# Another way to distribute the work load

• Find the Minimum value in an array - take 2

• Let's do the "Find min" example again, now splitting the task of "Finding the minimum value" in an array in a different manner

• Solution 2:

• Split the array into 2 (approximate) equal halfs
• Thread 1 finds the minimum in the odd-indexed elements of the array
(I.e.: x[0], x[2], x[4], etc)
• Thread 2 finds the minimum in the even-indexed elements of the array
(I.e.: x[1], x[3], x[5], etc)
• Main thread waits for the results and find the actual minimum.

Pictorially:

 ``` values handled by thread 0 | | | | | | | | | | | | | | V V V V V V V V V V V V V V |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-| ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | values handled by thread 1 Thread 0 Thread 1 | | | | V V min[0] min[1] \ / \ / \ / \ / \ / main thread | | V Actual minimum ```

• The division of labor in general is:

Main Thread: (UNCHANGED)

 ``` // ----------------------------------- // Create worker threads.... // ----------------------------------- for (i = 0; i < num_threads; i = i + 1) { start[i] = i; // Pass ID to thread in a private variable if ( pthread_create(&tid[i], NULL, worker, (void *)&start[i]) ) { cout << "Cannot create thread" << endl; exit(1); } } // ----------------------------------- // Wait for worker threads to end.... // ----------------------------------- for (i = 0; i < num_threads; i = i + 1) pthread_join(tid[i], NULL); // ---------------------------------------- // Post processing: Find actual minimum // ---------------------------------------- my_min = min[0]; for (i = 1; i < num_threads; i++) if ( min[i] < my_min ) my_min = min[i]; ```

Worker Thread: (CHANGED !!!)

 ``` void *worker(void *arg) { int i, s; double my_min; s = * (int *) arg; // Convert arg to an integer // -------------------------------------- // Find min in my range // -------------------------------------- my_min = x[s]; for (i = s+num_threads; i < MAX; i += num_threads) { if ( x[i] < my_min ) my_min = x[i]; } min[s] = my_min; // Store min in private slot return(NULL); /* Thread exits (dies) */ } ```

See the elements processed by the thread s:

It's much easier to code the worker thread !!!

• Example Program: (Demo above code)

Compile with: g++ -pthread min-mt2.C

• Speed up...

• Try running the programs using different threads (the program prints the elapsed time)

• Notice that the first version have drastically improved times on multi-processors (e.g. on compute

But the second version... no so much...

• \$60,000 question:

• Why the second version is not doing so great ?

• Answer: paging...

Each thread traverse the array from the beginning until the end.

Due to the large size of the array, the whole array cannot be stored in memory and will be paged in when a thread access the desired array elements.

(Solution 1 does not have the page problem, because the array element access pattern is "limited" to a tightly coupled region of the array)