# Computer Algorithms for Message Passing MIMD

• Difference kinds of thinking...

• When I first encountered algorithms for Message-Passing MIMD computers, it struck me as "weird"... at least, much "weirder" than Shared-Memory MIMD algorithm.

• The reason is probably because all of us "grew up" in a Shared-memory programming environment; and have never seen message passing algorithm before.

• Main cause of the differences:

 In shared memory multi-processors, the result produced by a processor is immediately available to all other processors In contrast, in Message passing multi-processors, a result produced by a processor must be sent to other processor(s) to make it available !

• So here is a brief intro to a "weird" message passing algorithm

(This is not a course in distributed algorithm, just trying to give you a glimse into this area)

(As you can see, this is the end of the semester and we don't have time to study parallel and distributed algorithms at length, this is ment to make you aware of a new type of thinking about solving problems).

• Computing partial sums

• Given a sequence of values:

 ``` a0, a1, a2, ...., aN-1 ```

• Problem:

 Find ALL the partial sums: a0 a0 + a1 a0 + a1 + a2 a0 + a1 + a2 + a3 ... a0 + a1 + a2 + a3 + ... + aN-1

 ``` PartialSum[0] = a[0]; for (i = 1; i < N; i++) PartialSum[i] = PartialSum[i-1] + a[i]; ```

The run time complexity is N

• Overview of the Partial Sum Message Passing Algorithm:

• Uses N processors
(think: transputer - a large number of processors available)

• Processor i holds the number a[i] (only one number)

• Processors exchange information to compute the partial sums...

• Example of the Partial Sum Message Passing Algorithm:

• I think the algorithm can best be explained by using an example first.

• Consider N = 8

• The following diagrams shows how 8 messages passing processors can compute all the partial sums:

The goal is to compute ALL the partial sums:

• Processor 0: a0
• Processor 1: a0 + a1
• Processor 2: a0 + a1 + a2
• Processor 3: a0 + a1 + a2 + a3
• ...
• Processor 7: a0 + a1 + a2 + a3 + ... + a7

(I used array notation a[i] in the figures for ai)

• Step 1:

• processor i sends its data to processor i + 1 , for i = 0, 1, ..., 6

• processor i receive some data and and add the received value to its data, for i = 1, 2, ..., 7

What happens is the following:

Resulting state:

• Step 2:

• processor i sends its data to processor i + 2 , for i = 0, 1, ..., 5

• processor i receive some data and and add the received value to its data, for i = 2, 3, ..., 7

What happens is the following:

Resulting state:

• Step 3: (final step)

• processor i sends its data to processor i + 4 , for i = 0, 1, ..., 3

• processor i receive some data and and add the received value to its data, for i = 4, 5, ..., 7

What happens is the following:

Resulting state:

• Processor 0: a0
• Processor 1: a0 + a1
• Processor 2: a0 + a1 + a2
• Processor 3: a0 + a1 + a2 + a3
• ...
• Processor 7: a0 + a1 + a2 + a3 + ... + a7

OK, how can we formulate this as an compuetr algorithm ?

• Message Passing Algorithm for processor i:

 ``` // k = loop index // k runs: 1, 2, 4, 8, .... etc !!! for (k = 1; k < N; k = 2*k) { // Check if I am a processor that needs to send data if ( i + k < N ) send a[i] to processor (i+k); // Check if I am a processor that needs to receive data if ( i >= k ) { recevive message x; a[i] = a[i] + x; } } ```

• Run time complexity:

• # iterations = O(log(N)) !!!

• Foot Note

• This kind of message passing algorithms will only work when you have a massive amount of processor and a super-high speed interconnection network (because you are sending an insane number of messages and if you have a slow network, the running time will be hampered by the message transmissions).

• Most message-passing MIMD's available today consists of a relatvely small number (16 - a few hundred) of powerful computers interconnected by highspeed network.

• The most commonly used message-passing programming paradigm is "job-shop" where one processor (master) doles out pieces of work (tasks) to other processors (slaves).

• The Message Passing Interface (MPI) is ideal for this kind of message passing programming paradigm:

• Process 0 in MPI is the master process
• All other processes are slave processes