
+ +  1  v =  1   1  + + 
+ +  1 0 0  A =  1 0 0   1 0 0  + + 
+ + + + + +  1   5   6   2  +  4  =  7   3   1   4  + + + + + + 
(The elements are added rowwise)
(The addition operation is applied to all rows)
+ + A_{11} A_{12} A_{13} A = A_{21} A_{22} A_{23} A_{31} A_{32} A_{33} + +
+ + B_{11} B_{12} B_{13} B = B_{21} B_{22} B_{23} B_{31} B_{32} B_{33} + + Then:
+ + C_{11} C_{12} C_{13} C = A*B = C_{21} C_{22} C_{23} C_{31} C_{32} C_{33} + + where: C_{ij} = A_{i1}*B_{1j} + A_{i2}*B_{2j} + A_{i3}*B_{3j} (for i = 1, 2, 3 and j = 1, 2, 3)

But each row of matrix A uses DIFFERENT operands (data)
While going through the example, make a note that the same operations are performed on each column.
Initialization step (done once):
Processing Row 1:
Processing Row 2:
And so on....
The Cray has multiple system busses:
So it can fetch multiple (upto 64 !) operands from memory at the same time  But ONLY IF the operands are stored in DIFFERENT memory banks
To perform this step in the matrix multiplication:
It must
