CS355 Sylabus

# Data Hazard in the Basic Pipelining

• Not all is well with the basic pipeline...

• Consider the following program that is executed by the basic pipeline:

```   Assume: R1 = 123, R2=11, R3=9, R4=1, R5=8, R6=0, R7=2

ADD R2, R3, R1        // R1 = R2 + R3   (new value = 11 + 9 = 20)
ADD R4, R1, R4	 // R4 = R1 + R4
...
```
• The correct behavior (one that an assembler programmer would expect) is:

 R1 := R2+R3 = 20, then the other instructions will add R1=20 to registers R4, R5, R6 and R7

• But that's NOT will happen in the basic pipelined CPU...

• CPU Cycle 1

• At start of the CPU cycle, the IF stage sends out PC
• At end of the CPU cycle, the IR(ID) register is updated with the instruction fetched (ADD R2, R3, R1)

• The picture above depicts the content of the CPU at end of the first CPU cycle (and the start of the 2nd cycle)

• CPU Cycle 2

• At start of the CPU cycle, the ID stage sends out selection signal that selects values from R2 and R3

• At end of the CPU cycle, A register is updated with R2 = 11, B register is updated with R3= 9.
• Also, at the end of the CPU cycle, the instruction (ADD R2, R3, R1) is moved into IR(EX) and instruction ADD R4, R1, R4 is fetched into IR(ID)

• The picture above depicts the content of the CPU at end of the second CPU cycle (and the start of the 3rd cycle)

• CPU Cycle 3

• At start of the CPU cycle, the EX stage selects values from R2 and R3 for the ALU, use the ALU opcode to make ALU add the input values forming the result 20 (which will become the value of R1)

Also, at start of the CPU cycle, the ID stage selects R4 and R1 to be copied into the A and B registers:

• Notice that an OLD value of R1 will be fetched

• At end of the CPU cycle, ALUo and DMAR registers is updated with the value R1+R2 = 20 (future value of R1)

Also, at the end of the CPU cycle, A is updated to R4 (=1) and B is updated to the "current value" of R1 (= 123). This "current" value is a wrong value because there is a more current one on the way....

Also, at the end of the CPU cycle, the instruction (ADD R2, R3, R1) is moved into IR(MEM), ADD R4, R1, R4 is moved into IR(EX) and instruction ADD R5, R1, R5 is fetched into IR(ID)

• The picture above depicts the content of the CPU at end of the 3rd CPU cycle (and the start of the 4th cycle)
• We can see that ADD R4,R1,R4 will not execute correctly.

• CPU Cycle 4

• At start of the CPU cycle, the MEM stage's ALUo1 register will start to receive the output 20 of ADD R2, R3, R1.

• Also at start of the CPU cycle, the EX stage selects values from R4 and the old value of R1 for the ALU, use the ALU opcode to make ALU add the input values - the result is INCORRECT because an old value of R1 was used

Also, at start of the CPU cycle, the ID stage selects R1 and the same OLD value of R1 to be copied into the A and B registers.

• At end of the CPU cycle, ALUo1 in MEM stage is updated with the value R2+R3 = 20

• Also, at end of the CPU cycle, ALUo and DMAR registers is updated with the value R4+R1(wrong value) = 124

Also, at the end of the CPU cycle, A is updated to R5 and B is updated to the old value of R1 .

Also, at the end of the CPU cycle, the instruction (ADD R2, R3, R1) is moved into IR(WB), ADD R4, R1, R4 is moved into IR(MEM), instruction ADD R5, R1, R5 is moved into IR(EX) and instruction ADD R6, R1, R6 is fetched into IR(ID)

• The picture above depicts the content of the CPU at end of the 4th CPU cycle (and the start of the 4th cycle)
• We can see that ADD R4,R1,R4 and ADD R5,R1,R5 will not execute correctly.

• CPU Cycle 5

• You may think that since register R1 has not yet been updated when cycle 5 starts, the next instruction (ADD R6,R1,R6) that use R1 will also fetch an OLD value.

That is not true, we can make ADD R6,R1,R6 fetch the updated R1 value through the following "trick".

• D-flipflop memory elements can be updated when the clock goes from 0-->1 or from 1-->0.

• We wired the general purpose registers in such a way that they will be updated when the clock goes from 0-->1

We wired the A, B and D registers in such a way that they are updated when the clock goes from 1-->0.

• Let us see what will happen to the ADD R6,R1,R6 instruction in the following diagrams...

• At start of the CPU cycle, the MEM stage's ALUo1 register will start to receive the output 20 of ADD R2, R3, R1:

• At the mid point of the cycle, the WB stage will trigger the register R1 to be updated (because the general purpose registers are updated when clock goes from 0-->1):

• Immediately after R1 is updated, the output of R1 will change from 123 to 20 and the value 20 will flow from the registers to the input of the B-register.

So sometime after the midpoint (but before the transition from 1-->0), the new value 20 of R1 will arrive at register B:

• At end of the CPU cycle, B will be updated with the new value in R1 !!!

• The picture above depicts the content of the CPU at end of the 5th CPU cycle (and the start of the 6th cycle)

• We can see that ONLY TWO instructions (ADD R4,R1,R4 and ADD R5,R1,R5) will not execute correctly.

• The fact that TWO instruction will not be able to obtain the correct value is important in the design of a solution.

• Demo: Aaron/5-ALU-hazard

• Executes the following program:

 ``` Initially: R1 = 7 R4,R5,R6,R7 = 1 R1 = R2 + R3 ==> R1 = 5 !!! R4 = R4 + R1 ==> R4 = 8 (error) !!! R5 = R4 + R1 ==> R5 = 8 (error) !!! R6 = R4 + R1 ==> R6 = 6 (correct) R7 = R4 + R1 ==> R7 = 6 (correct) ```

What to do:

• Clock until all stages has 0 for instructions

• First instruction will update R1

• Next 2 instructions will get the wrong value for R1

• Pay attention to the 4th instruction:

 Clock up   ⇒   R1 is updated Clock down   ⇒   B is updated with the correct value in R1