Está en la página 1de 65

ECE 2500 Computer Organization and Architecture Spring 2012

Multi-cycle MIPS Hardware

The Single Cycle Computer Review

Single Cycle Computer


To

recap, let's follow closely how the following instruction is executed on this architecture
lw $s0, 12($t3)

lw $s0, 12($t3)

0x1000

0x1004

0x1000 add 0x1004 lw 0x1008 sub

$t3, $t2, $t3 $s0, 12($t3) $t5, $t2, $t1

lw $s0, 12($t3)

0x1000 add

0x1004 lw

memory access time


0x1000 add 0x1004 lw 0x1008 sub $t3, $t2, $t3 $s0, 12($t3) $t5, $t2, $t1

lw $s0, 12($t3)
0x1000
add $?? 0x1004 lw $t3 register access time

lw $s0, 12($t3)
0x1000
add $?? 0x1004 lw $t3

adr1
alu add time

$t3 + 12

lw $s0, 12($t3)
0x1000 add $?? 0x1004 lw $t3

adr1
x memory read time (+ mux)

$t3 + 12 mem($t3 + 12)

lw $s0, 12($t3)
0x1000 add $?? 0x1004 lw $t3

adr1
x must be bigger then register file write setup time

$t3 + 12 mem($t3 + 12)

Single-Cycle implies a long Critical Path

10

Critical Path varies with instruction type


add $t3, $t4, $t5

11

Critical Path varies with instruction type


bne $t3, $t4, 25

12

The multi-cycle processor


The

single cycle processor has a very simple control scheme BUT has a very long critical path critical path varies with the instruction type
Results in inefficient use of clock cycle

The

Therefore,

we will chop the instruction cycle

Multiple shorter cycles per instruction Vary the number of shorter cycles with the instruction type

Key

points to figure out

How to 'chop' logic in cycles How to modify the single-cycle computer architecture
13

Splitting a combinational computation


Register Register

Logic
Single cycle Register Register

Logic

Logic

Register

Register

Logic
Cycle 1

Logic
Cycle 2
14

Making the transformation to multi-cycle


Add

a register at the output of each logic block

Register

15

Result is 6 small operations


1. Fetch Instruction 2. Increment PC 3. Calculate Branch 4. Read Register Operands 5. ALU Operation 6. Fetch Data operand

4 5 6

16

Merging Logic in multicycle implementations


Distributing

logic over multiple cycles enables reuse!


operation 1 operation 2

reg1

reg2

ALU

ALU

multi-cycle conversion

reg1

operation 1

reg3

operation 2

reg2

ALU

ALU

Two identical ALU used in different clock cycles can be merged into one ALU
17

Merging Logic in multicycle implementations


reg1
operation 1

reg3

operation 2

reg2

ALU

ALU

operation 1

operation 2 cycle1/ cycle2

reg1
reg2/3

ALU

18

What Logic can we merge in the S-C computer?

Register

19

What Logic can we merge in the S-C computer?

Merge additions and ALU

Merge memory access

Register

20

Logic merging results in multi-cycle datapath

Register

Registers

Registers

Registers
21

Multi-cycle datapath (with multiplexers)


What

is the difference between the green and the blue multiplexers ?

22

The Single Cycle Computer


Look

at the single cycle computer if you're not sure ...

23

Multi-cycle datapath (with multiplexers)


Green multiplexers accomodate multiple types of MIPS instructions

Result data is from data memory (lw) or from ALU (arithmetic op)

I-type and R-type have different destination reg field

24

Multi-cycle datapath (with multiplexers)


Blue multiplexers support logic reuse during multi-cycle instructions

ALU does next-PC calculation and arithmetic Memory serves as instruction-memory and data-memory

25

Multi-cycle controller design


Each

instruction can now be mapped to several clock cycles in the multi-cycle controller design MIPS instruction will be split into up to 5 execution steps
Instruction Fetch Instruction decode and Register Fetch Execution Memory access Memory read completion

In

the following, we will map the MIPS instructions to the above 5 execution steps

26

Register Transfers and Register Names


We

will describe the execution in terms of so-called register-transfers between the registers with names as shown below

IR

PC

AluOut

MDR
27

Cycle 1: Instruction Fetch


PC <= PC + 4 IR <= Memory[PC]

28

Cycle 2: Instruction Decode and Reg Fetch


A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] AluOut <= PC + (SignExt(IR[15:0]) << 2)

Optimistic: may/ may not be needed

29

Cycle 3: Execution (for Branch)


if (A == B) PC <= AluOut
This completes the Branch Instruction

30

Cycle 3: Execution (for R-type)


AluOut <= A op B

31

Cycle 4: memory Access (for R-type)


Reg[IR[15:11]] <= AluOut
This completes the R-type instruction

32

Cycle 3: Execution (for lw/sw)


AluOut <= A + SignExtend(IR[15:0])

33

Cycle 4: memory Access (for lw/sw)


lw instruction - MDR <= Memory[AluOut] sw instruction - Memory[AluOut] <= B
This completes sw instruction

34

Cycle 5: memory completion


Reg[IR[20:16]] <= MDR
This completes the lw instruction

35

RTL Summary
Instructions

take 3, 4, or 5 cycles

This

table can be used to design the controller


R-type lw
IR <= Memory[PC] PC <= PC + 4
A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR 36 Mem[AluOut] <= B if (A==B) PC <= AluOut

sw

branch

Instruction Fetch
Instruction Decode Execution Memory Access Memory Completion

Multi-Cycle Datapath (with control signals)

37

How to generate these control signals ?


Example:

let's find the control signals for

AluOut <= A + SignExt(IR[15:0])

R-type
Instruction Fetch
Instruction Decode Execution Memory Access Memory Completion

lw
IR <= Memory[PC] PC <= PC + 4

sw

branch

A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR 38 Mem[AluOut] <= B if (A==B) PC <= AluOut

Control signals for Register-Transfers


X 0 0 0 X 0 1

add 39

Finite State Machines


Generate

a (possibly conditional) sequence of control

signals
A

Finite State Machine (FSM) is an abstract representation of a control sequence


An FSM models a machine can be in several different states State transitions bring the machine from one state into the other A single state is designated as an initial state An FSM is captured by a graph, with nodes representing states and edges representing state transitions

40

Finite State Machine: Sequencing


This

machine has three states (s0, s1, s2) and starts out in s0 in s0, it will always transition into s1, and next into s2

When

S0

S1

S2

41

Finite State Machine: Sequencing


In

our discussion, state transitions are timed.

Each

clock cycle, the FSM will make a single state transition

S0

S1 cycle 1 cycle 4

S2 cycle 2 cycle 5

cycle 0
cycle 3 ...

42

Finite State Machine: Sequencing


State

transitions can be conditional when decisionmaking is needed


when a==0

S1
Controller modeled with FSM

S0

when a==1

S2

a Datapath generates a state transition condition a

43

FSM for the multi-cycle datapath

Each loop in this graph represents the execution of a single instruction 5 loops for 5 instruction types: lw, sw, R-type, conditionalbranch, jump
We did not discuss jump

Each state shows the value of the control signals

Each state transition shows the condition that triggers it


If nothing is shown, transition is taken unconditionally at start of the next clock cycle.

44

Multicycle Datapath Finite State Machine


0
Start

1
Fetch Decode

2
Exec Load/Store

6
Exec R-Type

8
Exec branch

9
Exec jump

3
Memory Load

5
Memory Store

7
Memory R-Type

4
Write Back 45

FSM corresponds to RTL summary table

2
R-type
Instruction Fetch Instruction Decode Execution

lw
IR <= Memory[PC] PC <= PC + 4

sw

branch

3 4

A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR if (A==B) PC <= AluOut

1 2 3

Memory Access
Memory Completion

4 5

Mem[AluOut] <= B

5
46

FSM corresponds to RTL summary table

2
R-type
Instruction Fetch

lw
IR <= Memory[PC] PC <= PC + 4

sw

branch

3
4

Instruction Decode Execution

A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut

Memory Access
Memory Completion

3 4

1 2

AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR Mem[AluOut] <= B

if (A==B) PC <= AluOut

47

FSM Implementation
State

Encoding: represent each state with a unique number.


N states => log2(N) bits required

Next-state

logic & output logic are combinational circuits


log2(# states)

from Datapath

Next-state Logic

State Register

Output Logic

to Datapath

48

Microprogramming
It's

not always possible or desirable to hardcode the nextstate logic & output logic
Example: Suppose you want a programmable instruction set, i.e. the possibility to define new instructions in a computer

Solution:

Microprogramming
Nextaddress Logic Microprogram Memory MicroProgram This is a writable Address MEMORY Register (not hardwired gates like an FSM)

from Datapath

to Datapath

Thus,

a MIPS program is made with instructions

Each instruction is made with micro-instructions


Some computers make each micro-instruction with nano-instructions

...

49

After all, How many Cycles Per Instruction (CPI)?


The

average program contains 25% load, 50% arithmetic, 10% store, 15% branches 5 cycles, Store: 4 cycles, Arith: 4 cycles, Branch: 3 cycles CPI, cycles per instruction:
CPI = 0.25*5 + 0.5*4 + 0.1*4 + 0.15*3 = 4.1 cycles/instruction

Load:

Therefore,

You

can predict the cycle-true behavior of a program by looking at the sequence of instructions

50

Recap: The multi-cycle MIPS datapath


ALU with Sign-extend Instruction Memory Data Memory Register File Branch Address adder Next-PC adder + Program Counter Reg

Single-cycle datapath
Insert Registers after each logic block

Merge logic & registers used in exclusive clock cycles

Multi-cycle datapath

ALU with Sign-extend Memory Register File + Program Counter Reg + A and B Reg + ALUOut Reg + Memory Data Reg + Instruction Reg
51

Multi-cycle datapath

52

Multi-cycle Implementation of MIPS instructions


MIPS Instructions
R-type instructions lw/ sw instructions conditional branch
implemented using

Register Transfers
= operations for which the source and the destination is defined by one of the following hardware registers: PC, IR, MDR, A, B, AluOut An RT takes 1 clock cycle Several RT can execute in parallel

3-cycle instruction
beq

4-cycle instruction

5-cycle instruction

R-type
sw

lw

53

Example: add $s0, $t0, $t1


RTL What happens

Cycle 1

IR <= Memory[PC] PC <= PC + 4

Cycle 2

A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2)

Cycle 3

AluOut <= A op B

Cycle 4

Reg[IR[15:11]] <= AluOut


54

Example: add $s0, $t0, $t1


RTL What happens

Cycle 1

IR <= Memory[PC] PC <= PC + 4

IR <= 'add $s0, $t0, $t1' PC <= PC + 4

Cycle 2

A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2)

A <= $t0 B <= $t1 AluOut <= garbage

Cycle 3

AluOut <= A op B

AluOut <= $t0 '+' $t1

Cycle 4

Reg[IR[15:11]] <= AluOut

$s0 <= AluOut


55

Example: add $s0, $t0, $t1


Cycle 1: IR <= Memory[PC]; PC <= PC + 4

56

Example: add $s0, $t0, $t1


Cycle 1: IR <= Memory[PC]; PC <= PC + 4

01

00

57

Example: add $s0, $t0, $t1


Cycle 2: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; AluOut <= PC + SignExt(IR[15:0] << 2)

01

00

58

Example: add $s0, $t0, $t1


Cycle 2: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; AluOut <= PC + SignExt(IR[15:0] << 2)

0 X

1 0

0 0

1 0

X X

0 0

0 0

X X

01 11

00 00 59

Example: add $s0, $t0, $t1


Cycle 3: AluOut <= A op B

0 X

1 0

0 0

1 0

X X

0 0

0 0

X X

01 11

00 00 60

Example: add $s0, $t0, $t1


Cycle 3: AluOut <= A op B

0 X X

1 0 0

0 0 0

1 0 0

X X X

0 0 0

0 0 1

X X X

01 11 00

00 00 10 61

Example: add $s0, $t0, $t1


Cycle 4: Reg[IR[15-11]] <= AluOut

0 X X

1 0 0

0 0 0

1 0 0

X X X

0 0 0

0 0 1

X X X

01 11 00

00 00 10 62

Example: add $s0, $t0, $t1


Cycle 4: Reg[IR[15-11]] <= AluOut

0 X X X

1 0 0 0

0 0 0 0

1 0 0 0

X X X 1

0 0 0 1

0 0 1 X

X X X 0

01 11 00 XX

00 00 10 XX

63

Multi-cycle control
Multi-cycle

control boils down to generating sequences of control bits for the datapath.
To execute add $s0, $t0, $t1, generate the following bits
Cycle 1 Cycle 2 Cycle 3 Cycle 4 0 X X X 1 0 0 0 0 0 0 0 1 0 0 0 X X X 1 0 0 0 1 0 0 1 X X X X 0 01 11 00 XX 00 00 10 XX

The

sequence only depends on the value of the opcode field (IR[31:26]).


clock Multi-cycle controller

opcode field

control-bits for the datapath


64

Summary for the multi-cycle processors


Single

cycle processor has long, variable critical path

Split

critical path by introducing registers


logic when similar functions used in separate

Merge

cycles
Instruction
Capture

execution with Register Transfers

control in a Finite State Machine

Performance

Measure is CPI = Cycles per Instruction

65

También podría gustarte