14 Multi-Cycle MIPS

ECE 2500 Computer Organization and Architecture Spring 2012
Multi-cycle MIPS Hardware
The Single Cycle Computer Review
Single Cycle Computer

To
recap, let's follow closely how the following instruction is executed on this architecture
lw $s0, 12($t3)
lw $s0, 12($t3)
0x1000
0x1004
0x1000 add 0x1004 lw 0x1008 sub
$t3, $t2, $t3 $s0, 12($t3) $t5, $t2, $t1
lw $s0, 12($t3)
0x1000 add
0x1004 lw
memory access time

0x1000 add 0x1004 lw 0x1008 sub $t3, $t2, $t3 $s0, 12($t3) $t5, $t2, $t1
lw $s0, 12($t3)
0x1000
add $?? 0x1004 lw $t3 register access time
lw $s0, 12($t3)
0x1000
add $?? 0x1004 lw $t3
adr1
alu add time
$t3 + 12
lw $s0, 12($t3)
0x1000 add $?? 0x1004 lw $t3
adr1
x memory read time (+ mux)
$t3 + 12 mem($t3 + 12)
lw $s0, 12($t3)
0x1000 add $?? 0x1004 lw $t3
adr1
x must be bigger then register file write setup time
$t3 + 12 mem($t3 + 12)
Single-Cycle implies a long Critical Path
10
Critical Path varies with instruction type

add $t3, $t4, $t5
11
Critical Path varies with instruction type

bne $t3, $t4, 25
12
The multi-cycle processor

The
single cycle processor has a very simple control scheme BUT has a very long critical path critical path varies with the instruction type
Results in inefficient use of clock cycle
The
Therefore,
we will chop the instruction cycle
Multiple shorter cycles per instruction Vary the number of shorter cycles with the instruction type
Key
points to figure out
How to 'chop' logic in cycles How to modify the single-cycle computer architecture
13
Splitting a combinational computation

Register Register
Logic
Single cycle Register Register
Logic
Logic
Register
Register
Logic
Cycle 1
Logic
Cycle 2
14
Making the transformation to multi-cycle

Add
a register at the output of each logic block
Register
15
Result is 6 small operations

1. Fetch Instruction 2. Increment PC 3. Calculate Branch 4. Read Register Operands 5. ALU Operation 6. Fetch Data operand
4 5 6
16
Merging Logic in multicycle implementations

Distributing
logic over multiple cycles enables reuse!

operation 1 operation 2
reg1
reg2
ALU
ALU
multi-cycle conversion
reg1
operation 1
reg3
operation 2
reg2
ALU
ALU
Two identical ALU used in different clock cycles can be merged into one ALU
17
Merging Logic in multicycle implementations

reg1
operation 1
reg3
operation 2
reg2
ALU
ALU
operation 1
operation 2 cycle1/ cycle2
reg1
reg2/3
ALU
18
What Logic can we merge in the S-C computer?
Register
19
What Logic can we merge in the S-C computer?
Merge additions and ALU
Merge memory access
Register
20
Logic merging results in multi-cycle datapath
Register
Registers
Registers
Registers
21
Multi-cycle datapath (with multiplexers)

What
is the difference between the green and the blue multiplexers ?
22
The Single Cycle Computer

Look
at the single cycle computer if you're not sure ...
23

Green multiplexers accomodate multiple types of MIPS instructions
Result data is from data memory (lw) or from ALU (arithmetic op)
I-type and R-type have different destination reg field
24

Blue multiplexers support logic reuse during multi-cycle instructions
ALU does next-PC calculation and arithmetic Memory serves as instruction-memory and data-memory
25
Multi-cycle controller design

Each
instruction can now be mapped to several clock cycles in the multi-cycle controller design MIPS instruction will be split into up to 5 execution steps
Instruction Fetch Instruction decode and Register Fetch Execution Memory access Memory read completion
In
the following, we will map the MIPS instructions to the above 5 execution steps
26
Register Transfers and Register Names

We
will describe the execution in terms of so-called register-transfers between the registers with names as shown below
IR
PC
AluOut
MDR
27
Cycle 1: Instruction Fetch

PC <= PC + 4 IR <= Memory[PC]
28
Cycle 2: Instruction Decode and Reg Fetch

A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] AluOut <= PC + (SignExt(IR[15:0]) << 2)
Optimistic: may/ may not be needed
29
Cycle 3: Execution (for Branch)

if (A == B) PC <= AluOut
This completes the Branch Instruction
30
Cycle 3: Execution (for R-type)

AluOut <= A op B
31
Cycle 4: memory Access (for R-type)

Reg[IR[15:11]] <= AluOut
This completes the R-type instruction
32
Cycle 3: Execution (for lw/sw)

AluOut <= A + SignExtend(IR[15:0])
33
Cycle 4: memory Access (for lw/sw)

lw instruction - MDR <= Memory[AluOut] sw instruction - Memory[AluOut] <= B
This completes sw instruction
34
Cycle 5: memory completion

Reg[IR[20:16]] <= MDR
This completes the lw instruction
35
RTL Summary
Instructions
take 3, 4, or 5 cycles
This
table can be used to design the controller

R-type lw
IR <= Memory[PC] PC <= PC + 4
A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR 36 Mem[AluOut] <= B if (A==B) PC <= AluOut
sw
branch
Instruction Fetch
Instruction Decode Execution Memory Access Memory Completion
Multi-Cycle Datapath (with control signals)
37
How to generate these control signals ?

Example:
let's find the control signals for
AluOut <= A + SignExt(IR[15:0])
R-type
Instruction Fetch
Instruction Decode Execution Memory Access Memory Completion
lw
sw
branch
A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR 38 Mem[AluOut] <= B if (A==B) PC <= AluOut
Control signals for Register-Transfers

X 0 0 0 X 0 1
add 39
Finite State Machines

Generate
a (possibly conditional) sequence of control
signals
A
Finite State Machine (FSM) is an abstract representation of a control sequence

An FSM models a machine can be in several different states State transitions bring the machine from one state into the other A single state is designated as an initial state An FSM is captured by a graph, with nodes representing states and edges representing state transitions
40
Finite State Machine: Sequencing

This
machine has three states (s0, s1, s2) and starts out in s0 in s0, it will always transition into s1, and next into s2
When
S0
S1
S2
41

In
our discussion, state transitions are timed.
Each
clock cycle, the FSM will make a single state transition
S0
S1 cycle 1 cycle 4
S2 cycle 2 cycle 5
cycle 0
cycle 3 ...
42

State
transitions can be conditional when decisionmaking is needed

when a==0
S1
Controller modeled with FSM
S0
when a==1
S2
a Datapath generates a state transition condition a
43
FSM for the multi-cycle datapath
Each loop in this graph represents the execution of a single instruction 5 loops for 5 instruction types: lw, sw, R-type, conditionalbranch, jump
We did not discuss jump
Each state shows the value of the control signals
Each state transition shows the condition that triggers it

If nothing is shown, transition is taken unconditionally at start of the next clock cycle.
44
Multicycle Datapath Finite State Machine

0
Start
1
Fetch Decode
2
Exec Load/Store
6
Exec R-Type
8
Exec branch
9
Exec jump
3
Memory Load
5
Memory Store
7
Memory R-Type
4
Write Back 45
FSM corresponds to RTL summary table
2
R-type
Instruction Fetch Instruction Decode Execution
lw
sw
branch
3 4
A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR if (A==B) PC <= AluOut
1 2 3
Memory Access
Memory Completion
4 5
Mem[AluOut] <= B
5
46
FSM corresponds to RTL summary table
2
R-type
Instruction Fetch
lw
sw
branch
3
4
Instruction Decode Execution
A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut
Memory Access
Memory Completion
3 4
1 2
AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR Mem[AluOut] <= B
if (A==B) PC <= AluOut
47
FSM Implementation
State
Encoding: represent each state with a unique number.

N states => log2(N) bits required
Next-state
logic & output logic are combinational circuits

log2(# states)
from Datapath
Next-state Logic
State Register
Output Logic
to Datapath
48
Microprogramming
It's
not always possible or desirable to hardcode the nextstate logic & output logic
Example: Suppose you want a programmable instruction set, i.e. the possibility to define new instructions in a computer
Solution:
Microprogramming
Nextaddress Logic Microprogram Memory MicroProgram This is a writable Address MEMORY Register (not hardwired gates like an FSM)
from Datapath
to Datapath
Thus,
a MIPS program is made with instructions
Each instruction is made with micro-instructions

Some computers make each micro-instruction with nano-instructions
...
49
After all, How many Cycles Per Instruction (CPI)?

The
average program contains 25% load, 50% arithmetic, 10% store, 15% branches 5 cycles, Store: 4 cycles, Arith: 4 cycles, Branch: 3 cycles CPI, cycles per instruction:
CPI = 0.25*5 + 0.5*4 + 0.1*4 + 0.15*3 = 4.1 cycles/instruction
Load:
Therefore,
You
can predict the cycle-true behavior of a program by looking at the sequence of instructions
50
Recap: The multi-cycle MIPS datapath

ALU with Sign-extend Instruction Memory Data Memory Register File Branch Address adder Next-PC adder + Program Counter Reg
Single-cycle datapath
Insert Registers after each logic block
Merge logic & registers used in exclusive clock cycles
Multi-cycle datapath
ALU with Sign-extend Memory Register File + Program Counter Reg + A and B Reg + ALUOut Reg + Memory Data Reg + Instruction Reg
51
Multi-cycle datapath
52
Multi-cycle Implementation of MIPS instructions

MIPS Instructions
R-type instructions lw/ sw instructions conditional branch
implemented using
Register Transfers
= operations for which the source and the destination is defined by one of the following hardware registers: PC, IR, MDR, A, B, AluOut An RT takes 1 clock cycle Several RT can execute in parallel
3-cycle instruction
beq
4-cycle instruction
5-cycle instruction
R-type
sw
lw
53
Example: add $s0, $t0, $t1

RTL What happens
Cycle 1
Cycle 2
A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2)
Cycle 3
AluOut <= A op B
Cycle 4

54

RTL What happens
Cycle 1
IR <= 'add $s0, $t0, $t1' PC <= PC + 4
Cycle 2
A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2)
A <= $t0 B <= $t1 AluOut <= garbage
Cycle 3
AluOut <= A op B
AluOut <= $t0 '+' $t1
Cycle 4
$s0 <= AluOut

55

Cycle 1: IR <= Memory[PC]; PC <= PC + 4
56

Cycle 1: IR <= Memory[PC]; PC <= PC + 4
01
00
57

Cycle 2: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; AluOut <= PC + SignExt(IR[15:0] << 2)
01
00
58

Cycle 2: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; AluOut <= PC + SignExt(IR[15:0] << 2)
0 X
1 0
0 0
1 0
X X
0 0
0 0
X X
01 11
00 00 59

Cycle 3: AluOut <= A op B
0 X
1 0
0 0
1 0
X X
0 0
0 0
X X
01 11
00 00 60

Cycle 3: AluOut <= A op B
0 X X
1 0 0
0 0 0
1 0 0
X X X
0 0 0
0 0 1
X X X
01 11 00
00 00 10 61

Cycle 4: Reg[IR[15-11]] <= AluOut
0 X X
1 0 0
0 0 0
1 0 0
X X X
0 0 0
0 0 1
X X X
01 11 00
00 00 10 62

Cycle 4: Reg[IR[15-11]] <= AluOut
0 X X X
1 0 0 0
0 0 0 0
1 0 0 0
X X X 1
0 0 0 1
0 0 1 X
X X X 0
01 11 00 XX
00 00 10 XX
63
Multi-cycle control
Multi-cycle
control boils down to generating sequences of control bits for the datapath.
To execute add $s0, $t0, $t1, generate the following bits
Cycle 1 Cycle 2 Cycle 3 Cycle 4 0 X X X 1 0 0 0 0 0 0 0 1 0 0 0 X X X 1 0 0 0 1 0 0 1 X X X X 0 01 11 00 XX 00 00 10 XX
The
sequence only depends on the value of the opcode field (IR[31:26]).

clock Multi-cycle controller
opcode field
control-bits for the datapath

64
Summary for the multi-cycle processors

Single
cycle processor has long, variable critical path
Split
critical path by introducing registers

logic when similar functions used in separate
Merge
cycles
Instruction
Capture
execution with Register Transfers
control in a Finite State Machine
Performance
Measure is CPI = Cycles per Instruction
65

14 Multi-Cycle MIPS

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

14 Multi-Cycle MIPS

Cargado por

Copyright:

Formatos disponibles

ECE 2500 Computer Organization and Architecture Spring 2012

Multi-cycle MIPS Hardware

The Single Cycle Computer Review

Single Cycle Computer

0x1000 add 0x1004 lw 0x1008 sub

$t3, $t2, $t3 $s0, 12($t3) $t5, $t2, $t1

memory access time

$t3 + 12 mem($t3 + 12)

$t3 + 12 mem($t3 + 12)

Single-Cycle implies a long Critical Path

Critical Path varies with instruction type

Critical Path varies with instruction type

The multi-cycle processor

we will chop the instruction cycle

points to figure out

Splitting a combinational computation

Making the transformation to multi-cycle

a register at the output of each logic block

Result is 6 small operations

Merging Logic in multicycle implementations

logic over multiple cycles enables reuse!

Merging Logic in multicycle implementations

operation 2 cycle1/ cycle2

What Logic can we merge in the S-C computer?

What Logic can we merge in the S-C computer?

Merge additions and ALU

Merge memory access

Logic merging results in multi-cycle datapath

Multi-cycle datapath (with multiplexers)

is the difference between the green and the blue multiplexers ?

The Single Cycle Computer

at the single cycle computer if you're not sure ...

Multi-cycle datapath (with multiplexers)

I-type and R-type have different destination reg field

Multi-cycle datapath (with multiplexers)

Multi-cycle controller design

Register Transfers and Register Names

Cycle 1: Instruction Fetch

Cycle 2: Instruction Decode and Reg Fetch

Optimistic: may/ may not be needed

Cycle 3: Execution (for Branch)

Cycle 3: Execution (for R-type)

Cycle 4: memory Access (for R-type)

Cycle 3: Execution (for lw/sw)

Cycle 4: memory Access (for lw/sw)

Cycle 5: memory completion

table can be used to design the controller

Multi-Cycle Datapath (with control signals)

How to generate these control signals ?

let's find the control signals for

AluOut <= A + SignExt(IR[15:0])

Control signals for Register-Transfers

Finite State Machines

a (possibly conditional) sequence of control

Finite State Machine (FSM) is an abstract representation of a control sequence

Finite State Machine: Sequencing

Finite State Machine: Sequencing

our discussion, state transitions are timed.

clock cycle, the FSM will make a single state transition

Finite State Machine: Sequencing

transitions can be conditional when decisionmaking is needed

a Datapath generates a state transition condition a

FSM for the multi-cycle datapath