Documentos de Académico
Documentos de Profesional
Documentos de Cultura
recap, let's follow closely how the following instruction is executed on this architecture
lw $s0, 12($t3)
lw $s0, 12($t3)
0x1000
0x1004
lw $s0, 12($t3)
0x1000 add
0x1004 lw
lw $s0, 12($t3)
0x1000
add $?? 0x1004 lw $t3 register access time
lw $s0, 12($t3)
0x1000
add $?? 0x1004 lw $t3
adr1
alu add time
$t3 + 12
lw $s0, 12($t3)
0x1000 add $?? 0x1004 lw $t3
adr1
x memory read time (+ mux)
lw $s0, 12($t3)
0x1000 add $?? 0x1004 lw $t3
adr1
x must be bigger then register file write setup time
10
11
12
single cycle processor has a very simple control scheme BUT has a very long critical path critical path varies with the instruction type
Results in inefficient use of clock cycle
The
Therefore,
Multiple shorter cycles per instruction Vary the number of shorter cycles with the instruction type
Key
How to 'chop' logic in cycles How to modify the single-cycle computer architecture
13
Logic
Single cycle Register Register
Logic
Logic
Register
Register
Logic
Cycle 1
Logic
Cycle 2
14
Register
15
4 5 6
16
reg1
reg2
ALU
ALU
multi-cycle conversion
reg1
operation 1
reg3
operation 2
reg2
ALU
ALU
Two identical ALU used in different clock cycles can be merged into one ALU
17
reg3
operation 2
reg2
ALU
ALU
operation 1
reg1
reg2/3
ALU
18
Register
19
Register
20
Register
Registers
Registers
Registers
21
22
23
Result data is from data memory (lw) or from ALU (arithmetic op)
24
ALU does next-PC calculation and arithmetic Memory serves as instruction-memory and data-memory
25
instruction can now be mapped to several clock cycles in the multi-cycle controller design MIPS instruction will be split into up to 5 execution steps
Instruction Fetch Instruction decode and Register Fetch Execution Memory access Memory read completion
In
the following, we will map the MIPS instructions to the above 5 execution steps
26
will describe the execution in terms of so-called register-transfers between the registers with names as shown below
IR
PC
AluOut
MDR
27
28
29
30
31
32
33
34
35
RTL Summary
Instructions
take 3, 4, or 5 cycles
This
sw
branch
Instruction Fetch
Instruction Decode Execution Memory Access Memory Completion
37
R-type
Instruction Fetch
Instruction Decode Execution Memory Access Memory Completion
lw
IR <= Memory[PC] PC <= PC + 4
sw
branch
A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR 38 Mem[AluOut] <= B if (A==B) PC <= AluOut
add 39
signals
A
40
machine has three states (s0, s1, s2) and starts out in s0 in s0, it will always transition into s1, and next into s2
When
S0
S1
S2
41
Each
S0
S1 cycle 1 cycle 4
S2 cycle 2 cycle 5
cycle 0
cycle 3 ...
42
S1
Controller modeled with FSM
S0
when a==1
S2
43
Each loop in this graph represents the execution of a single instruction 5 loops for 5 instruction types: lw, sw, R-type, conditionalbranch, jump
We did not discuss jump
44
1
Fetch Decode
2
Exec Load/Store
6
Exec R-Type
8
Exec branch
9
Exec jump
3
Memory Load
5
Memory Store
7
Memory R-Type
4
Write Back 45
2
R-type
Instruction Fetch Instruction Decode Execution
lw
IR <= Memory[PC] PC <= PC + 4
sw
branch
3 4
A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR if (A==B) PC <= AluOut
1 2 3
Memory Access
Memory Completion
4 5
Mem[AluOut] <= B
5
46
2
R-type
Instruction Fetch
lw
IR <= Memory[PC] PC <= PC + 4
sw
branch
3
4
A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]] AluOut <= PC + SignExt(IR[15:0] << 2) AluOut<= A op B Reg[IR[15:11]] <= AluOut
Memory Access
Memory Completion
3 4
1 2
AluOut<= A + SignExt(IR[15:0]) MDR <= Mem[AluOut] Reg[IR[20:16]] <= MDR Mem[AluOut] <= B
47
FSM Implementation
State
Next-state
from Datapath
Next-state Logic
State Register
Output Logic
to Datapath
48
Microprogramming
It's
not always possible or desirable to hardcode the nextstate logic & output logic
Example: Suppose you want a programmable instruction set, i.e. the possibility to define new instructions in a computer
Solution:
Microprogramming
Nextaddress Logic Microprogram Memory MicroProgram This is a writable Address MEMORY Register (not hardwired gates like an FSM)
from Datapath
to Datapath
Thus,
...
49
average program contains 25% load, 50% arithmetic, 10% store, 15% branches 5 cycles, Store: 4 cycles, Arith: 4 cycles, Branch: 3 cycles CPI, cycles per instruction:
CPI = 0.25*5 + 0.5*4 + 0.1*4 + 0.15*3 = 4.1 cycles/instruction
Load:
Therefore,
You
can predict the cycle-true behavior of a program by looking at the sequence of instructions
50
Single-cycle datapath
Insert Registers after each logic block
Multi-cycle datapath
ALU with Sign-extend Memory Register File + Program Counter Reg + A and B Reg + ALUOut Reg + Memory Data Reg + Instruction Reg
51
Multi-cycle datapath
52
Register Transfers
= operations for which the source and the destination is defined by one of the following hardware registers: PC, IR, MDR, A, B, AluOut An RT takes 1 clock cycle Several RT can execute in parallel
3-cycle instruction
beq
4-cycle instruction
5-cycle instruction
R-type
sw
lw
53
Cycle 1
Cycle 2
Cycle 3
AluOut <= A op B
Cycle 4
Cycle 1
Cycle 2
Cycle 3
AluOut <= A op B
Cycle 4
56
01
00
57
01
00
58
0 X
1 0
0 0
1 0
X X
0 0
0 0
X X
01 11
00 00 59
0 X
1 0
0 0
1 0
X X
0 0
0 0
X X
01 11
00 00 60
0 X X
1 0 0
0 0 0
1 0 0
X X X
0 0 0
0 0 1
X X X
01 11 00
00 00 10 61
0 X X
1 0 0
0 0 0
1 0 0
X X X
0 0 0
0 0 1
X X X
01 11 00
00 00 10 62
0 X X X
1 0 0 0
0 0 0 0
1 0 0 0
X X X 1
0 0 0 1
0 0 1 X
X X X 0
01 11 00 XX
00 00 10 XX
63
Multi-cycle control
Multi-cycle
control boils down to generating sequences of control bits for the datapath.
To execute add $s0, $t0, $t1, generate the following bits
Cycle 1 Cycle 2 Cycle 3 Cycle 4 0 X X X 1 0 0 0 0 0 0 0 1 0 0 0 X X X 1 0 0 0 1 0 0 1 X X X X 0 01 11 00 XX 00 00 10 XX
The
opcode field
Split
Merge
cycles
Instruction
Capture
Performance
65