Documentos de Académico
Documentos de Profesional
Documentos de Cultura
By
Harsh S Mehta
DECEMBER 2012
Date
Date
Date
ACKNOWLEDGEMENTS
I would like to thank Ramin Roosta (PhD) for providing nice ideas to work upon and Shahnam
Mirzaei (PhD) for his guidance. I sincerely want to thank my other committee members
Professor Ali Amini (PhD) and Sedghisigarchi, Kourosh (PhD) for their time to review my
project report and their suggestions.
iii
Table of Contents
Signature Page.........ii
ACKNOWLEDGEMENTS....................................................................................................... iii
ABSTRACT ...............................................................................................................................v
Chapter 1: Introduction and Background .....................................................................................1
1.1 RISC and CISC architecture..............................................................................................1
1.2 Introduction to single cycle CPU, multi cycle CPU and comparison with pipeline CPU ....2
1.2.1 Basic of Single Cycle CPU .........................................................................................2
1.2.2 Basic of Multi cycle CPU ...........................................................................................2
1.2.3 Comparison among Single Cycle, Multi Cycle and pipelined CPU ............................4
1.3 Design Environment ..........................................................................................................5
Chapter 2: Concept of pipelining .................................................................................................6
2.1 Fundamental of Pipelining ................................................................................................6
2.2 MIPS subset for an implementation...................................................................................7
2.2.1 MIPS instruction format ................................................................................................7
2.2.2 A Pipeline Datapath and Control ....................................................................................9
2.3 Data Hazard and Forwarding .......................................................................................... 12
2.4 Data Hazard and Stalls .................................................................................................... 16
Chapter 3: Synthesis using Xilinx ISE 13.2 ............................................................................... 20
Chapter 4 : Conclusion and Future work.................................................................................... 21
4.1 Conclusion. ...................................................................................................................... 21
4.2 Future Enhancement. ....................................................................................................... 21
References ................................................................................................................................ 22
Appendix A : Different Verilog Code files ................................................................................ 23
Appendix B : Output ................................................................................................................. 50
B.1 initial information form vcs.log ....................................................................................... 50
B.2 Waveforms:..................................................................................................................... 52
Appendix C : Use of VCS simulator .......................................................................................... 56
Appendix D: Schematic view of the Design......58
iv
ABSTRACT
By
Harsh S Mehta
The aim of the project is to implement the 32-bit five stage pipeline RISC CPU based on MIPS.
The project involves design of a simple RISC processor and simulation of it. A Reduced
Instruction Set Compiler (RISC) is a microprocessor that had been designed to perform a small
set of instructions, with the aim of increasing the overall speed of the processor. In this work, I
analyze MIPS instruction format, instruction data path, control module function and design
theory based on RISC CPU instruction set. Furthermore I use pipeline design process to
simulate successfully, which involves instruction fetch (IF), instruction decode (ID), execution
(EX), data memory (MEM), write back (WB) modules of the 32-bit CPU based on RISC CPU
instruction set. IF module fetches the instruction from instruction memory. ID stage sends
control commands i.e. instructions are sending to control unit and decoded here. EXE stage
executes arithmetic. Main component of the EXE stage is ALU. MEM fetches data from memory
and store data to memory, if instruction is not memory/IO instruction, result is sent to WB stage.
At last WB stage charges of writing the results, store data and input data to register file. The
purpose of WB stage is to write data to destination register. To implement different hazard
resolution, forwarding and hazard detection by stalling the processor is involved in this project.
The idea of this project was to create a MIPS processor as a building block in Verilog. In this
project for simulation I used Synopsys VCS as well Xilinx ISE tool.
1.2 Introduction to single cycle CPU, multi cycle CPU and comparison with pipeline CPU
In order to understand how one can implement the RISC instruction set in pipelined fashion, we
should understand how it can be implemented without pipelining and therefore here we will go
through the basics of multi clock cycle CPU approach. Definitely unpipelined implementation is
not economical in comparison to the pipelined CPU structure. We will understand this with the
help of an example later in this section.
In general, every instruction in RISC architecture can be implemented using 5 clk cycles. The
multi clk cycles are as follow:
1. Instruction Fetch (IF)
Sending PC to memory and fetching the current instruction from memory as well
update the PC to next in sequence by adding 4 to the PC (PC = PC+4)
2. Instruction decode (ID)
Decoding the instruction and reading the registers as specified in register file.
For the possible branch instruction, doing the equality test on the registers as they
are read.
Sign extend the offset field if it is needed.
Compute the possible branch target address
Decoding can be done in parallel with reading the registers since the register
specifiers at a fixed location, this is called is fixed field decoding
3. Execute (EX)
In this stage, mainly ALU operations based on the instruction type.
In terms of memory instructions, it adds base address and offset to acquire
effective address.
For register register operations, as per the ALU opcode it performs addition,
subtraction as it is needed.
It performs operation for register immediate ALU instructions.
4. Memory access (MEM)
In this particular stage, load and store instructions are being performed.
If it is a load instruction then it reads an effective address from the memory and in
the case of store instruction it writes the data in to memory.
5. Write Back (WB)
This is the last stage and it performs register register ALU instruction or LOAD
instruction to write the result in to register file (at ID stage), to check whether it
comes through load instruction or from ALU when it is a case of ALU instruction.
1.2.1 Basic of Single Cycle CPU
As name suggests in this category of CPU, it executes all instructions in one clk cycle. In reality
each cycle requires a certain amount of time and this mean single cycle CPU spends same
amount of time to execute each instruction, basically one cycle no matter how complex is the
instruction. In order to ensure the correct operation, the slowest instruction should be completed
within one clock tick e.g. load (ld), which means single cycle CPU operates at the speed of
slowest instruction in ISA. Another aspect of this CPU is, since it has to complete all the
2
instructions in one clock cycle means any element must be used once only. So duplication of
such an element has to be available. This point to the fact that if same element is used more than
once than there will instruction flows and therefore different connections have to be realized and
the is done by multiplexer.
Fig. 1.2.1a represents the combined data path including instruction memory, data memory, the
ALU and the program counter (PC) unit and of course multiplexer.
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
If we consider this condition then the speedup of the pipelining is same as the number of the
pipeline stages so it should be five in the case of MIPS processor. In reality, these stages are not
balanced accurately and pipeline does have overhead mainly pipeline register delay and clk skew
due to set up time of these registers. Once the clock cycle is as small as pipeline overhead then
the pipeline concept is no more useful which means very deep pipeline may not be useful.
Always consider the fact that pipeline reduces the average execution time per instruction.
6
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
and operation field; I-type instruction has two registers as well 16 nit immediate field and there is
26-bit address field in J-type instruction which is 26-bit jump target.
Instruction set definition
Name
Description
Type of
instruction
J
Lw
Sw
Bne
Beq
Addi
Ori
Add
Sub
Mult
Div
And
Or
Nor
Jump
load word
store word
branch not equal
branch equal
add immediate
Or immediate
Addition
Subtraction
Multiplication
Division
AND
OR
NOR
J
I
I
I
I
I
I
R
R
R
R
R
R
R
For example,
addi $r1, $r2, 9 (instruction rt, rs, immediate) which means it adds the value 5 to the register $r2,
and stores the result in to $r1.
J-type instruction
J instructions are written with labels; it is linker or assemblers duty to convert the label in to
numerical value.
For example,
j label (instruction addr), which means this instruction informs the processor to skip to the
instruction written at addr space.
2.2.2 A Pipeline Datapath and Control
Fig. 2.2.2a is showing the pipeline datapath. Here we will follow section 1.2 but this will be in
terms of pipeline structure and obviously for MIPS architecture.
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
needs to be stored temporarily in correspondent pipeline register. Operations in each stage of the
pipeline structure are shown below in Fig. 2.2.2b.
Courtesy of Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
10
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
; Result is written in $A
; 1st Operand $A is dependent on add instruction
; 2nd Operand $A is dependent on add instruction
; Both operands are dependent on add instruction
; Base is dependent on add instruction
All four instructions followed by add instruction are dependent on add instruction. $A stores
resulting addition of $B and $C. Fig. 2.3ashows the dependency of these instructions. It is clearly
shown that $A updates its value at clk cycle 5 and before that the written value is unavailable but
all the successive instructions followed by add instruction reads the value from $A, so basically
they need updated value in very next clk cycle. This is called data hazard.
12
or
$E, $A, $H
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
Execution Hazard:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt )) ForwardB = 10 [1]
13
add $A , $B, $C
or
$E, $A, $H
sw $M, 15 ($A)
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
Memory Hazard:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd 0 )
and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd 0 )
and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 [1]
It is to note down that there is no hazard in writeback stage since it is assumed that register bank
supplies the right result if an instruction in the instruction decode stage supplies the same register
written by an instruction in write back stage. But there is one potential hazard in the case of
forwarding, if the result of the instruction in writes back stage and result of the instruction in
memory stage , and the source operand in ALU stage. Below is the example of the forwarding[1]
add $A, $A , $V
add $A, $A, $X
add $A, $A $U [1]
14
Here in above example instruction reads from and write in to the same register. In such a case
result needs to be forwarded from MEMORY stage as this result at this stage is the latest one. So
including this case the condition will be:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd 0 )
and (EX/MEM.Register Rd ID/EX.RegisterRs)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd 0 )
and (EX/MEM.Register Rd ID/EX.RegisterRt)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 [1]
Fig. 2.3c is showing the necessary hardware for the forwarding unit described in above section
2.3
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
15
$A , 10 ($B)
$C , $A , $U
$K, SA , $N
$D, $C, $A
$J , $H, $G
In this case the instruction courtesy of load instruction which is and goes backward in time so
forwarding cannot be the remedy here and pipeline must be stalled hence forth. Fig. 2.4a shows
the same case.
lw $A , 10 ($B)
and $C , $A , $U
add $K, SA , $N
or
$D, $C, $A
and $J , $H, $G
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
Above condition checks that whether the instruction is load or not and if it is load then it checks
whether the destination register of the load instruction matches with the source register of the
instruction in Execution stage and if it is the case then it stalls the processor for a cycle. Courtesy
of this a cycle stall, the forwarding unit will take care of the situation courtesy of Execution
stage.
Now the question is, how can we implement this stall while designing MIPS processor? Let us
discuss about this aspect. As explained we are stalling the instruction in instruction decode stage
this means the instruction in fetch stage must be stalled as well other wise of course we will end
up losing fetched instruction which is not good at all. Basically idea is, preventing the Program
counter register and fetch/decode pipeline register from updating. Please note down at the same
the other half of the pipeline needs to work with current instruction which does not have any
effect. So we need to insert bubble and how can we do this is shown in Fig. 2.4b below.
lw $A , 10 ($B)
and $C , $A , $U
add $K, SA , $N
or
$D, $C, $A
and $J , $H, $G
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
17
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
Fig. 2.4c Pipeline Datapath with forward as well hazard detection unit
Same way control hazard can be resolved by different techniques, out of some one bit branch
prediction technique is involved with this project. The need of such arises to make a decision
when other instructions are executing and we have to determine the result of one instruction.
There are two solutions for control hazard, one is to stall the functionality of the processor and
another is to flush the current instruction and start everything again. The later is most expensive
in terms of performance than the former but in todays processors in which complexity has
grown and there are numbers of different instructions are being supported by this processor so
flushing is not an option. The famous ways of resolving such hazard is through dynamic branch
prediction method which is out of the scope of this project as it in self is very complex and
individual topic to work on.
It is to note that processor must need to start fetching of an instruction following branch
instruction on the next clock cycle and this invites the problem as pipeline does not know which
instruction is next and which it should be as it receives any instruction from memory. Processor
uses prediction to handle control hazard (branch). The simplest approach is to always consider in
other words predicts that branch is not taken so only when branch is taken the processor will be
stalled.
18
Courtesy of Computer Organization and Design, 4th edition by David A. Patterson and John L. Hennessy
Fig. 2.4d The solution of control hazard by predicting that branch is not taken.
Fig. 2.4d shows the pipeline structure when branch is not taken. The bottom picture shows that
the branch is taken as a result we are inserting bubble which means stalling the pipeline. But
when we are wrong in the case of branch is untaken, the only option is flush the pipeline as
explained earlier. For our case in this project such an approach is okay since we are not
supporting so many instructions but with deeper pipelines, this branch penalty increases when we
measure in clk cycles. Branch penalty even increases in the case of instruction lost and it means
that in an aggressive pipeline such a static prediction waste too much of performance as said
earlier.
Here in this project the branch prediction method is not included but in the case of branch the
static prediction is involves just to understand the pipeline stall fundamental and there is a one
test case written for the same which shows that in the case of branch instruction the pipeline is
stalled and that is shown in waveforms in Appendix B.
19
Used
1024
Available
12,292
Utilization
9%
696
263
1956
12,292
17%
1208
1208
0
6144
1208
1208
0%
1958
12,292
17%
1956
6
146
2
2
0
240
32
61%
6%
Logic Distribution
20
References
21
1. Computer Organization and Design, 4th edition by David A. Patterson and John L.
Hennessy
2. IIT Kharagpur video lectures.
3. MIPS Architecture class notes:
http://pages.cs.wisc.edu/~smoler/x86text/lect.notes/MIPS.html (10/24/2012)
4. Synopsys VCS User guide
5. MIPS Architecture and Assembly Language Overview Adapted from:
http://edge.mcs.dre.g.el.edu/GICL/people/sevy/architecture/MIPSRef(SPIM).html
(10/24/2012)
6. UC Berkely and Princeton universitys available online Computer Architecture class
notes
7. Xilinx ISE 13.2 user guide
22
fetch.v
module fetch(Instruction,Instruction_Reg,PC_+4_reg,flush,clk,Hazard_in,PC_+4,);
input [31:0] Instruction,PC_+4;
input Hazard_in,clk,flush;
output [31:0] Instruction_Reg, PC_+4_reg;
reg [31:0] Instruction_Reg, PC_+4_reg;
initial begin
Instruction_Reg = 0;
PC_+4_reg = 0;
end
always@(posedge clk)
begin
if(flush)
begin
Instruction_Reg <= 0;
PC_+4_reg <=0;
end
else if( ~ Hazard_in)
begin
Instruction_Reg <= 32b0;
PC_+4_reg <= 32b0;
end
else
Instruction_Reg <= Instruction;
PC_+4_reg <= PC_+4;
end
23
end
end
endmodule
decode.v
module
decode(A_Data,B_Data,immediate_value,RegRs,RegRt,clk,Write_Back,Memory,Execution,A_
Data,RegRd,reg_WriteBack,reg_Memory,Executionreg,
flop_Rs,flop_Rt,flop_Rd,flop_A_Data,flop_B_Data,immediate_valuereg);
input clk;
input [1:0] Write_Back;
output [1:0] reg_WriteBack;
input [2:0] Memory;
output [2:0] reg_Memory;
reg_WriteBack = 0;
reg_Memory = 0;
reg_Execution = 0;
flop_A_Data = 0;
flop_B_Data = 0;
immediate_valuereg = 0;
flop_Rs = 0;
flop_Rt = 0;
flop_Rd = 0;
end
always@(posedge clk)
begin
reg_WriteBack <= Write_Back;
reg_Memory <=Memory;
reg_Execution <= Execution;
flop_A_Data <= A_Data;
flop_B_Data <= B_Data;
immediate_valuereg <= immediate_value;
flop_Rs <= RegRs;
flop_Rt <= RegRt;
flop_Rd <= RegRd;
end
endMemoryodule
25
execution.v
module
execution(RegRD,WriteDataIn,flop_Memory,clk,WriteBack,Memory,ALU_out,flop_WriteBack
,flop_ALU,flop_Rd,WriteDataOut);
input clk;
input [1:0] WriteBack;
output [1:0] flop_WriteBack;
input [2:0] Memory;
output [2:0] flop_Memory;
input [4:0] RegRD;
output [4:0] flop_Rd;
input [31:0] ALU_out,WriteDataIn;
output [31:0] flop_ALU,WriteDataOut;
reg [31:0] flop_ALU,WriteDataOut;
reg [4:0] flop_Rd;
reg [1:0] flop_WriteBack;
reg [2:0] flop_Memory;
initial begin
flop_ALU=0;
WriteDataOut=0;
flop_Rd=0;
flop_WriteBack=0;
flop_Memory=0;
end
always@(posedge clk)
begin
flop_WriteBack <= WriteBack;
flop_Memory <= Memory;
flop_ALU <= ALU_out;
flop_Rd <= RegRD;
WriteDataOut <= WriteDataIn;
end
endMemoryodule
26
memory.v
module
Memory_ory(Reg_RD,write_backreg,Memory_reg,ALU_reg,Reg_Rdreg,clk,write_back,Memor
y_out,ALU_Out);
input clk;
input [1:0] write_back;
input [4:0] Reg_RD;
input [31:0] Memory_out,ALU_Out;
output [1:0] write_backreg;
output [31:0] Memory_reg,ALU_reg;
output [4:0] Reg_Rdreg;
initial begin
write_backreg = 0;
Memory_reg = 0;
ALU_reg = 0;
Reg_Rdreg = 0;
end
always@(posedge clk)
27
begin
write_backreg <= write_back;
Memory_reg <= Memory_out;
ALU_reg <= ALU_Out;
Reg_Rdreg <= Reg_RD;
end
endmodule
memory_data.v
module DATAMem(Write_data,Read_data,Mem_Write,Mem_Read,Address,);
input [31:0] Address,Write_data;
input Mem_Write,Mem_Read;
output [31:0] Read_data;
always@(Address,Write_data,Mem_Write,Mem_Read)
if(Mem_Write)
begin
regfile[Address]<=Write_data; //Write Operation
end
always@(Address,Write_data,Mem_Write,Mem_Read)
28
if(Mem_Read)
Read_data <= regfile[Address];//read operation
endmodule
memory_instruction.v
module memory_instruction(PC,instruction);
input [31:0] PC;
output [31:0] instruction;
reg [31:0] regfile[511:0];//32 32-bit register
assign instruction = regfile[PC]; //instruction is receiving PC value
endmodule
noclk_mux.v
module nonclk_mux(A,A0,A1,A2,A3,Out);
input [1:0] A;
input [31:0] A3,A2,A1,A0;
output [31:0] Out;
reg [31:0] Out;
always@(A,A3,A2,A1,A0)
begin
case(A)
2b00:
Out <= A0;
2b01:
Out <= A1;
2b10:
Out <= A2;
29
2b11:
Out <= A3;
endcase
end
endmodule
ALU.v
module ALU_Unit(control_ALU_Unit,DataA,DataB,Result);
input [3:0] control_ALU_Unit; //2^4 =16 possibilities
input [31:0] DataA,DataB; //32 bit data
output [31:0] Result;
initial begin
Result = 32d0;
end
always@(control_ALU_Unit,DataA,DataB)
begin
case(control_ALU_Unit)
4b0000:
//and instruction
30
4b0001:
//or instruction
4b0010:
//add instruction
4b0011://multiply instruction
Result <= DataA*DataB;
4b0100:
//nor instruction
begin
Result[0] <= !(DataA[0]|DataB[0]);
Result[1] <= !(DataA[1]|DataB[1]);
Result[2] <= !(DataA[2]|DataB[2]);
Result[3] <= !(DataA[3]|DataB[3]);
Result[4] <= !(DataA[4]|DataB[4]);
Result[5] <= !(DataA[5]|DataB[5]);
Result[6] <= !(DataA[6]|DataB[6]);
Result[7] <= !(DataA[7]|DataB[7]);
Result[8] <= !(DataA[8]|DataB[8]);
Result[9] <= !(DataA[9]|DataB[9]);
Result[10] <= !(DataA[10]|DataB[10]);
Result[11] <= !(DataA[11]|DataB[11]);
Result[12] <= !(DataA[12]|DataB[12]);
Result[13] <= !(DataA[13]|DataB[13]);
31
4b0101:
//divide instruction
4b0110:
//sub instruction
32
4b0111:
//slt instruction
4b1000:
//sll instruction
4b0110:
//srl instruction
4b1001://xnor
Result <= DataA ^~ DataB;
4b1010://MAX
if (DataA > DataB)
Result <= DataA;
else
Result <= DataB;
4b1011://absolute sub
if (DataA > DataB)
Result <= DataA DataB;
else
Result <= DataB DataA;
4b1111://xor
Result <= DataA^DataB;
default: //Error checking
begin
$display(Cheking error);
Result = 0;
end
33
endcase
end
endmodule
control_ALU.v
module control_ALU(andi,ori,addi,ALU_Op,operation,ALU_control);
input andi,ori,addi;
input [5:0] operation;
input [1:0] ALU_Op;
output [3:0] ALU_control;
2b01:
ALU_control = 4b0110;
34
2b10:
begin
if(operation==6b100100)
ALU_control = 4b0000;//and
if(operation==6b100101)
ALU_control = 4b0001;//or
if(operation==6b100000)
ALU_control = 4b0010;//add
if(operation==6b011000)
ALU_control = 4b0011;//multi
if(operation==6b100111)
ALU_control = 4b0100;//nor
if(operation==6b011010)
ALU_control = 4b0101;//div
if(operation==6b100010)
ALU_control = 4b0110;//sub
if(operation==6b101010)
ALU_control = 4b0111;//slt
if(operation==6b101011)
ALUCon = 4b1001;//xnor
if(operation==6b101110)
ALUCon = 4b1010;//Max
if(operation==6b101111)
ALUCon = 4b1011;//absolute sub
if(operation==6b111111)
35
ALUCon = 4b1111;//xor
end
2b11:
begin
if(andi)begin
ALU_control = 4b0000;//andi
end
if(ori) begin
ALU_control = 4b0001;//ori
end
if(addi)
ALU_control = 4b0010;//addi
end
endcase
end
endmodule
control_unit.v
//follow chapter 5 from the book
Memoryodule control_unit(Opcode,Out,juMemoryp,bne,immediate,andi,ori,addi);
input [5:0] Opcode;
output[8:0] Out;
output juMemoryp,bne,immediate,andi,ori,addi;
36
wire regdst,alusrc,Memorye_toreg,regwrite,Memorye_read,Memorye_write,branch;
// Memoryicrocode control
assign regdst = r;
assign alusrc = lw|sw|immediate;
assign Memorye_toreg = lw;
assign regwrite = r|lw|immediate;
37
// Execution control
assign Execution[3] = regdst;
assign Execution[2] = alusrc;
assign Execution[1] = r;
assign Execution[0] = beq;
//Memory control
assign Memory[2] = branch;
assign Memory[1] = Memorye_read;
assign Memory[0] = Memorye_write;
//WriteBack control
assign WriteBack[1] = Memorye_toreg; //not saMemorye as diagraMemory
assign WriteBack[0] = regwrite;
//output control
assign Out[8:7] = WriteBack;
assign Out[6:4] = Memory;
assign Out[3:0] = Execution;
endMemoryodule
38
hazard_detection.v
module Hazard_Detection(Branch, Stall, clk ,Fetch_RegRs, Fetch_RegRt, Decode_RegRt,
Deode_MemRead, Decode_RegWrite);
input [4:0] Fetch_RegRs, Fetch_RegRt, Decode_RegRt;
input Deode_MemRead, Branch, Decode_RegWrite, clk;
output Stall;
reg Stall, Stall_two;
initial begin
Stall <= 0;
Stall_two <= 0;
end
always @ (negedge clk) begin
Stall <= 0;
if (Branch) begin
if (Deode_MemRead && ((Fetch_RegRs == Decode_RegRt) || (Fetch_RegRt == Decode_RegRt))) begin
Stall <= 1;
Stall_two <= 1;
end else if (Decode_RegWrite && ((Fetch_RegRs == Decode_RegRt) || (Fetch_RegRt ==
Decode_RegRt))) begin
Stall <= 1;
Stall_two <= 0;
end else if (Stall_two) begin
Stall <= 1;
Stall_two <= 0;
end
end else if (Deode_MemRead && ((Fetch_RegRs == Decode_RegRt) || (Fetch_RegRt ==
Decode_RegRt))) begin
Stall <= 1;
Stall_two <= 0;
end else begin
Stall <= 0;
end
end
endmodule
39
forward_data.v
module forward_data(ForwardA, ForwardB, ForwardA_Branch, ForwardB_Branch , Fetch_RegRs,
Fetch_RegRt, Branch, Decode_RegRs, Decode_RegRt, Execution_RegWrite, Execution_RegRd,
Memory_RegWrite, Memory_RegRd );
input [4:0] Decode_RegRs, Decode_RegRt, Execution_RegRd, Memory_RegRd, Fetch_RegRs,
Fetch_RegRt;
input Execution_RegWrite, Memory_RegWrite, Branch;
output [1:0] ForwardA, ForwardB, ForwardA_Branch, ForwardB_Branch;
reg [1:0] ForwardA, ForwardB, ForwardA_Branch, ForwardB_Branch;
initial begin
ForwardA = 2b00;
ForwardB = 2b00;
ForwardA_Branch = 2b00;
ForwardB_Branch = 2b00;
end
always @ (Decode_RegRs or Decode_RegRt or Execution_RegRd or Memory_RegRd or Fetch_RegRs or
Fetch_RegRt or Execution_RegWrite or Memory_RegWrite) begin
if (Execution_RegWrite && (Execution_RegRd != 5b0) && (Execution_RegRd == Decode_RegRs))
ForwardA <= 2b10;
else if (Memory_RegWrite && (Memory_RegRd != 5b0) && (Memory_RegRd == Decode_RegRs))
ForwardA <= 2b01;
else
ForwardA <= 2b00;
if (Execution_RegWrite && (Execution_RegRd != 5b0) && (Execution_RegRd == Decode_RegRt))
ForwardB <= 2b10;
else if (Memory_RegWrite && (Memory_RegRd != 5b0) && (Memory_RegRd == Decode_RegRt))
ForwardB <= 2b01;
else
ForwardB <= 2b00;
if (Branch) begin
if (Execution_RegWrite && (Execution_RegRd != 5b0) && (Execution_RegRd == Fetch_RegRs))
ForwardA_Branch <= 2b10;
else if (Memory_RegWrite && (Memory_RegRd != 5b0) && (Memory_RegRd == Fetch_RegRs))
ForwardA_Branch <= 2b01;
else
40
endmodule
cpu_top.v
/**
1. defining 5 stage and control unit variables
2. assign statement //follow chapter 6 carefully
3. port mapping of 5 stages as well control unit, non clk multiplexer and ALU unit.
4. defining cycle 41ont41le for debugging purpose (VCS)
**/
module cpu_top(clk);
input clk;
//1st stage :fetch_
wire [31:0] nExecution_tpc,fetch_pc_plus_4,fetch_instruction;
reg [31:0] pc;
//2nd stage :decode_
wire PCSrc;
wire [4:0] decode_RegRs,decode_RegRt,decode_RegRd;
wire [31:0] decode_pc_plus_4,decode_instruction;
41
42
initial begin
pc = 0;
cycle = 0;
end
//: instruction Fetch (fetch_)
assign PCSrc =
((decode_RegAout==decode_RegBout)&decode_control[6])|((decode_RegAout!=decode_RegBout)&bn
e);
assign nExecution_tpc = PCSrc ? Branch_Address : PCMuxOut;
assign fetch_pc_plus_4 = pc + 4;
assign fetch_Flush = PCSrc|jump;
always @ (posedge clk) begin
fetch_(PC_Write)
begin
pc = nExecution_tpc; //update pc
$display(PC: %d,pc);
end
else
$display(do not write to PC nop); //nop 43ont update
end
memory_instruction memory_instr(pc,fetch_instruction);
assign decode_RegRs[4:0]=decode_instruction[25:21];
assign decode_RegRt[4:0]=decode_instruction[20:16];
assign decode_RegRd[4:0]=decode_instruction[15:11];
assign decode_immediate_value =
{decode_instruction[15],decode_instruction[15],decode_instruction[15],decode_instruction[15],decode
_instruction[15],decode_instruction[15],decode_instruction[15],decode_instruction[15],
decode_instruction[15],decode_instruction[15],decode_instruction[15],decode_instruction[15],decode
_instruction[15],decode_instruction[15],decode_instruction[15],decode_instruction[15],decode_instruc
tion[15:0]}; ///sign extension
assign Branch_Address = (decode_immediate_value << 2) + decode_pc_plus_4; //matching 32 bit
assign JumpTarget[31:28] = fetch_pc_plus_4[31:28];
assign JumpTarget[27:2] = decode_instruction[25:0];
assign JumpTarget[1:0] = 0;
assign decode_control = Hazard_mux_control ? out_control : 0;
assign PCMuxOut = jump ? JumpTarget : fetch_pc_plus_4;
Hazard_detection Hazard(
Branch, Stall, clk ,Fetch_RegRs, Fetch_RegRt, Decode_RegRt, Deode_MemRead, Decode_RegWrite
);
control_unit control(decode_instruction[31:26],out_control,jump,bne,immediate,andi,ori,addi);
pipeline_regs
registres(clk,WriteBack_WriteBack_[0],datatowrite,WriteBack_RegRd,decode_RegRs,decode_RegRt,dec
ode_RegAout,decode_RegBout);
// 2nd stae of pipeline : instruction Decode (decode_)
decode
decodereg(clk,decode_control[8:7],decode_control[6:4],decode_control[3:0],decode_RegAout,decode_
RegBout,decode_immediate_value,
decode_RegRs,decode_RegRt,decode_RegRd,Execution_WriteBack,Execution_M,Execution_Execute,Exe
cution_RegAout,Execution_RegBout,Execution_immediate_value,Execution_RegRs,Execution_RegRt,Exe
cution_RegRd);
44
// ALU control
assign alu_op[0] =
(~decode_instruction[31]&~decode_instruction[30]&~decode_instruction[29]&decode_instruction[28]
&~decode_instruction[27]&~decode_instruction[26])|(immediate);
assign alu_op[1] =
(~decode_instruction[31]&~decode_instruction[30]&~decode_instruction[29]&~decode_instruction[28]
&~decode_instruction[27]&~decode_instruction[26])|(immediate);
control_ALU
control_ALU(andi,ori,addi,Execution_Execute[1:0],Execution_immediate_value[5:0],control_ALU);
ALU_Unit ALU(control_ALU,ALU_SrcA,ALU_SrcB,Execution_ALUOut);
// 3rd stage of pipeline
Execution_Memory
Execution_Mem_reg(regtopass,Execution_RegBout,Memory_Mem_,clk,Execution_WriteBack,Execution
_M,Execution_ALUOut,Memory_WriteBack,Mem_ALUOut,Mem_RegRd,
Memory_WriteData);
45
Memory
Memory_WriteBackreg(Mem_RegRd,WriteBack_WriteBack_,WriteBack_ReadData,WriteBack_ALUOut,
WriteBack_RegRd,clk,Memory_WriteBack,Mem_ReadData,Mem_ALUOut);
tb_cpu.v
module tb_cpu;
integer i;
reg Clk;
initial
begin
$vcdplusfile(cpu.vpd);
$vcdpluson;
$vcdplusmemon;
end
46
initial begin
Clk = 1;
end
//clk controls
always begin
clk = ~clk;
#25;
end
initial begin
// Initilization of Instruction Memory
Instruction_Memory_register[0] = 32h012A4020; //add R5,R3,R4
Instruction_Memory_register[4] = 32h012A4023; //sub R6,R5,R4
Instruction_Memory_register[8] = 32h2128000C; //addi R3, R3, 12
Instruction_Memory_register[12] = 32h01090018; //mult $t0, $t1
Instruction_Memory_register[16] = 32h0109001B;//j
Instruction_Memory_register[20] = 32h012A4024; // and R7,R3,R4
Instruction_Memory_register[24] = 32h00094280; //sll R5,R11,R3
Instruction_Memory_register[28] = 32h0094282A;//srl R6,R7,R9
Instruction_Memory_register[32] = 32h8D28000C; //lw R4,1(R0)
Instruction_Memory_register[36] = 32hAD28000C; //add R5,R3,R4
Instruction_Memory_register[40] = 32h1509000C; //bne $t0, $t1, 12
Instruction_Memory_register[44] = 32h012A402A; //slt R10,R6,R5
Instruction_Memory_register[48] = 32h0166601A;//div R12,R11,R6
Instruction_Memory_register[52] = 32h34CE0002;//ori R14,R6,2
47
48
endmodule
49
Appendix B : Output
B.1 initial information form vcs.log
Chronologic VCS
Version G-2012.09 Wed Oct 11 16:50:47 2012
Copyright 1991-2012 by Synopsys Inc.
ALL RIGHTS RESERVED
This program is proprietary and confidential information of Synopsys Inc.
and may be used and disclosed only as authorized in a license agreement
controlling such use and disclosure.
Warning-[DFLT_OPT] Default option found
Option -ntb_opts dtm is already default. Future releases of VCS may
not
accept -ntb_opts dtm.
Warning-[OBSLFLGS] Obsolete flag(s) used
The flag(s) -no_error is(are) obsolete and will not be supported
courtesy of
this release. Please use -error=no<ID> instead.
Please contact vcs_support@synopsys.com or call VCS Customer Support at
1-800-VERILOG for any questions about obsolete switches.
Warning-[LCA_FEATURES_ENABLED] Usage warning
LCA features enabled by -lca argument on the command line. For more
information regarding list of LCA features please refer to Chapter LCA
features in the VCS/VCS-MX Release Notes
Parsing design file ../../arc_6300/TESTBENCH/MIPS/tb_cpu.v
Parsing design file ../RTL/MIPS/Control_unit.v
Parsing design file ../RTL/MIPS/control_ALU.v
Parsing design file ../RTL/MIPS/ALU_unit.v
Parsing design file ../RTL/MIPS/noclk_mux.v
Parsing design file ../RTL/MIPS/memory_data.v
Parsing design file ../RTL/MIPS/execution.v
Parsing design file ../RTL/MIPS/Hazard_detection.v
Parsing design file ../RTL/MIPS/decode.v
Parsing design file ../RTL/MIPS/fetch.v
Parsing design file ../RTL/MIPS/memory_instruction.v
Parsing design file ../RTL/MIPS/memory.v
Parsing design file ../RTL/MIPS/pipeline_regs.v
Parsing design file ../RTL/MIPS/Forward_data.v
Parsing design file ../RTL/MIPS/cpu_top.v
Top Level Modules:
tb_cpu
No TimeScale specified
Starting vcs inline pass...
modules and 0 UDP read.
50
51
B.2 Waveforms:
First screenshot shows the cycle count as well all the stages and correspondent register value.
This waveform in Fig. B.2a shows that the different instructions are loaded correctly and are
being clocked from the fetch unit to decode unit that means our decode unit is working correctly
as well instruction memory and of course fetch unit. If we examine this waveform carefully then
we can see that at every clock edge the instruction is being transferred to decode stage from fetch
stage. So as written in testbench for correspondent PC , we can see that instruction is loaded in
the unit correctly.
Below in Fig. B.2b, waveform again shows that the instructions are being loaded correctly as per
the written test cases. Here in this and waveform B.2c if check carefully the PC number 40 then
it is trying to execute one of the odd test case which tests CPU in the case of data hazard where
we need to stall the pipeline to execute the subsequent instruction correctly. Below is the specific
test case from testbench where we must need to stall the pipeline and that is the example that
shows our hazard detection unit is working fine. Please do not confuse with the PC number since
in waveform it is always one PC ahead then we have in testbench since we can see working
52
instruction in CPU unit only in decode stage and that one cycle after fetch unit fetches from
Instruction Memory.
Instruction_Memory_register[32] = 32h8D28000C; //lw R4,1(R0)
Instruction_Memory_register[36] = 32hAD28000C; //add R5,R3,R4
So, this is the particular test where in add instruction wants to use R4 that is the destination
register (write) for load word instruction. Now load is the longest instruction that travels all five
stages of the pipeline and this is where out hazard detection policy comes in to the action and
saving our CPU from being frozen for more number of cycles.
53
Fig. B.2d is another example of stall but this time it is due to branch instruction as well store
word instruction. This is the case where testbench is testing the CPU extensively two back to
back cases where stall should be apply to work ahead and waveform B.2d shows that it happens
correctly. To understand the functionality of the pipeline please examine the signals Branch,
Branch_zero and Branch_taken and the flush is applied to flush the pipeline which we can see by
watching flush signal and so correspondent decoded instruction which is nothing. And then after
from cycle number 19 now again pipeline has started working for the next instruction.
54
55
+incdir+
This includes the current directory structure.
+vcs+vcdpluson
56
To enable the dumping for the entire design this switch needs to be used.
+plusarg_ignore
This command is used to tell the VCS MX not to compile certain runtime options.
-assert enable_diag
This command is used to enable the control of the results that reports to runtime options. Runt
time assert options are enables only if we use this switch.
57
58