Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Application
Compiler
Micro-code
Operating
System
Machine
Organizatio
n
Circuit
Design
Instruct
ion Set
Archite
cture
Compact
Simple
Scalable
(64bit)
Spare
opcodes
Amenable to hw
implementa tion.
express
parallelism
T
u
ri
n
g
C
o
m
p
l
e
t
e
M
a
k
e
c
a
s
e
f
a
s
t.
t
h
e
c
o
m
m
o
n
E
a
s
y
t
o
v
4
5
erify
Cost effective
easy to compile
for
Co
nsi
ste
nt/
re
g
ul
ar
/
o
rt
h
o
g
o
n
a
l
R
e
g
u
l
a
r
i
n
s
t
r
u
c
ti
o
n
f
o
r
m
a
t.
G
o
o
d
O
S support
protection
VM
Interrupts
Ea
sy
for
the
p
r
o
g
ra
m
me
rs
Crafting an ISA
1
2
completeness
orthogonality
regularity and simplicity
compactness
ease of programming
ease of implementation
Basic Questions
1
Operat
ions
destination
operand
how
many
?
2
what
kinds
?
Opera
nds
how
many?
location
types
how
to
specify?
Instruc
tion
format
how
does
t
h
e
c
o
m
p
u
t
e
r
k
n
o
w
w
h
at
m
e
a
n
s
?
size
how
many
form
ats?
operand
s
o
p
e
r
a
t
i 0001
o 0100
n 1101
1111
y=x+b
source
Operand Location
1
Stack:
0 address add tos tos + next
Register-Memory:
2 address
add Ra B
Ra Ra + EA(B)
3 address
add Ra Rb C
Ra Rb + EA(C)
Rb
Ra
Load/Store:
3 address
Rc
add Ra Rb
load Ra
stor
e Ra
Ra
Rb
Rb +
mem[
Rc
Rb]
Rb]
mem[
Ra
A load/store
architecture has
instructi
ons
that do
either
ALU
operati
ons or
access
memory,
but never
both.
Functionality
calculate: A = X * Y - B * C
X Y B
SP
C A
+4
+8
+12
+16
m
stack
e
accumulato m
or
r
y
registerlo
Functionality
calculate: A = X * Y - B * C
X Y B
SP
C A
+4
+8
+12
+16
stack
accumula
tor
registermemory
load-
s
t
o
r
e
Pu
sh
Functionality
calculate: A = X * Y - B * C
X Y B
SP
C A
+4
+8
+12
+16
stack
Push 8(SP)
Push 12(SP)
Mult
Push 0(SP)
Push 4(SP)
Mult
Sub
Store 16(SP)
Pop
Functionality
calculate: A = X * Y - B * C
X Y B
SP
C A
+4
+8
+12
+16
stack
Push 8(SP)
Push 12(SP)
Mult
Push 0(SP)
Push 4(SP)
Mult
Sub
Store 16(SP)
Pop
Functionality
Sub
Store 16(SP)
Store 16(SP)
Pop
calculate: A = X * Y - B * C
SP
+4
+8
+12
+16
X
Y
B
C
A
stack accumulator
register-memory
Push
Push
Mult
Push
Push
Mult
8(SP)
12(SP)
0(SP)
4(SP)
Load 8(SP)
Mult 12(SP)
Store 20(SP)
Load 4(SP)
Mult 0(SP)
Sub 20(SP)
load-store
Load R1 0(SP) Load R2 4(SP)
Load
R3
8(SP)
Load R4
12(SP)
Mult R5
R1 R2
Mult R6
R3 R4
Sub R7
R5 R6
St
16(SP)
R7
Trade-offs
1
Stack
Short instructions
Lots of instructions
Simple hardware
Accumulator
See stack
Register-memory
Expressive instructions
Few instruction
Load-store
Simple
Memory Considerations
Memory Considerations
Instruction Operands
non-memory
Register direct
Add R4, R3
Immediate
Add R4, #3
Memory
Displacement
Indirect
Indexed
Direct
Mem. indirect
Autoincrement
Autodecrement
Conclusion?
Which Operations?
1
Arithmetic
add, subtract, multiply, divide
Logical
and, or, shift left, shift right
Data Transfer
load word, store word
Control flow
branch
PC-relative
1 displacement added to the program counter to get target address
Does it make sense to have more complex instructions?
-e.g., square root, mult-add, matrix multiply, cross product ...
the 3%
criteria
Branch Decisions
1
conditional branch
jump
procedure call
procedure return
Branch Conditions
Condition Codes
Processor status bits are set as a side-effect of executed
instructions or explicitly by a compare and/or test
instruction Ex: sub r1, r2, r3
bz label
Condition Register
Ex: cmp r1, r2, r3
bgt r1, label
Displacement Size
Conclusio
ns?
Compiler/ISA Interaction
1
Compiler wants
System/Compiler/ISA Issues
Parameter passing
Accessing data
Stack
global
Addressing modes
immediate (8-16 bits)
displacement (12-16 bits)
register indirect
MIPS64 instruction
set architecture
1
2
3
MIPS instructions
Read on your own and become comfortable speaking MIPS
LD
R1, 1000(R2)
DADDU
DADDI
JALR
JR
BEQZ
R1, R2, R3
RA gets PC + 4; Jump to R2
Jump to R3
If R5 == 0, jump to label (label is
within displacement)
Each
instruction
word contains
multiple
operations
1
The semantics of
the ISA say that
they happen in
parallel
VLIW Example
$s3$s1
= 1; $s2 = 1,
=4
add $s2, $s1, $s3
$s3sub $s5, $s2,
R
I = 5 Sub sees 5 s2
S VLIW
C instruction
word :
$s3$s1
= 1; $s2 = 1,
=4
c <add $s2, $s1,
$s3; sub $s5, $s2,
o $s3>
d sub sees s1
e = 1.
27
VLIWs History
1
28
Trace Scheduling
1
2
3
Trace Scheduling
3
1
Loops
4
5
Unroll them.
1
2
Function calls
32
VLIW Today
1
2
3
33
Beyond VLIW
1
2
3
4
1
2
34
Key Points
1