Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Design Challenges
Larger register file needed to hold multiple
contexts.
Not affecting clock cycle time, especially in
Instruction issue- more candidate instructions need
to be considered
Instruction completion- choosing which instructions
to commit may be challenging
Observation
There are mainly two observations
Potential performance overhead due to
multithreading is small
Efficiency of current superscalar is low with the
room for significant improvement
Transient Faults
Future is worse
smaller feature size, reduce voltage, higher transistor
count, reduced noise margin
R1 (R2)
THREAD
THREAD
Input
Replication
Output
Comparison
thread2
Instruction
Scheduler
Functional
Units
+ Lower cost
avoids complete replication
CRT borrows the detection scheme from the SMT-based Simultaneously and
Redundantly Threaded (SRT) processors and applies the scheme to CMPs.
replicated two communicating threads (leading & trailing threads)
compare the results of the two.
CRTs leading thread commits stores only after checking, so that memory is guaranteed to
be correct.
CRT compares only stores and uncached loads, but not register values, of the two threads.
CRT uses a store buffer (StB) in which the leading thread places its committed store values
and addresses. The store values and addresses of the trailing thread are compared against
the StB entries to determine whether a fault has occurred. (one checked store reaches to
the cache hierarchy)
Performance Evaluation
Forwarding: IP Forward
Authentication: MD5
Encryption: 3DES
SS
FGMT
CMP
SMT
Workloads have little ILP
Need to exploit packet-level parallelism
CMP and SMT do just that
I-Cache:
Instruction bandwidth
I-Cache misses: Since instructions are being grabbed from many different
contexts, instruction locality is degraded and the I-cache miss rate rises.