Está en la página 1de 13

A Selective Trigger Scan Architecture

for VLSI Testing


Mohammad Hosseinabady, Shervin Sharifi, Fabrizio Lombardi, Senior Member, IEEE, and
Zainalabedin Navabi, Senior Member, IEEE
AbstractTime, power, and data volume are among some of the most challenging issues for testing System-on-Chip (SoC) and have
not been fully resolved, even if a scan-based technique is employed. A novel architecture, referred to the Selective Trigger Scan
architecture, is introduced in this paper to address these issues. This architecture reduces switching activity in the circuit-under-test
(CUT) and increases the clock frequency of the scanning process. An auxiliary chain is utilized in this architecture to avoid the large
number of transitions to the CUT during the scan-in process, as well as enabling retention of the currently applied test vectors and
applying only necessary changes to them. The auxiliary chain shifts in the difference between consecutive test vectors and only the
required transitions (referred to as trigger data) are applied to the CUT. Power requirements are substantially reduced; moreover, DFT
penalties are reduced because no additional multiplexer is utilized along the scan path. Data reformatting is applied in order to make
the proposed architecture amenable to data compression, thus permitting a further reduction in test time. It also permits delay fault
testing. Using ISCAS 85 and 89 benchmark circuits, the effectiveness of this architecture for improving SoC test measures (such as
power, time, and data volume) is experimentally evaluated and confirmed.
Index TermsScan test, test data volume, test application time, test power, test compression, delay testing.

1 INTRODUCTION
I
NTELLECTUAL property (IP) cores are commonly used for
designing a System-on-Chip (SoC). Although IP cores can
help to reduce the design cycle time, they still pose many
challenges when testing is considered. The precomputed
test patterns that are provided by core vendors must be
applied to each core within the power constraints of the
whole SoC. As a system integrator may use a core in
different platforms with diverse test mechanisms (whether
for on-chip or off-chip implementation), the test mechanism
of the core must take into account issues related to data
volume, application time, and power consumption during
test. Moreover, other models (such as for delay faults) must
be considered to improve the overall test quality. A
comprehensive solution is very difficult; such a solution
requires major changes in different parts of the design as
provided by the IP providers. Power and test data volume
are especially challenging:
Test power. The increased use of portable computing
and wireless communication (together with a growing
density and a higher operational frequency) have made
power dissipation an important issue in both the design and
test of VLSI circuits. Power consumption in CMOS circuits
can be static or dynamic. Dynamic power consists of
switching power and short circuit power. Switching power
results from the activity of a circuit in changing its states
due to the charging and discharging of the effective
capacitive loads. Dynamic power significantly contributes
to total power dissipation. Switching power dissipation is
given by PP
Dynamic Dynamic
CV CV
2
cc
ff, where C is the capacitance of
the switching nodes, V cc is the supply voltage, and f is the
effective operating frequency. As the activity of the test
input signal is significantly higher than during normal
operation, power dissipation can be substantially higher
while testing takes place [1], [2]. However, power con-
straints are usually defined with respect to the normal
operational mode. Currently, design techniques are em-
ployed to reduce power dissipation during the normal
mode of operation [2]. The power constraints that are
usually considered during design are much lower than the
power consumed during testing [3], thus causing severe
reliability problems. Furthermore, the current trend toward
VLSI circuit miniaturization prevents the use of dissipating
devices to remove excessive heat generated during test [2].
Test time and test data volume. Application time is one of
the sources of complexity when testing IP cores as commonly
found in SoCs. Random or deterministic vectors are either
generated by on-chip hardware for built-in self-test (BIST) or
provided by an external tester such as automatic test
equipment (ATE) for manufacturing test. For testing logic
blocks, the feasibility of on-chip test generation is mainly
restricted to (pseudo)random methods. In general, the area
overhead required by a dedicated (on-chip deterministic)
vector generationis rather prohibitive for manufacturingtest.
316 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 3, MARCH 2008
. M. Hosseinabady and Z. Navabi are with the Electrical and Computer
Engineering Department, Faculty of Engineering, University of Tehran,
North Kargar Ave., Tehran, Iran, 14395/894.
E-mail: mohammad@cad.ece.ut.ac.ir, h_314@yahoo.com,
navabi@ece.neu.edu.
. S. Sharifi is with the Computer Science and Engineering Department,
University of California, San Diego, 9500 Gilman Drive, La Jolla, CA
92093-0404. E-mail: shervin@ucsd.edu.
. F. Lombardi is with the Department of Electrical and Computer
Engineering, Northeastern University, 424 Dana, Boston, MA 02115.
E-mail: lombardi@ece.neu.edu.
Manuscript received 25 Feb. 2006; revised 15 Feb. 2007; accepted 21 May
2007; published online 28 Aug. 2007.
Recommended for acceptance by C. Metra.
For information on obtaining reprints of this article, please send e-mail to:
tc@computer.org, and reference IEEECS Log Number TC-0075-0206.
Digital Object Identifier no. 10.1109/TC.2007.70806.
0018-9340/08/$25.00 2008 IEEE Published by the IEEE Computer Society
Random vectors are usually generated using linear feedback
shift registers (LFSR), so no storage is required. Response
evaluation is performed using a signature analyzer that
compacts test responses intoa signature andcompares it with
the signature of an error-free reference design. LFSRs
introduce a small area overhead; however, testing by random
vectors requires a long application time due to its modest
quality. Hybrid schemes are commonly used to reduce test
time byreseedingthe LFSRs. Comparedwithrandomtesting,
deterministic vectors are designed to detect a set of target
faults. This significantly reduces the number of vectors
requiredfor large designs. Due to the highoverheadincurred
in on-chip generation, deterministic vectors are usually
provided from an external source (such as an ATE). The
application of external data (as for manufacturing test)
involves downloading vectors from a storage device to a
user interface workstation usually attached to the ATE.
Compression has been investigated for resolving some of the
problems associated with SoC testing. Lossless compression
is the process of encoding test vectors such that the original
data can be uniquely reconstructed by a decoder. A basic
feature of lossless compression is to decompose an input data
set into a sequence of events, then to encode the events using
as few bits as possible. Reordering test vectors in combina-
tional or full-scan circuits can increase similarity (and
redundancy) of data inthe vectors andconsequently increase
the compression rate. The relatively small I/O pin count is
one of the main causes of speed degradation for data transfer
across a chip. In general, for a manufacturing test, determi-
nistic solutions face many challenges for applicability to SoC;
a high storage volume, long application time through the
serial paths, andvector sets that are generatedbythirdparties
(that is, from the IP core providers) with limited information
are some of the unresolved issues associated with an efficient
test of these devices.
1.1 Previous Works
Recent years have seen the development of many techni-
ques for overcoming the aforementioned difficulties in VLSI
testing. In this section, some of these works are briefly
reviewed.
Power reduction. Circuits are often designed to
operate in two modes: normal and test modes. The test
mode usually dissipates more power than the normal
mode, especially if a scan mechanism is employed.
During the data scan-in process, the difference between
two adjacent bits moves through the scan path due to the
shift operation; many floating transitions are then applied
to the CUT. Fig. 1 shows this process for shifting the
a
k
vector a
k;0
; a
k;1
; a
k;2
; a
k;3
; a
k;4
when the a
k1
vector is in
the scan chain; (1) can be used to determine the number of
shift-in transitions:
S
k1;k
a
k10
a
k11
2a
k11
a
k12
3a
k12
a
k13
4a
k13
a
k14

5a
k14
a
k0
4a
k0
a
k1
3a
k1

a
k2
2a
k2
a
k3
a
k3
a
k4
:
1
Many techniques in the technical literature have been
proposed to reduce the number of these transitions for
power dissipation management. These techniques can be
categorized as follows:
. transition techniques to reduce the difference be-
tween two consecutive test vectors,
. transition techniques to reduce the effect of the
difference between two consecutive bits in the scan
chain,
. transition techniques by partitioning to reduce the
effective length of the scan chains,
. techniques to block transitions in a circuit,
. scan reordering techniques, and
. integrated techniques that use two or more of the
aforementioned techniques.
Sharifi et al. [4], [5] propose a technique for reducing the
power dissipation in scan-based structures used for testing
digital circuits. This method reduces both the switching
activity and static power. They propose a scan structure
along with an algorithm to find the optimum configuration
of the structure that results in minimum dynamic and static
power consumptions during the scan mode.
Baik et al. [6] have proposed the use of the so-called
Random Access Scan (RAS) for the simultaneous reduction
in test power, data volume, and application time. Two
techniquestest vector reordering and Hamming distance
reductionhave been proposed to reduce the total number
of RAS operations. An MISR has been used for output
response compaction. Dabholkar et al. [7] have proposed a
post-ATPG process to reduce power dissipation for full-scan
and combinational circuits; in this technique, test vector and
scan latch orderings are used to reduce power dissipation
during test application. In [8], an ATPG technique that
reduces switching activity (during testing of sequential
circuits with full-scan) has been presented. The tests
generated by this ATPG can be used for an at-speed testing
of chips and barely dies with no risk of damage due to
excessive heat. Sinanoglu et al. [9] have presented a
methodology in which the scan chain is modified by
inserting logic gates between the scan cells, thus reducing
the number of transitions. This methodology proposes the
introduction of gate delays to the scan path. Nicolici and
Al-Hashimi [10], Saxena et al. [11], and Rosinger et al. [12]
have proposed different approaches to divide the scan
chain into multiple length-balanced partitions and to enable
only one partition at each test clock; therefore, instead of
HOSSEINABADY ET AL.: A SELECTIVE TRIGGER SCAN ARCHITECTURE FOR VLSI TESTING 317
Fig. 1. Scan cell values during shift operation.
being all active at the same time, only a fraction of the scan
cells is active in each test clock cycle. Bhunia et al. [13] have
proposed a solution to reduce test power by inserting
blocking logic into the stimulus path of the scan flip-flops to
prevent propagation of the scan ripple effect to logic gates.
Ghosh et al. [34] describe a technique for minimizing
power dissipation during scan testing. For a given set of test
vectors, the (local) optimal reordering of the scan cells is
found to minimize a score function; the selected function is
a linear combination of power and area overhead. The scan
cell reordering technique in [35] uses a heuristic algorithm
to modify the order of the scan cells within the chain to
reduce switching activity.
Chen et al. [14] have presented an integrated approach to
reduce both power consumption and test application time.
This method is made possible by combining a scan
architecture (referred to as the multiple clock disabling
architecture) and several other techniques, including scan
cell and vector reordering.
Test time. A reduction in test time can be accomplished
using data compression and scan tree techniques.
The advantages of compression are twofold: It reduces
storage requirements and decreases test application time (in
this case, a smaller number of vectors are applied to a CUT).
Run-length coding, Huffman coding, LempelZiv [18], and
arithmetic coding are some of the compression techniques
found in the technical literature. In run-length coding, a
sequence of symbols is encoded into two elements (the
repeating symbol and the length of the sequence). Run-
length coding is efficient for data with long sequences of
equal symbols. Huffman coding is more sophisticated than
run-length coding. It uses a table of frequencies of
occurrence to build up an optimal representation of each
character as a binary string. Compression is accomplished
by allocating short codewords to frequent characters and
long codewords to infrequent characters. Jas and Touba [15]
have presented a method for using embedded processors as
aid to the execution of the test process. The tester loads a
program along with compressed data into the on-chip
memory; the processor executes a program that decom-
presses the data and applies it to the scan chains in the other
components of the SoC. The application of this method is
restricted by the assumption of availability of appropriate
functional units within the SoC architecture. Jas and Touba
[16] have introduced a scheme for compression/decom-
pression of test data using cyclic scan chains. It captures
repeated patterns in a given set of vectors by applying a
run-length coding technique to the difference vector
between consecutive vector pairs. Adjacent repeated data
result in long runs of zeros in the difference vector; thus,
they are effectively encoded by run-length coding. How-
ever, cyclic and noncontiguous repeating patterns cannot be
fully used in [16]. Compressed scan data is serially
transferred from a tester to a chip at the clock speed of
the tester and is decompressed by an internal decoder. Due
to on-chip data expansion, [16] employs a faster internal
clock to avoid overrunning data. Golomb and the related
Rice [19] coding methods are based on data with exponen-
tially distributed run lengths. The Golomb and Rice
methods consist of a family of parameterized codes that
can be estimated adaptively [20], thus giving a good
compression performance. Chandra and Chakrabarty [21]
have introduced the application of Golomb coding for test
data compression. A high compression rate, analytically
predictable compression results, and a low-cost scalable on-
chip decoder [24] are the major advantages of Golomb
coding [21], [24]. Although the compression results in [21]
are promising, the exponential distribution of 0 runs in the
data is not a guaranteed characteristic in practical VLSI test
data for industrial designs. Lingappan et al. [22] have
investigated the use of heterogeneous and multilevel
compression schemes and demonstrated that substantial
reductions in test volume are accomplished compared with
current test compression techniques. They have also
investigated how these techniques can be efficiently
implemented by exploiting functional components that are
already present in todays SoCs.
Scan tree design is often used to address different issues
such as test time [32], [33]. In a scan tree design, the chain is
divided into multiple scan chains and a cell may drive
multiple scan cells. This technique is suitable for test time
reduction. In this scheme, the same test data is stored in
different scan chains; however, this method suffers from
reduced controllability in the design. The effectiveness of a
scan tree architecture depends on the correlation between
test data in the structure of the scan tree. Bonhomme et al.
[32] have proposed a scan tree architecture for reducing test
application time. This technique is based on a dynamic
reconfiguration mode, allowing a reduction in the depen-
dence between a test set and the final scan tree architecture.
This procedure does not require additional test control
inputs and an MISR is used for circuit response compaction.
Miyase et al. [33] have proposed a scan tree method for
multiple scan inputs; in a multiple scan design, the number
of scan trees is equal to the number of scan inputs. As each
scan input drives a scan tree, then test data volume and
application time are dominated by the scan tree of
maximum height. They have also proposed a method for
test compression in multiple scan designs.
1.2 Our Contributions
This work proposes a novel scan architecture that is
referred to as the Selective Trigger Scan Architecture
(STSA). This scan architecture uses a triggering (enabling)
chain in addition to the data registers. Furthermore,
triggering chain hardware is designed to take advantage
of similar adjacent data for test compression. Instead of
shifting new serial data into the data registers, the
triggering chain decides where a data flip-flop must toggle
or retain its old value. Retaining data causes a small number
of transitions at the data register outputs and low power
dissipation. Along with test reformatting techniques, this
architecture can reduce test time and power. It can also
reduce data volume by enabling the application of
compression algorithms on its reformatted data. This
structure can be used as a core or chip-level design-for-
test (DFT) technique. In addition, it is applicable to delay
fault testing. When applied at the core level, substantial
improvements in power and test time can be achieved by
reformatting the precomputed vectors rather than starting
with a new set of tests.
318 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 3, MARCH 2008
The rest of this paper is organized as follows: Section 2
describes the proposed scan architecture. Test data refor-
matting (as required for generating the vectors for the
proposed architecture) is explained in Section 3. Section 4
describes the algorithm used for compressing the test
vectors. Section 5 describes the test time reduction of the
proposed architecture and the application of this architec-
ture to delay fault testing. Sections 6 and 7 report
experimental results and conclusions, respectively.
2 THE PROPOSED ARCHITECTURE
We start explanation of our proposed architecture with a
simple example. Let us assume that V
1
in Fig. 2a is an existing
test vector in a scan chain and V
2
is the next test vector that
must be shifted. Comparing V
1
and V
2
transitions (Fig. 2a),
there are only three differences in their bits that are called
necessary transitions. If we were to use a standard scan chain
and shift V
2
into the scan chain in eight test clocks, with
each shift, transitions shown in Fig. 2b would occur. For
example, shifting the rightmost 1 of V
2
into the scan chain
causes five transitions in the eight scan flip-flops. All
together, shifting V
2
would cause 32 transitions that are
called unnecessary transitions. On the other hand, parallel
loading V
2
directly into our architecture eliminates un-
necessary transitions on the input of a CUT.
Hence, our scan architecture should eliminate the
unnecessary transitions. In addition, the following features
should be considered for the proposed scan architecture
and DFT method:
. A scan architecture should not add extra inputs
compared to a conventional scan approach.
. A DFT approach must add no delay to the normal
operation of the circuit.
The proposed architecture, shown in Fig. 3, serves two
purposes. One is to reduce the activity at the data outputs
and the second is to facilitate test data compression. As
shown in Fig. 3, this architecture has data registers that
contain the test data applied to the CUT, a triggering chain
where the test data is shifted in, and a triggering logic
circuit with an enabling AND that determines how the test
data should be decoded and trigger the data registers.
The triggering chain is for reducing the activity at the
data register outputs. For this purpose, instead of shifting
test vectors into the data registers, triggering data is
obtained by formatting test vectors and shifted into the
triggering chain. For example, if the triggering logic has an
identity function, the current data register is 00101110, and
the new test vector is 01100111, then the triggering chain
must contain 01001001, that is, the bitwise difference of the
two vectors. This architecture also blocks changes in the test
data from being directly applied to the CUT.
The triggering logic comes into play for compression of
test data. For test compression, instead of shifting the
difference of consecutive test vectors into the triggering
chain, the 0 !1 and 1 !0 transitions of the difference are
shifted. The triggering logic is implemented by an XOR gate.
Fig. 4 shows the structure of a scan cell of the proposed scan
chain. In this case, for the two aforementioned test vectors,
01101101 should be shifted into the triggering chain. This is
formed by the XOR of adjacent bits of the difference vector
01001001, starting from the right-hand side.
As shown in Fig. 4, the DR flip-flops is the main storage
cell and contains the vector that must be applied to the
CUT. The TR chain provides the data required for selective
triggering. Testing starts by resetting the DR chain. The TR
chain cell has three modes of operation: Shift, Trigger, and
Normal. The Shift and Trigger modes are used for testing,
HOSSEINABADY ET AL.: A SELECTIVE TRIGGER SCAN ARCHITECTURE FOR VLSI TESTING 319
Fig. 2. (a) Necessary transitions between two consecutive test vectors
(three transitions). (b) Total number of unnecessary transitions in
conventional scan architecture (32 transitions).
Fig. 3. Proposed scan-chain architecture.
Fig. 4. Scan cell structure of the proposed scan architecture.
while the Normal mode is used for normal operation of the
circuit. Table 1 shows the cell configuration in the different
operational modes.
. In the Shift mode, the Enable signal is low (inactive)
and the DR flip-flops remain unchanged. Therefore,
the required data can be shifted in the TR chain with
no effect on the contents of the DR flip-flops.
. In the Trigger mode, the Enable signal is high (active)
and the multiplexer selects the input connected to
the Q output of DR flip-flops. If the XOR output is 0,
the DR flip-flop value will not change. If the XOR
output is 1, the value of the DR flip-flop is inverted.
Therefore, in the Trigger mode, a 1 at the XOR output
of a cell causes an inversion of the value stored in its
DR flip-flop. This is accomplished by storing
different values in the TR flip-flops of this cell and
its neighboring cell (to the left).
. In the normal mode, the TR chain is loaded with a
sequence of alternating 1s and 0s (1010 . . . ). This
activates the outputs of all XORs; by selecting the
normal input of the multiplexer and setting the
Enable signal to the desired value, each cell performs
its normal operation. The loading process of the TR
chain with 1010. . . is performed only once, that is,
when the test process is completed and the circuit
starts its normal operation. During the test, each new
vector is obtained through a vector update cycle.
The timing diagram for a test vector update cycle is
shown in Fig. 5. In the Shift mode, the trigger data is shifted
into the TR chain; this requires n scan clocks. After n clocks
in the Shift mode, the cell enters the Trigger mode for a
single clock. In this mode (based on the TR chain data),
some of the DR flip-flops invert their values to obtain the
new vector. A test vector is reconstructed in a single test
update cycle (Fig. 6 shows this process).
As the Enable signal can be generated internally, no
additional pin is required. This signal can be easily
generated through a pulse after receiving n clock cycles.
Moreover, the Select signal of the MUX does not require an
additional pin; the same signal in conventional scan chains
can be used to determine the Test and Normal modes. In the
Test mode, the Select signal selects the 1 input of the MUX
(from the Q output), while, in the normal mode, it selects
the other input (which directs the Normal Input to the input
of the DR flip-flop). Therefore, the proposed STSA requires
no additional test pin compared to conventional scan
structures. The output response is not captured in the scan
chain; however, there are many techniques available in the
current literature [30], [31] that can be used at no loss of
coverage depending on the application and its constraints.
Significant improvements to SoC testing are achieved
using the proposed architecture; these improvements are
described in more detail here.
Power reduction. One of the main features of the
proposed architecture is to prevent unnecessary transitions
from affecting the CUT by altering a conventional scan
chain. As the required transitions are only a small portion of
the total transitions made during scanning, a reduction in
transitions will affect power consumption. This is accom-
plished using the so-called trigger data; in the proposed
architecture, trigger data is transferred from the TR chain
because its transitions have a less significant effect on
power dissipation. A transition in the TR chain will affect
only one XOR gate (that is, a low-power consumption gate,
as described in Section 3). Cells in the TR chain change only
in the Shift mode (in this mode, the Enable signal is 0).
Therefore, transitions in the TR chain can only affect the
XOR gate and are masked by the 0 input of the AND gate.
The so-called TR transitions have substantially less impact
than the transitions in the DR chain. The proposed
architecture also reduces the number of TR chain transitions
by using XORs. The use of XORs makes it possible to send
the same difference information with a smaller number of
transitions in the TR flip-flops. Without the XOR gates,
enabling a DR flip-flop at a specific position would require
shifting a 1 to that position in the TR chain, thus resulting in
two transitions (a high-to-low and a low-to-high transition)
per shift passing through the TR chain. When using XORs,
only the difference between two adjacent TR flip-flops at a
320 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 3, MARCH 2008
TABLE 1
Different Operational Modes of the New Cell
Fig. 5. Timing diagram of a test vector update cycle.
Fig. 6. Test session state diagram.
given position is required, that is, only one transition must
pass through the TR chain to reach the specified position.
Although it may seem that adding XORs could result in a
significant area overhead, these gates can be efficiently
implemented in terms of area. As most flip-flops provide
both the Q and its complement Q
0
, an XOR functionality can
be provided by using only two transistors in a pass-
transistor-based structure, as shown in Fig. 7a. This
implementation of the XOR not only has a small impact
on area, but its effects on power consumption are negligible
due to its pass-transistor-based structure. As shown in
Fig. 7, the logic required for providing the Clk Enable signal
to each DR flip-flop requires only six transistors to
implement the functionality of both the XOR and AND
gates in Fig. 4. Strollo et al. [23] have introduced new
designs for gating the clock in D-type flip-flops to reduce
power consumption. These D-type flip-flops are best suited
for applications in which the data switching activity is low,
such as with the TR chain. These D-type flip-flops use a
comparator to determine whether the new data is identical
to the previous state of the flip-flop. If not, the comparator
output activates the clock and the new data is clocked in.
Else, the clock of the D-type flip-flop is disabled by the
comparator. The comparator is an XOR gate in which its
inputs are connected to the flip-flop inputs and output lines
(Fig. 8c). As mentioned previously, the switching activity in
the TR chain is low, so this D-type flip-flop can be used to
reduce the power consumption when scanning the data in
the TR chain. As shown in Fig. 8d, the XOR gate compares
the input of the TR cell with its output. With the D-type flip-
flop in Fig. 8b in the TR cells, the XOR gate can be removed
by using the output of the built-in XOR of the D-type flip-
flop. The inverter used for inverting the D input is not
required because the D input of each cell is the Q output of
the previous cell (whose inverted value is also available).
Fig. 9 shows the HSPICE simulation of the proposed scan
chain with three scan cells in Fig. 8a. This simulation is for
180 nm technology and a voltage supply of 1.2v. Q1, Q2, and
Q3 are the outputs of the data registers; SI is the serial input
of the TR chain. SO1, SO2, and SO are the outputs of the
three triggering registers. In this simulation, the data
registers have the value of 000 and the data shifted into
the triggering registers is 110, which corresponds to the
010 test vector. After shifting the data to the TR chain, the
new test vector is triggered in the DR register when the
Enable signal is active (that is, Trigger mode).
Data compression. Data compression techniques usually
compress the difference between successive vectors and
then use an on-chip hardware to regenerate the original
vectors from the difference information [24], [25]. Both [24]
and [25] use cyclical scan registers (CSRs) to reconstruct the
original vectors from the difference vectors. A CSR consists
of an XOR and a scan chain of length n (n is the length of the
internal scan chain of the circuit). In STSA, the additional
chain is used differently to significantly improve features
such as power utilization, execution time together with
delay testing, and high compression. The proposed archi-
tecture can be directly used to reconstruct the original test
vectors from the difference information; this is accom-
plished by triggering the DR cells as they must be flipped to
HOSSEINABADY ET AL.: A SELECTIVE TRIGGER SCAN ARCHITECTURE FOR VLSI TESTING 321
Fig. 7. (a) A two-transistor XOR. (b) The structure used for implementing
the Clk Enable signal of the DR flip-flops.
Fig. 8. Low-power flip-flop structure [23]. (a) A scan chain with three
cells. (b) Normal negative edge triggered flip-flop. (c) Comparator and
inverter. (d) Gating logic.
Fig. 9. SPICE simulation of the proposed scan chain.
construct the next test vector. This is made possible by
properly filling the TR chain such that the desired DR flip-
flops are enabled when entering the Trigger mode. This
process requires reformatting the difference information to
make the test data compatible with this architecture. This
capability makes the STSA suitable for algorithms that are
based on compressing difference vectors. Test data com-
pression can further reduce test time and also the memory
requirements of the tester.
3 TEST REFORMATTING AND COMPRESSION
Test vectors should be reformatted for use in the proposed
architecture and to generate the original vectors at the
inputs of the CUT (that is, the DR outputs of the scan cells).
This section describes the process of generating test data for
the proposed architecture from vectors provided for a
circuit with a traditional scan. Changes are made to the test
data to address power consumption while improving test
time and data volume. The process of reformatting test
vectors consists of the following steps.
3.1 Test Vector Reordering
In the first step, test vectors are reordered to further reduce
the number of transitions in the DR chain. In the reordering
process, similar vectors are placed next to each other to
reduce the number of transitions between consecutive test
vectors, hence also reducing the number of total transitions
resulting from the entire test set. This technique is usually
used in data compression to reduce the number of 1s in the
difference vectors. The Hamming distance is used as a
measure of transition activity to estimate power consump-
tion. A complete undirected graph is generated with test
vectors as nodes. Then, the Hamming distance between
each pair of vectors is assigned as the weight of the edge
connecting them. The solution to this problem consists of
finding a path that traverses all nodes with minimum
overall weight. This corresponds to the well-known
problem of finding the minimal tour for the traveling
salesman, which has been proven to be NP-complete [26].
Some heuristics can be used to find nearly optimal solutions
in polynomial time complexity [29]. When the initial order
of the test vectors is important (for example, for delay fault
testing or sequential circuits), reordering should carefully
consider the limitations of those particular cases.
3.2 Extracting the Difference Vectors
The second step extracts the difference vectors from the
reordered vectors. The difference vectors show the posi-
tions in which two consecutive vectors differ. D
k
denotes
the difference vector of two vectors V
k1
and V
k
and can be
easily calculated as
D
k
V
k1
V
k
: 2
D
0
denotes the difference between the initial state of the
scan chain and the first test vector V
0
. Difference vectors
show where transitions are required to produce the desired
test vectors, that is, the positions of the scan chain in which
the DR flip-flops should be enabled in the Trigger mode to
invert their values.
3.3 Generating the TR Chain Scan Data
In the Trigger mode, the DR flip-flops for the positions in
which transitions are required must be enabled; so,
different values should be stored in the two TR flip-flops
connected to the XOR gate for enabling the corresponding
DR flip-flop. The 1s in the difference vectors should be
translated to transitions in the new scan data. Let a
k;i
and
N
k;i
represent the bits at the ith positions in the kth vectors
of the original test set and the test set generated for the
proposed architecture; D
k;i
represents the ith bit of the
difference vector of the kth and k 1th original test
vectors. The first bit in the new vector N
0;0
can be selected
as 1 or 0. Other bits can be calculated as follows:
N
k;i
N
k;i1
D
k;i
: 3
Using (2), the new vectors are calculated from the
reordered vectors by the following equation:
N
k;i
N
k;i1
a
k1;i
a
k;i
: 4
For conversion, the first bit of each vector N
k;0
can be
arbitrarily selected. However, this may increase the number
of transitions if the last bits of some vectors are different
from the values selected for the first bit of the next vector.
Therefore, the last converted bit of each vector is used to
start the next converted vector.
Fig. 10 shows the process of converting the original test
vectors to new vectors, as required by the proposed
architecture. Each vertical arrow indicates the conversion
of a 1 in the difference vector to a transition in the TR chain
scan data.
4 COMPRESSION/DECOMPRESSION
4.1 Compression
Jas and Touba [25] and Chandra and Chakrabarty [24] use
different run-length codes to compress the difference
between successive test vectors; however, both of these
techniques use CSRs to reconstruct the original test vectors
from the difference vectors and they do not consider power
consumption during testing. The proposed STSA can be
used for reconstructing test vectors from the difference
vectors. A compression technique similar to [24] is utilized
to reduce the data volume. The reformatted vectors
preserve the information in the difference vectors. Since
successive vectors are highly correlated, then the new
vectors are likely to contain long runs of 0s and 1s and they
can be compressed by a run-length encoding technique.
As a Golomb-like coding technique is used in this paper,
then Golomb coding is briefly described next. The most
important parameter of Golombcoding is the groupsize m.
322 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 3, MARCH 2008
Fig. 10. Generating TR chain vectors from reordered test vectors.
The input string is broken into runs of the form 0
i
1. For
example, i is broken into the following runs: 01, 00000001,
and 1. Each run of the form 0
i
1 is coded as 1
q
0
y
, where
ii qqmrr 0 rr < mm and y is the binary representation of
r, using log
2
m bits. Therefore, if m 4 is chosen, then the
three aforementioned runs will be coded as 001, 1011, and
000. A concatenation of encoded runs will produce the final
code; so, 01000000011 is coded as 0011011000. Next, the
proposed Golomb-like compression technique is presented;
this is used for reducing the amount of data required by the
proposed scan architecture. Vectors for the STSA are likely
to consist of runs of consecutive 0s and 1s and a Golomb-
like coding is employed to compress them by exploiting this
characteristic. First, the vectors are broken into runs of
consecutive 0s or 1s. If the length of a 1s run is L and the
selected group size is M, the run is encoded as 1
Q
0
Y
, where
LL 1 QM QM RR 0 RR < MM and Y is the binary repre-
sentation of the remainder R using log
2
M bits. The first part
of the compressed code (that is, 1
Q
0) is called the prefix and
the second part (that is, Y ) is called the tail. Fig. 11 shows
this process on a reformatted test vector. Assume that the
vector 0000001111111110001 is constructed for the proposed
scan architecture. It is then broken into four runs; the runs
consist of six 0s, nine 1s, three 0s, and one 1. For the first
run, L 6; so, if M 4, then Q 1, R 1, and the encoded
vector is 1010. This process is performed for the next three
iterations and the results are concatenated to form the final
coded vector.
4.2 Decompression
Decompression is accomplished with an on-chip decoder;
similarly to [24], using a log
2
MM-bit counter and a state
machine, we can implement the Golomb-like decoder
(Fig. 12).
Fig. 12a shows the block diagram of the decoder in which
the counter counts by the activation on the input inc signal
and informs its termination to the Finite State Machine (FSM)
by using the signal rs. The FSM in Fig. 12a receives the
compressed data pattern and the rs signal from the counter
and generates the decompressed data on its out output. In
addition, the v signal determines the times in which the value
on v is valid. The FSMdiagramin Fig. 12b shows the behavior
of the decompression method. States S0-S3 are used to
decode the prefix in the compressed data and States S4-S8
are used to decode the tail in the compressed data.
5 TEST TIME REDUCTION AND DELAY FAULT
5.1 Low-Cost ATE
For a compression method, we can use two different clock
frequencies for the scan cells and the ATE that provides the
test vectors. If f
ATE
f
ATE
and f
SCAN
f
SCAN
are the frequencies of the
external ATE and the internal scan cells, then f
ATE
f
ATE
< f
SCAN
f
SCAN
.
This way, a slow (and consequently inexpensive) ATE can
be used to test high-speed circuits. These two clock
frequencies should be synchronized, thus, based on [24],
f
SCAN
f
SCAN
Mf
ATE
Mf
ATE
, where M is the group size in the Golomb-
like compression.
If the uncompressed data (reformatted data) contains
p vectors and the length of the scan chain is n bits, then
pn bits should be shifted in the circuit and, therefore,
pn ATE clocks are needed to test the circuit. If the numbers
of consecutive 1s and 0s patterns in the reformatted data are
given by r and n
c
is the number of bits in the compressed
data, then the number of external clock cycles to shift the
compressed data to the scan chain is less than TT
max

Mn Mn
cc
rrMM log
MM
22
1 [24]. Using the scheme described in
[24], if f
ATE
f
ATE
0
and f
ATE
f
ATE
are the clock frequencies of the ATE
without and with compression, then
f
0
ATE
f
0
ATE
f
ATE
f
ATE

pn pn
n
c
n
c
rr log
MM
22
rr=MM
: 5
5.2 Low Test Application Time
The proposed architecture avoids the use of multiplexers
(MUX) in the scan path of the TR chain. The absence of
these MUXs from the critical scan path leads to a higher
scan clock frequency in the Shift mode, hence reducing the
total test time. The Trigger mode is entered when the
required data is loaded in the TR chain. After a sequence of
n clocks in the Shift mode (for a scan chain of length n), the
cell is in the Trigger mode for a single clock. When
operating in the Trigger mode (immediately after the Shift
mode), a stable MUX output is present. During the shift
cycle, the output of the DR flip-flop and the select input of
the MUX do not change. Therefore, the D input of the
DR flip-flop is stable when the circuit enters the Trigger
mode. The delay of the MUX has no impact on the clock
frequency. Therefore, the proposed architecture adds no
delay to the normal operation of the circuit compared with
a conventional scan arrangement; additional features
provided by the proposed scan architecture will be
described in more detail in the following sections.
HOSSEINABADY ET AL.: A SELECTIVE TRIGGER SCAN ARCHITECTURE FOR VLSI TESTING 323
Fig. 11. Example of proposed Golomb-like coding.
Fig. 12. Golomb-like decoder: (a) block diagram and (b) FSM.
5.3 Delay Faults
Delay faults are those faults that affect the timing of the
circuit without changing its logic operation. In a delay fault,
the traversal of one or more paths (not necessarily the
critical path) exceeds the clock period. Testing a delay fault
requires placing the appropriate transition at the input of
the path and appropriately setting the required off-path
inputs of those gates located on the path under test.
Moreover, a circuit should be clocked at-speed after the
application of each vector. The transition on the circuit
inputs requires the application of different vectors at two
consecutive clocks. Therefore, delay fault testing consists of
vector pairs. However, conventional scan cells cannot apply
two arbitrary vectors in two consecutive clock cycles. The
proposed scan cell allows for the application of arbitrary
vector pairs, hence making possible testing of delay faults.
This is accomplished as follows: Assume that the test
generated for a delay fault includes the vector pair (V
1
; V
2
).
First, the DRchain is loaded with V
1
. Then, the TRvector that
is required for changing the DR vector V
1
to V
2
is shifted into
the E chain. At the next clock edge, the DR vector V
1
is
changed to V
2
. The proposed scan architecture allows
applying two arbitrary vectors to the CUT in two consecutive
clocks. Using the HSPICE simulation of the scan-chain in
Fig. 8a, Fig. 13 shows this scenario for delay fault testing.
Another method for delay fault testing is the so-called
enhanced scan technique [27]. However, this increases the
path delay during normal operation compared to the
enhanced scan; the proposed scan architecture not only
allows a reduction of data volume, time, and power, it also
does not increase the delay during normal operation when
testing for delay faults.
6 EXPERIMENTAL RESULTS
The proposed architecture has been evaluated using fully
specified vectors generated for some of the ISCAS 85 and 89
benchmark circuits [28]. The full-scan versions of the
sequential benchmark circuits and predefined fully speci-
fied test vectors are utilized. The use of partially specified
test vectors would provide an evaluation with improved
results. In all tables, n and p represent the scan chain length
and the number of test vectors in the test set, respectively.
Table 2 shows the number of necessary and unnecessary
transitions for the application of the test sets to the circuit
inputs. T
TB
denotes the total number of unnecessary
transitions in the scan chain. T
NB
and T
NA
denote the
number of necessary transitions required for the application
of the same test vectors to the inputs of the CUT before and
after vector reordering, respectively. The last column of this
table shows the efficiency of the reordering algorithm. In
some cases, scan vector reordering significantly affects the
number of transitions.
The test vectors are reordered to reduce the number of
transitions between vector pairs; this is accomplished by
constructing a complete graph with test vectors as nodes
and the Hamming distance between the two connected
vectors as the weight of the edge between two nodes; the
Lin-Kernighan Heuristic (LKH) algorithm [29] is then run
on this complete graph. The execution time of this
algorithm is On
2:2
[29].
Fig. 14 shows the runtime of the reordering algorithm for
the test vectors of the benchmark circuits.
In a conventional scan design, all transitions occurring in
the scan chain are directly applied to the CUT. However, in
the STSA, only the DR transitions are applied to the CUT.
As power consumption is proportional to the number of
transitions, a substantial reduction is achieved. As shown in
Table 3, the transitions occurring in the logic circuitry of the
CUT are reduced using the Selective Trigger architecture.
The number of transitions in the flip-flops is also less than
in a conventional scan design. In this table, ST
T
is the total
number of transitions in a conventional scan design. ST
DR
and ST
TR
are the numbers of transitions in the DR and
TR cells in the proposed STSA when applying the same test
set. L
TT
and LT
ST
are the number of transitions in the
324 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 3, MARCH 2008
Fig. 13. Timing diagram of delay testing with the proposed scan
architecture.
internal logic of the CUT when applying the test set using a
conventional scan design and the STSA, respectively. The
simulation results show that this architecture achieves a
significant power reduction for long scan chains because,
while shifting vectors in a conventional scan chain, each
transition passes through all scan cells and produces
undesired transitions at the CUT inputs. In the proposed
architecture, undesired transitions due to shifting are
eliminated by directly applying the necessary changes to
the desired location (TR flip-flops).
Table 4 shows the compression ratios obtained for some
of the ISCAS benchmark circuits by applying the proposed
Golomb-like coding to the vectors generated for the STSA.
In this table, the compression ratios are reported for
different group sizes. The percentages of improvement for
different group sizes are also reported.
HSPICE simulation of the STSA in Fig. 8a has been
performed; the propagation delays of the DR and TR cells
are computed and are shown in Table 5. Three different
technologies used in this table are listed in the first column.
The second column shows the voltage supply of the
simulation. The propagation delay of the DR and TR cells
is shown in Columns 3 and 4, respectively. The last column
shows the percentage of delay reduction when we use the
TR chain instead of the DR chain to shift data vectors (the
speed of STSA is almost twice the traditional scan chain).
Table 6 shows the ability of Golomb-like compression to
use a slower ATE without affecting test application time.
Using (5), the last column of this table shows the ratio of the
ATE clock frequency reduction in the proposed method.
7 CONCLUSIONS
This paper has proposed a novel scan architecture that
considerably improves SoC testing in terms of power and
time. Power consumption is reduced during the test process
by preventing transitions in the scan chain from spreading
into the CUT. Data in this architecture is scanned at higher
clock frequencies because no MUX is introduced in the
critical scanning path (referred to in this paper as the
E chain). It was shown that the proposed architecture can
also be used for reconstructing the original test vectors from
their differences within a compressed data stream. The
applicability of the proposed architecture to compression
leads to a shorter test time and reduces ATE memory
requirements. A further feature of this architecture is its
HOSSEINABADY ET AL.: A SELECTIVE TRIGGER SCAN ARCHITECTURE FOR VLSI TESTING 325
TABLE 2
Comparing Necessary Transitions and Unnecessary Transactions
Fig. 14. Runtime of the reordering with respect to the number of test
vectors in each benchmark circuit.
applicability to delay fault testing. This is achieved by
enabling the application of two arbitrary test vectors to the
CUT in two consecutive clock cycles. The process of test
data reformatting for this new architecture has been
presented; a Golomb-like coding compression algorithm
has been proposed and analyzed. Experimental results for
various ISCAS benchmark circuits have confirmed the
effectiveness of the proposed approach on predefined fully
specified vectors. When embedded in a core by the
provider, this architecture can be used by an integrator
for different scan-based strategies in testable SoC designs.
326 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 3, MARCH 2008
TABLE 3
Number of Transitions in SCAN Cells and Combinational Logic
TABLE 4
Percentage of Compression Obtained by Golomb-Like Coding
TABLE 6
Comparison between ATE Clock of the
Proposed Method and Traditional Scan
TABLE 5
HSPICE Simulation
ACKNOWLEDGMENTS
This paper is based on the MS thesis of Shervin Sharifi at the
University of Tehran in 2003. The preliminary version of
this paper appeared in the Proceedings of the 18th IEEE
International Symposium on Defect and Fault Tolerance in VLSI
Systems in 2003. The authors would like to thank the
anonymous reviewers of this paper for very insightful
suggestions and comments for improving the paper.
REFERENCES
[1] Y. Zorian, A Distributed BIST Control Scheme for Complex VLSI
Devices, Proc. IEEE VLSI Test Symp., pp. 4-9, 1993.
[2] S. Wang and S.K. Gupta, ATPG for Heat Dissipation Minimiza-
tion during Scan Testing, Proc. IEEE Design Automation Conf.,
pp. 614-619, 1997.
[3] A. Wang and S. Gupta, LT-RTPG: A New Test-Per-Scan BIST
TPG for Low Heat Dissipation, Proc. IEEE Intl Test Conf., pp. 85-
94, 1999.
[4] S. Sharifi, J. Jaffari, M. Hosseinabady, Z. Navabi, and A. Afzali-
Kusha, A Scan-Based Structure with Reduced Static and
Dynamic Power Consumption, J. Low Power Electronics, vol. 2,
no. 3, pp. 477-487, Dec. 2006.
[5] S. Sharifi, J. Jaffari, M. Hosseinabady, Z. Navabi, and A. Afzali-
Kusha, Simultaneous Reduction of Dynamic and Static Power in
Scan Structures, Proc. Design, Automation and Test in Europe,
vol. 2, pp. 846-851, 2005.
[6] D.H. Baik, K.K. Saluja, and S. Kajihara, Random Access Scan: A
Solution to Test Power, Test Data Volume and Test Time, Proc.
17th Intl Conf. VLSI Design, pp. 883-888, 2004.
[7] V. Dabholkar, S. Chakrabarty, I. Pomeranz, and S. Reddy,
Techniques for Minimizing Power Dissipation in Scan and
Combinational Circuits during Test Application, IEEE Trans.
Computer-Aided Design, vol. 20, no. 7, pp. 911-917, 2001.
[8] S. Wang and S.K. Gupta, An Automatic Test Pattern Generator
for Minimizing Switching Activity during Scan Testing Activity,
IEEE Trans. Computer-Aided Design, vol. 21, no. 8, pp. 954-968,
2002.
[9] O. Sinanoglu, I. Bayraktaroglu, and A. Orailoglu, Test Power
Reduction through Minimization of Scan Chain Transitions, Proc.
20th IEEE VLSI Test Symp., pp. 166-171, 2002.
[10] N. Nicolici and B.M. Al-Hashimi, Multiple Scan Chain for Power
Minimization during Test Application in Sequential Circuits,
IEEE Trans. Computers, vol. 51, no. 5, May 2002.
[11] J. Saxena, K. Butler, and L. Whetsel, A Scheme to Reduce Power
Consumption during Scan Testing, Proc. IEEE Intl Test Conf.,
pp. 670-677, Oct. 2001.
[12] P. Rosinger, B.M. Al-Hashimi, and N. Nicolici, Scan Architecture
for Shift and Capture Cycle Power Reduction, Proc. 17th IEEE
Intl Symp. Defect and Fault Tolerance in VLSI Systems, pp. 129-137,
2002.
[13] S. Bhunia, H. Mahmoodi, D. Ghosh, S. Mukhopadhyay, and K.
Roy, Low-Power Scan Design Using First-Level Supply Gating,
IEEE Trans. Very Large Scale Integration Systems, vol. 13, no. 3,
pp. 384-395, 2005.
[14] J.J. Chen, C.K. Yang, and K.J. Lee, Test Pattern Generation and
Clock Disabling for Simultaneous Test Time and Power Reduc-
tion, IEEE Trans. Computer-Aided Design, vol. 22, no. 3, pp. 363-
370, Mar. 2003.
[15] A. Jas and N. Touba, Using an Embedded Processor for Efficient
Deterministic Testing of Systems-on-a-Chip, Proc. IEEE Intl Conf.
Computer Design, pp. 418-423, Oct. 1999.
[16] A. Jas and N. Touba, Test Vector Decompression via Cyclical
Scan Chains and Its Application to Testing Core-Based Design,
Proc. Intl Test Conf., pp. 458-464, Oct. 1998.
[17] J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data
Compression, IEEE Trans. Information Theory, vol. 23, pp. 337-343,
May 1977.
[18] J. Ziv and A. Lempel, A Compression of Individual Sequences
via Variable-Rate Coding, IEEE Trans. Information Theory, vol. 24,
pp. 530-538, Sept. 1978.
[19] R.F. Rice, Some Practical Universal Noiseless Coding Techni-
ques, JPL, Mar. 1979.
[20] J.S. Vitter and P.G. Howard, Fast and Efficient Lossless Image
Compression, Proc. IEEE Data Compression Conf., pp. 351-360,
Apr. 1993.
[21] A. Chandra and K. Chakrabarty, Test Data Compression and
Decompression Based on Internal Scan Chains and Golomb
Coding, IEEE Trans. Computer-Aided Design, vol. 21, pp. 715-
722, June 2002.
[22] L. Lingappan, S. Ravi, A. Raghunathan, N.K. Jha, and S.T.
Chakradhar, Test-Volume Reduction in Systems-on-a-Chip
Using Heterogeneous and Multilevel Compression Techniques,
IEEE Trans. Computer-Aided Design of Integrated Circuits and
Systems, vol. 25, no. 10, pp. 2193-2206, 2006.
[23] A. Strollo, E. Napoli, and D. De Caro, New Clock-Gating
Techniques for Low-Power Flip-Flops, Proc. Intl Symp. Low
Power Electronics and Design, pp. 114-119, July 2000.
[24] A. Chandra and K. Chakrabarty, System-on-a-Chip Test Data
Compression and Decompression Architectures Based on Golomb
Codes, IEEE Trans. Computer-Aided Design, vol. 20, no. 3, pp. 355-
368, Mar. 2001.
[25] A. Jas and N.A. Touba, Test Vector Decompression via Cyclical
Scan Chains and Its Application to Testing Core-Based Design,
Proc. Intl Test Conf., pp. 458-464, Nov. 1998.
[26] T. Cormen, C. Leiserson, and R. Rivest, Introduction to Algorithms.
MIT Press, 1990.
[27] M.L. Bushnell and V.D. Agrawal, Essentials of Electronic Testing for
Digital, Memory and Mixed-Signal VLSI Circuits. Kluwer Academic,
2000.
[28] ATOM Test Vectors, http://www.crhc.uiuc.edu/IGATE/
ATOM-vectors.html, 2007.
[29] K. Helsgaun, An Effective Implementation of the Lin-Kernighan
Traveling Salesman Heuristic, DATALOGISKE SKRIFTER (Writ-
ings on Computer Science), 81, Roskilde Univ., 1998.
[30] N.K. Jha and S. Gupta, Testing of Digital Systems. Cambridge Univ.
Press, 2003.
[31] M.L. Bushnell and V.D. Agrawal, Essentials of Electronic Testing for
Digital, Memory, and Mixed-Signal VLSI Circuits. Kluwer Academic,
2000.
[32] Y. Bonhomme, T. Yoneda, H. Fujiwara, and P. Girard, An
Efficient Scan Tree Design for Test Time Reduction, Proc. Ninth
IEEE European Test Symp., pp. 174-179, 2004.
[33] K. Miyase, S. Kajihara, and S.M. Reddy, Multiple Scan Tree
Design with Test Vector Modification, Proc. 13th Asian Test Symp.,
2004.
[34] S. Ghosh, S. Basu, and N.A. Touba1, Joint Minimization of Power
and Area in Scan Testing by Scan Cell Reordering, Proc. IEEE CS
Ann. Symp. VLSI, pp. 246-249, 2003.
[35] V. Dabholkar, S. Chakravarty, I. Pomeranz, and S.M. Reddy,
Techniques for Minimizing Power Dissipation in Scan and
Combinational Circuits during Test Application, IEEE Trans.
Computer Aided Design, vol. 17, no. 12, pp. 1325-1333, 1998.
Mohammad Hosseinabady received the BS
degree in electrical engineering from the Sharif
University of Technology, Iran, in 1992, the MS
degree in electrical engineering from the Uni-
versity of Tehran, Iran, in 1995, and the PhD
degree in computer engineering from the Uni-
versity of Tehran in 2006. From 2005 to 2006, he
was a research assistant at the Politecnico di
Torino, working on soft errors in deep-submicron
CMOS technologies. He is currently a research
assistant at the University of Bristol, working on Network-on-Chip (NOC)
reliability. His research interests include high-level testability, SoC
testing, NoC structures and testing, and soft error in nanometer CMOS
technologies. He has published several papers on these topics in
journals and conference proceedings.
Shervin Sharifi received the BS degree in
computer engineering from the Sharif University
of Technology, Tehran, Iran, in 2000 and the MS
degree in computer engineering from the Uni-
versity of Tehran, Tehran, Iran, in 2003. He is
currently working toward the PhD degree in
computer engineering at the University of
California, San Diego. His research interests
include low-power design, test, reliability, and
VLSI architectures for signal processing.
HOSSEINABADY ET AL.: A SELECTIVE TRIGGER SCAN ARCHITECTURE FOR VLSI TESTING 327
Fabrizio Lombardi received the BSc (Hons)
degree in electronic engineering from the Uni-
versity of Essex, United Kingdom, in 1977, the
masters degree in microwaves and modern
optics and the diploma in microwave engineer-
ing from the University College London in 1978,
and the PhD degree from the University of
London in 1982. He is currently the holder of the
International Test Conference (ITC) Endowed
Chair Professorship at Northeastern University,
Boston. At the same institution, from 1998 to 2004, he served as chair of
the Department of Electrical and Computer Engineering. Prior to joining
Northeastern, he was a faculty member at Texas Tech University, the
University of Colorado-Boulder, and Texas A&M University. His
research interests are testing and design of digital systems, bio and
nanocomputing, emerging technologies, defect tolerance, and CAD
VLSI. He has extensively published on these topics and is the coauthor
or editor of seven books. Since 2000, he has been an associate editor of
IEEE Design and Test. He has been with the IEEE Transactions on
Computers, as an associate editor from 1996 to 2000, an associate
editor-in-chief from 2000 to 2006, and as the editor-in-chief since 2007.
He was a guest editor of special issues in archival journals and
magazines such as the IEEE Transactions on Computers, IEEE
Transactions on Instrumentation and Measurement, IEEE Micro, and
IEEE Design and Test. He was a distinguished visitor of the IEEE
Computer Society twice, from 1990 to 1993 and 2001 to 2004. He is the
founding general chair of the IEEE Symposium on Network Computing
and Applications. He is a senior member of the IEEE. He has been
involved in organizing many international symposia, conferences, and
workshops sponsored by professional organizations. He has also served
as the chair of the Committee on Nanotechnology Devices and
Systems, Test Technology Technical Council of the IEEE since 2003.
He has received many professional awards, such as a visiting fellowship
at the British Columbia Advanced System Institute at the University of
Victoria, Canada, in 1988, the Texas Experimental Engineering Station
Research Fellowship twice from 1991 to 1992 and 1997 to 1998, the
Halliburton Professorship in 1995, the Outstanding Engineering Re-
search Award at Northeastern University in 2004, and an International
Research Award from the Ministry of Science and Education of Japan
from 1993 to 1999. He was also the recipient of the 1985/86 Research
Initiation Award from the IEEE/Engineering Foundation and a Silver Quill
Award from Motorola, Austin, in 1996.
Zainalabedin Navabi received the BS degree
from the University of Texas at Austin in 1975
and the MS and PhD degrees from the
University of Arizona in 1978 and 1891, respec-
tively. He is an adjunct professor of electrical
and computer engineering at Northeastern Uni-
versity and an electrical and computer engineer-
ing professor at the University of Tehran. His
involvement with hardware description lan-
guages began in 1976 when he started the
development of a register-transfer level simulator for one of the very first
Hardware Description Languages (HDLs). In 1981, he completed the
development of a synthesis tool that generated an MOS layout from an
RTL description. Since 1981, he has been involved in the design,
definition, and implementation of HDLs. He started one of the first full
HDL courses at Northeastern University in 1990. Since then, he has
conducted many short courses and tutorials on this subject in the US
and abroad. In addition to being a professor, he is also a consultant to
CAE companies. He is the author of several textbooks and computer-
based trainings on VHDL, Verilog, and related tools and environments.
He has written numerous papers on HDLs in simulation, synthesis, and
test of digital systems. He is a senior member of the IEEE and a member
of the IEEE Computer Society, the ASEE, and the ACM.
. For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
328 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 3, MARCH 2008

También podría gustarte