Está en la página 1de 1

In this paper we present an HPC based automated LFSRbased BIST test vector generator synthesis technique for mixed-signal

SoCs (Fig. 1) that can be used to: a) embed deterministic patterns for digital logic BIST, and b) store sinusoidal stimuli or pre-calculated Delta-Sigma modulated bitstream for analog and mixed-signal BIST. The Non-Exclusive XOR Test (NEXT) 2-D LFSR for testing core-based designs on a SoC proposed in [10] is utilized. The computational complexity involved in the synthesis procedure is primarily handled by a Nvidia Tesla C2050 graphics processor unit (GPU) that accelerates the synthesis. NEXT 2-D LFSR - INTRODUCTION The NEXT 2-D LFSR [10], an SoC BIST technique, improves BIST hardware and resource utilization in SoC in comparison with the 2-D LFSR [11]. It mainly comprises of five functional blocks the flip-flop array (FFA), the configuration networks (CN), the multiplexers (MUXs), the demultiplexers (DEMUXs), and the control unit (CU). The FFA is an N flip-flop array, where N is the number of inputs of a circuit under test (CUT). Each CN consists of logic gates and forms feedback paths from FFA. A MUX between CN and FFA selects one of the n configuration networks to feed the feedback signals to the FFA. The MUX is controlled by the CU. Each core has its own CN and FFA. N1, N2 Nn represents primary inputs of each SoC core. MUX and DEMUX select the CN and FFA and feed the test patterns to the corresponding SoC core. All the selections of MUXs and DEMUX are controlled by the CU. The 2-D LFSR [11] used only exclusive-OR (XOR) function and multiple stages of flipflop. However, in NEXT 2-D LFSR [10] the next state operation of LFSR was implemented by finding the solution of the logic function using not only XOR gates but also basic gates such as NOR, NAND, INV, etc. in multiple stages of flip-flop. It was found that by using more types of gates in NEXT 2-D LFSR, the optimization constraints were relaxed as the design solution space is increased while searching for the optimum solution. The general representation of the NEXT 2-D LFSR based on N primitive polynomials with more than one stage in the FFA is as follows:
Vi =

II.

optimum feedback logic for the first primary input with minimum fan-in (m = 1) such as inverter is considered. This procedure is complete by finding a solution of using minimum fan-in gate, which satisfies Eq. (1). The same procedure is repeated for the next primary inputs and subsequent configurations. For any input considered when a logic gate is not found to meet the feedback requirement, the fan-in (m value) is then incremented by 1. And, the above procedure is recursively applied to all inputs. In any situation when even incrementing m, a logic gate still cannot be found to meet the feedback requirement, the number of patterns considered for that particular configuration is decremented and the above procedure is repeated. After all the test patterns are considered a NEXT 2-D LFSR function utilizing the least hardware is reported. Based on hardware comparison of benchmark circuits it was observed that an average of 75% reduction is achieved using the NEXT 2D-LFSR approach [10] as opposed to the original 2D-LFSR [11]. The BIST circuit generated using the NEXT 2-D LFSR for am2910 (12-bit microprogram sequencer), mult16 (16-bit 2's-complement shift-and-add multiplier), div16 (6-bit divider using repeated subtractions), pcont2 (8-bit controller for DSP applications), and piir8 (8-bit digital filter) achieved a hardware reduction of 76%, 86%, 83%, 57%, and 72% respectively. The NEXT 2-D LFSR BIST synthesis algorithm that was utilized in [10] to find an optimal solution involves a computationally intensive exhaustive search procedure. The next section presents an HPC based optimized NEXT 2-D LFSR algorithm that drastically reduces the computational time involved in the BIST automated synthesis for LFSR based BIST test vector generator for analog and mixed-signal SoCs. III. HPC BASED BIST HARDWARE SYNTHESIS An Nvidia Tesla C2050 GPU with 14 streaming multiprocessors (each with 32 cores clocked at 1.15 GHz) was utilized as the hardware accelerator for the synthesis procedure. As the C based Compute Unified Device Architecture (CUDA) API was used for programming the GPUs, the development time for this application was reduced considerably. The hardware and software model of the hardware accelerator is described next. A. GPU Hardware Model NVIDIAs CUDA enhances General Purpose computation, where the GPU is used as a multi-core coprocessor. All the 14 streaming processors in Tesla C2050 GPU have common access to 64 kB of shared memory within a multi-processor. Each multiprocessor has one set of 32-bit registers per processor, constant memory and texture caches. Tesla C2050 supports concurrent kernel execution, which allows programs that execute a number of small kernels to utilize the whole GPU. B. GPU Software Model The CPU is referred to as the host and the GPU is referred to as the device. CUDA assumes that the host and the device have separate accesses to their memory, also referred to as host

j =1 k =1

aijk

min( i )

V j Dk

(1)

The circuit consists of N shift registers, each of which has M stages. Vi (i=1~N) represents an N-bit vector and Gmin(i) represents the logic gate with minimum number of inputs. Logic gates considered for the implementation according to the priority order are direct wire connection, INV, 2-input (NAND, NOR, XOR, XNOR), 3-input (NAND, NOR, XOR, XNOR), and so on so forth. The Vi Dk (k=1~M) represents kth delay of a vector Vi. When solving the first vector V1, V(j =1) represents the first embedded pattern. In the optimization algorithm for Eq. (1) [10], the number of patterns per configuration, the maximum fan-in of logic gates, and the number of stages in FFA are determined after test patterns to be embedded are inputted. The maximum number of stages is initially set to three. In the first iteration, an

También podría gustarte