VLSI DSP Project Report - 1.0

A 1024 POINT RADIX-2
2
AND COMPLEX
FFT DESIGN USING WALLACE
MULTIPLIER
Le Cai
Student ID: 4125589
Yuan Xu
Student ID: 4139225
Zhe Zhang
Student ID: 4137165
December 18th, 2009
i
TABLE OF CONTENTS
A 1024 POINT RADIX-22 AND COMPLEX FFT DESIGN USING WALLACE MULTIPLIER
.............................................................................................................................................. I
OBJECTIVE: ................................................................................................................................. IV
THE WALLACE MULTIPLIER DESIGN ..................................................................................... 5
Booth recoding ..................................................................................................................... 5
WallaceTree_Adder for Partial Product Reduction ............................................................. 7
32-bit Brent Kung adder ...................................................................................................... 8
SIMULATION RESULTS .............................................................................................................. 9
SYNTHESIS RESULTS OF THE WALLACE MULTIPLIER ..................................................... 9
Results of Critical Path ..................................................................................................... 10
Results of Power Consumption ......................................................................................... 13
Results of Area ................................................................................................................... 14
Conclusion of Phase1 ........................................................................................................ 14
SERVERAL OTHER FFT DESIGNS .......................................................................................... 15
R2MDC .............................................................................................................................. 15
R2SDF ................................................................................................................................ 15
R4SDF ................................................................................................................................ 16
R4MDC .............................................................................................................................. 16
R4SDC ............................................................................................................................... 16
FFT DESIGN BASED ON RADIX-22 ALGORITHM ............................................................... 17
RADIX-22 SDF ARCHITECTURE FOR 1024 POINTS COMPLEX FFT ................................ 19
SYNTHESIS RESULTS OF THE 1024 POINTS FFT ................................................................. 23
Results of Power Dissipation ............................................................................................. 24
Results of Critical Path ...................................................................................................... 25
Results of Area ................................................................................................................... 35
Conclusion of the Phase2 ................................................................................................... 36
REFERENCES .............................................................................................................................. 37
ii
LIST OF FIGURES
FIG 4. SIMULATION WAVEFORMS OF WALLACE MULTIPLIER................................9
FIG 5. R2MDC(N=16)..................................................................................................................15
FIG 7. R4SDF(N=256)..................................................................................................................16
FIG 8. R4MDC(N=256)................................................................................................................16
FIG 9. R4SDC(N=256)..................................................................................................................17
FIG 10. BUTTERFLY WITH DECOMPOSED TWIDDLE FACTORS.....................................19
FIG 11. 1024 POINTS RADIX-22 FFT ARCHITECTURE.........................................................20
FIG 12. BF2I..................................................................................................................................20
FIG 13. BF2II.................................................................................................................................20
FIGURE 14. BUTTERFLY ARCHITECTURE OF THE 1024 POINTS RADIX-22 FFT..........21
FIGURE 15. SIMULATION WAVEFORMS OF 1024 POINTS FFT.........................................23
Figure 16. Schematic View of 1024 points FFT.36
LIST OF TABLES
TABLE 2. TWIDDLE FACTORS OF EACH WIRE...................................................................22
iii
OBJECTIVE:
Design a 1024-point radix-2
2
and complex FFT module based on Booth recoding Wallace
multiplier with Verilog. This project contains two stages.
In the first stage, implement a 1616 multiplier based on Wallace tree using radix-4 booths
algorithm. By using Verilog, design sub-blocks such as half adder, full adder, booth encoder,
partial product generator, 32-bit Brent Kung adder, and Wallace tree carry save adder. A
simulation is carried out to verify the correct function of the proposed multiplier.
In the second stage, design a 1024-point radix-2
2
and complex FFT module based on the first
stage. Based on the He and Torkelsons paper, the proposed 1024-point FFT processor utilizes
simplified cascaded radix-2
2
single-path delay feedback (SDF) structure. The control circuit of
the proposed simplified radix-2
2
FFT SDF architecture is simpler than that of the direct radix-4
FFT SDF structure. The multiplier cost of the proposed FFT architecture is less than that of the
previous FFT structures in 1024-point FFT applications. Only 4 complex multipliers and 1024
complex-word data memory are needed for the pipelined 1024-point FFT processor.
iv
THE WALLACE MULTIPLIER DESIGN
Booth recoding
module Booth recoding
Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding the numbers that are multiplied.
Reducing the Number of Partial Products
It is possible to reduce the number of partial products by half, by using the technique of radix 4 Booth recoding. The basic idea is that, instead of
shifting and adding for every column of the multiplier term and multiplying by 1 or 0, we only take every second column, and multiply by 1, 2, or
0, to obtain the same results. So, to multiply by 7, we can multiply the partial product aligned against the least significant bit by -1, and multiply the
partial product aligned with the third column by 2.
Partial Product 0 = Multiplicand * -1, shifted left 0 bits (x -1)
Partial Product 1 = Multiplicand * 2, shifted left 2 bits (x 8)
This is the same result as the equivalent shift and add method:
The advantage of this method is the halving of the number of partial products. This is important in circuit design as it relates to the propagation delay
in the running of the circuit, and the complexity and power consumption of its implementation.
Radix-4 Booth Recoding
To Booth recode the multiplier term, we consider the bits in blocks of three, such that each block overlaps the previous block by one bit. Grouping
starts from the LSB, and the first block only uses two bits of the multiplier (since there is no previous block to overlap):
Fig 1.Grouping of bits from the multiplier term for Booth recoding
5
Table 1. Booth recoding mapping calculation
We generate three signals depending on the input bits, S(for shift x2) , N(when negative), Z(when zero).The logic expressions are
Sign Extension Tricks
Once the Booth recoded partial products have been generated, they need to be shifted and added together. The problem with implementing this in
hardware is that the first partial product needs to be sign extended by 6 bits, the second by four bits, and so on. This is easily achievable in hardware,
but requires additional logic gates than if those bits could be permanently kept constant.
The procedure to do this is:
Invert the most significant bit (MSB) of each partial product
Add an additional '1' to the MSB of the first partial product
Add an additional '1' in front of each partial product
This technique allows any sign bits to be correctly propagated, without the needs to sign extend all of the bits.
6
WallaceTree_Adder for Partial Product Reduction
module Wallace tree adder for Partial product reduction Tables
7
Fig 2.Wallace tree adder for Partial product reduction
32-bit Brent Kung adder
module Brent Kung adder(VMA)
In order to build fast adders, it is necessary to organize carry propagation and generation into recursive trees.
Here is the definition:

Pi means the Cin=Cout, while Gi means the Cout = 1 and idependent of Cin.
Then, the expression of Sum and Cout of one adder could be:
We need dot operator to do recursive implementation.
There gives a 16-bit Brent-Kung adder for example
Fig 3. 16-bit Brent-Kung adder
8
SIMULATION RESULTS
The simulation is carried out with Modelsim SE 6.5c. The results are shown in Fig. 4 as follows:
Fig 4. Simulation Waveforms of Wallace Multiplier.
SYNTHESIS RESULTS OF THE WALLACE MULTIPLIER
With the aid of Synopsis Design Compiler, we employ FreePDK 45 nm CMOS technique to
obtain the synthesis results with respect of power dissipation, length of critical path, and silicon
area.
9
Results of Critical Path
The result of Critical Path is shown as follows:
****************************************
Report : timing
-path full
-delay max
-max_paths 1
-sort_by group
Design : multiplier
Version: A-2007.12
Date : Tue Nov 17 16:39:59 2009
****************************************
Operating Conditions: typical Library: gscl45nm
Wire Load Model Mode: top
Startpoint: B[3] (input port)
Endpoint: sum[31] (output port)
Path Group: (none)
Path Type: max
Point Incr Path
--------------------------------------------------------------------------
input external delay 0.00 0.00 r
B[3] (in) 0.00 0.00 r
pre_coding/y[3] (booth_coding) 0.00 0.00 r
pre_coding/U63/Y (INVX1) 0.04 0.04 f
pre_coding/U50/Y (NAND3X1) 0.04 0.09 r
pre_coding/U13/Y (BUFX2) 0.04 0.12 r
pre_coding/U49/Y (OAI21X1) 0.01 0.14 f
pre_coding/C857/Z_11 (*SELECT_OP_5.15_5.1_15) 0.00 0.14 f
pre_coding/pp1[12] (booth_coding) 0.00 0.14 f
tree_adder/pp1[12] (wallace_tree_adder) 0.00 0.14 f
tree_adder/adder13/inb (full_adder_81) 0.00 0.14 f
tree_adder/adder13/U6/Y (XNOR2X1) 0.07 0.20 r
tree_adder/adder13/U3/Y (XOR2X1) 0.07 0.27 r
tree_adder/adder13/sum (full_adder_81) 0.00 0.27 r
tree_adder/adder50/ina (full_adder_53) 0.00 0.27 r
10
tree_adder/adder142/U3/Y (XOR2X1) 0.05 0.78 f
tree_adder/adder142/sum (full_adder_1) 0.00 0.78 f
tree_adder/VMA/A[10] (brent_kung_28bitadder) 0.00 0.78 f
tree_adder/VMA/pg10/A (p_g_17) 0.00 0.78 f
tree_adder/VMA/pg10/U2/Y (AND2X1) 0.04 0.82 f
tree_adder/VMA/pg10/G (p_g_17) 0.00 0.82 f
tree_adder/VMA/U30/Y (AND2X1) 0.06 0.88 f
tree_adder/VMA/adder5/Gin (dot_com_18) 0.00 0.88 f
tree_adder/VMA/adder5/U4/Y (AOI21X1) 0.05 0.93 r
tree_adder/VMA/adder5/U1/Y (BUFX2) 0.03 0.96 r
tree_adder/VMA/adder5/U3/Y (INVX1) 0.02 0.98 f
tree_adder/VMA/adder5/Gout (dot_com_18) 0.00 0.98 f
tree_adder/VMA/adder18/G (dot_com_5) 0.00 0.98 f
tree_adder/VMA/adder22/G (dot_com_1) 0.00 1.04 f
tree_adder/VMA/adder26/G (half_dot_com_22) 0.00 1.11 f
tree_adder/VMA/adder26/Gout (half_dot_com_22) 0.00 1.20 f
tree_adder/VMA/adder41/Gin (half_dot_com_8) 0.00 1.20 f
11
tree_adder/VMA/U9/Y (XOR2X1) 0.04 1.73 r
tree_adder/VMA/sum[27] (brent_kung_28bitadder) 0.00 1.73 r
tree_adder/sum[31] (wallace_tree_adder) 0.00 1.73 r
sum[31] (out) 0.00 1.73 r
data arrival time 1.73
--------------------------------------------------------------------------
(Path is unconstrained)
12
Results of Power Consumption
The result of power consumption is shown as follows:
****************************************
Report : power
-analysis_effort low
Design : multiplier
Version: A-2007.12
Date : Tue Nov 17 16:39:24 2009
****************************************
Library(s) Used:
gscl45nm (File: /home/class/zhan0915/project/gscl45nm.db)
Global Operating Voltage = 1.1
Power-specific unit information :
Voltage Units = 1V
Capacitance Units = 1.000000pf
Time Units = 1ns
Dynamic Power Units = 1mW (derived from V,C,T units)
Leakage Power Units = 1nW
Cell Internal Power = 1.7201 mW (57%)
Net Switching Power = 1.2722 mW (43%)
---------
Total Dynamic Power = 2.9922 mW (100%)
Cell Leakage Power = 18.9980 uW
13
Results of Area
The result of silicon area is shown as follows:
****************************************
Report : area
Design : multiplier
Version: A-2007.12
Date : Thu Nov 17 16:47:24 2009
****************************************
Library(s) Used:
gscl45nm (File: /home/grads/zhan0884/FreePDK45/osu_soc/lib/files/gscl45nm.db)
Number of ports: 70
Number of nets: 486
Number of cells: 135
Number of references: 135
Combinational area: 4213.5213404
Noncombinational area: 0.000000
Net Interconnect area: undefined (No wire load specified)
Total cell area: 4213.5213404
Total area: undefined
Conclusion of Phase1
In the project Phase 1, one 1616 multiplier is designed based on Wallace tree using radix-4
booths algorithm. Synthesis results show that the multiplier takes the silicon area of
4213.521340 m
2
. The critical path of the multiplier is 1.73 ns. And the total power consumption
of the multiplier is 2.9922 mW.
14
SERVERAL OTHER FFT DESIGNS
Before going into details of the new approach, it is beneficial to have a brief review of the
various architectures for pipeline FFT processors. This Section give a brief review of previous
approaches for FFT hardware design. Different approaches will be put into functional blocks
with unified terminology, where the additive butterfly has been separated from multiplier to
show the hardware requirement distinctively. The control and twiddle factor reading mechanism
have been also omitted for clarity.
R2MDC
Radix-2 Multi-path Delay Commutator (R2MDC) was probably the most straightforward
approach for pipeline implementation of radix-2 FFT algorithm. The input sequence has been
broken into two parallel data stream flowing forward, with correct distance between data
elements entering the butterfly scheduled by proper delays. Both butterflies and multipliers are in
50%utilization. (log
2
N-2) multipliers, log
2
N radix-2 butterflies, and (3/2N-2) registers (delay
elements) are needed.
Fig 5. R2MDC(N=16)
R2SDF
Radix-2 Single-path Delay Feedback (R2SDF) uses the registers more efficiently by storing the
butterfly output in feedback shift registers. A single data stream goes through the multiplier at
every stage. It has same number of butterfly units and multipliers as in R2MDC approach, but
with much reduced memory requirement: (N-1) registers. Its memory requirement is minimal.
Fig 6. R2SDF(N=16)
15
R4SDF
Radix-4 Single-path Delay Feedback (R4SDF) was proposed as a radix-4 version of R2SDF,
employing Coordinate Rotational Digital Computer (CORDIC) iterations. The utilization of
multipliers has been increased to 75% due to the storage of 3 out of radix-4 butterfly outputs.
However, the utilization of the radix-4 butterfly, which is fairly complicated and contains at least
8 complex adders, is dropped to only 25%. It requires (log
4
N-1) multipliers, log4 N full radix-4
butterflies and storage of size (N-1).
Fig 7. R4SDF(N=256)
R4MDC
Radix-4 Multi-path Delay Commutator (R4MDC) is a radix-4 version of R2MDC. It has been
used as the architecture for the initial VLSI implementation of pipeline FFT processor and
massive wafer scale integration. However, it suffers from low, 25%, utilization of all
components, which can be compensated only in some special applications where four FFTs are
being processed simultaneously. It requires 3log
4
N multipliers, log
4
N full radix-4 butterflies and
(5/2N-4) registers.
Fig 8. R4MDC(N=256)
R4SDC
Radix-4 Single-path Delay Commutator (R4SDC) uses a modified radix-4 algorithm with
programmable 1/4 radix-4 butterflies to achieve higher, 75%utilization of multipliers. A
combined Delay-Commutator also reduces the memory requirement to (2N-2) from (5/2N-1),
that of R4MDC. The butterfly and delay-commutator become relatively complicated due to
16
programmability requirement. R4SDC has been used recently in building the largest ever single
chip pipeline FFT processor for HDTV application.
Fig 9. R4SDC(N=256)
FFT DESIGN BASED ON RADIX-2
2
ALGORITHM
In this section, we will derive the hardware oriented radix-2
2
algorithm for FFT implementation.
One example of 16-point radix-2
2
will be given in this section. And finally, the detailed butterfly
trellis will be plotted to guideline the following hardware design.
The DFT of size N is defined by
1
0
( ) ( ) , 0
N
nk
N
n
X k x n W k N
<

To make the derivation of the new algorithm clearer, consider the first 2 steps of decomposition
in the radix-2 DIF FFT together. Applying a 3-dimensional linear index map.
1 2 3
1 2 3
2 4
2 4
N
N
N N
n n n n
k k k k
+ +
'
+ +
Where the range of each parameters depends on the number of points (N). For a 1024 points
FFT, n
1
, n
2
, k
1
and k
2
are 1 bit data, equaling to 0 or 1; n
3
, k
3
[0,N/4]; i.e. each iteration
decreases the implementation points to N/4, but its not like normal RADIX-4 algorithm, the
reason is shown as follows.
Rewrite the DFT equation as:
17
1 2 3 1 2 3
3 2 1
2 3 1 2 3 2 3
1
3 2
2 1 1 1
( )( 2 4 )
2 4
1 2 3 1 2 3
0 0 0
2 1 1
( ) ( )( 2 4 )
4 4
2 3
0 0
2
( 2 4 ) ( )
2 4
( )
4

N N N
n n n k k k
N
n n n
N N N
n n k n n k k
k
N N N
n n
N N
X k k k x n n n W
N
B n n W W
+ + + +

+ + +

+ + + +

+
' '

2 3 1 2 3
1
3 2
2 1 1
( )( 2 4 )
4
2 3
0 0
2
( )
4
N N
n n k k k
k
N N
n n
N
B n n W
+ + +

+

where
1 1
2 3 2 3 1 2 3
2
( ) ( ) ( 1) ( )
4 4 2 4
k k
N
N N N N
B n n x n n x n n n + + + + +
2 3 1 2 3 2 1 2
2 3 3 1 2 3 3
3 1 2 3 3 2 1 2
( )( 2 4 ) ( 2 )
( 2 ) 4
4 4
( 2 ) 4 ( 2 )
( )
N N
n n k k k n k k
Nn k n k k n k
N N N N N
n k k n k n k k
N N
W W W W W
j W W
+ + + +
+
+ +

then the equation will deduced for the radix-2
2
FFT algorithm:
3 1 2 3 3
3
4 1
( 2 )
1 2 3 1 2 3 4
0
( 2 4 ) ( , , )
N
n k k n k
N N
n
X k k k H k k n W W
] + +
]
1 1 2 1
BF I BF I
( 2 )
1 2 3 3 3 3 3
BF II
3
( , , ) ( ) ( 1) ( ) ( ) ( ) ( 1) ( )
2 4 4
k k k k
N N N
H k k n x n x n j x n x n
+
] ]
+ + + + + +
] ]
] ]
6 4 4 4 47 4 4 4 48 6 4 4 4 4 47 4 4 4 4 48
1 4 4 4 4 4 4 4 4 4 4 4 4 42 4 4 4 4 4 4 4 4 4 4 4 4 43
This equation represents the first two stages of butterflies with only trivial multiplications in the
SFG, as BF I and BF II in Fig 10. After these two stages, full multipliers are required to compute
the product of the decomposed twiddle factor
3 1 2 ( 2 ) n k k
N
W
+
in eqn. X(k1+2k2+4k3), as shown in
Fig 10. Note the order of the twiddle factors is different from that of radix-4 algorithm.
18
Fig 10. Butterfly with decomposed twiddle factors.
Radix-2
2
algorithm has the feature that it has the same multiplicative complexity as radix-4
algorithms, but still retains the radix-2 butterfly structures. The multiplicative operations are in
such an arrangement that only every other stage has non-trivial multiplications. This is a great
structural advantage over other algorithms when pipeline/cascade FFT architecture is under
consideration.
RADIX-2
2
SDF ARCHITECTURE FOR 1024 POINTS COMPLEX FFT
Fig. 11 outlines an implementation of the R2
2
SDF architecture for N = 1024, note the similarity
of the data-path to R2SDF and the reduced number of multipliers. The implementation uses two
types of butterflies, one identical to that in R2SDF, the other contains also the logic to implement
the trivial twiddle factor multiplication, as shown in Fig. 12,13 respectively. Due to the spatial
regularity of Radix-2
2
algorithm, the synchronization control of the processor is very simple. A
(log2 N)-bit binary counter serves two purposes: synchronization controller and address counter
for twiddle factor reading in each stages.
19
Fig 11. 1024 points Radix-2
2
FFT architecture.
Fig 12. BF2I.
Fig 13. BF2II.
20
.
.
.
BF 1
.
.
.
BF 2
.
.
.
BF 3
.
.
.
BF 4
.
.
.
BF 5
.
.
.
BF 6
.
.
.
1024-point r a dix -2
2
FFT a lgor ithm
.
.
.
x (0)
.
.
.
BF 7
.
.
.
BF 8
.
.
.
BF 9
.
.
.
BF 10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x (1)
x (2)
x (3)
x (4)
x(1 0 2 2 )
x (1 0 2 3 )
X(0)
X(1)
X(2)
X(3)
X(4)
X(1 0 2 2 )
X(1 0 2 3 )
0-0
0-1
0-2
0-3
0-4
0-1 0 2 3
1-0
1-1
1-2
1-3
1-4
1-1 0 2 3
2-0
2-1
2-2
2-3
2-4
2-1 0 2 3
3-0
3-1
3-2
3-3
3-4
3-1 0 2 3
4-0
4-1
4-2
4-3
4-4
4-1 0 2 3
5-0
5-1
5-2
5-3
5-4
5-1 0 2 3
6-0
6-1
6-2
6-3
6-4
6-1 0 2 3
7-0
7-1
7-2
7-3
7-4
7-1 0 2 3
8-0
8-1
8-2
8-3
8-4
8-1 0 2 3
N(0) N(1) N(2) N(3) N( 4) N(5) N(6) N(7) N( 8)
Figure 14. Butterfly architecture of the 1024 points radix-2
2
FFT.
The 1024 points FFT using R2
2
SDF architecture is shown in Fig.14. It includes 5 stage of butterfly in which there are two butterfly
named BF1 and BF2. We describe the connection between BF1 and BF2 as a network, thus there are 9 networks named from N(0) to
N(8) shown in Fig 14. Each network contains 1024 piece of wires which we described as m-n where m represents the network and n
stands for the certain wire. For instance, the 128
th
wire in network 7 is represented as 7-128. Thus, we can use the formulation given
before to calculate each wires twiddle factor. The twiddle factors value is specified in Table 2 shown as follows:

21
Table 2. Twiddle factors of each wire.
Network Wire number Twiddle factor
N(0) 0-m, 768m1023 -j
0-m, else 1
N(2) 0-m, 192+256nm255+256n, (0n3) -j
0-m, else 1
N(4) 0-m, 48+64nm63+64n, (0n15) -j
0-m, else 1
N(6) 0-m, 12+16nm15+16n, (0n63) -j
0-m, else 1
N(8) 0-m, m=3+4n, (0n255) -j
0-m, else 1
N(1) 0-m, 768m1023 W
3(m-768)
0-m, 512m767 W
1(m-512)
0-m, 256m511 W
2(m-512)
0-m, else 1
N(3) 0-m, 192+256nm255+256n, (0n3) W
3(m-192-256n)
0-m, 128+256nm191+256n, (0n3) W
1(m-128-256n)
0-m, 64+256nm127+256n, (0n3) W
2(m-64-256n)
0- m, else 1
N(5) 0-m, 48+64nm63+64n, (0n15) W
3(m-48-64n)
0- m, 32+64nm47+64n, (0n15) W
1(m-32-64n)
0-m, 16+64nm31+64n, (0n15) W
2(m-16-64n)
0- m, else 1
N(7) 0-m, 12+16nm15+16n, (0n63) W
3(m-12-16n)
0- m, 8+16nm11+16n, (0n63) W
1(m-8-16n)
0-m, 4+16nm7+16n, (0n63) W
2(m-4-16n)
0- m, else 1

22
SYNTHESIS RESULTS OF THE 1024 POINTS FFT
With the aid of Synopsis Design Compiler, we employ FreePDK 45 nm CMOS technique to
obtain the synthesis results with respect of power dissipation, length of critical path, and silicon
area.
Figure 15. Simulation Waveforms of 1024 points FFT
23
Results of Power Dissipation
The result of power consumption is shown as follows:
****************************************
Report : power
-analysis_effort low
Design : fft_1024
Version: A-2007.12
Date : Tue Dec 22 07:40:01 2009
****************************************
Library(s) Used:
gscl45nm (File: /home/class/zhan0915/fft3/gscl45nm.db)
Global Operating Voltage = 1.1
Power-specific unit information :
Voltage Units = 1V
Capacitance Units = 1.000000pf
Time Units = 1ns
Dynamic Power Units = 1mW (derived from V,C,T units)
Leakage Power Units = 1nW
Cell Internal Power = 37.6966 mW (83%)
Net Switching Power = 7.7248 mW (17%)
---------
Total Dynamic Power = 45.4214 mW (100%)
Cell Leakage Power = 2.2162 mW
24
Results of Critical Path
The result of length of critical path is shown as follows:
****************************************
Report : timing
-path full
-delay max
-max_paths 1
-sort_by group
Design : fft_1024
Version: A-2007.12
Date : Tue Dec 22 07:41:08 2009
****************************************
# A fanout number of 1000 was used for high fanout net computations.
Startpoint: counter/q_reg[8]
(rising edge-triggered flip-flop)
Endpoint: imag_out[15]
(output port)
Path Group: (none)
Path Type: max
Point Incr Path
--------------------------------------------------------------------------
counter/q_reg[8]/CLK (DFFPOSX1) 0.00 # 0.00 r
counter/q_reg[8]/Q (DFFPOSX1) 0.36 0.36 r
counter/q[8] (ctr) 0.00 0.36 r
bf_2_0/s (bf_2_0) 0.00 0.36 r
bf_2_0/U279/Y (INVX1) 0.21 0.57 f
bf_2_0/U101/Y (OR2X1) 0.07 0.64 f
bf_2_0/U102/Y (INVX1) 0.74 1.38 r
bf_2_0/U278/Y (MUX2X1) 0.20 1.58 f
bf_2_0/U262/Y (INVX1) 0.10 1.67 r
bf_2_0/adder1/B[0] (vma16_35) 0.00 1.67 r
bf_2_0/adder1/ipg16/B[0] (p_g_16_35) 0.00 1.67 r
25
bf_2_0/adder1/ipg16/U31/Y (XOR2X1) 0.04 1.72 f
bf_2_0/adder1/ipg16/pg0[1] (p_g_16_35) 0.00 1.72 f
bf_2_0/adder1/ir1c1/pg[1] (partial_product_generator1_560)
0.00 1.72 f
bf_2_0/adder1/ir1c1/U2/Y (AOI21X1) 0.04 1.76 r
bf_2_0/adder1/ir1c1/U1/Y (INVX1) 0.04 1.79 f
bf_2_0/adder1/ir1c1/pgo (partial_product_generator1_560)
0.00 1.79 f
bf_2_0/adder1/ir2c3/pg0 (partial_product_generator1_559)
0.00 1.79 f
0.00 1.87 f
0.00 1.87 f
0.00 1.96 f
bf_2_0/adder1/ixor16/A[7] (xor_16_35) 0.00 1.96 f
bf_2_0/adder1/ixor16/U3/Y (XOR2X1) 0.04 2.00 f
bf_2_0/adder1/ixor16/S[7] (xor_16_35) 0.00 2.00 f
bf_2_0/adder1/S[7] (vma16_35) 0.00 2.00 f
bf_2_0/U233/Y (AOI22X1) 0.05 2.05 r
bf_2_0/U21/Y (BUFX2) 0.05 2.10 r
bf_2_0/U9/Y (AND2X1) 0.03 2.13 r
bf_2_0/U89/Y (INVX1) 0.03 2.15 f
bf_2_0/imag_out0[7] (bf_2_0) 0.00 2.15 f
mul0_i/A[7] (multiplier_7) 0.00 2.15 f
mul0_i/pre_coding/x[7] (booth_coding_7) 0.00 2.15 f
mul0_i/pre_coding/U240/Y (INVX1) 0.25 2.41 r
mul0_i/pre_coding/U663/Y (MUX2X1) 0.10 2.50 f
mul0_i/pre_coding/U662/Y (OAI21X1) 0.05 2.55 r
mul0_i/pre_coding/pp0[7] (booth_coding_7) 0.00 2.55 r
mul0_i/tree_adder/pp0[7] (wallace_tree_adder_7) 0.00 2.55 r
mul0_i/tree_adder/adder6/ina (full_adder_647) 0.00 2.55 r
mul0_i/tree_adder/adder6/U6/Y (XNOR2X1) 0.07 2.63 r
mul0_i/tree_adder/adder6/U3/Y (XOR2X1) 0.07 2.70 r
mul0_i/tree_adder/adder6/sum (full_adder_647) 0.00 2.70 r
26
mul0_i/tree_adder/adder108/ina (half_adder_441) 0.00 3.00 r
mul0_i/tree_adder/adder108/sum (half_adder_441) 0.00 3.07 r
mul0_i/tree_adder/VMA/A[3] (brent_kung_28bitadder_7)
0.00 3.14 r
mul0_i/tree_adder/VMA/pg3/A (p_g_195) 0.00 3.14 r
mul0_i/tree_adder/VMA/pg3/U1/Y (XOR2X1) 0.08 3.22 r
mul0_i/tree_adder/VMA/pg3/P (p_g_195) 0.00 3.22 r
mul0_i/tree_adder/VMA/adder1/P (dot_com_167) 0.00 3.22 r
mul0_i/tree_adder/VMA/adder1/U1/Y (AND2X1) 0.04 3.26 r
mul0_i/tree_adder/VMA/adder1/Pout (dot_com_167) 0.00 3.26 r
mul0_i/tree_adder/VMA/adder20/P (half_dot_com_182) 0.00 3.26 r
mul0_i/tree_adder/VMA/adder20/U2/Y (AOI21X1) 0.02 3.28 f
mul0_i/tree_adder/VMA/adder20/U1/Y (INVX1) 0.05 3.33 r
mul0_i/tree_adder/VMA/adder20/Gout (half_dot_com_182)
0.00 3.33 r
mul0_i/tree_adder/VMA/U6/Y (XOR2X1) 0.08 3.42 r
mul0_i/tree_adder/VMA/sum[4] (brent_kung_28bitadder_7)
0.00 3.42 r
mul0_i/tree_adder/sum[8] (wallace_tree_adder_7) 0.00 3.42 r
mul0_i/sum[8] (multiplier_7) 0.00 3.42 r
bf_1_1/imag_in1[8] (bf_1_4) 0.00 3.42 r
bf_1_1/adder1/B[8] (vma16_31) 0.00 3.42 r
bf_1_1/adder1/ipg16/B[8] (p_g_16_31) 0.00 3.42 r
0.00 3.47 f
bf_1_1/adder1/ir1c9/pgo[0] (partial_product_generator2_338)
27
0.00 3.53 f
0.00 3.53 f
0.00 3.58 f
0.00 3.58 f
0.00 3.64 f
bf_1_1/adder1/S[10] (vma16_31) 0.00 3.68 f
bf_1_1/U128/Y (MUX2X1) 0.04 3.71 r
bf_1_1/U127/Y (INVX1) 0.04 3.76 f
bf_1_1/imag_out0[10] (bf_1_4) 0.00 3.76 f
bf_2_1/imag_in1[10] (bf_2_4) 0.00 3.76 f
bf_2_1/U277/Y (MUX2X1) 0.08 3.84 r
bf_2_1/U261/Y (INVX1) 0.09 3.92 f
bf_2_1/adder1/B[10] (vma16_27) 0.00 3.92 f
bf_2_1/adder1/ipg16/B[10] (p_g_16_27) 0.00 3.92 f
0.00 3.97 f
bf_2_1/adder1/ir1c11/U1/Y (AND2X1) 0.04 4.01 f
0.00 4.01 f
0.00 4.01 f
0.00 4.05 f
0.00 4.05 f
28
0.00 4.12 f
bf_2_1/adder1/S[11] (vma16_27) 0.00 4.16 f
bf_2_1/U244/Y (AOI22X1) 0.05 4.21 r
bf_2_1/U21/Y (BUFX2) 0.05 4.26 r
bf_2_1/U4/Y (AND2X1) 0.03 4.29 r
bf_2_1/U86/Y (INVX1) 0.03 4.31 f
bf_2_1/imag_out0[11] (bf_2_4) 0.00 4.31 f
0.00 5.38 r
mul1_i/tree_adder/VMA/pg7/U1/Y (XOR2X1) 0.05 5.43 f
29
mul1_i/tree_adder/VMA/pg7/P (p_g_135) 0.00 5.43 f
mul1_i/tree_adder/VMA/adder3/P (dot_com_117) 0.00 5.43 f
mul1_i/tree_adder/VMA/adder3/U1/Y (AND2X1) 0.05 5.49 f
mul1_i/tree_adder/VMA/adder3/Pout (dot_com_117) 0.00 5.49 f
mul1_i/tree_adder/VMA/adder19/P (dot_com_101) 0.00 5.49 f
mul1_i/tree_adder/VMA/adder19/U1/Y (AND2X1) 0.05 5.54 f
mul1_i/tree_adder/VMA/adder19/Pout (dot_com_101) 0.00 5.54 f
mul1_i/tree_adder/VMA/adder23/P (half_dot_com_129) 0.00 5.54 f
mul1_i/tree_adder/VMA/adder23/U2/Y (AOI21X1) 0.04 5.57 r
mul1_i/tree_adder/VMA/adder23/U1/Y (INVX1) 0.04 5.61 f
mul1_i/tree_adder/VMA/adder23/Gout (half_dot_com_129)
0.00 5.61 f
0.00 5.69 r
bf_1_2/imag_in1[12] (bf_1_3) 0.00 5.69 r
bf_1_2/adder1/B[12] (vma16_23) 0.00 5.69 r
bf_1_2/adder1/ipg16/B[12] (p_g_16_23) 0.00 5.69 r
0.00 5.75 f
0.00 5.81 f
0.00 5.81 f
0.00 5.86 f
0.00 5.86 f
0.00 5.92 f
30
bf_1_2/adder1/S[14] (vma16_23) 0.00 5.95 f
bf_1_2/U120/Y (MUX2X1) 0.04 5.99 r
bf_1_2/U119/Y (INVX1) 0.04 6.04 f
bf_1_2/imag_out0[14] (bf_1_3) 0.00 6.04 f
bf_2_2/imag_in1[14] (bf_2_3) 0.00 6.04 f
bf_2_2/U273/Y (MUX2X1) 0.08 6.11 r
bf_2_2/U257/Y (INVX1) 0.09 6.20 f
bf_2_2/adder1/B[14] (vma16_19) 0.00 6.20 f
bf_2_2/adder1/ipg16/B[14] (p_g_16_19) 0.00 6.20 f
bf_2_2/adder1/ipg16/U21/Y (XOR2X1) 0.06 6.26 r
bf_2_2/adder1/ipg16/pg14[1] (p_g_16_19) 0.00 6.26 r
bf_2_2/adder1/ixor16/B[14] (xor_16_19) 0.00 6.26 r
bf_2_2/adder1/S[14] (vma16_19) 0.00 6.31 f
bf_2_2/U241/Y (AOI22X1) 0.05 6.36 r
bf_2_2/U24/Y (BUFX2) 0.05 6.40 r
bf_2_2/U13/Y (AND2X1) 0.03 6.43 r
bf_2_2/U83/Y (INVX1) 0.03 6.46 f
bf_2_2/imag_out0[14] (bf_2_3) 0.00 6.46 f
31
0.00 7.60 r
0.00 7.74 r
bf_1_3/imag_in1[14] (bf_1_2) 0.00 7.74 r
bf_1_3/adder1/B[14] (vma16_15) 0.00 7.74 r
bf_1_3/adder1/ipg16/B[14] (p_g_16_15) 0.00 7.74 r
bf_1_3/adder1/S[14] (vma16_15) 0.00 7.85 f
bf_1_3/U120/Y (MUX2X1) 0.04 7.89 r
bf_1_3/U119/Y (INVX1) 0.04 7.94 f
bf_1_3/imag_out0[14] (bf_1_2) 0.00 7.94 f
bf_2_3/imag_in1[14] (bf_2_2) 0.00 7.94 f
bf_2_3/U273/Y (MUX2X1) 0.08 8.02 r
bf_2_3/U257/Y (INVX1) 0.09 8.10 f
bf_2_3/adder1/B[14] (vma16_11) 0.00 8.10 f
bf_2_3/adder1/ipg16/B[14] (p_g_16_11) 0.00 8.10 f
bf_2_3/adder1/S[14] (vma16_11) 0.00 8.21 f
32
bf_2_3/U241/Y (AOI22X1) 0.05 8.26 r
bf_2_3/U29/Y (BUFX2) 0.05 8.30 r
bf_2_3/U13/Y (AND2X1) 0.03 8.33 r
bf_2_3/U96/Y (INVX1) 0.03 8.36 f
bf_2_3/imag_out0[14] (bf_2_2) 0.00 8.36 f
0.00 9.50 r
0.00 9.64 r
33
bf_1_4/imag_in1[14] (bf_1_1) 0.00 9.64 r
bf_1_4/adder1/B[14] (vma16_7) 0.00 9.64 r
bf_1_4/adder1/ipg16/B[14] (p_g_16_7) 0.00 9.64 r
bf_1_4/adder1/S[14] (vma16_7) 0.00 9.75 f
bf_1_4/U120/Y (MUX2X1) 0.04 9.79 r
bf_1_4/U119/Y (INVX1) 0.04 9.84 f
bf_1_4/imag_out0[14] (bf_1_1) 0.00 9.84 f
bf_2_4/imag_in1[14] (bf_2_1) 0.00 9.84 f
bf_2_4/U257/Y (MUX2X1) 0.08 9.91 r
bf_2_4/U241/Y (INVX1) 0.09 10.00 f
bf_2_4/adder1/B[14] (vma16_3) 0.00 10.00 f
bf_2_4/adder1/ipg16/B[14] (p_g_16_3) 0.00 10.00 f
0.00 10.04 f
0.00 10.09 f
0.00 10.09 f
0.00 10.13 f
0.00 10.13 f
0.00 10.16 f
0.00 10.16 f
0.00 10.23 f
34
bf_2_4/adder1/ixor16/U10/Y (XOR2X1) 0.06 10.29 r
bf_2_4/adder1/ixor16/S[15] (xor_16_3) 0.00 10.29 r
bf_2_4/adder1/S[15] (vma16_3) 0.00 10.29 r
bf_2_4/U218/Y (AOI22X1) 0.04 10.33 f
bf_2_4/U54/Y (BUFX2) 0.11 10.45 f
bf_2_4/U217/Y (NAND2X1) 0.08 10.53 r
bf_2_4/imag_out0[15] (bf_2_1) 0.00 10.53 r
imag_out[15] (out) 0.00 10.53 r
data arrival time 10.53
--------------------------------------------------------------------------
(Path is unconstrained)
Results of Area
The result of silicon area is shown as follows:
****************************************
Report : area
Design : fft_1024
Version: A-2007.12
Date : Tue Dec 22 07:40:53 2009
****************************************
Library(s) Used:
gscl45nm (File: /home/class/zhan0915/fft3/gscl45nm.db)
Number of ports: 66
Number of nets: 39672
Number of cells: 38565
Number of references: 31
Combinational area: 70080.567213
Noncombinational area: 262240.141182
35
Net Interconnect area: undefined (No wire load specified)
Total cell area: 332320.708395
Total area: undefined
Information: This design contains black box (unknown) components. (RPT-8)
Conclusion of the Phase2
In the second stage, one 1024-point radix-2
2
FFT module is designed. Synthesis results show that
the multiplier takes the silicon area of 332320.708m
2
. The critical path of the multiplier is
10.53ns. And the total power consumption of the multiplier is 45.4214mW. Fig. 16 is the
schematic view of the FFT.
Fig. 16 Schematic View of 1024-points FFT
36
REFERENCES
[1] Shousheng He and Mats Torkelson, A New Approach to Pipeline FFT Processor ,15-19 April
1996 Page(s):766 - 770 Digital Object Identifier 10.1109/IPPS.1996.508145
[2] Shousheng He and Mats Torkelson, Design and Implementation of a 1024-point Pipeline FFT
Processor, 11-14 May 1998 Page(s):131 134 Digital Object Identifier 10.1109/CICC.1998.694922
[ 3] S. He and M. Torkelson. A complex array multiplier using distributed arithmetic. In Proc. IEEE
CICC'96, pages 71-74, San Diego, CA, May 1996.
[4] Garrido, M; Parhi, K; Grajal, J, A Pipelined FFT Architecture for Real-Valued Signals, Volume
PP, 2009 Page(s):1 - 1 Digital Object Identifier10.1109/TCSI.2009.2017125.
[5] Kia Bazargan, University of Minnesota Class Handouts, EE 5324- VLSI Design II, Spring 2006.
[6] Kharrat, M.W.; Ben Ayed, M.A.; Loulou, M.; Masmoudi, N.; Kamoun, L., A new method to
implement a constant operand multiplier, Microelectronics, The 14th International Conference on 2002
ICM 11-13 Dec. 2002 Page(s):62 65.
[7] Saeeid Tahmasbi Oskuii, Per Gunnar Kjeldsberg, Oscar Gustafsson Power Optimized Partial
Product Reduction Interconnect Ordering in Parallel Multipliers.
37
38

VLSI DSP Project Report - 1.0

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

VLSI DSP Project Report - 1.0

Cargado por

Copyright:

Formatos disponibles

A 1024 POINT RADIX-2

También podría gustarte