Documentos de Académico
Documentos de Profesional
Documentos de Cultura
I. INTRODUCTION
633
634
Fig. 3. Complementary delay line with inverter delay elements for improved phase resolution.
B. Delay-Line Improvements
To overcome some of these limitations, we developed a
complementary delay line as shown in Fig. 3 for our DLL.
In this architecture, two parallel delay lines with weak cross
coupling are driven by complementary input signals ClkIn and
635
Fig. 5. Phasor diagram with phasors of signals from the taps of a complementary delay line with one inverter delay
50 :
indicating the first 180 of delay in the delay lines. The first
state transition in the EOC code indicates the first true tap
from the delay line that provides a signal with phase that
lags the phase of the signal from Tap 1 by more than 180
With this information, the DLL logic knows when to switch
between the true and complement taps of the delay line to
ensure full coverage of all 360 of phase space, with phase
steps of at most one inverter delay. Use of the EOC code also
prevents negative phase steps in the phase-transfer function as
taps are successively selected from the delay line. This allows
the complementary delay lines to provide infinite, monotonic
phase range for the DLL. The clocking signal for the EOC
detector, SampClk, is synchronized to the signal from Tap 1
by a replica timing network (not shown).
To illustrate the principle of infinite phase generation using
the EOC code with this delay-line scheme, refer to Fig. 5,
which shows a phasor diagram of the signals from the first
five true and complement taps of a complementary delay line
like the one shown in Fig. 3. The figure assumes that the
PVT conditions and operating frequency are such that the
propagation delay of each inverter stage is equal to 50 of
phase. In the figure, the solid lines correspond to signals from
the true taps, while dashed lines correspond to signals from
the complement taps. Because Tap 5 delivers a signal that is
delayed by 200 from the signal at Tap 1, the EOC detectors
thermometer code would indicate that Tap 5 is the first true
tap to provide a signal with phase beyond 180 relative to the
signal from Tap 1. With this information, the DLL knows to
switch between the true and complement taps after four stages.
Although the delay-line improvements discussed above reduced the required power and area of the delay line, improved
its jitter accumulation performance, enabled infinite phase
range, and improved the available phase resolution by a factor
of two, this phase resolution was still not good enough to
meet the requirements of our memory system. In the 0.4- m
process we used, the propagation delay of one inverter over all
anticipated PVT conditions varied from 100 to 300 ps. This
is much larger than the worst case phase step specification of
50 ps. Therefore, to ensure compliance with this specification,
the DLLs phase resolution needed to be improved by at least
six times over what the delay line provided.
To solve this problem, we used inverter phase blending. A simple, single-stage phase-blender circuit is shown
in Fig. 6(a). This circuit receives two phase-adjacent input
and
, which are separated in phase by one
signals,
inverter delay. The phase blender directly passes these two
and
signals with a simple delay to produce output signals
However, it also uses a pair of phase-blending inverters to
interpolate between these two input signals to produce a third
, having a phase between that of
and
output signal,
This effectively doubles the available phase resolution.
However, it is not sufficient to use equal-sized inverters
for the phase blending. Fig. 6(b) illustrates a simple model
[12] used for determining the ideal relative sizes of the two
lies
phase-blending inverters to ensure that the phase of
and
The model approximates
directly between that of
the two inverters with two simple switched current sources
sharing a common resistancecapacitance (RC) load. For two
the model
rising edge input signals separated in time by
yields the equation
(2)
636
(a)
(b)
(c)
(d)
(e)
Fig. 6. Phase blending for phase-resolution improvement. (a) Single-stage phase-blender circuit, (b) simple model of phase-blending inverters, (c) plot of
signal voltages in the simple model for w
WA =(WA + WB ) = 0:50, (d) phase-blender output signal edges for w = 0:50, and (e) phase-blender
output signal edges for w = 0:60:
637
638
(a)
(b)
Fig. 9. (a) A 16 : 1 duty-cycle correcting multiplexer circuit. (b) Duty-cycle correction control circuit.
Fig. 10.
639
(a)
(b)
Fig. 11. Test-chip micrograph showing on the left side (a) the analog DLL
of [6] and on the right side (b) the new digital DLL integrated into identical
interface cells.
640
(a)
Fig. 12.
(b)
Measured transmit eye diagrams at 3.3 V and 400 MHz of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.
V. MEASURED PERFORMANCE
A. Test Chip
Both the digital DLL presented here and an implementation
of the analog DLL of Donnelly et al. [6] were integrated into
identical high-speed CMOS interface cells on opposite sides
of a single test chip. A micrograph of this test chip is shown in
Fig. 11. The test chip I/O was laid out symmetrically so that
either interface cell could be tested on the same hardware by
simply removing the test chip from the test socket, rotating
it 180 and reinserting it into the socket. This allowed a
true side-by-side comparison of the two DLLs operating in a
system. The test-chip circuits were fabricated using a standard
0.4- m, 3.3-V CMOS process with 0.65-V threshold voltages.
B. Test Results
Unless indicated otherwise, all test results described in this
section were measured with the analog and digital DLLs
operating in their respective high-speed interface cells at 3.3 V
and 400 MHz (800 Mb/s/pin) using the same test vectors.
Additionally, the test chip included noise-generator circuits,
which produced digital switching noise during the testing of
both interfaces.
Fig. 12(a) and (b) shows eye diagrams of the two interfaces
with the analog and digital DLLs, respectively. The diagrams
indicate the output timing performance of the interface cells
in the test system. Although the interface with the analog
DLL provided slightly better timing performance, 320 ps pp
versus 380 ps pp for the interface with the digital DLL, the
performances of both interfaces (and therefore, both DLLs)
were comparable. This is surprisingly good considering the
extensive use of poor PSRR elements, such as inverters, in
the signal path of the digital DLL. (Note: I/O circuit dutycycle distortion produced the unequal eyes in both diagrams.
This is unrelated to the DLLs.)
Fig. 13(a) and (b) shows receive shmoo diagrams for the
two interfaces with the analog and digital DLLs, respectively.
The diagrams indicate the CMOS interfaces valid timing windows for receiving data. On the diagrams, the -axis is supply
4.0 V) while the -axis indicates input
voltage (2.5 V
Mb/s
ns).
data positioning along a bit period (
The normal data position is in the center of the bit period. A
black dot in the diagram indicates incorrectly received data for
Ideally, the window
that combination of bit position and
should be entirely white, but realistically, it is limited by jitter
from the DLL and other sources. Therefore, this test measures
the amount of tolerable skew on the input timing over a range
of supply voltages. Although the interface with the analog DLL
delivers better timing performance than the interface with the
digital DLL (1.02 versus 0.92 ns), both meet the component
specification of 0.85 ns.
Fig. 14 is a circle plot of the measured phase of the DLLs
output signal ClkOut, illustrating the DLLs ability to provide
infinite phase range. The -axis indicates delay [or phase, as in
(1)] of the ClkOut signal relative to a fixed 400-MHz signal.
The -axis indicates cycle count. These data were measured by
probing the on-chip DLL output signal (ClkOut) and forcing
the DLLs phase-detector output low. This caused the DLLs
output phase to continually advance over time. The term circle
plot is used because this diagram is equivalent to sweeping a
phasor that represents the phase of ClkOut around the phase
plane, thereby drawing a circle in the phase plane. Because
the phase of ClkOut is measured relative to a fixed 400-MHz
ns
signal, the plotted delay appears modulo 2.5 ns, where
641
(a)
(b)
Fig. 13. Measured shmoo diagrams showing the 400-MHz receive timing windows of the high-speed interface cells with (a) the analog DLL of [6]
and (b) the new digital DLL.
Fig. 14.
Measured circle plot illustrating the infinite phase transfer characteristic of the digital DLL.
642
(a)
Fig. 15.
(b)
TABLE I
ANALOG AND DIGITAL DLL PERFORMANCE SUMMARY AT 3.3 V
AND
400 MHz
uses less power and area, and provides better timing performance (smaller long-term jitter) and phase resolution (smaller
maximum phase step), both DLLs enable the interface cells to
meet the component requirements when operating in the test
system. Additionally, the digital DLL has a higher maximum
operating frequency, works at lower supply voltages, and
requires much less effort to port to other processes (one versus
four man-months).
Fig. 15(a) and (b) shows plots of measured DLL power
versus frequency at
V and measured DLL power
MHz, respectively. Although
versus voltage supply at
both plots show that the digital DLL dissipated more power
than the analog DLL for all measured conditions, the plots illustrate the different characteristics of the power consumed by
the two DLLs. As mentioned earlier, the power of both DLLs
is distributed between IV power in the constant-current stages
and CV f power in the CMOS stages. The curves in Fig. 15(a)
show that the digital DLLs power dissipation has a greater
dependence on frequency than does the analog DLLs power.
The curves in Fig. 15(b) show that the digital DLLs power
dissipation has a predominantly square-law dependence on
supply voltage, whereas the analog DLLs power dissipation
has a mixed square-law and linear dependence. These trends
confirm that the power of the analog DLL has a relatively
higher IV term, whereas the power of the digital DLL has a
= 400 MHz.
[2] J.-M. Han, J. Lee, S. Yoon, S. Jeong, C. Park, I. Cho, S. Lee, and D. Seo,
Skew minimization techniques for 256 Mb synchronous DRAM and
beyond, in VLSI Circuits Dig. Tech. Papers, June 1996, pp. 192193.
[3] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H.
Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura,
K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T.
Anezaki, M. Hasegawa, and M. Taguchi, A 256 Mb SDRAM using
register-controlled digital DLL, in ISSCC 1997 Dig. Tech. Papers, Feb.
1997, pp. 7273.
[4] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Johnson, and T. Ishikawa, A
2.5 V CMOS delay-locked loop for 18 Mbit, 500 megabyte/s DRAM,
IEEE J. Solid-State Circuits, vol. 29, pp. 14911496, Dec. 1994.
[5] S. Sidiropoulos and M. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[6] K. Donnelly, Y. Chan, J. Ho, C. Tran, S. Patel, B. Lau, J. Kim, P.
Chau, C. Huang, J. Wei, L. Yu, R. Tarver, R. Kulkarni, D. Stark, and M.
Johnson, A 660MB/s interface megacell portable circuit in 0.3 m0.7
m CMOS ASIC, IEEE J. Solid-State Circuits, vol. 31, pp. 19952003,
Dec. 1996.
[7] N. Kushiyama, S. Ohshima, D. Stark, H. Noji, K. Sakurai, S. Takase,
T. Furuyama, R. Barth, A. Chan, J. Dillon, J. Gasbarro, M. Griffin,
M. Horowitz, T. Lee, and V. Lee, A 500-Megabyte/s data-rate 4.5M
DRAM, IEEE J. Solid-State Circuits, vol. 28, pp. 490508, Apr. 1993.
[8] M. Hasegawa, M. Nakamura, S. Narui, S. Ohkuma, Y. Kawase, H.
Endoh, S. Miyatake, T. Akiba, K. Kawakita, M. Yoshida, S. Yamada, T.
Sekigguchi, I. Asano, Y. Tadaki, R. Nagai, S. Miyaoka, K. Kajigaya, M.
Horiguchi, and Y. Nakagome, A 256 Mb SDRAM with subthreshold
leakage current suppression, in ISSCC 1998 Dig. Tech. Papers, Feb.
1998, pp. 8081.
[9] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara,
T. Matano, Y. Hoshino, K. Miyano, S. Isa, E. Kakehashi, J. Drynan,
M. Komuro, T. Fukase, H. Iwasaki, J. Sekine, M. Igeta, N. Nakanishi,
T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose,
T. Imura, M. Uziie, K. Koyama, Y. Fukuzo, and T. Okuda, A 2.5
ns clock access 250 MHz 256 Mb SDRAM with synchronous mirror
delay, ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 374375.
[10] B. Garlepp, K. Donnelly, J. Kim, P. Chau, J. Zerbe, C. Huang, C. Tran,
C. Portmann, D. Stark, Y. Chan, T. Lee, and M. Horowitz, A portable
digital DLL architecture for CMOS interface circuits, in VLSI Circuits
Dig. Tech. Papers, June 1998, pp. 214215.
[11] M. Griffin, J. Zerbe, A. Chan, Y. Jun, Y. Tanaka, W. Richardson, G.
Tsang, M. Ching, C. Portmann, Y. Li, B. Stonecypher, L. Lai, K. Lee,
V. Lee, D. Stark, H. Modarres, P. Batra, J. Louis-Chandran, J. Privitera,
T. Thrush, B. Nickell, J. Yang, V. Hennon, and R. Sauve, A process
independent 800 MB/s DRAM bytewide interface featuring command
interleaving and concurrent memory operation, in ISSCC 1998 Dig.
Tech. Papers, Feb. 1998, pp. 156157.
[12] S. Sidiropoulos, High-performance interchip signalling, Ph.D. dissertation, Computer Systems Laboratory, Stanford University, Stanford, CA, Apr. 1998. Available as Tech. Rep. CSL-TR-98-760 from
http://elib.stanford.edu/.
[13] I. Young, M. Mar, and B. Bhushan, A 0.35 m CMOS 3-880 MHz
PLL N/2 multiplier and distribution network with low jitter for microprocessors, in ISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 330331.
[14] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,
in ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 132133.
[15] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, A 600
MHz CMOS PLL microprocessor clock generator with a 1.2 GHz
VCO, in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 396397.
643
Charles Huang received the B.S. degree in electrical engineering from the University of Fuzhou,
China, in 1982 and the M.S. degree in electrical
engineering from the University of Arkansas, Fayetteville, in 1990.
He was with ULSI and SGI, working in the area
of PLL and cache circuit design. He joined Rambus,
Inc., Mountain View, CA, in 1994, where he has
being engaged in high-speed CMOS DLL and I/O
circuit design.
644
Thomas H. Lee (S87M87), for a photograph and biography, see this issue,
p. 585.
Donald Stark received the B.S. degree from the
Massachusetts Institute of Technology, Cambridge,
in 1985 and the M.S. and Ph.D. degrees from
Stanford University, Stanford, CA, in 1987 and
1991, respectively, all in electrical engineering.
His research interests at Stanford included circuit
design and CAD tools for analysis of voltage and
current distributions in VLSI circuits. From 1987
to 1991, he was also a Member of the Western
Research Laboratory, Digital Equipment Corp., Palo
Alto, CA, working on CAD development and ECL
circuit design. From 1991 to 1993, he was with the Semiconductor Device
Engineering Laboratory, Toshiba Corp., Kawasaki, Japan, working on DRAM
design. In 1993, he joined Rambus, Inc., Mountain View, CA, where he
currently works on DRAM, high-speed I/O design, and CAD.
Mark A. Horowitz, for a photograph and biography, see p. 528 of the April
1999 issue of this JOURNAL.