Great DLL Article

632
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999
A Portable Digital DLL for

High-Speed CMOS Interface Circuits
Bruno W. Garlepp, Kevin S. Donnelly, Associate Member, IEEE, Jun Kim, Pak S. Chau,
Jared L. Zerbe, Charles Huang, Chanh V. Tran, Clemenz L. Portmann, Member, IEEE,
Donald Stark, Yiu-Fai Chan, Member, IEEE, Thomas H. Lee, Member, IEEE, and Mark A. Horowitz
Abstract A digital delay-locked loop (DLL) that achieves

infinite phase range and 40-ps worst case phase resolution at
400 MHz was developed in a 3.3-V, 0.4-m standard CMOS
process. The DLL uses dual delay lines with an end-of-cycle detector, phase blenders, and duty-cycle correcting multiplexers. This
more easily process-portable DLL achieves jitter performance
comparable to a more complex analog DLL when placed into
identical high-speed interface circuits fabricated on the same
test-chip die. At 400 MHz, the digital DLL provides <250 ps
peak-to-peak long-term jitter at 3.3 V and operates down to 1.7 V,
where it dissipates 60 mW. The DLL occupies 0.96 mm2 :
Index TermsDelay circuits, delay-locked loops (DLLs), digital control, digital DLL, phase blending, phase control, phase
synchronization.
I. INTRODUCTION
N RECENT years, there has been a great deal of interest

in delay-locked loops (DLLs) for clock alignment. Both
analog and digital DLLs have been developed [1][6], with
analog loops generally providing better jitter performance
at the expense of greater complexity. This paper describes
a digital DLL that achieves jitter performance comparable
to an analog DLL. Although the digital DLL uses more
area and power than the analog DLL, its greater simplicity,
easier portability, and lower minimum required supply voltage
makes it very attractive in many clock alignment applications.
Additionally, the digital DLL not only operates at lower supply
voltages than the analog DLL but it also demonstrates that
digital DLLs have the potential for good power-consumption
scaling as supply voltage is decreased.
The motivation for the development of this digital DLL
was the need for a clock alignment circuit for use in the
CMOS interface cells [6] of a high-speed memory system
as in [7].1 The memory system operates at 400 MHz, with
data transferred on both edges of the clock, producing an
effective 800-Mb/s/pin transfer rate. This corresponds to a
1.25-ns bit time. With such tight timing requirements, it
becomes imperative to include clock alignment circuits in
Manuscript received September 15, 1998; revised December 23, 1998.
B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang,
C. V. Tran, C. L. Portmann, D. Stark, and Y.-F. Chan are with Rambus, Inc.,
Mountain View, CA 94040 USA.
T. H. Lee and M. A. Horowitz are with the Center for Integrated Systems,
Stanford University, Stanford, CA 94305 USA.
Publisher Item Identifier S 0018-9200(99)03668-9.
1 Documentation is available at http://www.rambus.com/html/direct_documentation.html.
the interface cells to provide internal on-chip clocks that

are aligned in phase with an external system clock. The
clock alignment circuits must provide a phase resolution
better than 50 ps and produce a worst case long-term jitter
of less than 250 ps peak-to-peak (pp). To facilitate the
use of many different application-specific integrated-circuit
controllers with the memory system, the clock alignment
circuit should be easily portable across multiple processes
without compromising performance.
The clock alignment function can be provided using either
phase-locked loops (PLLs) or DLLs. Because frequency synthesis is not needed in this application, DLLs are preferred for
their unconditional stability, lower phase-error accumulation,
and faster locking time. In previous designs of the interface
cells for this memory system, we have used an analog DLL
with a two-step coarse/fine architecture. A high-level drawing
of this approach is shown in Fig. 1. This analog DLL includes
a quadrature generator, which produces four reference signals
spaced 90 apart in phase to evenly cover the full 360
of phase space. A phase interpolator circuit in the analog
DLL receives these reference signals and selects a phase
adjacent pair that define a phase quadrant for interpolation to
produce an output signal phase-aligned to a reference signal,
RefClk.
Analog DLLs constructed with this approach provide several significant benefits. Because most of the elements in the
signal path can be made from differential analog blocks with
good power-supply rejection ratio (PSRR), the analog DLL
architecture of Fig. 1 can provide very good jitter performance.
Additionally, it can be carefully designed to occupy relatively
little area and consume relatively little current. Furthermore,
the analog DLL can provide very small phase steps when
locked ( 50 ps). Finally, the architecture of Fig. 1 provides
infinite phase range, and one set of quadrature reference
signals can be fed to multiple phase interpolators, allowing
phase alignment to multiple reference signals simultaneously.
However, because of the relatively high analog complexity of
this DLL and its individual elements, the analog DLL of Fig. 1
requires a detailed, process-specific implementation, making it
relatively labor intensive to port across multiple processes.
Although we have traditionally used analog DLLs to provide the clock alignment function in the CMOS interface
cells of the memory system described above, we decided to
consider using a digital DLL. Digital DLLs are characterized
by their use of a digital delay line and are typically made from
00189200/99$10.00 1999 IEEE
GARLEPP et al.: PORTABLE DIGITAL DLL
633
Fig. 1. Block diagram of a two-step, coarse/fine analog DLL architecture.
simple, digital circuit elements. This facilitates their design and

portability across multiple processes. Additionally, because
phase information in a digital DLL is stored as a digital
state, digital DLLs can provide very fast timing recovery after
being placed into a low power mode. However, conventional
digital DLLs provide only moderate phase resolution and jitter
performance [8], [9].
Another benefit of digital DLLs is their ability to readily
operate at lower voltages than analog DLLs. Because analog
DLLs require the use of saturated current sources, they
experience voltage headroom problems as supply voltages
decrease. Digital DLLs, on the other hand, need only enough
voltage to ensure the proper operation of their digital gate
elements. For the same reason, digital DLLs better utilize
the power-saving benefits of digital CMOS voltage scaling
than analog DLLs. The power of an analog DLL is typically
distributed between IV power (where I is power and V is
voltage) from the constant current (differential) stages and
CV f power (where C is capacitance and f is frequency) from
the CMOS (single-ended) stages (if any). The power of digital
DLLs, on the other hand, is determined primarily by CV f
power, which decreases quadratically with supply voltage.
This paper describes a digital DLL [10] used as the clock
alignment circuit in the CMOS interface cells of a high-speed
memory system. This work improves upon the performance of
previous digital DLLs by paralleling the two-step coarse/fine
analog DLL architectures presented in [4], [5], [7], and [11],
allowing the digital DLL to achieve jitter performance comparable to the analog DLLs.
This paper is arranged as follows. Section II describes
delay-generation techniques used in conventional digital
DLLs and describes the improved techniques implemented
in the new DLL. This section also describes infinite phase
generation with the new delay-line scheme. Section III
describes several new circuit techniques used for enhancing
the phase resolution and signal quality in the new digital DLL.
Section IV describes the overall DLL architecture. Section V
discusses our test chip and measured results, with special
attention given to making a direct, side-by-side comparison of
the new digital DLL with an analog DLL placed into identical
CMOS interface cells on the same test-chip die. Section VI

concludes this paper.
The terms phase and delay are used throughout this paper
to describe the DLLs operation. It is helpful to recall that at a
given system frequency, the two quantities are related by the
simple equation
(1)
is phase in degrees,
where
is frequency in hertz.
is delay in seconds, and
II. DIGITAL DELAY CIRCUIT TECHNIQUES

A. Conventional Digital Delay Lines
As mentioned above, the purpose of a DLL in a clock
alignment application is to provide an output clock signal that
is aligned in phase with a reference clock signal of the same
frequency. To do this, the DLL must include a mechanism for
providing a variable delay to an input signal. The DLL then
adjusts this variable delay such that the input signal passes
through the delay mechanism and emerges at the output of the
DLL aligned in phase with the reference signal.
Digital DLLs generally incorporate a tapped digital delay
line as the variable-delay mechanism. The delay line receives
an input clock signal (e.g., a buffered version of the reference
signal) and passes it through a series of delay elements. The
outputs of the delay elements are tapped and buffered to
provide a series of phase-adjacent signals. The DLL then
selects the delay-line tap that provides the signal that produces
an output with a phase that most closely matches the desired
phase.
A conventional delay line suitable for a CMOS digital DLL
is shown in Fig. 2. The delay elements could be implemented
with almost any circuit block, but because the phase resolution
of the delay line is determined by the delay through the delay
elements, delay elements that provide minimal delay are generally preferred. Thus, the delay line of Fig. 2 uses inverters,
since they provide the shortest delay of any CMOS digital gate.
Because of the inverting characteristic of all standard CMOS
gates, the delay line is tapped only at every other inverter
634
Fig. 2. Conventional digital delay line with inverter delay elements.
Fig. 3. Complementary delay line with inverter delay elements for improved phase resolution.
output to ensure that each successive tap provides a signal

that is adjacent in phase to the signals at its adjacent taps.
Although conventional delay lines are attractive for their
simplicity, DLLs designed around such conventional delay
lines suffer from several significant limitations. First, the delay
line provides fairly coarse phase resolution. For example, the
delay line in Fig. 2 provides a minimum phase step corresponding to two inverter delays. Such coarse phase resolution
is not fine enough for our clock alignment application. Second,
conventional delay lines deliver only a finite phase range.
Typically, in order to cover at least one full cycle of phase, the
delay-line length and element delays are adjusted to provide
at least 360 of phase under the fastest process, voltage,
and temperature (PVT) conditions and minimum operating
More often, however, the delay
frequency
line is designed with as much as 720 (i.e., two cycles)
of phase under these conditions. This requires the use of a
long delay line, occupying a large silicon area and dissipating
additional power as the input signal propagates through the
many delay elements. Additionally, because inverters offer
poor PSRR, voltage supply noise-induced jitter can accumulate
as the signal propagates down the delay line. This causes
the signals available from the later taps in the delay line
to be more jitter prone than the signals from the earlier
taps. Last, even with an extended delay line, the DLL can
nonetheless run out of phase range and lose lock in a system
with slowing drifting phase (e.g., spread-spectrum clocking).
These limitations prohibited the use of a conventional delay
line in our DLL design.
B. Delay-Line Improvements
To overcome some of these limitations, we developed a
complementary delay line as shown in Fig. 3 for our DLL.
In this architecture, two parallel delay lines with weak cross
coupling are driven by complementary input signals ClkIn and
ClkInb. Because of the use of complementary inputs, the two

delay lines are tapped after every inverter to provide phaseadjacent signals separated by only one inverter delay, thereby
improving the phase resolution by a factor of two. An example
of how this delay-line scheme provides single inverter delay
resolution is shown by the shaded paths in Fig. 3. The signal
that emerges from Tap 2 has passed through three inverter
delays, while the signal that emerges from Tap 3 has passed
through four inverter delays. However, ClkInb is exactly 180
out of phase with ClkIn, providing the additional inversion
required to ensure that the signals emerging from Taps 2 and
3 are indeed separated in phase by exactly one inverter delay.
This complementary delay-line architecture also allows the
delay lines to be made shorter. The true taps from the delay
line can provide the first 180 of phase, while the complement
taps can provide the second 180 of phase. Thus, each of
the two delay lines can be tuned for only 180 of phase
Shorter delay
under the fastest PVT conditions and
lines provide the additional benefits of reduced maximum
jitter accumulation, smaller silicon area, and lower power
consumption. The problem that this design creates is a need to
determine when to switch from the true taps to the complement
taps and vice versa to ensure full and even coverage of the
entire 360 phase plane. This is particularly important because
the number of delay elements (and output taps) needed to cover
180 changes with PVT conditions and operating frequency.
C. Infinite Phase Generation
To solve the problem of determining when to switch between the true and complement taps of the complementary
delay line, we developed an end-of-cycle (EOC) detector, as
shown in Fig. 4, for use with the complementary delay line. An
EOC detector is essentially a bank of data flip-flops arranged
as a time-to-digital converter for measuring the delay through
the delay line. The EOC detector produces a thermometer code
635
Fig. 4. EOC detector circuit (180 ).
In other words, to travel counterclockwise around the phase

plane, the DLL would successively select Taps 14, then Taps
1b4b, then Taps 14, etc., to provide infinite phase range.
In this manner, all phase steps are equivalent to at most one
inverter delay (i.e., 50 ), except for the Tap 4 to Tap 1b and
the Tap 4b to Tap 1 transitions, which are less (30 ).
III. RESOLUTION-ENHANCING CIRCUIT TECHNIQUES
A. Phase Blending
Fig. 5. Phasor diagram with phasors of signals from the taps of a complementary delay line with one inverter delay
50 :
indicating the first 180 of delay in the delay lines. The first
state transition in the EOC code indicates the first true tap
from the delay line that provides a signal with phase that
lags the phase of the signal from Tap 1 by more than 180
With this information, the DLL logic knows when to switch
between the true and complement taps of the delay line to
ensure full coverage of all 360 of phase space, with phase
steps of at most one inverter delay. Use of the EOC code also
prevents negative phase steps in the phase-transfer function as
taps are successively selected from the delay line. This allows
the complementary delay lines to provide infinite, monotonic
phase range for the DLL. The clocking signal for the EOC
detector, SampClk, is synchronized to the signal from Tap 1
by a replica timing network (not shown).
To illustrate the principle of infinite phase generation using
the EOC code with this delay-line scheme, refer to Fig. 5,
which shows a phasor diagram of the signals from the first
five true and complement taps of a complementary delay line
like the one shown in Fig. 3. The figure assumes that the
PVT conditions and operating frequency are such that the
propagation delay of each inverter stage is equal to 50 of
phase. In the figure, the solid lines correspond to signals from
the true taps, while dashed lines correspond to signals from
the complement taps. Because Tap 5 delivers a signal that is
delayed by 200 from the signal at Tap 1, the EOC detectors
thermometer code would indicate that Tap 5 is the first true
tap to provide a signal with phase beyond 180 relative to the
signal from Tap 1. With this information, the DLL knows to
switch between the true and complement taps after four stages.
Although the delay-line improvements discussed above reduced the required power and area of the delay line, improved
its jitter accumulation performance, enabled infinite phase
range, and improved the available phase resolution by a factor
of two, this phase resolution was still not good enough to
meet the requirements of our memory system. In the 0.4- m
process we used, the propagation delay of one inverter over all
anticipated PVT conditions varied from 100 to 300 ps. This
is much larger than the worst case phase step specification of
50 ps. Therefore, to ensure compliance with this specification,
the DLLs phase resolution needed to be improved by at least
six times over what the delay line provided.
To solve this problem, we used inverter phase blending. A simple, single-stage phase-blender circuit is shown
in Fig. 6(a). This circuit receives two phase-adjacent input
and
, which are separated in phase by one
signals,
inverter delay. The phase blender directly passes these two
and
signals with a simple delay to produce output signals
However, it also uses a pair of phase-blending inverters to
interpolate between these two input signals to produce a third
, having a phase between that of
and
output signal,
This effectively doubles the available phase resolution.
However, it is not sufficient to use equal-sized inverters
for the phase blending. Fig. 6(b) illustrates a simple model
[12] used for determining the ideal relative sizes of the two
lies
phase-blending inverters to ensure that the phase of
and
The model approximates
directly between that of
the two inverters with two simple switched current sources
sharing a common resistancecapacitance (RC) load. For two
the model
rising edge input signals separated in time by
yields the equation
(2)
636
(a)
(b)
(c)
(d)
(e)
Fig. 6. Phase blending for phase-resolution improvement. (a) Single-stage phase-blender circuit, (b) simple model of phase-blending inverters, (c) plot of
signal voltages in the simple model for w
WA =(WA + WB ) = 0:50, (d) phase-blender output signal edges for w = 0:50, and (e) phase-blender
output signal edges for w = 0:60:
where is the total resistive load, is the output capacitance,

is the total pulldown current of the two phase-blending
is the unit step function, and
is the phaseinverters,
blending inverter relative size ratio [refer to Fig. 6(a), where
is the ratio of the device widths in
inverter to the total device widths in both inverters and
]. Equation (2) is the sum of two decaying exponential terms,
and Fig. 6(c) shows a plot of the resulting waveform according
Because the
to this equation for the case where
relative to
second exponential term is delayed in time by
the first, it only begins to affect the slope of the decay after
this delay has elapsed. Therefore, without explicitly solving
and
it is not
the equation for each case of
will cross
obvious when
For input signals separated in phase by one inverter delay
), the model specifies that in order to ensure
(i.e.,
lies directly in between that of
that the phase of
and
the phase-blending inverters must be sized in a
ratio, such that the leading phase is
coupled to an inverter that is bigger than the one that receives
the lagging phase. This ratio was also confirmed empirically
with simulations. The effect of the relative sizing of the phaseblending inverters is illustrated in Fig. 6(d) and (e), which
and
shows the resulting output signal edges for
, respectively. Clearly, the phase of output signal
is closer to that of
than to that of
when the
Although
phase-blending inverter size ratio is
inverter sizing ensures good, evenly
asymmetrical
spaced edge placement of the three output signals, it requires
lead
Reversing the phase of these two input
that
since the
signals would result in a severely misplaced
effective sizing ratio would then be
Another design constraint of the phase-blender circuit is that

all paths through the circuit must provide precisely the same
loading and delay to ensure that the phase relationship between
and
is maintained by
and
The phase-blender idea can be extended to multiple cascaded stages for further phase-resolution improvement, with
each additional stage improving the resolution by a factor of
two. Fig. 7 shows a two-stage cascaded phase-blender circuit
that provides a 4x improvement in phase resolution from input
to output. Although it is theoretically possible to increase phase
resolution indefinitely by adding more and more phase-blender
stages, there is a practical limit. The number of inverters in
each signal path increases by two with each additional phaseblending stage, making the circuit increasingly susceptible
to voltage supply noise-induced jitter due to the additional
delay in the signal path. Therefore, it is prudent to increase
the number of blending stages to improve phase resolution
only until the output phase step size from the phase blender
is approximately equivalent to the anticipated voltage supply
noise-induced jitter.
There are several design limitations that must be considered
when designing a cascaded phase blender. First, the importance of proper (asymmetrical) sizing of the phase-blending
inverters grows with the number of cascaded blending stages
because edge misplacement has a compounding effect as the
signals travel through the multiple stages. Additionally, close
attention must be paid to ensuring equal loading for equal
delay through all paths, requiring the use of dummy devices
on otherwise unbalanced paths. Finally, like a single-stage
phase blender, a cascaded phase blender also requires the
to lead that of
to ensure even output phase
phase of
spacing.
637
Fig. 7. Two-stage, cascaded phase-blender circuit for 4x phase-resolution improvement.
Fig. 8. Three-stage, symmetrical phase-blender circuit.
To overcome these design limitations of the cascaded phase

blender, we developed a symmetrical phase blender. A block
diagram of a three-stage symmetrical phase blender is shown
in Fig. 8. This circuit is essentially two parallel cascaded
phase-blender circuits, sharing some common paths. When
leads
the outputs
provide
leads
the outequal output phase spacing. When
provide equal output phase
puts
spacing. Therefore, the circuit provides phase blending with an
8x improvement in phase resolution and equally spaced output
signals regardless of which input signal leads in phase.
Additionally, the symmetrical blender allows for seamless
input switching for continuous phase blending over multiple
leads
in
input delays. For example, assume that
phase. Beginning with output

outputs
can be successively selected to evenly span
and
Once
is selected,
the phase range between
can be changed to another signal that lags
This
beswitching is possible without affecting the signal
has no dependence on or coupling from
Then
cause
can be successively seoutputs
and
lected to evenly span the phase range between
Once
is selected,
can be changed to yet another
Again, this is possible without any change
signal that lags
because
has no dependence on or
in the signal
This process can continue indefinitely.
coupling from
Also, because all paths through the symmetrical phase blender
are inherently balanced, no dummy devices are needed.
638
(a)
(b)
Fig. 9. (a) A 16 : 1 duty-cycle correcting multiplexer circuit. (b) Duty-cycle correction control circuit.
B. Signal Selection and Duty-Cycle Correction

Since the digital DLL was to be placed into a memory
system that exchanges data on both edges of the clock, good
duty cycle (i.e., close to 50%) is required to ensure that the
data exchanged on either edge of the clock have equal bit
times. Duty-cycle distortion is usually addressed in PLLs by
simply running the PLLs voltage-controlled oscillator (VCO)
at twice the system frequency and using a postdivider triggered
on one edge of the VCO output to produce the output clock
from the PLL [13][15]. This ensures good, 50% duty cycle. In
a DLL, however, no frequency multiplication is possible. The
duty cycle of the output signal must be directly corrected to
50%, for example, by using a duty-cycle correcting amplifier
in the signal path as in Fig. 1 and in [4].
Although duty-cycle correction can be addressed by placing
a duty-cycle corrector at the output of the DLL, this approach
has several limitations. First, since duty cycle is corrected only
at the output of the DLL, internal DLL signals may have
poor duty cycle. It is good practice, however, to maintain
50% duty cycle throughout the signal path to maximize signal
propagation as frequency is increased. Second, performing all
the duty-cycle correction in one stage at the output of the
DLL places a great deal of strain on the duty-cycle correcting
circuit; it must have a large duty-cycle correction range to
compensate for all the duty-cycle distortion that can accumulate in the signal path. Finally, adding a duty-cycle corrector
directly into the signal path increases signal path delay, and
thus susceptibility to voltage supply noise-induced jitter.
To address the issue of duty cycle, we developed the
idea of duty-cycle correcting multiplexers. Since multiplexers
would be needed in our DLL regardless, by adding duty-
cycle correcting functionality to the multiplexing circuitry, we

implemented duty-cycle correction while requiring minimal
additional power, area, and delay.
A 16 : 1 duty-cycle correcting multiplexer is shown in
Fig. 9(a) with a corresponding control circuit in Fig. 9(b). To
facilitate understanding of this circuits operation, consider an
example. Assume that signal
is selected and has dutyhas a high
cycle distortion such that output signal
is sensed by a duty-cycle
duty cycle. Assume also that
error detector, which produces a differential output error signal
proportional to the difference in duty cycle beand the ideal 50%. Thus, in our example,
tween
will be greater than
causing more current to be steered
through the right branch of the control signal in Fig. 9(b) than
through the left side. This in turn increases the strength of
and
compared to
and
in the duty-cycle
correcting multiplexer of Fig. 9(a). These transistors alter the
to
driving
duty cycle of the signal as it passes from
to the ideal 50% duty cycle. The use of both PMOS and
NMOS devices to perform the duty-cycle correction ensures
a symmetrical duty-cycle correction range. Furthermore, because duty-cycle correction has been distributed through two
stages, the requirements on each individual duty-cycle correcting stage are reduced. By combining both necessary functions
of signal selection and duty-cycle correction, this circuit
minimizes signal path delay, jitter accumulation, circuit area,
and power compared to performing both functions separately.
IV. DLL ARCHITECTURE
Fig. 10 is a block diagram of the entire digital DLL, with
shading indicating the circuit blocks that were described in
Fig. 10.
639
Complete block diagram of the new digital DLL.
greater detail above. The DLL receives an input clock ExtClk

and passes it through a clock amplifier and splitter to provide
the two complementary input signals (ClkIn and ClkInb) to a
16-stage, 32-tap complementary delay line with EOC detector.
The delay line provides 32 signals at its output taps, which then
feed into two 32 : 1 duty-cycle correcting multiplexers. Each
multiplexer selects one of a pair of phase-adjacent signals
from the delay line. The two selected signals then pass to
a three-stage, 2 : 16 symmetrical phase-blender circuit, which
improves the phase resolution by a factor of eight. A final 16 : 1
duty-cycle correcting multiplexer selects one of the phaseblender output signals and passes it through a clock tree to
provide the DLLs output signal ClkOut. The digital DLL also
includes two independent duty-cycle correction loops as shown
in the figure. By using two separated duty-cycle correcting
loops, duty-cycle correction is distributed throughout the signal
path. This ensures a good duty cycle throughout the signal path
and reduces the duty-cycle correcting requirements of any one
stage.
The DLL uses bang-bang-type, all-digital feedback to lock
the phase of its output signal ClkOut to that of a reference
signal RefClk. A phase detector compares the phase of ClkOut
to RefClk and produces a binary error signal, which passes
through an optional digital filter to a control logic circuit. The
digital filter is a simple majority detector, which has no effect
when the loop is acquiring lock but reduces dithering once
lock is acquired. The control logic is composed of simple
combinational logic and counters that drive the multiplexers
to select the two phase-adjacent coarse phase signals from the
delay line and the fine phase signal from the phase blender
that minimize the phase error between ClkOut and RefClk.
Because the phase information is stored in this DLL as a
digital state, the DLL can quickly recover from low-power
modes, requiring only enough time for the signals to propagate
(a)
(b)
Fig. 11. Test-chip micrograph showing on the left side (a) the analog DLL
of [6] and on the right side (b) the new digital DLL integrated into identical
interface cells.
through the signal path of the circuit from ExtClk to ClkOut

to provide a phase-locked output signal.
It is important to recognize the role of the EOC detector and
code in this architecture. Because the delay line and blender
are uncontrolled, open-loop circuits, the architecture relies on
the control circuits use of the EOC code to ensure proper
coarse phase selection, small maximum phase step size, and
phase transfer function monotonicity. The EOC code enables
the control logic to determine when to switch between the true
and complement taps of the delay line to ensure that phaseadjacent taps are always selected by the coarse multiplexers
for the phase blender. The EOC code also enables the control
logic to determine which set of blender taps provides evenly
spaced output signals.
640
(a)
Fig. 12.
(b)
Measured transmit eye diagrams at 3.3 V and 400 MHz of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.
V. MEASURED PERFORMANCE
A. Test Chip
Both the digital DLL presented here and an implementation
of the analog DLL of Donnelly et al. [6] were integrated into
identical high-speed CMOS interface cells on opposite sides
of a single test chip. A micrograph of this test chip is shown in
Fig. 11. The test chip I/O was laid out symmetrically so that
either interface cell could be tested on the same hardware by
simply removing the test chip from the test socket, rotating
it 180 and reinserting it into the socket. This allowed a
true side-by-side comparison of the two DLLs operating in a
system. The test-chip circuits were fabricated using a standard
0.4- m, 3.3-V CMOS process with 0.65-V threshold voltages.
B. Test Results
Unless indicated otherwise, all test results described in this
section were measured with the analog and digital DLLs
operating in their respective high-speed interface cells at 3.3 V
and 400 MHz (800 Mb/s/pin) using the same test vectors.
Additionally, the test chip included noise-generator circuits,
which produced digital switching noise during the testing of
both interfaces.
Fig. 12(a) and (b) shows eye diagrams of the two interfaces
with the analog and digital DLLs, respectively. The diagrams
indicate the output timing performance of the interface cells
in the test system. Although the interface with the analog
DLL provided slightly better timing performance, 320 ps pp
versus 380 ps pp for the interface with the digital DLL, the
performances of both interfaces (and therefore, both DLLs)
were comparable. This is surprisingly good considering the
extensive use of poor PSRR elements, such as inverters, in
the signal path of the digital DLL. (Note: I/O circuit dutycycle distortion produced the unequal eyes in both diagrams.
This is unrelated to the DLLs.)
Fig. 13(a) and (b) shows receive shmoo diagrams for the
two interfaces with the analog and digital DLLs, respectively.
The diagrams indicate the CMOS interfaces valid timing windows for receiving data. On the diagrams, the -axis is supply
4.0 V) while the -axis indicates input
voltage (2.5 V
Mb/s
ns).
data positioning along a bit period (
The normal data position is in the center of the bit period. A
black dot in the diagram indicates incorrectly received data for
Ideally, the window
that combination of bit position and
should be entirely white, but realistically, it is limited by jitter
from the DLL and other sources. Therefore, this test measures
the amount of tolerable skew on the input timing over a range
of supply voltages. Although the interface with the analog DLL
delivers better timing performance than the interface with the
digital DLL (1.02 versus 0.92 ns), both meet the component
specification of 0.85 ns.
Fig. 14 is a circle plot of the measured phase of the DLLs
output signal ClkOut, illustrating the DLLs ability to provide
infinite phase range. The -axis indicates delay [or phase, as in
(1)] of the ClkOut signal relative to a fixed 400-MHz signal.
The -axis indicates cycle count. These data were measured by
probing the on-chip DLL output signal (ClkOut) and forcing
the DLLs phase-detector output low. This caused the DLLs
output phase to continually advance over time. The term circle
plot is used because this diagram is equivalent to sweeping a
phasor that represents the phase of ClkOut around the phase
plane, thereby drawing a circle in the phase plane. Because
the phase of ClkOut is measured relative to a fixed 400-MHz
ns
signal, the plotted delay appears modulo 2.5 ns, where
641
(a)
(b)
Fig. 13. Measured shmoo diagrams showing the 400-MHz receive timing windows of the high-speed interface cells with (a) the analog DLL of [6]
and (b) the new digital DLL.
Fig. 14.
Measured circle plot illustrating the infinite phase transfer characteristic of the digital DLL.
at 400 MHz. The absolute value of delay (i.e., from 3.4

to 5.9 ns) is irrelevant since it includes some test-system setup
time. The data were measured and plotted using a time-interval
analyzer.
The circle plot illustrates the DLLs phase transfer function,
showing its reasonably good linearity, monotonicity, and lack
of discontinuities. The small bumps in the transfer function
indicate a change in coarse reference phase selected from
the delay line. The slope of the transfer function depends on

PVT conditions and system frequency, since these conditions
determine how many delay-line taps are required to provide
180 of phase. In this case, nine taps were required, resulting
in an average phase step size of 20 ps or 2.9
Table I presents a summary of many of the measured and
simulated results of the analog and digital DLLs operating in
their respective CMOS interfaces. Although the analog DLL
642
(a)
Fig. 15.
(b)
Measured DLL power consumption (a) as a function frequency for
TABLE I
ANALOG AND DIGITAL DLL PERFORMANCE SUMMARY AT 3.3 V
AND
400 MHz
uses less power and area, and provides better timing performance (smaller long-term jitter) and phase resolution (smaller
maximum phase step), both DLLs enable the interface cells to
meet the component requirements when operating in the test
system. Additionally, the digital DLL has a higher maximum
operating frequency, works at lower supply voltages, and
requires much less effort to port to other processes (one versus
four man-months).
Fig. 15(a) and (b) shows plots of measured DLL power
versus frequency at
V and measured DLL power
MHz, respectively. Although
versus voltage supply at
both plots show that the digital DLL dissipated more power
than the analog DLL for all measured conditions, the plots illustrate the different characteristics of the power consumed by
the two DLLs. As mentioned earlier, the power of both DLLs
is distributed between IV power in the constant-current stages
and CV f power in the CMOS stages. The curves in Fig. 15(a)
show that the digital DLLs power dissipation has a greater
dependence on frequency than does the analog DLLs power.
The curves in Fig. 15(b) show that the digital DLLs power
dissipation has a predominantly square-law dependence on
supply voltage, whereas the analog DLLs power dissipation
has a mixed square-law and linear dependence. These trends
confirm that the power of the analog DLL has a relatively
higher IV term, whereas the power of the digital DLL has a
DD = 3:3 V and (b) as a function supply voltage for f
= 400 MHz.
relatively higher CV f term. This indicates that digital DLLs

have the potential for providing better power scaling than
analog DLLs as supply voltages decrease in the future.
Finally, we have shown in Table I and in Fig. 15(b) that
the digital DLL operates at lower supply voltages than the
analog DLL. Although the operation of the digital DLL was
limited to 1.7 V, this limitation was due to our use of several
analog elements in the digital DLL (i.e., it was a mostly digital
DLL). The digital DLL used an analog clock amplifier, two
analog duty-cycle error detectors (see Fig. 10), and an analog
quadrature phase detector (in a second loop, not shown). Using
an analog design for these circuit blocks in the digital DLL
was faster to implement without preventing evaluation of the
key digital blocks in the DLL, but their use determined the
minimum supply voltage of the digital DLL.
VI. CONCLUSION
We have described the architecture of a portable digital
DLL and demonstrated that it provides jitter performance
comparable to an analog DLL when fabricated in the same
3.3-V, 0.4- m standard CMOS process. Several circuits were
developed to enable the DLL to provide very fine phase
resolution, infinite phase range, and good duty-cycle performance throughout the signal path. Despite its relatively simple
architecture, the digital DLL meets all system specifications,
and it operates down to lower supply voltages than its analog
counterpart. Utilizing essentially only simple digital CMOS
gates, the DLL can be ported to new processes in minimal time. For these reasons, this digital DLL provides an
alternative to analog DLLs for clock alignment applications.
ACKNOWLEDGMENT
The authors thank J. McBride and P. Gordon for layout
support and S. Sidiropoulos for helpful insights.
REFERENCES
[1] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, Multifrequency
zero-jitter delay-locked loop, IEEE J. Solid-State Circuits, vol. 29, pp.
6770, Jan. 1994.
[2] J.-M. Han, J. Lee, S. Yoon, S. Jeong, C. Park, I. Cho, S. Lee, and D. Seo,
Skew minimization techniques for 256 Mb synchronous DRAM and
beyond, in VLSI Circuits Dig. Tech. Papers, June 1996, pp. 192193.
[3] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H.
Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura,
K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T.
Anezaki, M. Hasegawa, and M. Taguchi, A 256 Mb SDRAM using
register-controlled digital DLL, in ISSCC 1997 Dig. Tech. Papers, Feb.
1997, pp. 7273.
[4] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Johnson, and T. Ishikawa, A
2.5 V CMOS delay-locked loop for 18 Mbit, 500 megabyte/s DRAM,
IEEE J. Solid-State Circuits, vol. 29, pp. 14911496, Dec. 1994.
[5] S. Sidiropoulos and M. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[6] K. Donnelly, Y. Chan, J. Ho, C. Tran, S. Patel, B. Lau, J. Kim, P.
Chau, C. Huang, J. Wei, L. Yu, R. Tarver, R. Kulkarni, D. Stark, and M.
Johnson, A 660MB/s interface megacell portable circuit in 0.3 m0.7
m CMOS ASIC, IEEE J. Solid-State Circuits, vol. 31, pp. 19952003,
Dec. 1996.
[7] N. Kushiyama, S. Ohshima, D. Stark, H. Noji, K. Sakurai, S. Takase,
T. Furuyama, R. Barth, A. Chan, J. Dillon, J. Gasbarro, M. Griffin,
M. Horowitz, T. Lee, and V. Lee, A 500-Megabyte/s data-rate 4.5M
DRAM, IEEE J. Solid-State Circuits, vol. 28, pp. 490508, Apr. 1993.
[8] M. Hasegawa, M. Nakamura, S. Narui, S. Ohkuma, Y. Kawase, H.
Endoh, S. Miyatake, T. Akiba, K. Kawakita, M. Yoshida, S. Yamada, T.
Sekigguchi, I. Asano, Y. Tadaki, R. Nagai, S. Miyaoka, K. Kajigaya, M.
Horiguchi, and Y. Nakagome, A 256 Mb SDRAM with subthreshold
leakage current suppression, in ISSCC 1998 Dig. Tech. Papers, Feb.
1998, pp. 8081.
[9] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara,
T. Matano, Y. Hoshino, K. Miyano, S. Isa, E. Kakehashi, J. Drynan,
M. Komuro, T. Fukase, H. Iwasaki, J. Sekine, M. Igeta, N. Nakanishi,
T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose,
T. Imura, M. Uziie, K. Koyama, Y. Fukuzo, and T. Okuda, A 2.5
ns clock access 250 MHz 256 Mb SDRAM with synchronous mirror
delay, ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 374375.
[10] B. Garlepp, K. Donnelly, J. Kim, P. Chau, J. Zerbe, C. Huang, C. Tran,
C. Portmann, D. Stark, Y. Chan, T. Lee, and M. Horowitz, A portable
digital DLL architecture for CMOS interface circuits, in VLSI Circuits
Dig. Tech. Papers, June 1998, pp. 214215.
[11] M. Griffin, J. Zerbe, A. Chan, Y. Jun, Y. Tanaka, W. Richardson, G.
Tsang, M. Ching, C. Portmann, Y. Li, B. Stonecypher, L. Lai, K. Lee,
V. Lee, D. Stark, H. Modarres, P. Batra, J. Louis-Chandran, J. Privitera,
T. Thrush, B. Nickell, J. Yang, V. Hennon, and R. Sauve, A process
independent 800 MB/s DRAM bytewide interface featuring command
interleaving and concurrent memory operation, in ISSCC 1998 Dig.
Tech. Papers, Feb. 1998, pp. 156157.
[12] S. Sidiropoulos, High-performance interchip signalling, Ph.D. dissertation, Computer Systems Laboratory, Stanford University, Stanford, CA, Apr. 1998. Available as Tech. Rep. CSL-TR-98-760 from
http://elib.stanford.edu/.
[13] I. Young, M. Mar, and B. Bhushan, A 0.35 m CMOS 3-880 MHz
PLL N/2 multiplier and distribution network with low jitter for microprocessors, in ISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 330331.
[14] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,
in ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 132133.
[15] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, A 600
MHz CMOS PLL microprocessor clock generator with a 1.2 GHz
VCO, in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 396397.
643
Kevin S. Donnelly (A93) was born in Los Angeles,

CA, in 1961. He received the B.S. degree in electrical engineering and computer science from the
University of California, Berkeley, in 1985 and the
M.S. degree in electrical engineering from San Jose
State University, San Jose, CA, in 1992.
He was with Memorex, Sipex, and National Semiconductor, specializing in bipolar and BiCMOS
analog circuits for disk-drive read/write and servo
channels. In 1992, he joined Rambus, Inc., Mountain View, CA, where he has designed high-speed
CMOS PLL circuits for clock recovery and data synchronization, and highspeed I/O circuits. He currently manages a group developing I/O circuits
and PLLs. His interests include PLLs and DLLs, I/O circuits, and data
converters. He is a Member of the ISSCC Digital Subcommittee. He has
received several circuit design patents.
Mr. Donnelly is a coauthor of the paper that won the Best Paper Award
at the 1994 ISSCC.
Jun Kim was born in Tokyo, Japan, on November

14, 1966. He received the B.S.E.E. degree from the
University of California, Berkeley, in 1989.
From 1989 to 1991, he was with Vitelic, Inc.,
where he worked on SRAM and DRAM development. Between 1991 and 1994, he was with Sun
Microsystems, where he was involved in microprocessor and digital circuit design. Since 1994, he
has been with Rambus, Inc., Mountain View, CA,
as a Designer of high-speed CMOS I/O and DLL
circuits.
Pak S. Chau was born in Hong Kong in 1966.

He received the B.S. degree in computer system
engineering from the University of Massachusetts,
Amherst, in 1989 and the M.S. degree in electrical engineering from the University of California,
Davis, in 1991.
He was with National Semiconductor and Chrontel, Inc., where he worked as an Analog Circuit
Designer. In 1994, he joined Rambus, Inc., Mountain View, CA, where he has engaged in designing
high-speed I/O and DLL circuits.
Jared L. Zerbe was born in New York, NY, in

1965. He received the B.S. degree in electrical engineering from Stanford University, Stanford, CA,
in 1987.
He joined VLSI Technology, Inc., in 1987, where
he worked on semicustom ASIC design. In 1989, he
joined MIPS Computer Systems, where he designed
high-performance floating-point blocks. Since 1992,
he has been with Rambus Inc., Mountain View, CA,
where he has specialized in the design of highspeed I/O and PLL/DLL clock recovery and data
synchronization circuits.
Bruno W. Garlepp was born in Bahia, Brazil, on

October 29, 1970. He received the B.S.E.E. degree
from the University of California, Los Angeles,
in 1993 and the M.S.E.E. degree from Stanford
University, Stanford, CA, in 1995.
In 1993, he joined the Hughes Aircraft Advanced
Circuits Technology Center, Torrance, CA. There,
he designed high-precision analog integrated circuits
for A/D applications, as well as CMOS, bipolar,
and SiGe RF circuits for wide-band communications applications. In 1996, he joined Rambus, Inc.,
Mountain View, CA, where he designs and develops high-speed CMOS
clocking and I/O circuits for synchronous chip-to-chip communication.
Charles Huang received the B.S. degree in electrical engineering from the University of Fuzhou,
China, in 1982 and the M.S. degree in electrical
engineering from the University of Arkansas, Fayetteville, in 1990.
He was with ULSI and SGI, working in the area
of PLL and cache circuit design. He joined Rambus,
Inc., Mountain View, CA, in 1994, where he has
being engaged in high-speed CMOS DLL and I/O
circuit design.
644
Chanh V. Tran was born in Vietnam in 1964. He

received the B.S. degree in electrical engineering
and computer science form the University of California, Berkeley, in 1989.
From 1989 to 1992, he was with National Semiconductor Corp., Santa Clara, CA, where he worked
on CMOS mixed-signal IC design in the Data
Acquisition Group. In 1992, he joined Rambus Inc.,
Mountain View, CA, where he has been involved in
DLL and high-speed I/O design.
Clemenz L. Portmann (S92M95) received the

B.S.E.E. degree from the University of Washington,
Seattle, in 1986, the M.S.E.E. degree from the
University of Hawaii at Manoa, Honolulu, in 1988,
and the Ph.D. degree in electrical engineering from
Stanford University, Stanford, CA, in 1995.
From 1988 to 1989, he was a Visiting Researcher
at Nagoya University, Nagoya, Japan, and the Toyohashi University of Technology, Toyohashi, Japan,
under the Monbusho (Ministry of Education) scholarship program. From 1989 to 1990, he was a
Design Engineer for VLSI Technology, Inc., San Jose, CA, where he designed
standard cell libraries and SRAMs for ASIC designs. In 1995, he joined
Rambus, Inc., Mountain View, CA, where he is engaged in the design of
high-speed I/O circuits and DLLs for DRAM interfaces.
Yiu-Fai Chan (S76M78) received the B.S. and

M.S. degrees in electrical engineering and computer
science (with highest honors) from the University
of California (UC), Berkeley, in 1972 and 1973,
respectively.
He joined Rambus, Inc., Mountain View, CA, in
1992, where he is Director of Engineering, responsible for the development, application engineering,
and customer support of high-speed mixed-signal
circuits, device packaging, signal integrity, and system engineering. Prior to that, he was with Tera
Microsystems in charge of developing chips for workstations based on the
Sparc architecture. He was with Altera Corp. from 1983 to 1990, where he
led a team of engineers to develop the industrys first CMOS programmable
logic devices. From 1976 to 1983, he held various technical and management
positions at Intersil, Inc. (later a division of General Electric), where he was
engaged in the development of various CMOS memories, microprocessors,
and peripheral devices. It was there that he developed the first EPROM devices
in CMOS technology. From 1974 to 1976, he designed calculator and TV
game integrated circuits at National Semiconductor. He has received several
patents in circuits and systems technologies.
Mr. Chan is a member of Tau Beta Pi, Phi Beta Kappa, and Eta Kappa
Nu. He received the University Science Fellowship from UC Berkeley and
conducted research on solid-state devices and microwave acoustics. He has
published in various IEEE technical publications and presented papers at IEEE
technical conferences.
Thomas H. Lee (S87M87), for a photograph and biography, see this issue,
p. 585.
Donald Stark received the B.S. degree from the
Massachusetts Institute of Technology, Cambridge,
in 1985 and the M.S. and Ph.D. degrees from
Stanford University, Stanford, CA, in 1987 and
1991, respectively, all in electrical engineering.
His research interests at Stanford included circuit
design and CAD tools for analysis of voltage and
current distributions in VLSI circuits. From 1987
to 1991, he was also a Member of the Western
Research Laboratory, Digital Equipment Corp., Palo
Alto, CA, working on CAD development and ECL
circuit design. From 1991 to 1993, he was with the Semiconductor Device
Engineering Laboratory, Toshiba Corp., Kawasaki, Japan, working on DRAM
design. In 1993, he joined Rambus, Inc., Mountain View, CA, where he
currently works on DRAM, high-speed I/O design, and CAD.
Mark A. Horowitz, for a photograph and biography, see p. 528 of the April
1999 issue of this JOURNAL.

Great DLL Article

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Great DLL Article

Cargado por

Copyright:

Formatos disponibles

632

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

A Portable Digital DLL for

Abstract A digital delay-locked loop (DLL) that achieves

N RECENT years, there has been a great deal of interest

the interface cells to provide internal on-chip clocks that

00189200/99$10.00 1999 IEEE

GARLEPP et al.: PORTABLE DIGITAL DLL

Fig. 1. Block diagram of a two-step, coarse/fine analog DLL architecture.

simple, digital circuit elements. This facilitates their design and

CMOS interface cells on the same test-chip die. Section VI

is delay in seconds, and

II. DIGITAL DELAY CIRCUIT TECHNIQUES

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Fig. 2. Conventional digital delay line with inverter delay elements.

output to ensure that each successive tap provides a signal

ClkInb. Because of the use of complementary inputs, the two

GARLEPP et al.: PORTABLE DIGITAL DLL

Fig. 4. EOC detector circuit (180 ).

In other words, to travel counterclockwise around the phase

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

where is the total resistive load, is the output capacitance,

Another design constraint of the phase-blender circuit is that

GARLEPP et al.: PORTABLE DIGITAL DLL

Fig. 7. Two-stage, cascaded phase-blender circuit for 4x phase-resolution improvement.

Fig. 8. Three-stage, symmetrical phase-blender circuit.

To overcome these design limitations of the cascaded phase

phase. Beginning with output

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

B. Signal Selection and Duty-Cycle Correction

cycle correcting functionality to the multiplexing circuitry, we

GARLEPP et al.: PORTABLE DIGITAL DLL

Complete block diagram of the new digital DLL.

greater detail above. The DLL receives an input clock ExtClk

through the signal path of the circuit from ExtClk to ClkOut

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

GARLEPP et al.: PORTABLE DIGITAL DLL

at 400 MHz. The absolute value of delay (i.e., from 3.4

the delay line. The slope of the transfer function depends on

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Measured DLL power consumption (a) as a function frequency for

DD = 3:3 V and (b) as a function supply voltage for f

relatively higher CV f term. This indicates that digital DLLs

GARLEPP et al.: PORTABLE DIGITAL DLL

Kevin S. Donnelly (A93) was born in Los Angeles,

Jun Kim was born in Tokyo, Japan, on November

Pak S. Chau was born in Hong Kong in 1966.

Jared L. Zerbe was born in New York, NY, in

Bruno W. Garlepp was born in Bahia, Brazil, on

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Chanh V. Tran was born in Vietnam in 1964. He

Clemenz L. Portmann (S92M95) received the

Yiu-Fai Chan (S76M78) received the B.S. and

También podría gustarte

Fig. 4. EOC detector circuit (180 ).