Está en la página 1de 8

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/267764259

A Quantum-Dot Cellular Automata Processor


Design
CONFERENCE PAPER SEPTEMBER 2014
DOI: 10.1145/2660540.2660997

READS

48

6 AUTHORS, INCLUDING:
Elverton C Fazzion

Osvaldo Fonseca

Federal University of Minas Gerais

Federal University of Minas Gerais

9 PUBLICATIONS 3 CITATIONS

8 PUBLICATIONS 3 CITATIONS

SEE PROFILE

SEE PROFILE

Jos Augusto Miranda Nacif

Omar Vilela Neto

Universidade Federal de Viosa (UFV)

Federal University of Minas Gerais

24 PUBLICATIONS 71 CITATIONS

32 PUBLICATIONS 86 CITATIONS

SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate,


letting you access and read them immediately.

SEE PROFILE

Available from: Elverton C Fazzion


Retrieved on: 08 March 2016

A Quantum-Dot Cellular Automata Processor Design


Elverton Fazzion
Computer Science
Department (UFMG)
Belo Horizonte, Brazil

elverton@dcc.ufmg.br
Omar P. Vilela Neto
Computer Science
Department (UFMG)
Belo Horizonte, Brazil

omar@dcc.ufmg.br

Osvaldo L. H. M.
Fonseca
Computer Science
Department (UFMG)
Belo Horizonte, Brazil

Jose Augusto M. Nacif


Science and Technology
Institute, Florestal (UFV)
Viosa, Brazil

osvaldo.morais@dcc.ufmg.br
Antonio Otavio
Fernandes
Computer Science
Department (UFMG)
Belo Horizonte, Brazil

otavio@dcc.ufmg.br

jnacif@ufv.br
Douglas S. Silva
Computer Science
Department (UFMG)
Belo Horizonte, Brazil

douglas.sales@dcc.ufmg.br

ABSTRACT

Keywords

This paper describes the complete implementation of a robust SUBNEG (subtract and branch if negative) processor
using quantum-dot cellular automata (QCA) technology. A
processor is the basic unit in computer systems which is
responsable for performing the basic arithmetic, logic, and
input/output operations. QCA is a promising nanotechnology where components have nano size, ultra-low power
consumption and could have a clock rate on terahertz rate.
The architecture of our processor was inspired by the one
used on the first carbon nanotube computer. We used this
work as reference because both nanotechnology (the carbon
nanotube and QCA) are promising and able to overcome
the limits of current CMOS technology. Our work is the
first implementation of a SUBNEG processor in QCA technology and, moreover, satisfies all constraints in order to
make it robust. In a bottom-up approach, we first describe
the building blocks that compose the QCA SUBNEG processor such as the ALU and the data and instruction memories. Next, we present the processor architecture. Lastly,
we describe tests and performance evaluation of the QCA
SUBNEG processor.

QCA technology,Processor Design

Categories and Subject Descriptors


C.0 [General]: [Modeling of computer architecture,System
architectures]

General Terms
Design
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
SBCCI 14 September 01 - 05 2014, Aracaju, Brazil
Copyright 2014 ACM 978-1-4503-3156-2/14/09$15.00
http://dx.doi.org/10.1145/2660540.2660997.

1.

INTRODUCTION

The invention and miniaturization of transistor allowed


the great advancements in the electronics and computer industry over the past 60 years. However, currently available
devices are quickly approaching the physical limit of miniaturization as a result of various effects that are not found at
larger scales, such as current leakage [1].
One possible alternative to the current CMOS circuits is
Quantum-dot Cellular Automata (QCA) [9]. QCA technology consists of a group of cells which, when combined and
arranged in a particular way, are able to perform computational functions. QCA technology transfers information by
means of the polarization state of various cells in contrast
to traditional computers, which use the flow of electrical
current to transfer information. A QCA design provides advantages such as ultra-small factor, low power consumption
and high-speed clock circuits. QCA clock rate could be in
the range of 1-2 THz [8].
A CPU (Computer Processor Unit), also known as processor, is responsible for interpreting and executing program
instructions, being the main component in computer systems. There are many processor architectures proposed in
literature as MIPS, X86, SUBNEG, and others. Thus, when
a new technology arises, there is an effort to implement a
processor architecture on it. Recently, the architecture of
the 1-Bit SUBNEG [10] was implemented using the Carbon
Nanotube Technology (CNT) [16] which is one of the alternatives for CMOS circuits either. The most recent processor
implemented in QCA technology is based on the Accumulator architecture [19].
In this paper we present an implementation of a SUBNEG processor using QCA technology based on the first
CNT computer [16]. Our contributions are the processor
design and simulation as well as modifications needed by
QCA technology. We also propose a novel and simple synchronization system for large QCA designs. We demonstrate
the correct processor behavior presenting simulation results.

2.

BACKGROUND

P=-1
Binary 0

P=+1
Binary 1

Figure 1: Possible polarizations of QCA cells with


four quantum dots. Black dots represent the electrons positions.

2.1

QCA

The basic QCA building block is the cell. A QCA cell


is composed by four quantum dots, each dot representing a
position where an electron can be positioned [9]. Each cell
is charged with two free electrons which can tunnel between
any of the four quantum dots. The cells are constructed such
that charges are not allowed to tunnel outside a cell due to
a large potential barrier [9]. Due to Coulomb interaction,
the cell charges tend to arrange themselves in order to stay
as far apart as possible. The charges will then occupy dots
on diagonally opposite sites, making possible to obtain only
two steady states for the cell, as is illustrated in Figure 1.
The cell polarization P corresponds to the probability distribution of electrons location. P can vary in the interval
[-1,1]. It is defined that P = 1 represents the binary value
0 and that P = +1 represents the binary value 1.
When two cells are positioned at nearby locations one interfere with the other. When cells are arranged side by side,
they tend to assume similar polarizations. For example, consider that a cell (cell 1) polarization is fixed at P1 = +1.
After that, we place a second cell (cell 2) next to cell 1. The
distribution of charges in cell 1 influences the distribution of
charges in cell 2, which is then responsible for the polarization of cell 2. So, cell 2 tends to have the same polarization
as cell 1, reducing the Coulomb interaction between all the
electrons involved. This feature is shown in figure 2 (a). In
this way, it is possible to note that QCA cells placed in a
row acts like a wire, as shown in figure 2 (b). If we place the
QCA cells in a way that leverage the interaction between
them, we can create a QCA device with the desired logic.
For example, the basic QCA gates inverter and majority are
explained below.

(a)
Coulombic Interaction

(b)

Figure 2: (a) Coulomb interaction between two


QCA cells and (b) A QCA wire.
When cells are placed diagonally to each other an inverter
is implemented. As we can see in Figure 3(a), they tend
to have reverse polarizations due to the repulsion between
electrons.
The majority gate is the basic QCA logic device, as shown
in Fig. 3(b). The center cell of the gate reaches the lowest
energy state when it assumes the polarization of the majority

of the three input cells. This is the configuration where


the repulsion between the electrons in the three input cells
and the electrons in the center cell is the smallest. Observe
in Fig. 3(b) that, even though input cell A polarization
represents binary 0, the output cell is the same of cells B
and C, which are the majority in this case. Also, if input
cell A is always fixed at binary 0, an AND gate with two
inputs (B and C) is defined. In the same way, if cell A
is always fixed at binary 1, an OR gate is accomplished.
With ANDs, ORs, and inverters, any logic function can be
implemented. So, it is possible to design any computacional
circuit with these two basic QCA gates.

Input

Output

Figure 3: (a) A QCA inverter and (b) A QCA Majority Gate.


In QCA, a clock is used in order to control the information flow in a circuit, avoiding that a signal that reaches
a logic gate propagates before the other inputs reach the
gate. This characteristic is extremely important in QCA
circuits, assuring their correct operation. Such control is
performed by raising or lowering the potential barriers between the quantum dots in the cells allowing or suppressing
charge tunneling between these dots.
The clock can be applied to groups of cells (clock zones).
In each zone, a single potential modulate the barriers between the dots. The clock zones scheme permits a cluster
of QCA cells to make a certain calculation, freeze the states
and, finally, use the outputs as inputs to the next clock zone.
The QCA clock is divided into four phases, the first of
which is called the Switch phase. During this phase, the cells
begin depolarized and the potential barriers in low state.
These barriers are then gradually raised allowing the cells
to become polarized according to the polarization state of
their neighbors. Next, when the barriers are already raised
to their maximum height, a new clock phase, the Hold phase,
begins. During this phase, the charges are unable to change
their positions within the cell, such that it can influence
other cells without undergoing any change itself. At the end
of this phase, the potential barriers gradually become lower
and the cells begin to depolarize during what is called the
Release phase. Next, when the barriers are in their lowest
level, the clock phase changes to what is known as the Relax
phase, where the cells remain in a depolarized state and the
cell barriers remain at their lowest level. The clock cycle
then starts over again with the Switch phase.
QCA has been applied to the development of logic circuits,

as can be found in some recent studies [3, 14, 4, 11, 5, 13,


2, 17].

3.

RELATED WORK

Few works presented the implementation of a QCA processor and almost all implemented an accumulator type architecture, as shown in [19] and [12]. The first one uses an
architecture that do not follow the constraints to make a
QCA circuit design robust, e.g., their circuit does not use
clock zones in the wires [7] and uses rotate cells for crossing [15]. The other, despite more robust, does not have an
instruction memory and a program counter, i.e., this processor does not implement the branching feature, which essentially differentiates a computer from a calculator.
Our QCA processor implements the SUBNEG architecture, which is similar to the first carbon nanotube processor [16, 10]. Once QCA technology is still new, a simple
architecture processor implementation using a robust design
is an important step in order to show the feasibility of the
technology. Our work can also be used as a model in the
next steps of QCA technology.
The SUBNEG architecture in spite of being simple, is a
Turing complete architecture. The SUBNEG works only
with one instruction which performs two operations: a subtraction of two operands and a branching step when the
result of the operation is negative. The instruction is also
simple, once we have the address of the two operands and
part of the next instructions address. Simplicity is the reason because this architecture is a good starting point for new
technologies.

4.

PROCESSOR ORGANIZATION

In this section, we present the SUBNEG architecture for


QCA. As QCA is not based on transistors, some changes in
the circuit were needed in order to make the SUBNEG viable
for this technology. In figure 4, we show the combinational
QCA processor design. The circuit is composed by three
main components: Instruction Memory (IM), Data Memory
(DM) and the ALU. The Next Instruction Address is a
register that saves the known part of the next instruction
address. For simplicity, we consider an architecture that
contains a DM with 4 1-bit positions and an IM that holds 4
5-bit instructions. However, our architecture scales to larger
memory processors.
When using QCA we can benefit from inherent pipelining
and no registers are needed to save the state of the machine
because this information is kept by the clock zones. However, this implies that the correct state is in the wire. Thus,
we need a mechanism to identify the correct state in the
circuit in order to perform writes correctly in DM and select the right instruction in IM. Also, in QCA there is no
high impedance signal. In order to handle this situation, we
created a naive, but efficient, technique: a wire in parallel
to the processor circuit, carrying the polarization P = +1
if the corresponding instruction in the processor should be
executed. We call this mechanism WireSync because this
wire is responsible for the entire processor synchronization.
Figure 5 illustrates the idea.
Notice that such mechanism allows the execution of more
than one instruction at the same time: each instruction with
polarization P = +1 in WireSync will be executed. In the
best scenario, where no data harzards are present, such as

WireSync
WireSync

WireSync_IM

WireSync_ALU

WireSync_out

WireSync
WireSync_out_IM
WireSync_out_ALU

Figure 4: Combinational logic design of the processor architecture.

Figure 5: WireSync operation.

RAW, WAR, and WAW, we could have instructions running


with only one clock cycle difference, strongly improving performance. However, the SUBNEG architecture has always
the RAW harzard: to go to the next instruction, we need
the branching result from ALU. Therefore, we will use only
one instruction at time, i.e., will be only one P = +1 in
WireSync.
Next, we present the details of each of the three main
components.

4.1

Instruction Memory

The Instruction Memory, or IM, holds the instructions of


the SUBNEG processor. This memory is a ROM type, i.e.,
the instructions are already stored. In QCA circuit, we use
fixed cells in order to represent these fixed data in ROM
and robust multiplexers from [6] to select the instruction.
Figure 6 shows the IM as a combinational logic design.
The SUBNEG architecture has only one simple instruction. In figure 6, the instruction bits are little endian (i[0]
to i[4]) and have the following format: the least significant
bit (i[0]) is the known part of the next instruction address,
the two next bits (i[1] and i[2]) are the address for the second operand (B) and the last two bits (i[3] and i[4]) are the
address for the first operand (A). Figure 7 illustrates the
instruction format. As part of the IM, the bit i[0] is stored
in a register1 , waiting the other bit from the result from the
ALU, to define the next instruction address. Here, we use
the WireSync mechanism to write the correct value to the
1
All registers implemented here uses the robust register implementation from [6]

Figure 6: Combinational logic design of the instruction memory architecture.

Figure 7: SUBNEG instruction format.


Figure 8: Combinational logic design of the data
memory architecture.

4.3
register. The correct value from the instruction is followed
by a parallel P=+1 in the WireSync out from IM. When
i[0] reaches the special register for next instruction address,
the P=+1 reaches it either and the write in the register is
performed.

Arithmetic Logic Unit

The arithmetic logic unit (ALU) is the simplest component of the SUBNEG processor because it makes only two
operations. The first operation, a subtraction, consists in
a XOR and the negative test can be performed by the single logic equation A.B[16, 10]. The ALU unit works asynchronously which implies that the WireSync has no use here
unless to follow the right ALU result. The ALU logic design
is shown in figure 9.
A

4.2

Data Memory

DM design is slightly more complex because it performs


two steps. The first consists in selecting A and B values
for ALU and storing the write back address which is the
register address B. The second step writes the subtraction
(single XOR) of the ALU in the address stored in the first
step.
To store the write back address, in the first and second
steps, we use the WireSync mechanism. When the right
write back address reaches the write back address registers, a
P=+1 in WireSync IM triggers the correct address writing.
The same idea applies to the write back of the ALUs
subtraction result. A logic is made with the stored address
in the write back address registers, for each DM position, in
order to get a P=+1 in the corresponding memory position
output. When the P=+1 from the WireSync ALU arrives,
it indicates that the correct value from ALU arrives and a
P=+1 is released to make the correct write in the memory.
Figure 8 shows DM combinational logic design.

AB

AB'

WireSync

WireSync_out_DM

WireSync_out_IM

Figure 9: Combinational logic design of the ALU


architecture.

4.4

Integrated Processor

In this section, we present the full QCA circuit with all


components discussed in section 4. Figure 10 shows the complete QCA circuit for the SUBNEG processor. It is important to note that our processor implements all the robustness
constraints found in literature. [7] defines a maximum wire
length of 12 cells, a minimum of 2 cells per clock zone and

Table 1: Inside Components Delays


Component Operation
Cycles
IM
IM
DM
DM
DM
ALU

Select Instruction
Save Addr Bit
Select Registers A and B
Save Register B
Write Value to Register
Process Registers A and B

11
7
8
4
12
3

a robust majority gate which is largely used in our circuit.


Another work [15] encourages the usage of multi-layer crossing instead of coplanar crossing with rotate cells in order to
make the circuit more robust. Finally, we broadly use the
robust multiplexer and D-Flip-Flop presented by [6].
The proposed processor occupies a total area of 8.63m2 ,
using 4817 QCA cells.
The processor works as follows. First, lets say the critical
path of the QCA circuit is the minimum number of cycles
in order to process the next instruction. Initially, the IM
receives the 2-bit instruction address, where the most significant bit comes from the register that keeps the known
bit from the previous instruction and the least significant
bit comes from ALU. It also receives a P=+1 from the
WireSync out IM from IM. The instruction selection step,
which is in the critical path, spends 11 full cycles to be completed, according to table 1 and passes the address of A and
B, and the P=+1 of WireSync out to the DM.
Next, three operations occur concurrently: bit[0] instruction storage, the data reading A (bit[4] and bit[3]) and B
(bit[2] and bit[1]) from DM and address B storage for write
back. They spend 7, 8, and 4 cycles, respectively. As the
data reading belongs to the critical path and it has the
highest value among the others, we are sure that all operations occurs in the right time. Then, A and B values,
and the P=+1 of WireSync are passed to the ALU which
takes 3 cycles to make the operations. ALU operations are
in the critical path. Then, the ALU re-passes the P=+1
in WireSync out DM and the subtraction result to DM in
order to make the write back, and the branching result and
the P=+1 in WireSync out IM to IM.
Lastly, the write back is accomplished up to a maximum
of 12 cycles (if the target register is 00) while the next
instruction is selected, meaning that the write back is not in
the critical path. It is important to mention that the write
back finishes before the storage of address B occurs. Thus,
no conflict here is possible, even with two instruction sharing
the same time window. Thus, the processor cycle ends with
a consumption of 22 QCA cycles in the critical path.

5.

Table 2: Instruction Memory


Address b[0] to b[4]

SIMULATION RESULTS

In this section, we present a test for the SUBNEG processor. It is important to note that several tests had been
executed. However, due to space limitation we present only
one example. We implemented the proposed processor in
QCADesigner simulator applying the coherence vector engine [18] at temperature of 1K.
The instructions are showed in table 2. First, we initialize
the DM addresses 01,10, and 11 with 1 and the 00
with 0. Then, we setup the first instruction address to 10.
The instruction in this address, according to table 2, sets

00

10001

01

01010

10

01111

11

11111

the values of the write back address to 11, selecting and


sending to the ALU the both A and B values from DM (1).
The ALU result for the subtraction and branching is both
0. Thus, the DM 11 is set to 0 and the next instruction
address becomes 00. The first 0 comes from instruction
10 and the second from branching. The first cycle results
are shown in figure 11.
Instruction addr 10

Instruction addr 00

Part of
instruction
addr 11

max: 9.88e-01
ALU B
min: -9.88e-01

Read in the
input of ALU

max: 9.88e-01
ALU A
min: -9.88e-01
max: 9.88e-01
Subtraction
min: -9.88e-01

Read in the
output of ALU

max: 9.88e-01
Branching
min: -9.88e-01
max: 9.88e-01
Writeback_addr[0]
min: -9.88e-01

Read right
after the
written

max: 9.88e-01
Writeback_addr[1]
min: -9.88e-01
max: 9.88e-01
Instruction_addr[0]
min: -9.88e-01

Read in
4th cycle
inside I.M.

max: 9.88e-01
Instruction_addr[1]
min: -9.88e-01
max: 9.88e-01
Wiresync
min: -9.88e-01

Read in the
output of ALU

max: 9.88e-01
Reg[0]
min: -9.88e-01
max: 9.88e-01
Reg[1]
min: -9.88e-01
Read right
after the
written

max: 9.88e-01
Reg[2]
min: -9.88e-01
max: 9.88e-01
Reg[3]
min: -9.88e-01

Setup time

Figure 11: Two processor cycles.


The next instruction sets the write back address to 00
and get the values from 10 and 00 for A and B, respectively. Therefore, A is 1 and B is 0. The either results from
ALU are 1. Thus, 1 is written in the DM 00 and the next
instruction is 11. The second cycle results are shown in
figure 11.

6.

CONCLUSIONS

In this work we proposed and implemented a 1-bit SUBNEG processor using QCA technology. It is also important to note that the proposed processor is scalable and

Data Memory

Register

Selector A (3 mux 2:1)


Selector B (3 mux 2:1)

Writeback Address

QCA Clock Zones

ALU
Initializers
WireSync

Instruction Adress[1]

Instruction Memory

Figure 10: SUBNEG processor.


may cause major impacts in the development of future extremely fast nanoarchitecture circuits. QCA is a promising
nanoscale technology where components have nano size, ultralow power consumption and might have a clock rate on
terahertz range. And once we respected all the constraints
to design a robust circuit in QCA literature, our work is
a very important step to show the feasibility of computer
development using QCA technology. We demonstrated the
functionality and validated the results. For future work, we
intend to expand the implementation of our processor.

7.

ACKNOWLEDGMENTS

This work has been supported by DISSE - The National


Institute of Science and Technology on Semiconductors Nanodevices, PRPq-UFMG, CNPq, UFV, and FAPEMIG.

8.

ADDITIONAL AUTHORS

Jeferson Figueiredo Chaves (Computer Science Department (UFMG), email: jefchaves@dcc.ufmg.br)

9.

REFERENCES

[1] S. I. Association. The technology roadmap for


semiconductors: Emerging research devices. 2004
update. 2004.
[2] H. Cho and E. Swartzlander. Adder designs and
analyses for quantum-dot cellular automata. IEEE
Transactions on Nanotechnology, 6(3):374383, 2007.
[3] M. Dehkordi, A. Shamsabadi, B. Ghahfarokhi, and
A. Vafaei. Novel ram cell designs based on inherent
capabilities of quantum-dot cellular automata.
Microelectronics Journal, 42(5):701708, 2011.

[4] R. Devadoss, K. Paul, and M. Balakrishnan. p-qca: A


tiled programmable fabric architecture using
molecular quantum-dot cellular automata. ACM
Journal on Emerging Technologies in Computing
Systems (JETC), 7(3):13, 2011.
[5] M. Gladshtein. Quantum-dot cellular automata serial
decimal adder. IEEE Transactions on Nanotechnology,
10(6):13771382, 2011.
[6] S. Hashemi and K. Navi. New robust {QCA} d flip
flop and memory structures. Microelectronics Journal,
43(12):929 940, 2012.
[7] K. Kim, K. Wu, and R. Karri. Towards designing
robust qca architectures in the presence of sneak noise
paths. In Proceedings of the Conference on Design,
Automation and Test in Europe - Volume 2, DATE
05, pages 12141219, Washington, DC, USA, 2005.
IEEE Computer Society.
[8] K. Kim, K. Wu, and R. Karri. Quantum-dot cellular
automata design guideline. IEICE Trans. Fundam.
Electron. Commun. Comput. Sci.,
E89-A(6):16071614, June 2006.
[9] C. Lent and P. Tougaw. A device architecture for
computing with quantum dots. Proceedings of the
IEEE, 85(4):541 557, apr 1997.
[10] A. Lin. Carbon Nanotube Synthesis, Device
Fabrication, and Circuit Design for Digital Logic
Applications. PhD thesis, Stanford, 2010.
[11] V. Mardiris and I. Karafyllidis. Design and simulation
of modular 2n to 1 quantum-dot cellular automata
(qca) multiplexers. International Journal of Circuit
Theory and Applications, 38(8):771785, 2010.

[12] M. T. Niemier. Designing digital systems in quantum


cellular automata. PhD thesis, University of Notre
Dame, 2004.
[13] V. Pudi and K. Sridharan. Efficient design of a hybrid
adder in quantum-dot cellular automata. IEEE
Transactions on Very Large Scale Integration (VLSI)
Systems, 19(9):15351548, 2011.
[14] L. H. B. Sardinha, A. M. M. Costa, O. P. Vilela Neto,
L. F. M. Vieira, and M. A. M. Vieira. Nanorouter: A
quantum-dot cellular automata design. IEEE Journal
on Selected Areas in Communications, 31(12):825
834, 2013.
[15] G. Schulhof, K. Walus, and G. A. Jullien. Simulation
of random cell displacements in qca. J. Emerg.
Technol. Comput. Syst., 3(1), Apr. 2007.
[16] M. M. Shulaker, G. Hills, N. Patil, H. Wei, H.-Y.
Chen, H.-S. P. Wong, and S. Mitra. Carbon nanotube
computer. Nature, 501(7468):526530, Sep 2013.
Letter.
[17] H. Thapliyal and N. Ranganathan. Reversible
logic-based concurrently testable latches for molecular
qca. IEEE Transactions on Nanotechnology,
9(1):6269, 2010.
[18] K. Walus, T. Dysart, G. Jullien, and R. Budiman.
Qcadesigner: a rapid design and simulation tool for
quantum-dot cellular automata. IEEE Transactions
on Nanotechnology, 3(1):26 31, march 2004.
[19] K. Walus, M. Mazur, G. Schulhof, and G. A. Jullien.
Simple 4-bit processor based on quantum-dot cellular
automata (qca). In ASAP, pages 288293. IEEE
Computer Society, 2005.

También podría gustarte