DESIGN AND IMPLEMENTATION OF AN ARBITER FOR
EFFICIENT MULTISYSTEM INTERACTION WITH SHARED
RESOURCES
'ASHISH K BABU, "V NATTARASU, ’SHASHIDHARA HR.
Department of Electronics and Communication Engineering, Sri Jayachamarajendra Collegeof Engineering, Mysore
JSS Academy of Technical Education, Bengaluru, Karnataka, INDIA.
[Abstract the paper presents the design ard analysis of an arbiter for efficient multisystem interaction with shared
resources. The abiation algorithm contains static fixed priority algorithm to gain fast system performance and lower area
implementation and cost. The analysis for data requests of clients under different priorities has been simulated. The results
indicate, a better performance can be achieved with this arbitration scheme. Based on the perfarmance analysis, fived
priority arbtation scheme can be custom tuned to meet the design requirements. The implementation ofthe afiter with
ty arbitration scheme for SoC applications has also been explained. The arbiter wis implemented on FPGA and
by XILINX 14.2 version with a 90nm cell library. In addition tke power analysis ofthe arbiter at various
Arbitration states is reported. The fixed priority arbiter can be eustom tune to obtain high bandwidth utilization, ow latency
and power effective for on-chip bus communication,
Keywords—Arbiter, system-on-chip, arbitration algorithm, FPGA, shared resources.
|. INTRODUCTION
‘A typical System-on-Chip (SOC) design cont
many heterogeneous cores linked together wit
sophisticated on-chip bus communication
architectures. The ot-chip bus communication
architecture determines the way these heterogeneous
functional units are exchanging and synchronizing
data and has a great impact on the system’s
performance, The SOC design paradigm relies on
Well defined interfaces and reuse of intellectual
Property (IP). Because more and more IP's are
integrated into the design platform, the amount of
‘communication between the TPs ison the increase and
becomes the source of the performance bottlenecks.
In the last thirty years computer memory and other
shared resources has evolved rapidly seeing
improvements in both capacity and speed. However
the logic for controlling the shared resources has ‘also
‘become increasingly more complex and difficult to
interface with. The thitd generation of double data
rate synchronous dynamic random access memory
(DDR3 SDRAM) is the newest and the fastest
volatile memory currently available. DDR3 RAM is
one of the many types of RAM used to temporarily
hold data that system needs tohave quick access
DRAM memory is designed for
‘communication with a single system. For two
systems to share the same memory, an arbiter must be
used. The shared resource arbiter serves as an
interface to both systems and grants each system
access to the memory one at a time to avoid
collisions. In this project we developed an arbiter for
a shared resource (in our case a basic hardware
multiplier) and its analysis was done. The hardware
multiplier would enable very fast data calculation rate
‘and significantly reduce the time the testers take to
transfer data. The development of the arbiter was
done using Xilinx ISE 14.2 .It enabled us to write a
VHDL code that describes the design of arbiter and t0
insert a basic hardware multiplier as shared resource.
The simulation was run in modelsim 6.3£The
‘modelsim tool was very usefil because its GUI
‘enables the display of any signal waveforms in the
design Since simulation has a fast turnaround time
we iteratively validated the design in modelsim to
ensure any bugs in the code were caught cay,
‘Although the arbiter has been fully developed and is
fully functional, it still needs to be properly tested
with greater coverage Functionality checkers should
bbe developed for this design which would test the
validity of arbiter in runtime environment as of in
practical use, In many applications itis. very useful
for more than one system to interact with memery.
Xilinx bas released a memory controller for DDR3
which handles everything for communication
between one system and memory. If to systems try
toaccess DDR3 at the same time there is a great risk
that data accessed will not be accurate. The goal of
this project is to design a traffic controller or an
arbiter that will delegate which systems tum it is to
send requests while blocking requests from other
wm [8]. An arbiter will prevent collision of data’s
being accessed and maintain requests in order to
‘ensure memory is holding accurate data
I ARBITRATION SCHEMES FOR
SHARED RESOURCES: A REVIEW
‘This section discusses arbiters, which resolve
‘multiple requests for a single resource. In addition to
being useful in their own right, arbiters form the
fundamental building block for controllers that match
Trocecdines of leternatoal Joint Confrence, July 2013, Goa, lad ISBN: O7E-SID2TNT IG
a—— LS
Designand Implementation ofan Arie For Eficien Multisytem Ineraction wth Stared Resouces
‘multiple requesters with multiple resources.
Whenever a resource, such as a memory. buses of a
‘multiplier is shared by many agents, an arbiter is
required to assign access to the resource to one agent
at a time, Figure 1 shows a symbol for an n-input
arbiter that is used to arbitrate the use of a resource,
such as the input port to a crossbar switch, among a
set of agents, such as the client system connected to
‘that input por. Each client system that has a fit
send requests access to the input port by asserting its
request line. Suppose, for example, that there are n =
8 CSs and CSs 1, 3, and 5 assert their request lines,
rl, 13, and #5. The arbiter then selects one of these
(CSs—say, CS i= 3—and grants the input port to CS
3 by asserting grant line g3.CS 1 and 5 lose the
arbitration and must try again later by reasserting rl
and r5.Most often, these lines will just be held high
until the CS is successful and receives a grant.
Figurel: Arbiter symbols. (a) An arbiter accepts
request lines, ro . fy-l, arbitrates ameng the
asserted request lines, selecting one, 1, for service,
‘and asserting the corresponding grant line, g. (b) To
allow grants to be held for an arbitrary amount of
‘me without gaps between requests, a hold line is
added tothe arbiter.
2.1 Fixed Priority Schemes
2.1.1 Static fixed priority scheme
Static fixed priority is a common scheduling
mechanism on most common buses .In a static fixed
priority scheduling policy; each master is assigned a
fixed priority value. When several masters request
simultaneously, the master with the highest priority
will be granted. The advantage ofthis arbitration is its
simple implement and small area cost. The static
priority based architecture does not provide a means
for controlling the fraction of communication
‘bandwidth assigned to a component. If masters with
high priority requests frequently, it will lead to
starvation of the ones with low priority
If we assign priority in a linear order, we can
struct an arbiter as an iterative circuit, 36
illustrated for the fixed-priority arbiter of Figure 2.
We construct the arbiter asa linear aray of bit cells
Each bit cell i, as shown in Figure 2(a, accepts one
request input, ri, and one cary input, ci , and
generates a grant output, gi, and a carry ouput, ct
‘The carry input ci indicates that the resource has not
been granted to a higher priority request and, hence,
is available for this bit cell. Ifthe current request is
‘rue and the carry is true, the grant line is asserted and
the carry output is deasserted, signalling that the
resource has been granted and is no longer available.
$
0.
w
Figure 2: A fied priority arbiter (a) Abit cell for aniterative
‘arbiter (0) A fourat Oued pity arbiter.
Ep
Dy
@
We can express this logic in equation form as:
gist Aci
cit = ari Aci
Figure 26 shows a four-bit fixed. priority arbiter
constructed in this manner. The arbiter consists of
four ofthe bit cells of Figure 2a. The first and last bit
cells, however, have been simplified. The first bit cll
takes advantage ofthe fac that «0 is always |. While
the last bit cell takes advantage ofthe fact that there
is no need to generate c4At fist glance, it appears
thatthe delay ofan iterative arbiter mast be linea in
the number of inpuss. Indeed, if they are constructed
as shown in Figure 2, then that is the case, since in
the Worst case the carry must propagate from one end
of the arbiter to the other. However, we can build
these arbiters to operate in. time that grows
logarithmically with the number of inputs by
employing carry-look ahead techniques similar t0
those employed in adders [4]
2.1.2 TOM/Round-robin scheme
Time division multiplexed (TDM) scheduling divides
execution time onthe bus into tine slots and allocates
the time slots t0 adapters requesting wse ofthe bus
Each (ne slt can span several physical transactions
cn the bus. A request for use ofthe bus might requie
‘multiple slot times to perform al required transfers.
However, in this architecture, the components ae
provided access to the communication channel in an
Ditcrleaved mange, using a. bo level arbiaion
rotocel (3).
‘roccding fItcrnational ont Conference, ™ Jah 2013, Goa, Inia, ISBN 9TE-SI92TIA7-7%
2‘esgn and fyplementation of an Arbiter For Efficient Malisystem Interaction wih Shared Resouees
Figure 3 Time division maltiplesed/ Round Robin
“Architetore
The first level of arbitration uses a timing. wheel
where each slot is statically reserved for a unique
master. In a single rotation ofthe wheel, a master that
has reserved more than one slot is potentially granted
sccess to the channel mubiple times. If the master
Interface associated with the current slot tas an
outstanding request, a single word transfer is granted,
ind the timing wheel is rotated by one slot. To
alleviate the problem of wasted slots, a second level
farbitation is supported. The policy isto keep track
of the last master interface to be granted access via
‘he second level of arbitration, and issuea grant to the
‘ext requesting master in a round-robin fashion, at
figure |, the current slot is reserved for MI, but it has
10 data to communicate. The second level increments
1 round-robin pointer rr2 from its current position at
M2 to the next outstanding request at M4.Advantage
of this algorithm that it is easy to implement.
Disadvantage is that it leads to the mistake of date
transfer.
|. METHODOLOGY
|A system has a shared resource, such as a bus or
memory. with which various devices _ may
communicate upon a request being granted by an
arbiter. In order to reduce the arbitration time, the
arbiter is comprised of @ state machine and latch
which run asynchronously. Consequently, once one
request has been granted and acted upon, the state
‘machine will commence arbitration for any remaining
requests. In erder that a plurality of different portions
‘of a computer system may share a common resource,
such a a common bus or a common memory. it is
Y to provide an arbitration system so that
‘only ene portion uses the resource at a given
Arbitration problems arise not only as between
pracesiors or processes in computer systems, but also
in a wide variety of other electronic systems. For
| example, arbitration needs commonly arise in
communications networks, where interface access
and gateway access can be important shared
resourses. In a known bus arbitration scheme, at one
system clock eyele the bus requests which are current
are loaded into a register. and at the next system
clock eyele the requests held in the register are
arbitrated by a state machine to deterimine which
request to grant, The register is necessary to
‘overcome Meta stability problems; without the
resister if a request were to chanze at the time of
arbitration, an incorrest grant, of “zltch” is likely t0
‘The design presented in this paper provides an
arbitraticn system which docs not need to wait for an
extemal clock signal before commencing an
arbitratien operation and which can commence a
Subsequent arbitration operation immeditely after a
preceding request has been satisfied. It also provides
‘an arbitration system having latch means with a
plurality of signal inputs, a corresponding pluralty of
signal outputs and a control input. Each signal input
reseives a respective one of the resource request
signals. The cantrol put receives a control signal
Each signal output provides a respective second
resource signal which is select ably a latched form
and a passed form of the respective first-mentioned
resource request signal in dependence on the control
signal. Furthermore, the system includes an
asynchronous state machine having a plurality of
inputs and outputs. Each input receives a respective
‘one of the first and second resource request signals;
teach ouiput is provided for a respective one of the
first resource request signals and its corresponding.
second resource request signal. Each output provides.
a respective one of the grant signals to a respective
fone of the operating means.
‘A control output provides the control signal to the
Tatch means. The state machine operates cyclicly and
‘asynchronously (a) to set the control signal in
response to one or more of the resource request
signals so that the latching mears latches the first
resource request signals to provide the second
resource signal; (b) to arbitrate between the second.
resource request signals to select one of the second
resource request signals comesponding to a selected
‘one of the operating means; (c) to provide that one of
the grant signals corresponding to the selected one of
the operating means; (@) to wait until the first
resource request signal for the selected one of the
‘operating means ceaies to request the resource; and.
(@) to set the control signal so thatthe latching means
pasees the first resource request signals to form the
second resource request signals. Therefore. in
accordance with the invention, a initial arbitration
‘operation can commence immediately upon a first
resource request signal arising, glitches are prevented
due to the action of the latching means which
prevents any second resource request signal changing,
uring an arbitration operation, and_ subsequent
arbitration operations can commence immediately
after a preceding request has been satisfied, as
‘rosedings of Ttratonal ont Conference, July 2013, Goa, Indi ISBN OTR-BIS271477-5
i ecient a
3Design and implemestation of an Arbiter For Eficint Mulisystem Interaction wit Shared Resources
ialled by the appropriate first resource request
al
In accordance with one aspect of the presented
design, the arbitration system is operated
asynchronously. Putin other words, once an
arbitration operation has been completed. the system
does not wait to be triggered by an external signal
(ouch ss a computer system clock) before
commencing the next arbi operation, In
accordance with anether aspect of the invention, there
is provided a system having a common resource, a
plurality of means each operable to provide a
resource request signal and to communicate with the
resource in response fo a grant signal, and arbitration
means which is responsive to the resource request
sixnals and is operable repeatedly to determine which
‘request to grant and provide a corresponding grant
signal, characterised in that the arbitration means
‘operates asynchronously with respect fo any external
signal. In accordance with a further aspect of the
design, there is provided a system having a common
resource, a plurality of means each operable to
provide a resource request signal and to communicate
with the resource in response to a grant signal, and
arbitration means which is responsive io the resource
request signals and is operable repeatedly to perform
ation operation to determine which request to
grant and provide a corresponding grant signal,
characterised in thatthe arbitration means is operable
to commence a succeeding arbitration operation in
response 10 one or more resource request signals
immediately after completion of the preceding
arbitration operation [5}. Thus. the systems according
to the various aspects of the design have the
advantage that the arbitration period is determined
solely by the propagation delays of the system and is
independent of any system clock frequency or the
timing of the request signals relative to the system
clock.
Preferably, the arbitration means includes a state
machine, which may be operable to change stare
ceyclicly: from an initial state to a decision state in
response to one or more request signals from the
decision state to one of a plurality of grant states
determined in the decision state from the current
request signals; and from the determined grant state
hack t0 the initial state once the determined request
has been met. Conventionally, the state machine has a
plural bit state variable which has different values in
each of the states, and in order to prevent glitches due
to the asynchronous operation of the state machine
the correlation between the bit values of the state
Variable and the states is preferably arranged such
that upon a change from any state to a succeeding
state only one of the bit values changes. Preferabl,
the bits of the state variable are allocated such that
‘each grant signal can be determined from the value of
4 respective one of the bits of the state variable, thus
‘obviating the need for further logic in order to
produce the grant signals. With such a bit allocation
of the state variable, it may be required that the state
‘machine is operable to change from at least one of the
states to at least one other of the states via a transien!
state
In order to prevent glitches, the system preferably
further comprises means to prevent a change in &
request signal input to the state machine while the
state machine is in the decision state, Such means
may comprise a latch which is operable to pass the
request signals when the state machine is in the initial
State and which is cperabe to latch the request
signals when the state machine is in the decision
state. Conveniently, the state machine and the means
to prevent a change are formed by a single
programmable logic array. In order to determine
when a grant signal has been acted upon. the state
machine may be arranged to be responsive directly to
the request signals and be operable to remain in a
determined grant state until the respective request
signal is removed. In one embodiment, the common
resource is a bus of the computer system. In another
embodiment, the common resource is a memory of
the computer system. By adopting the arbitration
system according to the invention, itis envisaged that
it will be possible to use relatively inexpensive multi-
ported dRAM memories, where otherwise it woald be
necessary 0 use more expensive SRAM memories
(61.
In summary the definitions of the state and latch
‘enable signal are as follows and are been depicted in
the diagram below:
Initially by default we have all the priorities and latch
enable signal to zeros then state is reset to So and next
‘eycle if reset goes low and clack is applied the sate is
updated to $;
State $0: In state SO Mux input select and the enable
register ig |.and the Mux request select is assigned
(01,00 and 10 based on the grants 111,no grant and
(000 respectively and then nex state is assigned St
State SI: In state $1 when grant is 111 mux request
select is 10 then next state is finish, when we have
grants otherthen 111 the next state is assigned S2.
State S2 (Memory address computation): When the
‘count is 11 mux request select is 10 and next state is,
finish when other count the next state is $2.
State Finish (Memory read access): Here since it's the
finish stat, finish signal is assigned 1, mux request
select is 10 and the next state is assigned finish and
when others also i's the finish state
‘Proceelingsofatratonal ont Conference, 7 July 2013, oa, India, ISBN: 9TEL-S2T1A7-T4
oDesign and Implementation ofan Arbiter For Efficient Mulisstem Interaction with Shared Resources
Figore 4: Sate diagram for different states and latch enable
Based on the methodology discussed and taking a
basic hardware multiplier as a stared resource the
design below represents an arbiter for six client
request and according to the state operations
discussed above the grants are given to the requested
clients based on the fixed priority scheme discussed
2.The figure 5 below depicts the hardware
design of arbiter
in sect
sateo
| resource
| (oreo
| ooraeuses
| mucus)
The static fixed priority arbitration scheme
implemented in the design is a scheduling policy
where each master is assigned a fixed priority value
when more than one master request simultaneously
The master with the highest priority is granted. The
synthesized arbiter block for the design using the
arbitration scheme discussed is shown in figure 6
Figure 6: RTL Schematic ofthe arbiter design
IV, SIMULATION RESULTS
The des
on the different input values for shared resources.
Initially by default we have all the priorities and latch
enable signal to zeros then state is reset to So and next
cycle if reset goes low and clock is applied the state is
updated to S; .State $0 : In state SO Mux input select
and the enable register is I and the Mux request select
is assigned 01,00 and 10 based on the grants 111,n0
grant and 000 respectively and then next state is
assigned S1.State SI: In state SI] when grant is 111
mux request select is 10 then next state is finish,
when we have grants other then 111 the next state is
assigned S2.State $2 (Memory address computation):
When the count is 11 mux request select is 10 and
next state is finish when other count the next state is
is further synthesized and simulated based
2. State Finish (Memory read access): Here since
it's the finish state, finish signal is assigned 1, mux
request select is 10 and the next
finish and when others also it's the finish state
‘ceedings of ateratonal Jot Conerence, 7 ly 2013
Tdi, ISBN OTH BTDETTADesign ard Implement
5 OSAR) teoiupans
Further priorities are assigned to the inputs coming
from six different clients. The client request which is
assigned the highest priority is given the grant at the
‘earmest. Later the request following the next higher
priorities are processed and given grant without
WEN eum toni
ey
|
non
The logic utilization summarizing
simulation is provided in the table below
the above
‘Selected Device Xilinx XCTVS85i-1
ipFiops | 668
Fuly sed LOT FFP 9M
TOR Uiliaton or
Delays) 8.07
| Tost estinated ower — 20678 |
dnmW) |
Table 1: Logie Utilization table of the design
Proceedings of ineratonl Jont Confrence, 7
Eclat Multisystem Ineraction with Stared Resour
aly
Wuuibeeami sees
represented gure
siving external clock to each client for every request
thereby reducing the time delay. The simulated
results forall client request and grant are shown with
priorities assigned in figure 8.
Musee we ee
ee
ts ofthe mal arbiter desis
CONCLUSION
‘The arbiter with fixed priority for single shared
resource is presented in the study. The simulation
results not only provide the performance analysis for
various request combinations, but also the logic
utilization analysis including timing analysis and
power report under optimal conditions. The result:
obtained show that the framework of the arbiter can
be
conf
used to explore
the space of possible
rations to evaluate the performance tradeofls,
13, Goa, Tai, ISBN 7BSDRTNTISDesign and Implementation ofan Arbiter For Eicient Multisytem Interaction with Shared Resources
REFERENCES,
(IDEM. Lin CC. Yen, CH, Shih and LY Jou, “On compliance
tet of onchip bus fer SOC", in Proceedings of the 2004
‘Conference om ist South Paci design automaton, 2004,
[2] Noguera. J. and Badia, R. M_"HWISW co-sign techniques
for dynamically reconiguable architectures", [EEE
Trane VLSI Spt Vol 10, NO, 2002,pp. 398 ~ 415,
[BIE S- Shia, V.1. Mooney Ill G.F Riley, "Roundrbin Arbiter
Design and Generation” Georgia Irstinte of Tecinologs,
‘Atlant, GA, Technical Repo GIT -C-02-38.2002,
{41 incites and practices of intexconnecion networks, William
Tames Dally and Brian Towels Morgan Kaun,
[5] Asynchronous arbiter state machine for abitrating between
‘opening, devices requesting. acess to a shard resoerce
Us!1 79705, Osmaa Keon, Dupont Pine Systema, Lt
1) F Pole, D. Bertoze, LBenini and A. Bog,
“Performance Analysis of Abiration Policies for SoC
Communication “Archtecures", Design Automation for
Embedded Systems, Vo. Numbers 2-3, 2003,
1P1 ViIDC Progamming by Beams, Douglas 1 Perry, Tata
‘MoGa-Hil Publishing Comp Limite.
[8] Design and implemcotaion of a recnfgurble attr
‘Proecedings ofthe 7th WSEAS International Conference on
Signal, Speech and Image Processing Bejing, China,
Sepember 3-17200
kee
Trovsolingsofltarnaonal Joint Conference, 7 Fay 207
37
Goa Talia SN: OTERTDITIATIS