Explora Libros electrónicos

Categorías

Explora Audiolibros

Categorías

Explora Revistas

Categorías

Explora Documentos

Categorías

0 calificaciones0% encontró este documento útil (0 votos)

523 vistas156 páginas© Attribution Non-Commercial (BY-NC)

PDF, TXT o lea en línea desde Scribd

Attribution Non-Commercial (BY-NC)

0 calificaciones0% encontró este documento útil (0 votos)

523 vistas156 páginasAttribution Non-Commercial (BY-NC)

Está en la página 1de 156

Instituto Nacional de Tecnologa Industrial - INTI & Comisin Nacional de Energa Atmica - CNEA Buenos Aires ARGENTINA

Circuits and Systems Society IEEE Catalog number CFP0854E-CDR ISBN 978-987-655-003-1

Technical inquiries: Editorial de la Universidad Nacional del Sur, EDIUNS, Av. Alem 925, Baha Blanca, Argentina, Email: ediuns@uns.edu.ar, TE: +54-291-4595173 e.2092, FAX: 4562499

Proceedings of the School of MicroNanoelectronics, Technology and Applications 2008 Actas de la Escuela Argentina de MicroNanoelectrnica, Tecnologa y Aplicaciones 2008

ii

Proceedings of the School of MicroNanoelectronics, Technology and Applications 2008 Actas de la Escuela Argentina de MicroNanoelectrnica, Tecnologa y Aplicaciones 2008

Coordinated by:

Pedro Julin Universidad Nacional del Sur Andreas G. Andreou The Johns Hopkins University Edi UNS Editorial de la Universidad Nacional del Sur Av. Alem 925, Baha Blanca, 8000 Argentina July 2008

iii

Actas de la Escuela Argentina de Micro-Nanoelectrnica, Tecnologa y Aplicaciones 2008 / Proceedings of the Argentine School of Micro-Nanoelectronics, Technology and Applications Compilado por Pedro Julin y Andreas G. Andreou. - 1a ed. - Baha Blanca : Universidad Nacional del Sur - Ediuns, 2008. CD-ROM. ISBN 978-987-655-003-1 1. Tecnologa. I. Julin, Pedro, comp. II. Andreou, Andreas G., comp. CDD 664 IEEE Catalog number: CFP0854E

Queda hecho el depsito que estalece la Ley 11.723 Libro de edicin Argentina No se permite la reproduccin parcial o total, el almacenamiento, el alquiler, la transmisin o la transformacin de este libro, en cualquier formato por cualquier medio, sea electrnico o mecnico, mediante fotocopias, digitalizacin u otros mtodos, sin el permiso previo y escrito del editor y/o el autor del correspondiente trabajo. Su infraccin est penada por las leyes 11723 y 25446.

iv

The third edition of the Escuela Argentina de Microelectrnica, Tecnologa y Aplicaciones, EAMTA- has moved to Buenos Aires, to the venues of the National Comission of Atomic Energy (CNEA) and the National Institute of Industrial Technology (INTI). This year, twenty nine (29) manuscripts have been accepted for publication. Many international contributions were received from all over the world, including Taiwan, China, India, Iran, Spain, Japan, USA, Brazil, Uruguay, Peru, Colombia, United Kingdom, Mexico, and, of course, Argentina. We want to specially thank all reviewers for their hard work in providing useful feedback to improve the papers. All papers received at least three revisions, and some papers received up to six revisions. All papers will be presented in poster format to stimulate feedback and discussion from the attendees, as we have experienced from the last year. This year, thanks to the quality of previous years papers, and also to the quality of the Reviewers and Committee, the papers will appear at IEEE Xplore. The Technical Program is also very rich. This version of the School will be nine days long, during which, there will be eight plenary talks and four tutorials delivered by experts from Brazil, Uruguay, United States and Taiwan. Three courses tracks, two advanced and one basic, will cover material on digital and analog design, VLSI tools, semiconductor physics, low power design, filter design, MEMS, low noise design, and switched caps circuits. In addition, there will be special laboratory sessions taking advantage of the clean rooms of INTI and CNEA, including a course on Microsystems and another one on PDMS basics. Industry day will consist of another four lectures from members of industry. The activities related with the school will take place from the first day, Saturday, until Wednesday. Then, during Thursday and Friday, the Conference Section will take place with poster exhibitions and tutorials, plus Industry Day. The School will resume on Saturday for the two final days, dedicated exclusively for the students to finish the design of their first integrated circuit. As it is already known, one of the targets of EAMTA is to help establish a technological platform in the country, and in doing this, we recognize the central role of students. Therefore, continuing with the tradition started in the first EAMTA, sixty (60) travel and lodging grants were provided to students from different provinces of Argentina, and also to students from Brazil, Per, USA and Uruguay. We want to thank all sponsors for making this possible. We hope you enjoy the event and the city, and wish you a pleasant stay.

Sponsors

vi

Local Organization

Adriana Vallese Alberto Lamagna Alejandra Massacane Alex Lozano Alfredo Boselli Anah Weinstock Betiana Lerner Carlos Rinaldi Daniel Lupi Daniel Rodrguez Diego Schmidt Eliana Mangano Fabiana Barrera Federico Ibez Francisco Nespras Gabriel Carbonara Gabriel Molinaro Gabriel Redelico Gustavo Estvez Gustavo Gimnez Gustavo Merletti Jorge Quiroga Juan Bonaparte Juan Ortiz Karina Pierpauli Laura Malatto Mariano Roberti Maximiliano Fischer Maximiliano Perez Mercedes Malvasio Natalia Vega Norberto Boggio Omar Milano Pablo Gurman Paola Colombo Salvador Ortiz Sandra Romero Silvia Moncaglieri Valeria Muoz

vii

Conference Organization

Programme Chairs

Andreas Andreou Pedro Julian

Programme Committee

Oscar Agazzi Martin Alurralde Alfredo Arnaud Sergio Bampi Jennifer Blain Christen Claudio Busada Gert Cauwenberghs Ricardo Cayssials Alfonso Chacon Rodriguez Liang Gee Chen Hector Chiacchiarini Juan Cousseau Eugenio Cullurciello Alejandro De La Plaza Jader De Lima Tobias Dellbruck Carlos Dualibe Adrian Faigon Maximiliano Fischer Liliana Fraigi Antonio Garcia Rozo Carlos Gayoso Gregorio Oscar Glas Fernando Gregorio Victor Grimblatt Guillermo Guichal Luis Hernandez Mario Hueda Ting Ting Hwang Victor Jimenez Marcelo Johann Jing Yang Jou Morris Ker Walter Lancioni Sing Ling Lee

viii

Ching Ting Lee Chin-Teng Lin Brian Liu Alex Lozano Daniel Lupi Laura Malatto Pablo Mandolesi Carlos Marques Franco Noel Martin Pirchio Favio Masson Venkata Rakesh Mekala Carlos Muravchik Roberto Murphy Alejandro Oliva Beatriz Olleta Rogelio Palomera Felix Palumbo Eduardo Paolini Hernan Pastoriza Marcelo Pavanello Pablo Petrashin Adrian Quijano Ricardo Reis Murilo Romero Hernan Romero Conrado Rossi Arturo Sarmiento Marcio Schneider Carlos Silva Cardenas Fernando Silveira Milutin Stanacevic Gustavo Sutter Luis Toledo

Local Organization

ix

Table of Contents

Extraction Parameters Method to get a Dual Gate MOSFET Macromodel Julio Zola, Juan Kelly, Gregorio Oscar Glas SoC Prototyping Environment for Electromagnetic Immunity Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabian Vargas Low Dimensionality Electronic Devices Based on Heterodimensional Schottky Contacts: Modeling and Experimental Results . . . . . . . . . . . . . . . . Regiane Ragi, Murilo Romero, Bahram Nabet Theoretical Analysis of Power Clock Generator based on the Switched Capacitor Regulator for Adiabatic CMOS Logic . . . . . . . . . . . . . . . . . . . . . . Yasuhiro Takahashi, Toshikazu Sekine, Michio Yokoyama Temperature and interface traps compensation in MOS Bias Controlled Cycled dosimeters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jose Lipovetzky, Mariano Garcia Inza, Sebastian Carbonetto, Eduardo Gabriel Redin, Adrian Faigon Constraint-based test-scheduling of embedded microprocessors . . . . . . . . . . Nikolaos Bartzoudis, Vasileios Tantsios, Klaus McDonald-Maier Improved hardware implementation of complementary sequences generator and correlator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcos Funes, Patricio G. Donato, Matias Hadad, Daniel Carrica Correction Algorithm for the Proximity Eect in e-beam Lithography . . . Juan Jose Zarate, Hernan Pastoriza

11

17

23

29

33 38

Computer Assisted Design of a CMOS-Compatible Inductor for RF . . . . . 43 Jesus Garcia-Guzman, Antonio Salgado-Uscanga, Fayne Meza-Martinez, Carlos Alberto Gomez-Pecero Evaluation of Gunshot Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . Alfonso Chacon, Pedro Julian VLSI Microprocessor Architecture for a Simplicial PWL Function Evaluation Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Agustin Rodriguez, Victor Manuel Jimenez-Fernandez, Pedro Julian, Osvaldo Agamennoni, Omar Lifschitz CDM ESD Protection in CMOS Integrated Circuits . . . . . . . . . . . . . . . . . . . Ming-Dou Ker, Yuan-Wen Hsiao 49

55

61

A 76db-ohm, 2 mW, 10Gbps optical receiver analog front end in 80nm CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mustansir Mukadam, Alyssa Apsel On the Analysis of Switched Continuous Time Filters . . . . . . . . . . . . . . . . . Matias Miguez, Alfredo Arnaud A Fast Acquisition Phase Frequency Detector for Phase-Locked Loops . . . Zhongtao Fu, Xiao Wang, Eugene Minh, Alyssa Apsel Human Identication Experiments Using Acoustic Micro-Doppler Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhaonian Zhang, Andreas Andreou Impulse Radio Address Event Interconnects for Body Area Networks and Neural Prostheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrew Cassidy, Zhaonian Zhang, Andreas Andreou Fabrication Process Design for Complementary Metal-Cytop-OrganicSemiconductor Integrated Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edward Choi, Recep Ozgun, Bal Dhar, Howard Katz, Andreas Andreou Sigma Delta Based Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hector Kirschenbaum, Alejandro De La Plaza

67 71 77

81

87

93 99

Remote logic analyzer implemented on FPGA . . . . . . . . . . . . . . . . . . . . . . . . 103 Luisa Fernanda Garcia Vargas, Henry Leonardo Moreno Diaz, Alejandra Gonzalez Correal, Guillermo Jaquenod Parallel Architecture for Decoding LDPC Codes on High Speed Communication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Damian Morero, Graciela Corral-Briones, Mario Hueda Brent-Kung fast adder description, simulation and formal verication using Lava . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Leandro Marso High Value Resistance for Neural Signals Acquisition System using OTAs topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Juan Pablo Zeballos Raczy, Omar Olguin Amado, Cesar Vasquez Vargas Common Gate LNA Design Space Exploration in All Inversion Regions . . 119 Rafaella Fiorelli, Fernando Silveira Single post load cell in LTCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Mariano Roberti, Liliana Fraigi, Mario Ricardo Gongora-Rubio RFID Front-End in 0.5um Standard CMOS Process: Experimental Results 126 Gustavo San Martin, Pedro Julian, Pablo Mandolesi

xi

CNN Digital Pixel Processor Cells for Automated Design: Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Martin Di Federico, Pedro Julian, Pablo Mandolesi A 6 Watt Linear RF Power Amplier for WiMAX Base Station . . . . . . . . . 134 Marcelo Bruno, Lucas Citta, Juan Cousseau Close Range Bearing Estimation and Tracking of Slow Moving Vehicles Using the Microphone Arrays in the Hopkins Acoustic Surveillance Unit . 140 Zhaonian Zhang, Andreas Andreou

xii

Author Index

Agamennoni, Osvaldo . . . . . . . . . . . . . 55 Ker, Ming-Dou . . . . . . . . . . . . . . . . . . . . 61 Andreou, Andreas . . . . .81, 87, 93, 140 Kirschenbaum, Hector . . . . . . . . . . . . . 99 Apsel, Alyssa . . . . . . . . . . . . . . . . . . 67, 77 Lifschitz, Omar . . . . . . . . . . . . . . . . . . . 55 Arnaud, Alfredo . . . . . . . . . . . . . . . . . . .71 Lipovetzky, Jose . . . . . . . . . . . . . . . . . . 23 Bartzoudis, Nikolaos . . . . . . . . . . . . . . 29 Mandolesi, Pablo . . . . . . . . . . . . 126, 130 Bruno, Marcelo . . . . . . . . . . . . . . . . . . 134 Marso, Leandro . . . . . . . . . . . . . . . . . . 111 Carbonetto, Sebastian . . . . . . . . . . . . . 23 McDonald-Maier, Klaus . . . . . . . . . . . 29 Carrica, Daniel . . . . . . . . . . . . . . . . . . . . 33 Meza-Martinez, Fayne . . . . . . . . . . . . . 43 Cassidy, Andrew . . . . . . . . . . . . . . . . . . 87 Miguez, Matias . . . . . . . . . . . . . . . . . . . 71 Chacon, Alfonso . . . . . . . . . . . . . . . . . . 49 Minh, Eugene . . . . . . . . . . . . . . . . . . . . . 77 Choi, Edward . . . . . . . . . . . . . . . . . . . . . 93 Moreno Diaz, Henry Leonardo . . . 103 Citta, Lucas . . . . . . . . . . . . . . . . . . . . . 134 Morero, Damian . . . . . . . . . . . . . . . . . 107 Corral-Briones, Graciela . . . . . . . . . .107 Mukadam, Mustansir . . . . . . . . . . . . . . 67 Cousseau, Juan . . . . . . . . . . . . . . . . . . 134 Nabet, Bahram . . . . . . . . . . . . . . . . . . . 11 De La Plaza, Alejandro . . . . . . . . . . . 99 Olguin Amado, Omar . . . . . . . . . . . . 115 Dhar, Bal . . . . . . . . . . . . . . . . . . . . . . . . . 93 Ozgun, Recep . . . . . . . . . . . . . . . . . . . . . 93 Di Federico, Martin . . . . . . . . . . . . . . 130 Donato, Patricio G. . . . . . . . . . . . . . . . 33 Pastoriza, Hernan . . . . . . . . . . . . . . . . . 38 Faigon, Adrian . . . . . . . . . . . . . . . . . . . . 23 Fiorelli, Rafaella . . . . . . . . . . . . . . . . . 119 Fraigi, Liliana . . . . . . . . . . . . . . . . . . . . 123 Fu, Zhongtao . . . . . . . . . . . . . . . . . . . . . 77 Funes, Marcos . . . . . . . . . . . . . . . . . . . . .33 Garcia Inza, Mariano . . . . . . . . . . . . . .23 Garcia Vargas, Luisa Fernanda . . . 103 Garcia-Guzman, Jesus . . . . . . . . . . . . 43 Glas, Gregorio Oscar . . . . . . . . . . . . . . . 1 Gomez-Pecero, Carlos Alberto . . . . . 43 Gongora-Rubio, Mario Ricardo . . . 123 Gonzalez Correal, Alejandra . . . . . 103 Ragi, Regiane . . . . . . . . . . . . . . . . . . . . . 11 Redin, Eduardo Gabriel . . . . . . . . . . . 23 Roberti, Mariano . . . . . . . . . . . . . . . . 123 Rodriguez, Juan Agustin . . . . . . . . . . 55 Romero, Murilo . . . . . . . . . . . . . . . . . . . 11 Salgado-Uscanga, Antonio . . . . . . . . . 43 San Martin, Gustavo . . . . . . . . . . . . . 126 Sekine, Toshikazu . . . . . . . . . . . . . . . . . 17 Silveira, Fernando . . . . . . . . . . . . . . . . 119 Takahashi, Yasuhiro . . . . . . . . . . . . . . . 17 Tantsios, Vasileios . . . . . . . . . . . . . . . . . 29

Vargas, Fabian . . . . . . . . . . . . . . . . . . . . . 6 Hadad, Matias . . . . . . . . . . . . . . . . . . . . 33 Vasquez Vargas, Cesar . . . . . . . . . . . 115 Hsiao, Yuan-Wen . . . . . . . . . . . . . . . . . .61 Hueda, Mario . . . . . . . . . . . . . . . . . . . . 107 Wang, Xiao . . . . . . . . . . . . . . . . . . . . . . . 77 Jaquenod, Guillermo . . . . . . . . . . . . . 103 Yokoyama, Michio . . . . . . . . . . . . . . . . .17 Jimenez-Fernandez, Victor Manuel 55 Zarate, Juan Jose . . . . . . . . . . . . . . . . . 38 Julian, Pedro . . . . . . . . 49, 55, 126, 130 Zeballos Raczy, Juan Pablo . . . . . . 115 Katz, Howard . . . . . . . . . . . . . . . . . . . . . 93 Zhang, Zhaonian . . . . . . . . . .81, 87, 140 Kelly, Juan . . . . . . . . . . . . . . . . . . . . . . . . . 1 Zola, Julio . . . . . . . . . . . . . . . . . . . . . . . . . 1

xiii

Julio Guillermo Zola, Juan Miguel Kelly, Gregorio Oscar Glas

Electronic Circuits Laboratory Electronic Department Faculty of Engineering University of Buenos Aires, Argentina Email: {jzola, jkelly, gglas} @fi.uba.ar

Abstract A simple extraction parameters method (EPM) based on datasheets values to build a dual gate MOSFET (DGM) macromodel to be used on PSpice, is explained. The model obtained for each DGM under test fits approximately its characteristics curves and real parameters values. The extracted values can be validated or improved either by calibration measurements or simulation models responses. Finally, a comparison between the proposed macromodel, current models and real responses are shown.

II.

EPM DESCRIPTION

I.

INTRODUCTION

A. Static parameters Fig. 2 shows the ID_VG1S characteristics. It can be observed that for low values of VG1S and high values of VG2S, they have approximately a quadratic behavior. According to Fig. 1b, for this levels of gate-source voltages T1 will be in saturation region, and therefore ID = f (VG1S2). Under these conditions, VT1 (for ID 0, VG1S = VT1) and k1, can be obtained.

The DGM are used in several applications, i.e.: high frequency amplifiers, mixers, etc. The current models of DGM have an important number of parameters to be set. It is known that finding the complete set of parameters for a new device that hasnt been modelled yet, is a complex issue [1,2]. Then, a simple method to extract basic parameters and building a simple macromodel for a new device is developed. The extracted values can be validated or improved either by calibration measurements or simulation models responses. The proposed DGM macromodel, is shown in Fig. 1 [3,4]. EPM gets the following values from datasheets: Static parameters: VT, k and of T1 and T2, RS and RD (from ID_VG1S, ID_VDS and ID_VG2S characteristics). Dynamic parameters: Ls, Cg1, Cg2, Cf, Co1 and Co2 (from values of capacitances and admittance parameters). Limiting parameters: IS and Vzener (from input breakdown voltages and leakage input current). The proposed macromodel is firstly intended to work at TA= 25oC, since most datasheets values are specified for those ambient conditions. The DGM BF966 will be used to describe the EPM. Then the EPM will be applied to other devices.

Also for the gate - source voltage levels analyzed, with constant VG1S, ID depends lightly on VG2S, see Fig. 2b. Thus, the value of 1 can be obtained.

between ID and VG2S is almost linear. This means that T2 has strong feedback from the equivalent T2 source resistance Rsource and therefore the ID_VG2S characteristic will have a slope equal to Rsource-1.

Figure 2. (a): Drain current vs. Gate 1 Source voltage, (b) Zoom in

For VG1S = VG1Sa, < VG2S, T1 will be in saturation region but near to triode region, VDS(T1) VG1Sa VT1. Under this assumption and taking ID = IDa, the value of k1 is:

Figure 3. a) Drain current vs. Drain Source voltgage b) Drain current vs. Gate 2 Source voltage

(1)

For VG2S = VG2Sb >> VG1S = VG1Sa, ID = IDb, it can be assumed that VDS(T1) VG2Sb. Thus:

It can be observed in Fig. 3b that:

(2)

Fig. 3 shows the ID_VDS and ID_VG2S characteristics. For ID 0, VG2S = VT2. For low values of VG2S and high values of VG1S (T1 in triode region and T2 in saturation region), the relationship

From Fig. 1b, Rsource is the serial resistance obtained between RS and the T1 drain-source equivalent resistance in triode region. Since, in this case VG2S << VG1S = VG1Sc, VDS(T1) 0. Therefore, the value of T1 drain-source equivalent resistance in triode region results to be gm1-1, approximately. The value of RS will be:

The same result is obtained by analyzing the ID_VG1S characteristic in Fig. 2, for VG1S = VG1Sc >> VG2S. It can be observed that, ID/VG2S is approximately constant for VG1S = VG1Sc and constant VG2S.

Thus, for a particular value of ID and VT1 VT2 (acceptable in the DGM), results k2 >> k1, according to the analyzed characteristics: For high values of VG2S, ID = f (VG1S2). For low values of VG2S, ID = f (Rsource). Thus, k2 can be approached by choosing a value higher than k1. For example, k2 10k1 is a good relationship. Finally, analyzing the ID_VDS characteristic of Fig. 3a, for T1 and T2 in triode region: for low VDS (VDS 0) for high VG1S = VG1Sd. The characteristic is approximately linear with a slope mDS:

If Co2 = 0, all parameters are fixed and it could be not possible to adjust values of Y12 y Y22. Therefore, Co1 could be first obtained from Y12 and then, Co2 from (8).

Since k2 >> k1, results (5):

(4) (5)

The value of RD can be null, depending of device characteristics. Since the values obtained are approximated ones, if RD << RS + gm1-1,it is assumed RD = 0. B. Dynamic parameters The dynamic parameters are extracted with T1 and T2 in saturation region and ID = IDQ. The values of Cg1, Cg2 and Cf are obtained from typical values of datasheets, since the capacitances in the proposed macromodel remain constant under any voltage variation. The Fig. 4 shows the Im(Y11)_Re(Y11).characteristic. This is one of several ways to show the short circuit input admittance Y11 = ig1/vg1s |vd=vg2=0. From Fig. 1b, results (6):

Figure 4. Short circuit input admittance (Y11)

Sometimes, datasheets do not specify all admittances. However, based on simulation results, the following can be assumed:

(9)

C. Limiting parameters As breakdown gate-source voltage and leakage current are known, the breakdown voltage, VZ, and saturation current of diodes, IS, are respectively obtained by:

(6)

(10)

Where, Zs is the equivalent impedance between source of T1 and source of the DGM grounded. Since Im(Y11) > 0 and Re(Y11) > 0, Zs must be inductive. Thus, if it is assumed that, for any useful frequency, the inductive impedance is lower than any capacitive impedance connected to source of T1 (this assumption is verified for all DGM analyzed), solving (6):

Ls Re(Y11) [2 Cg12(k1IDQ) ] -1

(7)

Where the value of Re(Y11) is obtained by Fig. 4 for a chosen value of frequency and drain current. Since, the capacitances are calculated at 1 MHz, for this frequency Zs 0 and the output capacitance will be:

Co Co1 + Co2

(8)

The components of Co, Co1 and Co2 are useful because: If Co1 = 0 Y12 = 0 (neglecting Cf in macromodel) or Im(Y12) < 0 for any frequency. If Co1 = 0 Re(Y22) = 0 (neglecting RD in macromodel).

Figure 5. ID vs. VG1S: BF1211 (continuous line) and Macromodel (dashed line)

III.

In this section, the characteristics of some DGM and macromodel responses are compared. The devices under test are: BF966, BF998 y BF1211 [6,7]. Table I shows the values of parameters obtained for the devices mentioned above. The comparison between typical characteristics and macromodel response is shown in Figs. 5, 6 and 7.

TABLE I. Parameters VT1 VT2 k1 0.1 k2 1 RS RD Ls Cg1 Cg2 Cf Co1 0.2 Co2 VZ IS MACROMODEL PARAMETERS DGM under test

BF966 BF998 BF1211

model are shown in Figs. 8 and 9 (Y21 and Y12 parameters). The response of both models fits approximately to the real values.

0.5 V 20 mA/V2 0.1 V-1 50 0 1 nH 2.1 pF 1.1 pF 15 fF 0.2 pF 8V 30 nA Figure 7. Y11: BF966 (continuous line) and Macromodel (dashed line)

The number of parameters needed to create the macromodel netlist (obtained by means of the EPM) and the netlist of current model, are shown in Table II.

Figure 8. Y21: BF966 (continuous line), Intusoft model (dashed line) and Macromodel (dashed line), module and phase.

V.

CONCLUSIONS

The EPM used is a simple and quick method to create a macromodel of DGM. The response of this macromodel fits the real response and more complex current models response. The basic values from datasheets needed to build the macromodel are:

Figure 6. ID vs. VDS: BF1211 (continuous line) and Macromodel (dashed line)

ID_VG1S Characteristic. ID_VDS Characteristic. Y11_Frequency Characteristic. Input, output and feedback capacitances.

IV.

CURRENT MODELS

The comparison between the real characteristics (device BF998), proposed macromodel and PSpice Intusoft current

ISBN 978-987-655-003-1 EAMTA 2008

Maximum voltage and leakage current. By means of the EPM, reference values are obtained, and such values can be improved using extra simulations and measurements.

[1]

REFERENCES

Barsan, Analysis and Modeling of Dual-Gate MOSFET`s, IEEE Transactions on Electron Devices, Vol. ED-28, no 5. pp. 523-534, May 1981. Malobabic, Ortiz-Conde, Garca Snchez, Modeling the UndopedBody Symmetric Dual-Gate MOSFET, Proceedings of the Fifth IEEE International Caracas Conference on Devices, Circuits and Systems, Dominican Republic, Nov. 3-5, 2004, pp. 19-25. Kung-Hao Liang, Yi-Jen Chan, A 0.18 um Dual-Gate CMOS Model for the Design of 2.4 GHz Low Noise Amplifier, Radio Frequency Integrated Circuits (RFIC) Symposium, 2006 IEEE, pp. 313-316. Intusoft Newsletter, SpiceMod Helps Model Dual-Gate Mosfets, August 1991, pp. 23_6-22_8. Vishay-Telefunken, BF966 Datasheets, Rev. 3 - 20 Jan 1999. Philips, BF1211 Datasheets, December 2003. Philips, BF998 Datasheets, August 1996.

[2]

[3]

Figure 9. Y12: BF966 (continuous line) , Intusoft model (dashed line) and Macromodel (dashed line), module and phase. TABLE II. NETLISTS COMPARISON

F. Vargas*, J. Benfica*, L. Piccoli*, M. Moraes*, E. Gatti**, L. Garcia**, D. Lupi**, F. Hernandez***

*

Electrical Engineering Dept. Catholic University PUCRS. Porto Alegre Brazil. ** INTI. Buenos Aires Argentina. *** Universidad ORT / URSEC. Montevideo Uruguay.

vargas@computer.org, lupi@inti.gov.ar, fhernandez@uni.ort.edu.uy

AbstractWe present a configurable standard environment for electromagnetic (EM) immunity measurement of prototype system-on-chip (SoC). The environment is composed of two boards compliant with the 62.132-2 and 62.132-4 IEC Std Parts, being conceived for radiated and conducted measurements, respectively. The SoC under test can be prototyped on two types of ICs: two FPGAs and a microcontroller. Practical experiments have been carried out. The obtained results demonstrate the utility and benefits from using the proposed platform to estimate in an early stage of the design process the behavior of embedded systems operating in EM environment.

The remainder of this paper is divided as follows: Section 2 presents the proposed platform. Section 3 describes the case-study and the practical experiment that have been carried out to demonstrate the utility and benefits from using the proposed platform. Finally, Section 4 summarizes the main points of this work. II. PROPOSED PLATFORM The proposed environment is composed of two boards for radiated and conducted electromagnetic immunity measurements. With this infrastructure, multiple embedded microprocessors like MicroBlaze1 and PowerPC 601 running uCLinux or uCOS-II2 [6,7] can be prototyped. Additionally to the hardware parts, several implementations of VHDLdescribed embedded intellectual property (IP) cores and Ccode programs can also have their immunity response measured and compared to each other in order to leverage the final dependability level for the SoC on the design. Figure 1a presents a photograph of first board (Board I), designed and fabricated according the IEC 61.132-2 standard for radiated electromagnetic (EM) immunity measurement. The test side of this board is shown in Fig. 1a, which contains the IC under test (Xilinx FPGA, Spartan 300E). This side contains also the board ground layer. Fig. 1b shows other side of the board, which contains the remaining logic (SRAM memories, clock generator and voltage regulators, among other components). This board side also lays down the VDD distribution network for the system. The two inner layers of the board are used for signal propagation. Around the board, it can also be observed a ground ring used to attach the onboard system ground with the TEM cell ground into a

1

I.

INTRODUCTION

The roadmap for standardization of immunity measurement methods has reached a high degree of success with the IEC 62.132 proposal [1]. Recently (2006), some extensions have been proposed through research publications, which aim at extending the Bulk Current Injection Method and the Direct Power Injection Method to 10 GHz [2]. At the same time, the technology scale down offers the possibility to design more complex integrated circuits (ICs) [3], with tenths of millions of transistors placed and routed in between more than one thousand I/O pins. The supply voltage is continuously decreasing, reaching less than 1 volt for the IC core, and less than 2 volts for the periphery and I/O pads. This scenario reduces noise margins and increases circuit susceptibility to external electromagnetic (EM) waves [4,5]. There has been an increased demand for EMC models applicable to integrated circuits and hardware/software-based prototyping vehicles, in order to conduct compatibility analysis early in the system-on-chip (SoC) design process. It is at this point that we introduce our work. We propose hereafter an innovative (configurable) platform for measuring the EM susceptibility of SoCs prototyped during the design phase. Depending on the designer interest and the target application, the prototype immunity can be measured with respect to the hardware and/or the software parts o the SoC. In the best of our knowledge, this is the first time that this kind of platform is reported.

MicroBlaze is a true 32-bit soft RISC processor optimized for use in Xilinxs FPGA architectures. The processors main memory interface conforms to the IBM CoreConnect specification for the On-Chip Peripheral Bus (OPB). 2 MicroC/OS-II has been certified to RTCA DO-178B Level A for use in avionics systems where failure could result in catastrophic loss of the aircraft, and approved for use in FDA Class III medical devices where failure could result in loss of life for the patient or clinician.

(a)

Figure 1. Board I: 10x10cm2 IEC 62.132-2 std compliant board comprised by four-layers: Gnd (top) / signal / signal / Vdd (botton). (a) Top view; (b) Bottom view.

(b)

Figure 4. Shielding box for radiated test: (a) General view; (b) Inside the GTEM Cell.

unique reference. Fig. 2 presents the basic blocks composing Board I. The second board contains two Xilinx Spartan 500E FPGAs, a Texas 8051-like microcontroller, 16MBytes of SDRAM, and 8MBytes of serial Flash memory, among other glue logic required for communication with the test host computer (see Fig. 3 for details). In this figure, side (a) contains the components under test, i.e., the parts whose EM measurements can to be performed; whereas side (b) contains the remainder of the logic (processor bus, memories, crystals, connectors and external environment communication-support ICs, among other devices). Fig. 4 depicts the shielding box for radiated testing. The remainder logic of the board is protected inside the box, while the devices under test (FPGAs and microcontroller) are placed externally, to be exposed to EM fields. Fig. 5 presents the basic blocks of Board II.

Board

SRAM

(16Mbytes) RS232 + JTAG

III.

EXPERIMENTAL RESULTS

This section presents the case-study and the practical experiment that have been carried out to demonstrate the utility and benefits from using the proposed platform. A. Case-Study With this purpose in mind, we conducted an experiment aiming at analyzing the radiated electromagnetic sensitivity of a watch-dog processor intellectual property (WDP-IP) core [8,9] designed to monitor the Xilinx MicroBlaze soft core processor running under the uCOS-II operating system control. This system is said to be the Test Vehicle, prototyped in Board I (Fig. 6a presents a general view of this system, whereas Fig. 6b depicts details of the WDP-IP core basic blocks). The whole SoC was described in VHDL language (VLSI Hardware Description Language). (a)

Operating System (OS) Driver

FPGA

(Spartan 300E)

JTAG Serial

(a)

(b)

AP

Error Indication

WDP-IP

AP Bus

Figure 3. Board II: IEC 62.132-4 std compliant board comprised of 6 layers for conducted immunity measurement. Views: (a) Top; (b) Bottom.

Data Memory Instruction Memory OS Kernel Memory

(b)

Counter

CAM Memory

Control Logic

Error Indication Bus Interface Logic

Comparison Logic

WDP-IP

AP Bus (c)

Process ID

0000 0001 n-1

Idle Counter

0 1 0 011101 000101 000101

Bound Counter

Slower

000011 000111 000011

Parity

0 1 1

Supper

010111 010001 111000

Figure 6. SoC prototyped in Board I: (a) General architecture; (b) WDP-IP basic blocks; (c) CAM memory architecture.

current counted number of clock cycles is in the clock cycles range [Slower , Supper] estimated for a given task. 5) CAM Memory: The memory fields shown in Fig. 6c are interpreted as follows: Process ID contains (4-bit) information about the name of the existing system tasks; Idle field (1 bit) indicates whether the time that a task is waiting for being executed by the processor is under a predetermined value; the (32-bit) Counter field shows the current number of clock cycles summed by the WDP-IP up to a given moment; the fields Slower and Supper (32 bits each) store the minimum and the maximum number of clock cycles computed system simulation for the processor to complete the execution of a given task; finally, the Parity field contains the parity bit for the whole line of the CAM memory. This bit is used by the WDP-IP to run a sanity check, when requested by the processor. Aiming at accessing the WDP-IP, a dedicated driver was written in C-ansi, and compiled with the OS kernel. By means of this driver, the processor informs the WDP-IP about the beginning and completion of user tasks. On the other direction, the WDP-IP uses this driver to signal to the processor a system failure or to periodically indicate its own health status. The driver contains two functions. The first one is ip_cmd, which is used by the Xilinx MicroBlaze to write instructions into R-ONE and write data into R-TWO to the WDP-IP, or read data stored in R-TWO by the WDP-IP. The second function is ip_sw. This is used by the processor to indicate to the WDP-IP to switch from one task to another. When this command is used by the Xilinx MicroBlaze, the WDP-IP understands that it must switch from one task to another (by saving the context of the first task in its CAM Memory and by recovering the context of the second one also from this memory). Figure 7 depicts the basic communication sequence between processor and WDP-IP during a multi-task execution. When the Xilinx MicroBlaze starts running a task, it signals to the WDP-IP (Fig. 7: command ip_cmd) to reset the counter and then, start counting from zero the number of clock cycles needed by the processor to execute such a task (in Fig. 7, this command is used once at the first time tasks #1, #2 and #3 are executed). When the OS switches context, moving from one task to another, the Xilinx MicroBlaze signals to the WDP-IP (Fig. 7: ip_sw) to perform the following actions: (a) save the current counter value for the leaving task (the one going to background) into the CAM Memory; (b) reload the counter with the partial value stored in the CAM Memory for the next task in the OS waiting list (the one coming to foreground) and increment the counter from this value on, till the moment when the task is switched back again to the wait state (background). This process is repeated as many times as the task is run, until its complete retirement by the processor. Note that the communication process between the processor and the WDP-IP is done by the OS under the supervisor mode control. In addition to allow application programs to be compiled as they are, i.e. with no modifications, this condition also increases system reliability

Hereafter, it is presented a brief description of the WDP-IP basic blocks, as depicted in Fig. 6b: 1) Bus Interface Logic: This block is composed of two 16-bit registers, namely R-ONE and R-TWO. R-ONE is used by the processor to write a command to be executed by the WDP-IP (e.g., reset the whole CAM contents, reset only the Counter column of the CAM, perform a ping in the WDPIP), whereas R-TWO is used to write a data to the WDP-IP or to read a data solicited by the processor from the WDP-IP. 2) Control Logic: The Control Logic is a very simple combinational circuit used to decode the commands received from the AP through R-ONE and to write/read data into/from R-TWO. This block is also responsible for managing the task scheduling process inside the WDP by loading/resetting the 32-bit counters of the Counter block and by interrupting the AP in the event of system error detection. Another role of this block is to periodically reset the whole column Idle in the CAM Memory (Fig. 4). The periodicity by which this column is reset is defined by the maximum number of clock cycles that the processor is allowed to execute before returning control to run other slice of the same task again. 3) Counter: This block is a 32-bit counter with reset and preset commands used to count the number of clock cycles required by the processor to run a given task. The preset command is used to load the counter with the Counter field of the CAM Memory before continuing the count operation in the event of a context switching (task switch) controlled by the OS under the time-shared basis. 4) Comparison Logic: This block is basically a fulladder used by the Control Logic to determine whether the

ISBN 978-987-655-003-1 EAMTA 2008

ip_cmd

ip_cmd

ip_sw

ip_sw

ip_sw

The Interface Board observed in Fig. 8b and 8c is used to perform communication between the Test Vehicle and the external computer (test host). The Interface Board is responsible, for instance, for the RS232 serial and for the JTAG communications between the test engineer and the Test Vehicle during measurements procedure. It is worth noting that we have also implemented a second version of the WDP-IP. This version presented the same functionalities as the WDT-IP in hardware, but it was implemented purely in software (C-ansi) and compiled with the kernel of the uCOS-II OS. Additionally, the fault detection capability of the proposed I-IP was compared against the uCOS-II OS native fault detection structures existing in its own kernel. In summary, test measurements were carried out on three different system configurations: (a) microprocessor + WDP-IP in hardware (VHDL); (b) microprocessor + WDP-IP in software (C); and (c) microprocessor + uCOS-II OS native fault detection structures (original uCOS-II OS kernel). Fig. 9 summarizes the measurements for this experiment. Figure 10 presents the occurrence of faults as a function of the: modulated EM signal frequency (Fig. 10a) and the EM field incident on the board under test (Fig. 10b). For instance, in the frequency range of 100-200MHz (Fig. 10a), the system under test presented 265 faults: 174 (65.7%) were detected by the WDP-IP in hardware, 88 (33.2%) were detected by the WDP-IP in software, and 3 (1.1%) were detected by the native fault detection structures existing in the kernel of the uCOS-II OS. Table 1 summarizes the main characteristics of the WDP in hardware. The area overhead is computed with respect to the one required to lay down the MicroBlaze processor.

TABLE I. Area 11.90% (Configurable Logic Blocks) OVERHEADS MEASURED FOR THE WDP I-IP. Memory (Bytes) 0.77% (OS-Kernel Driver for the WDP)

0

100%

Background

Task#1

Task #2

Task#1

#2

Task#3

Time

Figure 7. Basic commands and communication sequence between the Xilinx MicroBlaze and the WDP-IP during normal system operation.

since any communication with the WDP-IP has the priority and the security native from the OS instead of the application programs ones. B. Practical Experiment To perform the experiment, we implemented three user tasks running in the processor under the time-shared basis: a random prime numbers generator (PNG), a bubble sort to reorder a matrix (BS), and a digital filter (DT). This experiment was based on the International IEC 62.132 Standard Part 2: Measurement of Radiated Immunity TEM Cell Method. Figure 8 depicts the TEM-Cell and the test setup at the Instituto Nacional de Tecnologia Industrial INTI, Buenos Aires, where the experiment was conducted. Dealing with minimizing test procedure complexity, we arbitrarily decided to stop the experiment when we succeeded to obtain 330 measurements of system failure. This resulted in a total time of system exposition to radiated EMI of approximated 40 hours. The test conditions were as follows: a) EM field range: from 10 to 220V/m; b) Measured frequency range: from 150KHz to 3GHz (extended IEC 62.132-2); c) Signal Modulation Format: three different types were used: 80%, Without Modulation, and Pulsed Signal.

System d T t

Performance Degradation (ms) Negligible (some assemblylevel macros are inserted in the OS kernel to perform CPU-WDP communication)

2 27 2 0 0

3 88

N. of detected faults

80%

60%

3 11 174 77 4

40%

20%

0%

(b)

0-99

100-200

600-699

700-799

800-899

900-100

WDP-IP (HW)

WDP-IP (SW)

uCos-II

Frequency (MHz)

0

100%

2 28

2 63

1 4

0 2

0 13

TEM Cell

Interface Board

N. of detected faults

(a)

80%

12

60%

40%

Figure 8. Test environment showing TEM Cell and test vehicle prototyped in Board I. (a) General view; (b) and (c) Closer views detailing the test vehicle with the FPGA board side turned into the chamber.

24

78

119

17

26

20%

0%

0-35

36-70

71-105

106-140

141-175

176-210

WDP-IP (HW)

WDP-IP (SW)

uCos-II

EM field (V/m)

Figure 9. Fault detection capability measured for the I-IP approach during IEC 62.132-2 test session: Approaches comparison and Classification of observed errors.

Approaches Comparison

Fault Occurrence %

Types of errors

60 50 40 30 20 1 0 0

Fault Detection %

90 80 70 60 50 40 30 20 1 0 0

81

52

29 19

26

uC/OS II

WDP-IP/SW Approaches

WDP-IP/HW

HW Configurations

Other Faults

Practical experiments have been carried out. The obtained results demonstrate the utility and benefits from using the proposed platform to estimate the behavior of embedded systems operating in EM environment. ACKNOWLEDGMENT The work reported in this paper has been partially funded by CNPq (Science and Technology Foundation, Brazil). REFERENCES

[1] [2] www.iec.ch (last access on 30/04/2007). E. Sicard, F. Vargas, F. Hernandez, F. Fiori, J. P. Teixeira. Design and Test on Chip for EMC. IEEE Design and Test of Computers, Issue. Nov/Dec. 2006, pp. 502-503. Bernardi, P.; Veiras Bolzani; L. M.; Rebaudengo, M.; Sonza Reorda, M.; Vargas, F. L.; Violante, M. A New Hybrid Fault Detection Technique for Systems-on-a-Chip. IEEE Transactions on Computers, Feb. 2006, Vol. 55, No. 2. pp. 185-198. Semio, J,; Rodriguez-Irago, M.; Piccoli, L.; Vargas, F.; Santos, M. B.; Teixeira, I. C.; Andina, J. J. R.; Teixeira, J. P. Digital Circuit Signal Integrity Enhancement by Monitoring Power Grid Activity. 8th IEEE Latin American Test Workshop (LATW'07), Cuzco, Peru, 11-14 March 2007. Steinecke, T. Experimental Characterization of Switching Noise and Signal Integrity in Deep Submicron Integrated Circuits. IEEE International Symposium on Electromagnetic Compatibility, Washington - DC, USA, 21-25 August 2000. pp. 107-112. Micrium Empowering Embedded Systems. www.ucos-ii.com (last access on 25/03/2008). Validated Software Corporation. http://www.validatedsoftware.com (last access on 25/03/2008). Vargas, F.; Piccoli, L.; Benfica, J.; Alecrim Jr., A.; Moraes, M. TimeSensitive Control-Flow Checking for Multitask Operating SystemBased SoCs. 13th IEEE International On-Line Testing Symposium (IOLTS07), Crete, Greece, 9-11 July 2007. Vargas, F.; Piccoli, L.; Benfica, J.; Alecrim Jr., A.; Moraes, M. Summarizing a Time-Sensitive Control-Flow Checking Monitoring For Multitask SoCs. IEEE International Conference on Field Programmable Technology (FPT06), Bangkok, Thailand, 13-15 December 2006. pp. 249-252.

Figure 10. Fault occurrence as a function of the modulated EM signal frequency (a) and the EM field incident on the board under test (b).

After analyzing the measurement results, we concluded that: The uCOS-II kernel native fault detection was very low (approx. 1%) because of the embedded structures were able to detect only those faults that resulted in an increase of the time allocated by the OS for the processor to run the task slices. Note that those faults that reduce the time allocated by the OS (for instance, resulting in a procedure aborting) are not detected by the kernel native structures. The WDP-IP software version was capable to detect (in addition to those faults that resulted in an increase/decrease of the time allocated by the OS for the processor to run the task slices) most of the faults that affected user memory elements (FFs and SRAM). However, it failed to signal most of those faults that changed the FPGA configuration bitstream. Several of these faults yielded system crash (processor should be reinitialized). The WDP-IP hardware version was capable to detect most of the faults that affected not only user memory elements, but also those that corrupted FPGA configuration logic. In addition to this, the WDP-IP also detected those faults that corrupted (by increasing or reducing) the task slice execution time frames defined by the OS. IV. CONCLUSIONS

[3]

[4]

[5]

[6] [7]

[8]

[9]

There has been an increased demand for hardware/software-based prototyping vehicles in order to conduct compatibility analysis early in the system-on-chip (SoC) design process. In order to address this point, we presented a configurable standard environment for electromagnetic (EM) immunity measurement of prototype system-on-chip (SoC). In the best of our knowledge, this is the first time that this kind of platform is reported. The environment is composed of two boards compliant with the 62.132-2 and 62.132-4 IEC Std Parts, being conceived for radiated and conducted measurements, respectively. The SoC under test can be prototyped on two types of ICs: two FPGAs and a microcontroller. The underlying advantages of the proposed test platform rely on: (a) reduction of SoC design cost and time due to early-estimation of system behavior in the presence of EM noise according to recognized standards, and (b) allowance of measurements for hardware (IP cores) as well as for software (user-code and operating system-kernel).

10

Low Dimensionality Electronic Devices Based on Heterodimensional Schottky Contacts: Modeling and Experimental Results

Murilo A. Romero and R. Ragi

Electrical Engineering Department University of So Paulo So Carlos - Brazil E-mail: muriloa@sel.eesc.usp.br

AbstractThis paper discusses the modeling and experimental results for a new family of electronic devices in which a Schottky metal is placed in direct contact to a low dimensionality structure such as a quantum-well or a quantum wire. Based on those principles, we experimentally demonstrate the improved performance of both a microwave varactor and a MSM photodetector. The results are explained in terms of a fully quantum mechanical model, by self-consistently solving Schrodinger and Poisson equations.

Bahram Nabet

Electrical and Computer Engineering Department Drexel University Philadelphia - USA high breakdown voltage, making them very promising for applications in ultrahigh frequency and low-power electronics [2]. However, despite the significant amount of device related work, the number of investigations on the modeling of the capacitance-voltage and current-voltage characteristics of these metal to 2-DEG interfaces is still limited.

Schottky contact

I.

INTRODUCTION

The properties of electrons in an inversion layer have attracted interest since the invention of the MOS field-effect transistor. Further attention has been motivated by the enhanced transport properties of the two-dimensional electron gas (2-DEG) formed at modulation doped heterointerfaces, where the inversion layer is quantized in the growth direction. Already in the early 90's High Electron-Mobility Transistors (HEMTs) based on this principle displayed power amplification well above 100 GHz with outstanding noise performance. In HEMT devices the 2-DEG is accessed by ohmic contacts. Here, we focus instead on electronic devices based on Schottky contacts to low dimensional systems, such as a 2DEG. For example, Fig. 1 shows a schematic interface of a Schottky contact between a three-dimensional (3D) metal and a two-dimensional electron gas (2-DEG), arising from an AlGaAs/GaAs modulation doped heterostructure grown on top of a GaAs substrate, the same layer structure of a conventional HEMT transistor. Other configurations are also possible, including the contact between a quantum wire (1DEG) and a three-dimensional (3D) Schottky metal In fact, since their first proposal [1], devices relying on the contact of a Schottky metal to a low dimensional system displayed several attractive features, such as low capacitance due to the small effective cross-section, excellent noise and transport characteristics due to the 2D electron gas as well as a

depletion region

2-DEG Channel

Fig. 1: Schematic view of the 3D-2D contact, also showing the depletion region which appears between the metal and the 2-DEG channel. The semicondutor layers above and below the 2-DEG channel are not shown.

In this framework, the present paper is a overview of our work in this field during the last few years, incorporating also several new results concerning 1D-3D structures. In section II, the first device investigated is a gatecontrolled Schottky diode varactor. The three-terminal varactor is a modulation-doped heterostructure of AlGaAs/GaAs with two Schottky contacts, similar to a metalsemiconductor-metal (MSM) structure. Schottky metal contacts are directly made to a two dimensional electron gas (2-DEG). The third gate contact is formed from highly doped n+ GaAs material to allow an open optical window that can be

11

used for optical gating and mixing. Measured capacitance is less than 1 PF and a change of more than 30 percent from the zero bias capacitance is observed with the applied gate voltage. In section III, the device is modeled for Capacitance vs. Voltage characteristics by numerically solving Poisson, Schrodinger equations under effective mass approximation in real space. In section IV, we explore the I-V characteristics of those devices. Our experimental data, comparing two MSM (metalsemiconductor-metal) devices with identical layer structures, one of them employing metal to 2-DEG Schottky contacts, while the reference device uses conventional bulk semiconductor-metal contacts, shows that a reduction of the reverse saturation current, by almost one order of magnitude, was achieved. We show that this result, which makes these devices very attractive as low noise photodetectors as well in low leakage gate contacts for next-generation transistors, is explained by our model as a consequence of a strong suppression of the thermionic emission current in the reduced dimensionality contact due an effective Schottky barrier height enhancement, caused by energy quantization. Since this energy level quantization is expected to be even more significant in unidimensional system, our next step was to move to the study quantum-wire based devices. To do so, we implemented, in section V, by extending the results of section II, a novel two-dimensional finite-difference self-consistent Schrodinger-Poisson solver to model quantum wires systems, structures which may be a promising platform for several applications, including solid-state based quantum computing. Our simulation tool has already allowed us to investigate a number of intriguing features and work is under way to provide the experimental validation of these last theoretical findings.. II.

FABRICATION AND CHARACTERIZATION OF A 2-DEG VARACTOR

Q C= c = V

(m + 2)(V + Vbi )

qB( o r )

m +1

(m + 2 )

(1)

Eq. (1) shows that the capacitance varies with the reverse s terminal applied bias as C (V + Vbi ) , with sensitivity

s = (m + 2 ) . or a uniformly doped abrupt-junction this reduces to the usual inverse square root bias dependence for the capacitance. In Schottky diode varactors the junction capacitance changes as the reverse applied bias voltage depletes the semiconductor channel until it reaches its minimum physical limit. Although some variation due to charge storage can still be observed, further increase on the applied bias typically causes no significant effect.

In what follows we describe the fabrication of a threeterminal varactor in which two back to back Schottky contacts are made to a 2DEG channel formed through modulation doping of a heterostructure. While maintaining desirable properties of the Schottky diode, this heterojunction metalsemiconductor-metal (HMSM) device has the advantage of ease of fabrication, since both contacts are deposited on the same step. Also, an important distinguishing feature of our device proposed is the addition of a third contact, a gate, allowing a second degree of freedom to modulate the capacitance, in particular, and channel characteristics, in general. We chose to fabricate the gate for this varactor HMSM by using a thin layer of highly doped n+ material. This layer is mostly transparent to light in such way that the device can be electrically as well as optically controlled. The structure of the HMSM-varactor is shown in Fig. 2. On top of a buffer layer grown on semi-insulating GaAs substrate, 5000 of undoped GaAs was deposited, followed by 100 of undoped Al24Ga76As and 500 of 3x1017 cm-3 n-type Al.24Ga.76As. The topmost layer is 200 of 3x1018 cm-3 n-type GaAs layer. All growth was done by molecular beam epitaxy (MBE) and the structure was chosen to be compatible with enhancement type HEMTs where the n+ cap layer is usually used for ohmic contact formation. A trench was formed by wet chemicals etching through the Al.24Ga.76As layers and 500 of Schottky Ti/Au contact metal was deposited on the GaAs side to form Schottky junctions with the 2-DEG. The contacts have the usual interdigital structure for a total device area of a total area of 40 m x 40 m.. Since the cathode and anode terminals are recessed, the top n+ layer readily becomes available for probing and voltage application.

Varactors have extensive applications in communication transceivers and control systems circuitry. They are the main building blocks in Voltage Controlled Oscillators (VCOs) and Schottky diode varactors have been vastly investigated for frequency multiplication applications. Due to the lack of charge storage time delays, Schottky diodes have the added advantage of high-speed operation compared to PIN diodes. Their most important property, however, is that Schottky contacts are routinely deposited as gates of MESFETs and HEMTs, making these devices compatible with unipolar MMIC technology. Varactors are variable capacitance devices, as they utilize the voltage dependence of semiconductor junction capacitance. Generally, for any semiconductor junction with arbitrary doping profile , the capacitance equals:

12

N+ GaAs T-line Vg Vt

Optical Window

Vg

consistently solving Schrdinger and Poisson equations in the growth direction. The quantum-mechanical formalism is based on the effective mass approximation, where the electron wavefunction is taken as the product of a Bloch function and an envelope function, solution of the time-independent Schrdinger equation: Hi(z) = Ei i(z) where z is the direction perpendicular to the epitaxial layers. The utilized Hamiltonian is based on a generalized expression suited for both conventional as well as strained quantum-well devices, since it is able to account for position-dependent effective mass and lattice constant. It is given by:

H

2 d 2az dz

Fig. 2. Layer structure of the 2DEG varactor showing terminal contacts and the n+ transparent gate.

a 2z d 1 m z dz az

V efz

The capacitance-voltage characteristics of the device were measured by directly probing the three terminals and recording the results by a HP LCR meter. In Fig. 3, the device capacitance is plotted as a function of gate voltage around zero bias terminal voltage. It is seen that capacitance changes from 311fF to 417 fF, a change of about 35% for a gate voltage variation of 15 volts. It is observed that a large gate voltage is needed to achieve capacitance modulation. This can be related to the voltage drop along the n+ layer lines, which, although highly doped, is very thin and has a long length. One way to enhance the device sensitivity to gate voltage is to sacrifice its fabrication simplicity by aligning and depositing gate metal. This solution, however, in addition to requiring more processing effort, would suppress the optical coupling capability. In order to gain further insight on the C-V characteristics in the next section a theoretical model is developed and compared to the experimental results.

420 400 Capacitance [ fF ] 380 360 340 320 300 0 2 4 6 8 10 12 14 Gate Voltage Vg [ V ]

where a(x) e m* (z) are the position dependent lattice-constant and effective mass. The effective potential Vef includes not only the band-diagram discontinuities and the Hartree term due to the electrostatic potential but also an exchangecorrelation term as well as strain components caused by lattice mismatch [3]. The Poisson equation, which yields the Hartree term, is given by:

d z d dz 0 dz Vz qN z N z nz a d , 0

where q is the electronic charge, (z) is the position dependent dielectric constant of the semiconductor, ND+ is the ionized donor concentration, NA- is the ionized non-intentional background acceptor concentration and n(z) is the freeelectron concentration in the conduction band (the free hole concentration has been neglected). Note that the above formulation is a set of coupled differential equations, since the free-carrier concentration n(z) is by its turn a function of electronic eigenfunction i(z) [3].

In order to perform the capacitance calculation for a given terminal voltage, the channel is initially divided into several segments of length dy. Next, the capacitance contribution of each segment is computed by solving the one-dimensional Schrodinger-Poisson problem above in the growth direction but now under an effective surface potential Vs = Vs V(yi), where yi is a coordinate position located in middle of each segment dy. A quasi-static approach was used, giving the capacitance per unit area as the total charge variation caused 16 by a small voltage change around a given bias point. Then, this free-carrier capacitance, Cfree, is given by the summation of the capacitance contribution for each segment dy. Further details concerning our procedure can be found in [4]. In order to verify our model the theoretical predictions were compared to the experimental data of a control AlGaAs/GaAs heterodimensional device fabricated in our labs and described in section II. Fig.4 displays the free-carrier capacitance as a function of the terminal voltage for a heterodimensional Schottky-ohmic device. In order to obtain the theoretical results we used the method described above, employing the Petrosyan approximation [5] for the

Figure 3. Modulation of terminal capacitance with gate voltage at zero terminal voltage.

III. THEORETICAL MODELING The model presented in this section is a quasi twodimensional extension of a work previously published by the authors [3]. Since charge is mostly confined in triangular quantum well at the AlGaAs/GaAs interface, we start by selfISBN 978-987-655-003-1 EAMTA 2008

13

longitudinal potential V(y) and simulated exactly the same layer structure discussed previously, assumed to be subjected to a surface potential of 0.75V, due to surface states causing Fermi-level pinning at the AlGaAs/air interface. The theoretical results obtained are quite satisfactory. It is clearly seen that our model was able to reproduce, without any fitting parameter, the general features of the varactor Capacitance-Voltage characteristics, yielding capacitance values in the measured range. Given the uncertainty on some device parameters (such as the amount of non-intentional doping at the GaAs buffer layer and value of the surface potential at the top of the structure) no attempt will be made in this paper to strictly match theory and experiment. However, due to satisfactory results obtained above, we believe that this novel model is an useful tool to optimize the device performance, by providing useful design guidelines.

20

where h is the Planck constant, q is the electron charge, E is the electron energy and Ex is the kinetic energy component in the x-direction. The derivative of the energy with respect to the linear momentum component px is the electron velocity in the x-direction, perpendicular to the Schottky barrier and T(Ex) is the probability that an electron with energy Ex will pass through the barrier, which can be determined by the wellknown WKB approximation [6]. Due to the low dimensional nature of the 2-DEG, the electron motion is confined to the quantum well at the AlxGa1xAs/GaAs interface, where the first available energy state is above the conduction band by the amount of the first confined state E0. Based on this boundary condition, the equation above was obtained by restricting ourselves to the case where that the electron population is low enough so that only the first energy level of the 2-DEG presents a significant carrier concentration. The above integral can then be solved and an analytical expression for the case where the thermionic emission process is dominant can be obtained by restricting ourselves to the case in which all carriers present kinetic energy larger than the top of the Schottky barrier [4]. Keeping in mind that, at zero bias, Jgm = Jmg results [4]:

J 2q 2m k B T 3/2 h2 q exp E 0 exp B kBT k BT

15

10

exp

qV k BT

1 .

where J is the current density through the barrier, V is the applied voltage, B is the Schottky barrier height and Eo is the energy of the first confined state at the AlGaAs/As interface. Observe that the above equation resembles the result obtained for a standard Schottky contact. However, the temperature dependence is different. Also, it should be stressed that the additional term in E0 the energy of the first confined state, is not a phenomenological factor but rather arises directly from the derivation. Independent experimental evidence of this barrier enhancement effect due to energy level quantization was already provided in [7]. From the device level point of view, the effect of the first confined state, emerging from the above derivation is to produce an exponential reduction of the reverse saturation current by an increase in the effective Schottky barrier height. There are fabrication issues still to be solved concerning the fabrication of high-quality metal to 2-DEG contacts, in order to completely eliminate the current flow through the adjacent semiconductor layers. However, In Fig. 5, our experimental data, shown where we compare the two MSM (metal-semiconductor-metal) photodetectors with identical layer structures, one of them employing metal to 2-DEG Schottky contacts, shows that effective dark current reduction, by almost one order of magnitude, was achieved, according to our theoretical expectations Fig. 6, in which Eo was be obtained from the formalism described in section III. Theoretical ratios are larger only because the free-electron mass was used in the calculations.

Figure 4. Free-carrier contribution for heterodimensional Schottky device capacitance. Dots represent experimental data while the solid line is the theoretical prediction.

IV. I-V CHARACTERISTICS After investigating the Capacitance-Voltage (C-V) characteristics of the device, our next step was to study the current-voltage (I-V) features of those Schottky contacts. Unlike most formulations, we made an effort to address both tunneling and thermionic emission in a unified fashion. Therefore, in this more general framework, the carrier transport across the metal-2DEG interface is characterized by the quantum-mechanical transmission coefficient, defined as the ratio of the transmitted to the incident current. Carriers can traverse from the two-dimensional electron gas to the metal and vice-versa, corresponding to current densities designated by Jgm and Jmg , respectively. The expression for Jgm , for a parabolic energy-momentum relation, is proportional to the transmission coefficient T(Ex) multiplied by the occupation probability in the two-dimensional electron gas, fg(E) , the Fermi-Dirac distribution function in the gas, and the unoccupied probability in the metal, 1- fm(E) [4]:

J gm q 22 h E TE x f g E1 f mE p x dp x dp y

14

VC

10 Current (A)

-8

VG

VC

10

-9

2-DEG

CMSM HMSM 10

-10

wC

dCG wG

10

12

14

16

18

Voltage (V)

Figure 5. Experimental dark current characteristics of two metalsemiconductor-metal (MSM) photodetectors. CMSM refers to a conventional device while HMSM represents the device using our heterodimensional contact.

Figure 7. Schematic view of the quantum wire transistor for a structure with a split gate over a conventional GaAs-based heterostructure. Two top Schottky contacts, Vc determine the wire channel width while the top control gate Vg sets the electronic density. We assume the same AlGaAs/GaAs heterojunction of section II.

1000 750

Razo Ratio

500 250 0

To illustrate the principle of operation, Fig. 8 gives typical values for the potential profile in y-direction as a function of function of the applied voltage Vc. It is clearly seen that an increase in Vc will modulate the wire channel width.

d2(A)

Figure 6. Theoretical ratio r between the saturation current of a conventional versus a heterodimensional Schottky contacts. Calculations are given as a function of d2 and Nd , thickness and doping level of the AlGaAs layer.

V. THEORETICAL INVESTIGATION OF A QUANTUM WIRE TRANSISTOR Since this energy level quantization is expected to be even more significant in unidimensional system, our next step was to move to the study quantum-wire based devices and contacts. To do so, the formulation described in section III was altered to allow quantum confinement in two-dimensions. Also, it is necessary to write the carrier concentration n(x,y) is terms of the wavefunction i(x,y):

nx, y

Figure 8. Effective potential profile V(y) as a function of the applied voltage Vc.

x, y k x, y E k

k1

2m E E k

Then, the device will behave as a quantum wire transistor if a control voltage is applied to gate contact. The calculated charge control relationship, relating the applied gate Vg to the dE 1 expE Ek /k BT total charge per unit length within the quantum wire is given in Fig. 9.

where m is the electron effective mass in the 1-DEG channel, kB is the Boltzmann constant, T is the absolute temperature, h is the reduced Planck constant, EF is the Fermi level energy and Ei represents the i-th eigenvalue. Summation is carried out over all i subbands.

15

x 10

6

consequence of a Schottky barrier height enhancement, caused by energy quantization. Since this energy level quantization is even more significant in unidimensional system, our next step was to theoretically investigate a proposed quantum-wire transistor. We obtained the charge control relationship and work is under way to experimentally validate those last theoretical findings..

REFERENCES

2.5 ns (cm )

-1

1.5

[1]

1 -6

-5

-4

-3 Vg(V)

-2

-1

[2]

Figure 9. Charge control relation for the proposed quantum wire transistor

VI. CONCLUSIONS This manuscript described a family of electronic devices in which the required Schottky contacts directly reach a quantum well or a quantum wire, in which case serving as a charge reservoir. The first device discussed was a gate-controlled Schottky diode varactor. The device was experimentally characterized and a quantum mechanical model was implemented in excellent agreement to the measured results. Next, we investigated the I-V characteristics of those devices. A strong reduction of the reverse saturation current, by almost one order of magnitude, was experimentally achieved. It was shown that this result, which makes these devices very attractive as low noise photodetectors as well in low leakage gate contacts for next-generation transistors, is

[3]

[4]

[5]

[6]

[7]

W. C. Peatman, T. W. Crowe e M. Shur, "Design and fabrication of heterostructure varactor diodes for millimeter and submillimeter wave multiplier applications," Proceedings of the IEEE/Cornell Conf. Advanced Concepts High Speed Semiconductor Devices and Circuits, Ithaca, NY, 1991. M. S. Shur, W. C. Peatman, H. Park, W. Grimm and M. Hurt, "Novel Heterodimensional Diodes and Transistors", Solid State Electronics, Vol. 38, no. 9, pp. 1727-1730, September 1995. J.E. Manzoli, M.A. Romero and O. Hiplito, On the CapacitanceVoltage Modeling of Strained Quentum-Well MODFETs, IEEE Journal of Quantum Electronics, pp. 2314-2320. December 1998 R. Ragi, M.A. Romero and O. B. Nabet, Modeling the Electrical Characteristics of Schottky Contacts in Low-Dimensional Heterostructure Devices, IEEE Transactions on Electron Devices, pp. 170-175, February 2005. S.G. Petrosyan and A. Y. Shik, "Contact Phenomena in a TwoDimensional Electron Gas", Soviet Physics Semiconductors, Vol. 23, no. 6, pp. 696-697, June 1989. F. A. Padovani and R. Stratton, "The Accuracy of the WKB Approximation for Tunneling in Metal-Semiconductor Junctions", Applied Physics Letters, Vol. 13, no. 5, pp. 167-169, May 1968. T. Ytterdal, M. S. Shur, M. Hurt and W. C. B. Peatman, "Enhacement of Schottky Barrier Height in Heterodimensional Metal-Semiconductor Contacts", Applied Physics Letters, Vol. 70, no. 4, pp. 441-442, January 1997.

16

Theoretical Analysis of Power Clock Generator based on the Switched Capacitor Regulator for Adiabatic CMOS Logic

Yasuhiro Takahashi, Toshikazu Sekine

Department of Electrical and Electronic Engineering Gifu University, 1-1 Yanagido, Gifu-shi 501-1193 Japan Email: {yasut, sekine}@gifu-u.ac.jp

Michio Yokoyama

Department of Bio-system Engineering Yamagata University, 4-3-16 Jonan, Yonezawa-shi 992-8510 Japan Email: yoko@yz.yamagata-u.ac.jp

AbstractThis paper reports an analytical method of a power clock generator based on a switched capacitor circuit which is used in adiabatic logic. We derive rst an equivalent circuit model of the switched capacitor circuit. We then discuss the design optimization of the capacitance ratio. Finally, we show that the analytical results agree rather well with the SPICE simulation results.

I. I NTRODUCTION In the design of low-power VLSI circuits, adiabatic (or energy recovery) logic shows great potential, because they are able to break the lower limit of the energy dissipation in static CMOS which amounts to CVdd 2 /2, where C is a load capacitance, and Vdd is a supply voltage of VLSI circuit. Numerous designs of adiabatic logic have been presented [1][10]. The driving of adiabatic logic requires adiabatic controlled sources of voltage. The adiabatic drivers fall into two classes: resonant driver and staircase driver. The resonant driver generates the pulses from the natural oscillations of a resonator, with power recovery provided by a dc-voltage source. The generators of quasi-sinusoidal pulses can be built around the simplest resonator, namely, an LC circuit. Such a driver has been used in Refs. [2][7], and [9]. On the other hand, a staircase driver was rst proposed by L. J. Svensson and J. G. Koller [1], and then has been used in Refs. [8] and [10]. The staircase driver includes a switched capacitor regenerator, which has a tank capacitor for restoring the charge energy. In Ref. [8], the properties and stability of a switched capacitor regenerator has been discussed, however, has not been discussed yet from the viewpoint of design optimization. This paper reports an analytical method of the switched capacitor regenerator. We derive rst an equivalent circuit model of the switched capacitor circuit, and then propose analytical methods for step voltage difference. Finally, we show that the analytical results agree rather well with the SPICE simulation results. II. C ONVENTIONAL CMOS L OGIC VS . A DIABATIC L OGIC The conventional switching can be understood by using a simple CMOS inverter. The CMOS inverter can be considered

ISBN 978-987-655-003-1 EAMTA 2008

to consist of a pull-up and pull-down networks connected to a load capacitance C. The pull-up and pull-down networks are actually MOS transistors in series with the same load C. Both transistors can be modeled by an ideal switch in series with a resistor which is equal to the corresponding channel resistance of the transistor in the saturation mode, as shown in Fig. 1. When a conventional CMOS inverter is set into a logical 1 state, a charge Q = CVdd is delivered to the load and the energy which the supply applies is Eapplied = QVdd = CVdd 2 . The energy stored into the load C is a half of the supplied energy: 1 (1) Estored = CVdd 2 . 2

The same amount of energy is dissipated during the discharge process in the NMOS pull-down network because no energy can enter the ground rail Q Vgnd = Q 0 = 0. From the energy conservation law, a conventional CMOS logic emits heat and, in this way, it wastes energy in every chargedischarge cycle: Etotal = Echarge + Edischarge 1 1 CVdd 2 + CVdd 2 = 2 2 = CVdd 2 .

(2)

If the logic is driven by a certain frequency f (= 1/T ), where T is the period of the signal, then the power of the CMOS gate is determined as: Ptotal = Etotal = CVdd 2 f. T

(3)

The main idea in an adiabatic switching shown in Fig. 2 is that transitions are considered to be sufciently slow so that heat is not emitted signicantly. This is made possible by replacing the DC power supply by a resonance LC driver or oscillator. If a constant current source delivers the Q = CVdd charge during the time period T , the energy dissipation in

17

Vdd

III. S WITCHED C APACITOR R EGENERATOR A. Concept The switched capacitor regenerator (SCR) was rst proposed by L. Svensson and J. G. Koller [1]. This regenerator uses a source voltage and N 1 capacitors, so that an N step waveform is created and the charging energy is reduced to 1/N . Figure 3 shows the switched capacitor regenerator used in the analysis (N = 4). The regenerator consists of a voltage source, ve pass transistors that have input signals from Clk0 to Clk4, three tank capacitors C1 , C2 , and C3 , and load capacitor CL . Figure 4 depicts its operation. Transistors are turned on as Clk0, Clk1, Clk2, Clk3, Clk4, Clk3, Clk2, Clk1, Clk0. This is done repeatedly and the output voltage Vout becomes a step waveform. With i running from 1 to N , a load capacitor is switched from one voltage source to the nest. It is clearly seen from the V Q diagram as shown in Fig. 5 that energy dissipated per cycle is CVdd 2 . (5) W = qVdd = N

Energy dissipation

t

T

R C V V dd

Fig. 1.

V dd

Vp

Energy dissipation

t

T

R C V Vp

Since the voltage source are free from dissipation, except for the N -th source, they can be represented by capacitors with high capacitances (such as C1 in Fig. 3). This circuit has a self-stabilizing property: the voltages across the capacitors C1 are set to required levels automatically. In Ref. [8], Nakata has proved that each step of the output voltage of the regenerator circuit with (N 1) capacitors always settles to the voltage of i V (i = 0, 1, 2, , N ) (6) N

Fig. 2.

regardless of the initial condition, in the case of Cn CL where Cn is the tank capacitor. However, the tank capacitor Cn cannot be immoderately increased as capacitor size is affected by the chip die. In the next subsection, we will explain an equivalent circuit model of the SCR, and then discuss the design optimization of the capacitance ratio.

Vdd

Clk4

2

Ediss

= = =

I 2 R T CVdd T

R T,

(4)

CL

Vout

Clk3 Clk2 Clk1

where is a shape factor which depends on the shape of the clock edges [11]. It takes on the minimum value min = 1 if the charge of the load capacitor is DC modulated. For a sinusoidal current, = 2 /8 = 1.23. The above equation indicates that when the charging period T is indenitely long, in theory, the energy dissipation is reduced to zero. This is called an adiabatic switching [1].

ISBN 978-987-655-003-1 EAMTA 2008

Clk0

C3

C2

C1

Fig. 3.

18

V Vdd W=CVdd2/N

Clk4

Clk3

2 1

0

Fig. 5. V Q diagram of SCR.

Clk2

G

3 C 2 ox

D

Ron

Cox

Clk1

Fig. 6.

S

NMOS equivalent digital model.

Clk0

circuit as follows:

Vout

v0

v1

v2

v3

v4

Qt01

v5

Vdd

v6

v7

+Cox (0 VC3 0 ) + Cox (0 Vdd ), (7)

t0 t1 t2 t3

Fig. 4.

t4

t5 t6 t7

time

where VCx y (y = 1, 2, , n) is voltage of the node capacitance Cx (x = 1, 2, , n), and then total capacitance Qt1 has Qt1 5 = V1 CL + Cox 2

+ C 1 V1

B. Theoretical Analysis The switching behavior of the NMOS transistor can be generalized by examining the parastic capacitances and resistances, and so we consider the NMOS switch shown in Fig. 6 with the equivalent digital model [12]. Note that the effective input and output capacitances of the NMOS are Cin = 3 Cox 2 and Cout = Cox , respectively. We then can draw the equivalent circuit of Fig. 7, by using the equivalent digital model of the NMOS.

+Cox (V1 VC3 1 ) + Cox (V1 Vdd ). Equations (7) and (8) follow from charge conservation, V1 = V7 3 Cox + C1 2 CL + 7Cox + C1

(8)

At rst, we consider the voltage on a capacitance C1 . An electric charge in C1 can be determined from the equivalent

ISBN 978-987-655-003-1 EAMTA 2008

Cox is much smaller than CL (or Cn ) and so the second term in the above equation can be neglected as compared to the

19

Clk4

3 C 2 ox

Vdd

Cox

Ron

Vout

3 Cox Clk3 2

Ron

Cox

3 Cox Clk2 2

Ron

Cox

3 Cox Clk1 2

Ron

Cox

3 Cox Clk0 2

Ron

CL

Cox

C3

C2

Fig. 7.

C1

SCR equivalent circuit.

+ C 1 V7 CL + 7Cox + C1

3 2 Cox V7

C1 V7 . (10) CL + 7Cox + C1 From the above equation, we can see that a terminal voltage V1 is not equal to V7 . Of course, it is possible to get the same voltage if C1 is much larger than CL , however, the tank capacitor C1 cannot be immoderately increased as capacitor size is affected by the chip die. The other voltage conditions are also as follows:

provided by On-Semi conductor. The transistor size W/L is 5.0 m/1.2 m for both of the PMOS and the NMOS transistors. Cox is calculated from SPICE parameters. The tank and load capacitances are be implemented as poly-poly capacitance. Figure 8 shows the comparison of analysis with simulation results. We show that the analytical results agree rather well with the SPICE simulation results. From the viewpoint of CMOS implementation, however, we think that Fig. 8(a) has an optimized condition because of poly-poly capacitance. IV. C ONCLUSION We have reported an analytical method of a power clock generator based on a switched capacitor circuit which is used in adiabatic logic. We have derived rst an equivalent circuit model of the switched capacitor circuit. Then, we have discussed the design optimization of the capacitance ratio. Finally, we show that the analytical results agree rather well with the SPICE simulation results. From the viewpoint of 1.2m CMOS implementation, the capacitance ratio CL : Cn has been set at 0.1 : 1.0. ACKNOWLEDGMENT This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc. and Cadence Design Systems, Inc. This work was supported, in part, by a grant from Ogawa Foundation for Science and Technology of Pacic Industrial Co., Ltd. and by a grant from Research Foundation for the Electrotechnology of Chubu. Finally, the authors would like to thank Mr. Y. Fukuta and Mr. Y. Sakai for their help and support in this work. R EFERENCES [1] L. J. Svensson, and J. G. Koller, Adiabatic charging without inductors, in Proc. IEEE Int. Workshop Low Power Design, (IWLPD 94), Napa Valley, CA, April 22 27, 1994, pp. 159164.

V0 V2

= =

(11)

(12)

V3

(13) (14)

V4 V5

= =

(15)

V6

(16)

V6 (CL + Cox ) + C1 V1 . (17) CL + 7Cox + C1 From equations (12) and (16), (13) and (15), we can also see that the terminal voltages V2 and V6 (or V3 and V5 ) are not equal. V7 =

C. Comparison of Analysis and Simulation Results In order to compare the analysis with simulation results, the SCR was simulated in a 1.2 m CMOS n-well technology

ISBN 978-987-655-003-1 EAMTA 2008

20

2

voltage[V]

voltage[V]

0 4.998

0

5 (10-5)

4.999

time[s]

(a) CL = 0.1 pF, Cn = 1 pF.

4.998

4.999

time[s]

(10-5)

voltage[V]

voltage[V]

0 4.998

0

5 (10-5)

4.999

4.998

4.999

time[s]

(c) CL = 0.01 pF, Cn = 1 pF.

time[s]

(d) CL = 0.1 pF, Cn = 5 pF.

(10-5)

voltage[V]

0 4.998

4.999

time[s]

(e) CL = 0.5 pF, Cn = 1 pF. Fig. 8.

(10-5)

21

[2] S. G. Younis and T. G. Knight, Asymptotically zero energy split-level charge recovery logic, in Proc. IWLPD 94, pp. 177182. [3] A. G. Dickinson and J. S. Dencker, Adiabatic dynamic logic, IEEE J. Solid-States Circuits., vol. 30, no. 3, pp. 311315, April 1995. [4] Y. Moon, D.K. Jeong, An efcient charge recovery logic circuit, IEEE J. Solid-States Circuits., vol. 31, no. 4, pp. 514522, April 1996. [5] S. Kim and M. C. Papaefthymiou, True single-phase energy-recovering logic for low-power, high-speed VLSI, in Proc. IEEE Int. Symp. Low-Power Electronics and Design, Monterey, CA, Aug. 1012, 1998, pp. 167172. [6] D. Maksimovi , V. G. Oklobd ija, B Nikoli , and K. W. c z c Current, Clocked CMOS adiabatic logic with integrated single-phase power-clock supply, IEEE Trans. VLSI Syst., vol. 8, no. 4, pp. 460463, Aug. 1998. [7] Y. Ye and K. Roy, QSERL: Quasi-static energy recovery logic, IEEE J. Solid-States Circuits., vol. 36, no. 2, pp. 239248, Feb. 2001.

[8] S. Nakata, Adiabatic charging reversible logic using a switched capacitor regenerator, IEICE Trans. Electron., vol. E87-C, no. 11, pp. 18371846, Nov. 2004. [9] Y. Takahashi, T. Sekine, and M. Yokoyama, VLSI implementation of a 44-bit multiplier in a two phase drive adiabatic dynamic CMOS logic, IEICE Trans. Electron., vol. E90-C, no. 10, pp. 20022006, Oct. 2007. [10] Y. Takahashi, T. Sekine, and M. Yokoyama, Two-phase clocked CMOS adiabatic logic, in Proc. IEEE Asiapacic Conf. Circuits and Systems, Macao, China, Nov. 30Dec. 3, 2008 (to be appeared in). [11] M. Alioto, and G. Palumbo, Power estimation in adiabatic circuits: A simple and accurate model, IEEE Trans. VLSI Syst., vol. 9, no. 5, pp. 608615, Oct. 2001. [12] R. J. Baker, H. W. Li, and D. E. Boyce, CMOS circuit design, layout, and simulation, IEEE Press, NY, 1998, pp. 201228.

22

Temperature and Interface Traps Compensation in MOS Bias Controlled Cycled Dosimeters

J. Lipovetzky, M. Garca Inza, S. Carbonetto, E. Redin, A. Faigon.

Laboratorio de Fsica de Dispositivos-Microelectrnica Facultad de Ingeniera Universidad de Buenos Aires Paseo Coln 850, Buenos Aires, Argentina afaigon@fi.uba.ar, jlipove@fi.uba.ar

AbstractMOS dosimetry employing the Bias Controlled Cycled Measurement technique is investigated regarding its ability to compensate threshold voltage drifts superimposed to the signal, inducing measurement errors. Two sources of drifts were addressed: drifts due to interface states creation, and drifts due to temperature variations. The first case was measured and modeled; the second was numerically simulated using the same model. The results show that the good compensation observed for interface states creation would also occur for temperature induced drifts, reducing at least one order of magnitude the measurement error compared to non-compensated standard MOS dosimeters.

biasing technique reduces the measurement error caused by temperature variations. Finally, section V summarizes the results and proposes future work to be made. II. PHYSICAL MECHANISMS AND BIAS CONTROLED CYCLED MEASUREMENTS The irradiation of MOS transistors with ionizing radiation causes, among other effects, the shift of the threshold voltage (VT) of the devices [9]-[12]. This shift is originated in an increase in the interface traps density (NIT) and in the buildup and neutralization of electrical charge in the insulating gate oxide. The increase in NIT and variations in oxide trapped charge cause a shift in VT in p-channel transistors given by [9]: q N IT + VOX + VT0 VT = (1) C OX where: t 1 ox x VOX = (2) ox (x) t ox dx , C OX 0 and q is the electron charge, NIT the amount of interface traps per unit area, COX the oxide capacitance per unit area, x the distance from the gate towards the semiconductor, the oxide charge density in (x), and tox the gate oxide thickness. The term VOX is the contribution of the oxide-trapped charge and VT0 is the pre-irradiation threshold voltage. The physical mechanisms leading to interface traps creation and oxide charge buildup and neutralization are summarized in Fig. 1. The incidence of ionizing radiation generates electron-hole pairs in the gate oxide of the devices (A), (E). A field dependent fraction of the carriers can escape from an initial recombination and under positive gate bias the electrons accelerate towards the gate and holes begin a slower migration towards the Si-SiO2 interface (B). During their migration, the holes can be captured in deep oxide traps, mostly located near the semiconductor interface (C), and oxide positive charge buildups (PCB). Probably as a result of the liberation of H+, the amount of interface traps increases (D) [9]. If during the irradiation the gate bias is switched to a negative value, the direction in which electrons accelerate is

I. INTRODUCTION Metal Oxide Semiconductor (MOS) dosimeters are p-channel MOS transistors used to measure absorbed doses of ionizing radiation through the shift of the threshold voltage [1]. Their use allows making automatic, real-time measurement of absorbed doses using small and low cost sensors. MOS dosimeters are used in medical applications [1]-[6], personal dosimetry [2], sterilization plants [7], and space environments [1]. In many of these applications, the sensors are exposed to temperature variations, whose effects superimpose to those of the radiation, yielding to erroneous dose measurements. Recently, a new biasing technique was proposed to extend the measurement range of MOS dosimeters [7], maintaining a high sensitivity during the whole measurement. It was termed Bias Controlled Cycled Measurement (BCCM). This paper presents a complete study of how interface traps influence the response of dosimeters when biased with the new technique, and, based on these results, a model and numerical simulation of how unwanted temperature effects can be compensated, reducing the dose measurement uncertainty . The following section gives an introduction to radiation effects on MOS devices and describes the cycled bias technique. Section III discusses the effect of interface traps creation on the repeatability of the sensor responses, and shows that the distortion it introduces is almost cancelled by the end of each cycle. The experimental results are modeled and compared to simulation results. Section IV shows the compensation of temperature effects by an analog mechanism, and presents simulations showing that the cycled

23

SiO2 C Si gate E H+ B Positive bias Negative bias Si G SiO2 F

gate

Figure 1. Physical effects involved in oxide charge buildup and neutralization, and interface traps creation under positive and negative gate bias. A) generation, B) escape, C) holes capture, D) interface traps creation, E) and F) net detrapping, G) charge neutralization.

inverted as shown in Fig. 1 (F). A fraction of the accelerated electrons can recombine with trapped holes (G), decreasing the net oxide trapped charge. The decrease in the oxide charge caused by irradiation is called Radiation Induced Charge Neutralization (RICN) [13]-[15]. The physical processes involved in trapping and neutralization of electrical charge strongly depend on the electrical field the oxide [10]. Thus, the rate of trapping or neutralization depends on the gate bias applied during the irradiation and the initial conditions of trapped charge in the oxide, as was shown in [16]. In the standard use of MOS dosimeters, the negative shift in VT caused by positive or zero gate bias irradiation is used to quantify the absorbed dose. A recent work [7], proposed the technique BCCM to extend the measurement range of the sensors taking advantage of the possibility of neutralizing the oxide trapped charge by means of RICN. The idea is to alternate stages in which the dosimeter is irradiated under a positive gate bias and VT decreases, with stages of negative bias irradiation in which VT rebounds. During positive gate bias the positive charge buildups in the oxide (PCB stage) and during negative bias irradiation the charge is neutralized by the irradiation (RICN stage). During both stages, the shift in VT is used to quantify the absorbed dose. The threshold voltage, which is periodically read, is kept in a convenient window to maintain a uniform sensitivity during the whole measurement. Figure 2 shows the application of the BCCM technique extending more than one hundred times the measurement range of a sensor. III. EFFECTS OF INTERFACE TRAPS CREATION. The response of fresh dosimeters irradiated under the BCCM technique was studied to analyze the influence of interface traps creation in the response of the sensors. The biasing technique deals with the trapping and neutralization of positive charge in the oxide, which can be controlled through the change of the gate bias during irradiation. On the other hand, interface traps creation cannot

ISBN 978-987-655-003-1 EAMTA 2008

Figure 2. The BCCM technique alternates stages of PCB under positive gate bias with stages of RICN under negative gate bias. When the measured value of VT crosses a minimum preset value --in this case VT min = -6 V-- the gate bias is switched to a negative voltage and a RICN stage begins. During this stage VT increases, and the now positive shift in V T is used to quantify the absorbed dose. When VT crosses the maximum preset value, in this case VTmax = -5 V, the gate bias is again positively switched, and a new PCB stage begins. (After [7]).

VT

VG ION

b) a) Figure 3. a) measure configuration of the dosimeter to read V T, using an ION = 40A, b) bias configuration.

be reverted at room temperature, but showed to be a process which saturated after absorbing 50 to 100 kGy. A. Irradiation results and IT creation. Several fresh MOS dosimeters with 70 nm gate oxide thick to be used in a sterilization irradiation plant where irradiated using 60Co gamma sources at a dose rates of 6.3 Gy/s all doses are referred to SiO2--. During most of the time, the sensors were irradiated in the "bias" configuration of Fig. 3, switching every five seconds to the measure configuration to read VT. During the experiment, the irradiation field was removed seven times to measure current-voltage (I-V) curves to estimate the NIT through the sub-VT slope technique [17]-[18]. Fig. 4 shows the evolution of VT for one of the devices. The responses of the sensor do not repeat along the first tens of kGys of irradiation. Fig. 5 plots the dose required to complete a 1 V shift in VT during PCB and RICN stages as a function of the total dose. As the dosimeter is irradiated, the PCB stages shorten and the RCIN lengthen, approaching to a final steady length which is observed in dosimeters irradiated with higher doses. The change in the length of the stages is

24

IEEE Catalog number CFP0854E-CDR

-1

Fresh sample 54 kGys, biasing the dosimeter according to the method. The irradiation was interrupted six times to measure I-V curves

-2

-3

-4

-5

-6

Dose (kGy)

Figure 4. Threshold voltage evolution in a fresh sensor using the Bias Controlled Cycled Measurement.

Figure 6. Dose length of the stages vs. the interface traps density (normalized to the final value at the end of the measurement).

correlated to the creation of new interface traps, as can be observed in Fig. 6, where the length of the stages is plotted against the increase in the NIT estimated from the I-V curves. An important observation is that, although the length of the PCB and RICN stages both change, their sum and thus the length of a complete cycle is almost constant after the first kilograys. This property of the BCCM technique can be used to compensate measurement errors caused by this and other sources of threshold voltage drift as the temperature variation for example. This will be addressed in Section IV. B. Model and simulation of responses during IT creation. Equations (1) and (2) relate the radiation induced shift in VT with IT creation and oxide charge trapping. Eq. (1) can be generalized to include a third parasitic effect like temperature variations or slow border traps affecting VT, by including the term Vdrift in: q N IT + Vdrifts + VT0 VT = VOX (3) C OX During the first tens of kGy of irradiation, IT are created and the absolute value of the second term in the right hand side of (3) increases. Therefore, while this term changes, it

also changes the VOX needed to reach a given VT in the window. This variation of VOX as NIT increases is responsible for the change in the slopes of the responses of the sensor observed in Figs. 5 and 6, as is explained in the following paragraphs. In this analysis the term Vdrift is assumed to be zero. The rate of trapping of the oxide charge depends on the amount of trapped charge and on the applied field. Therefore, the rate of change of VOX with the dose is a function of VOX and the gate bias. This dependence can be partially estimated from Fig. 7, where the rate of change of VT with the dose is plotted vs. VT in a highly irradiated dosimeter where the N IT is saturated [7]. Without change of NIT nor other drifts, the rate of change of VT is equal to the rate of change of VOX. The curves of Fig. 7 can be approximated in the range of voltages of interestby VOX OX = SOX + S1 .VOX , (4) 0 D OX OX where S0 and S1 are calibration parameters which depend on the applied bias. From (3) and (4) it can be seen that as NIT increases, the rate of change of VOX at a fix VT value should change. This effect can be understood as a shift towards the left of the curves of Fig. 7 as the NIT increases. While the window is kept in the constant range of -5 V to 6 V, the slope of the PCB stages should increase whereas the slope of the RICN stages should decrease, explaining the results of Figs. 5 and 6. To validate this explanation, a numerical simulation tool was developed. The tool simulated the response of MOS BCCM dosimeter, calculating the threshold voltage shift using (3) and (4). The experimental sensitivities as a function of VT from a device highly irradiated where used as parameters in (4). Figures 8 and 9 plot the result of simulating the irradiation of Fig. 4. The increase in NIT with dose was extracted form sub-VT slopes of the IV curves, and introduced in (1) to overlap both effects. The solid points in Figs. 8 and 9

25

correspond to the experimental VT evolution during the first PCB and RICN stages and after 20, 40 and 57 kGy, whereas the lines show the simulated responses for the same doses. The simulation accurately reproduced the measurements. IV. TEMPERATURE EFFECTS AND COMPENSATION. The change in the temperature of a MOS dosimeter during irradiation yields to measurement errors. Temperature variations cause parasitic VT shifts which can be mistakenly confused with radiation induced shifts. Many attempts have been made to compensate for the temperature effects in MOS dosimeters. RADFETS commercial MOS dosimeters have been integrated with a diode which can be used to measure the temperature and made complex numerical corrections of the measured dose [1]. In other implementations, VT is read in a point in the I-V curve which is approximately insensitive to the temperature [19] [20]. However this point of the I-V does not always exist in all MOS transistors, and thus cannot be always used. Another technique for compensating temperature effects is to integrate in the same sensor two transistors differently biased during irradiation. The difference between the threshold voltages of the devices is a measure of the absorbed dose which is less affected by temperature variations [21], [22]. In this section, we show by means of numerical simulations, that the BCCM technique can compensate in both cases the measurement error introduced by slow temperature variations during irradiation. Temperature variations of VT are caused by the change in the potential required to cause strong inversion in the semiconductor [23]. Interface traps creation causes an increase in the temperature-induced shift as a result of the decrease of the sub-VT swing [9]. For example, for the devices used in this work, the VT shift in a non-irradiated transistor was +5.0 mV/C, whereas in a highly irradiated transistor with NIT saturated the shift was +7.5 mV/C. The thermal shift in VT can be included in (3) trough the term Vdrift and introduce an effect analog to the observed during NIT creation. The dependence of VT with temperature was measured and introduced into the simulation tool used in the previous section, which showed to reproduce the response to irradiation of the dosimeters. The response under irradiation of the dosimeters was simulated at different constant temperatures and along temperature ramps to evaluate the measurement error introduced by thermal shifts. A. Simulation of constant-temperature irradiations Figure 10 shows the length of the simulated PCB and RICN stages as a function of temperature in a wide range of temperatures, from -75C to 150C. The dose required to complete an RICN stage changes a 40% along the temperature range, decreasing with temperature. On the other hand, the dose required to complete a PCB stage has a similar variation, increasing with temperature. This change in the

ISBN 978-987-655-003-1 EAMTA 2008

10 8 6 4 2 0

-10

-8

-6

-4

-2

Fig. 7- Rate of change of VT ,i. e. sensitivity, as a function of V T during +2.90 V and -1.73 V bias irradiation.

-5.0 -5.2

VT (V)

Figure 8. Simulation and measurement of PCB stages. Points correspond to measurements, lines to the numerical simulation.

-5.0 -5.2

VT (V)

First 20 kGy 40 kGy 57 kGy

Total dose

0.05

0.10

0.15

0.20

Dose (kGy)

Figure 9. Simulation and measurement of RICN stages. Points correspond to measurements, lines to the numerical simulation.

26

Complete cycle

Threshold Voltage (V)

0.5 0.4 PCB stages 0.3 0.2 0.1 0.0 -100 RICN stages -50 0 50 100 150

Dose (kGy)

Temperature (C)

(a) Figure 10. Simulation of the variation of the dose length of the stages with temperature. Although the dose length of each stage can vary as much as a 40% in the studied range, the length of a complete cycle does not vary more than a 3%.

25 to 30C

20

standard measurement

sensitivity would introduce a temperature dependent measurement error during each separate stage, or with the standard use of MOS dosimeters. However, with the cycled measurement technique, the error compensates after completing a PCB/RICN cycle, as is shown in the upper curve of the same figure. The length of a cycle has a maximum dispersion of only 3% along the whole temperature range. B. Simulation of temperature variations during irradiation. Temperature variations during irradiation introduce an unwanted VT shift which can be mistakenly confused with a radiation-induced shift, leading to an extra measurement error. To quantify the measurement error, and compare the result of using the BCCM technique with the standard use of a MOS sensor, the response of a dosimeter under both biasing techniques was simulated. Several irradiations up to 5 kGy were simulated varying the temperature in a small range from 25 to 30C, in a wider range from 20 to 80C and keeping the temperature constant in 25C as a reference. The simulated VT evolution is plotted in Fig. 11.a. Figure 11.b shows the relative dose measurement error using both techniques as a function of dose for the two temperature variations and the two biasing techniques. For the wide thermal variation from 20 to 80C, the measurement error for the standard biasing technique increased with dose, reaching about 20% after absorbing 5kGy. Ror the BCCM technique the error was always smaller than 1%, compensating after each complete cycle. For the small thermal variation from 25 to 30C shown in the inset, the standard biasing method introduced a temperature-induced error of 2% after 5 kGys of irradiation, whereas the error for the BCCM technique was always smaller than 0.2%, ten times smaller for the same dose. V. DISCUSSION AND SUMMARY The effect on MOS dosimeters biased with the BCCM technique of interface traps creation and temperature

ISBN 978-987-655-003-1 EAMTA 2008

BCCM

15

10

5.0

standard measurement

BCCM

3 Dose (kGy)

(b) Figure 11. Temperature induced measurment error in BCCM compared with standard MOS dosimetry. The simulated relative error is shown for a continuous measurement along a temperature ramp between 20C and 80C (25-30 in the inset).

variations was studied. It was experimentally shown that in a fresh dosimeter, the increase in NIT causes a variation in the length of PCB and RICN stages. Being IT creation an effect which saturates after absorbing a high dose, the effect stops occurring when the dosimeters are highly irradiated, as was shown in [7]. Therefore an accurate calibration of the sensors is possible after IT traps saturation. In many applications MOS dosimeters must deal with temperature variations, inducing VT shifts, which, in turn, introduce an error in the dose measurement. The effect of temperature variations in MOS dosimeters was studied with the aid of a simulation tool. The simulation tool, which accurately reproduced measurements during IT traps creation was modified to include shifts in VT caused by temperature variations. The main result obtained is that the temperatureinduced variations in the sensitivity during PCB stages is compensated with the variation of the sensitivity during RICN stages, as happened during IT creation. Therefore, the

27

IEEE Catalog number CFP0854E-CDR

BCCM dosimetry should allow a significant reduction in the measurement error caused by the use of the sensors at different temperatures and temperature variations during the irradiation. In future work, the evolution of VT during irradiation under different temperatures will be measured, completing the results of this work. REFERENCES

[1] [2] A. Holmes-Siedle and L. Adams, "RADFETs: A Review of the Use of Metal-Oxide-Silicon Devices as Integrating Dosimeters"; Radiation Physics and Chemistry, 28, (2), pp. 235 244 G. Tarr, Member, K. Shortt, Y. Wang,I. Thomson A Sensitive, Temperature-Compensated, Zero-Bias Floating Gate MOSFET Dosimeter, IEEE Trans. Nucl. Sci. VOL. 51, NO. 3, pp.1277-1282, 2004. M.C. Lavalle, L. Gingras, Luc B., Energy and integrated dose dependence of MOSFET dosimeter sensitivity for irradiation energies between 30 kV and 60Co, Med. Phys. 33, pp. 3683-3689, 2006. R. A. Price, C. Benson, K. Rodgers, Development of a RadFET linear array for intracavitary in vivo dosimetry in external beam radiotherapy and brachytherapy, IEEE Trans. Nucl. Sci., Vol. 51 No. 4, pp 1420-1426, 2004. R. Ramaseshan, K. S. Khli, T. J. Zhang, T. Lam, B. Norlinger, A. Hallil, and M. Islam, Performance characteristics of a microMOSFET as an in vivo dosimeter in radiation therapy Phys. Med. Biol. 49, pp. 4031-4048, 2004. L.J. Asensio, M.A. Carvajal, J.A. Lopez-Villanueva, M. Vilches, A.M. Lallena, A.J. Palma; Evaluation of a low-cost commercial mosfet as radiation dosimeter, Sensors and Actuators A, No 125, 2006, pp. 288 295. J. Lipovetzky, E. Redin, M. Maestri, M. Garca Inza, and A. Faign Extension of the Measurement Range of MOS Dosimeters Using Radiation Induced Charge Neutralization., in press, and IEEE Trans. Nucl, Sci, and Proceedings 9th European Conference Radiation and Its Effects, Deauville, France, September 2007. J. Lipovetzky, E. G. Redin, A. Faign. Electrically Erasable Metal Oxide Semiconductor Dosimeters, IEEE Trans. Nucl. Sci, Vol. 54, Iss. 4, pp. 1244-1250, 2007. T. R. Oldham, Ionizing Radiation Effects in MOS Oxides, Advances in Solid State Electronics and Technology Series, Singapore: World Scientific, 1999. T. R Oldham, F. B. McLean, Total Ionizing Effects in MOS oxides and Devices IEEE Trans. Nucl. Sci., Vol. 50 No. 3, pp. 483-499, 2003. H. L. Hughes, J. M. Benedetto. Radiation Effects and hardening of MOS Technology: Devices and Circuits, IEEE Trans. Nucl. Sci., Vol. 50 No. 3 , pp. 500-520, 2003. Der-Sun Lee, Chung-Yu Chan, Oxide charge accumulation in metal oxide semiconductor devices during irradiation J. Appl. Phys. 69 (10) May 1991, pp. 7134-7141. Fleetwood, D.M.; Winokur, P.S.; Riewe, L.C., Predicting switchedbias response from steady-state irradiations MOS transistors, IEEE Trans. Nucl. Sci. Vol.37, Iss.6, 1990, pp. 1806-1817. D. M. Fleetwood Radiation-induced charge neutralization and interface-trap buildup in metal-oxide-semiconductor devices, Jour. Appl. Phys. Vol 67, Iss 1, 1990, pp. 580-583 C. Benson, A. Albadri, M. J. Joyce, The empirical dependence of radiation-induced charge neutralization on negative bias in dosimeters based on the metal-oxide-semiconductor field-effect transistor J. Appl. Phys. 100, 044505, 2006 A. Faign, J. Lipovetzky, E. Redin, M. Garca Inza, M. Maestri, A. Cedola Experimental evidence and modeling of non-monotonic responses in MOS dosimeters., sent to Radiation Physics and Chemistry, 2008. D. M. Fleetwood, M. R. Shaneyfelt, and J. Schwank, Estimating Oxide-Trap, Interface-Trap, and Border-Trap Charge Densities in MOS Transistors, Appl. Phys. Lett. 64, 1965 (1994). P. Antognetti, D. Caviglia, E. Profumo, CAD Model for Threshold and Subthreshold Conduction in MOSFETs, Journ. of Solid State Circ., vol. Sc-17, No.3, pp. 454-458, 1982.

[19] Haran, A.; Jaksic, A.; Refaeli, N.; Eliyahu, A.; David, D.; Barak, J. Temperature effects and long term fading of implanted and unimplanted gate oxide RADFETs, IEEE Trans. Nucl. Sci. Vol 51, No. 51 pp 2917 2921, 1994. [20] Per H. Halvorsen Dosimetric evaluation of a new design MOSFET in vivo dosimeter, Med. Phys, Vol 32, Iss. 1, pp. 110-117, 2005. [21] M. Soubra and J. Cygler, Evaluation of a dual bias dual metal oxidesilicon semiconductor field effect transistor detector as radiation dosimeter, Med. Phys, Vol. 21, Iss. 4, pp. 567-572, 1994 [22] N. Garry Tarr, Member, IEEE, Ken Shortt, Yanbin Wang, and Ian Thomson, Member, IEEE A Sensitive, Temperature-Compensated, Zero-Bias Floating Gate MOSFET Dosimeter, IEEE Trans. Nucl. Sci Vol. 51, No. 3, pp. 1277-1282, 2004. [23] Sze, Physics of Semiconductor devices, 3rd edition, Willey, ISBN: 978-0-471-14323-9.

[3] [4]

[5]

[6]

[7]

[16]

[17] [18]

28

Nikolaos Bartzoudis

Centre Tecnolgic de Telecomunicacions de Catalunya Av. Canal Olmpic S/N, 08860, Castelldefels Barcelona, Spain nikolaos.bartzoudis@cttc.es

Abstract Test scheduling is a key aspect in the automation of embedded microprocessors self-testing. This paper presents a self-testing framework targeting the LEON3 embedded microprocessor with built-in test-scheduling features. The proposed design exploits existing postproduction test sets, designed for software-based testing of embedded microprocessors. The framework also includes a constraintbased approach of test-routine scheduling. The initial results show that the test execution time could be dynamically scaled by the test selection algorithm. The scheduler itself adds insignificant overheads in terms of execution cost and code size.

Department of Computing and Electronic Systems, University of Essex, Colchester, CO4 3SQ, UK Vasileios.Tantsios@eu.sony.com kdm@essex.ac.uk testing framework uses functional testing and pass/fail techniques. Moreover, the proposed task scheduling is generic and could be reused across different hardware platforms or testing strategies. That is due to its extendible structure, which may include additional scheduling parameters (i.e. power consumption) on top of the test execution time that is currently considered. II. THE TESTING FRAMEWORK

I.

INTRODUCTION

Software-based self-test (SBST) [1] and constraint-aware scheduling of testing patterns has attracted the attention of various researchers over the recent years. Efficient test scheduling minimizes the overall system test application time, prevents test resource conflicts and limits power dissipation during test mode. The author in [2] tries to minimize test application time within a System-on-Chip (SoC) while considering structural resource allocation. Zhao et al present an algorithm for solving the general test scheduling problem where multiple test sets are selected [3]. Another inspiring concept in power-aware test scheduling is presenting in [4]. Zhow et al. propose a structural SBST methodology optimized for energy, average power consumption, test length and fault coverage [5]. Nourani et al. focuses on the use of power profile of nonembedded cores within a SoC platform to find the best mix of their test pattern subsets that satisfy the power and/or time constraints [6]. A cost-effective approach to the construction of diagnostic software-based test sets for microprocessors is presented in [7]. The most closely related work to our implementation is presented in [8]; a set of test routines from different test approaches are composing a test program for an embedded processor optimized for memory usage and real time requirements of the application. This paper presents a test selection algorithm which not only ensures the periodical execution of the test, but also optimizes the test process considering the real-time requirements of the application and the execution cost. The

The research of the authors N. Bartzoudis, V. Tantsios and K.D. McDonald-Maier was supported in part by the EPSRC grant EP/C54630X/1; by the Catalan Government under grant 2005SGR-00690; by the Ministerio de Industria Turismo y Comercio of the Spanish Government under project 2A103 (MIMOWA) from MEDEA+ program (PROFIT FIT-330225-20072), and by the European Commission under projects NEWCOM++ (ICT 216715) and PHYDYAS (ICT 211887).

The target embedded microprocessor, LEON3 [9], is a synthesizable VHDL model of a 32-bit processor. LEON3 is part of the GRLIB IP library which is a set of reusable IPs centered on a common on-chip bus. A part of the GRLIB IP library could be simulated with the TSIM simulator [9]. TSIM is able to emulate LEON-based computer systems and a number of GRLIB IP cores through loadable modules. The self-testing framework and the test scheduling algorithm were implemented and debugged using the GNU-based crosscompilation tool-chain for the LEON3 processor. The online testing of LEON3 was accomplished by testing its arithmetic operations and memory transactions. Two different test sets were used, to achieve efficient functional testing of embedded microprocessors. The final formation of the testing framework included 30 different tests. A. Testing of arithmetic operations The selected testing suite, namely paranoia, applies arithmetic operations that test the FPU of embedded microprocessors for functional errors [10]. Paranoia comprises of a series of tests on arithmetic calculations (e.g. radix precision and range, consistency of comparison, presence of guard digits, underflow/overflow, square root and powers etc). An indicative representation of a test sequence is shown in fig. 1. Initially, the radix and the precision of the system are estimated. Next, the closest relative separation between two floating point numbers is calculated. Basic operations with some special numbers follow; floating point numbers have a limit in the precision of decimal numbers, thus, a small ulp is noticed between the exact value of the real number and the one that is calculated. Finally, the difference between the calculated value and the correct one is

29

measured. If the difference is equal to Radix to the power of the closest relative separation (21.1102230e-16), then the conclusion is that no errors appear in these kinds of operations. This test sequence is repeated four times in order to evaluate the impact of the rounding errors in the precision of the system.

point an additional variable is introduced (i.e. testoffset) which helps to define the memory address. The testing sequence continues with the following processing steps: The values of offset and testoffset are set to 1. The value at the testoffset address is changed. The anti-pattern is written. All the power of two offset values are read. The testoffset value shifts left; before that the pattern value is written again to the changed address. Thus, all the power of two addresses store the pattern value. The process is re-initiated. C. Overview of system design The implemented online self-testing framework comprises of three major structural components: the test pattern sets, the test selection algorithm and the testing evaluation. The execution of a test program may be initiated during system startup/shutdown. Alternatively, an OS scheduler may identify idle cycles and issue test program execution or this could be achieved in regular time intervals with the aid of programmable timers. Finally, the test programs could also be triggered manually by a user input. The testing framework can be dynamically modified according to priorities related to system parameters. The test scheduling algorithm presented in this paper has the inherent ability to prioritize the test routines according to the available execution time. However, it may easily be extended to account for other system metrics.

Figure 1. Estimating the radix, the precision and the rounding errors.

B. Memory testing The memory testing suite uses three separate sub-tests (i.e. data bus test, address bus test and memory space test). The shifting digit technique is applied in all three test scenarios.

The implemented online self-testing framework follows a structural strategy with pass/fail indications. The errors were classified in four categories according to their impact to the system (e.g. failures, serious defects, defects, flaws). The number of the detected errors together with the functional severity assessment, indicate the quality and health-status of the system. III.

TEST SCHEDULING

The test selection algorithm (TSA) is a complex scheduling engine, that tries to obtain the best execution order of test routines, given timing constraints (in terms of allowed time for test), while guaranteeing that conflicting tests cannot be executed concurrently. More constraints could be considered since the TSA has a generic programmable structure. The time overhead of each test routine is defined by simulating them offline. A brief description of the algorithm with details of the processing steps is summarized next (fig. 3). The full extent of the cross comparisons is omitted for simplicity reasons (i.e. condition-checking to prevent TSA from being trapped in an infinite loop). Three lists are initially created, comprising of double linked nodes with three parameters (i.e. the name of the test, the execution time of the test, the times that were already executed). List 1 stores the tests, sorted by their execution time. List 2 stores the tests, sorted by the times they were already executed. List 3 is empty during initialization. It is used later to store the tests that were executed during the current execution cycle, preventing them from being executed again during the same execution cycle. Next, Mergesort is

Fig. 2 shows the workflow diagram for the address bus test. Initially, the pattern value is written to each of the power of two offset. In the next step these values are read. At this

ISBN 978-987-655-003-1 EAMTA 2008

30

applied to List 2. The test that is placed in the first position of List 2 is executed. If the execution time of the selected test is less than the total available time, then a) the total available time is reduced by the selected test's execution time, b) the times that the selected test was executed are increased by 1, c) the test is stored in List 3 in order not to be executed again in this execution cycle.

The TSA finds test 5 within List 1. The test that is then selected is the one with the highest execution time below the position of test 5. This is test 6. However, test 6, has already run during this execution cycle; thus, test 7 is selected. The total available time will be 40-30=10ms and the times that test 7 was executed will be 4 leading test 7 to the bottom of List2. Finally, the total available time that is left is less than the execution time of the test with the shortest execution time; thus the execution cycle stops.

AN EXAMPLE OF THE LISTS CREATED BY THE TSA. LIST 2 Test 6 Test 3 Test 4 Test 5 Test 1 Test 8 Test 2 Test 7 Times executed 2 2 2 2 3 3 3 3 LIST 2 update Test 5 Test 1 Test 8 Test 2 Test 6 Test 3 Test 4 Test 7 Times executed 2 3 3 3 3 3 3 4

TABLE I. LIST 1 Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7 Test 8

IV.

A more detailed example of the functionality of the TSA is described next using the data shown in table 1. We assume for simplicity reasons that the test set consists of 8 tests and the available time is 200 ms. List 2 sorts the test according to the number of times these were already executed. Test 6 is initially selected to be executed. Its execution time is less than the available time, thus, it is executed. The total available time will be 20040=160ms and the times that test 6 was executed will be 3 leading test 6 to the bottom of List2. Test 3 is selected next. Since its execution time is less than the available time, test 3 will be executed. The total available time will be 160-60=100ms and the times that test 3 was executed will be 3 leading test 3 to the bottom of List2. Then, test 4 is selected. Since its execution time is less than the available time, test 4 will be executed. The total available time will be 100-60=40ms and the times that test 4 was executed will be 3 leading test 4 to the bottom of List2. Then, test 5 is selected. But, since its execution time is longer than the available time, test 5 will not be executed. The available time will remain 40ms.

The effectiveness of the self-testing framework was validated by manually initiating errors. Error triggering in the memory was achieved by changing (offline) certain program variable values, while the microprocessor register values were altered at run-time (i.e. using the GDB debugger). Fig. 4 is a TSIM screenshot depicting part of the simulation process. As it may be seen no failures, serious defects or defects were detected (i.e. the detected flaw is due to compiler limitations).

The TSA was also validated with the TSIM simulator. The test-sets verified the arithmetic operations of the LEON microprocessor and its memory response while the testing framework including the TSA, was executed 39 times. The contents of List 2 are initially displayed in each execution cycle (fig. 5). The left column shows the name of the test, the

31

middle one shows the execution time of the test and the right column shows how many times each test was executed before the current execution cycle. The TSA is then printing the test sequence numbers for the current execution cycle and finally it reports the total available time that was not used. As it is shown in fig. 5 the different test routines were used exactly the same number of times after 39 execution cycles of the testing framework ensuring by this way high fault coverage.

time, avoid test resource conflicts and decrease the execution cost, while ensuring that real-time system tasks will meet their deadlines. Moreover, the TSA can be tuned either for lower execution cost or higher error coverage; apparently, there is always a tradeoff between these two testing strategies. High error coverage could be achieved in a long term testing scenario by dynamically modifying the execution time of the framework and its execution frequency. The TSA is fully programmable and thanks to its extensible structure it could also be parameterized to consider different sorting conditions (i.e. sorting the test execution according to test importance). V. EXTENDING THE TSA FUNCTIONALITY

Additional scheduling parameters could be integrated to the TSA to target other operational scenarios or testing strategies. The tradeoff for using extra cross-correlation parameters would be additional memory, performance and execution-time overheads for the TSA. The implemented testing framework can run in conjunction with the main processing activity of the LEON3. This could be further optimized by applying minor modifications to the existing implementation. The TSA could also be extended to include scheduling of VHDL-based build-in self test routines. Additional validation of the TSA at run time in a hardware platform will boost its performance and optimize its code size. REFERENCES

[1] Janusz Sosnowski, Software-based self-testing of microprocessors, Journal of Systems Architecture, Vol. 52, No 5, pp. 257-271, May 2006. [2] karvada Jaroslav, Test Scheduling for SOC under Power Constraints, in Proc. of IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, Prague, 2006, pp. 91-93. [3] D. Zhao and S. Upadhyaya, Adaptive test scheduling in SoCs by dynamic partitioning, in IEEE Int. Symp. Defect and Fault Tolerance in VLSI Systems, Nov. 2002, pp. 334342. [4] Richard M. Chou, Kewal K. Saluja and Vishwani D. Agrawal, Scheduling tests for VLSI systems under power constraints, IEEE Transactions on Very Large Scale Integration Systems, Vol. (5), No 2, pp. 175-185, Jun. 1997. [5] J. Zhou and H. Wunderlich, Software-based self-test of processors under power constraints, in Proc. of Design, Automation and Test in Europe, pp. 430-435, Germany, March 06 - 10, 2006. [6] Mehrdad Nourani and James Chin, Power-time tradeoff in test scheduling for SoCs, in Proc. of the 21st International Conference on Computer Design, pp. 548-553, Washington DC, Oct. 2003. [7] P.Bernardi, E.Sanchez, M.Schillaci, G.Squillero, M.Sonza Reorda, An Effective Technique for Minimizing the Cost of Processor SoftwareBased Diagnosis in SoCs, in Proc. of the IEEE Conference on Design, Automation and Test in Europe, 2006, pp. 412-417, Germany, March 06 - 10, 2006. [8] M. Moraes, . Cota, L. Carro, F. Wagner and M. Lubaszewski, A constraint-based solution for on-line testing of processors embedded in real-time applications, in Proc. of the 18th Annual Symposium on integrated Circuits and System Design, pp. 68-73, Florianolpolis, Brazil, Sep. 2005. [9] LEON3 - TSIM2 Simulator, Oct. 2007 (www.gaisler.com). [10] Les Hatton, Embedded System Paranoia: a tool for testing embedded system arithmetic, Information and Software Technology, 47 (2005), pp. 555563.

An example that verifies that the TSA never runs the same test more than once during a given execution cycle, is illustrated in figure 6. The total available time left during the 36th testing execution cycle is 40ms. Tests 8 with execution time of 20ms and test 9 with execution time of 30ms could have been executed. However, these two tests were already executed in that particular execution cycle; therefore the execution sequence ends at that point.

Figure 6. A test is not allowed to run two times at the same execution cycle.

The total execution time of all the tests is approximately 34sec. The execution time overhead that is added from the TSA could be considered minor since it is 350ms (i.e. 1% of the total execution time of the testing framework). The executions times of the test-sets and the TSA could be reduced significantly when implemented in a hardware platform. The experimental results presented in this paper show that the TSA could optimize the overall system test application

32

Marcos Funes*, Patricio Donato*, Matas Hadad*, Daniel Carrica*

*

Laboratorio de Instrumentacin y Control (LIC) Universidad Nacional de Mar del Plata - Argentina Consejo Nacional de Investigaciones Cientficas y Tcnicas (CONICET) - Argentina {mfunes, donatopg, mhadad, carrica } @ fi.mdp.edu.ar physical channel [11]. Binary complementary sequences are widely used in fields of engineering such as communications [6], Non Destructive Tests (NDT) [7], Ground Penetration Radars (GPR) [8], robotics [9], etc. Generation and correlation of complementary sequences can be conducted by means of recursive methods that reduce computational load and hardware complexity if compared to the straightforward implementation [12] [13]. These methods allow the replication of a simple structure that can be easily implemented into hardware like Field Programmable Gate Arrays (FPGA). FPGAs combine the flexibility of a generalpurpose programmable digital signal processor with the speed and density of a custom hardware implementation. Yet and notwithstanding the simplicity of the generator or correlator schemes, the requirements of logic resources increase as the sequence length does, thereby reaching considerable levels. This work proposes improved architectures of the wellknown efficient generators and correlators of complementary sequences with the aim of reducing hardware implementation. In order to verify the reduction level obtained, these improved architectures have been parameterized in VHDL and implemented in a FPGA platform. The basic concepts of complementary sequences and efficient architectures are developed in Section II. Section III introduces the implementation of the efficient structures, while Section IV presents the proposed ones. The results of the implementation in FPGA are listed in Section V and conclusions are discussed in Section VI. II. COMPLEMENTARY PAIRS OF SEQUENCES

Abstract Complementary sequences are currently being applied to multiple fields of engineering. Given their particular mathematical properties, they have been widely used in applications ranging from robotics to communications, and have even been implemented in programmable logic devices (e.g. FPGA). Most of the applications proposed in the literature need to employ efficient architectures to reduce the computational load. This paper describes improved generator and correlator architectures which result in hardware requirements reduction when implemented in FPGAs.

I.

INTRODUCTION

Many digital signal processing techniques are commonly used to recover information contained in signals corruptible by noise and/or disturbances introduced by external sources (e.g. other users). Additionally, these signals can be affected by the characteristics of the physical medium (attenuation, distortion, etc.). Among the processing techniques most frequently employed is the correlation function which is suitable for the detection of signals immersed in Additive White Gaussian Noise (AWGN). Its application encompasses fields such as communications, radar, and sonar systems. With a view to maximizing the advantages of the correlation function, it is convenient to encode the signals. Several mathematical algorithms can produce an easily identifiable characteristic response, even in the presence of noise. Among them, the following could be mentioned: Pseudorandom sequences [1]. Barker sequences [2]. Walsh-Hadamard sequences [3]. Complementary sequences [4] [5].

In particular, the absence of sidelobes in the autocorrelation (AC) has rendered complementary sequences suitable for applications in which the detection of a signal in conditions of negative Signal to Noise Ratio (SNR) is required [10]. Their suitability also derives from their orthogonality property in those cases in which large amounts of independent information needs to be simultaneously sent in a particular

This work was partially supported by CONICET Grant PIP 6245/04 (Argentina), Universidad Nacional de Mar del Plata Grant ING 191/07 (Argentina), and ANPCyT Grant PICT 11-473/04 (Argentina).

Complementary pairs of sequences (also known as Golay sequences) are defined as a pair of sequences, {a n [k ]; bn [k ]} , composed of two binary elements, -1 and +1, respectively, which can be generated of length L=2n elements (n being a natural number excluding zero). The number of pairs of like elements with any given separation in one sequence is equal to the number of pairs of unlike elements with the same separation in the other sequence [4]. For example, for n=3, the following pair of complementary sequences is obtained:

33

a 3 [k ] = + 1 1 1 1 1 + 1 1 { { { { { { { k =1 k = 2 k =3 k = 4 k =5 k = 6 k =7 b3 [k ] = + 1 + 1 1 + 1 1 1 1 { { { { { { { k =1 k = 2 k =3 k = 4 k =5 k = 6 k =7

1 { k =8 = L +1 { k =8 = L

This result is independent of the synchronous or asynchronous correlation, and unattainable by any other binary sequence.

(1)

Taking into account the recursive formulation by Budisin (2), Popovic [13] introduced a recursive algorithm to obtain the correlation of pairs of complementary sequences:

( Ra0) [k ] = R[k ] ( Rb0) [k ] = R[k ]

Budisin [12] presented a complementary sequences generation algorithm based on a recursive formulation:

a 0 [k ] = [k ] ai [k ] = ai 1 [k ] + wi bi 1 [k Di ]

b0 [k ] = [k ]

(2)

bi [k ] = ai 1 [k ] wi bi 1 [k Di ]

where: [i] is a Krnecker Delta; i is the iteration number 1, 2, ... n; wn is a coefficient +1 or -1; Di is a positive delay Di = Z 2

( C rb [k ] = Rbn )

( C ra [k ] = Ran )

(6)

where: i 1

Rx are partial results; Crx is the correlation between the input signal R[k] and the sequence x.

);

The set of values of wn is referred to as a generation seed, and denoted by means of a W vector:

W = [w1 w2

K wn ]

(3)

From (6), it is possible to obtain an efficient complementary pairs correlator. Both the correlator and the generator are referred to as efficient because they reduce the number of mathematical operations and simultaneously make the correlation of the input signal with both sequences of the pair. Fig. 2 illustrates the architecture of the efficient correlator. However, two of these efficient correlators are needed to realize the sum of correlations (5) as seen in Fig. 3. Indeed hardware requirements are lesser if compared with those of the straightforward correlation, yet the efficient correlator misuses resources to make two undesired correlations (Cab and Cba). Next sections analyze the implementation of these architectures in FPGA and the way in which architecture can be improved to reduce logic resources consumption.

d(j)

Based on (2), an efficient complementary pairs generator can be attained (Fig. 1). Each recursive algorithm iteration features a new basic module in the implementation. Choosing a different generation seed, W, enables the generation of different pairs of complementary sequences. Given a complementary pair of sequences of length L, their corresponding autocorrelations (AC) are:

C an an [i ] = C bnbn [i ] =

k =L k =1 k =L k =1

+

D1

+

D2

+

Dn

an [k]

bn [k]

a n [k ] a n [k + i]

(4)

w1

w2

wn

bn [k ] bn [k + i]

Basic Module

where i is the i-th iteration of the C XX correlation. The sum of both sequences AC is as follows:

0 C an an [i ] + Cbnbn [i ] = 2 L [i ]

i0 i=0

R[k]

D1

+

+

D2

+

+

Dn

+

+

CRa[k]

CRb[k]

(5)

w1

w2

wn

The sum of complementary pairs of sequences AC yields a Krnecker delta of magnitude 2L with null sidelobes, this being one of the complementary sequences main features.

Basic Module

34

Unused correlation an [k] Efficient correlator Cab Caa

+

bn [k] Efficient Correlator Cbb Cba

Y [k]=Caa+Cbb

B. Complementary sequences correlator The basic module of Popovic correlator implementation is depicted in Fig. 5. The logical structure quite resembles that of the generator, the main difference lying in the word length of each element in the sequences, which increases with each correlator stage. The word length of the first stage is provided by an Analog to Digital Converter (e.g. m=8 bit word). In each stage the word length is increased one bit due to a carry bit. At the end of the correlator, the word length is a m+n bit word. Then the sum of the correlations Caa and Cbb is the sum of two m+n bits numbers.

Unused correlation

Figure 3. Sum of correlations using efficient correlators.

IMPROVED IMPLEMENTATION

Complementary sequences generators and correlators are basically made up of delays and adders/substracters. These blocks can be implemented in FPGA as binary adders/substracters (for additions and subtractions, respectively), and flip-flops D or shift registers (for delays). A. Complementary sequences generator The basic module of Budisin generator implementation is shown in Fig. 4. The multiplication between the seed wn and the output of the previous stage is realized by means of a + or - selection in the binary adder. The seed wn is a binary value -1 or +1 digitally represented as 0 or 1. Given the fact that the generator output can take three different values (1 and +1 for the sequence elements, and 0 for no emission), two bits are needed for their representation. Each Di delay unit is composed of a shift register of 2i bits length.

As explained in the preceding section, implementing the efficient generator and correlator implies the use of logical resources (e.g. flip-flops) whose number significantly increases with each new stage. Moreover, the efficient correlator proposed in [13] concurrently makes the correlation of the input signal with a n [k ] and bn [k ] , but fails to simultaneously correlate two input signals. In other words, if the received signal is comprised of a complementary pair of sequences (this can be realized by means of special emission techniques [10]), it will be necessary to use two efficient correlators or some multiplexer(s) stage(s) to make the sum of correlations described in (5). The next subsections propose a new type of implementation applicable to both architectures (generator and correlator), which extensively reduces the computational load and hardware complexity by way of a series of modifications.

A[1:0] S[1:0] 2

i

a1

Ra

(0)

Ra(1)

a0

i

D[7:0] Q[7:0]

clk Rb

(0)

CLK 8

b0 clk

2 SLI[1:0]

CLK

b1

Rb

(1)

w1

w1

Figure 5. Basic module of the Popovic correlator with an 8 bits word length input.

35

w1

A. Improved complementary sequences generator The improvement in the logical architecture of the generator consists in the selection of periodical parts of a sequence to compose a sequence of greater length. The flipflops used in the classical implementation are replaced by multiplexors, and a counter that works as a frequency divider. The recursive algorithm (2) presented in Section II can be represented as follows:

a n = [a n 1 wn bn 1 ] a 0 = [1] b0 = [1]

w2

wn

2 f /4

Frequency divider

f /2

f /n

bn = [a n 1

wn bn 1 ]

with

(7)

So, a pair of complementary sequences of length Ln=2n are composed of two pairs of sequences of length Ln-1=2n-1, and these ones, in turn, are made up of two pairs of sequences of length Ln-2=2n-2, and so on. For instance, a pair of complementary sequences of n=3 and generation seed W can be split into groups of sequences of lesser length:

B. Improved complementary sequences correlator In this case, an architecture based on Popovics efficient correlator is proposed. Said architecture simultaneously correlates two different input signals with the sequences a n [k ] and bn [k ] . The basic module of Popovics correlator is improved by inverting the order of the adders/substracters with respect to the delay and seed. A recursive formulation is suggested in order to introduce the following changes:

a 0 [k ] = Ra [k ] b0 [k ] = Rb [k ] ai [k ] = ai1 [k Dn i +1 ] + bi1 [k Dn i +1 ] b [k ] = w a [k ] w b [k ] n i i 1 n i i 1 i Y [k ] = a [k ] + bn [k ] = 2 L [i ] n

(8)

(9)

Hence it is possible to generate pairs of complementary sequences by means of a selection and concatenation of sequences of lesser length. A single stage of this implementation can be seen in Fig 6. A frequency divider is used for the selection of multiplexors (Fig 7).

ai and bi are the partial results.

From (9), the sum of correlations of a pair of complementary sequences of length L=2n can be realized with the architecture in Fig 8. Said architecture works as an inverse of the efficient generator producing a Krnecker delta at the output. Each basic module divides the length of the input sequences into two, and multiplies by two the amplitude of their elements. The iteration of this process along the n stages of the correlator produces two Krnecker deltas of amplitude 2 n . The sum of both outputs, Y[k], results in a Krnecker delta of amplitude 2 2 n = 2 L (5). A significant reduction in hardware requirements can be ascribed to this improved correlator, as it requires just a single correlator for the sum. Conversely, the previous systems, based on Popovic correlator, calls for two correlators to complete such same process (see Fig. 3). A reduction of nearly 50% can then be expected in FPGA implementation.

36

V.

IMPLEMENTATIONS

VI.

CONCLUSIONS

The classical as well as the improved architecture for the generator and correlator were implemented in a Xilinx FPGA Spartan 3 XC3S200. This FPGA contains a total of 1920 slices, each one with the following elements in common: two logic function generators (FG), two storage elements (FF), wide-function multiplexers, carry logic, and arithmetic gates. To compare the hardware requirements of each generator, the results of the implementation are resumed in Table I, where L is the sequence length. This table demonstrates the significant reduction obtained with the proposed generator, amounting to a total of 90% for a 128 bit sequence length. Table II summarizes the implementation results of a correlator module composed of two Popovic correlators and an m+n bit adder, and the improved correlator proposed (Fig. 8). A reduction of 50% to 58% of logic resources consumption can be evinced in the 8 to 128 bit sequence length, respectively.

This paper proposes two optimized architectures for the complementary pairs of sequences generator and correlator. These architectures enable hardware requirements reduction for their implementation in programmable logic devices. Reductions of about 50% were obtained with the correlator for different lengths, while an additional reduction ranging from 20% to 90% was achieved in the implementation of the improved generator for different lengths. Both proposed architectures can be generalized for complementary sets, thereby allowing a more efficient utilization of the resources. ACKNOWLEDGMENT The authors would like to express their thanks to the Universidad Nacional de Mar del Plata, UNMDP (Argentina), Agencia Nacional de Investigaciones Cientficas y Tecnolgicas (Argentina) as well as to the Consejo Nacional de Investigaciones Cientficas y Tcnicas, CONICET (Argentina). REFERENCES

L=2 i=0

L=2 i=n-1

L=2 0 i=n

[1]

Ra [k]

Dn-1

a1

an-1

D1

an

Rb [k]

+

Wn-1

b1

bn-1

+

w1

bn

Y[k]

[2]

Basic Module

Table I

GENERATOR IMPLEMENTATION

[6]

FF 4 5 6 7 8

Proposed FG Slices 9 5 16 8 15 9 16 7 18 10

[7]

[8]

[9]

Table II

CORRELATOR IMPLEMENTATION

[10]

L 8 16 32 64 128

Popovic Proposed Reduction FF FG Slices FF FG Slices % 164 170 127 85 94 64 49.61 340 293 231 164 153 112 51.52 724 522 440 323 253 200 54.55 1556 979 874 642 434 368 57.89 3348 1921 1677 1281 776 698 58.38

[11]

[12]

[13]

D. Sarwate and M. B. Pursley, Crosscorrelation properties of pseudorandom and related sequences, Proceedings of the IEEE, vol. 68, No.5, pp. 593-619, May 1980. S. W. Golomb and R. A. Scholtz, Generalized Barker sequences, IEEE Transactions on Information Theory, vol. IT-11, No. 4, pp. 533537, October 1965. H. Harmuth, Application of Walsh functions in communications, IEEE Spectrum, vol. 6, pp. 82-91, November 1969. M. J. E. Golay, Complementary series, IRE Transactions on Information Theory, vol. IT-7, pp. 82-87, April 1961. C. C. Tseng and C. L. Liu., Complementary sets of sequences, IEEE Transactions on Information Theory, vol. IT-18, No 5, pp. 644-652, September 1972. H. M. Wang, X. Q. Gao, B. Jiang, X. H. You and W. Hong, Efficient MIMO channel estimation using complementary sequences, IET Communications, vol. 1, No. 5, pp. 962969, October 2007. J. D. H. White and R. E. Challis, A Golay sequencer based NDT system for highly attenuating materials, IEE Colloquium on NonContacting and Remote NDT, pp. 7/1 - 7/7, November 1992. A. Vazquez Alejos, D. Muhammad and H. U. Rahman Mohammed, Ground penetration radar using Golay sequences, Proceedings of the 2007 IEEE Region 5 Technical Conference, pp. 318-321, Fayetteville, USA, April 2007. C. De Marziani, J. Urea, A. Hernndez, M. Mazo, F. Alvarez, J. J. Garca, J. M. Villadangos and A. Jimenez, Simultaneous measurement of times-of-flight and communications in acoustic sensor networks, Proceedings of the IEEE International Workshop on Intelligent Signal Processing (WISP05), pp. 122 127, Faro, Portugal, September 2005. P.G. Donato, J. Urea, M. Mazo, C. De Marziani and A. Ochoa, Design and signal processing of a magnetic sensor array for train wheel detection, Sensors and Actuators A (Physical), Vol. 132, Issue 2, pp.: 516-525. Elsevier Science B.V. November 2006. C. De Marziani, J. Urea, A. Hernndez, M. Mazo, F. lvarez, J. J. Garca and P. Donato, Modular architecture for efficient generation and correlation of complementary set of sequences, IEEE Transactions on Signal Processing, vol. 55, No. 5, pp. 2323-2337, May 2007. S. Z. Budisin, Efficient pulse compressor for Golay complementary sequences, IEE Electronics Letters, vol. 27, Issue 3, pp. 219-220, January 1991. B. M. Popovic, Efficient Golay Correlator, IEE Electronics Letters, vol.35, No.17, pp.1427-1428, August 1999.

37

Juan Jos Zrate

Laboratorio de Bajas Temperaturas Centro Atmico Bariloche Email: zaratej@ib.cnea.gov.ar

Hernn Pastoriza

Laboratorio de Bajas Temperaturas e Instituto Balseiro Centro Atmico Bariloche Email: hernan@cab.cnea.gov.ar

Abstract e-beam lithography is a technique capable of fabricate sub-micrometer planar structures. The ultimate resolution in this technique is limited mainly by the proximity effect where the dose accumulated in one spacial point is affected by the irradiated dose in its neighborhood. The relevance of this effect in one particular pattern strongly depends on its geometry, the sensitivity of the resist and the physical characteristics of the substrate. In this work we present a numerical algorithm to calculate the nominal dose needed to be applied in each point of the geometry that results in an optimal net dose for an efcient pattern transfer.

I. I NTRODUCTION While optical lithographic techniques are well established for the massive production of semiconductor electronics, electron beam (e-beam) lithography is widely used in applications where fast prototyping and nanometer resolution is required. Given that the de Broglie wavelength of an electron accelerated to 25 keV is around 0, 008 nm, this technique is not limited in its resolution by diffraction effects. Two other constrains are mainly responsible of the ultimate resolution attainable in ebeam lithography. One is beam size which is determined by the electron source size and the collimation and focalization capability of the hardware used. Current equipments are capable of focusing the electron beam in a spot of tenths of nanometer in diameter. On the other hand exist an intrinsic spread of the electron trajectories and energy transfer due to forward and back-scattering and secondary electron generation, due to interaction of electrons with matter. Moreover, a fraction of forward electrons reaching the bottom of the resist can be reected or generate secondary electron in the substrate. Both of this effects are responsible of an effective spot much broader than the incident electron beam. In gure 1 we sketch these two mechanisms for energy transfer. The spacial dose functional mentioned above is well described by the sum of two Gaussian distributions [1] [2], 1 r2 /2 r2 / 2 e e + (1) 2 2 where the rst term corresponds to the incident beam spread in a region of size which depends on the electron energy, and the resist characteristics. The second term takes into account the fraction of the incident e-beam that is backwards reected by the substrate which is distributed in a region of size . Clearly the incidence of this later term in the f (r) =

ISBN 978-987-655-003-1 EAMTA 2008

Fig. 1. Energy transfer mechanisms in e-beam lithography.

lithography is substrate dependent and becomes signicant when heavy materials are used like Pb or Au for dimensions below 1 m. In Silicon this effect is notorious for dimensions below 0.1 m. In Figure 2 we plot the radial dose distribution for typical values of / = 0.025 y = 0.7. The value can be obtained by Monte Carlo simulations [2] knowing electron energy, the resist characteristics and its thickness. Both y can be determined by measuring the dimensions of overexposed patterns [3].

Fig. 2.

= 0.025 y = 0.7.

The inuence that has this spatial distribution of the irradiation dose is enhanced when two adjacent regions are close enough such the dose of each region is affected by the irradiated dose in the other. This is called proximity effect. The problem of knowing a priori the irradiating dose that

38

has to be applied to each point in order to take into account the proximity effect can be approached as a inverse matrix problem. There are reports in the literature showing partial developments of algorithms [1], [2] but its details are not public but accessible through commercial software. In this work we present a complete new algorithm that can be used to correct the proximity effect. II. M ODEL To be able to quantify the proximity effect we must specify the pattern region, separating the parts that have to be drawn from those that have not. Being mij the pattern matrix dened as, mij = 1 if point (i, j) is drawn. = 0 if (i, j)is not drawn. (2)

uv ij fuv = (1 + )

(8)

The functions dened by equations 1 and 5 represent that if one point (uv) is irradiated with a dose duv the density in ij other sector (ij) is going to be Dij = duv fuv 1 . The nal dose in the cell (ij) will be dened by the dose matrix d( uv) as: Dij =

u,v

(9)

where duv is a matrix dened for all point where muv is. The problem consist in knowing duv such it satises the conditions 3 and 4: > D0 point (i, j)is drawn. < S0 point (i, j)is not drawn.

We assume that the lithography can be dened in a grid where each index (ij) represents the unit cell. With this model the smallest feature that can be dened is a square of size b where the applied dose is assumed to be constant. This square does not correspond to the smallest spot size, but includes many written points. The pattern dened by mij will be transferred to the resist if: Dij Dij D0 mij S0 (1 mij ) (3) (4)

Dij

(10)

where Dij is the dose density in the position (i, j). D0 is the saturation dose for the given resist. This dose represents the value where the resist chances completely is solubility. S0 is the dose value up to where the resist does not change its solubility, or threshold dose. A high contrast resist has its characteristic values S0 and D0 very close to each other. The magnitude of these doses are extracted from the contrast curve of the resist that can be determined experimentally [3]. For the implementation of the algorithm is useful to dene a spatial dose distribution on discrete values as given by the equation 1, f (|rij ruv |)

ij uv = fuv = fij 1 [(iu)2 +(jf )2 ]b2 2 = + e 2 2 +(jf )2 ]b2 [(iu) 2 e 2

Fig. 3.

III. A LGORITHM DETAILS Our algorithm needs as argument the pattern matrix mij dened in equation 2 and sketched in Figure 3. At zero order (0) we propose a dose matrix dij proportional to mij . dij = C mij

(0)

(11)

(5)

The variable C can be evaluated taking into account the D0 mij . The value of D0 is an adjustable condition Dij parameter of the algorithm as the best results are not always obtained with D0 = D0 . In this way, equation 9 is obtained,

D0 mij = u,v uv d(0) fij = C uv u,v uv muv fij

(12)

1 ij Note that f (r) and fuv have units of area . It can be readily seen that when the scattering effects are negligible ( 0 and ( ) 0) both approach to the Dirac delta function,

Note that for each point (i, j) we have a solution for C. uv Given the shape of the function fij , the points (u, v) that have greater weight in equation 12 are those close to (i, j). We can propose C Cij to get, dij =

(0) D0 mij uv muv fij

lim0 lim0

( )0 ( )0

f (r) = (r)

ij fuv = ijuv

(6) (7)

(13)

This limit corresponds where the proximity effect is irrelevant in comparison to the typical details of the pattern.

ISBN 978-987-655-003-1 EAMTA 2008

1 Note the dimensional character of each of the dened functions: f (r) = ij C 1 fuv cm2 ; Dij cm2 ; dij C

39

where the value dij in each point is modied by the contribution of the closest points (u, v) as is sketched in Figure 4. In (0) the limit given in equation 7 the step-like matrix dij = D0 mij is obtained.

(0)

The undetermined has to be evaluated by minimizing (1) the contribution of duv on the dark areas (mkl = 0). This can be accomplished be a new dose matrix on these areas, (1) similar to the evaluation given by equation 14, but taking duv (1) as argument and using sij such S0 , with 0 < S0 < S0 . sij =

(1) uv d(1) fij = S0 uv

(16)

For simplicity is convenient dene an auxiliary matrix Auv where muv = 0, Auv = b4

kl kl skl fuv (0)

(17)

Fig. 4.

(0)

In this way can be calculated injecting into equation 16 (1) the value duv from equation 15,

S0

One possible strategy to follow is to use the result of 13 as argument in equation 11 to solve the problem iteratively [4]. Before that we must calculate the dose in the points where mij = 0. Because the proximity effect some of these points (0) will receive dose different of zero. We dene a new matrix skl as the dose in the points (k, l) where mkl = 0 (see Figure 5). skl =

(0) uv d(0) fkl uv

=

uv

uv d(1) fij uv

=

uv

d(0) b4 uv

kl uv d(0) fij uv uv (0) sij uv

kl skl fuv uv

(0)

uv fij

(14)

= =

b4

kl

kl skl fuv uv

(0)

uv fij

uv

uv Auv fij

(18)

S0 sij uv uv Auv fij (0)

= ij

(ij)

(19)

Fig. 5.

(0)

(0)

The necessary condition is that sij < S0 , which is not fulllled for all points. We add a new correction to the matrix dose subtracting a factor proportional to the dose accumulated in the points where mij = 0, d(1) = d(0) b4 uv uv

(0) kl skl fuv (0)

Again this is a over-dened equation system with a solution for each point (i, j) outside the pattern and we need to compute a sum inside the pattern (see equation 15). Numerically we assign to the average value of all ij close to the point (1) were we are calculating duv , as shown in Figure 7.

(15)

kl The factor skl fuv has greater weight as greater is uv the dose in the point (k, l) and while is in close proximity to (1) (u, v). In Figure 6 we sketch this correction dij .

Fig. 7.

We call r(u, v) the square region of side r which center is the point (u, v) that will be used as the area in which this average value of is calculated. i j < ij >uv

Fig. 6. Sketch of the

(1) duv

ij

1

ij

with (ij) =

(i j ) r(u, v) mi j = 0 (20)

term.

40

At this point all elements to calculate the matrix dose in the rst order are available. We can rewrite equation 15 as, d(1) = d(0) < ij >uv Auv uv uv

(1) (1) skl .

(21)

The new dose matrix duv has been calculated to minimize (1) However the value of duv must be renormalized to have the dose in the regions mij = 1 constant. We iterate as is (1) proposed in Ref. [4] but using dij as argument (see Figure 8). dij = C dij

(2) (1)

Another set of parameters corresponds to those associated with the numerical algorithm: D0 , S0 , which will dene the convergence of the calculation; the grid size evaluating numerical time and spatial resolution; and nally the number of discrete values to which Dij is going to be rounded given the limitations of the lithography control software. B. Results We present some graphs showing the results obtained in each step of the calculation for a pattern design sketched in Figure 3. We plot the nal dose Dij convoluted with the contrast curve of the resist maN 2403 using P b as substrate. This gives as result the thickness prole that would be obtained in the lithography.

such

Dij

D0 mij

(22)

Fig. 8.

(2)

Leading,

D0 mij = u,v uv d(2) fij = C uv u,v uv d(1) fij uv

(23)

The calculation for C is identical as that performed for C. Finally we obtain the new value for the dose matrix, dij =

(2) D0 mij uv duv fij /dij (1) (1)

(24)

Fig. 9.

corresponding to the exposure matrix used in the lithography equipment. Further improvement can be attained be iterating more steps of the procedure described. For the patterns tested in this work the precision obtained at this point is better that the limitations of the lithography equipment itself [3] not being necessary to time consuming further iterations. IV. C ORRECTION R ESULTS The presented algorithm has been developed for the optimization of the fabrication process of patterns consisting in an array of micrometer sized crosses. The process ow for required that the lithography process where performed using the negative e-beam resist maN 2403 (Microresist Technology) on top of Pb. Results for this particular process will be presented. A. Algorithm parameters Some parameters have to be dened in order to calculate the optimal writing dose of one particular pattern. Those related to the particular lithography that is performed: , , , D0 and S0 can has been determined experimentally by measuring the results obtained in particular exposure patterns, as reported in [3].

ISBN 978-987-655-003-1 EAMTA 2008

In Figure 9 we plot the prole obtained in a simulation with constant dose dij = D0 mij , with = 0.6. The maximum value is around 0.6 of the targeted thickness. Also visible are the roundness of the cross arms. These are indications that the dose is low. Increasing the value of the pattern does not improves but adjacent crosses start to be connected.

Fig. 10.

(0)

41

In Figure 10 we plot the normalized thickness obtained (0) using as dose matrix duv (equation 13). Now the maximum thickness is one indicating that at this point the algorithm is capable of homogenize the prole in the regions mij = 1. However still overlap between facing crosses is observed.

Fig. 13.

(2)

dose in the corners where the points (ij) have less neighbors and subtract dose in the region between faced arms.

Fig. 11. Resist thickness prole taking into account dij .

(1)

V. C ONCLUSIONS In the present work we study the inuence of the proximity effect in electron lithography that is the main limitation to the nal resolution attainable by this technique. We develop a correction algorithm that takes into account the total dose accumulated inside and outside the desired pattern evaluating the contrast characteristics of the resist. This algorithm can be used in any geometrical pattern where the proximity effect is notorious. For the patterns tested the algorithm succeed to correct the proximity effect with great quality. R EFERENCES

[1] T. H. P. Chang, Proximity effect in electron-beam lithography, J. Vac. Sci. Technol., vol. 12, pp. 12711275, 1975. [2] M. G. R. Thomson, Incident dose modication dor proximity effect correction, J. Vac. Sci. Technol. B, vol. 11(6), pp. 27682772, 1993. [3] J. J. Zrate, Ratchet de vrtices: experimentos en redes de junturas josephson, Masters thesis, Instituto Balseiro, 2007. [4] T. R. Groves, Efciency of electron-beam proximity effect correction, J. Vac. Sci. Technol. B, vol. 11(6), pp. 27462753, 1993.

In Figure 11 we show the prole obtained using the dose (1) matrix duv , where the correction due to the dose in the regions mij = 0 avoided the overlapping of neighbor crosses. However at this stage the distance between crosses is greater than the nominal.

Fig. 12.

(2)

Finally in Figure 12 the results obtained using the dose (2) matrix dij are plotted. The nal thickness in the regions of mij = 1 is almost constant and near one, there is no overlapping between neighbor crosses and the distance between them is the nominal of the pattern. (2) Complementary we show in gure 13 the dose matrix dij used to obtain the Figure 12. It is clearly different from the initial constant matrix. The correction algorithm injects more

ISBN 978-987-655-003-1 EAMTA 2008

42

Jess Garca-Guzmn, Antonio Salgado-Uscanga, Fayne Meza-Martnez and Carlos Gmez-Pecero

Facultad de Ingeniera Mecnica Elctrica, Universidad Veracruzana Xalapa, Mxico E-mail: jesusgarcia@ieee.org

AbstractThe calculations and basic design steps for the fabrication of a 170-nH inductor are given in this paper. The design process is supported by the use of computer software in every stage. The calculation of inductance is accomplished by implementing an algorithm in Matlab code, in which several modifications are proposed to a previously developed method. The results of the calculation program are used for the layout of an inductor compatible with CMOS technology, using AutoCAD and L-Edit software, and the addition of an etching step is proposed in order to raise the self-resonance frequency of the device to the band of radio frequency.

components at such high frequencies, are now required to be fabricated into the same IC chips where the rest of the components are built. Miniaturization of inductors, nonetheless, has not been an easy task. Structures in silicon have inherent capacitance that produces selfresonance when an inductive element is placed in the same substrate, hence limiting the use of these devices in high frequencies. Also, the silicon on the substrate has a resistive effect that reduces the quality factor. This has been a challenge for researchers in development of amplifiers, filters and other inductive circuits [10, 11, 12]. In this paper we describe the inductance calculations and the design of a microinductor that can be fabricated through standard IC technology. The inductor can be integrated within CMOS circuits, being part of systems such as amplifiers, passive filters, wireless devices or any other application in the RF range. II. INDUCTANCE CONSIDERATIONS It is desired to obtain a high inductance component that can be used in high frequencies, but it is necessary to reduce the parasite capacitance in the silicon substrate. For a high value of inductance, a long conductor with many turns may be used, but this can increase the overall size of the component, so it is necessary not to exceed practical dimensions for the compatibility with IC. The resonance frequency can be increased if the capacitive effect in the silicon substrate is reduced [5]. This can be done by reducing the area of the metallic film forming the coil, and this in turn can be achieved by reducing the width of the segments. But the main reduction in capacitance is obtained by removing the silicon substrate under the inductor, by using an additional etching step in the process [12, 13]. In conventional inductors, several techniques, instruments and correction parameters have been used for the calculation of inductance, and the theoretical estimations of the values of inductance are enough well approximated to the measured values in practice. There are many methods for the calculation of the self-inductance L and they all are derived from the geometrical considerations on a straight conductor. For small coils, several methods have been developed and there are modern software packages available for electrical analysis and simulation. Nevertheless, researchers have reported significant differences between the measured values obtained in their experimental designs and the values predicted by theoretical estimations [5, 6]. In the design of

I.

INTRODUCTION

The recent increase in the interest for micro-electromechanical systems (MEMS) is a consequence of the cumulative experience in the use of IC technologies. Almost any type of device is now available in the microscale and nanotechnology has opened newer applications for electronic devices [1, 2]. Among these advances, some components have received less attention, either because of the problems for their use in conventional systems or because they were initially hard to fabricate in the microtechnology environment [3, 4]. This is the case of inductors, electric devices which, in spite of providing a number of solutions for engineering problems, were hardly used in the beginning of the IC revolution. Overheating, volume and overall dimensions, cost, interference and noise, were some of the drawbacks of the conventional use of inductors [5, 6]. Nevertheless, microinductors are nowadays very useful components of integrated circuits, microtransducers and other microsystems. The movement of small devices like the pins in the head of high-resolution printers and the tuning of signals at high frequencies are typical examples of the use of microinductors. In the same way that inductive devices have been useful for conventional applications, the developing of micro and nanosystems is now producing a new generation of motors, coils, inductive sensors, acoustic inductive transducers and many other types of micro-inductive mechanisms [7, 8, 9]. These applications require the design and introduction of fabrication methods compatible with the IC technology. In particular, communications have highly benefited from advances in microsystems technology. However, the design of very small wireless systems requires of components in the bands of radio frequency (RF), capable of being integrated within standard silicon processes. Inductors, being important

43

microinductors, some of the factors that are irrelevant in the conventional scale become very important and is in this sense that the methods of calculation have significant differences [14, 15, 16]. A symmetrical geometry (square coil) was selected for this project, and consequently some of the considerations are valid only for this particular case, having effect on each stage of the design, from the basic theoretical aspects, to the etching stage at the end of the process. Above and beyond the effects of the geometry on the selfinductance, special attention is required for the calculation of the mutual inductance M, which is a term that in the conventional scale only has practical importance for the design of transformers and coupled coils. Mutual inductance corresponds to the effect of the current in a conductor and its associated magnetic flux when a second conductor -carrying also current- is placed in the proximity. One of the critical considerations in the design of inductors in the microscale is the usual simplification obtained by neglecting the magnitude of the negative mutual inductance. This quantity, which arises from the coupling of coils carrying currents in opposite directions, is much smaller than the whole inductance in traditional circuits, but this proportion is not valid for the case of microinductors [17]. The calculation of the inductance for the coil in this project was achieved by using a method which includes these considerations and has previously shown its efficiency [17]. The same approach is used in this work, but a different method for the numerical manipulation is proposed here, improving the original two-part descending algorithm. The total inductance LT of a coil is given by: LT = L0 + M (1) where L0 is the resulting self-inductance of all the segments and M is the sum of all the mutual inductance terms. The theoretical development of the equations for the calculation of inductance in planar rectangular inductors is quoted elsewhere [17]. From the results of these works after the corrections for the rectangular section of thin-films and assuming that the magnetic permeability of the conductor is equal to 1 (for the near-direct-current case), the selfinductance L (in nanohenries), for a straight conductor, will be calculated with the formula:

Because every segment of the microcoil is placed in the near proximity of the others, there will be as many terms of mutual inductance as interactions of the segments. For a square coil, the magnetic field of perpendicular segments will produce small effects between them, but for parallel segments there will be a strong mutual inductance. The sign will result positive when currents flow in the same direction and negative when they flow in opposite directions. The number of terms for mutual inductance increases with the number of segments and this is the main reason for the need of a computer method which simplifies the numerical work. Also, the proposed algorithm takes practical advantage of the symmetry of the design. To compute the mutual inductance M in nanohenries, the following expression will be applied: M = 2lQ (4) where l is again the conductor length in centimetres and Q is the mutual-inductance parameter, calculated from equation 5, adapted from [17], in which d is the distance between the track centres, approximately equal to the geometric mean distance between conductors.

2 l + 1+ l Q = ln d d 2 1+ d + d l l

(5)

The symmetrical design simplifies the computation of mutual inductances for conductors of different length. Fig. 1 shows the distance parameters for a twosegment arrangement. In order to compute the effect of the segments where one of the conductors does not induce on the other, the empty spaces must be considered by subtracting the corresponding mutual inductance terms. The general expression for the mutual inductance between segments i and m is

(6)

(7) (8)

(2)

where l is the length of the conductor, w is the width of the segment and t is the thickness of the film, all of them in centimetres. The resulting self-inductance will be the sum of the selfinductances of the Z segments of the square coil:

L0 =

lp li

lq

L

i =1

=2

l ln w + t + 0.50049 +

i i =1

2l i

w+t 3li

(3)

44

The use of the symmetrical equation (7) will introduce a very small error. Also, the same error will appear in the calculation of the positive and negative terms and therefore it will be mostly cancelled of the resulting overall inductance. III. CALCULATION OF INDUCTANCE The procedure for the calculation of the overall inductance of the square coil was implemented in Matlab code, with substantial modifications to the classical algorithm proposed by Greenhouse [17]. The data required for the process are: n w t s l(1) the number total of entire turns; the width of the conductor; the thickness of the film; the edge-to-edge distance between conductors; the length of the most inner segment;

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12 + + + + + + + + + + + + -

...

Considering a square coil with n entire turns, the first step of the process is the calculation of the number of segments as Z = 4*n. Then, a vector l with the lengths of every segment is calculated, starting with the length of the first segment as the only input. Afterwards, the vector of self-inductances is calculated and the sum is stored in L0 as given by equation (3). The next step of the program performs the calculations of the mutual inductances. After analysing the number of terms contributing to the whole inductance, some substantial reductions were detected and they were applied in the design of the Matlab loop computing mutual inductances. The symmetry of the system produces M i ,m = M m,i hence being only required to calculate one of each terms, i.e., only one half of the matrix M will be used in the calculations. Furthermore, since the inductive effect of every segment of the coil on itself corresponds to the previously calculated selfinductance, there is no need for calculations with the terms in the diagonal of matrix M. Likewise, since the segments that are perpendicular do not produce mutual inductance, they are as well eliminated from the process, which is reduced to the minimum number of iterations. Also the directions of the currents in the coil produce an alternating sequence of negative and positive mutual inductances following the distribution illustrated in Fig. 2, which shows the variation of the indexes, the low density of the matrix required, and the alternated distribution of the negative and positive inductances. This pattern of distribution suggests the implementation of an algorithm in which the horizontal index varies from 3 to Z in steps of 2 and the vertical index varies from 1 to Z-2, in steps of 1. The last two segments do not produce interaction with subsequent segments. Distances between conductors are calculated depending on the relative positions of the interacting segments. It is necessary to consider the alternating sequence and the fact that the lengths and distances are given in pairs of equal values.

. . .

. . .

In order to calculate the mutual inductance M i , m between segments i and m, assuming symmetrical configuration for the utilization of the equation (7), it is necessary to obtain the following values:

lp = lm li 2 (9) (10)

li + p = li + l p

where the values are taken from the vector of lengths calculated before. These values of the distances between segments are used to obtain the values of Q in accordance with equation (5). Once all the terms are computed, the final calculations are performed. The overall mutual inductance is twice the sum of all the elements in M, and this result is added to the previously obtained self-inductance to give the total inductance of the coil. Although not described in this paper, several modifications of the program were also used in order to estimate some parameters in the design of the coil, such as the overall dimensions, the resonance frequency and the effects of variations of the input data on the final result IV. RESULTS The computer program was tested with data reported in the literature. Unfortunately, only one of the reports [6] provided the complete set of data required for the comparison with the results produced by the program. First, the program was validated with the data of the example given in [17] and the results were correct. After that, the data reported by Nguyen and Meyer [6] were used for testing. They report the results for two squarespiral inductors, one of them with 4 turns and the other with 9 turns. As expected from the assumptions made for the

45

IEEE Catalog number CFP0854E-CDR

calculations performed with the program, the accuracy will increase with the number of turns. This was confirmed for the two inductors reported in [6]. For the 4-turn inductor, with outer dimension of 115 m, the authors reported a measured inductance of 1.9 nH, and a theoretical value of 1.3 nH. The computer program in this project gave a result of l.25 nH, composed of L = 0.80 nH and M = 0.45 nH. With the large inductor, consisting of 9 turns, the compensation of errors on negative and positive mutual inductances produced better results. The reported values were 9.7 nH for the measured inductance and 9.3 nH for the theoretical one. The program in this project gave an overall inductance of 9.48 nH, resulting from the sum of L = 3.73 nH and M= 5.75 nH. After these tests, the program was used to obtain the inductance of the coil designed in this work. Several trials and considerations were made before obtaining the selected values, according to the objectives of the project. It was desired to design a coil smaller than the size reported as compatible with IC devices; hence it was kept below 400 m. Also, it was expected to obtain an inductor suitable for the work in RF, after the removal of the silicon substrate, and even trying to obtain a higher value of inductance. These considerations determined the selection of the number of turns, the width of the segments, the spacing between them and the length of the starting inner segment. In addition, as the active area of the conductor is directly related to the parasite capacitance, a reduction on the dimensions was suggested. The input data feed to the program were selected as follows: n = 30, w = 3 m, t = l m, s = 3 m and l(1) = 30 m. As selected in previous works [3, 4, 5, 6, 13, 17] the metal is assumed to be aluminium, with conductor permeability of l and frequency-correction parameter equal to l. The program produced a coil with outer size of 384 m and an overall inductance of l7l nH resulting from the sum of L = 27 nH and M = 144 nH.

A drawing of the coil, obtained with AutoCAD, is shown in Fig. 3. The distances are given between the points at the middle of the tracks of the segments. The first segment is the one at the topside of the inner turn, measuring 30 m. Since the coil has 30 complete turns, the number of segments is 120. V. FABRICATION PROCESS The inductor is intended to be fabricated as a component of a microsystem in a CMOS process. The fabrication of the inductor requires two layers of metal in the process. One of them provides the external contact for the inner end of the coil. The second one contains the 30 turns of conductor. Also, a point of metal contact will be needed to join these two layers. The isolation between the metal films will be achieved through the oxide layers in the CMOS process. The only step that is additional to the conventional process is the etching stage required at the end for the removal of the silicon substrate under the coil, leaving the device suspended on the centre of the structure. A cross-section of the layers in the structure is shown in Fig. 4, with the vertical dimensions augmented to make the details visible.

Metal 2: Coil with 30 turns Metal 1: External contact from centre

Substrate

Figure 4. Cross section of the structure with the suspended coil showing the required layers.

The post-processing etching step is going to be introduced in order to form a free-standing microstructure without affecting the components fabricated in the conventional process of IC fabrication. Several methods [13, 18, 19, 20] are available for this purpose, but the one proposed here consists in the utilization of an anisotropic silicon etchant to fabricate oxide microstructures. After depositing the two metal layers between the oxide, it will be necessary to pattern the openings where the etching will start, and finally etching away the layers and the substrate underneath the coil. The desired structure can be obtained by carefully manipulating the application of the etchant and by considering specific characteristics in the crystal orientation. Since this is a non-conventional step in the processing of CMOS circuits, and given that the quality factor of the inductor and the self-resonance frequency will depend on this stage, further explanation is given about the technique that allow the removal of the silicon with the purpose of reducing the capacitance. Calculations of the actual resistance of the coil and, therefore, of the resulting quality factor, will depend on the particular characteristics of the CMOS process used for fabrication, which will determine both the thickness of the metal layer and the expected resistance per square.

Figure 3. Layout view of the designed coil, with 30 turns and outer size of 384 m.

46

Anisotropic etching is chosen because it can be used to develop unique structures not feasible by other methods. In this technique, the etch rates depend on the crystallographic orientation. Also, the geometry of the structures that can be etched through the openings in a mask, depends on the shape of the opening itself. In order to obtain the desired structure, the edges of the opening must be oriented properly [21]. For example, truncated pyramidal structures can be obtained when a <100> oriented silicon wafer is exposed through a hole in a layer of silicon dioxide, to an anisotropic etchant. The shape of the pit will be bounded by <111> crystallographic planes with a very low etch rate. Thus, it is necessary to define the orientation of the edges of the opening in order to produce the desired structure. If a mask has only concave corners and the opening is properly aligned in the <1l0> directions there will not be undercut. If the edges of the opening are misaligned, concave corners and edges will be undercut. There is no theoretical explanation for the fact that geometries with convex corners will produce undercut, irrespective of alignment conditions. It seems to happen, experimentally, that the undercut depends on the etch time and the local surface area exposed actively to the etch [22]. Particularly, the symmetrical square structure described in this paper can be totally undercut because it is supported from the corners. When anisotropic etching begins, undercutting of the four tethers starts because they are not aligned in the <ll0> directions. The tethers are fully undercut, creating four convex corners. This in turn causes further undercutting of the oxide that is patterned as a central square. If the etching continues, the central square will be fully undercut and suspended. Several etchants are used for anisotropic etching, but potassium hydroxide (KOH) is the most frequently used. Other common etchings are Ethylenediamine-PyrocatecholWater (EDP), tetramethylammonium hidroxide and cesium hydroxide. KOH is used in mixtures with water and isopropyl alcohol (IPA) for etching. The etch rate is a function of the composition of the mixture, etch temperature, silicon orientation and resistivity. Etching mixtures may have different compositions and are often used in two phases. A typical etch solution is made up of 5000 ml of 40% aqueous KOH with sufficient IPA added to ensure separation into two phases. Additional IPA is added to produce liquid on top of the solution to ensure the twophase operation. The mixture is used between 80C and 90C. This mixture of KOH with IPA has an anisotropic ratio of 34, while without IPA the anisotropic ratio is only 8. Also, the use of <110> oriented silicon instead of <111> material gives similar results [23]. The final step in the fabrication of the inductor will be the etching with a mixture of KOH, IPA and water; which applied as described will produce the removal of the silicon substrate.

VI.

CONCLUSIONS

In general, the proposed design seems to be feasible. The design process was concentrated in the inductor without referring the calculation or fabrication steps to a particular situation. The proposed procedure for the calculation of the inductance of square microcoils can be extended to rectangular inductors with small modifications. Most of the restrictions implemented for this project can be solved for the extension of the method over a wider scope of applications. Given that the fabrication process could not be tested in practice, the only results that can be validated are the outputs of the computer code implemented for the calculation of the inductance. The particular conclusions derived from these observations are: a) The thickness of the film is not a relevant factor in the value of the final inductance and so it can be adjusted to the requirements of the fabrication process. b) The number of turns in the coil has a heavy influence (there is a quadratic relationship) in the value of the mutual inductance, since the number of combinations grows rapidly. c) The length of the segments is also an important factor for the final value of the inductance. Since the outer dimensions of the coil must be kept in the range of the size of microsystems, it is advisable to design coils with a low thickness and a long first segment for the coil. d) Reducing the length of the first coil will not make a significant reduction in the outer dimensions of the coil. VII. REFERENCES

[1] D.J. Bishop, C. Giles, G. Austin, The Lucent Lambda Router: MEMS technology, Communications Magazine IEEE, Volume 40, Issue 3, March 2002, pp. 75-70. A. Helland, H. Kastenholz, Development of nanotechnology in light of sustainability, Journal of Cleaner Production, Volume 16, Issues 89, May-June 2008, pp. 885-888. W. Gpel, J. Hesse and J. N. Zemel. Sensors: A Comprehensive survey. Volume 5: Magnetic Sensors VHC Publishers. Inc. New York, 1996, p. 513. W. Gpel, J. Hesse and J. N. Zemel. Sensors: A Comprehensive survey. Volume 8: Micro and Nanosensor Technology/trends in Sensor Markets VHC Publishers. Inc. New York, 1996, p. 565. J.Y. Chang, A. A. Abidi, Large Suspended Inductors on Silicon and Their Use in a 2 um CMOS RF Amplifier, IEEE Electron. Dev. Lett., Vol. EDL-14 No. 5, 1993, pp. 246-248. N.M. Nguyen and R. G. Meyer. Si IC-Compatible inductors and LC passive filterers, IEEE Journal of Solid State Circuits, Vol. 25, No. 4, 1990, pp. 1028-1031. C. Tassetti, G Lissorgues, J.P. Gilles, Tunable RF MEMS microinductors for future communication systems, Microwave and Optoelectronics Conference, 2003 IMOC 2003, Volume: 1, Sept. 2003, pp. 541- 545. I. Giouroudi, J. Kosel, H. Pftzner, W. Brenner Magnetostrictive bilayer sensor system for testing of rotating microdevices, Sensors and Actuators A: Physical, Volume 142, Issue 2, 10 April 2008, pp. 474478.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

47

[9] O. Cugat, J. Delamare, G. Reyne Magnetic micro-actuators and systems (MAGMAS), IEEE Transactions on Magnetics, Volume 39, Issue 6, Nov. 2003, pp. 3607-3612. L. Meimei, L. Shandong, D. Jenq-Gong, High-frequency ferromagnetic properties of FeNdBO thin films, Journal of Alloys and Compounds, Volume 455, Issues 1-2, 2008, pp. 516-518. K. Jae-Wook, J. Myung-Hee, P. Nho Kyung, Y. Eui-Jung, Microfabrication of solenoid-type RF SMD chip inductors with an Al2O3 core, Current Applied Physics, In Press, Corrected Proof, 17 November 2007. N. Wang, T. ODonnell, S. Roy, P. McCloskey, C. OMathuna, Micro-inductors integrated on silicon for power supply on chip, Journal of Magnetism and Magnetic Materials, Volume 316, Issue 2, September 2007, pp. e233-e237. M. Parameswaran, H. Baltes, L. Ristic, A.C. Dhaded, A. M. Robinson, A new approach for the fabrication of micromechanical structures, Sensors and Actuators, Vol. 19, 1989, pp. 289-307. C. Peters, Y. Manoli, Inductance calculation of planar multi-layer and multi-wire coils: An analytical approach, Sensors and Actuators A: Physical, Volumes 145-146, marcy-april 2008, pp. 394-404. D. Fang, Y. Zhou, X. Wang, X. Zhao, "Surface micromachined highperformance RF MEMS inductors", Microsystem Technologies, Volume 13, Number 1, 2007, pp. 79-83. X. Wang, X. Zhao, Y. Zhou, X. Dai, B. Cai, "Fabrication and performance of novel RF spiral inductors on silicon", Journal of Shanghai University (English Edition), Volume 9, Number 4, 2005, pp. 361-364. [17] H. M. Greenhouse. Design of planar rectangular microelectronic inductors, IEEE Transactions on parts, hybrids and packaging. Vol. PHP-10, No. 2, 1974, pp. 102-109. [18] I. Pellejero, M. Urbiztondo, M. Villarroya, J. Ses, M.P. Pina, J. Santamara Development of etching processes for the micropatterning of silicalite films Microporous and Mesoporous Materials, Volume 114, Issues 1-3, 1 September 2007, pp. 110-120. [19] X. Gao, Y. Zhou, Y. Cao, C. Lei, W. Ding, H. Choi, J. Won, A Copper/Polyimide Fabrication Process for Fabricating High-Inductance Microinductor, Electronics Packaging Manufacturing, IEEE Transactions, Volume: 30, Issue: 2, April 2007, pp. 123-127. [20] R. Changhong, C. Jun , Y. Liang, H. Pick, N. Balasubramanian, J. Sin, The partial silicon-on-insulator technology for RF power LDMOSFET devices and on-chip microinductors, Electron Devices, IEEE Transactions, Volume 49, Issue: 12, Dec 2002, pp. 2271- 2278. [21] M. Elwenspoek, Q. D. Nguyen, Characterisation of anisotropic etching in KOH using network etch rate function model: influence of an applied potential in terms of microscopic properties, Journal of Physics: Volume 34, 2006 , pp. 1038-1043 [22] C. R Tellier, "Anisotropic etching of silicon crystals in KOH solution: Part III Experimental and theoretical shapes for 3D structures micromachined in (hk0) plates", Journal of Materials Science, Volume 33, Number 1, 1998, pp. 117-131. [23] S.J. Kwon, Y.M. Jeong, S.H. Jeong, "Fabrication of high-aspect-ratio silicon nanostructures using near-field scanning optical lithography and silicon anisotropic wet-etching process", Applied Physics A: Materials Science & Processing, Volume 86, Number 1, 2007 , pp. 11-18

[10]

[11]

[12]

[13]

[14]

[15]

[16]

48

Alfonso Chac n-Rodrguez o

Laboratorio de Componentes Electr nicos, o Universidad Nacional de Mar del Plata, Argentina Email: alchacon@itcr.ac.cr

Pedro Juli n1 a

Instituto de Investigaciones en Ingeniera El ctrica e IIIE (UNS-CONICET) Departamento de Ingeniera El ctrica y de Computadoras e Universidad Nacional del Sur Avda. Alem 1253, (8000) Baha Blanca Argentina Email: pjulian@uns.edu.ar

Abstract Five pre-processing algorithms for the detection of rearm gunshots are statistically evaluated, using the Receiver Operating Characteristic method, as a previous feasibility metric for their implementation on a low power VLSI circuit.

Amplitude (normalized)

1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

2.56 2.58 2.6 Time (seconds)

I. I NTRODUCTION Detection, classication and localization of gunshots are of particular interest in areas related to public health, surveillance, law enforcement and the military. There is plenty of research regarding gunshot theory and the needs for its study (see [1][4]), as well as many software and hardware implementations of computationally efcient signal processing analysis methods [5][10], [22]. These solutions mostly use complex algorithms such as short time Fourier Transforms, Wavelet Transforms, Hidden Markov Models, Gaussian Mixtures and Maximum Likelihood Models and claim to be very effective at detecting, classifying and localizing shots from different rearms. Yet such algorithms are expensive in terms of power due to their computation needs, which range from whole personal computer systems to mote oriented sensor networks with DSP dedicated chips, making their deployment on the eld cumbersome when not totally restricted to indoor use. One particular instance of interest is the establishment of a surveillance network against illegal hunting in tropical forest reserves. In such environment, low power sensor networks provide a feasible solution considering the large areas to be protected, and the near impossibility of providing the sensors with standard long lasting power supplies. Though there are some commercial solutions available in that area (see [4], [9], [11]), all of them entail the use of equipment and algorithms that claim to be efcient in terms of processing but not in terms of power dissipation, questioning the use of complex classication methods at least in the early stages of detection. Regarding the complexities behind gunshots and rearms detection and classication, Maher offers a very thorough explanation of the physics of a gunshot in [1]. Gunshot sound is produced by two phenomena. First, the muzzle blast, that is produced by the rapidly expanding gases from the conned explosive charge that is used to propel the bullet out of the gun barrel. This acoustic disturbance lasts 3-5 milliseconds and propagates through the air at the speed of sound. Second,

1 P.

2.48

2.5

2.52

2.54

2.62

2.64

2.66

2.68

Fig. 1. Typical time signature of a gunshot: 9mm pistol at 30mts range from the recording microphone. Multipath distortion is appreciated a few miliseconds after the rst peak.

Juli n is also with CONICET, Av. Rivadavia 1917, Bs. As., Argentina a

if the bullet travels at supersonic speed, it causes an acoustic shock wave that propagates away from the bullets path. The shock wave expands as a cone behind the bullet, with the wave front propagating outward at the speed of sound. A typical gunshot signature is shown in Fig. 1. The sound characteristics of any gunshot, thus, are determined by factors such as the caliber of the bullet and the barrel, the length of the latter and the chemical properties of the propellant. Besides, being a nearly perfect impulsive signal, any particular measurement of the spectral or impulsive characteristics of a particular gunshot will likely give more information about the acoustic surroundings (i.e., the acoustic impulse response) rather than the rearm or the projectile characteristics [1], which in turn are dependent on another multiple set of factors such as temperature, wind speed, foliage density, air moisture and soil characteristics [12]. Attempts at detecting the Nshaped shock wave (as Sadler et al report using a wavelet approach [7]) become difcult as the wave rapidly loses its shape due to non-linear dispersion, or disappears altogether once the bullets speed falls under supersonic speed or hits an obstacle, a possibility which is higher in such a dense setting as a tropical rain forest. On the other hand, looking

49

at the power spectra of three particular gunshots gives also an idea of the differences between rearms located at the same distance (Fig. 2), which simply discourages the use of a simple ltering method for the task of detection. Therefore, a previous evaluation of the efciency of any detection algorithm and the feasibility of its low power implementation becomes mandatory before proposing a particular solution. The paper is organized as follows: Section II depicts the typical basic detection architecture; Section III explains the algorithms to be evaluated; Section IV shows the analysis of the results; nally, Section V presents the conclusions.

3

A Receiver Operating Characteristic plot is to be obtained for each pre-processing method. According to signal detection theory [14], the ROC plot is constructed with the ordered pairs (TPR, FPR) of a detection system as a function of a certain detection threshold, where TPR stands for True Positive Rate, and FPR for False Positive Rate, and each gure is dened according to TPR = FPR =

True positives detected Total number of positives False positives detected Total number of negatives

(1)

2

Energy

x 10

500

1000

1500

2000

2500

4000

0.04

Energy

0.02

0

500

1000

1500

2000

2500

4000

0.1

Energy

0.05

500

1000

1500

2000

2500

4000

Fig. 2. Example of the power spectra for a .22 carbine, a 9mm pistol and a .12 shotgun recorded at 30mts.

AND

ROC

The proposed detection scheme is shown in Fig. 3 and is common in the eld of biomedical engineering for the detection of neural spikes [13] but also in other applications involving detection and classication of impulsive audio events [5]. Detection is achieved by the comparison between a preprocessed version of the signal and an adaptive threshold, typically a running average or RMS estimation of the same pre-processed signal, scaled by a gain factor C.

x(t) Pre Processing C Detection

Running Average

Fig. 3.

A true positive is to be considered as such whenever a detection occurs within a few tens of samples of a real gunshot peak impulse. The evaluation is based on the ordered pair which stands closer (in terms of the Euclidean distance) to a perfect detector with a (TPR,FPR)=(1,0), where FPR gives the xaxis coordinate and TPR gives the y-axis coordinate. Usually, effective detectors are chosen allowing for a certain percentage of FPR in order to increase the TPR, since a false positive can always be eliminated later on by the classication system, while a missing true positive is lost forever. In our case, nonetheless, a detector with a high FPR implies power waste. Besides, it is assumed that the sensor network redundancy can compensate for a certain number of missing true positives. Thus, a sensor with a relatively low TPR may be acceptable for our purposes. Due to the intensive computation involved, a discrete ROC with 5 threshold values is to be calculated for each method and the best pair is to be extracted from the plot. The signals used in the evaluation are a collection of sounds recorded in a dense tropical rain forest, at a 48kHz sampling rate with 32-bit quantization, on a high quality digital recorder, using a professional, high sensitivity, directional microphone. Amplitude is normalized to a maximum pressure of 110dBSP L. The target samples include 5 rearms of different calibers, red at 30mts, 90mts and 250mts from the recording equipment, at angles of 0 , 90 and 180 . Additional samples for negative validation include: a chain-saw recorded at 30mts from the equipment, at the same three angles as the rearms; two planes low ying over the scene; three recordings of various birds singing; two recordings of rainshowers; recordings of two different water streams; a recording of wind through the trees surrounding the setting; a Matlab generated white noise signal with 2 = 0.1; and a male human voice recorded close to the microphone at a normal speech level. All the signals are pre-ltered using an IIR 4th order Butterworth low pass lter with a cutoff frequency of 3KHz (cutoff frequency determined from the observation of the gunshots power spectra), except those that are to be processed using wavelets, where the ltering is done by the pre-processing itself. In the case of the negative samples, signals are taken to amplitude levels equivalent to sound pressures ranging between 90dBSP L and 98dBSP L (the typical pressure levels of gunshots at distances greater than 90 mts from the gun barrel, on a obstacle-free propagation environment).

50

III. D ESCRIPTION

OF

P RE - PROCESSING A LGORITHMS

The following methods are proposed alternatively in [5], [13], and [22]. Detection with no signal pre-processing is taking as oor reference, as Obeid and Wolf do [13]. Implementation complexity is not taken into consideration in the comparison gures, yet an intuitive evaluation of the method hardware feasibility is offered. A. Absolute Value Absolute value of the input signal is taken before being introduced into the detection scheme of Fig. 3. Since abs[x(t)] is a one to one mapping of the energy estimation of the signal (x2 (t)), their respective performances are considered to be equivalent (as Obeid and Wolf argue in [13]), but with a lower implementation complexity. Besides, such pre-processing can be performed by an analog circuit, with the respective power savings involved. B. Median Filter Data is fed through a median lter with a window size of 7 samples (3 samples before and 3 samples after the center of the window), with a 1ms delay within each window sample. The lter output is subtracted from the signal in the middle of the window; this is considered as the normalized energy that enters into the threshold unit (Fig. 4). Dufaux et all [5] proposed this structure using a median lter with a window size of 20 samples, using a 44.1kHz sampling rate with 24-bit data resolution. No detail is offered in their paper about the delay within the samples, which we assumed to be equal to the sampling period.

x(t) Energy e(t) = abs[x(t)]

for the analog shift register. The search window, nonetheless, must be limited in length, as this structure quickly degrades the signal [18]. By extensive simulation with the available gunshots data, a maximum difference between the peak of the signal and the normalized energy is searched for a low enough value of window length and a specic delay. From the results plot, it is possible to determine that a window size of 7 samples with a delay of 1ms between samples, at a 7kHz sampling rate, is adequate (Fig. 6 shows one of the plots used for the determination of these parameters: it is clear that beyond 1ms, the improvement in the differences of energy is not signicant).

x(t) n=1 n=2 Data direction n=3 n=N

Fig. 5. Simplied Bucket Brigade Device used as an analog shift register. 1 and 2 are complementary phases of a bi-phase sampling clock.

10

Normalized Max Energy and RMS at030m, Window=7 9mm Pistol 9mm Pistol RMS .32 Revolver .32 Revolver RMS .38 Revolver .38 Revolver RMS .12 Shotgun .12 Shotgun RMS .22 Carbine .22 Carbine RMS

10

energy

10

10

Analog Reg.

Running Average

10

10

10 time (seconds)

10

med[e(n)]

Fig. 6. Example of search for the optimum delay: median lter with a search window of 7 samples

Fig. 4. Median Filter structure. Since there is a one to one relation between x2 (t) y abs[x(t)], the second method is used as it is easier to implement in analog or mixed signal circuits

C. Teager Energy Operator The Teager Energy Operator (TEO), as dened in [19], is applied to the signal before feeding it to the threshold unit. This operator has the following discrete form: y(n) = x(n)2 (x(n 1)x(n + 1)) (2)

For our purposes, a digital implementation is not possible due to its high power requirements. A completely analog implementation as in [16] and [17] is constrained by the delays imposed by the search window, which is in the order of 1ms. Typical analog versions of delay chains are based on allpass lters, that at best provide delays equivalent to a phase shift of up to radians of the input signal. Besides, delay constants in the order of hundreds of microseconds require RC relations hard to achieve on standard CMOS processes. A mixed-signal approach is therefore a reasonable alternative. A bucket-brigade device (Fig. 5) is a good compromise solution

ISBN 978-987-655-003-1 EAMTA 2008

which, as reported by [19], enhances the high frequency components of the input signal x(n), and is thus recommended for the detection of impulsive signals. One advantage of this method is that it can be implemented by an analog circuit, following the continuous equation also proposed in [19]: y(t) = x2 (t) x(t) d2 x(t) dt (3)

51

D. Correlation Against a Template Detection and classication methods based on correlation matching are common in plenty of elds, from brain machine interfaces [13] to Ultra Wide Band receivers [20]. Digital simplied detectors based on correlation with very low power dissipation have been successfully built for particular applications [21] and mixed signal general classiers have also been proposed [18]. Here, a full scale method is initially proposed (oating point resolution, 48kHz sampling rate) as a top metric for the evaluation of the methods efciency. On a later stage, simplications such as the use of integer arithmetic with low resolution or a lower sampling may be introduced, in order to gauge the trade-offs between the degrading efciency of the method and its hardware feasibility.

x(t) Antialias lter

the coefcients decomposition algorithm used in these kind of lter banks (see Bultheel, [23]), a signal f (t) is decomposed in a sum of functions of the type fj gj =

k

=

k

where (t) is a Riez basis and (t) is an orthonormal basis (both are often referred to as scale father function and wavelet mother function) with both respectively spanning the spaces Vn y Wn , the approximation and detail coefcients of a level n are related to those of the next level n + 1 by the equations: vnk = hl2k vn+1,l

l

wnk

Detection

=

l

gl2k vn+1,l

(5)

Analog Reg.

....

Running RMS

Rxy

These equations correspond to the respective application of transfer lters with H(z) = k hk z k and G(z) = k gk z k functions, followed by sub-sampling, where hk = ck / 2 and gk = dk / 2, with ck y dk as the coefcients from the dilation equation (t) =

k

ck (2t k),

ck = 2

k

(6)

....

Template

k fn = vn G 2

cD1 = wn1

(7)

Fig. 7.

The structure of detection is shown in Fig. 7. First, two signal templates are obtained by the averaging of gunshot signals at 30mts and 90mts, as Obeid and Wolf propose [13]. The templates are stored in two 1000-samples long vectors. Signal is fed through a window the same size of the template vectors, at a rate of 39 samples per iteration. At each iteration, correlation with each vector is computed and stored in another pair of vectors. These are the outputs of the system, which go to a threshold detector. Since the correlation is a signed operation, the averaging is done using a running RMS scheme. In Edwards and Cauwenberghs proposal of a mixed mode classier implementation of this algorithm [18], the computation is not done directly on the signal itself but on the features provided by a wavelet, cochlear or any other kind of pre-processing algorithm. The sampling vector used is a BBD structure and the correlation is done by analog current multiplication. Just as in the median lter case, the BBD cannot be extremely long. This entails a shortening of the template as well, and a cut on the sampling frequency. E. Discrete Wavelet Transform Istrate et all proposed in [22] the use of discrete analysis with a Daubechies wavelet of six vanishing moments for the detection of impulsive signals. In our case, we use a similar approach, but using an 8 level Haar wavelet bank lter on a 7kHz sub-sampled signal (Fig. 8). According to

ISBN 978-987-655-003-1 EAMTA 2008

Fig. 8. General structure of a wavelet decomposition bank lter. Signal details are given by the wn coefcients . Signal is approximated by the vn coefcients f (t) cD3 2 P Detection 2 Average

Fig. 9. Filter bank structure. Level detail coefcients energy is calculated before feeding them to the energy sum.

52

The lter bank is structured following a dyadic scale using 3500Hz as the Nyquist frequency, and it is fed with input sequences of 2048 samples. The number of levels of the decomposition and the choice of the coefcients of interest are the result of a preliminary analysis with the wavelet interactive toolbox from Matlab. Energy is calculated and added from the chosen coefcients. Various cases are evaluated: the best results are obtained considering levels 3, 4, 5 and levels 4, 5 and 6. The output is then fed to the threshold detector, as in Fig. 9. The choice of Haar functions is based on their simpler form, which allow for a mixed mode implementation using switched capacitors, for instance. A Haar scale function is basically a moving average operator with a transfer function H(z) = (1 + z 1 )/2, while its wavelet function is a moving difference operator with a transfer function G(z) = (1 z 1 )/2. IV. E VALUATION R ESULTS

AND

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0

Gain= 15 Gain= 20 Gain= 25 Gain= 30 Gain= 35 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fig. 11.

A LGORITHM R ANKING

Detection is evaluated for different gains. An example of such evaluation is shown in Fig. 10 for the wavelet algorithm. With the results from the evaluation of the 45 positive samples, plus the 15 negative samples, ROCs were plotted for each method. An example of such plots is given in Fig. 11. Table I shows the ranking of the best pairs for each tested method, with the corresponding gain for the threshold unit (C).

Binary detection, levels 4,5,6

Method Correlation 30mts template Correlation 90mts template DWT Coeffs. 3, 4, 5 DWT Coeffs. 4, 5, 6 Median Filter Absolute value TEO No pre-processing TPR 0.91 0.91 0.89 0.87 0.8666 0.8444 0.80 0.77 FPR 0 0 0 0.07 0.1333 0.1333 0.20 0.1333 Threshold Gain C 25 25 80 70 25 15 45 15

1

0.5

0

500

1000

1500 2000 2500 3000 3500 4000 Samples, level 4 Prep.wav : Signal at 250m, Gun= S12c Gain=70

4500

5000

0.05

0

0.05

0.1

0.2

0.7

0.8

0.1

0.05

0

500

1000

1500

3500

4000

4500

5000

Fig. 10. Example of detection of a 12-caliber shotgun at 250mts from detector, using the DWT pre-processing method at a 7kHz sampling rate.

Correlations against templates at 30mts or at 90mts were the best pre-processing methods, yielding no false positives. Two false positives existed though in the positive samples, that is, positives that occurred out of the sample window where the signals peak impulse lies. Wavelet analysis using coefcients 3, 4 and 5 was second, with no false positives, and a TPR of 0.89, real close to the correlations TPR of 0.91. Third was also wavelet analysis, using the coefcients information from levels 4, 5 and 6, with a slight decrease in detection efciency. A smoother wavelet would probably have given better results, yet with an increase in implementation complexity. Contrary

ISBN 978-987-655-003-1 EAMTA 2008

to the correlation method, with wavelet pre-processing there were no false positives in the positive samples. Median lter pre-processing came fourth. This method was particularly affected by false positives. If the gain is set at 20 instead of 25, its TPR equals that of the wavelet pre-processing, but with a FPR of 0.2. Nevertheless, analysis of the negative samples that produce some of these false positives show strong pops in the recording (see for instance Fig. 12), which are probably caused by water drops hitting the microphone and generating an impulsive sound. An acoustic protection on the microphone could thus increase the TPR while decreasing the FPR. Anyway, it is remarkable that the correlation and the wavelet algorithms are not fooled by these pops. The fourth method, which consists on just taking the absolute value of the signal, outperformed the TEO operator not only in its TPR but also in its FPR, which coincides with Obeid and Wolf observations that included even more rened versions of the latter method [13]. As in the median lter case, the pops effect in the negative samples is present in both methods, which means that a similar protection of the microphone may increase their performance. Not considering, for instance, the negative sample of the water stream, brings the FPR down to 0.07 in the absolute value pre-processing, the same as in the second case of the wavelet method. No pre-processing, as

53

Preprocessing Median: Stream 2 Binary detection

1

0.8

0.6

0.4

0.2

0

5

10

25

30

35

0.4

0.2

0

0.2

0.4

0.6

5

10

20 15 time (seconds)

25

30

35

Fig. 12. False positive using median lter on a water stream recording. Notice the pop in the sample that fools the algorithm. A higher gain in the threshold detector circumvents the false positive, at the expense of losing some true positives. An acoustic protection on the microphone may be a simple solution for this false positive, without sacricing the TPR.

expected, gave the metrics oor. Surprisingly, even no-preprocessing yielded better FPR results than the TEO operator. V. C ONCLUSIONS Detection of impulsive signals can be implemented with a wide variety of effective algorithms. A ROC statistical metric has been proposed in order to sort them in terms of detection efciency and from the results obtained, some annotations have been given about their feasibility of VLSI integration. Clearly, correlation and wavelet-based detection algorithms give high performance at a higher hardware cost, but there exist good mixed signal approaches to their VLSI implementation. A median lter approach may be as hardware costly as the preceding methods, with inferior results. For that matter, just considering the absolute value of the signal, with a protected microphone, can offer a similar performance at a much lower hardware cost. ACKNOWLEDGMENT A. Chac n-Rodrguez is on leave from the Instituto Teco nol gico de Costa Rica, on a scholarship funded by this o institution and the Ministry of Science and Technology from Costa Rica. The authors thank N stor Hern ndez Hostaller e a and Pablo Alvarado at the School of Electronics Engineering, Instituto Tecnol gico de Costa Rica, for the high quality signal o samples used in this case study. This work is funded by Project ANPCyT-PICT 2006 No. 1835, Project PGI-UNS 2006 No. 24/ZK17, and Project PIP 2005-2006 No. 5048 of CONICET. R EFERENCES

[1] R. C. Maher, Modeling and Signal Processing of Acoustic Gunshot Recordings, in Proc. IEEE Signal Processing Society 12th DSP Workshop, September 2006, p.p. 257-261. [2] P. G. Weissler and M. T. Kobal, Noise of police rearms, Journal of the Acoustic Society of America, vol. 56, no. 5, pp. 1515-1522, Nov. 1974. [3] R. Stoughton, Measurements of small-caliber ballistic shock waves in air, Journal of the Acoustic Society of America, vol. 102, no. 2, pt.1, pp. 781-787, Aug. 1997.

[4] L. Green Mazerolle, C. Watkins, D. Rogan, and J. Frank, Random Gunre Problems and Gunshot Detection Systems, in National Institute of Justice: Research in Brief, U.S. Department of Justice, Ofce of Justice Programs, National Institute of Justice, http://www.ojp.usdoj. gov/nij, Dec. 1999. [5] A. Dufaux, L. B sacier, M. Ansorge, and M. Pellandini, Automatic e Sound Detection and Recognitions for Noisy Environment, in In Proc. of the X European Signal Processing Conference, EUSIPCO 2000, http: //citeseer.ist.psu.edu/besacier00automatic.html, 2000. [6] K. Moln r, A. L deczi, L. Sujbert, G. P celi, Muzzle Blast Detection a e e Via Short Time Fourier Transform, 12th MiniSymposium 2005 of the Department of Measurement and Information Systems, Budapest University of Technology and Economics, http://home.mit.bme. hu/kmolnar/index.html, 2005. [7] B. M. Sadler, L. C. Sadler and T. Pham, Optimal and Robust Shockwave detection and estimation, in Proc. 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 97), vol. 3, pp. 1889-1892, 1997. [8] B. G. Ferguson, L. G. Criswick, and K. W. Lo, Locating far-eld impulsive sound sources in air by triangulation, Journal of the Acoustic Society of America, vol. 111, no. 1, pt. 1, Jan. 2002. [9] G.L. Duckworth, J.E. Barger, S.H. Carlson, D.C. Gilbert, M.L. Knack, J. Korn and R.J. Mullen, Fixed and wearable acoustic counter-sniper systems for law enforcement, in Proc. SPIE International Symposium on Enabling Technologies for Law Enforcement and Security Sensors, C3I, Information, and Training Technologies for Law Enforcement, November 1998, pp. 3575-3577. e [10] A. L deczi, P. V lgyesi, M. Mar ti, G. Simon, G. Balogh, A. N das, o o a B. Kusy, S. D ra and G. Pap. Multiple Simultaneous Acoustic Source o Localization in Urban Terrain, in Proc. Fourth International Symposium on Information Processing in Sensor Networks, IPSN 2005, April 2005, pp. 491-497. [11] M. Zu, P. Su, R. Shi, W. Wang, and J. Yu, AntiHunter. Intelligent Tracer of Hunting Activities, Lily Studio, University of Nanjing, June 2006. [12] A. I. Tarrero Fern ndez, Propagaci n del sonido en bosques. An lisis a o a comparativo de las medidas in situ, en laboratorio y de los valores predichos por un modelo, Doctoral dissertation, Facultad de Ciencias, Universidad de Valladolid, 2002. [13] I. Obeid, and P. D. Wolf, P. D, Evaluation of Spike-Detection Algorithms for a Brain Machine Interface Application, IEEE Transactions on Biomedical Engineering, vol. 51, no. 6, pp. 905-911, 2004. [14] D. Heeger. Signal Detection Theory, http://www.cns.nyu. edu/david/handouts/sdt/sdt.html, 1997. [15] C. H. Hansen, Fundamentals of Acoustics, Department of Mechanical Engineering, University of Adelaide. www.who.int/ occupational_health/publications/noise1.pdf. [16] A. Daz-S nchez, J. Ramrez-Angulo, A. Lopez-Martin, and E. S nchez a a Sinencio, A Fully Parallel CMOS Analog Median Filter, IEEE Transactions on Circuits and Systems-II: Express Briefs, vol. 51, no. 3, pp.116123, March 2004. [17] I. E. Opris, and G. T. A. Kovacs, A High-Speed Median Circuit,, IEEE Journal of Solid-State Circuits, vol. 32, no. 6, pp. 905-908, June 1997. [18] R. T. Edwards, and G. Cauwenberghs, Mixed-Mode Correlator for Micropower Acoustic Transient Classication, IEEE Journal of SolidState Circuits, vol. 34, no. 10, p.p. 1367-1372, Oct. 1999. [19] S. Mukhopadhyay, and G. C. Ray, A New Interpretation of Nonlinear Energy Operator and Its Efcacy in Spike Detection, IEEE Transactions on Biomedical Engineering, vol. 45, no. 2, pp. 180-187, Feb. 1998. [20] T. Kaiser, et al, Spatial aspects of UWB, in UWB Communication Systems, ed. M. G. Di Benedetto et al, New York: Hindawi, 2006, ch. 5, pp. 253-410. [21] D. Goldberg, A. G. Andreou, P. Juli n, P. O. Pouliquen, L. Riddle, and a R. VLSI Implementation of an Energy-Aware Wake-Up Detector for an Acoustic Surveillance Sensor Network. ACM Transactions on Sensor Networks, vol. 2, no. 4, pp. 594-611, 2006. [22] D. Istrate, E. Castelli, M. Vacher, L. Besacier, and J. F. Serignat,Information Extraction from Sound for Medical Telemonitoring, IEEE Transactions on Information Technology in Biomedicine, vol. 10, no. 2, pp. 264-274, Apr. 2006. [23] A. Bultheel, Wavelets with Applications in Signal and Image Processing, http://www.cs.kuleuven.be/ade/WWW/WAVE/contents. html, 2008.

54

J. A. Rodrguez, P. Juli n, a O. Lifschitz, O. Agamennoni

Instituto de Investigaciones en Ingeniera El ctrica - IIIE e (UNS-CONICET) Departamento de Ingeniera El ctrica y de Computadoras e Universidad Nacional del Sur Av. Alem 1253, (8000) Baha Blanca

Instituto Nacional de Astrofsica, Optica y Electr nica o Luis Enrique Erro No.1, Tonantzintla, Puebla, M xico e

Abstract In this paper, we present a VLSI design for a piecewise linear (PWL) function evaluator. This design is based on a dedicated microprocessor architecture that allows reprogrammability not only in the function, but also in the dimension (n = 1, , 6). The design was developed by using industry electronic design automation (EDA) tools and a standard CMOS 0.5 m technology. Logic and analog simulations show the correct operation of the design.

PWL representation, the reader is referred to [8]- [10]. Let us consider a function domain subdivided with a simplicial partition, using an equally sized grid. In a given simplex of the domain, an n-dimensional PWL function can be expressed as a weighted sum:

n+1

I. I NTRODUCTION Piecewise linear (PWL) functions are a mathematical abstraction widely used in circuit theory, computer graphics, and system identication [2]. The evaluation of this type of functions has been approached in different ways by diverse algorithms such as: simplicial paths [3], comparator architecture [4], and more recently neural networks [5]. In this paper, a PWL computing system which allows to represent and evaluate R6 PWL functions using the hyperplane path algorithm is presented. In a previous paper [1], a rst approach in the design of this system was presented; a methodology suitable for a digital realization and the evaluation steps to evaluate a given simplicial PWL function were introduced. In this paper, a VLSI architecture and the design of a full chip are described. A dedicated microprocessor, namely, PWLR6 -P has been designed in order to execute the calculation of the PWL function with a high degree of exibility. A micro-programmed control unit enables the setup of different congurations using the PWLR6 ISA (Instruction Set Architecture). Absolute and relative jump, ALU (Arithmetic Logic Unit), memory read and write, and register access instructions, provide a rich environment to exploit the PWLR6 -P functionalities. Section II presents a description of the PWL evaluation problem followed by a brief explanation of the algorithm used to solve it. Section III describes the system level architectural characteristics giving a detailed explanation about the main components of the system and communication protocols. Finally, section IV presents the VLSI implementation and the simulated results. II. T HE PWL FUNCTION In this section, a brief description of the simplicial PWL formulation is given. For further details about this kind of

F (X) =

i=1

ci i

(1)

where n indicates the n-th dimension, i are internal parameters to be explained in section II-A , and ci are the values of the function at the boundary vertices of the simplex where X is located. Figure 1 illustrates the geometrical representation of F (X) at an arbitrary simplex in a two dimensional domain. Notice that the triangular shadowed region, over the simplicial partition indicates the simplex where X is found, and {c1 , c2 , c3 } represents the set of values of F (X) at the simplex vertices.

F(x 1,x 2 ) c2 c1 c3 X

F(X)

x2

x1

Fig. 1.

A. PWL function evaluation From Eq.(1), we observe that in order to calculate the value of F (X) at input X, ci , and i are required. While the set of ci values is previously known, the -parameters must be computed. In order to do that, a sequential algorithm is followed hereinafter denoted as vertex addressing- that produces the decomposition of X by using the so called hypercube path [6].

55

That type of decomposition is reported in [3], and it states that a point belonging to a n-dimensional simplicial domain can be decomposed as:

n+1

2.6

2.3 2.0 v3 x1

X=

i=1

i vi

(2)

x2

where vi are the vertices of each simplex. In order to clarify Eq.(2), let us consider an example. Example 1: A - Let X = [2.1, 1.5] be a point in the R2 simplicial domain depicted in Fig.2.

Fig. 3.

v2 v 3= v 2=

v1

x2 1.5

B. Algorithmic scheme for evaluating F (X) The algorithm for the evaluation of F (X) is based on a sorted input data set to compute the i parameters, and also includes a binary-format vertex addressing procedure. The evaluation scheme is summarized as follows. 1) Input-Data: Let X = [x1 , x2 , , xn ] be a ndimensional input to Rn piecewise linear function. 2) Decomposition: Each xj -variable (for j = 1, 2, , n) is decomposed by integer and fractional part expressed by the notation xj = xintj .xf racj . 3) Sorting: let Xsorted = [xs1 , xs2 , , xsn ] be a vector which includes the xf racj elements sorted by the relation: xs1 xs2 xsn . 4) -Computation: i -parameters can be computed by 1 = xs1 j+1 = xs(j+1) xsj n+1 = 1 xsn (5) (6) (7)

2.1

Fig. 2.

x1

This point can be decomposed as follows. 1) First, we decompose X into integer, and fractional parts: Xinteger = [2, 1], and Xf ractional = [0.1, 0.5] 2) Then, we pick the minimum nonzero component of Xf ractional (i.e., 0.1 ), and decompose it as Xf ractional = 0.1 [1, 1] + 0.4 [0, 1] Notice that the second coefcient, 0.4, is obtained by the substraction 0.4 = (0.5 0.1) 3) After that, Xf ractional is factored as Xf ractional = 0.1 [1, 1] + 0.4 [0, 1] + 0.5 [0, 0] where the last coefcient in the sumatory, 0.5, is obtained by the complementary substration 0.5 = 1 (0.1 + 0.4) 4) Finally, X can be expressed as X= 1 2 + 0.1 1 1 + 0.4 0 1 + 0.5 0 0 (3)

B - Let {2.6, 2.0, 2.3} be the set of values of the function at the vertices of the simplex where the point X = [2.1, 1.5] is located. The previous decomposition of X gives as result 1 = 0.1, 2 = 0.4, and 3 = 0.5. The point Xinteger indicates the vertex [0, 0] of the simplex where the point X = [2.1, 1.5] is, and the vi code sequence: [1, 1], [0, 1], and [0, 0] denes the hyplerplane path closing the Xf ractional location at the selected simplex, and therefore the ci -i correspondence. Figure 3 shows the hypercube path and the ci -i correspondence for the described simplex. From Eq.(1), the evaluation of F (X) at X = [2.1, 1.5] is obtained as follows. F (X) = 0.1 (2.3) + 0.4 (2.0) + 0.5 (2.6) (4)

for j = 1, 2, , n 1. 5) Vertex addressing: In our design, the ci values are physically stored in a RAM, so a procedure for addressing the memory is required in order to select the ci that corresponds to any specic i parameter. Data and address in a RAM memory are expressed as binary numbers, therefore the addressing procedure must also be dened in binary format. a) Let V = xintn xint2 xint1 be a binary number formed by the concatenation of the integer part of all xi input variables, where i = {1, 2, , n}. b) Consider that S = Sn S2 S1 is a binary number (with the same word length as V ) composed by the concatenation of n Si -terms (each of them with the same length as xinti ) whose value can be either 00 00 or 00 01 . c) Let = {1 , 2 , , n } be a sequence indicating the order of the i-th elements of X in Xsorted (i.e: if Xsorted = [xf rac3 , xf rac1 , xf rac2 ] then = {3, 1, 2} ). The address memory, for a ci value

56

that corresponds to any specic i , is obtained by the following procedure: turn-on Sj , for j = {1, 2, , n} for k=0 to n DIRk = V + S turn-off Sk The notations turn-on and turn-off are used to indicate the process of setting Sj = 00 01 and Sj = 00 00 , respectively. It is important to notice that the set of (n+1) RAM addresses selected for a specic ci -term constitute the path in the n-dimensional hypercube. 6) F (X) evaluation: Equation (1) is evaluated in accordance with the ci -i correspondence given by the hypercube path. C. Hardware realization The hardware realization of the previous algorithmic is implemented as follows. In the rst place, the PWL function values at the simplicial vertices (ci ) are stored in a RAM memory. A convenient way to store these values is to code the concatenated binary value of the PWL function domain vertices into a physical memory direction. This allows the system to scan the PWL function by following an ordered sequence related to the dimension of the function domain. In Fig. 4, a two dimensional PWL function is depicted to illustrate the relationship among the vertices and the RAM memory directions, here denoted by add =< x2 x1 >. In the second place, digital structures are needed to perform the Input, Decomposition, Sorting, -Computation, Vertex Addressing, and Evaluation steps. Most of this hardware is already described in [1], except for the vertex addressing. The hardware designed for this operation consists in one shift register, VRT, which stores integer parts of xi and another register, RSX, which serves as the S binary number mentioned in section II-B. Example 2: The vertex addressing procedure for X = {0.25,2.75} (point i in Fig. 4) is the following: VRT = [xint2 ,xint1 ] = [10,00] RSX = [01,01] ADD0 = [10,00] + [01,01] = [11,01] Data0 = D As = {1,2} then RSX = [01,00] after turning off RSX1 component(equivalent to S1 ). ADD1 = [10,00] + [01,00] = [11,00] Data1 = E Then RSX=[00,00] after turning off RSX2 component. ADD2 = [10,00] + [00,00] = [10,00] Data2 = F III. PWLR6 S YSTEM Although the algorithmic scheme described in the previous section is valid for an n-dimensional domain, the designed system is able to perform the F (X) evaluation for n = 1, 2, . . . , 6. Since n = 6 is the maximum programmable dimension in our design, the system will be referred hereinafter as PWLR6 system. Its main component is the PWLR6 -P. A description

D E

C F A

x2

11

x1 i

10 01 00 01 11 10

00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 A B

E D

Fig. 4.

RAM addressing.

of the the designed datapath for the PWLR6 -P and a in depth explanation of the operations involved in the PWL evaluation algorithm are given in [1]. In this work, a system level description, I/O architecture, data memory subsystem , and control unit for the PWLR6 -P are presented. A. System Level The PWLR6 system involves two main components: the PWLR6 -P) in charge of executing the evaluation program and the data RAM which stores the ci values of the PWL function, also called PWL function image. The PWLR6 system was developed as a coprocessor architecture. It is intended to work with a master system that: (1) congures the PWL function image data; (2) congures the internal program memory with an evaluation program; and (3) sends different processing requests (Fig. 5). The three different tasks are described next.

Program channel

Program I/O

MASTER

XY channel XY I /O

PWL R6 P

Memory BUS

16 Mb Data Memory

PWL R

System

Fig. 5.

PWLR6 system.

1) PWL function image data: the PWLR6 system requires the ci values to be stored in a 16 Mbyte data RAM (the use of a RAM memory provides the reprogrammability). This data

57

will be read by the evaluation program in order to compute the processing requests. 2) Evaluation programs: The PWLR6 -P can be programmed to perform different tasks; evidently it is optimized for PWL function evaluation. This programming capability enables the master to set congurations for R1 up to R6 PWL function evaluations. Other programs can be dened using the PWLR6s ISA (Instruction Set Architecture); i.e. the congure data RAM program receives the 16Mbyte PWL function data and stores it in the data RAM. 3) Processing requests: Once the PWL function image is stored in data RAM and the Ri , i = {1, 2, , 6} evaluation program is loaded for execution, the system is ready to compute the processing requests. The master inputs an X, the PWLR6 -P computes the functions value for this point and produces the output. B. PWLR6 -P I/O and communication protocols. The PWLR6 -P has two main I/O ports: (1) XY I/O and (2) program I/O. 1) The XY I/O is used to send and receive 8/16/24 bit data. An 8 bit bidirectional bus, a 38 (3 blocks of 8 bits) shift register, require-bit and acknowledge-bit, conform a communication channel with an asynchronous system. In the PWLR6 system, the master sends the 24-bit xi values that dene the X where the evaluation must be computed and receives the 24-bit output through XY I/O (in both cases, using 3 send-receive transactions). The 8-bit PWL function image values are sent using this port too. The protocol to send an 8-bit data works as described in Fig. 6. It can be seen from the gure, that REQ and ACK signals are used to synchronize the transaction. On the one hand, if the master establishes a datum to be sent, it will not overwrite this data until it receives the acknowledgement; on the other hand, the PWLR6 -P will not read two times the same message because it waits for the reset of request signal that depends on the conclusion of the previous transaction.

MASTER

C. Data memory subsystem As it was previously mentioned, the data memory subsystem addresses a 16Mbyte RAM. For the highest programmable dimension R6 a 4-bit integer part is dened by each xi as it was stated in [1]. This results in 16 divisions per domain and 166 values to dene the PWL function. This RAM requires a 24-bit address. Due to pin count limitations of the design, the memory addressing procedure is multiplexed using a similar structure to Intels 8086 [7]. An internal 16-bit Data Address Memory Register (DAMR) outputs the 24-bit address in two cycles, the 16-bit lower part is stored in an external register (ALE) and the 8-bit higher part is stored in the DAMRs lower byte. The 8bit addressed data is nally read into the DAMRs higher byte: RAM address = DAMR [7:0] & ALE. The data memory subsystem is controlled by using read enable (ReadMem) and enable memory (EnMem) bits. Figure 7 shows a block description of this subsystem.

PWL R

MADR

15-0 23-16

Address

RAM

ALE

Data

Fig. 7.

Memory subsystem.

D. PWLR6 -P Control The PWLR6 -P control is designed using a microprogrammed approach. An on chip memory stores 256 (micro) program words. This memory is actually the program memory; no addressing capabilities are provided for external program memory. The control unit fetches (micro) instructions from this memory and decodes them to produce the control bits.

Program memory

Data Fetch Logic

. . .

PWL R6 P

. . .

wait for ACK=0 establish DATA set REQ=1 wait for ACK=1

. . .

. . .

MPC

. . .

MIR

Fig. 6.

XI I/O protocol.

Fig. 8.

Decode Logic

Control bits

2) Program I/O allows the PWLR6 -Ps to load a programm. A bit line and a clock line establish a synchronous communication channel. 20-bit program words are stored in a 20-bit shift register that is used during the programming as input data for internal program RAM.

Control unit.

Program memory is addressed with an MPC (Micro Program Counter) and program words are stored in an MIR (Micro Instruction Register). Decode logic uses MIR data to

58

Opcode 0 1 2 3 4 5 6 7 8 9 A B C D E F Mnemonic DIR CTRL VRT SR38 ALU OPS REG DIR MADR HLA MADR LA MEM BITS MADR IN JMP RELB JMP RELF SET RSX NOP JMP ABS MADR OUT SR48 CTRL SR48 CTRL Description Set control bits 0 to 15 Enable/shift for VRT and SR38 ALU operation (CMP, ADD, SUB, MUX) Register direct load MADR=Add[15:0] MADR[7:0]=Add[23:16] & Set ALE bit Set ACK, EnMem or ReadMem bits Data input in MADR higher byte Jump relative backward Jump relative fordward Control signals for register RSX No operation Jump absolute Data output in MADR higher byte SR48 output register enable SR48 output register shift TABLE I PWLR6 I NSTRUCTION SET

RTL

std_cells.db RTL level simulation (ModelSim)

constraints involved denition of four clocks, specifying false paths among clocks in order to separate clock domains and setting of driving and loading parameters for input and output ports. Pads cells were instantiated in the VHDL code in order to perform synthesis with pad timing and pad capacitance information. After synthesis, the gate level (GL) description with verilog standard cells denition were simulated with Modelsim. Both RTL simulation and GL simulations were performed using 8 data sets designed to cover most of the functionalities. After that, a formal verication between the RTL and the GL netlist was done in Synopsyss Formality.

evaluate the data paths control bits. The 256 word microcode may include absolute/relative jumps, memory RD/WR instructions, ALU operations and dedicated registers instructions. Table 1 summarizes the instruction set architecture designed for the PWLR6 -P. This instruction set lets the programmer use all of the hardware available inside the PWLR6 -P. In order to gain exibility a special opcode 0000 is used to set the control bits (control bits 0 to 15) directly. This allows the programmer to move data between registers in an arbitrary way. Bus contention logic is included to avoid short circuits if errors occur in programming. Figure 9 shows the 20-bit micro instruction format; opcode is 4 bits wide; opcode dependable bits width depend on the actual micro instruction as well as the addressing bits width.

19

Opcode Opcode dependable parameters Addressing Addressing

Gate Level

Layout (blueprint)

0

Layout + DRC + LVS (Virtuoso)

Tape out

Fig. 9.

Fig. 10.

Design Flow.

A nal characteristic that is also worth mentioning is that the PWLR6 -P is a pipelined architecture. Fetch / Decode / Execute cycles are performed in parallel; dedicated logic was designed in order to insert wait cycles for jump instructions. IV. I MPLEMENTATION AND SIMULATION A. Logic Implementation The PWLR6 -P logic implementation was done following a standard cell VLSI design ow. The standard cell library used for this implementation was AMI 05 OSU standard cells [11]. In the rst step of the the design ow, the HDL (Hardware Description Language) description of the design was done in VHDL. A test bench was programmed including all the system level blocks, and the design was simulated at register transfer level (RTL) with Mentors Modelsim. The second step consisted in performing the logic synthesis of the design with Synopsyss Design Compiler. The synthesis

B. Physical Implementation The place and route was done using Cadences Encounter. The verilog output, including the synthesized clock tree was simulated in Modelsim. A static timing analysis of the placed and routed design was performed using Synopsyss Prime Time with the nets back annotated parasitic capacitance. Finally, the full layout was generated in Cadence Virtuoso Layout Editor using NCSU tech library (Fig. 11). A fast Spice simulation was run. In order to shorten times, a special test bench was congured. A 20 instruction program using most of the logic (buses, ALU, registers, etc) was simulated, and a stimulus le was generated in Modelsim for simulation in the Cadences Ultrasim Simulator. Figures 12 and 13 show that both results, logic and fast analog simulation for this data, were consistent.

59

Fig. 13.

Juan Agustn Rodrguez has a (Type I) grant from CON ICET. Vctor Manuel Jim nez is an Associated Type-C re e searcher at the National Institute for Astrophysics, Optics and Electronics (INAOE), Puebla, M xico. e The authors would like to thank Alfonso Chac n Rodrguez o for his support in EDA design ow and Dave Wilder from Synopsys Inc. for his support in Design Compiler. R EFERENCES

[1] V. M. Jimenez, J. A. Rodriguez, P. M. Julian, O. Agamennoni, M. Di Federico, Digital architecture for R6 PWL function computation, in Proc. Arg. School of Micro Nanoelectronics, pp. 1-6, 2007. [2] L. Castro, J. Figueroa, O. Agamennoni, BIBO stability for NOE model structure using HL CPWL functions, in Proc. of Modelling, Identication, and Control, 2005. [3] M. Chien and E. Kuh, Solving nonlinear resistive networks using piecewise-linear analysis and simplicial subdivision, IEEE Transactions on Circuits and Systems, vol. CAS-24, no. 6, pp. 305-317, 1977. [4] P. Mandolesi, P. Juli n, and A. G. Andreou, A scalable and proa grammable simplicial CNN digital pixel processor architecture, IEEE Transactions on Circuits and Systems-I: Regular papers, vol. 51, pp. 988996, 2004. [5] X. Sun, S. Wang, A Special Kind of Neural Networks: Continuous Piecewise Linear Functions, LNCS Advances In Neural Networks, pp. 375-379, 2005. [6] J. P. Bowen, Hypercubes, Practical Computing, 5(4), pp. 97-99, 1982. [7] Intel Co., Microprocessor and Peripheral Handbook, Volume 1Microprocessors, Intel, 1987. [8] P. Julian, A high level canonical piecewise-Linear representation: theory and applications, Doctoral Thesis, Universidad Nacional del Sur, Argentina, 1998. [9] P. Juli n and O. Agamennoni, High-level canonical piecewise linear a representation using a simplicial partition, IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, vol. 46, pp. 463480, 1999. [10] P. Juli n, A. Desages, and B. DAmico, Orthonormal high-level a canonical PWL functions with applications to model reduction, IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, vol. 47, pp. 702-712, 2000. [11] J. E. Stine, J. Grad, I. Castellanos, J. Blank, V. Dave, M. Prakash, N. Iliev, and N. Jachimiec, A Framework for High-Level Synthesis of System-on-Chip Designs, Proc. of IEEE Int. Conf. Microelectronic Systems Education, pp. 11-12, 2005.

Fig. 12.

V. C ONCLUSIONS In this paper we have presented the design of a PWL function evaluation core, starting from the mathematical abstraction level and ending in a compact VLSI implementation. In order to accomplish this, it was necessary to perform an in-depth analysis of the PWL evaluation problem and the hyperplane path algorithm. The domain dimension equal to six was selected in order to provide exibility to future users in a wide range of applications. From the digital design point of view, a middle complexity design that includes many of the key characteristics of a micro-processor was implemented, including a communication interface to interact with other systems. VI. ACKNOWLEDGMENT This work was partially funded by project PICT 2006 No.1835 and PICT 2006 No.1864 of ANPCyT, project PGIUNS 2006 No. 24/ZK17 and project PIP 2005-2006 No. 5048 of CONICET..

60

Ming-Dou Ker and Yuan-Wen Hsiao

Nanoelectronics and Gigascale Systems Laboratory Institute of Electronics, National Chiao-Tung University Hsinchu, Taiwan

AbstractThe impacts of charged-device-model (CDM) electrostatic discharge (ESD) events on integrated circuit (IC) products are presented in this paper. The mechanism of chiplevel CDM ESD event is introduced with some case studies on CDM ESD damages. Besides the chip-level CDM ESD event, the board-level CDM ESD event, which had been reported to cause damages in many customer-returned ICs, is also investigated in this work. The chip-level and board-level CDM ESD levels of several test devices and test circuits fabricated in CMOS processes are characterized and compared. The experimental results have shown that the board-level CDM ESD level of the test circuit is much lower than the chip-level CDM ESD level, which indicates that the board-level CDM ESD test is more critical than the chip-level CDM ESD test in the field applications. In addition, failure analysis reveals that the failure on the test circuit under board-level CDM ESD test is much severer than that under chip-level CDM ESD test.

effective ESD protection design against CDM ESD stresses has gotten more requests from IC industry. Besides chip-level CDM ESD issue, board-level CDM ESD issue becomes more important recently, because it often causes the ICs to be damaged after the IC is installed to the circuit board of electronic system. For example, board-level CDM ESD events often occur during the module function test on the circuit board of electronic system. Even though the IC has been designed with good chip-level ESD robustness, it would still be very weak in board-level CDM ESD test. The reason is that the discharging current during the board-level CDM ESD event is significantly larger than that of the chiplevel CDM ESD event. There are several papers addressing the phenomenon of the board-level CDM ESD events on real IC products [4], [5]. In these two previous works, the ICs which already passed the component-level ESD specifications were still returned by customers because of ESD failure. After performing the field-induced CDM ESD test on the ICs which have been mounted on the printed circuit board (PCB), the failure is the same as that happened in the customer returned ICs. This indicates that the real-world charged-board-model (CBM) ESD damage can be duplicated by the board-level CDM ESD test. The previous works have demonstrated that the board-level CDM ESD events indeed exist, which should be taken into consideration for all IC products. In this paper, the CDM ESD issue in CMOS ICs is comprehensively addressed, including chip-level and boardlevel CDM ESD events. The mechanisms of chip-level and board-level CDM ESD events are introduced and compared. The chip-level and board-level CDM ESD tests are performed to some test devices and test circuits fabricated in CMOS processes. Moreover, failure analysis is also performed to investigate the difference between the failures under chip-level and board-level CDM ESD tests. II. CHIP-LEVEL CDM ESD EVENT

I.

INTRODUCTION

With the advances of CMOS processes, integrated circuits (ICs) have been fabricated with thinner gate oxides to achieve higher operation speed and lower power consumption. However, in the field applications, electrostatic discharge (ESD) was not scaled down with CMOS technology. Thus, ESD protection design in nanoscale CMOS processes becomes a challenging task. There are three component-level (or called as chip-level) ESD test standards, which are human body model (HBM) [1], machine model (MM) [2], and charged device model (CDM) [3]. CDM ESD test becomes more and more critical, because the nanoscale CMOS devices are fabricated with thinner gate oxide, and more function blocks are integrated into a single chip for system-on-chip (SoC) applications with a larger die size. An IC with larger die size can store more static charges in its body, which leads to larger discharging current during CDM ESD events. In addition, CDM ESD event has huge peak current and short duration, which increase the difficulty to effectively protect the internal circuits against CDM ESD events. To provide efficient CDM ESD protection, the ESD protection device should be turned on quickly and has high ESD robustness. Furthermore, CDM ESD current flows from the chip substrate to the external ground, whereas HBM and MM ESD currents are injected from the external ESD source into the zapped pin. As a result, CDM ESD events often cause internal damage to CMOS ICs. The aforementioned features make CDM ESD protection for CMOS ICs more challenging. Recently,

A. Mechanism of Chip-Level CDM ESD Event During the assembly of IC products, static charges could be stored within the body of IC products due to induction or rubs. Once a certain pin of the IC chip is suddenly grounded, the static charges originally stored within the IC body will be discharged out through the grounded pin, which is called as the CDM ESD event and shown in Fig. 1. The CDM ESD event causes huge current (of ~10 A) in a very short time

61

period (of ~1 ns). There are many situations for the pins of an IC to touch ground. For example, when the pin touches grounded metallic surface or the pin is touched by grounded metallic tools, as shown in Fig. 2.

which leads to yield loss. There are several works addressing the cause of chip-level CDM ESD events during manufacturing of IC products [7]-[9]. In the packaging process of plastic-leaded-chip-carrier (PLCC) packages, the chips are induced to store static charges when they are carried by the carrier of the machine. When a certain pin of the charged chip is connected to external ground, CDM ESD event occurs. To solve this problem, the ionizing air blower can be utilized in the manufacturing environment to neutralize the static charges stored in the chips and the machines [7]. An IC fabricated in a 0.8-m CMOS process had been reported to have leakage current when it was normally biased, but it worked well during function test after fabrication. Failure analysis demonstrated that the gate oxide of the NMOS in the input buffer was damaged by CDM ESD event. After study, it was found that the socket of the IC tester was charged during function test, which induced the tested IC to store static charges. After finishing function test, the charged IC was placed on the grounded metallic table, and CDM ESD event occurred to damage the IC which has passed function test [8]. During the fabrication of ICs, separating the tape and die after cutting the die from wafer also causes substantial charge accumulation in the die. Measured by the Faraday cup, it was reported that the CDM ESD voltage could be more than 1000 V during the separation of the tape and die. Such a high CDM ESD voltage may damage the IC product [9].

Fig. 1. In CDM ESD event, the stored charges in the IC are quickly discharged through the grounded pin.

B. Case Study on Chip-Level CDM ESD Damage Different ICs have different die sizes, so their equivalent The CDM ESD current path in an input buffer fabricated parasitic capacitances (CD) are totally different from another in a 0.8-m CMOS process is shown in Fig. 3(a). This chip one. Thus, different ICs have different peak currents and passes 2-kV HBM and 200-V MM ESD tests. Although this robustness under CDM ESD tests. When a device under test chip is equipped with ESD protection circuit at its input pad, it (DUT) with the equivalent capacitance of 4 pF is under 1-kV is still damaged after 1000-V CDM ESD test. As shown in Fig. CDM ESD test, the CDM ESD current rises to more than 15 A 3(b), the failure point after CDM ESD test is located at the in several nanoseconds [6]. As compared with HBM and MM gate oxide of the NMOS in the input buffer. Duo to ESD events, the discharging current in CDM ESD event is not consideration of noise isolation between I/O cells and internal only huge, but also faster. Since the duration of CDM ESD circuits, the VSS of I/O cells (VSS_I/O) and the VSS of event is much shorter than that of HBM and MM ESD events, internal circuits (VSS_Internal) are often separated in the chip the IC may be damaged during CDM ESD events before the layout. As a result, the ESD clamp device located at the input ESD protection circuit is turned on. Capacitor will be a low- pad can not effectively protect the gate oxide of the input impedance device, when the signal frequency is increased. buffer during CDM ESD stresses, because there is no Thus, CDM ESD current is most likely to flow through the connection between VSS_I/O and VSS_Internal. The CDM capacitive structures of devices in ICs. In CMOS ICs, the gate ESD current which damages the gate oxide of NMOS in the oxides of MOS transistors are capacitive structures, so the gate input buffer is shown by the dash line in Fig. 3(a). Fig. 4 is the oxide is most likely to be damaged under CDM ESD events. failure picture of another IC after CDM ESD test. This IC was In nanoscale CMOS processes, the gate oxide becomes thinner, fabricated in a 0.5-m CMOS process. The scanning-electronwhich makes the equivalent capacitance per unit area larger. microscope (SEM) picture had proven that the failure caused Consequently, the thinner gate oxides of MOS transistors in by CDM ESD event is located at the poly gate of a MOS nanoscale CMOS processes are more vulnerable to CDM ESD transistor in the internal circuit that is connected to some input stresses. Besides, more functions are integrated into a single pad through metal connection. chip, which makes the die size larger. Under the same charged From these two aforementioned cases, the charges stored voltage, larger capacitance stores more static charges, so the in the body of chip still flow through the gate terminal of the CDM ESD current is larger for the IC with larger die size. Therefore, nanoscale CMOS ICs with larger die size and input MOS transistor in the internal circuits to damage its gate thinner gate oxide are very sensitive to ESD, especially CDM oxide during CDM ESD stresses, even though ESD protection circuit has been applied to the input pad. According to the ESD events. previous works, the pins around the corners in IC products are During the manufacturing of IC products, some of the more often to suffer CDM ESD events, because the corner steps had been reported to cause chip-level CDM ESD events, pins are usually first touched by external ground during

ISBN 978-987-655-003-1 EAMTA 2008

(a) (b) Fig. 2. CDM ESD event may occur when (a) the pin touches grounded metallic surface, or (b) the pin is touched by grounded metallic tool.

62

transportation or assembly [10]. In addition to HBM and MM ESD protection, how to design efficient CDM ESD protection circuit for IC products is an important consideration in component-level ESD protection design.

the IC chip is attached to the PCB, C1 and C2 are shorted and the charges stored in the IC chip and the PCB are redistributed. Consequently, the voltages across C1 and C2 will be equal and become (C1 V1 + C2 V2) / (C1 + C2) after they are connected together. The instantaneous current during the attachment of IC chip to PCB will be increased if the initial voltage difference between the IC chip and PCB is increased. The instantaneous current during the charge redistribution may be larger than 10 A, which can easily damage the IC to cause a CDM-like failure. This is one of the examples of board-level CDM ESD events. Moreover, installing the modules to the system during the assembly of microelectronic products also causes board-level CDM ESD events. To mitigate this impact, the ionizing air blower can be utilized in the manufacturing environment to neutralize the static charges stored in the IC chips and PCBs.

(a)

(b) Fig. 3. (a) CDM ESD current path in an input buffer. (b) The failure point is located at the gate oxide of the input NMOS. Fig. 5. The charges stored in the printed circuit board (PCB) and the IC chip will be redistributed when the IC chip is attached to the PCB.

Fig. 4. After chip-level CDM ESD test, the failure point is located at the gate oxide of an NMOS in the internal circuit.

III.

A. Mechanism of Board-Level CDM ESD Event In microelectronic systems, IC chips must be attached to the PCB. Before the attachment, static charges could be stored in the substrate of the chip or the metal traces on the dielectric layer in the PCB. During the attachment, the static charges originally stored in the IC chip or the PCB will be redistributed, as illustrated in Fig. 5. To illustrate the charge redistribution mechanism, two capacitors C1 and C2 are used to denote the parasitic capacitances of the IC chip and the PCB, respectively. Usually C2 is much larger than C1, because the size of PCB is much larger than that of the IC chip. The initial voltages across C1 and C2 are V1 and V2, respectively. C1 and C2 are not connected together in the beginning. When

63

After the IC chips are attached to the PCB, module function test is performed. During module function test, I/O pins of the module are connected to the instruments. If there are static charges stored in the module, board-level CDM ESD event will occur to damage the IC chips on the PCB. Besides, board-level CDM ESD event may also occur before module function test when the I/O pin is connected to the cable, and the other terminal of the cable is accidentally grounded. If the voltages across the equivalent capacitances of the chips and PCB are larger, more charges are stored, which leads to larger discharging current. To solve this problem, ESD dischargers consisting of large resistances (~ M) can be used to ground the I/O pins of the module before module function test. Although there is still current flowing through the IC chips, the current peak can be significantly reduced by the large series resistance. As a result, the chip can be protected from being damaged by the board-level CDM ESD event during module function test. In the assembly and testing of LCD monitor, board-level CDM ESD events may also occur. As shown in Fig. 6, when the driver ICs are attached to the LCD panel, charge transfer occurs, which causes board-level CDM ESD current flowing between the driver ICs and LCD panel to damage them. Moreover, the driver IC can be also damaged by such boardlevel CDM ESD events when a certain pin of the driver IC on panel is connected to ground during panel function test. The charges stored in the LCD panel will be discharged through the pins of the driver ICs to the external ground during the

IEEE Catalog number CFP0854E-CDR

panel function test. The ESD current paths are shown by the dash lines in Fig. 7. Since the on-glass thin-film transistors (TFTs) in LCD panel have higher operation voltage than that of the most digital ICs, the core circuits and I/O cells of LCD driver ICs have different operation voltages. Such ICs with multiple power domains have individual power pads and ground pads for each power domain. Once the aforementioned board-level CDM ESD events occur, ESD current will flow from the LCD panel through the output pad of the driver IC into the driver IC. Although ESD protection circuits have been applied to each output pad of the driver IC to bypass ESD current to the power pad (VCC) or ground pad (VSS1) within the power domain, the interface circuits between different power domains are often damaged during such board-level CDM ESD events due to the disconnection between the power pads or ground pads in different power domains. To solve this problem, ESD protection devices can be inserted between the power pads or ground pads in different power domains to provide ESD current paths between the separated power domains, as shown in Fig. 8 [11].

B. Case Study on Board-Level CDM ESD Damage Recently, it has been reported that the real-world CBM ESD damage is caused by the board-level CDM ESD event [4], [5]. In [5], a LCD driver IC had passed 4-kV HBM, 200-V MM, and 500-V CDM ESD test, but it was still returned by customer. Failure analysis had shown that the ESD protection diode was damaged with a CDM-like failure. To verify the ESD damage, the board-level CDM ESD test was performed to the LCD driver IC. In the board-level CDM ESD test, the IC and the PCB on which the IC is mounted are both put on the charging plate of the conventional field-induced CDM ESD tester, as shown in Fig. 9. After +1000-V board-level CDM ESD test, the LCD driver IC was damaged. Failure analysis showed that the IC after board-level CDM ESD test exhibits the same failure as that found in the customer returned IC, as shown in Fig. 10. This experiment had demonstrated that performing the board-level CDM ESD test can successfully duplicate the failure in the customer returned IC.

Fig. 6. During panel function test, connecting the pins of the driver IC to external ground will cause board-level CDM ESD event.

Fig. 9. The IC was attached to PCB and placed on the charging plate of fieldinduced CDM ESD tester to perform board-level CDM ESD test.

(a) (b) Fig. 10. SEM cross sectional pictures of the ESD protection diode in the (a) customer returned IC and (b) IC after +1000-V board-level CDM ESD test.

IV.

Fig. 7. During board-level CDM ESD event, ESD current flows from the LCD panel through the interface circuits of driver IC to the grounded pins.

Fig. 8. ESD protection devices are inserted between different power domains to provide ESD current paths between the separated power domains.

In this section, the board-level and chip-level CDM ESD tests have been performed to several CMOS ICs. There are two components to be tested, which are the stand-alone gategrounded NMOS (GGNMOS) and a 2.5-GHz high-speed receiver circuit. The equivalent capacitance of the PCB in the board-level CDM ESD test in this test is 274 pF. The main difference between board-level CDM and chip-level CDM ESD test is that the IC and PCB are both charged in boardlevel CDM ESD test, whereas only the IC is charged in the chip-level CDM ESD test. Since the equivalent capacitance of the PCB is significantly larger than that of the DUT, more charges are stored and discharged in board-level CDM ESD test. Therefore, it is expected that the board-level CDM ESD test is more critical than the conventional chip-level CDM

64

ESD test. The measured results on the chip-level and boardlevel CDM ESD levels with the different test components are compared. In addition, failure analysis has been performed to characterize the failure mechanism. A. Test With Gate-Grounded NMOS A GGNMOS fabricated in a 0.18-m CMOS process was used as the DUT for the chip-level and board-level CDM ESD tests. GGNMOS is a widely used ESD protection device in CMOS ICs. In a GGNMOS, the drain terminal is connected to the protected pad, whereas the gate, source, and bulk terminals are connected to the VSS power line of the IC. The equivalent capacitance of this GGNMOS in IC package between its drain terminal and substrate is 6.2 pF. In the chip-level and boardlevel CDM ESD tests, the drain terminal of the GGNMOS is tested. Fig. 11 (a) and (b) show the measured current waveforms under 1-kV chip-level and board-level CDM ESD tests, respectively. The peak currents under chip-level and board-level CDM ESD tests are 11.04 A and 19.67 A, respectively. Under the same charged voltage, the peak discharging current under board-level CDM ESD test is significantly larger than that under chip-level CDM ESD test. Such a huge discharging current with a very short rise time can easily damage the GGNMOS.

B. Test With 2.5-GHz High-Speed Receiver Interface Circuit A 2.5-GHz differential high-speed receiver interface circuit fabricated in a 0.13-m CMOS process was also verified with the chip-level and board-level CDM ESD tests. Fig. 12 shows the circuit schematic of the 2.5-GHz differential high-speed receiver interface circuit with on-chip ESD protection design. The receiver interface circuit has the differential input stage realized by PMOS transistors. The double-diode ESD protection scheme is applied to each differential input pad. Besides ESD protection devices at the differential input pads, the power-rail ESD clamp circuit has been designed to provide ESD current path between VDD and VSS. P-type substrate-triggered silicon-controlled rectifier (PSTSCR) [12] was used in the power-rail ESD clamp circuit because SCR devices had been reported to have high ESD robustness under a small device size.

Fig. 12. Test circuit of 2.5-GHz high-speed receiver interface circuit for chiplevel and board-level CDM ESD tests.

(a)

(b) Fig. 11. Measured current waveforms of GGNMOS under (a) +1-kV chiplevel CDM ESD test, and (b) +1-kV board-level CDM ESD test.

Because of high-speed application, the dimensions of the ESD protection diodes at the input pads are limited to reduce the parasitic capacitance at the pads. Besides, the ESD protection devices and the inverter of the power-rail ESD clamp circuit were placed under the bonding pad to save chip area. A reference high-speed receiver interface circuit without on-chip ESD protection design was also fabricated in the same process to compare its ESD robustness. The tested pin under chip-level and board-level CDM ESD tests is the Vin1 pad. The chip-level and board-level CDM ESD levels of the reference high-speed receiver interface circuit without ESD protection are quite poor, which failed at 100 V and 50 V, respectively. With the on-chip ESD protection circuits, the failure voltages under chip-level and board-level CDM ESD tests are greatly improved to +2000 V/-1300 V and +1300 V/900 V, respectively. Again, the board-level CDM ESD level is lower than the chip-level CDM ESD level. Failure analysis had been performed on the ESD-protected high-speed receiver interface circuits after chip-level CDM ESD test of -1300 V and board-level CDM ESD test of -900 V. The SEM failure pictures after chip-level and board-level CDM ESD tests are shown in Fig. 13(a) and (b), respectively. The failure points are located at the ESD diode DP1. Although the ESD protection devices are successfully turned on during CDM ESD tests, huge current during CDM ESD tests still damages the ESD protection devices. According to the SEM failure pictures, the failure in Fig. 13(b) is much worse under boardlevel CDM ESD test than that in Fig. 13(a) under chip-level CDM ESD test. This again demonstrates that the board-level CDM ESD event is more critical than the chip-level CDM ESD event.

65

IEEE Catalog number CFP0854E-CDR

chip-level and board-level CDM ESD events, will become more critical and should be taken into consideration in ICs and microelectronic systems which are realized in nanoscale CMOS processes. ACKNOWLEDGMENT This work was supported by National Science Council (NSC), Taiwan, under Contract of NSC96-2221-E-009-182. REFERENCES

[1] Electrostatic Discharge (ESD) Sensitivity Testing Human Body Model (HBM), 1997. EIA/JEDEC Standard EIA/JESD22-A114-A. [2] Electrostatic Discharge (ESD) Sensitivity Testing Machine Model (MM), 1997. EIA/JEDEC Standard EIA/JESD22-A115-A. [3] For electrostatic discharge sensitivity testing - Charged Device Model (CDM) - component level, ESD Association Standard Test Method ESD STM-5.3.1, 1999. [4] A. Olney, B. Gifford, J. Guravage, and A. Righter, Real-world charged board model (CBM) failures, in Proc. EOS/ESD Symp., 2003, pp. 34-43. [5] C.-T. Hsu, J.-C. Tseng, Y.-L. Chen, F.-Y. Tsai, S.-H. Yu, P.-A. Chen, and M.-D. Ker, Board level ESD of driver ICs on LCD panels, in Proc. IEEE Int. Reliab. Phys. Symp., 2007, pp. 590-591. [6] L. Henry, J. Barth, H. Hyatt, T. Diep, and M. Stevens, Charged device model metrology: limitations and problems, Microelectron. Reliab., vol. 42, no.6, pp. 919-927, Jun. 2002. [7] W. Tan, Minimizing ESD hazards in IC test handlers and automatic trim/form machines, in Proc. EOS/ESD Symp., 1993, pp. 57-64. [8] H. Sur, C. Jiang, and D. Josephs, Identification of charged device ESD induced IC parameter degradation due to tester socket charging, in Proc. Int. Symp. for Testing and Failure Analysis, 1994, pp. 219-227. [9] J. Bernier and G. Croft, Die level CDM testing duplicates assembly operation failures, in Proc. EOS/ESD Symp., 1996, pp. 117-122. [10] M. Tanaka and K. Okada, CDM ESD test considered phenomena of division and reduction of high voltage discharge in the environment, in Proc. EOS/ESD Symp., 1996, pp. 54-61. [11] M.-D. Ker, C.-Y. Chang, and Y.-S. Chang, ESD protection design to overcome internal damages on interface circuits of a CMOS IC with multiple separated power pins, IEEE Trans. Components and Packaging Technologies, vol. 27, no. 3, pp. 445-451, Sep. 2004. [12] M.-D. Ker and K.-C. Hsu, Overview of on-chip electrostatic discharge protection design with SCR-based devices in CMOS integrated circuits, IEEE Trans. Device Mater. Reliab., vol. 5, no. 2, pp. 235-249, Jun. 2003. [13] T. Maloney, Designing MOS inputs and outputs to avoid oxide failure in the charged device model, in Proc. EOS/ESD Symp., 1988, pp. 220227. [14] M.-D. Ker, H.-C. Jiang, and J.-J. Peng, ESD protection design and verification in a 0.35-m CMOS ASIC library, in Proc. IEEE Int. ASIC/SOC Conf., 1999, pp. 262-266. [15] M.-D. Ker, Charged device mode ESD protection circuit, U.S. Patent 5901022, May 4, 1999. [16] M.-D. Ker and C.-Y. Chang, Charged device model electrostatic discharge protection for integrated circuits, U.S. Patent 6437407, Aug. 20, 2002. [17] M.-D. Ker, H.-H. Chang, and W.-T. Wang, CDM ESD protection design using deep N-well structure, U.S. Patent 6885529, Apr. 26, 2005. [18] C. Brennan, S. Chang, M. Woo, K. Chatty, and R. Gauthier, Implementation of diode and bipolar triggered SCRs for CDM robust ESD protection in 90nm CMOS ASICs, in Proc. EOS/ESD Symp., 2005, pp. 380-386.

(a)

(b) Fig. 13. SEM failure pictures on the failure points of the 2.5-GHz high-speed receiver interface circuit after (a) -1300-V chip-level CDM ESD test, and (b) -900-V board-level CDM ESD test.

V.

CONCLUSION

Both the chip-level and board-level CDM ESD issues in CMOS ICs have been comprehensively addressed in this work. The mechanisms of chip-level and board-level CDM ESD events are presented with some case studies. Then, the chiplevel and board-level CDM ESD tests are performed to several test devices and test circuits fabricated in 0.18-m and 0.13m CMOS processes. Measured results have shown that the board-level CDM ESD tests are more critical than the chiplevel CDM ESD tests. There were several designs reported for chip-level CDM ESD protection [13]-[18]. However, no design against board-level CDM ESD events is reported so far. In the nanaoscale CMOS processes, the gate-oxide becomes thinner, which degrades the CDM ESD robustness of CMOS ICs. In high-speed or radio-frequency (RF) applications, large ESD protection devices can not be applied to the I/O pad due to the limitation on parasitic capacitance, which further increases the difficulty on CDM ESD protection design. Moreover, the die size becomes larger in SoC applications, which indicates that more charges can be stored in the substrate of chip. Consequently, CDM ESD issues, including

66

Mustansir Y. Mukadam

Department of Electrical and Computer Engineering Cornell University Ithaca, NY mym3@cornell.edu

AbstractThis paper describes the design of an optical receiver analog front end for a low power, high speed, chip-to-chip or board-to-board communication system. The circuit has been designed in 80nm CMOS and consumes 2mW with a 1-V supply. Using active inductors, a transimpedance gain of 62.5 dB-, limiting amplifier gain of 15dB, and overall bandwidth of 9GHz was achieved. This work presents the highest gain-bandwidth product and lowest power consumption for optical front-end receivers in this process reported to date.

Alyssa B. Apsel

Department of Electrical and Computer Engineering Cornell University Ithaca, NY apsel@ece.cornell.edu

I.

INTRODUCTION

The well known advantages offered by an optical medium in terms of latency, crosstalk, and bandwidth have generated recent interest in using optical interconnects as a replacement for high speed, short haul, electrical wires. [1]. New applications being explored include chip-to-chip and on-chip communications channels or busses in microprocessors and routers. In order to compete with purely electrical ICs, optoelectronic ICs must exhibit not only high speed and high gain, but they must enable low power consumption to make short distance optical communication an attractive alternative. The front end of an optical receiver generally consumes a high percentage of the total power in a short distance optical link. It consists of a photodiode coupled with a transimpedance amplifier and a limiting amplifier. This configuration, shown in Fig. 1, converts information from the optical domain to the electrical domain. Depending on the application, further data processing, such as clock recovery, may be performed on the data. This paper describes the design of a low power front end for an optical receiver comprising of a transimpedance amplifier (TIA) and a cascade of gain stages which realize a limiting amplifier (LA) using IBMs CMOS9SF process. This circuit may be used in the first block of such a system to dramatically improve the efficiency of high speed optical communication. Without the use of inductors, the front end was able to achieve high speed and high gain. This enabled a compact design while maintaining high performance and low power operation.

II. A.

Implementation The TIA topology shown in Fig. 2 uses a basic common gate transimpedance amplifier with regulated cascode

67

feedback. A single ended topology was preferred over a differential one to enable low power operation and avoid the need for an additional capacitor at the negative input to balance the photodiode capacitance. Unlike a conventional regulated cascode TIA [3], where node Vy would require two gate source voltages to keep transistor M3 in saturation, the introduction of the common-gate configuration of transistor M2 allows Vy to be biased at one gate-source and one drainsource voltage drop, as demonstrated in [2]. This is extremely crucial for proper biasing of all transistors since the supply voltage is only 1V for the 80nm process, thereby limiting headroom. Transistor M1 is implemented as a gm-boosted amplifier and provides the feedback path. A fraction of the input resistance of the TIA is accounted for by 1/gm of M1. To prevent the input pole from being the dominant pole, gm of M1 should be made large. This could be accomplished either by increasing the size of M1 or increasing the current through it, but the former would increase the capacitance at the output node and deteriorate the TIA bandwidth while the latter would result in more power consumption. One solution is to use transistors M2 and M3 as gain stages in the feed-forward path. This increases gm of M1 by a factor of the combined feedforward gain, thus lowering the input resistance of the TIA, pushing the input pole further to the right. Power consumption for the TIA is adjusted by controlling the current supplied by the current source transistor M4. Also, the feed-forward gain configuration allows for a large gm for M1 while keeping transistor sizes, and thus the current, small. This makes the TIA suitable for low power applications. B. Transimpedance Gain A small signal analysis of the circuit gives the following intermediate gain equations:

The capacitance of the photodiode at the input node is assumed to be the dominant capacitance, thereby giving Cin = Cdiode. Z in = Vz I in 1 g mM 1 (1 A2 A3 ) + sCdiode (6)

(6) shows the increase in transconductance of transistor M1 by a factor of the feed-forward gain. This allows the use of smaller transistors to reduce power consumption and increase bandwidth due to the reduced capacitance associated with each transistor. The trans-impedance gain of the circuit is given as:

Vo I in Ro

Z TIA =

sCdiode

g mM 1 (1 A2 A3 )

(7)

From (7) it is evident that the TIA is represented by a 4 pole system with a transimpedance gain of Ro at low frequencies. A direct trade off between gain and bandwidth is observed since ZTIA is inversely proportional to the bandwidth. III. LIMITING AMPLIFIER A. Implementation The limiting amplifier is used to boost the low amplitude, high speed signal from the TIA to logic levels; therefore it requires high gain and high bandwidth. Since the feedforward stage of transistors M2 and M3 in Fig. 2 provides substantial voltage gain, the topology of the TIA was used as one gain cell of the LA with the modification that the voltage input to the gain cell is at the gate of transistor M4 (at Va) instead of at the drain. Also, since the TIA is able to support a signal at 10Gbps, using the same topology for the gain cell ensures that the front end would not be bandwidth limited. B. Voltage Gain Performing a small signal analysis for the gain of a single cell of the LA gives the following equations: g mM 4 Rz V (8) A4 = z = Va 1 + sCz Rz Rz is obtained from (4). Cz is approximated as Cd4 + Cs1 + Cs2.. A1 = Vo Vy = g mM 1 Ro 1 + sCo Ro (9)

A2 =

Vx Vz

g mM 2 Rx 1 + sC x Rx

(1)

A3 =

Vy Vx

g mM 3 R y 1 + sC y R y

(2)

Cx is approximated as Cg3 + Cd2. Cy is approximated as Cg1 + Cd3. Rx and Ry are the resistances at node x and y, respectively. Solving the circuit equation, the loop gain of the TIA is given by

A=

Vo Vz

(3)

where Co = Cnext-stage + Cd1 and Ro is the total resistance at the output node, which is the parallel combination of the active inductor and cascade of M1 and M4. The input impedance of the TIA is given as: Rin = 1 g mM 1 (1 A2 A3 ) || 1 g mM 2 || 1 g dsM 1 (4)

68

where A2 and A3 are obtained from (1) and (2). The gain cell is a 4 pole system with inverting gain. In this work, two cascaded gain cells stages were designed to realize the LA.

In the next section, the use of active inductors as load elements for an increase in both gain and bandwidth is described.

the zero associated with the active inductor of gain stage A2 (associated with transistor M2) is used to cancel the lower frequency pole associated with the active inductor of gain stage A3 (associated with transistor M3). The same was done for the zero associated with gain stage A3 and the pole of gain stage M1.This ensured that only those poles that occurred beyond the cutoff frequency required for a 10Gbps operation remained in the circuit. A concern with pole zero cancellation for general systems is that this compensation technique can potentially place a pole in the right half plane. For this circuit, however, both poles obtained from the second order equation will be in the left half plane since the coefficient of s is positive. This ensures system stability. V. SIMULATION RESULTS The optical receiver was designed using IBMs CMOS9SF process. The photodiode was modeled as a current source in parallel with a 40fF capacitor. The circuit was simulated with a 231-1 PRBS current input at 10Gbps. Circuit simulation was carried out in the Cadence design environment. Simulation waveforms are shown in Fig. 4, 5, and 6. Table 1 and 2 lists the performance parameters of the TIA and LA, respectively. Table 3 compares this work with other high speed optical front end designs in sub micron CMOS.

TABLE I. KEY PERFORMANCE PARAMETERS OF THE TIA Simulation result 62.5 dB 11.5 GHz 0.86 mW 5 A 100 A 28.8 pA/Hz

IV.

Active inductors were designed for the loads of each amplifying stage in Fig. 2. The advantages of using an active inductor are its tunability as well as considerably reduced chip area. By using a transistor as part of the design, its parasitic capacitances are exploited to position the poles and zeros to emulate inductive behavior [4]. The gate-source capacitance of the transistor was used in this case. The active inductor topology used is shown in Fig. 3. It consists of a common source FET with resistive feedback [5]. The resistive divider of R1 and R2 was used to allow for easy biasing of the gate of the PMOS and therefore ease headroom requirements. Combining all other resistances and capacitances seen at the drain of the PMOS in parallel with the active inductor as Ri and Ci respectively, the impedance of the active inductor is given by: R2 + R1 + sC gsMin R1 R2 Ri Z out = || (11) 1 + g mMin R1 + sC gsMin R1 1 + sCi Ri

| Z out ( s = 0) |= Ri || R1 + R2 1 + g mMin R1

Performance Parameter Transimpedance gain Bandwidth Power consumption Dynamic Range Input Referred Noise @ 10GHz TABLE II.

KEY PERFORMANCE PARAMETERS OF THE LA Simulation result 15 dB 10 GHz 1.21 mW COMPARISON OF HIGH SPEED FRONT END DESIGNS Blocks TIA TIA + LA TIA TIA + LA TIA + LA Total Power 6.5 mW 103.2 mW 2.2 mW 10 mW 2 mW Gain 178 31400 400 1900 6600 BW 19 GHz 6.84 GHz 20 GHz 10Gbps* 9 GHz

and | Z ( s = ) |= R . Therefore,

out 2

if R >>

2

R1 + R2 1 + g mMin R1

constraint had to be considered when designing the block for high gain and inductive behavior for each intermediate gain stage. The active inductor exhibits a zero at

zero

R1 + R2 C gsMin R1 R2

Process

and two poles which are obtained from the solution of the second order equation:

s 2CgsMinCi R1R2 Ri + s Ci Ri ( R1 + R2 ) + CgsMin R1( Ri + R2 ) + R1 + R2 + Ri + gmMin R1Ri = 0

The poles and zeros of each active inductor were established by the method of pole-zero cancellation used to extend the bandwidth for the circuit, as demonstrated in [6]. By using the equations for the poles and zero of an active inductor, the active inductor elements were sized such that

ISBN 978-987-655-003-1 EAMTA 2008

80 nm [7] CMOS 0.18 m [9] CMOS 80 nm [2] CMOS 0.12 m [8] CMOS This 80 nm ** work CMOS * Bandwidth not reported ** Simulation results.

69

Frequency (Hz) Figure 4: Eye diagram of output waveform at 10Gbps Figure 5: Simulated AC Gain of front end

The dynamic range was measured as the range of current input amplitudes over which the transimpedance gain of the TIA remains relatively constant at 62.5 dB. A Monte Carlo simulation was carried out to estimate the variation in the performance parameters of the front end over process. Over 100 process runs, the standard deviation over mean for the gain of the TIA was calculated as 1.5%. Standard deviation over mean for the DC power consumption of the TIA and LA over 100 runs was 9.1% and 7.3% respectively. This work shows a lower bandwidth than competing designs in 80nm CMOS. This is due to the fact that no inductors were used in order to conserve area. By designing the TIA for large transimpedance gain, fewer gain stages are required in the LA for substantial gain from the front end. The benefit of using fewer gain stages is seen when the following expression for system bandwidth is considered:

BW3dB _ system = BW3 _ dB _ one _ stage

N

transistors, reducing the overall power consumption of the TIA and LA. The use of active inductors allows for greater gain and bandwidth extension and, along with a single ended front end design, conserves area. Even without the use of inductors, to the authors knowledge, the front end has the highest transimpedance gain-bandwidth product and lowest power consumption of optical receiver front ends in 80nm CMOS technology. REFERENCES

D. Miller, Rationale and challenges for optical interconnects to electronic chips, Proc. IEEE, vol. 88, pp. 728-749, June 2000 [2] G. Kromer, G. Sialm, T. Morf, M. L. Schmatz, F. Ellinger, D. Erni, H. Jackel, A Low-Power 20-GHz 52-dB Transimpedance Amplifier in 80-nm CMOS, IEEE J. Solid State Circuits, vol. 39, No. 6, June 2004 [3] S. M. Park, H. Yoo, 2.5 Gbit/s CMOS transimpedance amplifier for optical communication applications, IEE Electronics Letters, vol. 39, Issue 2, 23 January 2003 [4] A. Thanachayanont, A. Payne, VHF CMOS integrated active inductor, IEE Electronics Letters, vol. 32, Issue 11, 23 May 1996 [5] Y. Cho, S. Hong, Y. Kwon, A Novel Active Inductor and Its Application to Inductance-Controlled Oscillator, IEEE Trans. On Microwave Theory and Techniques, vol. 45, no. 8, August 1997 [6] W. Sansen, Z. Y. Chang, Feedforward Compensation Techniques for High-Frequency CMOS Amplifiers, IEEE J. Solid State Circuits, vol. 25, No. 6, December 1990 [7] M. Kossel, C. Menolfi, T. Morf, M. Schmatz, T. Toifl, Wideband CMOS transimpedance amplifier, IEE Electronics Letters, vol. 39, Issue 7, 3 April 2003 [8] D. Guckenberger, J.D. Schaub, D. Kucharski, K.T. Kornegay, 1V, 10mW, 10Gb/s CMOS Optical Receiver Front-End, IEEE RFIC Symposium, pg. 309-312, 12-14 June 2005 [9] Maadani, M., Atarodi, M., A Low-Area, 0.18 m CMOS 10Gb/s Optical Receiver Analog Front End, ISCAS 2007, pages 3904-3907, 27-30 May 2007 [10] B. Razavi, Design of Integrated Circuits for Optical Communication, McGraw Hill, 2003 [1]

2 1 (12) [10]

This shows that limiting the number of cascading amplifying stages allows for operation over a wider frequency range. Another concern is the system dynamic range. Since the feed-forward circuit has a limited dynamic range, there is a maximum allowable input voltage to the LA before transistors M1 and M2 are driven out of saturation, degrading the gain. Automatic Gain Control circuits are being investigated to overcome this issue [10]. VI. CONCLUSION

An analog optical front end receiver designed in 80nm CMOS for high speed, low power applications has been described. The common gate feed-forward topology reduces the constraints on voltage headroom and decreases the input resistance, pushing the input pole further to the right. The large feed-forward gain allows for the use of smaller sized

70

Alfredo Arnaud, Matas Miguez

Departamento de Ingeniera Elctrica, Universidad Catlica, Montevideo, Uruguay

Abstract A study of the operation of switched continuous time filters (SCTF), defined as continuous time filters with elements that are alternatively switched on and off in the signal path, is conducted. A detailed calculation of the output of a SCTF in the frequency domain, which allows a fast but exact analysis of any SCTF in a general framework, is presented. Several applications of SCTFs are shown and examined using the developed theory, including a non-ideal Sample & Hold, the realization of fully integrated extremely large time constants, filter tuning by varying the duty-cycle of the switching. A detailed noise analysis for switched operated filters is also presented.

TS

x(t)

I.

Consider the case of an active bandpass filter, built with operational amplifiers, capacitors, and resistors, but each capacitor has an ideal switch in series to connect/disconnect them from the filter at regular time intervals. If the capacitors are connected, the filter acts like a continuous time one; but when the switches are open the capacitors preserve their charge and the output of the filter is assumed to remain constant. The filter is not a sampled-signal one because calculations are not performed between samples. In active time intervals when the passive elements are connected, the filter is continuous-time, but during hold time no filtering takes place, only the state of the filter is kept in an analog memory. This filter is an example of what will be referred to a switched continuous time filter (SCTF). SCTF examples may include the switched operation of Gm-C, Mosfet-C, active or passive filters. Switched operation of filters has been studied in the past using different approximations ([1],[2]) but the theoretical background in this paper will provide a general tool, to study the behavior and limitations of any SCTF in the frequency domain. For a linear, continuous time filter H ( f ) , and an input signal x(t ) X ( f ) (in the time and frequency domain respectively), the output signal will be X ( f ) H ( f ) . Fig.1 shows a scheme of a GmC SCTF that is regularly connected during active time. During hold time, the output and all the state variables inside the filter are kept constant. A control input m(t) sets the filter to hold or active depending on its value. For the sake of simplicity m(t ) is considered as a

Output Voltage

xOut(t) SCTF

Figure 1. A switched continuous time GmC filter.

is to calculate xOut (t ) X Out ( f ) in Fig.1 in terms of X ( f ), H ( f ), TS , . Several helpful, SCTF circuits will be also presented and examined along this work. II. CALCULATION OF THE OUTPUT OF A SCTF IN THE FREQUENCY DOMAIN

A SCTF is not a time-invariant system so it is not possible to define a transfer function H SCTF ( f ) such that the output of the filter can be calculated as X ( f ) H SCTF ( f ) . However it is possible to calculate the output X Out ( f ) of Fig.1 in the same way that is possible to write the output of an ideal Sample & Hold that is not a time-invariant system. Note that state variables in the SCTF are continuous in time, and they are modified only during the active time. Let y denote the vector containing the state variables of the filter. The value of the input signal during hold time is not relevant for the calculation of y (t ) . It is only necessary to solve the differential equations of the filter on each active time interval nTS 2 < t < nTS + 2 , assuming as the initial condition the value of the state variables at the end of the previous active time interval. That is:

m(t ) =

n =

p(t nT

).

With

p(t ) = 0 if t > 2 ,

((

))

y (nTS 2 ) = y ((n 1)TS + 2 ) . Therefore, to calculate the output it is possible to compress all the active time

71

intervals side by side, and then solve the differential equations of the filter in a single step. Consider the input of the SCTF x(t ) in Fig.2(a), and the chopped signal xCh (t ) = x(t ).m(t ) in Fig.2(b). The compressed signal

xComp (t ) X Comp ( f ) is defined as in Fig.2(c), placing together the pieces of x(t ) corresponding to active time slots. An intermediate auxiliary function x I (t ) is defined as its convolution with the impulse response h(t ) of the continuous time filter that is being switched:

x(t)

(a)

t xCh(t)

(b)

x I (t ) = xComp (t ) h(t ) .

(1)

t xComp(t)

y (t ) and the output of the filter xOut (t ) . In effect, (1) solves the filters equations for all active times incorporating proper initial condition on each time segment. xOut (t ) can be calculated by the inverse of the compression process as depicted in Fig.2(e). The output is the pieces of xI (t ) during active time (A) and the output of the filter is assumed to be a state variable that does not change during hold time (B). The compressed signal xComp (t ) of Fig.2(c) where the hold

times have been removed in x Ch (t ) can be expressed as: xComp (t ) =

(c)

t xI = xComp(t)*h(t)

(d)

n=

))

(2)

t XOut(t)

X Comp ( f ) =

X f . TS

TS

[ (( ( nf )]

n =

. sinc f . TS 1 + nf S .

S

))

(3)

B A B A

(e) A

t

Figure 2. Evaluation process for the output of a SCTF: (a) input signal (b) chopped signal (c) compressed signal (d) intermediate signal (e) output signal for "active" time slots (A) and "hold" time slots (B).

(3) is the Fourier transform of (2). Note in (3) that aliasing may occur if the bandwidth of the input signal is larger than f S 2 . Also note the frequency scaling by a factor TS when evaluating the input signal spectrum. The intermediate signal X I ( f ) is then calculated as X I ( f ) = X Comp ( f ).H ( f ) :

calculated by the inverse of the compression process in Fig.2(e). xOutB (t ) corresponds to the output of the filter in the hold time slots and holds the value of xOutA (nTS + 2 ) in the last active time. xOutA (t ) =

XI ( f ) =

TS

[sinc[( f (T n

=

1 + nf S

X f . TS nf S .H ( f )

)]

(4)

n=

S I

)) .

(6)

The exact output X Out ( f ) is the sum of two components: xOut (t ) = xOutA (t ) + xOutB (t )

n=

sinc( f ( T ) + n ). X

S

TS n f

(7)

that have to be calculated separately. x OutA (t ) corresponds to the output of the filter in the active time slots and is

ISBN 978-987-655-003-1 EAMTA 2008

72

are assumed to be ideal ones. The detailed analysis of a SCTF T n .sinc((TS ) f ) ( 1) . X I S .( f nf S ) included in this section, may help to design a proper equalizer if necessary. n = (8) IV. A NON-IDEAL SAMPLE & HOLD AS A SCTF Equations (4),(5),(7),(8) allow to compute the exact output MOS Sample & Hold (S&H) circuits are all based in the signal of the SCTF in terms of the continuous time filter elementary analog switch and capacitor structure shown in transfer function H ( f ) , the input signal X ( f ) , and the Fig.3. During sample time, the switch is closed for a time switching parameters TS , . Note that the input signal is so the capacitor is charged to the input voltage. The circuit scaled up in frequency (4) and down (7),(8); but H ( f ) in (4) can be seen as a switched R-C filter where the resistor is the is not scaled. Roughly in (4), is being filtered a frequency- on-resistance RON of the MOS switch. The usual scaled version of the input signal that is then downscaled. approximation is to consider large enough to guarantee From another point of view, neglecting the effect of the accurate samples every time the switch is closed. But if the modulating sinc functions and the effect of the aliasing (n=0): time is too short or the time constant R .C is too large,

X OutB ( f ) =

TS

ON

X Out ( f ) X ( f ).H

TS

.f

(9)

the S&H becomes non-ideal and it cannot accurately charge the capacitor if abrupt changes of the input signal occur. But a non-ideal S&H can follow an input signal provided its bandwidth B is much less than the sampling frequency f S . This case of an oversampling S&H ( f S > B ) has been valuable in a micropower accelerometer signal conditioning circuit [4]. The output of the non-ideal S&H will be calculated using SCTF background. TS >> is assumed for simplicity, thus only X OutB has to be computed in (5). When 0 it is possible to reduce (4),(8) to: X OutB ( f ) sinc( f .TS )

n =

case of a switched R-C filter can be clearly appreciated. III. SWITCHED CONTINUOUS TIME FILTER TRANSFER FUNCTION DEFINITION

GSCTF ( f ) = Vout ( f ). ( f ) . GSCTF ( f ) represents a pseudotransfer function for the system because neglecting the effect of aliasing, the output is: Vout ( f ) GSCTF ( f ) Vin ( f ) (10)

Although a transfer function in the sense of a linear, timeinvariant system, cannot be defined for a SCTF, it is possible to work with the function GSCTF ( f ) , defined as the output at frequency f of the SCTF when a pure sine-wave of unity amplitude v in (t ) = sin (2ft ) is applied at the input:

(1)

T H S ( f nf S ) X ( f mf S ) . m=

(11)

In this case (11) is evaluated using the R-C low-pass transfer 1 . Fig.3 shows the measured function H ( f ) = 1 + j 2RON Cf

Calculated Measured Ideal S&H

For a given SCTF, G SCTF ( f ) can be calculated with a computer program, adding a limited number of terms in (4),(7),(8), and using an input signal X In ( f ) = e f < f S 2 , X In ( f ) = 0 otherwise [3]. It should be pointed that switched operation of a filter does not affect stability. In effect, for an arbitrary initial perturbation, the non-switched system response is equal to x I (t ) in Fig.2. So the solution of the differential equations of a system will be stable or not, even if the differential equation set is solved in a single step, or applying the timedomain partition-compression of Fig.2. A SCTF adds some distortion to the signal in the sense that the input may be affected by sinc functions that are not constant with frequency. Also harmonics appear by the effect of aliasing. Both effects are considered in the above developed theory, and no extra distortion appears if switches

ISBN 978-987-655-003-1 EAMTA 2008

i 2f

1.0 0.9

for

Normalized Output

m(t)

v In (t )

vOut (t )

C

Frequency [Hz]

Figure 3. Magnitude of the pseudo-transfer function of a non-ideal SCTF for f S =125Hz, = 0.1ms, RON = 3.9k. The dashed line is classical result for the ideal S&H:

73

continuous line is GSCTF ( f ) as described in section III. For the non-ideal S&H, the SCTF theory allowed for the calculation of the exact distortion, which can be equalized later in the signal path (a usual practice for the ideal S&H). Apart from the above described SCTF analysis, there are other possible approaches to examine this circuit, the simplest being traditional S&H transfer represented in the dashed line of Fig.3. Albeit simple, this approximation does not take into account non-idealities when is too short. Using the intuitive resistor multiplication [2], one can estimate that the transfer function of Fig.3 should be equal to a low-pass with a cut-off frequency given by TS times the RC constant. This second approach takes into account S&H non-ideality, however sinc modulating effect gets lost. An exact approach should consider the sinc function, small , and aliasing effects. In the annex at the end, an exact transfer for the non-ideal S&H was developed using the z-transform formalism. When plotted, the result is the same as (11). In spite of being derived in a more comprehensible way this result can not be easily adapted to other SCTFs like the analysis of section II. V. VERY LARGE TIME CONSTANTS

C2=50p

and calculated output of a non-ideal S&H built with a W/L = 100m/5m NMOS transistor (measured RON = 3.9k), and

VInput

Gm1

C1=250p

S1

Gm2 +

VOut

Gm3

VRef S3

(a)

Theoretical, AC simulation Transient simulation: C1 = 250pF , S1 Duty = 100% C1 = 50pF , S1 Duty = 20% C1 = 10pF , S1 Duty = 4%

Another interesting application of SCTFs is the realization of fully integrated extremely large time constants. The circuit in Fig.4a for example, is a bandpass GmC corresponding to the second stage of a 0.5 - 7Hz, 40db/dec bandpass-amplifier (Gain 400) for a piezoelectric accelerometer [5]. Even employing an OTA equivalent to a 10G resistor, a large C1=250pF capacitor was used in [5] to properly set the highpass pole. However in Fig.4a a switch S1 was included to operate the feedback loop Gm3-C1 as a SCTF. Switching S1 at a 20%, or 4%, duty cycle it was possible to substitute C1 by a 50pF, or 10pF, capacitor respectively, without a major impact in the whole transfer function of the filter as depicted in Fig.4b. Each symbol in Fig.4b represents the result of a transient simulation, for several frequencies, and C1 value. The switch S3 is closed in hold times, and was placed only to set the voltage at the output node of Gm3 during hold. Without S3, it takes some time to Gm3 after S1 is closed, to deliver the proper current to C1, and the amplitudes in Fig.4b result slightly different for the smaller capacitors. VI. FILTER TUNING BY MEANS OF SWITCHING

(b)

10 100

Frequency [Hz]

Figure 4. (a) A 2nd order bandpass GmC, where one of the OTAs is switched to reduce the size of a 250pF capacitor to 50, or 10pF. (b) AC transfer function of the non-switched filter (line), and transient simulation for the switched version at several frequencies.

Switched operation can be exploited, for example, to adjust the transfer function of a filter by means of the duty cycle of a digital signal. Fig.5 shows the topology of a lowpass Sallen-Key filter, including switches to operate it as a SCTF. The continuous-time transfer function H ( f ) of the filter is the dashed line in Fig.5, and is the result of an AC analysis in SPICE simulator with both switches closed. The remaining curves and symbols in Fig.5, show simulations of the switched transfer function G SCTF ( f ) defined in (10) for different values of the duty cycle of m(t ) . The symbols in Fig.5 were not calculated using the SCTF equations of section

ISBN 978-987-655-003-1 EAMTA 2008

II. Instead, for each frequency and duty cycle, a SPICE transient analysis is performed. The amplitude of the output of the SCTF is measured for each simulation, and it corresponds to a single symbol in the plot of Fig.5. The operational amplifier and switches were simulated for a standard MOS 0.35m technology. To demonstrate the accuracy of the previously developed equations, the continuous lines, which are the result of applying equations (4) to (8) to the Sallen Key filter, are shown. It should be highlighted that while each transient SPICE simulation takes a couple of minutes to complete, SCTF equations take only a few seconds to calculate the filter response over the full frequency span. For the latter also the simulation setup is simpler, just by changing a couple of equations in a MATLAB script. As predicted by (9), the filter transfer is shifted in frequency by the duty cycle of m(t). Another example of filter tuning by adjusting the duty cycle is presented in Fig.6. Three different bandpass filters aimed for implantable medical devices, which span three orders of magnitude from few Hertz to kHz, will be implemented with the same GmC but different switching duty cycle. These filters are a 700Hz-centered bandpass for ENG recording in [6], a 70-200Hz bandpass for cardiac activity sensing filtering [7], and the 0.5-7Hz bandpass for the accelerometer in [5]. The original filters (normalized amplitude) are ploted in Fig.6 in continuous lines. The filters

74

R2 =1M

R1 =1M

vOut (t )

C1 =50pF

changed to 120Hz and 2Hz, respectively. In fig 6, each symbol was obtained by a transient simulation (no lower frequency points were obtained for duty=0.5% because of convergence issues). VII. NOISE ANALYSIS OF A SCTF Output noise in a continuous time filter is calculated by adding the noise contribution of all the elements in the filter at the output. In the case of a SCTF it is necessary to sum each noise source, but also each one passes through a SCTF to the output. Consequently noise aliasing may result in a significant contribution if the noise it is not band-limited in some way. Fortunately this limit is normally embedded in the filter. Equations (4), (7) and (8) contain all the information required for noise calculation, including the effect of aliasing. X OutA ( f ), X OutB ( f ) are expressed as the sums of infinite terms, in which some of these terms are also infinite sums. When calculating the output, a large enough number of terms must be summed for each infinite sum. If aliasing cannot be neglected (for example in the case of white noise), the appropriate amount of terms of X OutA ( f ) and X OutB ( f ) have to be added when numerically evaluating noise contributions with a computer. As noise is usually expressed in terms of its power spectral density (PSD), the coefficients in those equations must be squared. In this case because they are correlated, X OutA ( f ) and X OutB ( f ) cannot be calculated separately and then added, but all squared terms must be calculated together and multiplied by the noise PSD. The exact calculation can be a bit tricky, and equations (4),(7) and (8) must be all combined, and each term of the double sum must be calculated, squared and multiplied by the input before being added. To simplify calculations, a set of MATLAB routines were implemented [3] to calculate the noise contributions of a given noise source. The thermal noise contribution of a simple Gm-C chopper amplifier [8] (Fig.8) which can be studied as a SCTF, is shown in Fig.7 and Fig.8. To compute the thermal noise contribution, X ( f ) of section II, is substituted by a constant PSD.

2.0 PSDs: Xout XoutA XoutB

v In (t )

C2 =50pF

1.0

0.8

0.6

0.4

0.2

0.0

10

100

1k

10k

100k

Frequency [Hz]

Figure 5. A switched Sallen-Key lowpass filter (top, continuous-time 3db cut-off 2kHz) and simulation (bottom) of the transfer function GSCTF(f) while varying the duty cycle of m(t). The dashed line is the continuous time transfer function. Symbols are the result of a time domain simulation while continuous lines are obtained with SCTF theory.

were implemented with the topology of Fig.4, but an extra pair of switches were included to connect/disconect C2. By selecting Gm1=20S , Gm2=683nS, Gm3=2.2nS, C1 = 78pF, C2=22pF, the first transfer function is obtained (higher frequency bandpass of Fig.6). Operating with different duty cycles: 20% and 0.5%, the center frequency value can be

0.5-7Hz BP [4] 70-200Hz BP [7] 700Hz BP [6] duty = 0.5 % duty = 20 %

Normalized PSD

10k

1.5

Filter Gain

1.0

0.5

Frequency [Hz]

Figure 6. By switching a 700Hz-centered bandpass filter for ENG [6], a 70-200Hz cardiac sensing [7] and a 0.5-7Hz accelerometer [5] filters are simulated. Continuous lines represent original continuous time filter, while symbols represent switched Gm-C response (transient simulation).

Frequency (Hz)

Figure 7. Normalized output noise of a chopped GmC amplifier (continuous line); normalization is performed with respect to the white noise output at low frequency of a non-chopped GmC. XOutA (dashed) and XOutB (dots) components of noise are highly correlated.

75

2.5

IX.

2.0

The evolution of the sampled signal xs (t ) is given by the following equation, where the exponential charge of the capacitor C has been introduced: xs(t) = xs (t Ts ) + [x(t) xs(t TS)]. 1 e /RONC . The sample time comparison to the

Normalized PSD

1.5

(12)

1.0

0.5

VIn

VOut

Frequency (Hz)

Figure 8. GmC chopper, and its normalized output noise PSD, calculated using a time-domain simulation (continuous line) and using the proposed SCTF equations (dashed line).

discrete-time filter with input x[n ] , and output xs [n] . Its Ztransform transfer function is given by: X s ( z) = . Calculating the frequency H ( z) = X ( z ) 1 z (1 ) response of this digital filter as H e jwTs , applying the usual sampled Fourier transform:

Xs( f ) =

a discrete-time equivalent of (12) can be derived: xs [n] = (x[n] xs [n 1]) + xs [n 1] which defines a

= 1 e /RON C ,

shown for a single chopped Gm-C. In the same figure, the separate components X OutA ( f ) and X OutB ( f ) PSDs are shown. Note that X OutA ( f ) and X OutB ( f ) are highly correlated functions thus the total output PSD is not the sum of the individual ones. All results are normalized to the noise of a non-chopped amplifier. To verify the accuracy of the SCTF extension for noise contribution, Fig.8 shows a timedomain simulation of thermal noise (continuous line) compared to the PSD of X Out resulting from the simulations (now dashed line). Time-domain simulation is the result of applying to the chopper amplifier a white-noise-like random input signal. Both curves were obtained with MATLAB programs, the first simulation took 15 minutes while the latter only 30 seconds to complete. Since the output of the circuit in Fig. 8 is twice that of a single GmC, twice input noise is expected if no overhead due to switching is present, as measurements in [8] show. VIII. CONCLUSION A set of equations have been introduced to examine generic SCTFs in the frequency domain. The tool allows an exact evaluation of the output of this kind of filters, as well as to explore different design trade-offs. Particularly important is that the impact of noise, the effect of aliasing, and a transfer function definition, can be analyzed among other filter properties. Different SCTF examples were presented: a non-ideal S&H, active filters with duty-cycle tuning, and noise analysis in a switched Gm-C chopper amplifier. The previously developed SCTF theory allowed a better understanding of circuit operation and limitations. It should be pointed that in comparison, time-domain simulations required much more time and computer resources, and also the SCTF approach required only minimal changes on a computer program, to examine the widely different circuits studied.

1 e

j 2fTS

(1 )

Eq. (13) is an exact transfer function for the non-ideal S&H with the only assumption of infinitely-short . Once plotted, (13) is equal to our exact plot in Fig.3 but was derived in a more comprehensible way. However, the z-transform formalism applies only for a negligible , while the SCTF general formalism has no restrictions. Also, it is important to consider that the same MATLAB program was used with minor modifications to evaluate the widely different examples in sections III to VII. REFERENCES

[1] A.Kaehler, "Periodic-Switched Filter NetworksA Means of Amplifying and Varying Transfer Functions", IEEE Journ.Solid State Circuits, Vol.4, n4, pp.225-230, aug.1969. Y.Sun, I.T.Frisch, "Resistance Multiplication in Integrated Circuits by Means of Switching", IEEE Trans. Circuit Theory, vol.15, n3, pp.184192, sept.1968. MATLAB routines for SCTF calculations, available online: http://die.ucu.edu.uy/users/aarnaud/projects/sctf_m_files.zip A. Arnaud, M. Bar, G. Picn, F. Silveira, Design of a Micropower Signal Conditioning Circuit for a Piezoresistive Acceleration Sensor, Proc.IEEE Int Symp. on Circuits and Systems, vol.I, pp.269-272, 1998. A.Arnaud, C.Galup-Montoro, Fully integrated signal conditioning of an accelerometer for implantable pacemakers, Analog Integrated Circuits and Signal Processing vol.49, pp.313321, Dec.2006. J.Gak, M.Bremermann, A.Arnaud Integrated Filter-Amplifier for ENG Signals, Proc. Escuela Argentina de Microelectrnica, Tecnologa y Aplicaciones EAMTA 2007, pp.19-23, Cordoba, Argentina, Sept.2007. L.H. Spiller, Filtro OTA-C de baixa potncia aplicado a um detector de atividade cardaca, MSc.thesis, UFSC, Jul.2005. Available at: www.eel.ufs.br/lci A.Arnaud, M. Bremermann, J.Gak, M.Miguez, "On the design of ultra low noise amplifiers for ENG recording", 20th Symposium on Integrated Circuits and Systems Design - SBCCI 07, Rio de Janeiro, Brazil, Sept.2007.

[2]

[3] [4]

[5]

[6]

[7]

[8]

76

Zhongtao Fu, Xiao Wang, Eugene Minh and Alyssa Apsel Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA

Abstract In this paper, we present a fast acquisition, ambiguity-free Phase Frequency Detector (PFD) design. This PFD completely eliminates the missing edge and phase ambiguity problems found in many conventional PFDs. Therefore, this PFD topology speeds up the acquisition process and improves the maximum operating frequency. A fabricated PFD operating at 556MHz along with a 6.5GHz phase-locked loop design in 0.25m SOI process confirm such advantages.

I.

INTRODUCTION

until either input goes high. Then, FSM goes to either Charge Up State or Charge Down State depending on which input comes first. When the second input arises, reset is turned ON and the FSM returns to Idle State. In many conventional PFDs, non-ideal effects slow down the acquisition, such as missing edge [7] and phase ambiguity. These problems are discussed in Section II. A proposed PFD topology that completely eliminates these problems is described in Section III. We demonstrate this new PFD topology through a 556MHz PFD design along with a 6.5GHz PLL design in a 0.25m SOI process. The results are shown in Section IV.

Phase-locked loop (PLL) has been widely used in wireless communication, microprocessor IOs and signal processors for clock generation, distribution and data recovery [1, 2]. Charge pump PLLs draw the most attention due to their simple structure, CMOS compatibility and low phase noise properties. However, the charge pump PLL poses limitations on acquisition time if implemented with conventional PFD [3]. Fast acquisition time increase data transfer rate when PLL is used for on-chip clock distribution; Fast acquisition is also required for carrier generation in wireless communications, especially for frequency hopping system; Fast acquisition enables power saving because various power management techniques require the clock to be suspended and re-activated on the fly. In all the above applications, fast acquisition in PLL is crucial. The PFD plays a significant role in determine charge pump PLLs acquisition time. The PFD detects the phase and frequency difference between reference and Voltage Controlled Oscillator (VCO). And then it drives the VCO frequency towards the reference frequency through the charge pump. A PFD is usually built with a finite state machine (FSM) shown in Fig. 1 with memory elements such as flipflops [4~6]. Fig. 2 illustrates a common PFD structure using D flip-flops and an AND gate. Triggered by the reference and VCO, the PFD generates an UP and a DOWN signal that switches the charge pump current. Initially, both VCO (A) and reference (B) inputs are low. The FSM stays in Idle State

This work was supported by Army Research Office, ID# W911NF-05-10515.

77

II. A.

NON-IDEAL EFFECTS

Missing Edge Parasitic and gate delays lengthen the ON state of the reset signal, which can cause missing edges. And this effect is never negligible. In a case shown in Fig. 3, when the phase difference goes near 2, the leading edge of the reference triggers the UP signal until the lagging edge of VCO comes, which resets both the UP and DOWN signal to low. Due to the finite delay of the reset signal, the reset overwrites the next coming edge of the reference, which is supposed to cause the UP to go high. As a result of the missing edge, a discrepancy occurs. The PLL is not able to approach locking monotonically. The amount of acquisition time slowed down depends on how often the wrong information is feed to the charge pump and VCO. This, in turn, will lead PLL to pullout instead of pull-in. In the worst case scenario, when the reset delay is as much as half the input signal period, the PLL fails to lock [8]. Normalized Frequency Error

2-

PFD can not respond to inputs Figure 3. Finite Reset Delay Causes Missing Edge

-0.8

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

Phase Ambiguity Another problem is that the two rising edges to be compared are not unique. At any initial state, the circuit can pick, based upon the previous state, any two rising edges shown in Fig. 4 (solid or dashed) to calculate the phase error and therefore it can produce two possible output currents for each phase error. In other words, output current can be in either of the two valid states (solid or dashed). This ambiguity in the PDF has detrimental effects in PLLs. We can plot the phase error versus frequency error during phase locking process at the state space chart shown in Fig. 5. The origin is when PLL achieve locking with zero frequency and zero phase error ideally. The inner curve is when PFD pick , and the outer curve is when PFD pick 2-, assuming critically damped designs, where is initial normalized phase error. Obviously picking the correct edge to start with can shorten the acquisition time. Depending on the initial phase difference, picking the close-in edge or picking the further-out edge to compare can make a significant difference in locking time. If one assumes that the probability density function of the initial phase difference is uniformly distributed form 0 to 2. Then, the average advantage on initial phase error can be 50%.

ISBN 978-987-655-003-1 EAMTA 2008

B.

Figure 5. State Space Representation of Locking Process from Two Initial Conditions

III.

We propose a PFD design that forces the PFD to only compare the two rising edges with smaller phase error. Therefore, we eliminate the case when reset signal can overwrite the next coming edge (effect A) because the reset signal has at least half of the period to go low. Also, it will eliminate the phase ambiguity by always picking the smaller phase error initial condition (effect B). Starting from the logic level design, the simplest way to force the PFD to only compare the two rising edges with smaller phase difference is to ensure that the reset signal will go high only when A, B, QA and QB are all high. This can be done with a 4 input AND gate as shown in Fig. 6a. The 4 input AND eliminates the possibility of comparing the reference edge with the dashed edge in Fig. 4. This is because when QB rises, signal A is still low, so it does not trigger the reset. Now with this topology, each phase error is mapped uniquely to one possible data point from - to .

78

In the circuit design, an obvious way to realize this topology is to go by the logic block which is to feed A and B along with QA and QB to the AND gate or invert A and B

VCO Control Voltage (V)

and then feed them along with QA and QB to a four input NOR gate. However, this will increase the number of transistors and the PFD power consumption. To simplify the circuit, a novel D flip-flop for PFD is presented in Fig. 6b. This flip-flop operates as follows. When A and Reset are low, the middle node, M, is charged up by M1 and M2 to high. At the rising edge of A, M7 turns on and node M holds its value then Q goes low. There are three important facts to note: First, for Q to go high, not only reset but also A and B have to be high. Second, when M3, M4 and M5 discharge the node M,

Q goes high. M3 is the key to lower power consumption, as it prevents a short from VDD to gnd. Third, the small number of stages maintains low power operation. A PFD using this D flip-flop shown in Fig. 6c works at high speed because the middle node, M, is already precharged when the signal (CK) is low. And, at the rising edge of the signal only the second stage of the D flip-flop and an inverter needs to be driven to flip the sign of Q signals. In addition, the three transistor discharge path naturally adds delay to reset which is actually beneficial for dead-zone reduction and eliminates the need to add an additional delay block [9]. Note that this is still a rising edge triggered PFD, the characteristic does not depend on the duty cycle of the inputs. We demonstrate a PLL design using this PFD. The loop locks quickly in 2s shown in Fig. 7. However, under the same conditions for the worst case scenario the loop never locks if using a PFD with ambiguity shown in Fig 7.

Figure 7. (UP) Example of PLL Locking with proposed PFD (Down): One Case when PLL Fails to Lock with Ambiguity PFD

IV.

MEASUREMENT RESULTS

We have fabricated and tested this PFD design in 0.25m Silicon on Sapphire process. The use of an SOI process as well as the topology described in the previous section help us to achieve measured power consumption of 2.3mW with a 2.5V supply at over 550MHz. A measured curve of the average output current versus the input phase difference is shown in Fig. 8. The phase difference was swept with a phase shifter from Advanced Technical Material, Inc. for 556MHz input signals. Our measurements showed no ambiguity in this PFD characteristic and a good match to the simulated behavior with 8% variation in gain. The 2.7ps dead-zone in simulation is too small to be observed on the test bench. Since the proposed PFD uses fewer transistors (23 transistors) than the conventional PFD (64 transistors) [10], the layout footprint is only 48m x 58m.

(CK)

Figure 6. (a) Proposed PFD Logic Diagram (b) proposed D Flip-Flop (c) Proposed PFD Schematic

A comparison of several PFDs is listed in Table 1. We compare our design to two others found in the literature. It is important to note that while many PLL papers present PFD designs that suffer from the large signal discontinuity

79

IEEE Catalog number CFP0854E-CDR

problem described here as ambiguity, Both of these PFDs in Table I suffer from phase ambiguity while the proposed PFD is ambiguity free. It also has improved dead-zone. Power numbers were not made available for comparison within the literature. It is clear that the improved PFD presented in this paper decrease acquisition time and removes the potential for metastability without sacrificing dead-zone, bandwidth, power, or other critical metrics.

TABLE I. A COMPARISON OF PFDS

ACKNOWLEDGMENT This work was supported by Army Research Office, ID# W911NF-05-1-0515. The authors would like to thank Paul Chen for valuable comments. REFERENCES

[1] [2] M. V. Paemel, Analysis of a charge-pump PLL: a new model, IEEE Transactions on Communication, vol. 42, pp. 2490-3498, July 1994 . J. Craninckx, M.S.J. Steyaert, A fully integrated CMOS DCS-1800 frequency synthesizer, IEEE Journal of Solid-State Cirduits, vol. 33, pp. 2054-2065 December 1998. I.-C. Hwang, S.-H. Song, and S.-.W Kim, A digitally controlled phase-locked loop with a digital phase-frequency detector for fast acquisition, IEEE Journal of Solid-State Cirduits, vol. 36, pp. 15741581, October. 2001. B. Razavi, Design of Integrated Circuits for Optical Communication, McGraw-Hill, 1st ed. 2003, ch. 8. W.-H. Lee, and J.-D. Cho, A High Speed and Low Power PhaseFrequency Detector and Charge-Pump, IEEE ASP-DAC99. Asia and South Pacific, vol. 1, pp.269-272, 1999. R. C. Chang, and L.-C. Kuo, A differential type CMOS phase frequency detector, Proceedings of the Second IEEE Asia Pacific Conference on 28-30, pp. 61-64, August 2000. M. Mansuri, D. Liu, and C.-K.K. Yang, Fast frequency acquisition phase-frequency detectors for Gsamples/s phase-locked loops, IEEE Journal of Solid-State Cirduits, vol. 37, pp. 1375 - 1382 November

[5] [6]

a

Size 19 T 29 T 23 T

a. Simulation only T: transistor number NL: Not Listed

[3]

us

[4] [5]

[6]

V.

CONCLUSION

[7]

This paper demonstrates a novel fast acquisition phase frequency detector design operating at 556MHz. This PFD completely eliminates the missing edge and phase ambiguity problems found in many conventional PFDs. The measurement data has confirmed the advantages of this PFD design. Therefore, this PFD can be broadly used in charge pump PLLs for fast acquisition.

2002.

M. Soyuer, and R. G. Meyer, Frequency limitations of a conventional phase-frequency detector, IEEE Journal of Solid-State Circuits, vol. 25, pp. 1019-1022, August 1990. [9] K.-S. Lee, B.-H. Park, H. Lee, and M. J. Yoh, Phase frequency detectors for fast frequency acquisition in zero-dead-zone CPPLLs for mobile communication systems. ESSCIRC '03. Proceedings of the 29th European, pp. 525-528, September 2003. [10] N. H. E. Weste and K. Eshragrian, Principles of CMOS VLSI Design, 2nd ed. Reading, MA, Addison Wesley, 1993.

[8]

80

Zhaonian Zhang and Andreas G. Andreou

Electrical and Computer Engineering The Johns Hopkins University Baltimore, MD 21218 zz@jhu.edu, andreou@jhu.edu

Abstract Active acoustic scene analysis is a promising approach to distributed persistent surveillance in sensor networks. We report on the design of bandpass sampling technique for an acoustic micro-Doppler sonar [1] to reduce the data rate to as low as 85kbps. We then explore the use of Gaussian mixture models for human identication. We compare the classication performances using different feature vectors and from different sampling schemes. We show that the use of differential cepstral vectors of context length 2 improves the classication accuracy. We also show that the classication performance of the bandpass sampling system with an 8-bit resolution is still over 90% on a database consisting of 160 gait signatures from 8 individuals.

I. I NTRODUCTION The Johns Hopkins University Acoustic Surveillance Unit (JHU-ASU) [2][3][4] is a distributed system employing passive acoustic sensing for detection and tracking of vehicles. The demonstrated detection and localization performance of the JHU-ASU is comparable with state-of-the-art DSP systems at a mere fraction of their size and power dissipation [2], which enables the JHU-ASU to be deployed in an autonomous wireless sensor network for surveillance purposes. In this paper, we explore the use of active acoustics, and in particular micro-Doppler signatures, to augment the passive sensing in the JHU-ASU, in a multi-modal wireless sensor network for detection and classication. Although active acoustic sensing has been widely used in applications such as underwater sonar [5] and medical imaging [6] for decades, the application of active acoustics for sensor networks has only recently been studied [7]. We have previously reported an active acoustic micro-Doppler system that can image objects with articulated moving components, such as gaits of humans [1] and four-legged animals [8]. In those experiments, we have observed that each object possess a somewhat unique signature, which suggested the use of these signatures for biometric identication or gait recognition. Currently, the most widely used technique for gait recognition employs one or more visible or infrared video cameras to record a person walking in the monitored eld, either indoors or outdoors [9]. Despite the encouraging results from these studies, large memory size and fast processing speed are necessary to process sequences of images obtained from the camera(s). For example, consumer grade progressive scan single CCD cameras were used by the HumanID Gait Challenge program to create their database. The reported data rate after

ISBN 978-987-655-003-1 EAMTA 2008

compression and subsampling is 25.7Mbps [10]. Such high data rates and the computation power required make videobased gait recogntion techniques inhibitive to be implemented in wireless sensor networks, where the system complexity and power consumption are of primary concern. In addition to the engineering concerns, privacy concerns with camera-based systems are serious obstacles as well for their deployment in public. Kalgaonkar and Raj reported an acoustic micro-Doppler sonar that employed a heterodyne receiver for signal processing and data acquisition [11]. The signal chain of their system is illustrated in Fig. 1(a). Classication was attempted with Gaussian mixture models and feature vectors consisting of cepstral vectors and differential cepstral vectors whose context length is one. In this paper, we rst describe the design of an ultrasonic micro-Doppler sonar that employs a bandpass sampling approach for data acquisition, an approach that signicantly reduces the amount of data from our previous design [1]. Then we extract feature vectors from the micro-Doppler signatures and explore the use of Gaussian Mixture Models for training and classication. Finally, we compare the classication performance of the bandpass sampling system, the oversampling system previously reported and the heterodyne system in [11]. The reduced data rate from the bandpass sampling system minimize the amount of the communication as well as signal processing and computational costs. This enables the a microDoppler sonar approach to be used in a wireless sensor network for various security and surveillance applications. Although bandpass sampling techniques have been widely used in optics [12], sonar [13], and communications [14] to reduce the data rate or the requirement on the signal processing circuitry, they have note been applied to continuous wave micro-Doppler systems. The bandpass sampling approach described in this paper also applies to microwave-based continuous wave X-band radars to eliminating the need of a down conversion mixer and hence overall system complexity and power consumption, which were used in the past to study human gaits [15][16][17], but an acoustic system still offers the advantages of lower cost, easier signal processing and immunity from electromagnetic interference. The rest of this paper is organized as follows. Section II reviews micro-Doppler effect, discusses the design of a micro-

81

Doppler sonar using bandpass sampling and compares this design with previous designs. In order for us to explore human identication from micro-Doppler signatures and compare different designs, a data collection was conducted to build a database. In Section III, we explore the use of Gaussian mixture models for classication and present the results. II. M ICRO -D OPPLER E FFECT AND BANDPASS S AMPLING A. Micro-Doppler Effect The velocity of a moving object relative to an observer can be estimated by measuring the frequency shift of a wave radiated or scattered by the object, known as the Doppler effect. If the object itself contains moving parts, each moving part will result in a modulation of the base Doppler frequency shift, known as the micro-Doppler effect [18]. Given an acoustic wave transmitted by an observer, the frequency of the received wave due to a simple single-point scatterer is [19] 2v f = f0 (1 + ), (1) c where f0 is the frequency of the transmitted acoustic wave, v is the velocity of the scatterer relative to the observer and c is the speed of sound. The Doppler frequency shift due to the scatterer is fDoppler = f0 2v , which is proportional to the c velocity of the scatterer relative to the observer. In the case of an articulated body such as a walking person, the torso, each arm and each leg has its own velocity, and even when the torsos velocity is constant, the velocity of the limbs changes over time. The Doppler signature fDsig for such a complex object has multiple time-dependent frequency shifted components and is dened as: fDsig (t) = f0

i

n 1 2 3 4 5

range 84kHz fs 42kHz fs 76kHz 28kHz fs 38kHz 21kHz fs 25.33kHz 16.8kHz fs 19kHz

n 6 7 8 9 10

range 14kHz fs 15.2kHz 12kHz fs 12.67kHz 10.5kHz fs 10.86kHz 9.33kHz fs 9.5kHz 8.4kHz fs 8.44kHz

BANDPASS SAMPLING

SYSTEM GIVEN THAT fL

= 38 K H Z

AND fH

= 42 K H Z .

0-8kHz LPF

40kHz

amp

S/H 16kHz

classifier

36kHz

(a)

40kHz

amp

S/H

1MHz, 10.5kHz

classifier

(b)

Fig. 1. (a). block diagram of the data acquisition and signal processing approach presented in [11]; (b). block diagram of the data acquisition and signal processing approach presented in this paper and our previous paper [1].

2vi (t) , c

(2)

where vi (t) is the velocity of the torso or an individual limb as a function of time. A two-dimensional representation of human gait can be obtained from the returned Doppler signal by applying the short-time Fourier transform (STFT) to the received signal as follows:

+

in Figure 2. The frequency of the incident wave is 40kHz and the sampling rate in this previously published direct sampling system is 1MHz. A careful examination of Fig. 2 reveals that all the frequency shifts are located within a 4kHz frequency band around the 40kHz carrier frequency. Such a narrow band signal can be sampled at a rate much lower than the Nyquist rate with bandpass sampling techniques [20]. More specically, if fH and fL are the upper and lower cutoff frequencies of the frequency band of interest, the sampling rate fs must satisfy 2fH 2fL fs , n n1 where n is an integer and 1n fH . fH f L (4)

ST F T (t, f ) =

(3)

where x(t) is the received signal, g(t) is a sliding window function (e.g., a Hamming window), t is time and f is frequency. In this time-frequency plot, the horizontal axis is time, the vertical axis is frequency, and the magnitude of the short time Fourier transform output at each point is represented by the hue of the points color (or the intensity in the case of a gray-scale representation). B. Data Acquisition Using Bandpass Sampling Based on the above principle, an active acoustic sensing system is prototyped which allows us to collect gait signatures of humans [1] and four-legged animals [8]. An example spectrogram of a person walking toward the system is illustrated

ISBN 978-987-655-003-1 EAMTA 2008

(5)

The proof can be found in [12]. When n = 1, Eq.(4) becomes fs 2fH , which is the Nyquist sampling theorem. When n 2, subsamplling occurs. Using the bandpass sampling theorem, we prototyped a bandpass sampling micro-Doppler active sonar system that signicantly reduces the data rate. In our design, we assume fL = 38kHz, fH = 42kHz, and therefore, n can be an integer between 1 and 10, according to Eq.(5). Table I lists the range of sampling frequencies fs for each n. A Microchip microcontroller PIC18LF6680 is programmed to drive an ultrasonic transmitter with square waves at 40kHz.

82

1

Bandpass sampling at 10.526kHz

0.5

0 -50

-40

-30

-20

-10

10 0 Frequency (kHz)

20

30

40

50

Fig. 3. Illustration of bandpass sampling of a signal between 38kHz and 42kHz at a rate of 10,526kHz (whose period is 95s). The spectrum plotted in black represent the original signal spectrum. Note that the spectrum near DC is aliased from negative frequencies, hence mirrored from its normal view in the positive frequencies.

Fig. 2. Micro-Doppler spectrogram of a person walking toward the system. An accelerometer is attached to the left leg and the velocity derived from the accelerometer signal is superimposed on the spectrogram. The letter L and R mark regions of the spectrogram representing the motion of the left and right legs respectively. The right arrow T points to a region of the spectrogram around 40.3kHz, corresponding to a strong Doppler return due to the motion of the torso (walking speed approximately 1.3m/s).

On the receiving end, the same microcontroller controls a 16-bit analog-to-digital converter (AD7654) to digitize the incoming signal at roughly 10.526kHz (a period of 95s) and sends the upper 14 bits of every sample to a computer via an RS-232 interface. Although ideally, we would like to minimize fs by choosing the largest n, which is 10 in this case, we pick a slightly higher sampling frequency for the following reasons. Since a microcontroller is used in the design to control the an analog to digital converter in the receiver, the corresponding period of the sampling frequency has to be an integer number of instruction cycles of this microcontroller. For the Microchip PIC18LF6680, the instruction cycle is 100ns. Furthermore, the microcontroller has to be programmed to drive the ultrasonic transmitter at 40kHz (corresponding period is 25s). In order to have a reasonably short piece of assembly code, the least common multiple of the period of the sampling frequency fs and 25s has to be not too large. Taking those constraints into account, our choice for the sampling period is:

Fig. 4. A sample gait signature collected using a direct sampling system when a person is walking towards the sensors. Note that since no aliasing occurs in this case, the positive Doppler frequency shifts are all above the 40kHz line.

Figure 3 illustrates bandpass sampling of a narrow band signal from 38kHz to 42kHz in the frequency domain. Note that when the spectrum is aliased to near DC, it is mirrored from the original spectrum. Figures 4 and 5 show actual gait signatures (spectrograms) computed from data collected using both the direct sampling approach and the bandpass sampling approach.

ISBN 978-987-655-003-1 EAMTA 2008

Fig. 5. A sample gait signature collected using bandpass sampling at the same time Fig. 4 is collected. Note that due to subsampling, aliasing occurs and the frequency spectrum near DC is aliased from negative frequencies and is therefore mirrored from Fig. 4.

83

chairs desks

27ft (8.2m)

window window

3ft (0.9m)

Fig. 6.

C. Comparison with the Heterodyne Receiver Although 40kHz ultrasonic transducers were also used in Kalgaonkar and Raj system [11], a heterodyne receiver was built to mix the returned echoes with a 36kHz sinusoidal wave to demodulate the gait signal from the 40kHz carrier frequency. The output of this mixer was low pass ltered at 8kHz and a sound card in a personal computer was used to digitize the lter output at a sampling rate of 16kHz. A different but much simpler signal chain is used in the bandpass sampling system presented in this paper and the oversampling system previously published [1], as illustrated by Fig. 1 (b). The received acoustic signals are amplied, then acquired by either oversampling or bandpass sampling technique, without the use of a mixer or lowpass lter. A 36kHz reference sinusoidal signal is not needed either, thus eliminating the need of a direct digital synthesizer (DDS). The elimination of the mixer, the lowpass lter and the DDS in the signal chain not only reduces the system complexity but also the power consumption. The data rate and power consumption can be further reduced if bandpass sampling is used to acquire the incoming signals where they are sampled at 10.5kHz, as opposed to the 16kHz sampling rate needed in [11]. III. C LASSIFICATION A. Data Collection In order to explore the use of micro-Doppler signatures for identication, I collected some gait data to build a database. The data collection was conducted along a walkway in a research lab. A diagram of the walkway is illustrated in Fig. 6 with an actual picture being shown in Fig. 7. Eight volunteers, ve males and three females, were invited to help with the data collection. In each run, one volunteer was instructed to walk either from point A to B or from B to A. Both the oversampling and the bandpass sampling units were placed at the same location A, and used to record simultaneously for ve seconds for each run. Twenty runs were repeated for each volunteer, with 10 being the person walking towards the sensors while the other 10 away from the sensors. A total of 320 pieces of data, 160 from the oversampling unit (at 1MS/s and 12 bits resolution), and another 160 (at approximately 10.5kS/s and 14 bits resolution) from the bandpass sampling unit, were acquired simutaneously for subsequent analyses. B. Gaussian Mixture Models and Feature Vectors A Gaussian mixture model (GMM) is employed to model the distribution of feature vectors for each walker. Previously,

ISBN 978-987-655-003-1 EAMTA 2008

Fig. 7.

Gaussian mixture models were applied to feature vectors extracted from the speech spectra for speaker model training and recognition. Unlike Hidden Markov models, a Gaussian mixture model does not impose any Markovian constraints between sound classes, but rather provides a probablistic model of the underlying sounds of ones voice [21]. These models have been proven to be effective in modeling the dynamics and movement of the torso and legs of a walking person [11]. Mathematically, it is P r(X|w) =

i

pi N (X; i , i ),

(6)

where X denotes a feature vector, P r(X|w) represents the distribution of the feature vectors given a walker w, N (X; i , i ) is a Gaussian distribution with mean i and variance i for the ith member in the mixture model, and nally pi is the weight of the ith member in the mixture model. It should be noted here that pi , i and i are all functions of w, i.e., the probablistic weight, mean and variance for each member of the Gaussian mixture model are different for different walkers. The feature vectors can be obtained by calculating the cepstral coefcients from the micro-Doppler signatures, which are the STFT of the received gait signals. In our experiments, only the rst 40 dimensions of the cepstral coefcients are retained. Differential cepstral vectors have been widely used in automatic speech recognition techniques to include temporal information. If c(t) is an n dimensional cepstral vector at a given time t, the differential cepstrum at t is calculated as a weighted sum of K cepstral vectors, i.e.,

K

kc(t + k) c(t) =

k=K K

, k

2

(7)

k=K

84

where we call K the context length. Differential cepstral vectors of context lengths 1 and 2 are calculated using Eq. (7) to augment the cepstral vector. The resulting 80-dimensional vectors are the nal set of feature vectors used in the classication. The parameters for the Gaussian mixture models are initialzed using K-means clustering algorith and learned using expectation-maximization (EM)algorithm. A simple Bayesian classier was used for classication. Let X represent the set of feature vectors obtained from a subject. The subject is recognized as a walker w according to the rule: w = argmaxw P r(w)

XX

0.96

0.94

0.92

Classification rate

0.9

0.88

NGAUSS=10, oversampling NGAUSS=12, oversampling NGAUSS=14, oversampling NGAUSS=16, oversampling NGAUSS=18, oversampling NGAUSS=20, oversampling NGAUSS=10, heterodyne NGAUSS=12, heterodyne NGAUSS=14, heterodyne NGAUSS=16, heterodyne NGAUSS=18, heterodyne NGAUSS=20, heterodyne 5

0.86

P r(X|w),

(8)

0.84

0.82

where P r(w) is the a priori probability of walker w. A MATLAB toolbox netlab (http://www.ncrg.aston.ac.uk/netlab/) is used to implement these algorithms and evaluate Gaussian mixture models. C. Classication Results 1) Context Length of the Differential Cepstral Vectors: Since differential cepstral vectors have been widely used in automatic speech recognition techniques to include temporal information [22], we rst explore the impact of different context length used to calculate differential cepstral vectors on the classication performance. In [11], a context length of 1 is used to augment the feature vectors. Here, we explore the recognition difference using differential context lengths. Figure 8 and 9 compares the recognition performance using data collected from the direct sampling system and the bandpass sampling system, respectively, when the context length of the differential cepstral coefcients is 2. Different number of Gaussians (NGAUSS) in the mixture model and different dimensions of cepstral vectors are swept in our simulation. It can be seen clearly that in both cases a context length of 2 improves the recognition rate by an average of 2-3% over the heterodyne system. 2) Oversampling vs. Bandpass Sampling: We have shown in Figs. 8 and 9 that both the oversampling and the bandpass sampling systems outperform the heterodyne system. Here we compare the classication performance on signals acquired by the oversampling system and the bandpass sampling system. Figure 10 plots the classication results from data obtained simultaenously using bandpass sampling and direct sampling data acquisition systems. Clearly, the recognition accuracy on data from a direct sampling system is higher than a bandpass sampling system, but by only about 2%. One possible explanation for this is that when the signal is bandpass sampled at a much lower rate, the noise in the frequency band of the original system is also aliased to near DC and thus deterioates the signal to noise ratio. However, there is clearly an advantage of trading in the slight degradation in signal to noise ratio and recognition performance for a system that requires much less power consumption.

ISBN 978-987-655-003-1 EAMTA 2008

0.8 0

10

30

35

40

Fig. 8. Comparison of recognition performance on data acquired from the direct sampling system, with a context K=2 vs. K=1 reported in [11].

1

Comparison of classification rates.

0.95

0.9

Classification rate

0.85

0.8

0.75

0.7

0.65 0

NGAUSS=10, bandpass NGAUSS=12, bandpass NGAUSS=14, bandpass NGAUSS=16, bandpass NGAUSS=18, bandpass NGAUSS=20, bandpass NGAUSS=10, heterodyne NGAUSS=12, heterodyne NGAUSS=14, heterodyne NGAUSS=16, heterodyne NGAUSS=18, heterodyne NGAUSS=20, heterodyne 5

10

30

35

40

Fig. 9. Comparison of recognition performance on data acquired from the bandpass sampling syste with a context K=2 vs. K=1 reported in [11].

3) Bandpass Sampling: 14-bit ADC vs. 8-bit ADC: To further reduce the data the amount of data we need to process, we simulated the recognition performance of data with a 14-bit resolution versus only 8-bit resolution. The 8-bit resolution data is obtained in MATLAB by trimming the 6 least signicant bits from the 14-bit resolution data set. We can see the performance difference is statistically negligible. IV. C ONCLUSION We present the design of an active acoustic micro-Doppler sonar system that signcantly reduces the data rate from our previous design [1]. This system utilizes bandpass sampling technique to acquire the sonar return at 10.5kHz and is

85

0.95

Comparison of classification rates.

making it very attractive to be implemented in low power VLSI and applied to various security and surveillance applications in a wireless sensor network environment. R EFERENCES

[1] Z. Zhang, P. Pouliquen, A. Waxman, and A. G. Andreou, Acoustic micro-doppler radar for human gait imaging, Journal of Acoustical Society of America Express Letters, vol. 121, pp. EL110EL113, March 2007. [2] G. Cauwenberghs, A. G. Andreou, J. West, M. Stanacevic, A. Celik, P. Julian, T. Teixeira, C. Diehl, and L. Riddle, A miniature, low-power, intelligent sensor node for persistent acoustic surveillance, in Proc. SPIE Defense and Security Symposium, Orlando, FL, 2005. [3] P. Julian, A. Andreou, , L. Riddle, S. Shamma, D. Goldberg, and G. Cauwenberghs, Comparative study of sound localization algorithms for energy aware sensor network nodes, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 4, pp. 640648, April 2004. [4] M. Stancevic and G. Cauwenberghs, Micropower gradient ow acoustic localization, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 52, no. 10, pp. 21482157, October 2005. [5] L. Warren, Hull-mounted sonar/ship design evolution and transition to low-frequency applications, IEEE Journal of Oceanic Engineering, vol. 13, no. 4, pp. 196198, October 1988. [6] P. Wells, Current status and future technical advances of ultrasonic imaging, IEEE Engineering in Medicine and Biology, vol. 19, no. 5, pp. 1420, September 2000. [7] M. A. Clapp and R. Etienne-Cummings, Single ping - multiple measurements: Sonar bearing angle estimation using spatiotemporal frequency lters, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 53, no. 4, pp. 769783, April 2006. [8] Z. Zhang, P. Pouliquen, A. Waxman, and A. G. Andreou, Acoustic micro-doppler gait signatures of humans and animals, in Conference on Information Sciences and Systems, March 2007, pp. 627630. [9] M. S. Nixon and J. N. Carter, Automatic recognition by gait, Proceedings of IEEE, vol. 94, no. 11, pp. 20132024, November 2006. [10] S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, and K. W. Bowyer, The HumanID Gait Challenge Problem: Data Sets, Performance and Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 162177, February 2005. [11] K. Kalgaonkar and B. Raj, Acoustic doppler sonar for gait recognition, in Conference on Advanced Video and Signal Based Surveillance, 2007, pp. 2732. [12] J. D. Gaskill, Linear Systems, Fourier Transforms and Optics. New York, NY: Wiley, 1983. [13] O. D. Grace and S. P. Pitt, Quadrature sampling of high frequency waveforms, Journal of Acoustical Society of America, vol. 44, pp. 14321436, 1968. [14] W. M. Waters and B. R. Jarrett, Bandpass signal sampling and coherent detection, IEEE Transactions on Aerospace Electronic Systems, vol. AES-18, pp. 731736, November 1982. [15] J. Geisheimer, W. Marshall, and E. Greneker, A continuous-wave (CW) radar for gait analysis, in Conference Record of the Thirty-Fifth Asilomar Conference on Signals, Systems and Computer, vol. 1. Pacic Grove, CA: IEEE, Nov 2001, pp. 834838. [16] J. Geisheimer, E. Greneker, and W. Marshall, A high-resolution doppler model of human gait, in Proceedings of SPIE, 2002. [17] M. Otero, Appliation of a continuous wave radar for human gait recognition, in Proceedings of SPIE: Signal Processing, Sensor Fusion, and Target Recognition XIV, vol. 5809, Orlando, FL, March 2005, pp. 538548. [18] V. C. Chen and H. Ling, Time-Frequency Transforms for Radar Imaging and Signal Analysis. Boston, MA: Artech House, January 2002. [19] P. A. Tipler, Physics for Scientists and Engineers, 3rd ed. New York, NY: Worth Publishers, 1991. [20] R. G. Vaughan, N. L. Scott, and D. R. White, The theory of bandpass sampling, IEEE Transactions on Signal Processing, vol. 39, no. 9, pp. 19731984, September 1991. [21] D. Reynolds and R. Rose, Robust text-independent speaker identication using gaussian mixture speaker models, Speech and Audio Processing, IEEE Transactions on, vol. 3, no. 1, pp. 7283, Jan 1995. [22] B. Milner, Inclusion of temporal information into features for speech recognition, Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, vol. 1, pp. 256259 vol.1, 3-6 Oct 1996.

0.9

Classification rate

0.85

0.8

0.75

0.7

NGAUSS=10, oversampling NGAUSS=12, oversampling NGAUSS=14, oversampling NGAUSS=16, oversampling NGAUSS=18, oversampling NGAUSS=20, oversampling NGAUSS=10, bandpass NGAUSS=12, bandpass NGAUSS=14, bandpass NGAUSS=16, bandpass NGAUSS=18, bandpass NGAUSS=20, bandpass

0

10

30

35

40

Fig. 10. Comparison of recognition performance: bandpass samplig vs. oversampling (direct) in data acquisition.

Comparison of classification rates (NGAUSS = 16)

0.96

0.94

0.92

0.9

0.88

0.86

0.84

0.82

0.8

0.78

0.76

Classification rate

0

10

30

35

40

Fig. 11. Comparison of recognition performance: 14-bit ADC vs. 8-bit ADC in bandpass sampling.

therefore amenable to sensor network systems where energy and communication resources are limited. Then we explore classications with the use of Gaussian mixture models and cepstral and differential cepstral coefcients as feature vectors and compare the performance of the bandpass sampling system and the oversampling system with that of the heterodyne system previously published [11]. Simulation results show that the recognition accuracy is improved when the context length of differential cepstral vectors is increased to 2. Furthermore, the oversampling system demonstrates the best classication performance among all three in our simulations. Finally, the classication performance bandpass sampling system with an 8-bit resolution is close to that of oversampling system, thus

ISBN 978-987-655-003-1 EAMTA 2008

86

Impulse Radio Address Event Interconnects for Body Area Networks and Neural Prostheses

Andrew Cassidy, Zhaonian Zhang, Andreas G. Andreou

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 Email: {acassidy, zz, andreou}@jhu.edu

Abstract Impulse or Ultra Wideband (UWB) radio is a wireless communications method, providing high bandwidth for short range communication. Address Event Representation (AER) is a communications protocol for conveying information in terms of events and is widely used in neuromorphic systems. Combining the two techniques, we present a method for wirelessly sending AER events between neuromorphic systems. The resulting wireless protocol includes an AER event interface, asynchronous handshaking, PRN coding, and UWB transmission. The rst three functions are implemented in digital logic and the last, the physical transmission is accomplished using UWB radio components. The asynchronous handshaking and PRN coding provide two layers of robustness for operating the protocol in noisy conditions. An analysis of the capacity of a network of UWB AER nodes, which demonstrates its ability to support several thousand neurons across multiple transmitters, even at relatively high sustained ring rates, suitable for Body Area Networks (BANs) and wireless networked neural protheses.

AER benets neuromorphic robotics in enabling networks of autonomous AER systems to communicate or interact. Wireless AER benets neuromorphic sensor networks in communication of remote neuromorphic sensor information to base stations or other autonomous sensor networks nodes A. Background AER [1] [2] [3] and its variant Address Data Event Representation (ADER) [4] are asynchronous digital multiplexing protocols. They are widely used for interchip communications in multi-chip neuromorphic system architectures, such as silicon retinas [5] [6] [7], cochleas [8], as well as among chips representing computation in the cortex [9] [10] [11] [12] [13]. Although wired links offer simplicity in implementing AER or ADER interfaces, much of recent work has been geared towards building a distributed wireless cortex. In fact, wireless AER has been proposed and demonstrated in a distributed wireless cortex architecture [14] that employed silicon retinas and commercial off the shelf wireless sensor motes [15] [16]. In this paper, we report on the architecture and experimental demonstration of a wireless AER link that employs Ultra Wideband (UWB) radio as an alternative. Also known as impulse radio and analogous to the way information is communicated and processed in the brain [17], UWB utilizes short bursts of pulses to provide short-range, high-bandwidth wireless communications at very low energy levels [18]. Unlike a conventional carrier based communication system, a UWB system does not require a free running oscillator to provide a carrier for the modulator or demodulator. Thus the system design employs mostly digital high speed circuits that are amenable to CMOS technology scaling. With the use of a pseudo-random noise (PRN) sequence to address each user in the system, each wireless transmitter can be addressed individually without conict. The low emission limit (-41.3dBm/MHz between 3.1GHz and 10.6GHz)

I. I NTRODUCTION Address event representation (AER) is the predominant communication paradigm in spike-based neuromorphic systems. It has two major characteristics that are advantageous for communication of neural spike events. First, it is logically asynchronous, enabling communication of spike events at the time they occur. This retains the spike timing information critical for neuromorphic systems. Second, AER is a multiplexed bus, allowing spike events from many spike sources to share the same communication channel. This efciently utilizes the high frequency of communication in VLSI in order to compensate for the scarcity of wires, particularly in inter-chip communications. Wireless AER increases the scope of applicability of the communication method, and opens up the possibility of efciently communicating AER information over longer, undened, and untethered distances. Applications for which this is important include neuromorphic robotics and wireless body sensor networks. Wireless

87

set by the FCC on a UWB transmitter also ensures that it does not interfere with existing communication systems. II. S YSTEM A RCHITECTURE

Transmitting Chip N-bit data symbols Receiving Chip

encoder

D ... 3

decoder

For example, for a PR code width of 63, there are 6 m-sequences, and therefore 3 transmitters can be supported on the network. Although longer PRN codewords support more users, they also introduce longer latency in transmitting the message, as well as requiring longer correlation banks in the implementation. A full analysis of the tradeoffs associated with PRN selection is given in Section IV.

TABLE I PRN M - SEQUENCES [19] n 3 4 5 6 7 8 9 10 11 12 13 14 15 16 N = 2n 1 7 15 31 63 127 255 511 1023 2047 4095 8191 16383 32767 65535 M = # m-sequences 2 2 6 6 18 16 48 60 176 144 630 756 1800 2048

Fig. 1. Wireless AER system architecture, depicting transmission of one AER event composed of D data symbols.

Our impulse radio AER architecture is illustrated in Fig. 1. The data bus, request, and acknowledgement signals in a wired AER system are replaced by a wireless channel and a pair of antennas on each end. A D-bit AER event is represented by D data symbols. Within each data symbol, an entire N -bit PRN code is either transmitted directly to signify a data bit of 1, or transmitted inverted to signify a data bit of 0. An end-of-message (EOM) symbol is sent after all of the data symbols for one event have been transmitted. When the receiver receives the EOM symbol, it returns an acknowledgement (ACK) symbol to acknowledge it has received or recovered the message without error. If it has not received the correct number of message bits, it does not return the ACK symbol. The transmitter, after sending the EOM symbol, waits for the ACK symbol to be returned before sending the next event in the queue. If it does not receive the ACK within a specied timeout period, the transmitter retransmits the previous message. This serves as a higher level method of error detection, if a message is corrupted beyond recovery, it is retransmitted until it is correctly received. Using m-sequences (maximum length sequences) as the PRN, a number of pseudo-orthogonal codewords can be produced to support multiple access to the same wireless channel simultaneously with nearly zero interference. The receiver correlates the received PRN with reference PRNs for different users to determine its ownership. The number of simultaneous users in such an AER wireless network is determined by the number of distinguishable pseudo-orthogonal PRN codewords of a given length N . The number of m-sequences for PRN codewords of length N is shown in Table I (from [19]). As explained above, each transmitter requires two code words: one for data bits (1 - direct and 0 inverted) and one for EOM/ACK (direct and inverted).

m-bit events shift reg AER interface ACK LFSR receiver decoder correlator bank

Tx logic

encoder

transmitter osc

AER interface

...

Rx logic

Fig. 2. Implementation of the UWB-AER transceiver architecture. A linear feedback shift register (LFSR) is used to generate the PRN codewords. The digital logic of the transmitter and receiver is implemented an FPGA.

The UWB-AER transmitter and receiver architectures are illustrated in Figure 2. The transmitter comprises of a gated oscillator (Cellonics UWBTM-001, [20]) that outputs short pulses (approximately 2.5ns wide) centered around 4.5GHz with a bandwidth of 800MHz and an output power of -57dBm. The output of the gated oscillator is then amplied by a broadband amplier

88

(Picosecond 5840B) with a 21dB gain to produce a 36dBm signal on a broadband antenna (Skycross SMT3TO10M-A). The oscillator is gated by the serialized digital PRN codeword of length N . When the serial bit is a 1, the oscillator is turned on and sends out the coded pulse train on the wireless interface. On the receiving end, the signal is received by the antenna (Skycross SMT-3TO10M-A), amplied by a broadband amplier (Picosecond 5840B), and then sent to an envelope detector (Linear Technology LTC5532). The output of the envelope detector is then fed into a comparator to create a digital pulse train, which represents the serialized PRN codeword. A bank of correlators compares the recovered incoming signal with local references to determine its validity and ownership. A symbol is valid when the correlation output exceeds a pre-dened threshold. The correlator bank also distinguishes between data symbols and EOM symbols. If a data codeword is detected (directly or inverted), a bit is sent to the AER shift register that reconstructs the AER event. Once the EOM codeword is received, if the correct number of message data bits have been received, the AER event is recovered, and the AER interface logic in the receiver returns an ACK to the AER interface in the transmitter. The digital logic for the transmitter and receiver are implemented on a Xilinx XC3S1500 FPGA. They consume less than 5% of the FPGA, leaving plenty of room for implementing the AER neural array itself. The device utilization is summarized in Table II. The transceiver was hosted on an Opal Kelly XEMTABLE II D EVICE U TILIZATION : X ILINX S PARTAN XC3S1500 Resource Slice FFs: 4-LUTs: 2KB RAMs: Percent Utilization 5% 3% 18% Total Available 26,624 26,624 32

Fig. 3. Oscilloscope capture of the control signal to the gated oscillator (top) and the recovered signal from the envelope detector. Three data symbols (1,1,0) and an EOM symbol are shown.

of AER event generators or neurons (Nnrn ) supported by a single UWB-AER wireless channel and (2) the number of distinct wireless channels (Nch ) that can be simultaneously addressed in the network. A generic UWB-AER wireless network is shown in Figure 4, depicting the neurons in each wireless node, as well as the distinct wireless channels in the network. Note that each channel is duplex, requiring a transmitter and receiver on both ends of a channel, so that every message can be acknowledged.

Node 0

Node 1

Node 2

Node 3

Nch

3010 FPGA integration module. The integration module has a USB 2.0 interface to a host PC. High level control and interface to the design is through MatLab or Visual C++. Figure 3 shows an oscilloscope capture of the control signal for the gated oscillator on the transmitter and the received pulses after they have been ltered by the envelope detector. IV. C APACITY A NALYSIS The capacity of an UWB-AER wireless network is determined in terms of two parameters: (1) the number

Node 2Nch-2

Fig. 4.

The number of AER neurons is determined two factors: the width of the AER word and by the capacity of the wireless channel. Given waer , the AER bus width in bits, n = 2waer neurons are supported by the AER bus. For example, a 10-bit AER bus supports 1024 neurons.

89

The UWB-AER wireless channel capacity is analyzed below. The number of channels in UWB-AER network is determined by width of the PRN (pseudo-random number) used for encoding the transmitted data. The wider the PRN, the more orthogonal combinations that exist. Table I [19] shows the number of m-sequences (M ) for a given PRN width N = 2n 1. The number of channels is: Nch = M/2, since each channel requires two distinct code words, one for data and one for EOM/ACK. A. Single Channel Capacity The number of neurons (Nnrn ) supported by a single UWB-AER wireless channel is determined as follows. In this analysis, we assume that the UWB transmitter sends pulses with a width of tp = 10 ns/pulse, a conservative value, as a minimum UWB pulse width is approximately 5 ns/pulse. The number of PRN words, wmsg (PRN words/event), that compose an AER message is:

wmsg = waer + 2

Cch = = Dch /Dneuron 555.55k (events/sec)/1kHz

= 555 neurons

while the AER bus supports 210 = 1024 neurons. So the wireless channel is limiting the system throughput. If the assumed re rate for each neuron is reduced to 500Hz (events/neuron), then the channel capacity becomes:

Cch = 555.55k (events/sec)/500Hz

= 1111 neurons

which is above the 1024 neurons supported by the AER bus width. Also note that since the PRN width is 15 bits, there are 2 m-sequences, (given in Table I), so only one channel can be supported by the wireless network. B. Network Capacity Generalizing the previous example, we can plot curves showing the tradeoffs for Nnrn , the number of AER neurons supported by a single UWB-AER wireless channel and Nch , the number of distinct wireless channels in the UWB-AER network. The number of AER neurons is determined by the AER bus width and the channel capacity. The channel capacity is primarily determined by three parameters, the AER bus width, the PRN word width, and the single neuron re rate. The following plots show the number of neurons as a function of the AER bus width and an additional variable, either neuron ring rate (Figure 5) or PRN word width (Figures 6 and 7).

UWB AER Network capacity - variable indiv. fire rate 12000 2AER width 100 Hz 200 Hz 500 Hz 1000 Hz

(1)

where waer is dened as the AER bus width in bits. The time to send an event, tevent (sec/event), is:

tevent = wmsg wprn tp

(3)

where wprn (bit/PRN) is the number of bits in a PRN word, and tp (sec/bit) is the UWB pulse width. The event rate, Dch (events/sec), of a single wireless AER channel is:

Dch = 1/tevent

(4)

The overall channel capacity, Cch , is the number of neurons that can be supported by the channel:

number of neurons

10000

(5)

8000

where Dneuron (events/sec) is the event rate of a single neuron. For example, suppose the PRN width is 15 bits, the AER bus width is 10 bits, and neurons on the AER bus re at a rate of 1kHz (events/neuron). Using equations 3 and 4 above, the number of neurons supported by the wireless UWB-AER channel is:

tevent = (10 + 2) 15 10ns = 1800ns/event Dch = 555.55k events/sec

6000

4000

2000

10

12

14

Fig. 5.

90

UWB AER Network capacity - variable PRN width 12000 2 PRN : 7 bits PRN : 15 bits PRN : 31 bits PRN : 63 bits PRN : 127 bits

AER width

UWB AER Network capacity - variable PRN width 350 2 PRN : 255 bits PRN : 511 bits PRN : 1023 bits PRN : 2047 bits PRN : 4095 bits

AER width

10000

300

number of neurons

number of neurons

8000

250

200

6000

150

4000

100

2000

50

10

12

14

0 1

Fig. 6.

Fig. 7.

UWB AER Network capacity - Duplex Channels, Neurons

5000

25

In Figure 5, one curve plots the increasing number of neurons, as the AER bus width increases (black). Another set of curves plots the decreasing number of neurons supported by the channel, as the AER width increases. This is plotted for several neural ring rates. The PRN width is held xed at 15 bits. The minimum of these two curves is maximum number of neurons supported for the given parameters. For example, if the AER bus width is 11 bits, and the neural ring rate is 500Hz, then the number of neurons is limited by the channel capacity (to approximately 1000 neurons). If the ring rate is 200Hz, however, the number of neurons is limited by the AER bus width (to 2048 neurons). Figures 6 and 7 depict the channel capacity (number of neurons) versus AER bus width for several PRN widths, assuming a xed neural ring rate of 200Hz. Figure 6 shows PRN bit widths of 7 to 127 bits, and corresponding numbers of neurons in the thousands. Figure 7 shows higher PRN bit widths ranging from 255 bits to 4095 bits, and hundreds of neurons. As the PRN width grows, the number of neurons per channel decreases. The advantage of increasing the PRN width is that the number of channels per network can be increased (at the expense of neurons per channel). Assuming a xed neural ring rate of 200 or 100 Hz (N200 and N100 respectively), the number of neurons and the number of channels are plotted versus the PRN width in Figure 8 and 9 and . This shows the tradeoff between the decreasing number of neurons per channel, but the increasing number of channels as the PRN width grows larger. The number of channels and the number of neurons for each PRN width are also listed in Table III.

4000

20

3000

15

2000

10

1000

100

200

300

400

500

0 600

Fig. 8.

From the table, we can see that the network can support thousands of neurons per channel for a few channels (less than four) or hundreds of neurons per channel for nine channels. Note that these estimates are for high sustained ring rates (100-200Hz) for every neuron. It is more realistic for a neural array to have a wide variety of ring rates, with only a fraction of the neurons ring at a high sustained rate. V. D ISCUSSION AND C ONCLUSION We have demonstrated a UWB based wireless AER system that enables wireless interchip communications between neuromorphic systems. With the use of a gated oscillator in the transmitter, the system power consumption is reduced because the system does not require a free running oscillator to provide a carrier reference for the

91

UWB channels

UWB AER Network capacity - Duplex Channels, Neurons 10000 25

8000

20

6000

15

4000

10

2000

100

200

300

400

500

0 600

Fig. 9.

7 1 4096 8192

15 1 2048 4096

31 3 1024 2048

63 3 512 1024

511 24 64 128

1023 30 32 64

modulator or demodulator as in a conventional communication system. The gated pulse based wireless interface presented in this paper can be easily generalized to be used with other asynchronous protocols to achieve ultra low power operations. In addition, our asynchronous wireless AER protocol is also compatible with other wireless transmission technologies. The wireless protocol includes AER events input and output, asynchronous handshaking, PRN coding, and UWB transmission. The top three layers are implemented in digital logic and the lowest physical layer implemented with UWB radio components. After describing the architecture, implementation, and testing of the UWB AER link, we presented an analysis of the capacity of a network of UWB AER nodes. The combined UWB AER protocol is able to support several thousand neurons across multiple transmitters, even at relatively high sustained ring rates. R EFERENCES

[1] M. Sivilotti, Wiring considerations in analog VLSI systems with applications to eld programmable networks, Ph.D. dissertation, California Institute of Technology, 1991. [2] M. Mahowald, VLSI analogs of neuronal visual processing: A synthesis of form and function, Ph.D. dissertation, California Institute of Technology, 1992.

[3] K. A. Boahen, Point-to-point connectivity between neuromorphic chips using address events, IEEE Trans. Circuits and SystemsII: Analog and Digital Signal Processing, vol. 47, no. 5, pp. 416434, 2000. [4] J. Georgiou and A. Andreou, Address-data event representation for communication in multichip neuromorphic system architectures, Electronics Letters, vol. 43, no. 14, pp. , July 5 2007. [5] K. Boahen, Retinomorphic vision systems, Proc. of MicroNeuro 96 (IEEE), pp. 214, 1996. [6] E. Culurciello, R. Etienne-Cummings, and K. A. Boahen, A biomorphic digital image sensor, IEEE Journal of Solid-State Circuits, vol. 38, no. 2, pp. 281294, February 2003. [7] E. Culurciello and A. Andreou, 16 x 16 pixel silicon on sapphire CMOS digital pixel photosensor array, IEE Electronics Letter, vol. 40, no. 1, pp. 6667, January 2004. [8] V. Chan, S. Liu, and A. van Schaik, AER EAR: A Matched Silicon Cochlea Pair With Address Event Representation Interface, IEEE Transactions on Circuits and Systems I, vol. 54, no. 1, pp. 4859, 2007. [9] T. Choi, P. Merolla, J. Arthur, K. Boahen, and B. Shi, Neuromorphic implementation of orientation hypercolumns, IEEE Transactions on Circuits and Systems I, vol. 52, no. 6, pp. 1049 1060, 2005. [10] T. Serrano-Gotarredona, A. Andreou, and B. Linares-Barranco, AER image ltering architecture for vision-processing systems, Circuits and Systems I: Fundamental Theory and Applications, IEEE Transactions on [see also Circuits and Systems I: Regular Papers, IEEE Transactions on], vol. 46, no. 9, pp. 10641071, Sep 1999. [11] G. Indiveri, E. Chicca, and R. Douglas, A VLSI recongurable network of integrate-and-re neurons with spike-based learning synapses, Proceedings of 12th European Symposium on Articial Neural Networks (ESANN04), pp. 405410, 2004. [12] D. Goldberg, G. Cauwenberghs, and A. Andreou, Probabilistic synaptic weighting in a recongurable network of VLSI integrate-and-re neutrons, Neural Networks, vol. 14, no. 67, pp. 781793, July 2001. [13] R. J. Vogelstein, U. Mallik, E. Culurciello, G. Cauwenberghs, and R. Etienne-Cummings, A Multichip Neuromorphic System for Spike-Based Visual Information Processing, Neural Comp., vol. 19, no. 9, pp. 22812300, 2007. [14] E. Culurciello, A. Andreou, and P. Mandolesi, A distributed network for visual processing, Autumn 2004. [15] E. Culurciello and A. G. Andreou, Cmos image sensors for sensor networks, Analog Integrated Circuits and Signal Processing, vol. 49, no. 1, pp. 3951, October 2006. [16] T. Teixeira, E. Culurciello, J. Park, D. Lymberopoulos, A. Barton-Sweeney, and A. Savvides, Address-event imagers for sensor networks: evaluation and modeling, Information Processing in Sensor Networks, 2006. IPSN 2006. The Fifth International Conference on, pp. 458466, 19-21 April 2006. [17] F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek, Spikes: Exploring the Neural Code. Cambridge, MA: MIT Press, 1997. [18] M. Z. Win and R. A. Scholtz, Impulse radio: how it works, IEEE Communication Letters, vol. 2, no. 2, pp. 3638, 1998. [19] D. Sarwate and M. Pursley, Crosscorrelation properties of pseudorandom and related sequences, Proceedings of the IEEE, vol. 68, no. 5, pp. 593619, 1980. [20] J. Joe, Cellonics UWB Pulse Generators, in International Workshop on Ultra Wideband Systems, Oulu, Finland, 2003.

UWB channels

92

Edward Choi , Recep Ozgun Bal Mukund Dhar , Howard Katz and Andreas G. Andreou

Electrical and Computer Engineering The Johns Hopkins University, Baltimore, MD USA Email: {echoi, recep, andreou}@jhu.edu Materials Science and Engineering The Johns Hopkins University, Baltimore, MD USA Email: {bdhar1, hekatz}@jhu.edu

Abstract We report on design ows for the fabrication process of n and p type organic transistors onto a single substrate that allows for the integration of integrated circuits. Our fabrication process employs Cytop as the gate dielectric and four different combinations of organic semiconductors. Two p type organic semiconductors were used, Pentacene and -sexithiophene and two n type semiconductors were used, F15 NTCDI (bis(pentadecauorooctyl) naphthalenetetracarboxylic diimide) and F16 CuPc (Hexadecauorocopper phthalocyanine). Combining both p and n type organic semiconductors onto a single, interconnected substrate allows for CMOS digital circuits. Through use of a silicon insulating substrating, individual transistors with separate contacts for gate, drain and source can be fabricated allowing for the development of hybrid silicon/organic integrated circuits. Experimental results from fabricated devices and simple circuits are reported.

Fig. 1.

-sexithiophene

Fig. 2.

Pentacene

I. I NTRODUCTION Research in organic semiconductors[1] has progressed to building circuits for large area applications[2], such as solar cells[3] and exible-substrate[4] organic light-emitting device (OLED) displays[5], where the relatively inexpensive organic processes show many advantages over conventional silicon-based processes. Technologies developed for patterning these organics includes shadow masks for evaporation[6][7], the typical method of deposition, and also new compounds compatible with printing[8][9] to the extent that stacked, 3D processes might be possible[10]. Modeling prototype devices built with these organic semiconductors[11] and incorporation into simple circuits [12][13][14] demonstrate great promise for this technology. The latter examples employ pseudo-nMOS designs based on pentacene, which works as a p-type organic semiconductor and thus uses only p-type transistors. There has been some work on complementary organic transistor circuits using F16 CuPc as the n-type material [15][16][17]. There are further opportunities for hybrid integration, combining the electronics of an underlying oxidized silicon substrate with the organic materials above. This not only suggests a test platform for characterizing organic materials but allows for the possibility of hybrid chips, where the materials can

operate in a synergistic fashion. In an integrative research program encompassing research all the way from materials to devices, circuits and all the way to sub-systems, there needs to be a systematic way of tackling the problems at the different levels of the process while at the same time making progress towards the ultimate goals and accomplishment of the research objectives. For example, in studying the variability of the threshold in organic transistors arrays we must have a means to fabricate such devices and automate the testing and data acquisition in the arrays. The process described in this paper was designed with this future development in mind. Pentacene and F16 CuPc as the ntype material were incorporated into the process as described in section II, in addition to using the uoropolymer Cytop as a gate dielectric, as explained in section IV-B. II. O RGANIC S EMICONDUCTOR M OLECULES Small molecule organic semiconductors have been reported to have higher mobility values than polymerized organic semiconductors[19]. p-type organic semiconductors, such as Pentacene (Figure 2) and -sexithiophene (Figure 1) are well known and have been used successfully for all-p-type (pseudo-nMOS style) transistor circuits[12][13][14]. Newer ptype organic semiconductors such as Hexadecauorocopper phthalocyanine (hereafter denoted as F16 CuPc as per Figure 4) and a 15 Fluorine variant of the class of Naphthalenetetracarboxylic diimide (NTCDI) organic semiconductor com-

93

Fig. 4.

pounds (bis(pentadecauorooctyl) NTCDI hereafter denoted as F15 NTCDI, as per Figure 3) have been synthesized which show promise for integration into complementary organic semiconductor circuits[15][16][17]. These four compounds, two n-type and two p-type, were deemed to be the most stable and simple to handle of the many known organic semiconductors available at the time, and thus were used in this study to prototype the processing, although any organic semiconductor which can be deposited by thermal evaporation and will adhere to gold and Cytop could be used. III. T OP C ONTACT VS . B OTTOM C ONTACT D EVICES Most of the work done to characterize the organic semiconductor materials uses top-contact devices, as per Figure 5. These have a fairly simple fabrication process, whereby an entire oxidized silicon substrate is coated with a single organic semiconductor material (typically by thermal evap-

oration) followed by shadow-mask evaporation of the gold source and drain electrodes through a machined array of lines (similar to a diffraction grating). The underlying silicon substrate serves as a common gate for all of the devices, and is accessed by scratching through the oxide on a corner of the substrate. Each device is tested by probing adjacent lines of source and drain; leakages to other devices (which share the same contiguous organic lm) would be signicant, except that typically only one pair of electrodes is accessed at one time. Devices fabricated through this process allow for the characterization of the electrical properties of the material, and its integration as part of a device structure. For results of testing the organic materials using these structures, please see Section VI. However, the common gate of top-contact devices makes it difcult to build circuits involving multiple transistors. Ideally, any process would allow separate gates connected to separate source and drain nodes to allow maximum exibility in circuits designed with these materials. To accomplish this, a bottomcontact (gate) fabrication process is necessary. IV. B OTTOM G ATE C ONTACT P ROCESS F LOW A process ow that allows for the fabrication of organic devices and circuits will be described. A diagram of a bottomcontact device is shown in Figure 6. A. Oxide dielectric The initial process ow using sputtered oxide as the gate dielectric is as follows: Silicon Wafer Wet oxidation for passivation Chrome e-beam evaporation for gate metal 1000A Lithographic patterning of gate electrodes E-beam evaporation (sputtering) of silicon oxide 5000A Lithography to form openings for interconnects E-beam evaporation of chrome-gold for source-drain metal 1000+500A 8) Lithography for source-drain contacts 9) Oxygen plasma cleaning to remove surface contaminants 10) Shadow-mask evaporation of n-type organic semiconductor 1) 2) 3) 4) 5) 6) 7)

Fig. 5.

94

Fig. 8. square

Fig. 7.

There have been reports in the literature of success using amorphous uoropolymers as gate dielectrics.[21] In particular, a material given the trade name Cytop and manufactured by Asahi Glass Co. (Japan) has garnered signicant interest. The process was thus modied to incorporate this material as the gate dielectric instead of the sputtered oxide, to eliminate the many problems resulting from the sputtered oxide as mentioned above. B. Cytop processing Cytop is spinnable and curable as per the processing directions on the datasheet[22]. The thickness of Cytop after spinning and curing on a glass or silicon dioxide surface matches the specications remarkably well. With a dielectric constant = 2.1 0 and a resistivity on the order of 1017 cm. Cytop works well as a dielectric for the organic transistors. It does have the further advantage of being transparent, and hence the lm thickness is easily measurable using non-destructive optical methods. The Filmetrics F20NIR thin-lm analyzer works particularly well. Using an index of refraction value on n=1.32 yielded measurements which matched up nicely with step measurements taken with a surface prolometer (Dektak IIA). As the compound is a close derivative of Teon, once cured it can be expected to be impervious to all manner of chemicals, which was conrmed in several papers[23][24]. However, these sources also indicate that Cytop would be easily etched using RIE, specically an oxygen plasma treatment used for removing organic compounds. Tests have conrmed that a tabletop ICP RIE will etch this lm reliably. The protocol follows the directions suggested in the laboratory operating procedure, with the oxygen pressure set to 0.4 Torr and the power varied according to the desired rate. At 50 and 100 Watts, the rate of etch of Cytop is approximately 30 to 40 A/s; at 200 Watts, the rate is approximately 200 A/s, and at The 400 Watt setting is 400 Watts, the rate exceeds 100 A/s. only used to remove all of the the Cytop in the exposed areas, for the via etch step (see Section IV-C). The oxygen plasma RIE also etches the masking photoresist, so it is necessary to ensure that there is sufcient photoresist for the longer, higher power etches; however, because the Cytop is typically an order of magnitude thinner than the masking photoresist (S1813 spun at 1500 rpm yields on the order of 2 m of photoresist thickness) this is not usually an issue. Certainly at the highest power setting the photoresist can easily mask a 5 minute etch.

11) Shadow-mask evaporation of p-type organic semiconductor F15 N T CDI[20] or F16 CuP c[16] was used as the n-type semiconductor and Pentacene or -sexithiophene[16] as the p-type semiconductor, for a total of 4 combinations of organic materials. The shadow mask is a KOH etched silicon wafer, with openings for both the n and p areas. Alternatively a laser machined ceramic or metal mask can be used. After alignment of the shadow mask, the openings for PMOS devices are covered with tape during the n-type organic semiconductor evaporation, and likewise for the NMOS devices during the p-type organic semiconductor evaporation. As anticipated, the sputtered silicon oxide does not behave like thermal oxide. Thus 500 nm of oxide was deposited, to ensure no holes in the dielectric due to non-uniformity or porosity. An additional problem resulted from inability to control the etch rate accurately. The etch rate of the sputtered oxide is particularly fast (less than 30 seconds in 6:1 Buffered Oxide Etch). This results in signicant undercutting of the oxide during the oxide etch step, and consequently the openings for interconnects are much larger than intended (Figure 7). As a result, some of the gate metal lines underneath get etched during the source-drain patterning, and some of the source-drain metal above is washed away when the supporting oxide underneath is etched away. Despite these problems a few structures appeared to be sufciently intact for testing. Structures designed for resistivity measurements (Figure 8) indicated a sheet resistance of the source-drain metal of about 0.5 / , and a gate metal resistivity of about 12 / . None of the structures designed for measuring the interconnect resistance survived, due to undercutting of the oxide as explained above. Transistor measurements did not yield any results above the noise oor. This may be due in part to surface treatments, or lack thereof, with regard to the organic semiconductors to the source-drain and gate oxide interfaces.

95

S iO 2 S ource (A u) OSC G ate (S i) D rain (A u)

Si

Fig. 9. Structure resulting from Cytop-based processing

S iO 2

OSC

G ate (S i) D rain (A u)

C. Cytop-based process The revised process is as follows: 1) Silicon Wafer 2) Wet oxidation for passivation 3) E-beam evaporation of chrome-gold for gate metal 4) Lithographic patterning of gate electrodes 5) Spin-coat of Cytop as per manufacturer instructions 6) Cure Cytop as per manufacturer instructions 7) Lithography to form openings for interconnects 8) RIE to form interconnects 9) E-beam evaporation of gold into interconnects 10) Lift-off to remove excess gold 11) Lithography to dene source-drain regions 12) RIE to form trenches for source-drain metal 13) E-beam evaporation of chrome-gold for source-drain metal 14) Lift-off to remove excess chrome-gold 15) Oxygen plasma cleaning to remove surface contaminants 16) Shadow-mask evaporation of n-type organic semiconductor 17) Shadow-mask evaporation of p-type organic semiconductor Note that the source-drain metal is now recessed into the dielectric to provide a smaller step at the interface between the dielectric and the source-drain contact. Large steps at this interface have been correlated with small grain sizes of the evaporated organic semiconductor and result in poor performance, so minimizing this step alleviates this concern. In addition, the process now has explicit via plugs connecting the gate metal layer to the source-drain metal lines, which improves yield signicantly. The deposition of the gold via plugs is self-aligned without any additional lithography; the same mask used for via etch may be used as a lift-off mask for the via metal, and similarly the mask for sourcedrain groove etch is used as a lift-off mask for when the source-drain metal is evaporated. The extensive use of gold is not cost-effective and efforts are underway to investigate electroless nickel plating as an alternative. D. Alternative Process Flows In this section, we discuss an alternative process ow that allows the fabrication of isolated devices with source, drain and gate contacts available for circuit design. This second approach employs silicon on insulator wafers. The key idea here is that

S ource (A u)

Si

Fig. 10. Organic semiconductor devices fabricated on silicon on insulator substrates; top contacts (top), bottom contacts (bottom)

organic semiconductor devices can be fabricated with isolated gate contacts by using highly doped silicon on insulator as the gate material. The advantage of this approach is that it enables the design of the conductive channel and study the interfaces to it without necessarily being concerned with the design of the thin gate dielectric; this can be silicon dioxide or it can be an organic dielectric deposited on the highly doped silicon, or a combination of the two. This approach also allows the design of transistors with extremely thin gate oxide as thin as a few nm as oxidation of silicon is a well understood process. The fabrication of these devices (Figure 10) begins with highly doped n type SOI silicon wafers (1 ohm-cm). The gates of the transistors are dened using photolithography, followed by a wet or reactive ion etching of the silicon from everywhere except the gate areas. A dry thermal oxidation step follows to create the gate oxide. Lithography is then performed to dene the transistor areas and contacts to the source drain and silicon gates of the transistors and the thermal deposition of the organic semiconductor channel. With an additional two lithographic steps we can also design structures that incorporate two layers of interconnects separated from each other by employing the silicon on the insulating substrate material as one layer of interconnects and the metal drain and source as the other. The methods described above should yield excellent quality devices and interconnects, that are sufcient to carry on the research in materials and interfaces as well as the design and characterization of small circuits with device size of micron and sub-micron dimensions. V. C ONTACTS AND I NTERCONNECTS C HARACTERIZATION The improvements in connectivity are immediately evident from visual inspection of the samples following this new processing. The RIE step produces clean sidewalls with little

96

Fig. 11. Dogbone structures with vias etched in the Cytop layer. Via pads are 100 m square

Fig. 13. Pentacene transistor, bottom contact, Cytop dielectric. Via pads are 100 m square

Fig. 12. Dogbone structures used to test via connectivity. Via pads are 100 m square

Fig. 14.

evidence of undercutting, as may be expected. Figure 11 shows the clean, square via openings on the dogbone resistance measurement structures. Testing of the vias connecting the two layers using these dogbone structures (Figure 12) showed 2 resistance per via. The connectivity between layers was quite robust, with all 4 of these structures demonstrating connectivity. There was, however, some cracking observed in the top metal layer which is still under investigation. It is surmised this may be due to thermal expansion of the underlying Cytop when the sourcedrain metal layer is evaporated, and differences in thermal expansion between the Cytop and the metal result in stress at that interface. VI. O RGANIC S EMICONDUCTOR D EVICE C HARACTERIZATION The four combinations of organic semiconductors were shadow-mask evaporated onto the Cytop substrate (see Fig-

ure 13) but for reasons currently unknown these transistors performed poorly, if at all. Given that the resistance and connectivity tests proved positive in most cases, another possible source of failure would be contamination in the chamber when depositing the organic semiconductors. However, test samples deposited at the same time in the same chamber, but using top contact devices, appeared to operate well. For example, the transistor curves from F16 CuPc are shown in Figure 14 and the transistor curves from -sexithiophene are shown in Figure 15. VII. D ISCUSSION Much progress has been made in developing a bottomcontact process for integrating both n and p type organic transistors onto a single substrate. In this process, the sourcedrain metal exhibits some stress-related fracture which may affect yield, so this part of the process needs to be investigated

97

[4] M. Kane, J. Campi, M. Hammond, F. Cuomo, B. Greening, C. Sheraw, J. Nichols, D. Gundlach, J. Huang, C. Kuo, L. Jia, H. Klauk, and T. Jackson, Analog and digital circuits using organic thin-lm transistors on polyester substrates, IEEE Electron Device Letters, vol. 21, pp. 534 536, 2000. [5] S. Forrest, P. Burrows, and M. Thompson, The dawn of organic electronics, IEEE Spectrum, pp. 2934, August 2000. [6] D. Muyres, P. Baude, S. Theiss, M. Haase, T. Kelley, and P. Fleming, Polymeric aperture masks for high-performance organic integrated circuits, Journal of Vacuum Science and Technology A., vol. 22, pp. 18921895, 2004. [7] S. De Vusser, S. Steudel, K. Myny, J. Genoe, and P. Heremans, Integrated shadow mask method for patterning small molecule organic semiconductors, Applied Physics Letters, vol. 88, p. 103501, 2006. [8] H. Katz, Recent advances in semiconductor performance and printing processes for organic transistor-based electronics, Chemistry of Materials, vol. 16, pp. 47484756, 2004. [9] Y. Liu and T. Cui, Polymer-based rectifying diodes on a glass substrate fabricated by ink-jet printing, Macromolecular Rapid Communication, vol. 26, pp. 289292, 2005. [10] J. Ahn, H. Kim, K. Lee, S. Jeon, S. Kang, Y. Sun, R. Nuzzo, and J. Rogers, Heterogeneous three-dimensional electronics by use of printed semiconductor nanomaterials, Science, vol. 314, pp. 17541757, 2006. [11] M. Fadlallah, W. Benzarti, G. Billiot, W. Eccleston, and D. Barclay, Modeling and characterization of organic thin lm transistors for circuit design, Journal of Applied Physics, vol. 99, p. 104504, 2006. [12] S. Steudel, S. De Vusser, K. Myny, M. Lenes, J. Genoe, and P. Heremancs, Comparison of organic diode structures regarding highfrequency rectication behaviour in radio-frequency identication tags, Journal of Applied Physics, vol. 99, p. 114519, 2006. [13] P. Baude, D. Ender, M. Haase, T. Kelley, D. Muyres, and S. Theiss, Pentacene-based radio-frequency identication circuitry, Applied Physics Letters, vol. 82, pp. 39643966, 2003. [14] S. Han, S. Cho, J. Kim, J. Choi, J. Jang, and M. Oh, Ring oscillator made of organic thin-lm transistors produced by self-organized process on plastic substrates, Applied Physics Letters, vol. 89, p. 093504, 2006. [15] J. P. H. Klauk, U. Zschieschang and M. Halik, Ultralow-power organic complementary circuits, Nature, vol. 445, pp. 745748, February 2007. [16] Y. Y. L. R. W. F. Z. B. A. L. R. S. H. E. K. B. Crone, A. Dodabalapur and W. Li, Large-scale complementary integrated circuits based on organic transistors, Nature, vol. 403, pp. 521523, February 2000. [17] B. Crone, A. Dodabalapur, R. Sarpeshkar, R. Filas, Y. Lin, Z. Bao, J. ONeill, W. Li, and H. Katz, Design and fabrication of organic complementary circuits, Journal of Applied Physics, vol. 89, pp. 5125 5132, 2001. [18] J. J. C. K. T. S. W. L. Y. Y. L. H. E. Katz, A. J. Lovinger and A. Dodabalapur, A soluble and air-stable organic semiconductor with high electron mobility, Nature, vol. 404, pp. 478481, March 2000. [19] T. T. S. M. T. S. T. M. H. S. N. Y. S. O. S. Kobayashi, T. Nishikawa and Y. Iwasa, Control of carrier density by self-assembled monolayers in organic eld-effect transistors, Nature Materials, vol. 3, pp. 317322, May 2004. [20] A. J. L. H. E. Katz, J. Johnson and W. Li, Naphthalenetetracarboxylix diimide-based n-channel transistor semiconductors: Structural variation and thiol-enhanced gold contacts, Journal of the American Chemical Society, vol. 122, pp. 77877792, 2000. [21] S. H. A. F. S. W. L. Kalb, T. Mathis and B. Batlogg, Organic small molecule eld-effect transistors with cytop gate dielectric: Eliminating gate bias stress effect, Applied Physics Letters, vol. 90, p. 092104, 2007. [22] A. G. Co., Cytop data sheet. [23] A. M. D. K. H. R. K. P. J. B. P. E. J. Melin, K. Hedsten and F. Nikolajeff, Microreplicaiton in a silicon processing compatible material, Journal of Micromechanics and Microengineering, vol. 15, pp. S116S121, 2005. [24] S. B. K. W. Oh, A. Han and C. H. Ahn, A low-temperature bonding technique using spin-on uorocarbon polymers to assemble microsystems, Journal of Micromechanics and Microengineering, vol. 12, pp. 187191, 2002.

Fig. 15.

further. Furthermore, the poor performance of the organic transistors in the bottom contact conguration is especially worrying, especially given the excellent performance of the same compounds when the top contact samples are tested. Some surface treatments were suggested to improve the interface between the source-drain contacts and the organic semiconductor, and these modications will be included in future experiments. The current masks are designed to connect the transistors in inverter-based circuits; however, these are only preliminary tests and there is no restriction on how the transistors are connected; other circuit congurations are certainly possible by simply changing the masks. The process is also designed to allow connectivity to the substrate below; there is no fundamental reason why the underlying silicon substrate cannot be a silicon chip with embedded electronics. All processing steps are conducted at low temperatures (below 200 C) and as such are fully compatible with using a foundry chip as the substrate below. VIII. ACKNOWLEDGEMENTS This work was supported by DOE grant DE-FG0207ER46465, Johns Hopkins University. Lithography and fabrication was done at the Whitaker Institute Fabrication and Lithography Facility at Johns Hopkins University. The authors would like to thank Dr. Jia Sun for his helpful comments and AFM images concerning the organic semiconductor grain size at interfaces. R EFERENCES

[1] M. Pope and C. Swenberg, Electronic Processes in Organic Crystals and Polymers, 2nd ed. Oxford, Great Britain: Oxford University Press, 1999. [2] C. Dimitrakopoulos and P. Malenfant, Organic thin lm transistors for large area electronics, Advanced Materials, vol. 14, pp. 99117, 2002. [3] S. Shaheen, D. Ginley, and G. Jabbour, Organic-based photovoltaics: Toward low-cost power generation, Materials Research Society Bulletin, vol. 30, pp. 1019, 2005.

98

0

12 6785

5
772 345 9 7 7

52 2 79 5 754 5 8 5 67 5 8 7 55 5 7 2 78 2 59 2 79 5 585 7 372 "#

9 75 !

727 3795 $ 2

%&') , - ./0/1 345 78 /9 3/661 9 1 (*+( 2/. /6 8: ;/.5 1 1 4 856/ 5:9 4 6/<4 9 76>< 7. < @45 3//8 6/2/< 1 = 4= 77? 7?/6A B = 5 C4D 8 @1 E ./F>/80D 0782/.578 1 73= 8/6 45 ?4.= 7E = 871/ 5 1 5 41 @/ 5 @4?/6 484< = 61 = 7:G7G :14<0782/.578AB 55 1 =1 @7C8 = @/?/.E 480/7E @4== 7.9 = 3486?45 5:9 4 6/<4 < 048 3/ ?./61 /6 >58: 4 9 >0@ @/ 5 1 = 77? 0= 1 59 ?< 7CG 5/F>1 /8=9 76/< 1 @1 >< 1 ?< 1 1 /.< ?45 24< AH58:= 5./5 =59 /<8/4. = @/7.D 048 3/ 4??</6 ./6>01 = 079 ?< = 7E= 484< 1A 1 8: @/ /;1D @/ D55 I1 >< 1 7E4 610./= /F>1 /8=9 76/<5 9 4=78 5 / 24< @7C54:.//9 /8=C1@ = ?./61 /6 ?/.E 480/A 0= 7.9

J M M 1P6 PNP b P N LPNL# P 79588 R5 345 785 48 2 2 V" 2 9 59 V 2 2 \J 62 9 9 2 77 9 558 5 2 2 3J 3258 R9 9 3 9 7 Q4 9 2 $9 85972 548 7 959 7 R9 7 7 9 59 2 5 2 73 5 9 3582 9 59 2 2 U58S 9 2 8 7 R9Q9 7 77W J 9 \X

J K L
M 6 # NL M K P 47 9549Q7 2 S 9 2 " 54R83 7 7345" 785 9 T7 5 7587Q 79S V2 5V 5 9 72 U2 8 9 7 5 2 Q7T7S 52 5 2 7 9 4 R794 3 7 59 9 48 2 55 W J L7 7 R 7 Q7" 59 9 0X 772 R 7 T7S 7 2 2 9 4 9 V 2 9 7 9Q 7 Y 7 2 9 3258452 JL7 U3 2 R 79 2 5 9 59 7 7 7 7 8 R5 Z W 5 5R5 Z W W 345" 9 V" 9 [ \X

[ ]X ^X 2 785 77 J 9

2 3J0J 62797 2 72 7479 2 V2 345" 9 6NJ 772 4R8 59 9 2 785

L7R 7 U3 2 78 R 59 R52 772797 2 W 9 2 9 S 78 2 58 558 R 3 Q 2 _X 3258 3 8 3 7 2 9 2 2 5 345" 9 Q 7 2 785 775Y 8 5 2 8 5 R 9 7T27 S 5R5 9U879 2 9 595 7 2 7479 5 4R8 7 5 N 9 9 5Y 59 74R79 9 T589 V2 2S 7T27479 59 9 5 7 479S 58 9 59 9 8792 73 2 78 7W J 32 `X L2 V 2 7 5 Q 8 J 179 7 2 Y 352 8V 2 7 9 R 7 75 9 5 7 R 772 279 7 V2 7479 2 4R8 59 79 8a179 32 52 2 75 7T2 79 478Q 7 2 58 7 9 7958 558 2$32 3 9 5 9 S 9 S 2 7 2 7 595 7 5RR87 Q 2 R7 " Q 457 R 9a2 2 b 9 48 2 8 72 2 79 S74 2 59 7 9 5 R 79 5 U58S 2 2 b 8 2 7 7 7 8 79 5 32 7 7J

L2 U3 2 5758 9 Q 9 59R8 82 7795 7Q 2 59 6NJ c 7 9 2 7 R9 358 2 5 5R5 358 9 2 2 $ 7 9522 9585RR 2 7 5R5 2 785 77 5 9 345 9 9 $2 V2 7795 78 R8 9 Q 82 57599 779 7Q9 2 7358 5JL2 54R8 S 52 9 9 59 9 5 2 27 7R8 3 72 73 V2 5RR 2 7 95 7Q 2 R59 Q 9JL7 4R8 95 2 V7 9 2 7 7 7 95 7Q 2 2 7R8 S 55R5 9 Q 9 57 U87V2 R8 57 52795 7Q 2 Q 98 9 85927958Q7T7SW J 9 ^X 7 R 75 9 75R5 92 4R8 R 279 79 Q 2 2 7479 7S V 7 2 V R5 9 2 5 R 7 2 $8 U873 J
52 Q7T7S 7 Q7T7S 7 2 9 772 7 9 7 9 48 2 358 5Y 2 5 7 7 593 2 9 75J 4 9 72 7479 2 $Q7T7S 7 2 2 772 4R8 59 72 9 7 558 452 3 $5 9 2 73582 322 2 5 2 97 75$59 7 7d54R8 7 2 2 2 7 7 3J0$2 7 2 9 2 9 7 3258452 V7 5 5R5 345 785 48 2 W J 67 2 9 59 7 0X 9 9 5884259 9 2 74795 5R5 U87 V2 72 2 9 2 4R8 9 9 79 7Q7T7S R S R97 599 7278 22 752 7Q7T7S$ 9 8 99 T7 8 2 5 72 58S 7T27Q7T7S 9559 2 8 2 9 5 2 7 59 Q7T7S S2 2 5 4 4 4R8 9 472 7 78 3 7 7d 75 9 9 772 279 7 59R 79 R 2 89 Q5 38 78 7 22S 2 7 2 2 7479 2 9 R 4R8 59 597 945Y7 Q 5S 7d9 58 e 7 7 4R79J L7 R 7 U3 2 Q7T7S 7 2 2 R 59 7 " 7 Q5 345" 9 8 52 2 2 785 R 9V52 97 2 W gXW J fXW hX L7 7542 5 48 7 52 772 9 59 7 2 558 2 $ 9 3 R9 5 R 75 5 75 2 9 9754 22 9 3258 9 32589 J R9 7T59 2 9 2 7 3258358R 3 7R 9 7 2 9 595 7R779 7

IEEE Catalog number CFP0854E-CDR

99

0

12 3454 67893

2 4
78 77

78 7

2
78
7
7

77 7
2 4 2 2 7 78 2
3
83 8
2
78
2 77
7

82 2
7
8 83 8 !2
83

8
38 7 78
2 8
2 22 2 77
78 78

2
9
7 2 !

7 83 !2 3
77 82 78 2 83

28
2 7 2

2 ! 7

2!2
8" 37 77
#

72 7 8 2 82
2 !8 7

2 7
78

8$
8
78 2 41 2 2 8

83 7 2

2
2 8
7 2 2$ 2 8
2 7
2 3 7 2 6% &'
78 8
8
87 732 4 3
7

7

! 78 !
2

7 8 2 7 7 7!
8 8 2 7
8

38 73 7 8
2
2 2
! 7
2 7

!
8

388 38
2
77 738 2 4 !8
8
9 8
2

2 87
83 78 2 98 7!2 7
8 7

78
8 73

#2 2 83412
2 77

7!8 2
83 8
8
8 ! 8
8

7
2 87
2
2 2 78!22$ 7
9 2
4 8
838 2
7
2
8
2

8$48
7 2 $
983
2

28
8 !2 2 8 ( 8 2
)6*

72
7+ 87

3
!2 3 2$
8 7 7 7 8
2
87 , 4

732 -.

2 987893

2 2 78 2 12
83
78 8 3454 / / 012 /34 )05 % & 6 0) 1& 7 0& 70/64 ) 4 54 )8 '/' /4 / 7

8
7

7
8 7 8 2 2
2
7

8$
8
78

8 2
! 7
2
78!
8
!8
98
8
2 3 7

8
! 774 8 ! 78
3
2
7 7 2 7

727

8$
8
784
8!2
2
7

3982 7

2 8
7
83 !8
98
4
7

8$
8
78 8 ! 72
8 2 2

2 7 $ 2 !
$ 8 !8

8$ 87 83 7

78
! 7 4 !8 9 2 8 )6 78
2
77 7
8 2 7

8
78 782
72 12 32 83 7 8 34+ 8 !$

ISBN 978-987-655-003-1 EAMTA 2008

12 34+4 % 77 )6 78 4 7

2 7

7 98 B ; 2

2 8 8 7 72 )4 2 !8 7 78 7 8 7 2 2!

2 JL. , MN;< O?@ M:;< H OP?IQ< H OP? O O ( 0* :;< !8 2 MN;< 2 M O?2 38 O?2 7 38 8 Q< 2

82$ 38 MN;< 2 7 O? 2 4 O? 22

8$ 7M:;< 2 732 2 O? 7 2 844 83 $2 2 8 2 32 !$ ( 8 7

8 78 8 J*

2 9 :;< =>?@ 9 N; RSTU < =>?@

VWXY X Z [\ ] ^ _ WXY X Z [` Da \ ] b ] ^ Y VWXY X Z [c ] b Y \ ] ^ [Y ] b ] ^ W\

87 8 5*2 ! 22 $ $4d

2 8 ! !$ 9 7#2 7 !

7 8 83

8$ 2 22 2 7 2 7 $ 78

8 7 7 !8 2 7

8 2

8$4 4 7

2 8 77 2 2 2 2 8 12 34e4 9 2 7 7 8 4 78 92 8 2 3 78 9 2 8 7 9

IEEE Catalog number CFP0854E-CDR

100

0

8 9 4 3 4 7

4

3 3 8

4$%& 65 9 87 7 3 2 45/53 27 4 8

5 5

2 3 315 38 3

45 5 5 V * W2 5 $%& 893 9 8 2 42 3 81 2 "U 0, I 5 3 2 5 72 345 5 8 2 3

4 3 81 5 4 55 UVQ E 4 2

8

45 5 8 2 533 8

2 3 53 5 8653 4 2 4 XC Y : # h' Z [ 5 3 81

65 3 823 98 3 4 3

472 362672 5

3 VT\^]_ \ b c de _fg `a 657 2 5 5636565 3 8 2 9

2 4 8653 5

874 14

ij& 2 3 8

2

2 5 2 2 75 3 4 5 538 3 8 98 3 8

2 3 4 3 455

38

4 2 4533

DHQEDk" Dk 2 3 5 12 4 89 3 5 5 5 4 3 4 81 2 DH 2 3 2 97 6 5 2 753 B2 4 5

32 8 "142 64 2 3 97 6 142

3 3 "2

8

5 4

64 2 4 1264 3 53 685

8

53

2 72 3 5 2 5 8 5 81 5 3

!
3 2 48 3 8 4

8

9 5
6893 487 5 3 8

65
5 7
3 53 7 8

5 2 753 2 4 88 9 753
5 5 2 81 558972
38

47 42 3 5

893 72 5
665
4 3

63
825
535

7
638
8 2 2

9 9 2 " 3 83 43 5
8281

2 4
55
5
3 2 8
82
3 2 4
53

2# 538 $%&'58
83 4

8

9 5
6 5 7

4 56
5

3 5
5 3 5 5 5 2 2 48

( 4 81 2 3 87 8

5
89 3 5 72
3 62672" 53 4 5 53 5

3 481
2 2 * 4
5-.

3 75
3(538
) +, 2

3 4 2 2

8

5 72 3 3 2

"
453 4
73 532 5/ 3
5 3 236 42 2 455142
82 7
68

124 3 2 3

53 3 4
73
5 2

8893 5 3 487 8

2 53 8 2 8

8

9 3 5 738 5 5

8

3 2 4 2 753 9 5 6 7

29 33 2 9

27538 qr 4 2 73 52 5 5 72 3>p 2

56 5 4 65

2

97 6 2 33 s "8 2 8 t 5 /38

B) 657 3 2 97 6 2 33 753 5 4 5

8 8 3 465

2

97 64 5 3 423u D 98 3 8 9

465

2

55

3 873 637 72 B) 2 4 73

3 4 5 5

7 2 3 7 2

2 3 3 4 2 538 653 4

8

9 5 6 893 68 3 787ov

452 3 9 4 2 3 4 8812 35

9 2

9 7 638 8 ;w 6x T # }' ;w E=y =z{K5 x V < yT =6x T |7=6 124 y : ~ 5 |: NQ 3 6 3597 6 2 3 4

NQ 5 3 753 9 8

2 5 2 4

3 4 2 563 342 45 4 2

8

2 3 53 2 3 55 56677 8

53 4 873 5 2 2

53 2 8

3

3 3 73 2 3 8

8 8

4 8

2

8

564 873 "B) 2 2 2 5 5 53 73 5 365 873 5 9 6 2 8 3 873 42 72 3 73 5

8

3 4 73 2 5 3 2

5

124 5 1 9 45328 8 38762 3 3 8

45 67: 4 H 5 7:

EKL =7 M5 # ' 42

53 2 7
3 8
52 68

5
8952 3 3 25 42 48

9 3 4533 35

9 2 895 3 3 8

3 4

9 7
638
10 9 2 7
638
47 68

52 73563

3 3 2

98 4 5 481
2 2 124 5
5

35

35

9 2
) 3 2

9 7
638
"4 5 " 67 2
73
56 2 3 9 2 8
4 56/ 88 28 $ 7 2 45
68
763 124 3 52 89 3 2 538
3 4 ; 45 67 95 67: =-5 67< =>5 67 # 3 3 ?' 426

8 4
7

89 1

33 EV 3 5 8 8 ;< 4 5 67 ;< 4 5 67 5642

8
5 4

8

5 883 9 5
6 83 7
3 "124 @# B2( "3
82

2 * 8 2 42 3 ('A # B' 4 637 +, l 2 3 873
h 4 73893

2 4 62 481
9 87 8

5 I 2 2 53
7397
6 89~

~~~;N4873 7368

4 3 END =J C5 D7: E=-FG H = EJ =KL M5 7 # 5668

' 5
6 124 3
82

63 2 #
3 3 4 637 2
'2 2 E 893 5
5 3 4

5
3 97
6 35
538
4

2 124 3 O # )' -FG H : I j v i % vo! $li % ;E 75

14

2 4893 O 3 3 4685

8

FG H 2 45
53 "- 3 2633 2 3 62

8 25

5

3 7

3 64263 4537

97
6 75
3(538

8

14
3
733 3 2 2

42 8 4685

8

4575 68

2 2 2 5 5 3 45

53 8
2 5 88 88

3 89
5
14

2 3

8522 2
4 5
PO QERS : END" 5 72
3845
2 81 5 6

"5812 9
8

5
J 2 4

8 3 5 2 5

638
89 3

8

2 2 4 9 5
6 89 3 87 8

2 4 53
45
893 2 4
5 3 5
892

3 7 98 5 26

3 7 2 4
3 & 3

3 2 2 538
@
62 5 T 2
3 53 5 9C D7 2

8

3

48188 5

3 3 5 124 7

63 2 5
3

7 52 2 3 3 5
82 81

28
3 4 32
75

32 4 83 3 72
3 5 4 5 81 8 2 3 5
*
42 +,

ISBN 978-987-655-003-1 EAMTA 2008

3895

38

2

5568

3 3 4 2

8

2 2 3 5 5 56677 8

5 3 5 501 3 53 53 4 2

3 75 3(538

8

2

3 4 2 2

3876 3 2 68

4 3 3

101

0

9 8 4 978

2

987 2

1

!" # 8 4$%&'(%) 2 8

38
2 2 *2%

8

+&,

2

--./ 34 5 7809: .;:7/ .:6;3 012 6/ < &%=&4!&99 >=?) >!&@ ?==?4 ?" @ 4 & ' 7

7% 2 3
2 2

3) 8 7% 4 A 8 8 2 % ) 8 +&--- ./ 3B1 C7DDE15 5 &%
, 012 60F712 4>>&4>& 994?0G) ?H& I
!GH4 >" 4
% &'J9

2 %7% 2 2 &I 3

2 % % 8 2 &I 2 8 32% 2

+&, ./ 3B1 C56E5201O P<2:D2 4>H&4 --- 012 / F F &% !!&994!0=M) !0=G&4!GG!4 " A

A4A2 & ) 2'(28 2 8

2 8

2 2

32%7) 2 2 +&--- Q3P7KOR 0F C56E52 8

, 5 PF : / F&%>=&4!?&99 !>GG) !0!=& ( !GG4

&@ 7 &4 8 A4X 2 Y&Z4[2 3&4U7&'1 \7)98\72 8272 2 % ) 8 % ) 9

78

, Q3P7KOR 0F / F&% 2 92%2 +&--5 PF :C56E52 4>H& 4>&9940!G) 0?M&I ?==>4 5" A4 &J4U4]73&' 0==) N !? 2 X # 2 8N *2 A I 8!H) 1 322 8 2

3) 8 ^ 2 2 2 %7% % 8 9+&, Q7E/ --10K7_2 5 20F 7KOR F : C56E52 / F&%>0&4!?&994!5M) !55M&( !GGG4 H" U4Z&@ 4[7 A 0==)

\7 8

83 9

4I &' I `
%2

3) 8 7%
--- Q3P7KOR 0F C56E52 2 % 8 +&, 5 PF : / F&%>0&4!?& 99 !50!) !5?&( !GGG4 G" 4
Y
X 4 J% @4 & '7)

& 4 Y
9%3 7%
2 8

9

3+&CE27D , :a/ :O C56E52 C71_ :16:bcddc3

2 F 1F 0F / F :/ e/ 76::O5 1a27_F , cddc 4 0< fcR ;: --fg&994!G) !GH4 !=" A4J

7 %&72 2

4 $
U2% 8

8 &>
&I * %& !GHM4

102

Luisa Garca, Alejandra Gonzlez, Henry Moreno, Guillemo Jaquenod

Departamento de Electrnica Pontificia Universidad Javeriana Bogot, Colombia Email: {luisa.garcia,agonzalez,henry.moreno}@javeriana.edu.co, gjaquenod@ciudad.com.ar

Abstract This paper describes the development of a 32channel, 150MHz bandwidth logic analyzer, implemented on an Altera Stratix II FPGA. This analyzer is remotely controlled through TCP/IP, allowing acquisition configuration and control, and data visualization via Internet, by means of a specific user interface.

I.

INTRODUCTION

Logic analyzers are digital data acquisition tools characterized by their ability to detect complex triggering patterns, their bandwidth and data width, and their capability to organize the captured information in a meaningful way, in form of signals or buses, in different formats or radix. An instrument of these characteristics constitutes a digital system that verify other digital systems [1]; this situation makes his design a challenge that involves stringent performance requirements, like the size of the acquisition memory, the real time alternatives for samples capture, and remote transmission abilities. The present work implements a logic analyzer able to evaluate digital devices at frequencies up to 30 MHz, with 32 data channels and two dedicated clock channels. The analyzer has an user interface designed to configure the acquisition parameters according to the nature of the SUT (System Under Test) for data visualization. It uses Internet communications to receive data configuration and acquisition parameters and transmit the captured data, in order to evaluate the SUT in a remote way. This paper describes the steps followed during the design and development of the logic analyzer, with remarks on the engineering decisions that had to be taken for the addecuate balance among all the instrument requirements, with emphasis on the project design methodology adopted for hardware and software cooperation. Fig. 1 shows the FPGA development kit used by the analyzer implementation with the Internet connection and the acquisition system.

II.

DESCRIPTION

A. Metrics of design The viability evaluation, the planning of the objectives and the delimitation of the project has been based on some functionality metrics: the real time acquisition, the remote control, the possibility of expansion, the standardization for the measures and the user interface. The real time acquisition needs conditioned the system to be developed on custom hardware; this hardware was designed for a high frequency performance, to be used in systems sensitive to diagnostic and processes with a frequency up to 30 MHz. Nevertheless, the real time condition imposed a synchronous handling of the storage in the acquisition memories. These characteristics determined the selection of an Stratix II device; this technology has circuits of high frequency performance, with embedded memories and high density of logical resources. Additionally, the selection of the device Stratix II allowed the standardization of measurements. The logic analyzer works with voltage levels compatible with the

103

current standards LVTTL and LVCMOS, but it can be also configured to work 2.5V, 1.8 V, and 1.5 V LVCMOS. Remote Control suggested Internet as the most convenient communication interface. This selection is based on the search of a system of fast transmission rate and widespread use [2]. This feature allowed remote access to data in real time, without the requirement of an expert in the same place of the system to be analyzed. The Internet interface implied the need of a system based in a network processor to manage communications between the acquisition hardware and the remote user application. Taking account the capability and logical resources of Stratix II devices, the ALTERA NIOS II softcore embedded processor has been chosen. Nios II interaction with the memory and the peripherals is executed by a structure called configurable Avalon Fabric. TCP/IP has been chosen as the communication protocol, and the NicheStack TCP/IP implementation of the stack was used; this tool allowed the easy development of a socket server. In this application, the Nios II is in charge of receiving the configuration data and the execution of the orders from the user interface, and to control the acquisition of samples. Finally, the samples are transmitted by the acquisition hardware toward the user interface, where they are shown according to the initial configuration. The concurrency of acquisition events (that is, the analyzer instructions to acquire, to stop, etc.) and the handling of the transmission and reception of information is administered by using a RTOS (real time operating system) MicroC/OS-II [3]. The analyzer acquisition port has 32 data channels and two additional clock channels; however, the expansion ability requirement motivated the development of a scalable system architecture with configurable blocks for easy handling of improvements on the hardware in future expansions. The custom system is composed by blocks and interfaces. The hardware methodology design used the hierarchical structural description in VHDL (very low scale Integration hardware description language). The analyzer was implemented using the synthesis and simulation tool QUARTUS II. This tool, together with the SOPC Builder application, allowed an effective and complete integration of the NIOS II processor, the memories and the custom block. NIOS IDE tool was used also for the development and debug of the total system. From the users point of view, the user interface design based on the National Instruments Application Note 175 "Document/View Architecture in Visual C++ Test and Measurement Applications" [4]. This document is proposed for instrumentation and allowed the coherence between the windows of acquisition and configuration. B. Custom components: the acquisition hardware The custom system designed is composed by a few blocks: the state analyzer, the time analyzer, the module of simultaneous acquisition and the samples storage memories.

1) State analyzer The state analyzer obtains samples of the SUT by the combination of two signals called state clocks. The clocks can be configured to trigger data acquisition by their rising edge, falling edge or both edges. The state analyzer has several triggers activated by level comparison or state words and by detection of sequences or patterns. The state analyzer has a store memory of 12288 samples of 32 bits, that means 614 s. 2) Time analyzer The time analyzer acquires samples in time and analyzes the behavior of the signals: the acquisition can be done through periodic acquisition of 14 ns (75 MHz) or by a transitional acquisition method that detects the changes of the signals in the channels and counts their duration. The time analyzer has a memory of 8192 samples of 64 bits that store the information of the channels and the clocks of the SUT. Trigger conditions can be selected between comparison of time words, detection of edges, pulses of defined duration or glitches. The periodic acquisition can last up to 114,688 s; the transitional acquisition 109,22 s. 3) Simultaneous acquisition module The simultaneous acquisition module allows the user to acquire samples of states and time in the same process, using the mentioned triggers. 4) Samples storage memories The internal memory blocks store the samples captured using the trigger information and the pretrigger and postrigger configuration. C. The embedded microprocessor system The system is composed by the NIOS II processor core, the data and instructions memories and the Ethernet module. Additionally, the custom designed logic analyzer peripheral is connected to the processor; this peripheral is in charge of the data acquisition; the trigger logic; and the writing in acquisition memories. The outstanding elements of the system are described next: 1) Nios II With a performance of up to 64 MIPS (million instructions per second), NIOS II is a RISC (reduced instruction set computer) soft-core 32 bits processor, with cache instructions of 4 Kbytes, jump predictions and hardware implementation for compute multiplication and division. NIOS II uses an instruction master bus, a data master bus and a JTAG (joint test action group) level 1 debug slave port for programming and debugging of the software on the development kit. 2) Avalon reconfigurable fabric Avalon reconfigurable fabric is composed by Avalon ports. The ports are a group of signals used as a simple interface. The ports can be slaves, used to respond to transfers or master, used to begin them. The master and slave ports cannot be connected directly among them; instead, the Avalon fabric structure translates the signals among the ports. 3) Clock generation The clocks handling is based on the Altera predetermined function ALTPLL that configure the PLL (phase lock loop) of

104

the development kit. The oscillator frequency is 50 MHz, and is the main input in the PLL clock. This PLL generates the other clocks used in the system. 4) Ethernet The Internet connection used the peripheral device LAN91C111; this is an Ethernet (10/100) MAC access control, with a physical interface PHY, connected in the kit to a standard Ethernet RJ-45 connector. The communication with the NIOS II processor core is done through the Avalon slave port. 5) MicroC/OS-II The main objective of the software developed on this operating system is to negotiate the interchange of information between the hardware and the graphic interface. The functions of this software are the configuration and control of hardware analyzer, the data reception and the Internet transmission. The software has been developed using a sockets server application, which involves 5 concurrent tasks, for stack initialization, server setup, control of the acquisition blocks, trigger and storage (custom peripheral), detection of the acquisition process states, and data transmission toward the graphic interface. D. User interface The graphic interface designed for remote operation was written in Visual C++ 2005. It was development using MFC (Microsoft Foundation Classes). The objective of the graphic interface is to allow the analyzer's user to configure the acquisition process, and for data visualization. Taking in account the complexity of this task, instead of programming an application whose architecture was based on dialogues, the document/view methodology is used for windows handling. This fact facilitates the data captured graphical visualization in time diagrams and states lists. The interface is composed by three modules: the first one is charged of the hardware communication, the second is the configuration module and the third the visualization module. This last module is composed by the configuration display, the time diagram, the states list, the simultaneous acquisition display, the menu and the tool bar. Fig 2 shows the visualization window for simultaneous acquisition.

III.

METHODOLOGY

The developed project is composed by hardware and software, and a special methodology of hardware software co design was used for the hardware and software design, and the integrated system validation. A. Hardware design methodology The implementation has been developed with the standard of hierarchical design [5]. It is used in the development of systems of high complexity. To reach this objective, the division of the design in blocks with smaller levels of complexity is implemented. The hierarchical design was analyzed from two different perspectives: Top-Down and Bottom-Up [6]. B. Design methodology for the graphic interface The development process uses the methodology named pure cascade [7]. The method is based on a sequential linear model. In this model the requirements of the elements of the system are identified and then there are assigned to subset for the software, to identify the problem and to propose a solution. The analyzer is an application conceived to instrumentation, and the application note 175 of National Instruments "Document/View Architecture in Visual C++ Test and Measurement Applications" [4] was used as a Recommendation for the use of the architecture Document/View for the development of applications of tests and measurement. C. Test methodologies Using the hierarchical design for the development of the hardware, each designer validated the blocks proposed by other member of the group in a stand-alone way. These sequences of crossed tests were developed using testbenches, and ModelSim was used as the tool for this task In the interface development, the evaluation tests were proposed to processes verification and the real execution verification. A validation protocol was carried out to check the integrated hardware and the complete system (i.e. hardware plus software). The validation protocol is based on approaches of functionality for confirm the initial specifications of the system. The protocol support is the verification methodology in system level (System Level Verification) [8, 9]. The covering analysis is introduced to select a limited space tests to the critical cases, in such a way that the test was controllable and observable [10]. IV. CONCLUSIONS

The validation of system specifications was successful in all the test environments: real time debug of the hardware with the NIOS IDE tool, transmission and reception using Internet, coherence and integrity of the registers configuration, control signals and samples of acquisition, validation of the system changing the configuration and the visualization of the data from the graphic interface, verification of specifications with external circuits of different logical families and tests with users.

Figure 2. Visualization for simultaneous acquisition

105

The logical resources used are less than 33% of the total available, the input/output terminals used are also inferior to 43%, the memories embedded (M4K) used is 56%, leaving completely free the M512K and M-RAM blocks. The interconnection tracks used the 31% of the device and the global tracks disposed of 85% of Fan Out. The estimate of power consumption and thermal dissipation is of 1397,58 mW for the complete system. This work includes a methodological proposal to develop a complex system. This proposal responds to the standards of measurement of digital systems, the collaboration work in a combined outline of hardware and software, the mastering of new technologies in Colombia, the proposal of a custom device that can be configured far from and that it allows the visualization of data acquisition in any place of the world with a connection to Internet and the appropriate methodology for the design of the graphic interface applied to instrumentation and measure. The implementation of digital analyzer on re-configurable (FPGA) hardware makes the processes of upgrade and expansion cheap. The graphic interface is a feature of complex design and implementation. For that reason, a methodology document/view is proposed for windows handling that facilitated the visualization of the data captured in a time diagram and/or in a states list. An effective methodological scheme should allow the implementation of hardware modules and software in parallel for its later integration reducing the development time and optimizing the hardware resources used in the project. With this objective in mind, the specifications, the module tests and the integration tests should be defined in the same stage and they should last during the development without modification unless it is an agreed change. With the goal of taking full advantage of the FPGA architecture, the hardware should be designed with methodologies used in complex developments in the industry. This constraint makes the design challenge a process based on specifications and related with the characteristics of the implementation of technology. REFERENCES

[1] [2] F. Clyde, Electronic Instrument Handbook, 2nd ed., McGraw-Hill, 1994. R. Rivera, R. Hidalgo, J. Fernndez, Gemin, Walter and M. Gonzlez, Internet y la instrumentacin distribuida en red, Laboratorio de Procesos y Mediciones de Seales de la Facultad de Ingeniera de la Universidad Nacional de Mar del Plata, Argentina, http://www.mdp.edu.ar/rectorado/secretarias/investigacion/nexos/16/16 internet.htm; [online], Marzo 2006 J. Labrosse, "MicroC/OS-II - The Real Time Kernel", 2nd ed., Estados Unidos, CMP Books, 2002. National Instruments Corporation, Application note: 175 Document/View Architecture in Visual C++ Test and Measurement Applications, http://zone.ni.com/devzone/cda/tut/p/id/3305.[online], Septiembre 2007. W. Wolf, FPGA Based System Design, Prentice Hall., 2004

[6]

F. Plavec, Soft-Core Processor Design, Degree of Master of Applied Science, Department of Electrical and Computer Engineering University of Toronto, Toronto, 2004. [7] R. Pressman, Ingeniera del software, un enfoque prctico, 6 ed., Espaa, Mc Graw Hill, 2005 [8] H. D. Foster, "Integrating Functional Formal Verification Into Your Flow". Http://www.mentor.com/techpapers/fulfillment/upload/mentorpaper_29 515.pdf, Mentor Graphics Corporation. [online], Agosto 2007. [9] L. Curtis, "The Mentor Graphics 0-In Formal Verification Technology Backgrounder". http://www.mentor.com/techpapers/fulfillment/upload/mentorpaper_29 577.pdf, Mentor Graphics Corporation. [online], Agosto 2007. [10] B. Bailey, "Verification Strategies - The Right Strategy for You?". http://www.mentor.com/techpapers/fulfillment/upload/mentorpaper_82 27.pdf, Mentor Graphics Corporation. [online], Agosto 2007.

[3] [4]

[5]

106

Parallel Architecture for Decoding LDPC Codes on High Speed Communication Systems

Dami n A. Morero, Graciela Corral-Briones, and Mario R. Hueda a

Digital Communications Research Laboratory - National University of Cordoba - CONICET Av. V lez Sarseld 1611 - C rdoba (X5016GCA) - Argentina e o Emails: dmorero, gcorral, mhueda@com.uncor.edu

AbstractThis paper presents a novel parallel architecture for decoding LDPC codes. The proposed architecture has low memory and interconnection requirements, becoming attractive for high speed applications such as ber optic communications and high density magnetic recording. As an example, the implementation on an FPGA of a TPC/SPC code using the proposed architecture will also be described.

implementing on an FPGA the architecture proposed on Section IV. Finally, section VI presents the conclusions. II. L OW-D ENSITY PARITY C HECK C ODES An LDPC code is a linear block code dened by a sparse1 parity check matrix H, of dimensions (m n). This matrix denes a code of length n and dimension k, where k = n rank(H). A parity check matrix H has a bipartite graph associated with it, called Tannel graph (TG), which fully characterizes the code. The Tanner graph is composed of two types of nodes: variable nodes vi and check nodes ci . The nodes vi and cj represent the i-th coded bit or column of H, and the j-th check equation or row of H respectively. In a TG, connections exist only between a variable node and a check node. vi is connected to cj if and only if Hj,i = 0, where the indices j and i mean Hs row and column respectively. The degree of a TGs node is determined by the number of connections the node has; i.e. the degree of vi and cj represent the number of non-zero elements on the i-th column and on the j-th row of H respectively. If all vi nodes have the same degree and all ci nodes the same degree , then the code is said to be regular. Let V be a set of variable nodes and C a set of check nodes. If vi V and cj C, the set of neighbors of vi and ci is denoted as g(vi ) C and g(cj ) V respectively; i.e. g(vi ) = {cj C : Hj,i = 0} and g(cj ) = {vi V : Hj,i = 0}. This paper focuses on regular LDPC codes that hold the following structural property. That is: PV = {Vq : Vq V Vq Vw=q = q Vq = V } is a partition of V and with the following properties:

I. I NTRODUCTION The use of iterative decoders based on Low Density Parity Check (LDPC) codes has allowed to reach information rates close to Shannons channel capacity [1]. SISO (Soft Input/Soft Output) decoding of LDPC codes using the SumProduct Algorithm (SPA) requires approximately one order of magnitude less calculations than equivalent Turbo Codes [2]. Nevertheless, the implementation complexity of SISO LDPC detectors is one of the main obstacles that conditions its viability on commercial integrated circuits. High interconnection complexity and important amounts of memory are required. This issue becomes more important on high-speed applications where parallel processing schemes are needed. Different low-complexity architectures for implementing LDPC codes have been proposed. [3] describes one that has as drawback a degradation on the decoder performance due to a simplication on the SPA algorithm. Architectures described in [2] y [4] have high requirements of memory and interconnection. [5] y [6] show architectures focused on implementing LDPC codes with particular structural properties. The present paper introduces a new architecture for implementing a wide family of LDPC codes. It operates in a parallel fashion without requiring approximations on the SPA algorithm, and requires a reduced amount of memory and interconnection complexity. These features make the architecture desirable for implementing on integrated circuits. The rest of the paper is organized as follows. Section II gives a brief introduction to LDPC codes. Section III explains the decoding algorithm. Section IV describes the proposed novel architecture that implements the algorithm explained on the previous section. Section V shows the results after

0 The present paper has been supported by SeCyT-UNC and ClariPhy Argentina S.A.

A partition PV fullling the properties mentioned above is called a Variable-node Partition with Full Check-node Connectivity (VPFCC); and a code in which it is possible to create a partition of this nature is said to be a VPFCCcode. The architecture proposed in this paper to implement

1A

P1: Vq PV it is true that g(Vq ) = C P2: Vq PV , let vi , vj Vq where i = j; therefor it is true that g(vi ) g(vj ) = .

107

Tanner Graph

B1 B2 B3 B4 B5 B6 B7 B8 B9 P1 P2 P3 P4 P5 P6 P7 C8 C7 C6 C5 C4 C3 C2 C1

C1 C2 C3 C4

C5 B1 B4 B7 P1 B2 B5 B8 P2 B3 B6 B9 P3 P5 C6 P6 C7 P7 C8 P4

= {B3 , P6 , B7 , P2 } = {P5 , B4 , B8 , P3 }.

Bit

C1: C2: C3: C4: C5: C6: C7: C8:

Parity

Check

It can be veried that this partition holds properties (P1) and (P2), making 2D-TPC/SPC(4,3)2 a VPFCC-code. This result can be generalized to a 2D-TPC/SPC(N ,N 1)2 code by creating a partition PV = {Vi : i = 1, . . . , N }, where Vi is the i-th modular-diagonal from the square array of N N variable nodes. The modular-diagonal of a N N square array of elements xi,j is dened as the sets Dk = {xi,j : i = 1, . . . , N j = (i + k 1)mod(N )}. (1) III. T HE S UM -P RODUCT A LGORITHM Let bi be the i-th bit of the code word. The Sum-Product Algorithm (SPA) [10] takes as input the a priori log-likelihood ratio of each bit: AprLLRi = ln P (bi = 1) P (bi = 0) . (2)

Figure 1.

an LDPC decoder is grounded on these specic properties of a partition PV . There are several LDPC codes that satisfy the characteristic properties of VPFCC-codes. A set of these are the Turbo Product Codes based on Single Parity Check (TPC/SPC). [7], [8] y [9] suggest that these codes are suitable for iterative detection schemes for magnetic recording. In fact, the implementation of a TPC/SPC code is described in this paper to illustrate the proposed decoding architecture. The utility the VPFCC property of an LDPC code has on the new architecture will be analyzed in Section IV. Next, the main characteristics of TPC/SPC codes will be briey described. A. TPC/SPC Codes A TPC/SPC code is built through a multi-dimensional array of code words derived from other previously dened code or codes. The codes used for building the multidimensional array are called base codes. Typical base codes are: Hamming, BCH, and simple parity check codes. Within TPC codes, those based on bi-dimensional arrays (denoted as 2D-TPC) are of special interest. A 2D-TPC code consisting of two component codes C1 and C2 with parameters (n1 , k1 , d1 , G1 ) and (n2 , k2 , d2 , G2 ) respectively, has parameters (n1 n2 , k1 k2 , d1 d2 , G1 G2 ); where n, k, d, G are the length, dimension, minimum distance, and generator matrix of the code respectively, and represents the Kronecker product. Within the 2D-TPC codes, the attention is focused on those based on single parity check codes (SPC) of length N and dimension N 1, denoted as 2D-TPC/SPC(N ,N 1)2 [7], [8] y [9]. Fig. 1 shows the TG and its structure for a 2DTPC/SPC(4,3)2 code. 2D-TPC/SPC codes are all VPFCC-codes. This can be seen through the 2D-TPC/SPC(4,3)2 example code of Fig. 1, from

It then iterates on the computation of the a posteriori loglikelihood ratio: ApoLLRi = ln where PC (bi ) =

bj : j=i

PC (bi = 1) PC (bi = 0)

(3)

being P (|) the joint probability function of the code word given the a priori information. One iteration of the SPA algorithm consists of two steps. In the rst one (A), messages are calculated and sent from the variable nodes vi to the check nodes cj , M (vi cj ). In the second step (B), messages are calculated and sent from the check nodes cj to the variable nodes vi , M (cj vi ). Steps (A) and (B) are repeated as many times as required iterations. Finally, in a third step (C), the ApoLLR is computed. Summarizing, the calculations performed on each step are the followings: A. Messages from variable to check nodes: M (vi cj ) = AprLLRi + M (ck vi ) (4)

ck g(vi )\cj

vk g(cj )\vi

108

Messages Related to V 1

Messages Related to V 2

B1 B2 C1 C1

BCV

L U T

B1

C1

C1

FIFO

+ + +

B2 B3 B4 C2 C2

B3 B4

C2

C2

(abs)

(abs)

L U T

B5 B6 B7 B8 B9

C3

C3

B5 B6

C3

C3

C4

C4

B7 B8

C4

C4

C5

C5

(sign)

B9 P1

C5

C5

(sign)

P1 P2 P3 P4 P5 P6 P7 C8 C8 C7 C7 C6 C6

P2 P3 P4 P5 P6 P7

C6

C6

FIFO

C7

C7

C8

C8

Messages Related to V 3

B1 B2 B3 C2 C2 C1 C1 B1 B2 B3 B4 C3 C3 B5 B6 C4 C4 B7 B8 C5 C5 B9 P1 C6 C6 P2 P3 C7 C7 P4 P5 C8 C8 P6 P7

Messages Related to V 4

C1 C1

C2

C2

BVC

B4 B5 B6 B7 B8 B9 P1

C3

C3

C4

C4

C5

C5

BApoLLR

Figure 2. The BCV block computes recursively the messages M (cj vi ). The BVC block computes in parallel the messages M (vi cj ). The BApoLLR computes the a posterior information ApoLLRi . All the blocks correspond to one code of the 2D-TPC/SPC family.

P2 P3 P4 P5 P6 P7

C6

C6

C7

C7

C8

C8

Figure 3. Partition of the computation ow of messages on a SPA algorithm using the VPFCC property of the 2D-TPC/SPC(4,3)2 code.

C. A posteriori log-likelihood ratio (ApoLLR) computation: IV. P ROPOSED A RCHITECTURE ApoLLRi = AprLLRi +

ck g(vi )

M (ck vi )

(6)

The computation in steps (A), (B), and (C) is performed in independent blocks named BVC, BCV, and BApoLLR respectively. As the BCV block, which computes messages M (cj vi ), consumes most of the computational requirements of the SPA algorithm, a modied version is usually implemented. This version is based on the following approximation: 1 (a) + (b) = 2 tanh1 tanh b a tanh 2 2 1 + e|a+b| = sign(a) sign(b) min{|a|, |b|} + log 1 + e|ab| sign(a) sign(b) min{|a|, |b|}.

In this way, the computation of the check to variable messages can be rewritten in the following way: M (cj vi ) = min M (vk cj ) (7)

vk g(cj )\vi

vk g(cj )\vi

sign(M (vk cj ))

The proposed implementation architecture is based on the order that messages M (vi cj ) and M (cj vi ) are computed and evaluated. Fig. 2 shows the BVC and BCV block architectures used to compute the messages M (vi cj ) and M (cj vi ), respectively. The BVC block works in a parallel way while the BCV blocks do it recursively, minimizing thus their complexity. Fig. 4 shows the complete architecture of a SPA decoder with two iterations for a 2D-TPC/SPC(4,3)2 code. The system parallelism is 4, i.e., on each clock cycle 4 values of AprLLR and ApoLLR input and output respectively. The input and output sequence corresponds with the partition elements PV = {Vi : i = 1, . . . , 4} described in Section II. Fig. 3 shows how messages of the SPA algorithm are computed when the partition PV is used. Property (P2) of PV prevents a conict between M (vi cj ) messages, i.e., on each clock cycle there will not be two M (vi cj ) messages arriving at the same check node. Property (P1) ensures that, on each clock cycle, all check nodes will have input information available. As Fig.3 shows, only N = 4 variables nodes are updated on each clock cycle. Due to this property, only 4 BVC blocks need to be implemented instead of the N 2 = 16 that a full parallel architecture would need. Multiplexers shown in Fig. 4 interconnect the 4 BVC blocks with the corresponding BCV blocks of each set Vi of the partition PV . For the case of the 2D-TPC/SPC(N ,N 1)2 family of codes, the proposed architecture presents the following char-

109

BCV

FIFO

BVC

BCV

FIFO

BApoLLR

BCV BCV

FIFO

FIFO

BApoLLR

BCV BCV

FIFO

FIFO

BApoLLR

BCV BCV

FIFO

FIFO

BApoLLR

BCV

BCV

Figure 4.

SISO decoder architecture with two SPA iterations for the 2D-TPC/SPC(4,3)2 code.

acteristics: The partition of the computation ow in the SPA algorithm saves the need to store the M (vi cj ) and M (cj vi ) messages, which are computed and used on demand. The possibility to use a recursive architecture for the BCV block while maintaining a parallel processing of the whole decoder reduces the implementation complexity. 2 Only N BVC blocks are implemented, instead of N . The architecture of Fig. 4 can be easily generalized to any other LDPC code fullling the VPFCC property. Besides, the proposed technique can be implemented recursively without major modications. This allows to do several SPA iterations by only instantiating the hardware that corresponds to an iteration, e.g. the rst half of the architecture shown in Fig. 4. V. FPGA I MPLEMENTATION The synthesis of the proposed architecture was performed on an FPGA Virtex 5 (XC5VLX330) from Xilinx. The decoder was designed for the 2D-TPC/SPC(32,31)2 code with the following parameters: (n = 1024, k = 961, d = 4) and rate R = 0.9384. All the signals present in the decoder were digitized using 8 bits precision variables. The resources used were: 16952 registers out of 297369 (8.18%) and 14402 lookup tables (LUTs) out of 207360 (6.95%). VI. C ONCLUSIONS This paper proposed a novel parallel architecture for decoding VPFCC-type LDPC codes. This architecture is suitable for

working on high speed applications, requiring low memory and interconnection complexity. These features make the architecture attractive for implementing iterative decoders on integrated circuits. R EFERENCES

[1] D. J. C. Mackay and R. M. Neal, Near Shannon limit performance of low density parity check codes, IEE Electronics Letters, vol. 33, no. 66, pp 457-8, March 1997. [2] E. Yeo, P. Pakzad, B. Nikolic, and V. Anantharam, VLSI architectures for iterative decoders in magnetic recording channels, IEEE Trans. Magnetics, vol. 37, no. 2, pp. 748-55, March 2001. [3] , High throughput low-density parity-check decoder architectures, IEEE, 2001. [4] C. Howland and A. Blanksby, A 220mw 1Gb/s 1024-bit rate-1/2 low density parity check code decoder, IEEE Custom Integrated Circuits Conference, 2001. [5] E. Liao, E. Yeo, and B. Nikolic, Low-density parity-check code constructions for hardware implementation, IEEE Communications Society, 2004. [6] M. Karkooti, P. Radosavljevic, and J. R. Cavallaro, Congurable, high throughput, irregular LDPC decoder architecture: Tradeoff analysis and implementation, IEEE ASAP, 2006. [7] J. Li, K. Narayanan, and C. Georghiades, Product accumulate codes: A class of codes with near-capacity performance and low decoding complexity, IEEE Trans. on Inf. Theory. vol. 50, no. 1, pp. 31-46, January 2004. [8] J. Li, K. R. Narayanan, E. Kurtas, and C. N. Georghiades, On the performance of high-rate TPC/SPC codes and LDPC codes over partial response channels, IEEE Trans. on Communications, vol. 50, no. 5, pp. 723-734, May 2002. [9] J. Li, E. Kurtas, K. R. Narayanan, and C. N. Georghiades, On the performance of turbo product codes over partial response channels, IEEE Trans. on Communications, vol. 37, no. 4, pp. 1932-1934, July 2001. [10] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. on Inf. Theory, vol. 47, no. 2, pp. 498-519, February 2001.

110

Brent-Kung Fast Adder Dscription, Simulation and Formal Verication Using Lava.

Leandro Mars o

Facultad de Ciencias Exactas, Fsicas y Naturales Universidad Nacional de C rdoba o Email: elleandro@gmail.com

s1 carryIn

fullAdd

Abstract Integrated Circuits Design can be made by using ideas that come from Computer Science, and particulary from Functional Programing, that can give us more abstract representations and verication techniques in order to keep up with the ever-increasing complexity of modern hardware designs. Using Lava [1], a HDL embedded in Haskell, we explain how to design, simulate, and formal verify a carry chain binary adder and a fast adder parameterized in the size of its inputs.

s2

fullAdd

sn

fullAdd

carryOut

a1 b1

a2 b 2

an bn

I. I NTRODUCTION We shall present a Lava description of a carry chain adder. With this simple circuit we can verify all additions properties in less computer time than with a more complex one, like the Brent-Kung adder. We shall then prove that the latter is correctly designed, by means of an equivalence ckecking between them. A. Simple circuits Following gure 1a we will describe a half adder:

b

Fig. 2: Binary Adder Lastly, the binary adder shown in gure 2 can be described by:

adder (carryIn, ([], [])) = ([], carryIn) adder (carryIn, (a:as, b:bs)) = (sum:sums, carryOut) where (sum, carry) = fullAdd (carryIn,(a, b)) (sums, carryOut) = adder (carry, (as, bs))

AND

carry

b a

a

carry1

HA

sum1

carry2

XOR

carryOut sum

XOR

sum

carryIn

HA

(a) Half adder import Lava halfAdd (a, b) = (sum, carry) where sum = xor2 (a, b) carry = and2 (a, b)

B. Connection Patterns Standard Conections Patterns Conection patterns are high-order functions1 that when used to build circuits, we call them high-order circuits, or circuit generators. Looking into the adder denition and its topology, we want to capture its structure and generalize the conection pattern it uses. We can do this by replacing the fullAdder circuit by a not yet dened parameter, that in the function denition, it becomes another input to the circuit, that we call it circ:

row circ (carryIn, ([])) = ([], carryIn) row circ (carryIn, a:as) = (b:bs, carryOut) where (b, carry) = circ (carryIn, a) (bs, carryOut) = row circ (carry, as)

The import Lava statement is needed to import a module which denes a number of operations that we can use to build circuits, like denitions of gates xor2 and and2, among others. Figure 1b, can be used to write a circuit description out of it, by giving names to every internal signal of the circuit and writing down the subcomponent denitions, as follows:

fullAdd (carryIn, (a, b)) = (sum, carryOut) where (sum1,carry1) = halfAdd (a, b) (sum, carry2) = halfAdd (carryIn, sum1) carryOut = xor2 (carry2, carry1)

The row function takes a circ circuit, a set of inputs, and connects them as shown in gure 3a. Now, using the row circuit generator, the binary adder can be described simply by:

adder (carry, inps) = row fullAdd (carry, inps)

adder = row fullAdd

halfAdd,

Note that we made use of the previous circuit description as well as the xor2 function to build the new circuit called fullAdd.

Dening adder and adder that way is quite convenient because we gain expressivness by thinking in terms of circuit generators instead of recursion over lists. Seen the advantage of dening connection patterns, we present two more circuit generators that will be used later:

1 Functions are higher-order when they can take other functions as arguments, and return them as results.

111

a1 carryIn a2 an

a

carryOut

circ1 circ2

c d

circ

s1

circ

s2

circ

sn

circ1 b circ2

(a) Row

(b) Serial

(c) Parallel

prop_AdderCommutative (as, bs) = ok where out1 = adder2 (as, bs) out2 = adder2 (bs, as) ok = out1 <==> out2

par cir1 cir2 (a, b) = (c, d) where c = cir1 a d = cir2 b

Note that the <==> operator is the inx version of a function that maps between two lists the gate xnor2, which is the logical equality operation. Because it is very hard to automatically verify properties for any size, we will dene a new property which includes the size of the circuit to be veried:

prop_AdderCommutative_ForSize n = forAll (list n) $ \as -> forAll (list n) $ \bs -> prop_AdderCommutative (as, bs)

It is very usefull to dene a more graphical version of function par, by dening the inx operator -|-:

cir1 -|- cir2 = par cir1 cir2

And lastly serial circuits conection and its inx operator version:

serial cir1 cir2 a = c where b = cir1 a c = cir2 b cir1 ->- cir2 = serial cir1 cir2

So now we do verication by calling Minisat from Lava giving the size number:

minisat (prop_AdderCommutative_ForSize 32)

Minisat: ... (t=0.00system) Valid.

Minisat: ... (t=0.00system) Falsifiable. Minisat: ... (t=0.00system) Inderterminate.

II. I NTERPRETATIONS Though our goal is to introduce Lava as a tool for hardware designers, until now, every line of code that we wrote was in plain Haskell. Actually, designers only need to know which is the set of gates (e.g. xor2, or2, inv, etc) that are available in the Lava module, when describing circuits in terms of those denitions, for the circuit to be used by any of the following interpretations. A. Simulation We can simulate a circuit using the simulate operation, the circuit and the state of the inputs to the circuit, for instance:

simulate adder (low,([high,low],[low,high])

prop_AdderAssociative (as, bs, cs) = ok where out1 = adder2 (adder2 (as, bs), cs) out2 = adder2 (as, adder2 (bs, cs)) ok = out1 <==> out2

To verify that zero is the identity element of the addition, we will add extra logic to the circuit to transform one of the two numbers to zero:

alwaysLow :: [Signal Bool] -> [Signal Bool] alwaysLow (as) = [low | n <- [1..n]] where n = length as addZero = (id -|- alwaysLow) ->- adder2 -- id is the identity function

yields:([high,high],low). We can also simulate sequences of inputs with the simulateSeq operation:

simulateSeq halfAdd [(low,low),(high,low),(low,high)]

that will return [(low,low),(high,low),(high,low)] B. Formal Verication Verication in Lava can be done by means of feeding the circuit description with symbolic inputs, and using the output of the circuit as the input to a SAT solver [2], the one we used is called Minisat. We shall verify all properties of the sum, to be sure that our adder is correctly designed. But beforehand, we need to change a little the denition of the adder circuit, just to make verication examples easier. We dene an adder that does not take in a carry bit, and throws away the resulting carry.

adder2 ([],[]) = [] adder2 (a:as, b:bs) = sum:sums where (sum, carry) = halfAdd (a, b) (sums, carryOut) = adder (carry, (as, bs))

prop_AdderZero (as,bs) = ok where out = addZero (as, bs) ok = out <==> as

a b

id all Low

adder2

sum

C. Generating RTL Description As well as we use circuit descriptions and symbolic data to prove some properties, we can also use it to generate VHDL code. After describing the Brent-Kung fast adder, we will show how to generate a VHDL netlist from the Lava description. III. FAST A DDER D ESCRIPTION AND V ERIFICATION We are going to design a fast adder based on the Brent and Kung paper [3], and also based in a recursive pattern proposed by M.Sheeran [4] to generate a parallel prex network used to compute the carries.

112

A. Brent-Kung Adder 1) Brent-Kung Operator: The operator is dened as: (g, p) (, p) = (g (p g ), p g ) g In Lava, we can write it following the gure 5:

g p

g1 p1

g1 p1 g2 p2 g3 p3 g4 p4

fork

even

BrentKung parallel prefix network

fork

G 1 P1 G 2 P2 G 3 P3 G 4 P4

and2 and2

id

odd

sums

or2

sums

go

po

s1

s2

s3

s4

s5

dotOp ((g1, p1) ,(g, p)) = (go, po) where go = or2 (g, and2 (p, g1)) po = and2 (p, p1)

If dotOp describes correctly the operator, it must fulll the associative property, as Brent and Kung demostrated. The property can be stated as:

checkAssociativeDotOp (a, b, c) = ok where (d, e) = dotOp (a, dotOp (b, c)) (f, g) = dotOp (dotOp (a, b), c) ok1 = d <=> f ok2 = e <=> g ok = ok1 <=> ok2

other blocks using the ->- and the -|- circuit generators. The Lava code will be exactly the same as the picture description, but all in the same line. Again, looking into gure 7 we shall dene the fork, evens, odds and sums circuits:

fork as = (as, as) -even as = cs where (bs,cs) = unzip as -odd as = bs where (bs,cs) = unzip as -- some shorter definitions: dropP = id -|- odds dropG = evens -|- ppNet -sums (a:as,bs) = (a:lastXor (as,init bs),cOut) where cOut = last bs -lastXor (as, bs) = map xor2 cs where cs = zipp (as, bs)

2) Generate and Propagate Circuit: Now we have to describe a circuit that computes the generate and propagate signal, dened as: gi = ai bi , pi = ai bi Those signals are generated in parellel given two binary numbers a[n] and b[n] of length n.

a1 b1 a2 b2 an bn

and2 xor2 and2 xor2 and2 xor2

g1

p1

g2

p2

gn

pn

B. Parallel Prex Network We shall now describe a parallel prex network with maximum fan-out of two, i.e. the Brent-Kung carry chain computation block. Sheeran [4] proposed a recurrence pattern (gure 8a) called wrap where if in every step of iteration, we take the result of the previous iteration (the P circuit) and we apply the dot operation as shown, it can lead us to build parallel prex networks like gure 8c. Figure 8b are the rst two iterations of circuit ppNet, where the doted box is the base case of the description, and black dots are the dotOp. What we are producing is networks like gure 8c. So lets describe ppNet circuit:

dop [a, b] = [a, dotOp(a, b)] -unzipl [] = ([],[]) unzipl [a] = ([a], [])

gAndPs ([],[]) = [] gAndPs (a:as, b:bs) = (g,p):gps where (g, p) = (and2 (a, b),xor2 (a, b)) gps = gAndPs (as, bs)

3) Overall Circuit: Asuming that we already have the carry chain computation block designed, denoted as the parallel prex network, the overall circuit proposed in Brent and Kung paper can be conviniently seen as in gure 7. Looking at the right of gure 7 it can be found the Lava description of the overall pattern, which is done by conecting

113

comb > unzippl > id | p > zippl > posComb

B. Generating VHDL Description It can be done as simple as dening the next function:

fastAdder n = writeVhdlInputOutputNoClk "BrentKungFastAdder" fastAdd (varList n "a", varList n "b") (varList n "sum", var "cout")

To generate the VDHL netlist, we must specify the actual size of the adder, so we can use it as fastAdder 16 and we will get the BrentKungFastAdder.vhl le which is 376 lines long. If this netlist is to be used by a synthesis tool, it should not touch the wires so as to preserve the network structure. V. C ONCLUSION Using functional programing we managed to describe, verify and generate VHDL description of both adders in less than 150 lines of code (comments included), which is less than halve of the VHDL 16 bits adder generated code. We want to point out some advantages of this aproach: We can generate adders of any size Verication code is fully reusable We work with an unied language for description, simulation and verication resulting in a simpler design ow Functional Programing give us the ability to design correct circuits in a simple and compact way, easily integrated into a standard design ow. We also want to stress again that given the fact that Lava circuits are plain Haskell programs, System Level simulations can be easily done, when needed. It is worth to mention the use of Functional Programing expresiveness to describe circuits that adapt to their context, for example to the delay prole of the inputs, as can be found in Sheerans [4] paper. Functional Programing techniques are being also [5] used to take into account non functional properties as area, power consumption and timing, even when working at a high level of abstraction on early stages of the design. ACKNOWLEDGMENT The author would like to thank to M. Sheeran and K. Claessen for making Lava and its documentation be freely available. R EFERENCES

[1] K. Claessen and M. Sheeran, A tutorial on Lava: A hardware descritption and verication system, Website, 2000, http://www.cs.chalmers.se/ koen/Lava. [2] K. Claessen, N. Een, M. Sheeran, and N. Sorensson, SAT-Solving in practice, in 9th International Workshop on Discrete Event Systems (WODES08), G teborg, Sweden, May 2008. o [3] R. P. Brent and H. T. Kung, A Regular Layout for Parallel Adders, IEEE Transaction on Computers, vol. C-31, Issue: 3, pp. 260264, 1982. [4] M. Sheeran, Parallel prex network generation: an application of functional programming In Hardware Design and Functional Languages, in Hardware design and Functional Languages (HFL07), Braga, Portugal, March 2007. [5] E. Axelsson, K. Claessen, and M. Sheeran, Wired: Wire-aware circuit design, in Proc. of Conference on Correct Hardware Design and Verication Methods (CHARME), ser. Lecture Notes in Computer Science, vol. 3725. Springer Verlag, October 2005.

unzipl (a:b:abs) = (a:as, b:bs) where (as, bs) = unzipl abs -zipl ([], []) = [] zipl ([a], []) = [] zipl (a:as, b:bs) = a:b:zipl(as, bs) -- zipl and unzipl are the key to -- have a binary adder that accepts -- any input sizes -comb [] = [] comb [a] = [] comb (a:as) = dop [a, head as] ++ comb (tail as) -posComb (a:as) = a: (comb (init as))++ [last as] -miti p = unzipl ->- (id -|- p) ->- zipl -wrap p = comb ->- miti p ->- posComb

ppNet [a] = [] ppNet [a, b] = dop [a, b] ppNet as = wrap ppNet as

GENERATION OF THE FAST ADDER

A. Verication We could easily modify the fastAdd to make it drop the carry out, and we will have a new circuit called fastAdd2 that should do the same as the adder2. To state that in Lava we can write:

prop_Equivalent adder2 fastadd2 a = ok where out1 = adder2 a out2 = fastadd2 a ok = out1 <==> out2

114

High Value Resistance for Neural Signals Acquisition System using OTA topologies

Juan Pablo Zeballos Raczy, Csar Vsquez Vargas, Omar Olgun Amado

Grupo de Microelectrnica Pontificia Universidad Catlica del Per Lima, Per {jpzeba, cvasquezv, omar.olquin}@ieee.org Abstract This paper presents the design of a high value integrated resistance by analyzing three OTA (Operational Transconductance Amplifier) topologies, and making a comparison among them and the topology that is mentioned in [1]. This resistance is used to obtain high time constants in neural signals acquisition systems (NSAS). In order to achieve this goal, the gm/Id methodology was used, as well as CAD Tools based on BSIM 3v3 mathematical model [2], and the design was made with AMS CMOS 0.35 technology. I. INTRODUCTION Adaptation: it defines the analog-to-digital converter (ADC), by using a high band filter and an amplifier. [1]

Within the integrated circuits design, where we have a reduced space, we may not place a conventional high value resistance, because it would take a big area inside the integrated circuit (IC). High value resistances, that occupy small areas, can be obtained by designing low transconductance OTAs, with low noise and low power consumption. Those resistances can be applied to band pass filters in NSAS. The first part of this work presents a brief theory about big integrated resistances, and the OTA topologies to be analysed. The second part refers to the design of these topologies, and the comparison among them and the mentioned in [1]. The results and conclusions are presented in the last part. A. Neural signals acquisition system structure

Figure. 1 Place where the high value resistance is implemented in the band pass filter. [1]

A NSAS requires high common-mode rejection ratio (CMRR), because neural signals have low amplitude, and noise can interfere by producing acquisition mistakes. This kind of system has 3 functional blocks: Pre-amplification: it presents differentials stages, and eliminates noise. Band pass filter: it is based in a DDA (double differential amplifier) Two feedback loops are required: a negative one, to define the DDA gain, and a positive one, where the low frequency poles are introduced. To achieve the last one, a high resistance is needed. It is known that, the higher capacitance, the less frequency, but it is not suitable for ICs.

B. High value integrated resistances An integrated resistance is defined as the resistance implemented inside an IC. Several types of resistances can be found according to the material: the resistance of Poly that has some tens of ohms; the N-well resistances, which are built by n doped materials, with values from 1 to 2k/, (where is equal to 1m2), and finally the active zone resistances, made of silicon, present values from 100 to 200/. [1] A small area and low power architecture is proposed in [1] and [3]. This technique replaces Rinf in Fig. 1 for P-Type transistor with bulk and voltage controlled by the architecture shown in Fig. 2.

115

Another alternative to obtain a high resistance is by using the architectures based on OTAs, shown in Fig. 3. Where a floating resistance is built with 3 OTAs and its value is shown below: Working with Iin , the following equations can be deduced: Ic = (Vout Vin) Gm Iin = Ic = (Vin Vout) Gm (Vin Vout) / Iin = 1 / Gm = R With Iout, the following equations: Ia = (Vin) Gm Ib = (Vout) Gm Iout = (Ia + Ib) = Gm (Vout Vin) (Vout Vin) / Iout = 1 / Gm = R (4) (5) (6) (7)

a) Simple OTA This architecture uses few transistors, so it can optimize low power consumption with small areas. However, obtaining very low transconductance and high linear range is a difficult task. Fig. 5 shows the simple OTA that consists in a differential pair (M1, M2) and an active current mirror (M3, M4)

Finally, with (3) and (7), it is demonstrated that the value of the resistance (R) is given by: R = 1 / Gm (8)

Figure. 5: Simple OTA

At low frequencies the input voltage noise can be reduced by making L3 higher than L1. L3=3L1 is used as a practical case. b) Symmetric OTA

Vdd

M8

Vdd Id

M7

Iout Vin(+)

Vin(-)

So getting a transconductance value in the order of nS, high resistance values can be obtained without occupying so much space. But if transconductance is reduced in the design, noise could be increased. C. OTA topologies An OTA amplifier implies that input voltages control an output current, therefore, we have a relation between voltage and current that might be the transconductance. See Fig. 4.

M1

M2

M5

M3

M4

M6

The Symmetric OTA consists in a differential pair and three active current mirrors. The overall transconductance, Gm, is the same that M1 and M2, with Id(M5)=Id(M3)=Id(M4)=Id(M6) [4]. This topology has a better linear range than basic topology because of its symmetric distribution. Thus, the input voltage noise also depends on M6 and M5. c) Symmetric OTA using serie-paralel current division

This structure employs the serie-pararel (SP) association to achieve very low transconductance (pA/V). The SP technique divides the differential pair current in a factor that

116

depends on the numbers of series and parallel transistors in N-type active current mirrors [5]: I out SP (9) = I in RQ where S and Q are the series transistors and P and R are the parallel transistors in the active current mirror. See Fig. 7.

The M1 transistor is fixed in weak inversion [6] with a 12.5 nA/V of transconducance (Gm1) obtaining a bias current of 0.48nA. This current value forced the others transistors to be in weak inversion. The N-type current mirrors are fixed with a Gm/Id of 28 and the P-type current mirror is fixed in 26. The design results are shown in the table II:

TABLE II. gm/Id (V 1) M1, M2 M3,M4,M5,M6 M7,M8 26 28 26 SYMMETRIC OTA DESIGN Id(nA) 0.48 0.48 0.48 W(m) 1.3 1.1 5.3 L(m) 10 10 10 Gm(nA/V) 12.5 13.44 12.5

The active current mirrors might be divided in two branches, in series and parallel. When the two branches have the same transistor number (N) the effective transconductance (Gm) is expressed by: [5] gm (10) G MX = 21 N

The M1 transistor is fixed in weak inversion with a 1.25 A/V because Gm=gm1/N2, where N=10 obtaining a 1.25n A/V of system transconductance. The others transistors were also fixed in weak inversion. The results of design are shown in the table III:

TABLE III. gm/Id (V 1) M1 M2 M2 M3 22 24 24 29 SERIE -PARALEL OTA DESIGN Id(nA) 56.8 56.8 0.568 0.568 W(m) 77.6 99.5 9.95 3.71 L(m) 1 1 10 10 Gm(nA/V) 1250 1360 16.5 13.6

Besides, the differential pair transistors (M1) were divided in order to decrease the flicker noise and mismatch [2]:

Figure. 8 Series-Parallel OTA [5] TABLE IV. DIVIDED TRANSISTORS VALUES Wu(m) Lu(m) 38.8 9.95 9.95 1 1 1

Array Structure

II.

OTA DESIGN

M1 M2 M2

A. Simple OTA design The design objective was to obtain the lowest transconductance possible with a small area. The result was a Gm of 96 nA/V with a bias current of 16nA as can be shown in Table I:

TABLE I. gm/Id (V 1) M1, M2 M3,M4 M5,M6 12 6.53 5.86 Id(nA) 8 8 16 SIMPLE OTA DESIGN W(m) 1 0.8 1 L(m) 67.8 500 150 Gm(nA/V) 96 52.24 93.76

III.

RESULTS

As shown in Table V, it is difficult to obtain very high resistances with simple OTAs. With the other ones, it is possible to obtain them (and outnumber it), but the problem with the symmetric topology is a small linear range in contrast to the series-parallel topology, and a high input noise voltage. Besides, there is an evident advantage of using SP topology instead of the resistance in [4], not only because of high resistances values, but also a small area and power consumption.

117

TABLE V. Parameter Resistance(M) Linear Range(mV) Input Noise(Vrms) Estimated area(m ) Power consumption(nW)

2

RESULTS Symmetric 80 100 9464 231 21 SP 125 220 110.4 2535.6 1140 Resistance [1] 80.5 4147 90009

IV.

CONCLUSIONS

The following graphics show the transconductance transfer characteristic of resistances implemented with the three topologies according to Fig. 3, and its resistance value is given by 8.The linear range and the simulating transconductance are shown as well.

Output current Iout[A]

The use of OTA topologies optimizes the area of integrated resistances. It also can be useful in high time constant filters, obtaining low capacitances values. The input voltage noise is not a problem, because it is low in all cases. However, it can be reduced by increasing or decreasing the design parameters. The linear range can be increased, in all cases, with two transistors between the differential pair. According to the obtained results, OTA topologies as resistors can replace the resistor mentioned in [1]. REFERENCES

[1] E. Raygada, E. Azabache, J.C. Saldaa, Diseo de una resistencia integrada de alto valor aplicada a un sistema de adquisicin de seales neuronales, XIV IBERCHIP, February 2008, Puebla, Mxico. [2] H.Alarcon, H. Villacorta, A design-space generation tool for analog blocks of ultra low-power ICs based upon the Bsim3v3 model, XIV IBERCHIP, March 2006, San Jose, Costa Rica. [3] J. Sacristn, M. T. Oses, Low noise amplifier for recording ENG signals in implantable systems, ISCAS 2004(International Symposium on Circuits and Systems), p.IV-, 2004. [4] A.Veeravalli, E.Snchez-Sinencio, J.Silva-Martnez, Transconductance amplifiers with very small transconductances: A comparative design approach", IEEE JSSC, vol.37, n.6, pp.770-775, Jun.2002. [5] A. Arnaud, R Fiorelli, C. Galup, On the design of very small transconductance OTAs with reduced input offset SBCCI05, September 47, 2004, Florianpolis, Brazil.

Figure. 9 Simple OTA GM transfer characteristic simulation Output current Iout[A] [6] F. Silveira, D. Flandre, P. G. A. Jespers, A gm/ID based methodology for the design of CMOS analog circuits and its application to the synthesis of a silicon-on-insulator micro power OTA, IEEE Journal of Solid-State Circuits, vol. 31, no. 9, September 1996

Figure. 10: Symmetric OTA GM transfer characteristic simulation Output current Iout[A] Figure. 11: Series-Parallel OTA GM transfer characteristic simulation

118

Rafaella Fiorelli, Fernando Silveira

Instituto de Ingeniera El ctrica, Facultad de Ingeniera e Universidad de la Rep blica u Montevideo, Uruguay. {orelli, silveira}@ng.edu.uy

AbstractThe design of CMOS CG-LNAs using a design space exploration proposed for all inversion regions, from weak to strong, is performed in this work. The exploration is done in terms of current consumption, gain and noise gure in the design space (ID , gm ) showing the trade-offs of designing the CG-LNA ID in moderate or weak inversion. Finally comparisons between the MATLAB design space exploration and BSIM3v3 simulations using Spectre-RF are done, through the design example of a 900MHz CG-LNA implemented in a 0.35m CMOS technology. Index TermsCommon Gate LNA, Low power consumption, All inversion regions, Design Methodology, ACM model.

II. ACM M ODEL For the theoretical deductions and simulations, the oneequation- all-region MOSFET model ( [5] [6]) has been used to describe the transistor behaviour. This is a physical-based compact model valid for all inversion levels. In this design, the transistors are considered to be working in the saturation region. In the ACM model, the drain current is expressed as the difference between the forward IF and reverse IR components, ID = IF IR = IS (if ir ) where IS = (1)

I. I NTRODUCTION In autonomous front-end RF applications, exist either strong power and noise constraints. Therefore, the design of these circuits carry on several trade-offs which can be seen clearly by using a design space exploration of their characteristics. In this work it is presented the design space exploration of a single-ended common gate low noise amplier (CG-LNA), considering noise gure and gain as its basic characteristics. All the inversion levels were considered. In [1] it is performed a design space exploration but considering only strong inversion transistor operation. In ( [2], [3]) present a graphical optimization of a CG-LNA, using the all-region EKV model, but in these works no study of inductor constraints was done. In [4] the ACM model is used to design a LNA but in this case it is a common source LNA. With the aid of MATLAB, the design space exploration of the CG-LNA is done using power gain GT and noise gure (NF). The all-region ACM CMOS transistor model is used. Limitations imposed by using on-chips inductors are also studied and included in the exploration. The CG-LNA proposed does not have an input matching network, so limitations in the load resistance for achieving the desired input impedance are also considered. Following a design ow, the complete set of the LNA parameters are obtained. A discussion of how the inversion level affects the current consumption, the gain and the noise gure of the CG-LNA is developed. In section II is done a brief review of the transistor model used. In section III are given the description and behaviour of the CG-LNA. The design exploration and results are given in section IV and V. Finally a design example is described in VI.

W 1 nCox 2 (2) t 2 L IS is the specic current, which is proportional to the aspect ratio of the transistor. VG , VS and VD are the gate, source, and drain voltages, with reference to the substrate. is the effective mobility, t is the thermal voltage, Cox is the gate oxide capacitance per unit area and n is the slope factor, slightly greater than unity and weakly dependent on the gate voltage. Parameters if and ir are the normalized forward and reverse currents, or inversion levels at source and drain, respectively. Note that, in the saturation region, the drain current is almost independent of VD ; therefore, if >> ir and ID IF . The = small-signal transconductances gm , gms y gmd (gate, source and drain transconductances) are given by gms(d) = 2IS W Q L IS(D) T gm = 1 + if (r) 1 (3)

gms gmd (4) n The other small-signal parameters can also be derived in terms of the inversion levels. For the sake of simplicity a complete list of expressions is not here presented, but ACM intrinsic capacitances equations [5] were employed throughout the design process. The considered model is a long channel model. Nevertheless, considering the technology analyzed (0.35m), and the fact that there is no interest in working in deep strong inversion, the effect of short channel effects, velocity saturation and mobility reduction may be neglected, as the agreement between calculations and simulations show.

119

B. Design ow For a complete CG-LNA design, the following parameters must be computed: Ls , M1 width WM1 , M1 transistor current ID . M1 transistor length L1 is chosen the smallest available to reach the highest fT for a certain ID . Requirements of NF, current consumption, Zin and gain must be fullled. The design ow is as follows. A couple (ID , gm ) is chosen. ID This determines gm , if , WM1 ( [8]) and Cgs . Ls is obtained from (6). RL is calculated from (7). Power gain GT is calculated using (8) and NF using (9) and (10).

Figure 1.

IV. D ESIGN SPACE EXPLORATION ALGORITHM The proposed algorithm for designing RF CG-LNA considers the design space dened by the DC bias current ID of the active transistor M1 and the gm ratio of this transistor. In ID this design space the current consumption was considered for a given NF and power gain of the CG-LNA. The algorithm is summarized as follows: the design space is covered by a grid of couples (ID , gm ). For each of these ID couples, the N F = N F (ID , gm ) and the power gain GT = ID GT (ID , gm ) are obtained. ID Having explored the design space, as it is shown in the following example in the next section, for a given NF, the minimum of current consumption is in weak inversion. A. Inductor and load resistance constraints The inductor values are themselves a limitation in the design, as they are on-chip inductors. The limitations come from the range of inductors available in the technology used and area constraints, and it marks the feasibility or not of having the circuit completely integrated. In the design space exploration, the constrains in the Ls inductor is considered. It cannot be higher than Lmax or lower s than Lmin , limits imposed by the technology used. s As Ls is also obtained from the couple (ID , gm ), this ID constraint is added in the design space. It means that there is only a valid zone of the design space that can be used to choose a design. As previously mentioned, the RL resistance is also limited by the couple (ID , gm ). If no matching network is to be used, ID certain zone of the design space is neglected for the design, in case negative RL resistance values are needed. V. R ESULTS AND DISCUSSION In the following example, the maps of NF and gain generated by the algorithm are shown and briey explained. Using a 0.35m CMOS technology and working with the ACM model, the design space of a CG-LNA is generated. We will consider the following design requirements: working frequency f0 = 900M Hz, voltage supply V dd = 2.5V , 10dB, NF less than 3dB and current ID power gain GT less than 2mA. Also the output and input impedance must be 50. The output network must be also designed. The design space considering only the power gain evaluation is shown in Fig. 2. The zone where RL < 0 is discarded in the routine as GT is expressed in dB. In Fig. 3 it is shown the

III. CG-LNA DESIGN A. Circuit description The circuit,shown in Fig.1, is a single-ended common gate narrowband (CG-LNA). It consists of the transistor M1 which x the gain and the real part of the input impedance and the inductor Ls , that cancels the imaginary part of the input impedance. Ld , Rd , Cd1 and Cd2 are part of the output matching network and include the parasitic capacitances of M1 . Due to the topology characteristics it is a narrowband device, as the input impedance is real only at the resonant frequency. The input impedance is Zin |s=j0 when 0 = 2rds + RL 1 (s) = gm + gmb 2rds 1 Ls Cgs (5)

(6)

At resonance, Zin must be equal to Rs , as no input matching network is used. It means that, for a xed Rs and a xed transistor size, the load resistance RL is xed and equal to ( [1]): (7) RL = 2(Rs ngm 1)rds It is interesting to note that for certain values of gm and Rs , RL can be negative, as so an input matching network would be needed. The gds is the transistor small signal output conductance. It is modelled as proportional to the transistor transconductance, assuming that in moderate and weak inversion the DIBL effect dominates over the channel length modulation effect ( [7]). The power gain is GT = The NF of this circuit is: N F = 10log(F ) with F, the Noise Factor, given by ( [1]): F 1+ = 2Rs + ngm Rs (ngm Rs 1)rds (10) (9) RL 4Rs (8)

120

Figure 2.

Figure 3. Design space exploration considering NF, with the constraints of RL and Ls . Design point shown. (In the zone marked with an (*) the NF rises to values up to 12dB and then decreases until the RL value is 0.) Parameter ID (mA)

gm ID gm (S)

NF curves jointly with the inductors valid zone, and also the design point chosen. The power gain and the NF are plotted separately to simplify the study. The design point chosen reaches all the design requirements previously mentioned. In Table I the parameters calculated in Matlab for this particular point of the design are shown Analyzing the NF and GT curves, some comments regarding to the selection of the inversion level can be made. For a particular value of NF, we can achieve lower current consumption if we move towards weak inversion. Also, for a certain GT , the power consumption decreases as we increase gm ID . It is interesting to note that both characteristics have the same tendency when we work in moderate or weak inversion. The proposed method, allows to perform a simple design space exploration that provides fairly accurate results when compared with simulation results. Nevertheless some limitations of the accuracy of the applied model must be discussed. For weak inversion the fT is near the operating frequency and particularly for the poind chosen where the technology fT is around 2GHz. As we are not considering non quasi-static effects in the ACM model equations used, there is an error in the parameters calculated in this zone. On the other hand when going towards strong inversion, though usually this will not be the region of interest for operation in this architecture, some error would be associated with neglecting the velocity saturation and mobility reduction effects. VI. D ESIGN EXAMPLE In this section the complete design of the CG-LNA and the simulation results using Spectre-RF, using the above design point are shown. To complete the design, the values of Ld , Rd , Cd1 and Cd2 were calculated for a 50 output load RL , and are shown in Table II. In Table I the values of the simulated NF, gain and ID , among others, are compared with the ones obtained with the Matlab design exploration. Very good agreement exists between calculated and simulated results. In Figs. 4, 5, 6 and 7 the S-parameters are shown. Reverse isolation value is good [1]. The input and output impedance

NF(dB) GT (dB)

Tabla I PARAMETERS CALCULATED WITH MATLAB AND SIMULATED WITH S PECTRE -RF

values are : Zin@900M Hz = (30 + 13j) and Zout@900M Hz = (32 + 11j). These values are not 50 as expected; to adjust them some iterations in the component values of the output network and the source inductance are needed. Also it is shown the simulated NF in Fig. 8. In Fig. 9 is plotted the IIP3, which has a simulated value of 4.6dBm. VII. C ONCLUSIONS A design space exploration to design RF CG-LNA using an all-region CMOS transistor model has been presented in this work. It has been shown how the inversion level affects the NF, the current consumption and the gain. Moreover, is has been shown how operation in moderate or even in weak inversion decrease the current consumption without decreasing GT or NF. The RL constrains has been considered and also is has been studied the importance of considering the inductor size constrains in the design space exploration. Finally, an example of a design was considered and very good agreements between

Parameter W(m) L (m) Ls (nH) Ld (nH) Rd () Cd1 (pF) Cd2 (pF) Value 2000 0.35 0.35 5.5 10 1800 1.7 6.6

121

Figure 4.

S11 parameter

Figure 8.

NF

Figure 5.

S12 parameter

Figure 9.

IIP3

the MATLAB calculations and the Spectre-RF were observed. VIII. ACKNOWLEDGEMENTS The authors would like to thanks the uruguayan project PDT 69/08 for its nancial support. R EFERENCES

[1] P. Leroux and M. Steyaert, LNA-ESD Co-Design for Fully Integrated CMOS Wireless Receivers, 1st ed. The Springer International Series in Engineering and Computer Science, 2005. [2] T. Stcke, N. Christoffers, R. Kokozinski, S. Kolnsberg, and B. Hosticka, Graphical optimization of common-gate LNA, Research in Microelectronics and Electronics 2006, Ph. D., pp. 453 456, Jun. 2006. [3] P. H. Amin Shameli, A novel power optimization technique for ultra-low power RFIDs, in International Symposium on Low Power Electronics and Design, G. Tegernsee, Ed., 2006. [4] V. Varotto and O. Gouveira-Filho, Design of RF CMOS low noise ampliers using a current based MOSFET model, in 17th symposium on Integrated circuits and system design, 2004, pp. 8287. [5] A.Cunha, M. C. Schneider, and C. Galup-Montoro, An MOS transistor model for analog circuit design, IEEE Journal of Solid-State Circuits, vol. 33, no. 10, pp. 15101519, Oct. 1998. [6] C. Galup-Montoro, Mrcio.Schneider, and A. Cunha, A current-based MOSFET model for integrated circuit design. IEEE Press, 1999, ch. 2 of Low-voltage/Low-Power Integrated Circuits and Systems, pp. 755. [7] Y. Tsividis, Operation and Modelling of the Metal-oxide Semiconductor Transistor, 2nd ed. McGraw-Hill, 1999. [8] F. Silveira, D. Flandre, and P. G. A. Jespers, A gm/Id based methodology for the design of CMOS analog circuits and its applications to the synthesis of a silicon-on-insulator micropower OTA, IEEE Journal of Solid-State Circuits, vol. 31, no. 9, pp. 13141319, 1996.

Figure 6.

S21 parameter

Figure 7.

S22 parameter

122

Roberti M., Fraigi L.

INTI Electronics and Informatics National Institute of Industrial Technology (INTI) Buenos Aires, Argentina mariano@inti.gov.ar

Abstract This paper presents the development of a force post compressive load cell, fabricated using Low Temperature Cofired Ceramics (LTCC) technology. It was implemented an LTCC mechanical load cell structure with a z-axis thick film strain gage using two different approaches. Fabrication methods and materials are explored and fabricated devices are presented. Mechanical characterization tests are still in progress, but preliminary load tests with compressive forces were found to exhibit consistent behavior, in a strain level up to 1.500 micro strain.

Gongora-Rubio Mario R.

Institute of Technological Research of Sao Paulo State (IPT) Sao Paulo/SP, Brazil gongoram@ipt.br Some features of this approach are: good performance to cost ratio, robust design, long-term stability and suitable for difficult environments. The structure consists of several green tape layers machined by using a printed circuit board prototyping CNC machine accompanied by metal electrodes and posts of thick film piezoresistive paste, embedded in sacrificial materials.

II. I. INTRODUCTION Micro-fabrication technologies have played a fundamental role in the development of MEMS/MST. LTCC technology is an excellent option for 3D sensors devices implementation with several advantages when compared with other microfabrication technologies [1]. Embedded passive components are well implemented in LTCC in order to improve MCM packaging density. Microvolume resistors were manufactured, by filling a via hole with a proper thick film paste [2] to obtain resistors, thermistors or varistors. In addition investigations of electrical properties of LTCC resistors were conducted [3] in order to verify compatibility of LTCC materials and pastes from different manufacturers. Recently new LTCC-MST developments were introduced allowing new fabrication techniques feasible [4,5]. Load cells are used in industry for weighing measurements. Basically a load cell is an elastic element to which an appropriate type of strain sensor is bounded. The application of a force to the elastic element cause a deformation sensed by the strain sensor providing an electrical output proportional to the applied force. In the present work is proposed a novel structure designed for low cost compressive force sensing. Fabrication methods of different meso-scale load cell structures, implemented using LTCC technology are also presented.

STRAIN SENSORS

The change in resistance of any resistor under applied stress is due to changes in the dimensions of the resistor and modifications of its material conductivity as a result of microstructural changes. The gage factor (GF) of a resistor is defined as the ratio of the relative change in resistance (R/R) and the applied strain (l/l) GF = (R/R) / (l/l) (1)

Metals are affected only by geometrical changes resulting in GF of 2 to 2.5. Semiconductors, thin film and thick film resistors displays higher gage factors. Besides geometrical changes because of applied strain, resistor shift is due to micro-structural modifications in material conductivity. A sensing element based on the thick resistive films piezoresistive effect, as reported by [6] is being used in this work. The longitudinal GF values of thick film 10K/ resistors are usually between 9 and 20, rendering an appropriate sensor for the proposed application. Geometry of piezoresistive sensors implemented using thick film resistor materials are displayed in Fig. 1, (a) a planar structure, (b) a vertical structure, (c) a vertical structure with surrounded dielectric and in (d) the proposed novel post vertical structure suitable for z-axis force sensing.

123

R = (

z ) x y

(2) With

z =

With = film resistivity; z= thickness of resistor and x and y lateral dimensions of resistor. Force sensors using thick film sensors have been proposed in [6, 7], using planar geometry, see figure 1(a), but planar methods are generally indirect measurements. Z-axis sensitive devices exhibit higher GF, good thermal stability and are direct measurement technique when compared with conventional planar sensors.

Electrode Applied Force Electrode

Upper Electrode Applied Force Lower Electrode

z 1 4 Fz = z E d2

(3)

R 1 4 Fz = GF z = GF R E d2

Upper Structure Applied Force Metal Sphere

(4)

(a)

Upper Electrode Applied Force Lower Electrode

(b)

Upper Electrode Applied Force

Piezoresistive material

Piezoresistive material

(c)

Figure 1.

Lower Electrode

Substrate

(d)

Thick film strain sensor geometries.

Piezoresistive Post

A design of z-axis piezoresistive force sensor, using geometry shown in figure 1b; obtaining high levels of strain. Another implementation of a vertical z-axis thick-film piezoresistive resistor, surrounded with dielectric, for multipoint load sensor with sensing resistances of 2.5 K and sensitivities of (R/R) about 1.3% at 6 bar pressure using geometry depicted in figure 1c. At this time we introduce a novel LTCC sensor conception for a z-axis piezoresistive sensor using geometry illustrated in figure 1d. Sensor is composed of a piezoresitive material post with upper and lower electrodes for electrical contact. III. SENSOR OUTLINE

Contact

IV.

FABRICATION

In order to know the behavior of each post of load cell we have decided to fabricate a single post device. The structure are fabricated whit two different sacrificial materials, high purity carbon black tape of 200 m thickness (TCS-CARB-1) and Setter Powder SheetTM (SPS) with thickness of 127 um both from Harmonics, Inc. With the latter we obtain best aspect ratio and form in cavities and resistor post using two of 127 um thickness. Applied Force

Cavity

L8

L7

Operation for compressive forces offer direct force measurement and allow the use of arrays for vectorial force decomposition, without mechanical translation. In Fig. 2 is displayed the outline of a z-axis load cell force sensor. Force is applied through a metal sphere. Three posts at an angle of 120o to each other, connected to an upper structure, receive the decomposed force, each post sustain 1/3 of total applied force. The relation for single post configuration is 1:1, the total force is applied on to one post.

L6

L5

L4

L3

L2

L1

L0

124

The post has a diameter of 950 m and a 200 m height. Pouring method is very misleading and several steps of paste drying are needed. We decided to prepare piezoresistive material from thick film paste by removing the organic content of the paste using acetone in ultrasonic bath followed by heating up to 250 oC. Finally we use grinding to obtain a fine powder. The optimization method for former the post resistor was obtained from a pellet with this fine powder. This pellet was implanted in to L1-L6 before sintered. The rest of the manufacturing was done with the standard flow process of 951 LTCC system. Thick film 6146 paste was used for electrical contacts, both paste and ceramics are from DuPont. The characterization system for fabricated single post device is composed of a loading structure for applying static forces to the fabricated sensor, signal conditioning electronic circuitry, data acquisition and a PC workstation for data analysis, as shown in Fig. 4.

Applied Force

This results corresponding for two sensor fabricated, the variation of l are very similarly for both sensors tested, one of this are presented in a Fig. 6.

V.

CONCLUSION

LTCC technology proved to produce z-axis compressive force sensor suitable for load cell applications. The fabricated sensors presents large buried cavities without sagging; buried resistive post with good aspect ratio and shape; good electrical contact after applying force several times; repeatability of fabrication process and low drift whit temperature. Further measurements for sensor characterization in higher and lower loads and temperature behavior, are in progress. REFERENCES

[1] M. R.Gongora-Rubio, P. Espinoza-Vallejos, L. Sola-Laguna and J. J. Santiago-Aviles; Overview of Low Temperature Co-Fired Ceramics Tape Technology for Meso-System Technology; Sensors & Actuators A, Physical; v. 89, 2001, pp. 222 - 241. L.J. Golonka, K.-J.Wolter, A.Dziedzic, J. Kita & L.Rebenklau; Embedded passive components for MCM; 24Th International Spring Seminar on Electronics Technology, Calimanesti-Caciulata, Romania, May 5-9, 2001, pp. 73-77. A.Dziedzic, L.J. Golonka, J. Kita, H Thust, K-H Drue, R Bauer, L.Rebenklau & K-J Wolter; Electric and stability properties and ultrasonic microscope characterization of Low temperature co-fired ceramics resistors; Microelectronics Reliability 41, pp 669-676, 2001. Peterson K.A., Rohde S. D. Walker C.A., Patel K.D., Tuner T.S. and Norquist C.D., Microsystem integration with new techniques in LTCC, Proc. of Ceram. Intercon. Tech. Conf., IMAPS, Denver (2004) 19-26. Peterson K. A., Patel K. D., Ho C. K., Rohde S. B., Nordquist C. D., Walker C. A., Wroblewski B. D. and Okandan M.; Novel Microsystem Applications with New Techniques in Low-Temperature Co-Fired Ceramics; International Journal of Applied Ceramic Technology, 2 [5] 345363 (2005). M. Prudenziati (Ed.) Handbook of sensors and actuators 1: Thick film sensors, Elsevier 1994. White N. M. and Brignell J. E.; A Planar Thick-film Load Cell; Sensors and Actuators A, 25-27(1991) 313-319.

The fabrication of single post structure whit Ro = 1,3K and temperature behavior of free posts (no mechanical loads) was accomplished obtaining good stability and having variations of no more than 1% of Ro. The single post structure was loaded from 0 to 15 N in a INSTRON force calibration equipment, displacement was monitored using a Laser displacement meter and resistance was measured using a Keithly 2000. Loading results of single post force sensor are presented in Fig. 5, at this time we measured %R/R and L.

[2]

[3]

[4]

[5]

[6] [7]

125

G. San Martn*, P. Julin* and P. Mandolesi*

*

Instituto de Investigaciones en Ingeniera Elctrica IIIE (UNS-CONICET) Departamento de Ingeniera Elctrica y de Computadoras Universidad Nacional del Sur Avda. Alem 1253 (8000) Baha Blanca E-mail: {gsanmartin,pjulian,pmandolesi}@uns.edu.ar must be the sufficiently reliable and robust so it can bear conditions of operation that would damage the rest of IC. The tests here presented consist on the verification of the rectifier circuit working under different load conditions, and the efficacy in the turn-on of the ESD protection circuit. The paper is organized as follows. Section II describes the circuits of the IC; Section III presents the experimental results; finally, the conclusions are shown in Section IV. II. A. CIRCUIT BLOCKS

AbstractIn this paper the front-end implementation of an RFID integrated circuit (IC) for a frequency of 134.2 KHz is described. The front-end includes a rectifier circuit and over-voltage and ESD protection circuits. Experimental results are presented. The IC was fabricated in the AMI 0.5m process through MOSIS.

I.

INTRODUCTION

An RFID (Radio Frequency Identification) IC consists of a set of electronic circuits that permit information exchange between a reader device and a device which carry data. RFID systems are very useful in automatic detection and identification of people, objects and animals [1]. The principle of operation and the standards for RFID systems for animal identification have been presented in [2] and [3]. Basically, a resonant circuit formed by a coil and capacitor is used for information exchange using radio frequency. The coil is excited by the activation electromagnetic field emitted by the reader. This electromagnetic field induces a voltage in the coil, which is used for powering the IC circuits. The coil also acts as a transmission antenna when it responds with its identification code. The coil and the capacitor are connected at the two pads on the IC. An ESD protection circuit is used to protect the internal circuits against over-voltage peaks (the gate oxide breakdown voltage in CMOS technology are getting lower and lower as technology scales down). Another protection circuit is necessary to limit the induced voltage at levels permitted by the technology. The proximity voltage between the reader and the IC can reach several hundred volts [4][5][6]. Inside the IC, a circuit that converts the incoming RF signal provides the DC voltage for all the digital circuits. The architecture as well as a description of every circuit has been presented in [3] and [7]. The implementation of an IC RFID front-end should thus generate a DC voltage supply from the RF signal while also recovering from it a clock and data signals. The front-end

Rectifier Bridge The circuits more oftenly used in inductive coupling are analyzed and compared in [8]. The efficiency of the rectifier circuit in general is reduced by the threshold voltage drop caused by the diode-connected Mosfets, the leakage substrate and the channel parasitic resistor. The main advantage of the gate-cross bridge, as shown in Fig. 1, is the smaller turn-on voltage and thus the higher output voltage level. They are more efficient, from a power consumption viewpoint, for low input levels. The circuit implemented uses two NMOS transistors N1 and N3, and two other transistors N2 and N4 connected as diodes. Due the fact that the voltage of ant1 and ant2 are always in high and low, respectively, the gates of N1 and N2 are cross connected, and work as a switch. B. Over Voltage Protection The net shown in Fig. 2, formed by three PMOS transistors connected as diodes, are part of the over voltage protection circuits. The VA1 node is connected to Vrec. If Vrec is lower than 3VTP, VTP being the threshold of the PMOS transistors, the circuit will not affect the rectifier's normal behavior. But, if Vrec is high enough to turn on the diodes, current begins to circulate by R1.When the voltage drop in R1 reaches the threshold of N1 and N2, these transistors turn on, causing a current flow from ant1 to ant2. In consequence the input voltage is reduced due to the change of the equivalent impedance. The VA terminal is a testing point.

126

nodes are usually connected in this type of configuration, they are left unconnected here for testing reasons. The transistors of the ESD protection circuit follows the recommended technology rules. These rules specify the minimum transistor width, the optimum contact to gate spacing, and the implementation of transistors with multiplefinger structure. The resistor R2 and the capacitor C are dimensioned using the HBM (Human Body Model) model [10]. The R2-C net is activated by any over-voltage peak. The devices which intervene in the discharge of the transient of current, satisfy the withstand voltage requirement of 1.5 kV HBM. So, the IC is compatible with the standard production requirement, typically 1 to 2 kV HBM.

Fig. 1. Rectifier bridge

III.

EXPERIMENTAL RESULTS

C. ESD Protection Between the IC inputs a protection circuit against ESD is introduced, which provides a safety low impedance path between ant1 and ant2. The architecture of this protection circuit is based on a circuit proposed by Facen and Boni [9], modified in order to fulfill the needed requirements for our particular system. Circuit is shown in the Fig. 2.

Measurements were done on unbonded dies of the IC, using a probing station fabricated in the lab with micromanipulators Rucker & Kolls. The signals were generated using a signal generator Agilent 3220B. Measurements were taken with an oscilloscope Tektronics TDS 3052. Figure 3 shows a photography of measurement setup. In the Fig. 4 a microphotograph of the IC is shown, which also shows the probing points with the names referenced in the text. Table I shows the size of devices used.

This circuit has the gates of transistors N1 and N2 tied to ground. Therefore the structure formed by N1-N2-P4-P5 is equivalent to a full wave bridge rectifier, and allows the ground and Vrec potentials to track fast impulsive voltage peaks. The over voltage peak on Vrec caused by an ESD event is reported on the gate of N3 by the capacitor C, turning N3 transistor on and allowing a massive current to flow through N1-N3-P4 or N2-N3-P5. Once exhausted the over peak, the resistor R2 allows to turn off N3. Though the VA1 and Vrec

ISBN 978-987-655-003-1 EAMTA 2008

A. ESD Protection In order to verify the efficacy of the ESD turn-on protection net, an 8 volt pulse with a 200ns width is applied between the pads ant1 and ant2. The pulse has a rise time of 5 ns, similar to the rise time of an HBM ESD pulse (typically 215 ns [10][11]). This pulse does not cause drain breakdown on N1 in the ESD protection circuit. The generator has a limited output current capability. The voltage pulse will be degraded if the protection circuit is turned on, because the generator cannot deliver enough current to sustain the voltage on the pads [11]. A negative pulse was also applied. Figures 5 and 6 show an obvious degradation of the input pulse.

127

Fig. 5. The degraded voltage waveform when the 8 Volt voltage pulse is applied.

Fig. 4. IC microphotograph

TABLE I.

120 m / 4.8 m 840 m / 0.9 m 42.1 m /7.5 m 10 pF 27 pF 200 k 12 k

Device

Rectifier Bridge Transistors ESD Protection Transistors Over-Voltage Protection Transistors Capacitor, CS Capacitor, C Resistor, R1 Resistor, R2

In order to verify the performance the R2-C net, a voltage pulse is applied between the pads ant1 and ant2. The voltage pulse has a pulse height of 3.3 V and a pulse width of 20 us and a rise time of 5 ns. Figure 7 shows the waveform obtained with the oscilloscope on the probing point ant1-GND and Vrec-GND.

Fig. 6. The degraded voltage waveform when the - 8 Volt voltage pulse is applied.

B. Rectifier Bridge The rectifying bridge was measured under different load conditions I1 = 1uA, I2 = 3 uA and I3 =10 uA. The input voltage is set at 3.3 V of amplitude and 134.2 KHz of frequency. Figures 8-10 show the waveform obtained with the oscilloscope on the probing points ant1-GND, ant2-GND and VCAP-GND. Table II summarizes the values of maximum, average and ripple voltage.

1 uA 5 uA 10 uA

Vavg (V)

2.25 2.15 1.83

Vmax (V)

2.62 2.809 2.19

Vripp (V)

0.65 1.32 0.995

128

IV.

CONCLUSION

Experimental results of the implementation of the frontend of a RFID IC have been shown. The turn-on of the protection circuit against ESD has been verified, therefore it can be used as a protection circuit for the rest of the analog and digital circuits of the RFID IC. A complete wave rectifier implementation has also been verified. ACKNOWLEDGMENT This paper was partially funded by the research projects: PICT 2006 No. 1835 Agencia Nacional de Promocin Cientfica y Tecnolgica (ANPCyT); PGI-UNS 2006 No. 24/ZK17 and PGI-UNS 2006 No. 24/ZK17. REFERENCES

Fig. 8. Rectifier voltage for a load of 1 uA. W. J. Eradus and M. B. Jansen, Animal identification and monitoring, Computers and Electronics in Agriculture, 24, no. 5, pp. 9198 ,1999. [2] K. Finkenzeller, RFID Handbook: Fundamenals and Aplications in Contactles Smart Cards and Identification. New York: John Wiley & Sons, 2003. [3] G. San Martn, P. Julin, and P. Mandolesi, Front-end de un chip rfid para identificacion de animales, XII Reunin de Trabajo en Procesamiento de la Informacin y Control, no. 360, October 2007. [4] U. Kaiser and W. Steinhagen, A low power transponder ic for highperformance identification systems, IEEE Journal of Solid-States Circuits, vol. 3, no. 3, pp. 306310, March 1995. [5] Y. Li and J. Liu, A 13.56mhz rfid transponder front-end with merged load modulation and voltage doubler-clamping rectifier circuits, IEEE International Symposium on Circuits and Systems,vol. 5, pp. 5095 5098, 2005. [6] G. K. Balachandran and R. E. Barnett, A 110na voltage regulator system with dynamic bandwidth boosting for rfid systems, IEEE Journal of Solid-States Circuits, vol. 41, no. 9, pp. 20192028, september 2006. [7] G. San Martn, P. Julin, and P. Mandolesi, Front-end para un rfid en 0.35um, Segunda Escuala Argentina de Microelectrnica,Tecnologa y Aplicaciones, pp. 3739, september 2007. [8] Z. Zhu, Z. B.Jamali, and P.H. Cole, Brief Comparison of Different Rectifier Structures for RFID Transponders, http://www.mlab.ch/autoid/SwissReWorkshop/papers/ /BriefComparisonOfRectifierStructuresForRFIDtransponders.pdf, Available online. [9] A. Facen and A. Boni, A cmos analog frontend for a passive uhf rfid tag, Proceedings of the 2006 International Symposium on Low Power Electronics and Design, pp. 280285, Oct. 2006. [10] S. G. Beebe, Methodology for layout design and optimization of esd protection transistors, Proceedings Electrical Overstress/Electrostatic Discharge Symposium, pp. 265275, Sept. 1996. [11] M.-D. Ker, Layout design to minimize voltage-dependent variation input capacitance of an analog esd protection circuit, Journal of Electrosatics, vol. 54, pp. 8793, 2002. [1]

129

CNN Digital Pixel Processor Cells for Automated Design: Experimental results

M. Di Federico, P. Julin, P. S. Mandolesi Instituto de Investigaciones en Ingeniera Elctrica - IIIE (UNS-CONICET) Departamento de Ingeniera Elctrica y de Computadoras Universidad Nacional del Sur Avda. Alem 1253 (8000) Baha Blanca

AbstractThis paper shows the design and presents experimental results for two different isolated cells of a simplicial Cellular Nonlinear Network (S-CNN) digital pixel processor. The cells were integrated in an n-well non-silicided 0.35m process.

II.

I.

INTRODUCTION

A Cellular Nonlinear Network (CNN) is a parallel computational structure consisting of an array of interconnected cells. The computation of the next state of the array is based on the processing capabilities available at each cell, in this case a dynamical system with a local state, inputs and outputs. Several CNN realizations have been reported in the literature [2], [3], but the operation of most of them is based on the standard CNN [4]. The state difference / differential equation of the S-CNN uses a simplicial piecewise linear function to calculate the next state. This algorithm was original proposed in [1]. This cell is proposed to be used as a standard cell on a digital pixel processor automated design flow. The cells tested here, are an improved version of the architecture proposed on [5] and [6]. This circuit is all static CMOS, the [6] architecture is pre-charged logic, and less control signals are needed to perform the operations. The different blocks were integrated in an n-well nonsilicide 0.35m TSMC CMOS process through the service provided by MOSIS.

As was previously indicated each cell has a state, inputs and outputs. The inputs to a cell are the converted values of the intensity of light corresponding to the cell and the neighboring pixels (In this particular case, the sphere of influence of the cell contains five neighbors: up, down, left, right and the cell itself). The state consists of registers that can store previous values of the inputs, or other arbitrary values. The output is the information that each cell sends to the neighbors. Based on all these variables, a state equation produces the time evolution of the cell, using a piecewice linear function. The value of the cell state is encoded with a 5-bits digital word, and is stored in a register. The input is also coded with a 5-bits digital word The next state of each cell is calculated according to the algorithm described in [5], which is briefly summarized next. The input ui and the state xi are compared to a 5-bits digital ramp during a cycle, called program cycle (PC) of 32 steps. The resulting signals are two 1-bit signals (Upwm, Xpwm ) that have the information of the cell input and state, respectively, coded in time. These signals and the equivalent signals belonging to the four neighbors, are arranged to form 5-bit time-coded words (uf, xf). These words are used to address the memory and retrieve the parameter values, which are the values of the logic function to be implemented by the SCNN, G(uf) and F(xf), respectively. As the memory is not in the cell, for every single step of the PC, a digital ramp and the

130

values of the memory are distributed to all cells through the bus. Each cell latches the function value when the correct address is in the bus. This internal cycle is called the memory evaluation cycle (MEC). At every step of the PC, the value of both functions retrieved from the memory (one-bit for G(uf) and another for F(xf)) can be operated by selecting a logical function (AND, OR, XOR) and this value is integrated by a counter, which at the end of the PC will update the state. Each cell has the necessary circuits to: Store the state and the input value. Compare the State with the Digital Ramp (Program cycle) Obtain the PWM Signal to send to the neighborhood Compare the inputs with the Digital Ramp (Evaluation Ramp) Obtain from the broadcast the correct value of the function Integrate the function obtained from the memory. Compute the new state based on the actual state, inputs and outputs.

A. Static Cell Architecture

The Dynamic cell uses precharge and evaluate logic to perform all the operations needed to calculate the evolution of the cell. In this case, for program and memory evaluation, during one clock cycle, the internal nodes are precharged, and during the next clock cycle, the logic is evaluated. If the resulting logic function is 0, the node is discharged. In order to perform this operation the signal and the complement signal are needed. All the signals needed are stored in an SR type Flip Flop with clock enable. The input and state comparisons versus the digital ramp used to generate the Upwm and the Xpwm signals, are done by precharging the parUPWM and the parXPWM nodes and discharging them with a partial XOR gate, implemented bitwise. The output node is first precharged. Afterwards, the data is evaluated and the node is discharged if there is a difference between the inputs. The circuit is shown in Fig. 2.

The Static cell uses static CMOS logic to perform all the operations needed to calculate the evolution of the cell. The design of the cell is shown in Fig 1.

Neighbors

Neighbors

Figure 3 shows the partial pre-charged node which is charged before the realization of the comparison, and the SR flip flop where the UPWM and the XPWM signals are stored.

The input and the cell state are stored in a single clock D type Flip Flop with clear. The comparator used to compare the state value with the digital ramp and generate the Upwm and the Xpwm signals, is a modular comparator. The comparator used to compare the uf and the xf signals is a regular equal comparator. The logical Function between G(uf) and F(xf) is executed by the FoG Block; this block selects between an OR, AND or XOR function. A modular counter is used to integrate the value of the FoG Signal.

The comparator used to compare the uf and xf signals is the same as the other comparator. The FoG function is implemented by an FoG block that computes the logical function between G(uf) and F(xf).

IEEE Catalog number CFP0854E-CDR

131

The counter has been implemented indirectly in the following way. At the end of the PC, if the count (stored in a register) needs to be increased by one, a flag is set, and in the next MEC an equality comparator detects when the ramp is equal to the current count, and stores the next value of the ramp, therefore increasing the register value by one. This eliminates the need for a counter, and can be done very efficiently using an equality comparator. The layout of both cells is shown in Fig 4. The one on the right is the static cell and the one on the left is the dynamic cell. The design of the dynamic cell is complex but is a very small circuit. The size of the dynamic cell is aprox 8.000 um2, and the size of the Static cell is 18.000 um2. As the reader can notice the size of the dynamic cell is smaller than the size of the static cell. A photograph of the chip is shown in Fig 5.

There are only one Output Pin to read all the internal nodes. A multiplexer is used to select the signal to read. The selection in the multiplexer is selected with a counter selector using the ClkO input as a Clk. The value of the cell counter as well the value of the cell state can be read from the BUS I/O. In order to store the data in the SR flip flop the complementary value of the bus is needed. The BusComplement signal generate this data. The neighbors values are loaded via shift registers using the InSReg and the ClkNei signals. Function and logic operations values are stored in another shift register using the ClkFG clock. In order to reset the UPWM, FPWM F, G, State signals and latch the state and input values a decoder is used.

III. TEST RESULTS

This section describes the experimental results of the CNN Cell. The experimental setup and the different data obtained from the testing of both the static and the dynamic circuits are described.

A. Interface

Both cells were connected using the same set of signals. The Signal Din/CMOS is used to select the cell to test, and interrupt the input signals to the unselected block. Twenty (20) signals were used to test both circuits as show next. I/O Bus(4..0) : Input and Output Bus Bus Dir : Bus Direction BusComplement : Complement the BUS in the Dynamic logic ClkE, ClkP, ClkR, ClkNei, ClkFG, ClkO : Clocks PCb : Pre Charge in the dynamic logic InSReg : Shift Register input Deco(2..0) : Decoder input value Din/CMOS : Dynamic / CMOS Cell selection Out : Node values OUT

The experimental setup consists of a Digilent Spartan3 development Kit (Starter Board), connected to a PC via RS232. The FPGA has a synthesized Microcontroller (Picoblaze) in order to send and receive information from the PC, and generate all the control and data signal necessary to test the circuit. A board was specially designed to place the chip, and connect to the FPGA. A program in MatLab was written to generate all the commands and read the output signals. A set of commands were implemented in the microcontroller in order to perform the test. Each test is executed by the microprocessor when the correct parameter is sent via RS232 to the Microprocessor. Before starting with the cell test, the test bench was verified. The correct behavior of the microprocessor was verified and also the generation of signals from the FPGA. The second step was to verify the power static consumption. Finally, all data registers and shift registers were tested.

132

B. Cells Testing

In both tests shown here, the data in the registers is the same. The values of the vectors uf and xf are stored in the Shift register and its values are: uf = 00101b (5d) xf = 01010b (10d) The cell state has a decimal value of 20d and the input is 15d xi = 10100 b (20d) ui = 01111 b (15d) Once all the registers are loaded with this data, and the signals reset, an incremental digital ramp is generated an all the signals are read on each ramp cycle.

1) Static Cell Test Results

The Dynamic cell test results are shown in Fig. 7. All the nodes with the letter p are the pre-charged nodes. The pFX is the resulting of the comparison between the Bus and the neighbors values. This signal is used to retrieve from the memory the values G(uf) and F(xf). The outputs pUPWM, pXPWM, UPWM and XPWM are the signals shown in Fig. 3. The UPWM value is obtained using the pUPWM as the Set input for the SR Flip Flop.

IV. CONCLUSION

Two different cell designs for an S-CNN digital pixel processor were designed and tested. These cells will be used in the future for an automated design flow of a digital pixel processor. As was shown, the static structure is more than twice the size of the dynamic cell, although this last one is much more complex in term of the control signals that are required. V. AKNOWLEDGMENTS This work was partially founded by project ANPCyTPICT 2006 No. 1835 and PGI-UNS 2006 No. 24/ZK17. VI. REFERENCES

The Static cell test results are shown in Fig. 6. The U_PWM and the X_PWM signals are the result of comparing the state value and the input value with the digital ramp. The signals GetFx and GetGu are the signals used to get the value of the function G(uf) and F(xf) during the MEC. These signals must be 1 when the BUS is equal to the value of the outputs of the neighbors and the output of the own cell. The U_PWM and X_PWM signals are generated during the PC. The G(uf) and F(xf) are obtained from the memory in the MEC. During the PC the GetGu and GetFx signals are generated but no value is get from the memory, and during the MEC the U_PWM and X_PWM signals are not latched. This test show all the signals in the same cycle. The FoG signal is the output of the operation between Fx and Gu. This value is the input to the counter. In this case the FoG output is 1 and the counter is counting.

2) Dynamic Cell Test Results

[1]

[2]

[3]

[4] [5]

[6]

P. Julian, R. Dogaru, and L.O. Chua, A piecewise-linear simplicial coupling cell for CNN gray-level image processing, IEEE Trans. Circuits Syst. I, 2002, vol. 49, no. 7, pp. 904913 J. Cruz and L.O. Chua A 16 _ 16 cellular neural network universal chip: the first complete single-dynamic computer array with distributed memory and with gray-scale input-output, Analog. Integr. Circuits Signal Process., 1998, vol. 15, no 3, pp. 227238 A. Rodriguez-Vazquez, G. Linan-Cembrano, L. Carranza, E. RocaMoreno, R. Carmona-Galan, F. Jimenez-Garrido, R. Dominguez-Castro and S.E. Meana ACE16k: the third generation of mixedsignal SIMDCNN ACE chips toward VSoCs, IEEE Trans. Circuits Syst. I, 2004, vol 51, no 5, pp. 851863 L. O. Chua CNN: a vision of complexity, Int. J. Bifurcation Chaos Appl., 1997, vol 7 no 10, pp. 22192425 P. S. Mandolesi, P. Julian and A.G. Andreou A scalable and programmable simplicial CNN digital pixel processor architecture , IEEE Trans. Circuits Syst. I, 2004, vol 51, no 5, pp. 988996 M. Di Federico, P. S. Mandolesi, P. Julin and A. G. Andreou, Experimental Results of a Simplicial CNN Digital Pixel Processor, Electronics Letters, vol. 44, no. 1, pp. 27-29, 2008.

133

Marcelo J. Bruno

Departamento de Ingenieria Electrica y Computadoras Universidad Nacional del Sur Bahia Blanca, Av. Alem 1253 Tel. 0291-4595100-int 3316 Email: mbruno@criba.edu.ar

Lucas Citta

Departamento de Ingenieria Electrica y Computadoras Universidad Nacional del Sur Bahia Blanca, Av. Alem 1253

Juan E. Cousseau

IIIE - Conicet, Departamento de Ingenieria Electrica y Computadoras Universidad Nacional del Sur Tel. 0291-4595100-int 3313 Bahia Blanca, Av. Alem 1253 Email: jcousseau@uns.edu.ar

Abstract This paper presents a two-stage design of a 6 Watt RF linear power amplier following WiMAX radio interface specication. The main design goal chosen is linearity instead of efciency. By this reason the design must be complemented with a predistortion scheme. The main source of nonlinear impairment, i.e., 3rd-order intermodulation distortion product obtained is below -32 dBc, with a minimum ripple of 0.5 dB over a 150 MHz of bandwidth centered at 2.4 GHz. Due to the specied bandwidth, the DC decoupling lters and input-output matching networks are implemented mainly with microstrip components in both stages. A total gain of 27 dB was measured, which makes this amplier able to be interconnected directly to low power baseband signal processors.

into account. For current and future investigation purposes we chose Class A amplier. In this manner, high efciency is obtained with additional design steps, mainly by adding an external predistortion circuit. The presentation of this work is as follow. Section II briey describes some topics of high linear RFPA theory and modeling, and introduce the necessary gures of merit required for test and evaluation. Section III describes the design of each amplier stage and decoupling lters. In Section IV we present simulations and measurements results. Finally, in Section V a discussion on previous results and conclusions are carried out. II. PA MODELING ASPECTS The nonlinear operation of RFPA must be avoided or minimized in some way. To deal with these nonlinear phenomena we need to understand the nonlinear behavior of the RFPA. Thus, a model of the nonlinear behavior is needed and here we briey introduce a simple model. The selected model for our analysis is based on power series and it is also known as polynomial model [12], [13]. It will provide us with the necessaries gures of merit for the RFPA nonlinear performance evaluation. The polynomial model describes the behavior of the RFPA caused mostly by the nonlinear nature of the active devices (bipolar or FET transistors) used to build the amplier. Its origins are mainly related with the input-output relation of the active device. For example, in a FET based amplier, the most important nonlinearity is the relation between the gate voltage(input) and the drain current(output), although a nonlinear relationship between drain current(output) and drain voltage(output) can be veried [14],[15]. The power series or polynomial representation [12], [13] relates the input (vi ) and output (vo ) signals in the following way:

I. INTRODUCTION Due to its high power operating regime, closely related to the nonlinear impairments that require to be reduced, the radio frequency power amplier (RFPA) plays a key role in the actual signal quality of broad band telecommunications systems [1], [2], [3]. Thus, it should cause no surprise that nowadays, the RFPA has becoming the subject of many new studies intended to understand its limitations and then optimize its performance. Moreover, not only design issues are of major concern but also compensation techniques and circuits. Some of these compensation techniques, like Predistortion (digital or analog) [4], Cartesian Feedback [5] or Envelope Elimination and Restoration (EER, or Khan method [6]) are currently investigated and commercialized. For example the old EER technique, dating from earlier AM transmitters, is used now in commercial WiMAX equipment [7]. In this context, broad band wireless communications systems are continuously pushing design requirements for RFPA and compensation circuits. The main topics involved in RFPA design are related to: linearity, efciency (power) and bandwidth [4], [8], [9], although the so called bandwidth dependent effects or memory effects are gaining attention [10]. It is well known that the later requirements are opposite design goals, for this reason the modern approach is to take into account the design of RFPA together some linearization conguration. In this work is used the predistortion approach [11], which means that the design goal is linearity. Then, only Class A or AB operation mode ampliers could be taken

ISBN 978-987-655-003-1 EAMTA 2008

vo =

n

n an vi .

(1)

134

Keeping in mind a high linear RFPA design, the model utility will stand out when harmonic signals are applied to his input. To understand the that, let see two cases of different input signals. First, a sinusoid voltage like Eq.2 is applied to the input model: vi (t) = cos(1 t), (2)

Back-Off (OBO) and is calculated as [20]: OBO = 10log10 POU T,P eak POU T,Avg (6)

in this situation the output voltage in response to that sinusoid will be expressed by Eq.3[9]: vo = a1 cos(1 t) + a2 cos(1 t)2 + a3 cos(1 t)3 + ... Second, the input voltage is the given by Eq. 4: vi (t) = cos(1 t) + cos(2 t), then, the output voltage results like Eq.5 [9]: vo (t) = a1 [cos(1 t) + cos(2 t)]2 + +a3 [cos(1 t) + cos(2 t)]3 + +a4 [cos(1 t) + cos(2 t)]4 + +a5 [cos(1 t) + cos(2 t)]5 + ... (4) (3)

where POU T,P eak = P1dB,Comp . By Eq.6 we can verify that the OBO value in dB is the same as the PAPR of the signal in dB. Another key issue related to the PAPR of the signal (and the resulting OBO) is its relation with the maximal achievable drain efciency. For example a Class A amplier, has a 50% maximum of theoric efciency, but this calculation is assuming a sinusoid signal at the input. If the input is other than deterministic sinusoid, and also provided with some statistic behavior (described by its PAPR) the calculation of efciency changes. From the amplier perspective, the most efcient values (to be amplied) will occurs with the lower probability. Then, statisticaly speaking, the amplier is most of the time amplifying the less efcient values. For this reason lower efciencies must be expected when higher PAPR signals are amplied. In OFDM signals the maximal theoretical efciency is between 5% and 20%. It is straightforward to verify the origin of the linear gain compression. A voltage vi like Eq.(2) is applied to the input of a nonlinear system represented by Eq. (1) (up to third order). Making some basic trigonometric operations on Eq. (3), we obtain the following relationship between the output voltage v0 and the input voltage vi : 3 vo = a1 cos(1 t) + a3 cos(1 t)3 (7) 4 By Eq. (7) we can see that the output voltage vo depends on linear gain, coefcient a1 , and the nonlinear gain, coefcient a3 . The sign of a3 coefcient must be negative for compression. B. IMD3 By denition, Third Order Inter Modulation Distortion (IMD3) are the harmonic voltages generated by nonlinearity due to inter modulation of two different harmonic components at the input of a nonlinear system. This new set of frequencies results form sums and subtractions of different harmonics frequencies. They also have its origin in the odd order distortion coefcients of Eq. (1). Its value is obtained with a Two Tone Test Measurement Setup ([16], Chapter 4). It is really important avoid or minimize these products since they fall inside of the operation band or adjacent band, either the systems are broad band or narrow band. Looking at the Fig.1 we can observe that: the in band IMD3s products can not be eliminated by ltering and the adjacent band IMD3s products requires cost ineffective ltering for their cancellation. For linear ampliers the importance of this gure of merit is related with the output dynamic range of the RFPA. More precisely, is related with the Spurious Free Dynamic

IEEE Catalog number CFP0854E-CDR

(5)

The main observation about the outputs in either case (equations (3) and (5)), is that a set of new frequencies has been created at the output of the nonlinear system. This phenomena points to the convenience on the use of harmonic signals for analyze the nonlinear behavior of the RFPA. Based on this consideration there are two straight forward measurable gures of merit that helps in the RFPA design and analysis. Such gures of merit are: the 1 dB Compression Point ([16]Chapter 4) and the Third Order Inter Modulation Distortion (IMD3)([9] Chapter 2). Next subsections briey points out the utility of these metrics and their relations with the polynomial model.

A. 1dB Compression Point By denition, 1dB Compression Point is the input power value at which linear power gain is reduced in 1dB from the small signal gain. This gain reduction is related to the odd order coefcients of the polynomial model described by Eq. 1. Its value is obtained with a One Tone Test Measurement Setup ([16], Chapter 4). For linear ampliers the importance of this metric is closely related with the dynamic range of the input signal [17], [18]. Signals like those generated by OFDM modulators (our study case) present high values of Peak to Average Power Ratio (PAPR) [19]. This means that the peak signal values are far apart from average signal value. By this reason, the transistor must be biased below of the optimal power operation point to avoid symbol distortion (clipping) caused by saturation [4]. This new power operation point is called Output Power

ISBN 978-987-655-003-1 EAMTA 2008

135

Fig. 1.

Range (SFDR) ([16], Chapter 4), which is dened as the distance between the maximum output power without clipping (distortion) and the Noise Floor (NF) at the output of the RFPA. For maximal linearity the IMD3 value is selected to be equal to the NF. The value of IMD3 is derived applying an input voltage like Eq. (4) to the polynomial model of Eq. (1). Then doing some trigonometric operations, the Third Order Inter Modulation Products are expressed (up to third order polynomial model)[9] by: 3 a3 (cos(21 t) cos(2 t)) 4

The RPFA for a WiMAX system base station has stringent requirements to be accomplished. The more important and critical specications are: 1) Power: the average output power is not specied in the IEEE 802.16-2004 standard. For medium range coverage area (about 3000 feet in 2-4 GHz spectrum band) the output average power must be greater than 30 dBm (1 Watt) [22]. 2) Bandwidth: the channel bandwidth needed to achieve 75 Mbps is about 20 MHz. But, in order to realize a cost effective base station, several 20MHz channels must be handled by the RFPA. Thus, the total bandwidth could be increased from 100 MHz up to 200 MHz. 3) Distortion: the standard species an Error Vector Magnitude (EVM) lower than 2.5% or a Signal to Noise Plus Distortion of -32 dBc. This forces the operation point of the amplier approximately between 3dB to 6bB below the 1 dB output compression point. Of course, this results in a schem of low efciency. For example, using the above distortion values, a class A amplier works between 5% to 10% of DC-to-RF power conversion ,and a class AB performance is about 15% to 18%. B. DC Decoupling Filter Basic (classic) networks for ltering the RF feedback trough DC path can be found in [14] and [23]. The function of this network is mainly to avoid self oscillations, and is conformed basically by capacitors and RF chokes (inductors). At the operating frequency of 2.3Ghz to 2.5Ghz this network can be replaced by micro strip circuits. In particular, the radial stub ([24],[25])has excellent properties to construct RF chokes [14]. The attenuation provided by the network is increased about 30db for each additional radial stub, and the notch frequency and bandwidth can be controlled easily with the stub geometry. Our design was made with ADS simulator [26], and they were required two iterations to achieve an acceptable prototype. In the second pass we optimized the design looking for small size. Figure 2 illustrates the attenuation values obtained with the ADS simulator.

v0 = and v0 =

(8)

3 a3 (cos(22 t) cos(1 t)) (9) 4 It must be noted that IMD3 is only dominant for low levels of distortion (10 dB below P1dB ). At higher levels, 5th and higher order IMD effects can also produce sidebands at the IMD3 frequency. Since the signals to be amplied are the outputs of the OFDM modulators and also several modulators can be amplied in a multi carrier fashion [18] the assumption that IMD3 is composed only of third order coefcient distortion remains valid because the PAPR is always higher than 10dB. For WiMAX standard the IMD3 must be at least of -32dBc [21], where the carrier (dBc) reference power is the average power of two sinusoids applied to the input under Two Tone Measurement Setup. III. AMPLIFIER DESIGN A. WiMAX Specication WiMAX system (IEEE 802.16-2004 standard) is a broadband outdoor wireless scheme using up to 20 MHz of bandwidth per channel. It uses an M-ary quadrature amplitude modulation (up to 256-QAM) [21]. This modulation can support bits rates of log2 (M ) bits/sec per 1Hz of bandwidth. In practice, the maximum achievable data rate is about 75 Mbps.

ISBN 978-987-655-003-1 EAMTA 2008

Fig. 2.

Figure 3 shows the nal decoupling lter prototype (a $ 0.5 Argentine coin diameter is 2.5 cm). C. Driver Stage For this stage we choose the transistor MMG3003NT1 from Freescale [27]. It is a Darlington pair (InGaP technology)

IEEE Catalog number CFP0854E-CDR

136

bandwidth. Second Freescale is the only one, among all LDMOS transistor manufacturers, who provides a complete physical model for simulation in ADS [30], [31]. This model includes thermal drift and thermal transients capabilities. The design steps are quite similar to those for previous stage. The main differences in this case are given by output matching network design and the fact that now, a physical model (ADS software) of the transistor is available. This provide us with more exibility and precision in the design and simulation steps. First, we select the Q operating point from the simulated IV curves to meet the required Output Back-Off Power explained in section II. Afterward, we calculate the maximal thermal power dissipation (operating case temperature and heat sink size). The following, is to simulate the stability conditions of the device with the input and output impedances matched at the operating point selected. If the device result stable, the better input matching networks is obtained with optimizations in the ADS simulator. If device results not stable, lossy networks (serial or parallel) must be introduced at the input to dump oscillations and then match to this new circuit. Latter, the output matching network is calculated using the load pull simulator tool [32] provided by the ADS. In this way, we nd the best output matching impedance value (power matching) that better t with the trade-off between delivering the maximum power to load with the maximum Power Added Efciency (PAE) [12]. The power stage design results obtained are the following: Maximum allowed power dissipation 113 W, maximum case temperature 140 degrees, Power Gain 14 dB, Output Power at 1 dB Compression Point 34 dBm (4 Watt), input impedance Zin = 14.81 j2.41, output impedance Zout = 2.64 + j3.42. Figure 5 illustrates the nal power stage prototype.

Fig. 3.

with a cut-off frequency fT higher than 3.5GHz. It is easy to bias because requires only one DC power supply. The operation point is selected with a series resistor connected to DC who controls the collector current ICQ . Its main drawback is that the information available from the transistor data sheet is partial (ADS model is not available), only a few S parameters for a xed biasing points are provided by the manufacturer [27]. Thus, to meet our specications, we implement a dedicated board with known S parameters (Device Under Test board, DUT) to extract the S parameters of the transistor [28] at the specic operating point using a vector network analyzer (VNA). Once the S parameter are obtained, the matching network design (on micro strip) and stability analysis were performed with ADS simulations tools. Some basic driver stage design results obtained are the following: Power Gain 13 dB, Output Power at 1db Compression Point 19 dBm (80 mW), input impedance Zin = 84, 76 + j9.85, output impedance Zout = 70, 59 + j7.56. Figure 4 illustrates the nal driver prototype stage.

Fig. 4.

Driver Stage

Fig. 5.

Power Stage

D. Power Stage The transistor selected for this stage is the MRF6S23100H from Freescale [29]. The selection is based on two main aspects: First, LDMOS devices are actually the cheapest solution in the specied power range and frequency

ISBN 978-987-655-003-1 EAMTA 2008

IV. SIMULATIONS AND MEASUREMENTS For simplicity this section presents laboratory measurements results veried with the complete prototype. Results observed in each separated amplier stage are discussed in the next Section.

IEEE Catalog number CFP0854E-CDR

137

One tone measurements results: measurements were carried out with a whit a 3GHz Vector Signal Analyzer. The output power was attenuated with a Aeroex xed load with 30dB of attenuation and 25 Watt of dissipation (Fig8). From Figure 6 we can observe a constant gain of 27 dB up to the 1dB compression point. At this point the power amplier is delivering around 34 dBm of RF power to the load. The overall prototype measured efciency is approximately 9%. This agree with the theoretical values presented in section II.

Fig. 8.

Final Prototype

V. C ONCLUSIONS AND D ISCUSSION A 6 Watt peak linear RFPA was implemented, the prototype performance fulll the specications on a bandwidth of 120 MHZ. Out of this range a better matching network must be designed. At maximum power, the IMD3 obtained is -36dBc, which is better than the WiMAX specication. In any case, more effort can be realized to obtain higher overall efciency. Main impairments of the design were veried in the power stage. The design was dened to work in class A (with drain current IdQ = 3A), but at the specied biasing point the thermal drift cancel all the linear benets. As a result, the power stage works in class A but with actually IdQ = 1.4A, that results in a lower efciency. Despite the valuable physical ADS thermal model provided by Freescale, careful design considerations must be taken if linear modulation is used at maximum of power. ADS simulation software predicts the veried (measured) results within 5%. That was in general a good reference. However, accuracy of ADS was not suitable in the design of the output matching network of power stage. A displacement of the center frequency with respect to the value speciced (2.4 GHz) was veried. It could be caused by two differents aspects: One, is the high sensitivity of output power transistor impedance to operating point conditions. The other, is the high RF current owing in the output matching circuit. Since the matching microstrip circuit was calculated with 2D simualtion, the variation in the microstrip components values caused by supercial currents not were included. The way to face up the latter aspect is using a 3D Electromagnetic Simulator ([16], Chapter 8) like for example [33]. This kid of tool allow the renement of the design since they can deal with supercial currents and cross talks effects. Both amplier stages were designed in a separated way with its respective inputs and outputs adapted to 50 for individual measurement simplicity. A further improvement to whole device is to design the matching interstage network to adapt his own impedance. Eve more, if this matching network is of passband nature, the IMD3 of rst stage can be eliminated. Self oscillations were not observed with prototype in the

IEEE Catalog number CFP0854E-CDR

Fig. 6.

Two tone measurements results: measurements were carried out with a 26GHz Agilent Spectrum Analyzer. Due to practical limitations this measurement was made based on two separated signals generators coupled trhough a Wilkinson combiner [24], with each tone having different power. Figure 7 illustrates IMD3 results. The tone separation was selected of 1 KHz . This allows the easy characterization of the out of band distortion and their relationships with the power of input tones. For both tones, the respective measured side bands distortion power results lower than -32dBc. Figure 7 also shows that the -32dBc was achieved with 38 dBm of RF power in the load. This means that the amplier can deliver up to 6Watt of RF power to the load full ling the linearity requirement of WiMAX.

Fig. 7.

138

range o 0 Hz and 26 GHz. Related to that, the design of the DC decoupling lter was very conservative. His attenuation is around 60 dB, when a rule of thumb for prevent oscillation is to design lter around 30 dB of attenuation. For a future design, in order to reduce the board size, lter with only one radial stub could be considered. All prototypes were implemented in Rogers R04230 printed circuit board [34], whose manufacturer data and simulation ADS results show excellent agreement. ACKNOWLEDGMENTS We wish to thank to the Agencia Nacional de Promocion Cientica y Tecnologica (Project N#21723) and Fundacion Tarpuy for partial support of this work, Mr. Hernan Gutierrez from Laboratorio de Electronica (UNS), for his helpful collaboration in the prototypes construction, and also to Prof. Nestor H. Mata for his advises and helpful discussions. R EFERENCES

[1] S. A. Ahson, WiMAX Applications, 1st ed. CRC Press, 2007. [2] R. Prasad, Multicarrier Techniques for 4G Mobile Communications, 1st ed. Artech House, 2003. [3] S. Glisic, Advanced Wireless Communications. 4G Technologies, 1st ed. John Wiley & Sons, Inc., 2006. [4] S. Cripps, Advanced Techniques in RF Power Amplier Design, 1st ed. Artech House, 2002. [5] J. L. Dawson and T. H. Lee, Cartesian feedback for rf power amplier linearization, Center for Integrated Systems, Stanford University. [6] L. R. Khan, Single sideband transmission by envelope elimination and restoration, Proceeding of IRE, vol. 40, pp. 803806, July 1952. [7] G. Wimpenny, Improving multi-carrier pa efciency using envelope tracking, RF Design On Line Magazine, March 2008. [8] P. Kenington, High Linearity RF Amplier Design. Artech House, 2000. [9] J. C. Pedro and N. B. Carvalho, Intermodultation Distorion in Microwave and Wireless Circuits, 1st ed. Artech House, 2003. [10] J. Brinkhoff, Ph.D. Thesis: Bandwidth Dependent Intermodulation Distortion in FET Ampliers. Macquire University, Sydney Australia, 2004. [11] L. Ding, Ph.D. Thesis: Digital Predistortion of Power Ampliers for Wireless Applications. Georgia Institute of Technology, 2004. [12] S. Cripps, RF Power Amplier for Wireless Communications, 1st ed. Artech House, 1999. [13] S. Maas, Nonlinear Microwave and RF Circuits, 2nd ed. Artech House, 2003. [14] U. Rohde and D. Newkirk, RF/Microwave Circuit Design for Wireless Applications, 1st ed. John Wiley & Sons, Inc., 2000. [15] N. Dye and H. Granberg, Radio Frequency Transistors, 1st ed. Newnes, 2001. [16] M. Golio, the RF and Microwave Handbook. CRC Press, 2001. [17] O. N. Andrew Wright, Multi-carrier wcdma basestation design considerations. amplier linearization and crest factor control, PMC Sierra Tehcnology White Paper, August 2002. [18] S. Kenney and A. Leke, Design considerations for multicarrier cdma basestation power ampliers, Microwave Journal, November 1998. [19] A. Behravan and T. Eriksson, Papr and other measures for ofdm systems with nonlinearity, The 5th International Symposium on Wireless Personal Multimedia Communications, vol. 1, pp. 149153, October 2002. [20] G. G. G. Gonzalez, Master Thesis: Measurements for Modelling Wideband Nonlinear Power Ampliers for Power Communications. Departmnet of Electrical and Communications Engineering, Helsinki University of Technology, 2004. [21] S. A. Ahson, WiMAX: Standards and Security, 1st ed. CRC Press, 2007. [22] R. Crane, Propagation Handbook for Wireless Communications System Design, 1st ed. CRC Press, 2003. [23] M. Albulet, RF Power Ampliers, 1st ed. Noble Publishing, 2001. [24] B. Wadell, Transmission Lines Design Hnadbook. Artech House, 1991.

[25] B. Atwater, The design of the radial line stub: A useful microstrip circuit element, Microwave Journal, pp. 149156, November 1985. [26] A. Technologies, Agilent eesoft eda advanced design system, 2006. [27] Freescale, www.freescale.com/les/rf if/doc/data sheet/mmg3003nt1.pdf. [28] I. Agilent, S parameters design application note 154. [29] Freescale, www.freescale.com/les/rf if/doc/data sheet/mrf6s23100h.pdf. [30] D. B. T. L. W. Curtice, J. Pla and E. Shumate, A new dynamic electrothermal nonlinear model for silicon rf ldmos fets, MTT-S Digest, 1999. [31] Motorola, Motorola electrothermal model. [32] A. Andy Howard, Load pull simulation using ads white paper, Agilent Application Notes. [33] C. S. T. Inc., Cst microwave studio, 2004. [34] Rogers, www.rogerscorporation.com/mwu/pdf/ro4000data fab 10 07.pdf.

139

Close Range Bearing Estimation and Tracking of Slow Moving Vehicles Using the Microphone Arrays in the Hopkins Acoustic Surveillance Unit

Zhaonian Zhang and Andreas G. Andreou

Electrical and Computer Engineering The Johns Hopkins University Baltimore, MD 21218 zz@jhu.edu, andreou@jhu.edu

Abstract In this paper, we report on the use of a microphone array and enclosure in the Hopkins Acoustic Surveillance Unit (HASU) for bearing estimation and tracking of a moving vehicle at a close range. We use correlation based algorithms and we report bearing estimation results from moving vehicles at various constant velocities. The estimation results are in good agreement with theoretical model predictions. The correlation algorithms employed can be implemented in low power digital VLSI for wireless sensor networks.

I. I NTRODUCTION Julian et al reported a sound localization algorithm for energy aware sensor network nodes [1]. This algorithm has been implemented in low power CMOS integrated circuit for bearing estimation [2]. Its eld test results show that it can estimate the bearing angle of a stationary sound source within one degree of precision [3]. Such a system has been integrated into an acoustic surveillance unit [4] to be used in a wireless sensor network environment for security and surveillance applications [5]. This work is a continuation of the work reported by Julian et al [1]. In this work, we report on experimental results from using the cross correlation algorithm and the Johns Hopkins University Acoustic Surveillance Unit [3] to track one or more vehicles travelling at various velocities at close range. These correlation based algorithms can be readily used with custom designed low power digital VLSI chips [6], [7] for use in sensor network applications. II. P ROBLEM S TATEMENT Consider the conguration in Fig. 1. Two microphones separated by a distance of L can be used to estimate the bearing angle of a sound source at a distance of Ls , where L << Ls. If the source is far away the sound waves arriving at the microphones are considered plane and the bearing angle can be estimated using the time delay D between the wavefront arriving at one microphone from the other. If we let c be the speed of sound, the time delay can be calculated as L D = cos . (1) c Then the bearing estimate is: Dc . (2) = arccos L

ISBN 978-987-655-003-1 EAMTA 2008

-L/2 L/2

Fig. 1. Microphone setup for bearing angle estimation. The two microphones (marked by X) are placed at -L/2 and L/2 respectively. The bearing angle of an acoustic source can be estimated by the time delay between the time-of-arrival of the signals received by the microphones.

A cross correlation of the signals received by the two microphones can be used to nd the time delay D. Mathematically, if

+

Rx1 ,x2 =

x1 (t)x2 (t + )dt

(3)

is the cross correlation of the two signals, the correlation output Rx1 ,x2 exhibits a peak and is maximized when = D. The computation of Rx1 ,x2 in hardware is expensive because it involves multiplication operations between two signals. Noting when Rx1 ,x2 is maximized, its derivative reaches zero value, the derivative of the cross correlation Rx1 ,x2 can be evaluated to search for a zero crossing point, as reported in the algorithm by Julian et al [1]. If we let r(i) be a practical implementation of Rx1 ,x2 , and

K

r(i) =

k=0

x1 (k)x2 (k i)

(4)

IEEE Catalog number CFP0854E-CDR

140

microphones, then the derivative of r(i) can be written as r(i) = r(i) r(i 1)

K

A

(5)

traffic

=

k=0

x1 (k)(x2 (k i) x2 (k (i 1))

If signals x1 (k) and x2 (k) from the two microphones are quantized to 1 bit resolution, Eq. (5) can be implemented as an up-down counter [1]. The counter counts up when x1 (k) = 1, x2 (k i) = 1, and x2 (k i + 1) = 0. It counts down when x1 (k) = 1, x2 (ki) = 0, and x2 (ki+1) = 1. Combinatorial logic is designed to detect the zero crossing point from the bank of counters. This bearing estimation algorithm can be easily implemented in digital logic. Since switching activities only occur when x2 (k i + 1) changes its state, very little power is consumed. III. F IELD T ESTING : B EARING E STIMATION V EHICLES A. Setup The eld testing took place at Fort Devens, Massachusetts, USA in July 2005. The setup is illustrated in Fig. 2 with an actual picture being shown in Fig. 3. Four Knowles SiSonic MEMS microphones and amplifying circuitry in the Hopkins Acoustic Surveillance Unit (HASU) (see Fig. 4) were placed on the side of the road at roughly 3 meters away. Inside the enclosure, the physical distance between microphones is 6cm, however, the acoustic enclosure produces an effective separation of 15.9cm. The four channels of microphone output was amplied with a cutoff frequency of 300Hz. A National Instruments data acuqisition card (NI-DAQ) was used to acquire these four channels of amplied signals at 4kHz. The acquired signals were processed ofine in MATLAB to estimate the bearing angles of the moving vehicles from the noise they made. B. Signal Processing The acoustic signals acquired at 4kHz were interpolated to a sampling rate of 200kHz in MATLAB with the use of the interp function. This step is necessary because the maximum time delay two opposing microphones can sense is determined by the effective separation between the microphones, which is 15.9cm in this case. Such a distance gives rise to a time delay of roughly 460s in air. A 5s resolution (i.e., 200kHz sampling rate) between samples is necessary to provide enough resolution to resolve this maximum delay. In the actual VLSI implementation, what this implies is that the incoming signals from the microphone array needs to be quantized to 1 bit at 200kHz. However, since Eq. (5) can be implemented with an up-down counter and the incoming signals have a cutoff frequency of 300Hz, the counter value will only be updated when certain conditions are met, which effectively makes the correlator bank running at 300Hz or lower. When the target is moving, its bearing angles are changing constantly. The acoustic signals have to be windowed and the bearing of the acoustic target needs to be estimated for each

ISBN 978-987-655-003-1 EAMTA 2008

OF

3 1 4 2

M OVING

traffic

Fig. 2. Field testing setup. A microphone array (indicated by the small circle) consisting of four microphones was placed near a road where different vehicles drove by at different velicities. A zoomed-in view of the orientation of the microphones is shown on the right.

Fig. 3. Picture of the actual eld testing setup. A zoomed-in view of the location of the microphone array is shown on the right.

individual window. This is similar to the idea of performing a short time Fourier transform to see how the frequency content of a signal changes over time. The window size here is 0.17s, or 34000 samples after interpolation. One channel of the microphone array output is differentiated, and both channels are quantized to 1 bit. Then the two channels are correlated using Eq. (5) with one channel being delayed by different number of stages. The zero crossing point is searched from the correlator bank output. The number of delay stages is used to estimate the bearing angle of the moving vehicle. Only the output from microphones 1 and 2 (see Fig. 2 are analyzed because microphone 3 was broken due to the

IEEE Catalog number CFP0854E-CDR

141

100

Bearing Angle (degrees)

Run 33 - Malibu @ 10mph

80

60

40

20

0 0

5

10

Time (s)

15

20

25

0.02

Microphone Output (V)

0.015

0.01

0.005

0

-0.005

-0.01 0

5

Fig. 4. Picture of the enclosure housing the microphone array. Although the physical distance between microphones is 6cm, the acoustic enclosure produces an effective separation of 15.9cm.

10

Time (s)

15

20

25

excessive dust in the testing eld. When a vehicle drives by these two microphones, the bearing angle changes from 90 degrees to 0 and then 90 degrees. A simple model is used to validate the above algorithm for bearing estimation. In the window where the maximum number of delay stages occurs, the vehicles is assume to be inline with the two microphones and produce a zero bearing. Then the vehicles theoretical bearing is calculated by assuming the vehicle has been and will be driving at constant velocities before and after reaching the zero bearing point. This bearing trajectory is used to evavulate the accuracy of the signal processing algorithms described above for bearing estimation. C. Results A number of tests were carried out and their results are presented as follows. Figure 5 shows a Chevy Malibu driving by at 10 miles per hour. Figures 6 and 7 illustrate the same Malibu driving by at 20 and 40 miles per hour from A to B and then from B to A in Figure 2. In the top graph of each gure, the solid blue line is the bearing estimation from our algorithm, and the dash red lines is the bearing from the model. The bottom graph of each gure plots the raw acoustic signals from the microphones. We can see the algorithm is in good agreement with the model. Figures 8 and 9 illustrate a convoy consisting of a Malibu and a Honda driving by at 20 and 40 miles per hour. In the top graph of either gure, the solid blue line is the bearing estimation from our algorithm, and the dash red lines is the bearing from the model. The bottom graph of each gure plots the raw acoustic signals from the microphones. Again, two peaks are present in the bearing estimation plots which demonstrates that the algorithm for bearing estimation is in good agreement with the model. IV. C ONCLUSION We have applied signal processing algorithms to acoustic signals collected by a microphone array in the HASU to estimate the bearing angles of moving vehicles. The results are in good agreement with the vehicle locations. These algorithms

ISBN 978-987-655-003-1 EAMTA 2008

Fig. 5. A Chevy Malibu drove by the microphone array at 10mph. The top plot shows the bearing estimation using our algorithm and our model. The bottom plot shows the raw acoustic signals collected by the microphones.

100

Bearing Angle (degrees)

Run 34 - Malibu @ 20mph

80

60

40

20

0 0

5

10

15

20

25 Time (s)

30

35

40

45

50

0.04

Microphone Output (V)

0.02

0

-0.02

-0.04 0

10

15

20

25 Time (s)

30

35

40

45

50

Fig. 6. A Chevy Malibu drove by the microphone array at 20mph, turned around and drove by again in the opposite direction. The top plot shows the bearing estimation using our algorithm and our model. The bottom plot shows the raw acoustic signals collected by the microphones.

can be implemented in low power VLSI and embedded in a sensor network for bearing estimation and tracking of moving targets. ACKNOWLEDGEMENTS This work was supported by MASINT project Decentralized-Fusion, On-Demand Activation, Awareness Sensor Network NMA401-02-9-2002 under a subcontract by Honeywell. It was also partially supported by NSF grant IIS0434161 and by DARPA/ONR contract N00014-00-C-0315. We are grateful to Prof. Pedro Julian for his critical review

IEEE Catalog number CFP0854E-CDR

142

100

Bearing Angle (degrees)

Run 35 - Malibu @ 40mph

80

60

40

Bearing Angle (degrees)

Run 50 - Malibu and Honda Wagon Convoy @ 20mph

100

80

60

40

20

0 0

5

20

0

0

10

15

20

Time (s)

25

30

35

40

45

1.75

Microphone Output (V)

1.7

1.65

1.6

1.55

Microphone Output (V)

10

15

20 Time (s)

25

30

35

40

1.68

1.66

1.64

1.62

1.6 0

10

15

20

Time (s)

25

30

35

40

45

Fig. 7. A Chevy Malibu drove by the microphone array at 40mph, turned around and drove by again in the opposite direction. The top plot shows the bearing estimation using our algorithm and our model. The bottom plot shows the raw acoustic signals collected by the microphones.

10

15

20 Time (s)

25

30

35

40

[1] P. Julian, A. Andreou, , L. Riddle, S. Shamma, D. Goldberg, and G. Cauwenberghs, Comparative study of sound localization algorithms for energy aware sensor network nodes, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 4, pp. 640648, April 2004. [2] P. Julian, A. Andreou, P. Mandolesi, and D. Goldberg, A low-power cmos integrated circuit for bearing estimation, Circuits and Systems, 2003. ISCAS 03. Proceedings of the 2003 International Symposium on, vol. 5, pp. V305V308 vol.5, May 2003. [3] P. Julian, A. Andreou, G. Cauwenberghs, M. Stanacevic, H. Goldberg, P. Mandolesi, L. Riddle, and S. Shamma, Field test results for low power bearing estimator sensor nodes, Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, pp. 42054208 Vol. 5, May 2005. [4] D. H. Goldberg, A. G. Andreou, P. Juli n, P. O. Pouliquen, L. Riddle, and a R. Rosasco, Vlsi implementation of an energy-aware wake-up detector for an acoustic surveillance sensor network, ACM Trans. Sen. Netw., vol. 2, no. 4, pp. 594611, 2006. [5] G. Cauwenberghs, A. G. Andreou, J. West, M. Stanacevic, A. Celik, P. Julian, T. Teixeira, C. Diehl, and L. Riddle, A miniature, low-power, intelligent sensor node for persistent acoustic surveillance, in Proc. SPIE Defense and Security Symposium, Orlando, FL, 2005. [6] P. Juli n, A. G. Andreou, and D. Goldberg, A low power correlationa derivative cmos vlsi circuit for bearing estimation, IEEE Transactions on Very Large Scale Integration Systems, vol. 42, no. 2, pp. 207212, 2006. [7] P. Juli n, F. M. Pichio, and A. G. Andreou, Experimental results for a cascadable micropower time delay estimator, IEE Electronics Letters, vol. 42, no. 21, pp. 12181219, 2006.

Fig. 8. A Malibu and Honda convoy drove by the microphone array at 20mph. The top plot shows the bearing estimation using our algorithm and our model. The bottom plot shows the raw acoustic signals collected by the microphones.

100

Bearing Angle (degrees)

80

60

40

20

0 0

5

10

Time (s)

15

20

25

1.75

Microphone Output (V)

1.7

1.65

1.6

1.55 0

10

Time (s)

15

20

25

Fig. 9. A Malibu and honda convoy drove by the microphone array at 40mph. The top plot shows the bearing estimation using our algorithm and our model. The bottom plot shows the raw acoustic signals collected by the microphones.

143