Book2 Neural

Volume 14(S1) Supplementary August 2007
Dynamics of Continuous,
Discrete & Impulsive Systems
Series A: Mathematical Analysis
Editor-in-Chief
Xinzhi Liu, University of Waterloo
Special Issue on
Advances in Neural Networks--Theory and Applications
Part 2
DCDIS 14(S1) 503-815 (2007) ISSN 1201-3390
Watam Press • Waterloo

DYNAMICS OF CONTINUOUS, DISCRETE AND IMPULSIVE SYSTEMS
Series A: Mathematical Analysis
Editor-in-Chief: Xinzhi Liu, Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario, Canada
Honorary Editors:
J. Hale Florida Institute of Technology Berkeley, CA 94720, USA P. A. Samuelson
Georgia Institute of Technology Melbourne, FL 32901, USA S. R. May Massachusetts Institute of
Atlanta, GA 30332, USA G. Leitmann University of Oxford Technology
V. Lakshmikantham University of California Oxford, OX1 3PS, UK Cambridge, MA 02139, USA
Editorial Board:
R.P. Agarwal Winterthurerstr. 190, Zürich D. Guo S. Sathananthan
National University of Y.J. Cho Shandong University Tennessee State University
Singapore Gyeongsang National Univsity Jinan 250014, P.R. China Nashville, TN 37203, USA
10 Kent Ridge, Singapore 0511 Chinju 660-701, Korea K.P. Hadeler J. Serrin
S. Ahmad S.N. Chow Universität Tübingen University of Mi nnesota
University of Texas National University of D-72076 Tübingen, Germany Minneapolis, MN 55455, USA
San Antonio, TX 78249, USA Singapore L. Hatvani D.D. Siljak
N.U. Ahmed 10 Kent Ridge, Singapore 0511 Bolyai Instit, Szeged, Hungary Santa Clara University
University of Ottawa L. O. Chua N. Hirano Santa Clara, CA 95053, USA
Ottawa, Ontario, Canada University of California Yokohama National University R. Temam
J.L. Bona Berkeley, CA 94720, USA Yokohama 240, Japan Université de Paris-Sud et
University of Texas G. Da Prato I. M. Lasiecka C.N.R.S., France
Austin, TX 78712, USA Scuola Normale Superiore University of Virginia K.L. Teo
H. Brezis 56126 Pisa, Italy Charlottesville, USA Hong Kong Polytechnic Univ
Universite Pierre et Marie Curie D.G. De Figueiredo T.T. Li Hung Hom, Hong Kong
75252 Paris, Cedex 05, France IMECC-UNICAMP Fudan University G.S.K. Wolkowicz
S.A. Campbell Campinas S.P., Brazil Shanghai 200433, P.R. China McMaster University
University of Waterloo T. Furumochi J. Mawhin Hamilton, Ontario, Canada
Waterloo, Canada Shimane University University of Louvain J. Wu
C.Y. Chan Matsue, 690 Japan B-1348 Louvain-La-neuve, York University
University of Louisiana K. Gopalsamy Belgium Toronto, Ontario, Canada
Lafayette, LA 70504, U.S.A. Flinders Univ of South O’Regan
M. Chipot Australia National Univ of Ireland
Universität Zürich Adelaide S.A. 5001, Australia Galway, Ireland
DCDIS Series A: Mathematical Analysis is published bimonthly in the months of February, April, June, August, October, and December. Articles published in this
journal are indexed or abstracted in: CompuMath Citation Index, Science Citation Index Expanded, Current Contents/Engineering, Computing
and Technology, Current Mathematical Publications, ISI Alerting Services, Mathematical Reviews, Mathsci, Research Alert, SciSearch
and Zentralblatt für Mathematik/Mathematics Abstracts.
© Copyright 2007 by Watam Press. All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission
in writing from the publisher.
Printed in Canada
Dynamics of Continuous, Discrete and Impulsive Systems, Series A, Vol.14 (S1)
Copyright@2007 Watam Press
Advances in Neural Networks--Theory and Applications
TABLE OF CONTENTS page
Section Language Model for Information Retrieval 503

Wen-Yu Chen, Bin-Wei Yang, Shi-Xin Sun
Markerless Tracking Based on Transiently Chaotic Neural Network 508
with Invariant Features for Augmented Reality System
Xinyu Li
Identification, Filtering and Control of Nonlinear Plants by Recurrent 512
Neural Networks Using First and Second Order Algorithms of Learning
Ieroham S. Baruch, Saul Escalante M, Carlos R. Mariaca Gaspar.
Combining Radar Emitter Recognition with Ambiguous a priori Knowledge 522
Shurong Tian, Xin Guan, You He, Wei Xiong
Local Weather Forecast for Flight Training Using Neural Networks 526
Jianguo Chen, Shijun Liu
The Research of Enterprise Strategy Management Based on Bayesian Networks 531
Xu Jian-zhong, Ren Jia-song
Application of a Swarm-based Artificial Neural Network to Ultrasonic Detector 536
Based Machine Condition Monitoring
Shan He, Xiaoli Li
Inverse Learning Control of an Experimental Helicopter Using 543
Adaptive Neuro-Fuzzy Inference System
Gwo-Ruey Yu, C. W. Tao
Kinematic Control of a 6-DOF Robot Manipulator using Kohonen 550
Self-Organizing Map (SOM)
Anjan Kumar Ray, Laxmidhar Behera, Amit Shukla
A WNN Based Kalman Filtering For Auto-Correction Of SINS/Star 559
Sensor Integrated Navigation System
Baiqi Liu, Jiancheng Fang, Lei Guo
A Method to Pre-select Support Vectors for LS-SVM Classifiers 565
Yongsheng Sang, Zhang Yi, Stones Lei Zhang
A Multi-layer Quantum Neural Networks Recognition System for 570
Handwritten Digital Recognition
Li Peng, Rushi Wu
Text-Independent Speaker Identification Using Fuzzy LS-SVM 575
Chengfu Yang1, Zhang Yi, Stones Lei Zhang
A Canonical Integrator Environment For The Development Of Connectionist Systems 580
Diego Ordóñez, Carlos Dafonte, Alfonso Iglesias, Bernardino Arcay
Stock Price Forecasting Using a Recurrent Fuzzy Neural Network 586
Menghua Tong, Qizhi Zhang, Woonseng Gan
Basic Engineering Materials Classification Model- A Neural Network Application 591
Doreswamy
i
Complexity Analysis Of Eeg Under Different Brain Functional States 596
Using Symbolic Entropy
Lisha Sun, Guoliang Chang, Patch. Beadle
Prediction of Clinical Response to Treatment of Crohn’s Disease by Using RBFN 602
Igor Grabec, Ivan Ferkolj, Daša Grabec, Dušan Grošelj
Study on Fuzzy Evaluating Neural Network for University Student Credit 608
Qingyu Xiong, Jing Chen, Qi Huang
Real-Time Control of Erythromycin Fermentation Process Based on 612
ANN Left- and Right-Inversion
Xianzhong Dai, Wancheng Wang
Optimal Control Based-Neurocontroller to Guide the Crop Growth 618
under Perturbations
J. Pucheta, H.D. Patiño, C. Schugurensky, R. Fullana, B. Kuchen
Stock Investor Behavior Simulation with Hybrid Neural Network 624
and Technical Analysis on Long-term Increases of Hong Kong Hang Seng Index
Chi Xu, Yan Cai, Zheru Chi
Application of neural networks in predictive Maintenance of rotating 631
machinery –areview
H Ranganathan, J Pattabiraman
Artificial Neural Network approach for Estimation of Hemoglobin in 638
Human Blood using Color Analysis
H Ranganathan, N Gunasekaran
One-class/Two-class Training for Windows NT User Profiling 644
Li Ling, C.N Manikopoulos
Combinatorial Productivity Through The Emergence Of Categories 650
In Connectionist Networks
Francis C. K. Wonga, William S-Y Wangb
An Estimative Model of Maximum Power Generation From Photovoltaic 658
Modules Based on Generalized Regression Neural Network
Hung-Cheng Chen, Jeng-Chyan Lin, Meng-Hui Wang, Jian-Cong Qiu
Channel Noise Induced Transition From Quiescence To Bursting In The 664
Dissipative Stochastic Mechanics Based Model Neuron
Marİfİ Güler
Fuzzy Systems on Orthogonal Bases 671
Musa Alcı
Homomorphisms in a direct sum of full matrix algebras 677
Zhongyan Li, Minli Li
A Delay Differential Equation Model of Immune Surveillance 682
Against Cancer and Its Stability Analysis
Dan Li, Wanbiao Ma
The Effect of Indexing Methods on SVM-based Text Categorization 689
Ju Jiang, Lei Chen, Mohamed S.Kamel, Yi Zhang
Research on a New Method of Processing Distributed Data 694
Wang Zhiguang, Chen Ming, Liu Lifeng
Triangles With Median And Altitude Of One Side Coincide 699
Dong-hai Ji, Jun-jing Jia, Sen-lin Wu
ii
Matrix Representation of Solution Concepts in Graph Models for Two 703
Decision-Makers with Preference Uncertainty
Haiyan Xu, D. Marc Kilgour, Keith W. Hipel
A Homotopy Method for Solving MPEC Problem 708
Jiamin Li, Qinghuai Liu, Xinmin Wang, Guochen Feng
A State-Of-Charge Estimation Method Based On Extension Theory 713
For Lead-Acid Batteries
Kuei Hsiang Chao, Meng Hui Wang, Chia Chang Hsu
On Subnormal Completion Problem 719
Chunji Li, Shengjun Li, Jianrong Wu
The Left-Groebner bases in Ring of Di®erential Operators 724
Jinwang Liu, Xiaoling Fu, Dongmei Li
On The Fuzzy Riemann-Stieltjes Integral 728
Xue-Kun Ren, Cong-Xin Wu, Zhi-Gang Zhu
On The Discrete Time Brownian Flow I: Characteristic And Invariant 733
Measure Of The N-Point Motion
Jingxiao Zhang
On The Discrete Time Brownian Flow II: Central Limit Theorem Of 738
The N-Point Motion
Jingxiao Zhang
Doob’s Martingale Inequality in G-Framework 742
Jing Xu, Bo Zhang
Fuzzy Genetic Algorithm Based on Principal Indexes Operation 746
Fachao Li, Panxing Yue, Chenxia Jin
Analysis of Compressible Miscible Displacement with Dispersion 751
by a Characteristics Collocation Method
Ning Ma
Existence of Weak Solutions for Evolution Inclusions in Reflexive Banach Spaces 756
Guocheng Li
A Class of Robust Strategy for Robot Manipulators with Uncertainties 761
Lixia Zhi
The Sufficient And Necessary Conditions Of Error Bounds For Constrained 766
Multifunctions
Jian-Rong Wu, Shi-Ji Song
Commutativity Theorems On Rings 770
Chen Guanghai, Yang Xinsong
Local-Bandwidth Mean Shift Segmentation of MR Images Using 774
Nonlinear Diffusion
Dong Huang, Huizhong Qiu, Zhang Yi
On the Central Limit Theorem of Markov Chains in Markovian 779
Environments for φ-mixing Stochastic Sequences
Chen Neiping, Yang Gang
Lower Bounds and Existence Conditions of the Solution for the Perturbation 784
Generalized Lyapunov Equations
Dong-Yan Chen, Ling Hou, Jun-Fang An
iii
Commuting Toeplitz Operators with Harmonic Symbols 788
Limin Yang
Ergodic Characteristics Analysis of Time Series Data in Hydrological Process 791
Hongrui Wang, Xin Lin, Xiaoming Peng, Dongli Zhou
Periodic Solutions For A Kind Of p−Laplacian Liénard Equation 798
With A Deviating Arg
Minggang Zong, Wei Yuan, Wenqing Zhao
Web Page Importance Ranking With Priori Knowledge 805
Guoyang Shen, Shiji Song
Author Index 811
iv
DCDIS A Supplement, Advances in Neural Networks, Vol. 14(S1) 503--507
Section Language Model for Information Retrieval

Wen-Yu Chen, Bin-Wei Yang and Shi-Xin Sun
College of Computer Science and Engineering, University of Electronic Science and Technology
of China Sichuan chengdu 610054, China.
Corresponding author e-mail: cwy@uestc.edu.cn
Abstract— This paper studies the information retrieval in questions. This related vocabulary table consists all words that
natural language processing. A new model which named Section have the same meaning or the related meaning of the sentence.
Language Model is proposed. In this model, by using a Correla- A brief description of the proposed Related Vocabulary Set
tion Vocabulary Table, the search result recall rate is increased
effectively. This paper also proposes two specific language’s is presented in Section II. The proposed Section Language
models, named Unigram Section Language Model and Bigram Models is presented in Section III. The simulations results are
Section Language Model. A thorough research has been done given in Section IV. Finally, conclusions are drawn in Section
in the field of the correlative word set for those models. Some V.
simulations have carried out to show our results.
Index Terms— Conceptual base, Information Retrieval, Recall- II. T HE D ESIGN OF R ELATED VOCABULARY S ET
ing rate, Section Language Model.
In information retrieval model, all the basic processes are
matching. we can compute the similarity of article and the
I. I NTRODUCTION query sentence from the words existing in the query sentence
According to the Schank’s concept exist theory[1], there or article. Then the article can be sorted according to the
are all kinds of concept in real world. Language is used when probability finally. The computing of the similarity is various
people transmit information, while information is defined by in different models. But for all methods, the frequency and
concepts and the relation between these concepts. So the site of the words related to query sentence in article should
process of getting the language’s information is equal to the be considered for ever.
process of mapping language to concept sets. It is clear that
words in a language are representative symbols of concept sets A. Dealing with Query Sentences
in a particular language. The input of information retrieval model is a query sentence.
In the information retrieval model based on concept dy- The output should be a group of articles related to this
namic feature match[2], a sufficient and necessary condition query sentence. Before searching, the query sentence should
has been summed up for query sentence. They declared that, be analyzed firstly. How to identify the boundary of the
in the concept sets which have been picked up to represent crytic words is the first work for analyzing. It is different
the feature structure of an article, there must exist some from the model based on sentence meaning analysis that not
matching concepts to the feature structure of selected con- all the words of a query sentence in probability model are
ditions. This implies that the concept’s sets represented by useful to information retrieval. Some empty words, such as
the query sentence must be the same as that in an article. prepositions, auxiliary words, modal words, and so on, are in
However in real language, vocabularies are not matched to a high proportion in frequency statistics. These words make no
concept’s sets one by one. A concept’s set can match more sense in forming language model, and should be overlooked.
then one vocabularies, this is so called synonymy relationship. In information retrieval, we stop the vocabulary table, and
A word can also match more then one concept’s set, and this prefabricate a table called ”stopped table”, it contains these
is multivocal relationship. How to find out that the selected words that exist in sentences depot in high frequency.
conditions are whether matched to the article by concept’s
sets rather than vocabularies? This is a basic difficulty for
information retrieval [3]. According to the concept dynamic B. Computing Words’ Connection
feature set theory, the relations between concepts of a sentence Let’s suppose, the query sentence is Q = {q1 , q2 , ..., qn },
or an article form the feature structure. In information retrieval, the related table is V the related words set of qi is V (qi ) =
the matched concepts between article and query sentence are {vi1 , vi2 , ..., vim }. It is clear that Vij in V (qi ) is not the same
not the single matched concepts, but are matched concepts with qi in Q. Suppose Q exist, if Vij = qi , then the value of
with some characteristics[4]. But in real language, the same P (vij |Q − qi ) is bigger than that of P (qi |Q − qi ) or near to
information, such as the concept’s sets, can be made up by it. If the value of P (vij |Q − qi ) is too small, then Vij does
different sentence strutrue[5]. How to distinguish the same not much match other words in Q. This implies that Vij has
meaning from different sentences? This is the second difficulty. little relationship with qi of Q.
This paper proposes a method, in which a related vocab- The value of P (vij |Q − qi ) should be computed by train-
ulary table is proposed to solve synonymous and multivocal ing the vocabulary depot. Suppose w = vij , Q − qi =

w1 , w2 , ..., wk , then III. T HE P ROPOSED S ECTION L ANGUAGE M ODEL
There are two characteristics of our proposed model. The
C(w, w1k )
P (w|w1k ) = (1) first is based on a window rather than the whole article. The
C(w1k ) second one is, the key words are not only the that in query
We only compute the existence frequency of w among sentence, but also base on the related words sets. We propose
w1 , ..., wk , rather than the sentence meaning relationship be- two models which are Unigram Section Language Model and
tween words. So we can change it to be a window to compute Bigram Section Language Model.
the existence frequency, thus:
A. Unigram Section Language Model
C(w, w1k , L)
P (w|w1k ) = (2) In Unigram Section Language Model, we take each word
C(w1k , L) for isolated, then calculate the product of every probability.
where C(w, L) is the frequency of W in window L, L is a
n
section or an article. P (Q|d) = P (w1 , w2 , ..., wn |d) ≈ P (wi |d) (5)

i=1
When K in equation (2) is big, the situation of data sparsity
will be serious. So smoothness technology is used in our where, w1 , ..., wn is the concept radicel sequence of key words
design. We use recalling smoothness together with Good- after dealing with the query sentence Q.
Turing smoothness to compute the existence frequency of the Assume article D consists M sections: L1 , L2 , ..., Lm , then
W. If the value is zero, then computer the frequency of the we can compute the peak value of the query sentence’s
W . So, equation (2) can be modified to: probability. Take the max value in these M sections as the
⎧ value of D:
⎪
⎪ PGT (w|w1k ), C(w, wik , L) > 0
⎪
⎪ PL1 (Q|d) = M AXLj ∈d (PL1 (Q|Lj))
⎨ α(w1 )PGT (w|w2 ),
k k
C(w, wik , L) = 0
k
PKatz (w|w1 ) = ..., C(w, w2k , L) > 0 n
⎪
⎪ ≈ M AXLj ∈d ( P (wi |Lj )) (6)
⎪
⎪
k
α(wk−1 )PGT (w|wk ), ...
⎩ i=1
0 C(w, wk , L) = 0
(3) where, PL1 (Q|d) represents the probability.
1) Model Deducing: In this model, we will deduce the
value of P (wi |d). From above results, each concept radicel
C. Sorting of Words match a Related Word Structure. This structure consist of two
parts, which are key word qi of query sentence and key word’s
From above discuss we can know that, when dealing with list Rqi of qi .
the query sentences, we should compute the relation degree Theoretically, qi and Rqi can represent concept radicel in
between words of qI in the related vocabulary table. The query sentence. But it’s difficult to make sure which ones
sequence of context-sensitive words of qi . The words more in Rqi have the same meaning with qi in context of query
related to qi should be put in the front of the table. sentence. We bring in relation set Rdi , which represent the
The meaning of qi in query sentence is determined by itself. probability whether each element in Rqi has the same meaning
So when sorting, the position of each word in the related with qi .
vocabulary is determined only the by it’s relation degree. Synchronously, qi is the represent word of concept radicel
Suppose the relation degree of W and qi in a sentence is Wi artificially. It is default that the checkman’ choice is the
proportion to the probability of their existed in window L, best suitable word. Thus, qi get a most part while computing
then we have the probability of Wi . Synthetically, we get the estimate
C(w, qi , L) formula of P (wi |d):
R(w, qi ) ∝ P (w|qi ) = (4)
C(qi , L) P (wi |Lj ) = λP (qi |Lj ) + (1 − λ)P (Rqi |Lj ) (7)
Then the value of the confirmed qi and C(qi , L) can be where, λ is a parameter, its value range is [0, 1]. The value is
confirmed. The relationship of W and qi can also be deter- lager, the weighter of qi in computing. P (Rqi |d) is the sum
mined by C(qi , L). C(qi , L) is computed by summing up the of probability of all the elements. Each element in Rqi has
appeared times of W and qi in window L. different relative degree of qI in context of query sentences.
In a sentence, the meaning of a word is not confirmed by So it should multiple the corresponding value of Rdi , then:
all the words in this sentence, but can be confirmed by few |Rqi |

words. It means that the relation degree between the related P (Rqi |Lj ) = Rdi [k] ∗ P (Rqi [k]|Lj ) (8)
word W of qi and the key word in query sentence is only k=1
depended on few words, but not on the all query words. So,
where |Rqi | is the length of Rqi , Rqi [k] is the k-th element
after sorting by the word’s existence together, all the words
of Rqi , Rdi [k] is the k-th element of Rdi .
that have little appearance can be ignored, and keep the close
Relate the correlation computing, Rdi [k] can be written as
connective words. We use absolute- value technology to reach
it. Rdi [k] = P (Rqi [k]|q1b ) (9)

where q1 , ..., qb represent the words sets of context words after Modify this arithmetic by add m/s sections,
ordering and interception. The sum of Rdi may be larger than
s
1−η
1, so it need to to be gether to 1 in order to make sure that the P (Q|d) = η ∗ Pl1 (Q|d) + ∗ PL(k−1)∗s/m (Q|d) (14)
sum of correlation is 1. P (Rqi |d) should be a legal probability. s 1
k=2
Then, here, M is the sections of article, S is the amount of insert-
|Rdi |
value.
Rdi [k]
total = [k], Rdi [k] = , k = 1...|Rdi | (10) 3) Parameter Estimating: Five parameters need esti-
total mate,they are λ, P (qi |Lj ), P (Rqi [k]|Lj ), η and s. Take
k=1
maximum likelihood estimation method to P (qi |Lj ) and
Thus we have,
P (Rqi [k]|Lj ), take EM arithmetic to estimate λ, η and s, in
order to simplify it in model, we estimate it in section IV.
P (wi |Lj ) = λP (qi |Lj ) + (1 − λ)
|Rqi |
B. Bigram Section Language Model
( Rdi [k] ∗ P (Rqi [k]|Lj )) (11) Although the implication of Unigram Section Language
k=1 Model is simple, it supposes every concept radicel is abso-
lute, this is not accord the truth. Actually, according to link

n
grammar, each word in query sentence is related to at least
P (Q|Lj ) ≈ (λP (qi |Lj ) + (1 − λ) one other word. If a word is related to the word in front, then
i=1
this language model is Bigram Section Language Model. Like
|Rqi |
this:
( )Rdi [k] ∗ P (Rqi [k]|Lj )) (12)
k=1

n
2) Model Glide: In short language model, we take section P (Q|d) = P (w1 , w2 , ..., wn |d) ≈ P (wi |d)∗ P (wi |wi−1 , d)
as the statistic unit. But this will bring a serious data aparse- i=2
ness problem. This means that there are not all the related (15)
words can be found in one section of an article, but they can If we take section as the unit of statistic, the same with
be found in serval sections. This time, we use glide technology Unigram Section Language Model, the probability of whole
which is similar with deleting-insert value model. article is the maximal value of section. Like this:
When glide, the computing of probability is not based only PL1 (Q|d) = M AXLj ∈d PL1 (Q|Lj ) ≈ M AXLj ∈d
on a single section, but depended on a window. The size of this
n
window is alterable. It will be plus to 1 after each computing, (P (w1 |Lj ) ∗ P (wi |wi − 1, Lj )) (16)
until the whole article. Here, we need to abate the value of i=2
single section, using rebate rate η, that is, multiple the whole In Unigram Section Language Model, when compute the
probability before this added section with the rebate rate. η after-check probability of concept radicel X which is repre-
shows that, in article A and B, if there are all the query concept sented by qi−1 in qi or Rqi , we need consider which word is
radicel in some section in A, but small frequency, and in B, suitable to represent X, a better way is, statistic the after-check
there are no query concept radicel in any section, but they probability according to the proportion of words in qi−1 and
are in serval other sections with a large frequency, obviously, Rqi−1 , then compute the sum of them. From this ,the value can
η is larger,the correlation of A is higher, η is smaller, the be more exact, but need to very complex computing. Simplfy
correlation of B is higher. It needs a lot of practices to confirm it with qi−1 to get the after-check probability. like this:
a suitable η, then
P (wi |wi−1 , Lj ) = λP (qi |qi−1 , Lj ) + (1 − λ)P (Rqi |qi−1 ,

m |Rq |
1−η Lj )LP (Rqi |qi−1 , Lj ) = k=1i Rdi [k] ∗ P (Rqi [k]|qi−1 , Lj )
P (Q|d) = η ∗ PL1 (Q|d) + ∗ PLk1 (Q|D) (13)
m (17)
k=2
There are much more data sparseness in bigram section lan-
here, PL1 represents that the size of the window is 1 section. guage model, so we need modify P (wi |wi−1 , Lj ) besides the
PLk1 represents that the size of a window is k sections. modify of unigram section language model to data smoothness.
In equation 13, η is the rebate rate, the remainder rate is Thus we can take Good-Turing technic or rolling-back glide
(1 − η), after K times abate, each takes 1/m of the remainder technic. Not expound any more here.
rate. Because it only can add one section after each abate, In parameter estimate, the parameters needing estimate are
but the remainder rate should reduce (1 − η)/m of it, it is λ, P (qi |qi−1 , Lj ), P (Rqi [k]|Lj ), η and s. For λ, η and s,
good for short article but not long article. Look this situation: take the method of that to unigram model, but it is different
two article, the one has more sections while the other has to estimate P (qi |qi−1 , Lj ) and P (Rqi [k]|Lj ), that is:
fewer. Suppose the concept radicel distribute equably, but can
C(qi , qi−1 , Lj )
not find all the concept radicel in any N sections, only through P (qi |qi−1 , Lj ) = (18)
the whole article. This time, by using this glide arithmetic, the C(qi , Lj )
article with less sections has higher correlation, but they should Define C(qi , qi−1 , Lj ) as the times of existence together in
be close usually. Thus, the arithmetic shoule be improved. window Lj of qi and qi−1 . This is different from traditional

bigram language model, because after deleting-no meaning C. Testing of Unigram Section Language Model and Tradi-
words in query sentences, qi−1 and qi will not be together tional Unigram Language Model
assuringly, we do not tatistic the existence frequency like In traditional language model, the model is depended on the
traditional model. The range of existence together of qi−1 whole article, and the statistic is also carried out in this article,
and qi is needful. Considering the probability of correlation and not consider the reflection of the related words. Set λ to
in meaning of words in a sentence will be large, that is, if be 1, this will remove the related words. Set η to be 0, this
qi−1 and qi exist together both in a same query sentence and will remove the reflection of section frequency. Set s to be
a sentence of an article, this is not an accident in a large degree. 1, this means the window is the whole article. Then, section
It shows some correlation between them and query sentence language model will be considered as a traditional language
in meaning, that is, it will estimate the probability of query model. Fig.2 shows the graph of veracity recalling rate in these
sentence exactly, this is the core of language model. If they two model.
not exist together in same sentence but in a window even an From Fig.2, we can see that Unigram section language
article, the chance of existence together of qi−1 and qi will be model can get higher recalling rate. It also can increase the
increased deeply,finally influent the estimate. This is why we veracity greatly. The traditional language model which did not
take sentence as the range of existence together of qi−1 and consider the reflection of related words creates language model
qi . only based on the words in query sentence. It is clear that only
IV. T EST AND R ESULT A NALYSIS half of information related to the query would be returned
in the The traditional language model. Here, the veracity of
A. Testing Data
unigram section language model is from 0.5 to 0.6.
The testing data to evaluate the performance of section
language model is from the open data in people daily of
China. 3238 articles, 873612 vocabularis and 56445 different D. Comparing of Unigram Section Language Model and
vocabularies have been involved in our test. The average length Bigram Section Language Model
of the articles is 269 words. Unigram Section Language Model don’t consider the rela-
tionship of concept radicel in query sentence. Bigram Section
B. Testing Results of Unigram Section Language Model Language Model consider concept radicel only related to the
In this subsection, the setting of parameters in model front concept, which is a complement to unigram model in
deducing and model glide of unigram section language model some degree.
are studied in detail.
1) λ Confirming: Take “the economic development of
1.1000
china” as an example to see about the relationship of λ 1.0000
between insert-value veracity and recalling rate. Fig. 1 shows 0.9000
the graph of the average insert-value veracity in retrival model 0.8000
when λ get the different values. 0.7000
Veracity
From Fig.1, we can see that, the average insert-value will 0.6000
increase when λ becomes larger. It also shows that the words 0.5000
in query sentence can mostly express the concept of retrieval. 0.4000
This is also to say that the words user picking out must 0.3000
0.2000
express the meaning of what he query most definitely. The
0.1000
veracity of average insert-value decrease abruptly when λ 0.0000
is 1.0. This is because the recalling rate is very low when 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
querying(only 30%). Tt shows that, when construct language Recalling Rate
model, if we only consider user’s query sentence, but not
consider the reflection of the related words’, then it will result
in no recalling back the great mass of connective article. Fig. 3. Recalling probability-veracity graph ofUnigramBigram Section
In the testing, we found that when the veracity of average Language Model and traditional Language Model.
insert-value get the peak value, the value of λ is all above 0.6.
In most case, the veracity of average insert-value change little Fig.3 shows the graph of the recalling rate-veracity of
when λ is changed from 0.6 to 0.9. So in this paper, the value Unigram Section Language Model, Bigram Section Language
of λ is set to be 0.6. Model and traditional language model with related vocabulary
2) The Parameters Setting: In order to smooth section table. In Unigram Section Language Model, we set λ to 0.6,
language model, we take some technology like recalling back. η to 0.2, S to 3. In Bigram Section Language Model, λ is set
Firstly, we consider only one section, then two sections, serval to be 0.6, η is set to be 0.3, S is set to be3 . In traditional
sections, finally all the sections. In the end, article’s probability language model, λ is 0.6, η is 0, S is 1.
can be determined. Among it, two parameters, η, denote The related vocabulary table effect much to retrieval result.
recalling-back quotiety, and S, denote recalling-back times The results is more better after adding related words. Bigram
should be considered. In our testing, the value of η is set Section Language Model has better results than Unigram Sec-
to be 0.2, and s is set to be 3.0. tion Language Model. However, in Bigram Section Language

Average Insert-Value Veracity
Average Insert-Value Veracity

0.600
0.550
0.500
0.450
0.400
0.350
0.300
0.250
0.200
0.150
0.100
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
ƒ¸
Fig. 1. The Values and average insert-value veracity graph λ.
1.100
Section Language Model
1.000
Traditional Language Model
0.900
0.800
0.700
Veracity
0.600
0.500
0.400
0.300
0.200
0.100
0.000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Recalling Rate
Fig. 2. The graph of Veracity recalling rate of unigram section language model and traditional language model.
Model, the data sparseness is more serious, the recalling rate [2] Berger, A. and Lafferty, J., “Information Retrieval as Statistical Transla-
is 90%. By using the related words table, The recalling rate tion”, Proceedings of Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, 222-229, 2005.
have been improved greatly in traditional language model. [3] Good, I.J., “The population frequencies of species and the estimation of
population parameters”, Biometrika, 40, 16-264.
[4] Witten, I.H. and Bell, T.C., “The zero-frequency problem: Estimating
V. SUMMARY the probabilities of novel events in adaptive text compression”, IEEE
Transactions on Information Theory, 37(3), 1085-1094, 2001.
This paper presents a new language model: section language [5] Jelinek, F. and Mercer, R.L., “Interpolated estimation of Markov source
model. We propose two specific models of language, named parameters from sparse data”, Proceedings of Pattern Recognition in
Practice North Holland, Amsterdam, 381-397, 2004.
Unigram Section Language Model and Bigram Section Lan- [6] Katz, S.M., “Estimation of probabilities from sparse data for the lan-
guage Model. A thorough research has been done in the field guage model component of a speech recognizer”, IEEE Transactions on
of the correlative word set in those model. Simulations have Acoustics, Speech and Signal Processing, 35(3), 400-401, 2003.
been carried out to show our results.
R EFERENCES
[1] Schank, R.C., “Conceptual Information Processing”, North-Holland
Publishing Company , 1975.

Markerless Tracking Based on Transiently Chaotic Neural Network with Invariant

Features for Augmented Reality System
XINYU LI
School of Automation Engineering
University of Electronic Science and Technology of China
Chengdu ,Sichuan Province, China
Abstract: Tracking and registration of camera and object is one of We propose an approach to increase the accuracy of
the most important issues in Augmented Reality (AR) systems. features matching in the initialization phase of the tracking.
Markerless visual tracking technologies with image feature are used The algorithmic process can be divided in two stages. During
in many AR applications. Feature point based neural network image the first, offline stage, features are extracted from the
matching method has attracted considerable attention in recent years.
reference images by Scale Invariant Feature Transform (SIFT),
This paper proposes an approach to feature point correspondence of
image sequence based on transient chaotic neural networks. Rotation which are invariant to image scaling and rotation, and partially
and scale invariant features are extracted from images firstly, and invariant to changes in illumination and viewpoint. In the
then transient chaotic neural network is used to perform global second stage, the feature matching problem firstly is
feature matching and perform the initialization phase of the tracking. transformed into the optimization problem. Due to its
Experimental results demonstrate the efficiency and the effectiveness capability of collective computation and parallel processing,
of the proposed method. feature matching can be performed efficiently and reliably via
transiently chaotic neural network.
The remainder of the paper is organized as follows. First,
1 Introduction the method for extracting scale and rotation invariant features
Augmented Reality (AR) is an advanced technology for is provided in section 2. Section 3 explains in more detail how
enhancing or augmenting a person’s view of the real world the features matching based on transiently chaotic neural
with computer generated “Virtual” objects. AR systems network is carried out. Finally, section 4 discusses some
overlay these “Virtual” objects onto the real world to increase experimental results obtained with our AR system. Section 5
users’ visual experience, expand theirs visual system, and help concludes the paper.
them achieve theirs tasks by a more natural way. So it has a
wide range of applications in human-computer interaction,
industry maintenance, multimedia computing, medical and
2 Feature Extraction
military training [1]. Feature point extraction is an important pre-processing step in
In order to overlay the “Virtual” objects onto the real image processing and computer vision for applications such as
world, AR systems firstly need to know accurate geometric image registration, object recognition among others. Many
relations between real objects and users’ viewing position to different algorithms have been developed over the past few
locate “Virtual” objects onto suitable position of the real world. years. These algorithms try to search for features that are
When the user moves his or her head and viewpoint, the relatively invariant to changes in orientation and lighting
“Virtual” objects must remain aligned with the three- conditions, so that the algorithms can find the same features in
dimensional (3-D) locations and orientations of real objects. other images with different backgrounds or points of view.
The accuracy of alignment is depended on the tracking David Lowe proposed an efficient algorithm SIFT (Scale
technologies. Vision-based tracking methods are regarded as Invariant Feature Transform) to extract features that are local
the common and promising solutions amongst many other extrema in the scale-space pyramid built with difference-of-
technologies mostly due to their accuracy as well as flexibility Gaussian (DoG) filters[4,5]. This algorithm initially presented
and ease of use. AR tracking based on fiducial markers in the to the computer vision community in a paper a few years ago
scene has been highly successful [2,3]. However in some AR (1999) and more recently with some improvements to the
applications (e.g. maintenance in automotive industry), it is algorithm (2004). Features detected by this algorithm are
hard to use markers. Furthermore, marker–based tracking is invariant to small affine image transforms and small changes
very sensitive to occlusions. So the markerless virtual tracking in lighting, so are quite robust compared to some of the other
methods which directly use of scene natural features instead algorithms used to detect features for object recognition [6].
of the markers are much desirable. As use of these features for The main attractions of SIFT features are their
tracking, the character of features and features matching are distinctiveness and invariance, resulting in a high probability
the most important issues in the initialization phase of the of correct matches across a wide range of image variations. In
tracking. addition, many of these features densely populating a typical
image can be efficiently extracted (see Fig.1), making them
* Supporting fund: Project supported by the National suitable for recognition and tracking in the presence of
Natural Science Foundation of China (60674077). occlusions, and generally for algorithms benefiting from a

large number of feature matches. In this section we give a Since Hopfield and Tank proposed the Hopfield neural
brief overview of the feature extraction algorithm (see [4, 5] network (HNN) for the traveling salesman problem, many
for more details). engineering problems have been formulated as optimization
problems in which an energy function is minimized. Feature
matching problem is actually an optimization problem [7]. So
feature point based Hopfield Neural Network image matching
method has attracted considerable attention in recent years
[8,9]. However, there often exists much difference between
two images, such as object translating, rotation, scale variety,
lighting variety and scene change, thus the result of image
matching is affected greatly. In addition, Hopfield Neural
Network is often trapped in local minima, which gives an
optimization solution with an unacceptable high cost.
To overcome drawbacks mentioned above, transiently
chaotic neural work (TCNN)is used to optimize the computing
process of feature matching [10]. TCNN exploiting the rich
Fig.1 SIFT features shown as white arrows extracted from a behaviors of nonlinear dynamics have been developed as a
320×240 image of a book new approach to extend the problem solving ability of
standard HNN. The TCNN model can be presented as follows:
In this initial stage of the algorithm, a source image
1
I ( x, y ) is convoluted using a 2-dimensional Gaussian xi (t ) yi ( t )(1H )
(4)
function G ( x, y, V ) to build the scale space of an image,
1 e
n
which is defined as a function L( x, y, V ) : yi (t 1) kyi (t) D( ¦w x (t) I ) z (t)(x (t) I )
ij j i i i 0 (5)
L ( x , y , V ) G ( x, y , V ) I ( x, y ) (1) j 1, jzij
where * is the convolution operation in x and y , and z i (t 1) (1 E ) z i (t ) (6)

( x2 y2 )
1 where (i 1,2, , n) , xi = output of neuron i, y i =
2V 2
G ( x, y , V ) e
2SV 2 internal state of neuron i, wij = connection weight from
(2) neuron j to neuron i , wij w ji , I i = input bias of neuron
whereĥrepresenting the scale axis of the image.
To efficiently detect stable keypoint locations in scale i, D = positive scaling parameter for inputs, k = damping
space, scale-space extrema found in the difference-of- factor of nerve membrane (0 d k d 1) , z i (t ) = self-feedback
Gaussian function from the convolved image pyramid that is
computed from the difference of two nearby scales separated connection weight (refractory strength) t 0, E = damping
by a constant multiplicative factor k: factor of z i (t ), 0 E 1, I 0 = positive
L ( x, y , V ) (G ( x, y, kV ) G ( x, y,V )) I ( x, y ) parameter, H =steepness parameter of the output function
(3)
L( x, y, kV ) L( x, y,V ) (H ! 0) .
Once all potential keypoint candidates have been identified, This neural network actually has transiently chaotic
the keypoints need to be checked for stability. Due to sensitive dynamics which eventually converges to a stable equilibrium
to small amounts of noise, the potential keypoints with low point through successive bifurcations like a route of reversed
contrast or poor localization are discarded. period-doubling bifurcations, with the temporal evolution of
Besides image location and scale at which it was found, one variable z i (t ) according to eqn (3) which represents an
each stable feature is assigned an image orientation and a
exponential cooling schedule for the annealing.
feature descriptor vector, which reflect local image properties.
For image features matching, the structure of neural
Feature orientation and descriptor vector are computed from
network can be considered as a two-dimensional array.
gradient magnitudes and orientations.
Assume that reference image can be represented by a set of
The length of the descriptor vector varies depending on
feature points G, which size is M; An input scene image,
the number of orientation histograms used to accumulate the
which may consist of one or several overlapping objects, can
samples. Best results are achieved with 128 dimensions,
be represented as another set of feature points S, which size is
however smaller values are acceptable. These descriptors
N. So the number of neurons in the network will be MhN.
subsequently are used in the process of feature matching.
The output status of the neural vij represents the matching
3 Feature Matching by Neural Network

state of point No. i, and No. j. If i matched j, vij will be set to 1. SIFT features are extracted from the model image and
scene image respectively.
1, else vij will be set to 0. 2. Set the initial state of the network and update the state
According to energy function corresponding to image of the network until a stable status is obtained.
matching based on HNN[8], we define new energy function to
TCNN as following :
M N M N
4 Experiment Results
E D ¦ ¦ ¦ ¦ C ikjl v ik v jl To collect input images, an IBM ThinkPad R50e with a
i 1 k 1 j 1 l 1 Pentium 4 processor (1.6 GHz) and a Logitech QuickCam IM
(7)
A M N
B N M USB camera are used in the experiments. Fig.2 (a) shows the
¦
2 i1
(1 ¦
k 1
v ik ) 2
¦
2 k 1
(1 ¦
i 1
v ik ) 2 model image of a book, which is the tracked object. The SIFT
algorithm found 356 features in this image. Fig.2 (b) shows
A B the scene image in which the tracked object is placed under
where D, , respectively denotes a weight coefficient of different environment with changes in illumination, scales,
2 2 rotations and perspectives.
each constraint; vik denotes the matching degree that point k
in the scene image feature point set is matched with point i in
the model image feature point set, which is corresponding to
output state of a neuron. If the ith node of the model image
matches the kth node of the scene image, will be one;
otherwise, it will be zero.
The first terms of (3) uses the information of relational
properties between the two images. The second and third
terms are uniqueness constraints which force that at most one (a) (b)
Fig.2 (a) model image (b) scene image
neuron will be active in each column and row of the network.
Cikjl is the connection weight between neural vik and v jl , During the matching process the network parameters in
which is defined as following: equation (4), (6) and (9) are set as: A B D 1, k ˙
128
2
128
2 0.9, H ˙ 1/250, I 0 ˙ 0.65, D ˙ 1.5. The features matching
Cikjl w1 ¦ f i , m f k , m w2 ¦ f j , m f l , m (8) results are shown in Fig.3. These correspondences are
m 1 m 1
connected by white lines. The number of matching features is
where w1 , w2 are weight coefficients, f i ,m and f j ,m are the 56, and match probability is 90.3%.
value of mth descriptor for ith and jth SIFT feature in model
image. f k , m and f l ,m are the value of mth descriptor for kth
and lth SIFT feature in scene image.
The derivative of equation (7) with respect to vik as
following:
M N
wE
D ¦¦ C ikjl v jl
wvik j 1 l 1
(9)
N M
A(1 ¦ vik ) B(1 ¦ vik )
k 1 i 1
from (5) and (9), the following equation of motion can be
obtained:
M N Fig.3 The matching results
yi (t 1) kyi (t) zi (t)xi (t) D(D¦¦Cikjlvjl The feature matching only is the initialization phase of
j 1 l 1
(10) the markerless tracking. Matched pairs are verified by fitting
N M
an affine transform to remove the remaining outliers. From the
A(1 ¦vik ) B(1 ¦vik )) zi (t)I0 remaining inliners, a homographic matrix between the current
k 1 i 1
and the reference image is calculated by RANSAC (Random
when the network is convergent to a stable state, the statuses Sample Consensus). Through the homographic matrix, motion
of neuron give the matching results. Thus the algorithm of parameter of the camera can be worked out and the
TCNN for SIFT features matching can be summarized in the registration of virtual objects can be finished. Fig.4 shows a
following step:

virtual teapot is placed on a book that is the tracked object in
the model desk scene.
Fig.4 virtual object overlap on the real scene
5 Conclusions and Future Work

This paper presented a new approach of feature matching.
Features are extracted from the reference images by SIFT,
which are invariant to image scaling and rotation, and partially
invariant to changes in illumination and viewpoint. TCNN is
used to optimize the computing process of feature matching,
which treated with the minimization of an energy function.
Robust matching are proved by experimental results.
In the future works, we are under going to decrease the
computation cost of SIFT feature extraction and use the method
in real-time augmented reality systems.
References
[1] Azuma, Ron; Baillot, Yohan; Behringer, Reinhold; Feiner, Steven; Julier,
Simon and MacIntyre, Blair: Recent Advances in Augmented Reality.
IEEE Computer Graphics and Applications, 25(6) (2001):34-43.
[2] H. Kato and M. Billinghurst, Marker tracking and HMD calibration for a
video-based augmented reality conferencing system, Proc. IWAR ’99,
(1999), 85-94.
[3] X. Zhang, S. Fronz, and N. Navab, Visual marker detection and decoding
in AR systems: a comparative study, Proc. ISMAR ’02,(2002), .97-106.
[4] D. G. Lowe. Object recognition from local scale-invariant features.
Proceedings of the 7th International Conference on Computer Vision,
(1999), 1150–1157.
[5] Lowe, D.G., Distinctive image features from scale-invariant keypoints.
IJCV 60 (2004) 91–110.
[6] Yan Ke and Rahul Sukthankar, PCA-SIFT: A more distinctive
representation for local image descriptors, CVPR, (2004), 506-503.
[7] Hopfield, J.J., Tank D.W., Neural Computations of Decisions in
Optimization Problems.Biol. Cybern., 52 (1985) 141-152.
[8] Shi, Z., Huang, S., Feng, Y., Artificial Neural Network Image Matching.
Microelectronics and Computer, 20 (2003) 18-21.
[9] Wen-Jing Li, Tong Lee, Hopfield Neural Networks for Affine Invariant
Matching,IEEE TRANSACTIONS ON NEURAL NETWORKS,12(6),
(2001), 1400-1410.
[10]Chen.L.,Aihara.K., Chaotic simulated annealing by aneural network
model with transient chaos, Neural Networks,(1995),8(6), 915-930.

Identification, Filtering and Control of Nonlinear Plants by

Recurrent Neural Networks Using First and Second Order
Algorithms of Learning
IEROHAM S. BARUCH SAUL ESCALANTE M. CARLOS R. MARIACA GASPAR.
Department of Automatic Department of Automatic Department of Automatic
Control Control Control
CINVESTAV-IPN, Mexico City, CINVESTAV-IPN, Mexico City, CINVESTAV-IPN, Mexico City,
Abstract—The paper proposed a new Recurrent Neural feedbacks, [5]. The application of the FFNN for modeling,
Network (RNN) model for systems identification and states identification and control of nonlinear dynamic plants
estimation of nonlinear plants. The RNN model is learned both caused some problems which could be summarized as
by the Backpropagation and by the recursive Levenberg- follows: 1. The dynamic systems modeling usually is based
Marquardt (L-M) learning algorithm. The estimated states of on the NARMA model which need some information of
the RNN model are used for direct adaptive trajectory tracking input/output model orders, and input and output tap-delays
control. The system contains also a noise rejection plant output
ought to be used, [5], [6]; 2. The FFNN application for
filter. The applicability of the proposed neural control system is
confirmed by simulation results with SISO and MIMO plants, Multi-Input Multi-Output (MIMO) systems identification
where comparative results obtained by both learning needs some relative order structural information, [6]; 3. The
algorithms are given. The results show a good convergence of ANN model structure ought to correspond to the structure of
both algorithms with priority of the recursive L-M algorithm. the identified plant where four different input/output plant
models are used, [5]; 4. The lack of universality in ANN
Index TermsBackpropagation learning, Levenberg- architectures caused some difficulties in its learning and a
Marquardt learning, nonlinear plants, recurrent trainable Backpropagation through time learning algorithm needs to
neural network, noise rejection plant output filter. be used, [7]; 5. Most of NARMA-based ANN models are
sequential in nature and introduced a relative plant-
dependent time-delays; 6. Most of the ANN-based models
I. INTRODUCTION are nonparametric ones, [5], and so, not applicable for an
indirect adaptive control systems design; 7. All this ANN-
T HE recent advances in understanding of the working
principles of artificial neural networks (ANN) and the
rapid growth of available computational resources led to
based models does not perform state and parameter
estimation in the same time, [4]; 8. All this models are
the development of a wide number of ANN-based appropriate only for identification of nonlinear plants with
modelling, identification, prediction and control smooth, single, odd, nonsingular nonlinearities.
applications, [1]-[6], especially in the field of mechanical Recurrent Neural Networks (RNN) possesses its internal
engineering and robotics. The ability of ANN to time-delayed feedbacks, so they are promising alternative
approximate complex non-linear relationships without prior for system identification and control, particularly when the
knowledge of the model structure makes them a very task is to model dynamical systems [2], [3], [4], [7]. Their
attractive alternative to classical modeling and control main advantage is the reduced complexity of the network
techniques. Many of the applications currently reported are structure. However, the analysis of state of the art in the area
based on the classical Nonlinear Autoregressive Moving of classical RNN-based modeling and control has also
Average (NARMA) model, where a Feedforward Neural shown some of their inherent limitations as follows: 1. The
Network (FFNN) is implemented, [4], [5], [6]. However, the RNN input vector consists of a number of past system inputs
FFNN has in general a static structure, therefore it is and outputs and there is not a systematic way to define the
adequate to approximate mainly static (nonlinear) optimal number of past values [4] and usually, the method of
relationships and their real-time applications for dynamical trials and errors is performed; 2. The RNN model is
systems require the introduction of external time-delayed
naturally formulated as a discrete model with fixed sampling
period, therefore, if the sampling period is changed, the
The M.S. student Saul-Fernando Escalante Magana and the Ph.D. network has to be trained again; 3. It is assumed that the
student Carlos-Roman Mariaca-Gaspar are thankful to CONACYT, Mexico plant order is known, which represents a quite strong
for the fellowships received during their studies at CINVESTAV-IPN,
Mexico.
modeling assumption in general, [5]. Driven by these
I. S. Baruch is with the Department of Automatic Control, limitations, a new Recurrent Trainable Neural Network
CINVESTAV-IPN, ave. IPN No 2508, Col. Zacatenco, A.P. 14-740, 07360 (RTNN) topology and the recursive Backpropagation (BP)
Mexico D. F., Mexico (telephone: (+52-55) 5061-3800/ext. 42-29, e-mail:
type learning algorithm in vector-matrix form was derived,
baruch@ctrl.cinvestav.mx).

[8]-[12] and its convergence was studied, [9], [10]. But the diagram of the RTNN topology (see Fig. 1), we could
recursive BP algorithm, applied for RTNN learning, is a construct an error predictive adjoint RTNN, which is given
gradient descent first order learning algorithm which not in the Fig. 2. Following this adjoint RTNN block diagram,
permits to augment the precision and to accelerate the we could derive the next RTNN weight updates, [9], [10],
learning. So, the aim of the paper is to apply for RTNN given in matrix-vector form:
learning a second order algorithm like the Levenberg-
Marquardt (L-M) algorithm, [13]-[15] it is. The RTNN with 'C (k ) E1 (k ) Z T (k ) (6)
L-M learning will be applied for SISO/MIMO plants '
(7)
E1 (k ) F [Y (k )]E (k ); E (k ) Yp (k ) Y (k )
identification, filtering and control.
'B (k ) T
E3 (k )U (k ) (8)
II. TOPOLOGY AND LEARNING OF THE RTNN 'A(k ) T
E3 (k ) X (k ) (9)
E3 (k ) '
G [ Z (k )]E2 (k ); E2 (k ) T
C E1 (k ) (10)
A. RTNN topology
A RTNN model and its learning algorithm of dynamic 'vA(k ) E3 (k ) X (k ) (11)
Backpropagation-type, together with the explanatory figures
and stability proofs, are described in [9], [10]. The RTNN Where: 'A , 'B, 'C are weight corrections of the of the
topology, given in vector-matrix form (see Fig. 1) is learned matrices A, B, C, respectively; E is an error vector
described by the following equations: of the output RTNN layer, where Yp is a desired target
vector and Y is a RTNN output vector, both with
X ( k + 1) = AX ( k ) + BU ( K ) (1) dimensions l; X is a state vector, and E1, E2, E3 are error
Z (k ) G[ X (k )] (2) vectors, illustrated on Fig. 2; F’, G’ are diagonal Jacobean
matrices with appropriate dimensions, which elements are
Y (k ) F [CZ (k )] (3)
derivatives of the activation functions. The equation (9)
A block-diag (aii ) ; a ii 1 (4) represents the learning of the feedback weight matrix of the
hidden layer, which is supposed to be a full (nxn) matrix.
Where: Y, X, and U are, l, n, m output, state and input The equation (11) gives the learning solution when this
vectors; A = block-diag (aii) is a (nxn)- state block-diagonal matrix is diagonal vA, which is our case. The stability and
weight matrix; aii is an i-th diagonal block of A with (1x1) the properties of the BP - RTNN learning algorithm, given
or (2x2) dimension. Equation (4) represents the local by the equation (5)-(11), are proved by one theorem and two
stability condition, imposed on all blocks of A; B and C are lemmas, [9], [10].
(nxm) and (lxn)- input and output weight matrices; G, F are C. Theorem of Stability for the BP- RTNN learning
vector-valued sigmoid or hyperbolic tangent-activation Let the RTNN with Jordan Canonical Structure, [9], is
functions; k is a discrete-time integer variable. given by equations (1), (2), (3), (4) and the nonlinear plant
B. Backpropagation RTNN learning model, [9] is as follows:
The general BP learning algorithm is given by:
X d (k 1) H [ X d (k ), U (k )] (12)
W (k 1) W (k ) K'W (k ) D'W (k 1) (5) Yd (k ) S [ X d (k )] (13)
Where: W is the weight matrix, being modified (A, B, C); Where: {Yd (.), Xd (.), U(.)} are output, state and input
' W is the weight matrix correction ( ' A, ' B, ' C); K variables with dimensions l, nd, m, respectively; H(.), S(.)
and D are learning rate parameters. Applying the are vector valued nonlinear functions with respective
diagrammatic method, derived in [16], and using the block- dimensions. Under the assumption of RTNN identifiability
made, the application of the BP learning algorithm for A(.),
B(.), C(.), in general matricial form, described by equation
(5)-(11), and the learning rates Ș (k), Į (k) (here they are
considered as time-dependent and normalized with respect
to the error) are derived using the following Lyapunov
Fig. 1. Block-diagram of the RTNN topology
function, [9], [10]:
L(k ) || J (k ) ||2 || B (k ) ||2 || C ( k ) ||2 (14)
Then the identification error is bounded, i.e.:
'L(k ) d K (k ) | E (k ) |2 D (k ) | E (k 1) |2 d ;
Fig. 2. Block-diagram of the adjoint RTNN (15)
'L(k ) L(k ) L(k 1)

Where all: unmodelled dynamics, approximation errors and
perturbations, are represented by the d-term. The complete ª Y T [W (k )] º
proof of the stability theorem and two lemmas are given in :T [W (k )] « »;
[9], [10]. ¬ 0 " 1 " 0 ¼
ª1 0 º 4 (28)
D. Recursive Levenberg-Marquardt RTNN learning /(k )1 « » ;10 d U d 10 ;
6
The general recursive L-M algorithm of learning, [13]- ¬ 0 U ¼

[15], is given by the following equations: 0.97 d D (k ) d 1;103 d P(0) d 106
W (k 1) W (k ) P(k )Y [W (k )]E[W (k )] (16)
The matrix ȍ(.) has dimension (nwx2), where the second
Y [W ( k )] g [W ( k ), U ( k )] (17)
row has only one unity element (the others are zero). The
E 2[W k ] {Yp (k ) g[W (k ),U (k )]}2 (18) position of that element is computed by:
w (19)
DY [W (k )] g[W , U (k )]
wW W W (k ) i k mod(nw) 1; k ! nw (29)
Where: W is the general weight matrix (A, B, C), under As the recursive L-M is based on the Newton method of
modification; P is the covariance matrix of the weights optimization, it does not needs a stability proof. Next the
estimates, being updated; DY is nw-dimensional gradient given up topology and learning is applied for SISO and
vector; Y is the RTNN output vector which depends of the MIMO systems identification and control.
updated weights and the input; E is an error vector; Yp is the
plant output vector which is in fact the target vector. III. DIRECT ADAPTIVE NEURAL CONTROL OF MIMO
Following the same RTNN adjoint block diagram we could SYSTEMS
obtain the values of DY for each updated weight, The block-diagram of the control system is given on
propagating D=Y through it (see Fig. 3). Following the Fig.4. It contains a recurrent neural identifier RTNN-1, two
block diagram of Fig. 3 we could apply equation (19) for neural controllers (feedback and feedforward) RTNN-2,
each element of the weight matrices (A, B, C) to be updated. RTNN-3, and a low pass noise rejection filter. In the direct
The corresponding gradient components are as follows: adaptive neural control, the weight parameters of the
feedback and feedforward controllers are learned so to
DY [Cij (k )] D1,i k Z j k (20)
minimize the cost function which is the reference tracking
D1,i k Fi c [Yi (k )] (21) quadratic instantaneous error of the plant output. The
structure of the closed loop system contains a neural
DY [ Aij (k )] D2,i k X j k (22)
identifier issuing an estimated state vector to the feedback
DY [ Bij (k )] D2,i k U j k (23) neural plant dynamics compensator. The control feedback
signal is added to the signal of the feedforward neural
D2,i k Gic [ Zi (k )]Ci D1,i k (24)
controller. The feedforward controller represented in fact an
inverse model of the feedback closed loop system and
So the Jacobean matrix could be formed as: depends on the reference signal. The system is completed by
a low-pass filter which is aimed to reject the plant and
DY [W (k )] [ DY (Cij ( k )), DY ( Aij ( k )), DY ( Bij (k ))] (25) measurement noises. So, the RTNN-1 identified the
combined dynamics of the plant and the filter, and estimates
The P(k) matrix is computed recursively by the equation: the states of this complex dynamic system. The plant and the
filter, including also the input and output noises, are
P ( k ) D 1 ( k ){P ( k 1) described by the following models:
(26)
P ( k 1):[W ( k )] S 1[W ( k )] :T [W ( k )] P ( k 1)}
Where: the S(.), and ȍ(.) matrices are given as follows:
S[W (k )] D (k )/(k ) :T [W (k )] P(k 1):[W (k )] (27)
Fig. 4. Block-diagram of the closed-loop RTNN control system

Fig. 3. Block-diagram of the adjoint RTNN, used for L-MA

X p k 1 \ [ X p k , U K , V1 k ] (30) Y * z W * z Yp W * z V2 ; X i z Pi z U z (44)
Yp k M X p k (31) U f b z Q1 z X z ;U f f z Q2 z R z
i
(45)
Yp1 k Yp k V2 (32) Yp z Wp z U z WpV1 z ;
(46)
Y k 1
*
A Y k B Yp1 k
* * * U z U f f z U f b z
(33)
A*Y * k B*Yp k B*V2
Effectuating some substitutions and mathematical
manipulations we could obtain the following statement for
Here the input, output and state dimensions of the plant are the systems control variable:
m, l, np. The filter dynamics is completely decoupled, so the
state matrix is diagonal (lxl) one. The plant equations (30) 1
(47)
U z ª¬ I Q1 z P i z º¼ Q2 z R z
and (31) could be linearized and written in the same state-
space form as:
The substitution of the control into the plant equation yields:
X p k 1 Ap X p k BpU k BpV1 k (34)
1
Yp k C p X p k (35) Yp z W p z ª¬ I Q1 z P i z º¼ Q2 z R z
(48)
W p z V1 z
The linearized identification RTNN-1 could also be
described by a state space model: The substitution of the plant equation into the systems
output equation finally gives:
X i k 1 Ai X i k BiU k (36)
Y * z W * z Wp z [ I Q1 z P i z ]1 Q2 z R z
Y k C X k
i i i
(37)
V3 z
(49)
Here the input, output and state dimensions of the RTNN-1

are m, l, ni. The feedback neural RTNN-2 controller has a Where V3(.) is a generalized noise term, given as:
similar linearized state space representation, which input is
the estimated systems state, issued by the RTNN-1: V3 z W * z [Wp z V1 z V2 z ] (50)
X c f b k 1 Ac f b X c f b Bc f b X i k (38) The RTNN topology is controllable and observable, [9], [10]

U f b k C c
X c
k (39) and the L-M algorithm of learning is convergent [13], [14],
fb fb
so the identification and control errors tend to zero:
Here the input, output and state dimensions of the RTNN-2
E i k Y * k Y i k o 0; k of
are ni, m, nfb. The feedforward neural RTNN-3 controller
(51)
could be described in the same manner as:
Ec k R k Y * k o 0; k of
X c f f k 1 Ac f f X c f f B c f f R k (40)
This means that each transfer functions given by equations
U f f k C c
f f X c
f f k (41)
(42), (43) is stable with minimum phase. From (48), it is
seen that the dynamics of the stable low pass filter is
Here the input, output and state dimensions of the RTNN-3 independent from the dynamics of the plant and it does not
are l, m, nff. Let us to write the following z-transfer- function affects the stability of the closed-loop system. The closed-
representations of the given up state-space equations for the loop system is stable and the RTNN-2 controller
plant, filter, feedback and feedforward controllers: compensates the combined “plant plus filter” dynamics. The
RTNN-3 feedforward controller dynamics is an inverse
1 1
W P z C P zI Ap B p ;W * k C * zI A* B* ; dynamics of the closed-loop system one, which assure a
(42) precise reference tracking in spite of the presence of process
i 1
Pi z zI A B i
and measurement noises.
1
Q z C zI A
1
c
fb
c
fb Bf b;
1
(43) IV. SIMULATION RESULTS
Q z C
2 zI A
c
f f
c
f f Bf f
A. Identification of SISO nonlinear plant
The control systems z-transfer functions (42), (43) are Let us consider the SISO mechanical plant governed by
connected by the following equations, given in z-operational the following state-space discrete-time nonlinear dynamic
form: equations, taken from [11]:

Y k 1 [ X1 X 2 X 3 X 5 ( X 3 1) X 4 ]/[1 X 2 2 X 32 ] (52) plants. The RTNN is learned both by the BP and by the
Where: second order recursive L-M learning algorithm.
X1 Y k , X 2 Y (k 1), (53)
X 3 Y (k 2), X 4 U (k ), X 5 U (k 1)
The input signal is as follows:
U k 0.25sign[sin 2S k / 30 ] 0.25 (54)
The RTNN topology is (1, 2, 1) and To=0.01. The graphics

of the identification results for BP and L-M learning
algorithms (plant with filter and plant without filter) are
given on Fig. 5-8, respectively. The results of final MSE%
are given in Table I. The results show a good convergence
of both BP and L-M learning algorithms with priority of the
L-M one. The 10% noise augmented the MSE% in both
cases.
B. Direct adaptive RTNN control of MIMO plant
The equations of the MIMO mechanical plant, taken from
[6], are given as follows:
X 1 k 1 0.9 X 1 (k )sin[ X 2 (k )]
ª X k U k º ª 2 X1 k º (55)
« 2 1.5 1 2 21 » U1 « X 1 k »U2
¬ 1 X 1U 1 k ¼ ¬ 1 X 21 k ¼
X3 k (56)
X 2 k 1 X 3 k ^1 sin[2 X 3 k ]`
1 X 23 k
X 3 k 1 ^3 sin[2 X k ]Ù k
1 2 (57)
Y1 k X 1 k ; Y2 k X2 k (58)
The input, output and state dimensions of the plant are 2, 2,
and 3. The reference signals R1, R2 of the control system are
chosen as:
R1 k J 1sign[sin S k /10 ] (59)

R2 k J 2 sign[sin S k /10 ] (60)
Where the reference amplitudes are random Gaussian zero-

means numbers. The results of the direct adaptive neural
control are given on Fig. 9 applying the L-M algorithm of
learning for all RTNNs. A 10% white noise is added to the
plant inputs and outputs. Detailed graphical simulation
results of BP and L-M learning of filtered and non filtered
plant outputs are given in Fig.10-13, respectively. The
results obtained for control MSE% are summarized in Table
II. The obtained results show the good convergence of both
learning algorithms. The noise augmented the MSE% of
control in both cases for systems without noise filters. The Fig. 5. Graphical results of SISO plant identification by means of
results show that the L-M algorithm of learning is more RTNN (BP learning and filtered plant output); a) Comparison of
precise but more complex that the BP one. the filtered plant output (continuous line) and RTNN output
(pointed line); b) Plant input with noise; c) Detailed view of a) in
V.CONCLUSION the first 10 sec; d) Detailed view of a) in the last interval 88-92
sec; e) MSE% of identification (last value -3.70%); f) RTNN states
The paper proposed a new RTNN model for systems
(ni=2).
identification and states estimation of nonlinear mechanical

Fig. 6. Graphical results of SISO plant identification by means of Fig. 7. Graphical results of SISO plant identification by means of
RTNN (BP learning without plant output filtering); a) Comparison RTNN (L-M learning and filtered plant output); a) Comparison of
of the filtered plant output (continuous line) and RTNN output the filtered plant output (continuous line) and RTNN output
(pointed line); b) Plant input with noise; c) Detailed view of a) in (pointed line); b) Plant input with noise; c) Detailed view of a) in
the first 10 sec; d) Detailed view of a) in the last interval 88-92 the first 10 sec; d) Detailed view of a) in the last interval 88-92
sec; e) MSE% of identification (last value -4.22%); f) RTNN sec; e) MSE% of identification (last value -0.48%); f) RTNN states
states (ni=2). (ni=2).

Fig. 8. Graphical results of SISO plant identification by means of
RTNN (L-M learning without plant output filtering); a)
Comparison of the filtered plant output (continuous line) and
RTNN output (pointed line); b) Plant input with noise; c) Detailed
view of a) in the first 10 sec; d) Detailed view of a) in the last
interval 88-92 sec; e) MSE% of identification (last value -1.58%);
f) RTNN states (ni=2).
TABLE I
MEAN SQUARED ERROR OF SISO PLANT IDENTIFICATION
Learning / noise filtering MSE (%)
BP with filter 3.70
BP without filter 4.22
L-M with filter 0.48
L-M without filter 1.58

Fig. 9. Graphical simulation results of MIMO plant direct neural
control using a noise rejection filter and a L-M RTNN learning; a)
comparison between the first filtered plant output (pointed line)
and the first reference signal (continuous line); b) comparison
between the second filtered plant output (pointed line) and the
second reference signal (continuous line); c) comparison between
the first filtered plant output (continuous line) and the first output
of the identification RTNN (pointed line); d) comparison between
the second filtered plant output (continuous line) and the second
output of the identification RTNN (pointed line); e) first control
signal; f) second control signal; g) Means Squared Error of control
of the first filtered plant output; h) Means Squared Error of control
of the second filtered plant output; i) state variables (ni=3) of the
identification RTNN used for control.
Fig. 11. Detailed graphical simulation results of MIMO plant direct

neural control using/or not a noise rejection filter and a BP RTNN
learning in the end of simulation (23-28 sec.); a) comparison
between the first filtered plant output (pointed line) and the first
reference signal (continuous line); b) comparison between the
second filtered plant output (pointed line) and the second reference
signal (continuous line); c) comparison between the first unfiltered
plant output (pointed line) and the first reference signal (continuous
line); d) comparison between the second unfiltered plant output
(pointed line) and the second reference signal (continuous line).
The estimated states of the recurrent neural network model

are used for direct adaptive trajectory tracking control
systems design. The system contains also a noise rejection
output filter, which dynamics is separated from the
dynamics of the control system. The applicability of the
proposed neural control system with L-M learning is
confirmed by simulation results with SISO and MIMO
mechanical plant and compared with the results obtained by
the BP learning algorithm. The MSE% results of learning,
summarized in Tables I, II show good convergence of both
L-M and BP learning algorithms. The presence of noise
terms augmented the MSE% of identification and control in
both cases of systems without noise filters. The L-M
algorithm of learning is more precise but more complex then
Fig. 10. Detailed graphical simulation results of MIMO plant direct the BP one.
neural control using/or not a noise rejection filter and a BP RTNN
learning in the first 2 seconds of simulation; a) comparison
TABLE II
between the first filtered plant output (pointed line) and the first MEAN SQUARED ERROR OF MIMO PLANT CONTROL
reference signal (continuous line); b) comparison between the
Learning / noise filtering MSE1 (%) MSE2 (%)
second filtered plant output (pointed line) and the second reference
BP with filter 2.89 3.02
signal (continuous line); c) comparison between the first unfiltered
BP without filter 3.44 3.80
plant output (pointed line) and the first reference signal (continuous L-M with filter 2.25 2.32
line); d) comparison between the second unfiltered plant output L-M without filter 2.68 2.52
(pointed line) and the second reference signal (continuous line).

Fig. 12. Detailed graphical simulation results of MIMO plant direct Fig. 13. Detailed graphical simulation results of MIMO plant direct
neural control using/or not a noise rejection filter and a L-M RTNN neural control using/or not a noise rejection filter and a L-M RTNN
learning in the first 2 seconds of simulation; a) comparison learning in the end of simulation (23-28 sec.); a) comparison
between the first filtered plant output (pointed line) and the first between the first filtered plant output (pointed line) and the first
reference signal (continuous line); b) comparison between the reference signal (continuous line); b) comparison between the
second filtered plant output (pointed line) and the second reference second filtered plant output (pointed line) and the second reference
signal (continuous line); c) comparison between the first unfiltered signal (continuous line); c) comparison between the first unfiltered
plant output (pointed line) and the first reference signal (continuous plant output (pointed line) and the first reference signal (continuous
line); d) comparison between the second unfiltered plant output line); d) comparison between the second unfiltered plant output
(pointed line) and the second reference signal (continuous line). (pointed line) and the second reference signal (continuous line).
REFERENCES IEEE Transactions on Neural Networks, vol. 1, No1,

1990, pp. 4 -27.
[1] W.T. Miler III, R.S. Sutton, and P.J. Werbos, Neural
[6] K.S. Narendra, and S. Mukhopadhyay, “Adaptive
Networks for Control, London: MIT Press,, 1992.
[2] S. Chen, and S.A. Billings, “Neural networks for control of nonlinear multivariable systems using neural
nonlinear dynamics system modeling and networks,” Neural Networks, vol. 7, No 5, 1994, pp.
identification,” International Journal of Control, vol. 737-752.
56, 1992, pp. 319-346. [7] L. Jin, and M. Gupta, “Stable dynamic
[3] S.A. Pao, S.M. Phillips, and D.J. Sobajic, “Neural net backpropagation learning in recurrent neural
computing and intelligent control systems,” networks,” IEEE Transactions on Neural Networks,
International Journal of Control, vol. 56, 1992, pp. vol. 10, 1999, pp. 1321-1334.
263-289. [8] I.S. Baruch, J.M Flores, F. Thomas, and R. Garrido,
[4] K.J. Hunt, D. Sbarbaro, R. Zbikowski, and P.J. “Adaptive neural control of nonlinear systems,” (G.
Gawthrop, “Neural network for control systems - A Dorffner, H. Bischof, and K. Hornik-eds.), Artificial
Survey,” Automatica, 28 (6) (1992) 1083-1112. Neural Networks-ICANN 2001, Lecture Notes in
[5] K.S. Narendra, and K. Parthasarathy, “Identification Computer Science, vol. 2130, Berlin: Springer, ISBN
and control of dynamic systems using neural 3-540-42486-5, 2001, pp. 930-936.
networks,”

[9] F. Nava R., I.S. Baruch, A. Poznyak, and B. Nenkova,
“Stability proofs of advanced recurrent neural
networks topology and learning,” Comptes Rendus
(Proceedings of the Bulgarian Academy of Sciences),
ISSN 0861-1459, vol. 57, No 1, 2004, pp. 27-32.
[10] I.S. Baruch, J.M. Flores, F. Nava, I.R. Ramirez, and B.
Nenkova, “An advanced neural network topology and
learning applied for identification and control of a D.C.
motor,” in: Proc. of the First Int. IEEE Symposium on
Intelligent Systems, Varna, Bulgaria, 2002, pp. 289-
295.
[11] I.S. Baruch, E. Gortcheva, and R. Garrido, “Recurrent
neural networks for identification of nonlinear plants
(in spanish),” Científica-ESIME-IPN, No 14, 1999, pp.
39-46.
[12] J.M. Flores, I.S. Baruch, and R. Garrido, “Recurrent
neural network for identification and control of
nonlinear systems (in spanish),” Científica-ESIME-
IPN, vol. 5, No 1, 2001, pp. 11-20.
[13] V.S. Asirvadam, S.F. McLoone, and G.W. Irwin,
“Parallel and separable recursive Levenberg-Marquardt
training algorithm,” in: Proceedings of the 2002 12th
IEEE Workshop on Neural Networks for Signal
Processing, 2002, pp. 129-138.
[14] L.S. Ngia, J. Sjöberg, and M. Viberg, “Adaptive neural
nets filter using a recursive Levenberg-Marquardt
search direction,” IEEE Signals, Systems and
Computer, vol. 1, 1998, pp. 697-701.
[15] L.S. Ngia, and J. Sjöberg, “Efficient training of neural
nets for nonlinear adaptive filtering using a recursive
Levenberg Marquardt algorithm,” IEEE Trans. on
Signal Processing, vol. 48, 2000, pp. 1915-1927.
[16] E. Wan, and F. Beaufays, “Diagrammatic method for
deriving and relating temporal neural networks
algorithms,” Neural Computations, vol. 8, 1996, pp.
182-201.

Combining Radar Emitter Recognition with Ambiguous a priori

Knowledge
Shurong Tian 1
Department of Basic science, Naval Aeronautical Engineering Institute, Yantai, 264001, China
Xin Guan, You He, Wei Xiong
Research Institute of Information Fusion, Naval Aeronautical Engineering Institute, Yantai, 264001, China
AMS subject classifications: 34K35,34H05,49J25,
Abstract: Based on the concept of the concordance 2 Concordance and Condi-

existing between the ambiguous evidences and the am-
biguous prior knowledge, and based on the fuzzy conditioned Evidence Combina-
tional Dempster-Shafer (D-S) evidence theory, we pro- tion
vides a novel radar emitter recognition approach which
reflects the influences of ambiguous prior knowledge.
2.1 Ambiguous a priori Knowledge
First, we change the fuzzy measurements about radar
emitter into the form of ambiguous bodies of D-S evi- and Concordance of Evidence
dences, and then, we apply the ambiguous conditional Definition 2.1 Given a compact subset E of Rn , let
D-S evidence theory to combine these evidences, and F (E) denote the collection of finite subsets of E. A
calculate the concordance of evidence and prior knowl- random finite set Ξ on E is defined as a measurable
edge. This method can help us to increase the reliabil- mapping from a probability space (Ω, s(Ω), P ) to the col-
ity of radar emitter recognition under complex battle lection (F (E), B(F )) of finite subsets of E:
circumstances.
Ξ : Ω → F (E)
where B(F ) is the (Borel) subsets of F (E).
Simply, random finite set is a random variable

which sample from collection of finite sets.
1 Introduction
Let U be a finite universe, and L = {l0 , l1 , · · · , ln }
be a finite list of numbers in [0,1] such that li < li+1 ,
Radar emitter recognition has become an important i = 1, · · · , n − 1, and where l0 = 0, ln = 1. Let ZL (U )
issue in military intelligence, surveillance, and recon- denote the set of all fuzzy membership functions of U
naissance. With the rapid development of radar tech- which take values in L only.
nology, density and complexity of radar signal are in-
creasing. Moreover, radar signals take on uncertainty, Definition 2.2 Let (Ω, s(Ω), p) be our given probabil-
illegibility and contradiction. Current algorithms for ity space. Then, a random (finite-level) fuzzy subset of
radar emitter recognition do not always give good per- U is a random variable σ : ω → ZL (U ). Conjunc-
formance. So some researches have been conducted for tion of random fuzzy subsets σ and τ is defined by:
emitter recognition over the past years, such as expert (σ ∧ τ )(ω) = σ(ω) ∧ τ (ω) for all ω ∈ Ω.
system, artificial neural network, and Dempster-Shafer
reasoning etc.[1]-[6], in [7] and [8], prior probability Pd In this paper, we denote ambiguous evidence B =
of multi-type radar occurrence had been considered. i=1 mi fi of U as random fuzzy subset σ of U , the
Because of secrecy of military affairs and complexity mass assignment ofPσ is mσ (fi ) = p(σ = fi ) = mi ,
d
of electromagnetism environment, prior probability of i = 1, · · · , d, and i=1 mi = 1. Similarly, random
radar occurrence is difficult to be obtained, on the other fuzzy set γ denote the ambiguous a priori knowledge
hand, the collected radars’ character parameters and of U , then random conditional event (σ|γ) reflect the
database is both imprecise and vague. So, in this paper, influence of γ to σ.
based on Mahler’s concept of concordance of evidence
and prior knowledge, and fuzzy conditioned Dempster- Definition 2.3 Let (Ω, s(Ω), p) be our given probabil-
Shafer FCDS theory[9], we give a novel radar emitter ity space.Then, a random fuzzy conditional event is a
recognition approach which reflect the influence of am- random variable on the set of fuzzy conditional events.
biguous prior knowledge. Let σ, γ : ω → ZL (U ) be two random fuzzy subsets of
1 Corresponding author, tiansr110@sohu.com.

U . Then the random fuzzy conditional event (σ|γ) is if σ and γ are statistically independent, then
defined by:
p(f ⊆ σ)p(γ = f ) mγ (f )δσ (f )
mγ (f |σ) = = (2.2)
(σ|γ)(ω) = (σ(ω)|γ(ω)), ∀ω ∈ Ω p(γ ⊆ σ) βγ (σ)
P
where δσ (f ) = p(f ⊆ σ) = f ⊆g mσ (g) is the com-
the mass assignment of random fuzzy conditional event monality measure (doubt measure) of fuzzy random set
is defined by: σ.
m(σ|γ) ((f |μ)) = p((σ|γ) = (f |μ)), ∀f, μ ⊂ U
2.2 The Fuzzy Conditioned Demp-
Random sets and conditional event algebra theory ster Shafer (FCDS) agreement
is in [10]. and combination
Consistency between evidence and prior knowledge
is measured by determining how frequently the random In this subsection, σ, τ , γ will always be assumed to be
fuzzy conditional event is a fuzzy tautology (that is, a statistically independent random fuzzy subsets of U .
fuzzy conditional event of the form (f |f )). We call the
set of these frequencies the concordance of the impre- P 2.5 Let γ be a random fuzzy
Definition P subset of U .
σ : B = f ∈ZL (U ) bf f and τ : C = g∈ZL (U ) cg g are
cise and vague evidence σ with respect to the imprecise
two ambiguous D-S evidences on U . Then,
and vague prior knowledge γ.
(1) The conditioned agreement of B, C with respect
Definition 2.4 Concordance of random fuzzy condi- to γ is defined by
tional event (σ|γ) is the truncation of (σ|γ) to the set X
αγ (B, C) = bf cg αγ (f, g) (2.3)
of fuzzy tautologies. That is, it is the mass assignment
f,g∈ZL (U )
f → mγ (f |σ) defined by:
where if βγ (f ) = 0 = βγ (g), then αγ (f, g) =
mσ|γ ((f |f )) βγ (f ∧g)
mγ (f |σ) = P , ∀f ∈ ZL (U ) (2.1) βγ (f )βγ (g)
, and αγ (f, g) = 0, otherwise.
g∈ZL (U ) mσ|γ ((g|g))
(2) The conditioned product of B, C with respect to
Concordance reflects the degree of a priori belief in γ is
evidence. X
B ·γ C = bf cg αγ (f, g)(f ∧ g) (2.4)
f,g∈ZL (U )
Remark 1: Fuzzy tautology is fuzzy conditional
event (f |f ). Fuzzy conditional event must be objects of (3) The conditioned combination of B, C with respect
the form (f |μ), and satisfying (f |μ) = (g|η) ⇔ (f ∧μ) = to γ is
(g ∧ η) and μ = η[9]. where ∧ denote conjunction of B ·γ C
fuzzy sets. B ∗γ C = , αγ (B, C) = 0 (2.5)
αγ (B, C)
Remark 2: Concordance is strong consistency. The following proposition shows that FCDS com-
From remark 1, we know that (σ(ω)|γ(ω)) = (f |f ) if bination obeys usual algebraic properties[9].
and only if γ(ω) ⊂ σ(ω) and γ(ω) = f . So, truncating
random fuzzy conditional event (σ|γ) to the set of fuzzy proposition:
tautologies is equivalent to requiring γ ⊆ σ. (1) The product ·γ is commutative, associative, dis-
tributive.
Notice that (2) (A ∗γ B) ∗γ C = A ∗γ (B ∗γ C) whenever defined.
m(σ|γ) ((f |f )) = p(σ ∧ γ = f, γ = f ) = p(f ⊆ σ, γ = f ) (3) αγ (A ·γ B, C) = αγ (A, B ·γ C).
we have
X X
3 Combining Emitter Recog-
mσ|γ ((g|g)) = p(γ ⊆ σ, γ = g) nition with Ambiguous a
g∈ZL (U ) g∈ZL (U )
= p(γ ⊆ σ) = βγ (σ) priori Knowledge

βγ (σ) is belief measure of fuzzy random set σ associated Let {R1 , · · · , Rn } be finite universe for radar type,
with γ. From (2.1), we have L = {l1 , · · · , lm }. For certain battle circumstance, ran-
dom fuzzy set γ denote ambiguous prior knowledge of n
p(γ ⊆ σ, γ = f ) types radar occurrence. Usually, the multi-type radars
mγ (f |σ) = = p(γ = f |γ ⊆ σ) can be recognized, however, when the enemy radars
p(γ ⊆ σ)

transmit interference signal, we will get ambiguous in- the same radar, there is a small possibility that R1 and
formation. Thus, we give a small mass assignment to R2 might be the same radar, and, there is a small pos-
γ = U , represent that the information is ambiguous. sibility that all three radars are actually the same. This
Using gray correlation analysis[11], we represent information is represented by a prior mass assignment
collected radar signal and information provided by clas- of fuzzy subsets of U as follows:
sification knowledge base as ambiguous D-S evidence.
The process as following: The data vector collected mγ (1/2, 0, 0) = 0.2, mγ (0, 1/2, 0) = 0.2
from radar emitter is reference sequence, we choose
the data vectors conform to the reference sequence mγ (0, 0, 1/2) = 0.2, mγ (0, 1/2, 1/2) = 0.2
from known radar emitter data base as comparison se- mγ (1/2, 1/2, 0) = 0.1, mγ (1, 1, 1) = 0.1
quences, compute the gray correlation grade of each
comparison sequence and reference sequence, and rep-
resent the sum of gray correlation grade of those com- Part A: Let βγ be the prior belief measure cor-
parison sequences belong to ith type radar Ri and the responding to the prior mass assignment mγ . As-
reference sequence as γ̃i , i = 1, · · · , n, we define gray suming that we have a observation evidence σ: B =
correlation grade of measurement vector and the ith 0.95(0, 1, 1/2) + 0.05(1, 1, 1), and σ, γ are independent,
class radar Ri as: we have the concordance of random conditional event
(σ|γ):
γ̃i
γi = Pn , i = 1, · · · , n
i=1 γ̃i mγ (1/2, 0, 0|σ) = 0.015, mγ (0, 1/2, 0|σ) = 0.323
hence, we get the mass assignments of evidence σ pro- mγ (0, 0, 1/2|σ) = 0.323, mγ (0, 1/2, 1/2|σ) = 0.323
vided by measurements sample:
mγ (1/2, 1/2, 0|σ) = 0.008, mγ (1, 1, 1|σ) = 0.008
mσ (Ri ) = γi , γi ∈ L, i = 1, · · · , n Thus, on the basis of evidence σ and priori knowl-
may be some Rj are not focal elements of σ, that is, edge γ we are able to conclude that the emitter is not
mσ (Ri ) = γi = 0. The detail is in [12]. R1 . However, the evidence is too equivocal for us to
Any fuzzy subset f of U have the form f (Ri ) = ai , be able to decide between the following three possibil-
ai ∈ L, i = 1, · · · , n (f is a fuzzy membership function). ities: the emitter is R2 , or R3 , or a third radar which
We abbreviate such a fuzzy set f by the ordered n-ple encompasses both R2 and R3 .
(a1 , · · · , an ), the fuzzy intersection of two such fuzzy
sets (a1 , · · · , an ) and (b1 , · · · , bn ) is therefore the fuzzy Part B: Getting another observation evidence τ :
set (a1 , · · · , an ) ∧ (b1 , · · · , bn ) = (a1 ∧ b1 , · · · , an ∧ bn ), C = 0.80(0, 1/2, 0) + 0.20(1, 1, 1), and σ, τ , γ are inde-
where ai ∧ bi = min{ai , bi }. We represent ambiguous pendent, using (2.3), we get the conditioned agreement
a priori knowledge γ as prior mass assignment of fuzzy between the two reports σ, τ :
P
sets of U : mγ (μj ) = mγj , j = 1, · · · , d, dj=1 mγj = 1.
The sensor report αγ (B, C) = 0.76αγ ((0, 1, 1/2), (0, 1/2, 0))
P be represented by fuzzy body of ev-
idence σ: B = qk=1 mσk fk , and γ, σ are statistically +0.19αγ ((0, 1, 1/2), (1, 1, 1))
independent. Then we calculate concordance of am- +0.04αγ ((0, 1/2, 0), (1, 1, 1))
biguous evidence σ and ambiguous a priori knowledge
+0.01αγ ((1, 1, 1), (1, 1, 1))
γ with (2.2), to help us to recognize radar emitter type.
If we can collect t statistically independent ambigu- where
ous evidences σ1 , · · · , σt , we firstly fuse the t evidences
using FCDS combination theory associated with a pri- βγ ((0, 1, 1/2) ∧ (0, 1/2, 0))
αγ ((0, 1, 1/2), (0, 1/2, 0)) =
ori knowledge, (from the proposition in section 2.2, we βγ (0, 1, 1/2)βγ (0, 1/2, 0)
obtain a new evidence uniquely) then, for the fused ev- βγ (0, 1/2, 0)
idence λ, we calculate the concordance between λ and =
βγ (0, 1, 1/2)βγ (0, 1/2, 0)
a priori knowledge γ, and recognize radar type.
the focal fuzzy sets of the prior mass assignment
mγ that are contained in (0, 1, 1/2) are (0, 1/2, 0),
4 Example Analysis (0, 0, 1/2), (0, 1/2, 1/2) with weights 0.2, 0.2, 0.2. Thus,
βγ (0, 1, 1/2) = 0.6, likewise, βγ (0, 1/2, 0) = 0.2. Hence,
In certain battle circumstances, the classification αγ ((0, 1, 1/2), (0, 1/2, 0)) = 1.67. For any subset
knowledge base consists of three new radar types, U = (a, b, c) of U ,
{R1 , R2 , R3 }, L = {0, 1/2, 1}. Any fuzzy subset f of U
abbreviated by ordered triple (a, b, c). βγ ((a, b, c) ∧ (1, 1, 1))
αγ ((a, b, c), (1, 1, 1)) =
We assume that the three radar types are equally βγ (a, b, c)βγ (1, 1, 1)
likely to occur in some special battle circumstance. 1
However, there is some evidence that R2 and R3 may be = =1
βγ (1, 1, 1)

we get αγ (B, C) = 1.31 > 1, we are therefore justified [2] B. Tessem, Approximations for Efficient Compu-
in combining B and C to get the conditioned product tation in the Theory of Evidence. Artificial Intelli-
of B and C: gence, 61(2), (1993), 315-329
B ·γ C = 1.27((0, 1, 1/2) ∧ (0, 1/2, 0)) [3] X. Guan, Y. He, X. Yi, Attribute Measure Recog-
+0.19((0, 1, 1/2) ∧ (1, 1, 1)) nition Approach and its Applications to Emitter
Recognition, Science in China Series F, Informa-
+0.04((1, 1, 1) ∧ (0, 1/2, 0))
tion Sciences, 48(2), (2005), 225-233
+0.01((1, 1, 1) ∧ (1, 1, 1))
[4] E. Granger, M. A. Rubin, S. Grossberg, et al.
= 1.31(0, 1/2, 0) + 0.17(0, 1, 1/2) + 0.01(1, 1, 1)
Classification of incomplete data using the fuzzy
the report of combined evidence λ of σ and τ is: ARTMAP neural network, Proceeding of the In-
ternational Joint Conference on Neural Networks,
B ∗γ C = 0.88(0, 1/2, 0) + 0.11(0, 1, 1/2) + 0.01(1, 1, 1) Vol IV, (2004), 34-40
and using (2.2), we get the concordance of (λ|γ): [5] Y. J. Shen, B. W. Wang, A fast learning algorithm
mγ (1/2, 0, 0|λ) = 0.008, mγ (0, 1/2, 0|λ) = 0.794 of neural network with tunable activation function,
Science in China, Ser F, 47(1),(2004), 126-136
mγ (0, 0, 1/2|λ) = 0.095, mγ (0, 1/2, 1/2|λ) = 0.095
mγ (1/2, 1/2, 0|λ) = 0.004, mγ (1, 1, 1|λ) = 0.004 [6] M. Testsuya, K. Yasuo, S. Yoshiharu, Association
Rules and Dempster-Shafer Theory of Evidence,
Thus, on the basis of the evidence σ, τ , and a priori DS 2003, LNAI 2843, (2003), 377-384
knowledge γ, we conclude that the emitter is R2 .
[7] R. Mahler. Combining Ambiguous Evidence with
From this example, although evidences are ambigu- Respect to Ambiguous a Priori Knowledge I:
ous, we can recognize the type of radar (Part B) or ex- Boolean Logic. IEEE Trans Systems, Man and Cy-
clude some radar types in U (Part A) using our method bern Part A: Systems and Humans, 26(1), (1996),
associated with a priori knowledge. 27-41
[8] M. J. Gai, X. Guan, X. Yi, B. Shi, Research on
Combining Radar Emitter Recognition with a prior
5 Conclusion Knowledge, Journal of XIDIAN University, 33(5),
(2006), 831-837
In this paper we addressed the problem of radar clas-
sification when the underlying classification knowledge [9] R. Mahler, Combining Ambiguous Evidence with
base is both imprecise and vague. In our approach, Respect to Ambiguous a Priori Knowledge II:
multi-source information can be used sufficiently to im- Fuzzy Logic. Fuzzy Sets and Systems. 75(1995),
prove the reliability of radar recognition. Especially, 319-354
in problem of target recognition, information collected
[10] I. Goodman, R. Mahler and H. Nguyen, Mathe-
from multi-type sensors, we can combine the infor-
matics of Data Fusion, Kluwer Academic Publish-
mation using FCDS when providing ambiguous prior
ers, (1997)
knowledge about target, to increase the reliability of
target recognition. [11] J. L. Deng, Gray Control System, Huazhong Uni-
versity of Science and Technology Press,Wuhan,
(1997)
References
[12] X. Guan, Y. He, X. Yi, A Noval Gray Model for
[1] Y. He, G. H. Wang, D. J. Lu,Y. N. Peng, Multi- Radar Emitter Recognition, 7th International Con-
sensor Information Fusion with Applications, Pub- ference on Signal Processing, August,(2004), Bei-
lishing House of Electronics Industry, (2000) jing, China: 2116-2119

Local Weather Forecast for Flight Training Using

Neural Networks
Jianguo Chen
Department of Computer Science, Leshan Teachers College
Leshan 614004, Sichuan, P. R. China.
Email:cenjguo@sina.com
Shijun Liu
Department of Computer Foundation Teaching
Chengdu University of Information Technology, Chengdu 610225, P. R. China.
Email: lsjuen@cuit.edu.cn
Abstract— Local weather forecast is very important for flight Finally, the local weather forecasters add the factual state of
training. Neural networks have been used in this paper to local weather to local historical forecast database. Then they check
weather forecast for flight training. Experiments show good whether the consistencies between the conclusion of weather
performance of the neural networks method for such weather
forecast. forecast and the actual weather condition. If they are matching,
the confidence of the rule increase, otherwise decrease. For
Index Terms— Local weather, Forecast, Flight training, Neural example, in a certain process of forecasting, a forecast rule
Networks.
that ”IF C1 then R1 ” shows that there will be R1 after a
certain time because the C1 appear. If the actual weather is
I. I NTRODUCTION R2 later, and R1 = R2 , the confidence of the rule that ”IF
C1 then R1 ” would be increased; otherwise if R1 = R2 , then
The local weather forecast (LWF) is to estimate the state then the confidence of the rule that ”IF C1 then R1 ” would
of weather in some particular districts. The LWF plays im- be decreased. The rule that ”IF Ci then Ri ” will be used to
portant roles in flight training because it significantly affect forecast the local weather, when the confidence of the rule
the security of flying directly. Its formation mechanism and have reached a certain threshold.
forecast involve a rather complex physics that is not com- Because the weather of local region is influenced by its
pletely understood so far. Traditionally,weather forecasting is circumstance and amounts of factors such as cold front, wind
based mainly on numerical models which attempts to models shear, air vortex, air trough and subtropical high etc, and the
the fluid and thermal dynamic system for grid-point times accuracy of LWF for flight training must be keeped in high
series prediction based on boundary meteorological data. But level, we must update rules of LWF for flight training in time.
this classical approach is not suited to LWF since it is more However, in the traditional LWF for flight training, we have
appropriate for long-term(24 or 48 hours) forecasting over a to concentrate lots of human and material resources to modify
large area of several thousand kilometers. the rules every several years. It is obvious that there are some
So,for mesoscale and short-term weather forecasting in a disadvantages. We must look for new method to solve this
relative small region such as the region of flight training problem.
is need for dissimilar to approach which based on relative The neural networks (NN) technique has been frequently
weather conditions combined with historical weather informa- used for forecast, recognition and classification in many
tion in some local areas. The analyzing processes for LWF weather events[1][2][3][4][6]. Some practical applications of
can be described as follows. NN have been used in rainfall prediction[5][7]. Neural net-
The first is to collect weather related data such as, cold front, works are trained to learn relationships involving the atmo-
the line of wind shear, subtropical high, trough, and vortex, spheric circulation and local weather with the intention of
etc. in their region as much as possible. These weather data capturing the local circulation dynamics. In this paper, we use
are will directly affect the prediction of local weather. NN to update the rules of LWF for flight training. Using the
In the second step, the local weather forecasters search for method of NN, not only release the work stress of weather
statistical knowledge from historical database of LWF. Such forecaster, but also improve the velocity and quality of weather
knowledge can be described in the form of association rules, forecast, especially can update the new rules in time.
such as ”if C then R”, where C is some vector in which every The rest of this paper is organized as follows. In Section
element is the weather fact related to the local weather, R is 2, will discuss the preprocessing and standardization of me-
also a vector, each element of R is a forecast result of weather, teorologic data. In Section 3, we will give a review about
such as, thunderstorm, fog, floating dust, etc. the NN model. Experiments will be carried out in Section 4.

Conclusions will be given in Section 5. cold front, summer cold front, spring subtropical high, etc.
The meteorological conditions can be represented as a 40-
II. DATA P REPROCESSING AND S TANDARDIZATION dimensional vector. The vector is defined as:
In this section, we will deal with the preprocessing and p= ( LF11 , ..., LF34 , Q01 , ..., Q08 , (1)
standardization of the meteorologic data of the LWF. The W01 , ..., W08 , ...C01 , ..., C08 , ...R04 )T40 .
following factors have significant influence on the LWF for
In the eqs.(1), if the probability of the meteorological condi-
flight training in some particular area, we denote this area as
tions is less than 0.1, the corresponding element in the vector
HX area.
is 0, otherwise it is 1. The second is to conclude the type of
Cold Front: It is the leading portion of a cold atmospheric
weather according to the historical weather data such as table
air mass moving against and eventually replacing a warm air
VI.
mass. The cold front which influence the local climate in the
The type of weather can be represented as a 11-dimensional
HX area is classified as western(LF1 ), northwest(LF2 ), and
vector. The vector is defined as:
northern clod front(LF3 ), according to the historical weather
data in the HX area. In the table I, if the probability is less than a = (T01 , ..., T05 , ..., T11 )T . (2)
the lest value of forecast, the form is null. In the same region,
In the eqs.(2), if the value of element is 0, the corresponding
the probability of some weather influenced by dissimilar cold
weather can not appear, otherwise the weather appears. For
front may be enlarged or reduced as the the variety of the
example, if T01 = 0, the rainfall does not appear. Finally,
environment. For example, as the size of green area in some
according to the historical weather data in the HX region, the
region increasing, the weather of float dust will reduce. So,
vector is used to represent the all meteorological data and the
we must update the rule of LFW for flight training in time.
corresponding weather.
Wind Shear: It is a change in wind direction and speed
between slightly different altitudes, especially a sudden down-
draft. According to the historical weather data in the HX area, III. T HE N EURAL N ETWORKS M ODEL
the name of wind shear of Guanzhong which influence the An NN is an arrangement of processing elements (neurons).
local climate is classified as cold wind shear of Guanzhong The artificial neuron model consists of a linear combination
(LSFQB), warm wind shear of Guanzhong (LSFQB) such as followed by an transfer function. Learning is accomplished
table II. through modifying existing connections between neurons or
Air Vortex: It is a spiral motion of fluid within a limited establishing new connections. Arrangement of such units form
area, especially a whirling mass of air that sucks everything the NN, which has the feature such as: very simple neuron-
near it toward its center. According to the historical weather like processing elements; weighted connections between the
data in the HX area,the air vortex which influence the local processing elements; highly parallel processing and distributed
climate is classified as northwest vortex and southwest vortex control; automatic learning of internal representations. The
which are defined as table III. simplest NN model is the single-layer Perceptron with a hard
Air Trough: It is an elongated region of relatively low limiter transfer function, which is appropriate for solving
atmospheric pressure, often associated with a front. The trough liner problems[8]. In this paper, we use the the single-layer
which influence the local climate in the HX area is classified Perceptron with a hard limiter transfer function to update
as northwest trough and tableland trough according to the the rule of LWF for the flight training. In the section 2,
historical weather data. Changing of state of air trough maybe we proposed the meteorological data which consisted of 40
bring about the rain, wind, dust, thunderstorm, rainstorm etc. elements including different factors in different seasons in
In particular, thunderstorm that the main factor affecting flight the HX region. So, we designed the NN model which the
safety occurs when the tableland trough rises between spring input layer had 40 neurons ( 5 meteorological variables in the
and summer. It is defined the symbol of trough as input of different seasons) and the output layer had 11 neurons such
NN such as C01 , C02 ,..., C07 , C08 , according to the different as Fig.1.
seasons and the types of trough in the table IV. The output of the NN is given by:
Subtropical High: The subtropical high is one of several a = hardlim(W p) (3)
regions of semipermanent high atmospheric pressure located
1 if n ≥ 0
over the oceans near 35 latitude in both the northern and when y = hardlim(n) =
0 otherwise.
southern hemispheres of the Earth. In the HX region, the
subtropical high has distinct influence on the weather such where the p is the input vector, and the network weight
as rainfall, sandstorm, wind, hail, cumulus, thunderstorm, matrix W is as following:
rainstorm etc. During spring and summer months, the state ⎡ ⎤
w0101 ... w0140
of subtropical high maybe cause the sandstorm. It define the W =⎣ . ... . ⎦ . (4)
symbol of trough as input of NN such as R01 , R02 , R03 , R04 , w1101 ... w1140 11×40
according to the different seasons in the table V.
the step of preprocessing and standardization: The first is The hardlim transfer function is defined as:

to analyze and conclude the meteorological conditions which 1 if n ≥ 0
y = hardlim(n) =
influence the weather of the local region, such as spring 0 otherwise.

TABLE I
THE STATISTICAL RESULT OF WEATHER WHICH INFLUENCED BY COLD FRONT IN HX AREA . THE VALUE OF FORM IS THE PROBABILITY OF WEATHER
INFLUENCED BY DISSIMILAR COLD FRONT IN THE DIFFERENT SEASON .
Cold Front Season Rainfall Wind Dust Snowfall Coldwave Thunderstorm Rainstorm
LF1 Mean / Year 0.645 0.834
LF11 Spr. 0.846 0.867
LF12 Sum. 0.723 0.845 0.345
LF13 Aut. 0.602 0.903
LF14 Win. 0.409 0.721
LF2 Mean / Year 0.542
LF21 Spr. 0.698 0.787 0.675
LF22 Sum. 0.567 0.327 0.432 0.302
LF23 Aut. 0.525
LF24 Win. 0.378 0.865 0.786 0.352
LF3 Mean / Year 0.342
LF31 Spr. 0.369 0.679 0.623 0.567
LF32 Sum. 0.429 0.685 0.567 0.604
LF33 Aut. 0.37
LF34 Win. 0.2 0.784
TABLE II
THE STATISTICAL RESULT OF WEATHER WHICH INFLUENCED BY WIND SHEAR IN HX AREA . THE VALUE OF FORM IS THE PROBABILITY OF WEATHER
INFLUENCED BY DISSIMILAR WIND SHEAR OF G UANZHONG IN THE DIFFERENT SEASON .
Wind Shear of Guanzhong Season Rainfall Wind Snowfall Coldwave Rainstorm

Q01 Spr. 0.923 0.456
LSFQB Q02 Sum. 0.976 0.546
Q03 Aut. 0.934 0.234
Q04 Win. 0.897
Q05 Spr. 0.234
NSFQB Q06 Sum. 0.212
Q07 Aut. 0.245
Q08 Win. 0.145
TABLE III
THE STATISTICAL RESULT OF WEATHER WHICH INFLUENCED BY AIR VORTEX IN HX AREA . THE VALUE OF FORM IS THE PROBABILITY OF WEATHER
INFLUENCED BY DISSIMILAR AIR VORTEX IN THE DIFFERENT SEASON .
.
Air Vortex Season Rainfall Wind Snowfall Coldwave Rainstorm
W01 Spr. 0.454 0.345 0.124
Northwest Vortex W02 Sum. 0.778 0.767 0.346
W03 Aut. 0.684 0.886 0.772 0.234
W04 Win. 0.223 0.346 0.357
W05 Spr. 0.334 0.245 0.123
Southwest Vortex W06 Sum. 0.661 0.454 0.214 0.221
W07 Aut. 0.321 0.566 0.174
W08 Win. 0.112 0.247
TABLE IV
THE STATISTICAL RESULT OF WEATHER WHICH INFLUENCED BY AIR TROUGH IN HX AREA . THE VALUE OF FORM IS THE PROBABILITY OF WEATHER
INFLUENCED BY DISSIMILAR AIR TROUGH IN THE DIFFERENT SEASON .
Air Trough Season Rainfall Wind Dust Thunderstorm Rainstorm

C01 Spr. 0.445 0.345 0.342 0.103
Northwest Trough C02 Sum. 0.552 0.243 0.222 0.233 0.234
C03 Aut. 0.459 0.634 0.123 0.448
C04 Win. 0.106 0.211
C05 Spr. 0.546 0.424 0.542 0.331 0.442
Tableland Trough C06 Sum. 0.641 0.354 0.221 0.214 0.433
C07 Aut. 0.354 0.578 0.254
C08 Win. 0.252 0.346 0.102

TABLE V
THE STATISTICAL RESULT OF WEATHER WHICH INFLUENCED BY SUBTROPICAL HIGH IN HX AREA . THE VALUE OF FORM IS THE PROBABILITY OF
WEATHER INFLUENCED BY DISSIMILAR SUBTROPICAL HIGH IN THE DIFFERENT SEASON .
Subtropical High Season Rainfall Sandstorm Wind Hail Cumulus Thunderstorm Rainstorm
R01 Spr. 0.545 0.386 0.348 0.232 0.422
R02 Sum. 0.712 0.456 0.345 0.553 0.323 0.554
R03 Aut. 0.554 0.557 0.678 0.232 0.102 0.349
R01 Win. 0.234 0.122
TABLE VI
THE STATISTICAL RESULT OF WEATHER WHICH INFLUENCED BY COLD FRONT IN HX AREA . THE VALUE OF FORM IS THE PROBABILITY OF WEATHER
INFLUENCED BY DISSIMILAR COLD FRONT IN THE DIFFERENT SEASON .
type of weather Rainfall Wind Dust Snowfall Coldwave Thunderstorm Rainstorm Sandstorm Fractus Cumulus Hail
Symbol T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11
Fig. 1. the topology of NN model which the input layer had 40 neurons
The Percetron learning rule is defined as: as:
W (n + 1) = W (n) + η · (D − a(n)) p, (5) if p1 = (0, 1, 0, ..., 1, 0, ..., 0)T40 ,

then a1 = (1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)T . (6)
where the η is rate of learning, the D is the target value of
where LF12 = 1, W02 = 1, the other 38-meteorological data
output.
is 0 in the p1 ;
(2).”If LF11 = 1 and C05 = 1 and R01 = 1, Then
T11 = 1”(if the western clod front, the tableland trough
IV. E XPERIMENTS
and subtropical high appear and the probabilities of them
The rules of the LWF were constituted by human experts are above 0.25 at the same time in spring, then hail will
(weather forecaster) according to a great deal of historical appear).The rule is defined as:
weather data for the flight training in the HX region. Our if p2 = (1, 0, 0, ..., 1, 0, ..., 0)T40 ,
experiment is designed to validate the usability of NN which
then a2 = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)T . (7)
update the rules of LWF for flight training. According to the
historical weather data for flight training in the HX region, where LF11 = 1, C05 = 1, R01 = 1, the other 38-
we select two rules in the historical weather forecast database meteorological data is 0 in the p2 ;
such as: But, in fact, the actual local weather shows that: when
(1).”If LF12 = 1 and W02 = 1, Then T01 = 1 and T02 = the western clod front and the northwest vortex appeared in
1”(if the western clod front and the northwest vortex appear summer, rainfall, wind and thunderstorm occurred with high
and the probabilities of them are above 0.25 at the same time frequency; when the western clod front, the tableland trough
in summer, then it will be rainy and windy). The rule is defined and subtropical high appeared in spring, hail and sandstorm

occurred with high frequency. If we forecast the local weather
according to the old rules, the flight training will not take
safely. So we have to update the rules of LWF for flight
training. We use the network architecture as Fig.1 to update
rules, the all weights of NN have been initiated as wij =
1, (i = 1, ..., 11; j = 1, ..., 40) and the rate of learning η = 0.5.
We use the p1 ,p2 as the input vectors of the NN, and update
the output vector a1 and a2 as following:

a1 = (1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0)T ,

a2 = (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1)T . (8)
We use some historical weather data and the actual result

of weather to test the NN which trained by the p1 , p2 , a1 ,

a2 . The result of testing shows the method which use single-
layer Perceptron NN to update the rules of the LWF for flight
training is effective and double-quick.
V. C ONCLUSION
In this paper, we have studied LWF for flight training by
using a single-layer Perceptron neural network. Experiments
shown that our method is both effective and efficient.
R EFERENCES
[1] Marzbam, C., Stumpf, G.: Neural Network for tornado prediction based
on Doppler radar derived attributes, Journal of Applied Meteorology,
35(1996) 617-626.
[2] Cavazos, T.,: Downscaling large-scale circulation to local winter rainfall
in North-Eastern Mexico, International Journal of Climatology, 17(1997)
1069-1082.
[3] Hsu, K., Gao, H., Soroshian, S., Gupta, H.: Precipitation estimation from
remotely sensed information using artificial neural networks, ournal of
Applied Meteorology, 36(1997) 1176-1190.
[4] Hall, T.: Precipitation forecasting using a neural network, Weather and
Forecasting, 14(1999) 338-345.
[5] Maier, H., Dandy, G.: Neural networks for the predictions and forecasting
of water resources variables: review of modeling issues and applications,
Environmental Modelling and Software, 15(2000) 101-124.
[6] Jaruszewicz, M., Mandziuk, J.: Application of PCA method to weather
prediction task, Proceedings of the 9th International Conference on Neural
Information Processing (ICONIP’02), vol.5(2002) 2359-2364.
[7] Ramı́rez, M., Velho, H., Ferreira, N.: Artifical neural network technique
for rainfall forecasting applied to the São Paulo region, Journal of
Hydrology, 301(2005)146-162.
[8] Haykin S. : Neural Networks: A Comprehensive Foundation, Published
by Pearson Education Inc, Second Edition, 2004.

The Research of Enterprise Strategy Management Based on Bayesian

Networks
Xu Jian-zhong1ˈRen Jia-song2
1 School of Economics and Management, Harbin Engineering University, P.R.China, 150001
2 School of Economics and Management, Harbin Engineering University, P.R.China, 150001
Abstract: The economical globalization and the enterprises as a whole, give dual attention to the integrity
competition internationalization cause the instability and character and the long term character.
the complex of the survival environment that enterprises
faced with greatly increased, so Chinese enterprises must 2 The integrity and long term of strategic
pay more attention to the enterprise strategy management management
for survival and development[1]. However, the enterprises
face with the international and domestic changeable
environment of management at present, so the strategic 2.1 The integrity
management is no longer the management in traditional The enterprise as a whole includes many parts
significance, we need to take the enterprises as a whole which relate and affect mutually. The parts have partial
and use the conformity thought to arrange the human problems and the wholeness has whole problems, the
resources, Internal environment, external environment whole problems are not the sum of the partial problems
and so on in order to realize optimization of the and have essential difference with them[5]. The
enterprises own system[2]. We must pay attention to the development of enterprises faces many whole problems,
long term of the enterprises management when we for example the responsing to the great changes of
considerate the integrity of enterprises, thus we need to environment, the development, using and conformity of
control and manage the every kind of instabilities in the resources, the balance of elements of production and
process of enterprises management in order to achieve management[6]. Planning the integrity problems well is
the dynamic of strategic management and the strategic the important condition of the enterprises’ development,
target well[3]. The Bayesian Networks, a completely so we must grasp the integrity development of
statistical model can give dual attention to the integrity enterprises frequently. Grasping the whole development
and the long term of enterprises management in a very can not see only the trees, but not see the forest. We must
great degree, it can forecast the possibility of the embark from overall situation of enterprises, consider
realization of the strategic management goal on the each link and factor which will influence the realization
maximum limit by weaving the enterprise’ each link in a of strategic target in the future, do our best to find the
network, so it is the ideal model of enterprise strategy reasons in the most wide range in order to guarantee the
management. However, the Bayesian Networks used validity of enterprise strategy management.
widely in the enterprises of western countries still does
not obtain enough attentions, this will affect the 2.2 The long term
development of Chinese enterprises greatly. Therefore, The enterprises also have life, life can be long or
we should pay more attention to the research of the short. The investors should set up "the longevity
Bayesian networks in order to enhance the competitive enterprise" consciousness[7]. The operators should pay
power of Chinese enterprises. In this article, we analyze dual attention to short-term development problems and
the integrity and long term of enterprises management long-term development problems in order to cause the
firstly, then we introduce a simple Bayesian networks enterprise to have long life. The long-term development
and explain its using in the enterprise strategy questions are not sum of the short-term development
management by an example, at last we summarize the questions, there are essential differences between them[5].
problems of Bayesian networks in the application. The enterprises which eager to have long life face with a
Keywords: Bayesian networks, enterprises, lot of long-term questions such as the development goal
strategic management question, the development step question, the product and
technical innovation question, the brand and prestige
question, the development of talent question, the cultural
1 Introduction reconstruction question, these questions exist in each link
of the enterprises management by all kinds of forms[8].
The development of enterprises includes the whole The enterprises craving for long life should care about
process of the birth, the growth and the expansion, the future and not only think about the future questions
includes not only the increase of quantity but also the ahead of time but solve them ahead of time, because of
change of nature, the integrity and long term of solving any problems need a process, the operators must
enterprise strategy management pass throughout the deal with the relations between the short-term benefits
whole process of enterprises’ development[4]. Therefore, and the long-term benefits correctly. In a word, the
enterprise strategy management also needs to take the operators should place the standpoint of strategic

management on the long term of the enterprises carry on the supposition analysis, at the same time this is
management, find the problems and solve them as also the most fascinating characteristic of this network.
necessary in order to guarantee the realization of Therefore, the characteristic of using the visible model
enterprise strategic target. which is easy to deal with to carry on the supposed
In fact, the integrity questions of enterprise strategy analysis becomes the most important reason for using it.
management embody the management scope, namely
mains that the operators need to find problems on the X:20% condition 0
maximum limit; but the long term questions manifest the X
50% condition 1
efficiency, namely mains to find the problems as
Y
necessary[9]. Therefore, the models which can give dual 30% condition 2
attention to the integrity and the long term is suitable for Y: 30% condition 0
the application of strategic management, the Bayesian
networks as a complete statistical model can satisfy the 70% condition 1
basic requests of enterprise strategy management well. Z: 30% condition a
Z
40% condition b
3 The summary of Bayesian networks[10]
30% condition c
The Bayesian networks is one kind of statistical

Fig.1 The points and probabilities of a simple Bayesian
model, this model studies the boundary distribution Network
conditions of a series of stochastic factors or the risk
characteristic multiplication distributed tendency. The
4 A simple example of Bayesian Networks
most basic and simple structure of Bayesian networks
"construction" is a linear and non- circulation graph, the
In the view of the above characteristics of the
points of the graph represent the random variables, then
Bayesian networks, we introduce it into the enterprise
the line between every two points expresses the relation
strategy management in order to control and manage
of them. The right part of Fig.1 explains a simple
strategic targets. An enterprise strategy management
Bayesian networks “construction”: there is a pure
includes many links and aspects, in this article we carry
analyzed target in this network (namely the end point Z),
on the supposed analysis based on Bayesian networks by
Z has two mother points X and Y; then the distributed
a sub-module of enhancing the staff efficiency.
conditions of every point are demonstrated on the left
Staff efficiency includes two aspects: staff wages
side. The initial point refers to these not having upper
and staff quantity, lower staff wages and less quantity
leveled points (for example X and Y), each of them
indicate that staff' efficiency is higher and can easily
presents the distributed conditions of single variables,
make more benefits for the enterprise in the short time.
this kind of distributed condition must be limited
However, staff quantity decides the working pressure, the
correctly in the model: in Fig.1, the probability that X
long lasting and oversized working pressure can also
equals to 0 is 20%, analogized by this. But the end point
affect the working efficiency of staff. At the same time,
Z presents the distributed conditions of many facets, its
staff wages and staff quantity decide the manpower cost
distribution is determined by the distributed situation and
together, as in Fig.2.
conditional probability of the initial points (in this
example, Z in condition a expresses that X equals to 2
and Y equals to 1). All the conditional probabilities are
Staff efficiency
not demonstrated in Fig.1, only the combined distributed
conditions of the goal points are demonstrated on the
right side of Fig.1.
This network uses the Bayesian rules to carry on the
computation for the network dissemination: if the Staff wages Staff quantity
probabilities of the initial points and the conditional
probabilities of all points are determined, then the
distributed conditions of all points in the network can be
quantized. Regard to two events Y and Z, the concrete
meaning of Bayesian rules as in (1). Manpower cost
P Z Y P Y Z P Z P Y (1)
More important: If the condition of a point in the Working pressure
network is determined, the network can use the Bayesian
rules to carry on the computation from top to bottom or
Fig.2 The sub-module of enhancing staff efficiency
from bottom to top by itself, thus obtains the afterwards
changed probability of any points in the networks. This We turn the model in Fig.2 into the form of simple
is also the foundation of using the Bayesian network to letters for the under operation, as in Fig.3.

Step 2: Calculating the combined probabilities.
P(AFT)=P(A)P(F|A)P(T|A)=0.2 0.3 0.8=0.048
A
P(AFT)=P(A)P(F|A)P(T|A)=0.8 0.5 0.1=0.040
P(AFT)=P(A)P(F|A)P(T|A)=0.2 0.7 0.8=0.112
F T P(AFT)=P(A)P(F|A)P(T|A)=0.8 0.5 0.1=0.040
P(AFT)=P(A)P(F|A)P(T|A)=0.2 0.3 0.2=0.012
P(AFT)=P(A)P(F|A)P(T|A)=0.2 0.7 0.2=0.028
P(AFT)=P(A)P(F|A)P(T|A)=0.8 0.5 0.2=0.080
C
S P(AFT)=P(A)P(F|A)P(T|A)=0.8 0.5 0.9=0.360
Fig.3 The sub- module expressed by simple letters
P(FTC)=P(C|FT)[P(AFT)+P(AFT)]=0.98 0.128=0.125
By this kind of model, we can use the Bayesian rules
to carry on the probability operation of each point in the P(FTC)=P(C|FT)[P(AFT)+P(AFT)]=0.02 0.128=0.003
network. P(FTC)=P(C|FT)[P(AFT)+P(AFT)]=0.8 0.172=0.138
Step 1: According to the statistical data and the
experience of experts, we define that P (A) equals to 0.2
and P (A) equals to 0.8, at the same time we give the P(FTC)=P(C|FT)[P(AFT)+P(AFT)]=0.1 0.588=0.059
conditional probabilities during various points, as in P(FTC)=P(C|FT)[P(AFT)+P(AFT)]=0.2 0.172=0.034
Tab.1, Tab.2, Tab.3.
Tab.1 The conditional probabilities during various points P(FTC)=P(C|FT)[P(AFT)+P(AFT)]=0.9 0.588=0.529
F F T T
A 0.3 0.7 0.8 0.2 P(TS)= P(T)P(S|T)=0.24 0.96=0.230
A 0.5 0.5 0.1 0.9 P(TS)= P(T)P(S|T)=0.24 0.04=0.010
P(TS)= P(T)P(S|T)=0.76 0.3=0.228
Tab.2 The conditional probabilities during various points
P(TS)= P(T)P(S|T)=0.76 0.7=0.532
C C
Step 3: Calculating the boundary probabilities.
F T 0.98 0.02
F T 0.8 0.2 P(F ) ¦ ¦ P ( AFT )
A T
0 . 18
F T 0.7 0.3
F T 0.1 0.9 P (T ) ¦¦
A F
P ( AFT ) 0 . 24
Tab.3 The conditional probabilities during various points

P (C ) ¦¦ P ( FTC ) 0 .4
S S F T
T 0.96 0.04
T 0.3 0.7
P (S ) ¦T
P ( TS ) 0 . 458
According to the results of the computation, we

A: The staff efficiency is excessively low discover that the probability of excessively high staff
A: The staff efficiency is normal wages, too many staff quantities, excessively high
manpower cost as well as high working pressure equals
F: The staff wages is excessively high respectively to 0.18, 0.24, 0.4, and 0.458 when the
F: The staff wages is normal probability of excessively low staff efficiency equals to
0.2. Obviously, staff wages and staff quantity are
T: The staff quantities are too many basically normal, but manpower cost is slightly high and
T: The staff quantities are normal there is some certain working pressure. If operators want
to reduce manpower cost and working pressure further in
C: The manpower cost is excessively high order to control them in a certain scope, they can
C: The manpower cost is normal determine the probability scope of manpower cost and
working pressure firstly, then carry on the reverse
S: The working pressure is high operation according to the Bayesian rules and make staff
S: Without working pressure efficiency controlled in a certain scope in order to
achieve the anticipated effect. At the basis of the same

principle, we can also design the Bayesian sub- module model applied widespread, in the above analysis process,
for the other links of enterprise strategy management and we can discover that all the probabilities in the entire
then calculate the probability of various points according network are the experience value obtained by the
to the similar principle in order to achieve the control qualitative analysis, so there are some subjective factors
and the management goal. inevitable. Therefore, the superintendents cannot depend
upon the computed results only to make judgment as
5 The problems in using Bayesian networks decision-making. It can only provide the directive
Just as other data mining and risk control models, the reference to us, but not the accurate suggestions. The
Bayesian networks is not a multi-purpose network model, superintendents should unify the own enterprise and the
even if in the identical management domain, there also actual situation of enterprises to make the comprehensive
can be a series of questions such as the size of company decision-making in the direction provided by the
scale and the localization of the goal market. Therefore, computed results. More accurately said, the Bayesian
we should care about some problems below when we use networks model is one kind of risk early warning model
the Bayesian networks specifically[11]. in fact. It can promptly make the superintendents to find
the problems and then make the judgment.
5.1 The probability and conditional probability
determination of various points 6 Conclusion
In the example above, the probability and the
conditional probability of each point are the result of Facing the tendency of the economical globalization
simulating the statistical data and the experiences of and the competition internationalization speeding up
experts. In the actual application, the probability unceasingly, the strategic management obtains more and
determination is the most difficult link, the accurate more attentions from the business community. The
degree of determination affects the work effect of the superintendents must innovate unceasingly and seek the
Bayesian networks directly. This mainly depends upon new management methods in order to adapt to the new
the concrete analysis of two aspects to produce the situation, at the same time give dual attention to the
probability qualitatively[11]. Among the two aspects, the integrity and the long term of enterprise strategy
accumulation of primary data is an important link, if management[12]. The Bayesian Networks, a completely
there is not primary data, then there will be not the statistical model can give dual attention to the two
reference of the statistical analysis, therefore, the requests in a very great degree, promptly find and solve
accumulation of primary data is more valuable with more the problems, realize the strategic plan and goal of the
primary data. The enterprises can also entrust the enterprises effectively, so it is the ideal method for
statistical data and the expert advice with different enterprise strategy management. Along with its
weight according to the actual situation of enterprise unceasing application and improvement in the strategic
when operators determine the probability and the management domain, we believe that the Bayesian
conditional probability of various points, then draw up networks will certainly promote the development of
and become the probability of various points finally. enterprise strategy supervisory work in our country.
5.2 The design of sub- module is not necessarily only

In the entire Bayesian networks, the design of sub- References
module is also the especially important link. It is the part
which can manifest the designer’s purports mostly and is [1] Bryan A. Lukas, J. Justin Tan, G. Tomas M. Hult,
also the key part which relates to the degree of Strategic fit in transitional economics: The case of
satisfaction of the superintendents about the results. Here,
what we inspect is the degree about staff efficiency China’s electronics industry, in Journal of
influence the human resources policy of enterprise, we Management, 2001,27, 409-429.
mainly choose several qualitative indices such as staff [2] A. D. F. Price and E. Newson, Strategic Management:
wages, staff quantity, manpower cost and working Consideration of Paradoxes, Processes, and
pressure as points, if we want to inspect how staff
Associated Concepts as Applied to Construction, in
efficiency influence the management achievement of the
enterprises, we must choose other indices to carry on the Journal of Management in Engineering, 2003, 10, pp.
analysis. Moreover, according to an identical question, 18-36.
there also can be many different design proposals of sub- [3] Keith D. Brouthers, Patrick Arens, Privatization and
modules. We must choose different indices according to Strategic Fit: Evidence from Rumania, in Business
the actual situation of enterprises and the like of
Strategy Review, 1999, 10, pp. 53-59.
superintendents in the work.
[4] Matthew S. Kraatz, Edward J. Zajac, How
5.3 Not depending on the computed results totally as Organizational Resources Affect Strategic Change
decision-making and Performance in Turbulent Environments: Theory
Although the Bayesian networks is the statistical

and Evidence, in Organization Science, 2001,5,
pp.632-610
[5] Zhang Wan-chun, Tan Zhong-fu, The Integration
Thought Model on Enterprise Strategic Management,
in Journal of North China Electric Power University.,
2006,1,pp. 34-37.(in Chinese)
[6] Luo Yi-xin, Probe and Seek for Main Questions on
Business’s Strategic Management in China and it
Countermeasure, in Science Technology and Industry,
2005,12, pp. 9-11.(in Chinese)
[7] Sampler, Jeffrey L, James E, Strategy in dynamic
information-intensive environments, in Journal of
Management Studies, 1998,4, pp: 429-436.
[8] Barnett, William p. Burgelman, Robert, Evolutionary
perspectives on strategy, in Strategic management
Journal, 1996,7, pp:5-19.
[9] Neil Wrigley, Strategic market behaviour in the
internationalization of food retailing, in European
Journal of Marketing, 2000, 3, pp.891-907.
[10] Carol Alexander, Operational Risk: Regulation,
Analysis and Management. NJ: Chinese finance
publishing house,2004,pp.307-309.(in Chinese)
[11] Hu Xiao-ming, The Research of Software Risk
Management Based on Bayesian Networks. NJ:
Nanjing University of Science and Technology, 2004,
pp. 33-35.(in Chinese)
[12] Charles Chi Cui, Derrick F. Ball and John Coyne,
Working effectively in strategic alliances through
managerial fit between partners: some evidence from
Sino-British joint ventures and the implications for
R&D professionals, in R&D Management, 2002, 4,
pp. 25-61.

Application of a Swarm-based Artificial Neural

Network to Ultrasonic Detector Based Machine
Condition Monitoring
Shan He (s.he@cs.bham.ac.uk), Xiaoli Li (xiaoli.avh@gmail.com),
Cercia, School of Computer Science,
The University of Birmingham, Birmingham, B15 2TT, UK
Abstract—Artificial Neural Networks (ANNs) have been ap- Machine

plied to machine condition monitoring. This paper first addresses
a ANN trained by Group Search Optimizer (GSO), which is a
novel Swarm Intelligent (SI) optimization algorithm inspired by
Intelligent monitoring system

animal social foraging behaviour. The global search performance sensors and amplifiers
of GSO has been proven to be competitive to other evolutionary
algorithms, such as Genetic Algorithms (GAs) and Particle
Swarm Optimizer (PSO). Herein, the parameters of a 3-layer extraction of measured signal features
feed-forward ANN, including connection weights and bias are
tuned by the GSO algorithm. Secondly the GSO based ANN is
applied to model and analysis ultrasound data recorded from automatic feature selection
grinding machines to distinguish different conditions. The real
experimental results show that the proposed method is capable
to indicate the malfunction of machine condition from the automatic feature integration
ultrasound data.
Index Terms—Ultrasonic, Condition Monitoring, Artificial
Neural Networks, Swarm Intelligence, Evolutionary Algorithms
State of observed phenomenon
I. I NTRODUCTION
Fig. 1. Intelligent condition monitoring system.
Condition monitoring system is very important for some
machinery, such as aircraft, and machine tool, and so on
[1]. The main technologies of condition monitoring include
For instance, the adaptive LMS filter’s output is defined by the
sensors, signal processing and classification of condition [2].
linear combination of the order statistics of the input samples
A typical system is shown in Fig. 1.
in the filter window:
New techniques from computational intelligence have at-
tracted the interest of condition monitoring system. These
techniques include advance signal/data analysis, fuzzy logic, Y (k) = aT (k)Xr (k)
artificial neural networks, evolutionary computation and ma- The coefficient vector a(k) is adapted at each step k
chine learning [1]. These computational techniques inspired accordingly to the LMS adaptation algorithm. The core of this
by nature have shown promise in many condition monitoring filter is to build a simple model for the time series; then shifted
system. Moving these techniques from simulated data sets, toy series between the original series and the out of model, or so-
problems or laboratory settings to real industrial applications called residue, is considered as noise. The noise information
is still challenging [3]. can be directly employed to diagnose the condition of machine
Applications of ultrasonic detector for nondestructive testing [4]. It is noted a fact that machine faults often result from
are found in numerous industries, including refineries, pipe some sort of nonlinear operation of the machine involved,
lines, power generation (nuclear or other), aircraft, offshore oil which may in turn lead to nonlinearities occurring in the
platforms, paper mills and structures (bridges, cranes, etc.). So machine’s signatures such as ultrasonic data [5]. A new signal
far, the ultrasonic detector can solve the following problems in processing techniques, such as neural network, can detect these
industrial practices: leak detection, crack detection, diagnostics nonlinearities. In particular, neural networks, fuzzy logic and
and process monitoring.In order to apply for the ultrasonic evolutionary programming are widely applied to solve above
detector for condition monitoring system, an advanced method complicated problems.
should be developed to process the signals from the detector. In this study, ultrasonic detector is proposed to detect the
Usually, an adaptive filter may be employed to the time series. malfunction of a machine. Ultrasonic detectors is made of
Correspondence and requests for materials should be addressed to Dr. X. piezo-electric quartz crystals. The detector can sense ultra-
Li (email: xiaoli.avh@gmail.com) sound generated from a machine. Ultrasound from mechanical

system is energy that created by the friction between moving input layer hidden layer output layer
components, such as bearings, gear mesh, etc. Therefore, the whn wkn
ultrasound is a reflection of machine condition. For example, x1 f1 (¦ ) f1 (¦ ) ŷ1
the friction varies with the lubrication condition. Different
friction leads different ultrasound, in turn, the ultrasound
can be employed to predict the lubrication condition. Since
1980’s, the ultrasonic detectors have been applied for various xn f h (¦ ) f k (¦ ) ŷ k
maintenance of mechanical system.
Previous work show that ultrasound-based condition moni-
toring is better than vibration. For instance, the low frequency
vibration can indicate the wearing state of a bearing and xN f H (¦ ) f K (¦ ) ŷ K
provide information about root cause of premature failure.
However, ultrasound can indicate necessary lubrication in- Th Tk
tervals, and triggers alarms before the bearing enters failure bias bias
state. Ultrasound data is non-stationary, the traditional signal
-1 -1
processing methods fail to build a suitable model to simulate
it. In this paper, we will use a novel neural network trained Fig. 2. A three-layer feed-forward ANN.
by GSO to solve this problem.
GSO is a novel SI algorithm for continuous optimization
problems [6]. The algorithm is based on a generic social In this section, we loosely refer evolving artificial neural
foraging model, Producer-Scrounger (PS) model, which is networks to those ANNs trained by Evolutionary Algorithms
different from the metaphors used by the ACO and PSO. In (EAs) and SI algorithms, such as PSO.
order to evaluate its performance, extensive experimental study Since the renaissance of ANNs in the 80s, EAs have been in-
has been carried out. From the experimental results, it was troduced to ANNs to perform various tasks, such as connection
found that the GSO algorithm has better search performance weight training, architecture design, learning rule adaption,
on large-scale multi-modal benchmark functions. Probably the input feature selection, connection weight initialization, rule
most significant merits of GSO is that it provides an open extraction from ANN, etc. [8].
framework to utilize research in animal behavioral ecology In [9], an improved genetic algorithm was used to tune
to tackle hard optimization problems. In this paper, the GSO the structure and parameters of a neural network. In order
algorithm is applied to ANN training to model ultrasound data to tune the structure of ANN in a simple way, link switches
for the condition monitoring. were incorporated into a three layer neural network. By intro-
The rest of the paper is organized as follows. We present ducing link switches, a given fully connected feed-forward
the background information of evolving ANNs, SI algorithms neural network may become a partially connected network
in Section II. The GSO algorithm and the GSO based ANN after training [9]. An improved Genetic Algorithm (GA) with
are introduced in Section III. In Section IV, the ultrasound new genetic operators were introduced to train the proposed
data, experimental settings and the results will be given. The ANN. Two application examples, sunspots forecasting and
paper is concluded in Section V. associative memory tuning, were solved in their study.
Palmes et al. proposed a mutation-based genetic neural
network (MGNN) [10]. A simple matrix encoding scheme
II. B RIEF OVERVIEW OF A DVANCED A RTIFICIAL N EURAL
was used to represent an ANN’s architecture and weights. The
N ETWORKS
neural network utilized a mutation strategy of local adaption
Artificial Neural networks (ANNs) are inspired by the work- of evolutionary programming to evolve network structures
ings of the human brain. They can be used as a universal func- and connection weights dynamically. Three classification prob-
tion approximator, mapping an arbitrary number of inputs onto lems, namely iris classification, wine recognition problem, and
an arbitrary (but smaller) number of outputs (generally just Wisconsin breast cancer diagnosis problem were used in their
one decision variable). This feature is particularly useful for paper as benchmark functions.
modeling very complex systems. Essentially, neural networks In [11] a new type of EANN called memetic pareto artificial
model the process of the activation and strengthening of neural neural network (MPANN) was developed for breast cancer
connections. Generally, they are built in layers comprising an diagnosis. A multi-objective differential evolution algorithm
input layer, an output layer and one or more hidden layers was employed to determine the number of ANN’s hidden
internally. Fig. 2 shows a very simple ANN model. Each neural neurons and train its connection weights. In order to speed
connection is given a certain weight. These weights are then up the training process, a so-called memetic approach which
tuned on a training set by ANN training algorithms such that combines the BP algorithm was used.
the ANN output most closely resembles the parameter that Cantú-Paz and Kamath presented an empirical evaluation
the network is designed to fit. The objective of ANN training of eight combinations of EAs and ANNs on 11 well studied
algorithms is to minimize the least-square error between the real-world benchmaks and 4 synthetic problems [12]. The
desired output of the training set and the actual output of the algorithms they used included binary-encoded, real-encoded
ANN [7] GAs, and the BP algorithm. The tasks performed by these

algorithms and their combinations included searching for be calculated from ϕki via a Polar to Cartesian coordinates
weights, designing architecture of ANNs, and selecting feature transformation:
subsets for ANN training.
SI algorithms have also been applied to ANN training. The
n−1
expression “swarm intelligence” was coined by Beni and Wang dki1 = cos(ϕkip )
in 1989 [13]. There is no commonly accepted definition of p=1
Swarm Intelligence (SI). As defined in [14]:
n−1
SI is “an artificial intelligence technique based around the dkij = sin(ϕki(j−1) ) · cos(ϕkip )
study of collective behavior in decentralized, self-organized p=i
systems.” dkin = sin(ϕki(n−1) ) (1)
According to [14] and also generally accepted by most of
the researchers in SI, the most prominent components of SI are As same as the PS model, in the GSO, a group com-
Ant Colony Optimiser (ACO) and Particle Swarm Optimiser prises producers and scroungers which perform producing and
(PSO), both of which are based on observations of collective scrounging strategies, respectively. We also employ ‘rangers’
animal behaviour. ACO is inspired by real ants’ foraging which perform random walks to avoid entrapment in local
behaviour. In the ACO algorithm, artificial ants build solutions minima. For accuracy [21] and convenience of computation,
by moving on the problem graph and depositing artificial in the GSO algorithm, we simplify the PS model by assuming
pheromone on the graph so that future artificial ants can build that there is only one producer at each searching bout. This
better solutions [14]. ACO has been successfully applied to is based on the recent research [21], which suggested that
a number of difficult optimization problems, e.g., traveling the larger the group, the smaller the proportion of informed
salesman problems. PSO is another well-known SI algorithm individuals need to guide the group with better accuracy. The
which glean ideas from animal aggregation behaviour. Arti- simplest joining policy, which assumes all scroungers will join
ficial life models, such as BOID, which can mimic animal the resource found by the producer, is used.
aggregation vividly, serve as the direct inspiration of PSO. During each search bout, a group member, located in the
The PSO algorithm is particularly attractive to practitioners most promising area, conferring the best fitness value, acts
because it has only a few parameters to adjust. In the past as the producer. It then stops and scans the environment
few years, the PSO algorithm has been successfully applied to search resources (optima). Vision, as the main scanning
in many areas [15], [16]. mechanism used by many animal species [22], is employed by
In their seminal paper [17], Kennedy and Eberhart, who are the producer in GSO. In order to handle optimization problems
the inventors of PSO, firstly applied PSO to train a simple of whose number of dimensions usually is larger than 3, the
multi-layer feedforward ANN to solve XOR problems. Since scanning field of vision is generalized to a n dimensional
then, more and more SI algorithms including PSO and ACO space, which is characterized by maximum pursuit angle
have been applied to ANN training [18] [19]. θmax ∈ Rn−1 and maximum pursuit distance lmax ∈ R1 as
illustrated in a 3D space in Figure 3. In the GSO algorithm,
III. G ROUP S EARCH O PTIMIZER BASED A RTIFICIAL at the kth iteration the producer Xp behaves as follows:
N EURAL N ETWORKS
A. Group Search Optimizer
The Group Search Optimizer is a novel swarm intelligence θ max
s u it a n g le
algorithm inspired by animal social foraging behavior [6]. The u m pur
M a x im
GSO algorithm employs the Producer-Scrounger (PS) model 0o
[20] as a framework. The PS model was firstly proposed by M a x im u m p u r s u it a n g le θ max (For w ard dire cted )
C.J. Barnard to analyze social foraging strategies of group M ax im um
pu rs ui t di st
living animals. In this model, it is assumed that there are an ce l
m ax
two foraging strategies within groups: (1) producing, e.g.,

searching for food; and (2) joining (scrounging), e.g., joining Fig. 3. Scanning field in 3D space [22]
resources uncovered by others. Foragers are assumed to use
producing or joining strategies exclusively. Under this frame- 1) The producer will scan at zero degree and then scan
work, concepts of resource searching from animal scanning laterally by randomly sampling three points in the scanning
mechanism are used to design optimum searching strategies field [23]: one point at zero degree:
to design the GSO algorithm.
Xz = Xpk + r1 lmax Dpk (ϕk ) (2)
Basically, GSO is a population based optimization algo-
rithm. The population of the GSO algorithm is called a group one point in the right hand side hypercube:
and each individual in the population is called a member. In
an n-dimensional search space, the ith member at the kth Xr = Xpk + r1 lmax Dpk (ϕk + r2 θmax /2) (3)
searching bout (iteration), has a current position Xik ∈ Rn ,
and one point in the left hand side hypercube:
a head angle ϕki = (ϕki1 , . . . , ϕki(n−1) ) ∈ Rn−1 and a
head direction Dik (ϕki ) = (dki1 , . . . , dkin ) ∈ Rn which can Xl = Xpk + r1 lmax Dpk (ϕk − r2 θmax /2) (4)

where r1 ∈ R1 is a normally distributed random number with B. Training Artificial Neural Networks using GSO
mean 0 and standard deviation 1 and r2 ∈ Rn−1 is a random The ANN weight training problem is essentially a hard
sequence in the range (0, 1). continuous optimization problem because the search space
2) The producer will then find the best point with the best is high-dimensional multi-modal and is usually polluted by
resource (fitness value). If the best point has a better resource noises and missing data. Therefore, it is quite logically to apply
than its current position, then it will fly to this point. Otherwise our GSO algorithm to ANN weight training. The objective of
it will stay in its current position and turn its head to a new ANN weight training process is to minimize an ANN’s error
angle: function. However, it has been pointed out that minimizing
ϕk+1 = ϕk + r2 αmax (5) the error function is different from maximizing generalization
[28]. Therefore, to improve ANN’s generalization perfor-
where αmax is the maximum turning angle. mance, in this study, an early stopping scheme is introduced.
3) If the producer cannot find a better area after a iterations, The error rates of validation sets are monitored during the
it will turn its head back to zero degree: training processes. When the validation error increases for a
specified number of iterations, the training will stop. In this
ϕk+a = ϕk (6) study, the GSO algorithm and early stopping scheme have been
√ applied to training an ANN.
where a is a constant given by round( n + 1).
In this study, the ANN we employed consists three layers,
At each iteration, a number of group members are selected namely, input, hidden, and output layers. The nodes in each
as scroungers, which keep searching for opportunities to join layer receive input signals from the previous layer and pass
the resources found by the producer. The commonest scroung- the output to the subsequent layer. The nodes of the input
ing behavior [20] in house sparrows (Passer domesticus): area layer supply respective elements of the activation pattern (input
copying, that is, moving across to search in the immediate area vector), which constitute the input signals from outside system
around the producer, is adopted. At the kth iteration, the area applied to the nodes in the hidden layer by the weighted links.
copying behavior of the ith scrounger can be modeled as a The output signals of the nodes in the output layer of the
random walk towards the producer: network constitute the overall response of the network to the
activation pattern supplied by the source nodes in the input
Xik+1 = Xik + r3 (Xpk − Xik ) (7)
layer [7]. The subscripts n, h, and k denote any node in the
where r3 ∈ Rn is a uniform random sequence in the range input, hidden, and output layers, respectively. The net input u
(0, 1). is defined as the weighted sum of the incoming signal minus
In group-living animals, group members often have different a bias term. The net input of node h , uh , in the hidden layer
searching and competitive abilities; subordinates, who are less is expressed as follows:
efficient foragers than the dominant will be dispersed from
n
the group [24]. This may result in ranging behavior, which is uh = whn yn − θh
an initial phase of a search that starts without cues leading
to a specific resource [25]. In our GSO algorithm, rangers where yn is the output of node n in the input layer, whn
are introduced to explore a new search space therefore to represents the connection weight from node n in the input
avoid entrapments of local minima. Random walks, which are layer to node h in the hidden layer, and θh is the bias of
thought to be the most efficient searching method for randomly node h in the hidden layer. The activation function used in
distributed resources [26], are employed by rangers. If the ith the proposed ANN is the sigmoid function. Therefore, in the
group member is selected as a ranger, at the kth iteration, it hidden layer, the output yh of node h, can be expressed as
generates a random head angle ϕi : 1
yh = fh (uh ) =
1 + euh
ϕk+1
i = ϕki + r2 αmax (8)
The output of node k in the output layer can be also
where αmax is the maximum turning angle; and it chooses a described as:
random distance: 1
yk = fk (uk ) = (11)
li = a · r1 lmax (9) 1 + euk
where
and move to the new point: uk = wkh yh − θk
h
Xik+1 = Xik + li Dik (ϕk+1 ) (10)
where θk is the bias of node k in the output layer.
In order to maximize their chances of finding resources, The parameters (connection weights and bias terms) are
animals restrict their search to a profitable patch. One strategy tuned by the GSO algorithm. In the GSO-based training algo-
is turning back into a patch when its edge is detected [27]. This rithm, each member of the population is a vector comprising
strategy is employed by GSO to handle the bounded search connection weights and bias terms. Without loss of generality,
space: when a member is outside the search space, it will turn we denote W1 as the connection weight matrix between the
back to its previous position inside the search space. input layer and the hidden layer, Θ1 as the bias terms to

D e s ir e d O u tp u t input layer hidden layer output layer
In p u t A N N O u tp u t
y(t-1) f1 ( )
ANN
E rro r
y(t-h) fh ( ) fk ( ) yˆ (t )
GSO
A ju s t P a r a m e te r s
k
Fig. 4. Schematic diagram of GSO based ANN. bias

y(t-l) fH ( )
-1
h
the hidden layer, W2 as the one between the hidden layer
and the output layer, and Θ2 as the bias terms to the output -1 bias
layer. The ith member in the population can be represented
as: Xi = [W1i Θi1 W2i Θi2 ]. The fitness function assigned to
Fig. 5. ANN for time series modeling.
the ith individual is the least-squared error function defined
as follows:
Measured ultrasound
1
P K
i 2
Fi = (dkp − ykp ) (12)
2 p=1
k=1 ANN model 1
.
- Residual
i
where ykp indicates the kth computed output in equation (11) .
.
Input
of the ANN for the pth sample vector of the ith member; P
denotes the total number of sample vectors; and dkp is the
ANN model
.
n - Classifier
.
desired output in the kth output node. .
ANN model N -
IV. A PPLICATION OF GSOANN TO ULTRASONIC
DETECTOR Fig. 6. The structure of the model-based condition monitoring system.
A. Ultrasound Modeling with GSOANN
Ultrasound is a time-series that is generated by a dynamic models can be calculated to form the residuals. The residual
system such as a running machine. In fact, the ultrasound then may be used to indicate the different conditions of the
is an output of this dynamic system. A simple idea is to machine. Each case will have different residuals, the smaller
build a model for this dynamic system by using this output. the residual, the closer to to the state of the compared ANN
Traditional methods are to make use of the series relation model. For a complex condition monitoring system, a classifier
at past and current conditions for modelling the dynam- will be used to distinguish different states.
ical system. Herein, we will apply GSOANN to build a
model for ultrasound from a griding machine tool. Denote
a ultrasound series at instant t as y(t), where y may be B. Case study: Condition Monitoring in Grinding
a vector, then the time series model can be described as In this study, we use the GSOANN to build time series
y(t) = f (t − 1, t − 2, · · · t − l), where f (·) is a modeling models of ultrasound in grinding to identify the condition of a
function. Once the model is constructed, it can be used for machine. Fig. 7 shows the measurement of sound signal from
further analysis, e.g., forecasting, control, and diagnosis. In a grinding machine. The sound is recorded by a ultrasonic
the past few decades, ANNs has attracted more and more detector nearby the motor. The data was recorded using 44100
attention since they provide an powerful alternative to model Hz sampling rate with 16 bits per sample, then the data is
a complex dynamical system. The ANN time series modeling saved/transferred to a computer for further analysis. In the
can be described as N N (t − 1, t − 2, · · · t − l), where N N (·) meanwhile, we also measure different positions, because of
stand for a neural network modeler. the difference of structure, the ultrasound is different.
In this study, an ANN model based condition monitoring We have recorded two types of ultrasound data, one is
system is proposed, as illustrated in Fig. 6. To make use normal condition, another is abnormal. Firstly we aim to model
of this system to identify the condition of a machine, it is these two ultrasound data sets using GSOANN. Then we
necessary to record ultrasound data at different conditions, will investigate whether the GSOANN models can be used to
including normal and abnormal states. Then using the mea- distinguish healthy/faulty conditions from unknow ultrasound.
sured ultrasound as input to the trained ANN models, we Fig. 8 and 9 show the ultrasound sample of normal and
can obtain the expected outputs, the differences between the abnormal condition, respectively. The total length of the sound
actual ultrasound data and expected outputs from the ANN sample from normal and abnormal conditions are 9.6632 and

0.05
Amp.
0
−0.05
0 1 2 3 4 5 6 7 8 9
Time Sec.
Fig. 9. Ultrasound data sample from a machine in normal condition.
0.2
ANN output
0.15 ultrasound data
0.1
0.05
Amp.
-0.05
Fig. 7. The measurement of ultrasound data from the grinding machine.
-0.1
0.2 -0.15
0.8 0.81 0.82 0.83 0.84 0.85
0.15 Time Sec.
0.1
Fig. 10. Testing results from ANN ultrasound model of the machine in
abnormal condition in comparison with the actual unseen ultrasound data.
0.05
Ultrasound fragment taken from 0.8-0.85 second
Amp.
−0.05
training is set to be 150.
−0.1 Once the models are constructed, we then can test the
condition of the machine by using unknown ultrasound data.
−0.15
We select a ultrasound data of 3.1422 seconds recorded from
−0.2 machine at normal condition for testing. The data will be input
0 1 2 3 4 5 6 7 8
Time Sec. into two constructed models. The errors between model output
and actual ultrasound will be calculated to form the residues.
Fig. 8. Ultrasound data sample from a machine in abnormal condition. We select a ultrasound of 0.03 second to illustrate the details
of the GSOANN model output in comparison with the actual
data, as shown in Fig. 10. The error rate between output of
8.1422 seconds, respectively. We use the first 5 seconds data GSOANN model at abnormal condition and actual ultrasound
of these samples as ANN training sets to construct ultrasound data is 0.0024%. The output of the GSOANN model at normal
models. condition is shown in Fig. 11. It can be seen that, the difference
between model output and actual ultrasound is large. The
To build a model for ultrasound data, a 3-layer feed-
error rate calculated from GSOANN output is 0.008%, which
forward neural network with 4 hidden neurons is employed.
is almost 4 times larger than the error rate obtained by the
The training algorithm we chose is the GSO based training
GSOANN model at abnormal condition.
algorithm in Section III. To reduce the complexity of the ANN,
we only select six past data points as the inputs: x(t − 0.01),
x(t − 0.02), x(t − 0.03), x(t − 0.04), x(t − 0.05), x(t − 0.06); V. C ONCLUSION
the output is x(t). The input and output are normalized so In this study, we proposed an novel ANN trained by a SI
that they fall in the interval [−1, 1]. The number of epoch for algorithm GSO, GSOANN for machine condition monitoring.

[6] S. He, Q. H. Wu, and J. R. Saunders. A novel group search optimizer
0.05
inspired by animal behavioural ecology. In 2006 IEEE Congress on
Ultrasound Evolutionary Computation (CEC 2006), pages Tue PM–10–6, Sheraton
EANN output Vancouver Wall Centre, Vancouver, BC, Canada., July 2006.
[7] S. Haykin. Neural Networks. A Comprehensive Foundation. Prentice
Hall, New Jersey, USA, 1999.
[8] X. Yao. Evolving artificial neural networks. Proceeding of the IEEE,
87(9):1423–1447, Sep. 1999.
Amp.
0 [9] F. H. F. Leung, H. K. Lam, S. H. Ling, and P. K. S. Tam. Tuning of the

structure and parameters of a neural network using an improved genetic
algorithm. IEEE Trans. on Neural Networks, 14(1):79–88, Jan. 2003.
[10] P. P. Palmes, T. Hayasaka, and S. Usui. Mutation-based genetic neural
network. IEEE Trans. on Neural Networks, 16(3):587–600, MAY 2005.
[11] H. A. Abbass. An evolutionary artificial neural networks approach for
breast cancer diagnosis. ARTIFICIAL INTELLIGENCE IN MEDICINE,
25:265–281, 2002.
-0.05 [12] E. Cantu-Paz and C. Kamath. An empirical comparison of combina-
0 1 2 3 4 5
Time Sec. tions of evolutionary algorithms and neural networks for classification
problems. IEEE Transactions on Systems, Man, and Cybernetics-Part
0.05 B: Cybernetics, 35(5):915–927, 2005.
Ultrasound [13] G. Beni and J. Wang. Swarm intelligence. In Seventh Annual Meeting
of the Robotics Society of Japan, pages 425–428, Tokio, Japan, 1989.
EANN output
RSJ press.
[14] Wikipedia. Swarm intelligence — wikipedia, the free encyclopedia,
2005. [Online; accessed 21-JULY-2006].
[15] S. He, J. Y. Wen, E. Prempain, Q. H. Wu, J. Fitch, and S. Mann. An
improved particle swarm optimization for optimal power flow. In 2004
Amp.
0 International Conference on Power System Technology, Nov. 2004.

[16] S. He, E. Prempain, and Q. H. Wu. An improved particle swarm
optimizer for mechanical design optimization problems. Engineering
Optimization, 36(5):585–605, Oct. 2004.
[17] J. Kennedy and R.C. Eberhart. Particle swarm optimization. In IEEE
international Conference on Neural Networks, volume 4, pages 1942–
1948. IEEE Press, 1995.
-0.05 [18] Christian Blum and Krzysztof Socha. Training feed-forward neural
0.18 0.2 networks with ant colony optimization: An application to pattern classi-
Time Sec. fication. In HIS ’05: Proceedings of the Fifth International Conference
on Hybrid Intelligent Systems, pages 233–238, Washington, DC, USA,
2005. IEEE Computer Society.
Fig. 11. Testing results from ANN ultrasound model of the machine at normal [19] F. van den Bergh and A. Engelbrecht. Cooperative learning in neural
condition in comparison with the actual unseen ultrasound data. Ultrasound networks using particle swarm optimizers. South African Computer
fragment taken from 0.18-0.2 second Journal, 26:84–90, 2000.
[20] C. J. Barnard and R. M. Sibly. Producers and scroungers: a general
model and its application to captive flocks of house sparrows. Animal
Behaviour, 29:543–550, 1981.
We have successfully constructed GSOANN models for ul- [21] I.D. Couzin, J. Krause, N.R. Franks, and S.A. Levin. Effective leadership
trasound data to monitor the machine conditions in grinding. and decision-making in animal groups on the move. Nature, 434:513–
516, Feb. 2005.
Based on the case study, it is found that the proposed method [22] J. W. Bell. Searching Behaviour - The Behavioural Ecology of Finding
is capable to approximate the ultrasound data of different Resources. Chapman and Hall Animal Behaviour Series. Chapman and
conditions. Preliminary experiments show that these models Hall, 1990.
[23] W. J. O’Brien, B. I. Evans, and G. L. Howick. A new view of
can be used to indicate the condition from ultrasound recorded. the predation cycle of a planktivorous fish, white crappie (pomoxis
However, it would be more helpful to expend the classification annularis). Can. J. Fish. Aquat. Sci., 43:1894–1899, 1986.
to other conditions, such as lubrication conditions. Therefore, [24] D. G. C. Harper. Competitive foraging in mallards: ’ideal free’ ducks.
Animal Behaviour, 30:575–584, 1988.
to assure and widen the applicability of the proposed method, [25] D. B. Dusenbery. Ranging strategies. Journal of Theoretical Biology,
more experiments are required. 136:309–316, 1989.
[26] G. M. Viswanathan, S. V. Buldyrev, S. Havlin, M. G. da Luz, E. Raposo,
and H. E. Stanley. Optimizing the success of random searches. Nature,
R EFERENCES 401(911-914), 1999.
[27] A. F. G. Dixon. An experimental study of the searching behaviour of
[1] X Li, Y. Yao, and Yuan Z. On-line tool condition monitoring using the predatory coccinellid beetle adalia decempunctata. J. Anim. Ecol.,
wavelet fuzzy neural network. Journal of Intelligent Manufacturing, 28:259–281, 1959.
8:271–276, 1997. [28] D. H. Wolpert. A mathematical theory of generalization. Complex
[2] X Li, , Y. Ou, X. P. Guan, and R. Du. Ram velocity control in plastic Systems, 4(2):151–249, 1990.
injection molding machines with higher order iterative learning. Control
and Intelligent Systems, 34:64–71, 2006.
[3] X. Li, R. Du, B. Denkena, and J. Imiela. Tool breakage monitoring
using motor current signals for machine tools with linear motors. IEEE
Trans. Industrial Electronic, 52:1403–1409, 2005.
[4] X. Li and R. Du. Condition monitoring using latent process model
with an application to sheet metal stamping processes. ASME Trans. J.
Manuf. Sci. Engg., 127:376–385, 2005.
[5] X. Li. Detection of tool flute breakage in end milling using feed-motor
current signatures. IEEE/ASME Transactions on Mechatronics, 6:376–
385, 2001.

Inverse Learning Control of an Experimental Helicopter Using

Adaptive Neuro-Fuzzy Inference System
Gwo-Ruey Yu and C. W. Tao
Department of Electrical Engineering , National Ilan University
Section 1, Shen-Lung Rd., Ilan City, Ilan County, Taiwan, 260
e-mail: cwtao@niu.edu.tw
Abstract: The inverse learning strategy is applied to neuro-fuzzy inference systems are fuzzy inference systems
control an experimental helicopter. The inverse dynamics of implemented in the framework of adaptive neural-networks
the helicopter is learned through the adaptive neuro-fuzzy [3]. After a hybrid learning procedure, the adaptive
inference systems. The hybrid learning algorithm of the neuro-fuzzy inference systems establish fuzzy if-then rules
adaptive neuro-fuzzy inference systems include the gradient with trained membership functions to bring out the inverse
descent and least squares method. Both computer simulations dynamics of the helicopter. The trained fuzzy inference
and experimental results illustrate the effectiveness of systems will be utilized to produce control actions such that
proposed design. the desired attitudes of the helicopter are traced.
This paper is organized as follows. In Section 2, the
non-linear dynamics of a experimental helicopter is
1 Introduction introduced and the linearized state-space model of the
experimental helicopter is also described. Section 3 illustrates
Since helicopters are very important conveyances and have the strategy of inverse learning control and the principles of
been extensively used to rescue disasters, helicopters with adaptive neuro-fuzzy inference systems. The computer
superior performance are needed. Helicopters exhibit high simulations of learning and flight are presented in Section 4.
levels of agility and maneuverability such as Section 5 indicates the equipment setup and experimental
“climbing”, ”hovering”, and ”forward flight” [6]. With the results. The conclusions are provided in Section 6.
high agility and maneuverability, the dynamics of a helicopter
is unstable and nonlinear. A nonlinear dynamic model is
developed for a coaxial helicopter in hover condition [2]. In
2 Aerodynamic Equations of the
view of this, a more effective helicopter flight control strategy Experimental Helicopter
is developed in this paper to improve the stability and The experimental helicopter used in this study is made by the
performance. Quanser Company in Canada [9]. The helicopter system
L.A. Zadeh first proposed the fuzzy set theory in [12]. The consists of a helicopter plant on a fixed base, two power
fuzzy inference system employing fuzzy if-then rules can modules (Universal Power Module UPM2405 and
control a plant with experts’ knowledge. Fuzzy controllers UPM1503), a data acquisition card (MultiQ- PCI data
have been widely used in industry because of their easy acquisition card), and a terminal board (MultiQ- PCI
realization and robustness [10]. However, the rules and the Terminal Board). There are two propellers driven by DC
membership functions of a fuzzy logic controller are short of motors (a front motor and a back motor) in the helicopter
systematic design approaches. plant. The mathematical model of the
A lot of works have been done on the construction of fuzzy two-degrees-of-freedom helicopter describes the propeller
control rules and the determination of parameters in dynamics and the aerodynamic force. The dynamic diagram
membership functions. In reference [5], a hybrid fuzzy of the experimental helicopter is shown in Figure 1.
controller with two types of fuzzy control rules is presented to The physical meanings of the parameters in the helicopter
control an autonomous helicopter. The Mamdani-type control diagram are indicated as follows. The helicopter mass center
rules regulate the desired velocity and the TS-type control is represented by the symbol “Mc”. The symbol “L” is the
rules achieve the desired attitude angle. The “rules evolution half-length of helicopter fuselage. The revolving speed of the
tuning method” and “knowledge based adjustment” are front DC motor is controlled via input voltage Vp which
devised for the fuzzy control of an unmanned helicopter in [1]. actuates the pitch propeller. A lifting force is generated from
The literature [7] utilizes genetic algorithms to obtain the the revolving of the front propeller. Thus the aerodynamic
appropriate parameter values in the fuzzy mechanism used
for the control of a UH-1H helicopter. force normally acts to the helicopter fuselage at a distance Rp
Unlike the approaches in the above literature, the inverse from the pitch axis. However, the revolution of propeller also
learning technology is applied to control a causes the generation of a load torque Tp on the rotating motor
two-degrees-of-freedom helicopter through the adaptive observed at yaw axis (parallel axis theorem). Therefore, the
neuro-fuzzy inference systems in this study. The adaptive rotation of pitch propeller does not only affect the motion
with respect to pitch axis, but also the motion relative to the
yaw axis. Similarly, the back DC motor creates a force Fy to

the helicopter fuselage at distance Ry from yaw axis. Also, where Vp is the pitch motor voltage and Vy is the yaw motor
there is a torque Ty about the pitch axis. voltage. The symbol “Kff ” and “Ktf ” are the constants of front
motor. The symbol “Ktb ” and “Kfb ” are the constants of back
motor. These parameter values are listed in table 1. The
symbol “Gd” means the gravitational disturbance constant.
Taking notice of a positive pitch voltage, it does not only
result in a pitch but also a negative yaw.
Table 1: Parameters of the experimental helicopter
Symbol Value Unit
Jpp 0.0307 Kg-Meter2
Jyy 0.0307 Kg-Meter2
Kff 0.8722 Newton/Voltage
Ktf 0.0200 Newton-Meter/Voltage
Kfb 0.4214 Newton/Voltage
Ktb 0.0100 Newton-Meter/Voltage

Fig 1. Dynamic diagram of the experimental helicopter

L 0.4064 Meter
According to the above analysis of aerodynamics, the
equations of longitudinal and lateral motion of the To reduce the effects of the gravitational disturbance
experimental helicopter could be given as follows constant “Gd”, the integrators have to be joined in the loop.

J pp P R p F p C p (T y ) FG ( p ) (2.1) Two new states D and ] are defined as the integrations of
pitch and yaw angles respectively. The new state-space model
J yy y R y Fy C y (T p ) (2.2) is derived as follows:
ª p º ª0 0 1 0 0 0º ª pº
where “p” means the pitch angle relative to the horizontal axis, « y » «0 0 0 1 0 0 »» «y»
“y” means the yaw angle. « » « « »
« p » «0 0 0 0 0 0» « p »
The “Jpp” and “Jyy” are the moment of inertia of the « » « »u « »
fuselage about the pitch and yaw axes respectively. The “Rp” « y » «0 0 0 0 0 0» « y »
is the horizontal distance between the center of mass and the «D » «1 0 0 0 0 0» «D »
« » « » « »
pivot point. The “Ry” is the vertical distance between the «¬] »¼ «¬ 0 1 0 0 0 0 »¼ «¬] »¼ (2.4)
center of mass and the pivot point. The “Fp” is the
ª 0 0 º
aerodynamic force created by the front propeller and is
functions of input voltage Vp. The “Fy” is the aerodynamic
« 0
« 0 »»
force created by the back propeller and is functions of input «K K py » ªV p º
voltage Vy. The “Tp” and “Ty” are torques at the propeller axes « pp »u «V »
« K yp K yy » ¬ y¼
and also are the functions of Vp and Vy respectively. The “Cp” « 0 0 »
and “Cy” are the nonlinear functions of the coupling. The « »
“FG” represents the effects of gravity force on the angular ¬« 0 0 ¼»
momentum. In the next section, the augmented model is used for
The nonlinear dynamic equations (2.1) and (2.2) of the computer simulation. The training data of the adaptive
experimental helicopter could be linearized as a state-spaces neuro-fuzzy inference systems will be generated from
model: equation (2.4).
ª 0 0 º
ª p º ª0 0 1 0º ª pº « 0 0 »» ª0º 3 Inverse Learning Control Using
« y » «0 » « » « K « »
0 0 1» « y» «L ff L Ktb » ªVp º « 0 » (2.3)
« »
«p»
«
«0
u « J u « »
0 0 0» « p » « pp J pp »» ¬Vy ¼ «Gd »
Adaptive Neuro-fuzzy Inference Systems
« » « » « » K K « » The strategy of inverse learning control is first to find the
¬y¼ ¬0 0 0 0¼ ¬ y ¼ «L tf L fb » ¬0¼
« Jyy J » inverse dynamics of the experimental helicopter [8]. Then the
¬ yy ¼
inverse dynamics will be implemented in the feed-forward

loop to produce the desired control signal. The discrete form Rule i : If x is Ai and y is Bi, then hi=pix+qiy+ri;
B
of equation (2.4) can be written as

where Ai and Bi represent linguistic labels for input variables
B
x(n 1) Ax(n) Bu (n) (3.1) x and y, respectively, i=1, …, n. The symbol “pi”, “qi” and “ri”
are the parameters in the consequent of the fuzzy inference
where x is the 6 u 1 state vector, u is the 2 u 1 control system, i=1, …, n.
signal. After iterating the equation (3.1), the discrete state
Defuzzifying the fuzzy set by centroid method, the output
equation at the mth steps is
h can be derived as:
x ( n m) A m x(n) Su (n) (3.2) n
¦ wi hi
where S [ B AB " A m1 B] is the controllability h i 1
(3.4)
n
matrix. Since the experimental helicopter is controllable, the ¦ wi
controllability matrix S is nonsingular. Thus the inverse i 1
mapping of the experimental helicopter exists and can be
and
expressed as an explicit function
u F ( x(n), x(n m)) (3.3) wi P Ai ( x) P Bi ( y )
It is not easy to solve equation (3.3) explicitly. On the where P Ai and P Bi are the membership functions of fuzzy
other hand, the adaptive neuro-fuzzy inference systems could
be treated as a universal approximator by the sets Ai and Bi , respectively.
Stone-Weierstrass theorem [4]. That is, the zero-order
Sugeno model possesses the characteristic of approximating Let the coefficient of hi in Eq. (3.4) be w i , i.e.,
any nonlinear function arbitrarily. Therefore, the adaptive
neuro-fuzzy inference systems are utilized to learn the inverse wi
mapping F. Fig. 2 shows the block diagram of inversing wi n
.
learning control of the experimental helicopter using adaptive ¦ wi
neuro-fuzzy inference systems (ANFIS). The principles of i 1
the adaptive neuro-fuzzy inference systems are introduced in
the following. Then, from the first-order Sugeno model, the output h could
be written as
rp Vp Pitch n
h ¦ w i ( pi x qi y ri ) (3.5)
Helicopter i 1
ry ANFIS Vy Yaw
Fig. 3 shows the structure of adaptive neuro-fuzzy
inference systems. Layer 1 is used for converting the input to
the membership function. The inputs are the states of the
̈́ system and the output is the membership function. Layer 2 is
ey used for performing fuzzy “AND” to find the firing strength
ep of a rule. The output is the product of all the incoming signals.
̈́ Layer 3 is used for normalizing firing strengths. The output is
the ratio of the ith rule’s firing strength to the sum of all rules’
firing strengths.
ANFIS Layer 4 is used for calculating the output for each fuzzy
rule. The adaptive neuro-fuzzy inference systems utilize a
hybrid-learning rule composed of a forward pass and a
backward pass to tune the premise and consequent parameters.
Fig 2. Block diagram of inverse learning control
In the forward pass, signals go forward till layer 4 and the
Fundamentally, the adaptive neuro-fuzzy inference consequent parameter {pi, qi, ri} are identified by the least
systems are implementing fuzzy systems on adaptive neural squares estimate. In the backward pass, the error rates
networks and tuning them by a combination of back propagate backward and the premise parameters are updated
propagation algorithm and least square method based on by gradient descent method. Layer 5 is used for computing
collections of input-output data. This allows the neuro-fuzzy the overall output as the summation of all incoming signals.
systems to learn. To demonstrate the architecture of adaptive The adjustable premise parameters of the membership
neuro-fuzzy inference systems, consider a first-order Sugeno function, Pp, are tuned so as to minimize the average error
fuzzy model. A fuzzy rule set contains fuzzy if-then rules between the actual network output and the desired target over
with the following common format: the vectors in a training set. The error function E is defined as
sum-of-the-squared-errors to be minimized over training set

1 L 2 (3.6)
E ( Pp ) ¦ (d l hl ) in1mf1 in1mf2 in1mf3 in1mf4 in1mf5
2l 1 1
where d l is the desired target and hl is the network output. 0.8
Degree of membership
The premise parameters are updated as follows 0.6
wE (3.7)
'Pp U 0.4
wPp
0.2
where U is the learning rate.
0
layer1 layer2 layer3 layer4 layer5
0 10 20 30 40 50 60
input1pitch
A1
w1 w1 Fig 4. (a) Initial membership functions of pitch angle

x # w1 h
#w #w
1 in1mf1 in1mf2 in1mf3 in1mf4 in1mf5
1
xy

An
i i wi h Ν
i 0.8
# ΝȈ h
B1 0.6
#w #w
xy
wn h
y
#
n n n
0.4
0.2
Bn
xy
0
Fig 3. Architecture of adaptive neuro-fuzzy inference systems

0 10 20 30 40 50 60
input1
4 Computer Simulations Fig 4. (b) Trained membership functions of pitch angle
The adaptive neuro-fuzzy inference systems are trained to

in2mf1 in2mf2 in2mf3 in2mf4 in2mf5
find the inverse dynamics of the experimental helicopter. The 1
input data are the angle of pitch and the angle of yaw angle.
0.8
The output data are the voltage Vp of front motor and the
voltage Vy of back motor respectively. Let the universe of the 0.6

input variables of the fuzzy mechanism are partitioned into
five fuzzy sets. Thus, the architecture of adaptive neuro-fuzzy 0.4
inference systems contains 25 rules, with 5 bell-shaped

0.2
membership functions assigned to each input variable. A total
of 1000 training data are sampled uniformly from the input 0
range [0, 60] u [0, 250]. Fig 4(a) shows the initial
0 50 100 150 200 250
membership functions of pitch angle. The trained input2yaw
membership functions of pitch angle are indicated in Fig 4(b).

Fig 4. (c) Initial membership functions of yaw angle
The bell-shape functions in Fig 4(c) are the initial
membership functions of yaw angle. Fig 4(d) provides the
trained membership functions of yaw angle. To illustrate the in2mf1
1
in2mf2 in2mf3 in2mf4 in2mf5
learning performance, the training data and the neuro-fuzzy

inference systems outputs are plotted in Fig 5. Obviously, the 0.8
final result is a good fit to the original data.
0.6
0.4
0.2
0 50 100 150 200 250

input2
Fig 4. (d) Trained membership functions of yaw angle

Fig 6. (b) Input voltage Vp
Fig 5. (a) Fitness of input voltage Vp
Fig 6. (c) Response of Yaw angle
Fig 5. (b) Fitness of input voltage Vy
After training, the adaptive neuro-fuzzy inference systems

are applied to control the helicopter. Let the command be two
step signals, one is the 60 degrees of pitch angle and the other
is 45 degree of yaw angle. The simulation results of attitude
response and input voltage are shown in Fig 6. Apparently,
the neuro-fuzzy controller tracks the command signals.
Fig 6. (d) Input voltage Vy
5 Experimental results
A two-degree-of-freedom helicopter is used as the platform to
demonstrate the effectives of adaptive neuro-fuzzy inference
systems. The experimental helicopter plant mounted on a
fixed base is shown in Fig 7. The encoders measure the
Fig 6. (a) Response of Pitch angle longitudinal and lateral motion. Electrical signals are

transmitted through the slip ring on the base. There are two
power modules to offer motor voltage in the system. The
UPM2405 is used for the pitch motor. The UPM1503 is used
for the yaw motor. The maximum input voltage is limited to 5
volts.
The MultiQ-PCI data acquisition card installed on
computer supports 48 I/O channels. The MultiQ-PCI terminal
board supports 4 analog outputΕ16 analog inputΕ6 encoder
input and 48 digital I/O channels. The MATLAB Simulink
shown in Fig 8 is used as the interface for operation. Digital
command signals are transmitted to the MultiQ-PCI terminal
board via the MultiQ-PCI data acquisition. These signals are
converted to two kinds of signals (analog and encoder). The
analog signals are transmitted to the Universal Power Module.
The encoder signals are transmitted to the helicopter plant.
Applying the trained neuro-fuzzy inference systems to the
Simulink window, the experimental results are shown in Fig 9.
The attitude of the experimental helicopter could approach
Fig 9. (a) Attitude of Pitch angle
the reference command through inverse learning control.
Fig 7. The experimental helicopter Fig 9. (b) Voltage signal of front motor
Fig 8. Simulink Interface Fig 9. (c) Attitude of Yaw angle

[7] C. Phillips, C. Karr and G. Walker, “Helicopter Flight Control:
with Fuzzy logic and Genetic Algorithms,” Engineering
Applications of Artificial Intelligence, Vol. 9, pp. 175-184,
1996.
[8] D. Psaltis, A. Sideris and A. Yamamura, “A multilayered neural
network controller”, IEEE Control Systems Magazine, Vol. 8,
No. 4, pp. 17-21, 1988.
[9] Quanser Consulting Inc, User manual for 2DOF Helicopter,
2000.
[10] K. Tanaka and M. Sugeno, “Stability Analysis and Design of
Fuzzy Control Systems,” Fuzzy Sets and System, Vol. 45, No. 2,
pp. 135-156, 1992.
[11] L.-X. Wang, A Course in Fuzzy Systems and Control,
Prentice-Hall Inc, 1997.
[12] J. Yen and R. Langari, Fuzzy Logic: Intelligence, Control, and
Information, Prentice-Hall, New Jersey; 1999.
[13] J. Zhang, J. Chen, C. C. Ko, B. M. Chen and S. S. Ge, “A
Fig 9. (d) Voltage signal of back motor Web-Based Laboratory on Control of a
Two-Degree-of-Freedom Helicopter”, Proceedings of the 40th
IEEE Conference on Decision and Control, Vol.3, pp.
2821-2826, December 2001.
$PODMVTJPO
The adaptive neuro-fuzzy inference systems are trained to
learn the inverse dynamics of an experimental helicopter. The
membership functions have been tuned to fit the training data.
The mapping between the attitude of helicopter and motor
voltage is established as the fuzzy inference systems. The
trained fuzzy systems are the flight controller of the
experimental helicopter. Computer simulations and
experimental results demonstrate the response of pitch and
yaw angle could track the desired signal.
Acknowledgement
This research is sponsored by means of the National Science
Council of the Republic of China, under Grant NSC
95-2221-E-197-008-.
3FGFSFODFT
[1] C. Calvalcante, J. Cardoso, J. G. Ramos and O. R. Neves,
“Design and Tuning of a Helicopter Fuzzy Controller,” IEEE
Int. Conference on Fuzzy Systems, pp. 1549-1554, 1995.
[2] A. Dzul, T. Hamel and R. Lozano, “Modeling and nonlinear
control for a coaxial helicopter”, Proceedings of 2002 IEEE
International Conference on Systems, Man and Cybernetics,
Vol.6, 6-9 Oct. 2002.
[3] J.-S. R. Jang, “ANFIS: adaptive-network-based fuzzy inference
system”, IEEE Trans. on Sys., Man and Cyber., Vol. 23, No. 3,
pp. 665-684, 1993.
[4] J.-S. R. Jang, C.-T. Sun and E. Mizutani, Neuro-Fuzzy and Soft
Computing: A Computational Approach to Learning and
Machine Intelligence, Prentice-Hall, Inc., 1997.
[5] B. Kadmiry and D. Driankov, “Autonomous Helicopter Control
Using Linguistic and Model-Based Fuzzy Control,”
Proceedings of the 2001 IEEE International Symposium on
Intelligent Control, Mexico City, September 5-7, pp. 348-352,
2001.
[6] D. McLean, Automatic Flight Control System, Prentice Hall,
1990.

Kinematic Control of a 6-DOF Robot Manipulator using Kohonen

Self-Organizing Map (SOM)
Anjan Kumar Ray ∗ , Laxmidhar Behera † and Amit Shukla ‡
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
208 016, UP, INDIA
∗ akray@iitk.ac.in † lbehera@iitk.ac.in ‡ shuklaam@iitk.ac.in
Abstract— Redundancy resolution is a prime goal for the In the field of robotics, many aspects are complex to model
robot manipulator with higher dimensional joint space than from first principle. So, in those cases, learning and acquisi-
the task-space. In this work, we present three schemes for tion are suitable processes to model robust system. The main
this redundancy resolution based on hybrid visual motor co-
ordination (VMC) for a 6-dof robot manipulator. The proposed aim of the learning task is to formulate a mapping between
first scheme is based on the rotation of the manipulator’s the task-space and the joint space. In this learning scheme
coordinate frames which uses an extended Kohonen’s Self the manipulator will be trained to place its end-effector to a
Organizing Map (EKSOM) to find out the mapping from desired location by using visual data, similar to what we ob-
the 3-dimensional positional task space to the 6-dimensional serve in case of human being. This coordination is achieved
joint space of the manipulator while maintaining its desired
orientation. The next two schemes use semi-joint space and through learning the mapping that exists between the camera
full-joint space clustering during the training through EKSOM. output and the desired end-effector position. The adaptive
The EKSOM is modified to confine the joint angles within a capabilities of motion control of biological organisms are still
specified range. The visual feedback is obtained through a pair highly superior to the capabilities of current robot systems.
of calibrated camera. Here, we assume the preprocessing of Therefore, various neural network models have been devel-
the camera data. So, given the positional data corresponding
to camera coordinates, the modified EKSOM has been trained oped that apply biologically inspired control mechanisms
to obtain the input-output mapping by combining the visual to robot control task. Kuperstein’s [7], [8] model is an
feedback and hybrid system model. These schemes can be used early contribution to the application of topology-conserving
for trajectory tracking in the workspace of the manipulator as maps to visual-motor co-ordination. Ritter, Martinetz and
well as maintaining the required orientation or joint movement. Schulten [9] improved on Kuperstein’s model by considering
These schemes are successfully implemented on a 6-dof IRB-140
manipulator. a more general model based on Kohonen’s self-organizing
scheme. Further modification has been done by Walter and
I. I NTRODUCTION Schulten [10] using the visual feedback from the camera
for fine tuning of the manipulator’s joint space variables.
Visual Motor Coordination (VMC), in the context of Redundancy has been recognized as a major characteristic
robotics is the process of using visual feedback from a in performing the tasks which need dexterity comparable
camera system to control a robot manipulator. It is similar to that of the human arm. Redundancy in the manipulator
to the hand-eye coordination of Human being. The visual structure provides high dexterity and versatile performance
feedback provides the information of the desired location of for a given task due to the infinite number of joint motions
a robot manipulator in its workspace, thus helps the end- which results in the same end-effector trajectory.
effector to reach a target point. In order to use the visual In this present work, a general framework for kinematic
information, a priori knowledge of camera model, robot control has been presented using visual feedback. A model
kinematics and dynamics may be required. This requires the of 6-dof IRB-140 manipulator is considered to study the
exact knowledge of those models which are quite uncertain in applicability of the proposed schemes. In the hybrid network
the field of robotics. However, the parameters of the camera approach, we take the partial system model and a neu-
model can be estimated through camera calibration technique ral network based on Extended Kohonen’s Self-Organizing
[1], [2]. Moreover, getting the exact knowledge of robot’s Map (EKSOM). The trained network is used for trajectory
inverse kinematics is also a complex process [3]. In case of tracking in a obstacle free workspace. The update of neural
redundant robot, the problem spreads out as there will be network is so planned that will always keep the joint-angles
multiple solution for a given task. Redundancy resolution is within the limits of [−π π]. This limit can be changed
a process to handle the multi solution problem [4]. There for the joints whenever required based on its physical limit
are several methods available to solve the redundancy reso- and can be included in the neural model as required during
lution problem such as Jacobian transpose, pseudo-inverse, the training phase. The redundancy of the manipulator has
damped-least square [3], configuration control [5], successive been resolved using the rotational elements of the system
approximation based technique [6]. which will provide the desired orientation as well. So, in

550
this present work, we map 8-dimensional input space to 6- of the manipulator by the following equation,
dimensional joint space during the training with this scheme. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
xb x X
Among the 8-dimension of the input space, 3-dimension ⎢ ⎥ ⎢ e⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
come from the position and with each position we attach a 5- ⎢ yb ⎥ = R ⎢ ye ⎥ + ⎢ Y ⎥ (1)
dimensional random rotational elements during the training ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
phase. The simulation result shows that this method is zb ze Z
capable of tracking a trajectory as well as maintaining a where, R is the rotational matrix given by
required orientation. The next two schemes use the joint ⎡ ⎤
angles information. During the training, the positional data r11 r12 r13
⎢ ⎥
points are generated by random selection of joint angles. ⎢ ⎥
R = ⎢r21 r22 r23 ⎥ (2)
These information are combined with the positional data to ⎣ ⎦
form clustered network. r31 r32 r33
In the section II we present the forward kinematic model The rotational matrix elements and position are found out to
of a 6-dof IRB-140 manipulator. In the section III, a modified be as follows
EKSOM is presented which is the main building block
of neural network training. The first training scheme is r11 = c6 (c5 (c4 (c1 c2 c3 − c1 s2 s3 ) + s1 s4 )
presented which will enable the end-effector to maintain its +(−(c1 c3 s2 ) − c1 c2 s3 )s5 ) (3)
position as well as orientation. The input-output clustering −(−(c4 s1 ) + (c1 c2 c3 − c1 s2 s3 )s4 )s6
of the network is described in the section IV. The simulation
r12 = −(c6 (−(c4 s1 ) + (c1 c2 c3 − c1 s2 s3 )s4 ))
result is presented in the section V followed by the section
VI which depicts the overall conclusion of these processes. −(c5 (c4 (c1 c2 c3 − c1 s2 s3 ) + s1 s4 ) (4)
+(−(c1 c3 s2 ) − c1 c2 s3 )s5 )s6
II. M ODELLING OF S YSTEMS r13 = c5 (−(c1 c3 s2 ) − c1 c2 s3 ) (5)
−(c4 (c1 c2 c3 − c1 s2 s3 ) + s1 s4 )s5
The visual-motor coordination for a robot manipulator
r21 = c6 (c5 (c4 (c2 c3 s1 − s1 s2 s3 ) − c1 s4 )
system consists of a pair of cameras and a robot manipulator.
The objective of this method is to place the end-effector of +(−(c3 s1 s2 ) − c2 s1 s3 )s5 ) (6)
the manipulator to a desired position using the information −(c1 c4 + (c2 c3 s1 − s1 s2 s3 )s4 )s6
gained from the pair of cameras. Each target object position r22 = −(c6 (c1 c4 + (c2 c3 s1 − s1 s2 s3 )s4 ))
[X Y Z]T is seen by each of the two cameras. In this
−(c5 (c4 (c2 c3 s1 − s1 s2 s3 ) − c1 s4 ) (7)
paper, we do not concern about the image processing
issues. We assume a visual preprocessing of the camera +(−(c3 s1 s2 ) − c2 s1 s3 )s5 )s6
data that reduces the images on both the two cameras to r23 = c5 (−(c3 s1 s2 ) − c2 s1 s3 ) (8)
two-dimensional ’retinal’ co-ordinates [u1 u2 ] and [u3 u4 ] −(c4 (c2 c3 s1 − s1 s2 s3 ) − c1 s4 )s5
respectively. Such kind of preprocessing can be done easily
r31 = c6 (c4 c5 (−(c3 s2 ) − c2 s3 )
by convolution and thresholding operations, provided that
there is a single object with high contrast. In this present +(−(c2 c3 ) + s2 s3 )s5 ) (9)
work we consider a 6-dof IRB-140 robot manipulator whose −(−(c3 s2 ) − c2 s3 )s4 s6
D-H parameters are as follows r32 = −(c6 (−(c3 s2 ) − c2 s3 )s4 )
−(c4 c5 (−(c3 s2 ) − c2 s3 ) (10)
i αi−1 ai−1 di θi +(−(c2 c3 ) + s2 s3 )s5 )s6
1 0 0 0 θ1 r33 = c5 (−(c2 c3 ) + s2 s3 ) (11)
−c4 (−(c3 s2 ) − c2 s3 )s5
2 −90o 70 d2 θ2
X = 70c1 + a2 c1 c2 − d2 s1 − d3 s1
3 0 a2 d3 θ3 +d4 (−(c1 c3 s2 ) − c1 c2 s3 ) (12)
o
4 −90 a3 d4 θ4 +a3 (c1 c2 c3 − c1 s2 s3 )
5 90 o
0 0 θ5 Y = c1 d2 + c1 d3 + 70s1 + a2 c2 s1
+d4 (−(c3 s1 s2 ) − c2 s1 s3 ) (13)
6 −90o 0 0 θ6
+a3 (c2 c3 s1 − s1 s2 s3 )
where, the subscripted parameters’ values are a2 = 360, Z = d1 − a2 s2 + a3 (−(c3 s2 ) − c2 s3 )
a3 = 0, d2 = 352, d3 = 0, d4 = 380. The position of the
+d4 (−(c2 c3 ) + s2 s3 ) (14)
end-effector is specified by [X Y Z]T and the orientation is
given by the rotational matrix R. The end effector position The terms ci = cosθi , si = sinθi , cij = cosθi cosθj and
[xe ye ze ]T and orientation is related to the base [xb yb zb ]T sij = sinθi sinθj where 0 < i < 7, 0 < j < 7 are integers.

551
III. H YBRID APPROACH WITH ROTATIONAL ELEMENTS where, cα = cosα, sα = sinα, cβ = cosβ, sβ = sinβ, cγ =
USING MODIFIED EKSOM cosγ, sγ = sinγ
From the above relation, we find out
The learning algorithm in this present work is based on
extended Kohonen’s Self-Organizing Feature Maps which r11 = cαcβ
was introduced by Walter and Schulten in [10]. This method r21 = sαcβ
was presented for a 3-dof robot manipulator [11], [12]. From
r31 = −sβ (18)
each positional data [X Y Z]T we can get a pair of camera
coordinate data [u1 u2 ] and [u3 u4 ] from camera 1 and 2 r32 = cβsγ
respectively. These form the visual target r33 = cβcγ
utarget = [u1 u2 u3 u4 ] (15) The inverse relation between the orientation and the se-
lected rotational elements can be formulated as
For each training step, a target position utarget is presented at
β = Atan2(−r31 , r11 2 + r2 ) (19)
randomly and the corresponding θ(utarget ) = (θ1 θ2 θ3 ) are 21
found out using a hybrid model. The transformation depends α = Atan2(r21 /cβ, r11 /cβ) (20)
(i) on the geometry of all the robot link; (ii) on the relative γ = Atan2(r32 /cβ, r33 /cβ) (21)
position of the robot arm with the camera system, (iii) on
the camera’s optical properties. The control law is adaptively where, Atan2(o, ∗) computes the angle in a 4-quadrant
represented by ‘winner-takes-all’ scheme. Each neuron in structure with tan−1 ∗o .
the network has three fields namely ‘weight vector’ wr , In this present work, we use the information of rotational
‘Jacobean matrix’ Ar and ‘output vector’ θr where subscript elements instead of orientation directly. The position is
r stands for its position in the Kohonen latice. The system directly related to the first three joint angles. So, given a po-
produces the joint angles θ(u) = (θ1 θ2 θ3 ) by using θr and sition and orientation the first three joint angles will be used
Ar , which are required for first two term of the Taylor series to find out the exact position whereas to maintain the proper
expansion of θ(u), i.e., orientation it is seen that the last three joint angles have rapid
changes. So, instead of using direct orientation information

θ(u) = θμ + Aμ (u − wμ ) (16) our approach is based on rotational elements which results in
smooth joint movements. The facility of the hybrid model is
where, subscript μ denotes the subscript of the neuron that we can directly calculate the rotational matrix elements
responsible for the output. given a set of joint angles. So, during the process of training,
The kinematic redundancy in robotics has two sources. the position can be achieved by the two cameras whereas the
One is the task definition in a space of lower dimensionality orientation can be calculated using the system model with
than the joint space and the second is robot’s construction random joint angles. In this way, the erroneous attachment of
with more than six degrees-of-freedom (DOF). For the case orientation can be avoided, because with a chosen position all
of 6-dof manipulator, we can not use simply utarget = the random orientation may not be feasible for the physical
[u1 u2 u3 u4 ] for calculating the proper joint space. In limit. But during commissioning phase, the orientation will
this case we have redundant solution as the 3-dimensional come from a certain valid requirement. So, the rotational
positional task or 4-dimensional image plane task can be elements can be calculated with the help of the equation
observed for various combination of 6-dimensional joint (18). In this way a fusion of camera data and the system
space. So, the additional tasks have to be considered for model can be achieved which is an important contribution
resolving the redundancy. In this present work, we wanted to to the proper training. Otherwise, the choice of random
maintain a desired orientation along with the desired position orientation associated with a specific position may not be
of the end-effector. The orientation of the end-effector is feasible for a physical system. So, the choice of this hybrid
defined by 3 angles named roll γ, pitch β and yaw α nature greatly reduces the chance of erroneous data during
which the end-effector coordinate frame makes with the base training. As a result during the training process we use these
frame of the manipulator [13]. The relationship between the information of rotation along with positional information.
orientation and the rotational matrix is found out to be as As mentioned earlier, we consider a preprocessing of the
follows. camera data so that we can look deep into the redundancy
⎡ ⎤
resolution process instead of image processing issues. So, in
r11 r12 r13
⎢ ⎥ the coming discussion we will consider the direct available
⎢ ⎥
⎢r21 r22 r23 ⎥ = (17) position [X Y Z]T instead of its image coordinate utarget .
⎣ ⎦
The input space of the network is then formed as
r31 r32 r33
⎡ ⎤ target = [X Y Z r11 r21 r31 r32 r33 ]T
ip (22)
cαcβ −sαcγ + cαsβsγ sαsγ + cαsβcγ
⎢ ⎥ representing an end-effector position and orientation. The
⎢ ⎥
⎢sαcβ cαcγ + sαsβsγ −cαsγ + sαsβcγ ⎥ output space for the network is the 6-dimensional joint space
⎣ ⎦
−sβ cβsγ cβcγ θ = [θ1 θ2 θ3 θ4 θ5 θ6 ]T (23)

552
With this selection of input-output space, the dimensions well as this will enforce a smooth movement of joints angles.
of the parameters of each neural unit will change. The Further modification can be done to make the range within
dimensions of the parameters wr , Ar and θr will be 8 × 1, ±π as follows.
6 × 8 and 6 × 1. The neuron μ whose reference position is
closest to the target is declared as the winner based on the if θj > 0 (32)
⎧
euclidean distance metric in the workspace by the following ⎨ θ ; 0 ≤ θj ≤ π
j
equation: θj =
⎩ θ − 2π ; π < θj < 2π
j
target || = min ||wρ − ip
||wμ − ip target || (24)
∀ρ if θj < 0
⎧
The arm is given a coarse movement θ0out which is the ⎨ θ ; 0 ≥ θj ≥ −π
j
network output that moves the end-effector to a position θj =
⎩ θ + 2π ; −π > θj > −2π
j
v0 = [X 0 Y 0 Z 0 ]T with rotation [r11
0 0
r21 0
r31 0
r32 0 T
r33 ] .
Collectively we take it as After completion of the movement stage the neural units are
0 = [X 0 Y 0 Z 0 r11
ip 0 0
r21 0
r31 0
r32 0 T
r33 ] (25) adjusted by the following update rules:
This is followed by some fine movement given by θnout 1

which wr ← wr + εhr2 (ip target − wr ) (33)

brings the end-effector to ipn1 . In this process, we not only θr ← θr + ε1 hr3 Δθr
consider the effect of the winning neuron but also the effect
Ar ← Ar + ε1 hr3 ΔAr
of neighbors of that winning neuron. The neighborhood is
chosen by the equation The update rule for θr can be further corrected using equa-

||r − μ||2 tions (31), (32) where, θj will be the element of θr . The
hr1 (r) = exp − (26) ΔAr is found using a stochastic gradient descent approach
2σ12
t1 to minimize the quadratic cost function
σ1f inal ( tmax )
σ1 = σ1initial
σ1initial
(27) 1 out
2
E= Δθ0n1 − Ar Δip (34)
2
where, σ1initial and σ1f inal stand for the parameter σ1 ’s
initial and final values. t1 is the current number of iteration and it is given by,
step and tmax is the maximum number of iteration in
our training. The following equations depict the process of −2 Δθ0n
ΔAr = ||Δip|| out
− Ar Δip T
Δip (35)
1
averaged output to calculate θ0out and θnout

1
where n1 stands
out
Δθ0n
for number of fine movements. The quantities Δip, and Δθr are found out using the
rules as follows:
1
θ0out = s−1 hr1 (r) θr + Ar ip target − wr (28)

r n − ip0
Δip = ip 1
(36)
θout = θout + s−1 target − ip
hr1 (r)Ar ip i−1 (29)
i i−1 Δθout0n1 = θout − θout
n1 0 (37)
r
out
Δθr 0 − wr
= θ0 − θr − Ar ip (38)
s = hr1 (r) (30)
r In this work, we consider the functions hr2 and hr3 to be
θiout = [θ1 θ2 θ3 θ4 θ5 θ6 ]Ti out Gaussian as

i = 1, 2, ...n1 ||r − μ||2
hr2 = exp − (39)
2σ22
It is observed that, with this condition it may happen that
θout become unbounded. So, to keep the θout within the ||r − μ||2
i i hr3 = exp − . (40)
range of ±2π the following corrective measures have been 2σ32
taken. Also the learning rate parameters ε, ε1 and the neighborhood
if θj > 0 (31) width functions σ2 , σ3 changes during the training process
θj = mod(θj , 2π) according to a general rule as follows:
t1
if θj < 0 ηf inal ( tmax )
θj = mod(θj , −2π) η = ηinitial (41)
ηinitial
j = 1, 2, ...6
where η ∈ ε, ε1 , σ2 , σ3 . One corrective measure is required
where, θj is the element of θiout and the function mod(a, b) during the training of the network. If the required pitch β is
is the remainder of the division a by b. This rule will ensure ± π2 then from equation (18) it is observed that the elements
that all element of θiout will stay within the range ±2π as r11 , r21 , r32 and r33 become 0. So, in this special condition

553
the roll γ, pitch β and yaw α can be computed respectively As all the information of input-output are available, this
as method facilitate the training. The associated parameters of
β =
π
, α = 0, γ = Atan2(r12 , r22 ) (42) each neuron are weight vector wr with dimensionality 9 × 1,
2 Jacobian matrix Ar with dimensionality 6 × 3 and the output
π vector θr with dimensionality 6 × 1. During the training,
β = − , α = 0, γ = −Atan2(r12 , r22 ) (43)
2 the collective output of the network is calculated using the
In this present work, the initial and final values of the eqn. (28), eqn. (29) and eqn. (29), where only the first three
parameters ε, ε1 , σ2 , σ3 are taken as {1.0, 0.05}, {0.9, 0.9}, target , ip
elements of ip i−1 and wr are used. In this way, this
{2.5, 0.01} and {2.5, 0.01} respectively. method reduces the dimensionality of the Jacobian matrix
IV. S EMI - JOINT SPACE AND FULL - JOINT SPACE CLUSTER Ar to 6 × 3 and maps directly the output space θ from the
In visual-motor coordination, a topological map is formed positional input space [X Y Z]T . In this case, ip 0 will be
between the input space and output space using a 3D neural given by
lattice. Each neuron acts as a receptive point of the input. 0 = [X 0 Y 0 Z 0 θ10 θ20 θ30 θ40 θ50 θ60 ]T
ip (49)
During training phase, random θ = [θ1 θ2 θ3 θ4 θ5 θ6 ]T are
generated within the workspace of the manipulator. For each where, end-effector position is vo = [X 0 Y 0 Z 0 ]T and
set of joint movements, the end-effector of the manipulator joint space variables are [θ10 θ20 θ30 θ40 θ50 θ60 ]T . During
reaches a particular workspace point [X Y Z]T . With the the testing phase, the winner neuron is selected based on
neural network training, a topological organization of the the 3-dimensional positional task space [X Y Z]T . In the
input space is formed which directly produce the inverse following section V, the simulation results are presented in
kinematics of the manipulator. So, after training, the network support of these schemes.
is capable of finding the joint space θ given the task space
target . Among the elements of θ,
ip the [θ1 θ2 θ3 ]T resolve
V. S IMULATION R ESULTS
the required positional task [X Y Z]T . The extra 3-dof can
be resolved by using the available joint space θ information. In the simulation, the initial and final values of σ1 are
We can utilize this joint space information along with the taken as 1 and 0.1 respectively. The 3-D neural network with
positional information in two ways, named semi-joint space 8 × 8 × 8 neurons have been trained with 100000 data points
clustering and joint space clustering. randomly spread over the work-space of 200 × 200 × 200
In semi-joint space clustering, we use the available info- mm. During the training phase 2 visual corrections are
mation of last three joint angles during the training, as first used for fine tuning. The visual correction is not required
three joint angles are directly related to the positional task during the testing phase if the network uses a hybrid model.
(eqn. (12), eqn. (13), eqn. (14)). So, the information of first In that case, the mapped values of joint angles from the
three joint angles are redundant. The actual input and output task space will produce the corresponding positional and
space are defined by rotational information using the hybrid network. This reduces
a lot of computational effort for image processing. It is
target = [X Y Z θ4 θ5 θ6 ]T
ip (44) been observed that by normalizing the input space within
θ = [θ1 θ2 θ3 θ4 θ5 θ6 ]T
(45) the range ±1, the errors are greatly reduced and possibility
of smooth joint movements are increased. This policy is
Each neuron of the 3-dimensional neural network is now
adopted during the training of the network. The Fig. 1,
associated with the weight vector wr of dimension 6 × 1,
2, 3 show the elements of the weight vector wr of the
Jacobian matrix Ar of dimension 6 × 6 and output vector
trained network with the first scheme. It is clear from the
θr of dimension 6 × 1. Hence for each point in workspace,
figures that the normalized weights’ values are very small.
the neurons are receptive to both Cartesian space as well as
The corresponding output vector of the trained network are
semi-joint space simultaneously. The network training will
shown in the Fig. 4 and 5. The joint angles 1, 2 and 3 of the
follow the same steps as mentioned by the eqn. (24) through
0 will trained network are shown in Fig. 4 and the Fig. 5 shows
eqn. (41), except a change in the eqn. (25) where ip
the last three joint angles of the trained network. The figures
be given by
depicts that all the joint angles’ values are within the range
0 = [X 0 Y 0 Z 0 θ40 θ50 θ60 ]T
ip (46) of ±π. The trained network has been tested to find whether
where, end-effector position is vo = [X 0 Y 0 Z 0 ]T and last it actually track a desired trajectory while maintaining the
three joint angles are [θ40 θ50 θ60 ]T . desired orientation. A data set for the testing has been
In joint space clustering, we use all the available informa- generated using the system model. Then those data have
tion to train the network. In this scheme, we club together been fed back to the network to observe its outcome. We
the positional input space with the output space i.e. the joint observe a satisfactory result as the output almost matches
space. So, here the actual input and output space are defined with the input to the network. A circular trajectory tracking
by is shown in the Fig. 6. The corresponding joint angles of
the manipulator are shown in Fig. 7 and Fig. 8. As shown in
target = [X Y Z θ1 θ2 θ3 θ4 θ5 θ6 ]T
ip (47) figures, smooth joint movements are achieved for the desired
θ = [θ1 θ2 θ3 θ4 θ5 θ6 ]T (48) tracking operation.

554
weight w
123
trained θ456
−0.3 4
−0.32
2
trained θ6
−0.34
0
w3
−0.36
−2
−0.38
−0.4 −4
0.4 4
0 2 4
0.3 2
0
−0.1 0
0.2 −2 −2
w2 −0.2 w trained θ5 −4 −4 trained θ4
1
Fig. 1. Trained weight vector elements 1, 2 and 3 for rotational scheme Fig. 5. Trained output vector elements θ456 for rotational scheme
weight w456 desired

actual
−0.06 −350
−0.08
−0.1
z−axis −400
6
w
−0.12
−0.14
−0.16 −450
0.05
390
380 210
0 0.2 200
370 190
0.1 360 180
−0.05 0 350 170
w5 w4 y−axis 160 x−axis
Fig. 2. Trained weight vector elements 4, 5 and 6 for rotational scheme Fig. 6. Tracking of a circular path in the workspace for rotational scheme
0.4
weight w678
0.35
0
0.3
0.25
−0.05
θ1, θ2, θ3
0.2
w8
−0.1 0.15
0.1
−0.15 θ1
0.2 0.05
−0.06 θ
2
0.15 0
−0.08 θ
3
0.1
−0.1
−0.05
w7 0.05 −0.12 w 0 20 40 60 80 100 120 140
6 No of samples
Fig. 3. Trained weight vector elements 6, 7 and 8 for rotational scheme Fig. 7. First three Joint angles’ motion during tracking for rotational scheme
0.9
trained θ123 θ4
0.8 θ
5
4
θ
6
0.7
2
trained θ3
0.6
6
θ ,θ ,θ
0
5
4
0.5
−2
−4 0.4
4
2 4 0.3
0 2
0
−2 −2 0.2
trained θ −4 −4 trained θ 0 20 40 60 80 100 120 140
2 1 No of samples
Fig. 4. Trained output vector elements θ123 for rotational scheme Fig. 8. Last three Joint angles’ motion during tracking for rotational scheme

555
1.6
x−axis error desired
1.4 y−axis error actual
z−axis error 300
1.2
280
1
positional error
260
y−axis
0.8
240
0.6
220
0.4
200
0.2 −380
−270
0 −400 −280
−290
−0.2 −300
0 20 40 60 80 100 120 140 z−axis −420 −310 x−axis
No of samples
Fig. 9. Positional error in tracking (in mm) for rotational scheme Fig. 12. Tracking of a circular path in the workspace for semi-joint cluster
0.14 1.6
r θ1
11
0.12 r21 1.4 θ2
0.1 r
31
θ
3
r 1.2
32
0.08
rotational error
r33
1
3
θ ,θ ,θ
0.06
0.04 2
1 0.8
0.02
0.6
0
0.4
−0.02
−0.04 0.2
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
No of samples No of samples
Fig. 10. Rotational elements’ error during tracking for rotational scheme Fig. 13. First three Joint angles’ motion during tracking for semi-joint
cluster
The positional error of tracking is shown in the Fig. 9,

space clustering is shown in the Fig. 12. The associated joint
whereas the rotational and orientation errors are shown in
movements are shown in Fig. 13 and Fig. 14. The positional
the Fig. 10 and Fig. 11. The rms errors associated with the
tracking error is shown in the Fig. 15. The tracking with joint
tracking and maintaining the orientation are tabulated. Table
space clustering is shown in the Fig. 16. The associated joint
I shows the error associated with the positional tracking. The
movements are shown in Fig. 17 and Fig. 18. The positional
rms error along the x-axis,y-axis and z-axis are 0.0257 mm,
tracking error is shown in the Fig. 19.
0.8222 mm and 0.5984 mm respectively. The rotational rms
error are depicted in the Table II and the orientation error are VI. C ONCLUSION
tabulated in the Table III. We observe that the error are quite
In this work, a visual control scheme based on rotational
small as the maximum rms rotational error occurs for yaw α
elements of a robot manipulator is proposed which uses an
which is 0.0628 rad. So, we may conclude that the present
extended Kohonen Self Organizing Map. This method has
work satisfactorily minimizes error while maintaining the
been successfully implemented on a 6-dof IRB-140 manip-
desired position and orientation. The tracking with semi-joint
0.75
0.06 θ4
roll error 0.7
θ
0.04 pitch error 5
yaw error 0.65 θ

6
0.02
0.6
0
orientation error
θ4, θ5, θ6
−0.02 0.55
−0.04 0.5
−0.06 0.45
−0.08 0.4
−0.1
0.35
−0.12
0 20 40 60 80 100 120 140
−0.14 No of samples
0 20 40 60 80 100 120 140
No of samples
Fig. 14. Last three Joint angles’ motion during tracking for semi-joint
Fig. 11. Orientation error during tracking (in rad) for rotational scheme cluster

556
0.8
0.3 x−axis error
x−axis error 0.6 y−axis error
0.25
y−axis error z−axis error
0.2 z−axis error 0.4
0.15 0.2
positional error
0.1 0
error
0.05 −0.2
0
−0.4
−0.05
−0.6
−0.1
−0.8
−0.15
−1
−0.2 0 20 40 60 80 100 120 140
0 20 40 60 80 100 120 140 No of samples
No of samples
Fig. 15. Positional error in tracking (in mm) for semi-joint cluster Fig. 19. Positional error in tracking (in mm) for joint cluster
TABLE I
desired P OSITIONAL ERROR OF TRACKING FOR ROTATIONAL SCHEME
actual
300
Position X Y Z
280
260
rms error (mm) 0.0257 0.8222 0.5984
y−axis
240
220
200
−380 ulator to achieve trajectory tracking in the workspace of the
−400 −280
−270 manipulator as well as desired orientation. The orientation
−290 information is transformed into the rotational information
z−axis −420 −300 x−axis
and then the network has been trained along with the visual
data as mentioned in the section III. The trained network has
Fig. 16. Tracking of a circular path in the workspace for joint cluster been tested for randomly generated position and orientation.
The simulation result shows satisfactory performance. Both
1.6
the position and the orientation are maintained upto the
θ tolerable errors. The errors can be further minimized by
1
1.4 θ
θ3
2
using more number of visual corrections. But it generally
1.2 increases the computational burden of image processing and
1
operational time. In this present work, 2 visual corrections
θ1, θ2, θ3
are used. For smoother joint movement, the joint angle limits
0.8
are considered during the training which limits the movement
0.6 of joints within the range of ±π. With this process, the
0.4
inverse mapping of the joint space from the task space is
achieved. The simulation result shows that this process can
0.2
0 20 40 60 80 100 120 140 be implemented on a 6-dof manipulator. The performances
No of samples
of VMC for redundancy resolution are also demonstrated
Fig. 17. First three Joint angles’ motion during tracking for joint cluster through semi-joint space and joint space clustering. In these
case, we use the available information of joint angles during
the training and clubbed with the positional task. This work
1
θ4
θ
θ
5
TABLE II
6
ROTATIONAL ERROR OF TRACKING FOR ROTATIONAL SCHEME
0.5
θ4, θ5, θ6
Rotation r11 r21 r31 r32 r33

rms error 0.0791 0.0151 0.0018 0.0087 0.0138
0
TABLE III
−0.5
O RIENTATION ERROR OF TRACKING FOR ROTATIONAL SCHEME
0 20 40 60 80 100 120 140
No of samples
Orientation Roll Pitch Yaw
Fig. 18. Last three Joint angles’ motion during tracking for joint cluster rms error (rad) 0.0219 0.0123 0.0628

557
assumes a obstacle free environment. The normalization
of input-output space produces more accurate result with
bounded θ. In this present work, obstacle avoidance is not
considered, which is another way to resolve redundancy and
also no image processing issues have been addressed in this
present work.
R EFERENCES
[1] R. Y. Tsai, “A versatile camera calibration technique for high-accuracy
3d machine vision metrology using off-the-shelf tv cameras and
lenses,” IEEE journal of robotics and automation, vol. RA-3, no. 4,
pp. 323–344, August 1987.
[2] B. K. P. Horn, “Tsai’s camera calibration method revisited,” Tech.
Rep., 2000.
[3] S. R. Buss, “Introduction to inverse kinematics with jacobian trans-
pose, pseudoinverse and damped least squares methods,” University
of california, San Diego, Tech. Rep., April 2004.
[4] R. V. Patel and F. Shadpey, Control of redundant robot manipulators.
Springer, 2005.
[5] H. Seraji, “Configuration control of redundant manipulators: Theory
and implementation,” IEEE trans. on robotics and automation, vol. 5,
no. 4, pp. 472–490, August 1989.
[6] D. K. Goran S., Milan R., “Learning of inverse kinematics behavior
of redundant robot,” in Proceedings of the 1999 IEEE international
conference on robotics and automation, Detroit, Michigan, May 1999,
pp. 3165–3170.
[7] M. Kuperstein, “Adaptive visual-motor coordination in multijoint
robots using parallel architecture,” in Proc. IEEE Int. Automat.
Robotics, Raleigh, NC, 1987, pp. 1595–1602.
[8] ——, “Neural model of adaptive hand-eye coordination for single
postures,” Science, vol. 239, pp. 1308–1311, 1988.
[9] H. R. T.M. Martinetz and K. Schulten, “Three dimensional neural
network for learning visuomotor coordination of a robot arm,” IEEE
Trans. NN, vol. 1, no. 1, pp. 131–136, 1990.
[10] A. Walter and K. Schulten, “Implementation of self-organizing neural
networks for visuo-motor coordination of an industrial robot,” IEEE
Trans. NN., vol. 4, no. 1, pp. 86–95, 1993.
[11] L. Behera and N. Kirubanandan, “A hybrid neural control scheme for
visual-motor coordination,” IEEE Control Systems Magazine, vol. 19,
pp. 34–41, 1999.
[12] L. Behera and N. Kumar, “Visual-motor coordination using a quantum
clustering based neural control scheme,” Neural Processing Letters,
vol. 20, pp. 11–22, 2004.
[13] J. J. Craig, Introduction to Robotics, 2nd ed. Pearson Education,
2003.

558
A WNN Based Kalman Filtering For Auto-Correction Of

SINS/Star Sensor Integrated Navigation System
Baiqi Liu, Jiancheng Fang, Lei Guo
School of Instrumentation Science and Optoelectronics Engineering, Beihang University, Beijing – 100083, China
Abstract: Large velocity errors and position errors the accelerometer bias, which can not be corrected by
result from initial misalignments in the SINS/Star Kalman filters or other non-linear filters. As a result, it
sensor integrated navigation system, since SINS works is necessary to correct these errors due to initial
independently without the correction of star sensor misalignments to improve the precision of SINS/Star
before the ballistic missile flies out the atmosphere. sensor integrated navigation system.
Firstly a wavelet neural network is designed to estimate Up to now, few feasible approaches have begun
the velocity errors and position errors of SINS due to on, since it is a nonlinear relationship between the
initial misalignments. The errors estimated through velocity errors, position errors and the initial
WNN can be corrected in the SINS/Star sensor misalignments. How to estimate the velocity errors
integrated navigation system. Simulations are carried and position errors from the misalignment information
out to show the efficiency of the presented method for is an open problem in the integrated navigation area.
reducing velocity and position errors of SINS/Star Neural Networks is suitable for the approach of
Sensor integrated navigation system. the nonlinear system. Wavelet neural network (WNN)
combining the strongpoint of wavelet decompositions
Key words: Integrated navigation system, SINS, star and feedforward networks has become a popular tool
sensor, auto-correction, wavelet neural network, initial for function approximation [7]. In this paper, WNN is
misalignment. designed firstly to correct the velocity errors and
position errors of SINS/Star sensor integrated
navigation system due to initial misalignments, and
1 Introduction the simulations are carried out to indicate the
Strapdown Inertial Navigation System (SINS) has the efficiency of the new method.
merit of small volume, light weight, low cost and
medium accuracy. Particularly with the rapid
development of Ring Laser Gyro (RLG) and Fiber 2 Traditional Integrated Method
Optic Gyro (FOG), SINS plays a more and more
significant role in navigation fields [1]. However, the of SINS and Star Sensor
error accumulation of SINS with time retards the
further improvement of the SINS precision. If SINS 2.1 The Scheme of SINS/Star Sensor
works independently, it is hard to meet the precision integrated navigation system
requirement of the long range or middle range ballistic
missile. Therefore, SINS is often integrated with other The scheme of SINS/Star Sensor integrated navigation
aided system through various filtering methods. Star system is as shown in Fig 1. SINS determines the
Sensor can determine the attitude of the ballistic position, velocity, and attitude ( M g , T g and J g ) of
missile by celestial observation to enhance the
accuracy of SINS [2]. The star sensor provides updates the ballistic missile according to the launch point
for the angle error derived from the stellar observation, inertial coordinate system(LPICS) which is describe
which can correct not only the attitude error of SINS with subscript g. Star sensor determines the
but also the gyro drift [3-5]. attitude( M c s , T c s and J c s ) of ballistic missile
It is noted that SINS works independently under according to the geocentric inertial coordinate
atmosphere, since star sensor is interfered by sun light,. system(GCICS) which is described with subscript c,
When the ballistic missile flies out the atmosphere
and the superscript s represents star sensor. M c s , T c s
region, star sensor starts to search and recognize the
stars in sky, and then the attitude of ballistic missile and J c s are transferred to the attitude according to
can be calculated from the attitude matrix of star
sensor [6]. During the standalone operation of SINS, GCICS through the transfer matrix C cg , namely M g s ,
there are large velocity errors and position errors T g s and J g s . M g s , T g s and J g s are minused by
resulting from initial misalignments, gyro drifts and

M g , T g and J g respectively, and the results are ª03u3 03u3 03u3 Cbg º
« »
represented with of GM ˈ GT and GJ . And then « Fb 03u3 Fa 03u3 »
F( t ) (4)
GM ˈ GT and GJ are multiplied by the transfer «0 I 3u3 03u3 03u3 »
« 3u3 »
matrix A, the results of which are set to be the observ- «¬03u3 03u3 03u3 03u3 »¼
ation of kalman filter. The attitude errors and gyro
ª Cbg º
drifts of SINS are evaluated by kalman filter and then G( t ) « » (5)
corrected in SINS. ¬«0 9u3 ¼»
where I x , I y and I z are misalignments of
Position, velocity and attitude of ballistic missile
according to launch point inertial coordinate
SINS. GV x , GV y and GV z are velocity errors of
FOG SINS SINS. Gx , Gy and Gz are position errors of
M g ,T g , J g SINS. H x , H y and H z are gyroscope drifts. Fa
Attitude according and Fb are given by
to geocentric +
inertial coordinate - ª f 14 f 15 f 16 º
Star Sensor Ccg «f
M cs ,T cs , J cs M gs ,T gs , J gs Fa « 24 f 25 f 26 »»
'M , 'T , 'J
«¬ f 34 f 35 f 36 »¼
ª 0 W z W y º
« »
Fb « W z 0 W x »
Xˆ Kalman Ix I y Iz Transfer Matrix:
Filter A « W y W x 0 »¼
¬
Fig. 1 The basic scheme of SINS/Star Sensor Integrated where W x , W y , W z are special force measured by
Navigation System
accelerometers. f 14 , f 15 , f 16 , f 24 , f 25 , f 26 ,
f 34 , f 35 and f 36 are given by
2.2 Mathematic Model of SINS/CNS
wg x GM x2
(1) State Equation of the System f 14 3 (1 3 2 )
wx r r
The launch point inertial coordinate system is wg x GM x( y R0 )
f 15 3 3
taken to be the navigation reference frame, and the wy r r2
body frame of SINS is similar to the body frame of
wg x GM xz
ballistic missile. According to the error model of SINS, f 16 3 3 2
the state equation of SINS/Star Sensor integrated wz r r
navigation system in launch point inertial coordinate is wg y wg x
given as follows. f 24 f 15
wx wy
X(t) F(t)X(t) G(t)W(t) (1) wg y ( R0 y ) 2
GM
f 25 3
(1 3 )
wy r r2
where X ( t ) is the state vector of the SINS/Star wg y GM ( R0 y ) z
Sensor integrated navigation system. F ( t ) is the state f 26 3 ( )
wz r3 r2
transition matrix. G( t ) is the error distribution matrix. wg z wg x
f 34 f 16
W ( t ) is system noise vector and assumed to be white wx wz
noises. X ( t ) , W ( t ) , F ( t ) and G( t ) are given wg z wg y
f 35 f 26
by wy wz
>
X ( t ) I x I y I z GV x GV y GV z
wg z GM z2
f 36 (1 3 )
Gx Gy Gz H x Hy Hz @
T
(2) wz r3 r2
W( t ) >WH x WH y WH z @T (3) r x 2 ( y R0 ) 2 z 2

(2) Measurement Equation of the System M gs ( T 't ) , T gs ( T 't ) a n d J gs ( T 't ) . T h e
The attitude M gs , T gs and J gs determined by WNN based auto-correction method to correct errors
star sensor which does not accumulate with time is due to initial misalignments is as shown in Fig 2. SINS
much more accurate than the attitude M g ˈT g and J g works alone from launching until the star sensor
determine the attitude which is much more accurate
determined by SINS. The attitude of SINS minus the
than the attitude of SINS. The misalignments of SINS
attitude of star sensor, the result is given as follows:
s
at T 't moment are calculated by misusing
ªGM º ªM g M g º
«GT » « s »
M gs ( T 't ) , T gs ( T 't ) a n d J gs ( T 't ) with
« » «T g T g » 6˅ M g ( T 't ) , T g ( T 't ) and J g ( T 't ) . Because
«¬GJ »¼ « J g J gs »
¬ ¼
T 't LV generally short (about 40 seconds in this
In the error model of SINS, I x , I y and I z research), the misalignments of SINS at T 't
represent the angle error between the computational moment are nearly equal to the initial misalignments.
navigation coordinate system and the navigation The misalignments of SINS at T 't moment are
reference frame, which are different with GM , carried in the trained wavelet neural network, then the
position errors and velocity errors of SINS due to
GT and GJ . GM , GT and GJ are transferred to
initial misalignments are calculated by the WNN
I x , I y and I z through the transfer matrix A using designed in the next part of this paper and corrected in
the equation as follows: SINS. After T 't time, the Kalman filter introduced
ªI x º ªGJ º ª 0 cos M cos T sin M º ªGJ º above is used to integrated SINS and star sensor.
«I » A «GM » « 0 sin M cos T cos M »» ««GM »»
« y» « » «
«¬I z »¼ «¬GT »¼ «¬ 1 0 0 »¼ «¬GT »¼
(7)
According to the equation (7), the measurement
equation could be obtained as followes:
Z( t ) HX ( t ) V ( t ) (8)
where Z ( t ) is the observation vector of the

SINS/Star Sensor integrated navigation system. H ( t )
is the measurement matrix. V ( t ) is measurement
noise vector and assumed to be white noises. Z ( t ) ,
H ( t ) and V ( t ) are given by
Z( t ) >I x I y Iz @T ,
H( t ) >I 3u3 0 3u12 @ , V ( t ) >V )x V )y V )z @
Fig 2 A WNN based Auto-Correction method of SINS
3 A WNN based Auto-Correction using Star Sensor
method of SINS using Star
Sensor 4 The design of wavelet neural
Supposed that it is T second when the ballistic missile networks for correct the SINS
flies out the atmosphere and the star sensor stars to
work. It takes 't time for the star sensor to search The activation function of WNN is the wavelet
function [8]. Wavelet neural network combining the
stars, recognize stars, and calculate attitude, which
merits of wavelet decompositions and feed forward
means the time is T 't when the star sensor
networks has excellent learning and generalizing ability,
determines the attitude of the first time, namely
which have become a popular tool for function

approximation [9, 10]. function is obtained as follows:
In this part, A WNN is designed to estimate the 1 N 6
¦ ¦ > y i ŷ i @
l l 2
position errors and velocity errors of SINS due to the J ˄11˅
2 l 1i 1
initial misalignments. The inputs of the WNN are three
According to the equation (6), the learning
initial misalignments, 'M 0 , 'T 0 and ' J 0 . The algorithm of the parameters could be obtained as
outputs are three position errors and three velocity follows:
errors due to initial misalignments, 'x , 'y , 'z , wJ
wij ( n 1) wij ( n ) K w aw 'wij ( n ) ˄12˅
'V x , 'V y and 'V z . So the number of input wwij
neurons and output neurons is three and six wJ
a j ( n 1) a j ( n ) Ka a a 'a j ( n ) ˄13˅
respectively, and the number of neurons in hide layer wa j
is set to be seven. The structure of WNN designed to
wJ
correct SINS is show in Fig 3. b j ( n 1 ) b j ( n ) Kb ab 'b j ( n ) ˄14˅
wb j
where K w ,K a and K b are the learning rate of w, a
<m
yn and b respectively, which are all set to be 0.5.
<1 a w , a a and ab are the momentum coefficients of
xm 'x w, a and b respectively, which are all set to be 0.02.
<2 The method to get the training sample is given as
'M 0
'y follows. First, a trajectory for ballistic missile is
<3 designed, then the independent navigation calculation
'T 0 'z
of SINS is carried out for T 't seconds with the
<4 'Vx initial misalignments according to the trajectory.
Because other initial errors are set to be zero in the
'J 0 <5 'V y independ calculation of SINS such as gyro drift and
accelerometer bias, the position errors and velocity
<6 'Vz
errors only due to initial misalignments at T 't
moment could be obtained. The three initial
<7 misalignments and the errors at T 't moment due
to the three initial misalignments are taken to be
Fig 3 The structure of the WNN designed to estimate the training samples. When the two initial level
position errors and velocity errors due to initial misalignments vary from -20 arc-second to 20
misalignments
arc-second with an increment of 10 arc-second and, the

initial heading error varies from -1 degree to 1 degree
The mathematics model of this WNN could be with an increment of 30 arc second, 1200 groups
obtain as follows training samples could be obtained.
§ 3 ·
7
¨ ¦ xk b j ¸
ŷ i ¦ wij < ¨ k 1 ¸ ,i 1,2, ,6 ˄˅ 5 Simulation
j 1 ¨ aj ¸
¨ ¸ The trajectory designed in this paper is as shown in Fig
© ¹
4. Longitude and latitude of launch point is 116° (E)
where, xk ( k 1,2 ,3 ) is the kth input. y i ( i 1,2,,6 )
and 0° respectively, and initial heading, pitch and
is the jth output. wij represent the weight between azimuth are 90°, 0° and 90° respectively. The time span
the ith neuron in hide layer and the jth neuron in of the power phase is 158 seconds. The star sensor
output layer. a j and the b j mean the scale and starts to work at T moment and determines the
shift of the wavelet function respectively. <( x ) is the attitude at T 't moment Ǆ T 't equals to 40
seconds in this paper. Initial heading misalignment is 6
mother wavelet and given as follows: arc minutes, and the two level misalignments both are
x2 10 arc-seconds. The three gyro drifts are all 0.1°/h( V ),
<( x ) x exp( ) ˄˅
2 and the three accelerometer bias are all 100ug( V ), and
If the training sample is > x l , y l @, l 1,2 , , N , the
k k
the error of star sensor is 10 arc-second( V ).
number of training samples is N. So the objective

the power-off point. The simulation results are as
shown in Fig 5 with red dot line. The three velocity
errors at the power-off point are 0.3152m/s, -0.1263m/s
and -12.234m/s respectively and the three poison errors
are 32.606 meters, -13.03 meters and -1717 meters
respectively, which are too big to satisfy the accuracy
of the ballistic missile.
Then, the simulation with WNN correction is
carried out. SINS also works independently from
launching to the time 40 seconds after launching. WNN
is used to estimate the position errors and velocity
errors due to initial misalignments, and then the errors
estimated by WNN are corrected in SINS. The kalman
filter is still used to integrated SINS and star sensor
from 40 seconds to 158 seconds when the ballistic
missile is power-off. The simulation results are as
Fig 4 Trajectory of the ballistic missile designed for shown in Fig 5 with blue line. The three velocity errors
at the power-off point reduce to 0.1267m/s, -0.058m/s
simulation
and 0.1194m/s respectively and the three poison errors
reduce to 7.778 meters, -6.665 meters and 7.317 meters
First, the simulation without WNN correction is respectively, which are much more accurate than the
carried out for comparision. SINS works independently traditional method without WNN.
from launching to the time 40 seconds after launching
and the kalman filter is used to integrated SINS and
star sensor from 40 seconds to 158 seconds which is
Fig 5 Compare between the results of the traditional method and the new WNN based correction method for the errors due to
initial alignment of SINS

6 Conclusion [9] Zhang Q, Benveniste A, Wavelet Networks [J].
IEEE Transaction on Neural Networks, 1992,
Because the star sensor is interfered by sun light, it 6(3): 889 - 898.
starts to work after the ballistic missile flies out the [10] Zhang Q, Using Wavelet Networks in No-
atmosphere, before which SINS work independently. nparametric Estimation [J]. IEEE Transac- tion
The initial misalignments of SINS result in very large on Neural Networks, 1997, 8(2): 227 -236.
position errors and velocity errors which can not meet
the accuracy of ballistic missile especially for the
maneuver launching and the underwater launching.
The position errors and velocity errors due to initial
misalignments can not be correct by star sensor
directly. A new WNN based correction method for
errors due to initial misalignments of SINS is
proposed in this paper, the simulation results indicate
that this method corrects the position errors and
velocity errors due to initial misalignments effectively
and improve the accuracy of SINS/Star Sensor
integrated navigation system.
Reference:
[1] Upton R. W. Jr. and Miller W. G., The next
frontier for strapdown RLG inertial systems [C].
Position Location and Navigation Symposium,
1990: 537 -542.
[2] C. C. Liebe, Star Trackers for Attitude Dete-
rmination [J]. IEEE AES System Magazine,
1995.6: 10~16
[3] Brown Alison, Moy Geng, Long duration
strapdown stellar-inertial navigation using
satellite Tracking[C], IEEE PLANS, Posi- tion
Location and Navigation Symposium, 1992.4:
194-201.
[4] Johnson, W.M.; Phillips, R.E. Space avion- ics
stellar-inertial subsystem[C], Proceedi- ngs of
the 20th Conference on Digital Avio- nics
Systems, 2001, Vol.2, 8D2/1 - 8D2/9.
[5] Andy Wu, Douglas H. Hein, Stellar inertial
attitude determination for LEO spacecraft [C],
Proceedings of the 35th Conference on
Decision and Control, Kobe, Japan, 1996, 3:
3236-3244.
[6] Chen Xueqin, Geng Yunhai, On-orbit calibration
algorithm for gyro/star sensor [J], Journal of
Harbin Institute of Techn- ology, 2006,
38(8):1369 -1373.
[7] Ho D.W.C., P.A. Zhang and J. Xu, Fuzzy
wavelet networks for function learning, IEEE
Trans. On Fuzzy Systems, 2001, 9(1):200-211.
[8] Jin, Zhen-Shan, Shen, Gong-Xun, Study on
stellar-inertial integrated guidance system for
mobile ballistic missile, Hangkong Xu-
ebao,2005.3, Vol.6, No.2: 168-172

A Method to Pre-select Support Vectors for

LS-SVM Classifiers
Yongsheng Sang1,2 , Zhang Yi1 and Stones Lei Zhang1
1 Computational Intelligence Laboratory, School of Computer Science and Engineering,
University of Electronic Science and Technology of China, Chengdu 610054, P.R.China.
2 School of Computer Science and Technology,
Southwest University of Science and Technology, Mianyang 621010, P.R.China.
{sangys,zhangyi,leizhang}@uestc.edu.cn.
Abstract— Least Squares Support Vector Machines (LS-SVM), A pruning method, called Sparse LS-SVM, was proposed
a modified version of standard SVM, is a proven method for in [6] for imposing sparseness of LS-SVM by Suykens et al.
classification and function estimation. Comparing to standard Motivated by the fact that the LS-SVM support values are
SVM, LS-SVM only needs to solve a set of linear equations
instead of a quadratic optimization problem. As a result, LS- proportional to the errors at the data points, they first train LS-
SVM is computationally attractive. However, the sparseness is SVM on the whole training data, then remove a small amount
lost because LS-SVM makes use of an -sensitive cost function. To of points with smallest values in the sorted |αk | spectrum and
impose sparseness to LS-SVM solution, typical pruning methods re-train LS-SVM based on the reduced training set. It repeats
omit the data with smallest training errors and retrain the LS- the above steps unless the user-defined performance index
SVM on the reduced training set iteratively. But iterative pruning
and retraining are time-consuming. In this paper we propose a degrades. This is a iterative and time-consuming procedure.
direct method to impose sparseness to LS-SVM, which is done by De Kruif et al [7] Argue that omitting data with small errors
pre-selecting some more significant data points as support vectors in the previous pass does not reliably predict what the errors
for LS-SVM. Experiments show our method is time-saving and will be after the samples have been omitted. So they propose
can effectively improve the sparseness of LS-SVM. to select the samples that bear the smallest errors when they
are omitted in the next pass. However, the computational
I. I NTRODUCTION cost of determining the pruning points will increase in this
Support vector machines introduced by Vapnik et al in method. The SMO-based method for sparse LS-SVM in [8]
[1] is receiving increasing attention in recent years. SVM is is still an iterative method. L. Hoegaerts et al [11] give a
an important learning methodology with good generalization general comparison of various pruning algorithms, and make
ability, which has been successfully applied in classification a conclusion that pruning based on absolute support values
and function estimation problems [2], [3]. For the standard is still most attractive if one takes into account both the
SVM, one needs to solve a quadratic convex optimization computational costs and classification accuracy.
problem. The dimension of the optimization problem grows In this paper, we propose a direct method to pre-select a
with the size of training data set, which results in intense subset from original data set as a new training set, which does
computation. But due to the -insensitive cost function, the not need to train a non-sparse LS-SVM and retrain iteratively
solution obtained by SVM is sparse. on reduced training sets. Our method evaluates the significance
A modified SVM called LS-SVM was proposed by J.A.K. of a data point to the LS-SVM according to its distance to
Suykens et al, which has been investigated for classifica- the other class center. The effectiveness of our method is
tion and function estimation problems [4], [5]. Taking into demonstrated by some experiments.
account equality constraints, LS-SVM only needs to solve This paper is organized as follows. In Section II, LS-SVM
a set of linear equations. This method significantly reduces for binary classification problem is reviewed. A pre-selecting
the computation complexity. However, the sparseness of the support vectors method for LS-SVM is proposed in Section
solution for LS-SVM is lost because of the choice of - III. The experimental results on some artificial datasets are
sensitive cost function. In standard SVM, many vectors are reported in Section IV. The conclusion is given in Section V.
non-support vectors and their support values are zero. Those
non-zero support values are related to support vectors, which II. L EAST S QUARES S UPPORT V ECTOR M ACHINE
are contributing to construct the classifier. While in LS-SVM, C LASSIFIERS
the support values are proportional to the errors, which results
in that almost all the data points have contributions to the Given a training set of n data points {(x1 , y1 ), (x2 , y2 ), ...,
classifier. There are several numerical algorithms proposed for (xN , yN )}, where xk ∈ Rp is the kth input vector and yk ∈
training LS-SVM [9], [10], but they usually get non-sparse {+1, −1} is the corresponding class label. Assuming the data
solutions. points are linearly separable, there exists a linear classifier in
input space such that y(x) = Sign(wT x + b). For nonlinearly
This work was supported by National Science Foundation of China under
Grant 60471055 and Specialized Research Fund for the Doctoral Program of separable cases, we employ the idea of mapping the data points
Higher Education under Grant 20040614017. into a high dimensional feature space by means of a nonlinear

function ϕ(.), and a separating hyperplane in feature space III. P RE - SELECTING S UPPORT V ECTORS M ETHOD FOR
takes form as y(x) = Sign(wT ϕ(x) + b). To build an SVM, LS-SVM C LASSIFIERS
one needs to find the optimal hyperplane between the two
From Equation (3) we can formulate (9)
classes of training samples, which has the maximum margin of

separation 1/||w||. Obviously, the maximization of the margin ek = 1 − yk wT ϕ(xk ) + b (9)
is equivalent to the minimization of the Euclidean norm of w.
In order to obtain the optimal hyperplane, standard SVM need According to Equation (9), ek can be classified as follows
to build an optimization problem with inequality constraints ⎧
⎪
⎪ <0 , |wT ϕ(xk ) + b| > 1
such that ⎨
=0 , |wT ϕ(xk ) + b| = 1
yk [wT ϕ(xk ) + b] ≥ 1, k = 1, ..., N. ek = (10)
(1) ⎪
⎪ (0, 1) , 0 < |wT ϕ(xk ) + b| < 1
⎩
1 , |w ϕ(xk ) + b| = 0
T
Unlike standard SVM model, LS-SVM model builds an opti-
mization problem with equality constraints (3) instead of (1), According to αk = γ · ek (γ > 0), we can get a shift rule of
and the optimization problem takes the form as follows |αk |, which is shown in Fig.1.
1 T γ 2
N In Fig.1, D denotes (wT xk + b) , and |αk | denotes the
min J3 (w, b, e) = w w+ ek (2) absolute support values of data points. Those data points fall
w,b,e 2 2
k=1 inside the region nearest to the optimal hyperplane (D = 0)
Subject to the equality constraints have more big |αk | approximate to γ. When data points come
closer to decision surfaces D = +1 and D = −1, their |αk |
yk [wT ϕ(xk ) + b] = 1 − ek , k = 1, ..., N (3) reduce closer to 0. When data points go further from decision
The Lagrangian is constructed as surfaces, their |αk | gradually begin to increase, and reach γ
again when approach to D = +2 and D = −2. The further
L(w, b, e; α) = J3 (w, b, e)− data points go away from decision surfaces, the bigger |αk |
N (4)
αk {yk [wT ϕ(xk ) + b] − 1 + ek } they have.
k=1
According to Kuhn-Tucker conditions, the conditions for op- D = −3 D = −2 D = −1 D = 0 D = +1 D = +2 D = +3

timality are obtained as follows ĸ ĸ
⎧ ķ ķ
⎪
⎪
N
⎪
⎪ ∂w = 0 → w =
∂L
αk yk ϕ(xk )
⎪
⎪
⎪
⎨
k=1
N
∂b = 0 →
∂L
αk yk = 0
⎪
⎪
⎪
⎪
k=1
⎪ ∂e = 0 → αk = γek , k = 1, ..., N
∂L
⎪
⎪
⎩ ∂Lk = 0 → y [wT ϕ(x ) + b] − 1 + e = 0, k = 1, ..., N
∂αk k k k
(5)
αk αk
A linear system can be given through these conditions
γ ek 2γ γ 0 γ 0 γ 2γ γ ek
0 YT b 0
= . (6)
Y ZZ T + γ −1 I α 1 Fig. 1. The shift rule of absolute support values of data points.
−
→
where z = [ϕ(x1 )T y1 ; ...; ϕ(xN )T yN ], y = [y1 ; ...; yN ], 1 =
The pruning method mentioned in [6] omitted an amount
[1; ...; 1], α = [α1 ; ...; αN ]. Mercer’s condition can be applied
of data points with smallest |αk | step by step. However, while
to the matrix Ω = Z T Z where
omitting those data points with |αk | approach to γ, it’s difficult
Ωkl to distinguish data points near to D=0 from those points near
= yk yl ϕ(xk )T ϕ(xl ) (7) to D=+2 and D=-2, and those data points near to D=0 can’t
= yk yl K(xk , xl ) be omitted arbitrarily because they construct the boundary
between two classes. Hence, it probably stops the pruning
then resolve the linear system (6)-(7) which instead of
procedure when omit data points with |αk | approach to γ,
quadratic optimization question in standard SVM and finally
or it will result in huge loss of performance in most cases.
results in the following LS-SVM classifier
N In other words, the pruning method in [6] usually can omit
those data points in region 1 without performance loss. In
f (x) = sign αk yk K(x, xk ) + b) (8) fact, we can omit all those hidden data points in region 2
k=1 without more loss of performance. Therefore, the data points
where K : Rp × Rp −→ R : (x, xk )
−→ K(x, xk ) is a fall inside boundary regions and those points far from decision
positive-definite kernel function. αk are Largrange multipliers, surfaces (points with bold border in Fig.1) are remained, which
and they are proportional to the errors at the points. Conse- are the points with largest |αk | and the points constructing the
quently, all the data points are related to the construction of boundary. It’s obvious that region 2 covers more wide range
the classifier in LS-SVM and the sparseness is lost. than region .1 As a result, omitting data points in region 2

can improve more sparseness of LS-SVM than the pruning Define di+− as the distance between a sample xi ∈ C + and
method in [6]. the class center of C − in feature space, then we can formulate
Therefore, we propose a novel pruning method, called PS d2i+− as follows
LS-SVM, which directly omits those data points least signifi-
cant for construction of LS-SVM classifier before training it. d2i+−
The PS LS-SVM can omit those hidden data points effectively = ||ϕ(xi ) − ϕ− ||2
through a pre-selection procedure and the detail description of = ϕ2 (xi ) − 2ϕ(xi )ϕ− + ϕ2−
PS LS-SVM is given as follows: = K(xi , xi ) − 2ϕ(xi ) N1− ϕ(xj )
1) Compute the class centers ϕ+ for class C + and ϕ− for xj ∈C −
class C − in given training data sets. + N1− ϕ(xj ) N1− ϕ(xk ) (15)
xj ∈C − xk ∈C −
2) Compute the distances di+− between each data point
xi ∈ C + and ϕ− , then sort these distances, and a distance = K(xi , xi ) − N2− K(xj , xi )
xj ∈C −
spectrum D+ is obtained.
+ N12 K(xj , xk )
3) Compute distances di−+ between each data point xi ∈ −
xj ∈C − xk ∈C −
C − and ϕ+ , then sort the distances, and a distance spectrum
D− is obtained.
4) Pre-select an amount of data points with smallest dis- di+− can be computed as
tances (e.g. 20% of original training set in nonlinearly sep-
arable problems) and largest distances (e.g. 20% of original
training set in nonlinearly separable problems) in distances di+− =
2 1

spectrum D+ and D− as support vectors, and omit all the K(xi , xi ) − N− K(xj , xi ) + N−2 K(xj , xk )
other data points. xj ∈C − xk ∈C −
5) Train LS-SVM classifier based on the pre-selected data (16)

set. Similarly, Define di−+ as the distance between a sample xi ∈
In our pre-selection procedure, we evaluate the significance C − and the class center of C + in feature space, and di−+ can
of a data point according to its distance to the other class be computed as
center. To compute the distance, we need to find the class
centers at first. In linearly separable data sets, class centers
can be easily obtained. Define ϕ+ as the center of class C + , i−+ =
d
2
1
it can be computed in input space by K(xi , xi ) − N+ K(xj , xi ) + N+2 K(xj , xk )
xj ∈C + xk ∈C +
1 (17)
ϕ+ = xi (11) The proposed pre-selection procedure doesn’t need to train
N+ + xi ∈C a non-sparse LS-SVM beforehand and retrain LS-SVM itera-
where N+ is the number of samples in class C + . Define ϕ− tively, which saves much of time. The results of the pruning
as the center of class C − in the input space by method in [6] also show they keep an amount of data points
near to the optimal hyperplane and far from the decision
1 surfaces. Hence, our method is consistent with Sparse LS-
ϕ− = xi (12)
N− − SVM in some extent, but imposes more sparseness to LS-SVM
xi ∈C
classifiers. The experimental results show that our method
where N− is the number of samples in class C − . The usually achieves as good performance as standard LS-SVM.
distances di+− and di−+ for linearly separable data sets can
be computed by means of Euclidean distance in input space.
For nonlinearly separable data sets, we compute the class IV. E XPERIMENTS
centers in feature space. Given a training sample, define ϕ(xi )
as a mapping of sample xi from input space to feature space, In this section, we present some design examples and
then the kernel function can be written as K(xi , xj ) = ϕ(xi ) · simulation results in order to compare the Standard LS-
ϕ(xj ). Define ϕ+ as the class center of C + in the feature SVM, Sparse LS-SVM, and our method PS LS-SVM. Part
space by A is designed for linearly separable problems and Part B
for nonlinearly separable cases. Many experiments based on
1
ϕ+ = ϕ(xi ) (13) different size of training sets both for linear and nonlinearly
N+ + separable examples are given in part C. Our experiments
xi ∈C
were carried on an Intel Celeron M 1.4GHz with 256MB of
Where N+ is the number of the samples of class C + . Similarly,
memory and running Windows XP Professional 2002. In all
define ϕ− as the class center of C − in the feature space by
the experiments presented here, LS-SVM is carried out using
1 LS-SVM tools from LS-SVMlab [12] based on Matlab 6.5
ϕ− = ϕ(xi ) (14) system. Our pre-selecting support vectors procedure is carried
N− −
xi ∈C
out based on a Visual Basic 6.0 program, and we train LS-
Where N− is the number of the samples of class C − . SVM using standard LS-SVM tools.

A. Linearly separable experiment 40
class 1
class 2
In this experiment, we used an artificial linearly separable 35
training set with 500 vectors with 2 Dimensions, in which 30
250 vectors belong to class C + and 250 vectors belong to 25 1
class C − . We pre-selected 20% data points of class C + with 1
X2
20
smallest distances to the class center of the class C − , and 15

5% furthest data points into support vector set. A same pre-
selection operation was done to class C − . The pre-selection
10
1
5 1
procedure of support vectors in this experiment carried out in
0
input space. We used standard LS-SVM, Sparse LS-SVM, and 0 5 10 15 20
X1
25 30 35 40
our method PS LS-SVM to train the data set, and Fig.2 shows
the training result of our method. Fig. 4. Pruning result of Sparse LS-SVM in Expriment B(285 points).
B. Nonlinearly separable experiment 40 class 1

class 2
In this experiment, we used another nonlinearly separable 35
training set with 500 vectors with 2 Dimensions, in which 250 30
vectors belong to class C + and 250 vectors belong to class 25 1

C − . Because the data sets are nonlinearly separable, we pre-
X2
20
1
selected support vectors in feature space which is obtained 15
by using Radial Basis Function kernel with σ = 7. During
10
training LS-SVM, we also used Radial Basis Function kernel 1 1
5
with γ = 10 , σ = 7; in Sparse LS-SVM, we employed
0
step=0.05, and tradeoff=0.75. We pre-selected 20% data points 0 5 10 15 20
X1
25 30 35 40
of class C + with smallest distances to the class center of the

class C − , and 20% furthest data points into support vector Fig. 5. Training result of PS LS-SVM in Expriment B(200 points).
set. A same pre-selection operation was done to class C − .
The training results of Sparse LS-SVM and PS LS-SVM on
this data set are shown in Fig.4, Fig.5 respectively. Fig.2 and Fig.5 show training results of our method, in
which some data points near to the optimal hyperplane and far
from the decision surfaces are kept. Fig.4 shows the pruning
65
Class C+
Class C−
result of Sparse LS-SVM in Experiment B. Fig.3 shows the
1
60 |αk | spectrums for three methods in experiment A, in which

Sparse LS-SVM is based on 326 support vectors and our
55
method is based 126 support vectors. Fig.3 also shows the
X2
spectrum of our method is well consistent with part of standard

1
50
LS-SVM with more big |αk |. In experiment B, Sparse LS-

45
SVM and PS LS-SVM are based on 285 and 200 support
40 vectors respectively. Obviously, our pre-selected support sets
1
20 30 40 50 60 70 80 90
are far smaller than standard LS-SVM and Sparse LS-SVM.
X1
Fig. 2. Training result of PS LS-SVM in Expriment A(126 points). C. Performance tests on different size of data sets
This part includes six experiments for performance com-
parisons of three methods based on different size of data sets.
10
we designed training sets size 500 , 800 and size 1200 for
9 both linearly separable experiments and nonlinearly separable
8 cases. All the experiments were tested on size 400 test sets.
7 All the data sets were generated from Gaussian distributions,
6
and some linearly separable data sets are with small overlap.
|ak|
5
Each experiment was repeated almost 100 times and average
4
results were taken. The performances comparisons of linearly
3
2
separable experiments were listed in Table I, and nonlinearly
1
by LS−SVM
by Sparse LS−SVM
separable cases were listed in Table II. In the two tables,
by PS LS−SVM
0
training time of Sparse LS-SVM includes training a non-
0 100 200 300 400 500
# of SVs sparse LS-SVM and the pruning time. PS LS-SVM training
time includes pre-selecting time and training a sparse LS-SVM
Fig. 3. The |αk | spectrums for three methods in Expriment A. procedure.

TABLE I
P ERFORMANCE COMPARISON OF THREE METHODS ON LINEARLY SEPARABLE TRAINING SETS .
Size of training sets Methods # of SVs Training time(s) Test time(s) Test accuracy(%)
LS-SVM 500 0.114 0.029 99.92
500 Sparse LS-SVM 351 0.114 + 1.146 = 1.260 0.023 99.84
PS LS-SVM 125 0.024 + 0.017 = 0.041 0.012 99.79
LS-SVM 800 0.268 0.047 99.95
800 Sparse LS-SVM 546 0.268 + 2.596 = 2.864 0.035 99.92
PS LS-SVM 200 0.055 + 0.028 = 0.083 0.015 99.92
LS-SVM 1200 0.576 0.067 99.98
1200 Sparse LS-SVM 795 0.576 + 5.984 = 6.560 0.046 99.97
PS LS-SVM 300 0.116 + 0.060 = 0.176 0.020 99.98
TABLE II
P ERFORMANCE COMPARISON OF THREE METHODS ON NONLINEARLY SEPARABLE TRAINING SETS .
Size of training sets Methods # of SVs Training time(s) Test time(s) Test accuracy(%)
LS-SVM 500 3.043 0.085 99.95
500 Sparse LS-SVM 332 3.043+18.626=21.67 0.065 99.91
PS LS-SVM 200 0.360+0.388=0.748 0.045 99.59
LS-SVM 800 8.901 0.138 99.96
800 Sparse LS-SVM 515 8.901+62.293=65.97 0.097 99.92
PS LS-SVM 320 0.978+1.039=2.026 0.061 99.92
LS-SVM 1200 24.512 0.203 99.99
1200 Sparse LS-SVM 645 24.51+173.05=197.560 0.109 99.95
PS LS-SVM 480 2.246+2.841=5.087 0.078 99.97
From Table I, we can see our method only pre-select a fixed sets in these experiments, which can be improved easily by
size total 25% of original training sets as support vectors in increasing amount of pre-selected support vectors. Further
linearly separable cases. In table II, our method pre-select total study will focus on how to obtain an optimal amount for pre-
40% of original training samples as support vector sets for selecting support vectors in diverse cases.
nonlinearly separable data sets. The pruning results of Sparse
LS-SVM in the two tables show it usually keeps 60% to 70% R EFERENCES
of data points as support vectors. It’s obvious our method [1] V. Vapnik, “The Nature of Statistical Learning Theory”, Springer-Verlag,
can impose more sparseness to LS-SVM and effectively speed New York, NY, 1995.
up the training and test procedure. The test accuracy of our [2] B.Schölkopf, C.Burges, and A.J.Smola, “Advances in Kernel Methods:
Support Vector Learning”, Cambridge, Cambridgeshire, UK: Cambridge
method is almost same as standard LS-SVM approximately, Univ. Press, 1998.
and achieves higher performance in some cases than sparse [3] V. Vapnik, “The support vector method of function estimation”, in J.A.K.
LS-SVM. But, it has a little more loss of performance on Suykens and J. Vandewalle (Eds) Nonlinear Modeling: Advanced Black-
Box Techniques, Kluwer Academic Publishers, Boston, pp. 55-85, 1998.
some small training sets than Sparse LS-SVM. The loss [4] J.A.K. Suykens and J. Vandewalle, “Least squaress support vector ma-
of performance of our method can be easily improved by chine classifiers”, Neural Process. Lett., Vol.9, No.3, pp.293-300, June
increasing the amount of pre-selected support vectors. In fact, 1999.
[5] J.A.K. Suykens, L. Lukas, and J. Vandewalle, “Sparse approximation
we can pre-select less support vectors in linearly separable using least squares support vector machines”, in Proc. IEEE Int. Symp.
cases without performance loss, such as pre-selecting total Circuits and Systems (ISCAS00), Geneva, Switzerland, May 2000, pp.
10% of training samples usually obtains a good classifier too. II757-II760.
[6] J.A.K. Suykens, L. Lukas, and J. Vandewalle, “Sparse Least Squares
In this paper, we only give a conservative experiential amount Support Vector Machine Classifiers”, in Proc. of he European Symposium
of preselecting support vectors. on Artificial Neural Networks (ESANN2000), Bruges, Belgium, 2000, pp.
37-42.
V. C ONCLUSIONS [7] B.J. de Kruif and T.J. de Vries, “Pruning error minimization in least
squares support vector machines”, IEEE Trans. Neural Netw., vol. 14,
An effective pre-selecting support vectors method for LS- no. 3, pp. 696-702, May 2003.
SVM was proposed. The pre-selection is based on the signifi- [8] X.Y.Zeng and X.W.Chen, “SMO-based pruning methods for sparse least
squares support vector machines, IEEE Trans. Neural Netw., vol.16, no.6,
cance of data points to the construction of LS-SVM classifier, pp. 1541-1546, Nov. 2005.
which is determined by the distances of the points to the [9] J.A.K. Suykens, L.Lukas, P.Van Dooren, B. De Moor, and J. Vande-
other class center. Experiments for linearly separable and walle, “Least squares support vector machine classifiers: a large scale
algorithm”, in Proc. Eur. Conf. Circuit Theory and Design (ECCTD99),
nonlinearly separable cases were given, and a great deal Stresa, Italy, 1999, pp. 839-842.
of tests based on different size of data sets were studied [10] W. Chu, C.J. Ong, and S. S. Keerthi, “An improved conjugate gradient
too. These experiments show support vectors can be reduced scheme to the solution of least squares SVM”, IEEE Trans. Neural Netw.,
vol. 16, pp. 498-501, 2005.
obviously by our method using experiential amount 25% for [11] L.Hoegaerts, J.A.K. Suykens, J.Vandewalle, and B.De Moor, “A com-
linearly separable data sets and 40% for nonlinearly separable parison of pruning algorithms for sparse least squares support vector
cases, which can avoid training iteratively LS-SVM and speed machines”, in Proc. 11th Int. Conf. ICONIP, Calcutta, India, Nov. 22-
25, 2004.
up training and test procedure. Our method only has a little [12] LS-SVMlab, http://www.esat.kuleuven.ac.be/sista/lssvmlab/.
acceptable loss of performance in some small training data

A Multi-layer Quantum Neural Networks Recognition System
for Handwritten Digital Recognition

Li Peng Rushi Wu
College of Communication and Control Engineering, Southern Yangtze University, Wuxi, Jiangsu,
China, 214122, pengli@sytu.edu.cn
Abstract: In this paper, a handwritten digital recognition BPNN is their inability to correctly estimate class
system based on multi-level transfer function Quantum membership of data belonging to region of the
Neural Networks (QNN) and multi-layer classifiers is feature space where there is overlapping between the
proposed. The recognition system proposed consists of classes[6]. The reason for this is that BPNN use
two layer sub-classifiers, namely first-layer QNN coarse sharp decision boundaries to partition the feature
classifier and second-layer QNN numeral pairs classifier. space. The handwritten digital, due to its extensive
Handwritten digital recognition experiments are variety of writing style, has overlapping data between
performed by using data from MNIST database. the classes, so it is reasonless for BPNN to be used to
Experiment results indicate the proposed QNN do handwritten digital recognition.
recognition system achieves excellent performance in In 1997, N.B.Karayiannis[7-9] and other people
terms of recognition rates and recognition reliability, and employed the idea of quanta states superposition and
at the same time show the superiority and potential of advanced the QNN model based on multi-level
QNN in solving pattern recognition problems. transfer function. It has three-layer network
Keywords: quantum neural network; multi-level architecture. The transfer function of the quantum
neuron in hidden layer applied the superstition of
transfer function; multi-layer classifier; pattern
several traditional transfer functions, and this
recognition
architecture made the network possess a sort of fuzzy
1.Introduction characteristics. It has been proved by theories and
The handwritten digital recognition is an important experiments that the QNN based on multi-level
embranchment of Optical Character Recognition transfer function has perfect classify effect for pattern
(OCR). Because of broad applications of postal code recognition problems which have uncertainty and
recognition, financial document processing, form overlapping data between two patterns. At present the
processing etc. In the past 30 years, the handwritten QNN has been applied to many fields successfully,
digital recognition is always a research hotspot in the such as: image disposal, weather forecast and speech
image processing and pattern recognition fields[1]. recognition[10-11].
Although a great deal of work had been done on the In this paper, aiming at the data overlapping of
feature vectors in handwritten digital recognition, we
handwritten digital recognition[2-5], there still exists
applied the recognition system based on multi-level
room in pursuit of a higher recognition rate and transfer function Quantum Neural Networks
reliability for real world applications. (MLQNN) and multi-layer classifiers to do the
As a new method of pattern recognition, the handwritten digital recognition. The results of the
experiments testified that the MLQNN recognition
artificial neural network has some unique virtues system have perfect classify effect for pattern
compared with traditional methods: perfect fault recognition problems that have uncertainty and
tolerance ability, powerful classification ability, overlapping data between two patterns.
parallel management and self-learning ability. In all 2. Quantum Neural Networks
2.1 Architecture of Quantum Neural Networks
artificial neural networks models, the back The main difference between conventional
propagation neural networks(BPNN) have been feed-forward neural networks and QNNs is the form
broad used. But one of the major disadvantages of of the nonlinear activation functions of their hidden

units[12]. Instead of the ordinary sigmoid functions
employed by conventional BPNN, according to the It can be seen from Eq.(1)and Eq.(2), that f is
idea of quanta state superposition, the transfer
function of neurons in hidden layer is expressed as − x −1
sigmoid function, i.e. f ( x) = (1 + e ) , ir is the
W
linear superposition of multi-sigmoid function, i.e.
multi-level transfer function[13], and then the nodes
in hidden layer could denote more states than the connecting weight between node
ai in input layer
traditional nodes in hidden layer which could denote
only two states. The multi-level transfer function has V
several different quantum intervals. By adjusting the and node br in hidden layer, rj is connecting
quantum intervals, the different classes of data are
weight between node br in hidden layer and node
mapped onto corresponding state; accordingly the
classification has more freedom. The quantum
intervals of QNN will be determined through training.
Given a suitable training algorithm, the uncertainty in
cj
in output layer, θ r is the threshold of hidden
the sample data will be adaptively learned and
quantified. If the feature vector lied at the boundary ϕj
between overlapping classes, the QNN will assign it layer, is the threshold of output layer.
partially to all related classes. These make the QNN For QNN, Where the Eq.(1) become:
possess a sort of inherent ambiguity, and it could
1 ns
assign the ambiguity data to the corresponding br = ∑ f [β (W T X − θ s )] s
ns s =1 = 1,2,..., ns
patterns reasonably, thereby reduce the uncertainty of ,
pattern recognition and improve the veracity of
pattern recognition. Where f ( x ) = 1 /(1 + exp(− x )) , W is the weight
The standard architecture of a three-layer BPNN
is shown in Fig.1. Suppose that there are m nodes in vector, X is the input vector, β is a slope factor,
the input layer LA, n nodes in the output layer LC
and u nodes in the hidden layer LB. The nodes W T X is the input activation of the quantum neuron,
locating at every two attached layers connect each
other, while the nodes belonging to the same layer do θ s is the quantum interval.
not connect. The output function of the rth node in
2.2 Training of Quantum Neural Networks
hidden layer is:
There are two steps in the training of QNN. The first
br = f (W X − θ ) r = 1,2,..., u
T
step is to make the input sample data correspond to
the relevant class spaces by updating connecting
weights. The second is to embody the uncertainty of
LB data by updating quantum intervals of quantum
LA LC
neuron in hidden layer. The connecting weights are
I nput ( m) Out put ( n) adjusted through standard back propagation
• algorithm. Once the weights have been obtained,
• • •
• • • quantum intervals can be learned by suitable training
• W ir V rj •
algorithm[7]. The quantum intervals of quantum
neuron in hidden layer can be learned by minimizing
the class-conditional variances at the outputs of the
Fig.1 Architecture of a Three-layer BPNN
hidden units. Essentially, the quantum intervals
The output function of the jth node in output adjusting algorithm is gradient-descent-based
algorithm.
layer is:
ith
c j = f (V B − ϕ ) j = 1,2,..., n
T
Cm

σ ∑( O efficiently, they must be transferred into a more
2 2
i ,m Oi , k
i ,m
xk : xk ∈C m
appropriate form by processing. Because some
preprocessing have been done on the digit images of
where Oi , k is the output of the ith hidden MNIST, so in this paper we only applied the
Binarization, Blank segmentation and Normalization
steps to process the images. Although the numeral
1
unit with input vector x k , Oi ,m =
Cm
∑O
xk : xk ∈C m
i ,k ,
images from MNIST are 28x28 pixel grey-level ones,
the virtual size of digital is 20x20 pixel and there are
blank in the images. In order to reduce the work load
of computer and center the numeral in the center of
C m denote the cardinality of C m . By minimizing image, we applied the blank segmentation step to
decrease the useless feature values. In this paper, we
σ applied the vertical projection method to search the
2
, we can get the follow update equation for
i ,m right blank segmentation points. Concretely speaking,
we project the digital image onto the coordinate axis
θ i, s ( for the ith hidden unit and its sth quantum and calculate the total pixel value, if the total pixel
value of some points on the coordinate axis less than
level. the critical value predefined, then the points maybe
β are segmentation points. The size of the digital
∑ ∑( O )( )
n0
Δθ i , s = η i,m − Oi , k * Vi , m , s − Vi , k , s images will become unlikeness after blank
ns m =1 x k : x k ∈C m
segmentation step, so we applied the linear
Where η is the learning rate; β is a slope factor; normalization method to unify the size of the digital
images. The original black and white images from
MNIST were size normalized to fit in a 16x10 pixel
n0 is the number of output nodes, i.e. the number of box while preserving their aspect ratio.
In paper, we use pixel features as direct input of
classes; ns is the level number of quantum artificial neural networks. The reason of success can
be explained that the neural network also acts as a
intervals. xk : x k ∈ C m denote the all input feature extractor during the learning and weight
forming[14].
samples belonged to the class C m . 3.2 Architecture of Multi-layer QNN
Recognition System
The block diagram of proposed system for
1
Vi , m, s =
Cm
∑V
xk : xk ∈C m
i,k ,s
handwritten digital recognition is shown in Fig.2.
Vi ,k , s = Oi ,k , s * (1 − Oi ,k , s )
In Oi , k , s is the output of the ith hidden
unit and its sth quantum level with the x k input
vector.
3. Handwritten Digital Recognition System
3.1 Preprocessing and Feature extraction
We obtained the train set and test set from MNIST Fig.2 The system block diagram of the MLQNN
database. The MNIST database is a free standard recognition system
numerals database provided by Doctor Y.Le Cun. The recognition system consists of three main
From AT&T laboratory. The MNIST database of parts: preprocessing and feature extraction, first-layer
handwritten digits has a training set of 60,000 coarse classification and second-layer fine
examples, and a test set of 10,000 examples. It is a classification. The method of preprocessing and
subset of a larger set available from NIST. All the feature extraction had been described in section 3.1.
numeral images are 28x28 pixel grey-level image, For first-layer coarse classification, a three-layer
background is white and foreground is black. QNN is employed. The 160 bits of features are feed
In order to process the handwritten numerals into the input layer. The output layer is composed of

10 nodes representing 10 character classes the network is not more than 0.002 we stop training.
{0},{1},{2},{3},{4},{5},{6},{7},{8},{9}. The We choose 2,000 samples out of training set for every
hidden layer is composed of 30 quantum neurons and QNN of the second-layer fine classification, the end
the quantum level is set to 6. After being trained, the condition of training is SSE ≤ 0.0005 . The
QNN can be used as a classifier. learning rate for weight adjusting is set to 0.05, and
The second-layer fine classification is composed the learning rate for quantum interval adjusting is set
of 13 three-layer QNNs. After being trained, the each to 0.005. The training evolvement process of the
QNN is used to recognize one confusing numeral pair. recognition system be shown in Fig.3.
In the handwritten digital recognition, there are many The 10,000 testing samples of MNIST are used as
confusing pair numbers in recognition processing, testing set of our experiments. We have
such as numeral pairs{4,9},{3,8},{7,9} etc. By conducted three experiments, and have
statistic, we obtain 13 most confusing numeral pairs calculated the recognition rate and reliability
in MNIST database. These confusing numeral pairs rate.
make the feature vector of samples overlap, the right the number of character recognized correctly
δ = × 100
recognition rate of recognition system decrease and the total number of character recognized by network
the mis-recognition rate increase. QNN have be
shown to be capable of correctly estimating class the number of character recognized incorrectly
κ= × 100
membership of data belonging to regions of the the total number of character recognized by network
feature space where there is overlapping between the
classes, so we applied the QNN classifier to compose γ = 1−δ −κ (10)
the handwritten digital recognition system.
recognitio n rate
In the second-layer fine classification, the nodes λ= × 100
of input layer are 160 and the output layer is recognitio n rate + mis − recognitio n rate
composed of 2 nodes representing 2 classes. The
hidden layer is composed of 10 nodes and the Where δ is recognition rate, κ is
quantum level is set to 3. mis-recognition rate, γ is rejection rate and λ is
3.3 Flow of Recognition System
reliability rate.
The work process of the designed recognition system
is composed of two stages: train stage and 0
Performance is 0.00199788, Goal is 0.002
0
Performance is 0.000500804, Goal is 0.0005
10 10
recognition stage. Each QNN classifier is trained
independently by using the algorithm described in -1
10
section 2. It is important to determine when the

Training-Blue Goal-Black
Training-Blue Goal-Black
-1
10
network finishes the training. Over training can -2

10
cause the network to over-specialize training set -2

10
and to adapt too closely to the particular features -3

10
of the training set. This problem is solved by using

a validation set. After training the network for a -3
10
0 50 100 150 200 250
440 Epochs
300 350 400
-4
10
0 20 40 60
136 Epochs
80 100 120
short time, performance of the network is tested on

(a) the first-layer (b) the second-layer
the validation set. The training is repeated until the
Fig.3 Training evolvement process of the first-layer
recognition rate on the validation set starts to
QNN classifier and the second-layer for numeral pair
deteriorate.
{4,9}
In recognition stage, the features of handwritten
4.2 Recognition results analysis
digital are sent into the first-layer classifier, and can
For experiment one, a three-layer BP network
obtain a result; it is one of the ten character classes.
was employed as the digital recognizer. The network
Then the result from the first-layer classifier is sent
has a ten-output layer (10 nodes standing for digital
into the corresponding QNN numeral pair classifier
from 0-9), and a hidden layer with 30 nodes. The
of second-layer fine classification; the result of the
network is trained by using the training samples
first-layer classification will be confirmed, rejected
which be used by the first-layer QNN classifier, then
or rectified.
a recognition performance is conducted by using
4. Experimental Results and Analysis testing set. The overall recognition rates of the BP
4.1 Experiments of QNN recognition network recognizer are tabulated in Table.1.
We tested the performance of the proposed For experiment two, a QNN is used as a digital
recognition system on the set of handwritten recognizer, and its structure is similar with the
numerals from the MNIST database. Randomly first-layer QNN classifier of the proposed recognition
selected 20,000 samples out of 60,000 training set are system. The training and testing process of the QNN
use as recognizer are similar with the BP network used in
training samples for the first-layer classifier of experiment one. The overall recognition rate is
recognition system, when square sum error (SSE) of

shown in the second row of Table.1. method to improve the performance of recognition
For experiment three, the propose recognition system.
system indicated in Fig.2 is used to do recognition
test. The overall recognition rate for testing data is Reference
shown in the third row of Table.1 [1] Xia Guo-en, Jin Weidong, Zhang Gexiang. Handwritten
Table.1 The comparison of handwritten digital digit recognition method based on combination
recognition rate by BPNN, QNN and MLQNN features[J]. Application Research of Computers, 2006,
Reject Vol.23, No.6, pp.170-172.
Classif Recognit ion Mis-recogn [2] Suen C.Y., Liu K., Strathy N.M.. Sorting and recognizing
ier ion rate(% ition Reliability cheques and financial documents[C]. In proc. Of the
Rate(%) ) Rate(%) Rate(%) Third International Association for Pattern Recognition
Workshop on Document Analysis Systems, 1998,pp.1-18
Nagano Japan
BPNN 89.9 3 7.1 92.6 [3] Cho S.B. Neural-network classifiers for recognizing totally
unconstrained handwritten numerals[J]. IEEE
QNN 91.7 4.9 3.4 96.4 Transactions on Neural Networks, 1997, Vol.8, No.1,
pp.43-53
MLQ
[4] Filatov A., Gitis A, Kil I.. Graph-based handwritten digit
NN 96.5 2.3 1.2 98.8
string recognition[C]. In proceedings of ICDAR’95,
The Table.1 shows the overall recognition rate 1995,pp. 845-848 Montreal
comparison of three kinds of recognizer. The [5] Ha T.M., Bunke H.. Off-line handwritten numeral
recognition rate and reliability of the pure QNN recognition by perturbation method[J]. IEEE
classifier have improved obviously compared to the Transactions on Pattern Analysis and Machine
conventional BP network classifier. It show the Intelligence, 1997,Vol.19, No.5, pp.535-539.
[6] Jie Zhou . Recognition and verification of unconstrained
superiority and potential of QNN in solving pattern handwritten numerals [D].Canada: The Univ.of
recognition problems. The experiment result Concordia, 1999.
demonstrated that our proposed multi-layer QNN [7] Karayiannis N.B., Purushothaman G., Fuzzy pattern
recognition system(MLQNN) achieves excellent classification using feed forward neural networks with
performance in terms of recognition rates and multilevel hidden neurons[C]. IEEE International
recognition reliability compared to the BP network Conference on neural networks, 1994, Vol.3, pp.127-132.
Orlando Florida
and QNN classifiers. [8] Gopathy P., Nicolaos B., Karayiannis N.B., Quantum
Neural networks: Inherently fuzzy feedforward neural
5. Conclusion networks[J]. IEEE Transactions on neural networks, 1997,
Vol.8, No.3, pp.679-693
In this paper, a handwritten digital recognition [9] Behman E.C., Chandrashkar V.G., Wang C.K., A quantum
system based on multi-level transfer function neural network computes entanglement[J]. Physical
Quantum Neural Networks (QNN) is proposed. The Review Letters, 2002, Vol.16, No.1, pp.152-159.
recognition system proposed consists of two layer [10] Zhou J., Qing G., Adam Krzyzak, Recognition of
sub-classifiers, namely first-layer QNN coarse handwritten numerals by quantum neural network with
fuzzy features[J]. IJDAR, 1999, No.2, pp.30-36.
classifier and second-layer QNN numeral pairs [11] Li F., Zhao S.G., Zheng B.Y., Quantum neural network in
classifier. The experiments are performed on speech recognition[C]. 6th International Conference on
handwritten digital come from MNIST database. We Signal Processing, 2002,Vol.2. pp. 312-317, Beijing
obtain good performance on testing set of MNIST China
database: 96.5% recognition rate and 98.8% [12] Purushothaman G. and Karayiannis N. B.. Feed-forward
reliability rate. Experiments demonstrated that our neural architectures for membership estimation and fuzzy
classification[J]. International Journal of Smart
proposed system has improved the recognition rate Engineering System Design,1998,vol, No.1, pp.163-185
and recognition reliability compared to the pure BP [13] Zhou Shude, Wang Yan , Sun Zengqi , Sun Fuchun .
network and QNN recognizers. Quantum neural network[C], China Intelligent
In another, in paper, we only use simple pixel Automation Conference , 2003, pp.163-168. (in
features as direct input of recognition system, and Chinese)
this method have difficulty in describing the [14] Trier O.D., Jain A.K., and Taxt T.. Feature extraction
methods for character recognition—a survey[J]. Pattern
character feature and the strong noises can make the Recognition, 1996, Vol.29, No.4, pp.641-662.
character feature lose or transfer. So in the future, we
will apply other good performance feature extraction

Text-Independent Speaker Identification Using

Fuzzy LS-SVM
Chengfu Yang1,2 , Zhang Yi1 and Stones Lei Zhang1
1 Computational Intelligence Laboratory, School of Computer Science and Engineering
University of Electronic Science and Technology of China. Chengdu 610054,P.R.China.
2 Sichuan University of Arts and Science, Dazhou 635000,P.R.China
(E-mail: ycfwsm@gmail.com {zhangyi,leizhang}@uestc.edu.cn)
Abstract— This paper presents a text-independent speaker these two complementary approaches. There are two manners
identification system using fuzzy Least Squares Support Vector to combine the two models: fusing the results from two
Machines (fLS-SVM) in the score-space. In contrast to the models and embedding one model in another model. A series
most current systems based on frame-level discrimination, the
approach provides direct discrimination between whole sequences of recent papers has reported that the techniques of mixing
by combining Gaussian Mixture Models (GMMs) as underlying generative models and discriminative models had gotten better
generative models and fLS-SVM as multi-classifier with score- performance than any that using single models [8], [9].
space kernel. The improvement can be attributed to better At the same time, some limitations of these approaches lie
feature extraction and fLS-SVM construction based on the on the following facts. The first is that discrimination occurs
score-space. The experiments using the PolyVar database show
that the proposed algorithm outperforms the other systems on between frames, whereas speaker identification is concerned
reducing the relative error rates and reducing the computational with sequence discrimination. The second is that increasing
complexity in so high dimensionality space. in the expected error and computational cost with growing
population size, specially in multi-class SVM [11]. In this
paper, we describe an approach to speaker identification based
I. I NTRODUCTION
on the fuzzy least squares support vector machines [13] with
Speaker recognition has two categories in general: speaker score-space kernel that enables direct discrimination between
identification (SI) and speaker verification (SV). In the iden- sequences to overcome the prior two main drawbacks. We
tification task, an unknown speaker is compared against a performed experiments on the PolyVar database [16] and
database of known speakers, and the best matching speaker compared error rates and computational cost to that of the
is given as the identification result, or rejection is given approach [11]. The simulations report that we can get better
because the applicant is not in database. By text-independent performance not only on error rate, but also on time assume
we mean that the words are unknown and unrestricted in both rate in large population size of multi-class case.
training and testing. Current speaker identification systems The rest of the paper is organized as follows: Section II
are either based on generative probability models, such as provides an overview of GMM-based text-independent speaker
Gaussian mixture models (GMM) [3] and hidden Markov identification system; Section III reviews score-space and
models (HMM) [2], or based on discriminative framework, normalization for the next section to use it as kernel; Section
such as artificial neural networks (ANN) and support vector IV reviews the fuzzy least squares support vector machines for
machines (SVM) [5], [7]. classification; experimental evaluation and results compared to
The generative probability models are usually used for mod- other methods are presented in Section V; in the last section,
elling sequences of data because of their ability to handle vari- some conclusions and future work are given.
able size sequences and missing information. The advantages
of this kind of models lie on the abilities of finding statistic
distribution among training data that reflects the similarity II. GMM-BASED T EXT-I NDEPENDENT S PEAKER
between data in same class. On the other hand, discriminative I DENTIFICATION
models like SVM usually yield better performance in classifi- Over the past decades, GMM have become the dominant ap-
cation problem and can construct flexible decision boundaries proach for modelling in text-independent speaker recognition
that reflect the discrepancies between data in different class. In applications. As a generic probabilistic model for multivariate
general, the discriminative models can get better performance densities capable of representing arbitrary densities, GMM
than generative models in classification tasks, but are short of suits for unconstrained text-independent speaker identification
getting information that reflect the implicit properties among very well [4].
training data. An ideal classifier should have all the power of There are about five steps in text-independent speaker iden-
tification using GMM. The first step is to get training speaker
Grant 60471055, Specialized Research Fund for the Doctoral Program of sets. Suppose that there are n speakers will be trained, and to
Higher Education under Grant 20040614017. every speaker, we get m utterances about 5 minutes in different

TABLE I
environment. The next step is to segment every utterance into
S CORE O PERATORS
frames by a 20-ms window progressing at a 10-ms frame rate.
A speech activity detector is then used to discard silence-
score-operator expression
noise frames. The third step is to get the feature vectors first derivative Ŝ=∇θ
for example, the Mel-scale cepstral feature vectors, extracted first derivative and argument Ŝ=[∇θ , 1]T
from the speech frames. The dimension of feature vector is first and second derivative Ŝ=[∇θ , vec(∇2θ )T ]T
supposed to D. The following step is to get speaker models (or
called client model) and the universal background model (or
called world model) using the expectation-maximization (EM)
algorithm [15]. In the hypothesis prior, there are m component is defined by and derived from the likelihood score of a set
Gaussian mixture models for every speaker, denoted as Mi = of m generative models for a speaker, {pk (X|Mk , θk ) k =
{αij , μij , Σij } f or i = 1...n and j = 1...m. Where 1, ..., m }, where the parameter θk is referred to αk , μk and
the αij , μij and Σij are the mixture weight, mean vector Σk defined in Section II. The generic formulation for mapping
and the covariance matrix respectively. The universal model a sequence X = {x1 , x2 , ..., xN } to the score-space is given
is denoted as Ω. The last step is to identify a speaker with a by [11]:
given utterance with the GMM models.
Given an utterance X = {x1 , x2 , ..., xN }, the probability ΨŜ,f (X) = Ŝ(f (pk (X|Mk , θk ))), (4)
P (X|Mi ), used as the utterance score, is estimated by the where ΨŜ,f (X) is called the score-vector, f (), a function of
mean log-likelihood as the form (1), or by the mean the log- the scores of the set of generative models, is called the score-
likelihood ratio of the client model to the world model as (2) argument and Ŝ(...) is the score-operator that maps the scalar
[11]: score-argument to the score-space. The log-likelihood and log-
likelihood ratio are common score-argument. Several common
1
N options for score-operator were proposed by Smith et al. [10]
Si (X) = logP (X|Mi ) = logP (xk |Mi ), (1) and are summarized in TABLE I. In this paper, we use the
N
k=1 log-likelihood as the score-argument, and first derivative as
1 the score-operator. The log-likelihood score-vector can then
N
P (xk |Mi )
Si (X) = log , (2) be expressed as [11]:
N P (xk |Ω)
k=1
where P (xk |Mi ) is the likelihood of input vector xk given the d d d

ΨF isher (X) = [ , ..., l , ..., l , ...]T logP (X|Mi , θi ),
mixture model, Mi . The form is as (3): dαij dμij dσij
(5)

m
1 for j = 1, ..., m and l = 1, ..., D to the i’th speaker model.
P (xk |Mi ) = αij 1/2
× When fuzzy least squares support vector machines use the
j=1
(2π)D/2 |Σ ij |
linear transformation in feature space, they are not invariant, so
1 normalization of the feature vectors is desirable. In this paper,
exp(− (xk − μij )T Σ−1ij (xk − μij )). (3)
2 we used two stages of normalization as [11]: the first stage
For reasons of both modelling and estimation, it is usual is whitening the data in the score-space by normalizing the
to employ GMM consisting of components with diagonal components of the vectors, ΨF isher (X), to zero mean and unit
covariance matrices. A detailed discussion on the application variance; the second stage is applying spherical normalization,
of GMM to speaker modelling can be found in [4]. the modified stereographic projection is used in this paper.
The kernel used in fLS-SVM for a given score-space is
III. S CORE -S PACES AND N ORMALIZATION constructed from the mapping as (6):
Discriminative classifications between frames have devel- K(Xp , Xq ) = Ψ(Xp )T GT GΨ(Xq ), (6)
oped in so many fields for last several decades. But discrim-
inative classification between complete utterances is difficult where GT G is the metric of the space and the subscript on
since sequences have different lengths. The general methods X enumerates the sequences. In this way, we can get the first
developed by Jaakkola and Haussler [6] in such field is through stage normalization whitening. In the case of the log-likelihood
mapping from a variable-length sequence to a fixed-length score-space mapping, GT G is the inverse Fisher information
vector, which was generalized in [10] as a technique referred matrix as (7):
to as score-spaces. In score spaces, a set of sequences are GT G = (E{U (X)U (X)T })−1 , (7)
mapped to a comparatively high-dimensional feature space
where the discriminative classifiers such as SVM, which uses where U (X)=Ψ(X)-E{Ψ(X)} and E is the expectation op-
the mapping as kernel, can be used to discriminate different erator.
class. Spherical normalization is a preconditioning step employing
In general, score space is achieved by applying some a transformation that maps each feature vector onto the surface
operator to the likelihood score of a generative mode such of a unit hypersphere embedded in a space that has one
as GMM outlined in Section II. In this paper, the score-space dimension more than the feature vector itself. Dot products

between high-dimensional vectors may lead to an ill- output pattern. The LS-SVM approach aims at constructing a
conditioned Hessian since the dynamic range of the result is classifier as:
large. If the vectors have unit length, then the dot product is

N
just the cosine of the angles of them and the result must be in f (x) = sign( αk yk Ψ(xk , x) + b), (10)
the range -1 to +1. This procedure leads to the large reduction k=1
of parameters in score-space. The modified stereographic
projection used in this paper as the spherical normalization is where αk are support values and b is a real constant bias. For
depicted as Fig.1.: Ψ(x, y)=ϕ(x)·ϕ(y) called Mercer kernel, one typically has the
linear kernel; polynomial kernel; RBF kernel and MLP kernel
[12].
x O’ LS-SVM classifiers introduced in [13] are obtained as
d solution to the following optimization problem:
1 2
N
φ ( x) 1 T
min ζ(w, b, e) = w w+γ ek , (11)
w,b,e 2 2
O k=1
1 ªx º
φ ( x) = « » subject to the equality constraints:
x2 + d 2 ¬d ¼
x1 ⋅ x2 + d 2
K ( x1 , x2 ) = yk [wT ϕ(xk ) + b] = 1 − ek , k = 1, ..., N. (12)
( x1 + d 2 )( x2 2 + d 2 )
2
Then the Lagrangian form as:

Fig. 1. Spherical Normalization: the modified stereographic projection

N
L(w, b, e; α) = ζ(w, b, e)− αk (yk (wT ϕ(xk )+b)−1+ek ),
Then for the score-space kernels as (6), the mapping applied
k=1
explicitly to the score-vectors is as (8): (13)
where αk are Lagrange multipliers, which can be either
1 positive or negative.
Ψ(X) → φ(Ψ(X)) = [Ψ(X) d]T ,
Ψ(X) · Ψ(X) + d2 The conditions for optimality is as:
(8) ⎧ N
where Ψ(X) is the whitened score-vector of the sequence X. ⎪
⎪ ∂w = 0 → w =
∂L
k=1 αk yk ϕ(xk )
⎪
⎪
The spherically normalized sequence kernel becomes: ⎪
⎪
⎪
⎪ N
⎪
K(Xp , Xq ) = φ(Ψ(Xp )) · φ(Ψ(Xq )). (9) ⎨ ∂L∂b = 0 → k=1 αk yk = 0
(14)
⎪
⎪
IV. F UZZY L EAST S QUARES S UPPORT V ECTOR M ACHINES ⎪ ∂e
⎪
∂L
= 0 → α k = γek
⎪
⎪ k
⎪
⎪
Support vector machines (SVM) introduced in [12] is used ⎪
⎩ ∂L = 0 → y (wT ϕ(x ) + b) − 1 + e = 0,
to solve pattern recognition and nonlinear function estima- ∂αk k k k
tion problem. By mapping the data into a high dimensional

for k = 1, ..., N . (14)can be written as the linear system as:
input space through the kernel trick existing on the Mercer

conditions, SVM solution can obtain a global solution, mean- 0 YT b 0
= , (15)
while the VC dimension is minimized based on the structural Y ZZ T + γ −1 I α 1
risk minimization principle. At the same time, the capacity
concept with pure combinatorial definitions, the quality and where Z=[ϕ(x1 )y1 ;...;ϕ(xN )yN ], Y = [y1 ; ...; yN ], 1 =
complexity of the SVM solution does not depend directly [1; ...; 1], e = [e1 ; ...; eN ], α = [α1 ; ...; αN ]. When eliminate
on the dimensiality of the input space. SVM proposed by the parameter w and e, we can get:
Vapnik [12] are trained by solving a quadratic optimization The matrix (15) is not positive definite. Hence in this form
problem. Least squares support vector machines (LS-SVM) it cannot be solved directly. However, (15) is equivalent to
proposed by Suykens [13] are trained by solving a set of linear solve the formula (16):
equations that is more fit in with the large scale data case. In
general, speaker identification is a multi-class problem, but s 0 b Y T H −1 1
= , (16)
the primary SVM and LS-SVM are formulated for two-class 0 H α + H −1 Y b 1
classification problems. An extension for multi-class problems
is not unique. In this paper, we adopt the modified fuzzy least with s = Y T H −1 Y > 0, H = Ω + γ −1 I, Ω = ZZ T . In this
squares support vector machines (fLS-SVM) [14] based on the paper, we use the RBF kernel as (17):
one-against-one multi-class support vector machines.
Ψ(x1 , x2 ) = exp(− x1 − x2 2 /σ 2 ). (17)
A. Least Squares Support Vector Machines The parameter σ of the kernel, can be optimally chosen by
Given a training set of N data points {xk , yk }N
k=1 , where optimizing an upper bound on the VC dimension, which
xk ∈ Rn is the k-th input pattern and yk ∈ {1, −1} is the k-th involves solving a quadratic programming problem as in [12].

B. Multi-class Least Squares Support Vector Machines The another is using the average operator as:
The formulation of LS-SVM is based on a two-class clas-
n
1
sification problem. Since LS-SVM determines the decision mi (x) = mij (x). (23)
boundary directly by several equation as (16), an extension n−1
j=1,j=i
to multi-class problem is not unique. There are roughly four
Which class the data x is classified into depends on the
types of algorithms that can handle multi-class problem in LS-
following formula:
SVM: one-against-all algorithm; one-against-one algorithm;
error-correcting-output code (ECOC) algorithm and all-at-once arg max mi (x) (24)
algorithm. In this paper, our methods are based on one-against- i=1,...,n
one algorithm. In this paper, the minimum operator is used. The unclassifiable
In one-against-one classification, we require a binary classi- region shown in Fig.2. is resolved as shown in Fig.3.
fier for each possible pair of classes and the number of the total
pairs is n(n-1)/2 for a n-class problem. The decision function
for the pair of classes i and j is given by:
F23(x)
T
Fij (x) = wij ϕ(x) + bij , (18)
where Fij = −Fji . Then for the datum x we can calculate F21(x)

n
Fi (x) = sign(Fij (x)), (19)
j=i,j=1
and the datum x is classified into the class: F31(x)
arg max Fi (x). (20)

i=1,...,n
C. Fuzzy Least Squares Support Vector Machines Fig. 3. Resolution of the Unclassifiable region by the Minimum Operator
If there is only one i satisfied the formula (20), the datum
x is classified into class i. But if (20) is satisfied for plural i’s,
x is unclassifiable. The Fig.2. denotes the case: when x is in
the shaded region, Fi (x) = 1 for i = 1, 2, 3. So, from (20), x V. E XPERIMENTS
is unclassifiable. A number of development experiments and testing ap-
proaches were carried out using the PolyVar database [16]. The
PolyVar database consists of 38 client speakers, 24 male and
F23(x) 14 female, recorded over a telephone network. 85 utterances
were recorded from each speaker in 5 sessions, with 17
F21(x) utterances per session. There are also 952 impostor utterances
from 56 speakers, each contributing 17 utterances in a single
session.
A speaker identification system is composed of two distinct
phases, a training phase and a test phase. There are about four
F31(x)
main steps in our algorithm.
The first step is feature extraction from speaker frames. In
this paper, we used two kinds of features to quantize a speaker
as the TABLE II: perceptual features and L-order frequency
Fig. 2. Unclassifiable Region by One-against-One Classification cepstral coefficient (FCC) features. From the TABLEII, a
(14+2L)-dimensional feature vector is proposed for every
According to [14], the fuzzy membership function is used to frame in a speaker utterance. The frame length is 256 samples
avoid the unclassifiable case. We define the one-dimensional (20 ms) with a 128-sample (10 ms) overlap between adjacent
membership function mij (x) in the direction perpendicular to frames. In general, there are three steps in feature extraction:
the optimal separating hyperplane Fij (x) as follows: 1) Preprocessing: This procedure can be implemented via
⎧
⎨ 1 f or Fij (x) ≥ 1, a pre-emphasizing filter that is defined as (25)(26)(27).
mij (x) = (21)
⎩ sn = sn − 0.96 × sn−1 , f or n = 1, ..., 255. (25)
Fij (x) otherwise.

There are two ways to define the membership function, mi (x), where sn is the nth sample of the frame s and s0 = s0 .
of x for class i. The one is using the minimum operator as: Then, the pre-emphasized frame is Hamming-windowed
by
mi (x) = min mij (x). (22)
shi = si × hi , f or i = 0, ..., 255. (26)
j=1,...,n

TABLE II TABLE III
L IST OF E XTRACTED F EATURES R ESULTS BASED ON S CORE - SPACE WITH D IFFERENT PARAMETERS .
Type of Number of Algorithm (%)Correct (%)Correct Total

Features transforms features Identification Rejction Complexity(s)
Perceptural feature of Wavelet 3 GMM 84.5 98.5 20
Subband power Pj fLS-SVM
Perceptural feature of Wavelet 1 (γ=0.2) (RBF:σ=60) 64.2 85.3 15
Pitch frequency fp GMM+Score-space
Perceptural feature of Fourier 1 +fLS-SVM 92.1 99.5 25
Brightness wc (γ=0.2, RBF:σ=60)
Perceptural feature of Fourier 1 GMM+Score-space
Bandwidth B +fLS-SVM
Frequency cepstral Fourier L (γ=0.2, RBF:σ=30) 89.7 96.1 30
coefficient cn GMM+Score-space
+SVM 87.8 97.5 35
(C=30, RBF:σ = 60)
with hi = 0.54−0.46×cos(2πi/255). The pre-processed

frame will be detected as a nonsilent frame for feature
identification, fLS-SVM without quadratic programming out-
extraction if the total power is as
perform the standard SVM on speed, at the same time, get
Σ255 h 2 2
i=0 (si ) > 400 , (27) more accuracy because of the fuzzy membership function.
So, speaker identification algorithm based on GMM+Score-
where 400 is an experience value. space+fLS-SVM outperforms the other present algorithms.
2) Feature extraction from nonsilent frames: This procedure In the future work, we will extend fLS-SVM based on score-
can be implemented via the wavelet transform and other space to more ASA (Audio Signal Analysis) problems. Finally,
general methods [3][4][6][11]. sparseness of fLS-SVM must be solved in our applications.
3) Some pre-normalization, such as whitening and spheri-
cal normalization for training and testing feature vector
R EFERENCES
can be implemented.
[1] T. Matsui and S. Furui, A text-independent speaker recognition method
The second step is to get the GMM space use EM algorithm. robust against utterance variations, in Proc, IEEE ICASSP, pp. 377-380,
The third step is to project the GMM space to the score- 1991.
space and normalization is implemented. The last step is using [2] N.Z. Tishby, On the application of mixture AR hidden Markov models
to text-independent speaker recognition, IEEE Trans. Signal Processing,
the vectors in score-space to train fLS-SVM to get the multi- vol. 39, pp. 563-570, Mar. 1991.
classification boundaries. [3] D.A. Reynolds, Speaker identification and verification using Gaussian
In order to compare the different algorithms on accuracy and mixture speaker models, Speech Commun.,vol. 17, pp. 91-108, 1995.
[4] D.A. Reynolds and R.C. Rose, Robust text-independent speaker iden-
complexity, four experiments were implemented as follows: tification using Gaussian mixture speaker models, IEEE trans. Speech
24 speakers are selected, 12 male and 12 female. For every Audio Process, pp. 72-83, March, 1995.
speaker, we select four sessions with 10 utterances per session, [5] M. Schmidt and H. Gish, Speaker identification via support vector
classifiers, in Proc. ICASSP, vol. 1, pp. 105-108, 1996.
and every utterance is of 30s length. For testing, there are [6] T.S. Jaakkola and D. Haussler, Exploiting generative models in discrimi-
also 20 impostor utterances from 5 speakers, each contributing native classifiers, in Advances in Neural Information Processing System
4 utterances in a single session. So, the parameters in the 11, MIT Press, 1998.
[7] V. Wan and W.M. Campbell, Support vector machines for speaker
experiments are: n = 24, m = 4, D=14+2L and the RBF- verification and identification, in Proc. Neural Networks for Signal
kernel is selected with two different σ value. The identified Processing X, pp. 775-784, 2000.
accuracy ratio, rejected identification ratio and the computa- [8] S. Fine, J. Navratil and R.A. Gopinath, A hybrid GMM/SVM approach
to speaker identification, in Proc, ICASSP, vol. 1, pp. 417-420, 2001.
tional complexity are adopted as the evaluated protocol. The [9] L. Quan and S. Bengio, Hybrid generative-discriminative models for
compared results are denoted in TABLE III. speech and speaker recognition, Tech. Rep. IDIAP-RR 02-06, IDIAP,
From the TABLE III, we can see: the GMM+Score- 2002.
[10] N. Smith and M.J.F. Gales, Using SVM’s and discriminative models for
space+fLS-SVM algorithm outperform the other algorithms in speech recognition, in Proc, ICASSP, vol. 1, pp. 77-80, 2002.
the table not only on accuracy ratio, but also on computational [11] V. Wan and S. Renals, Speaker verification using sequence discriminant
complexity. support vector machines, IEEE Transactions on Speech and Audio
Processing, vol. 13, no. 2, pp. 203-210, 2005.
[12] V.N. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.
VI. C ONCLUSIONS [13] J.A.K. Suykens and J. Vandewalle, Least squares support vector machine
classifiers, Neural Processing Letters, pp. 293-300, 1999.
GMM can get high accuracy on the hypothesis of Gaussian [14] Daisuke Tsujinishi and Shigeo Abe, Fuzzy Least Squares Support Vector
distribution for the processing information. At the same time Machines for Multiclass Problems Neural Networks, pp. 785-792, 2003.
the speech information are not always according to Gaussian [15] R.A.Redner and H.F. Walker, Mixture densities, maximum likelihood
and the EM algorithm in SIAM Rev., vol. 26, pp. 195-202, 1984.
distribution, so, speaker identification based on GMM com- [16] G. Chollet, J.L. Cochard, A.Constantinescu, C.Jaboulet and P. Langlais,
bined with discriminative algorithm, such as SVM, can get Telephone Speech Databases to Model Inter-and Intra-Speaker Variability
more correct performance. Because the algorithms based on J. Nerbonne, Ed: Linguistic Databases, pp. 117-135, 1997.
frame will lost many information between frames, the score-
space based on utterance is necessary. For multi-class speaker

A canonical integrator environment for the development of

connectionist systems
Diego Ordóñez, Carlos Dafonte, Alfonso Iglesias, Bernardino Arcay
Information and Comunications Technologies Department
University of A Coruña, 15071, A Coruña, Spain
{dordonez, dafonte, alfonsoiglesias, cibarcay}@udc.es
Abstract: The present article proposes a framework 2 Objectives and scope

for the difinition and generation of artificial neural net-
works. The building blocks of this framework are a Our main objective is not to compete with widely ac-
library and two tools that provide various ways of in- cepted ANN development tools in terms of quantity by
teraction with the library for various user profiles. This increasing, for instance, the amount of supported net-
proposal solves the problem of using trained networks work architectures or algorithms to train them. Rather,
in the output platform where they will eventually be we have focused on solving problems that are common
operative by generating the networks source code as to many ANN-related tools, hereby following the good
a component that can easily be integrated into other, practices of reusable components, applications integra-
more general software. At the same time, we try to de- tions, platform transparency, and object orientation.
tach the networks from a concrete tool and provide the The reusable components principle was applied at
information of their definition and state in the exchange various levels. On the environment level, it allows the
format XML. This data format allows the interoperabil- user who has trained a network to recuperate it at any
ity between tools and provides an adequate framework time without having to train it again. On the devel-
for the transfer and storage of information related to opment level, it sustains the design of the environment
neural networks. The network architectures and learn- which is based on reusable software components.
ing algorithms were verified in the solution of two real The transparency of the platform requires the user
problems with satisfactory results. to make abstraction from the development platform in
which the networks will be developed and the environ-
ment in which the resulting networks will be operative.
It is the way to obtain networks that allows us to create
light and efficient software components that represent
these networks.
1 Introduction Object-orientation plays a fundamental role in the
work methodology that is proposed for the development
of this environment. We applied architectonic patterns
Artificial neural networks (ANNs) belong to the com-
[8] in order to guarantee the environment’s robustness,
putational models that are inspired on the solutions
minimize errors, and provide high quality documenta-
devised by nature in the course of evolution. Our work
tion. We also promoted the incremental development
is focused on developing mechanisms that allow us to
of the software by means of mechanisms that simplify
manipulate these networks and generalize their use by
as much as possible the design of new components.
presenting the user with a framework that allows him to
focus on the networks rather than their implementation
and execution platforms.
2.1 The basic principles
Various environments provide the user with high
Most distinctive functionalities of this environment sub-
level mechanisms that allow him to accelerate the de-
sist on the principle of a canonical format for network
velopment of solutions: whereas Matlab’s Neural Net-
representation. This format is based on XML schemes
work Toolbox is a commercial version, SNNS is a free
that provide the correct way of representing the net-
software tool. A common characteristic for these tools
works and checking whether a determined definition is
is that they do not provide a satisfactory support for
valid.
the export of their functionalities and the use of ANNs
in other platforms. This format gives us the freedom to operate the
networks in various environments and has two logical
The here proposed alternative tries to solve some advantages: it facilitates the integration of applications
of the problems of the existing tools for the develop- through a common language that allows their commu-
ment of connexionist models by providing a series of nication, and it is adequate to represent the information
features that are not easily found: generation of mul- and provide us with a robust framework to solve prob-
tiplatform code, abstraction with respect to the final lems related to the storage of networks into databases
execution platform, and user-friendliness. or their transmission through the network.

Complex internal operations such as code genera- 2.3 Applications integration and
tion, which require network transformations, are based unification
on XSLT style sheets ([3] and [11]) and on pre-existing
libraries that allow these transformations. The use of a canonical format to represent XML net-
The software components are developed in JAVA, works offers a series of benefits to both users and devel-
but we represent the network instances in XML [2] be- opers. Considering the fact that there are XML parsers
cause this allows us to integrate the tool with other for almost any language or platform, and that it is an
software platforms. XML is presented as an internal adequate format to transfer information, we dispose of
functioning mechanism that provides flexibility for the a good alternative for the communication (integration)
implementation of tools, but it is only one of the many of applications. The transmitter and the receiver are
network views that are available to the user. We do not applications, the XML documents are the message, and
merely wish to represent the data, we also provide con- the channel can be presented in any shape, from a file
crete implementations by extending the functionalities that stores the XML, over an object in the memory
to diverse output platforms, as can be seen in section that represents it (XMLBean), to a remote call over
2.2. the network.
We can imagine a set of tools that collaborate to
2.2 Generation of multilanguage and offer solutions: one tool may be adequate for the de-
sign of network architectures, another can be efficient
multiplatform code in training networks. This work proposes the necessary
One of the distinctive features of this tool is code gen- mechanisms for these tools to communicate and inter-
eration. In our proposal, code generation does not refer operate to solve the problem and generate as soon and
to transferring the network content to a file in a deter- simply as possible the best final product (the trained
mined format and then providing a library that is in network).
charge of parsing this file and loading the network into Another consequence is the minimization of the
the memory. We propose something quite different: a tools’ learning curves, which foment competitivity. A
file in library style that is apt for a very specific network tool that appears on the market and tries to compete
and can be imported into other software as a compo- with the existing tools may be very good, but the user
nent. This alternative has three direct consequences. still needs to learn how to use it. However, if this new
1. Transparency of the exploitation platform. We tool is especially apt for a specific functionality, the
can manipulate the networks without having to user can continue using his familiar tool for common
worry about where they will be used. tasks and use the new tool for that specific function-
ality. This may help to reduce the reticence of users
2. Reusability. One network can be used in various towards novelties.
contexts to solve the same problem, regardless of
The existing tools that use proprietary formats do
the language in which it is coded.
not have to radically change their modus operandi in
3. We provide the source code. This is one of the key order to adopt this format; they only have to facilitate
factors of the environment and allows the code to adequate XML converters, which is considerably less
be compiled for a specific destination platform. complicated than having to operate directly with the
If we generated precompilations, we would de- new format.
termine the destination platform of the network
in which the code can be used and limit the effi-
ciency of the programme. A specifically compiled
code is always more efficient for an architecture.
3 Components
A fourth, indirect consequence is due to the fact The kernel of the system is formed by the library and
that the code generation was designed in a way that two tools that are mentioned in section 2 and shown in
allows the functionality of reusable component soft- Figure 1.
ware can be easily integrated into other network-related
tools.
We provide XSLT style sheets to transform the net-
works to ANSI C; also, in this environment we can
load networks from their representation in XML and
use them in JAVA programmes. The initial choice of
transforming networks to ANSI C was not accidental:
we pretended to provide an alternative that can be used
in almost any platform, from a PC with a random type
of operating system to microprocessors embedded in, Figure 1: Developed components
for instance, a washing machine. ANSI C compilers
exist for virtually all hardware platforms.

3.1 eXtensible Object Oriented Ar- contained in the potential functionality of the XOANE
tificial Neural Networks Engine library. However, there are differences with regard to
visualization and information management:
(XOANE)
• Focus based on forms
The main component of the system is the XOANE li-
brary, which is the result of factorizing the training ar- • More detailed information on the training
chitectures and algorithms and other ANN-related enti- • Better oriented error messages
ties. Their definition will mark the limit between what
• Persistency of the information (patterns manage-
can and cannot be done with the rest of the developed
ment, ANNs, etc)
components, which only provide the interface for han-
dling the library. The focus is similar to what is offered by a stan-
The current version of the library gives support to dalone application, but with the advantage that it does
various entities. In what follows, we show an inventory not require to be installed. The user who possesses an
of some of these entities according to their category, account through which he can access the application
and a brief explanation of why they were selected: has the complete functionality within his reach. The
ANNs that belong to a determined user are available
• Network Architectures [7]: We tried to se- in any place. The application provides the user with
lect a significant architectures subset: Feed For- a way to organize his information: pattern and ANNs
ward (with proven applicability for the resolution repositories, the possibility to save intermediate net-
of real problems), SOM (distinctly representative works during the training and retain the one that gen-
of competitive architectures), Cascade Correla- eralizes best, the possibility to concatenate trainings for
tion [6] (architecture network of the incremental a determined instance of a network if we do not agree
type), CPN (typical example of a composed net- with the performed training, etc.
work), and Hopfield (for its behaviour as a self-
associative memory).
• Training algorithms [7]: Backpropagation (on-
line and batch), Kohonen, Fahlman [6], Hopfield,
and Counterpropagation. These algorithms were
chosen according to the selected network archi-
tectures and will be used to train them.
• Activation Functions: Linear, Sigmoid, Hy-
perbolic Tangent, Hardlimit, and their deriva-
tives if they exist. In our opinion, these are the
most frequently applied activation functions.
We have incorporated the capability of ANN com-

position, which is a relevant functionality. The compo-
sition propagates the output of a network towards the Figure 2: Content of a page from the Web applica-
inputs of another with which it connects. The output tion
of the composed network is the result of propagating
inputs throughout all the intermediate ANNs. The out- We have implemented the necessary mechanisms to
put network of the composition is that network whose guarantee the storage of the information in a database
output is not connected to the inputs of any other net- and recuperate it upon request. This feature also pro-
work. vides the user with more capacity to manage the enti-
Among the innovating features are the neighbour- ties; capacities that manifest themselves in the shape
hood functions, which are adapted to the problem for of pattern and ANNs repositories that can be searched
the Kohonen training. We can define high level neigh- and recuperated at any given moment.
bourhood functions for the training (cylindrical, spher- The Web application was elaborated according
ical, linear, rectangular, etc), which means that in- to the architectonic design patterns MVC (Model-
stead of indicating connectivities, we need to specify View-Controller) and Layers. The application view
the shape towards which the user wants the forms to was generated with JSP pages, and the controller
tend. was developed with Struts (http://struts.apache.org).
The application is packed in a standard WAR file
and deployed in the Web J2EE container Tom-
3.2 Web Application
cat (http://tomcat.apache.org). The information
The functionality with regard to handling ANN-related is saved in a database that can be accessed via
entities is exposed by the web application and contained JDBC; the used database manager is PostgreSQL
in the functionality of the interpreter, which in turn is (http://postgresql.org).

3.3 Ad Hoc Language and Inter- the parsing tool Javacc (https://javacc.dev.java.net/).
preter The parsing was approached with a Lazy evaluation
strategy, which, applied to this case, means analyzing
The grammar was defined for a declarative language everything, checking that everything can be done, and
that allows us to define the architectures, specify the in case all is correct, carrying out the pertinent opera-
training parameters, and perform the ANN-related options. In many occasions it may be irrelevant whether
erations. we opt for a Lazy or Eagle [1] strategy, but in this case it
The idea behind this alternative is to equip the is not. If we have two training instructions, the system
user with more expressive capacity, an alternative that spends a considerable amount of time evaluating and
allows him to carry out more complex operations ac- processing the first instruction and then continues with
cording to his knowledge. It is an alternative for ap- the second instruction. If we notice a logical error that
plications of the WYSIWYG (What You See Is What is not related to the parsing, the interpreter ends with
You Get) type, such as the Web application presented errors and much time will have gone by before the er-
in section 3.2. ror is notified. The proposed strategy, on the contrary,
notifies errors before processing any instructions.
4 Results
The functioning of this tool and the capacities of the
implemented algorithms were tested with two real and
different problems that allow us to compare our results
with those of similar works. The first problem is a clas-
Figure 3: Declaration example of the training of a sical case in our field, whereas the second one originates
SOM network with the Kohonen algorithm in the field of astrophysics and is currently being stud-
ied by our research group [5] y [10]. The test cases are
The code of figure 3 declares the training of a self- the following:
organized map with the Kohonen algorithm. The pa-
rameter of the exposed algorithm is a decline function
of the learning rate of the gaussian type. It represents • Determination of the age of abalones.
a bidimensional gaussian function that models how the
• Stellar classification in the Morgan Keenan scale.
learning rate decreases according to the time (steps)
and the distance of the winning process element from
its neighbours. Another important feature is how we
define the neigbourhood relations between the process
4.1 Abalone age prediction
elements. We specify a form, not connectivities. In the
example, we have specified that during the training the
The purpose of this case is to predict the age of abalones
process elements of the map influence each other mutu-
from physical measurements. The age of abalones is de-
ally as if they were placed in a cylinder. The patterns
termined by cutting the shell through the cone, staining
are declared by specifying the shape in which they will
it, and counting the number of rings through a micro-
recover: in this case, by declaring ANNDFFile we say
scope. It is a boring and time-consuming task. Other
that they can be found in a file and that the parameter
measurements, which are easier to obtain, are used to
indicates the route.
predict the age.
The source programme can be reused in subsequent
testing, or it can be used in other developments as a The patterns that we use provide a series of mea-
template. The result of the programme will be the code surements to extract the number of growth rings from
in some or other output language that represents some the abalones and deduct their age. The number of rings
of the instances declared in the programme. The tool plus 1.5 indicates the age in years.
consists in the compiler that interprets this programme The data comes from an original (non-machine-
and returns the source code for the output platform. learning) study [12]. The patterns set consists of 4177
The interpreter provides the functionality of sep- examples with 8 attributes each and the number of rings
arating the definition of the network from the imple- for each abalone as an output. Although we have a sig-
mentation thanks to the declarative language. This ab- nificant amount of examples, the set is badly distributed
straction allows the user to focus on the ANNs instead because it counts with considerably more abalones of in-
of on how to implement them. termediate ages than with young or old abalones. The
The interpreter is the tool that processes the complete set was divided into 80% (3341) for training
declarative language. It was elaborated on the basis of and 20% (836) for testing.

Name Data Type Meas [5] and [10]. We also obtained acceptable results with
other architectures and algorithms: the counterpropa-
Sex nominal - gation networks allowed us to correctly classify 80% of
Whole weight continuous grams the test examples.
Length continuous mm
Shucked weight continuous grams
Diameter continuous mm 5 Conclusions
Viscera weight continuous grams
Height continuous mm Whereas the library provides what is needed to define
Shell weight continuous grams the ANNs and their handling, and the tools abstract the
implementation platform and the network destination
Table 1: Input parameters that determine the age platform, the user focuses exclusively on the develop-
of an abalone ment of the networks.
The framework was entirely developed in Java and
The tests carried out with Feed Forward networks can therefore not be used in all the platforms for which
obtained a success rate that varied according to the tol- a virtual machine exists.
erance, as can be seen in table 4.1. These results can One of the most noticeable features of the devel-
be compared to previous studies such as [13] and [4]. oped environment is the possibility to derive from an
The tolerance is the maximum distance from the cor- instance the output code of a network. We have ob-
rect value that is still considered valid. If the correct served the addition of new destination languages to the
value is x and the network output is y,Vapplying toler- development, as well as the extension of other features
ance t, then y is correct if ((x + t) > y) ((x − t) < y). such as network architecture, training algorithms, etc.
Therefore, and following the guidelines of the design, we
Tolerance Backprop can increasingly develop components that subsequently
1 26.31 will be added to the rest of the software. Another
4 92.46 advantage is that the generated compact code can be
2 68.42 used in systems with reduced computational capacities.
5 96.17 For instance, in the case of the network that solve the
3 85.76 abalones problem, the compiled and linked code occu-
pies approximately 35KB (gcc compiler version 4.1.0,
Table 2: Error percentages in the test set examples default options).
(ages of the abalones) Grouping the functionalities of the ANNs into a
library will allow the future development of new tools
based on this library without having to re-implement
4.2 Morgan Keenan classification the same features. This implies a smaller error margin
In astronomy, stellar classification is initially based on and an important decrease in development time.
the temperature of star surfaces and their associated One of the most powerful features of this proposal
spectral features. The information that is obtained with is the XML representation of the networks, which pro-
a spectroscope can be classified by locating and measur- vides interoperability between tools and comfortable in-
ing the absorption and emission lines, spectral energy, formation transfer and storage.
and molecular bands (which is often a complex process). The developed tool was successfully used to solve
Particular absorption lines can be observed for only a problems with diverse features and difficulties, as can
certain range of temperatures [9]. be observed in section 4.
The Morgan Keenan spectral classification is
among the most common classifications. The classes
are ordered from hot to cold as follows: O, B, A, F, G, References
K, M.
We train patterns with a high dimensionality. The [1] G. Brassardand P. Bratley, Fundamentos de Algo-
inputs of the data set have a range of 659, which is ritmia, Prentice Hall (1997).
the number of spectral points that is provided by the [2] T. Bay, J. Paoli, C.M. Sperberg-McQueen, E.
spectroscope and the input that feeds the network. The Maler, F. Yergeau and J. Cowan, Extensible
outputs are binary; there is one output for each of the Markup Language (XML), W3C Recommendation
seven classes that we wish to recognize. (2006).
The best results were obtained with Feed Forward
[3] J. Clark, XSL Transformations (XSLT), W3C
networks and the error backpropagation algorithm. At
Recommendations (1999).
best we were able to classify correctly in 92% of the
test set examples, a good percentage when compared to [4] D. Clark, Z. Schreter, A. Adams, A Quanti-
previous studies that applied diverse techniques such as tative Comparison of Dystal and Backpropaga-

tion, Australian Conference on Neural Networks [9] W.W. Morgan , P.C. Keenan , E. Kellman, An
(ACNN’96). atlas of stellar spectra with outline of spectral clas-
[5] C.Dafonte, A. Rodriguez, B. Arcay, I Carri- sification, University of Chicago Press (1943).
cajo and M. Manteiga, A comparative study of [10] A. Rodriguez, B. Arcay, C. Dafonte, M. Manteiga
KBS, ANN and Statistical Clustering thecniques and I. Carricajo, An automated knowledge-based
for unattended stellar classification, Lecture Notes analysis and classification of stellar spectra using
in Computer Science (LNCS) (2005) Vol 3773, pp. fuzzy reasoning, Expert Systems with applications,
566-577, Springer Verlag. vol. 27(2), pp. 237-244. Elsevier Science, 2004.
[6] S. Fahlman and C. Lebiere, The cascade-
[11] The Apache XALAN Project,
correlation architecture, In Touretzky, D.S, ed-
http://xalan.apache.org
itor, Advances in Neural Information Process-
ing Structures II (1990), pages 524-532. Morgan- [12] J. N. Warwick , T. L. Sellers, S.R. Talbot, A.J.
Kauffmann. Cawthorn and W. B. Ford, The Population Biol-
[7] J.A. Freeman, D.M. Skapura, Neural networks: ogy of Abalone (Haliotis species) in Tasmania. I.
algorithms, applications, and programming tech- Blacklip Abalone (H. rubra) from the North Coast
niques, Addison-Wesley 1991. and Islands of Bass Strait , Sea Fisheries Division,
Technical Report No. 48 (ISSN 1034-3288), 1994
[8] E. Gamma, R.Helm, R.Johnson, J. Vlissides,
Design Patterns: Elements of Reusable Object- [13] S. Waugh, Extending and benchmarking Cascade-
Oriented Software, Addison-Wesley Proffesional Correlation, PhD thesis, Computer Science De-
Computing Series (2000). partment, University of Tasmania, 1995.

Stock Price Forecasting Using a Recurrent Fuzzy

Neural Network
Menghua Tong
Department of Quantitative Economics, Dongbei University of Finance & Economics, tongmenghua@yahoo.com.cn
Qizhi Zhang
Department of Computer Science and Automation, Beijing Institute of Machinery, zqzbim@yahoo.com.cn
Woonseng Gan
School of EEE, Nanyang Technological University, Singapore
Abstract: Stock price forecasting using a Recurrent Fuzzy Neural such as stock prices based on the past history of stock prices.
Network (RFNN) is considered. The proposed RFNN is a The NN can be retrained when the new data comes. The
feed-forward fuzzy neural network (NN) with local full feedback ability of the NN to be retrained on-line in the process of its
connections that are used to construct dynamic fuzzy rules. Online usage is very useful in applications of NN in financial
dynamic back-propagation learning algorithm based on the error
analysis and predictions [1]. The focus of this study is to
gradient descent method is proposed, and the convergence is
proven. Because RFNN can capture the dynamic behavior of a
provide a RFNN model to stock price forecasting, and the
system through the feedback links, the exact lag of the input RFNN model can be trained on-line.
variables need not to be known in advance. Only one input node is This paper is organized as follows. Section 2 describes the
used, and the drawback of input nodes selection in feed-forward architecture of recurrent fuzzy NN (RFNN) and their
neural networks is overcome. The proposed RFNN is applied to functions. Section 3 present learning algorithm and
stock price forecasting. The simulation results show that the convergence for RFNN. The stock prices forecasting
proposed algorithm is effective for stock price forecasting. simulating results using proposed RFNN are given in
Section 4. The conclusions are given in Section 5.
I. INTRODUCTION
Application of the neural networks for financial market II. STRUCTURES OF RFNN
prediction is an interesting area attracting much research A diagonal RFNN (DRFNN) structure is shown in Fig. 1.
effort [1-6]. The most common form of financial forecasting The system has five layers as proposed in Ref. [8]. A model
system is the feedforward neural network (NN) using the with two inputs and a single output is considered here for
gradient descent-based back-propagation (BP) algorithm [1]. convenience. The nodes in Layer 1 are input nodes that
Using the NN in financial forecasting usually it is trained directly transmit input signals to the next layer. Layer 5 is
with thousands of input data and retrained many times as it is the output layer.
used. It can last several hours or even weeks [1]. A local
approximate NN, known as a radial basis function (RBF) V
network, can be introduced to improve convergence G R N
performance [2]. The train sets selection and optimal x(k)

G
partition algorithms are studied to improve the forecasting W
ability [2-3]. The NN topologies must be selected first, and G
u(k)
the performances are varies for difference NN structures [1]. R N 6
A feedforward NN is a static mapping. With tapped delays, a G
feed-forward NN can be used to represent a dynamic x(k-1)

G
mapping, but a large number of neurons are required for
representing dynamic responses in the time domain. On the G R N
other hand, recurrent NN (RNN) may be used to deal with

time-varying input or output through their natural temporal Fig. 1. Structure of five-layered DRFNN
operation itself. Thus, an RNN is a dynamic mapping and is
The nodes in Layer 2 are “term nodes” (G), and they act as
better suited for a dynamic system than a feed-forward NN.
membership functions expressing the input fuzzy linguistic
A recurrent adaptive fuzzy filter has been proposed to
variables. A Gaussian function is used for the membership
resolve speech processing problems involving noise [7].
function, in which the mean value is m, and the variance is V.
Good performances have been obtained, and the exact order
The two fuzzy sets of the first and the second input variables
of the inputs need not be known.
consist of n1and n2 linguistic terms, respectively. Each node
The training process is usually done with the training data
in Layer 3 is called a “rule node” (R) and represents a single
off-line, and then the NN is used to forecast future values,

fuzzy rule. A diagonal feedback connection is introduced to model, and the most firing strengths of the rules in Layer 3
give the feed-forward fuzzy NN a temporal processing are zeros (or near zeros) for arbitrary input. Thus, once an
capability. In total, there are n1un2 nodes in Layer 3 forming output of a node in Layer 3 is zero, it will be zero forever for
a fuzzy rule base for two linguistic input variables. The the DRFNN. The output in Layer 3 is as follows:
nodes in Layer 4 (N) perform the normalization of firing ai( 3 ) ( k ) S( k )Vi ai( 3 ) ( k 1 ) S( k )S( k 1 ) S( 0 )Vi k ,
strengths from Layer 3, and the input links are fully
connected. The normalization of firing strengths is helpful in
S (k ) vj
(3)
j
improving the convergence performance of the linear Therefore, a DRFNN is difficult to use in financial
adaptive process. The number of nodes in this layer is equal forecasting.
to that of the nodes in Layer 3. In the following descriptions, Several techniques can be used to improve the
the symbol vi( k ) denotes the ith input of a node in the kth layer, performance of the DRFNN. First, as in Ref. [7], a global
and the symbol a (k ) denotes the output of a node in the kth membership function, f(x)=1/(1+e-x), can be used with the
layer. feedback term node. The firing strength of a rule term in
To provide a clear understanding of an RFNN, the Layer 3 can take a nonzero value, even if it is zero in the
functions of Layer 1 to Layer 5 are defined as follows: previous iteration. The fully connected recurrent NN has
Layer 1: The nodes in this layer only transmit input interlinked weights, and it can capture more complex
values to the nodes of the next layer directly, dynamic systems. A fully connected RFNN is shown in Fig.
ai(1) (k ) vi(1) (k ) (1) 2. Only the rule layer is illustrated. “sum” and “z-1”denote
Layer 2: The nodes in this layer represent Gaussian summation and one-sample delay, respectively. The
membership functions. The functions of the nodes are function of Layer 3 can be defined as follows:
defined as
(2) 2
° (vi (k ) mij ) °½ (2)
a (j 2 ) (k ) exp ® 2 ¾
¯° V ij ¿°
where mij and Vij are the mean and the width of the R R R
Gaussian membership function of the jth term of the ith Rule

Vij
input variable, x(i), respectively. Layer 3
Layer 3: The nodes in this layer are rule nodes, and a f(x) f(x) f(x)
diagonal recurrent architecture is selected. The rule nodes

sum&z-1 sum&z-1 sum&z-1
perform a fuzzy AND operation (or product inference) to
calculate the firing strength,
ai(3) (k ) v (3) (3)
j (k )Vi ai (k 1) (3)
j Fig. 2. Fully connected RFNN
where Vi and ai(k-1) are the recurrent link weight and the
output in the last steps of the ith node in Layer 3, a i(3) ( k ) S i ( k ) f ( net i ( k )) , S i ( k ) v j (k ) ,
( 3)
j
respectively.
Layer 4: Nodes in Layer 4 perform the normalization of neti (k ) ¦V a (3)
(k 1). (6)
ij j
j
firing strengths from Layer 3,
vi(4) (k ) In the next section, we shall discuss the learning algorithm
a i(4) (k ) (4) and its convergence for RFNN.
¦ v (4)j (k )
j
Layer 5: This layer is the output layer. The link weights III. EARNING ALGORITHM AND CONVERGENCE
in this layer represent the singleton constituents (Wi) of the Generally, the learning algorithm of an RFNN consists of
output variable. The output node integrates all the two major components:
normalized firing strengths from Layer 4 with the (1) Input/output space partitioning and construction of
corresponding singleton constituents and acts as a fuzzy rules.
defuzzifier, (2) Identification of parameters.
u (k ) a (5) (k ) ¦ vi(5) (k )Wi (5) In this paper, the input space is partitioned using a priori
i knowledge. The gradient descent method is used to adjust
Remark 1: The architecture of a DRFNN as shown in the parameters of an RFNN. The RFNN is a nonlinear
Fig.2 possesses the advantage of a simple structure with tap-delay filter, and the input of the RFNN is x(k). It is
dynamic characteristics. The purpose of the recurrent is to presumed that the input space is partitioned by a priori
counter the past firing strength of its corresponding rule in knowledge. Only the singleton constituents of the output
Layer 3. Because the feedback terms contain the firing variable and the recurrent weights are adaptively adjusted
history of the rules, the recurrent fuzzy network has dynamic when the system is running. The rule of adaptive learning
characteristics [8]. The fuzzy NN is a local approximate can be obtained using the gradient descent technique. The

gradients with respect to the weight vectors, W and V, can be w u (k ) (16)
' W (k ) P e (k ) P e (k )A (k )
computed using the chain rule as follows: w W (k )
1 where A(k)=wu(k)/wW is the gradient matrix with respect to
J (k ) 12 e 2 (k ) [x(k p ) u (k )]2 (7)
2 the general weight vector. A general convergence theorem
Where p is the predictive level. The unknown parameters can be presented as follows.
can be adjusted according to the gradient descent method, Theorem 1: Let P be the learning rate for the general
wJ( k ) wu( k ) , weights of the NN. We define g max maxk A(k ) , and
W( k 1 ) W( k ) P W( k ) P e( k )
wW( k ) wW( k )
wJ( k ) wu( k ) (8) x is the usual Euclidean norm of a matrix or a vector. If the
V( k 1 ) V( k ) P V( k ) P e( k )
wV( k ) wV( k ) learning rate, P, is chosen as <P<2/(gmax)2then the local
The unknown parameters can be adjusted according to the convergence of NN is guaranteed.
gradient descent method, Proof:'V(k) can be represented as
wu (k ) 'V (k ) 'e(k )[2e(k ) 'e(k )] / 2
vi(5) (k ) ai(4) (k ) T T
wWi 1 ª we(k ) º ° ª we(k ) º ½°
«w » Pe(k )A(k ) ®2e(k ) « w » Pe(k )A(k )¾ (17)
2 ¬ W (k ) ¼ °¯ ¬ W (k ) ¼ °¿
wu (k ) ½ w a i( 3 ) ( k ) (9)
®[W i u ( k )] / ¦ a l ( k ) ¾
(3)
. 1
w V ij ¯ l ¿ w V ij
2
^
Pe(k ) A(k ) 2e(k ) e(k ) A(k )
2
2
`
From Eq. (6), the gradient with respect to the recurrent link 1 1
weight of the RFNN is found as
2
^
Pe2 (k ) A(k ) 2 P A(k )
2
2
`
Oe2 (k )
2
wai(3) (k ) ° wai(3) (k 1) ½° (10) Because A(k ) d g max , if the learning rate,P, is chosen as
Si (k ) f '(net i (k )) ®a (3)
j ( k 1) Vii ¾.
wVij ¯° wVij ¿° 2
<P<2/(gmax)2then 0 P 2 / A(k ) , which implies that
The gradient with respect to the recurrent link weight is a
dynamic equation. Using symbols similar to those in Ref. [9], O P A(k )
2
^ 2P A(k )
2
` >0 and 'V(k)<0. Therefore,
the gradient for RFNN in Eq. (9) is given by the NN system is locally convergent.
wu (k ) The general convergence theorem can be used to find
Bi (k )Pij (k ) (11)
wVij the specific convergence criterion for RFNN.
where Bi (k ) [Wi u( k )] / ¦ a(j 3 ) (k ) and Theorem 2: Let PW and PV be the learning rates for the
j feed-forward weight vector and the recurrent weight vector
Pij (k ) wa(3)
(k ) / wVij . Pij (k ) satisfies of the RFNN, respectively. The dynamic back-propagation
i
algorithm converges if the recurrent weights satisfy «Vii¸<1
Pij (k ) Si (k )f ' (neti (k )) â (3)
j (k 1) Vii Pij (k 1)` and the learning rates are chosen as
Pij (0) 0 (12) 0 PW 2 (18a)
2
Eq. (12) is dynamic recursive equations for the gradient 0 PV S min / [2nr2Wmax
2
] (18b)
and it can be solved with given initial conditions recursively. where nr is the number of rule nodes in Layer 3, and
The Eq.(8) is used for p=1. if p >1, the x(k+p) can not be Wmax maxk W (k ) . W(k) is the link weight vector between
obtained, and we can use the delay errors e(k-p) to adjust the
weights of RFNN on-line.
An RFNN-based forecasting system uses an error
Layers 4 and 5, Smin =mink[Sum(k)], and Sum(k)= ¦v
j
(4)
j (k ) .
gradient descent algorithm to adjust the weight vector of the
Proof: (a) From Eq. (9), A(k)= [a1(4) (k ) a2(4) (k ) an(4)r (k )]T is
NN. As in Ref. [9], a discrete-type Lyapunov function can
be given by the output vector of Layer 4. Because ai(4) (k ) t 0 and
1 2 (13)
V (k ) e (k )
2 ¦a
i
(4)
i (k ) 1 , From this we can obtain
Because of the training process, the change in the Lyapunov 2
function can be obtained using A(k ) ¦ [ai
(4)
i (k )]2 d 1 (19)
1 2 (14)
' V ( k ) V ( k 1) V ( k ) [ e ( k 1) e 2 ( k )] Hence, from Theorem 1, Eq. (18a) follows.
2 (b) From Eq. (12),
The error difference resulting from the learning can be S i ( k ) f '( k ) â (3)
j ( k 1) Vii Pij ( k 1)` ,
Pij ( k ) where
represented by
T fc(k)=fc(neti(k)). From Eqs. (2) and (6), we obtain Si (k ) d 1
ª we(k ) º (15)
e(k 1) e(k ) 'e(k ) e( k ) « » 'W (k )
¬ wW (k ) ¼
and ai(3) (k ) Si (k ) f (neti (k )) d 1 . Because 0<fc(neti(k))<0.5
According to the update rule of the weights, we can obtain and «Vii¸<1, the above equation can be estimated as follows,

Pij (k ) d Si (k ) f ' (k ) â (3)
j (k 1) Vii Pij (k 1) ` (20) 1
d 0.5 0.5 Pij (k 1) RFNN pridictions
Real values
Using Eq. (20) recurrently and consider the fact that Pij(0)=0,
0.5
it follows
Normalized price
2 k 1 k
Pij (k ) d 0.5 0.5 0.5 0.5 Pij (0)
k 1 f (21)
0
¦ 0.5 d ¦ 0.5
t 1
t
t 1
t
1
denote AI(k)= ^ Aij (k )` , and Aij(k)= wu(k)/wVij. From Eq.

nr u nr
-0.5
(11), we obtain
Aij (k ) Bi (k ) Pij (k ) d [Wi u (k )] / ¦a (3)
j (k )
(22)
j -1
0 50 100 150 200
d ^ Wi u (k ) ` / Smin Trading days
From Eqs. (4) and (5) and the condition that «Wi»dWmax, we Fig. 3. The predictions of RFNN for daily stock index (p=1)
obtain
u (k ) d ¦ vi(5) Wi d Wmax ¦ vi(5)
i i (23)
RFNN pridictions
Wmax ¦ vi(5) Wmax ¦ vi(4) / ¦ v (4)
j Wmax Real values
i i j -0.5
Thus Ai (k ) d ^ Wi u (k ) ` / S min d 2Wmax / Smin and

-0.6
Normalized price
AI ( k ) d 2nr2Wmax / Smin (24)

Hence, from Theorem 1 and Eq. (24), Eq. (18b) follows -0.7
IV. SIMULATION EXAMPLES -0.8
Some simulations are presented to illustrate the online

-0.9
forecasting performance of RFNN. There is only one input
node and one output node in RFNN. Given the input data set,
-1
the mean and variance of the Gaussian functions can be 1000 1100 1200 1300 1400 1500 1600
estimated using a clustering algorithm. Because only one Trading days
input node is selected in the simulation, the input space is Fig. 4. The predictions of RFNN for daily stock price (p=1)
uniformly partitioned to eight fuzzy sets, and the means and
widths of the Gaussian membership functions are selected as
[10]: V. CONCLUSION
m=[-0.65,-5/8,-3/8,-1/8,1/8,3/8,5/8,0.65],
V=[-20,0.14,0.14,0.14,0.14,0.14,0.14,20]. A Recurrent Fuzzy Neural Network (RFNN) is proposed.
Only the weight vectors W and V are adjusted online, and Online dynamic back-propagation learning algorithms based
the means and widths of the Gaussian membership functions on the error gradient descent method is proposed. The
remain fixed when the system is running. Experiments are proposed RFNN is applied to stock price forecasting.
conducted with Shanghai stock index (from 2005-1-4 to Because RFNN can capture the dynamic behavior of a
2005-11-30) and a stock price (No. 600001, from 1998-1-22 system through the feedback links, the exact lag of the input
to 2004-12-31). Fig. 3 and Fig. 4 show the forecasting variables need not be known in advance. Only one input
results of RFNN for daily stock index (p=1) and daily stock node is used, and the drawback of input nodes selection in
price (p=1). It can be seen that RFNN is able to predict the feed-forward neural networks is overcame. The simulating
values of the stock index and stock price. To reduce the results show that the proposed algorithm is effective for
noise in daily data, Move Average (MA) techniques are used stock price forecasting. The average moving techniques can
in daily stock prices. Fig. 5 and Fig. 6 show the forecasting improve the predictive performance when the predictive
results of RFNN for MA(5) (p=1 and p=5), and the level p>1. Our current work is focused on designing a
forecasting results of RFNN for MA(20) (p=1 and p=5) are trading decision system based on RFNN to meet the
presented in Fig. 7 and Fig. 8. It can be seen that Move requirement of practical application.
Average (MA) techniques is able to improve the forecasting
performance.

RFNN pridictions RFNN pridictions
Real values Real values
-0.5 -0.5
-0.6 -0.6
Normalized price
Normalized price
-0.7 -0.7
-0.8 -0.8
-0.9 -0.9
-1 -1
1000 1100 1200 1300 1400 1500 1600 1000 1100 1200 1300 1400 1500 1600
Trading days Trading days
Fig. 5. The predictions of RFNN for MA(5) (p=1) Fig. 7. The predictions of RFNN for MA(20) (p=1)
RFNN pridictions RFNN pridictions

Real values Real values
-0.5 -0.5
-0.6 Normalized price -0.6

Normalized price
-0.7 -0.7
-0.8 -0.8
-0.9 -0.9
-1 -1
1000 1100 1200 1300 1400 1500 1600 1000 1100 1200 1300 1400 1500 1600
Trading days Trading days
Fig. 6. The predictions of RFNN for MA(5) (p=5) Fig. 8. The predictions of RFNN for MA(20) (p=5)
[6] Liang X. : Neural Network Method to Predict Stock Price Movement

ACKNOWLEDGMENTS Based on Stock Information Entropy. ISNN 2006, LNCS 3971, 442 –
451
This research is supported by Training Funds for Elitist of [7] Juang C. F. and Lin C. T.: Noisy Speech Processing by Recurrently
Beijing. Adaptive Fuzzy Filters. IEEE Trans. on Fuzzy Systems 9 (2001)
139-152
REFERENCES [8] Lin J. and Wai R. J.: Hybrid Control Using Recurrent Fuzzy Neural
Network for Linear-induction Motor Servo Drive . IEEE Trans. on
[1] Marius J.: Testing Stock Market Efficiency Using Neural networks, Fuzzy Systems 9 (2001) 102-115
Case of Lithuania. SSE Riga Working Papers 17(52) (2003) 1-55 [9] Ku C. C.and Lee K. Y.: Diagonal Recurrent Neural Networks for
[2] Sun Y. F., Zhang W. L., Gu X. J. and Liang Y. C.: Optimal Partition Dynamic Systems Control. IEEE Trans. on Neural Networks 6 (1995)
Algorithm Of RBF Neural Network and Its Application to Stock Price 144-156
Prediction. Proceeding of International Conference on Intelligent [10] Zhang Q. Z. and Gan W. S.: Active Noise Control Using a Simplified
Information Technology, Posts & Telecom Press, Beijing (2002) Fuzzy Neural Network. Journal of Sound and Vibration, 272 (2004)
448-454 437-449
[3] Huang W., Nakamori Y., Wang S. Y., and Zhang H.: Select the Size of
Training Set for Financial Forecasting with Neural Networks. ISNN
2005, LNCS 3497, 879–884
[4] Lee K., Park J., and Lee S.: Effectiveness of Different Target Coding
Schemes on Networks in Financial Engineering. ISNN 2005, LNCS
3497, 873–878
[5] Zeng F. Z. and Zhang Y. H.: Stock Index Prediction Based on the
Analytical Center of Version Space. ISNN 2006, LNCS 3971, 458 –
463,

Basic Engineering Materials Classification

Model- A Neural Network Application
Doreswamy
Department of Post-Graduate Studies and Research in Computer Science
Mangalore University,
Mangalagagotri-574 199,
Karnataka, INDIA
Ph.No: +91-824-2287670
doreswamyh@yahoo.com
Subject Classification: NN Model and Algorithms and Because of the vast amounts and varying quality of this
Neural Network Applications. information, the use of artificial intelligent based data
mining models, for augmenting more analytical
Abstract approaches for advanced composite materials designs, is
This paper deals with an application of Artificial receiving increasing attention [2].
Neural Network (ANN) for engineering materials’ non
linear data set classification and prediction of class of Inspired by the biological nervous system, an artificial
inconsistent input design requirements. An optimal neural network (ANN) approach is a fascinating
Back Propagation Neural Network (BPNN) structure mathematical tool, which can be used to simulate a wide
is constructed to improve classification accuracy with variety of complex scientific and engineering problems.
minimum classification error based on the network A self-modeling system for materials search [3], neural
features, which are determined by the analytical trial network applications for composite properties prediction
and error procedure of network training on learning [4] using gradient decent algorithm [5] have been
data set. The knowledge about the class associated to developed using artificial neural networks and
majority of the data sets in the materials database is classification mappings. Classification tasks form an
determined by the statistical parameters, which are important class of problems in patterns classification and
computed by network learning process on data set at recognition of machine intelligence and learning. A
different test phases. The results of the constructed number of classification methods such as decision tree
neural network model are compared with equal based classifier [6], Ant colony algorithms [7][8], Neural
probability distribution function. The validation networks for knowledge discovery through classification
results depict that the optimal network structure [9][10],Fuzzy expert systems[11], and hierarchical
implemented has revealed the acceptable classification classification systems[12] for data selection have been
performance accuracy of 99.35% with the cost applied successfully to tackle these types of problems.
function error of 0.00690%, less than 1%. This The goal of this model is to promote more consideration
classification model is proposed as the decision of using ANNs in the field of composite materials design
support model to assist composite designer in the and their basic engineering materials classifications and
selection process of basic engineering material against to support in decision making in materials selection by
the input end user requirements. composite design engineers. This model closely
examines the classification and prediction strategies on
1 Introduction
materials data sets consisting of non linear relationships
Materials and their designs have evolved over two among their attributes/properties values. The applications
millennia from the simple to complex macro-design of of data classification models in composites materials
mixtures - monolithic to composite materials. The trend designs are revealed and promoted as new classification
today is advanced composite materials designs for models.
various applications, and the research emphasis is
towards more computationally tractable ’atomic-scale' The rest of this paper is organized as follows. The
design of lattices, surfaces and interfaces. In general, the section 2 describes the classification system’s
pace of all materials research is being challenged, architecture. Emphasis on the database and its attributes
particularly from an affordability perspective. As a used for classification and prediction model, is made in
consequence, a world-wide pursuit of more efficient and section 3. Section 4 describes the experimental results of
accurate prediction methods of ‘yet-to-be-made’ classification model on the training and the testing data
materials is becoming a preeminent materials research sets. Section 5 draws conclusions and briefs about the
frontier. One, currently very popular, approach to future scope.
materials design is to utilize existing materials data to
predict properties of yet-to-be-made materials [1].
591
2. Neural Network Architecture 2.2 Preparation of Input and Output Patterns for
Training
Artificial neural networks are a class of parametric non
linear statistical models that have found wide spread use
The performance and the convergence of the NNs are
in many domains, including data mining, signal
determined with the capability of generating correct
processing, medical diagnosis, composite materials
solutions for input patterns. Training of NN is carried out
design and process, materials property predictions [13].
through the presentation of training samples. Training
These are different from other traditional techniques with
samples are selected to be representative of variations in
two major characteristics: learning and recall. Learning is
such a way that unseen part features can be recognized by
the process of adjusting the connection weights to
network.
produce the desired output. Recall is the process of
providing an output for a given input in accordance with
The neural network model is a three layer feed
the neural weight structure [14].
forward neural network [15][16]. Each layer is fully
connected to all successive layers through the connection
Artificial neural architecture is mainly composed of
weights as shown in figure 1. For a neuron i, the
elementary processing units, which are interconnected, by
normalized weighted inputs are fed and then summed up
weighted connection and layers. Among various neural
to the final input ui
network architectures Back Propagation Neural Network
m
u i = ∑ w j ,i x j
(BPNN) is a widely used technique for training of (1)
Multilayer Perceptrons (MP). Though there are several
j =1
case studies of computer aided materials design and
The inputs for neurons are propagated to outputs through
manufacturing problems solved successfully by MLP
the neurons in the hidden layers according to the
algorithmic approach [15], back propagation model is
applied as new model for basic engineering materials following sigmoid activation function with bias θ
classifications. The major steps of the Neural Networks 1
(NNs) approach involve the following steps. f (u i ) = (2)
1 + e −( ui +θ )
where ui is the input function and f(ui) is the output
1. Construct Multilayer Neural Network architecture. function.
2. Prepare the input and output patterns for training.
3. Train and validate the NN. 2.3 Training and Validation of Neural Network
The training procedure is a search algorithm to minimize
2.1 Construction of Neural Network Architecture
the error between the input and the output patterns by
changing the weights. This process determines the
Structural design of NN involves the determination of
weights of NN connections to map the relationships
layers and neurons in each layer. The selection of ANN
between input and output. The network must be trained
architecture is determined by trial and error approach
with training data sets in such a way that for a given input
such that the number of neurons in the input layer,
vector, the output vector must be obtained to classify the
number of hidden layers, number of neurons in each
patterns.
hidden layers and number of neurons in output layer are
found by using several repeated runs of the system.
When the back propagation learning method is
Several repeated solutions with different initial weights
employed as training procedure, the objective function
and network parameters are used for converging to
for an input/output pattern is the sum of the squared
optimal solution.
residual errors as follows.
1 m
Typical Multilayer Perceptrons Neural Network
(MPNN) architecture and a processing neuron are
E= ∑ (Tk − Ok ) 2
2 k =1
(3)
depicted in figure 1.
where Tk and Ok are the target and the actual
Input
computed outputs of kth output unit respectively. A
Data Set gradient decent method is implemented to find a set of
weights that minimizes the objective function error. The
weight change is proportional to the derivative of the
error with respect to each weight. This can be expressed
Data as
Preproces ∂E
sing ∆w ∝ (4)
∂W
The determination of weight change is a recursive
process which starts with the system units. For a weight
Figure 1: Typical MPNN Architecture that is connected to a unit in the output layer, its change is
based on the error of this output unit. It is given by
592
∆ w k , j ∝ O k (1 − O k ) ( T k − O k ) O j = δ kO j Pe = 1 − P(C pmc , N ) (14)
∆ wk , j = δkO j (5) where N is the total number of training samples , Ω(N )
is the decision space, Cpmc is the probability of correct
where δk is referred to as the error signal at the kth output
classification on entire training data set, Pe is the error
unit. The output signals are back propagated to units in probability, and PM, CM and MM are the probabilities of
the hidden layer. The change of a weight in hidden layer correct classifications of Polymer, Ceramic and Metal
classes, respectively.
is determined by ∆ w j ,i ∝ O j (1 − O j ) ∑δ
k
k wk , j O i ,
3 Materials Database
∆ w j ,i = δ j Oi (6) The reconstructed material database obtained by
sampling various data sets from different websites
In order to increase the speed of the training procedure
[16][17] is used as training samples for classifying
without any oscillations, the adaptive learning rate and
material class. The database contains 5000 data instances
momentum are used during the training process. The
that include all class materials such as Polymer, Metal
equations (5) and (6) are then rewritten as follows
and Ceramic.
∆wk,j(n) = ηδ k O j + α ∆wk,j(n-1) (7)
3.1 Descriptions of Attributes of Database
∆wj,i(n) = ηδ j Oi + α ∆wj,i(n-1) (8)
The material database is an organized collection of
where n is the training epoch number, η is the learning related data that specifies the materials data and their
rate and α is the momentum. The momentum allows the properties including physical, mechanical, thermal and
previous weight change to have a continuing influence on general properties. Each property has some sub properties
the current weight change. that emphasize the characteristics or features of
engineering materials [17][18].
Neurons in the hidden layers play a critical role in
4 Experimental Results
the operation of MLP network with back-propagation
learning because they act as feature detectors. While the 4.1 Inputs
learning process is in progress, the hidden neurons begin
The unique materials attributes/features [17][18] from the
to gradually discover the hidden features that characterize
database are extracted and used to represent the input
the training data. Hidden neurons do non linear
neurons in the BPNN. The numeric values present to the
transformation on the input data into a new space called
input neurons are preprocessed for checking integrity of
feature space. The error obtained during learning process
the attribute values and for replacing
in each epoch is measured by cost function (9). The error
linguistic/categorical values with predefined numeric
signal is gradually minimized as the number of iterations
values by the domain expert classification rules in the
steps further for fixed epochs.
N M
knowledgebase. The data sets are then transformed to the
1
E av =
2N
∑∑ (T
1 K =1
k − Ok ) (9) range 0-1 by using the Min-Max normalization technique
[19] for improving the classification accuracies. A set of
The neural network with optimal number of neurons in properties is randomly selected, processed and presented
the hidden layer is determined on fixed momentum to the input layer of the network. Network predicts the
constant and different learning rates. The best learning class to which majority of the input properties are
curve is induced in for the classification of engineering belonging.
materials. 4.2 Outputs
Network validation is the primary task of The outputs of BPNN algorithm are obtained in the form
determining the correctness of results obtained by the of weights for every neuron. The neuron in the output
network architecture. The performance of the layer whose output weight value converges to 1 is
implemented network is measured and validated on Equal considered as the winning output neuron. The class label
Probability Distribution Function(EPDF) theory. corresponding to that winning neuron is identified as the
predicted class for the input design requirement
properties.
P(Cpmc, N) = ppP(PMc,N/ Pm)+ pcP(CMc, N/Cm)+ pmP(Mc,N/ Mm)
4.3 Optimal BPNN Selection and Classification Results
where pp= pc = pm = 1/ 3 (10)
(11) Determination of optimal number of neurons in the
P ( PM c , N / Pm ) = ∫ f ( x / PM
Ω( N )
x c )dx
hidden layer for producing minimized error is the core
(12) task of optimal neural network design. As there is no
P (CM c , N / Cm ) = ∫f
Ω( N )
x ( x / CM c ) dx standard method of determining optimal neural network
structure, analytical trial and error analysis procedure is
(13)
P ( MM c , N / M m ) = ∫f x ( x / MM c )dx employed for designing optimal network architecture.
Ω( N ) Network structures with different number of active
The probability of error occurs is defined as
593
neurons at hidden layer are trained with different learning The network performance is analyzed on materials
rate η , η ∈ 0.01, 0.1, 0.25, 0.5, 0.75, 0.725, 0.8, 0.9 { } database consisting of 5000 data sets that included all
types of materials. The classification performance and
and fixed momentum constant, m = 1.0, for fast
residual errors of the network, and the statistical
convergence. The minimized cost function errors
parameters such as mean and standard deviation are
obtained by equation (9) for different data set are
estimated for individual class of materials. From these
analyzed for the optimal network selection. Figure 2
statistical parameters, the following are found:
shows the comparisons of best learning curves extracted
from the learning curves of different network
1. The mean classification error of 99.35440% and the
architectures. From the figure 2, it is found that the
mean cost function error of 0.00690% are obtained by
optimal BPNN with four hidden neurons has produced
the optimal network. These are the optimal results for
the least cost function error 0.002312 at 100 epochs, with
testing data set and predicting the unknown class of
optimal learning rate 0.8 and momentum constant 1.0, for
non liner data sets
better classification accuracy. The knowledge of the
2. The standard deviations of individual classes of
optimal neural network structure is extracted to obtain
training data have measured the deviations of
better classification and prediction results and the same
classification among the training set samples
is listed in the table 1
Best Learning Curves
3. The majorities of the materials in the proposed database
0.014 are polymers. The table 2 shows the comparisons of
materials classification performance of Equal
25 X 4 X 3 Network with Lr = 0.8
25 X 6 x 3 Network with Lr = 0.8
0.012 25 X 7 X 3 Network with Lr = 0.8
0.01
Probability Distribution Function (EPDF) and Neural
MinimizedCost Function Error
0.008
Network Model(NNM).
0.006
0.004
4.4 Testing Results
0.002
The network structure constructed and trained on learning
0
50 100 150 200 300 data set was tested with different class of materials. The
non linear materials properties are extracted randomly
No. of Epochs
Figure 2: Best Learning curves for different network structures

from the material database and are preprocessed with
Table 1 classification rules for testing integrity of Polymer, Metal
Optimized MLBP Neural Network and Ceramic classes, respectively. The optimal neural
Optimum Number of Hidden Neurons 4 network implemented successfully is capable to predict
Optimum learning rate 0.8 the class of input design requirements with inconsistent
Momentum Constant 1.0 data. The following figure 3 depicts the exponentially
decreased cost function errors at different steps of
The determined optimal features in table 1 are used for iterations for correctly classified material classes.
constructing the optimal network structure for the
classification and prediction of materials in the database.
Table 2
Comparison of materials classification performance of Neural Network Model (NNM) and equal probability density function.
Run Training Polymer Ceramic Metal Training Data Set
Steps set Classification in % Classification in % Classification in% Classification in %
BPNN
EPDF BPNN EPDF BPNN EPDF BPNN EPDF
1 1000 45.89 42.543 28.37 25.335 24.98 22.374 99.24 90.252
2 2500 47.2756 45.204 25.789 21.245 26.287 25.334 99.3516 91.783
3 3500 50.321 48.215 26.450 22.264 22.65 21.346 99.421 91.825
4 5000 51.6427 47.226 27.4674 26.279 20.294 19.446 99.4041 92.951
Average
48.78233 45.797 27.0191 23.78075 23.55275 22.125 99.35418 91.70275
Classification
Cost Function Errors with Correctly Classfied Materials Classes

5. Conclusion and Future Work Scope
0.002 0.244
0.0018 Polymer Metal Ceramic 0.243
An optimal multilayer back propagation non linear

Errorsof Polymer andMetal Classes
0.0016 0.242
neural network model has been implemented

Errorsof CeramicClass
0.0014 0.241
0.0012 0.24
0.001 0.239 successfully to predict the class of input design

0.0008
0.0006
0.238
0.237
requirements with inconsistent/ nonlinear data items.
0.0004 0.236 Knowledge about the network structures include
0.0002
0
0.235
0.234
number of network layers, number of neurons in
99999 199998 299997 399996 499995 599994 699993
Iterations at Different Steps
799992 899991 999990
hidden layers and optimal learning rate parameters
are revealed for the optimal network construction.
Figure 3: Cost function errors associated to Correctly Classified The majority of the materials in the training data set
material classes are revealed by statistical knowledge. This supports
the design engineer to directly select the domain
594
class and to reduce the search complexities for [11] Chang S Y, Lin C R and Chang C T. (2002) A fuzzy
diagnosis approach using dynamic fault tree, Chemical
materials selection. The implemented model has Engineering Science, Vol. 57,No.15,pp.2971-2985.
obtained satisfactory results on the training data set
with the classification accuracy of 99.35440% with [12] Chih-Ming Chen, Hahn-Ming Lee, Cheng-Wei Hwang.
the mean cost function error of 0.00690%. (Dec.2005) A Hierarchical Neural Network Document
Classifier with Linguistic Feature Selection, Applied
Intelligence, Publisher: Springer Netherlands ISSN: 0924-
Further, this model can be extended as Neuro- 669X (Paper) 1573-7497 (Online), Issue: Volume 23,
Fuzzy classification model by inputting fuzified data Number 3, pp. 277 – 294.
set to Neural Network Model for achieving still
[13] Peter Sittner, Veronique Michaud, Antonio Balta-Neumann
better classification accuracy with negligible cost and Jan Schrooten. (2001) Modeling and Materials Design
function error. of SMA Polymer Composites, Proceedings of International
Symposium on Smart Materials, PRICM4, pp.11-15, Hawai,
ACKNOWLEDGEMENT Honolulu.
This work is supported by UGC under minor [14] Rumelhart D, Hinton G and Williams R. (1986) learning
research project No..MRP (S)-285/2005(X internal representation by error propagation, Parallel
Distributed Processing. Cambridge, MIT Press, Vol.1,
plan)/KAMA004/UGC-SWRO. The author pp.138-362.
gratefully acknowledges the support.
[15] ZHANG Z. and FRIEDRICH K.(2003) Artificial Neural
Networks Applied To Polymer Composites: A Review,
REFERENCES Composites Science and technology,
Vol. 63, N0.14, pp. 2029-2044.
[1] Christopher C. Fischer,Kevin J. Tibbetts, Dane Morgan and
Gerbrand Ceder. (2006) Predicting Crystal Structure By [16] Zhou Z H,Wu J and Tang W. (2002) Ensembling Neural
Merging Data Mining With Quantum Mechanics, Nature Networks: Many could be better than all, Artificial
Materials ,Vol.5, No.8, pp.641-646. Intelliegence,Vol.137,pp.239-63.
[2] Jang-Kyo Kim,(2003) Polymer Composite Processing [17] Materials database at website http://www.matweb.com.
Technologies, in the proceeding of the 2nd International
Conference on Applied Science and Technology, pp.13-25, [18] Dieter, G.E. (Ed.) (1997), Materials Selection and Design,
Bhurban. ASM Handbook, Vol.20. Materials Park, OH: ASM
International.
[3] Hang Su, Cai-Fu Yang, Jun-Chang Shen and Zhi-Ling Tian.
(2001) A Systemic Self-Modeling Method and Its [19] Michalski, R.S. and Kaufman, K.A.(1998), Data Mining and
Application to Material Design and Optimization, Modeling Knowledge Discovery: A Review of Issues and a
Simulation Material Science Engineering, Vol. 9, pp.97-109. Multistrategy Approach, In Machine Learning and Data
Mining: Methods and Applications, Michalski, R.S.,
[4] Necat Altinkok , (2006) Use Of Artificial Neural Network Bratko, I. and Kubat, M. (eds.), London, John Wiley &
For Prediction Of Mechanical Properties Of -Al2O3 Sons, pp. 71-112.
Particulate-Reinforced Al–Si10Mg Alloy Composites
Prepared By Using Stir Casting Process, Journal Of
Composite Materials, Vol. 40, No. 9, pp.779-796.
[5] Simon Hay Kin. (2003) Neural Networks- A comprehensive
foundation, Pearson Education, Forth Edition.
[6] S. Rasoul Safavina and David Landgreb.(1991) A Survey on
Decision Tree Classifier Methodology, IEEE Transaction on
Systems, Man and Cybernetics, Vol.21, No.3, pp 660-674.
[7] Shankar P.S, Jayaraman V K, and Kulkarni B D. (2004) Ant

Colony Classifier system: applications to some process
Engineering problems, Computers and Chemical
Engineering, 28, pp.1577-1584.
[8] Parpinelli R S, Lopes H.S and Freitas A A. (2002) A Colony

algorithm for Classification Rules discovery, In H abbas, R
Sharkar and C Newton (Eds.), Data Mining: A Heuristic
approach, pp.191-208, London, UK: Ideal group publishing.
[9] Gangadhara T Shoba, Srinivas C Shama and Doreswamy.

(Dec.2005) Knowledge Discovery for Large data sets using
Artificial Neural Network, International Journal of
Innovative Computing, Information and Control. Vol.1,
Number 4, pp.635-642.
[10] Khosrow Kaikhah and Sandesh Doddameti.

(Feb.2006)Discovering Trends in Large Datasets Using
Neural Networks., Applied Intelligence, Volume 24, Number
1, pp. 1573-7497, Publisher: Springer Netherlands.
595
COMPLEXITY ANALYSIS OF EEG UNDER DIFFERENT BRAIN

FUNCTIONAL STATES USING SYMBOLIC ENTROPY
Lisha Sun1, Guoliang Chang1, Patch. Beadle2

1
Key-Lab of Intel. Manuf. Tech. of State Education Ministry
College of Engineering, Shantou University, Shantou 515063, China.
(lssun@stu.edu.cn)
2
School of Engineering, Portsmouth University
Portsmouth, United Kingdom
ABSTRACT produced by low-dimensional chaos [6]. Pritchard et al. also applied

surrogate data testing to a normal resting human EEG and revealed
This paper proposes a novel method based on symbolic dynamics that a normal resting human EEG was nonlinear, but did not
for investigating the complexity of EEG signals. The symbolic represent low-dimensional chaos [7]. Similar results have been
entropy (SyEn) from the symbolic sequences of the given signals is independently reported by Casdagli, Rombouts and et al. [8]. Hence
defined and the corresponding algorithm is developed for the these algorithms may produce spurious dimension or Lyapunov
purpose of quantitatively measuring the complexity of EEG data. exponent estimates, supporting false identification of chaotic
To evaluate the performance of the presented symbolic entropy, dynamics existing in the observed data [9].
some simulations with symbolic random sequences were carried out Information theory provides a new way to measure the
and compared with other traditional entropies such as binary non-linear trends in a brain electrical time series. Several entropies
Shannon entropy and approximate entropy. Furthermore, the for identify the complexity of medical signals have been introduced.
proposed SyEn is used as a quantitative parameter for estimating the As early as 1970s, Lempel and Ziv had researched the complexity of
complexity of two groups of clinical EEG data with normal and random time series from the viewpoint of Kolmogorov complexity
schizophrenic subjects. Several experimental results are provided to [10]. They thought the complexity described the increase of
show that our method is significantly superior to the common different patterns in a time series with the increase of the data sets
entropy approaches. Based on the developed time-varying entropy, and represented the extent of randomness. In 1991, Pincus, who
significant difference between two kinds of EEG data was easily formulated the theory and method for a measure of regularity
distinguished. closely related to the Kolmogorov entropy, introduced a new
scheme in terms of the approximate entropy (ApEn) [11]. Later on,
Index Terms—symbolic entropy, time series analysis, sample Richman and Moorman found a new algorithm called sample
entropy, complexity analysis entropy (SampEn) [12]. Compared to the algorithm of ApEn,
SampEn reduces two aspects of bias and agrees much more closely
1. INTRODUCTION
with the theory for random numbers with known probabilistic
characteristics over a broad range of operating conditions and
Studies have shown that human brain is a complicated nonlinear
furthermore maintains relative consistency. However, the
spatial-temporal neural system [1,2]. Electroencephalogram (EEG),
performances of the two methods depend highly on the chosen
which is regarded as the summed electrical activity of very large
parameters. The variations of parameters will strongly affect the
numbers of neurons exists high-dimensional chaotic characteristics.
reliability of the results. They are experimental to some extent and
Regarding the functional significance of the brain activity, it is of
time-consuming so that researchers urgently pursue a more
major interest to investigate what is the nonlinear spatial-temporal
stationary, reliable and simple algorithm to identify the
structure of such brain oscillations, as far as they are visible in the
complexities of nonlinear signals. Recently, several entropies based
EEGs, under pathological or physiological brain states such as
on symbolic dynamics have shown to be a new way for identify the
epileptic seizures, sleep-wake stages and etc. Advanced approach
complexity of medical signals and gained a rapid development
for studying EEGs enables us to extract more useful information
[13-16].
and the underlying inherent mechanism under different brain
In this study, a novel scheme called symbolic entropy (SyEn) is
functional states.
proposed to distinguish the complexities of different EEGs. By
In principles, non-linear dynamics provides a more complete
setting proper thresholds, the nonlinear dynamic series are
description of the EEG recordings and a better understanding of the
converted into symbolic series and a novel SyEn is defined. We also
underlying mechanism of the brain, such as Lyapunov exponent,
compare the results with traditional SampEn. Experimental results
correlation dimensions [3-5].However, these algorithms are based
show that the SyEn can effectively identify the differences of EEG
on a low-dimensional nonlinear dynamics system. By using the
data between the healthy subjects and epileptic patients, which are
surrogate-data methods, Theiler et al. found that the EEG was not

extremely useful for the purpose of assistant diagnosing the
epileptic and evaluating the medical treatment.
The paper is organized as follows: In section 2, the method of
symbolic entropy is introduced. In section 3, surrogate data
algorithm is used to test the nonlinear characteristics of EEGs. The
SyEn and SampEn analysis based on a sliding window are applied Fig. 1. The segmentation of the symbolic time series
to deal with the EEG data. The clinical interpretation follows that. n
The detailed conclusion and discussion is given in section 4. The number of different integers is K 2 . Next, the
pseudo-random sequences {si | i 1, 2, , N } are segmented into
2. SYMBOLIC ENTROPY METHODS the short sequences with length of L. Pseudo-random sequences of
vectors u (1), , u ( N L 1) are formed by
Symbolic dynamics is considered to be a coarse grained description
of trajectories of a general class of systems, which remains both defining u (i ) [ s (i ), s (i 1), , s(i L 1)] . The short sequences
robust and statistical properties of the system invariant [17].The can be marked and identified as:
main principle of symbolic dynamics is to transform a time series
into symbol sequences, which provides a model for the orbits of the L
dynamical system via a space of sequences. The adaptation of the
definitions of symbolic dynamics to chaotic systems allows
l x ( L, i ) ¦m
p 1
L p
s( p i) (3)
partitioning the infinite number of finite length trajectories into a where m denotes the number of different integers in symbol set
finite number of trajectory sets. {s0 , s1 , , sm 1} and L is the length of the short time sequences. i
Given a data set X , the symbol sequence is achieved by
represents the initial point of the symbolic
quantifying X into a symbol. Let E {E0 , E1 , , Eq 1} be a
series {si | i 1, 2, , N } . By quantifying, the symbol sk can be
finite disjoint partition of a phase space X . Considering a given replaced by an integer k. The derived short time symbolic series can
symbol set {s0 , s1 , , sm 1} and a data set composed of m+1 critical L
be easily identified by data set {0,1, , m 1} . Therefore, the
points {c0 , c1 , cm } , the given chaotic sequences symbolic entropy of the sequences can be defined by calculating
{xi | i 1, 2, , N } can be replaced by the symbolic information contained in the symbolic series
series {si | i 1, 2, , N } :
I
E ¦ Pl ln Plx
L lx x
(4)
if ck xi d ck1, then s(i) sk (1)
where Pl
x
nlx / nsum is the probability of the pattern lx . nlx is the
The performance of the method depends on the precision of the frequency of the pattern lx . nsum is the total number of the
quantization and the dynamical characteristics of the original patterns.
systems. Also, one of the most important problems dealing with Clearly, the ability of coding information depends on the
symbolic dynamics is that we must risk the loss of information number of the critical points. If the number of c i is given, we can
caused by coarse-grained processing. Generally, this influence is
not significant, except for some extreme cases such as binary find the best critical values by optimizing the symbolic entropy E.
quantization [18, 19]. So the symbolization based on binary is not The number of the critical points is proportional to the ability of
applicable and the quantization to multiple scales is more reliable. coding the original time series. The more the critical points, the
In our proposed method, we will convert the original signals into higher the symbolic entropy. When the number of critical points
multiple integers according to the following algorithm [20, 21] comes to a certain extent, the pseudo-random sequences have
sufficient information so that the symbolic entropy reaches to the
maximum.
jS ( j 1)S
VC(x) j, if sin2 x dsin2 j 0,1,2,, K1 (2) For a very regular binary sequence, only few distinct patterns
2K 2K occur. Hence, the symbolic entropy becomes smaller since the
probability of these patterns is higher and only little information is
contained in the whole sequence. For a random binary sequence, all
Next, we will segment the symbolic time series possible patterns of length N occur with the same probability. So the
{si | i 1, 2, , N } into short time series {U (1), , U ( N L 1)} with corresponding content of information is maximal.
length of L, in which U (i ) [ s (i ), s (i 1), , s (i L 1)] . Fig. 1
3. COMPLEXITY ANALYSIS OF EEGS
represents the algorithm of segmentation with length L 5 .
3.1 EEG Data Collection
The subjects included controls and epileptics. The controls were
4 healthy male graduates and researchers. They have no history of
neurological or psychiatric disease and not taking any drugs. 8

epileptic patients have no medicated histories. The experiments phase-randomized surrogate, the null hypothesis is that EEG signals
were performed in an acoustically and electrically shielded room. are generated by linear stochastic process, and symbolic entropy is
The EEG data were collected with NIKKON 4000 EEG System at used as the discriminating statistic.
sampling rate of 100Hz and 1-min of data were recorded from 14 Usually, a Gaussian distribution of surrogate data SyEn is
electrodes (Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T5, T6) assumed. A statistical t-test is performed by calculating
according to the international 10-20 system. The reference electrode significance:
was placed on the tip of the nose, grounding through linked earlobes. Q ! Q (5)
surr o rig
Artifact rejection was performed off-line by an experienced S
doctor’s visual inspection of the recordings. V
The preprocessing of the EEG data was performed before the
complexity analysis. Data were high-pass filtered at 1 Hz and where ҏ Qorig stands for the SyEn of the original time series.
low-pass filtered at 35 Hz. Then they are normalized into interval [0,
1] via a linear map and then the sequences are coded for the estimate Qsurr ! stands for that of the surrogate data and V is the
of the SyEn. Here 30s long EEG time series was selected. In order to standard deviation of the SyEn for surrogate time series. The test
detect and reduce non-stationary present in taking small windows of allows rejection of the null hypothesis with some confidence level,
the whole EEG data, two windows sizes, 2s-window and 5s-window for example S t 1.96 , means the confidence level P=0.05.
are used with 50 data points overlap. For the former situation, there In this study, we use the 14-channles EEG data from a healthy
are 57 sliding windows with 200 data points in each one. The subject as the original data sets. 40 realizations of the surrogate
averaged values of the entropies for the 57 sliding windows are algorithm are generated, and SyEn is calculated as described in
taken. By averaging mean data of the two groups individually, the Section II for each group surrogate ensemble. The mean and SD of
final entropy value is obtained. For the latter, there are 51 sliding SyEn in surrogate data are calculated and compared with that of the
windows with 500 data each correspondingly. The calculations are original. The result is shown in Table 1. It indicates that the SyEn
performed for each sliding window via both SyEn and SampEn between the original and the surrogate data is significantly different.
algorithms. For SampEn, the input parameters m=2, r=0.15SD (the The SyEn of the surrogate data change from 0.8022 to 0.8723 which
standard deviation of EEGs) are chosen. As to SyEn, we take K=8, is significantly higher (P<0.001) than the original. So the null
L=5 for each sliding window. Entropy algorithms are applied in hypothesis can be rejected at a confidence level of 0.001.
calculating EEGs of both the controls and the patients. Moreover, as mentioned above, the size of the sliding window is
an important parameter regarding the time resolution of the
3.2 Text of Nonlinear Hypothesis time-varying entropy. Fig. 2 shows the results of the SyEn for an
From the neurophysiological point of view, there is much example of a segment time series with 2 kinds of window size: 50
evidence that EEGs arise from a highly nonlinear system. Not only and 100, respectively. By comparing the Fig.2 (b) and (c), it can be
have multiple feedback loops been detected at each of the seen that the window size with 50 is more reasonable for our
hierarchical levels of the central nervous system, but also individual purpose.
neurons themselves appear to be nonlinear elements. It has been
demonstrated experimentally that indeed neuronal systems exhibit a 3.3 EEGs Analysis
variety of nonlinear behaviours. But, with respect to the dynamical Table 2 presents the SyEn and the SampEn measurements of the
modelling of EEG signals, it is still a crucial problem whether EEGs controls and epilepsy groups with 2s-window. The EEGs of 14
are indeed in accordance with nonlinear deterministic models, channels are used for calculations. Each subject's entropy results
rather than linear stochastic models. come from the average value of 57 sub-sections with 50 data
A straightforward way to test different dynamical models for a overlapping segment measurements. SyEn values of epilepsy lie
given time series is the so-called method of surrogate data, which is between 0.2238 and 0.3070, corresponding to the areas of epileptic
first introduce by Theiler et al. [6].The null hypothesis that the time focus and the other areas of the brain. However, all these data are
series is the result of a linear stochastic process. The measured significantly lower (P<0.01 to 0.001) than that of the healthy
topological properties are compared with the measured topological subjects in which the entropy values are from 0.4131 to 0.5055.
properties of the surrogate data sets. If both the experimental time Similarly, the SampEn results of the epileptic patients range from
series data and the surrogate data sets yield the same values for the 0.5451 to 0.7098 but not statistically lower compared to that of the
topological properties (within the standard deviation measured from control group in which the entropy values are from 0.6521 to 0.7730.
the surrogate data sets), then the null hypothesis that the It can be obviously seen that the performances of SyEn are
experimental data set is random noise cannot be ruled out. significantly effectively than that of the SampEn in EEGs
Here surrogate signal is produced by phase randomizing the complexity analysis given in this term.
given data which has similar spectral properties as of the given data. Table 3 gives the results of the SyEn and the SampEn
This means that the surrogate data sequence has the same mean, measurements in both two groups with 5s-window. The individual
variance and autocorrelation function and therefore has the same entropy for each subject is obtained from the average of 51
power spectrum as the original sequence but (nonlinear) phase sub-sections with 50 data overlapping EEG segment measurements.
relations of the original series are destroyed in the surrogate series. The SyEn of the epileptic patients ranges from 0.2388 to 0.3276. It
To construct the phase-randomized surrogate data set, the original is significantly lower (P<0.01 to 0.001) than that of the control
time series is transformed by Fourier transform, and then take the group in which the entropy values are from 0.4417 to 0.5391.
inverse Fourier transform. For comparison with the Correspondingly, the SampEn of the epilepsy group ranges from

0.5674 to 0.6997, which is not statistically lower compared to that TABLE 1ˊCOMPARISON OF SYMBOLIC ENTROPY FOR ORIGINAL TIME
of the control group in which the entropy values are from 0.6789 to SERIES AND SURROGATE DATA FOR A HEALTHY SUBJECT (***P<0.001)
0.8088. Channe
Original Surrogate-data
2 l
1.5 (a) Fp1 0.6168 0.8147 r 0.0594***
1 Fp2 0.5965 0.8022 r 0.0674***
0.5 F3 0.7230 0.8396 r 0.0469***
0 F4 0.7665 0.8296 r 0.0542***
-0.5 C3 0.6945 0.8426 r 0.0448***
-1 C4 0.7274 0.8577 r 0.0634***
-1.5 P3 0.7610 0.8033 r 0.0594***
-2
0 200 400 600 800 1000 1200
P4 0.7813 0.8621 r 0.0445***
O1 0.7791 0.8495 r 0.0604***
(a) O2 0.7544 0.8723 r 0.0473***
0.8056 r 0.0564***
0.9
˄b˅
F7 0.7320
0.85
0.8
F8 0.6702 0.8210 r 0.0576***
0.75
T5 0.8120 0.8491 r 0.0512***
0.7 T6 0.8031 0.8223 r 0.0588***
SpEn
0.65
0.6
0.55
TABLE 2ˊ COMPARISON OF SYEN AND SAMPEN
0.5
Channe Original Surrogate-data
0.45
0 100 200 300 400 500 600 700 800 900 1000 l Controls Patients Controls Patients
(b)
Fp1 0.4921 r 0.2572 r 0.6521 r 0.6051 r
0.0223 0.0816*** 0.0022 0.0961
0.75
Fp2 0.4998 r 0.2656 r 0.6650 r 0.6142 r
(c)
0.7 0.0313 0.1070*** 0.0510 0.1407
0.65
F3 0.4918 r 0.2446 r 0.7271 r 0.5912 r
0.0177 0.1016*** 0.0485 0.1119*
0.4965 r 0.2625 r 0.7372 r 0.5978 r
0.6
F4
0.55 0.0041 0.1009*** 0.0503 0.1259*
0.5
C3 0.4992 r 0.2653 r 0.7311 r 0.6488 r
0.0391 0.0846*** 0.0630 0.0992
0.45
0 100 200 300 400 500 600 700 800 900 1000 C4 0.5055 r 0.2708 r 0.7097 r 0.6489 r
0.0567 0.1100*** 0.0124 0.1401
(c)
Fig. 2. Comparison of the size of the sliding window. (a) a segment
P3 0.4582 r 0.2944 r 0.7551 r 0.6701 r
of time series with changing entropy. (b) the SyEn with window size 0.0190 0.0850*** 0.0684 0.1120
100. (c) the SyEn with window size 50. P4 0.4779 r 0.3070 r 0.7610 r 0.6068 r
0.0031 0.0917** 0.0561 0.2035
In order to obviously evaluate the performance of different O1 0.4882 r 0.3049 r 0.7695 r 0.6952 r
algorithms in different windows, we depict the changes of entropy 0.0386 0.0512*** 0.0798 0.0565*
of the EEGs with time evolution in channel Fp1 of a healthy person, O2 0.4907 r 0.2640 r 0.7730 r 0.7098 r
measured by both SyEn and SampEn as shown in Fig. 3. The data in 0.0445 0.1245** 0.0209 0.1171
Fig.3 (a) are obtained from the SampEn measurements and the F7 0.4131 r 0.2303 r 0.6538 r 0.5451 r
curves in Fig.3 (b) are taken from the SyEn calculations in
0.0029 0.0916*** 0.0711 0.1123*
2s-window which generates 57 entropy results. Fig. 2(c) and 2(d)
represent the SampEn and SyEn measurements in the 5s-window by
F8 0.4310 r 0.2464 r 0.6786 r 0.5634 r
generating 51 entropy data respectively. It can be seen that the SyEn 0.0183 0.0944*** 0.0131 0.1625
of the epileptic patients are obviously lower than that of the healthy T5 0.4333 r 0.2238 r 0.7595 r 0.6651 r
subjects in these two sliding windows. Only in the 5s-window, the 0.0161 0.1214** 0.0462 0.0988*
SampEn can distinguish two different groups while in the T6 0.4426 r 0.2689 r 0.7709 r 0.6760 r
2s-window, the difference is very weak thus causing the disability 0.0089 0.1151** 0.0544 0.1479
of the algorithm.

TABLE 3. COMPARISON OF SYEN AND SAMPEN WITH 5S-WINDOW
FOR TWO DIFFERENT EEG DATA
0.7
Channe SyEn SampEn
l Controls Patients Controls Patients 0.6
Fp1 0.5326 r 0.2728 r 0.6837 r 0.6293 r
0.0352 0.0838*** 0.0057 0.1034 0.5
SyEn (2 sŦwindow)
Fp2 0.5384 r 0.2850 r 0.6988 r 0.6339 r
0.4
0.0503 0.1128*** 0.0393 0.1412
F3 0.5184 r 0.2560 r 0.7557 r 0.6138 r 0.3
0.0254 0.1053*** 0.0457 0.1185*
F4 0.5238 r 0.2836 r 0.7756 r 0.6205 r 0.2
0.0030 0.1054*** 0.0455 0.1295*
C3 0.5391 r 0.2807 r 0.7640 r 0.6701 r 0.1
0.0570 0.0832*** 0.0679 0.1037*

0
C4 0.5329 r 0.2896 r 0.7401 r 0.6699 r 0 10 20 30 40 50 60
Number of sections
0.0554 0.1130*** 0.0200 0.1424
P3 0.4855 r 0.3100 r 0.7884 r 0.6902 r (b)
0.0164 0.0837** 0.0604 0.1168* 0.7
P4 0.5012 r 0.3276 r 0.7997 r 0.6303 r
0.0071 0.0960** 0.0466 0.2050 0.6
O1 0.5140 r 0.3221 r 0.7929 r 0.7196 r
0.0421 0.0494*** 0.0610 0.0636 0.5
SyEn (2 sŦwindow)
O2 0.5144 r 0.2827 r 0.8001 r 0.7342 r

0.4
0.0483 0.1308** 0.0197 0.1194
F7 0.4417 r 0.2420 r 0.6789 r 0.5674 r 0.3
0.0009 0.0944*** 0.0622 0.1153*
F8 0.4642 r 0.2678 r 0.7096 r 0.5838 r 0.2
0.0223 0.1016*** 0.0045 0.1684

T5 0.4513 r 0.2388 r 0.8039 r 0.6904 r 0.1
0.0235 0.1240** 0.0344 0.1058*

0
T6 0.4650 r 0.2860 r 0.8088 r 0.6997 r 0 10 20 30 40 50 60
Number of sections
0.0143 0.1205** 0.0389 0.1521
(For table II and III, the comparisons is statistically the control and (c)
epileptic groups using the algorithm of SyEn and SampEn (*P<0.05, 0.7
**P <0.01, ***P<0.001).Control: the healthy subjects; Patient:
epilepsy.) 0.6
0.5
SyEn (2 sŦwindow)
1.2
0.4
1.1
0.3
1
SampEn (2 sŦwindow)
0.2
0.9 0.1
0.8 0
0 10 20 30 40 50 60
Number of sections
0.7
(d)
0.6
Fig.3. Comparisons of the control and epileptic groups using the
0.5 algorithm of SyEn and SampEn obtained from Fp1 with different
0 10 20 30 40 50 60
Number of sections windows, where -*-denotes the healthy,-o-denotes the patient.
(a)

4. CONCLUSIONS AND DISCUSSIONS [4] J. Roschke, “The dimensionality of human's
electroencephalogram during sleep”. Biological. Cybernetics, 1991,
Observational data of natural systems, as measured in medical 64:307-313
measurements are typically quite different from those obtained in [5] L. Werner, “Dimensional analysis of the human EEG and
laboratories. Due to the peculiarities of these data, well-known intelligence”. Neuroscience Letters, 1992, 143:10-14
characteristics, such as power spectra or fractal dimension do not [6] J. Theiler, “Testing for nonlinearity in time series: The method
often provide a suitable description. To study such data, we present of surrogate data”, Physica D , 1992, 58:77-94..
here a novel measure of complexity called symbolic entropy based [7] J. Theiler and D. Prichard, “Using `Surrogate Surrogate Data'
on symbolic dynamics. The performances of the proposed method to calibrate the actual rate of false positives in tests for nonlinearity
and the SampEn are presented to analyze the complexity of EEG in time series”, Fields Inst. Comm, 1997,11:99-.
time series, collected from both the healthy subjects and epileptic [8] M. Casdagli, “Chaos and deterministic versus stochastic
subjects. nonlinear modeling”. J. R. Stat. Soc. B, 1992, 54:303-328.
Results further suggests that the neural information [9] T. Zhang, “Chaotic characteristics of renal nerve peak interval
transmission or communication in patients could be more isolated or sequence in norm otensive and hypertensive rats”. Clin Exp
impaired to some degree compared to that in control subjects. The Pharmacol Physiol 1998, 25:896-903.
ability of entropy measurements to detect these differences provides [10] A. Lempel, Ziv J. “On the complexity of finite sequences,”
a means by which the complexity of the EEG signal can be IEEE Trans, On Information Theory, 1976, IT-22(1):75-81.
characterized and thus comparisons made between different brain [11] S. M. Pincus, “Approximate entropy as a measure of system
areas and subjects requiring only a relatively short portion of the complexity”, Proc. Natl. Acad. Sci, 1991, 88:2297-2301.
signal. One of the considerations in this study is to compare [12] J. S. Richman, “Physiological time-series analysis using
different window sizes, such 2s-window and 5s-window, which approximate entropy and sample entropy,” Am J Physiol, 2000,
may provide significant entropy measurement in order to 278:H2039-H2049.
characterize the loss of complexity associated with the seizure. It is [13] D. Cysarz and et al., “Irregularities and nonlinearities in fetal
well known that EEG signals are non-stationary and it is difficult to heart period time series in the course of pregnancy”, Herzschr
obtain a long transient for a reliable estimation of the invariant Elektrophys, 2000, 11:179-183.
measures. Our SyEn indicate that either 2s-window or 5s-window [14] D. Cysarz and et al., “Entropies of short binary sequences in
gives distinguishable differences of entropy measurements between heart period dynamics”, Am. J. Physiol. Heart Circ. Physiol, 2000,
epileptic patients and healthy people. However, the SampEn can’t 278:2163-2172.
distinguish the complexity of different EEG data in 2s-window. [15] P. B. Graben, “Is it positive of negative? On determining ERP
Thus it shows that the proposed SyEn is more applicable to analyze components”, IEEE Trans. Biomedical Engineering, 2004,
these dynamical medical time series. Also, it provides preliminary 51:1374-1382.
support for the notion that the complex nonlinear nature of brain [16] C. Thomton, “Evaluating depth of anesthesia: Review of
electrical activity may be the result of isolation or impairment of the methods”, Intenationa1 Anesthesiology Clinics, 1993, 31:67-88.
neural information transmission within the brain. [17] K. Q. Yang and et al., “A new approach to transforming time
In addition, we use surrogate data based on SyEn discriminating series into symbolic sequences”. First Joint BMESEMBS
statistic to test the nonlinear hypothesis of EEGs. Investigations Conference: Serving Humanity, Advancing technology, 1999,
show that EEG data are chaotic and the applying nonlinear 6:13-16.
dynamical methods to analysis of time series of brain electrical [18] X. Meng, “Coarse graining in complexity analysis of EEG I:
activity provides new information about the complex dynamics of Over coarse graining and a comparison among three kinds of
underlying neuronal networks. The proposed entropy algorithm is a complexities”, Acta Biophysica Sinca, 2000,16(4):701-706
useful tool in several fields of complexity analysis in nonlinear [19] E. Shen, „Coarse graining in complexity analysis of EEG II:
science and more meaningful for biomedical engineering. The influence of quantization on complexity analysis”, Acta
Biophysica Sinca, 2000, 16(4):707-710
ACKNOWLEDGEMENTS [20] J. Cai, “Analysis on the chaotic pseudo random sequence
complexity”, Acta Physica Sinica, 2003, 52:1871-1876.
This work is supported by the Natural Science Foundation of China [21] F. H. Xiao and et al., “A symbolic dynamics approach for the
(60571066) and the Natural Science Foundation of Guangdong, complexity analysis of chaotic pseudo-random sequences”. ACTA
respectively. Physica Sinica, 2004, 53:2877-2881
5. REFERENCES
[1] W. J .Freeman, “Simulation of chaotic EEG patterns with a

dynamic model of the olfactory system”, Biological Cybernetics,
1987, 56:139-150
[2] W. J. Freeman, “Tutorial on neurobiology: form single neurons
to brain chaos”. Int. J. Bifurcation and Chaos, 1992, 2:451-482
[3] D. Gallez, “Predictability of human EEG: a dynamical
approach”. Biological Cybernetics, 1991, 64:381-391.

Prediction of Clinical Response to Treatment of

Crohn’s Disease by Using RBFN
Igor Grabec, Ivan Ferkolj*, Daša Grabec**, Dušan Grošelj***
Faculty of Mechanical Engineering, University of Ljubljana,
Aškerčeva 6, POB 394, SI-1001 Ljubljana, Slovenia, e-mail:igor.grabec@fs.uni-lj.si
University Medical Center*, Institute of Oncology**, Medical Faculty***
University of Ljubljana, SI-1001 Ljubljana, Slovenia
Abstract— This paper concerns the prediction of patient re- that several oral parameters could be applicable for the purpose
sponse to treatment of Crohn’s disease with the drug infliximab. of response prediction since they describe the condition of the
As an optimal predictor, a normalized radial basis function neural upper digestive tract. This hypothesis was derived from clinical
network is utilized. Information used in the prediction is based
on joint data from clinical parameters and response indicators observations [2], [3], [4], [5].
observed in a test group of patients. In the presented example, In comparison with other parameters recently considered
the network utilizes given oral parameters from selected patients for this purpose [1], [6], [7], oral parameters can easily be
to predict the response to drug administration. In the prediction determined using non-invasive methods, and therefore appear
algorithm, the similarity between the given parameters and those convenient in clinical application, provided that they yield suf-
in the database is calculated. It is then used as a weight by which
the response of a patient from the test group is predicted. The ficient information for prediction. As a set of oral parameters
method thus resembles the prediction performed by a physician about the patient’s condition before drug administration, we
based upon comparison of a treated patient with previously consider here dental, periodontal and oral mucosa parameters,
tested ones. Prediction quality is estimated from the discrepancy as well as morphotypes in subgingival plaque detected by
between predicted and observed response data. Prediction quality darkfield microscopy. The objective of this article is to prove
corresponding to particular clinical parameters provides for their
ordering and selection of an optimal set of parameters that our hypothesis statistically and to propose a method for its
together yield the maximal quality 0.63 and ∼ 80% coincidence application.
between predicted and observed response categories. Among the different statistical estimators applicable for
describing the effects of medical treatment by various drugs,
Index Terms— normalized radial basis function NN, clinical logistic regression and correlation between a drug dosage and
response prediction, Crohn’s disease, oral parameters an indicator of the patient’s condition are most frequently
applied [1], [8], [9], [10], [11]. However, in cases when many
I. I NTRODUCTION indicators are included, a general non-parametric regression
One of the basic problems in modern medicine is predicting appears to be better suited for prediction purposes, especially
the effects of an administrated drug based on quantitative when no functional form of predictor is given in advance [12].
clinical and biomedical data on the patient’s condition. For To demonstrate the applicability of this method, it was recently
this purpose, a reliable quantitative predictor representing used for the prediction of the healing process of periodontitis
the relation between corresponding clinical parameters and [7]. In the following sections the same method is adapted to
response indicators of the treated patient is needed. The aim processing Crohn’s disease data, and its applicability is ex-
of this article is to demonstrate that statistical modeling of amined by estimating the prediction quality of the given data.
the predictor by non-parametric regression that represents a An additional objective of our presentation is to demonstrate
normalized radial basis function neural network (NRBFNN) that this method is also applicable as a selector of optimal
enables a generic approach to the solution of this problem. parameters appropriate for prediction.
This will be exemplified in modeling the predictor of a Crohn’s
disease (CD) patient’s response to administration of infliximab II. M ODELING THE PREDICTOR
(a chimeric monoclonal antibody to tumor necrosis factor).
Let us consider a case in which the patient’s state is charac-
This example is being considered because 3 months after
terized by M variables that comprise components {vm ; m =
treatment with this drug, only 44% of patients are in complete
1, . . ., M } of a random vector V = (v1 , . . . , vM ). Repeated
remission, while 56% have weak or no remission. Since this
drug can cause unfavorable side effects in some patients, a re- observations of equally prepared states of a phenomenon
liable and simple prediction method of their response is sought generally yield scattered values of the measured variables.
The result of measurements is most frequently represented
[1]. Since the origin of Crohn’s disease is unknown, it is not
by mean values (< v1 >, . . . , < vM >) = < V > and
a priori evident which parameters of patients’ condition can
standard deviations (σ1 , . . ., σM ). However, more in tune with
successfully be applied for this prediction. Our hypothesis is
the stochastic character of the measurements is describing
Manuscript received: January 31, 2007 the scattering of state variable V around its mean < V >

by the density of probability distribution ψ(V− < V >), between various components of the vector V are frequently
known as the scattering function. Quite often this function sought. Such a relation is generally expressed by some re-
characterizes the properties of instruments utilized in mutually gression function. If the structure of the function is known in
independent measurements. Without loss of generality we advance, then only parameters in the function are estimated by
may assume that the scatterings of different components, minimizing the statistical error [11]. However, there are many
but not the components themselves, are mutually statistically examples in the natural sciences where the structure of the
independent. All standard deviations can further be equalized regression function is not known, and we must formulate the
σm = σ ; m = 1, . . ., M by normalization or adaptation of the regression function non-parametrically [12]. A typical example
measurement scales. We assume this property in the following is encountered when the response of a patient to a certain
treatment. The form of the scattering function pertaining to a treatment has to be predicted from given parameters of his or
particular case can be estimated objectively from measured her condition [13].
data, but it turns out that the scattering function of the m-th In this article we consider a patient whose initial state is
component can often be expressed analytically by the Gaussian described by given components of the vector V and we want
function: to estimate his or her response to treatment described by

1 (vm − < vm >)2 complementary hidden components. The given components
ψ(vm − < vm >) = √ exp − , (1) comprise a partial vector G = (v1 , . . . , vg , øg+1 , . . . , øM )
2π σ 2σ2
in which the sign ø denotes a component that is missing.
which we further apply in our treatment. We also assume Similarly, the hidden components comprise the vector H =
that measurements of components v1 , . . . , vM are mutually (ø1, . . . , øg , vg+1 , . . ., vM ), which we treat as a missing com-
independent and express the multivariate scattering function plement of G. A complete vector V is thus expressed by the
by the product concatenation V = G ⊗ H, and the vectors of the database
M
by Vn = Gn ⊗ Hn . Our task is to estimate H from the
ψ(V− < V >) = ψ(vm − < vm >). (2) probability density function (1) subject to the condition that
m=1 vector G is given. We consider as an optimal predictor of H
If only one measurement is performed on a selected subject, that vector Ho for which the statistical mean-square error at
we interpret the acquired data vector as the most probable one a given data vector G is a minimum:
and consequently utilize it as the mean value of the scattering E[(H − Ho )2 |G] = min(Ho ). (4)
function ψ(V− < V >). In this case the scattering function
can still be expressed by the last two equations, provided Here E[. . .] denotes the statistical average or mean value. The
that the standard deviation σ is determined by additional minimum occurs where dE[(H − Ho )2 |G]/dHo = 0. This
measurements. equation yields as the optimal estimator Ho the conditional
In order to characterize the properties of the phenomenon average (CA):

under consideration, we have to perform measurements on
a representative set of say N samples. A measurement on Ho (G) = E[H|G] = H f(H|G) dH (5)
the n-th sample yields the vector Vn = (v1,n , . . . , vM,n),
This estimator determines a non-parametric regression of H on
which we further consider as the center of the scattering
G and can be interpreted as a general statistical model of the
function ψ(V − Vn ). Since measured data generally vary
function H(G). If we express the CA by the estimated joint
from sample to sample, we treat V as a random variable
probability density (1), we obtain the following expression
whose properties can be statistically specified based on the
[12]:
examination of the representative statistical set of state vectors
{V1, . . . , VN }, which forms a database. We consider this N N
n=1 Hn ψ(G − Gn )
database as a source of quantitative empirical information Ho (G) = N
= Hn Bn (G). (6)
and utilize it when characterizing various relations between j=1 ψ(G − Gj ) n=1
measured variables statistically [12]. For example, in the case The coefficients
presented later in this article, the database represents clinical ψ(G − Gn)
parameters and response indicators of a group of patients Bn (G) = N (7)
suffering from Crohn’s disease [1]. j=1 ψ(G − Gj )
Properties of the variable V are statistically described by are called the basis functions and satisfy the conditions
the joint probability density function f(V), which can be N

described by the kernel estimator [12]: Bn (G) = 1 , 0 ≤ Bn (G) ≤ 1. (8)
N
1
n=1
f(V) = ψ(V − Vn) . (3) The basis functions Bn (G) can be interpreted as a normalized
N n=1
measure of similarity between the given G and the sample
Here the kernel ψ(V − Vn ) denotes a multivariate Gaussian Gn. Hence the vector Hn whose complement Gn is most
function, while the value of its width σ will be specified similar to the given vector G mainly determines the optimal
more in detail later on. Using f(V), various quantities can predictor Ho (G). Therefore, this predictor represents an as-
be estimated by a statistical average. Among them, relations sociative recall of information from the database. It resembles

a subjective estimation of a patient’s response by a physician records, then it is reasonable to exclude only one record
using given data of a patient and a knowledge of responses of for validation of predictor performance and to utilize the
other patients with similar properties described by partial data remaining samples for modeling. In this case the procedure
vector G. However, an advantage is that the estimator Ho can be sequentially repeated by excluding a different sample
can be optimally determined by processing given data of a in each step. Prediction quality is then determined by the mean
patient and the database. Such a system resembles an artificial of values obtained by this repetition. In the later demonstration
neural network in which a memory cell contains a particular we utilize this cross-validation method to estimate prediction
data vector [12], [14]. Its governing equation 6 represents a quality.
normalized radial basis function neural network (NRBFNN). Prediction quality generally depends on the settable kernel
The nonparametric regression determined by the conditional parameter σ, which determines the smoothing of the proba-
average estimator represents the basic link between the statis- bility density function. The value of σ at which the quality
tical modeling of natural laws and the field of neural networks, attains a maximum is considered as optimal for the modeling
and is widely applicable in various natural and technical of the predictor. In our calculations, this value is equal to 0.7
sciences [12], [14], [15]. of the standard deviation of data.
Prediction quality generally depends on the number of
A. Prediction quality components comprising vector G, and it is often observed
that all components are not equally applicable for modeling;
A predictor maps the random variable G to a new random moreover, some of them could be disturbing. We roughly
variable Ho (G), which generally differs from the variable H. estimate the applicability of a particular component by con-
When the variables G, H are related by some natural law, we sidering it as the only component of G and calculating
expect that the first and second statistical moments E[H−Ho ], the corresponding prediction quality. In such a manner all
E[(H − Ho )2 ] of the prediction error are also small. The components that are considered as given can be ordered with
second moment is: E[(H − Ho )2 ] = Var(H) + Var(Ho ) − respect to the prediction quality pertaining to them. By joining
2Cov(H, Ho ) + [m(H) − m(Ho)]2 , where E, m, Var, Cov de- the components that are most informative, we simply proceed
note statistical average, mean value, variance and covariance, to a combination that eventually corresponds to an optimal set
respectively. In the case of statistically independent variables of components for modeling the predictor. In our example we
H and Ho with equal mean values we get: E[(H − Ho )2] = follow this procedure for the specification of an optimal set of
Var(H) + Var(Ho ). With respect to this property we define oral parameters.
predictor quality by the formula
E[(H − Ho )2 ] III. PREDICTION OF PATIENT RESPONSE TO TREATMENT
Q = 1−
Var(H) + Var(Ho ) OF C ROHN ’ S DISEASE
2Cov(H, Ho ) [m(H) − m(Ho )]2
= − (9) We apply the NRBFNN to model the relationship between
Var(H) + Var(Ho ) Var(H) + Var(Ho ) oral condition and the patient’s response indicators in treating
The quality is 1 if the prediction is exact: Ho = H, while it CD with the drug infliximab. Data on each patient are treated
is 0 if H and Ho are statistically independent and have equal as a statistical sample of a random variable V. Its particular
mean values. The quality Q approximately corresponds to the sample is denoted by the index n and characterized by
correlation coefficient between Ho and H. Q can be negative M = 30 measured data comprising the components of the
if the correlation coefficient is negative or if m(H) = m(Ho ). n-th data vector Vn . The data vectors obtained in a group
In the case of a small number N of samples, the estimated of N = 14 patients are combined in the initial database
quality could be subject to considerable statistical variation, providing the information about the relation between 27 oral
but with an increasing number of samples N we generally condition and 3 response parameters, as specified in TABLE
expect that the statistically estimated CA will increasingly 1. The values of the database are shown in Fig. 1 by a set
better represent the underlying natural law. The distributions of 14 records. The response of Crohn’s disease patients at 8
of predicted and actually observed data need not completely weeks, 3 and 6 months after drug administration is presented
coincide. The discrepancy between the corresponding mean by the level of the CD activity index, three discrete scores
values and standard deviation can simply be removed by of which correspond to complete (denoted by the level 3),
normalization of the predicted data, which is also utilized in partial (2), and no remission (1). Each component of the data
our treatment. vector is further treated as a random variable, mean value
The quality of prediction generally depends on the proper- and standard deviation, which is statistically estimated using
ties of vector H. Since H is a random variable, the quality N = 14 samples from the database. Using these statistics, all
should be estimated as a statistical mean over values calculated components are normalized before application in the NN.
from various statistical samples of H. For this purpose an
additional data set must be provided besides the database
utilized in the predictor [14]. When only one set of data A. Results
records is provided, then it is usually split into a database In predictor modeling, we assume that partial information
for modeling the predictor and the remaining data set utilized about a new patient is given by his/her oral parameters, and we
for testing. If the given set includes a small number of data then predict response indicators. Oral parameters comprise the

INITIAL DATA BASE
800
700
ORDERED PREDICTION QUALITY
1
600
0.8
DISPLACED RECORDS
500 0.6
0.4
400
0.2
300
Q
0
200
−0.2
100 −0.4
−0.6
0
0 5 10 15 20 25 30
ORAL (1:27) INDEX RESPONSE (28:30) −0.8
−1
Fig. 1. Initial database comprising N = 14 samples of vector V . 0 5 10 15
ORDER INDEX
20 25 30
Legend: A record represents a sample Vn ; n = 1 . . . 14 and is displaced in

the vertical direction by the same amount with respect to the previous one. Fig. 3. Graph of ordered prediction quality Q. Prediction quality decreases
Components with indices between 1 and 27 represent oral parameters, while evenly with the increasing order index.
components with indices between 28 and 30 represent the response of patients
to treatment by the drug at various time intervals.
RELATION BETWEEN ORAL INDEX AND PREDICTION QUALITY

1
0.8
NORMALIZED REDUCED DATA BASE
60
0.6
0.4
50
0.2
Q
0 40
DISPLACED RECORDS
−0.2
30
−0.4
−0.6
20
−0.8
−1 10
0 5 10 15 20 25 30
ORAL INDEX
0
Fig. 2. Relation between oral index and the corresponding prediction quality 0 2 4 6 8 10 12
Q. ORAL (1:8) INDEX RESPONSE (9:11)
Fig. 4. Normalized reduced database.

Legend: Components with indices between 1 and 8 represent reduced oral
parameters, while components with indices between 9 and 11 represent the
given (G) and the response indicators, the hidden (H ) com- response of patients to treatment at various time intervals.
ponents of data vector V. The optimal predictor Ho is then
modeled non-parametrically by the NRBFNN. The quality of
prediction Q is estimated by the previously mentioned cross-
validation method. The applicability of each oral parameter
for modeling patient response is determined by calculating its PREDICTED AND MEASURED RESPONSE DATA
prediction quality Q. The resulting dependence of prediction 60
predicted
measured
quality on the oral index is shown in Fig. 2. By using these
50
data, oral parameters are ordered with respect to their quality,
as shown in Fig. 3. An optimal set of oral parameters is formed 40
by joining and excluding in G those with high quality and

DISPLACED RECORDS
observing the resulting quality. In this manner, the set that 30
yields maximal prediction quality is determined. It consists

of 8 parameters, printed in bold in TABLE 1. The optimal 20
set of oral parameters joined with the response parameters

10
is further treated as a reduced database. It is presented in
Fig. 4 by normalized data. The set of normalized response 0
parameters determined in patients and those predicted from 0 2

ORAL (1:8)
4 6
INDEX
8 10
RESPONSE (9:11)
12
the optimal set is shown in Fig. 5. At 8 weeks after drug

administration, the predicted response (index 9) coincides well Fig. 5. Predicted (*) and measured (o) response data.
with the observed one, but with increasing time intervals
(indices 10, 11), this agreement diminishes. This is quan-

2 DISTRIBUTION OF PREDICTED CATEGORY ERROR
MEAN PREDICTION DISCREPANCY <D> AND ERROR √ <D >
2
√ <D > 80
1.8 <D>
1.6
1.4
PROBABILITY [%]
1.2
40
<D> & √ <D >
2
0.8
0.6
0.4 0
0.2 2
1
0 0
ERROR −1
−2 11
−0.2 10
8 9 10 11 12 9
RESPONSE INDEX RESPONSE INDEX
Fig. 6. The mean prediction discrepancy (bottom) and error (top) versus Fig. 8. Error of predicted category versus response index.
response index.
1
PREDICTION QUALITY
prediction with zero error. As could be expected, the accuracy
of prediction also in this case diminishes with increasing time
0.8
intervals after application of the drug.
0.6
IV. C ONCLUSIONS
In our study we have considered the hypothesis that certain
Q
0.4
oral signs and symptoms can be applied for the purpose of

0.2 response prediction to infliximab administration in treating
Crohn’s disease. The aim of this article was to prove the
0
hypothesis statistically, based on an estimation of predictor
quality. The rather high level of modeled predictor quality
−0.2
8 9 10
RESPONSE INDEX
11 12
and the relatively good coincidence between predicted and
observed response scores indicate that this hypothesis cannot
Fig. 7. Prediction quality Q versus response index. be rejected.
Since the etiology of Crohn’s disease is still unknown, it
is not entirely clear which parameters of a patient’s condition
titatively demonstrated in Fig. 6 showing the dependence of could be successfully applied for prediction of response. A
mean prediction discrepancy and corresponding error on the recent study suggests that the likelihood of a patient with
response index. The corresponding dependence of prediction Crohn’s disease achieving complete remission 3 months after
quality is shown in Fig. 7. The value of prediction quality treatment with infliximab can be predicted from immunolog-
at the first time interval is Q = 0.63. This means that the ical parameters measured in the inflamed intestinal mucosa
oral parameters of the optimal set contribute substantially to before treatment [1]. As demonstrated in our article, the
prediction quality. However, inclusion of other oral parameters, clinical response of patients can also be successfully predicted
on average, decreases the prediction quality. With increasing based on the following parameters: recurrent or persistent
time intervals, prediction quality decreases, meaning that the swelling of the lips and cheeks; missing, healthy, non-vital
correlation between predicted and observed responses is ever teeth, and their total number; and the percentage of large rods,
lower. In spite of the observed decrease of prediction quality spirochetes and total number of bacteria. These parameters
with increasing time intervals, the mean and variance of the are rather easily determined by non-invasive methods, and
predicted and observed data coincide for all time intervals. therefore appear promising for clinical use in comparison with
This agreement is achieved by additional normalizing of data other parameters recently examined for this purpose [6].
obtained from non-parametric regression. In cases that include many indicators, NRBFNN appears to
In our research the response of examined patients was be better suited for prediction purposes than logistic regression
represented by discrete scores 3, 2, 1 indicating categories of and correlation between a drug amount and an indicator of the
complete, partial or no remission, respectively. In opposition patient’s condition [1]. Therefore, a similar study to the one
to this discrete scale, the application of the conditional av- presented here has been done using endoscopic biotic samples
erage yields a continuous scale for the predicted response instead of oral parameters. In this case, the level of optimal
indicator. This discrepancy can be avoided by rounding the prediction quality reaches a value of 0.73. However, when both
predicted value to the closest integer score. By such tuning kinds of optimal parameters are used together as given data,
the coincidence between predicted and observed category this level rises to approximately ∼ 0.9.
values at the first time interval (index 9) is ≈ 80%. The The value of prediction quality 8 weeks after drug adminis-
distribution of category prediction error versus response index tration, as determined from the optimal set of oral parameters,
is shown in Fig. 8. The highest peaks correspond to accurate is 0.63, which is significantly higher than the prediction quality

TABLE I
determined from a single parameter. This indicates that the
VARIABLES USED IN MODELING
method is also applicable as a selector of optimal parameters
appropriate for prediction. Therefore, the method yields new I II III IV V
information for the study of the relation between Crohn’s 1 25 Oral ulcers/ (fr)1 0.1±0.3 -0.808
disease therapy and the condition of patients, and eventually 2 1 Swelling of lips and cheeks/ (fr)1 0.1±0.4 +0.365
also for interpretation of other clinical symptoms. 3 24 Recurrent oral aphthae/ (fr)1 0.6±0.5 -0.775
In conclusion, our study has shown that an optimal set of 4 26 Hyperplasia of the mucosa/ (fr)1 0.1 0.3 -0.830
measured oral parameters in patients suffering from Crohn’s 5 19 Decayed teeth/ (fr)1 5.7±3.8 -0.459
disease can identify those in whom remission at a time interval 6 10 Filled teeth/ (fr)1 8.7±4.5 +0.001
of 8 weeks can be expected. We are aware that the number 7 7 Missing teeth/ (fr)1 5.9±6.3 +0.110
of patients studied is small and that the group is not homoge- 8 8 Healthy teeth/ (fr)1 10.7±6.7 +0.095
neous, since both luminal and fistulizing forms of Crohn’s 9 2 Non-vital teeth/ (fr)1 1.5±1.3 +0.214
disease are treated. Further studies with larger numbers of 10 14 Root canal filled teeth/ (fr)1 1.1±1.3 -0.264
patients and subgroup analysis are necessary before this new 11 6 Total number of teeth/ (fr)1 25.1±5.7 +0.123
response indicator prediction method can be proposed for 12 21 Impacted teeth/ (fr)1 1.0±1.2 -0.510
routine use. 13 22 Width of keratinized gingiva/ (mm) 3 4.1±0.6 -0.559
14 16 Probing depth/ (mm)3 1.7±0.4 -0.379
ACKNOWLEDGMENT 15 9 Gingival margin/ (mm) 3 -0.2±0.5 +0.042
16 15 Clinical attachment level/ (mm) 3 1.8±0.8 -0.365
This research was supported by the Ministry of Higher Ed-
17 18 Bleeding on probing/ (fr)1 0.2±0.1 -0.415
ucation, Science and Technology of the Republic of Slovenia, 18 13 Plaque index 0.7±0.4 -0.203
EU-COST, and Schering-Plough AG, Slovenia. 19 11 Gingival index 0.7±0.4 -0.034
20 17 Cocci/ (%)4 50.8±37.2 -0.398
R EFERENCES 21 12 Small rods/ (%)4 5.3±5.1 -0.189
[1] J. Ferkolj, A. Ihan, S. Markovic. , ”CD19+ in intestinal mucosa predict 22 4 Large rods/ (%)4 5.9±12.2 +0.201
the response to infliximab in Crohn’s disease”, Hepatogastroenterology, 23 23 Small motile rods/ (%) 4 35.8±33.7 -0.709
vol. 52 (64), pp. 1128–1133, 2005.
24 27 Large motile rods/ (%) 4 1.4±5.2 -0.830
[2] S. Harty, P. Fleming, M. Rowland, E. Crushell, M. McDermott, B. Drumm
and B. Bourke, ”A prospective study of the oral manifestations of Crohn’s 25 5 Spirochetes/ (%)4 0.7±1.8 +0.186
disease”, Clin. Gastroenterol. Hepatol., vol. 3 (9), pp. 86-91, 2005. 26 3 Total number of bacteria/ (fr)1 132.8±65.1 +0.205
[3] L. Halme, J.H. Meurman, P. Laine, K. von Smitten, S. Syrjanen, C. 27 20 Gingival crevicular fluid/ (N) 2 10.6±3.7 -0.507
Lindqvist, I. Strand-Pettinen, ”Oral findings in patients with active or
inactive Crohn’s disease”, Oral. Surg. Oral. Med. Oral. Pathol., vol. 76 28 CD resp. ind. at 8 weeks/ (N)2 2.6±0,5
(2), pp. 175–181,1993. 29 CD resp. ind. at 3 months/ (N)2 2.4±0.6
[4] M. Plauth, H. Jenss and J. Meyle, ”Oral manifestations of Crohn’s disease, 30 CD resp. ind. at 6 months/ (N)2 2.0±0.9
An analysis of 79 cases”, J. Clin. Gastroenterol, vol. 13, pp. 29-37, 1991.
[5] U. Mahadevan and W.J. Sandborn, ”Infliximab for the treatment of
orofacial Crohn’s disease”, Inflamm. Bowel Dis., vol. 7, pp. 38-42, 2001.
[6] D. Grošelj and I. Grabec, ”Statistical modeling of tooth mobility after A. Legend of Table
treatment of adult peridontitis”, Clin. Oral Invest, vol.6, pp. 28–38, 2002.
[7] I. Grabec and D. Grošelj, ”Detection and prediction of tooth mobility I – index of the component in the initial database,
during the periodontitis healing process”, Comput. Methods in Biomech. II – order index.
Biomed. Engin., vol.6 pp. 319–328, 2003. III – meaning of measured data vector components,
[8] S. Vermeire, E. Louis, A. Carbonez, G. van Asche, M. Noman, J. Be- IV – mean value and standard deviation of the group,
laiche, M. de Vos, A. van Gossum, P. Pescatore, R. Fiasse, P. Pelckmans, V – prediction quality,
H. Reynaert, G. d’Haens, P. Rutgeert, ”Belgian group of infliximab The set of optimal parameters is written bold.
expanded access program in Crohn’s disease. Demographic and clinical Units : (fr)1 –frequency in the group, (N)2 –number,
parameters influencing the short-term outcome of anti-tumor necrosis (mm)3 – millimeter, (%)4 –percentage.
factor (infliximab) treatment in Crohn’s disease”, Am. J. Gastroenterology,
vol. 97, pp. 2357–2363, 2002.
[9] M. A. Parsi, J. P. Achkar, S. Richardson, J. Katz, J. P. Hammel, B.
A. Lashner and A. Brzezinski, ”Predictors of response to infliximab in
patients with Crohn’s disease”, Gastroenterology, vol. 123, pp. 707–713,
2002.
[10] R.H. Riffenburgh, Statistics in Medicine , Amsterdam: Elsevier, 2005
[11] F. E. Harrell, Regression Modelling Strategies: With Applications to
Linear Models, Logistic Regression and Survival Analysis , Berlin:
Springer - Verlag, 2006.
[12] I. Grabec and W. Sachse, Synergetics of Measurement, Prediction and
Control, Berlin: Springer-Verlag, 1997, pp. 237-242.
[13] P. S. Rosenberg, H. Katki, C. A. Swanson, L. M. Brown, S. Wacholder,
R. N. Hoover, ”Quantifying epidemiologic risk factors using nonparamet-
ric regression: model selection remains the greatest challenge”, Statistics
in Medicine vol.22, pp. 3369-3381, 2003.
[14] S. Haykin, Neural Networks, A Comprehensive Foundation, 2nd ed.
New York, NY : Macmillan College Publishing Company, 1999.
[15] I. Grabec, ”Extraction of Physical Laws from Joint Experimental Data”,
Eur. Phys. J. B, vol. 48, pp. 279-289, 2005. (DOI: 10.1140/epjb/e2005-
00391-0)

Study on Fuzzy Evaluating Neural Network for University Student Credit
QINGYU XIONG 1, 2, JING CHEN 1, 2, QI HUANG2

1
The Key Lab of High Voltage Engineering & Electrical New Technology of Education Ministry of China,
2
Automation Department of Chongqing University, Chongqing, China, 400030
Abstract: This paper presents an evaluation method function and the weights. Trained by itself repeatedly,
based on neural network and fuzzy evaluation model the neural network can reflect the non-linear relation
with multilayer and multi-index evaluating university between index system and evaluation and accumulating
student credit effectively. The method integrates fuzzy
of expert knowledge. The model can reduce workload
mathematics into neural network and train the neural
network to modify the parameter of fuzzy
and subjectivity when evaluating. And the scientificity
mathematically. The model uses the expert experience and rationality of evaluation will be advanced.
to evaluate university student credit and simplifies the This paper is organized as follows. In Section 2,
evaluating process effectively. evaluating model structure and index system is
developed. The fuzzy evaluating neural network is
introduced in Section 3. In Section 4, we give a
simulation example. Finally, brief conclusions of this
1 Introduction paper are drawn in Section 5.
Personal credit files play a very important role in every

way of modern life abroad. Personal credit in China is
necessary along with the trend of social development. 2. Index System
Especially, University student credit is the basis of
personal credit. So the study on university student credit According to the fully-fledged index system for personal
is very important. The paper will present an index system credit evaluation and practical situation of China, we
of university student credit and a university student establish two levels of comprehensive evaluation index
credit evaluation model. system. There are two for 1st level indexes and eleven
A lot of researches have been done in personal for 2nd level indexes (see Table 1).
credit evaluation at home and abroad. There are many
comparatively mature evaluation method in the field of Table 1. The evaluation index system for university
personal credit evaluation system, such as linear student credit
regression [1], discriminant analysis [2], logistic
regression [3], linear programming [4] and so on. In the First Nature Economic
last few years non-parameter statistics and artificial level Information Information
intelligence, such as nearest neighbor algorithm [5], indexes u (1) u ( 2)
neural network [6], [7] and genetic algorithms [4], [8],
Grades ranking Scholarship
are used in the personal credit evaluation model. All the
u1(1) u1( 2)
methods for personal credit evaluation are effective and
feasible. But these methods have some limitation. Every Moral score Part-time job
time we have done a case, we should analyze it first and u2(1) earning u 2( 2)
then have a comprehensive evaluation. The successful
Prize and
case prior can’t be utilized and expert knowledge can’t Other earning
Punishment
be accumulated. u 3( 2 )
Second u3(1)
A lot of indexes of university student credit are
fuzzy information. The key of fuzzy information handing level Payment
is the generation of the membership function and the indexes Cheating in of school fee
a exam u4(1)
weights. But it is not easy to generate the membership u4( 2)
function and the weights. It is a long time to adjust the
Attendance
membership function and the weights. Neural network
u5(1)
can realize the complicated nonlinear mapping and has Repayment of
learning ability. So we can embed fuzzy logic in neural Participation in the loan u
5( 2 )
network to handle the fuzzy information and adjust the socially useful
membership function and the weights. In this paper, we activity u6(1)
present a fuzzy evaluating neural network for university
student credit. The model integrates fuzzy
comprehensive evaluation method into neural network. It
makes use of nonlinearity, fault-tolerance and self-
learning of neural network [9] to adjust the membership There isn’t a uniform standard to measure the
indexes. And some indexes are qualitative, not

quantitative. So it’s difficult to analyze and process them.
The index used for input should be quantified by expert
scoring method. The quantified attribute value of index is
shown as Table2.
Table 2. Quantify qualitative indexes
Quantified attribute
Index name
value of index
Grades ranking u1(1) A=10, B=8, C=6, D=4, E=2
Moral score u2(1) A=5, B=4, C=3, D=2, E=1
State-level prize=4,
City-level prize =3,
Prize and School-level prize =2,
Punishment u3(1) No=0,
Warning=-1, Fig. 1. Fuzzy evaluating neural network structure for university
Gig=-2, student credit
Disciplinary probation =-3
Cheating in a exam (1) First layer is the input layer where the eleven
u4(1) No=1, Yes=-1
quantified indexes are inputted.
No=1, The input and output of the first layer is
Leave =-1, o1k (i ) u k (i ) k 1,",6
Attendance u5(1) Late=-2, while i 1; k 1, ",5 while i 2 (3.1)
Leave early=-3,
Absence=-4 (2) The outputs of second layer are the membership
degree to each evaluation levels. There are 5 evaluation
Participation in Frequently =2, levels. So the number of nodes is 5 u 11 on the second
socially useful activity Occasionally= 1,
layer. The neuron function of second layer is the
u6(1) Never=0 membership function shown as formulas (3.2) to (3.4).
Scholarship u1( 2) A=4, B=3, C=2, D=1 1 x d x1
°
Part-time job earning P1 ( x) ®
( x x1 ) 2
(3.2)
u2( 2) Yes =7, No =3 V2
°̄e x ! x1
Other earning u3( 2) Yes =4, No =2 ( x x j )2

Payment of school fee On time=8, P j ( x) e V2
j 2,3,4 (3.3)
u4( 2) Actively pay after payment =5,
Fallen into arrears =0 ( x x5 ) 2
Repayment of the loan On time=6, °e V 2 x d x5
Work off arrears =-1, P 5 ( x) ® (3.4)
u5 ( 2 ) ° 1
Fallen into arrears=-2 ¯ x ! x5
The input and output of the second layer is
net 2jk (i ) o1k (i ) j 1,...,5 (3.5)
3. Fuzzy Evaluating Neural Network o 2jk (i ) P jk (i ) o1k (i ) j 1," ,5 (3.6)
Based on the characteristic of neural network and fuzzy (3) Use synthesis operator M to accomplish the
comprehensive evaluation, this paper puts forward fuzzy first fuzzy synthesize on the third layer and the second
evaluating neural network (FENN). The structure is synthesize on the forth layer. The synthesis operator M
shown as Fig. 1. FENN is not the black box and is shown as formula (3.7). There are 5 u 2 nodes on the
correspond with the fuzzy comprehensive evaluation on third layer and 5 nodes on the forth layer.
structure. Its nodes and parameter have physical meaning. M x, ¦ wo (3.7)
Two levels of indexes are inputted on the first layer and
The input and output of the third layer is
the membership degrees to each evaluation levels are
generated on the second layer. The third layer complete om3 ( i ) 2
M wk ( i ) , omk (i ) m 1,",5 (3.8)
the first fuzzy comprehensive evaluation and the forth The input and output of the forth layer is
layer accomplish the second fuzzy comprehensive
evaluation. The output of the fifth layer is the result of

on4 M wi , on3(i ) n 1,",5 (3.9)
fuzzy comprehensive evaluation. (4) The output of the fifth layer is result of fuzzy
comprehensive evaluation. There is 1 node on the fifth

layer. u1( 2)
The input and output of the fifth layer is
5 Part-time job
0 2 4 6 8 2
¦ v n o n4 earning u2( 2)
o5 net 5 n 1
5
(3.10) Other earning
0 5/4 5/2 15/4 5 5/4
¦n 1
o n4 u3 ( 2 )
Payment of
school fee 0 9/4 9/2 27/4 9 9/4
u4( 2)
4 Simulation
Repayment of
the loan u5( 2) -2 1/4 5/2 19/4 7 9/4
The paper selects eleven quantified indexes as the input.
80 groups of data are chosen to be the training samples
of the neural network. The input vector is 2nd level Table 4. The initial weights of each levels
indexes X (u1(1) ,", u6(1) , u1( 2) ,", u5( 2) ) and output
vector Y is the value of evaluation. So the input layer First Nature Economic
has 11 nodes and output layer has 1 node. level Information 0568 Information 0.432
Make use of the algorithm shown as formula (4.1)- indexes u (1) u ( 2)
(4.4) to modify the weights and parameters of Grades ranking Scholarship
membership function. The initial parameters of u1(1) 0.128 u 0.184
1( 2)
membership function are given as Table 3. The initial
weights are confirmed by AHP shown as Table 4. Moral score Part-time job
u2(1) 0.362 earning u 0.396
1 5 2( 2)
'wi
5m1
¦
K (d y )vn om3 (i ) (4.1) Prize and
Other earning
5
Punishment 0.216 u 0.128
1
'wk (i )
5
¦K (d y)v w o
j 1
n i
2
jk ( i ) (4.2) Second u3(1) 3( 2 )
level Payment
( x xkj ( i ) ) 2 Cheating in
2 5 ( x xkj (i ) ) indexes a exam u 0.145 of school fee 0.179
'xkj (i ) ¦K (d y)vn wi wk (i) okj2 (i) e
5 j1
V2
V k (i ) 2
4(1)
u4( 2)
Attendance
(4.3) 0.149
u5(1)
( x xkj ( i ) ) 2 2 Repayment of
2 5 ( x xkj (i ) )
'V k (i ) ¦K (d y)vn wi wk (i ) okj2 (i ) e V2 Participation in the loan u5( 2) 0.113
5 j 1 V k (i ) 3 socially useful 0.082
(4.4) activity u6(1)
Table 3. The initial parameters of membership function Learning rate is 0.01. Momentum is 0.95. Learning
accuracy is 0.001. Max-epoch is3000ǄThe connective
Index x1 x2 x3 x4 x5 V weights and parameters of membership function are
Grades ranking adjusted to reasonable value after training the FENN.
u1(1) 0 11/4 11/2 33/4 11 11/4 The simulation is carried out by neural network toolbox
in MATLAB. Fig. 2 shows the simulation of FENN (“o”
Moral score is expected output, “+”is simulated output). The
u2(1) 0 3/2 3 9/2 6 3/2
simulated output of 80 training samples approximated to
Prize and expected output. And the relative error reached the
Punishment expected standard. So the convergency of evaluation
-4 -7/4 1/2 11/4 5 9/4
u3(1) model is good.
Cheating in a
exam u4(1) -1 -1/4 1/2 5/4 2 3/4
Attendance
u5(1) -4 -5/2 -1 1/2 2 3/2
Participation in
socially useful 0 3/4 3/2 9/4 3 3/4
activity u6(1)
Scholarship 0 5/4 5/2 15/4 5 5/4

Acknowledgements
The work described in this paper was supported by
Project 60375024 of NSFC, and Project 2006BB2192 of
CSTC.
Reference
[1] Henley, W.E.: Statistical aspects of credit scoring.
Unpublished PhD thesis, The Open University, Milton
Keynes, UK (1995)
[2] E Altman, RA Eisenbeis, J Sinkey: Applications of
classification techniques in business. Banking and
Finance .JAI Press, Greenwich. CT, 1981
[3] P Gothe: Cedit bureau point scoring sheds light on shades
Fig. 2. The contrast between the actual output and expected of gray. The Credit World. (1990) 25- 29
output [4] Lyn C .Thomas: A survey of credit and behavioral
scoring: forecasting financial risk of lending to
Table 4. Expected output and actual output consumers. international Journal of Forecasting
(2000)149-172
Expected Actual Relative [5] Henley, W.E., and D.J. Hand.: A k-nearest-neighbor
output output error classification for assessing consumer credit risk. The S
tatistician(1996) 45(1)77-95
No.1 Student 0.879 0.881 0.2% [6] Cheng, B.,&Titerington, D. M. Neural networks: a
No.2 Student 0.752 0.756 0.5% review from a statistical perspective. Statistical science
No.3 Student 0.774 0.771 0.4% (1994)230
No.4 Student 0.531 0.535 0.7% [7] David West: Neural network credit scoring models.
Computers & Operations Research (2000)
No.5 Student 0.605 0.608 0.5%
[8] Coflman, J.: The proper role of tree analysis in
forecasting the risk behavior of borrowers. Management
Input the checking samples into the trained FENN Decision Systems. Atlanta, MDS Reports.
model to get the output of model. The contrast between [9] Zengren Yuan. artificial neuronal network and the
the actual output and expected output is shown as Table application. Tsinghua University Press.2003
4. All the relative error is less than 1% and the result is [10] Satty T L.: The analytic hierarchy process: planning,
satisfied. So it is feasible that we evaluate the university priority setting. New York, Mcgraw-Hill(1998)
student credit by the model. The trained neural network [11] Shoukang Qin. Comprehensive Evaluation principle
andapplication. Beijing: Publishing House of Electronics
effectively reflects the non-linear relation between index
Industry,2003
system and evaluation. The model can reduce workload [12] Yuntong Liu, Jiangbi Hu. A Methematical Model and Its
and subjectivity when evaluating. And the scientificity Parameters Estimation for Fuzzy Evaluation. Journal of
and rationality of evaluation will be advanced. Beijing University of Technologyˈ2001(3):112-115
5 Conclusions
The paper sets up the index system for university student
credit evaluation and fuzzy evaluating neural network
which solve the problem on university student credit
comprehensive evaluation. The model can reduce
workload and subjectivity when evaluating. And the
scientificity and rationality of evaluation will be
advanced. Additionally, every index adopts fuzzy
attribute value so that it is subjective to some extent. The
evaluation model based on neural network provides a
new way for university student credit evaluation.

Real-Time Control of Erythromycin Fermentation Process

Based on ANN Left- and Right-Inversion*
Xianzhong Dai, Wancheng Wang
School of Automation, Southeast University, Nanjing, 210096, China (xzdai@seu.edu.cn)
Abstract: Based on the artificial neural network (ANN) technology or high cost. In order to overcome this difficulty,
left-inversion and the ANN right-inversion methods proposed we have proposed an ANN inversion as a soft-sensor to
in our previous work, this paper focuses on addressing the estimate such directly immeasurable states in [7]. For avoiding
problem of real-time controlling some crucial variables which confusion, in this paper the inversion soft-sensor is called
are directly immeasurable in erythromycin fermentation left-inversion.
process. First, the left-inversion and the right-inversion are Based on our previous work [5,7], the present paper
presented for the erythromycin fermentation process, which focuses on addressing the problem of real-time controlling
respectively act as a soft-sensor and a linearization & some immeasurable crucial variables in erythromycin
decoupling controller. Then, a combined control scheme is fermentation process by coupling the ANN left-inversion (i.e.
proposed by coupling the left- and right-inversion. Further, soft-sensor) with the ANN right-inversion (i.e. linearization
application of ANN to the combined control scheme results in and decoupling controller).
an ANN combined scheme based on the ANN left- and right- To do so, a right-inversion is first given, which is
inversion which overcomes the difficulty in implementing the composed of a nonlinear function to describe its nonlinearity
combined control scheme by analytic means. The ANN and a set of integrators to describe its dynamic behaviors. The
combined control scheme can decouple the erythromycin right-inversion can be used as a dynamic nonlinear controller
fermentation process into three linear subsystems and finally a to decouple the erythromycin fermentation process into three
closed control scheme is obtained by using the mature linear linear subsystems, i.e. the mycelia concentration, sugar
control theory. concentration, and chemical potency subsystems. This makes
Keywords: ANN, left-inversion, right-inversion, combined it very easy to design additional linear controllers according to
control scheme, erythromycin fermentation process the mature linear control theory. Then, coupling the
left-inversion and the right-inversion forms a combined
1 Introduction control scheme which still holds the same decoupling function
as the right-inversion.
In last two decades, there has been considerable interest in the
Further, in order to overcome the difficulties in
problem of linearizing and decoupling nonlinear multivariable
implementing the left- and right-inversion based combined
systems by using right-inversion whose realization usually
control scheme by analytic means, static ANNs are used to
depends on the full state feedback [1-4]. Taking the respective
approximate them respectively, hence resulting in an ANN
advantages of the strict control theory of right-inversion and
combined control scheme based on ANN right-inversion and
the strong potential of artificial neural network (ANN) to
ANN left-inversion. The proposed ANN combined-inversion
approximate nonlinear function, in [5,6] we have presented an
is a special kind of dynamic ANN since it features two
ANN right-inversion method for general nonlinear systems
relatively independent parts---a series of integrators &
that overcomes the difficulty in implementing right-inversion
differentiators to represent its dynamic behavior and a static
by analytic means.
ANN to approximate the static nonlinear functions difficult to
For erythromycin fermentation process, the major
be implemented by analytic means. Different from common
bottleneck restricting its real-time control is that some crucial
dynamic ANNs like the well-known Hopfield network [8],
process variables are very difficult to be measured directly by
such a separate structure makes the ANN right-inversion
biochemical and/or physical sensors due to the limitation in
easier to use. Therefore, the ANN right-inversion is capable of
approximating a complex dynamic system with satisfactory
* This work is supported by both the National Natural Science Foundation of
China (60574097) and the Specialized Research Fund for the Doctoral accuracy and relatively small effort.
Program of Higher Education (20050286029).

Based on the proposed ANN combined control scheme, It is evident that it is very difficult to verify the right
the ultimate closed-loop real-time control of the erythromycin invertibility of system (1) due to its high inaccuracy and
fermentation process is achieved by further resorting to the nonlinearity. Fortunately, considering the actual control
mature linear system theory. situation that the dissolved oxygen concentration and pH
value (i.e. x4 , x5 ) are controlled separately by inputs u4 and
2 Right-Inversion of the Erythromycin u5 via designing normal PIDs, the system (1) can be
Fermentation Process simplified by leaving them out of account. Thus, by redefining
the inputs as u = ( u1 , u2 , u3 ) and the controlled outputs as
T
As given in our previous work [7], the erythromycin
fermentation process can be described by the following
y = ( x1 , x2 , x3 ) , the erythromycin fermentation system (1)
T
so-called gray box model

can be simplified as follows
⎧ x1 ⎛ 5 ⎞
⎪ x1 = μ x1 − ⎜ ∑ ui ⎟ = f1 ( x , u ) ⎧ x1
⎪ x6 ⎝ i =1 ⎠ ⎪ x1 = μ x1 − x (u1 + u2 + u3 )
⎪ ⎪ 6
s x ⎛ 5 ⎞ ⎪
⎪ x2 = σ x1 + 1 u2 − 2 ⎜ ∑ ui ⎟ = f 2 ( x , u ) s1 x2
⎪ x6 x6 ⎝ i =1 ⎠ ⎪ x2 = σ x1 + u2 − (u1 + u2 + u3 )
⎪ ⎨ x6 x6
⎪⎪ x3 = π x1 − s2 x3 + s3 u3 − x3 ⎛ ∑ ui ⎞ = f 3 ( x , u )
5
⎪ s x
⎪ x3 = π x1 − s2 x3 + 3 u3 − 3 (u1 + u2 + u3 )
⎨ x6 x6 ⎝ i =1 ⎟⎠
⎜
⎪ x6 x6
(2)
⎪ (1) ⎪ x = u + u + u
⎪ x4 = η x1 − s4 x4 − x4 ⎛⎜ ∑ ui ⎞⎟ = f 4 ( x , u )
5
⎩ 6 1 2 3
⎪ x6 ⎝ i =1 ⎠
⎪ ⎛ y1 ⎞ ⎛ x1 ⎞
⎪ x = ψ x + s5u4 − s6 u1 − s7 u2 − s8 u3 − x5 ⎛ u ⎞ = f ( x , u )
5 ⎜ ⎟ ⎜ ⎟
y = ⎜ y 2 ⎟ = ⎜ x2 ⎟
⎪ 5 1
x6
⎜ ∑ i ⎟
x6 ⎝ i =1 ⎠
5
⎜y ⎟ ⎜x ⎟
⎪ ⎝ 3⎠ ⎝ 3⎠
⎪⎩ x6 = u1 + u2 + u3 + u4 + u5 = f 6 ( x , u ) Sinc xi ≠ 0, i = 1, 2,3, 6 and s j ≠ 0, j = 1, 2,3 , we have
y = ( y1 , y2 , y3 , y4 , y5 ) = ( x1 , x2 , x3 , x4 , x5 )
T T
⎡ ∂y1 ∂y1 ∂y1 ⎤
⎢ , , ⎥
where u = ( u1 , u2 , u3 , u4 , u5 ) are directly measurable inputs ⎢ ∂u1 ∂u2 ∂u3 ⎥
T
⎛ ∂y ⎞ ⎢ ∂y ∂y 2 ∂y 2 ⎥

(oil u1 , dextrin u2 , propanol u3 , aqua ammonia u4 , and det ⎜ ⎟ = det ⎢ 2 , , ⎥
⎝ ∂u ⎠ ⎢ ∂u1 ∂u2 ∂u3 ⎥
y = ( y1 , y2 , y3 , y4 , y5 ) = ( x1 , x2 , x3 , x4 , x5 )
T T
water u5 ), are ⎢ ∂y ∂y3 ∂y3 ⎥
⎢ 3, , ⎥
x = ( x1 , x2 , x3 , x4 , x5 , x6 ) are states that are
T
⎣⎢ ∂u1 ∂u2 ∂u3 ⎦⎥
outputs, and
divided into two groups: the measurable group (dissolved ⎡ x1 x1 x1 ⎤
oxygen concentration x4 , pH value x5 and zymotic fluid ⎢− , − , − ⎥
⎢ x6 x6 x6 ⎥
volume x6 ) and the immeasurable group (mycelia ⎢ x s1 − x2 x ⎥
concentration x1 , sugar concentration x2 and chemical = det ⎢ − 2 , , − 2 ⎥ (3)
⎢ x6 x6 x6 ⎥
potency x3 ). In addition, μ ( x ) , σ ( x ) , π ( x ) ,η ( x ) ,ψ ( x ) are ⎢ x x3 s3 − x3 ⎥
⎢− 3 , − , ⎥
all analytic functions, and si , i = 1, ,8 are all non-zero ⎢⎣ x6 x6 x6 ⎥⎦
constants, both of which imply that all the ss x
functions f i ( x , u ) , i = 1, , 6 are analytic as well. = − 1 33 1 ≠ 0
x6
To realize the real-time control for the erythromycin which means that system (2) is right invertible. Then
fermentation process, in this section we will use its according to the inversion theory [1,5], in theory, its
right-inversion as a nonlinear controller to achieve its right-inversion can be expressed as
linearization and decoupling, consequently obtaining several
SISO pseudo-linear subsystems that will enable us to easily ⎛ u1 ⎞ ⎛ φ1 ( x6 , y1 , y2 , y3 , y1 , y 2 , y3 ) ⎞
⎜ ⎟ ⎜ ⎟
design respective linear controllers to accomplish their u = ⎜ u2 ⎟ = ⎜ φ2 ( x6 , y1 , y2 , y3 , y1 , y 2 , y3 ) ⎟ (4)
closed-loop real-time control according to the mature linear ⎜ u ⎟ ⎜ φ ( x , y , y , y , y , y , y ) ⎟
⎝ 3⎠ ⎝ 3 6 1 2 3 1 2 3 ⎠
system theory.

Let y1 = v1 , y 2 = v2 , y3 = v3 , and then the right-inversion (4) can real-time control. In our previous work [7], we have proposed
be rewritten as an inversion soft-sensing model (i.e. soft-sensor) described by
⎛ u1 ⎞ ⎛ φ1 ( x6 , y1 , y2 , y3 , v1 , v2 , v3 ) ⎞ ⎛ x1 ⎞ ⎛ ϕ1 ( x4 , x5 , x6 , x4 , x5 , x4 , u, u ) ⎞

⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
u = ⎜ u2 ⎟ = ⎜ φ2 ( x6 , y1 , y2 , y3 , v1 , v2 , v3 ) ⎟ ⎜ x2 ⎟ = ⎜ ϕ 2 ( x4 , x5 , x6 , x4 , x5 ,
x4 , u, u ) ⎟
(6)
⎜ u ⎟ ⎜φ (x , y , y , y , v , v , v ) ⎟ (5) ⎜x ⎟ ⎜ ⎟
⎝ 3⎠ ⎝ 3 6 1 2 3 1 2 3 ⎠ ⎝ 3 ⎠ ⎝ ϕ3 ( x4 , x5 , x6 , x4 , x5 , x4 , u, u ) ⎠
=φ ( x6 , y1 , y2 , y3 , v1 , v2 , v3 ) = ϕ ( x4 , x5 , x6 , x4 , x5 ,
x4 , u, u )
Obviously, cascading the right-inversion (5) with the to estimate the directly immeasurable states x1 , x2 , x3 . For
nonlinear system (2) naturally results in there linearized and avoiding confusion, in this paper the inversion soft-sensor is
decoupled subsystems, i.e. the mycelia concentration x1 , sugar called left-inversion. The detailed construction procedure of
concentration x2 , and chemical potency x3 subsystems, as the left-inversion and ANN left-inversion is given in
shown in Fig. 1. Appendix.
From Fig. 1, it is clear to find that the nonlinear system Thus, we can achieve a combined control scheme by
(2) has been linearized and decoupled into three first-order coupling the right-inversion with the left-inversion, as shown
linear subsystems. Thus, according to the linear system theory, in Fig. 2, where the left-inversion is used as a soft-sensor to
one can design three respective linear controllers to estimate the immeasurable crucial process variables while the
accomplish their closed-loop control for these three linearized ANN right-inversion as a nonlinear controller to linearize and
and decoupled subsystems. decouple the nonlinear erythromycin fermentation process.
u1 ~ u5
v1 u1 y1 u1 ~ u5
v2 s −1 u2 y2 v1 u1 x4 y1
−1 s
v2 s u2 x5 y2
v3 s −1 u3 y3 −1 s
v3 s u3 y3
s −1 φ ( i) x6 s
s −1 φ (i) ϕ (i)
Right- x6 x6
Inversion
Fig. 2 Combined control scheme based on left- and
decoupled into right-inversion ( s denotes a differentiator)
Similar to the left-inversion, the right-inversion is also
v1 difficult to implement by analytic means due to inaccuracy
y1
s −1 and high nonlinearity of the erythromycin fermentation
v2 y2 process. In order to overcome this problem, a static ANN can
s −1 also be used to approximate the nonlinear functions φ ( i ) due
v3 y3 to ANN’s strong potential to approximate nonlinear function
s −1 with any accuracy. Thus, an ANN combined control scheme is
achieved and shown in Fig. 3.
Fig. 1 The linearization and decoupling of the fermentation
The ANN right-inversion as well as the ANN
system ( s −1 denotes an integrator)
left-inversion proposed above is a kind of dynamic ANN in
essence since it features two relatively independent parts---a
3 ANN Combined Control Scheme Based set of integrators (or differentiators) to represent its dynamic
on Left- and Right-Inversion behavior and a static ANN to approximate the static nonlinear
For the above linearized and decoupled subsystems, as the function. As opposed to common dynamic ANNs composed of
controlled outputs y1 , y2 , y3 are not directly measurable, it is dynamic neurons, like the well-known Hopfield network, such
necessary to online estimate them so as to realize their a separate structure makes the ANN left- and right-inversion

easier to use than common dynamic ANNs. Therefore, the The ANN can be trained with Levenberg-Marquardt
ANN left- and right-inversion is capable of approximating a training algorithm due to its faster convergence than common
complex dynamic system with satisfactory accuracy and BP algorithms. In addition, the trained ANN right-inversion
relatively small effort. should be tested with the data not used for training to show
u1 ~ u5
whether or not the generalization of the ANN right-inversion
u1 ~ u5
is appropriate for actual application.
In practical use, three independent static ANNs are
v1 u1 x4 y1
adopted to approximate the three functions φ1 , φ2 , φ3 in
−1 s
v2 s u2 x5 y2 right-inversion (5) respectively in order to reduce the
s −1
s complexity of ANN’s structure, enhance the ability of
v3 u3 y3
x6 s generalization and shorten the training time.
s −1
x6
ANN Right-inversion ANN Left-Inversion y1 y1
u1 u1nn
Fermentation Process
Fig. 3 ANN combined control scheme y1 y1
s
Normalization
Static ANN
Erythromycin
y2 y2
u2 u2nn
y 2 y 2
Although common dynamic ANNs composed of dynamic y5
s
y3
u3
neurons can also be used to approximate a complex dynamic y3 y3 u3nn
s
system, the presented ANN right-inversion as well as ANN x6 x6
left-inversion is the preferred option over the common
dynamic since they generally have a very complex structure u1 e1
Algorithm
Normalization
Learning
and its dynamic characteristic are very hard to capture. u2 e2
u3 e3
4 ANN’s Training and Structure

First consider the training of ANN right-inversion., whose
training structure is shown in Fig.4. Before training the ANN Fig. 4 The offline training scheme of ANN right-inversion
right-inversion, we first need to collect the raw data to form
the training data sets. For the erythromycin fermentation The training procedure of ANN left-inversion is quite
process, the raw data x4 , x5 , x6 , u1 ~ u5 are acquired by using similar to ANN right-inversion as shown in Fig. 5 and not
actual chemical and physical sensors every 5 minutes while represented.
the x1 , x2 , x3 are obtained by using off-line analyzers every 6
hours and denoted as x1a , x2a , x3a . To get enough training data
u1 ~ u5
"Assumed Inherent Sensor"
matching with x4 , x5 , x6 , u1 ~ u5 , the offline analyzing data x1

u1 ~ u5 u1 ~ u5
s
y1 , y2 , y3 (i.e. x1 , x2 , x3 ) are fitted and smoothed by least x1 x4 x4
Normalization
Static ANN
square fitting method in 5-minute-step. The required u1 ~ u5 s x 4 x4 x2

x4 x4
derivatives of y1 , y 2 , y 3 (i.e. x1 , x2 , x3 ) are obtained by using x2 x5 s
x5
5-point derivative method that can guarantee high accuracy. x3
x5 x5
s
x3 x6 x6
Thus, we obtain the ultimate ANN training data sets.
Erythromycin
Before being used to training ANN, the training data
Fermentation Process
should be filtered and smoothed by the normal digital filters so x1a x1a + e1
Algorithm
Normalization
Data Fitting
Learning
Off-line
Analysis
as to filter the noise disturbance. In addition, to improve the x2a x2a - + e2

x3a x3a - e3
training effect and enhance the soft-sensing performance, all +
-
the data are normalized within ±1 and the normalized data
are denoted by { u1 ~ u3 } and { y1 , y2 , y3 , y1 , y 2 , y 3 , x6 } as
shown in Fig. 4. The ANN’s outputs are denoted as u1nn ~ u3nn . Fig. 5 The offline training scheme of ANN left-inversion

As pointed out in [7], any kind of static ANN, such as BP Following Fig. 3, the trained ANN left- and right-
and RBF together with the corresponding learning algorithms, inversion can decouple the nonlinear system (2) into three
can be used to approximate the nonlinear functions φ ( i ) and first-order linear subsystems. Thus, the ultimate closed-loop
ϕ ( i ) . Due to the simplicity of structure and ease of realization, real-time control of outputs y1 , y2 , y3 is achieved by further
resorting to the mature linear system theory, as shown in Fig.6.
Here, all of the static ANNs adopt feed-forward BP network
with the structure of 16-22-15-1 with “tan sigmoid” transfer 5 Conclusions
function on the nodes of two hidden layers, and “linear” This paper proposes an ANN combined control scheme based
transfer function on the node of output layer. The experiment on the ANN left-inversion and right-inversion, which can
results have shown that the static ANN thus chosen is of provide real-time control for the erythromycin fermentation
higher accuracy and better ability of generalization. process.
u1 ~ u5
u1 ~ u5
y1ref u1 s y1
x4
Fermentation Process (2)

s −1
Static ANN
u2 x 4
y2ref s y2
Static ANN
x4
x5 s
y3ref s −1 u3 y3
x5
s
s −1 x6
x6
Fig. 6 The closed control scheme of erythromycin fermentation process
Appendix ANN Left-Inversion In [7], we also proposed the “assumed inherent sensor”
and its inversion of the “assumed inherent sensor” that acts as
For the erythromycin fermentation process (1), to estimate the
immeasurable x1 , x2 , x3 by soft-sensing method, one may a soft-sensor to estimate the directly immeasurable states
x1 , x2 , x3 . They are respectively described by
firstly assume that, in the interior of the original biochemical
⎛ x4 ⎞ ⎛⎜ φ1 ( x1 , x2 , x3 , x4 , x5 , x6 , u, u ) ⎞⎟
process (1), there exists a subsystem whose inputs are just the ˆ
⎜ ⎟
⎜ x4 ⎟ = ⎜ φ2 ( x1 , x2 , x3 , x4 , x5 , x6 , u, u ) ⎟
to-be-estimated x1 , x2 , x3 while whose outputs the ˆ (7)
measurable x4 , x5 , x6 . Such a subsystem would be viewed as ⎜ x ⎟ ⎜⎜ ˆ ⎟
⎝ 5 ⎠ ⎝ φ3 ( x1 , x2 , x3 , x4 , x5 , x6 , u, u ) ⎟⎠
an “assumed inherent sensor” contained in biochemical
process as depicted in Fig. 7, with the u1 ~ u5 regarded as and
⎛ x1 ⎞ ⎛ ϕ1 ( x4 , x5 , x6 , x4 , x5 , x4 , u, u ) ⎞

parameter variables.
⎜ ⎟ ⎜ ⎟
x1 x4 ⎜ x2 ⎟ = ⎜ ϕ 2 ( x4 , x5 , x6 , x4 , x5 ,
x4 , u, u ) ⎟
(8)
⎜x ⎟ ⎜ ⎟
⎝ 3 ⎠ ⎝ ϕ3 ( x4 , x5 , x6 , x4 , x5 , x4 , u, u ) ⎠
u1 ~ u 5 Assumed
x2
Inherent
x5 = ϕ ( x4 , x5 , x6 , x4 , x5 ,
x4 , u, u )
Sensor
x3 x6 For avoiding confusion, the inversion soft-sensor here is called
Fermentation Process as left-inversion.
As shown in Fig. 8, the cascade of the left-inversion
Fig. 7 “Assumed inherent sensor” structure
expressed by (8) with the “assumed inherent sensor”

expressed by (7) leads to a so-called composite identity system u1 ~ u 5
whose outputs would be the identity mapping of its inputs. It u1 ~ u 5
is clear that the outputs of the “assumed inherent sensor” s x1
x4
inversion reproduces completely the inputs into the “assumed
x 4
Static ANN
inherent sensor” x1 , x2 , x3 , and then the problem of estimating s
x4 x2
the directly immeasurable states x1 , x2 , x3 is solved. x5 s
In order to further overcome the difficulty in constructing
x5 x3
the left-inversion by analytic means, a static ANN is used to s
approximate the nonlinear function ϕ ( i ) appearing in (8) so x6
that the ANN left-inversion is finally completed, which is

composed of a static ANN and a series of differentiators as
shown in Fig. 9. This makes its construction easy in practical
Fig.9 The ANN left-inversion structure
use and strict in theory as well.
u1 ~ u5 [3] Benedetto, M.D.D., Glumineau, A., Moog, C.H., The

nonlinear interactor and its application to input-output
u1 ~ u5
s x1 decoupling. IEEE Trans. Automat. Contr., Vol. 39, No. 6 (1994)
x1 x4 1246-1250.
x 4
s x2
[4] Wu, R., Li, C., Constructive inverse system method for
x4
u1 ~ u5 x5 s general nonlinear systems. Control Theory and Applications,
x2
Vol. 20, No. 3 (2003) 345- 350.
x5 x3
s [5] Dai, X., He, D., Zhang, X., MIMO system invertibility and
x3 x6 ϕ (i)
decoupling control strategies based on ANN α th-order
inversion. IEE Proceedings-Control Theory and Applications,
Vol. 148, No. 2 (2001) 125-136.
Fig. 8 The left-inversion structure [6] Xianzhong Dai, Dan He, Teng Zhang, Kaifeng Zhang,
ANN generalized inversion for the linearization and
decoupling control of nonlinear systems. IEE Proceedings-
Control Theory and Applications, Vol.150, No.3, 267~277,
2003.
[7] Xianzhong Dai, Wancheng Wang, Yuhan Ding, Zongyi Sun,
References “Assumed Inherent Sensor” Inversion Based ANN Dynamic
[1] Singh, S.N., A modified algorithm for invertibility in Soft-sensing Method and Its Application in Erythromycin
nonlinear systems. IEEE Trans. on Automat. Contr., Vol. 26, Fermentation Process, Computers and Chemical Engineering,
No. 2 (1981) 595-598 Vol. 30, No. 8, pp. 1203-1225, 2006.
[2] Li, C, Feng, Y., Decoupling theory of general multivariable [8] Hagan, M.T., Demuth, H.B., Beale, M., Neural Network
analytic non-linear systems. Int. J. Control, Vol. 45, No. 4 Design. PWS Publishing Company, Boston (1996).
(1987) 1147-1160

Optimal Control Based-Neurocontroller to Guide the Crop Growth

under Perturbations
J. Pucheta H.D. Patiño C. Schugurensky R. Fullana B. Kuchen
Department of Electrical Institute of Automatics Institute of Automatics Institute of Automatics Institute of Automatics
Engineering Faculty of Engineering Faculty of Engineering Faculty of Engineering Faculty of Engineering
Faculty of Exact, Physical National University of National University of San National University of National University of
and Natural Sciences San Juan Juan San Juan San Juan
5000, Córdoba, Argentina 5400 San Juan, Argentina 5400 San Juan, Argentina 5400 San Juan, Argentina 5400 San Juan, Argentina
julian.pucheta@gmail.com dpatino@inaut.unsj.edu.ar dpatino@inaut.unsj.edu.ar dpatino@inaut.unsj.edu.ar dpatino@inaut.unsj.edu.ar
Abstract—In this work, an optimal control based- scenario whose errors were considered as perturbation but
neurocontroller for guiding the crop growth in greenhouse that not measured nor used by the control system [10] [14].
considers the perturbations associated to the climatic variables However, the system’s future operation conditions are not
is presented. In order to perform the guidance of the tomato always accurately modeled. In the case of the crop growth
seedlings crop development in greenhouse, a neural network
guidance in greenhouse, the weather forecast errors
implements the controller. To this end, operational costs —
associated to the actuators— and, a sequence of weather represent those conditions, which are associated to the
temperature measurements are considered. The design perturbation variables. Thus, in order to compute the
procedure of the neurocontroller is based on the neurocontroller it is necessary to have a weather pattern, and then,
dynamic programming technique, which assumes the when the system is on-line operating the neurocontroller
availability of the mathematical model of the crop greenhouse assumes the presence of those conditions, which normally
system and the historical data of the weather temperature. In
the performance moves away from the optimal [10].
order to show the performance of the proposed control scheme,
a comparative simulation’s results regarding to other In this work, an approach for guiding the crop growth of
approaches are featured. tomato seedling in greenhouse is proposed, by controlling its
rate growth by means of control actions on the greenhouse’s
I. INTRODUCTION variables, whereas the costs associated are minimized and
the historical data of the weather temperature are considered.
T he crop growth guidance consists of generating the
control actions to evolve the biological process from
some initial state to the desired final state, by minimizing the
The control actions are the window opening and the heater
use percentage.
associated operative cost to the actuators. To this end, it is The control scheme is shown in Fig. 1, which operates on
assumed that the system’s state —which in this case is line with the process. The state variable x has the values of
defined by the values of dry weight and number of leaves— dry weight, of number of leaves and of a sequence of five
is known at every moment [14] [11]. Furthermore, the historical values of the weather temperature. The variable u
system’s model is nonlinear, with constrains on its variables, is the control action that operates on two actuators of the
and the performance index of the optimization criterion is greenhouse, whose are the windows opening (-100dud0)
not quadratic [12] [13]. and the percentage of heater use (0dud100).
In order to solve this problem diverse strategies have been
proposed [9]. However, such approach suffers of PC-BASED SYSTEM
dimensionality problems because it demands many
computational resources when does increase the number of u(x,k,{To}) CULTIVATION
NEUROCONTROLLER
state variables, as is the situation that will be presented here.
The application of the neuro-dynamic programming x(k)
(NDP) technique to design the controller is attractive from STATE CHARACTERISTICS
OBSERVER
several points of view, since it allows obtaining a robust
controller [10] [1], which can require relatively few
To
computational resources for its field implementation as well
[8]. Fig. 1. Optimal control PC-based approach, in which a sequence of
weather temperature is considered.
In this approach, the controller is off-line computed
because of it requires a great amount of operations [2].
Therefore, the system’s performance depends on the
exactness in the pattern used to model the future process’s Thus, the minimization of the operative costs associated
to the heating is incorporated by considering the changes of
the weather temperature, whose function in the control law

tries to adapt it —on line— according to the last data of closed with V(t)=0 where the leakages are considered by U
temperature. in Eq. (2).
The described model corresponds to a scale model located
II. PROBLEM FORMULATION in the Laboratory of Processes of the Instituto de
The optimal control problem is composed by the Automática, of the UNSJ (Argentina) [9] [10]. The
process’s dynamic model, the cost function, and the greenhouse model is algebraic and it does not add any state
restrictions of the state and the control variables. Next, by variable.
considering those elements simultaneously the optimal The control actions or manipulated variables are the
control problem is stated, where the solution is the optimal heater use percentage H(t) and the windows opening V(t). It
control law. is considered —in the model of Eq. (2) — that the variables
of heating and ventilation are mutually excluding. Thus, the
A. The model of the crop-greenhouse system manipulated variables H(t) and V(t) that can vary between 0
The crop growth model is and 1 are grouped in a single variable with range in [- 1, 1].
°W E P T, S , CO , L R T W Vt min^0, a t `
g PAR 2 m
® (1) (4)
r r T
°̄ N m
Ht max^0, a t `
where W and N are the time-dependent state variables where - 1 d a t d 1.
representing the total dry weight of the crop in [g m-2] and It is assumed that the weather temperature To is known for
the number of leaves, respectively. However, from now on the system on-line operation. Therefore, a historical
the dimension of variable W will be changed from [g m-2] to temporal sequence for the neurocontroller off-line
[g] by dividing W by the plant density U, where U=835 m-2. computation was used, which belongs to July 1999 from the
Functions Rm(T) and Pg(T, SPAR, CO2, L) are the sustainable Tulum Valley, Argentina.
respiration rate of the leaves in g [CH2O] g-1[tissue] h-1, and The moisture control within the greenhouse is performed
the canopy gross photosynthesis rate, in g [CH2O] m-2 independently of the neurocontroller. This is possible
[ground] h-1, respectively. The temperature T measured in because the seedlings requires a minimum humidity content
°C, the photosynthetically active radiation SPAR in P mol and the necessity to diminish it, in case of excess, can
[photon] m-2s-1, and the CO2 concentration in ppm, are the eventually arise ending the guidance process. The CO2
environment variables whereas L is the leaf area index, concentration is assumed constant and equal to the ambient
which represents m2[leaf] m-2[soil]. The coefficient E value of 350ppm.
denotes the conversion efficiency from CH2O to plant tissue,
with units in g [tissue] g-1 [CH2O]. In addition, the
B. The Objective function
coefficient rm is the maximum rate of leaf appearance per
hour, and r(T) is a piecewise linear temperature function
with range in [0, 1], based on the TOMGRO model [4]. The cost function is composed of terms that consider
In addition, the greenhouse model used here [5] is certain variables of the system. One of these terms is the
performance index, which is specified in order to guide the
bS Oo Fc cultivation setting for the desired evolution of the system.
T To (2)
U Q v This performance index is defined as
where T and To are the internal and external temperature Ix, v / v T P T P
1 r 2 W
(5)
[°C], respectively. The function So is the global solar
radiation measured in [W m-2], the coefficient b is the where x = [W N]T is the state vector, with the state
fraction of the global solar energy that contributes to variables as defined in equation (1), the costs associated to
increase of T and b#1m2, O=0.45 is an empirical coefficient the control actions are considered by v=Cf(t) where
and U=0.5056 W °C-1 is the energy loss coefficient by Cf(t)=H(t)Pc, with Pc = $ 0.12 h-1 (The greenhouse
interacting with the environment. The variables Fc and Qv equipment includes a 1500 W electric heater, whose
are used as control actions, where Fc = F·H(t) with H(t) operative cost is 0.08 $(kWh)-1 (December 2006, in San
ranging between 0 and 1, F is the heating coefficient, 14.966 Juan, Argentina) or, alternatively, 0.12 $ h-1.
[W], and Qv is the cooling through ventilation in [W °C-1] In addition, Pr is a nonlinear function to constrain the
defined by the expression internal temperature defined by
T 36 if T ! 36
v1 V ( t ) v 2 V ( t ) 2 v 3 V ( t ) 3 ° (6)
Qv (3) Pr ® 8 T if T8
° 0 if 8 d T d 36,
where v1, v2, v3 are 0.107, 2.3275, -1.2761 respectively, ¯
and V(t) is the windows action, dimensionless with range in PW is a nonlinear function to avoid any overshoot in the
[0, 1]. Here, windows are completely open with V(t)=1, and desired dry weight

function (8). The domain of the control law μ is the state
W t Wd if W t ! Wd (7) space [W N]T and the time stage P : R 3 o R , which
PW ®
¯ 0 otherwise ,
requires a measurement of the state variables without
/, T1 and T2 are weighing coefficients with /=0.05, perturbing the normal crop development [11] [14].
T1=50, T2=50. In addition, to incorporate information about the weather
By integrating the performance index, one can define the temperature is also an objective. The reason is for giving
cost function, which incorporates an extra term in order to robustness to the control system by performing an implicit
consider the final values of the state variables prediction of the future scenario of the process’s
tf
environment.
J x, x d , v
³ Ix, v dW * xt f x d (8) Therefore, the tabulated control law of the system (10)
where xd=[Wd Nd]T containst
the desired final values for will be approximated by means of a neural network (NN),
the state variables where Wd is the desired dry weigh and Nd which incorporates historical data of the weather
is the desired number of leaves, and * is the weighting temperature in its inputs. This NN is trained in a scheme of
matrix with *=diag{5662.8, 1600}. The use of functions ¨.» optimal control.
is motivated by its good performance when the arg(¨.»)o0.
The field experience in San Juan, Argentina, has shown III. THE PROPOSED SOLUTION
that a good tomato seedling must have a dry weight of 0.21 The proposed control scheme is shown in Fig. 2. The
g and a number of leaves of three-four. Thus, 0.21 g and 3-4 controller is a NN that generates the control action from the
leaves were the final state to be reached by the optimal actual extended state of the system. The motivation for using
trajectory of the state variables. The internal temperature such a structure for the neurocontroller is based on its
T(k) was constrained to the 8 oC -36 oC range, which is the simplicity, although others variants was tried.
admissible temperature range for the tomatoes, and the dry
To(t)
weight is in the 0 – 0.21 g range for Eq. (6), and (7),
respectively. The weight matrix values were tuned by trial Crop greenhouse N(t)
a(t)
and error [9]. system W(t)
Finally, the operative cost associated to the heater action t
is computed by
tf h1 W1A
C o t ³ C dW
f
(9) W2A xe
t
where Co is the operative cost in economics terms, which

is incurred when the process evolves from time t to tf.
Then, the closed-loop system can be described as follows
x f x, u x0 x 0
® (10) hq
¯ u Px, t 0 d t d t f
where x is the time dependent state vector defined as
Z1-1
x=[W N]T, W is the dry weight as a function of time [g
plant-1], and N is the number of leaves per plant as a
Z4-1
function of time, u is the control action defined as u=a(t). In
addition, f(·) is the equivalent function of the model which
combines Eq. (1) and (2), ȝ(·) is the optimal control law, and h1 W1
W2
x0 is the initial condition for the nonlinear equation (i.e.,
initial dry weight and number of leaves). ~
J
C. The control objective
The optimal control problem for guiding the crop growth hq

Fig. 2. Scheme of the control approach based on the
can be formulated as follows. By considering the dynamic approximation of the control law and the cost function, by
model of the production system of expression (10), it is considering the historical values of the weather temperature.
desired to obtain an optimal control law (which gives the
values of heater use and window opening) such that the crop This state is extended, because it consists of a part added
growth goes from an arbitrary initial state condition to a to the state vector of the crop, which corresponds to a
desired final state by minimizing the predefined cost sequence of historical values of weather temperature. The

extended state vector at each time stage is defined by are denoted by s.
The parameter’s tuning algorithm was coded in Matlab®,
xe [k Wk N k To k To k -1 To k - 2 To k -3 To k - 4 ] T (11) and the computing time was around 18 minutes using a
Pentium IV running Windows XP®-based system.
where k is the time, W and N are the state variables and The algorithm was run by 20 iterations. The learning law
To is the weather temperature. The state vector is not for the parameters in the off line calculation, which gives the
augmented for the approaching function of the cost function. learning rate is
In order to facilitate the operation of the tuning method of
15 n
the parameters of the approaching function, a scaled process J n 1 1 ; with n 1,2,3.....20. (12)
is performed on the state vector defined in Ec. (11), which 20 15 n
consists of normalizing in the range [3, 0]. This structure and coefficients of the function J was
The idea is to tune the NN’s parameters to approximate the chosen after several trials. The Eq. (12) shows the preferred
tabulated control law P : R 7 o R . Thus, this law will learning function, which a well-suited parameter adjustment
consider historical values of the weather temperature. has performed.
A. Controller computation
In order to apply the approximate policy iteration
algorithm [2] to the stated problem, magnitudes of the state
variable x and the control variable u are quantified. The
magnitude of each component of xe for each time stage was
quantified in 12 values, and the one of u in 9.
The time of the process evolution was divided in 100
stages of 4Hs. The stage number 100 is the last one, with the
associated state x(100).
Fig. 4. Top: parameter's trajectory of the approaching function for the

control law during the adjustment of the algorithm. Bottom: cost-to-
go evolution from the initial state t=0 to the final stage t=tf.
Fig. 4 shows the parameter’s evolution of the matrix

whose coefficients belong to the hidden layer of the NN
corresponding to the controller.
IV. NUMERICAL RESULTS

In order to check the performance of the proposed control
approach, the system was simulated by means of two
~
Fig. 3. Discretized states that belongs to the sets S and Ŝ , to calculate neurocontrollers. One that of them was computed by NDP
~ ~ , s only considering the states, which is labeled as “Traditional”
J , r and P respectively.
[14] [10] [13] and the other one is the neurocontroller
The amount of values J(i) are 12 x 100, which leaves 1200 proposed here which considers historical data of the weather
possible values of J . From the set S of 1200 elements a temperature, and labeled as “Modified”.
sample of 400 was taken to generate the sub set Ŝ , and 400
~ The temporal evolution of the environment conditions of
for S . The tuning algorithm used is the Levenberg-
Marquardt [6] [3]. the cultivation and the greenhouse are shown in Fig. 5. In
Fig. 3 shows the space of discreet states, where are addition, the operative cost accumulation of the greenhouse
different those used for the compute of J and for the one of for each neurocontroller is depicted.
P. A desired trajectory is superposed on both data sets.
In order to highlight the NN’s parameter origin from each
NN, those from the function used to approach the cost The resulting evolution of the state variables is shown in
function are denoted by r, and those from the action network Fig. 6. The title “Cost accumulation” is referred to the

evaluation of the time dependent function expressed by Ec. Performance Operative ||ef||%
(8). Note that the improvement with respect to the criterion cost $
is quite good, when comparing the violation to the Traditional 349.0191 11.0411 4.4919
restrictions of the internal temperature, although the final Modified 194.1676 11.2421 0.9793
performance is very similar. Table 1. Comparative values obtained by simulation, where Performance
arises by evaluating the Eq. (8), Operative Cost arises by evaluating the Eq.
(9) and ||ef||% are the percentage of the state variables error norm at time tf.
Both controllers were used to guide the crop growth under

the same weather conditions. The weather conditions that
were used in the compute are compared with that used in the
test by means of the percentage of similarity, which is the
central value of the correlation between the signals [7].
In order to help the comparison, the numerical results are
depicted by Table 1. In Fig. 7 is graphically depicted how
the variations of the weather conditions affects to the values
of the cost function and the final state error.
Fig. 5. Evolution of the control action a(t), and of the internal and
external greenhouse's temperatures. Also is depicted the operative cost
associated to the heater action, which is when the condition a(t)>0
holds.
Fig. 7. Performance of the traditional NDP-based neurocontroller and

the modified approach. The expected behavior gets worse when the
similarity between variables foretold and measured diminish, although
this effect is not so strong in the modified version.
B. Discussion
Fig. 6. Temporary and space evolution of the variables of state for the
controllers based on NDP. One is to the traditional controller and the
The proposed control scheme have shown an improved
one that considers disturbances due to the outer temperature to the performance when it is compared against the case of not
conservatory. Also is the evaluation of the functional one of cost. considering the perturbations under the same weather
conditions. The improvement is observed when comparing
A. Performance of the neurocontroller the final state errors, shown in Table 1, which is sensibly
In order to obtain comparison results, a traditional improved. A similar effect is observed for the Cost —it
neurocontroller was computed. The same algorithm represents the evaluation of the Ec. (8). This performance
parameters were used for the calculation of the improvement associates an increase of the operative cost,
neurocontroller with states x= [k W(k) N(k)]T and the other which is 1.8%. This fact shows a commitment situation
one with the state defined by expression (11). The former given the improvement of 78% of the final state error and
already was developed previously, [13], with its numerical 44% of the cost function reduction.
results shown in [10] and with some shown experimental However, note that the operative cost do not rise with the
results in [14]. same rate in the proposed approach compared against the
traditional one, when the environment conditions are very
different —that is, a very bad weather forecast.
This effect can be seen in Fig. 7, which shows that

although the weather conditions differs with the patterns .
used by the algorithm when computing the neurocontroller, [7] Oppenheim, A. Willsky, Hamid Nawab, S., “Signals and
the system’s performance does not get worse as is the case systems”. 2nd edition by Prentice-Hall, Inc. Pg. 168-170,
of the traditional neurocontroller. 1997.
[8] Pucheta J., H. Patiño, R. Fullana, C. Schugurensky,
V. CONCLUSIONS Kuchen, B., “A neuro-dynamic programming based
An NDP-based neurocontroller for guiding the crop optimal controller for crop-greenhouse systems” X Rpic.
growth in greenhouse system with considering perturbations San Nicolás Bs As, Argentina, 2003.
was presented. [9] Pucheta, J.A., Schugurensky, C., Fullana, R., Patiño H.,
The effect of increasing the state vector dimension in the Kuchen, B., “Optimal greenhouse control of tomato-
present approach with respect to the traditional one, is to get seedling crops”. Computers and Electronics in
enriched the information to the control law during the on Agriculture, Volume 50, Issue 1, January 2006, Pages
line operation, given that it considers historical weather 70-82.
conditions. Thus, the weather forecast is built-in by the [10] Pucheta, J.A., Schugurensky, C., Fullana, R., Patiño H.,
optimization procedure. Kuchen, B., “A Neuro-Dynamic Programming-Based
From the control exactness point of view, the Optimal Controller for Tomato Seedling Growth in
improvement reached is remarkable. Nevertheless, the same Greenhouse Systems”. Neural Processing letters.
results with the economic costs were not obtained. Springer Verlag Eds. Volume 24, Number 3 / December,
The experimentations with the proposed neurocontroller 2006, Pages 241-260.
demand the availability of a greenhouse with the necessary [11] Pucheta, J., Patiño, H., Fullana, R., Schugurensky C.,
equipment at the field. These requirements seem to be Kuchen, B., “A state observation approach for crop
justified in accordance with the results obtained by means of growth control with a neuro-controlled greenhouse
simulation presented here. system”. XXº Congreso Argentino de Control
Automático, AADECA 2006. Buenos Aires. 2006.
ACKNOWLEDGMENT [12] Pucheta, J.A., Patino, H.D., Schugurensky, C., Fullana,
R., Kuchen, B., “Optimal Control-Based Neurocontroller
This work was supported by the National Council for
for Crop Growth in Greenhouse”. Networking, Sensing
Scientific and Technical Research (CONICET), the National
and Control, 2006. ICNSC '06. Proceedings of the 2006
Agency for Scientific and Technological Promotion
IEEE International Conference on 23-25. Ft. Lauderdale,
(ANPCyT) under grant PAV-TIC-076 and PICT/04 25423,
Florida, U.S.A. April 23-25, 2006. Page(s):398 – 403.
and the National Institute for Agricultural Technology
(2006).
(INTA).
[13] Patiño, H.; Pucheta, J.; Fullana, R.; Schugurensky, C.;
Kuchen, B., 2004. “Neuro-dynamic programming-based
REFERENCES
optimal control for crop growth in precision agriculture”.
[1] Bertsekas, D., “Dynamic Programming and Suboptimal Intelligent Control, 2004. Proceedings of the 2004 IEEE
Control: A Survey from ADP to MPC”. In proc. of International Symposium on, Sept. 2-4, 2004. Pp: 397 -
Decision and Control, 2005 and 2005 European Control 402.
Conference. CDC-ECC '05. 44th IEEE Conference on. [14] Patiño, D., Pucheta, J., Schugurensky, C., Fullana R.,
[2] Bertsekas, D., Tsitsiklis, J., “Neuro-dynamic Kuchen B., “Approximate Optimal Control-Based
programming”. Chapters 5 and 6. Athena scientific. Neurocontroller with a State Observation System for
MIT, 1996. Seedlings Growth in Greenhouse”. In proc. of the 2007
[3] Bishop, C., “Neural Networks for Pattern Recognition. IEEE International Symposium on Approximate
Pp. 290 292. University Press. Oxford, 1995. Dynamic Programming and Reinforcement Learning, 1-4
[4] Jones, J., “Crop growth, development and production April 2007 in the Hilton Hawaiian Village, Honolulu,
modeling”. Proc. of the Symposium on Automated 2007.
Agriculture for the 21st Century, ASAE 447-457, 1991.
[5] Lapilli, S., R. Fullana, C. Schugurensky, Pucheta, J.,
“Modelo algebraico de temperatura de un invernadero.
Algoritmos de control”. IX RPIC, Santa Fe Argentina,
2001.
[6] M. Norgaard, O. Ravn, N.K. Poulsen, L.K. Hansen.
“Neural networks for modelling and control of dynamic
systems: A practitioner's handbook”. Springer-Verlag
London Ltd. 2000.

Stock Investor Behavior Simulation with Hybrid Neural Network

and Technical Analysis on Long-term Increases of Hong Kong
Hang Seng Index
Chi Xu, Yan Cai, Zheru Chi,

Department of Electronics and Information Engineering, The Hong Kong Polytechnic University, Hong Kong
AMS subject classifications: 03E72,68T73,49N35 be reflected in the analysis. In addition,

technical analysis is open to interpretation.
Abstract Even though there are standards, it is quite
In this project, the long-term data from Hong common that two technicians look at the same
Kong Hang Seng Index have been analyzed. chart but paint two different scenarios or see
The neural network technology is used to different patterns. And technical analysis is
simulate the investor’s trading behavior in the also concerned as solely with the dynamics of
stock market. The experimental results the market price and volume behavior as a
indicate that the system has certain behavioral basis for price prediction but ignores the
orientation instead of moving unregulated efficient markets hypothesis.
during the trading period. In addition, the
impact that different neural network parameter In the recent decade, neural network approach
settings take on the investor’s behavior is has become popular in analysis of stock
evaluated as well. market price before trading actions. Neural
Keywords: Neural network, Investor behavior, networks are non-parametric, non-linear
Simulation models that can be trained to map past values
of a time series, for purposes of classification
or prediction. It is able to work parallel with
Introduction input variables and consequently handle large
The stock investors follows the crucial sets of data swiftly. It supports detecting
principle in stock trading, which is to buy low multi-dimensional non-linear connections in
and sell high, but the practice is not as simple data, which is extremely useful for modeling
as the rule looks like, while not everyone can dynamic stock market.
win money from the market, because the
decision making mechanism in the investor’s W.Leigh, M.Pas, and R.Purvis [1] proposed in
brain might not be sufficiently perfect. 2002 that a hybrid approach with neural
network and pattern recognition technique to
The investors apply several classical predict short-term increases in the NYSE
approaches to predict stock price before any composite index, from which the investors
buy, sell, or hold behavior, while a large analyzed a 5 trading day data before trading
proportion of the security price investors actions, and such technique was capable of
practice technical analysis. Technical analysis returning results that were superior to those
is based on studying the past and current attained by random choice.
market activities with price and volume
patterns or index to do the prediction. In this project, the long-term data from Hong
However, technical analysis is subjective. Kong Hang Seng Index have been analyzed to
When analyzing a chart, personal biases can gain a wide angle view on the market

increasing progress. The impact that different constrain the inputs. A neural network engine
neural network parameter settings take on the is to train and test the data for simulation of
investor’s behavior is evaluated as well. investor’s decision making process, and one of
the technical analysis methods, Bull flag
template [2], is manipulated to fit the time
Methodology series data. The technical analysis is also used
Since long-term market data can include some to evaluate the trained network by
abnormal volatility from sudden impacts of feed-forward with back-propagation. Figure 1
politics, malicious trading actions, and etc., shows the working schemes of the system,
and the neural network analysis needs to use a which has inputs from framed price and volume
pattern template as the input data, a data information and supplies outputs for investors’
sorting process is required to examine and decision to buy, sell or hold the stock.
Figure 1 Windowed Price and Volume Components Relationship in Analyzing System
Figure 2 Template matching

Data Sorting rectangle template. The cross-correlation
In this process, two steps, data cleaning and computation is done for the 10 cells in the
template preparation, are proposed to sort the column and summed, ending with a result in a
data. The invalid data that are beyond standard fit value for the column. Hence, during each
that the boundary of which is set to be 2.5ı trading day, 10 columns fit values for price
should be defined as noise data and be and 10 column fit values for volume are
removed or normalized in the database. computed. Except for the column fit values,
the height of the window is also needed as an
Raw data inputs are generated from the time input. However, the window height value
series of price or price & volume data. If only needs a normalization step that has been
price data is to be processed, the number of mentioned, where the normalization is to scale
raw data inputs is equal to window size the data to the active range and domain of the
chosen. On the other hand, if price and activation functions used. The activation
volume data are processed, the number of raw function used in both hidden neurons and
data inputs is double the window size. output neurons is hyperbolic tangent function,
with the active range [-1, 1]. If a pattern data
The preparation for raw data inputs has only file with 566 patterns is generated, 90%
one normalization step. All the raw data will patterns are used for training, while three
be normalized to the range [-1, 1]. All the exams are performed with different number of
records in the column are scanned, either price hidden nodes: 3, 9, and 60.
or volume, to get the maximum data Xmax,
and the minimum data Xmin. Thus, the Neural Network Construction
abnormal data can be filtered as normalized. The batch mode training is used, where
weights of the links are modified after each
In the template fitting step, the normalized cycle, when all the training patterns are
data are applied the 10×10 grid template on a presented. The training process works in two
window of the input time series data of price major steps for a single iteration, in which first
and volume. The horizontal dimension of the step is the feed-forward activity, and the other
window corresponds to trading days, and the is back-propagation learning. The agent
vertical dimension corresponds to either stock determines the configuration of the neural
prices or volume values. network and writes to a configuration file, and
then the agent reads the configuration file to
The template consists of percentage of value construct a network, and randomly initialize
that time series data occupy the table, and the the thresholds and start feed-forwarding.
percentage is converted into code. To fit the When the output nodes calculate out the real
time series data, each cell of a column is output, an error function is performed to
entered the percentage of price or volume calculate the error, which is to be
values which fall into the respective cell in the back-propagated to the former layer. The error
column. If one cell in a column is fitted all 5 function is defined as:
values, then it is coded with 100%; if no value
1 P
is fitted in a cell, then this cell is coded with Ep ¦ (d p o p ) 2
2p1
0%. Then the percentage of values which falls
in each cell of a column is multiplied by the dp: the target output
weight in the corresponding cell of the op: the real output

p: the pth pattern (input – output pair) In each cycle, a cross validation is performed
P: total number of patterns in the training set to estimate the generalization error to prevent
The weights will be adjusted when the neuron the training from overtraining. The mean
response in error. The function for updating squared error (MSE) is calculated during the
weights is defined as: cross validation, and training continues until
the MSE is increasing.
wi (t ) wi (t 1) K ( d p o p ) xi , p
Figure 3 Templates Applied in the Analyzing System

The general effectiveness of the trained decides which network to test and which
network needs to be tested. Firstly, the agent pattern data file to be used. Then, neural

network configuration file and weight file are through an evaluating procedure. The predict
read and a feed-forward neural network is result as well as the actual result can be
constructed. Furthermore, the pattern data file displayed, when the information of the day to
is read, inputs data are fed into the input nodes, be predicted exists, otherwise only the predict
and feed-forward to the next layer, until the result can be displayed.
outputs get a real output value. With this real
output value, and the destination output, mean
squared error of each cycle is calculated. If the
Simulation and Discussion
agent is to forecast the future tendency, the Two pattern data files are used, in which one
precision of the prediction, the number of is new with no overlap on dates of training
actual increase, the number of predict increase, pattern data file, and the other is the original
and the number of successful predicted training file. The new pattern data file for
increase will be displayed. If the agent is to testing is generated with the same settings as
estimate the future price, the MSE and the training file except the ending day and
average error is displayed. record numbers, which is set to be April 01,
2006 with 180 records accordingly.
The result from the network needs to go
Number of hidden nodes 3 nodes 9 nodes 60 nodes

Training Time 20s 32s 38s
MSE 0.627 0.57 0.623
Testing with new data MSE 0.598 0.585 0.728
Presision(%) 55.4 57.2 50.2
Actual Buys 93 93 93
Indicated Buys 137 77 61
Good Buys 78 46 34
Indicate Precision (%) 56.9 59.7 55.7
Probability of good buys (%) 83.9 49.5 36.6
Testing with training MSE 0.607 0.595 0.595
data Presision(%) 57.5 61.3 59
Actual Buys 334 334 334
Indicated Buys 463 457 408
Good Buys 274 282 251
Indicate Precision (%) 59.2 61.7 61.5
Probability of good buys (%) 82 84.4 75.1
Table 1 Optimization of hidden nodes in the neural network
The Precision indicates the percentage of indicates the number of days in which the
successful predicting on trend over all system successfully predicts an increasing
predictions. The Actual Buys indicates the trend. The Indicate Precision is derived from
total number of days in which there is an dividing Good Buys by indicated Buys. The
increasing trend. The Indicated Buys indicates Probability of good buys is derived from
the number of days in which the system dividing Good Buys by Actual Buys.
predicts an increasing trend. The Good Buys

During the training period, the neural network
with less hidden nodes requires less training The simulation of the investor behavior in the
time, while more hidden nodes requires more market is based on the agent’s prediction of
training time. Concerning the effectiveness, the market price. After the optimization of the
the precision and indicate precision of neural agent settings, an ideal performance of the
network with 9 nodes are much better. neural network is highly expected.
Pattern template preparation settings Neural network training settings

Index/stock file Hang Seng Index No of hidden nodes 9
End day 2005.07.01 Learning rate (hidden) 1
No. of records 600 Learning rate (output) 1
Price/price & volume Price & volume Momentum (hidden) 0.5
Raw data preparation Yes Momentum (output) 0.5
Template matching Cup and Holder Flatness (hidden) 1
Descending Triangle
Head and shoulders
top
Window size 60 Flatness (output) 1
Predict day 5
Table 2 Settings of experiment on predicting tendency with combination of ideal settings
Testing with MSE 0.794

new data Precision (%) 51.7
Actual Buys 67
Indicated Buys 51
Good Buys 31
Indicate precision (%) 60.8
Probability of good buys (%) 46.3
Testing with MSE 0.394
training data Precision (%) 0.784
Actual Buys 316
Indicated Buys 350
Good Buys 275
Indicate precision (%) 78.6
Probability of good buys (%) 87
Table 3 experiment on predicting tendency with combination of ideal settings
Different pattern data have also been tried to technique of combining two template
examine how the input data affect the matching and neural network gives better
prediction precision, which affects trader’s return than performing neural network alone.
behavior in the market. Experimental results
indicate that such an agent is capable of
supplying results that are superior to those
Conclusion
attained by random choice; what is more, the The system applies the hybrid technology with

neural network and technical analysis to Joint Conference on Volume 2, 7-11 June 1992,
analyze the data from Hong Kong Hang Seng p465-471 vol.2
Index. Different neural network parameter 6. Clarence N.W. Tan, Gerhard E. Wittig, A
settings as well as different pattern data have Study of the Parameters of a Back Propagation
been examined to verify the impact on the Stock Price Prediction Model Artificial Neural
performance of the system to simulate Networks and Expert Systems, Proceedings
investor’s behavior. The experimental results First New Zealand International Two-Stream
indicate that the prediction from system can Conference on 24-26 Nov. 1993, p288– 291
supply to the investors clearly certain 7. William Leigh, Naval Modani, Russell
behavioral orientation instead of moving Purvis, Tom Roberts, Stock Market Trading
unregulated during the trading period. Rules Discovery using Technical charting
Heuristics Expert Systems with Applications,
v 23, n 2, August, 2002, p155-159
Reference 8. William Leigh, Noemi Pazi, Russell Purvis,
1. W. Leigh, M. Paz, R. Purvis; An Analysis of Market timing: a test of a charting heuristic W.
Hybrid Neural Network and Pattern Leigh et al., Economics Letters 77, 2002,
Recognition Technique for Predicting p55-63
Short-term Increases in NYSE Composite 9. Martin. J. Pring, Technical Analysis, 4th
Index, v 30, 2002, p69-76 edition
2. William Leigh, Russell Purvis, James M. 10. Chart School, http://www.stockcharts.com
Ragusa; Forecasting the NYSE composite
index with technical analysis, pattern
recognizer, neural network, and genetic
algorithm: a case study in romantic decision
support Decision Support Systems, v 32, n 4,
March, 2002, p361-377
3. Wanas, N., Auda, G.; Kamel, M.S., and
Karray, F., On the Optimal Number of Hidden
Nodes in a Neural network, Electrical and
Computer Engineering, 1998. IEEE Canadian
Conference on Volume 2, 24-28 May 1998,
Page(s):918 - 921 vol.2, Digital Object
Identifier 10.1109/CCECE 1998
4. Roy, S., Factors Influencing the Choice of
Learning Rate for a Back Propagation Neural
Networks, 1994. IEEE World Congress on
Computational Intelligence., 1994 IEEE
International Conference on Volume 1, 27
June-2 July 1994 p503-507 vol.1 Digital
Object Identifier 10.1109/ICNN 1994
5. Tetsuji Tanigawa, Ken’ichi Kamijo, Stock
Price Pattern Matching System - Dynamic
Programming Neural Network Approach
Neural Networks, 1992. IJCNN, International

APPLICATION OF NEURAL NETWORKS IN PREDICTIVE

MAINTENANCE OF ROTATING MACHINERY – A REVIEW
H Ranganathan1 , J Pattabiraman2
1
Sakthi Mariamman Engineering College,Chennai, 602 105, India. email : rangah@vsnl.com
2
MNM Jain College of Engineering, Chennai, 600 096 , India, email: dr_praman@yahoo.com
Many progressive organizations are conscious of

keeping their machinery health in good condition and
Abstract— Integration of interdisciplinary and domain choose to adopt predictive maintenance technology.
specific knowledge, databases and expert opinions using neural Vibration signature analysis, one of the very powerful tools
networks and fuzzy logic with a view to provide fast and of machinery vibrations diagnostics, has gained popularity in
reliable solutions is now rendered a possibility by advanced industrial applications to resolve and identify most likely
computing concepts and tools which are believed to help in
providing a quick remedy to some of the issues pertaining to
causes of excessive vibrations. Standard malfunction tables
machinery vibration diagnostics. This paper presents an presenting causes along with corresponding peaks in
overview on the status of machinery vibration problems vibration spectra have been made available by many of the
diagnostics and their limitation and discusses a conceptual leading instrument manufacturers, like IRD Mechanalysis, B
software design that can minimize such limitation and render &K etc.[2]Such a generic diagnostic tables still leave the
more precision to the diagnostics of vibration problems in following areas vague.
industries. Efforts are on to implement such a diagnostic tool
on a trial basis. (i) Attributing a specific cause to the increased spectral
response in vibration (more than one cause can also produce
Index Terms—Rotating Machinery – Predictive
the same response in spectrum) collating other related data
Maintenance – Artificial Neural Network approach –
BOLTZMANN’s Machine.
and plant operating parameters is not yet made available.
(ii) Since causes pertaining to only mechanical malfunctions

I. PROBLEMS OF VIBRATION IN ROTATING are identified as responsible for change in vibration
MACHINERY behavior, other likely causes such as due to change in
temperature, velocity, volume flow rate, humidity, etc are
It is known that vibrations in machines with moving or not considered for correlating with corresponding changes in
rotating parts is an out come due to several causal factors vibration spectrum. This aspect is qualitatively addressed
such as design, manufacturing, assembly, construction, using the experience knowledge of specialists who have
erection, installation and operation. No machine can be spent substantial part of their professional career in dealing
designed for a pre-determined vibration output. Vibrations with plant and machinery problems.
are felt and perceived only when the machine is running.
Over years of experience and pooling up of knowledge There is a cost to be borne for ‘vague knowledge’
resources from designers, operators, and researchers, and this may demand opening up of the machine to see if
vibration standards have been evolved [1] to provide some any defects are visible. This cost is contributed by two
guidelines as to whether the vibrations are acceptable and factors: firstly, the downtime of machinery resulting in
help in permitting continued operation of machines with or production loss and secondly, the machinery in general is
without remedial measures. not restored to its original performance condition after
reassembly. It is also quite probable that this process may
Quite often, plants and machinery management introduce further complications, which may have their own
generally do not concern themselves much with vibrations consequences. The main objective should be that the process
so long as the continuous operations are not hampered by of finding a solution to an existing problem should not result
most. It is felt that there is no need to take records of the in creation of further new problems, so to say to eliminate
characteristic features of vibration so long as problems are side effects (analogy with medical diagnostics)
not posed (like what is being practiced in human health The above can be summarized in the following two majors
monitoring). Only when vibrations are likely to exceed thrust areas.
limits, there is an awakening, resulting in finding a quick
solution to restore the system rapidly into normal operation. (i) There is a very strong need to integrate the knowledge of
This is procedure known as preventive maintenance. It is not machinery vibration data, vibration standards, past history of
an efficient method of maintenance management since it still vibration performance and diagnostic information with
does not provide answer to what had caused disorder, how it spectral response. This can be made possible only by
was manifested and how long it took to take notice of it. A designing appropriate databases and data flow between
Maintenance strategy that incorporates such knowledge is databases, which will narrow down the causes for machinery
known as predictive maintenance technique. vibration.
(ii) With the output provided by (i), and correlating with

expert’s knowledge as well as other parameters relating to

plant operations, sequence of events records and vibration GUIDELINES FOR VIBRATION ANALYSIS
behavior of similar machines operating elsewhere, the
solution can be more precisely provided to help in correct The nature of problems in machinery, causing
remedial action. Opening up of the machine only in case of vibrations can be identified through comparison of signals
absolute necessity is then performed. (like surgery in from a healthy machine with that from a machine having a
medical treatment) known problem. The general guidelines for adopting a
systematic procedure in collecting the relevant data to tackle
Challenged with high expectations from customers vibration problems are furnished below:
and a strong commitment from management to assure
continuous machinery operation with minimal or practically (1) Vibration measurement and analysis should be done on
no down-time, utility managements are compelled to look a machine that is having a problem and also on a machine
for a quick, accurate and a reliable solution to their that is trouble free.
machinery problem and provide a total solution which
incorporates, recommendation for modifications or repairs, (2) Periodic measurement (preferably once a week) of
and post repair operating procedures which ensure absence vibration and signature analysis should be done to see the
of re-manifestation of the existing problems or creating, trend of vibration behavior.
newer problems.
(3) Only one change must be effected at a time. When a
Rapidly expanding knowledge from several modification is carried out either on the machine or on the
specialized fields need to be integrated meaningfully to foundation, vibration and signature records should be
provide a total solution to the machinery problem. The appropriately tagged showing the status of machine and
human brain will not be able to handle such a vast foundation and the location and date of measurement
information bank and therefore computerization becomes together with the specification of instrumentation used. The
more effective for such situations. process is repeated for any other change or combinations of
changes made.
Integration of interdisciplinary and domain specific
knowledge, databases and expert opinions using neural (4) Systematic numbering & labeling of measurement
networks and fuzzy logic with a view to provide fast and location should be followed on the sketches used to describe
reliable solutions is now rendered a possibility by advanced the machine layout.
computing concepts and tools which is believed to help in
providing a quick remedy to some of the issues pertaining to (5) All details pertaining to the vibrating machine (make,
machinery vibration diagnostics. rating, drive-its connections with neighbouring equipment,
operating and process parameters etc.) should be recorded
This paper illustrates some typical cases pertaining
to vibration diagnostics of rotating machines and presents (6) Description of events occurring prior to noticing high
areas, which can be, probed further using the help of neural vibration should be obtained along with the data on
networks and fuzzy logic. operating conditions. This information could be obtained
from the machine operators/control room operators and
machinery logbook of the plant.
II. ROTATING MACHINES AND VIBRATIONS
(7) Whenever necessary the natural frequency of the system
PLANT EQUIPMENT CATEGORY (be it piping or equipment or structures) should be
determined using either a computer program or using
known mathematical analysis depending upon the problem
A plant's equipment can be classified into four categories
and desired degree of accuracy and kept as one of the
plant data.
Critical machinery - like turbine generator – the
unexpected failure of which causes significant production
(8) Signature analysis should be done for the same location
loss.
as was done earlier to identify changes if any, in the
vibration pattern after carrying out modification(s).
Critical, but spared machinery in arduous service -like
boiler feed pumps - the unexpected failure of which
In many process plants and power plants a
jeopardizes but does not interrupt production.
combination of both periodic and continuous monitoring
systems are required. This is classified as "Integrated Plant
Partially spared equipment - like pumps, compressors
Condition Monitoring" which is dictated by the
gears etc. - in critical services.
concentration and distribution of the rotating machinery, its
strategic importance to production, the time to failure and
Non-critical machinery - like various auxiliary service
the level of sophistication of the required monitoring system.
motors and pumps.
In this case, periodic monitoring is performed using
automated data collection while "the critical machineries are
Vibration problems, which may be damaging in critical
“hardwired” to a machine information center for continuous
equipment cannot be neglected and should be remedied as
protection. The data from the various periodic and
soon as possible when a problem is encountered. continuously monitored systems are off-loaded and
transmitted to a centralized computer terminal where process

and maintenance data is also logged. This will provide the So from the above review, it can be concluded that
maintenance manager with a comprehensive plant for periodic vibration monitoring Automatic Data Collection
performance and maintenance information database. Systems (both static and dynamic depending on the
suitability of overall vibration level measurement or
For machinery vibration analysis and predictive amplitude vs. frequency signature checks) should be
maintenance, the key monitoring parameters under steady introduced, while for critical machines continuous
state and transient conditions should include magnitude of monitoring through a Machine information center should be
dynamic motion (rotor radial vibration, casing vibration), adopted. In case of process plants and power plants where
frequencies of dynamic motion, phase angle and journal both periodic and continuous monitoring are essential, an
centerline position. Measurement of the parameters should integrated plant condition monitoring system is
be followed by data reduction into interpretable format. recommended.
Steady state (on-line) vibration data can be reduced into
Orbit or Time Base plots, comparative time spectrum plots, BUILDING IN INTELLIGENCE
Mode Shape and Trend Analysis plots. Transient vibration
data can be reduced into Bode (phase vs. speed and It is needless to emphasize that building in
amplitude VB speed) and Polar plots (phase vs. speed vs. “intelligence” in the automation process has become a
amplitude), spectrum cascade and plot of journal center line necessity to improve preciseness of diagnostics, remedial
position with respect to geometric center of bearing. For procedures and a procedure to perform these in a minimal
vibration analysis in the laboratory with the help of taped down time based on the situation. The efficiency of the
(FM) data recorded at site, it is desirable to develop the maintenance system is improved if a smaller domain is
necessary software and to use it in a digital computer selected for analysis and diagnostics than dealing with a vast
because of data reduction capability of a computer system. data pertaining to all the machines, which are not only time
consuming but can also lead to doubtful solutions. So a
dedicated system to each critical machine that caters to
entire knowledge integration is a more effective proposal.
OBSERVED DATA DATA (STATIC) MEASUREMENT DATA ANALYSIS DATA
(DYNAMIC) (DYNAMIC)
INSTRUMENT
MACHINE, DATA
PLANT PICKUP USED, INSTRUMENT
VIBRATION MAINTENANCE
PROBLEM ,
SPEED,
VIBRATION
DATA
SPECTRAL
CHARACTERISTICS OF VIBRATION PROBLEMS
STANDARD LOCATION ID, DECOMPOSITION
MALFUNCTION DIRECTION ID, SPECTRUM ID
S& PARAMETER OF DOMINANT
CAUSES MEASUREMENT PEAKS
BASE LINE VIB
SIGNATURE
ABSOLUTE PRINTOUTS
CHARTS
Common Causes of vibration are, some part of
CASING
(BEARING moving machinery being out of balance, turbulent fluid
VIBRATION) OR
ABSOLUTE flow, rattling of loose objects, impulses and shocks etc.
SHAFT
VIBRATION Standard Tables which relate the various sources of
vibration to the nature of resulting response, is available in
NEURAL NETWORKS COMPARISON DATA literature as well as in instrument catalogues [1], [2], [3],
(STATIC) (STATIC)
[4].
LIKELY CAUSES
MAINTENANCE BASE LINE DATA
HISTORY
PROCESS DATA
& SPECTRAL
DATA,
Using the data collected the information derived
ACTIVATION
MODEL
DOMINANT
PEAKS &
from diagnostic Table, it is possible to asses's the likely
ELIMINATE
UNRELATED
STANDARD
MALFUNCTION
cause(s) of vibrations with a reasonable range of accuracy,
CAUSES
MOST PROBABLE
DATA,
LIKELY CAUSES
and provide the remedial measure. References [5], [6] give
CAUSE IDENTIFICATION some case studies of plant vibration problems and how they
were resolved and convey also that with further refinements
on decision making, the solutions would have been much
DECISION OUTPUT better.
(STATIC)
MOST PROBABLE
SOLUTION TO
UNCERTAINTY IN VIBRATION DIAGNOSIS
VIBRATION PROBLEM
The vibration diagnostics table presents various problems,

remedial measures along with the corresponding peak
Fig 1 Machine specific vibration diagnostic flow chart using ANN spectral frequency data. For example, under the category of
vibration amplitudes, which show prominent peaks under
Utilizing all the above, a generalized flow chart running speed (1 * RPM), several possibilities exist as
depicting the flow of information as well as deductions causes, some of which are as follows.
and inferences made from these from start (beginning of a
vibration problem noticed or reported) to the end (1) Unbalance
(successful solution of a vibration problem) has been (2) Misalignment
evolved as shown in flow chart (Fig 1). This is helpful for (3) Bent shaft
future reference for tackling any vibration problem and (4) Assembly errors
incorporates scope of further analysis. (5) Foundation errors
(6) Eccentric armature of electrical m/c

(7) Electrically induced vibrations. In the fore going analysis, it is seen that any
(8) Flexible rotors mechanical problem in the machinery setup reflects in the
(9) Rotating M/c rubbing change in pattern of vibration spectrum, over time. The
(10) Self excited sub harmonic resonance experts in the field could “read” the vibration information
(11) Excessive clearance in journal bearings. and come out with a set of possible. These experts have
(12) Thermal unbalance gained considerable knowledge to be able to interpret the
(13) Increased Turbulence in air handling m/c information contained in the vibration analysis and transfer
(14) Damaged or bad gears the same into diagnostic information. However, such
knowledge is personalized. Transducers are used to measure
There is a need to make further observations to bring down vibration in terms of axial, vertical and horizontal
the above possibilities into a more appropriate manageable displacement or velocity or acceleration. There are also
list of causes. Additional data on vibrations, like increase in standard specified for different applications. It is possible to
axial vibrations, stroboscopic test, resonant behavior, have other machine specific data built into a database and
vibration response when electric power is switched off, shaft refer the same to compare with current data to identify any
vibration measurement in addition to bearing housing significant variation.
vibration, vibration change between the start up and steady
state with corresponding temperature change, change in With the aid of historic information and current
noise level etc., throw more light on narrowing down the data, it is possible to deduce the changes introduced into the
above list of causes, to a manageable list. If all these system. It is also possible to digitize such variations in an ‘n’
reduction point out to, say unbalance is the most likely dimensional plane, dimensions being (to name only a few):
cause, then it is necessary to check whether the earlier data
show consistently the symptoms reinforcing this reason. If Variations in:
it is pointing to the case of mechanical unbalance in the
machine, the measurement and analysis should also be made 1. Peak value of 1 * rpm vibration value.
at different flow levels and at the same speed and ensure that 2. 3db Spread at 1* rpm value in terms of frequency.
vibration readings do not change. Then it can be concluded 3. Peak value of ½ * rpm value.
to be a case of unbalance. This might have been caused by 4. 3db Spread at ½ *rpm value.
some “inputs” during transportation, handling, erection or 5. Peak value at critical speed.
installation. In such an event, field balancing, preferably a 6. 3db spread at critical speed.
two plane balancing can be recommended and machine 7. Peak value at 2 *rpm value
watched more frequently after this is done to ensure 8. 3db spread at 2 * rpm values and higher formats
consistent machine performance.
An expert, who studies these values from a
III. PROPOSED AUTOMATED SYSTEM vibration analysis data taken at the time of installation of the
machinery and at a later time, can identify “patterns”. The
In the proposed automated system with built in expert, may then, proceed to “classify” different possibilities
neural network and fuzzy logic, all the forgoing reasons are of potential malfunction. There will be attempts, then, to
automatically scanned with the other data and “put in” in the eliminate easily identifiable exceptions before one can
system along with expert system to give a final venture on to a study of more possible causes, which may be
recommendation. This is one such example and the same more time consuming.
logic can be extended to other causes of machinery
vibration. There is a similarity between human systems and
A conceptual scheme illustrating the organization Artificial Neural Networks. Artificial Neural Networks
of databases, dynamic data and logical decisions derived (ANN) take the cue from the way the human brain works
from expert systems and Neural Networks is furnished and tries to emulate a similar operation with the help of
already in Fig.1. computer program. Human brain is the fastest parallel
It must be appreciated that the plant processor which is best at processing different data
instrumentation system provides scope for retrieving all the perceived simultaneously through different organs and
above data. What is necessary to solve the problem precisely recognizes different patterns.
is not just looking at only the vibration spectral data but also
all other data over a period and look at them together. Such Human brain is considered to be fast and most efficient in
a totality approach ensures unique cause to the observed the following:
effect, and arrives at unique solution and also provides 1. Pattern Recognition – Audio - Image
reasoning to plant personnel that a scientific approach is 2. Classifications – Tall – Short. Etc.
followed and the solution is justified. 3. Clustering
EXPERT SYSTEMS ANNs operate on the principle of identifying the digital data
inputs and recognizing “patterns” out of them as human
The details of the Expert Systems are conceptually brain is expected to do. After the pattern recognition, ANNs
presented as an integral part of plant vibration diagnostic classify different faults and the list of possible remedies.
systems. ANNs operate on the following 2 specific characteristics:
1. Structure
2. Learning process.

Structure: proportion to the overall change. In short, there is a lack of
adequate diagnostic and classificatory procedure in the
Different networks use different structures with a no. of existing signature analysis approaches. Uncertainty and
layers Based on the actual application the structures vary. incomplete information seem to prevail generally. On rare
Various options are. occasion the desired constraints are specified. Even on those
1. Perception rare occasions, the assessments made are incomplete to the
2. Back propagation networks. extent that a total solution is not achieved.
3. Kohenan’s Self Organizing networks
4. ART networks ROLE OF ARTIFICIAL INTELLIGENCE
5. Hopfield etc.
In the field of Artificial Intelligence (AI), the recent
Learning Process: past has seen the evolution of several methods and tools for
constructing diagnostics systems. It appears that there is no
Each network has to be “trained” first. The network has to single method / tool which is claimed to be the best. Studies
be ‘informed’ about a particular set of inputs that will result have revealed that there is no method, which has clear
in a particular set of outputs. This training or learning advantages over others. The main issue at stake has been the
process can be under supervision or no supervision at all. accuracy of such systems to predict outcomes in a selected
Based on the available information and on the number of set of symptoms. The development of an intuitive ‘feel’ for
samples available for “training” the network, the training is the domain, a judgment skill that is routinely displayed by
performed In this network, number of input neurons and experts is, however, functionality that is impossible to
output neurons and an arbitrary network with arbitrary replicate using classical AI representations. Classical AI
weights are considered first A set of inputs and outputs are systems do not allow assimilation of new information about
specified, which will alter the weights. The alteration of prior case histories, which is an intrinsic part of decision-
weights will proceed until the local and global minima are making process. Being aware of such limitations in solving
reacted. After the training, when once a set of inputs is diagnostic problems using AI techniques, in this paper it is
presented, the network will give the set of outputs. proposed to have a more practical approach using Artificial
Neural Networks (ANN) for capturing, quantifying and
emulating the diagnostic (objective) logic followed by the
NEURAL NETWORKS APPROACH experts.
The maintenance engineer has the responsibility ARTIFICIAL NEURAL NETWORK TECHNIQUE
not only to assess the severity of the problem precisely when
there is a change in vibration level of any machine, but also The proposed ANN model employs a unique
to indicate how soon the rectification should be carried out. decision making paradigm that enables the gradual
Such professionals are usually experts in their respective augmentation of domain knowledge through as experience
domains with comprehensive knowledge about relevant based, unsupervised learning process called episodic
characteristics that are likely to influence the problem. learning. The model used here is a process mechanism,
Besides diagnostic skills, – that is gained through repeated which assumes that all information processing takes place
practice and tutoring –, with experience, these professionals through interactions of a large number of simple processing
develop a certain ‘feel’ for the domain which enables them elements that are connected to each other in the form of a
to improve their assessment of the extent of problem in a network. The most distinguishing feature of such a network
given case. is the absence of a central controller to co ordinate the
processing within the system. The relative autonomy with
The main characteristics of such an assessment of diagnostic which each processing element can carry out its
/ classification are: computations make parallel activation models extremely
attractive for solving problems such as the one on hand.
- Identifying the impairments from information These problems traditionally required inordinate amounts of
available based on vibration pattern. processing time and capacity.
- Confirm the identified impairments based
specific criteria to determine the machine Diagnostic reasoning is a high level cognitive task
status. that involves providing explanation to a set of symptoms by
postulating a set of disorders or impairments. Existing
Using a manual one can assess the likely symbol-processing models of diagnostic reasoning do not
impairments reasonably well, but only in the ideal case provide adequate mechanism to support interaction among
scenarios. The expert has to make an assessment based on knowledge structures, which are necessary to capture the
the vibration pattern, which is available with him. All generative capacities of human diagnosticians in novel
missing information must be compiled before attempting to situations. Achieving such interactions has been one of the
classify an impairment to determine the cause of variation in greatest difficulties with implementing models of diagnostic
vibration pattern. Even when the complete machine reasoning, which can reason in the presence of imprecise
functional profile is available, it is a time consuming and and incomplete information. Our proposed model is based
expensive process. One is unlikely to match specified on the theory that diagnostic reasoning takes place through
criteria exactly in every case. The ambiguity in rendering a the interactions of several relevant domain concepts
classification is made more complex in cases where more connected to each other in the form of causal network. The
than one malfunction are present and contribute in some model is based on an underlying associative network of

nodes that represent domain concepts or ‘knowledge atoms’.
These nodes are connected to each other through links that 1. For any given manifestation node that is said to be
represent associations (both causal and non causal) among present, all plausible disorders that can account for
the knowledge atoms. The network consists of a set of its presence get activated initially.
hierarchically layered nodes in which nodes in one layer
connect only to the nodes in the immediately adjacent layer 2. Among alternative disorders with same
as shown in figure 2. manifestation, only one disorder stays active by
completely inhibiting all others (some of them may
get reactivated subsequently to account for other
observed manifestation)
3. Each observed manifestation is accounted for by at

least one disorder that stays active.
4. Disorders that are found to be active at the end of

the problem solving session are those that present a
globally most plausible explanation for observed
manifestations.
It is useful to think of associative links between the nodes of

There are two kinds of nodes within such a network. They adjacent layers as constraints, which could be a numeric
are: value denoting how activated one node can become when its
- Nodes whose activations do not change during corresponding associated neighbor is fully activated. Since
problem solving phase (Stuck at nodes) these constraints are only local to specific nodes that are
- Nodes whose activations change during involved, they are to be satisfied only locally. In the present
problem solving (memory or accumulator model, the activity of a node that spreads to its neighbor is
nodes) not determined by the level of activity possessed by the
recipient node. In such models, the incoming activation for a
A node in the network is said to be active if it has a node is the weighted sum of the activities of its neighbors to
non zero activation. An active node can have a negative or a which it is directly connected.
positive activation level which lies in the interval of –1.0 to
+ 1.0. Positive activation signifies presence of the concept in Two separate aspects of contention that exist
the node. Negative activation signifies absence of the between neighboring nodes that become competitor and
concept that is the node. Zero activation signifies no form an inhibitory cluster are identified. The idea of
knowledge about the presence or absence of the concept competition is introduced by permitting the evoked
represented by the node. disorders of manifestation mj in an adjacent disorder layer,
to actively compete for the total output activity of mj. The
Nodes in one layer are connected to the nodes in ability of disorder node di to compete for mj’s output activity
the adjacent layer through bi directional links. In such a aj(t) is proportional to its own activation level aj(t) and to
network, one can define a connectivity matrix for the nodes weight of its association with mj. This computation enables
of two adjacent layers. Each entry in the matrix specifies highly activated nodes to extract a proportionally larger part
association between a pair of nodes in mutually adjacent of mj’s finite activation, leaving successively smaller
layers if it exists, and the strength (weight) of this portions for competition with lower activations.
association. These weights exist in the interval of –1 to + 1.
A negative weight implies a preventive association or
inhibitory link and a positive weight implies a supportive In diagnostic reasoning, whenever supporting
association or an excitatory link evidence is found for one disorder among several plausible
alternatives, then not only does it support the belief that a
A network with two adjacent layers, layer D disorder is present, it also discourages the belief that one of
(disorders or causes) and layer M (manifestations or effects) the other alternatives is present. Since competing disorders
can be considered in parallel activation model frame work. influence each other through the manifestation that they all
Each node di in layer D is connected to a set of nodes man share, this drain of activation among competitors can be
(di) in layer M through causal links. Similarly each node Mj interpreted as a correction of original support that is
in layer M is connected to a set of nodes evokes mj in layer received by each competing disorder di from common
D through upward evoking links. In each direction manifestation node mj. The degree of confidence (DC) in the
associative strength may either represent conditional presence of disorder di due to each manifestation nodes mj
probabilities or subjective causal or evoking strengths that are present from its manifestation (di) set is computed as
respectively. If conditional probabilities are used as weights the sum of contributions that are made by each of the
for any given manifestation node, the sum of all upward link individual manifestations.
weights must add up to 1.0. These constraints need not be
necessary or relevant when subjective and evoking strengths Similarly, the Degree of uncertainty (DU) in its
are used as weights. In the context of parallel activation presence is computed as the sum of the activation that is
model, the following characteristics are to be noted: drained from it by all of its competitors that share its active

manifestations. One other phenomenon that is believed to Symposium on Vibration of Power Plant Equipment” (BARC
Bombay, March 1986.
occur during diagnostic reasoning is that the belief in the
[2] Brik Anderson “Vibration Monitoring System, a way to Improved
presence of a manifestation is diminished if other Reliability”, Technical Symposium, Thermal Power and Heat
manifestations which share a common cause with it are not Generation, ASEA 1983.
observed. This effect can be expressed as a proportionately [3] Hills P.W. “Predictive Maintenance Systems for Monitoring
lower degree of confidence in the presence of their common Vibrations on Rotating vibration information from Rotating
Machinery”, IRD Mechanalysis (U.K.) Ltd., August 1986.
cause. Active Fan Out (ACTFOi) can be computed as a [4] Bently D.E., Zimmer, S. Palmatier George E, Muszynska Agnes
weighted measure of the portion of the manifestations of di “Interpreting Vibration Information from Rotating Machinery”,
that are in fact found to be present. Using these definitions, Bently Nevada Corporation, 1985.
one can compute the net flow of activity into a disorder node [5] Vibration studies on suction piping of a boiler feed pump, J
Pattabiraman, R Srinivasan and D K Bhattacharjee, Proceedings of
di at time t as: the National Symposium on Vibration in Power Plant Equipment,
BARC, Trombay, Bombay, March 19-22, 1986.
Fi(t) = tanh [ DCi(t). ACTFOi – DUi(t)] [6] Vibration problems in operating plants – Some Case Studies, J
Pattabiraman, Paper published in the proceedings of the International
Conference on Condition Monitoring, CM, 97 held in Zian, P R of
Given this competition based parallel activation approach in China, March 97.
satisfying local constraints in a causal network, a global
interpretation of the effects is possible by allowing the
network to stabilize. H. Ranganathan , India, received his BE in Electronics
and Communication Engineering and M.Sc (Engg) in
Communication Systems Engineering from College of
Such a network could have the structure as shown in figure 3 Engineering, Guindy, Madras 600 025, India in the year
1975 and 1978 respectively. Currently he is pursuing PhD
in Anna University, Chennai, India.
He had worked in Installation and Maintenance of
various computers with different configurations from
1978 until 2000. He had developed Hardware and Software solutions for
various field problems with many customers in India. He has taken to
teaching from November 2000. Currently he is Professor and Head,
Department of Electronics and Communication Engineering, Sakthi
Mariamman Engineering College, Thandalam, 602 105 (near Chennai),
India
Dr. J. Pattabiraman holds Bachelor’s, Master’s and

Doctorate degrees in Mechanical Engineering from the
Indian Institute of Technology, Madras, India. He has
Thirty Eight Years of Experience out of which about 25
years were in consulting and manufacturing industries in
IV. CONCLUSION India and the balance 13 years were spent in Teaching
and Research. He has 32 papers to his credit in the areas
of stress, vibration and failure analysis and Remanent
In this context d1,d2,d3… etc could imply disorders life evaluation of power plants. He is Currently Dean (Mech. Engg.) and
such as unbalance, looseness, bend in the shaft, non Vice-Principal of MNM Jain Engineering College, Chennai, India. He is a
alignment, excessive bearing clearance etc., and m1,m2,m3… member of SAE (International), ASME (International) and Fellow of
Institution of Engineers (India) & Institution of Production Engineers
etc could be manifestations like vibration spectrum (India) and is a Chartered Engineer
exhibiting peaks at 1xrpm,2xrpm, critical speed1 , critical
speed 2 etc.
With the proposed intelligent system built in the

diagnostic process, the authors believe that vibration
problems in plant and machinery could be made more ‘user
friendly’ and the scientific, heuristic and probabilistic
approaches could also be merged in the final decision for
corrective action which will also ensure that the cause –
effect relations are clearly understood and corrective action
is made fool – proof, without introducing any new problem
in the process as per fig 1.
The authors propose to work further and apply the above

concept to a live industrial problem and shall present the
results in their future publications.
REFERENCES
[1] Chhaya S.V. and Sharma S.K. “Vibration monitoring and Diagnosis
of Rotating Machines” a technical paper presented in “National

Artificial Neural Network approach for Estimation of

Hemoglobin in Human Blood using Color Analysis
H Ranganathan1, N Gunasekaran2
1
Sakthi Mariamman Engineering College,Chennai, 602 105, India. email : rangah@vsnl.com
2
College of Engineering Guindy, Anna University Chennai, 600 025, India, email: nguna@annauniv.edu
number of independent variables is limited to 2 or 3. When

the number of independent variables is more than 3, it is
practically impossible to visualize the relationship.
Abstract—Artificial Neural Network (ANN) is an excellent
approach for establishing tacit relationship among variables. Many problems, which involve real word intelligent tasks
Anemia is a globally dreaded health hazard. Anemia is due to such as speech, vision or natural language processing, do not
reduction of Hemoglobin (Hb) level in one’s blood. To reduce fit into categories of straightforward relationships. Only
the complications due to anemia, Hb level needs to be
measured. There are a number of methods available for
computer-based algorithms are useful in overcoming
estimating Hb value in blood. These give very approximate cumbersome manual calculations that tend to get repetitive
results as compared to standard and well-recognized and time consuming. Many software packages are readily
Cyanmethemoglobin method. Cyanmethemoglobin method is available for finding solutions or for “curve fitting”. It is
not suitable for rural settings as it involves handling of deadly found that these concepts are easier to follow in establishing
chemicals and sophisticated equipment. It is found that there is linear relationships. When the problem on hand involves a
a correlation existing between the color of one’s blood and the number of variables with non-linear relationships, it
corresponding Hb level in one’s blood. There is a requirement becomes cumbersome.
for simple method suitable in rural settings. The color matching
technique is one of them and it is recommended by WHO for
adaptation in low resource settings. Since human interpretation
Artificial Intelligent (AI) techniques appear to be
errors are likely to creep in during this subjective method, promising for providing solutions to intelligent task.
computerized method to determine the color of blood is devised. However, even AI techniques need clear specifications of
One non-invasive method is examined for establishing the the problems. Besides, it is required to map the problem on
relationship between the color of the blood and Hb level. Since hand into a form suitable for application of AI techniques.
the level of accuracy is not adequate, an Artificial Neural There are some well-known approaches available for use in
Network (ANN) approach for estimation of Hemoglobin in AI techniques. Some of them are:
human blood has been evaluated using back propagation
learning algorithm. The ANN used color-coded values of the
1. Heuristic search methods
samples as input and the Hemoglobin value, as obtained with
the Cyanmethemoglobin method, as desired output using 2007 2. Rule-based approach
samples. The results show a strong relation between the color
of the blood sample and the hemoglobin level in the blood. In all the approaches it is important that one
understands the underlying principle involved and one
Index Terms—Estimation of Hemoglobin in Human blood – applies the information available in a suitable method for
Artificial Neural Network approach – Back Propagation identifying the mapping relationship that exists. To
Network. overcome the problem of matching the given problem to AI
approach, many scientists feel that the intelligent mapping of
inputs and outputs can be achieved by Artificial Neural
I. INTRODUCTION Networks. It is possible to extract relevant features from the
Many relationships between variables are not clearly input data and recognize patterns by ANN approaches.
defined. But these relationships exist in some abstract form. These networks learn from examples without a need for
When there is a set of dependent variables that depend on a explicitly stating the rules.
set of independent variables, many approaches in
mathematics can be used to define the existing relationships In using ANN techniques a distinction is made between
between the independent and dependent variables. Generally patterns and data. Knowledge, commonsense and learning
these relationships are established by means of observing are natural to human beings but unknown to computers. This
variations of dependent variables based on the variations of is because; humans recognize patterns and computers are
independent variables. Some of the known methods of very good in handling data. Recall of patterns is found to be
establishing relationships are: possible even when the data is noisy are partial. Many data
patterns appear in real word with implicit relationship
1. Curve fitting method between input and output patterns. ANN systems are
2. Regression Analysis provided with some training pattern pairs. The system maps
the relationship between the input patterns and the
It is possible to visualize the relationship in the form of corresponding output patterns. When an input pattern is
straight line, a curve, a plane or a curvilinear plane, if the offered for testing, the stored mapping is recalled and after
interpolation, corresponding output pattern is identified.

ANN is a technique that is extensively used in the field of There are a number of methods available for estimation of
medicine [1,2,3]. ANN works on the basis of architecture Hemoglobin level in blood. These are documented and dealt
and a learning algorithm. There are a number of in detail by World Health Organization (WHO) [6]. Of all
architectures and a number of learning algorithms. Based on the methods available, Cyanmethemoglobin method is
the nature of problem and the type of mapping required, the considered the most accurate and also recommended by
architecture and the learning algorithm are chosen. It is WHO. Cyanmethemoglobin method cannot be the regular
experienced that the selection of proper input variables will method for estimation of Hb in blood as it calls for handling
have significant impact on the performance of ANNs. For of hazardous chemicals such as Cyanide, skilled technician
examining existence of any type of relationship, where there and some sophisticated equipment. Researchers at Indian
is no explicit relationship between input variables and the
Institute of Health and Family Welfare, Hyderabad, and
output variables, Back Propagation Network learning
AIIMS, New Delhi, India observe that there is a difference
algorithm is recommended.
in Hb levels of 2.08 mg/dl between cyanmethemoglobin
It is found that the color of blood can give an indication
about the hemoglobin (Hb) level in one’s blood. Currently method and Hemocue method [7,8]. This can translate into
available methods of estimation of Hb level in blood use this an error of approximately 15% to 20%, as compared to the
information. For establishing the relationship between the reading by Cyanmethemoglobin method.
color of the blood and its Hb level, two methods are Since the most of the clinics and diagnosing laboratories
suggested. In one method, the color of the blood is use one of the methods other than Cyanmethemoglobin or
determined by occluding blood flow to thumb tip and taking Hemocue methods, the readings obtained have a tendency to
the digital photograph blood-welled thumb tip. Any natural give an incorrect estimate of Hb in blood resulting in false
color can be resolved into Red, Green and Blue components. sense of safety leading to complications, specifically for
The R, G and B values before and after occlusion of blood women in pregnancy. The readings will have a variance from
flow give indication of blood color in terms of R, G and B the standard method beyond 15% to 20%. Most of the rural
values. Multiple regression is carried out to establish the settings cannot fulfill stringent requirements as required by
relationship between the R, G and B values so found and Hb standard methods and hence there is a need for simple,
value of the given sample. To validate the relationship, color objective, accurate and easily implementable method for
digitization along with ANN technique is used. This paper analysis of Hb in blood but still cost effective. The method
explains the steps involved in such a process. based on ANN provides a possible breakthrough.
Blood consist of cellular material (99% red blood cells,
II. HEMOGLOBIN IN BLOOD with white blood cells and platelets making up the
remainder), water, amino acids, proteins, carbohydrates,
lipids, hormones, vitamins, electrolytes, dissolved gases, and
HEMOGLOBIN cellular wastes. Each red blood cell has about 33%
hemoglobin by volume. Plasma is about 92% water, with
Hemoglobin (Hb) is a blood substance containing Iron plasma proteins as the most abundant solutes. The main
and protein. Hb needs to be monitored in some cases plasma protein groups are albumins (~60%), globulins
regularly. The drop of Hb in blood results in anemia and it is (~36%), and fibrinogens (~4%). The primary blood gases
a health hazard. The drop of Hb level in blood can be due to are oxygen, carbon dioxide, and nitrogen. The plasma
deficiency of Iron, Vitamin B12 or Folic acid. Of these, nutrients include amino acids, simple sugars (e.g., glucose),
anemia due to Iron deficiency is normally prevalent. lipids (e.g., triglycerides, phospholipids, cholesterol), and
Hemoglobin is responsible for carrying Oxygen from the nucleotides. Cellular wastes include nonprotein nitrogenous
lungs to various parts of body through blood. So, reduction substances, such as urea, uric acid, creatine, creatinine, and
in Iron level will result in reduced Oxygen carrying capacity amino acids. The plasma electrolytes include sodium,
of blood, which can have adverse effect on the health of the potassium, magnesium, calcium, bicarbonate, chloride,
individual. sulfate, and phosphate ions. Blood is slightly denser and
Anemia afflicts about 2 billion people worldwide, mainly approximately 3-4 times more viscous than water. Blood
women and children as per the report by InFocus [4]. The consists of cells that are suspended in a liquid. As with other
main reasons behind such an alarming proportion of suspensions, the components of blood can be separated by
prevalence of anemia can be poverty, inadequate diet, filtration, however, the most common method of separating
certain diseases such as leukemia and thalassemia, blood is to centrifuge (spin) it. Three layers are visible in
pregnancy and lactation and poor access to health services. centrifuged blood. The straw-colored liquid portion, called
Many times wrong estimation of Hb level leads to false plasma, forms at the top (~55%). A thin cream-colored
sense of security resulting in more severe anemic conditions. layer, called the buffy coat, forms below the plasma. The
Anemia is not very prevalent in developed countries, but buffy coat consists of white blood cells and platelets. The
observed in alarming proportions in developing countries. red blood cells form the heavy bottom portion of the
The problem of anemia gets aggravated specifically separated mixture (~45%) [9,10]. So, it can be deduced that
during menstruation, pregnancy and athletic and sports the color of blood can give an accurate estimation of Hb
activities. The problem of anemia during pregnancy not only available in it.
affects the person suffering from that condition, but also
extends to situations such as stillbirths or babies with low It is a regular practice to observe the color of blood using
birth weight, prenatal and maternal mortality [5]. direct vision or through the usage of some chemical
substances for estimation of Hb in blood. These methods use
human perception as the means for matching of colors. So,

color of blood can be a factor for estimation of Hb. Our sample. If the slide is not uniform, the digital photograph is
attempt is to replace the subjective human perception of not smooth. The smoothness is decided by the mean and
color with the non-subjective machine vision. So, it is standard deviation of the colors, as explained later. The
decided to use a method to transfer the color information of transmission, absorption and reflection characteristics of Hb
each blood sample to a digital Image file. products are not linear functions over the complete range
wavelengths. So, it is not expected to have a linear
straightforward relationship between color components of
COLOR INFORMATION FROM DIGITAL IMAGE
blood and the Hb value and hence simpler mathematical or
Before we proceed, it may be prudent to consider the AI methods will not be able to establish the existing
basics of digital image. Eye’s receptors are sensitive to red, relationship clearly. An ANN method provides the required
Green and Blue colors and so, other colors are perceived by solution to this situation.
adjusting the combination of these additive primary colors.
That is, the orange color perceived through the vision is a III. MULTIPLE REGRESSION METHOD
combination of the green and red lights from the computer
monitor [11]. There are a number of methods in which the The correlation between the color of the blood and the Hb
colors are defined in a computer frame. The methods are value is tested using multiple regression method. The color
RGB, HSB, HSL, CMYK and CIE. It has been decided to of the blood is derived from a non-invasive technique. In this
specify the color by means of RGB format in this study. case, the digital photograph of the fingertip is taken before
and after occlusion of blood flow into the fingertip. The
In RGB format, a digital image frame consists of a number photographs are found to be uniform and it is possible to get
of pixels, say 640 x 480. That is, the frame will have 640 the R G and B values of the digital photographs. It is found
pixels horizontally and 480 pixels vertically, making it that the color of the fingertip is darkened after occlusion of
307200 pixels in all in the frame. There could be better blood flow due to welling of blood in the fingertip after
resolution such as 1024 x 768 and so on. In each pixel, the occlusion of blood flow. There is a perceivable difference in
color information is defined in terms of the combination of readings for R G and B values from the photographs of the
color information pertaining to red, green and blue colors in fingertip taken before and after occlusion of blood flow.
that pixel. The color level of the pixel for that particular Sample photographs of a fingertip before and after occlusion
color is defined as one of the 256 classified color levels. of blood flow are shown in fig 1.
These 0 to 255 levels are indicated by an 8 bit digital word
[12]. In a picture with non-uniform color content, these
values keep varying from a pixel to the next. If there is some
uniform information available in each frame, the color
values of adjacent pixels will not vary very much. It is
possible, in that case, to define the color of the frame in
terms of average value of each of the color.
HEMOGLOBIN AND COLOR OF BLOOD
It is found that the color of blood is an indication of level Fig 1 Photograph of a fingertip of a person before and
of Hb in one’s blood. In this study an attempt is made to get after occlusion of blood flow. Left picture is before and
the relationship that exists between the color of blood as right picture is after occlusion of blood flow.
measured by digital means and the Hb level. Hb value of a
person depends on the number of Red Blood Cells (RBCs) In this study, readings are taken from 200 people. The R,
available in one’s blood. Red blood cells in blood occupy G and B values are listed from the photographs taken before
45% of the blood volume. RBCs have the form of bi- and after occlusion of blood flow. The difference in R, G
concave disc with a mean diameter of about 7.5 Pm and and B readings gives an indication of the blood color in
thickness of about 1.7Pm [13]. The mean surface area of terms of R, G and B differences, taking care of any effect
each RBC is about 154Pm2. The mean volume of each RBC due to various skin colors. The readings are taken for 200
volunteers. Multiple regression software by NCSS is used
is about 90 Pm3. Compared to this size of RBCs, other cells
for establishing any relationship that exists between the
have negligible size. So, when blood is smeared on a glass
blood color and Hb value. The regression gives the
slide homogeneously, the RBCs spread uniformly on the
relationship as follows:
surface.
Since the smear thickness is a small fraction of an mm, the Hb = Nr / Dr , where,

RBCs occupy the major portion of the smear. Plasma has Nr = 11.5 – 3.7 R – 1.4 G – 0.1 B + 0.08 R2 + 0.03 RG +
92% water and so is expected not to contribute for the color 0.02 G2 +.1 RB+0.04 GB + 0.01 B2
Dr = 1– 0.3 R– 0.2 G + 0.02 B + 0.07R2 + 0.004RG +
of the slide. When a photograph is taken by a digital camera,
0.003G2+ 0.02RB +.005GB -.0005B2
as the smear thickness is only a fraction of an mm, the light
from the flash of the camera is expected to penetrate the
In this, Hb is the Hb value for the person and R, G and B
complete thickness of the slide and get reflected. The glass is
variables are differences in R, G and B readings before and
transparent and at the bottom portion of the glass slide white
after occlusion of blood flow to the fingertip. It can be seen
surface is used. So, the resultant color recorded in the
that the values of R, G and B readings before occlusion are
camera is a true representation of the color of the blood

more than the readings that are taken after occlusion.
The results indicate that there is a close correlation
between the difference between the R, G and B values
before and after occlusion and the Hb value of the blood of
the person. However, perfect prediction is not possible.
Usually, there will be a number of observed variation points
around the predicted regression line. The deviation of a
particular point from the regression line is called residual
value (X). Coefficient of Determination (R2) is defined as
the value (1-X). It indicates the extent of perfection of fit. It
is found that in this case the multivariate regression relation
obtained is very close. The R2 value is 0.71 and it shows that
the relationship derived explains about 71% of the cases in
actual data. The regression expression gives the best
estimation for the predicted value (Hb) in terms of
independent variables (R, G, B values).
Fig. 2: Slides of various samples with different Hb
IV. ANN METHOD values in mg/dl. Individual Hb values are given in the
Having established that there is a possibility of a figure. One can note that the areas of smear is not
relationship between the color of the blood and Hb value, it uniform at some places
is decided to use ANN to arrive at better mapping. There are The BPN architecture used consists of five layers as
a number of architectures available for verification of the shown in figure 3, with one input layer, 3 hidden layers and
relationship. Feed forward network with BPN learning one output layer. Three neurons in the input layer
algorithm, ART1 and ART2 networks offer very good correspond to the three colors, R, G, and B. A single neuron
alternatives. BPN and ART2 networks can accept data in in the output layer provides the Hb level in the blood. The
analog form and ART1 expects data in digital form. number of hidden layers and the number of neurons in each
However, since it is a first attempt, it is decided to use BPN hidden layer vary based on the accuracy of the results
algorithm. required and the acceptable time for computations.
Table No 1
For better estimation, an ANN method is studied. In the Representative samples for estimation of Hb using color
present study, 2007 of samples of blood are taken from information.
different cross-section of people – young and old, males and Slide Hb Red Green Blue
females, people with anemia and those are normal and No
people with different levels of anemia, namely, mild, 97 10.9 201 162 128
moderate and severe. Samples were collected with Hb value 101 10.8 197 135 102
as low as 4.4 mg/dl up to Hb value of 16.5 mg/dl. The Hb 103 11.0 194 153 119
levels are measured using the standard Cyanmethemoglobin 107 8.4 206 162 126
method for these samples, to maintain a high level of 109 10.7 162 96 78
accuracy. The same sample of blood is then smeared on a 113 6.4 206 187 168
glass plate to prepare a slide. This smeared blood slide is 117 11.9 196 163 143
then photographed using a high-resolution digital camera to 123 13.3 217 146 133
obtain the digitized information of the pixels for processing. 130 13.4 195 137 87
The camera has a resolution of 1280 x 1024 pixels for each
frame and 24-bit color information for each pixel. The slides
The values are normalized by dividing the color values by
are digitally processed and digital images are transferred to
255 and the Hb value by 20. These values are then tabulated.
different files. The images so transferred are processed using
Normalized R, G, B values obtained from the slides
software to identify the color of the uniform smear.
This program provides the R, G, B values of each pixel prepared and the normalized Hb value obtained from the
relating to Red, Green and Blue levels respectively. These R, standard Cyanmethemoglobin method form the training pair
G, B values determine the color information of each pixel, to train the ANN. Out of 2007 samples collected, 1500
representing the color of the blood. After identifying the area samples are used to train the network and the remaining 507
of uniform smear and the R, G, B values, the mean values of samples are used to verify the mapping capability of the
the R, G, B readings and the corresponding standard network. Training has been carried out for BPN various
deviations are calculated. A few color slides of samples with architectures such as:
their corresponding Hb values are shown in Figure 2. 1. One hidden layer with 4 or 5 or 6 neurons
The slides with readings of significant value for standard 2. Two hidden layers with combinations for each of
deviation as compared to the mean value are considered to the hidden layer such as 2 or 4 or 5 neurons in the
be non-uniform. Such a slide is discarded from first hidden layer and 5 or 6 or 7 or 9 neurons in the
consideration. Thus, the color information in respect of each second hidden layer
of the slide is identified. All the information is then 3. Three hidden layers with 2 or 4 or 5 neurons in the
tabulated. Table 1 contains readings corresponding to 9 of first layer, 3 or 5 or 6 or 7 neurons in the second
the slides prepared for the analysis. layer and 2 or 4 or 5 neurons in the third layer.

minutes) after smear of blood for good results. It is also
found that the photograph must be taken at angle of 90q with
Hb the plane of the slide.
Output layer
Table No 2
1 2 3 4 5 Comparison of results for 10 samples. The Hb values
H
(mg/dl) are estimated by standard and BPN methods
Hidden S.No Red Green Blue Hb by Hb by
1 2 3 4 5
H Layers Cyanmethemo- BPN
Globin method method
1 0.51 0.41 0.30 11.6 11.4
H 1 2 3 4 5 2 0.54 0.48 0.37 11.0 11.0
3 0.54 0.49 0.38 10.8 10.8
4 0.53 0.46 0.37 11.4 11.2
5 0.52 0.49 0.33 9.8 9.2
6 0.49 0.41 0.35 10.2 10.2
Input layer
R G B 7 0.55 0.52 0.38 10.2 10.4
8 0.58 0.52 0.38 9.8 9.6
9 0.44 0.26 0.14 12.0 12.4
10 0.47 0.30 0.14 12.0 12.2
Fig. 3 BPN structure used in this problem
The results are compared with the standard
Cyanmethemoglobin method for evaluation of Hb in blood,
as it has been indicated in many literatures that there is a
V. RESULTS large difference between the results obtained using the
It is found that the configuration with 3 hidden layers standard method and other methods. The other methods are
containing 5 neurons each offers the best results. In the other being practiced very much in many rural and low resource
combinations, the training does not converge to an settings, as they are the only affordable methods available to
acceptable level of error rate. It is verified that with more many poor rural folk. As per the report by National Institute
than three layers and more neurons per layer, the results do of Nutrition [7], the best method other than the
not offer more attractive accuracies but take more time to cyanmethemoglobin method, namely the Hemocue method,
train the network. The network is trained with fixed training has variations in the range of 2 mg/dl of Hb reading.
rate of 0.4 and for accuracies of 20% and 15%. Accuracies
below 15% demand more computing time. The computing The Bland Altman plot for the comparison of results
time is about 20 hours to realize an accuracy of 5%. Lesser between the Cyanmethemoglobin method and ANN method
the noise content of the samples (due to non-uniform smear), is given in fig.4. Bland Altman plot is deemed simple both to
the better is the accuracy of the result. do and to interpret. It became available only in recent years
and is considered to be a substitute for correlation and
It is observed from the sample values, that red content in regression analyses [14]. The X-axis shows the mean of
blood is the maximum followed by Green and Blue colors. It results from the two methods and the Y-axis represents the
is also seen that the Hb levels as given by the BPN method difference between the results from two methods
are within 5% of the values indicated by the standard
Cyanmethemoglobin method. Comparisons in respect of ten 20
samples are given in table 2.
Difference between two
15
10
readings in %
This confirms the choice of the BPN technique proposed

in this paper for estimation of Hb in blood. It is expected 5
that the smearing of the blood over the slide may not be 0
uniform possibly affecting the end result marginally. Slides -5 0 5 10 15 20
with significant non-uniformity are already eliminated by
-10
computing the standard deviation. Hence the resultant
training inputs are capable of providing the training to the -15
network taking into consideration the insignificant non- Hb value s in m g/dl
uniformity in spearing.
Fig 4. Bland Altman plot for comparison of results of
On the other hand, the possible human error due to
measuring Hb using ANN method and
subjective estimation of color is totally eliminated by the use
Cyanmethemoglobin method.
of camera. To establish the consistency of the results, the
. In our Bland Altman plot, the mean of the values
images of slides of the same blood sample are created by
estimated by the two methods is taken in X-axis and the
taking the photograph at different angles, under different
lightning conditions and from different distances. It is found difference between them is plotted in Y-axis. We have
that the photograph must be taken immediately (within a few chosen to plot the difference in percentage in Y-axis as

against the actual difference because the difference in lower [9] Hole's Human Anatomy & Physiology, 9th Edition, McGraw Hill,
2002.
values of Hb may be small but it may constitute a significant [10] S Ramakrishnan, K G Prasannan and R Rajan, Textbook of Medical
percentage. In such a case, the comparison is easier and the Biochemistry, Orient Longman, 1990, ISBN 81 250 0764 4
read-out is straightforward. We have plotted the difference [11] Tay Vaughan, “Multimedia- Making it work”Tata McGraw Hill, 4ed,
for all values only and it can be seen that there is a cloud in 1998, ISBN 0-07-463953-6
[12] Fred Halsall, “Multimedia Communications” Pearson Education,
the –5 % to +5% region between Hb values of 5 mg/dl to 15 2001, ISBN 81-7808-532-1
mg/dl. [13] R S Khandpur, “handbook of Biomedical Instrumentation” Tata
Mcgraw-Hill Company Limited, New Delhi, 1987, ISBN - 0 07 -
451725 – 2
[14] Katy Dewitte, Colette Fierens, Dietmar Stocld and Linda M
VI. CONCLUSION Theinpont, “Application of the Bland - Altman Plot for Interpretation
of Method – Comparison Studies : A Critical Investigation of its
Practice” Journal of Clinical Chemistry, 48, No 5, pp799-801, 2002
In these studies, it has been established that there is a
strong correlation between the R, G and B components of H. Ranganathan , India, received his BE in
the digital photograph of one’s blood and the Hb value. For Electronics and Communication Engineering and
M.Sc (Engg) in Communication Systems Engineering
better accuracy, an ANN method is tried. In this ANN from College of Engineering, Guindy, Madras 600
method, the color of the blood is determined by taking 025, India in the year 1975 and 1978 respectively.
digital photograph of the blood by making a smooth smear Currently he is pursuing PhD in Anna University,
of blood. In the procedure adopted in preparation of the Chennai, India.
He had worked in Installation and Maintenance of various computers
blood smears to evaluate the Hb value, there is a clear need
with different configurations from 1978 until 2000. He had developed
for standardizing the whole process to find the precise value Hardware and Software solutions for various field problems with many
of Hb content in blood. Once that is done, it is easier to customers in India. He has taken to teaching from November 2000.
diagnose the status of the patients and finally to offer them Currently he is Professor and Head, Department of Electronics and
Communication Engineering, Sakthi Mariamman Engineering College,
the correct remedial advice. The results of BPN method
Thandalam, 602 105 (near Chennai), India
proposed in this paper to estimate HB in blood are within
5% of the results of standard but complex N.Gunasekaran, India, (M’78) received both his
cyanmethemoglobin method in 85% of the samples tested. Masters in Engineering in the year 1974 and Ph.D in
By standardizing the procedure of smear, the angle of taking the year 1985 from University of Madras in the area
of Microwave Engineering. His current fields of
photograph, distance between camera and the slide, the interest include: Microwave Engineering, Optical
ambient light and the time of taking the photograph after Communication, Antenna, Electromagnetic fields,
preparing the slide, it is not only possible to establish closer Satellite Communication, and Artificial Neural
correlation between the color of the blood sample and the Networks
He has joined teaching profession in the year 1974. Currently he is
Hb level, but also consider ANN method to be a future Professor, Department of Electronics and Communication Engineering,
substitute for the standard cyanmethemoglobin method. College of Engineering, Guindy, Anna University, Chennai, 600 025,
Thus it is concluded that the BPN method is simple method India. He has over 60 papers to his credit published in National / Inter
to establish a complex relationship between the color of the national Workshops, Journals and Symposia. He was involved in many
Sponsored projects sponsored by State and National / International funding
blood and its Hb level. This can be used to establish a simple Agencies. He has guided over 100 UG / PG projects. He is presently one of
and cost-effective method to measure Hb level in patients, the task Team Members in a Project involving design, fabrication, testing
specifically in low resource rural settings. The results show and launching of a micro satellite in collaboration with ISRO, Bangalore,
an interesting and effective approach to estimation of Hb in India. He is also presently involved in the design of a seamless
communication network over coastal area of Tamilnadu, India. Dr. N.
blood. Gunasekaran is a member of IEEE, life member of ISTE, Fellow of IETE,
Life member of IE, OSI and BES.
REFERENCES
[1] Artificial Neural Networks in Medicine [Online], Available:
http://www.emsl.pnl.gov:2080/docs/cie/techbrief/NN.techbrief.html
[2] Electronic Noses for Telemedicine [Online], Available:
http://www.emsl.pnl.gov:2080/docs/cie/neural/papers2/keller.ccc95.
abs.html
[3] Pattern Recognition of Pathology Images [Online], Available:
http://kopernik-eth.npac.syr.edu:1200/Task4/pattern.html
[4] “Focus on Young Adults”, report after 6-year program funded by
USAID and led by Path Finder International. Report available at
http;//www.pathfinder.org/focus.htm
[5] Dr. Monika Malhotra, “Severe Anemia linked to poor outcomes for
pregnant women and their babies”, International Journal Gynecology
and Obstetrics, 2002, 79 p p 93 – 100
[6] Anemia detection methods in low-resource settings: a manual for
health workers – December 1997, Available: http://www.path.org
[7] GV Ramana Rao, “ A Comparative study on prevalence of Anemia in
women by Cyanmethemoglobin and Hemocue mthods” in
Proceedings IX Asian Congress of Nutrition, New Delhi, 2003,
Abstract p p 98
[8] S K Kapoor, Umesh Kapil, Sadanand Dwivedi, K Anand, Priyali
Pathak,Preeti Singh, “Comparison of Hemocue method and
Cyanmethemoglobin method for estimation of Hemoglobin”, Indian
Pediatrics 2002; 39:743-746

One-class/Two-class Training for Windows NT User Profiling

Li Ling
Electrical Engineering Department, New Jersey Institute of Technology, NJ 07102, USA
C.N Manikopoulos
Electrical Engineering Department, New Jersey Institute of Technology, NJ 07102, USA
Abstract—Previous research has mainly studied UNIX system acknowledged challenge in the area of user profiling is how to
command line users, while here we investigate Windows system accurately model a user’s behavior while it changes
users, utilizing real network data. This work primarily focuses constantly.
on one-class Neural Network Classifier and Support Vector Constructing effective user profiles is a challenging
Machines masquerade detection. The one-class approach offers
significant ease of management of the roster of users, in that the
problem. Compared with program applications or system
addition of new users or deletion of legacy ones requires much performance, human behavior appears erratic and thus hard to
smaller effort compared to the multi-class case. Two-class study profile. It has extremely broad range of “normal” and it can be
has also been carried out for the purpose of comparison. Both highly unpredictable. Moreover, insider misuse may look like
receiver operating characteristic (ROC) curves and Area under “normal” if the masquerader can mimic the user’s behavior
the ROC curve (AUC) have been evaluated to compare the successfully. In addition, the legal users themselves may
performance of detecting different masqueraders from different
legitimate users. For Neural Network (NN) two-class training,
sometimes behave differently than their trained profile,
the best performance is hit rate 90% achieved with false alarm because of a change in their mission, for example, which may
rate of 10%. For Support Vector Machines (SVM), two-class cause a false alarm.
training shows that about 63% hit rate can be reached with a Another difficulty in detecting masquerades is the
low false alarm rate (about 3.7%). The results of one-class SVM unavailability of many proven techniques, especially for
training show the detection rate of about 66.7% with false alarm Windows systems, due in large part to the lack of relevant
rate of about 22%. Even though the one-class training approach
results in some sacrifice of performance for false alarms, the experimental data faithfully represents sessions in the
gains in ease of roster management and reduction in training Windows environment. Obtaining real data, particularly of
needed may be more desirable in some practical environments. attack attempts, is difficult for a number of reasons, including
concerns about privacy and business aspects, the large and
I. INTRODUCTION time-consuming collection effort involved, the need for tools
for managing such efforts, etc. In light of this, the data used in
M asqueraders are individuals who impersonate other
users on computer networks and systems. They could be
intruders from outside the network who steal other users’
this work have been made available to the public, at one of the
author’s website [1], in the hope that this will enable more
passwords to gain access of a super-user account. More likely researchers to conduct relevant work.
they could be insiders, such as disgruntled employees with In generating the data analyzed here, other approaches have
malicious intentions, attempting to gain additional privileges. been employed. In particular, it has been found that the
From the system’s point of view, many operations executed by Windows process table provides a unique and rich perspective
an insider masquerader may be technically legal and thus of a user’s behavior. A tool has been employed that queries the
undetectable by existing access control or authentication Windows NT process table periodically and collects all the
mechanisms. To detect this kind of attack, the most promising process information associated with each user [5]. Processes,
avenue is to track and analyze the operations a legitimate user in turn, are mapped to various user modes representing
executes. That will enable us to build a long term historical different applications that the user runs [5].
profile of the user’s activity. This, in turn, will allow us to The user interface of Windows is quite different from that
compare the user’s recent behavior against their long term of UNIX (for example, it primarily relies on mouse clicks than
historical profile. Finding significant deviation as a result of command inputs which). In the past, much work has been done
this comparison might indicate the possibility that a for profiling users in the UNIX operating system environment
masquerader is at work. (for example, [2] [3] [4]). In particular, short sequences of
User profiling is an important technique for detecting an UNIX shell commands have been used to characterize a user’s
insider’s misuse of critical information systems. A “user behavior with modest success [2] [3] [4]. Many UNIX user
profile” contains information that characterizes a user’s profiling techniques do not directly apply in a Windows
computer usage behavior. Profiling users can help security system. To investigate new techniques for Windows NT
officers to identify activities that deviate from a user’s normal system, which is another important operating system, is
patterns and distinguish one user from another. A widely crucial and important.
As discussed in [5], collecting real data describing
legitimate users is much easier than getting data that describe

masquerading attacks. For example, when using the community of 50. Data for each user comprised 15,000
“Schonlau dataset” [2], the authors utilize following approach: commands. The first 5,000 commands constituted ground
each user’s command lines are divided into 150 blocks of 100 truth (i.e., contained no injected data), and the last 10,000
command lines. They randomly select 50 users to serve as commands were probabilistically injected with commands
intrusion targets, while the remaining users serve as issued by another user. The idea was to detect blocks of 100
masqueraders, their blocks interspersed amongst the normal commands typed by the “masquerader,” discriminating them
users’ data. In this paper, we follow above approach and from blocks of 100 commands typed by the true user.
simulate illegitimate sessions by attributing them to an Maxion and his colleagues [4] used the “Schonlau dataset”
incorrect data source. described above, and extended Schonalau’s work. First,
The data we are investigating are colleted in a live another experiment configuration is presented in [4]. In stead
Windows NT environment over about a year’s period. of randomly injecting with data from users outside the
Because our purpose here is detecting “abnormal” from user’s community of 50 (SEA configuration as described in [2]),
profile, we only have interest on whether the new session does Maxion constructed an experiment where each user crossed
or doesn’t belong to the real user (a “positive” or “negative”), with every other user to compare the effects of every user
we needn’t know how exactly the masquerade attack behaves acting as a “masquerader” against all other users (1v49
or what kind of masquerade attack it is. Therefore, what we experiment). The advantages of this experiment are: (1) it
need is a binary classifier which gives us “normal’ or provides a consistent set of intrusions against all users, which
“abnormal” result instead of clustering masquerades. We allows meaningful error analysis, and (2) it samples a greater
apply both Neural Network Classifier and Support Vector range of intrusive behavior (2450 sequences from 49
Machines to decide “self” and “non-self” after training with masqueraders per victim, as opposed 0-24 sequences from a
“negative” records from legitimate user session and “positive” maximum of three different masqueraders per victim under
records from masqueraders, which is similar as “signature” of the SEA regime).
attacks in intrusion detection systems. A new detection algorithm, inspired by Naive Bayes text
In this paper, we focus on a different approach with classification, was used in [4] also. This algorithm was chosen
important practical advantage. We train a user profile by because the self/nonself classification task is similar to the
modeling his data exclusively, without using examples from generalized task of text classification, and the Naive Bayes
other users, and achieve good detection performance using classifier has a history of good performance in that context
one-class SVM, with a little higher false positive rate. This is The results show that the Naïve Bayes method with updating,
so-called “one-class” training approach. Instead of using both using a threshold extracted from the training data by cross
positive and negative examples to build both self and non-self validation, obtained a false alarm rate of 1.3% with a
profiles, we use one-class training to build “self” profile only concomitant hit rate of 61.5%.
with legitimate sessions and decide “masquerader” when The above two works describe UNIX user profiling data
significant deviation occurs. This “self” profile idea is similar sets, experiments and techniques. Goldring designed a tool
to the widely used “anomaly detection” techniques in that queries the Windows NT process table periodically and
intrusion detection systems [6][7][8][9][10]. It is found that collects all the process information associated with each user
this one-class approach achieves satisfactory detection [1][5]. Processes, in turn, are mapped to various user modes
performance, with only a small penalty in higher false positive representing different applications the user runs. Local
rates, compared to multi-class methods. The important ordering of user modes has been used to model a user’s
advantage of the one-class approach is that it allows the behavior, where each user profile contains a large amount of
management of users in the systems “effortlessly.” The seen short sequences (n-grams) of user modes, and statistical
addition of new users or the removal of existing ones can be or symbolic learning methods are employed to classify a new
carried out by just adding or removing only their own profiles, sequence as normal or abnormal [5]
it does not affect anyone else’s profiles; by contrast in III. WINDOWS NT USER DATASET
multi-class approaches the whole system would need to be The data analyzed here were collected on an
re-trained, a process that is laborious and time-consuming. internet-connected Window NT network, and represents
The rest of the paper is organized as follows. Section II “real” network interactions on an actual network. As
reviews related works. Section III describes the Windows NT mentioned, the data are provided in [1]. The original raw data
user dataset and features [5]. Section IV discusses the were reformatted, so the background operating system
one-class SVM methods. Section V described the behavior (which for the present purpose can be thought of as
experimental setting and our results and Section VI concludes noise) can be filtered out [5].
the paper with our findings and future works. Each file represents a single session, i.e. the user’s
II. RELATED WORKS interactions from login to logout. In particular, what appears
in any given session is a temporal stream of window titles and
Schonlau and his colleagues [2] used keyboard command
process table information. In order to protect the privacy of
data from 50 users, injected with data from users outside the
the users so that the data could be made publicly available,

many of words in the window titles was “sanitized” by be highly effective in text classification [12], among other
mapping them into a meaningless form. important learning tasks. They are maximal-margin classifiers.
Each session contains three types of records: window, In the two-class formulation, the basic idea is to map feature
Process, and Ancestry. vectors to a high dimensional space and to compute a
1) Window hyperplane that not only separates the training vectors from
A line that begins with a left parenthesis. The fields are different classes, but also maximizes this separation by
1. line number in original raw data making the margin as large as possible.
2. delta t (seconds) since login Scholkopf et al. [13] proposed a method to adapt the SVM
algorithm for one-class SVM, which only use examples from
3. name of process
one-class, instead of multiple classes, for training. The
4. pid=the actual process ID one-class SVM algorithm first maps the input data into a high
5. the window title as it appears in the title bar. dimensional feature space via a kernel function and treats the
origin as the only example from other classes. It then
2) Process
A line that begins with two spaces represents process iteratively finds the maximal margin hyperplane that best
table information. The fields are separates the training data from the origin.
1. line number in original raw data, Considering that our training data set x1 , x2 ,..., xl ∈ X ,
2. delta t (seconds) since login is the feature mapping X t o a high-dimensional space, we
3. name of process can define the kernel function as:
4. a, b, c, or d corresponding to background, birth, k ( x, y ) = (Φ ( x ) × Φ ( y ))
continuation or death
Using kernel functions, the feature vectors need not be
5. total cpu time accrued by process computed explicitly, greatly improving computational
6. ancestry information efficiency since we can directly compute the kernel values and
operate on their images. Some common kernels are linear,
3) Ancestry
polynomial, and radial basis function (rbf) kernels:
This field indicates those process records whose pid match
that of the current window are listed. Linear Kernel: k ( x, y ) = ( xy )
According to this dataset, we should construct our feature P-th order polynomial kernel: k ( x, y ) = ( xy + 1) p
vectors before we feed them into SVM. For each session (from 2
− x − y / 2σ 2
login to logout), we construct one feature vector. rbf kernel: k ( x, y ) = e
For example, the features we can use are such as: Now, solving the one-class SVM problem is equivalent to
• Log (# of total windows opened) solving the dual quadratic programming (QP) problem:
• Log (elapsed time between login and logout) 1
• Log (total cpu time) min
α 2
∑α α k ( x , x )
ij
i j i j
• Log (process lines) 1

subject to 0 ≤ αi ≤ , ∑αi = 1
Or present as stream features: vl i
• Log (time inactive for any interval >1 minute)
where αi is a Lagrange multiplier, which can be thought of
• Log (elapsed time between new windows)
• Log (1+#of window opened) whenever it as a weight on example xi , and ρ is a parameter that controls
changes the trade-off between maximizing the number of data points
• Log (1+#of process lines between successive contained by the hyperplane and the distance of the
windows) hyperplane from the origin.
• Total cpu time used per window After solving for αi , we can use a decision function to
… classify data. The decision function is:
Here we use log operation to each feature. Original features f ( x) = sgn(∑ i α i k ( xi , x) − ρ )
may have great range (for example, elapsed time between
login and logout ranges from several second to over 10,000). where the offset ρ can be recovered by
By using log we can reduce the range and provide better ρ = ∑ j α j k ( x j , xi )
scaling before feeding data into SVM and NN, which results
better classifying performance. In our work, we used the LIBSVM 2.4 [11] available at
http://www.csie.ntu.tw/~cjlin/libsvm for our experiments
IV. ONE-CLASS SUPORT VECTOR MACHINES LIBSVM is an integrated tool for support vector classification
and regression that implemented Sholkopf’s algorithm for
Support Vector Machines (SVM) [11] has been shown to

one-class SVM. We used the default rbf kernel and the default how exactly the masquerade attack behaves or what kind of
values of the parameters for one-class SVM. masquerade attack it is. Therefore, what we need is a binary
classifier which gives us “normal’ or “abnormal” result
V. EXPERIMENTS AND RESULTS instead of clustering masquerades.
As we discussed above, it is easier to collect user profile In this section, we conduct experiments using one-class
data for legitimate users but hard to obtain data for actual training. For each target user (namely, user1, user4, user7 and
masquerade attacks. Therefore, we can simulate user8), we train a SVM and NN using only his legitimate
“masquerader” data (illegitimate sessions) by attributing them training sessions. One-class training has practical meaning
to an incorrect user, i.e. a different user. In previous work, when only user’s own legitimate sessions are available.
mainly two kinds of experimental methods have been applied 1) One-class SVM
to investigate masquerade detection: 1) divide sessions We present one-class SVM results for user1 in table2. From
(command lines) of different sources into small blocks and Table 2, we could see the hit rate ranges from 53.7% to 78.4%
randomly interject blocks from illegitimate users (as when false alarm rate is 22.8%. So far, best reported results of
masquerade blocks) into target users’ sessions. Blocks are using command line sequences in UNIX, [4][5] are 60-70%
labeled for training and evaluation of testing [2]; and, 2) is an with a false positive rate as 1-2%. Compared with those, our
alternative method -“1v49” exhaustive evaluation presented results have competing hit rate (average of 66.7%) with higher
in [4]. The authors take each user as the (legitimate) target false alarm rate (22.8%).
each time, while interpreting all other users as “masqueraders The reason that we have a higher false alarm is we totally
In our experiments, we follow “1v49” experiment but make don’t use non-self data for training. Without knowing non-self,
some modifications. We take all activities between “login” the decision threshold is blinder, which increases the false
and “logout” as one “session” and deal with whole session alarm. Just like in intrusion detection systems, abnormal
during training and testing. We use one target user’s sessions intrusion detection may have higher false alarm rate.
as “negative” training to compute a one-class SVM without Considering much less data collecting we need for one-class
any positive training data. We use remaining “negative” training, this result is encouraging.
sessions and test with each other user’s sessions as Table 2: Results of One-class SVM for User1
“positives”. Hits (%)
We choose eight users with over 35 sessions for evaluation. User2 62.9
The target users we chose are user1, user 4, user7, user8. User User3 78.4
2, user3, user 19 and user25 are only used for testing. Sessions User4 78.4
used for training and testing are presented in following table: User7 63.2
Table: Number of training/testing sessions User8 56.3
User Id Training Testing User19 53.7
User1 200 87 User25 73.7
User4 100 34 Average 66.7
User7 140 52 (Note: FP=22.8%)
User8 120 47 2) One-class NN
User2 - 54 In Figure1, we present the ROC curves obtained by using
User3 - 37 Neural Network classifier. Back-propagation network is used
User19 - 134 here. From Figure 1 we could find one-class NN is not
User25 - 99 effective for out user profiling. The detection rate is from 30%
In experiment 1, we conduct experiments to evaluate to 50% when false alarm rate is about 20%. The performance
one-class SVM performance for our purpose. We also present is much worse than detection rate of One-class SVM
performance of NN (Back-propagation network, specifically) (53.7-78.4%) with false positive 22.8%.
using one-class training. In experiment 2, we conduct
B. Experiment 2: two-class training
experiments using two-class SVM and NN for the comparison
with one-class SVM performance for masquerade detection. In this section, we conduct experiments using two-class
In experiment 3, we note different user may have different training, meaning, we use self and non-self to train SVM. In
characteristics and “space” from other users, which will cause practice, we may have some knowledge of masquerade attacks,
different performance even for the same classifier; we conduct for example, masquerader will generally try to obtain
some experiments with multiple users using two-class SVM. superuser account. This concept is similar as attack
ROC score is computed for evaluation. “signature’. We take different users as different attack
“signatures”. We use knowledge of each signature during
A. Experiment1: one-class training training, and test the performance this classifier may have for
Our purpose here is detecting masqueraders. We only have different kind of attacks (different users here).
interest on whether the new session does or doesn’t belong to 1) Two-Class SVM
the real user (a “positive” or “negative”), we needn’t know Table 3 presents results we get using two-class training

SVM, user1 is also chosen as an example. 1
Binary training NN for user1
From Table 3, we can see that average false alarm is 0.9

dramatically decreased by knowing “non-self”. We have
0.8
average hit rate of 62.7% with false alarm rate of 3.7%. This
0.7 user2
performance is as good as best result for masquerade detection user3
user8
Detection Rate
0.6
using UNIX command line sequences. user19
0.5
Table 3: Results of two-class SVM for User1
0.4
Hits (%) False
positives (%) 0.3
User2 33.3 0.0 0.2
User3 59.5 0.0 0.1
User4 80.6 2.3 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
User7 86.5 9.2 False Positive
User8 59.3 6.8 Fig3. Two-Class NN ROC curves II

User19 68.7 5.7 Compared with two-class SVM (Table 3), which also has
User25 50.5 2.3 the worst performance when detecting user2 while has the best
Average 62.7 3.7 performance when detecting user 7 (86.5% detection rate with
2) Two-Class NN 9.2% false alarm rate).
In Figure 2 and Figure 3, we present ROC curves obtained In summary, both binary training algorithms (NN and SVM)
by using binary training NN classifier. Binary training NN outperform One-Class training, and have satisfactory
outperforms one-class NN both on false alarm rate and on hit performance for profiling Windows NT users for the purpose
rate. The best performance is gained for user7, 90% detection of masquerade detection.
rate with about 10% false alarm rate. The worse performance C. EXPERIMENT 3: ROC SCORES FOR DIFFERENT
is gained for user2, 60% detection rate with about 10% false USERS
alarm rate. In previous two experiments, we present performance of
One-class NN for user1
1 both one-class and two-class training, using user1 as an
user3
0.9 user4
user7
example. However, the same algorithm might perform quite
0.8 user8 different for different users.
0.7 The area under the ROC curve (AUC) is commonly used as
a summary measure of classification accuracy. It can take
Detection Rate
0.6
0.5 values from 0.0 to 1.0. The AUC can be interpreted as the
0.4 probability that a randomly selected masquerader case (or
0.3 "event") will be regarded with greater suspicion (in terms of
0.2
its rating or continuous measurement) than a randomly
0.1
selected legitimate case (or "non-event"). So, for example, in a
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 study involving rating data, an AUC of 0.84 implies that there
False Positive
is an 84% likelihood that a randomly selected masquerader
Fig 1: One-Class NN ROC curves
Binary training NN for user1
case will receive a more-suspicious (higher) rating than a
1 randomly selected legitimate case. Note that an AUC of 0.50
0.9 means that the classification accuracy in question is equivalent
user4
0.8
user7 to that which would be obtained by flipping a coin (i.e.,
user25
0.7 random chance).
To compare the performance of detection for different users,
Detection Rate
0.6
0.5 we compute the AUC for each user. In table 4, we present the
0.4 AUC for detecting user2, 3, 19, 25for different target users.
0.3 We use two-class training and cross validation is used for
0.2 better parameter selection.
0.1 Table4: AUCs for different targets and masqueraders
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
U2 U3 U19 U25 Aver.
False Positive
U1 0.818 0.848 0.878 0.817 0.846
Fig2: Two-Class NN ROC curves I
U4 0.760 0.969 0.781 0.859 0.853
U7 0.779 0.917 0.985 0.819 0.875

U8 0.616 0.913 0.671 0.739 0.735 [10] Zhang, Z., and Manikopoulos, C., "Neural Networks in Statistical
Anomaly Intrusion Detection," Journal of Neural Network World, Vol.
From above table, we could see AUCs range from 0.6165 to 3, 2001, pp. 305-316.
0.9851 when detecting different masqueraders from different [11] Chih-Chung Chang and Chih-Jen Lin, “LIBSVM: a library for
targets. Results show detecting User2 from User8’s profile is support vector machines”, 2001. Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm
most difficult while detecting User19 from User7’s profile is
[12] Thorsten Joachims, “Text categorization with support vector
most successful. By average, User8 is an easy target while machines: Learning with many relevant features”, In Proc. of
User 7 is a hard target. the European Conference on Machine Learning (ECML), pp.
137-142, 1998.
VI. CONCLUSION [13] B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C.
In this paper, we investigate an important while challenging Williamson, “Estimating the support of a highdimensional
distribution”. Technique report, Microsoft Research, MSR-TR-9987,
problem: detecting masquerade in Windows NT systems. We 1999.
apply one-class SVM, NN algorithm and two-class SVM, NN
for profiling Windows NT users for the purpose of
masquerade detection. We also use AUC to compare the
performance of detecting different masqueraders. Both
Two-class training algorithms have satisfactory performance.
Binary SVM shows that about 63% hit rate can be reached
with a low false alarm rate (about 3.7%). The best
performance binary NN gained for user 7, is 90% detection
rate with about 10% false alarm rate. The results of one-class
SVM show the detection rate of about 66.7% with false alarm
rate of about 22%. This is an encouraging result because much
less training data needed. This has practical meaning for user
profiling when only user’s own legitimate data are available.
Acknowledgement: this work has been partially supported
by a Phase 2 SBIR of the US Army and the NJWINS Center of
the State of New Jersey.
REFERENCE:
[1] ftp://ftp.njit.edu/pub/manikopo/
[2] M. Schonlau, W. DuMouchel, W. -H. Ju, A. F. Karr, M. Theus,
and Y. Vardi, “Computer intrusion: Detecting masquerades”,
Statistical Science, 16(1):58-74, February 2001.
[3] LANE, T., AND BRODLEY, C. E. 1999. Temporal sequence
learning and data reduction for anomaly detection.
ACMTransactions on Information and System Security 2, 3,
295–331.
[4] MAXION, R. A., AND TOWNSEND, T. N. 2002. Masquerade
detection using truncated command lines. In Proceedings of
International Conference on Dependable Systems & Networks,
Washington DC, 23–26 June 2002, 219–228.
[5] GOLDRING, T. 2002. Recent experiences with user profiling for
Windows NT. In Workshop on Statistical and Machine
Learning Techniques in Computer Intrusion Detection, Johns
Hopkins University, 11–13 June 2002.
[6] Manikopoulos, C., and Papavassiliou, S., “Network Intrusion and
Fault Detection: A Statistical Anomaly Approach”, IEEE
Communications Magazine, vol. 40, no. 10, pp. 76-82, October 2002.
[7] Li, J., and Manikopoulos, C., “Early Statistical Anomaly Intrusion
Detection of DOS Attacks Using MIB Traffic Parameters,”
Proceedings of the 4thAnnual IEEE SMC Information Assurance
Workshop (IAW03), West Point, NY, June 18-20, 2003.
[8] Zhang, Z., and Manikopoulos, C., “Investigation of Neural Network
Classification of Computer Network Attacks,” Proceedings of the
International Conference on Information Technology: Research and
Education (ITRE2003), NJIT, Newark, NJ, Aug. 10-13, 2003.
[9] Li, J, Xu, S., Manikopoulos, C., and Papavassiliou, S., “Anomaly
Network Intrusion Detection for AD-HOC Mobile Wireless Networks,”
Proceedings of the 3rd Annual IEEE SMC Information Assurance
Workshop (IAW2002), West Point, NY, June 17-19, 2002.

Combinatorial Productivity through the Emergence of Categories

in Connectionist Networks
Francis C. K. Wonga, William S-Y Wangb
Language Engineering Laboratory, Department of Electronic Engineering,
The Chinese University of Hong Kong.
franciswong@cuhk.edu.hka, wsywang@ee.cuhk.edu.hkb
Abstract — Combinatorial productivity refers to events or learnability of language [5, 6] from a new perspective.
objects where complex entities are composed by combining
simple elements in a linear or hierarchical fashion. In such A. Combinatorial Productivity in language modelling
cases, complexity in terms of number arises. In human For adult language users it is apparently straightforward
language, sentences are composed of lexicon that may be an that having mastered a sentence construction one could
ever growing open set. How language learners being exposed to comprehend all possible sentences of that type even though
just a fraction of the language generalise their knowledge
most of the instances have near zero probabilities of
combinatorially to comprehend all possible grammatical
sentences is a challenge being tackled in connectionist occurrence in the language input. Adults achieve
modelling research. In this study we will (a) provide simulation generalisation from “sparse input” [7] to competence by
experiments to show connectionist networks’ potential to having mastered the underlying syntactic rules of the
generalise combinatorially; (b) explore the plausible language, which governs the lawful and meaningful ways of
mechanism underlining networks’ success. combining categories of syntactic elements, and the proper
(Keywords: connectionist cognitive sciences, language assignment of lexical items into syntactic categories. Since
processing models, recurrent networks, combinatorial
rules operate over categories under this traditional
productivity)
framework of analysis, language users by definition possess
I. INTRODUCTION a high degree of generalisation ability. Once a rule is learnt
it immediately applies to every element of the categories that
The notion of combinatorial productivity in the present define the rule.
context refers to the abilities to deal with the multiplicative This feature of rules have led some scholars to take rules
growth of the number of possible combinations of objects per se as the mechanism underlying language learning [8, 9].
with the number of classes of objects to be combined and However, in the present study, we would like to seek ways
the sizes of those classes. The idea can be best illustrated to explain how rules may be deduced from exemplars in the
with the construction of sentences in a language. Suppose first place. To be more specific, we try to explain the
we consider that a sentence is formed by a combination of emergence of higher order phenomena from the lower level
nouns and verbs in which they will occupy various syntactic processing that are more grounded in neurological terms as
positions. The number of possible sentence in the language implemented by artificial neural networks as associations.
is mn, this quantity grows exponentially with the number of We would follow up the discussion raised by van der
syntactic classes, n. In the case of simple declarative S-V-O Velde et al. [1, 10, 11] concerning whether connectionist
sentences in English, n = 3. The size of the language also networks as one of the contemporary cognitive models
grows polynomially with the size of each class, i.e. the exhibit ability to generalize, to be productive in a
number of nouns and the number of verbs in the language, combinatorial sense. We would attempt to demonstrate, to
assuming both to be equal to m for simplicity here. the contrary of van der Velde et al. [1], that networks do
The complexity arisen on one hand poses a challenge in exhibit such abilities and we would also probe the question
the area of machine learning as often referred “the curse of of how networks could achieve that. We hypothesise that
dimensionality” [3], the problem of finding enough training networks succeed through the emergence of categories
examples to cope with such a combinatorial complexity with which could be observed by analysing their internal
respect to generalization [4]. On the other hand, as recently representations developed during the course of training.
raised by van der Velde et al. [1], it is an issue yet to be
properly addressed in building up computational models to II. CONNECTIONIST LANGUAGE PROCESSING MODELS
account for the combinatorial nature of cognition. The focus
of this study is on the latter, we would pay particular The connectionist architecture to be discussed in this
attention from a linguistic perspective as we consider that study is the simple recurrent network (SRN) model which
the combinatorial nature of cognition is best expressed in was proposed by Elman [12] as a model for processing
language and more importantly it addresses the old issue of sequential information. It has evolved mainly as models for
the acquisition of syntax [13-16] and sentence processing

[17-19]. sequence. This explains why in(t) and o(t) are of the same
A. The Network architecture dimension \i . More about the prediction task and its use in
this study will be discussed in Section II.B and III. Here we
Fig. 1 shows the general architecture of SRNs that are
focus on giving a brief summary of the working mechanism
commonly employed in the literature. Without the context
of SRNs and provide notation for later analysis.
layer, an SRN is just a layered feedforward network in
Recall that the context layer in an SRN provides the basis
which every neuron in a layer is connected to every other
for the network to process sequential data. At the second
neuron in the layer immediately above it (as shown in Fig. 1,
time step, t=2, the context layer activation c(t) is set to h(t-
boxed). The context layer in SRNs provides the network the
1) which is the hidden layer activation of the network at the
ability to process sequential information through the
previous time step. As the process continues though time,
accumulation of network’s hidden layer activation. Suppose
the context layer will keep track of the accumulated internal
activation of the network. It is commonly denoted as one-to-
one copy-back connection between the hidden layer and the
Output layer
i neurons
context layer, the arrow with dotted line in Fig. 1.
B. Model training and evaluation
W2
As a model for scientific enquiries it should reflect the
Hidden layer
k neurons hypotheses one takes. In the case of SRN model for
language processing, it fits into the emergentist school’s [15,
W1 16, 22-26] perspectives on plausible language acquisition
mechanisms.
Input layer Context layer
i neurons k neurons First, the usage-based nature of language learning, as
proposed in [25, 26], is reflected by the fact that SRNs (and
Fig. 1. The general architecture of a simple recurrent network connectionist networks in general) are statistical learning
employed in connectionist modelling of language processing. Solid devices and they are trained with positive exemplars alone.
lines denote full connections between layers of neurons, represented Second, minimum assumption is built into the model in
as blocks. Arrow with dotted line denotes copy-back one-to-one
connections. Arrows denotes directionality.
order to explore the plausibility for the emergence of
linguistic ability out of elementary domain general
the network is to process a sequence of words (w1, w2 … ), operations, SRN models attempt to provide an existence
the lexicon is by convention encoded with a set of proof that syntax can emerge out of temporal associations of
orthogonal bit strings, for example an identity matrix of size sequences of elements. To achieve the latter, the networks
i where each column of bit string, w1…i, codes for a word in are trained to associate the current word in a sequence
the lexicon. together with the context in which the word appears with the
At the first time step, t=1, network’s input layer next word in the sequence, i.e. in(t ) c (t ) is associated
activation, in(t) \i , is set to wt which is the code of the with w(t+1).
first word in the sequence. The context layer activation, A network’s ability in capturing the grammar of the
c(t) \ k , is set to a null context of some neutral value [0.5, language after training can be evaluated by assessing the
0.5 … 0.5]T since wt is the first word of the sequence. The grammaticality of the network’s output in processing a
concatenation of the vector in(t) and c(t) will then be fed to sentence. Take the processing of a simple declarative S-V-O
the network and propagated to the hidden layer via the English sentence as an example. Since the network is trained
weighted connection W1, which is a (i+k)-by-i matrix. The to associate a word in a particular context with the next
hidden layer activation, h(t) \ k , is given by the equation, word, the network’s output at the first time step, when a
in matrix notation, h(t) M ( W1 u (in(t ) c (t ))) , where M noun is fed, is regarded as the network’s prediction1 of what
denotes a activation function. In this study the logistic words to follow. Such a prediction is non-deterministic
sigmoid function will be used where: because virtually all verbs are grammatical continuations of
M ( x) (1 e x ) 1 and M ([ x1 , x2 ...]) [M ( x1 ), M ( x2 )...] the partial sentence up to this point. Recall that the lexicon is
coded by a set of orthogonal bit strings, the output activation
Similarly, the output of the network, o(t) \i , is then of the SRN is taken to be its estimate of the conditional
obtained by propagating h(t) to the output layer, i.e. probability distribution indicating which words to follow. In
o(t) M ( W2 u h(t )) . The connection weights, W1 and W2, the literature [1, 27], an error measurement called the
are modified using the backpropagation algorithm [20, 21] Grammatical Prediction Error (GPE) is used to quantify the
with an objective to minimise the difference between o(t) grammaticality of network’s output which is defined as:
and some target output, p(t) \i , which is associated with ¦ correct activation
GPE 1
in(t). Very often, a prediction task [12] is used to train the ¦ correct activation +¦ incorrect activation
SRNs and hence p(t) is set to be w(t+1), the next word in the
Continuing with the example of an SRN fed with a partial

sentence, the sum of the activations of the output neurons Eight nouns and eight verbs together with the relative
coding for words that are grammatically correct marker “that” and the end of sentence marker “#” were
continuations constitutes the numerator in calculating the incorporated into the lexicon to compose the training and
GPE. The second part of the denominator is obtained in a testing sets sentences. The key element behind this
similar fashion. Notice that the grammaticality of an SRN’s framework of assessing SRNs’ ability to exhibit
predictions requires not just a mere mastery of bi-gram combinatorial productivity lies in design of the training and
statistics but also the sensitivity to sentence structure. When testing sets. We illustrate the rationale with the two
it comes to the third time step in processing the S-V-O utterance networks shown in Fig. 2. An utterance, consider
sentence, another noun is fed to the network, the network only simple sentence construction, is represented by a path
has to take into account the context in which the noun through the network from left to right. We consider LC as a
appear in order to differentiate an object noun from a subject
noun and to achieve a low GPE evaluation.
LC: Generalisation LA:
In sum, SRNs are faced with the demands of several
n1 v1 n1 word
n1 v1 n1
concurrent and interdependent tasks in learning the artificial n2 n2 word
n2 n2
language: n3 v2 n3 word
n3 v2 n3
i) forming categories of nouns and verbs, since the

n4 v3 n4 word
n4 v3 n4
lexicon is deliberately coded with orthogonal vectors; n5 n5 word
n5 n5
ii) learning to associate words that are immediately n6 v4 n6 word
n6 v4 n6
adjacent with each other;
iii) learning the context dependence of the association. n7 v5 n7 word
n7 v5 n7
n8 n8 word
n8 n8
n9 v6 n9 word
n9 v6 n9
III. FRAMEWORK OF ASSESSING COMBINATORIAL
...
...
...
...
...
...
PRODUCTIVITY
Having introduced the basis of the simple recurrent Fig. 2. Combinatorial productivity and generalisation from training set
to testing set. Assuming a left-to-right directionality, arrow heads are
networks we now turn to the focus of this study, namely, to hidden for simplicity.
look at combinatorial productivity exhibited by SRNs. The
framework of assessment was introduced by van der Velde model of the language available to a child during his
et al. [1] in which they attempted to demonstrate that SRNs acquisition of the target language modelled as LA. If we
lack the ability to generalise with respect to combinatorial consider an utterance as a combination of lexical items, LC
complexity of language and hence argued that SRNs fail to under-represents the target language. Such an under-
be a model of language acquisition/processing. We have representation parallels the observation of the “skewed
reported our replications of their simulation in [2] showing input” of the child directed speech in corpus data [28]
otherwise. In the remaining parts of this paper we will first analysed by Goldberg [29]. Children can and always do
summarise our disagreement with van der Velde et al. [1, 2] generalise their knowledge about the language to
then discuss our recent findings concerning plausible comprehend novel sentences like "the horse sings a story",
mechanism underlining SRNs’ success in our simulations. one of the test stimuli used in [30] as a comprehension task
A. Training and testing sets in which children aged 21 to 35 months old showed no
problem in understanding in general. For SRNs to be a
SRNs of architectures shown in Fig. 3 (a) and Fig. 3 (b) successful model of language acquisition, it should also
were used in van der Velde [1] and in our previous study [2] exhibit such an ability to generalise from LC to LA. The
respectively. The networks were trained with three types of
training and testing sets sentences were thus constructed
sentences, simple, right-branching and centre-embedding
according to Fig. 2.
sentences, as tabulated in Table I. As pointed out by van der
The lexicon of nouns and verbs were divided into four
Velde [1], the use of complex sentences would reveal
non-overlapping groups, we denote the jth member of the ith
whether networks had truly capture the underlining structure
group of nouns as nij and similarly vij for verbs. Training set
instead of merely bi-gram transitions between nouns and
sentences were composed of nouns and verbs from the same
verbs.
group and hence the complete set of 128 right-branching
TABLE I.
training sentences were:
THREE TYPES OF SENTENCE USED IN TRAINING THE SRNS Group 1: {n1a-v1b-n1c-that-v1d-n1e-#},
Sentence types Constructions Natural language
Group 2: {n2a-v2b-n2c-that-v2d-n2e-#},
equivalents
Simple N-V-N-# the boy kisses the girl Group 3: {n3a-v3b-n3c-that-v3d-n3e-#},
the boy kisses the girl
Group 4: {n4a-v4b-n4c-that-v4d-n4e-#}
Right-branching N-V-N-that-V-N-#
that chases the dog where a,b,c,d,e = {1,2}
Centre-embedding N-that-N-V-V-N-# the girl that the boy
kisses chases the dog Hence 32 unique sentences (25, 2 different words at 5

different syntactic positions) were generated for each group.
The other two types of sentences, simple and centre- Output layer (a) Output layer (b)
20 neurons 20 neurons
embedding sentences, were generated in a similar way. The
Hidden layer
four groups of sentences were combined to form the training Hidden layer
sets with different weightings of simple, right-branching and

Hidden layer Input layer Context layer
20 neurons 20 neurons 40 neurons
TABLE II.
4-PHASED TRAINING SCHEME Hidden layer
Phase Token and type (bracketed) ratio* No. of sentences 10 neurons Output layer
20 neurons
(c)
fed to a network Input layer Context layer Hidden layer 2
20 neurons
1 1 : 0 : 0 (1 : 0 : 0) 32 000 20 neurons 40 neurons
2 6 : 1 : 1 (24 : 1 : 1) 10 240
Hidden layer 1
3 2 : 1 : 1 (8 : 1 : 1) 51 200 40 neurons
4 1 : 2 : 2 (2 : 1 : 1) 64 000
Input layer Context layer
*ratio of simple: right-branching: centre-embedding
Fig. 3. Network architecture used in (a) van der Velde [1], (b)
centre-embedding sentences mixed together according to the Wong [2] and (c) this study.
4-phased training scheme in Table II. The design of the
training scheme with increasing number of complex
sentences was in accordance with Elman’s notion of
“starting small” [1, 31], our initial simulations have also M3 Bigram VDVM3
agreed that training SRNs with simple sentences first,
0.8
followed by increasing number of complex sentences indeed (b) Right-branching
gives better training results. SRNs trained on training set 0.7
sentences after the fourth phase of training were evaluated, 0.6 (a) Simple
via GPE as introduced in Section II.B, with testing set
0.5
sentences.
GPE
The testing sets were constructed by combining lexical 0.4

items from mixed groups, hence, sentences in LA but not in 0.3
LC. The level of difficulty with respect to generalization was
0.2
varied by the number of groups that are mixed. The more the
number of groups the more difficult the sentence would be. 0.1
We use M to denote such a level of complexity of a testing 0
set sentence. Examples of right-branching testing set N V N THAT V N #
sentences with different M values were: 0.8

(c) Centre-embedding
M=2: {n1a-v3b-n1c-that-v3d-n1e-#}, 0.7
{n4a-v3b-n4c-that-v3d-n4e-#} 0.6
M=3: {n1a-v3b-n2c-that-v1d-n3e-#}, 0.5
{n4a-v3b-n1c-that-v4d-n3e-#}
GPE
0.4
M=4: {n1a-v3b-n2c-that-v4d-n1e-#},
0.3
{n4a-v3b-n1c-that-v2d-n4e-#}
where a,b,c,d,e = {1,2} 0.2
0.1
Obviously testing sets contain more sentences than the
training set. More importantly, constructing testing set 0
N THAT N V V N #
sentences this way ensures a maximum separation between
lexical items from the same group since GPE is evaluated on Fig. 4. GPE evaluation on testing set sentences, with complexity
level M=3. Solid line (labelled M3): results we obtained; Solid line
every sentence position. with square marker (labelled VDVM3): results reported by van der
Velde et al. [1]; Dotted line: expected GPE from a bi-gram model.
IV. OUR RESULTS CONTRARY TO VAN DER VELDE
We trained twenty SRNs with the architecture shown in 0.05, with training set sentences. Since the prime focus is to
Fig. 3(b), each initialized with an independent random initial examine the generalisation ability exhibited by the networks.
set of connection weights, with streams of concatenated We plot only testing set GPE in Fig. 4 as a comparison of
sentences that were randomly sampled from the training the results obtained by us [2] and by van der Velde et al. [1].
sets. Similar to the results reported by van der Velde et al. GPE of networks’ output in processing testing set sentences
[1], the networks achieved a small GPE, on average about with complexity level M=3 in each sentence position were

measured. They were averaged over the twenty SRNs and
SRNs with 2 hidden layers Bigram
are plotted in Fig. 4 with error bars of two standard
SRNs with 1 hidden layer
deviations in height. Results reported by van der Velde et al.
0.8
[1] are marked on the plots with square markers. A baseline (b) Right-branching
GPE pattern expected from a bi-gram model (Table III) is 0.7
also included. 0.6 (a) Simple
TABLE III.
0.5
THE BI-GRAM MODEL*
GPE
N V that # 0.4
N 0 0.357 0.286 0.357
V 0.778 0.222 0 0 0.3
that 0.5 0.5 0 0
0.2
# 1 0 0 0
0.1
* the value of the cell in the ith row jth column is the relative frequency
that the words in category j follow the word in category i. Formally, 0
Pr(wk+1 Cj | wk Ci), where wk and wk+1 are consecutive words in a N V N THAT V N #
sequence. They are calculated from the training data.
0.8
(c) Centre-embedding
Van der Velde et al. [1] argued that SRNs are only able to 0.7
make use of the bi-gram statistics when processing testing 0.6
set sentences and therefore failed to generalise with respect
0.5
to combinatorial complexity (consider at the position of the
second noun of a right-branching sentence, the frequent N-V 0.4
GPE
transitions would bias the network to predict another verb to 0.3

follow, which is grammatically incorrect). Their argument
was based on the observation that the testing set GPE they 0.2
obtained were, in most sentence positions, larger than the 0.1

expected values obtained from a bi-gram model. In other
0
words, they were attempting to show that SRNs fail to N THAT N V V N #
capture the sentence structure when the sentence involved Fig. 5. Comparing the performance on generalisation exhibited
novel combinations of lexical items. by SRNs with two hidden layers and SRNs with one hidden
Our simulation results, however, showed contrary layer.
observation. In most sentence positions, our networks
training that might happen. Twenty 2-hidden-layer-SRNs
achieved GPE smaller than the bi-gram GPE. This cast
and twenty 1-hidden-layer-SRNs were trained according to
doubt on their criticism on SRNs. We speculate that our
the revised training scheme. GPE evaluation after training,
improvement in training the networks lies mainly in the
averaged over the twenty trials of each network type, are
choice of the network architecture (cf. Fig. 3). SRNs in van
plotted in Fig. 5. Networks with two hidden layers showed
der Velde et al. [1] had three hidden layers with the
considerable smaller GPE in processing testing set
recurrent copy-back connections coming from the second
sentences. Although centre-embedding sentences remained
hidden layer. In our initial simulaitons, SRNs with recurrent
to be difficult to procees which is consistent with human
connections come from layer other than the first hidden
performance [32, 33]. Networks with only one hidden layer
layer showed poor perfomance on testing set sentences, just
showed no improvement in response to the extended
like what van der Velde et al.[1] reported, even though they
training. The difference in performance between one-
learned the training set well. This was the reason we adopt
hidden-layer networks and two-hidden-layer network
the architecture we used. In short, we consider the dismissal
together with the individual difference in achieved GPE
of SRNs as a model for language acquisition and processing
among networks of the same type lead us to explore if there
based on a premature investigation as in van der Velde et al.
exist a qualitative difference between the two types of
[1] unconvincing. And that triggerred us to further explore
networks that may shed light on the plausible mechanism
the impact of network architecture on the performance with
underlines generalisation. This is to be discussed in the next
respect to generalisation.
section.
We carried out another set of simulations with SRN of
architectrure shown in Fig. 3 (c). Because the number of
V. COMBINATORIAL PRODUCTIVITY THROUGH CATEGORIES
connection weights for each network was increased by 50
per cent, we extended the fourth phase (cf. Table II) of Proponents of connectionist models for language
training such that the number of training sentences fed to the acquisition [7, 34] have long been arguing that networks are
network in the last phase was increased to 640,000. The ten- more than passive statistics gathering devise. Attempts have
fold increament was to exclude the possibility of premature been made to show that networks could go beyond surface

similarity towards successful generalisation. Elman’s early
attempt in [12], in which architecture of Fig. 3(b) was used, N1 V1 N2 THAT V2 N3
2
showed how knowledge of categories of nouns and verbs;
sub-categorisation of nouns into animates and inanimates; V2
1.5
and sub-categorisation of verbs into transitive and
intransitive verbs could be induced by SRNs. The active role 1
N1
played by, and modelled by, neural networks was clearly V1
0.5
stated by McClelland and Plaut in [34]:
"The relevant overlap of representations required 0 N2
for generalization in a neural network or other
statistical learning procedure need not be present -0.5
THAT
directly in the ‘raw input’ but can arise over internal
representations that are subject to learning." -1
In the context of the current study, a testing set sentence

-1.5 N3
such as “n11-v21-n31” is novel to the network since the n31 (a) Hidden layer 1
noun was never seen by the network as an object in a -2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
sentence with n11 as the subject. Results in Fig. 5 show that
SRNs with two hidden layers are better in dealing with such 0.8
novelty and some of them even achieved a very low GPE N1

0.6
value, Fig. 7 plots the mean GPE achieved by each of the THAT
0.4
networks with two hidden layers. We speculate that on top N2
of forming the categories of nouns and verbs, as 0.2
demonstrated in Elman [12], categorisation according to

0
sentence positions may also be the driving force for the V1
success of the networks. -0.2
In the literature of connectionist research, to probe the -0.4

question of how the networks solve a task, quite often V2
-0.6
analysis will be done on the internal representations, the N3
hidden layer activations (denoted as h(t) in Section II.A), -0.8
developed by the networks through training. Since hidden -1

layer activations are of high dimension, method of (b) Hidden layer 2
-1.2
dimensionality reduction such as Principle Components -2.5 -2 -1.5 -1 -0.5 0 0.5 1
Analysis, Multidimensional Scaling, and Hierarchical Fig. 6. Hidden layer activations of network #8’s (a) first hidden layer
Clustering Analysis are often used. We choose to use the and (b) second hidden layer. Each data point corresponds to a 2D
Classical Multidimensional Scaling as it target at preserving projection of network’s hidden layer activation, i.e. h(t), in processing
a word in a sentence. Data points correspond to activation at different
the Euclidean distance between data points in the reduced sentence positions of right-branching sentences, “N1-V1-THAT-V2-
space and hence might be better in revealing categories in N2”, are marked with different shape in black and labelled
the form of clusters formed by the networks. accordingly. Dots in grey correspond to the processing of other two
sentence types. The projection was done with Classical
A. Analysis of hidden layer activations – network #8 Multidimensional Scaling.
Fig. 6 gives the scattering plot of hidden layer activations

sampled from network #8, the network that achieved the
0.14
lowest mean GPE (of 0.003, cf. Fig. 7) on M=3 testing set 0.12
sentences among the twenty 2-hidden-layer SRNs trained,
Mean GPE
0.1
i.e. the one that was most able to generalise. The network 0.08
was fed with 90 simple sentences, 30 for each complexity 0.06
level, M=1…3; 120 right-branching and 120 centre- 0.04
0.02
embedding sentences, 30 for each complexity level,
0
M=1…4. The network was fed word-by-word with the test 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Network number
sentences and the hidden layer activation h(t) at each time
step, which corresponds to each sentence position, was Fig. 7. Mean GPE achieved by each of the 2-hidden-layer
networks. Obtained by testing the networks with 300 M=3 testing
recorded. Since the first three words of a right-branching
set sentences, 100 for each sentence type.
sentence is equivalent to a simple sentence and due to the
limitation of space we highlight only data points that
correspond to the processing of right-branching sentences in

Fig. 6. The first nouns (N1) in the right-branching sentences
are marked with a star, the second noun (N2) are marked N1 V1 N2 THAT V2 N3
1.5
with a square and the last nouns (N3) are marked with a
triangle in Fig. 6. Labels for the two verb positions and the N3
1
relative marker ‘that’ position can be found in the figure.
THAT
A qualitative difference between the functioning of the 0.5 V1
first and the second hidden layer can be observed. In the
N1
second hidden layer, a more distinct clustering according to 0
sentence positions is observed compared with the groupings
formed in the first hidden layer which are more dominated -0.5
by word type. In other words, the classification of main
V2
clause subject (N1), main clause object / relative clause -1
N2
subject (N2) and relative clause object (N3) in processing
right-branching sentences are more distinctive in the second -1.5
hidden layer. This agrees with the low GPE evaluation (a) Hidden layer 1
obtained by the network as grammatical predictions depend -2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
not only on the word type of the incoming word but also on
the context in which the word appears. Notice that hidden 1
THAT
layer activations of V1 and V2 almost completely overlap
with one another. This is due to the current limitation of the 0.5
prediction task, as in both of these two sentence positions N2

only nouns could be the correct continuation. The task does V2 V1
0 N1
not require the network to make further distinction.
B. Analysis of hidden layer activations – network #5 -0.5 1
The worst network in terms of generalisation is network

#5 (cf. Fig. 7) which achieved a mean GPE of 0.1 in -1
processing M=3 testing set sentences. Fig. 8 shows plots of N3

the network’s hidden layer activations of the two layers. -1.5
While a somewhat fuzzy distinction between nouns and
verbs is still available in the first hidden layer, clear (b) Hidden layer 2
-2
distinction according to sentence positions is lost in the -1.5 -1 -0.5 0 0.5 1 1.5 2
second layer. Analysis of other networks with relatively Fig. 8. Hidden layer activations of network #5’s (a) first hidden layer
poor performance, e.g. network #7 and #10, show similar and (b) second hidden layer. See caption of Fig. 6 for details.
phenomenon.
hidden layer. They also made fine categorisations according
VI. SUMMARY AND CONCLUSION to the sentence context in the second hidden layer. Our
speculation that the success of two-hidden-layer SRNs is
We consider the dismissal by van der Velde et al. [1, 10,
driven by the categorisation on top of general noun-verb
11] of SRNs as a model for cognition based on a premature
distinction was supported by such observation, particularly
analysis of the networks’ performance unconvincing. Our
from the contrast between the more successful networks
experimentations with SRN, under the framework of
with the less successful ones.
combinatorial productivity, suggested that (i) networks do
From the perspective of psycholinguists, one might ask
exhibit ability to generalise, (ii) networks with recurrent
whether the model and results we presented support a view
connections coming from the first hidden layer generalise
that noun-verb distinction is first acquired by a child before
better than networks with recurrent connection coming from
acquiring the ability to comprehend a sentence. The answer
other layer and (iii) networks with two hidden layers show
is no as we have not traced the development of categories
better ability to generalise.
through time to see what type of categorisation evolved first.
To our knowledge, this study is the first one to show that
In other words, the analysis done on the two hidden layers
layers could play differential roles in connectionist networks
was strictly synchronic. Our speculation is that they are
for language modelling. Among the two-hidden-layer
likely to be coevolving with one another since sentence
networks we analysed, all of them showed various degree of
context certainly provides a strong cue towards
success in forming general nouns and verbs categories in the
identification of word class. This view is also supported by
first hidden layer. SRNs that were more successful with
multi-agent model of language emergence [35, 36].
respect to generalisation not only developed a more
It remains for future research to see how much of the
separated distinction between nouns and verbs in the first
observed “working mechanism” in SRNs parallels the

function of neural substrates in the brain of a language [17] M. H. Christiansen and N. Chater, "Connectionist natural language
processing: the state of the art," Cognitive Science, vol. 23, pp. 417-
learner. Nevertheless, the functional building block of 437, 1999.
artificial neural networks is the association between [18] M. H. Christiansen and J. T. Devlin, "Recursive inconsistencies are
functionally correlated signals. If complex behaviour can hard to learn: A connectionist perspective on universal word order
emerge out of such low level association in artificial correlations," in Proceedings of the 19th Annual Cognitive Science
Society Conference Mahwah, NJ: Lawrence Erlbaum, 1997, pp. 113-
networks, we see no reason why that cannot happen in the 118.
living brain. [19] P. Rodriguez, "Simple recurrent networks learn context-free and
context-sensitive languages by counting," Neural Computation, vol.
13, pp. 2093-2118, 2001.
ACKNOWLEDGEMENT [20] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal
The authors would like to express their gratitude towards representation," in Parallel distributed processing : explorations in the
microstructure of cognition. vol. 1, D. E. Rumelhart, J. L. McClelland,
Dr. James W. Minett and Mr. Gong Tao for useful and University of California San Diego. PDP Research Group., Eds.
comments. The research is supported by research grants Cambridge, Mass:: MIT Press, 1986, pp. 319-362.
from the RGC Hong Kong: CUHK-1224/02H and CUHK- [21] S. S. Haykin, Neural networks : a comprehensive foundation, 2nd ed.
New York: Prentice Hall, 1999.
1127/04H. [22] J. L. Elman, "The emergence of language: A conspiracy theory," in
Emergence of Language, B. MacWhinney, Ed. Hillsdale, NJ:
REFERENCES Lawrence Earlbaum Associates, 1999, pp. 1-27.
[23] M. Redington and N. Chater, "Connectionist and statistical approaches
[1] F. van der Velde, G. T. van der Voort van der Kleij, and M. de Kamps,
to language acquisition: a distributional perspective," in Language
"Lack of combinatorial productivity in language processing with
acquisition and connectionism, K. Plunkett, Ed. Hove, UK:
simple recurrent networks," Connection Science, vol. 16, pp. 21-46,
Psychology Press, 1998, pp. 129-191.
Mar 2004.
[24] M. Redington, N. Chater, and S. Finch, "Distributional information: A
[2] F. C. K. Wong, J. W. Minett, and W. S.-Y. Wang, "Reassessing
powerful cue for acquiring syntactic categories," Cognitive Science,
combinatorial productivity exhibited by simple recurrent networks in
vol. 22, pp. 425-469, OCT-DEC 1998.
language acquisition," in Proceedings of the 2006 International Joint
[25] M. Tomasello, "The item-based nature of children's early syntactic
Conference on Neural Networks Vancouver, Canada, 2006, pp. 2905-
development," in Language development : the essential readings, M.
2912.
Tomasello and E. Bates, Eds. Malden, Mass.: Blackwell Publishers,
[3] R. E. Bellman, Adaptive control processes: Princeton University Press,
2001, pp. 169-186.
1961.
[26] M. Tomasello, Constructing a language : a usage-based theory of
[4] L. I. Perlovsky, "Toward physics of the mind: concepts, emotions,
language acquisition. Cambridge, Mass.: Harvard University Press,
consciousness and symbols," Physics of Life Reviews, vol. 3, pp. 23-
2003.
55, 2006.
[27] M. H. Christiansen and N. Chater, "Toward a connectionist model of
[5] S. Pinker, Language learnability and language development, 2nd ed.
recursion in human linguistic performance," Cognitive Science, vol.
Cambridge,Mass.: Harvard University Press, 1996.
23, pp. 157-205, 1999.
[6] G. Lupyan and M. H. Christiansen, "Case, word order, and language
[28] E. Bates, I. Bretherton, and L. Snyder, From first words to grammar:
learnability: Insights from connectionist modeling," in Proceedings of
Individual differences and dissociable mechanisms. Cambridge, MA:
the 24th Annual Conference of the Cognitive Science Society Mahwah,
Cambridge University Press, 1988.
NJ: Lawrence Erlbaum Associates, 2002, pp. 569-601.
[29] A. E. Goldberg, Constructions at work: The nature of generalization
[7] J. L. Elman, "Generalization from sparse input," in Proceedings of the
in language. New York: Oxford University Press, 2006.
38th Annual Meeting of the Chicago Linguistic Society, 2003.
[30] V. Valian, S. Prasada, and J. Scarpa, "Direct object predictability:
[8] G. F. Marcus, The algebraic mind: integrating connectionism and
effects on young children's imitation of sentences," Journal of Child
cognitive science. Cambridge, Mass. ; London: MIT Press, 2001.
Language, vol. 33, pp. 247-269, 2006.
[9] G. F. Marcus, S. Vijayan, S. Bandi Rao, and P. M. Vishton, "Rule
[31] J. L. Elman, "Learning and development in neural networks: the
learning by seven-month-old Infants," Science, vol. 283, pp. 77-80,
importance of starting small," Cognition, vol. 48, pp. 71-99, 1993.
January 1, 1999 1999.
[32] F. Hsiao and E. Gibson, "Processing relative clauses in Chinese,"
[10] F. van der Velde and M. de Kamps, "Neural blackboard architectures
Cognition, vol. 90, pp. 3-27, 2003.
of combinatorial structures in cognition," Behavioral and Brain
[33] E. Gibson, "The dependency locality theory: a distance-based theory
Sciences, vol. 29, pp. 37-108, 2006.
of linguistic complexity," in Image, language, brain : papers from the
[11] F. van der Velde, "Modelling language development and evolution
First Mind Articulation Project Symposium, A. Marantz, Y. Miyashita,
with the benefit of hindsight," Connection Science, vol. 17, pp. 361-
and W. O'Neil, Eds. Cambridge, Mass.: MIT Press, 2000, pp. 95-126.
379, Sep-Dec 2005.
[34] J. L. McClelland and D. C. Plaut, "Does generalization in infant
[12] J. L. Elman, "Finding structure in time," Cognitive Science, vol. 14,
learning implicate abstract algebra-like rules?," Trends in Cognitive
pp. 179-211, 1990.
Sciences, vol. 3, pp. 166-168, 1999.
[13] A. Borovsky and J. L. Elman, "Language input and semantic
[35] T. Gong and W. S. Y. Wang, "Computational modeling on language
categories: a relation between cognition and early word learning,"
emergence: A coevolution model of lexicon, syntax and social
Journal of Child Language, vol. 33, pp. 759-790, 2006.
structure," Language and Linguistics, vol. 6, pp. 1-41, 2005.
[14] M. H. Christiansen, C. M. Conway, and S. Curtin, "Multiple-cue
[36] T. Gong, J. Ke, J. W. Minett, J. H. Holland, and W. S. Y. Wang, "A
integration in language acquisition: a connectionist model of speech
computational model of the coevolution of lexicon and syntax,"
segmentation and rule-like behavior," in Language acquisition, change
Complexity, vol. 10, pp. 50-62, 2005.
and emergence: essays in evolutionary linguistics, J. W. Minett and
W. S. Y. Wang, Eds. Hong Kong: City University of Hong Kong
Press, 2005, pp. 205-240.
1
[15] J. L. Elman, "Connectionism and language acquisition," in Language Hence the name “prediction task” and “Grammatical
development : the essential readings, M. Tomasello and E. Bates, Eds. Prediction Error”
Malden, Mass.: Blackwell Publishers, 2001.
[16] J. L. Elman, "An alternative view of the mental lexicon," Trends in
Cognitive Sciences, vol. 8, pp. 301-306, JUL 2004.

An Estimative Model of Maximum Power Generation from

Photovoltaic Modules Based on Generalized Regression Neural Network
HUNG-CHENG CHEN, JENG-CHYAN LIN, and MENG-HUI WANG

Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung, Taiwan, R.O.C.
JIAN-CONG QIU
Institute of Information and Electrical Energy, National Chin-Yi University of Technology, Taichung, Taiwan, R.O.C.
Abstract: Generalized Regression Neural Network For the maximum power generation estimation from
(GRNN) is usually applied to the function approximation. PV modules, an estimative model based on neural
Based on the principle of GRNN, the paper presents an networks is proposed. The problem of the maximum
estimative model of maximum power generation (MPG) power generation estimation is view as a function
from photovoltaic (PV) modules. The weather factors approximation problem consisting of a nonlinear mapping
such as irradiation and temperature are utilized as the from a set of input variables containing information about
input information to the proposed neural network. The the weather onto a single output variable representing the
model implementation using Mtalab/Simulink is also estimated maximum power generation [3]-[5]. The
developed. It could be applied on further simulation of presented study develops a GRNN model trained on
DC/AC inverter for converting DC source to three-phase extended data set adopted from power-voltage
AC source. The simulation results show the presented characteristic curves of a PV module under different
model has good accuracy on estimating the maximum irradiations and temperatures. The maximum powers
power generation from PV modules. generated corresponding to different irradiations and
temperatures are utilized. The accuracy of the neural
network model is evaluated by making a comparison
between estimated values and characteristic curves.
1 Introduction Simulation results and discussions are presented in this
paper. The estimation results made by the proposed model
The fossil fuel will produce pollution after burning, such show a good agreement with the characteristic curves of
as carbon dioxide, nitrogen oxide, sulfur oxide compound PV modules.
and the carbon hydrogen compound etc [1]. All of these
poisonous airs are the cause of the air pollutions and
green house effects. The fossil fuel is the non-renewable 2 Characteristic Analysis of PV
resources and less and less after massive consumptions.
Comparing with the nuclear energy and thermal power, Modules
the renewable energy is inexhaustible and has
non-pollution characteristics. The solar energy, wind To investigate the fundamental characteristics of the PV
power, hydraulic power and tide energy are natural module, the specification of SEIMENS SP75 PV module
resources of the interest to generate electrical sources. under standard test condition (irradiation is 1000W/m2
Extensive and generalized usage of renewable energy is and temperature is 25к) is shown in Table 1. Fig. 1
very popular to reduce the pollutions we have cause on shows the characteristic curves of this PV module. We
earth. can observe the changes of V-I and P-V characteristic
The solar energy is a welcome substitution for many curves under the different irradiation and temperature
other energy resources because it is natural, inexhaustible values. It is nonlinear and depends on the irradiation level
resource of sunlight to generate electricity [2]. The PV and operating cell temperature. When the irradiation
modules are, by nature, nonlinear power sources that need reduces, the output current and the output power
accurate estimation of the maximum power generation obviously decreases, shown as Fig.1(a) and Fig.1(b). If
(MPG). For the operation planning of power system the temperature rises , the output voltage and output
including PV modules, the accurate estimation of MPG is power will decrease, shown as Fig.1(c) and Fig.1(d).
inevitable. The MPG depends on the weather factors, If the maximum-power-point tracker is applied, the
mainly the irradiation and the cell temperature. Therefore, PV module is always working at the
the weather factors such as the irradiation and the maximum-power-point. Therefore, the MPG can be
temperature are utilized for the estimation of the adopted from power-voltage characteristic curves under
maximum power in this study. different irradiations and temperatures. The maximum

powers generated corresponding to different irradiations
and temperatures are utilized as training data of the
developed GRNN.
Table 1. Specification of SEIMENS SP75 PV module
Characteristics Value
Typical maximum power (Pmax) 74.8 W
Voltage at maximum power (Vmp) 17 V
Current at maximum power (Imp) 4.4 A (c)

Short circuit current (Isc) 4.8 A
Open circuit voltage (Voc) 21.7 V
Temperature coefficient of
2.06 mA/к
short-circuit current
Temperature coefficient of
-0.077V/к
open-circuit current
(d)
Fig. 1. Characteristic curves of SEIMENS SP75 PV
module (a) V-I characteristic curve of different irradiation
(b) P-V characteristic curve of different irradiation (c) V-I
characteristic curve of different temperature (d) P-V
characteristic curve of different temperature
(a)
3 GRNN Based MPG Estimative
Model for PV Modules
3.1 GRNN Structure Analysis
The typical structure of GRNN is shown as Fig. 2. It is
often used in function approximation for complex models.
It consists of a layer of radial base network and a layer of
linear network [6]. Direct mappings are adapted by
GRNN between input layer and hidden layer. But the
mapping between the hidden layer and output layer
adopts the weighted linearly sum of hidden layers as the
mapping mode. In Fig. 2, IW1,1 Q u R is the
weighting matrix of first layer. “.*” stands for the dot
(b) product of input vectors. LW2,1 Q u Q is the weigh
matrix. a1 Q u 1 is the output of first layer. R is the

dimension of network input. Q is the neuron number of

every layer and is also the number of training samples. In the Matlab/Simulink implementation of this kind of
nprod presents the normal dot matrix for calculating the neural networks, the centers of the Gaussian are chosen
2 2 equal to the training input patterns. The first layer has as
output vectors n Q u 1 . a Q u 1 is the output of
many neurons as the number of input patterns. Each
GRNN. This structure of NN can reduce the complexity hidden node has an associate bias that plays the role of
of computational problems so as to speed up the learning the variance of the Gaussian. The bias is set to a column
process. It is suitable especially for accomplishing the vector of 0.8326/SPREAD, where SPREAD determines
function approximation and model identification quickly. the distance of an input vector to a neuron’s weight vector
at which the radial basis function will respond with an
Input Radial Basis Layer Special Linear Layer output of 0.5 [7]. In practical, SPREAD should be large
enough so that more than one node in the hidden layer of
Q u R IW 1,1
the GRNN is responding with a nonzero output. On the
Q u Q LW 2,1 other hand, SPREAD should not be large enough so that
P Q u1 a2 y
||dist||
Q u1
every node in the hidden layer is efficiently responding in
R u1 1
a1 n2 the same large area of the input space [8]. After the
. * Qnu1 Q u1
nprod
Q u1
1 b 1 outputs of the hidden layer are determined, the weights
R
Q u1
Q Q from the hidden layer to the output layer are chosen to be
equal to the desired vectors.
Fig. 2. Structure of GRNN
4 Estimation Results and
Discussions
3.2 MPG Estimative Model Implementation
Using Matlab/Simulink The proposed MPG estimative model based on GRNN is
evaluated by using the training data set. The estimation
Based on the principle of GRNN discussed above, the accuracy for different weather conditions is shown in
MPG estimative model implementation using Table 2. The maximum error is only 0.58%. This table
Mtalab/Simulink is developed. The Mtalab/Simulink shows that the presented model has good accuracy on
estimating the maximum power generation from PV
model is shown in Fig. 3.
modules.
1 TDL weight
data Delays 1 1
bias netprod radbas a{1}
Table 2. Estimation accuracy of GRNN for different
weather conditions
weights w
z
p Weather Characteristic Curves Estimation Values
1 TDL normprod1 Mux -K- 1
a{1} Mux netsum purelin output
R T Pmax Imp Vmp Pmax Imp Vmp
weights w
z 1000 0 82.30 4.35 19.0 82.29 4.35 18.90
p
normprod2 1000 25 74.80 4.40 17.0 74.80 4.40 17.00
(a) 1000 50 67.10 4.45 15.1 67.10 4.45 15.07

1000 75 59.20 4.50 13.2 59.20 4.50 13.15
R 800 25 59.50 3.50 17.0 59.39 3.48 17.06
Radiation data output 600 25 46.70 2.72 17.2 46.52 2.72 17.10
T
GRNN
400 25 31.20 1.80 17.3 31.14 1.80 17.30
scope
Temperature 200 25 15.66 0.90 17.4 15.64 0.90 17.38
(b)
For further investigating the performances of the
Fig. 3. An estimative model of maximum power estimative model, analysis has been performed during a
generation from PV modules based on GRNN. (a) GRNN day. Fig. 4 illustrates the daily estimation of the
structure using simulink (b) GRNN simulink model maximum power generation obtained by using
GRNN-based estimative model. The data on the day

indicated in Fig.4(a) and Fig.4(b) are not included in the
training data. As shown in Fig.4(c) and Fig.4(d), the 80
proposed GRNN-based estimative model gives the highly
accuracy estimation of the maximum power on all day 70
long. This fact indicates the wide availability of the
proposed model. 60
Pmax (W)
50
32
40
31
30
30
T(OC)
20
29
8 9 10 11 12 13 14 15 16 17 18
28 Time
(d)
27
Fig. 4. Estimation results by GRNN during a day (a)
26 temperature variation (b) radiation variation (c) voltage
8 9 10 11 12 13 14 15 16 17 18 and current outputs at maximum power generation (d)
Time estimated maximum power generation
(a)
1200 The output of a PV module is a DC source. However,

most of power distribution system and power generation
system are three-phase AC sources. We have to convert
1000
DC source of the PV module output to AC source in order
Radiation (W/m2)
to connect with others of power systems. When

800 irradiation is 1000 W/m2 and temperature is 25к, the DC
source of the PV module output is converted to
600 three-phase AC source through a IGBT inverter
and uses the LC circuit as a filter. Fig. 5 shows the
400 simulation of DC/AC inverter using the estimative model
of the PV module. The dynamical responses of the line
200
output voltage and the line output current of the DC/AC
8 9 10 11 12 13 14 15 16 17 18 inverter are shown in Fig. 6.
As shown in Fig. 6, we can observe that the output
Time
waveform of Vab and Ian are complete three-phase AC
(b)
sources. The output frequencies of both voltage and
current are 50 Hz. When irradiation is 1000 W/m2 and
18
temperature is 25к, we can get a current Ian with peak
16 value 4.95A and the root mean square value therefore is
Imp(A) 3.5A. It represents that three-phase power delivered to
14
Vmp(V)
12 three-phase RLC load is 63W. Some powers are
consumed during converting DC source to three-phase
10
AC source, which include IGBT inverter and LC filter.
8 Summing these consumed powers and absorbed power,
6 the total there-phase output power is about 74W. It is very
4 near the maximum power generation from the SEIMENS
SP75 PV module studied in this research.
2
0
8 9 10 11 12 13 14 15 16 17 18
Time
(c)

Vdc
+
v
-
+
v
PWM -
1000
IGBT Inverter Vab_inv i
R + Ia
-
data output s g
Scope2
+ +
25 Vabc
A A A A
GRNN -
Iabc
T
Initialized B B B B a
Controlled Voltage - b
Source 1 C C C C
c
LC Filter Three-Phase
C
A
B
V-I Measurement
Three-Phase
Series RLC Branch
C
A
B
Voltage Regulator
1
Pulses Signal(s) Vabc_inv Vabc (pu)
z
Vref (pu)
Discrete
PWM Generator m Vd_ref (pu) 1
Fig. 5. Simulation of DC/AC inverter using the estimative model of the PV module
15 5. Conclusions
10
For the operation planning of power system including PV
5 modules, the accurate estimation of MPG is inevitable.
The MPG depends on the weather factors, mainly the
Vab (V)
0 irradiation and the cell temperature. In this paper, the

performances of the Generalized Regression Neural
-5 Networks used to estimate the maximum power
generation from PV modules are investigated. The
-10 proposed GRNN-based estimative model gives the highly
accuracy estimation of the maximum power in response
-15
0 0.02 0.04 0.06 0.08 0.1 to the change in weather conditions for all day long. This
Time (s) fact indicates the wide availability of the proposed model.
(a)
5 Acknowledgments
4
3 The research was supported in part by the National
2 Science Council of the Republic of China, under Grant
1 No. NSC94-2622-E-167-009.
Ian (A)
0
-1
-2
-3
Reference
-4
-5 [1] K. Kobayashi, H. Matsuo and Y. Sekine, An
0 0.02 0.04 0.06 0.08 0.1
Time (s) Excellent Operating Point Tracker of the Solar-Cell
Power Supply System, IEEE Conference of Power
(b) Electronics Specialists, (2006), 495-499.
Fig. 6. Dynamical responses of the DC/AC inverter [2] J Applebaum, The Quality of Load Matching in a
(a) line voltage Vab (b) line current Ian Direct Coupling Photovoltaic System, IEEE Trans.
on Energy Conversion, 2(1987), 534-541.
[3] T. Hiyama, S. Kouzuma and T. Imakubo,
Identification of Optimal Operation Point of PV

Modules Using Neural Network for Real Time on Generation, Transmission and Distribution,
Maximum Power Tracking Control, IEEE Trans. on 147(2000), 310-316.
Energy Conversion, 10(1995), 360-367. [6] D. W. Philippe, Neural Network Models. An
[4] S. Premrudeepreechacharn, , Patanapirom, N.: Analysis, Springer-Verlag, Berlin Heidelberg, New
Solar-Array Modeling and Maximum Power York. (1996)
Tracking Using Neural Networks. Proc. of IEEE [7] H. Demuth and M. Beale, Neural Network Toolbox
Conf. on Power Technology 2 (2003) 1-5 for Use with ATLAB, User’s Guide, Version 3.0.
[5] A. Al-Amoudi and L. Zhang, Application of Radial [8] C. Christodoulou and M. Georgiopoulous,
Basis Function Networks for Solar-Array Modeling Application of Neural Networks in Electromagnetics,
and Maximum Power-Point Prediction, IEE Proc. Artech House, Norwood MA. (2001).

DCDIS A Supplement, Advances in Neural Networks, Vol. 14(S1)664--670
Channel noise induced transition from quiescence

to bursting in the dissipative stochastic mechanics
based model neuron
MARİFİ GÜLER
Department of Computer Engineering, Eastern Mediterranean University,
Famagusta, Mersin-10,
Turkey.
E-mail: marifi.guler@gmail.com
AMS subject classifications: 92C05,62M45,82C05 of the coupled system composed of ions, water, protein,
and lipid molecules has been treated in various approaches:
Abstract— Recently, an approach based on dissipative (1) continuum approximation [3]-[5], (2) Brownian motion
stochastic mechanics was proposed by the author for the of each ion [6], [7], (3) molecular dynamics that takes
effects of channel noise on the neuron’s behavior [1]. The
present paper advocates a scheme for identifying the location of into account the motion of the involved particles [8], [9],
the extremum of the renormalization potential, accommodated and (4) conceiving the ion channel as a quantum system
inherently by the approach in the membrane voltage space; it with two or more states [10]. Possible effects of channel
is argued that the location should be decided by the stationary noise on the neuron’s behavior have been investigated by
solution of the underlying deterministic model. It is then means of representing the stochasticity of the ion channels
found that there exists a region of input current values, in
which, the neuron with noise is in the bursting state while as an additional voltage dependent Gaussian noise term
the deterministic neuron is in the quiescent state. Also that, introduced into the deterministic equations of motion for
this region coexists with another region, in which, the noisy the gating variables [11] or the conductances [12], in the
neuron is bursting while the deterministic neuron is in the Hodgkin-Huxley model; and, more recently, by using an
state of tonic firing. approach based on stochastic mechanics with dissipation [1].
Internal noise from ion channels has been shown to
be sufficient to cause spontaneous activity (repetitive fir-
I. I NTRODUCTION
ing or bursting) in otherwise quiet neuronal models [12]-
Neurons are under the influence of noise of two dif- [17], chapter 15 in [18]. The effects of channel noise
ferent types — external type and internal type. External and temperature on more complicated behavior such as
noise arises from synaptic transmission and network effects. the coexistence of different dynamical states (in particular,
Internal noise, on the other hand, is specific to neurons the states of bursting and tonic firing) and noise-induced
and generates stochastic behavior on the level of neuronal transitions among these dynamical states have attracted
dynamics. The major source of internal noise is due to the attention recently and have only started to be investigated
existence of a finite number of voltage-gated ion channels within last few years [19]-[22].
in a patch of neuronal membrane and that channels have The occurrence of burst-generating activity in neurons
one open state and one or more closed states. The number is widely spread and a substantial fraction of spikes fired
of open channels fluctuates in a seemingly random manner, by cortical neurons during information processing occurs
implying a fluctuation in the conductivity of the membrane, during bursts, and these spikes play a crucial role in in-
which, in turn, implies a fluctuation in the membrane voltage formation processing in the cortex [23]-[25]. In a study,
[2]. Although the voltage across the membrane is commonly using integrate-and-fire neurons with spike adaptation, it
termed as membrane potential, we shall use the term ”mem- was shown that networks of tonically firing excitatory
brane voltage” instead throughout the paper in order to avoid neurons can evolve to a state where the neurons burst
a possible confusion with the potential functions that are in a synchronized manner [26]. It has been argued that
going to be introduced later. tonic firing is associated with a linear coding of the input,
Ion channels are water filled holes in the cell mem- whereas bursting is associated with a nonlinear one; and
brane that are formed by proteins embedded in the lipid that the bursting neuron signals certain stimulus features
bilayer, with the property that each type of ion channel is [27]. There is a phenomenon, which can be considered
selective to conduct a particular ion species. The dynamics reminiscent of this conjecture, already known in artificial

neural networks as follows: a much better generalization The collective system is specified as follows. Assume, for
is achieved if networks of intrinsically higher order units, the time being, that the deterministic model is not the one
rather than the networks of linear McCulloch-Pitts units, given by the set of equations (1), but instead, it reads as
are used in learning complicated logical functions [28]. All
that might explain the reason for having most neurons in a
bursting state, rather than the simple spiking state, during a (2a)
cognitive task. Consequently, noise-induced transitions be- (2b)
tween bursting and tonic firing states might have particularly (2c)
significant implications.
In the present paper, we suggest a principle for identifying Then, by a change of variables, it easily follows that this
the location of the extremum of the renormalization poten- set of equations is equivalent to
tial accommodated inherently by the dissipative stochastic
mechanics based model neuron. Some numerical results are (3a)
also presented in order to show the implications of that
principle. (3b)
(3c)
II. A SPECTSOF THE DISSIPATIVE STOCHASTIC where matches the overall current that the neuron ex-
MECHANICS BASED MODEL NEURON periences, and denotes the initial time. The dissipative
The dissipative stochastic mechanics based approach to stochastic mechanics based approach, in this case, makes
model the effects of channel noise in neurons was proposed the ansatz that the combined effect of noise, emerging from
by Güler in a recent study [1]. The approach views the the fluctuations of the -channels and -channels, on the
total channel activity as constituted by the collective and the membrane voltage is the same as the effect of a Brownian
intrinsic systems both subject to the channel fluctuations, in environment, with zero friction, on the position of a one-
which, the collective system is described through a voltage dimensional particle. Here the term “noise” (or “channel
dependent potential in the membrane voltage phase space noise”) denotes how particular responses of populations of
and the intrinsic system is formed by a set of dynamical ion channels differ from the mean behavior. Then, follow-
channel attributes. The coupling between the collective and ing Nelson [30], [31], the membrane voltage obeys a
the intrinsic states induces the emergence of some correction Schrödinger type equation with some membrane voltage
terms due to the renormalizations of the membrane capac- diffusion constant and a potential function .
itance and of the voltage dependent potential, as well as The probability of finding the membrane voltage at the
the channel dissipation. A computational neuron model that value of at time is given through the wave function
incorporates channel noise was introduced consequently. by . follows from (3) to be as
This model assumes the Rose-Hindmarsh model as the
underlying deterministic model of the neuron and reduces (4)
to it in the deterministic limit. The Rose-Hindmarsh model
[29]; a three parameter model, where the three variables Then, Ehrenfest’s procedure results in the following first
describe in dimensionless units the membrane voltage , moment equations:
an auxiliary variable representing the fast ion dynamics
(e.g., potassium and sodium), and a slow variable which (5a)
captures the slower dynamics. The Rose-Hindmarsh model
is formulated in the form of a coupled set of dynamical (5b)
differential equations as follows: (5c)
where the first moments and are the expectation values
(1a) of the collective position and the collective momentum
operators:
(1b)
(1c) (6)
where , , , , , and are some constant parameters. and
denotes the external current injected into the neuron and
, introduced here for convenience, denotes the membrane (7)
capacitance. The model is capable of exhibiting tonic firing
and bursting, for a proper choice of the parameters, depend- In deriving equation (5), instead of the potential function
ing on the value of the current . , its Taylor expansion at upto the quadratic

order was used in order to avoid the appearance of the pair The correction coefficients in association with the -
correlation functions. channels are given by
The frictional parts of and , in (1b) and in
(1c), are articulated by the intrinsic system. The voltage
dynamics of the collective system, induced by the intrinsic (11)
system, is worked out from the dynamics of the entire
(collective plus intrinsic) system through the use of reduced and
density operator techniques. The derivation starts with a
separation of the total Hamiltonian, corresponding to the
entire system, into intrinsic and collective Hamiltonians and (12)
a weak coupling,
where are obtained from the eigenvalues
of the unperturbed intrinsic Hamiltonian :
(8)
(13)
where and denote the set of intrinsic coordinates
and the intrinsic momentum operator, respectively. is the partition function of the intrinsic system at
is the coupling (or interaction) between the collective and thermal equilibrium with a temperature , and
the intrinsic systems. The superscript indicates that the . The correction coefficients in association with the
intrinsic system in consideration is in association with the - -channels are given similarly.
channels. Similarly, another intrinsic system, in association
with the -channels, is defined. The intrinsic system is III. W HERE SHOULD THE RENORMALIZATION
assumed to be in a state of large but nearly random exci- POTENTIAL BE LOCATED ?
tation with the fluctuations being distributed as Gaussians. Taking the interaction Hamiltonian as in (8)
Following the derivation of the equation for the reduced leads to the location of the extremum of the renormalization
density operator as in [32], equations of motion for the potential be at . It can be argued however that the
collective variables are obtained. Then, the first moments extremum of the renormalization potential should coincide
dynamics reads as with the extremum of the collective potential. In other
words, the extremum should be identified as the quasi-
static or equilibrium state of the entire system. A detailed
(9a)
physical discussion on the issue can be found in [33].
Following this argument, we suggest taking the extremum as
the stationary solution of the Rose-Hindmarsh model since
it is the underlying deterministic model. Then, solving (1)
(9b) subject to the condition , results in that the
equilibrium voltage, denoted by , obeys
(9c) (14)
Consequently, equation (8) should be modified as
(9d)
where the correction terms with the coefficients and (15)

emerge due to the renormalizations of the membrane capaci-
tance and the voltage dependent potential, respectively. They with the result that the renormalization potential now be-
are induced by the part of the intrinsic system associated comes
with the -channels. The renormalization potential is an
additional conservative potential in the form of an inverted (16)
parabola:
This, in turn, implies a modification in (9b) as that the
(10) term in it is replaced by . The first
moments dynamics (9), subject to this modification, will
The influence of the intrinsic system associated with the hold for slowly varying input currents.
-channels is treated in the same manner with similar Taking into consideration the above argument also for the
renormalization terms. -channels, and combining the effects of the two intrinsic

systems as in [1], results in the following equations of We need to solve from equation (14) first since it
motion for the first moments: takes place in the governing equation. For the values of the
model parameters , , , and as just stated above, it can be
shown that attains a uniquely defined value, provided
(17a)
that , as given by
(17b)
(17c)
(17d)
(26)
(17e)
where
where
(18)
(19) (27a)
(20) (27b)
(21)
Figure 1 shows the membrane voltage time course of
the deterministic Rose-Hindmarsh model for the parameter
(22) values , , , , , ,
, in the time range for input
current values: , , , ,
, , . Figure 2 shows the time course
of the membrane voltage expectation value in our noisy
(23)
neuron model, using the correction coefficients ,
, , ; the model parameter
and values, the time range, the input current values are the same
as of Figure 1. For current values below , both
the deterministic and the noisy neurons are quiescent. The
(24) time course upto is not included in the plots just for
skipping the transient activity. Since the exact time course
In (17), is the expectation value of the depends on the initial values of the dynamical variables,
operator defined by the same set of initial values were used in obtaining both
figures.
(25)
It can be seen from Figure 2 that even though the time
course of the membrane voltage is affected by the channel
IV. S OME NUMERICAL RESULTS noise, still exhibits a regular behavior, provided that
In this section, we shall solve the governing equation (17) the coefficients and are not unrealistically large.
numerically using the fourth-order Runge-Kutta method and Thus, channel noise does not destroy or modify the basic
investigate the role played by the correction terms in the qualitative properties of the neuron’s repertoire, but only
evolution of the membrane voltage expectation value . Our alters its quantitative behavior. However, note here that
investigation will be conducted using the following com- Figure 2 includes plots of , rather than the plots of ,
monly used values of the (deterministic) model parameters: that is these plots show the ensemble average of a Brownian
, , , , , . For the type motion and, therefore, the neuron in a specific sequence
parameters and , various values have been used by the of membrane voltage measurements will actually exhibit a
researchers. We will fix the value of to and to . zigzag behavior.

A A
B B
C C
D D
E E
F F
G G
H H
Fig. 1. Membrane voltage time series of the deterministic Rose-Hindmarsh Fig. 2. Time series of in our noisy neuron model for the correction
model for the parameter values , , , , , coefficients , , , and the parameter
, , in the time range for various values , , , , , , ,
(constant) current values: , , , , in the time range for various (constant) current
, , . The membrane voltage value is in the range values: , , , , , ,
in each plot. . The value of is in the range in each plot.
The dynamical states of the Rose-Hindmarsh model are

quiescence, bursting (rhythmic with a high degree of period-
icity, or chaotic), and tonic firing. We observe from Figure
2 that the same repertoire of dynamical states endure also
in the presence of channel noise. But, the quiescent state
in the noisy case should be considered as to include also
the subthreshold oscillations due to the zigzag behavior of
a specific sequence of membrane voltage data. Any realistic
model of the neuron with channel noise should display the
above repertoire of dynamical states since this is exactly
what a typical biological neuron (in which internal noise is

inherent) does. fixed point to subthreshold oscillations and then to rhythmic
The plots in Figure 2 for our noisy neuron show that firing, for the stochastic model; while, for the deterministic
exhibits a highly periodic time course – this can be seen model, the result is a shift from the stable fixed point to
even more clearly by extending the plots to longer time rhythmic firing [16]. More recently, by taking into account
intervals. However, it can be shown for some specific values the stochastic nature of calcium and fast potassium channel
of the input current that the number of spikes in successive currents in the Plant model of a bursting neuron, it was
bursts have no apparent regularity; it appears to exhibit reported that noise-induced coherent bursting emerges even
chaotic behavior. This is actually the type of behavior the in the case when the deterministic neuron is silent [17]. The
deterministic Rose-Hindmarsh model exhibits. The values occurrence of the noise-induced transition from quiescence
of the input current for which the chaotic behavior take to bursting, or the presence of the lower region, in our
place in the noisy neuron are, however, different than the neuron model is compatible with these findings. Our model
values for the Rose-Hindmarsh model. Thus, just like the however tends to exhibit a time course with a much higher
underlying deterministic model, our noisy neuron model (in degree of periodicity in comparison with the results of the
terms of ) is capable of carrying out a highly periodic currently available other models cited above.
activity as well as a chaotic type behavior. It can be argued It is also remarkable that the occurrence of the noise-
however that some approximations were accommodated in induced transition from tonic firing to bursting, or the
developing our formalism, and, therefore, in reality, the presence of the higher region, in our neuron model is
degree of periodicity may not be that high. also supported by the recently reported findings of differ-
It is seen from the figures that our noisy neuron model ent researchers; obtained by means of introducing noise
displays bursting in a wider range of input currents in com- into the gating variables [19], by means of increasing
parison with its deterministic counterpart. Figure 1 shows the temperature in the Huber-Braun cold receptor model
that the Rose-Hindmarsh model is in the bursting state for [20], and through the stimulation of the electroreceptors
the values of currents . In our neuron, on in paddlefish with Gaussian noise [34]. Inspired by the
the other hand, the domain for bursting is stochastic automaton model of ion channels [11], in [19]
when the correction coefficient values , , the Hodgkin-Huxley model of the neuron was considered
, are used. Here, the leftfloor-rightfloor and a stochastic gating of the channels were simulated
notation is used to indicate an approximate range by adding to the equation for the variable representing
of values. The widening of the current range for bursting the proportion of the open gates (i.e. variable in
with increasing noise, i.e. with having larger values of the Hodgkin-Huxley equations) a Gaussian noise term with
the correction coefficients, results in two types of noise- mean zero, and a voltage and dependent variance; then,
induced transitions: transition from quiescence to bursting, it was suggested that sufficient noise, associated with the
and transition from tonic firing to bursting. There is a region channels, converts tonic firing into bursting, for which,
of input current values where the deterministic neuron is in the minimal essential noise amplitude increases with the
the quiescent state but the neuron with channel noise is in the applied current. In [20], two interacting minimal sets of ionic
bursting state; and, a distinct region of higher input current conductances, each including simplified depolarizing and
values where the deterministic neuron is in the state of tonic repolarizing Hodgkin-Huxley type currents with sigmoidal
firing but the noisy neuron is still in the bursting state. These steady state activation kinetics, together with a tempera-
two regions are in coexistence, that is both regions turn ture scaling of the ionic currents, were considered and
up simultaneously for a single set of values of the model it was found that transition from tonic firing to bursting
parameters and the correction coefficients. The plots and activity with increasing temperature always occurs. In [34],
in Figure 1 fall into the lower region, while the plots stimulation of electroreceptors in paddlefish with Gaussian
and , in the same figure, fall into the higher region. It noise was found to change the tonic firing pattern of the
seems that the lower region owes its existence to the function electroreceptors to a bursting mode.
. If this function is chosen to be identically zero, even In conclusion, when comparing with the Rose-Hindmarsh
though the higher region is still intact, the lower region is model, our model of the noisy neuron displays bursting in a
either very narrow or does not exist at all [1]. wider range of input current values, caused by the existence
It is remarkable that the two types of noise-induced of the correction terms; and, the larger the values of the
transitions which coexist in our model were separately correction coefficients are the wider the range for bursting
suggested or observed, in some form, in various studies. are. Consequently, there exists a region of input current
In a study focused on stellate cells of the medial entorhinal values, in which, the neuron with noise is in the bursting
cortex, using a simple two-state Markov process model of state while the deterministic neuron is in the quiescent state.
the the persistent channels in a system of identical, Also that, this region coexists with another region, in which,
independent ion channels [12], it was found that increasing the noisy neuron is bursting while the deterministic neuron is
the input current value results in a shift from the stable in the state of tonic firing. It might be also worth mentioning

that, in our model, noise tends to cause a faster spiking in [24] R.K. Snider, J.F. Kabara, B.R. Roigand and A.B. Bonds, Burst firing
bursts with a high degree of periodicity. and modulation of functional connectivity in cat striate cortex, Journal
of Neurophysiology, 80 (1998), 730-744.
[25] G.S. Cymbalyuk, Q. Gaudry, M.A. Masino and R.L. Calabrese,
Bursting in leech heart interneurons: Cell-autonomous and network-
R EFERENCES based mechanisms, Journal of Neuroscience, 22 (2002), 10580-10592.
[26] C. van Vreeswijk and D. Hansel, Patterns of synchrony in neural
[1] M. Güler, Modeling the effects of channel noise in neurons: A study networks with spike adaptation, Neural Computation, 13 (2001), 959-
based on dissipative stochastic mechanics, Fluctuation and Noise 992.
Letters, 6 (2006), L147-L159. [27] M.J. Chacron, A. Longtin and L. Maler, To burst or not to burst ?,
[2] J.A. White, J.T. Rubinstein and A.R. Kay, Channel noise in neurons, Journal of Computational Neuroscience, 17 (2004), 127-136.
Trends in Neurosciences, 23(3) (2000), 131-137. [28] M. Güler, A model with an intrinsic property of learning higher order
[3] B. Hille, Ionic channels of excitable membranes, 3rd edition, Sunder- correlations, Neural Networks, 14 (2001), 495-504.
land, MA: Sinauer Associates. (2001). [29] J.L. Hindmarsh and R.M. Rose, A model of neuronal bursting using
[4] P. Graf, A. Nitzan, M.G. Kurnikova and R.D. Coalson, A dynamics three coupled first order differential equations, Proceedings of the
lattice Monte Carlo model of ion transport in inhomogeneous dielec- Royal Society of London, B221 (1984), 87-102.
tric environments: method and implementation, Journal of Physical [30] E. Nelson, Derivation of the Schrödinger equation from Newtonian
Chemistry B, 104 (2000), 12324-12338. mechanics, Physical Review, 150 (1966), 1079-1085.
[5] G. Moy, B. Corry, S. Kuyucak and S.H. Chung, Tests of continuum [31] E. Nelson, Dynamical Theories of Brownian Motion, Princeton, NJ:
theories as models of ion channels: I. Poisson-Boltzmann theory versus Princeton University Press. (1967).
Brownian dynamics, Biophysical Journal, 78 (2000), 2349-2363. [32] H. Hofmann and P.J. Siemens, On the dynamics of statistical fluctua-
[6] K.E. Cooper, E. Jakobsson and P. Wolynes, The theory of ion transport tions in heavy ion collisions, Nuclear Physics, A275 (1977), 464-486.
through membrane channels, Progress in Biophysic and Molecular [33] H. Hofmann, A quantal transport theory for nuclear collective mo-
Biology, 46 (1985), 51-96. tion: the merits of a locally harmonic approximation, Physics Reports,
[7] S. Kuyucak and T. Bastug, Physics of ion channels, Journal of 284 (1997), 137-380.
Biological Physics, 29 (2003), 429-446. [34] A.B. Neiman and D.F. Russell, Stochastic dynamics of electrorecep-
[8] B. Roux and M. Karplus, Molecular dynamics simulations of the tors in paddlefish, Fluctuation and Noise Letters, 4 (2004), L139-L149.
Gramicidin channel, Annual Review of Biophysics and Biomolecular
Structure, 23 (1994), 731-761.
[9] D.P. Tieleman, P.C. Biggin, G.R. Smith and M.S.P. Sansom, Simulation
approaches to ion channel structure-function relationships, Quarterly
Reviews of Biophysics, 34 (2001), 473-561.
[10] H. Haken, Noise and correlated transport in ion channels, Fluctuation
and Noise Letters, 4 (2004), L171-L178.
[11] R.F. Fox and Y. Lu, Emergent collective behavior in large numbers
of globally coupled independently stochastic ion channels, Physical
Review E, 49 (1994), 3421-3431.
[12] C.C. Chow and J.A. White, Spontaneous action potentials due to
channel fluctuations, Biophysical Journal, 71 (1996), 3013-3021.
[13] L.J. DeFelice and A. Isaac, Chaotic states in a random world:
Relationship between the nonlinear differential equations of excitability
and the stochastic properties of ion channels, Journal of Statistical
Physics, 70 (1992), 339-354.
[14] A.F. Strassberg and L.J. DeFelice, Limitations of the Hodgkin-Huxley
formalism: effects of single channel kinetics on transmembrane voltage
dynamics, Neural Computation, 5 (1993), 843-855.
[15] J. Rubinstein, Threshold fluctuations in N sodium channel model of
the node of Ranvier, Biophysical Journal, 68 (1995), 779-785.
[16] J.A. White, R. Klink, A. Alonso and A.R. Kay, Noise from voltage-
gated ion channels may influence neuronal dynamics in the entorhinal
cortex, Journal of Neurophysiology, 80 (1998), 262-269.
[17] S.L. Ginzburg and M.A. Pustovoit, Bursting dynamics of a model
neuron induced by intrinsic channel noise, Fluctuation and Noise
Letters, 3 (2003), L265-L274.
[18] C. Koch, Biophysics of Computation: Information processing in single
neurons, Oxford: Oxford University Press. (1999).
[19] P.F. Rowat and R.C. Elson, State-dependent effects of Na channel
noise on neuronal burst generation, Journal of Computational Neuro-
science, 16 (2004), 87-112.
[20] O.V. Sosnovtseva, S.D. Postnova, E. Mosekilde and H.A. Braun,
Inter-pattern transitions in a noisy bursting cell, Fluctuation and Noise
Letters, 4 (2004), L521-L533.
[21] H.A. Braun, M.T. Huber, N. Anthes, K. Voight, A. Neiman, X. Pei
and F. Moss, Noise-induced impulse pattern modifications at different
dynamical period-one situations in a computer model of temperature
encoding, BioSystems, 62 (2001), 99-112.
[22] U. Feudel, A. Neiman, X. Pei, W. Wojtenek, H.A. Braun, M.T. Huber
and F. Moss, Homoclinic bifurcations in a Hodgkin-Huxley model of
thermally sensitive neurons, Chaos, 10 (2000), 231-239.
[23] A. Agmon and B. Connors, Repetitive burst-firing neurons in the
deep layers of mouse somatosensory cortex, Neuroscience Letters, 99
(1989), 137-141.

Fuzzy Systems on Orthogonal Bases

Musa Alcı
musa.alci@ege.edu.tr
Ege University, Electrical and Electronics Engineering Dept., 34100 Bornova, Izmir, Turkey
weights. The normalized input membership
Abstract— Two fuzzy system models based on two functions p j ( x) are defined as following;
orthogonal bases are presented in this work. The non-
linear mapping property of the fuzzy systems is
∏ μ
combined with the powerful function space n
approximation tools of the orthogonal bases. In the i =1 j ( xi )
Ai
first model, Laguerre base is used in the input p ( x) = (3)
∑ ∏
j M n
orthogonalization whereas in the other model, j =1 i =1
μ j ( xi )
Ai
trigonometric bases are used in the output
orthogonalization. The performances of the models
are demonstrated by the sum squared error (SSE) and Where j = 1, 2,..., M , and μ j ( xi ) are the
the minimum description length (MDL) criteria, Ai
tested on the non-linear system identification problem
Gaussian membership functions. The p j ( x)
and performances are compared with the classic
Sugeno model. are also known as fuzzy basis functions. Eq.
Index terms-- orthogonal bases, Laguerre bases, (2) is the analytic representation of the fuzzy
trigonometric bases, nonlinear dynamical system,
fuzzy system identification.
rule base in Eq. (1). The fuzzy model in Eq.(2)
and Eq. (3) is formed using singleton fuzzifier,
product inference, centroid defuzzifier and
1 Introduction Gaussian membership function. If we use
normalized membership functions we have not
A fuzzy system is a representation of a rule enough freedom on the parametric space. Since
base. A rule base consists of a set of rules. normalized input membership function
Rules are constructed with linguistic parameters are set to constant values, there are
propositions, which have linguistic variables output weights as free parameters to adjust.
and linguistic values. Since, a membership The output weights are adjusted using the
function is a basic and an essential element of orthogonal least squares (OLS) learning
the rule base and fuzzy system, linguistic algorithm in [4][5].
values are represented with membership
functions. Their values vary continuously on On the other hand the well-known method
interval [0, 1]. A fuzzy system is not linear studied in literature concerning orthoganality
with respect to the membership functions. For on fuzzy systems is the Gram-Schmidt
that reason a fuzzy system can not be orthogonalization process [6-8][3]. Other
represented as an orthogonal approximation orthogonal decomposition methods which are
using membership functions. Eigenvalue Decomposition (ED) Method,
A fuzzy rule-base consists of M rules; [1][2] SVD-QR with Column Pivoting Method, Total
and is represented by; Least-Squares (TLS) method, and Direct SVD
R j : IF x1 is A1j and x2 is A2j and ..... Method [9] are also examined. In these
(1)
and xn is Anj , THEN y j is B j orthogonal transformation methods the aim is
to select important fuzzy rules from a given
where j = 1, 2,..., M , xi (i = 1, 2,..., n) are rule base. These methods have also restriction.
inputs to the fuzzy system, y is the output They must be set completely, i.e., the rule
variable, and Ai j , Bi j are linguistic values number and consequently all system
parameters must be selected or adjusted to
which are represented by membership
apply the orthogonalization methods
functions. A fuzzy system can be represented
mentioned above.
as a non-orthogonal expansion using
normalized input membership functions [3],
M 2 Method
f ( x) = ∑ p j ( x)θ j (2)
j =1
We proposed two methods for investigation
where θ j ∈ R are constants or output of fuzzy systems using orthogonal basis. They
include input and output orthogonalization
studies.

A. Fuzzy system input orthogonalization Where a < 1 is the pole of Laguerre bases
and r is the number of filters used, i = 1, 2,.., r .
The NARMAX (Nonlinear AutoRegressive
When the pole parameter a is 0, the Laguerre
Moving Average model with eXogeneous
bases or filters become regular delay operators
inputs) is used extensively [10-13] in linear
[21].
and non-linear system identification. This
model requires previous inputs and outputs for B. Fuzzy system output orthogonalization
the actual inputs. Using past terms causes
propagation of estimation error and may cause In TSK (Takagi-Sugeno-Kang) fuzzy system
instability of models [14-15]. To overcome this modeling, the output function is a nonlinear
difficulty, filtered inputs can be used. On the function of inputs. This is an important
other hand, Laguerre based identification does motivation to select output functions as
not need previous outputs; it only needs past orthogonal bases. The well-known
inputs on the time. Another good property is trigonometric orthogonal bases φ ( x) ,
that the Laguerre bases decompose inputs to {φn ( x)} = {cos(nx), sin(nx)} n = 0,1, 2,...
orthogonal signals. This process makes speed
up fuzzy system learning on system (5)
identification problems. Laguerre bases are are selected in this study.
widely used in system identification and We want to investigate their contributions on
control applications [16-18]. They are also the fuzzy system modeling. Orthogonalization
used as input orthogonalization for this study. is the process of splitting a problem or system
They can be considered as filters. into its distinct components. Orthogonal bases
span entire space. They are building stones of
A Laguerre based fuzzy system model can functional space. We represent any arbitrary
be formed using Laguerre based filters on the function by summation of weighted orthogonal
input of the classic fuzzy system model. The bases with arbitrary error. Since the weights
block diagram of Laguerre based fuzzy model are unique in an orthogonal expansion, the
is shown in Fig. 1. minimum in error space is global minimum
The output of the system can be expressed [22]. This matter explains why we are
interested in orthogonalization on fuzzy
as (k ) = f (l1 (k ),....., lr (k )) , where li (k ) is i.
y
systems.
filter output in time-domain and f (.) is a If Bi j in the rule base expression Eq.(1) is
static fuzzy model. The overall model which is replaced with a function g j ( x ) , the general
shown in Fig.1 is a dynamic model.
TSK model is formed. If g j ( x ) are selected as
g0 first order polynomial function of inputs then
l1
the resulting fuzzy system is the known first-
g1 f (.) order Sugeno fuzzy model and if g j ( x ) are set
x( k ) y(k )
l2 constants, then fuzzy system is called the zero-
Fuzzy order Sugeno fuzzy model. The aim is here, to
System use orthogonal bases in Eq.(5) or their proper
gr combinations instead of g j ( x ) . The proposed
lr
g j ( x ) are,
Figure 1. Laguerre based fuzzy system. g j ( x) = a ( j ) * cos(( j − 1) * x) +
j = 1, 2,....M .
Here gi ( k ) = li (k ) ⊗ x(k ) is the output of b( j ) * sin(( j − 1) * x )
the ith Laguerre filter, and ⊗ denotes the time (6)
domain convolution. In this case, the fuzzy rule base can be
The Z-transform of the Laguerre basis expressed as follows;
functions are given as [19-20]. R j : IF x1 is A1j and x2 is A2j and .. and xn is Anj
i −1
Z − transform 1 − a2 ⎡1 − az ⎤ THEN y j = a( j ) * cos(( j − 1) * x ) +
li ( k ) ⎯⎯⎯⎯ ⎯→ Li ( z ) = ⎢ z−a ⎥
z−a ⎣ ⎦ b( j ) * sin(( j − 1) * x).
(4) (7)
2

672
Training input sequence
3.5
The g j ( x ) vary trough rule index. Thus, the 3
proposed fuzzy model outputs form Fourier 2.5
like expansion, named trigonometric based 2
fuzzy (TBF) system. For j=1, y j will be a 1.5
constant a(1) , and for j=2, y j will be 1
a (2) * cos( x) + b(2) * sin( x) , and so on. The 0.5
a ( j ) s, b( j ) s are consequent free parameters 0
which will be adjusted. Eq.(6) results [−1, +1] -0.5
compact space, but commonly the output space -1

0 50 100 150 200 250 300
index
may change. It is expected that adjustment of
the a ( j ) , and b( j ) parameters compensate Figure 3. Input sequence used in training.
this constraint.
The models are tested for the different test
The Gaussian membership function is
input xt = 1.5cos(1.3u )sin(2u ) - cos(0.6u ) as
μ A j ( xi ) = exp(−( xij − mij ) 2 /(σ i j )2 ) ,where mij ,
i shown in Fig. 4.
and σ i j are free parameters belong to the
Test input
premise part of the fuzzy system. 2
3 Simulations 1.5
1
A. Generation of test data
0.5
In this section proposed model performances
will be discussed. We have tested models on 0
synthetic data which is obtained from

linear-nonlinear Wiener cascade test system, -0.5
y (n) = y1 (n) + y1 (n)2 − x(n) y1 (n) where -1

0 50 100 150 200 250 300
y1 ( n) = x (n) − 0.5 y1 ( n − 1) . In practice y1 (n) index
can not be measured. The input-output data Figure 4. Test input

[ x(n); y (n)] is sufficient for identification. The
overall model is shown in Fig. 2. The output B. Training of Fuzzy System
depends on its past values and past inputs
nonlinearly. A quadratic performance index is used for
parameter optimization.
The training input which is used as fallows; 1 N 2
E ( p ) = ∑ k =1[ y ( p, k ) − yd (k )] (8)
x = 0.4sin(0.9u ) + cos(2.3u )sin(1.2u ) − cos(2u ) 2
where u is varying on [0 2π ] , it is shown in where yd (k ) is the real system output or
Fig. 3. desired output, y ( p, k ) is the output of the
model which will be identified, and N is the
number of training data. The model output
depends on the parameter vector, i.e., p
x 1 y1 - y
2 embedded in y ( p, k ) . For simplicity y ( p, k )
(.)
1−0.5 z−1 and yd ( k ) are referenced y and yd respectively.
The Levenberg-Marquardth (LM) method is
used in optimizations [23]. At q. iteration, the
Figure 2. Test data generation. parameter variation Δpq is computed using the
following equation,
3

673
( J qT J q + μ q I )Δpq = − J qT ( y − yd ) (9) 6
Laguerre base Fuzzy system
where μq ≥ 0 is a scalar which makes Hessian 4
T
approximation matrix( J J q ) positive definite,
q
3
I is the unit matrix of order total parameter 2
number, and J is Jacobian matrix which is the 1
fuzzy system model output sensitivities with 0

respect to free parameters. It is;
-1
Actual
Fuzzy Model Estimation
⎡ ∂y ∂y ∂y ⎤ -2
0 50 100 150 200 250 300
J =⎢ j ⎥ (10) index
⎣⎢ ∂mi ∂σ i ∂θ j ⎦⎥
j
Figure 5. Training performance of LBF.
for Sugeno and LBF model, and 2

10
Sum Squared Error (SSE)
⎡ ∂y ∂y ∂y ∂y ⎤ 1
J =⎢ j
10
⎥ (11)
⎣ ∂mi ∂σ i ∂a( j ) ∂b( j ) ⎦
j
0
10
for TBF model. The parameter number is -1
P = M (2n + 1) for Sugeno and LBF model,

10
P = M (2n + 2) for TBF model. -2

10
In Eq. (10) and Eq.(11) i = 1,..., n , -3

10
0 10 20 30 40 50 60 70 80 90 100
j = 1, 2,..., M .Where n is number of inputs to

fuzzy system. It is selected n=2 for Sugeno and Figure 6. SSE for LBF model.
TBF model, and n=3 for LBF model. The
The Proposed model performances and
parameter vector is p = ⎡⎣ mij σ i j θ j ⎤⎦ for
Sugeno fuzzy model performance with respect
Sugeno fuzzy model and LBF model, and to the test input are shown in Fig. 7., Fig. 8.,
p = ⎡⎣ mij σ i j a ( j ) b( j ) ⎤⎦ for TBF model. In and Fig 9. respectively.
other words, the Sugeno fuzzy model and LBF Test : Lagurre Based Fuzzy System
10
model have 3 parameters per rule; the TBF
model has 4 parameters per rule. In computer 8
simulations, input membership function
centers and output parameters are initialized 6
uniformly between minimum and maximum of

4
input and output [23]. The scaling parameters
(standard deviation of membership functions) 2
are initialized to 1. Rule numbers are selected
as M=3. Another free parameter is pole a. If 0
System output
the memory of the underlying system is known LBF Estimation
-2
in advance, a suitable value for pole parameter 0 50 100 150
index
200 250 300
can be selected easily. We selected the pole

parameters of Laguerre bases as a=0.6 in our Figure 7. Test response of the trained LBF
simulation study. model.
Fig.5 shows training of LBF model for 100 The Sugeno fuzzy system model and
iterations with training input as shown in Fig. Trigonometric based fuzzy system model
3. The sum squared error (SSE) of LBF model training performances are almost same but the
for training is shown in Fig. 6. The LBF model Sugeno model test-performance is not
has fast learning capability. adequate.
4

674
10
Test : Trinigometric Based Fuzzy System
Table 1. Comparison of Sugeno Fuzzy
model, proposed LBF model, and proposed
8 TBF model. P is the number of model
parameters.
6
Training Test
4
Fuzzy Model Performance Perfor.
(MDL) (SSE) (SSE)
2
Sugeno
98.65 0.522 97.546
(P=15)
0
System output Trigonometric
TBF Estimation 107.93 0.509 4.171
-2
0 50 100 150 200 250 300
Based (P=18)
index
Laguerre
-1381.60 0.003 0.180
Figure 8. Test response of the trained Based (P=21)
trigonometric based fuzzy system model.
Test: Sugeno Fuzzy system

5 Conclusions
10
The fuzzy system models on orthogonal

8
bases are examined in this study. The input and
6
output orthogonal approximations are
introduced. They are Laguerre based and
4 Trigonometric based fuzzy system models.
Their contributions are discussed on non-linear
2 system identification problem. Both model
performances are compared with the classic
0
Actual
Sugeno fuzzy model performance using the
-2
Fuzzy Model Estimation SSE and the MDL criteria. It is shown that
0 50 100 150
index
200 250 300
their generalization capability superior than
Sugeno fuzzy model. The LBF model has best
Figure 9. Test response of the trained Sugeno
training and test performances.
fuzzy system model.
4 Results References
[1] Jang, J.-S. R., “ANFIS: Adaptive-network
Simulation results are summarized in Table
based fuzzy inference system”, IEEE
1. For all cases; models are set with same
Trans. Fuzzy System, 23, 665-685,
initial parameters and same rule number which
May/June 1993.
is M=3, trained for 100 iterations with same
inputs, and they have been tested for the same [2] Jang, J.-S.R., Sun, C.-T., Mizutani, E.
test input. The training and test performances Neuro Fuzzy and Soft Computing, USA:
are listed in Table 1. Prentice Hall, 1997, 4, 73-84.
The training performances are given as sum [3] Wang, L.-X., Mendel, J.M., “Fuzzy basis
squared error (SSE) and minimum description functions, universal approximation, and
length (MDL)[24] criterion in Table 1. The orthogonal least squares learning”, IEEE
MDL consists of two terms acting in opposite Trans. Neural Networks, 3:807-814, 1992.
sense.
[4] Lotfi, A., Howarth, M. and Hull, J.B.,
“Orthogonal rule-based systems: Selection
MDL = N log(SSE) + P log( N ) (12) of optimum rules”, Neural Comput&
Where N is the length of the data sequence. Applic., 9, 4-11, 2000.
Because the MDL criterion also takes into [5] Hong, X., Harris, C.J., “A Neurofuzzy
consideration parameter number, it is more network knowledge extraction and
meaningful than the SSE. extended Gram-Schmidt algorithm for
model subspace decomposition”, IEEE
5

675
Trans. on Fuzzy Systems, 11, 528-540, Proceedings of the 2002 IEEE
August 2001. International Conference on Fuzzy
[6] Juang, C.-F., Lin C.-T., “An On-Line Self- Systems, Volume: 2, 1399-1404, 2002.
Constructing Neural Fuzzy Inference [16] Nelles, O., Isermann, R., “Basis function
Network and Its Applications”, IEEE networks for interpolation of local linear
transactions on fuzzy systems, vol. 6, no. 1, models”, Decision and Control, 1996,
February 1998. Proceedings of the 35th IEEE 11-13 Dec
[7] Setnes, M., Hellendroon, H., “Orthogonal 1996 Volume: 1, on page(s): 470-475
transforms for ordering and reduction of [17] Campello, R.J. G. B., Meleiro, L. A. C.,
fuzzy rules”, Fuzzy Systems, 2000. The Amaral, W. C., “Control of a Bioprocess
Ninth IEEE International Conference on using Orthonormal Basis Function Fuzzy
Volume: 2, On page(s): 700-705 Models”, Proceedings 2004 IEEE
[8] Roubos, H. Setnes, M., “Compact fuzzy International Conference on Fuzzy
models through complexity reduction and Systems, 25-29 July, 2004.
evolutionary optimization”, Fuzzy [18] El Adel, E. M. Ouladsine, M., Radouane,
Systems, 2000. The Ninth IEEE L., “Predictive steering control using
International Conference on, Volume: Laguerre series representation”, Control
2, On page(s): 762-767 Applications, Proceedings of 2003 IEEE
[9] Yen, J. and Wang, L. “Simplifying fuzzy Conference on, Volume 1, 23-25 June
rule-based models using orthogonal 2003 Page(s):439 - 445
transformation methods”, IEEE Trans. On [19] Asyali, M.H., Juusola, M., “Use of
Systems, Man, And Cybernetics-Part B: Meixner functions in estimation of
Cybernetics, 1:29:13-24, February 1999. Volterra kernels of nonlinear systems
[10] Chen, S. and Billings, S. A., with delay”, IEEE Transactions on
“Representation of non-linear systems: Biomedical Engineering 52 (2): 229-237
the NARMAX model” International Feb 2005.
Journal of Control 49(3) pp. 1012-1032, [20] Oliver, P.D.,“Online system identification
1989. using Laguerre series”, IEE Proc.-Control
[11] Chen, S., Billings, S., “Practical Theory Appl., Vol 141, No. 4, July 1994.
identification of NARMAX models using [21] Oliveira, G.H.C., Campello, R.J.G.B.,
radial basis functions. International Amaral, W.C., “Fuzzy Models within
Journal of Control, 52:66, pp. 1327-1350, Orthonormal Basis Function
1990. Framework”, IEEE International Fuzzy
[12] Chiras, N. Evans, C. Rees, D., Systems Conference Proceedings, August
“Nonlinear gas turbine modeling using 22-25, 1999, Seoul, Korea
NARMAX structures”, Instrumentation [22] Chen, C.-S., Tseng, C.-S., “Performance
and Measurement, IEEE Transactions on, comparison between the training method
Aug 2001, Vol:50, Issue: 4, pp 893-898. and the numerical method of the
[13] Seborg, D.E., “Experience with nonlinear orthogonal neural network in function
control and identification strategies”, approximation”, International Journal of
International Conference on Control '94, Intelligent Systems, vol. 19, 1257-1275,
pp. 879 -886, Coventry, UK, 21-24 2004.
March 1994. [23] Alci, M., Gradient Based Fuzzy Logic
[14] Yen, G., Lee, S., “Multiple model Systems Depending on Training, PhD
approach by orthonormal bases for Thesis, SAU, January 1999.
controller design”, Proceedings of the [24] Barron, A., Rissanen, J., Yu, B., “The
American Control Conference Chicago, minimum description length principle in
Illinois June 2000, pp.2321-2329. coding and modeling”, IEEE Trans.
[15] Campello, R.J.G.B., Amaral, W.C., Information Theory, Vol. 44 (1998), No.
“Takagi-Sugeno fuzzy models within 6, pp. 2743-2760.
orthonormal basis function framework
and their application to process control”,
6

676
Homomorphisms in a direct sum of full

matrix algebras
Zhongyan Li
Department of mathematics and physics, North China Electric Power University, Beijing 102206, China
Minli Li∗
Department of mathematics, Capital Normal University, Beijing 100037, China
AMS subject classifications: 46L05
Abstract: In this paper, we give a detailed discus- (2) H → „ 2 (C) : λ + μi + σj +

M «τk
sion for the homomorphisms in a direct sum of full ma- λ + iμ σ + iτ
→ ;
trix algebras over real, complex and quaternion fields, −σ + iτ λ − iμ
which are more complicated than that only over com- (3) H → M 0 4 (R) : λ + μi + σj + τ k1
plex field. λ μ σ τ
B −μ λ −τ σ C
→ B @ −σ
C,
τ λ −μ A
−τ −σ μ λ
1 Introduction ∀λ, μ, σ, τ ∈ R.
In view of the above, the ”standard” *-
By the definition, any complex (real) AF C*-algebra A homomorphism is defined as follows:
may be regarded as the inductive limit of an increasing
sequence An of finite dimensional complex (real) C ∗ - Definition 2.1 Let k, l, n be positive integers. A uni-
algebras. In fact, let Φn be the embedding map from tal real *-homomorphism ([3])between two full matrix
An into An+1 and Φmn = Φm−1 ◦ . . . ◦ Φn (m ≥ n), algebras (as real C*-algebras) is called ”standard” if it
then A is *-isomorphic to limn {An , Φmn | m ≥ n} = is one of the following forms:
limn {An , Φn }. Conversely, let An be a sequence of (1)
finite dimensional C ∗ -algebras, and for each n, sup-
pose that Φn is a *-isomorphism from An into An+1 , k
αR,F : Mn (R) −→ Mnk (F) = Mn (F) ⊗ Mk (R)
then the inductive limit limn {An , Φn } is an AF C*-
x −→ x ⊗ Ik , ∀x ∈ Mn (R);
algebra. Therefore it is very important to investigate
the *-homomorphisms between complex (real) finite di-
mensional C*-algebras. Bratteli has given the detailed (2) (i)
discussion for the homomorphisms between finite di-
k
αC,R : Mn (C) = Mn (R)+̇iMn (R) −→
mensional complex C*-algebras([1]). About the real M2nk (R) = M2n (R) ⊗ Mk (R)
case, there is a simple depiction([2]). Here, we give the „+ ib −→ «
a
complete and detailed description of that between finite a b
⊗ Ik , ∀a, b ∈ Mn (R);
dimensional real C*-algebras with the new system of −b a
real definitions([3]). The real case is more complicated k,
(ii) αC,C :
than that of complex case. Because a finite dimensional
Mn (C) → Mn(k+) (C)
real C*-algebra A is isomorphic to a direct sum of full 0 1
x
matrix algebras over R, C or H ([3]), we mainly dis-
B .. C
cuss the *-homomorphism between the direct sum of B . C
B C
full matrix algebras. Throughout, let F, G = R, C or H B x C
x → B C,
( as real C*-algebras). B x C
B C
B .. C
@ . A
x
2 Main results ∀x ∈ Mn (C). The multiplicity of x and x are k and
l respectively;
As real C*-algebras, the canonical embeddings are given (iii)
as follows: „ «
λ μ k
αC,H : Mn (C) −→ Mnk (H) = Mn (H) ⊗ Mk (R)
(1) C → M2 (R) : λ + iμ → ;
−μ λ x −→ x ⊗ Ik , ∀x ∈ Mn (C).
0∗ Corresponding author.
E-mail address: liml@mail.cnu.edu.cn(Minli Li).

k
(3) (i) αH,R : real *-homomorphism α = {αij | 1 ≤ i ≤ n, 1 ≤j ≤
m} from A into B, where αij : Ai −→ Bj , ∀i, j is real
Mn (H) → M4nk (R) = M4n (R) ⊗ Mk (R) *-homomorphism, is called ”standard”, if
00 + a1 i + a2 j + a3 k →
a 1 0
α11 (x1 )
1
a0 a1 a2 a3
B −a1 −a3 a2 C B .. C
B a0 C ⊗ Ik , B . C
@ −a2 B C
a3 a0 −a1 A B αn1 (xn ) C
−a3 −a2 a1 a0 α(x) = B
B
C
C
B 0 C
B .. C
∀a0 , a1 , a2 , a3 ∈ Mn (R); @ . A
(ii) 0
0 1
α1m (x1 )
k
αH,C : Mn (H) −→ M2nk (C) = M2n (C) ⊗ Mk (R) B .. C
B C
„0 + a1 i + a2 j + a3 k −→«
a B .
C
a0 + ia1 a2 + ia3 B αnm (xn ) C
⊗ Ik , ⊕··· ⊕ B
B
C
C
−a2 + ia3 a0 − ia1 B 0 C
B .. C
@ . A
∀a0 , a1 , a2 , a3 ∈ Mn (R);
(iii) 0
∈ B1 ⊕ · · · ⊕ Bm = B,
k
αH,H : Mn (H) −→ Mnk (H) = Mn (H) ⊗ Mk (R)
∀x = x1 ⊕ · · · ⊕ xn , xi ∈ Ai , 1 ≤ i ≤ n.
x −→ x ⊗ Ik , ∀x ∈ Mn (H).
Remark: For each αij in Definition 2.2 is deter-
Remark: The geometrical meaning of the ”stan- mined by a positive integer or a couple of positive inte-
dard” embeddings is as follows: gers, α is determined by the matrix:
Let α : Mn (F) −→ Mm (G) be a unital real *-
homomorphism, and be ”standard”. For simplicity, α (sij (Fi , Gj ))1≤i≤n,1≤j≤m
must be injective. with
(1) If m = n, then the geometrical meaning of
Xn
p p
is as follows: sij (Fi , Gj )· dimR (Ai ) ≤ dimR (Bj ), ∀1 ≤ j ≤ m,
Let {eij | 1 ≤ i, j ≤ n} be the matrix units of i=1
Mn (F), then {α(eii ) | 1 ≤ i ≤ n} are orthogonal
where each sij is a non-negative integer or a couple of
and equivalent projections of Mm (G). Thus each α(eii )
non-negative integers ( when (Fi , Gi ) = (C, C). In this
is the orthogonal direct sum of minimal projections in
case, the sum in the above inequality is the sum of the
Mm (G) with rank (independent of i).
couple.)
(2) If (F, G) = (R, G), (C, H), and (H, H), the em-
Obviously, α is unital if and only if the inequality
bedding has been determined by of (1). It has the
becomes equality. We see that the matrix is different
same situation as the complex case (see Definition 2.1).
that of the complex finite dimensional C*-algebras([1]).
(3) (i) If (F, G) = (C, R), = 2k (see Lemma
2.1(i)), it involves the canonical embedding C → Lemma 2.1 Let φ : F −→ Mm (G) be a unital real *-
M2 (R); homomorphism, then
(ii) If (F, G) = (H, C), = 2k (see Lemma 2.1(i)), (i) If (F, G) = (C, R) or (H, C), then m is even;
it involves the canonical embedding H → M2 (C); (ii) If (F, G) = (H, R), then m = 4k, k is integer.
(iii) If (F, G) = (H, R), = 4k (see Lemma 2.1(ii)),
it involves the canonical embedding H → M4 (R); Proof: (i) Let (F, G) = (C, R), then φ(i) is a
unitary of Mm (R) with φ(i)2 = −Im . It follows
(iv) If (F, G) = (C, C), = p + q. The geometri-
that detφ(i)2 = (−1)m , where detφ(i) ∈ R, so m is
cal meaning of the couple p, q is as follows: α may be
even. Now let (F, G) = (H, C), φ : H −→ Mm (C)
written as α : C −→ M (C). Then φ(i) is a unitary
be a unital *-homomorphism. Then φ(i), φ(j) and
of M (C), and skew-hermitian with φ(i)2 = −I . so
φ(k) are unitary elements of Mm (C), and φ(i)φ(j) =
σ(φ(i)) = {±i}. There is a unitary u ∈ M (C), and
φ(ij) = −φ(j)φ(i), φ(j)φ(k) = −φ(k)φ(j), φ(k)φ(i) =
„ «
∗ Ip −φ(i)φ(k). Since σ(φ(i)) = {±i} in Mm (C), there is a
uφ(i)u = i , p + q = , unitary v ∈ Mm (C) such that
−Iq
„ «
Ip
where p and q are the multiplicity of the eigenvalue i φ(i) = vφ(i)v ∗ = i , where p + q = m.
−Iq
and −i respectively.
Let φ(j) = vφ(j)v ∗ , we have φ(i) φ(j) +
Definition 2.2 Let A = A1 ⊕ · · · ⊕ An , B = B1 ⊕ · · · ⊕ φ(j) φ(i) „= 0, φ(i)«
2
= −Im . We can assume that
Bm be two finite dimensional real C*- algebras, where A B
Ai , Bj are full matrix algebras over field C, R or H. The φ(j) = , so φ(j) = φ(i) φ(j) φ(i) =
C D

„ «„ «„ «
Ip A B Ip Lemma 2.2 Let φ : H −→ M2p (C) be a unital real
− =
−Iq C D −Iq *-homomorphism, then there is a unitary u ∈ M2p (R)
„ « „ «
−A B 0 B p
with Adu ◦ φ = αH,C , where Adu ◦ φ = u · φ(·) · u∗ .
. Thus φ(j) =
p
. Since
C −D C 0q
„ « Proof: There is a unitary v ∈ M2p (C)(see
BC 0
φ(j)2 = = −Im ,
0 CB the proof of „ Lemma «2.1) such that Adv ◦
Ip
φ(i) = i and Adv ◦ φ(j) =
B : Cq −→ Cp , −Ip
„ «
C : Cp −→ Cq , 0p B
, B ∈ Mp (C). Since φ(j)∗ = −φ(j),
−B −1 0p
B is a q × p matrix, C is a p × q matrix, it follows that
B is a unitary, and B −1 = B ∗ . It follows that
p = q, m = 2p.
φ
(ii) Let H −→ Mm (R) be a unital real *- Adv ◦ φ(k) = Adv ◦ φ(i) · Adv ◦ φ(j)
„ «
homomorphism. φ(i), φ(j) and φ(k) are unitary ele- 0p B
ments of Mm (R). As the first case of (i), m = 2p. = i −1 .
B 0p
We investigate the real operator φ(i) of Mm (C) = „ «
Mm (R)+̇iMm (R). Its eigenvalues are i and −i with Set ω =
Ip 0
, then u = ωv is a unitary of
eigenspaces [ξ1 +iη1 , · · · , ξp +iηp ] and [ξ1 −iη1 , · · · , ξp − 0 B
iηp ] in Cm . The following is obvious, < ξj + iηj , ξk + M2p (C). We have
„ «„ «„ «
j δjk , ξj , ηj ∈ R , ∀j. So in R , we have
iηk >= m m Ip Ip Ip
Adu◦φ(i) = i
< ξj , ξk > + < ηj , ηk >= δjk B −Ip B∗
(2.1) ∀1 ≤ j, k ≤ m. „ «
< ηj , ξk > − < ξj , ηk >= 0, Ip
=i ;
−Ip
„ «„ «„ «
Obviously, < ξk + iηk , ξj − iηj >= 0. It follows that Ip 0 0 B Ip 0
Adu ◦ φ(j) = ∗
j 0 B −B 0 0 B∗
„ «
< ξk , ξj > − < ηk , ηj >= 0 0 Ip
(2.2) ∀1 ≤ j, k ≤ m. = ;
< ξk , ηj > + < ηk , ξj >= 0, −Ip 0
„ « „ «„ «
Ip B Ip
By (2.1) and (2.2), ξk ⊥ ξj , ηk ⊥ ηj , ∀k = j; ξk ⊥ Adu◦φ(k) = i
B B −1 B∗
ηj , ∀k, j, and ξk 2 = ηk 2 = 12 , ∀k. We may set „ «
ξk = ηk = 1, ∀k, thus {ξ1 , · · · , ξp , η1 , · · · , ηp } is Ip
=i .
an orthonormal basis of Rm , and Ip
p
So Adu ◦ φ = αH,C . The proof is complete.
ξj + iηj i(ξj + iηj ) = iξj − ηj
φ(i) : −→ Lemma 2.3 Let φ : H −→ Mm (R) be a unitary real
ξj − iηj −i(ξj − iηj ) = −iξj − ηj .
*-homomorphism, and m = 2p = 4k. Then there is a
Thus φ(i)(ξj ) = −ηj , φ(i)(ηj ) = ξj , ∀j. There is a unitary u ∈ Mm (R) with Adu ◦ φ = αH,R
k
.
unitary (real) matrix v ∈ Mm (R) with
„ « Proof: There is a unitary v ∈ Mm (R) (see the
Ip proof of Lemma 2.1(ii)) such that
vφ(i)v ∗ = φ(i) = .
−Ip „ «
Ip
Adv ◦ φ(j) = = φ(j)
Denote φ(j) = vφ(j)v ∗ ∈ Mm (R), still φ(j)2 = −Ip
−I
„ m and « φ(j) φ(i) = −φ(i) φ(j) . Write φ(j) =
and „ «
A B A B
, where A, B, C, D ∈ Mp (R). Since φ(j) = Adv ◦ φ(k) = = φ(k) ,
C D B −A
„ «
A B
φ(i) φ(j) φ(i) , we have φ(j) = , A, B ∈ where A, B ∈ Mp (R) with Aτ = −A, B τ = −B, AB =
B −A
BA and A2 + B 2 = −Ip . We write R2p = [ξ1 , · · · , ξp ] ⊕
Mp (R). Thus A + B = −Ip , AB = BA by φ(j)2 =
2 2
[η1 , · · · , ηp ],
−Im . It follows that Aτ = −A, B τ = −B for φ(j)∗ =
φ(j) τ = −φ(j) . ξs −→ −ηs
φ(j) : 1 ≤ s ≤ p.
See A, B as real operators in Cp = Rp +̇iRp , we ηs −→ ξs ,
have Cp = X1 ⊕ · · · ⊕ X ⊕ X0 ⊕ X ⊕ · · · ⊕ X1 , where
Since AB = BA, we may find the common eigenvalues
Xj is the eigenspace of A with eigenvalue iλj , and Xj is (1) (1) (2) (2)
of A and B in Cp . Let ξ1 , · · · , ξk ; ξ1 , · · · , ξk ∈
the eigenspace of A with eigenvalue −iλj (λj = 0 ∈ R).
[ξ1 , · · · , ξp ] such that
Obviously dimC Xj = dimC Xj , 1 ≤ j ≤ ; X0 = X0 is
the null space of A. Since AB = BA, BX0 ⊂ X0 . The A (1) (2) iλt (1) (2)
eigenvalue of (B |X0 ) are couples in imaginary axis for (ξ + iξt ) = (ξ + iξt ),
B t iμt t
A2 + B 2 = −Ip . So dimC X0 is even. By the above, A (1) (2) −iλt (1) (2)
p = 2k, k ∈ Z, so m = 4k. The proof is complete. (ξ − iξt ) = (ξ − iξt ),
B t −iμt t

where λt , μt ∈ R, λ2t + μ2t = 1, 1 ≤ t ≤ k. Obviously where s, t ∈ R, s2 + t2 = 1.
( Obviously in C4 = X ⊕ X,
(1) (2) (1) (2)
< ξt + iξt , ξs + iξs >= 2δst
(1) (2) (1) (2) ∀s, t.
< ξt + iξt , ξs − iξs >= 0 e1 −→ −e3 , e3 −→ e1 ,
φ(j) :
e2 −→ −e4 , e4 −→ e2 ,
(1) (1) (2) (2)
As in Lemma 2.1, {ξ1 , · · · , ξk , ξ1 , · · · , ξk } is an
orthonormal basis of Rp = [ξ1 , · · · , ξp ]. We know that and
(1) (2) (1) (2)
ξt −→ −λt ξt , ξ −→ −μt ξt , e1 −→ −se2 − te4 , e2 −→ se1 + te3 ,
A: B : t(2) φ(k) : X −→ X :
(2)
ξt −→
(1)
λt ξt , ξt −→
(1)
μt ξt , e3 −→ −te2 + se4 , e4 −→ te1 − se3 ,
∀1 ≤ t ≤ k. Likewise, there is [η1 , · · · , ηp ] = where X = [e1 + ie3 , e2 + ie4 ] is the eigenspace corre-
(1) (1) (2) (2)
[η1 , · · · , ηk ; η1 , · · · , ηk ] such that A and B have sponding to the eigenvalue i of φ(j) ; X = [e1 −ie3 , e2 −
the same matrix representations. Now we may write ie4 ] is the eigenspace corresponding to the eigenvalue −i
(4) (4) (1) (2) (1)
Rm = ⊕kt=1 Rt , where Rt = [ξt , ξt , ηt , ηt ]. Ac-
(2)
of φ(j) .
cording this ordinal basis, the matrix of Now we must find the orthonormal basis g, h. Thus
(1)(2) (1)(2) X = [g, h], X = [g, h]. Set
ξt −→ −ηt
φ(j) : (1)(2) (1)(2)
ηt −→ ξt g+g h+h
„ « f1 = , f2 = ,
2 2
I2
is
−I2
; at the same time, g−g h−h
f3 = , f4 = .
0 1 2i 2i
λt μt
B −λt −μt C If {f1 , f2 , f3 , f4 } is an orthonormal basis of R4 , and
φ(k) = B
@
C.
μt −λt A
f1 −→ −f3 , f3 −→ f1 ,
−μt λt (2.3) φ(j) :
f2 −→ −f4 , f4 −→ f2 ,
(4)
« ωt ∈ M (Rt ) such
Now we need to„find a unitary
I2
that Adωt ◦ φ(j) = , and f1 −→ −f4 , f3 −→ −f2 ,
−I2 (2.4) φ(k) :
f2 −→ f3 , f4 −→ f1 ,
0 „ « 1
1
B 0
−1 C then φ(j) , φ(k) will satisfy our requirements. These
Adωt ◦ φ(k) = B@
„ « C.
A requirements are equivalent to φ(k) : g −→ −ih and
1
−1
0 h = −iφ(k) g ⊥ g. Let g = (λ + iμ)(e1 + ie3 ) + (σ +
iτ )(e2 + ie4 ), where λ, μ, σ, τ ∈ R. Thus iφ(k) g =
Then there is a unitary u ∈ Mm (R) with (iλ + μ)[(−se2 − te4 ) − i(−te2 + se4 )] + (iσ + τ )[(se1 +
„ «
I2 te3 ) − i(te1 − se3 )]. That is to require
Adu ◦ φ(j) = ⊗ Ik ,
0 1 0 1
0 −I2 „ « 1 λ + iμ (iσ + τ ) · (s − it)
1 B σ + iτ C B (iλ + μ) · (−s + it) C
B 0 C
−1 g=B C B
@ iλ − μ A ⊥ (−h) = @ (iσ + τ ) · (t + is) A .
C
Adu ◦ φ(k) = B@
„ « C ⊗ Ik .
A
1
0 iσ − τ (iλ + μ) · (−t − is)
−1
So By computing, < g, −h >= 0. So we may choose
j
Adu ◦ φ(i) = Adu ◦ φ(j) · Adu ◦ φ(k) g = e2 + ie4 ,
0 „ « 1
1 h = −iφ(k) g = (−te1 + se3 ) + i(−se1 − te3 )
B 0 C
B −1 „ « C
= @ A and set
1
0 − j
−1 f1 = e2 , f3 = e4
⊗Ik . f2 = −te1 + se3 , f4 = −se1 − te3 .
That is Adu ◦ φ = αH,Rk
. Then {f1 , f2 , f3 , f4 } is an orthonormal basis of R4
In order to find that ωt , the problem has been re- (s2 + t2 = 1). Therefore φ(j) and φ(k) satisfy (2.3)
duced the case of φ : H −→ M4 (R). By the above, there and (2.4). The proof is complete.
has been the following matrix in M4 (R):
0 1
s t Lemma 2.4 Let φ : Mn (F) −→ Mm (G) be a unital
„ «
I2 B −s −t C real *-homomorphism, then there is a unique standard
φ(j) = , φ(k) = B
@
C,
−I2 t −s A *-homomorphism α : Mn (F) −→ Mm (G) and a unitary
−t s u ∈ Mm (G) such that φ = Adu ◦ α.

Proof: (1) We know that p = p1 ⊕ · · · ⊕ pk for any (b) φ : H −→ Mm (C) is a unitary real *-
projection p of Mm (G), where each pi (i = 1, 2, · · · , n) homomorphism. By Lemma 2.1(i), m = 2p. By
is a minimal projection. We know that arbitrary two Lemma 2.2, there is a unitary u ∈ Mm (G) such that
p
projections of Mm (G) are equivalent. If p has another Adu ◦ φ = αH,C .
decomposition of minimal projections: p = q1 ⊕ · · · ⊕ q , (c) φ : H −→ Mm (H) is a unitary real *-
then k = . homomorphism. By Noether-skolem theorem, there is
(2) Let {fij | 1 ≤ i, j ≤ m} be the matrix units an inverse a ∈ A such that aα(x)a− = φ(x), ∀x ∈ H,
of Mm (G). Since fii Mm (G)fii ∼ = G and non-zero pro- where α : H −→ Mm (H) is a standard homomor-
jection in G is only I, we know that fii (1 ≤ i ≤ m) is a phism. Since φ and α are *-homomorphisms, we have
minimal projection. (a∗ )−1 α(x)a∗ = φ(x) = aα(x)a−1 , a∗ a ∈ α(H) ( the
(3) Let {eij | 1 ≤ i, j ≤ n} be matrix units commutators of α(H)). Naturally α(H) is also a C*-
of Mn (F). Then {φ(eii ) | 1 ≤ i ≤ n} is an equiv- subalgebra of Mm (H), and is unital. So a∗ a is a re-
1
alent projection family with φ(eii ) · φ(ejj ) = 0, i = j. versible element of α(H) and (a∗ a)− 2 ∈ α(H) . De-
1
Thus each φ(eii ) is an orthogonal direct sum of minimal fine u = a(a∗ a)− 2 , we have uα(x)u− = aα(x)a− =
projections with rank k (independent of i) in Mm (G), φ(x), ∀x ∈ H, moreover
or φ(eii ) is equivalent to the
P orthogonal direct sum of j ∗ 1 1
fjj with k times. Since i φ(eii ) = 1 (φ is unital), u u = (a∗ a)− 2 (a∗ a)(a∗ a)− 2 = I,
m = n · k, and k is as the above. ∗ ∗ −1 ∗
uu = a(a a) a = I.
(4) We see
So u is a unitary and Adu∗ ◦ φ = α. The proof is com-
= eii Mn (F)eii −→ φ(eii )Mm (G)φ(eii ) ∼
F∼
φ
= Mk (G). plete.
Then φ may be written as Theorem 2.1 Let A and B be finite dimensional
φ = ψ ⊗ id : F ⊗ Mn (R) −→ Mk (G) ⊗ Mn (R), real C*-algebras, and φ : A −→ B is a unital
real *-homomorphism. Then there is a standard *-
where ψ : F −→ Mk (G) is a unital real *- homomorphism α : A −→ B and a unitary u ∈ B such
homomorphism. So we may assume that n = 1. that Adu ◦ φ = α.
(i) If F = R.
Proof: Let A = A1 ⊕ · · · ⊕ An , B = B1 ⊕ · · · ⊕ Bm ,
φ:R −→ Mm (G) = G ⊗ Mm (R)
where each Ai and Bj are full matrix algebras over field
I −→ I ⊗ Im . R, C or H. We set
m
Thus φ = αR,G . φ = {φij | 1 ≤ i ≤ n, 1 ≤ j ≤ m},
(ii) If F = C.
(a) φ : C −→ Mm (R) is a unital real *- φij : Ai −→ Bj *-homomorphism, ∀i, j.
homomorphism. By Lemma 2.1(i), m = 2p. We know For each j, we investigate the orthogonal projection
that φ(i) is a unitary and skew-hermitian of Mm (R) family {φij (IAi ) | 1 ≤ i ≤ n}. By the discussion
and there is a unitary u by the proof of Lemma 2.1(ii) of Lemma 2.4, there is an inner automorphism φ of
such that B, we may assume that φ has the same form as α in
„ « „ «
Ip 1 Definition 2.2. By Lemma 2.4, there exist standard
Adu ◦ φ(i) = = ⊗ Ip .
−Ip −1 *-homomorphisms αij and unitary uij ∈ (Bj )φij (IAi )
p such that
That is Adu ◦ φ = αC,R .
Aduij ◦ φij = αij , ∀i, j.
(b) φ : C −→ Mm (C) is a unital real *- Pm Pn P
homomorphism. In this case, φ(i) is unitary and skew- Let u = j=1 ( i=1 uij + IBj − n i=1 φij (IAi )), then u
hermitian with φ(i)2 = −Im . So σ(φ(i)) = {±i}. There is a unitary of B, and α = (αij ) is standard. We have
is a unitary u ∈ Mm (C) such that Adu ◦ φ = α. The proof is complete.
„ «
Ip
Adu ◦ φ(i) = i , p + q = m.
−Ip References
p,q
That is Adu ◦ φ = αC,C . [1] Ola Bratteli, Inductive limits of finite dimensional
(c) φ : C −→ Mm (H) is a unitary real *- C*-algebras, Transactions of the American mathe-
homomorphism. As the same proof of (c) in the fol- matical society.,171(1972),195-234.
lowing (iii), there is a unitary u ∈ Mm (H) such that
Adu ◦ φ = αC,H
m
. [2] T. Giordano, A Classification of approximately
(iii) If F = H. finite real C*-algebras, J. reine angew.math.,
(a) φ : H −→ Mm (R) is a unital real *- 385(1988), 161-194.
homomorphism. By Lemma 2.1(ii) m = 2p = 4k. By [3] Bingren Li, Real Operator Algebras, World Scien-
Lemma 2.3, there is a unitary u ∈ Mm (R) such that tific, New Jersey. London .Singapore .HongKong
Adu ◦ φ = αH,R
k
. (2003).

A Delay Differential Equation Model of Immune

Surveillance Against Cancer and Its Stability Analysis
Dan Li and Wanbiao Ma∗
Department of Mathematics and Mechanics, School of Applied Science
University of Science and Technology Beijing, Beijing 100083, China
Abstract In this paper, based on some biological can bind to several target cells; (b) the complexes
meanings, time delay is introduced into a nonlinear or- decompose into the original effectors and dead can-
dinary differential equation model of immune surveil- cerous cells. Let E0 and E denote the densities
lance against cancer proposed by Garay and Lefever of the effectors and the complexes. The resulting
(1978), and a revised nonlinear delay differential equa- ODE model is ([12], [13], [17] and [18])
tion model is derived. The model describes prolifera- ⎧
tion, decomposition and death of cancerous cells, and ⎪
⎪ Ẋ(t) = λX(t)[1 − (X(t) + P (t))]
⎪
⎪ −k1 E0 (t)X(t),
consists of four components: cancerous cells, effector ⎨
cells, complexes of effector cells and cancerous cells, E˙0 (t) = −k1 E0 (t)X(t) + k2 E(t), (1.1)
⎪
⎪
dead cancerous cells. By some biological assumptions, ⎪
⎪ Ė(t) = k1 E 0 (t)X(t) − k2 E(t),
⎩
the revised model can be reduced to a simpler delay Ṗ (t) = k2 E(t) − k3 P (t),
differential equation with two components: cancerous
where λ, k1 , k2 and k3 are parameters. λ is the rate
cells and dead cancerous cells. Then, we study the ef-
at which the cancerous cells proliferate, k1 and k2
fect of time delay on the stability of the equilibria of the
represent the rates of binding and decomposition,
system and sufficient criteria for local asymptotic sta-
respectively. k3 depicts the rate at which the dead
bility of the equilibria of the system are given. Finally,
cancerous cells are removed away or eliminated.
numerical simulations are also presented to illustrate
In the ODE model (1.1), it is assumed that,
the main results and show the existence of periodic or-
for cancerous cells, the growth rate is of a clas-
bits for large time delay.
sical logistic type, i.e., at any time t, the increas-
ing of the cancerous cells depends on the relative
numbers of individuals occupying potential spaces.
1 Introduction From the second and third equations of (1.1), it
has that there are input and output for the ef-
It is well known that the cells in a living system fectors and the complexes. Therefore, the sum
usually become cancerous cells in some rate, and E0 (t) + E(t) of the local densities E0 (t) and E(t)
that, under immune surveillance against cancer, could be biologically assumed as a constant for
cancerous cells will proliferate, decompose and then any time t ≥ 0, i.e., E0 (t) + E(t) ≡ E1 =const.
die. The above process of transmission is often ([16]). Furthermore, in biology ([13], [16] and [18]),
mathematically described by nonlinear differential it has that the distribution of E0 and E in phase
equations. In the model proposed by Lefever and could also be assumed to equilibrate rapidly with
Garay ([12]), four types of interacting populations respect to the local density of the cancerous cells
are taken into account. The density of cancerous X, i.e., k1 E0 X − k2 E ≡ 0. Therefore, it has that
cells which replicate in accordance with the law of E0 = k2 E1 /(k1 X + k2 ), E = k1 E1 X/(k1 X + k2 ),
logistic growth is denoted by X. The density of and that the fourth-dimensional nonlinear differen-
dead cancerous cells is represented by P . Generally tial equation (1.1) can be reduced to the following
speaking, the cytotoxic process includes two steps: nonlinear differential equation with the variables X
(a) the effector cells and the target cells are conju- and P ([16] and [18]),
gated and form multicellular complexes; an effector ⎧
∗ The research of this article is partially supported by
⎨ Ẋ(t) = λX(t)[1 − (X(t) + P (t))]
−k1 k2 E1 X(t)/(k1 X(t) + k2 ),
the Foundation of University of Science and Technology ⎩
Beijing and The National Natural Science Foundation of Ṗ (t) = k1 k2 E1 X(t)/(k1 X(t) + k2 ) − k3 P (t).
China(No.10671011). (1.2)

In the paper, based on some biological meanings, The purpose of the paper is to consider dynam-
we shall introduce a time delay τ into the ODE ical properties of the nonlinear delay differential
model (1.1) and get the following nonlinear delay system (1.4).
differential equation model, For simplicity, we introduce the dimensionless
⎧ parameters, λt → t, k1 X/k2 → X, k1 P/k2
⎪
⎪ Ẋ(t) = λX(t)[1 − (X(t − τ ) + P (t − τ ))] → P , and k2 /k1 ≡ θ, k1 E1 /λ ≡ β, k3 /λ ≡ γ,
⎪
⎪ −k1 E0 (t)X(t),
⎨ then, the system (1.4) can be rewritten as the fol-
E˙0 (t) = −k1 E0 (t)X(t) + k2 E(t), lowing dimensionless form,
⎪
⎪
⎪
⎪ Ė(t) = k1 E0 (t)X(t) − k2 E(t), ⎧
⎩
Ṗ (t) = k2 E(t) − k3 P (t), ⎨ Ẋ(t) = X(t)[1 − θ(X(t − τ ) + P (t − τ ))]
(1.3) −βX(t)/(1 + X(t)),
⎩
which can be further reduced to the following non- Ṗ (t) = βX(t)/(1 + X(t)) − γP (t).
linear delay differential system with the variables (1.5)
X and P, The organization of the paper is as follows. In the
⎧ following section, we shall consider existence and
⎨ Ẋ(t) = λX(t)[1 − (X(t − τ ) + P (t − τ ))] the ultimate boundedness of the solutions of (1.5).
−k1 k2 E1 X(t)/(k1 X(t) + k2 ), In Section 3, we shall give a detailed analysis for
⎩
Ṗ (t) = k1 k2 E1 X(t)/(k1 X(t) + k2 ) − k3 P (t). the local asymptotic stability of the nonnegative
(1.4) equilibrium and the positive equilibrium of (1.5).
We introduce the time delay τ to the ODE model In Section 4, we give some remarks and numerical
(1.1) based on the following motivations in biology simulations which illustrate the main results.
and mathematics. (i) The classical logistic term
X(t)(1 − X(t) − P (t)) in the ODE model (1.1) as-
sumes that the growth rate of the cancerous cells in 2 Boundedness of solutions
a population depends on the relative number of the
individuals occupying potential spaces, at any time By biological meaning, the initial condition of (1.5)
t, in the system under consideration. In practice, is given as
this is improbable in biology, because the process
X(t) = ϕ1 (t) ≥ 0,
of the replication of the cancerous cells is not in- (2.1)
P (t) = ϕ2 (t) ≥ 0, (t ∈ [t0 − τ, t0 ]),
stantaneous. Thus, it is more reasonable to use
X(t)(1−X(t−τ )−P (t−τ )) to describe the growth where t0 is a real constant, ϕ1 (t) and ϕ2 (t) are
of the cancerous cells at time t. In fact, the similar continuous functions on [t0 − τ, t0 ]. With a stan-
argument has been used in [7] for deriving the fol- dard argument, it is easily shown that the solu-
lowing well known Logistic model with time delay tion (X(t), P (t)) of (1.5) with (2.1) is existent and
τ ≥ 0 ([11]), nonnegative on [t0 , +∞), and is also ultimately
bounded, i.e., we have the following
Ṅ (t) = rN (t)(K − N (t − τ ))/K (r, K > 0).
Theorem 1 The solution (X(t), P (t)) of
(ii) In recent years, time delays (including discrete
(1.5) with (2.1) is existent and nonnegative
time delays and distributed time delays) have been
on [t0 , +∞), and lim supt→+∞ X(t) ≤ M ≡ eτ /θ,
widely incorporated in various immune dynamical
lim supt→+∞ P (t) ≤ N ≡ βM/γ(1 + M ).
systems in HIV infection (see, for example, [1]-[5],
[9], [10], [14]-[15], [20] and references there in). For Proof. It follows from the local existence the-
example, an intracellular delay represents the time ory of solutions for functional differential equations
between viral entry into a target cell and the pro- (see, for example, [8] and [11]) that (X(t), P (t))
duction of new virus particles. Pharmacological de- is existent on [t0 , b) for some positive constant
lay occurs between the ingestion of drug and it ap- b > t0 . The first equation of (1.5) can be written
pearance with in cells. Furthermore, time delays as Ẋ(t) = X(t)g(Xt , Pt ) for t ∈ [t0 , b), where
can also be used to describe the time between infec-
tion of a CD4+ T-cell and emission of viral particles g(Xt , Pt ) = 1−θX(t−τ )−θP (t−τ )−β/(1+X(t)).
on cellular level. (iii) In general, delay differential
equations exhibit much more complicated dynam- It has that for t ∈ [t0 , b),
ics than ordinary differential equations, since a time
t
delay could cause a stable equilibrium to become
X(t) = ϕ1 (t0 ) exp g(Xu , Pu )du ≥ 0,
unstable, and cause the populations to fluctuate. t0

which further implies that X(t) ≡ 0 on t ∈ [t0 , b) classifications of the equilibria of (1.5). First, it is
for ϕ1 (t0 ) = 0, and that X(t) > 0 on t ∈ [t0 , b) for clear that there always exists the boundary equi-
ϕ1 (t0 ) > 0. librium E0 = (0, 0) for any θ > 0, β > 0 and γ > 0.
We further show that P (t) ≥ 0 for any t ∈ [t0 , b). Furthermore, there also exist positive equilibria in
If not so, it has from the continuity of P (t) on [t0 − the following cases:
τ, b), there exists some t1 > t0 such that P (t1 ) < 0 (i) If β < 1, there exists unique positive equi-
and Ṗ (t1 ) ≤ 0. On the other hand, it has from librium E ∗ = (X ∗ , P ∗ ). (ii) If β > 1 and
the second equation of (1.5) and X(t) ≥ 0 for any 0 < θ < θ1 , there exist two positive equilib-
t ∈ [t0 , b) that ria E1∗ = (X1∗ , P1∗ ) and E2∗ = (X2∗ , P2∗ ). (iii) If
β > 1 and θ = θ1 , there exists unique positive
Ṗ (t1 ) = βX(t1 )/(1 + X(t1 )) − γP (t1 ) > 0, equilibrium E ∗∗ = (X ∗∗ , P ∗∗ ). (iv) If β = 1 and
0 < θ < γ/(1 + γ), there exists unique positive
which is a contradiction to Ṗ (t1 ) ≤ 0. Hence, equilibrium E ∗∗∗ = (X ∗∗∗ , P ∗∗∗ ).
P (t) ≥ 0 for any t ∈ [t0 , b).
Next, let us further show that X(t) and P (t) are
bounded on [t0 , b). 3 Stability of equilibria
From the second equation of (1.5), it has that
for t ∈ [t0 , b), Ṗ (t) ≤ β. Integrating it from t0 In the section, we shall first give a detailed analysis
to t, it has that P (t) ≤ ϕ2 (t0 ) + β(t − t0 ) < for the local asymptotic stability of the boundary
ϕ2 (t0 ) + β(b − t0 ), which implies that P (t) is equilibrium E0 = (0, 0). It has the following
bounded on [t0 , b). Since P (t) is nonnegative on
Theorem 2 If β > 1, then E0 of (1.5) is locally
[t0 , b), it has from the first equation of (1.5) that
asymptotically stable for any time delay τ ≥ 0. If
for t ∈ [t0 , b), Ẋ(t) ≤ X(t), which implies that
β < 1, then E0 of (1.5) is unstable for any time
the inequality X(t) ≤ X(t0 )e(b−t0 ) holds for any
delay τ ≥ 0. If β = 1, it is a critical case.
t ∈ [t0 , b). Hence, we have that X(t) is also
bounded on [t0 , b). Therefore, it follows from the Proof. To discuss the local asymptotic stabil-
continuity theory of solutions for functional differ- ity of E0 , let us consider the following coordinate
ential equations (see, for example, [8] and [11]) that transformation u(t) = X(t) − X̄, v(t) = P (t) − P̄ ,
the solution (X(t), P (t)) is also existent and non- where (X̄, P̄ ) denotes any equilibrium of (1.5).
negative on [t0 , +∞). Hence, the corresponding linearized system of (1.5)
Now, we further consider the ultimate bound- is
edness of the solutions of (1.5) by similar analysis ⎧
techniques given in [11]. From the first equation of ⎨ u̇(t) = 1 − θX̄ − θP̄ − β q̄ u(t)
(1.5), it has Ẋ(t) ≤ X(t) for t ≥ t0 . Integrat- −θX̄u(t − τ ) − θX̄v(t − τ ), (3.1)
⎩
ing it from t − τ to t, it has that X(t − τ ) ≥ v̇(t) = β q̄u(t) − γv(t),
e−τ X(t) for t ≥ t0 + τ . Hence, we have that
for t ≥ t0 + τ, Ẋ(t) ≤ X(t)[1 − θe−τ X(t)]. It where q̄ = 1/(1 + X̄)2 . The associated characteris-
has from well known comparison principle that tic equation of (3.1) is given by
lim supt→+∞ X(t) ≤ eτ /θ = M. Hence, for any suf-
ficiently small ε > 0, there exists sufficiently large λ2 + γ − 1 + θX̄ + θP̄ + β q̄ λ
T, such that for any t ≥ T, it has that X(t) < M +ε.
+γ −1 + θX̄ + θP̄ + β q̄ (3.2)
It has from the second equation of (1.5) that

+e−λτ θX̄λ + θγ X̄ + βθX̄ q̄ = 0.
Ṗ (t) ≤ β(M + ε)/(1 + M + ε) − γP (t),
It is clear that the associated characteristic equa-
which implies that lim supt→+∞ P (t) ≤ β(M + tion of (3.1) at E0 = (0, 0) = (X̄, P̄ ) becomes
ε)/γ(1 + M + ε). Note that ε may be arbitrarily
small, it has that lim supt→+∞ P (t) ≤ N. λ2 + (γ − 1 + β) λ + γ (−1 + β)
This completes the proof of Theorem 1.
= (λ − 1 + β)(λ + γ) = 0. (3.3)
The existence of the equilibria of (1.5) without Hence, (3.3) have two real roots, λ = λ1 = −γ <
time delay has been considered in [18] based on 0, λ = λ2 = 1 − β, which clearly imply that the
some simple theoretical analysis and numerical sim- conclusions of Theorem 2 are true.
ulations. For convenience in discussion on stability This completes the proof of Theorem 2.
of the equilibria of (1.5) in next section, we give the

Remark 1 Note that β = k1 E1 /λ, it has that of the characteristic equations have negative real
the condition β > 1 means that the killing rate parts, applying such a general test to specific tran-
of the effector cells is higher than the increasing scendental equations and far from trivial.
rate of the cancerous cells, and hence the stability We shall use similar methods as in [5] and [11]
of E0 of (1.5) implies that cancerous cells shall to discuss the distribution of the roots of (3.5).
be dead ultimately. Furthermore, β < 1 means Let λ = λ(τ ) = η(τ ) + iω(τ ) be any root of
that the killing rate of the effector cells is lower (3.5). It has that η(τ ) and ω(τ ) depend on the time
than the increasing rate of the cancerous cells, and delay τ continuously (see, for example, [11] and
hence the instability of E0 of (1.5) implies that [19]). From the assumptions in Theorem 3, it has
cancerous cells can not be eliminated ultimately. that all the roots of (3.5 )for τ = 0 have negative
real parts, i.e., η(0) < 0. Hence, it follows from
For stability of the positive equilibrium E ∗ of Rouché theorem and the continuity with respect
(1.5), it has the following to τ that η(τ ) < 0 for τ > 0 being sufficiently
small, which implies that E ∗ of (1.5) is still locally
Theorem 3 If β < 1, and the linearized system asymptotically stable for sufficiently small τ > 0.
of (1.5) at E ∗ is locally asymptotically stable for We first show that there exits some critical value
τ = 0, then, there exists a critical value τ0 such τ0 > 0 such that η(τ0 ) = 0 and ω(τ0 ) > 0, i.e.,
that E ∗ of (1.5) is locally asymptotically stable for λ = ±iω(τ0 ) are a pair of purely imaginary roots
τ < τ0 and unstable for τ > τ0 . of (3.5) for τ = τ0 .
In fact, it has from B > 0 that λ = 0 is not a
Proof. For simplicity, let us set q = 1/(1+X ∗ )2 . root of (3.5) for any τ ≥ 0. If λ = iω is the purely
It is known from (3.1) that the corresponding lin- imaginary root of (3.5) for some τ > 0 and ω > 0,
earized system of (1.5) at E ∗ is separating the real part and the imaginary part, it
⎧ has from (3.5) that
⎨ u̇(t) = qβX ∗ u(t) − θX ∗ u(t − τ ) ⎧
−θX ∗ v(t − τ ), (3.4) ⎪ (θγX ∗ + qβθX ∗ ) cos ωτ + ωθX ∗ sin ωτ =
⎩ ⎪
⎨
v̇(t) = qβu(t) − γv(t), ω 2 + qβγX ∗ ,
⎪
⎪ ωθX cos ωτ − (θγX + qβθX ∗ ) sin ωτ =
∗ ∗
and the associated characteristic equation of (3.4) ⎩
− ω (γ − qβX ∗ ) .
is given by (3.7)
Adding up the squares of the both equations, we
λ2 + (γ − qβX ∗ ) λ − qβγX ∗
obtain that
+e−λτ (θX ∗ λ + θγX ∗ + qβθX ∗ ) = 0. (3.5) 2
(θγX ∗ + qβθX ∗ ) + ω 2 θ2 X ∗2 =
If τ = 0, (3.5) reduces to 2 2 2
ω + qβγX ∗ + ω 2 (γ − qβX ∗ ) ,
λ2 + (γ − qβX ∗ + θX ∗ ) λ from which it has that

2
−qβγX ∗ + θγX ∗ + qβθX ∗ = 0. (3.6) ω 4 + 2qβγX ∗ + (γ − qβX ∗ ) − θ2 X ∗2 ω 2
Since the linear system (3.4) is locally asymptot- 2 2
+ (qβγX ∗ ) − (θγX ∗ + qβθX ∗ ) = 0. (3.8)
ically stable for τ = 0, it has from the Routh-
Hurwitz criterion that Since A > 0 and B > 0, it has that
A ≡ γ − qβX ∗ + θX ∗ > 0, Ā = 2qβγX ∗ + A (γ − qβX ∗ − θX ∗ ) ,

B ≡ −qβγX ∗ + θγX ∗ + qβθX ∗ > 0. B̄ = −B (qβγX ∗ + θγX ∗ + qβθX ∗ ) < 0.
Hence, it has from (3.8 )that
It is known that the positive equilibrium E ∗ is lo-

cally asymptotically stable if all the roots of (3.5)
ω 2 = −Ā/2 + Ā2 − 4B̄/2 > 0.
have negative real parts. However, as pointed out
in [5] and [11] that (3.5) is a transcendental equa- This implies that (3.5) has a pair of purely imag-
tion and have infinitely many eigenvalues, hence, inary roots λ = ±iω(ω > 0). Furthermore, from
the classical Routh-Hurwitz criterion cannot be (3.7), it easily has that
used to discuss (3.5) anymore. Moreover, though
there are some general tests (see, for example, [19]) cos ωτ = [(γ + qβ)qβγX ∗
that can be used to determine when all the roots +(qβω 2 + qβX ∗ ω 2 )]μ > 0, (3.9)

sin ωτ = [(ω 2 + qβγX ∗ ) characteristic equation (3.5) which has positive real
+(γ − qβX ∗ )(γ + qβ)]ωμ, (3.10) part for any τ > τ0 , i.e., the positive equilibrium
E ∗ of (1.5) is unstable for τ > τ0 .
2
where μ = 1/θX ∗ [ω 2 + (γ + qβ) ]. Hence, there This completes the proof of Theorem 3.
is unique θ (0 < θ ≤ 2π) such that ωτ = θ and
that (3.9) and (3.10) hold. Therefore, it has that Remark 2 Since β < 1 means that the killing
θ = arctan(A1 /A2 ), where rate of the effector cells is lower than the increas-
ing rate of the cancerous cells, Theorems 2-3 show
A1 = ω[(γ − qβX ∗ )(γ + qβ) + ω 2 + qβγX ∗ ], that the cancerous cells can not be eliminated ulti-
A2 = (γ + qβ) qβγX ∗ + qβω 2 + qβX ∗ ω 2 . mately and that the densities of the cancerous cells
and the dead cancerous cells shall tend to positive
Let τ0 = θ/ω. It has that λ = ±iω(τ0 ) (ω(τ0 ) > 0) constants, respectively, as t tends to infinity for
are a pair of purely imaginary roots of (3.5) for τ < τ0 . If β < 1 and τ > τ0 , Theorem 3 also im-
τ = τ0 . Hence, it has from the continuity of λ(τ ) plies that the densities of the cancerous cells and
with respect to τ that all the roots of (3.5) have the dead cancerous cells vary periodically as t in-
negative real parts for 0 ≤ τ < τ0 , i.e., E ∗ of (1.5) creases.
is locally asymptotically stable for τ < τ0 .
To show instability of E ∗ of (1.5) for τ > τ0 ,
let us consider sign of dReλ
dτ λ=iω
. Computing the 4 Conclusions
derivatives of the two sides of (3.5) with respect to
time delay τ, it has that In this paper, based on some biological meanings,
we introduce a time delay into the ODE model (1.1)
[2λ + (γ − qβX ∗ ) + e−λτ θX ∗ which was used to describe proliferation, decompo-
dλ sition and death of cancerous cells by Lefever and
−τ e−λτ (θX ∗ λ + θγX ∗ + qβθX ∗ )] Garay (1978), and get a revised two-dimensional
dτ
= λe−λτ (θX ∗ λ + θγX ∗ + qβθX ∗ ) . nonlinear differential equation model (1.5) with
time delay in Section 1. In biology, our revised
−1 model (1.5) is more realistic than that without time
dλ(τ )
For convenience, let us study dτ instead of
delay. In Section 2, we give a detailed discussions
dλ(τ )
dτ . By some simple computations, it has that on the global existence and ultimate boundedness
−1 of the solutions of (1.5) with suitable initial condi-
dλ(τ ) 2λ + (γ − qβX ∗ ) tions. A detailed analysis on local stability of the
= −
dτ λ (λ2 + (γ − qβX ∗ ) λ − qβγX ∗ ) boundary equilibrium E0 and the positive equilib-
θX ∗ rium E ∗ of (1.5) are given in Section 3. Theo-
+ rem 2 shows that, if β > 1, i.e., the killing rate of
λ (θX λ + θγX ∗ + qβθX ∗ )
∗
τ the effector cells is higher than the increasing rate
− . (3.11) of the cancerous cells, the boundary equilibrium
λ
point E0 is locally asymptotically stabile for any
Therefore, it has from (3.8) and (3.11)that τ ≥ 0, i.e., the cancerous cells can be eliminated
ultimately. However, if β < 1, i.e., the killing rate
dReλ dλ(τ ) −1 of the effector cells is lower than the increasing rate
sign = sign Re( )
dτ
λ=iω
dτ λ=iω of the cancerous cells, the positive equilibrium E ∗
2λ + (γ − qβX ∗ ) of (1.5) appears and is locally asymptotically sta-
= sign Re −
λ (λ2 + (γ − qβX ∗ ) λ − qβγX ∗ ) ble for τ < τ0 , i.e., the densities of the cancerous
cells and the dead cancerous cells shall tend to con-
θX ∗ τ
+ − stants, respectively, as t tends to infinity for τ < τ0 ,
λ (θX ∗ λ + θγX ∗ + qβθX ∗ ) λ
2
λ=iω
i.e., the cancerous cells can not be eliminated ul-
∗ 2
= sign (γ − qβX ) + 2 ω + qβγX −θ2 X ∗2 ∗
timately. Furthermore, it follows from Theorem 3
that, if β < 1 and τ > τ0 , the densities of the
= sign Ā2 − 4B̄ = 1 > 0. cancerous cells and the dead cancerous cells shall
become oscillatory as t increases. Theorem 3 sug-
It is clear from sign dReλ
dτ λ=iω
> 0 that all the gests that the time delay τ factually has important
roots that cross the imaginary axis at λ = ±iω affects to the dynamics of the positive equilibrium
must cross from left to right as τ increases. This E ∗ of (1.5).
implies that there exists at least one root of the It should be pointed out here that a detailed

analysis on local and global asymptotic properties [5] R. V. Culshaw, S. Ruan and G. Webb. A mathe-
of other positive equilibria, such as E ∗∗ , E ∗∗∗ , E1∗ matical model of cell-to-cell HIV-1 that include a
and E2∗ are also important both in biology and time delay. J. Math. Biol., 46(2003), 425-444.
mathematics. Because of computational complex-
ity, we shall discuss them in another paper. [6] J. M. Cushing. Integrodifferntial Equations and
Finally, let us give a numerical example to il- Delay Models in Population Dynamics. Springer-
lustrate applications of Theorems 2-3. We choose Verlag, Heidelberg. (1977).
the parameters β, θ, γ and τ, and the initial func- [7] G. Evelyn and Hutchinson. An Introduction to
tion (ϕ1 (t),ϕ2 (t)) as follows: β = 2, θ = 3, γ = Population Ecology. Academic Science, (1978).
6, τ = 20, (ϕ1 (t), ϕ2 (t)) = (1, 1) (−τ ≤ t ≤ 0).
Since β > 1, it has from Theorem 2 that the bound- [8] J. K. Hale. Theory of Functional Differential
ary equilibrium E 0 is locally asymptotically stable Equations. Springer-Verlag, New York. (1997).
for any time delay τ ≥ 0. In fact, numerical simula-
tions suggest that E0 may be also globally asymp- [9] A. V. M. Herz, S. Bonhoeffer, R. M. May and M.
totically stable for any time delay τ ≥ 0 (see Figure A. Nowak. Viral dynamics in vivo: limitations on
4.1 below). estimates of intracellular delay and virus decay.
Let us the parameters β, θ and γ, and the ini- USA: Proc. Nat. Acad. Sci., 93(1996), 7247-7251.
tial function (ϕ1 (t),ϕ2 (t)) as follows: β = 0.5, θ = [10] T. Kajiwara and T. Sasaki. Theoretical analysis
2, γ = 0.1, (ϕ1 (t), ϕ2 (t)) = (0.4, 0) (−τ ≤ t ≤ 0). of pathogen-immune interaction dynamical sys-
Then, it has that (X ∗ , P ∗ ) ≈ (0.0451, 0.2157) and tem models (in Japanese). Kyoto University: Suri
τ0 ≈ 3.20. Since Kaiseki Kenkyujyo Kokyuroku, 1432(2005), 172-
177.
A ≡ γ − qβX ∗ + θX ∗ = 0.16955 > 0,
B ≡ −qβγX ∗ + θγX ∗ + qβθX ∗ = 0.048247 > 0, [11] Y. Kuang. Delay Differential Equations with
Applications in Population Dynamics. Academic
it has that the linearized system of (1.5) at the pos- Press, San Diego. (1993).
itive equilibrium E ∗ is locally asymptotically stable
for τ = 0. Hence, it has from Theorem 3 that E ∗ is [12] R. Lefever and R. Garay. Local description of
locally asymptotically stable for τ < τ0 ≈ 3.20 (see immune tumor rejection, In Biomathematics and
Figure 4.2 below). Cell Kinetics, A. J. Valleron and P. D. M. Mac-
If we choose τ = 3.21 > 3.20 ≈ τ0 , Figure donald (Eds). North-Holland: Series Develop-
4.3 below clearly shows that E ∗ becomes unsta- ments in Cell Biology, 2(1978), 333-340.
ble and that there are some orbits of (1.5) that [13] R. Lefever and W. Horsthemke. Biostability in
shall tend to some non-constant periodic orbit as t fluctuating environments implications in tumor
increases. Furthermore, numerical simulations also immunology. Bull. Math. Biol., 41(1979), 469-490.
suggest that E ∗ may be also globally asymptotically
stable for τ < τ0 . [14] J. E. Mittler, B. Sulzer, A. U. Neumann and A. S.
Perelson. Influence of delayed viral production on
viral dynamics in HIV-1 infected patients. Math.
References Biosci., 152(1998), 143- .
[1] E. Beretta and Y. Kuang. Modeling and analysis [15] P. W. Nelson and A. S. Perelson. Mathematical
of a marine bacteriophage infection. Math. Biosci., analysis of a delay differential equation models of
149(1998), 57-76. HIV-1 infection. Math. Biosci., 179(2002),73-94.
[2] E. Beretta, T. Hara, W. Ma and Y. Takeuchi. [16] I. Prigogine and R. Lefever. Stability problems in
Global asymptotic stability of an SIR epidemic cancer growth and nucleation. Comp. Biochem.
model with distrbulted time delay. Nonl. Anal. Physiol., 67B(1980), 389-393.
TMA, 47(2001), 4107-4115.
[17] A. Qi. Nonlinear Models in Immunity. Shanghai
[3] S. Busenberg and K. L. Cooke. Vertical Transmit- Scientific and Technological Education Publishing
ted Diseases. Springer-Verlag, Berlin. (1993). House, (1998).
[4] R. V. Culshaw and S. Ruan. A delay-differential [18] A. Qi. Multiple solutions of a model describing
equation model of HIV infection of CD4+ T-cells. cancerous growth. Bull. Math. Biol., 50(1988), 1-
Math. Biosci., 165(2000), 27-39. 17.

[19] Y. Qin, Y. Liu, L. Wang and M. Wang. Stability 0.4
of Dynamical Systems with Time Deiays. Scinece
0.35
Press, Beijing. (1989).
0.3
[20] J. Tam. Delay effect in a model for virus replica-
0.25
tion. IMA J. Math. Appl. Med. Biol., 16(1999),
29-37. 0.2
P
0.15
0.1
1.2
0.05
1
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
0.8 X
0.6
P
0.4
Figure 4.2: The orbit of (1.5) with β = 0.5, θ = 2,
0.2
γ = 0.1, τ = 3.19 < τ0 and (ϕ1 (t), ϕ2 ) = (0.4, 0).
−0.2 0.4
−0.2 0 0.2 0.4 0.6 0.8 1 1.2
X
0.35
0.3
Figure 4.1: The orbit of (1.5) with β = 2, θ = 3, 0.25
γ = 6 and (ϕ1 (t), ϕ2 (t)) = (1, 1). 0.2

P
0.15
0.1
0.05
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
X
Figure 4.3: The orbit of (1.5) with β = 0.5, θ = 2,

γ = 0.1, τ = 3.21 > τ0 and (ϕ1 (t), ϕ2 (t)) = (0.4, 0).

The Effect of Indexing Methods on SVM-based Text Categorization

Ju Jiang1 , Lei Chen1 , Mohamed S.Kamel1 , and Yi Zhang2
1 Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, N2L 3G1, Canada
2 Department of Mathematics and Physics, China University of Petroleum (Beijing), Beijing, China
j4jiang@uwaterloo.ca, l5chen@uwaterloo.ca, mkamel@uwaterloo.ca, z y12@126.com
Abstract— The use of support vector machines (SVMs) in the II. D OCUMENT C LASSIFICATION
text categorization problems has been claimed to achieve high
accuracy and efficient computational complexity. The effects Text categorization is the problem of automatically as-
of the different kernel functions and functional parameters on signing one or more predefined categorizes to free text
the classification results have been well discussed in previous documents. Most of the researches in text categorization
research. However, few papers studied the influence of different have been transformed to the binary classification problem,
indexing methods, which is an important step in document in which each classifier only needs to decide whether the
classification. In this paper, we focus on how the different index-
ing methods affect the classification results. Our experimental document is relevant or not to a predefined category. In this
results reveal that the accuracy of document classification is section, we will briefly review some common approaches
not sensitive to methods of indexing, while the computational used in the text categorization. In this paper we will use ci to
complexity could be affected. A brief discussion following the represent the ith category, dj to represent the j th document,
description of the experiment is presented to explain the reasons and wkj to represent the term (word) k in document j.
theoretically.
Naive Bayesian classifier, one of the most widely dis-
cussed probabilistic classifiers, assumes the independence of
I. I NTRODUCTION any two coordinates of document vectors. Thus the estima-
tion of P (dj |ci ) can be simplified as
SVM is a supervised machine learning technique, which
|T |
can create a function from a set of labeled training data to
describe the characteristic of the data set. This innovative P (dj |ci ) = P (wkj |ci ) (1)
approach was invented by Vladimir Vapnik [11] and is now k=1
widely used in the data mining areas. Some desirable char- where, T is the number of the total words in document j.
acteristics, such as the independency of the dimension of the Then, the Bayesian formula can be applied to estimate the
input space and the using of simple kernel functions rather probabilities:
than real non-linear mapping, make SVM successful in many
P (ci )P (dj |ci )
fields like classification and regression. Because of the use of P (dj |ci ) = (2)
inner product in kernel functions, SVM methods are able to P (dj )
learn and generalize well in large dimensional input spaces, While the ”Naive” property of this approach does not al-
which are the disasters for many other learning algorithms, ways hold in practice, Aas [1] claims the surprising effective-
and handle non-linear feature spaces. The performance of ness of this approach in many circumstances. Lewis [9] gives
conventional document classification algorithms (described a detailed review on the different Naive Bayesian classifiers,
in Section II) is sensitive to feature selection methods, but which relax the assumptions on the binary and independence
SVM pays little attention to the feature selection, which requirement. However, the estimation of P (dj |ci ) is sensitive
makes SVM be widely used in text categorization and shows to indexing methods.
more advantages than other classification algorithms. Never- A decision tree classifier is a tree such that the leaves
theless, most researches only focus on the discussion of clas- are labeled by the categories, the internal nodes are labeled
sification accuracy with different feature selection methods, by the different terms and the branches, which depart from
and few of them analyze the influences of indexing methods nodes, are labeled by testing on the weight, which the term
on computational complexity. This paper will provide some has, in the test document. A document dj is recursively
SVM-based text categorization results and analyze the effect evaluated by each term in the internal node until a leaf is
of different indexing methods on both the computational reached. Then, the document dj will be classified to the
complexity and classification accuracy. category, which is labeled by the leaf. Aas [1] and Sebastiani
The rest of this paper is organized as follows. A brief [10] extensively examined the different decision tree learning
review of document classification and SVM will be given in models including CART, C4.5 and CHAID.
Sections II and III. Section IV discusses the effects of the A Neural Network (NN) text classifier is a network of
indexing methods based on the experimental results. Section nodes, in which the input nodes represent terms, the output
V will provide some conclusions about this paper. node(s) represents the category or categories of interest, and

the weights connecting between nodes represent dependence Equation (3) gets its maximum. Where, α∗ are positive real
relations. One of the most widely used training methods is numbers that maximize Equation (5) under the constrains of
the back-propagation, in which errors are back propagated (6) and (7).
to update the parameters of the network. Multi-layered
1
n n n
neural networks are widely adopted, in which the high-order
Q(α) = αi αj yi yj < ϕ(xi ), ϕ(xj ) > + αi
interaction between the terms can be solved [12]. However, 2 i=1 j=1 i=1
one of the main disadvantages of NN is that the complexity (5)
of structure and computation load are greatly affected by
the dimension of input. Therefore, the training performance
n
highly depends on indexing methods. αi yi = 0. (6)
In the discussion above, we have tried to give an overview i=1
of relevant learning approaches proposed for literature text
categorization. There are still many other text categorization
0 ≤ αi ≤ C, i = 1, 2, . . . , n. (7)
methods including K-nearest neighbors classification algo-
rithms, voting algorithms, decision rule-based methods and
Then, the decision function becomes
regression models [1], [3], [10]. However, all the methods
have one common drawback: the classification results are
f (x) = sign(< w∗ , ϕ(x) > +b∗ )
sensitive to the indexing methods. Different indexing meth-

n
ods affect the construction of feature space and the calcula- = sign( αi∗ yi < ϕ(xi ), ϕ(x) > +b∗ )
tion of feature’s values, the estimated functional parameters, i
such as the weights of a neural network, or the structure of
a decision tree, could be different; even though the method Only these samples that make αi∗ = 0 have influence on
that normalizes the text length has been taken into consider- the decision function, which are called support vectors.
ation. Therefore, the classification results of these document A kernel function can be used to substitute the inner prod-
classification methods could vary with the different indexing ucts in Equations (3) and (5) if the kernel function satisfies
methods. This makes the classification process subjective and the Mercer’s Theorem. The optimal problem becomes finding
unstable. Our experiment shows that one of the desirable αi∗ to maximize Equation (8) with respect to the constrains
properties of SVM is that the classification results are not of Equations (6) and (7).
sensitive to the indexing methods.
1
n n n
III. R EVIEW OF SVM
Q(α) = αi αj yi yj K(xi , xj ) + αi (8)
2 i=1 j=1 i=1
SVM is a type of supervised learning algorithm based
on the structural risk minimization principle from statistical
In this case the decision function is expressed as
learning theory. SVM adopts a linear decision function
f (x) = sign(w · x + b) described by a weight vector w
f (x) = sign(K(w∗ , ϕ(x) > +b∗ )
and a threshold b obtained from a given training sample
n
set Sn : (x1 , y1 ), (x2 , y2 ), · · · , (xn , yn ) of size n. Here xi = sign( αi∗ yi K(xi , x) + b∗ )
represents the features of a sample, and yi ∈ {−1, +1} i
indicates the class to which the sample belongs. Based on the
decision function, SVM can classify new data efficiently and Some useful kernel functions are
accurately. The objective function of the SVM is an optimal • RBF:
hyperplane, which separates two different classes with the
x − y
2
maximum margin. In general, this hyperplane corresponds K(x, y) = exp{− } (9)
2σ 2
to a nonlinear decision boundary in the input space. By
choosing a non-linear mapping ϕ(x), the input space can • Polynomial:
be transformed into a higher dimension space, the feature
space, where the two classes could be linearly separated [2]. K(x, y) = (< x, y > +1)d (10)
The minimum distance between the closest vector and the
hyperplane can be expressed as: • Sigmoid:
r = mini yi [< w, ϕ(xi ) > +b] (3) K(x, y) = tanh(η < x, y > +c). (11)
It can be proved [11] that when In the experiments, all of these three kernel functions were
examined respectively. The experimental results (Tables II–

w = αi∗ yi ϕ(xi ) (4) V) show that their performances are very similar. This proves
i that SVM methods are not sensitive to the kernel functions.

TABLE I
IV. T HE I NFLUENCE OF I NDEXING
S ELECTED SUB - DATABASE
The database we used is the Reuters-21578, which con-
tains 21 categories. 10 categories are selected as the training Category Number of files Category Number of files
data and testing data. The detailed information about these Earn 3799 Trade 517
categories is described in table 1. Acq 2213 Interest 426
Money-fx 689 ship 303
The sub-database is randomly separated into two parts, Grain 580 Wheat 288
the training data that contains 6055 files, and the testing data Crude 573 Corn 224
that contains 2594 files. All the experimental results given in
Tables II–V are the average of 20 independent experiments.
TABLE II
For each experiment, the training data and the testing data
C LASSIFICATION RESULTS WITH txc- INDEXING
are chosen randomly. Since the SVM could cope with the
problem of high dimension input vectors, all the words in
RBF Polynomial Sigmoid
the training data are used as the input features. In this case, Earn 98.4/97.5 98.5/97.4 98.3/97.2
the dimension of input vectors is 13845. Acq 96.1/96.9 96.0/96.7 95.9/96.6
Feature selection is critical for many classification algo- Money-fx 693.2/86.2 93.2/86.2 93.1/85.8
Grain 96.7/90.3 96.7/90.7 96.7/90.7
rithms mentioned in Section II. High dimensionality of input
Crude 93.0/85.4 93.0/85.4 93.0/85.9
feature space makes these algorithms computationally costly Trade 92.7/83.6 92.7/84.1 92.7/84.1
or sometimes not feasible. There are various feature selection Interest 86.7/84.4 86.7/84.4 85.4/83.7
methods to reduce the dimensionality of the feature space. Ship 92.1/80.9 92.2/81.7 92.2/81.7
Wheat 96.8/81.8 96.8/81.8 96.8/81.8
Information gain, χ2 -statistic, mutual information, document Corn 91.7/45.8 85.2/48.0 85.2/48.0
frequency threshold, and indexing are five important methods
[1]. However, most dimensionality reduction methods will
lead to loss of some information about the data and cause the
classification errors. For SVM, the computational complexity assigned to this category; c is the number of the documents
is influenced very little by the dimension of the feature incorrectly rejected from this category.
space. The only influence is the calculation of “dot product” Tables II and III show the results of SVM classification.
in kernel function. The input vectors will become a scalar Although each SVM-based classifier can only separate two
with the kernel function or dot product. So the procedure of classes, the combination of a number of SVM-based classi-
feature selection and reduction are not important for SVM. fiers can deal with the problem of classification for multiple
Compared with SVM, neural network based classifiers suffer classes data. The test results are organized with different
a dramatic increasing of computational complexity with the combinations of indexing methods and kernel functions. In
increasing of dimension of input data. In this case, feature the experiment, the C library SVMlight [8] is used to train
selection and reduction are very important. and test the SVM classifiers.
In this experiment, two indexing methods are used to In Tables II and III, the data are presented as: in each entry,
create the indexing of the word-by-file matrix. the first value is the precision and the second value is the
• txc-indexing: recall (i.e., precision/recall), and the values are the average
of 20 experiments.
T F (wi , d)
xi = Some conclusions can be derived from Tables II and III.
2
j T F (wj , d) The first conclusion is that the SVM has many advantages
compared to other methods for text categorization, for ex-
• tfc-indexing: ample, the high accuracy of the classification results, the
T F (wi , d)log( DF|D|
(wi ) )
insensitivity to the different type of the kernel functions,
xi =
|D|
j T F (wj , d)log( DF (wi ) ) TABLE III
Here, T F (wi , d) is the term frequency of wi in the document C LASSIFICATION RESULTS WITH tfc- INDEXING
d; DF (wi ) is the document frequency of wi , and |D| is the
cardinality of the document set D. RBF Polynomial Sigmoid
The precision and recall are used to evaluate the testing Earn 98.7/96.6 98.8/96.6 98.5/96.6
Acq 96.0/96.5 95.9/96.4 95.8/96.4
results and them are defined as: Money-fx 93.0/87.7 95.9/87.7 92.5/87.0
a Grain 96.0/91.6 95.8/91.6 95.9/92.0
P recision = Crude 93.5/87.5 93.5/87.9 93.6/87.9
a+b
Trade 93.1/82.6 92.6/83.6 92.6/83.6
a Interest 83.1/80.3 81.5/81.0 81.6/81.6
Recall =
a+c Ship 92.2/81.7 92.2/81.7 92.2/82.6
Wheat 95.8/82.7 94.9/83.6 94.9/83.6
where, a is the number of documents correctly assigned to Corn 81.5/45.9 81.5/45.8 81.5/45.8
this category; b is the number of the documents incorrectly

TABLE IV
and tolerance to a wide range of the functional parameters.
T HE NUMBER OF SUPPORT VECTORS AND TRAINING TIME WITH
These characteristics have been intensively discussed in the
txc- INDEXING
previous researches [6], [7], and we only analyze our results.
In our experiment, the average precision and recall of the RBF Polynomial Sigmoid
classification results reaches 92.7 % and 83.5 % respectively. Earn 1330/13.7 1274/11.97 1218/12.44
Further study reveals that the highest error rate occurs Acq 1695/16.9 1628/17.74 1552/15.2
in the category Corn, which is the smallest category in Money-fx 741/8.54 702/7.35 678/7.73
Grain 953/10.15 857/8.41 809/8.8
our experiment. Since the sample documents are randomly Crude 916/10.27 855/8.9 802/9.05
selected, relatively few documents that belong to the small Trade 717/8.42 622/7.09 638/7.49
categories are chosen. Therefore, the classifiers for the small Interest 670/7.44 636/6.36 615/7.77
Ship 576/6.66 528/5.57 492/5.6
categories are under-trained, which causes higher error rate.
Wheat 440/5.16 393/4.27 352/4.28
One of the solutions is to give more weight to the small Corn 450/5.07 422/4.2 401/4.51
categories, which makes the classifiers of the small categories
more sensitive.
TABLE V
The experimental results also demonstrate that the accu- T HE NUMBER OF SUPPORT VECTORS AND TRAINING TIME WITH
racy of the classification result is not sensitive to the type of tfc- INDEXING
kernel functions. In tables II and III, each row represents the
classification results of three popular kernel functions for a
RBF Polynomial Sigmoid
certain category. Obviously, their precision and recall values Earn 1900/22.4 1779/21.92 1701/17.13
are very similar. Acq 2141/21.13 2032/19.49 1961/21.33
The experimental results also highly tolerance to wide Money-fx 945/10.28 893/8.88 852/9.2
Grain 1323/13.32 1207/11.09 1122/11.2
ranges of parameters. In our experiments, the ranges of Crude 1204/12.48 1120/11.04 1048/10.9
parameter values can be chosen widely, for example, γ = Trade 955/10.41 886/8.79 833/9.05
(0.001 ∼ 9), (for RBF kernel function); d = (1 ∼ 10), Interest 850/9.05 780/7.64 753/8.17
Ship 777/8.57 706/7.29 660/7.3
(for polynomial kernel function); η = (0.01 ∼ 1), and c =
Wheat 657/6.95 587/5.64 546/5.73
(0.01 ∼ 5) (for sigmoid kernel function). The explanation Corn 505/5.48 474/4.16 444/4.81
of these parameters is given in [11]. The experimental results
with wide parameters ranges were very similar, but only the
results with γ = 5, d = 2, η = 0.1, and c = 1 are given
because of the space limitation. It is another advantage of positions of training data and affect the training and running
SVM, high robustness, which makes SVM easy to choose time complexity. Yang’s work [12] further supports our
parameters, and makes the algorithm more robust than other argument by indicating that the computational complexity
learning methods. of SVM depends on the number of support vectors.
The second conclusion from our experiment is that the In summary, our work demonstrates that the classification
accuracy of the classification is not sensitive to the index- accuracy of SVM is not sensitive to the indexing methods
ing methods while the computational complexity could be in the text categorization, which guarantees the SVM-based
significantly affected. The phenomenon has not been well classifiers more stable and objective than other classification
discussed in the previous researches. In this experiment, the algorithms. Furthermore, we also reveal the relation between
effects of two types of popular indexing methods, txc and the time complexity of SVM and the indexing methods,
tfc are discussed. Comparing the classification accuracy of which could be used to evaluate the different indexing
each category shown in Tables II and III, we find that the methods.
differences between their precisions and recalls are trivial V. C ONCLUSION
but the computational cost of the different methods varies
significantly. In order to explain the difference, more detailed Some conclusions of SVM can be summarized from the
experimental results are provided. Tables IV and V give the above experiments.
experimental results in the form of (the number of support • The accuracy of SVM classification is not sensitive to
vectors)/(time used to train a SVM classifier). Clearly, SVM different kernel functions, and SVM has wide range of
classifier has more support vectors and needs more training tolerance to the functional parameters.
time with tfc indexing than with txc method in almost all the • The accuracy of SVM classification is not sensitive to
cases for the given database in our experiment. the different indexing methods.
The explanation for this phenomenon is that tfc indexing • The computational complexity of SVM could be af-
generates more support vectors than txc indexing in our ex- fected by the indexing methods because of the different
periments, thus increases the time complexity. Generally, the number of support vectors generated by different index-
number of support vectors is determined by the distribution ing methods.
of the training data. Different indexing methods will create The scalability, robustness, and shorter training and run-
different word-by-file matrices, which will determine the ning time make SVM more useful in real time classification

than other methods. Indexing method has a great influence
on training and running time. So a suitable indexing method
is very important to real time applications of SVM-based
classifiers. However, the quantitative relationship between the
indexing and the computational complexity is still open to
discuss in our future work.
R EFERENCES
[1] Kjersti Aas, Line Eikvil, “Text categorization: A survey”, Technical
Report, No. 941, Norwegian Computing Center, 1999.
[2] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector
Machines, Cambridge University Press, 2000.
[3] R. O. Duda, P. E. Hart, Pattern Classification and Scene Analysis, Wiley,
New York, 1972.
[4] Steve R. Gunn, “Support vector machines for classification and regres-
sion”, Technical Report, Faculty of Engineering and Applied Science,
Department of Electronics and Computer Science, May, 1998.
[5] D. A. Hull, “Improving text retrieval for the routing problem using
latent semantic indexing”. In Proceedings of SIGIR-94, 17th ACM
International Conference on Research and Development in Information
Retrieval, pp.282-289, 1994.
[6] Thorsten Joachims, “Text categorization with support vector machines:
Learning with many relevant features,” Proceedings of ECML-98, 10th
European Conference on Machine Learning, No. 1398, pp. 137-142,
1998.
[7] Thorsten Joachims, “Making large-scale SVM learning practical”,
www.cs.cornell.edu/P eople/tj/publications/joachims9 9a.pdf .
[8] Thorsten Joachims, “SVMlight”, http://svmlight.joachims.org
[9] D. D. Lewis, “Naive (Bayes) at forty: The independence assumption
in information retrieval”, In Proceedings of ECML-98, 10th European
Conference on Machine Learning, pp. 4-15, 1998.
[10] Fabrizio Sebastiani, “Machine learning in automated text categoriza-
tion”, ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47, March 2002.
[11] Vladimir Vapnik, Statistical Learning Theory, Wiley, Chichester, GB,
1998.
[12] Y. Yang, X. Liu, “A re-examination of text categorization methods”,
In Proceedings of SIGIR-99, 22nd ACM International Conference on
Research and Development in Information Retrieval, pp. 42-49, 1999.

Research on a New Method of Processing Distributed Data

Wang Zhiguang, Chen Ming, Liu Lifeng
Department of Computer Science and Technology
China University of Petroleum, Beijing 102249
E-mail:cwangzg@cup.edu.cn
Abstract: In traditional Web Distributed Application, ever-increasing number of users.

we usually emphasize the server, and most functions are (2).The interaction ability and responding speed of
finished by the server. This not only causes huge load to the client cannot meet the demand of office users, for
the server, but also makes the server easy to be attacked. example, it may take you a dozen of minutes to wait for
So the server may have serious potential dangers of the establishment of a large datum to finish.
security and stability in it; on the other hand, the (3).The server is unstable, and often fails to respond,
functions at the client are often too simple, generally just which is difficult to maintain.
downloading the web page to browse. Resources at the (4).The server may generate serious potential safety
client have been ignored and wasted seriously. For such problems. For example, continuous submitting
a reason, this paper proposes a new method based on the unfinished data processing tasks by the user is very
Queuing model to dispose the data on the Web. By possible to lead to the delay of the server.
utilizing Component technology and XML technology, it To solve the above-mentioned problems, this
transplants some of the data disposing work to the client. article has proposed a highly efficient distributed
Through cooperation between the Server and the Client, processing technology. By using distributed component
data in the database system can be disposed and stored technology and XML technology, it moves part of the
efficiently. data processing work to the client, to finish the data
processing and put the data into the database through
cooperation between the server and the client;
1. Introduction furthermore, it administrates the user tasks by a
application queuing model at the server , to make the
The distributed application on the basis of Windows is whole system efficient and stable. Both the analyzing
now widely used, with self-management, reliability, with probability theory and queuing theory, testing its
usability, elasticity and interoperationality and so on as performance index in the practical use, we proved that
its common goal to pursue. The commonly accepted it has greatly improved the overall efficiency, stability
solution by IT industry at present is the platform of DNA and safety of a system [2,6,7].
and .NET put forward by Microsoft, namely the
component-basing distributed development on the 2. Construction of the model
Windows platform by using the built-in service of
COM+ and windows2000/XP. With regard to the above-mentioned goal of the system,
Theoretically, this kind of application system model the following technological blueprint is put forward:
boasts great flexibility, meaning that it can dispose the Firstly, the data processing work should be treated by
display of application system, i.e. the distribution mode certain algorithm or transplantation to increase the data
of its components and the application pattern of clients, processing efficiency. Secondly, to make the server
according to the scope in which it is used. However, the work efficiently, the queuing theory from the field of
commonly used distributed application solution of today midstream amount of the network and congested theory
is usually confined to the thinking way of the “thin” is used, that is to say, regarding large task processors as
clients, and attaches all processing capacity of the waiters who share a queue. The number of waiters can
system to the server, leaving only simple functions such be configured according to system capacity and each of
as browsing, downloading etc to the client. But in them offers service to the arrived tasks. When a task
practical use, the application scopes of system comes, it will be refused if the task fails to meet the
application differ one from another. For example, the service qualification. On the contrary, if it does meet,
office system basing on Web at enterprise level has a one waiter will serve and so the task is immediately
very different demand on system capacity from the one submitted. If all waiters are busy (i.e. the task number
of an e-commerce platform. One of the most prominent reaches its limit), a queue will come into being by a
differences is that the former demands rich Office certain rule. As soon as a waiter becomes available, one
service-oriented client-end interaction and rapid task in the queue will be taken out according to the
response, in contrast, the latter only needs normal service rule and handed over to the waiter. The
interaction and limited response speed that can meet above-mentioned model is as Figure 1 shows.
their demand.
After developing some large-scale distributed 3ˊThe data processing modes in
application projects, the author realizes that conventional the system
solutions may produce many disadvantages, such as
those in the application of Web-basing office system, In order to specialize data processing, here we will take
which are due to the over-concentration of application Web office automatic system as an example to illustrate
logic on the server. They are briefly listed as follows: the fact that two processing modes of Documents and
(1).Because the server is overloaded, even the Report Forms are adopted in practical development of
component balancing technology cannot support the

large-scale Web office automatic application projects, and realized an optimized server-end processing mode
due to the difference in data processing modes. through study, development and application. The mode
will be introduced in following texts.
DHTML BROWSER 3.2 Mode of Documents and Report Forms

Client emphasizing Client
Resources ActiveX
In this article, part of the data processing work is
XML transplanted to the Client by using distributed
component technology. Here it is exemplified by the
COM/DCOM
processing of Web office Documents and Report Forms.
To be specific, the Client automatically call macro with
HTTP components and load XML data from record set by
components. After parsed, these data are sent back to
front users to use. Meanwhile, the user can conduct
some complex operations such as result operation
Task Queue
through automatic call. Comparing to the server-end
processing mode, this mode reduces greatly the load of
ASP/IIS/MTS Database the server by shifting complex application processing
XML File
to the client with the server’s job only to collect data
and produce XML document. What’s more, because the
Components Service
data are sealed in XML file and users cannot get the
detailed information of data connection, the server
ADO/RDO/ODBC becomes safer, more reliable and immune to user’s
malicious attack by continuously submitted tasks.
Other Service Besides, users can load data from XML by components
at the client , send them back to front user to use and
Fig 1 Description of the Model user can automatically call to conduct complex
operations such as result operation. COM/DCOM may
help to finish these tasks without sacrificing the ability
ķ Ordinary server-end processing mode of set up in the proscenium environment such as Office
Documents and Report Forms and the whole maintainability of the system; and make
ĸ Mode of Documents and Report Forms it possible for users to customize the expression of the
emphasizing the client data freely [3,4].
The ķmode is commonly used now. Since it is The description of the processing on the Sever end
referred to in almost every Office VBA book, here this and on the Client described as below.
article makes brief introduction only. However, theĸ Server ˖
processing mode is developed by the author on the If Operation=Query Then
basis of actual conditions and practical use, so its //If present operation is querying
algorithms and processing course are explained in Sql=Request //the condition of the query
detail in this article. Rs.open Sql, Conn
//create ADO object and Obtain the data
3ˊ1 Ordinary server-end processing mode of Set Dom =new DOMDocument
Documents and Report Forms //create dom document object
Rs save Dom
As mentioned above, this mode can easily reflect the //store the require data into Dom Document object
data into database in Word or Excel Documents and Post Dom
Report Forms. The working course can be described as //send the Xml document to the XML file Sever
follows[3]˖ //with //POST Method
(1).Establish Word (Excel) object and open the URL= Accept Document
document template to be operated //Receive file, save, send back URL
(2).Read data from database and reflect them into Post URL //Send the target url to the ActiveX
document //Components on //Client with post method
(3).Save the file and provide user with wanted If Operation=Update Then
Documents and Report Forms //When user’s operation is Update Data
Ordinary server-end processing mode of Update Database
Documents and Report Forms is rather stable. //Update Database
Therefore, concerning those file report files with Create NewDom
operation result, this mode can assure the consistency //Create New XML Document
and integrity of data. So long as the data are inputted Error: //Error Disposing
through Web page first and then the form is produced Disposing Error
following certain procedures, the user need not worry End // Algorithm End
that the data in the form do not agree with those in the Client ˖
database. However, there is a drawback in this kind of Check Version
processing mode, that is, the speed is relatively slow. In //Check Component version
order to solve this problem, this article has proposed If version=New or Unregistered Then

//Component is new or unregistered submit in TL time
install Component PresentNum –The number of active servers at
//Get new component from the Components present
//Server Count –The number of tasks that behind the last
Set Dom =new DOM Document rejected task
//Create DOM Document Object TP -- Time to process the task
Dom =load URL Regular parameters˖
//Load XML file on the XML Server Wq – Weight of the queue
//asynchronously QHmin-- The limited ideal length of the queue
event˖ (no task rejected)
//When the Dom’s loading state change QHmax – The limited maximal length of the
If URL=Updated Then queue
//If Data is updated SVRmax –The limited number of the Task
Dom =Reload URL Processing Server
//Reload XML and Resolute TOmax – The limited time of task processing of
If DomState=COMPLETED Then overtime
//When the loading of the XML document is TTminˉThe limited minimum time interval that
//completed user submit task between two times
If Operation=Create Report then TNmaxˉThe limited maximal tasks number that
//If user select creating Documents user submit within a given time
//and Report Forms Other parameters˖
Parse Dom Pa – The probability refused by the system of the
//Explain or Parsing Dom Document present task
Create Document or Report Pb – The temp used probability
//Open User’s Document or Report and Create
Ǐ ǐˉProcess asynchronously
//New File
Else qˉThe present length of queue
Start OtherWork
//User can start other work asynchronously Description of Algorithm:
End // Algorithm End Initialize˖
avg =0
TT(i)=Big Int //Very big integer such as 65535
4ˊRealization of the model TN(i)=0
PresentNum=0
In real project, the client-end processing mode enjoys For each Task
obvious advantages. Although the following model is //For each task needed to be controlled
based on this mode, as a whole system it must achieve Calculate TT(i) and TN(i)
the goals listed below: If TT(i) <TTmax or TN(i) >TNmax
(1). Avoiding the possibility that some users Reject Serving And Warn User(i)
continuously submit their tasks on purpose and other //Reject serving and feedback to the present user
users cannot get served. Else If q>0 or PresentNum = SVRmax
(2). If some service is destroyed, the queue should //When queue is not empty or PresentNum = SVRmax
be able to continue and abort those services which has avg=(1ˉWq )avg+ Wq q
no response. //Accord the number of avg to decide
(3). Avoiding some problems such as task //reject or queue the task
congestion in order to make the whole system capable If avg > QHmax
of processing the submitted task within a short time Reject Serving And Warn User(i)
and make the server capable of processing enough //Reject serving and feedback to the user
tasks. If avg<QHmin
(4). Avoiding the synchronisms of the users, Queueing Task
meaning that it can inform the connected user or refuse count= –1
to serve if necessary. Else If QHmin İavgİ QHmax
(5). Restricting the length of the queue, to control count=count+1
the delay of average user task. Pb=Pmax(avg –QHmin )/( QHmax –QHmin)
Referring to queuing theory [5,6,7] and Pa=Pb/(1–count * Pb)
experiences in the developing application, we put With Pa //With Probability Pa
forward the algorithm to maintain the queue. For the
Reject Serving And Warn User(i)
convenience of description, the follow symbols and //Reject serving and feedback to the
their meanings are given.
//present user
Reserving variables: count=0
avg—Average length of the queue
Else With 1-Pa //with Probability 1-Pa
i – User ID Queueing Task
User(i) ˉUser of i //Put the task in rear of the queue
Task --Task that user submit //and wait to be disposed
TT(i)ˉTime interval of submitted tasks of user i //processing server to dispose
TL –Standard time interval of the system to serve q=q+1
TN(i) ˉ The number of tasks that the user i Else

Start Disposing Task goes whit the accurate number of avg and increases with
//accord the task type to start avg closing to QHmax; When avg is in this area, it
ǏCalculate TP //Record The disposing time abandons the tasks submitted with probability Pa, and
If Error Or TP>Tomax notify the user; Otherwise with probability 1-Pa to queue
//If Disposing error or TP>TOmax the tasks. The purpose of parameter count is to avoid that
End Disposing //Stop Disposing tasks abandoned or succeed gathering in succession.
PresentNum=PresentNum-1ǐ In order to carry on the comparison of
If Within TL//If beyond the limited time TL performance, we need to know average processing time
TN(i)=TN(i)+1 Ts of the three task processing systems. In order to
//Count of the task of user I receive the values, through a survey of the Tarim oil field
Else experiment measure center, we find that the scale of the
TN(i)=0
data quantity which the user dealt with remains 40-60
PresentNum=PresentNum+1
When avg=0 //When queue is empty records basically. The following is the statistics result of
q=0 consuming time about server-end in the case that dealing
End //Algorithm End with a document the scale of which is 50-record and
using two processing mode separately (systematic
Note: Among above-mentioned algorithm, environment: Web server: CPU Pentium 4/2.4GHz,
According to the probability theory [6], Probability Po memory 512MB, Windows 2000 Server; Customer's
that drew tasks and was refused is: quantity is 60: CPU Pentium 4/2.4GHz, memory 256MB,
Po= the busying of the probability that all the Windows 2000, IE 6.0), as table 1and 2 shows.
Waitershfull probability of the queue 5ˊExperiment
According to the theory of queuing [6], There are By the model and the Algorithm we have mentioned
the following formulas: above in this paper, we compared the two kind method
²˙(¬hTs)/N about the Browser and Server, shown as table 1. The
N 1
(NU )I deference efficiency is clear.
. ¦
I 0 I!
Pasong RatioFunction
N
(NU )I Table 1 time consuming when the server create
¦
I 0 I! Documents or Reports(chronomere for minute)
Erlang C Function˙the probability of that all the Time Convertional Cient Disposing
consuming Disposing
Waiters are busy˙C˙ 1 K Mode˄M˅
type Mode˄M˅
1 UK
Consuming
And Among above-mentioned algorithm,
time of 5.1 2.2
whenever one new demand data processing task reach Server(second)
FIFO queue, the algorithm carries out the following
function:
(1).From the user’s IP address, it confirms user’s 6ˊConclusion
identity and calculates TT (i) and TN (i) to judge present
user whether meets the condition to be served: A certain
Starting with some defects in present Web Office
time interval that a user submits two tasks beyonds
Automated System, this article proposes a new
TTmin;The number of the tasks is smaller than TNmax
distributed data processing model basing on the
within time of TL. If the user is unqualified with the
Distributed Processing Technology, XML technology
condition to be served, it refuses to server and notify the
and Queuing theory. The inventive points of the model
user, otherwise puts the task into the queue to be
lie in its high efficiency of data processing by using
managed.
Component technology and XML technology; and
(2).It judges whether the queue is empty and
from the perspective of the whole system it also puts
present number of active Task Processing Servers
forward a system that the tasks should be managed in a
PresenNum. If the queue is empty and PresentNum <
unified manner through queuing system. Through
SVRmax, then the system process the task, otherwise put
referencing the present mature theory and methods of
the task into the queue.
network such as flow and congestion control,
(3).Calculating the average length of the queue
combining with some characteristics of Web system, a
avg.The destination of introducing parameter Wq is that
specific queue manage algorithm of tasks is put
when the tasks amount reach temporarily in a short time
forward. The algorithm realizes the dynamic
is very big , The algorithm makes no reactions. By
management of tasks submitted by users, and achieves
allowing breaking length, it enhances the stability of the
the goal of stability, security and efficiency of the
system.
system.
(4).If the queue is not empty, it calculates the
average length of the queue avg, then it compares avg
with two limited length as QHmax and QHmin. If avg is Reference
greater than QHmax, it considers that the present amount
1. Hans-W.Gellersen&Martin Gaedke,
of tasks submitted is too great, so it abandons the task. If
“Object-Oriented Web Application Development”,
avg betweens the two length, it considers that the tasks
January-February 1999, IEEE Internet Computing.
may in congested area, In this area, it calculates Pa which

2. Don Benage, Azam Mirza, Using Visual Studio,
Que Corporation.
3. Professional Active Sever Page 3.0. Richard
Adnerson et al. 1999
4. Professional Windows DNA [America] Christopher
Blexrud etc. 2001
5. HIGH-SPEED NETWORKS TCP/IP and ATM
design principle [America] William Stallings 1998
6. Queueing Systems,Volume I: Theory. New
York:Wiley 1975
7. Queueing Systems,Volume II: Computer
Application .New York:Wiley 1976
8. Hans-W.Gellersen&Martin Gaedke ˈ
Ā Object-Oriented Web Application
Development āˈ January-February 1999 ˈ IEEE
Internet Computing.
9. Fonseca F.T., Egenhofer M.J., Ontology-driven
information systems. In The Proceedings 7th ACM
Symposium on Advances in GIS, Medeiros C.B.
(Ed.), Kansas Sity, MO, 1999:14-19.
10. Josefa Z. H, Juan M. S, Knowledge-based models
for emergency management systems, Expert System
with Application,2001(20): 173-186.
Zhiguang Wang,Ming Chenˈ You Luˈ Xiaoxue
Zhong. The Real Time Data Model Based on COM
Technology. Journal of Information &
Computational Science 1:3.2004.12:335-340
12. Shim, J.P., Warkentin, M., 2002. Courtney, J.F.,
Power, D. J., Sharda R, Carlsson C., Past. present
and future of decision support technology, Decision
Support Systems 33:pp.111–126

Dynamics of Continuous,
Copyright@2007 Watam Press Discrete and Impulsive Systems
Series B: Theory and Applications
\LNU\PX
Special Volume: Advances in Neural Networks–Theory and Applications
Copyright c 2007 Watam Press
1
Triangles with median and altitude of one side coincide
Dong-hai Ji2 Jun-jing Jia Sen-lin Wu
Department of Applied Mathematics, Harbin University of Science and Technology, Harbin Heilongjiang 150080, China
AMS subject classifications: 46B20, 52A10, 46E30,
Abstract: In this paper, the geometrical Theorem 1.1. If triangles with median and altitude
constant D (X) is introduced to characterize the differ- of one side coincide are isosceles triangles, then the
ence between Birkhoff orthogonality and isosceles or- normed linear space is Euclidean.
thogonality. It is showed that 0 and 1 are the lower
The above theorem shows that, for general normed
and upper bounds for D (X), respectively. The spaces
spaces, triangles with altitude and median on one side
of which D (X) attains the upper and the lower bounds
coincide may not be isosceles triangles, i.e. the lengths
are characterized. The relationship between D(X) and
of two legs are different. To characterize this difference,
D (X) and the attainability of D (X) are discussed.
we introduced the following constant:
D (X) are also calculated when X = (R2 , ·p ), X =
(R2 , ·8 ) and X = l1−2 , respectively. D (X) = sup{| x + y−x − y | : x, y ∈ S(X), x ⊥B y}
To characterize the difference between isosceles orthorg-
onality and Birkhoff orthogonality, Donghai Ji and Sen-
1 Introduction lin Wu[1] introduced the following constant:
j ff
It is well known that, in Euclidean plane, if the D(X) = inf inf |x + λy| : x⊥I y, x, y ∈ S(X)
λ∈R
altitude and median of one side of a triangle coincide,
√
then the triangle is an isosceles triangle. To study the and showed that 2( 2 − 1) D(x) 1, D(X) = 1 iff
analogue of this property in normed planes, we have the underlying space is Euclidean. If there exist√ x, y ∈
to generalize the definition of triangles, altitudes and S(X) with x⊥T y such that inf λ∈R |x + λy| = 2( 2−1),
medians to normed planes: iff there exists x0 ∈ S(X) where x0 is the common
Let x, y, z be three affinely independent points on end point of two segments√ on S(X) whose lengths are
a normed plane (i.e. real two-dimensional normed lin- greater that or equal to 2.
ear space), then the set ∂conv{x, y, z} is said to be a Our constant is similar to D(X) introduced by Ji
triangle, the segments [x, y], [y, z], [x, z] are said to be Donghai, Wu Senlin in [1], geometrically, D(X) char-
the sides of the triangle, and x, y, z are said to be the acterize the difference between the altitude and median
vertexs of that triangle. A triangle is said to be an on the base of an isosceles triangle, while our constant
isosceles triangle iff at least two of the three sides are characterize the difference of the legs of triangles with
with equal length, and the sides with equal length are altitude and median on one side coincide. Both D(X)
said to be the legs of that triangle. [x, w] is the median and D (X) can be viewed as quantitative characteri-
of the triangle on [y, z] where w = y+z 2
. zation of the difference between Birkhoff orthogonality
To define altitude in normed linear spaces, we need and isosceles orthogonality.
to introduce the Birkhoff orthogonality. Let X be a real
normed linear space, for any x, y ∈ X, x is said to be
Birkhoff orthogonal to y (x ⊥B y) iff x + αy ≥ x 2 Lower and upper bounds
holds for all α ∈ R[2]. of D (X)
The segment [x, w], where w ∈ {λy + (1 − λ)z : λ ∈
R}, is said to be one (the altitude may not be unique) Definition 2.1. For any Banach space X, D (X) is
of the altitudes on [y, z] iff x − w ⊥B y − z. said to be attainable, if there exist x, y ∈ S(X), such
Another type of orthogonality, namely isosceles or- that x ⊥B y and | x + y − x − y | = D (X).
thogonality, was introduced by R. C. James as:
Theorem 2.1. Let X be a real Banach space with
x is said to be Isosceles orthogonal to y (x ⊥I y) dim(X) ≥ 2. Then
iff x + y = x − y[3].
It is prove in[4], if x ⊥B y ⇒ x ⊥I y for any (1) 0 ≤ D (X) ≤ 1.
x, y ∈ S(X), then X is Euclidean. Thus we can eas-
ily get the following theorem: (2) D (X) = 0 iff X is an inner product space.
1 Supported by the National Natural Science Foundation of China (10671048) and the Foundation of Hei Longjiang Education
Committee
2 Corresponding author, E-mail address:jidonghai@126.com

(3) D (X) = 1 is attainable iff there exists a two- Hence x is the common endpoint of two segments on
dimensional subset X0 of X, and x is the common end- S(X0 ) of which the lengths are greater than or equal to
point of two segments on S(X0 ) of which the lengths are 1.
greater than or equal to 1. Sufficiency. Suppose now that there exists a two-
dimensional subspace X0 of X, x, y, z ∈ S(X0 ) and
Proof. (1) Since x ⊥B y, x + y ≥ x = 1. y = z such that [x, y], [x, z] ⊂ S(X0 ) and x − y =
By the Triangle Inequality, x − z = 1.
It is obvious that
x + y ≤ x + y = 2
x + y = 2.
we can show that
Let
1 ≤ x + y ≤ 2 f (t) = x + ty ,
then f is a convex function with respect to t and
Similarly
f (−1) = f (0). Thus f attains its minimum 1 on [−1, 0],
1 ≤ x − y ≤ 2
hence
Hence x + ty ≥ 1 = x , (∀t ∈ R)
0 ≤ | x + y − x − y | ≤ 1 which shows that there exist x, y such that x ⊥B y and
which implies | x + y − x − y | = 1.
0 ≤ D (X) ≤ 1
Corollary 2.1. D(X) = 1 iff D (X) = 0
(2) Suppose that D (X) = 0, then Birkhoff orthogonal- Corollary 2.2. For √ any finite-dimensional Banach
ity implies isosceles orthogonality on S(X). By[4], X space X, if D(X) = 2( 2 − 1), then D (X) = 1.
is an inner product space. Since those two orthogonal-
ities coincide in an inner product space, the converse
also holds.
3 Constant D (X) of lp2 space
(3) Necessity. Suppose that there exist x, y ∈ S(X0 )
Theorem 3.1.
such that x ⊥B y and (˛" !p
˛
˛ 1 tp−1
| x + y − x − y | = 1 D (lp2 ) = sup ˛ 1 +
˛ (1 + tp ) p
1
[1 + tp(p−1) ] p
Without lose of generality, we can assume that
!p # 1
p
x + y ≥ x − y 1 t
+ 1 − 1
[1 + tp(p−1) ] p (1 + tp ) p
Then we have
" !p
x + y = 2, x − y = 1, 1 tp−1
− 1 − 1
(1 + tp ) p [1 + tp(p−1) ] p
hence ‚ ‚
‚x + y‚ !p # 1 ˛˛ 9
‚ ‚ = 1. =
2 t 1
p
˛
+ + ˛ : t ∈ [0, 1]
By the convexity of the unit ball, 1 1 ˛ ;
(1 + tp ) p [1 + tp(p−1) ] p ˛
[x, y] ⊆ S(X0 )
Proof. Let x = (α, β) ∈ S(lp2 ), then x∗ = (αp−1 , β p−1 )
Since satisfies
x∗ (x) = x∗ x = 1.
λx + (1 − λ)(x − y) = x − (1 − λ)y ≥ x = 1 Without lose of generality, we can assume that α ≥ β ≥
(∀λ ∈ R) 0 and α = 0.
Let
and x, x − y ∈ S(X0 ), by the Triangle Inequality, β
= t,
α
λ0 x + (1 − λ0 )(x − y) ≤ 1, (∀λ0 ∈ [0, 1]) then
αp (1 + tp ) = 1,
hence
that is
1
λ0 x + (1 − λ0 )(x − y) = 1(λ0 ∈ [0, 1]). α= 1
(1 + tp ) p
That is t
β= 1
[x, x − y] ⊆ S(X0 ) (1 + tp ) p

"˛ ˛p
1 1
, − β p−1 ˛ ˛
Let y = a( αp−1 ), where a > 0, then y is an el- ˛ 1 tp−1 ˛
x − y = ˛ 1 − ˛
ement to which x is Birkhoff orthogonal. If we require ˛ (1 + tp ) p 1
[1 + tp(p−1) ] p ˛
further that y ∈ S(lp2 ), we can obtain
˛ ˛p # 1
˛ ˛p ˛ ˛ p
˛ ˛ ˛ t 1 ˛
˛ a ˛p ˛˛ −a ˛˛ + ˛
˛ (1 + tp ) p1
+ 1 ˛
˛
˛ p−1 ˛ + ˛ p−1 ˛ = 1 [1 + t p(p−1) ]p
α β
" !p
„ « 1 tp−1
= 1 − 1
1 1 (1 + tp ) p [1 + tp(p−1) ] p
ap + =1
αp(p−1) β p(p−1)
!p # 1
p
Where t 1
+ 1 + 1
1 (1 + tp ) [1 + tp(p−1) ]
= (1 + tp )p−1 p p
αp(p−1)
Thus
(˛" !p
˛
˛ 1 tp−1
D (lp2 ) = sup ˛ 1 +
1 (1 + tp )p−1 ˛ (1 + tp ) p
1
[1 + tp(p−1) ] p
=
β p(p−1) tp(p−1)
!p # 1
p
Hence 1 t
+ 1 − 1
tp(p−1) [1 + tp(p−1) ] p (1 + tp ) p
ap =
(1 + tp )p−1 (1 + tp(p−1) )
" !p
1 tp−1
− 1 − 1
p−1
(1 + tp ) p [1 + tp(p−1) ] p
t
a= 1 1
!p # 1 ˛˛ 9
(1 + tp ) q [1 + tp(p−1) ] p p
˛ =
t 1 ˛ : t ∈ [0, 1]
+ + ˛
Then (1 + tp ) p
1 1
[1 + tp(p−1) ] p ˛ ;
!
1 t
x = (α, β) = 1 , 1
(1 + tp ) p (1 + tp ) p Corollary 3.1. lim D (lp ) = 1
p→∞
„ «
Attainability of D (X)
1 1
y = a
αp−1
, − p−1
β 4
! Lemma 4.1. [5] X = lp (Xi ), (1 < p < ∞, i = 1, 2...)
tp−1 −1 is strictly convex Banach space iff Xi is strictly convex
= 1 , 1
[1 + tp(p−1) ] p [1 + tp(p−1) ] p Banach space.
Theorem 4.1. For any finite-dimensional Banach
Hence space X, and {xn }, {yn } ⊂ S(X), satisfying xn ⊥B yn ,
· ·
˛p xn −→ x and yn −→ y, we have x ⊥B y.
"˛
˛ ˛
˛ 1 tp−1 ˛ · ·
x + y = ˛ + ˛ Proof. For any > 0, since xn −→ x, yn −→ y, there
˛ (1 + tp ) p1 1
[1 + tp(p−1) ] p ˛
exists some N such that xn − x < and yn − y <
hold for all n > N . Thus
˛ ˛p # 1
˛ ˛ p
˛ t 1 ˛ | xn + tyn − x + ty | ≤ (xn + tyn ) − (x + ty)
+ ˛ − ˛
˛ (1 + tp ) p1 p(p−1)
1
˛
[1 + t ]p = (xn − x) + t(yn − y)
" !p ≤ xn − x + |t| yn − y
1 tp−1 < (1 + |t|)
= 1 + 1
(1 + tp ) p [1 + tp(p−1) ] p < 2
!p # 1 holds for all t ∈ (−1, 1). That is
p
1 t
+ −
[1 + tp(p−1) ] p
1
(1 + tp ) p
1
lim xn + tyn = x + ty , (∀ t ∈ (−1, 1))
n→∞

Since References
xn + tyn ≥ xn , (∀ t ∈ R)
We know that [1] Ji Donghai, Wu senlin, Quantitative Characteri-
zation of the Difference between Birkhoff Orthogo-
x + ty ≥ x , (∀ t ∈ (−1, 1))
nality and Isosceles Orthogonality, J. Math. Anal.
Let f (t) = x + ty, then f is a convex function with Appl. 2006, 323, pp.1-7.
respect to t. Since f (t) ≥ f (0), where t ∈ (−1, 1), f (t)
is decreasing and increasing on (−∞, 0) and (0, +∞), [2] G. Birkhoff, Orthogonality in Linear Metric
respectively. Hence, x + ty ≥ x holds for all t ∈ R. Spaces, Duke Math. J. 1935, 1, pp.169–172.
Namely, x ⊥B y holds.
[3] R. C. James, Orthogonality in Normed Linear
The theorem shows that D (X) is attainable if X Space, Duke Math. J. 1945, 12, pp.291–301.
is a finite-dimensional Banach space.
It is obvious that if D (X) = 1 and D (X) is at- [4] D. Amir, Characterizations of Inner Product
tainable for a Banach space X, then X is not strictly Space, Birkhäuser 1986.
convex. Hence, for any finite-dimensional Banach space [5] M. M. Day, Reflexive Banach Space not Isomorphic
X, if X is strictly convex, then D (X) < 1. to Uniformly Convex Space, Bull. Amer. Math.
In fact, let X = lp (Xi ), where Xi = (R2 , ·pi ), Soc. 1941, 47, pp.313–317.
pi ∈ (1, 2], i = 1, 2, . . . , pi > pi+1 , and lim pi = 1, then
i→∞
D (X) is not attainable. [6] R. C. James, Orthogonality and Linear Functions
in Normed Linear Space, Trans. Amer. Math. Soc.
Theorem 4.2. Let X be as above . Then 1947, 61, pp.265–292.
(1) X is strictly convex.
(2) D (X) = 1. [7] R. C. James, Uniformly Non-square Banach
Proof. (1) Since lp2i , (1 < pi ≤ 2) is a strictly convex Spaces, Ann. of Math. 80(1964),542-550.
Banach space, by Lemma4.1, X is strictly convex.
[8] J. Go and K. S. Lau, On the Geometry of Spheres
(2) By the Definitions of X and D (X), it is obvi-
in Normed Linear Spaces, J. Austral. Math. Soc
ous that
(Series A)48(1990), 101-112.
D (X) ≥ lim D (Xi ) = lim D (lp2i ) = 1
pi →1 pi →1 [9] Ji Donghai, Zhan Dapeng, Some equivalent repre-
For any Banach space X, we have 0 ≤ D (X) ≤ 1, then sentations of Nonsequare constants and its appli-
D (X) = 1 holds. cations, Northeast Math. J 15(4)(1999),439-444.
Example 1 Let X = R2 with the norm defined √ [10] J. Alonso and C. Benítez, Some Characteristic
by ·8 = max{·∞ , √12 ·1 }, then D (X) = 2 − 2. and Non-characteristic Properties Inner Product
Example 2 Let X = l1−2 with the norm defined Spaces, J. Approx Theory.55(1988),318-325.
by j
·1 if x1 x2 ≥ 0 [11] J. García-Falset, E, Llorens-Fuster and Eva M.
· = Mazcuíían-Navarro, Uniformly nonsequare Banach
·2 if x1 x2 ≤ 0
spaces have the fixed point property for nonexpan-
then q
√ sive mappings, J. Funct Anal. 233(2)(2006),494-
D (X) = 2 + 2 − 1. 514.

Matrix Representation of Solution Concepts in Graph Models for Two

Decision-Makers with Preference Uncertainty
Haiyan Xu1 , D. Marc Kilgour2 , and Keith W. Hipel1 , Fellow, IEEE
1 Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
2 Department of Mathematics, Wilfrid Laurier University, Waterloo, ON, N2L 3C5, Canada
h6xu@engmail.uwaterloo.ca, mkilgour@wlu.ca, kwhipel@engmail.uwaterloo.ca
Abstract— Explicit matrix formulations are proposed for four new solution concepts that take strength of preference into
basic solution concepts of two decision-maker (2-DM) graph account.
models with preference uncertainty. The relative preferences
of each DM are an important component of a graph model, A graph model consists of a set of DMs, a set of feasible
as preference information must play a key role in decision states, a directed graph describing the state transitions avail-
analysis. But unfortunately it is difficult to obtain accurate able to each DM, and each DM’s relative preferences over
preference information in some situations, so uncertainty in states. Obviously, preference information plays an important
models is sometimes unavoidable. Although the four basic graph
model solution concepts have been extended to models with role in decision analysis; unfortunately, it is not easy to
preference uncertainty, they are not easy to integrate into obtain accurate preference information in some situations.
GMCR II, an existing Decision Support System for the graph Moreover, as Fischer et al. [6] [7] discussed, conflicts among
model. In this paper, these stability definitions are represented the attributes of alternatives can cause preference uncertainty.
using matrices instead of graphically or logically. Compared To incorporate preference uncertainty into the graph model
with existing analytical representations, the matrix method has
the advantages of easy calculation and easy coding. methodology, Li et al. [17] proposed a new preference struc-
ture for the graph model that can handle DMs’ preference
I. INTRODUCTION uncertainty. They also extended some of the seven solution
concepts to conflict models with preference uncertainty. In
In a strategic conflict, two or more Decision Makers the new preference structure, the uncertain preferences of
(DMs) interact. Many methodologies have been proposed DM i over the set of states S, are expressed by a triple
to analyze strategic conflicts, including Metagame Analysis of relations {i , ∼i , Ui } on S, where {s i q}, indicates
[12], Conflict Analysis [8], Drama Theory [13], the Graph that DM i prefers s to q, and s ∼i q means that DM i is
Model for Conflict Resolution [3] with its decision support indifferent between s and q (or equally prefers s and q). Also
system GMCR II [4] [5], and Theory of Moves [2]. Most of s i q means either s i q or s ∼i q, and Ui indicates that
these methods identify possible equilibrium states based on DM i does not know the relative preference, so sUi q means
solution concepts that model human behavior. DM i may prefer state s to state q, may prefer q to s, or may
In 1944, Neumann and Morgenstern [20] presented the be indifferent between s and q.
normal form representation of games. In 1971, option form Although Li et al. [17] redefined several solution concepts
models, which are often more convenient to use than normal with preference uncertainty, their new solution concepts are
form, were introduced by Howard [12]. In 1987, the Graph defined either graphically or logically, and cannot easily be
Model for Conflict Resolution (GMCR) was proposed by integrated into the Decision Support System (DSS) GMCR
Kilgour et al. [16] to provide simple, flexible, and insightful II. Matrix Representation of Solution Concepts (MRSC)
models for conflict analysis and resolution. In a graph model, [22] facilitates the development of improved algorithms to
a solution concept is a precise definition of stability for a assess the stabilities of states and to implement new stability
state. In order to represent a diversity of behavior patterns, concepts. In this paper, the new stability definitions are
at least seven solution definitions have been formulated represented using explicit matrix formulations. The MRSC
for graph models, including Nash stability [18], [19], Gen- method is extended to allow for preference uncertainty.
eral Metarationality (GMR) [12], Symmetric Metarationality
(SMR) [12], Sequential Stability (SEQ) [9], Limited-move Several important matrix representations corresponding to
Stability (LS) [15], Non-myopic Stability (NM) [1], [14], and the state set of a graph model and preference relations over
Stackelberg’s equilibrium concept [21]. Recently, Li et al. the states are established in Section II. Section III develops
[17] extended some of these solution concepts to models with matrix representations of solution concepts for 2-DM conflict
preference uncertainty, and Hamouda et al. [10] proposed models with preference uncertainty. In Section IV, a practical
application is used to illustrate the extended MRSC method.
This work was supported by the Natural Sciences and Engineering Finally, some conclusions and ideas for future work are
Research Council (NSERC) of Canada. provided in Section V.

II. T HE STRUCTURE OF M ATRIX R EPRESENTATIONS FOR with (s, q) entry m(s, q) = h(s, q) · v(s, q). (“ ◦ ” denotes
2-DM C ONFLICTS the Hadamard product.) If H is an m × m matrix, then the
To discuss the representation of a graph model using m × m matrix sign(H) with (s, q) entry is defined by
⎧
matrices, we first introduce some notation. ⎨ 1 h(s, q) > 0,
A. The Preference Structure with Uncertainty sign[h(s, q)] = 0 h(s, q) = 0,
⎩
−1 h(s, q) < 0.
A graph model for a strategic conflict is comprised of a
finite set of DMs N, a set of feasible states S, a preference Below, two preference relation matrices Pi+ and Pi−,= for
relation i on S for each DM i, and a directed graph Gi = DM i are respectively defined as
{S, Ai }. In each directed graph, S is the vertex set, and each
+ 1 if q i s,
oriented arc of the arc set Ai ⊆ S × S indicates that DM i Pi (s, q) =
0 otherwise,
can make a legal move (in one step) from the initial state to
the terminal state of the arc. and
1 if s i q,
In a graph model, the preferences of DM i over the set Pi−,= (s, q) =
0 otherwise.
of states S, can be expressed by a triple of relations {i
, ∼i , Ui } on S. It is assumed that the preference relations of Let
each DM i ∈ N have following properties: Pi+,U (s, q) = E − I − Pi−,=
(i) i is asymmetric.
and
(ii) ∼i is reflexive and symmetric.
Pi−,=,U (s, q) = E − I − Pi+ ,
(iii) Ui is symmetric.
(iv) {i , ∼i , Ui } is strongly complete. where I is an m × m identity matrix. Hence,
Note that the assumption of transitivity of preferences is
Ji+ (s, q) = Ji (s, q) ◦ Pi+ (s, q)
not required so that the results in this paper hold for both
transitive and intransitive preferences. and
For i ∈ N , Ji is a |S| × |S| 0-1 matrix defined by Ji+,U (s, q) = Ji (s, q) ◦ Pi+,U (s, q).

1 if (s, q) ∈ Ai , The four types of extending solution concepts Nash, GMR,
Ji (s, q) =
0 otherwise, SMR, and SEQ in 2-DM conflict models are represented
where |S| is the number of the states in S. Ji is called using MRSC method next.
a reachability matrix to represent DM i s unilateral moves III. MRSC IN G RAPH M ODELS FOR T WO
(UMs). Ri (s) denotes DM i s reachable list from a state s, D ECISION -M AKERS WITH P REFERENCE U NCERTAINTY
containing all states to which DM i can move from state
Li et al. [17] extended Nash stability, general metara-
s in one step. So Ri (s) = {q : Ji (s, q) = 1}. For DM i, a
tionality, symmetric metarationality, and sequential stability
unilateral improvement (UI) matrix Ji+ is defined as follows:
to models with preference uncertainty, using four distinct
1 if Ji (s, q) = 1 and q i s, sets of definitions (indexed by a, b, c, and d) to calculate
Ji+ (s, q) =
0 otherwise. stabilities in situations with uncertainty. For a 2-DM model,
the algebraic characterizations of the new stability definitions
Similarly, Ri+ (s) is defined as DM i s all unilateral im- are presented in following theorems.
provements (UIs) from state s. Therefore, Ri+ (s) = {q : In the definitions indexed a, DM i has an incentive to move
Ji+ (s, q) = 1}. to states with uncertain preferences relative to the status
Define an |S| × |S| 0-1 matrix by quo, but will not consider a move to a state with uncertain

U 1 if Ji (s, q) = 1 and sUi q, preference to be a sanction.
Ji (s, q) =
0 otherwise. Theorem 3.1: Let two DMs be N = {i, j}. A state s ∈ S
−
→T
is N asha stable for DM i iff eTs · Ji+,U = 0 . (T denotes
Let RiU (s) denote all states of uncertain preference
relative the transpose of a matrix or a vector.)
to state s for DM i, and let Ri+,U (s) = Ri+ (s) RiU (s).
Proof: It is obvious that Ri+,U (s) = ∅, for i iff eTs · Ji+,U =
Then RiU (s) = {q : JiU (s, q) = 1} and Ri+,U (s) = {q : −
→T
Ji+,U (s, q) = 1}, where Ji+,U = Ji+ + JiU . 0 .
Let i ∈ N and m = |S|. Define the m×m matrix MiGM Ra
B. Matrix Representations of Preference Relations over a
State Set MiGM Ra = Ji+,U · [E − sign Jj · (Pi−,= )T ].
In this paper, we using matrix formulations to calculate Theorem 3.2: Let i ∈ N . A state s ∈ S is general
stabilities and predict equilibria of a graph model. metarational (GM Ra ) for DM i iff
Let |S|=m. E is an m × m unit matrix with each entry 1,
MiGM Ra (s, s) = 0. (1)
and let ek denote a m− dimensional column vector with k th
Proof: (1) is equivalent to
element 1 and all other elements 0. For two m × m matrices
H and V , M = H ◦ V is defined as the m × m matrix (eTs Ji+,U ) · E − sign Jj · (Pi−,= )T es = 0.

Since Theorem 3.4: Let i ∈ N . A state s ∈ S is sequentially
stable (SEQa ) for DM i iff
(eTs Ji+,U ) · E − sign Jj · (Pi−,= )T es
MiSEQa (s, s) = 0. (6)

m
Proof: (6) is equivalent to
= Ji+,U (s, s1 ) 1 − sign (eTs1 Jj ) · (eTs Pi−,= )T ,
s1 =1
(eTs Ji+,U ) · [ E − sign Jj+,U · (Pi−,= )T es ] = 0.
then (1) holds iff
Since
Ji+,U (s, s1 )[1 − sign (eTs1 Jj ) · (eTs Pi−,= )T ] = 0, ∀s1 ∈ S.
(eTs Ji+,U ) · [ E − sign Jj+,U · (Pi−,= )T es ]
(2)
It is clear that (2) is equivalent to
m
= Ji+,U (s, s1 )[1 − sign (eTs1 Jj+,U ) · (eTs Pi−,= )T ],
(eTs1 Jj ) · (eTs Pi−,= )T = 0, ∀s1 ∈ Ri+,U (s), s1 =1
which implies that, for any s1 ∈ Ri+,U (s),

there exists at then (6) holds iff
least one s2 ∈ Rj (s1 ) with s i s2 .
Ji+,U (s, s1 )[1−sign (eTs1 Jj+,U ) · (eTs Pi−,= )T ] = 0, ∀s1 ∈ S.
Let i ∈ N and m = |S|. Define the m×m matrix MiSM Ra
(7)
MiSM Ra = Ji+,U · [E − sign(H)], It is clear that (7) is equivalent to
with (eTs1 Jj+,U ) · (eTs Pi−,= )T = 0, ∀s1 ∈ Ri+,U (s).

H = Jj · [(Pi−,= )T ◦ E − sign Ji · (Pi+,U )T ], It implies that for any s1 ∈ Ri+,U (s), there exists at least
one s2 ∈ Rj+,U (s1 ) with s i s2 .
f or j ∈ N − i. For the next definitions indexed b, in a uncertain situation,
Theorem 3.3: Let i ∈ N . A state s ∈ S is symmetric DM i considers to leave a state or assesses sanctions,
metarational (SM Ra ) for DM i iff excluding uncertain preferences.
MiSM Ra (s, s) = 0. (3) Theorem 3.5: Let two DMs be N = {i, j}. A state s ∈ S
−
→T
Proof: Since is N ashb stable for DM i iff eTs · Ji+ = 0 .
Let i ∈ N and m = |S|. Define the m×m matrix MiGM Rb
MiSM Ra (s, s) = (eTs Ji+,U ) · [(E − sign(H)) es ]
MiGM Rb = Ji+ · [E − sign Jj · (Pi−,= )T ].

m
= Ji+,U (s, s1 )[1 − sign (H(s1 , s))] Theorem 3.6: Let i ∈ N . A state s ∈ S is general
s1 =1 metarational (GM Rb ) for DM i iff MiGM Rb (s, s) = 0.
with Let i ∈ N and m = |S|. Define the m×m matrix MiSM Rb

m
H(s1 , s) = Jj (s1 , s2 ) · W, MiSM Rb = Ji+ · [E − sign(H)],
s2 =1
with
and
m
H = Jj · [(Pi−,= )T ◦ E − sign Ji · (Pi+,U )T ],

W = Pi−,= (s, s2 )[1−sign Ji (s2 , s3 )Pi+,U (s, s3 ) ], f or j ∈ N − i.
s3 =1 Theorem 3.7: Let i ∈ N . A state s ∈ S is symmetric
then (3) holds iff H(s1 , s) = 0, ∀s1 ∈ Ri+,U (s), which metarational (SM Rb ) for DM i iff MiSM Rb (s, s) = 0.
is equivalent to the statement that, ∀s1 ∈ Ri+,U (s), ∃s2 ∈ Let i ∈ N and m = |S|. Define the m×m matrix MiSEQb

Rj (s1 ) such that MiSEQb = Ji+ · [E − sign Jj+,U · (Pi−,= )T ].
Pi−,= (s, s2 ) = 0, (4)
Theorem 3.8: Let i ∈ N . A state s ∈ S is sequentially
and stable (SEQb ) for DM i iff MiSEQb (s, s) = 0.
m
Although excluding uncertainty in preference, the stability
Ji (s2 , s3 )Pi+,U (s, s3 ) = 0. (5) definitions indexed b are different from those discussed by
s3 =1
Fang et al. [3], since current definitions are utilized to
Obviously, ∀s1 ∈ Ri+,U (s), ∃s2 ∈ Rj (s1 ) such that (4) analyze conflict models with preference uncertainty. For the
and (5) hold iff for every s1 ∈ Ri+,U (s) there exists s2 ∈ extended definitions indexed c, DM i has an incentive to
Rj (s1 ) such that s i s2 and s i s3 for all s3 ∈ Ri (s2 ). move to states with uncertain preferences relative to the
Let i ∈ N and m = |S|. Define the m×m matrix MiSEQa status quo, and will consider a move to a state with uncertain
preference to be a sanction.
MiSEQa = Ji+,U · [E − sign Jj+,U · (Pi−,= )T ]. Theorem 3.9: Let two DMs be N = {i, j}. A state s ∈ S
→T
−
is N ashc stable for DM i iff eTs · Ji+,U = 0 .

Let i ∈ N and m = |S|. Define the m×m matrix MiGM Rc
1 2

MiGM Rc = Ji+,U · [E − sign Jj · (Pi−,=,U )T ]. 1 2
Theorem 3.10: Let i ∈ N . A state s ∈ S is general
metarational (GM Rc ) for DM i iff MiGM Rc (s, s) = 0.
Let i ∈ N and m = |S|. Define the m×m matrix MiSM Rc 4
3 3 4
MiSM Rc = Ji+,U · [E − sign(H)], (a) graph model for DM 1 (b) graph model for DM 2
payoff P1 = (4, 2, 3, 1) payoff P2 = (b1 , b2 , b3 , b4 )
in which
Fig. 1. Graph model for a 2 × 2 game

H = Jj · [(Pi−,=,U )T ◦ E − sign Ji · (Pi+ )T ],
f or j ∈ N − i. A. A 2-DM Conflict Model

Theorem 3.11: Let i ∈ N . A state s ∈ S is symmetric
metarational (SM Rc ) for DM i iff MiSM Rc (s, s) = 0. A 2-DM conflict model with preference uncertainty is used
to illustrate how stability analysis is carried out by MRSC
Let i ∈ N and m = |S|. Define the m×m matrix MiSEQc
with preference uncertainty. A sustainable development game
to model a conflict between environmental agencies and
MiSEQc = Ji+,U · [E − sign Jj+,U · (Pi−,=,U )T ].
developers was considered by Hipel [11] and Li et al [17].
The conflict is modeled by two DMs: environmental agencies
Theorem 3.12: Let i ∈ N . A state s ∈ S is sequentially (DM 1) and developers (DM 2), DM 1 with two strategies:
stable (SEQc ) for DM i iff MiSEQc (s, s) = 0. proactive (labeled P) and reactive (labeled R) in monitoring
For the last definitions indexed d, DM i considers to leave developers’ activities and their impacts on the environment,
a state, excluding preference uncertainty, but will consider a and DM 2 with two strategies: a high priority (labeled H)
move to a state with uncertain preference to be a sanction. and a low priority (labeled L) by assessing the environment.
Theorem 3.13: Let two DMs be N = {i, j}. A state s ∈ S These strategies are combined to form four feasible states:
−
→T
is N ashd stable for DM i iff eTs · Ji+ = 0 . PH, PL, RH, and RL. We order the four states: 1: PH; 2:
Let i ∈ N and m = |S|. Define the m×m matrix MiGM Rd PL; 3: RH; 4: RL. The graph model of the conflict model is
shown in Fig. 1. DM 1’s preference information is complete,
MiGM Rd = Ji+ · [E − sign Jj · (Pi−,=,U )T ]. but DM 2’s preferences is uncertain. Given
x U2 y, x U2 {, y U2 z, z U2 {,
Theorem 3.14: Let i ∈ N . A state s ∈ S is general z 2 x, and { 2 y.
metarational (GM Rd ) for DM i iff MiGM Rd (s, s) = 0. The graph model of the conflict model is shown in Fig. 1.
Let i ∈ N and m = |S|. Define the m×m matrix MiSM Rd The reachable matrices for DM 1 and DM 2 are
⎛ ⎞ ⎛ ⎞
0 0 1 0 0 1 0 0
MiSM Rd = Ji+ · [E − sign(H)], ⎜ 0 0 0 1 ⎟ ⎜ 1 0 0 0 ⎟
J1 = ⎜ ⎟ ⎜
⎝ 1 0 0 0 ⎠ and J2 = ⎝ 0 0 0 1 ⎠ .
⎟
in which
0 1 0 0 0 0 1 0

H = Jj · [(Pi−,=,U )T ◦ E − sign Ji · (Pi+ )T ], Preference relation matrices for the DMs 1 and 2 are
⎛ ⎞ ⎛ ⎞
f or j ∈ N − i. 0 0 0 0 0 1 1 1
Theorem 3.15: Let i ∈ N . A state s ∈ S is symmetric ⎜ 1 0 1 0 ⎟ ⎜ 0 0 1 ⎟
P1+ = ⎜ ⎟ , P −,= = ⎜ 0 ⎟,
metarational (SM Rd ) for DM i iff MiSM Rd (s, s) = 0. ⎝ 1 0 0 0 ⎠ 1 ⎝ 0 1 0 1 ⎠
Let i ∈ N and m = |S|. Define the m×m matrix MiSEQd 1 1 1 0 0 0 0 0
⎛ ⎞ ⎛ ⎞
MiSEQd = Ji+ · [E − sign Jj+,U · (Pi−,=,U )T ]. 0 0 1 0 0 1 0 1
⎜ 0 0 0 1 ⎟ ⎜ 1 0 1 0 ⎟
P2+ =⎜
⎝ 0
⎟, P +,U
=⎜ ⎟,
Theorem 3.16: Let i ∈ N . A state s ∈ S is sequentially 0 0 0 ⎠ 2 ⎝ 0 1 0 1 ⎠
stable (SEQd ) for DM i iff MiSEQd (s, s) = 0. 0 0 0 0 1 0 1 0
and P1U is a zero matrix, P2+,U = P2+ + P2U ,
IV. A PPLICATIONS P2−,= = E − I − P2+,U , P2−,=,U = P2−,= + P2U .
In this section, we apply the matrix method to a practical Hence, we can calculate the extended stabilities of Nash,
problem to show the procures. GMR, SMR, and SEQ respectively, using Theorems 3.1 -

TABLE I
[3] L. Fang, K. W. Hipel and D. M. Kilgour, Interactive Decision Making:
S TABILITY R ESULTS OF THE S USTAINABLE D EVELOPMENT G AME WITH The Graph Model for Conflict Resolution, Wiley, New York, U.S.A.,
1993.
UNCERTAIN P REFERENCE [4] L. Fang, K. W. Hipel, D. M. Kilgour, and X. Peng, “A decision support
system for interactive decision making, Part 1: Model formulation,”
State Nash GMR SMR SEQ IEEE Transactions on Systems, Man and Cybernetics, Part C, SMC-
1 2 E 1 2 E 1 2 E 1 2 E 33(1), pp. 42C55, 2003.
√ √ √ √ [5] L. Fang, K. W. Hipel, D. M. Kilgour, and X. Peng, A decision support
1
√ √ √ √ system for interactive decision making, Part 2: Analysis and output
2
√ √ √ interpretation, IEEE Transactions on Systems, Man and Cybernetics,
a 3
4 Part C, SMC-33(1), pp. 56C66, 2003.
1 √ √ √ √ √ √ √ √ √ √ √ √ [6] G. W. Fischer, J. Jia and M. F. Luce, “Attribute conflict and preference
2 √ √ √ √ √ √ √ √ √ √ √ √ uncertainty: The randmau model”, Management Science, vol. 46(5), pp.
b 3 √ √ √ √ √ √ √ √ √ √ 669-684, 2000.
4 √ √ √ √ [7] G. W. Fischer, M. F. Luce, and J. Jia, “Attribute conflict and preference
√ √ √ √ √ √
uncertainty: Effects on judgment time and error”, Management Science,
1 vol. 46(1), pp. 88-103, 2000.
√ √ √ √ √ √
2 [8] N. M. Fraser, and K. W. Hipel, Conflict Analysis: Models and Resolu-
√ √ √ √ √ √ √ √ √
c 3 tions, New York: NorthHolland, 1984.
√ √ √
4 [9] N. M. Fraser and K. W. Hipel, “Solving complex conflicts”, IEEE
√ √ √ √ √ √ √ √ √ √ √ √
1 Transactions on Systems, Man, and Cybernetics, vol. SMC-9, pp. 805-
√ √ √ √ √ √ √ √ √ √ √ √
2 817, 1979.
√ √ √ √ √ √ √ √ √ √
d 3 [10] L. Hamouda, D. M. Kilgour, and K. W. Hipel, “Strength of preference
√ √ √ √
4 in the graph model for conflict resolution”, Group Decision and
Negotiation, vol. 13, pp. 449C462, 2004.
[11] K.W. Hipel, ”Conflict resolution: Theme overview paper in conflict
resolution, in Encyclopedia of Life Support Systems (EOLSS), Oxford,
U.K.: EOLSS Publishers, 2002.
3.16. [12] N. Howard, Paradoxes of Rationality: Theory of Metagames and
Table I provides the stability results for the 2×2 game cal- Political Behavior, Cambridge, MA: MIT press, 1971.
[13] N. Howard, “Confrontation analysis: How to win operations other
culated by the extended MRSC method. They are precisely √ than war,” DoD C4ISR Cooperation Research Program, The Pentagon,
the same as the results of Li et al. [17]. In Table I, ” ” Washington, D.C, 1999.
denotes that this state is stable for DM 1 or DM 2 under the [14] D. M. Kilgour, K. W. Hipel, and N. M. Fraser, “Solution concepts in
non-cooperative games”,Large Scale Systems 6, pp. 49-71, 1984.
corresponding stability definitions. [15] D. M. Kilgour, “Anticipation and stability in two-person noncoopera-
tive games”, In M. D. Ward and U. Luterbacher (Eds.), Dynamic Model
V. C ONCLUSION AND F UTURE W ORK of International Conflict, Lynne Rienner Press, Boulder, CO, pp. 26-51,
1985.
A. Conclusions [16] D. M. Kilgour, K. W. Hipel, and L. Fang, “The graph model for
conflicts”, Automatica, vol. 23, no. 1, pp. 41-55, 1987.
In this paper, the MRSC is extended to produce a ma- [17] K. W. Li, K. W. Hipel, D. Marc Kilgour, and Liping Fang, “Preference
trix method to represent several solution concepts for 2- uncertainty in the graph model for conflict resolution”, IEEE Transac-
DM conflict models with preference uncertainty. Because of tions on Systems, Man, and Cybernetics łPart A: Systems and Humans,
vol. 34, no. 4, pp. 507-520, 2004.
the nature of logical representations, procedures to identify [18] J. Nash, “Equilibrium points in n-person games,” Proc. Nat. Academy
stable states based on these solution concepts defined by Li Sci., New York, NY, vol. 36, pp. 48C49, 1950.
et al. [17] are not easy to code, which may explain why [19] J. F. Nash, “Noncooperative games”, Annals of Mathematics, vol. 54,
no. 2, PP. 286-295, 1951.
algorithms for these solution concepts have not integrated [20] J. V. Neumann and O. Morgenstern, Theory of Games and Economic
into a DSS. The MRSC with preference uncertainty method Behavior, Princeton University Press, Princeton, NJ, 1944.
handles this problem efficiently, and therefore facilitates the [21] von Stackelberg, The Theory of the Market Economy, Oxford U. P.,
Oxford, 1952.
development of improved algorithms to assess the stabilities [22] H. Xu, K. W. Hipel, and D. M. Kilgour, “Matrix representation of
of states. We have shown that the extended MRSC method conflicts with two decision-makers”, Under preparation, 2007.
has the advantages of easy calculation, easy coding. It can
be expected to stimulate further empirical research.
B. Future Work
To demonstrate the efficiency of the new representations, it
would be worthwhile to extend the matrix method introduced
here to multiple-decision-maker conflicts with preference
uncertainty. Then the method may be further extended to
tackle more complex problems, such as models with different
preference strengths.
R EFERENCES
[1] S. J. Brams, and D. Wittman, “Nonmyopic equilibria in 2 x 2 games”,
Conflict Management and Peace Science, vol. 6, no. 1, pp. 39-62, 1981.
[2] S. J. Brams, Theory of Moves, Combridge University Press, Cambridge,
U. K., 1994.

A Homotopy Method for Solving MPEC Problem
JIAMIN LI 1, QINGHUAI LIU2 , XINMIN WANG2, , GUOCHEN FENG1

1
Institute of Mathematics, Jilin University, Changchun 130012,China
2
Institute of Applied Mathematics Changchun University of Technology,Changchun 130012,china
AMS subject classifications:90C30,47J20
Abstract: In this paper, we present a homotopy method for Let X be the projection of Z onto R n , and X dom (C ) ,
solving the MPEC problem with variational inequalities
constraints. We prove that the existence of the homotopy then the MPEC can be expressed as
pathway is ordinary, and that this homotopy method has min f ( x, y )
large-scale convergent to KKT point of the MPEC problem. s.t. ( x, y ) Z R n m , ˄1.1˅
y S ( x ) { SOL ( F ( x,), C ( x )),
1 Introduction where for each x X , S (x ) is the solution set of the
variational inequality defined by the pair ( F ( x,), C ( x )) ; i.e.,
The mathematical programs with equilibrium constraints y S (x ) if and only if y is in C (x ) and satisfies the inequality:
(MPEC) are important nonlinear models of the artificial
intelligence, neural network, control and machine study, (v y )T F ( x, y ) t 0, v C ( x ). ˄1.2˅
etc(see[1],[2]). It can be used for complex macro-economics Let
analysis and microeconomic analysis; and it can solve many Gr ( S ) { {( x, y ) R n m : y S ( x )},
pivotal problems in the natural science and engineering
technique, such as machine study, pattern recognition, the MPEC may write
automatization, manual brainpower, chemical engineering, min f ( x, y )
traffic transport program, network design, etc (see [2],[4],[5], s.t. ( x, y ) E { Z Gr ( S ),
[7],[8],[9]).
where E denote the feasible region of (1.1).
MPEC is a special optimization problem. The complexity of
constraints is the main difference between the MPEC and the
general program problems, that is, its constraints include the
equilibrium constraints besides the accustomed equation and
2 Homotopy equation
inequality constraints. The word “equilibrium” roots in the
concept of economics. The equilibrium constraints are defined In this paper, we consider the following MPEC problem. Let
by a parametric variational inequality or complementarity for each x X ˈ
system. Due to the existence of these conditions, solving the
MPEC becomes very difficult. In recent years, the MPEC has C ( x ) { { y R m : G ( x, y ) d 0} ,
been intensively studied. The general and systematical research where G : R
nm
o R l is continuously differentiable , and for
begin with [1] for the MPEC. The SQP algorithmǃ the smooth
each x X ˈ Gi ( x, y ) (i 1, ! , A) is a convex function, then
trust region method and other methods are presented for
mathematical programs with linear complementarity C ( x ) is a closed and convex set.
constraints. [6] proposed a homotopy method for solving l
By Lemma 2.1 of [13], y S ( x ) if and only if u R ˈ
bilevel programming problem. [14] discussed the algorithm for
solving multiobjective programs with equilibrium constraints. such that
At the present time, there are a few effective methods for ° F ( x, y ) y G ( x, y )u 0,
solving mathematical programs with variational inequality. ® ˄2.1˅
°̄u t 0, G ( x, y ) d 0,UG ( x, y ) 0.
Many problems are still unsolved in this field. The most used
approach among existing methods is the punishment (2.1) is called to KKT system of (1.2).
technology (see [12]), some use the trial-error method(see Let M ( x, y ) (possibly empty) denote the set of multipliers
[11]). In this paper, we present a combined homotopy method
for solving the MPEC problem with variational inequalities u Rl satisflying (2.1). Obviously, M ( x, y ) is the set-valued
constraints, and prove that this method has large-scale map, and M : R
nm
o RA .
convergence.
In general, the MPEC with variational inequality constraint Clearly, dom ( M ) Gr (C ) and for ( x, y ) dom ( M ) ,
is defined as follows: M ( x, y ) is a nonempty convex polyhedron.
nm
Let f : R o R and F : R n m o R m are continuously Let
nm I ( x, y ) { {i : Gi ( x, y ) 0},
differentiable functions, Z R is a nonempty closed set,
then by (2.1), we have
n m
C : R o R is a set-value map with closed convex values. M ( x, y ) {u RA : F ( x, y ) y G ( x, y )u 0, u j 0, j I ( x, y )}.
————————————————
We shall refer to as the sequentially bounded constraints
Corresponding author: Xinmin Wang, Changchun University
qualification (SBCQ) of [1] for the MPEC (1.1) in our paper.
of Technology, wang_xinmin@email.jlu.edu.cn k k
SBCQ for any convergent sequence {( x , y )} E , there

k k k k
exists for each k a multiplier vector u M ( x , y ) and {u } is and as t o 0 , y (t ) tend to the solution of the lower problem.
bounded.
If E dom ( M ) ,and the SBCQ holds on E for the set-valued
map M, then under the above conditions, the MPEC (1.1) is
3 Homotopy pathway
equivalent to the following optimization problem in the
variables ( x, y , u ) By the above discussion, the MPEC(1.1) is equivalent to the
following parameterized optimization problem
min f ( x, y )
min f ( x, y )
s.t. ( x, y ) Z R n m , ˄2.2˅ s.t. g ( x, y ) d 0, ˄3.1˅
F ( x , y ) y G ( x , y )u 0,
h( w, t ) 0.
u t 0, G ( x, y ) d 0,UG ( x, y ) 0.
When t o 0 , by Lemma 2.1,problem (3.1) becomes problem
m
When the set { y R : G ( x, y ) 0} is nonemptyˈthe homotopy (2.2).
equation for (2.1)can be represented as Problem (3.1) can be rewrited as˖
~
§ F ( x, y ) y G ( x, y )u · min f ( w)
h( w, t ) ¨ ¸ 0,
˄2.3˅ ~
¨ UG ( x, y ) te ¸ s.t. g ( w) d 0, ˄3.2˅
© ¹
T T A h( w, t ) 0.
Where w ( x, y , u ) , e (1, ! ,1) R , t (0,1] .
Its KKT system is
Let ~
w f ( w) w g~ ( w)v w h( w, t )O 0,
Z {( x, y ) R n m : g ( x, y ) d 0} ˈ
nm
h( w, t ) 0, ˄3.3˅
where g : R o R s is continuously differentiable, g i ( x, y ) ~ ~ Vg ( w) 0, v t 0, g ( w) d 0,
(i 1, ! , s ) is a convex function, then Z is a close convex set.
mA
~ ~ where O R ,v Rs
are multipliers, and V diag (v ) .
Set f ( w) f ( x, y ) , g ( w) g ( x, y ) , and some signs are
Now, we turn to construct a homotopy equation for
denoted as follows: problem (2.2). We define a homotopy equation for problem
:1 (t ) {w R n m u RA : g~ ( w) 0, h ( w, t ) 0}, (2.2):
§ (1 t )(~
f ( w) g~ ( w)v ) w h( w, t )O t ( w w 0 ) ·¸
:1 (t ) {( w, t ) R n m u RA u (0,1] : g~ ( w) d 0, h( w, t ) 0}, ¨
H (T , T 00 , t ) ¨ h( w, t ) ¸ 0
¨ ¸
:(t ) : 1 (t ) u R s u R m A , ¨ ~ 0~
Vg ( w) tV g ( w ) 0 ¸
© ¹
: {( x, y ) R n m : g~ ( w) d 0, G ( x, y ) d 0}, ˄3.4˅
w: (t ) {w : (t ) : s g~ ( w) 0},
1 1 i 1 i where T ( w, v, O )T , T 00 ( w 0 , v 0 )T , t (0,1] ˈ v Rs ˈ
I ( w) {i {1, ! , s} : g~i ( w) 0}, O RmA ˈ w ( x, y , u ) R n m u RA ˈ U diag (u ) ,
I G ( x, y ) {i {1, ! , A} : Gi ( x, y ) 0}. V diag (v ) .
Condition A˖ Lemma 3.1
0
For any given T 0 ( w 0 , v 0 )T :1 (1) u Rs ,
(A.1) t >0,1@ , :1 (t ) is nonempty and bounded;
then the equation H (T , T 0 ,1) 0 has a unique solution
(A.2) { y Gi ( x, y ), i I G ( x, y )} is a matrix of full column rank;
~ T (T 00 ,0)T .
(A.3)for t 0, 1@, w :1 (t ) , {g i ( w), i I ( w), w h( w, t )} is
Proof˖When t 1 ˈ(3.4) becomes
full column rank;
w h( w,1)O w w 0 0,
(A.4) w :1 (1) ,
h( w,1) 0,
{w ¦ i I ( w) vi g~i ( w) w h( w,1)O : vi t 0, i I ( w);
Vg~ ( w) V 0 g~ ( w 0 ).
O R m A } :1 (1) {w}; ~
By g ( w) 0 and h( w,1) 0 , we have I (w) 2 and w :1 (1) ,
(A.5)for any ( x, y ) : ,
0
then by Condition (A.4) and w h( w,1)O w w 0 , we
y F ( x, y ) ¦ is 1 ( 2yy Gi ( x, y ) y Gi ( x, y ) y Gi ( x, y )T )
is positive definite. have O 0,w 0
w .hence v 0
v ,T (T 00 ,0)T : (1) .
Remark 2.1 Geometrical interpretation of Condition (A.4) is In generalˈwe rewrite (3.4) as follows form:
that :1 (1) satisfies the normal cone condition. Condition (A.5) H T 0 (T , t ) H (T , T 00 , t ),
can guarantee the uniqueness of solution for the lower problem 0
of MPEC in this paper. and the solution set of H as follows:

Lemma 2.1(McCormick[3]) Suppose that Condition(A.1),(A.2) H T01 (0) {(T , t ) :(t ) u (0,1] : H T 0 (T , t ) 0}.
and (A.5) hold, and the set 0 0
: 0
{( x, y ) R nm ~
: g ( w) 0, G ( x, y ) 0} We will prove the existence of the homotopy pathway in the
next content.
is nonempty. Then for any given point x : , the solution Lemma 3.2˄Parameterized Sard theorem[10]˅ Let Q, N and P
curve { y (t ), u (t ), t (0,1]} for the homotopy equation(2.3)of the be smooth manifolds of dimensions q, m and p, respectively.
lower problem of MPEC is continuous, bounded and unique,

r k ~ k k k
Llet ) : Q u N o P be a C mapˈwhere r ! max{0, m p} . ¦ (1 t k )vi g i ( w ) ¦ O j h j ( w , t k )
iI1 jI 2
If 0 P is a regular value of ) , then for almost all ~
(1 t k )f ( w k ) ¦ vik g~i ( w k ) ¦ Okj h j ( w k , t k ) t k ( w k w 0 ),
a Q , 0 is a regular value of ) a { ) ( a,) . iI1 jI 2
k ~ ~
Lemma 3.3 If H (T , T 00 , t ) is defined as (3.4) and Condition by {w } is bounded, and f , g , h is sufficient smooth, and
(A.1)-(A.5) is true, then for almost all T
0
(T 00 ,0)T :(1) , when k o f , then I1 I ( w) , and we let
~
1 vik ~ O kj
0 is the regular value of map H, and H T 0 (0) consists of some
0
v~i lim ,Oj lim ,
k of min {| vik |, | Oik |} k of min {| vik |, | Okj |}
0 i I 1 , j I 2 i I 1 , j I 2
smooth curves, which starts from (T ,1) , denoted by *T 0 .
0
~ ~
then one of vi and O j isn’t equal to zero. Hence we have
Proof˖ (T , t ) :(t ) u (0,1] ˈ
§ ~ ·
§Q tI · ¨ ¦ vik g~i ( w k ) ¦ Oik w hi ( w k , t k ) ¸ {| vik |, | Okj |} o 0,
¨ 0 ¸ ¨© i I1 ¸ i Imin
1 , j I 2
wH (T , T 00 , t ) ¨ h ( w, t )T ¸
i I 2 ¹
0 0
0
w ( w, w , v )0 ¨ w ¸ that is
¨ Vg~ ( w)T tV 0 0 g~ ( w 0 ) tdiag ( g~ ( w 0 )) ¸ ~ ~ ~
© w ¹ ¦ (1 t )vi g i ( w ) ¦ O j hi ( w , t ) 0,
iI1 jI 2
where this contradicts to Condition(A.3);

§ ~ s · mA (ii) when t 1 ˈby the third equation of (2.5), we get
Q(1 t )¨ 2 f ( w) ¦ vi 2 g~i ( w) ¸ ¦ Oi 2w hi ( w, t ) tI .
© i 1 ¹ i 1 I1 I ( w) . Hence, we have
~ mA
lim [(1 t k ) ¦ vik g~i ( w k ) ¦ Oik w hi ( w k , t k )] w w 0
0 0
By w :1 (1) , we have g ( w ) 0 .Since w h( w, t ) is full 0
k of i I 1 i 1
0 0 0
column rank, we have wH (T , T 0 , t ) w ( w, w , v ) is full row
and
0 0
rank. Hence, wH (T , T 0 , t ) w (T , T 0 , t ) is also full row rank, and mA
lim [(1 t k ) ¦ vik g~i ( w k ) ¦ Oik w hi ( w k , t k )]
0 is a regular of H . k of i I 1 i 1
(T 00 ,0)T : (t ) , 0 is a regular exists. Let

By Lemma 3.2, for almost all
1 (1 t k )vik o v~i t 0, (i I1 )
value of H (T , T 00 , t ) . By the inverse image theorem, H T 0 (0)
0 ~ ~
Oik o Oi , (i 1,..., m l1 )
consists of some smooth curves. And because
H T 0 ((T 00 ,0)T ,1) 0 ˈ there must be a smooth curve *T 0 we have
~ ~ ~
0 0
¦ vi g i ( w ) w hi ( w ,1)O w w0 ,
i I 1
starting from ((T 00 ,0)T ,1) .
k k
0
Lemma 3.4 Suppose that Condition A hold. Given T :(t ) , This contradicts to Condition (A.4). Hence, {(v , O )} is
bounded, and the result is true.
if 0 is a regular value of H, then *T 0 is a bounded curve in By Lemma 3.3 and Lemma 3.4, we get our main result in
0
: (t ) u (0,1] . this paper.

Lemma 3.5 Suppose that Condition A hold. Then (3.3) has
Proof: Suppose *T 00 :(t ) u (0,1] is unbounded, then there 0
solution as t is descending to 0, for almost all T : (t ) ,
k k
exists {(T , t k )} *T 0 and (T , t k ) o f .Noting that ( x, y ) H T01 (0) contains a smooth curve *T 0 ,which stars from (T 0 ,1) .
0
0 0
k k k
and t (0,1] is bounded, hence, by Lemma 2.1, ( x , y , u ) As t o 0 , the limit set 4 u {0} : (1) u {0} of *T is nonempty, 0
0
is bounded, then there exists a subsequence (denoted also and every point of 4 is a solution of (3.3). Especially, if *T 0 is
0
k k k k
by ( x , y , u ) ), we have x o x , y k o y , u k o u ,

t k o t . By the supposition, there exists a subsequence finite, and (T ,0) is the end point of *T 0 , then T is a KKT
0
k k k k
of {( v , O )} (denoted also by {(v , O )} ), which goes to solution of problem (2.2).
0
k k Proof˖By Lemma 3.3ˈ for almost all T : (t ) , 0 is a regular
infinity, and we have || (v , O ) ||o f . Let
value of H T 0 ,and *T 0 consists of some smooth curves, among
I1 {i {1, ! , s} : vik o f}, I 2 {i {1, ! , m A} : Oik o f}. 0 0
0
k them, a smooth curve is starting from (T ,1) .
0
By H (T , T 0 , t k ) 0 , (T k , t k ) satisfies
By the classification theorem of one-dimensional smooth
§ (1 t )(~f ( w) g~ ( w)v ) w h( w, t )O t ( w w 0 ) ·
¨ ¸ manifold, *T 00 is diffeomorphic to a unit circle or the unit
H (T , T 00 , t ) ¨ h( w, t ) ¸ 0, (3.5)
¨ ¸
¨ V ~ ( w) tV 0 g~ ( w 0 )
g ¸ interval. Because
© ¹
we have

(i) when t [0,1) , form the first equation in (3.5), we have

wH (T ,1) invariant. Then we can determine the direction by following
|T ( w 0 , v 0 ,0) result.
wT
Proposition 4.1(Garcia and Zangwill[14], Allgower and
§ mA 2 0 ·
¨ ¦ Oi w h( w ,1) I 0 w h( w0 ,1) ¸ Georg[15]) If *T 00 is smooth, then the positive direction
ï 1 ¸
¨ h( w0 ,1)T ¸
¨ w 0 0
¸, K ( 0) at the initial point (T 0 ,1) satisfies
¨ ¸
¨¨ ¸¸ H Tc 0 (T 0 ,1)
0 ~
© V g ( w )
0 T
diag ( g~ ( w0 )) 0 ¹ sign 0
( 1) m s A 1 .
KT
~ 0 wH T 0 ((T 00 ,0)T ,1) wT
and by g ( w ) 0 ,we have is We give algorithm as follows.
0
Algorithm 1 (Euler-Newton method)

nonsingular. Hence *T 0 is diffeomorphic to a unit interval.
0
(0) Give t 0 1 , an initial point T 0 : (t ) , step-length

Let ( w , t ) be a limit point of *T 0 , then three cases are h0 ! 0 and two small positive numbers
0
possible: H 1 , t H ! 0, k : 0.

(i) (T , t ) : (t ) u {1} ; (1) Compute a predictor point (T
( k 1, 0 )
, tk 1,0 ) :

(ii) (T , t ) w:(t ) u (0,1]; (k )
R n 2 m s A 1 ;
(a) Compute a unit tangent vector ]

(iii) (T , t ) w: (t ) u {0}. (k )
(b) Determine the direction K of predictor step:
By Lemma 3.1 and the equation H (T , T 00 ,1) 0 has only If the sign of the determinant
one solution ((T 00 ,0) T
,1) in : (1) u {1} , the case (i) is H Tc 0 (T ( k ) , t k )
0
( 1) m s A 1 ,
impossible. In case (ii), there must exist a sequence of ( k )T
]
(T k ,t k ) *T 0 such that 1 d j d s, g j ( x k , y k ) o 0 . From the
0
then K (k )
] (k )
, else if the sign of the determinant
k
third equation of (3.5), it follows that || v j ||o f , which H Tc 0 (T ( k ) , t k )
contradicts Lemma 3.4. Hence, case (ii) is impossible. As a 0
( 1) m s A ,
( k )T
conclusion, case (iii) is the only possible case. ]
By Lemma 3.5, *T 00 is determined by the initial value (k ) (k )
then K ] ;
problem to the system of ordinary differential equations ( k ,0) (k )
(c) (T , t k ,0 ) (T , t k ) hkK ( k ) .
§ T c(9 ) ·
°° DH T 0 (T , t )¨¨ ¸¸ 0,
(2)Compute a corrector point (T
( k 1)
, t k 1 ) :
® 0
© t c(9 ) ¹
° (T ( k 1, j ) , tk 1, j ) (T ( k , j 1) , tk , j 1 ) M k, j 1HT 0 (T ( k , j ) , tk , j ), j 1,2, !
°¯T (0) T 0 , t (0) 1. 0
until
If 9 ! 0 , such that t (9 ) 0 , then T T (9 ) must be a
solution of (3.3). || HT 0 (T ( k 1, j ) , tk 1, j ) ||d H1 .
0
( k 1)
Set (T , t k 1 ) (T ( k 1, j ) , t k 1, j );
4 Algorithm (3)If t k 1 d tH , then stop, else choose a new step-length
hk 1 ! 0 . k : k 1 , and go to (1).
In this section, we discuss how to trace numerically the In Algorithm 1,
homotopy equation (3.4). By Lemma 3.5, for almost all
M k , j 1 H c 0 (T ( k , j 1) , t k , j 1 ),
T0
T 0 : (t ) , the homotopy (3.4) generates a smooth curve *T 0 . (4.1)
0
M M ( MM T ) 1
T
We call *T 00 the homotopy path. Tracing numerically from

is the Moore-Penrose inverse of M .
0
(T ,1) until t o 0 , one can find a solution of (3.3). The
References
tangent-vector at a point on *T 00 has two opposite directions,
one (the positive direction) makes 9 increase, and another [1] Luo Z Q, Pang J S and Ralph D. Mathematical
(the negative direction) makes 9 decrease. The negative Programswith Equilibrium Constraints[m].New York:
direction will lead us back to the initial point, so we must go Cambrige University Press, (1996)
along the positive direction. The criterion in step 1 (b) of [2] Outrata J, Kocvara M and Zowe J. Nonsmooth Approach
Algorithm 1 that determines the positive direction is based on a to Optimization Problems with Equilibrium Constraints
basic theory of homotopy method (see[13],[14,15]), that is, the [M]. The Netherlands Kluwer Acadermic Publishers,
(1998)
positive direction K at any point ( w, t ) on *T 00 keeps the
[3] P.G. McCormick, The projective SUMT method for con-
sign of the determinant vex programming, Math.Oper.Res.14(1989) 203-223.
H Tc 0 (T 0 ,1) [4] G.Anandalingam, T. Friesz, Editors, Hierarchical
0 optimization , Annals of Operations research 34(1992).
KT [5] T.Baar, G.J. Olsder, Dynamic noncooperative Game

Theorey, Academic Press, New York(1982).
[6] Dao Li Zhu, Qing Xu, Zhenghua Lin, A homotopy
method for solving bilevel programming problem,
Nonlinear Analysis 57 (2004) 917-928.
[7] I.Constantin, M. Florian, A method for optimizing the
frequencies in a transit network: a special case of
nonlinear biblevelprogramming, Technical report
TRISTAN 1,Center de recherché sur les transports,
University of Montreal(1991).
[8] O.L. Mangarsarian, J.S.Pang, Exact penalty functions for
mathematical programs with linear complementarity
constraints, Report, Computer Science Department,
University of Wisconsin, Madison, Wisconsin 53706,
U.S.A. (1998).
[9] RL. Tobin, Uniqueness results and algorithm for
Stackelberg-Cournot-Nash equilibria, Annals of
Operations Research 34(1992) 21-36.
[10] E.L. Allgower, K. Georg, Numerical Continuation
Methods; An Introduction, Springer, Berlin, New York,
(1990).
[11] Friesz T.L.,Tobin R.L.,Cho H.J.,and Mehta N.J.,
Sensitivity analysis based heuristic algorithms for
mathematical programs with variational inequality
constraints,Math,Prog., 48(1990),265-284
[12] shizuka Y.,and Aiyoshi E., Double penalty method for
bilevel optimization problems,In: Anandalingam,G.,
Friesz T.,eds.,Hierarchical Optimization, Ann. Oper.Res.,
34(1992),73-88.
[13] Qing Xu,Bo Yu,Guo Chen Feng, Globally Convergent
Method for Variational Inequalities,Journal of Global
Optimization,volume 31,Issue1,(2005),121-131.
[14] Jiang Zhixia,Liu Yuning,Yang Yihua,An Algorithm for
Solving Multiobjective Programs with Equilibrium
Comstraints.Journal of Jilin University(Science Edition)
(2005),43(3),275-281.
[15] E.L. Allgower, K. Georg, Simplicial and continuation
methods for approximating fixed points and solutions to
systems of equations, SIAM Rev.22(1980), 28-85.

A State-Of-Charge Estimation Method Based On Extension Theory

For Lead-Acid Batteries
KUEI HSIANG CHAO
Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung, Taiwan
MENG HUI WANG

Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung, Taiwan
CHIA CHANG HSU

Institute of Information and Electrical Energy, National Chin-Yi University of Technology, Taichung, Taiwan
Abstract: This paper presents a state-of-charge (SOC) to discharge state, for a lead-acid battery to achieve a stable
estimation method for lead-acid batteries based on extension state. Therefore, the open circuit voltage of a battery is used
theory. First, a constant electric current discharging during this period to estimate the remaining capacity with
experiment with an electronic load for lead-acid batteries is sizable errors. Moreover, during this period, the open circuit
made to measure and record the internal resistance and voltage with respect to remnant capacity is not linear but
open-circuit-voltage using an internal resistance tester. Then, quasi-linear. The internal resistance method is adopted to
the experimental data are adopted to differentiate the estimate the residual capacity of lead-acid battery by
remaining capacity of the lead-acid battery by constructing an monitoring the change in internal resistance during battery
estimation method based on extension theory. The simulated discharging. However, mistakes may occur when the value is
results indicate that the proposed estimation method can not sufficiently accurate enough due to the small variation of
discriminate the remaining capacity of lead-acid batteries resistance at the early discharge period.
rapidly and accurately. This paper presents a residual capacity estimation method
based on the extension theory for SOC estimation of the
lead-acid battery. The extension theory was proposed by Cai
1 Introduction in 1983 for the purpose pf solving inconsistent problems [5].
It has been adopted widely in many domains [6]-[8]. The
Modern convenient products such as electric vehicles, mobile proposed remaining capacity estimation method will first
telephone, uninterruptible power supply and emergency light create a set of remaining capacity matter-element of the
require battery energy. The electrochemical cells play an lead-acid battery, and then a regular extended correction
important role in human life. Among these, the lead-acid function can directly estimate the remaining capacity of the
battery is the most used one due to its simple structure, cheap battery by calculating the degrees of extended correction.
price, and large electromotive forceȐEMFȑ, and large According to the results, the proposed capacity estimation
operating temperature range. Owing to little gas of lead-acid method can discriminate the remaining capacity exactly, and
battery could be lost; it causes difficulty to estimate the thus make the most efficient use of the lead-acid battery
remaining capacity consumption. However, the users usually energy.
need to know what amount of electric energy is left and how
long this battery can be used. If the state of electric capacity
can be estimated more accurately, it will avoid causing the
2 Lead-Acid Battery
permanent harm to the lead-acid battery resulted from
over-discharged; moreover, it will improve the efficiency of The discharge reaction of a lead-acid battery mainly takes
operating and extend service life so as to make the most place on a polar board. The oxidation and revivification
efficient use of the lead-acid battery energy. reactions will occur on the positive electrode (anode) board
There are currently several methods for estimating the and negative electrode (cathode), respectively. Equation (2.1)
remaining capacity of lead-acid batteries, including shows the discharge and charge chemical reactions of a
electrolyte specific gravity, open circuit voltage, internal lead-acid battery on its anode and cathode board. Discharge
resistance, coulometric measurement, and loaded voltage reaction occurs toward the right side and charge reaction
[1]-[4]. Among these, the open circuit voltage and internal occurs toward the left side. The discharge reaction will
resistance methods are the most commonly used. The open produce lead sulphate ( PbSO4 ) in both of the anode ( PbO2 )
circuit voltage method uses the linear relation between and cathode ( P b ) polar boards, which makes the polar board
voltage value and the remnant capacity of the lead-acid
nearby electrolyte concentration droop such that battery
batteries to carry out SOC estimation. However, it requires
voltage also descend immediately. If one removes the load at
one half to one hour, which is the recovery time of the open
this time, the electrolyte rises near pole plank due to
circuit voltage of the battery after changing from charge state

proliferation effect. The battery voltage will also increase transformation of the matter-element. The extension set
immediately. The battery electrolyte density proliferation extends the fuzzy set from [0, 1] to ( -f ~f ) [10]. Therefore,
tends to be even and the rising speed of battery electric
an extension set could assign any data in this field to express
voltage tends to be steady after 2 hours. The measured
voltage at this time is exactly the open circuit voltage. the degree of relationship between one point and two sectors.
If the relative value is below -1, it means that this point is not
Anode Cathode Anode Cathode
PbO 2 2 H 2 SO 4 Pb
Disch arg e

o PbSO 4 2 H 2 O PbSO 4 in the set. However, if the value is within 0 and -1 for points
m
Ch arg e that are originally outside the set, they can be moved to the
(2.1) set by changing the conditions. Values above 0 represent the
Figure 1 shows the familiar equivalent circuit of a points to be located in this set.
lead-acid battery [9]. Here Rin1 is the equivalent resistance
between the battery electrode and the battery electrolyte, and 3.1 Matter-Element Principle
Rin 2 is the interface resistance of the battery electrode and
The matter-element is one of the main theories in extension
the battery electrolyte. C is the quantity of voltage during
theory. A matter-element contains three essential factors. We
discharging, which is the electric capacity through the space
can define the matter, called N , whose characteristic is c ,
charge between the active substrate and the electrolyte
interface. I is the discharge current. The battery voltage can and v is the value related to c . The matter-element can be
be obtained from Fig. 1 as: expressed as equation (3.1):
(
t
) (
t
)
R ( N , c, v ) (3.1)
CRin1 C Rin 2
Vb V oc ( Rin1 Rin 2 ) I Rin 2 Ie VC e (2.2)
In addition, we can assign R ( N , C , V ) as a multi-
The voltage equation can be simplified as in equation (2.3), dimensional matter-element with a characteristic vector
when the electric circuit is in a steady state. C [c1 , c2 ,....., cn ] and the value vector V [v1 , v2 ,......, vn ]
with respect to C . The multi-dimensional matter-element is
Vb V o c I ( R in 1 R in 2 ) described as
(2.3)
Then the internal resistance can be derived as ª R1 º ª N1 c1 v1 º

«R » « »
« 2» « c2 v2 » (3.2)
( Rin1 Rin 2 ) 'V / I (2.4) R ( N , C ,V )
« .... » « ... ... »
« » « »
where 'V # Voc Vb ¬ RN ¼ «¬ cn vn »¼
In equation (3.2), R j (N , C j ,Vj ) ( j 1, 2,....., n ) is the

sub-matter-element of R .
3.2 Conception of Extension Mathematic
In the classical mathematical field, 0 and 1 indicate that a

matter owns a property or not. However, the extension set
adopts the real number coming from ( -f ~f ) to point out
the degree of a matter to be a part of a property. Assuming
U is a universal set, x is any element of U and X o is the
Fig. 1 Equivalent circuit of lead - acid battery
classical domain, then an extension set E is considered as a
set of systematic pairs in U can be shown as
3 Summary of Extension Theory
To observe the question solving, many questions are unable E ^ x , y x U , y K ( x ) ( f , f )` (3.3)
to obtain the explanation under the existing conditions, but it
may find the answer following the suitable transformation. There are several formats in correlation function. A
Besides the algebraic transformation employed in the primitively extended correlation was applied in this paper.
mathematical computation and the Z-transformation widely Let X o a, b be the classical region, and X m, n be
used in digitally designed applications, the extension theory expressed as joint field and X 0 X ; consequently, the
can be used to solve the contradictory problem through primitively extended correlation function can be denoted as

U( x, x0 ) maximum and minimum values of every characteristic in the
K ( x) (3.4)
D ( x, X 0 , X ) test records. R can be represented as:
K , Rin , 11.1,96.8 ½
where ° °
R ( K , C ,VF ) ® Voc , 11.04,12.84 ¾ (4.1)
° Voc / Rin , 0.119,1.14299 ¿°
ab ba ¯
U( x, x0 ) x (3.5)
2 2
4.2 The Extension SOC Estimation Method
U( x, X ) U( x, X 0 ) x X 0
D ( x, X 0 , X ) ® (3.6) The proposed extension SOC estimation method can be
¯ 1 x X0
formulated as a measured electrical capacity method of
It is convenient to calculate the related degree between x lead-acid battery through a computer program. The proposed
and X 0 by using the extended correlation function [11]. extension SOC estimation method is described in the
following.
Step 1: The matter-element model of each remaining capacity
4 The Proposed SOC Estimation type in every section can be established as
Method
ªK j C1 V j1 º
4.1 Matter-Element Model of SOC Estimation « » (4.2)
Rj « C2 V j 2 » , j 1, 2,3
The lead - acid battery remaining quantity of electricity is « »
¬« C3 V j 3 ¼»
classified into ten types, the symbols of these ten types are
listed as follows. Where V jk is the classical region of every
m jk , n jk
K1: Definition of the remaining capacity is 90%. characteristic set, and its joint field is expressed as
K2: Definition of the remaining capacity is 80%. V'jk m ' , n ' . In this paper, the classical region of each
jk jk
K3: Definition of the remaining capacity is 70%. SOC estimation matter-element is assigned by the maximum
K4: Definition of the remaining capacity is 60%. and minimum values of open circuit voltage, internal
K5: Definition of the remaining capacity is 50%. resistance, and short circuit current. These values are
K6: Definition of the remaining capacity is 40%. produced by the up bound and low bound values of saturation
K7: Definition of the remaining capacity is 30%. voltage and cutoff voltage from lead-acid battery.
K8: Definition of the remaining capacity is 20%. Step 2: Setting the SOC estimation matter-element of the
K9: Definition of the remaining capacity is 10%. tested lead-acid battery as
K0: Definition of the remaining capacity is 0%.
K x , Rin , Vf1 ½
Table 1 shows the matter-element model of SOC estimation °° °° (4.3)
Rx ( K x , C,VF ) ® Voc , Vf 2 ¾
based on extension theory. Here, R represents the ° °
¯° Voc / Rin , V f 3 ¿°
matter-element of ten kinds of remaining capacity
sorts. K n ^ K1 , K 2 , K3 ,..., K 0 ` is the gathering of remaining
Step 3: Estimating the degree of relation of the tested
capacity sort, and R represents the three characteristics of
matter–element, which are an internal resistance Rin , open lead-acid battery when the discharge fall into
abeyance by the proposed extended correlation
circuit voltage Voc and short circuit current Voc / Rin
function:
respectively. The classical region of each characteristic is
assigned by the lower and upper boundary of measured U(v jk , V jk )
records in each capacity section; it is also considered as the ° if v jk V jk
°° V jk
value range of each matter-element model to express the joint K jk (v jk ) ® (4.4)
field of every characteristic, i.e. the possible range of all SOC ° U(v jk , V jk )
if v jk V jk
estimation matter-element model of lead-acid battery in this ° '
°̄ U(v jk , V jk ) U(v jk , V jk )
paper. j 1, 2, 3,...., 0 k 1, 2,3
The range of joint field could be directly determined from

Where Step 8: If a new tested lead-acid battery exists, then go back
to Step 2, or else end the process.
n jk m jk
V jk (4.5)
2 Table 1 The SOC estimation matter-element model of lead-
acid battery
Remaining Matter-Element Mode
a jk b jk 1
U(v jk , V jk ) v jk (b jk a jk ) (4.6) Category
2 2
K1, Rin , 11.19,11.82 ½
° °
K1 R1 ® Voc , 12.71,12.82 ¾
m jk n jk 1 °
U(v jk ,V ' jk ) v jk (n jk m jk ) (4.7)
¯ Voc / Rin , 1.081494,1.142091 ¿°
2 2
K2 , Rin , 11.83,12.16 ½
° °
Step 4: Assign weights to the estimation model such as K2 R2 ® V oc , 12.59,12.69 ¾
W j1,W j 2 ,W j 3 , which denotes the significance of each ° Voc / Rin , 1.036184,1.072696 °
¯ ¿
estimation characteristic in the extension SOC
estimation.
K3 , Rin , 12.18,12.64 ½
° °
K3 R3 ® V oc , 12.49,12.58 ¾
Step 5: Calculate the relation degree of each estimation ° Voc / Rin , 0.988133,1.03284 ¿°
¯
category.
K4 , Rin , 12.65,13.69 ½
3 ° °
K4 R4 ® V oc , 12.33,12.46 ¾
Oj ¦ W jk K jk ( j 1, 2,3,....., 0) (4.8)
° °
k 1 ¯ Voc / Rin , 0.900657,0.98498 ¿
K5 , Rin , 13.7,15.95 ½
Step 6: Normalize the degree of relation for each remaining ° °
category to be between 1 and -1. This procedure K5 R5 ® V oc , 12.18,12.32 ¾
will conveniently estimate the remaining type. ° Voc / Rin , 0.775445,0.940541 °¿
¯
' Oj K6 , Rin , 15.96,20.18 ½
°Oj if Oj !0 ° °
O max K6 R6 ® V oc , 11.87,12.18 ¾
° °
® (4.9)
¯ Voc / Rin , 0.588206,0.763158 ¿°
°O ' Oj
° j if Oj 0 K7 , ½
O max Rin , 20.19, 25.24
¯ ° °
K7 R7 ® V oc , 11.7,11.86 ¾
° Voc / Rin , 0.465533,0.58742 °
Step 7: Selecting the maximum value from the normal ¯ ¿
relation indices (or 1) recognize the remaining
K8 , Rin , 25.27,36.65 ½
category of tested lead-acid battery. The ° °
corresponding equation is K8 R8 ® V oc , 11.59,11.74 ¾
° Voc / Rin , 0.316235,0.464583 ¿°
'
¯
If (O j 1)then ( K n K j) (4.10)
K9 , Rin , 33.66,46.5 ½
° °
K9 R9 ® V oc , 11.48,11.6 ¾
The remaining capacity indices obtained from the
° Voc / Rin , 0.227112,0.316421 ¿°
proposed estimation method not only points out the accuracy ¯
of main remaining capacity type compared to other, but also K0 , Rin , 46.7,96.7 ½
indicates the remain probability of other categories. In ° °
K0 R0 ® Voc , 11.05,11.47 ¾
general, the more relation index values that are owned, the
° V 0.118925,0.24561 ¿°
greater the possibility occurs. ¯ oc / Rin ,

5 Simulation Results matter-element principle of the extension theory. At last,
these tested data progress into the proposed SOC estimation
The SHYKUANG BP12-12 lead-acid battery is adopted to system. The estimated results of the remaining capacity are
make a constant current (1A) discharge experiment, and an shown in Table 3. The simulation results have proven that the
electronic load is connected to the lead-acid battery as a load. proposed estimation method can easily recognize the
The changes of internal resistance in each minute and the remaining category from Table 3. For example, in tested
open circuit voltage in each hour after load disconnection are number 3, the relation index of the remaining 60% category
recorded. Figure 2 shows the relation between open circuit is 1Ȑor maximum valueȑ, which expresses that the remaining
voltage and battery discharge degree of the lead-acid battery. capacity of the battery is 60%, and can continue to use
Figure 3 also shows the relation of internal resistance and the without charging. In the same tested number we can make
degree of battery discharge of the lead-acid battery. The sure that 0% is impossible since its relation index is equal to
experimental results are all close to those in the manufacture -1 (or minimum value).
datasheet.
Table 2 Tested data of lead-acid battery
Open Short Actual
Internal
Tested Circuit Circuit Remaining
Resistance
No. Voltage Current Capacity
(: )
Open Circuit Voltage (V)
(V) (A) (%)

1 11.82 12.82 1.084602 90%
2 33.7 11.61 0.34451 20%
3 12.75 12.44 0.975686 60%
4 17.76 12 0.675676 40%
5 23.82 11.78 0.494542 30%
6 52.7 11.42 0.216698 0%
7 41.7 11.5 0.276978 10%
Fig. 2 The relation curve between remaining capacity 8 26.62 11.73 0.440646 20%
and open circuit voltage of lead-acid battery
9 11.56 12.79 1.106401 90%
10 14.64 12.25 0.836749 50%
11 18.68 11.96 0.640257 40%
12 15.46 12.21 0.78978 50%
13 13.7 12.32 0.89927 60%
14 24.52 11.76 0.479608 30%
15 12.15 12.65 1.04115 80%
6 Conclusions
Fig. 3 The relation curve between remaining capacity In this paper, an SOC estimation method based on the
and internal resistance of lead-acid battery extension theory for a lead-acid battery was proposed. First, a
discharge with constant current (1A) has been made using the
In order to confirm the feasibility of the proposed SOC electronic load and the change in the internal resistance and
estimation method, first the 10 remaining categories need to open circuit voltage is recorded by using the internal
be built according to the relation of battery discharge time resistance tester TES-32. Secondly, the extension relation
and the total capacity of the battery. Then parameters of 15 index was used to estimate the remaining capacity of the
tested remaining categories are arbitrarily set by the lead-acid battery. Finally, the simulation results show that the
experimental data. The parameters of the building–out needed proposed estimation method can recognize the remaining
to include open circuit voltage, internal resistance and short capacity of lead-acid batteries accurately and rapidly. When
circuit current (it is defined as Voc / Rin ). The tested the proposed SOC estimation method is used on other
lead-acid batteries, it is only required to modify some parts of
parameters are listed in Table 2.
data. Therefore, it saves time on updating data and increases
In Table 2, the tested data such as open circuit voltage,
the practicability of the proposed estimation remaining
internal resistance and short circuit current are listed. Next,
capacity method. The proposed method does not require a
the tested matter-element is established by applying the

particular parameter to be set and there are no learning Conference on Electrical and Computer Engineering,
procedures. Moreover, the computation process and speed are (1999), 977-982.
fast and simple. Hence, the proposed estimation method is [7] J. Li and S. Wang, Primary Research on Extension
convenient in a great variety of secondary cells. Control and System, International Academic Publishers,
New York, (1991).
[8] M. H. Wang and H. C. Chen, Application of Extension
References Theory to the Fault Diagnosis of Power Transformers,
[1] J. H. Aylor, et. al., A Battery State-of-Charge Indicator Proceedings 22 Symposium on Electrical Power
for Electric Wheelchairs, IEEE Transactions on Engineering, (2001), 797-800, (in Taiwan ).
Industrial Electronics, (1992), 398-409. [9] S. Sato and A. Kawamura, A New Estimation Method of
[2] M. J. Hlavac and D. Feder, VRLA Battery Monitoring Charge Using Terminal Voltage and Internal Resistance
Using Conductance Technology, International for Lead-Acid Battery, Power Conversion Conference.
Telecommunications Energy Conference, INTELEC
(2002), 565-570.
(1995), 284-291.
[10] M. H. Wang, Application of Extension Theory to
[3] T. Yanagihara and A. Kawamure, Residual Capacity
Vibration Fault Diagnosis of Generator Sets, IEE
Estimation of Sealed Lead-Acid Batteries for Electric
Proceeding-Generation, Transmission and Distribution,
Vehicles, Power Conversion Conference, (1997),
151(2004), 503-508.
943-946.
[11] S. H. Ho, K. H. Chao, and M. H. Wang, Application of
[4] O. Caumont, et. al., Energy Gauge for Lead-Acid Extension Fault Diagnosis Method to Malfunction
Batteries in Electric Vehicles, IEEE Transactions on Investigation of Photovoltaic System, The 10th
Energy Conversion, 15(2000), 354-360. conference on Artificial Intelligence and Applications,
[5] W. Cai, The Extension Set and Incompatibility Problem, (2005), 282-289, (in Taiwan).
Journal of Scientific Exploration, 1(1983), 81-93.
[6] Y. P. Huang and H. J. Cgeb, The Extension-base Fuzzy
Modeling Method and Its Applications, IEEE Canadian
Table 3 Estimated results of battery SOC based on the proposed method

Tested Relation Index of the Remaining Capacity Type Estimation
No. 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Result
K1 K2 K3 K4 K5 K6 K7 K8 K9 K0
1 1 0.937 0.084 -0.102 -0.593 -0.794 -0.912 -0.961 -1 -0.984 90%
2 -0.840 -0.834 -1 -0.797 -0.838 -0.667 -0.459 1 -0.159 -0.593 20%
3 0.230 0.434 0.636 1 0.108 -0.450 -0.762 -0.890 -0.992 -1 60%
4 -0.590 -0.571 -0.666 -0.475 -0.340 1 -0.362 -0.676 -0.927 -1 40%
5 -0.748 -0.736 -0.881 -0.675 -0.661 -0.348 1 -0.126 -0.788 -1 30%
6 -0.785 -0.782 -1 -0.757 -0.843 -0.708 -0.585 -0.290 0.075 1 0%
7 -0.886 -0.882 -1 -0.864 -0.903 -0.812 -0.717 -0.444 1 -0.423 10%
8 -0.792 -0.777 -1 -0.704 -0.717 -0.361 0.211 1 -0.570 -0.965 20%
9 1 -0.317 -0.657 -0.681 -0.875 -0.934 -0.972 -0.987 -1 -0.988 90%
10 -0.462 -0.424 -0.484 -0.183 1 -0.289 -0.682 -0.842 -0.970 -1 50%
11 -0.588 -0.565 -0.674 -0.473 -0.367 1 -0.185 -0.585 -0.905 -1 40%
12 -0.334 -0.295 -0.395 -0.065 1 0.181 -0.467 -0.733 -0.946 -1 50%
13 0.230 0.434 0.636 1 0.108 -0.450 -0.762 -0.890 -0.992 -1 60%
14 -0.710 -0.695 -0.890 -0.618 -0.609 -0.224 1 0.229 -0.700 -1 30%
15 0.399 1 0.592 0.232 -0.394 -0.698 -0.872 -0.943 -1 -0.993 80%

ON SUBNORMAL COMPLETION PROBLEM

CHUNJI LI1, SHENGJUN LI2 AND JIANRONG WU3
1
Institute of System Science, College of Sciences, Northeastern University, Shenyang 110004, China
2
College of Information Sciences and Technology, Hainan University, Haikou 570228, China
3
Department of Mathematics, University of Science and Technology of Suzhou, Suzhou 215009, China
E-mail: jrwu@mail.usts.edu.cn
Abstract: Let {en }fn 0 be the canonical orthonormal basis for

We first reconsider Curto-Fialkow’s subnormal
completion criterion, and discuss two variable subnormal l 2 ¢ , and D {D n }fn 0 be a bounded sequence of
completion problem by using the solution of truncated positive numbers. Let WD be the unilateral weighted shift
complex moment problem.
defined by WD en : D n en 1 , for all n ¢ . The numbers
Keywords: J 0 : 1, J 1 : D 02 , J 2 : D 02D12 ,K , J n : D 02 L D n21 ,
Subnormal completion problem; unilateral weighted shift;
moment problem are called the moments of WD . It is well known that WD is
hyponormal if and only if D n d D n 1 for all n ¢ .
1. Introduction and Preliminaries
Let H be a separable, infinite dimensional, complex Berger’s Theorem ([2, III. 8.16]). WD is subnormal if and
Hilbert space and let L( H ) denote the algebra of all only if there exists a Borel probability measure P
bounded linear operators on H . An operator T L( H ) is 2
º with WD 2 supp P , such that
supported in ª 0, WD
* *
said to be normal if T T TT , hyponormal if T T t TT , * * ¬ ¼
³ t d P t , for all
n
and subnormal if T N |H , where N is normal on some Jn n¢ .
Hilbert space K H . It is well known that normal
subnormal hyponormal, with converses false. On the other hand, it was shown that WD is
Recently the classes of k -hyponormal and weakly
subnormal if and only if the two Hankel matrices
k -hyponormal operators have been introduced and studied f
in an attempt to bridge the gap between subnormality and
f
J pq and J 1 pq
p,q 0

are positive ([4]).
p ,q 0
hyponormality.
For a weighted shift WD , if there exists the smallest
Let ρ ^0` and ¥ be the set of positive
positive integer r and \ i i 0,L , r 1 such that
numbers. For A, B L( H ) , let > A, B @ AB BA . We say
that an n -tuple T T1 ,L , Tn of operators on L( H ) is J n \ r 1J n 1 L \ 0J n r n t r , or equivalently,
\ r 2 \0
hyponormal if the operator matrix ª¬T , T º¼
*
j i
n
is positive D n2 \ r 1 2
L 2 n t r 1 .
i, j 1 D n 1 D n 1 L D n2 r 1
on the direct sum of n copies of H . For k ¥ and We call r the rank of D .
T L( H ) , T is k -hyponormal if I , T ,L , T k is In the unilateral weighted shift operators, the
hyponormal. recursively generated weighted shift operators provide a
The Bram-Halmos characterization of subnormaity good role to the study of hyponormal operators (cf. [5]). We
([1]) may be paraphrased as follow: T L( H ) is recall here the unilateral weighted shift which is generated
by only three weights 0 D 0 D1 D 2 (see [6, p.370] for the
subnormal if and only if T is k -hyponormal for every
k ¥ . This provides the study of the bridge of general case). Given D : D 0 , D1 , D 2 with 0 D 0 D1 D 2 .
k -hyponormality and subnormality.

§J · §J · §J · (4) D has an l -hyponormal completion;
Let v0 : ¨ 0 ¸ , v1 : ¨ 1 ¸ , v2 : ¨ 2 ¸ . Then the (5) A( k ) t 0 , B (l 1) t 0 , and
© J1 ¹ ©J 2 ¹ ©J3 ¹ v(k 1, k ) Ran ( A(k )) if m is even,
2
vectors v0 and v1 are linearly independent in ¡ , so v(k 1, k 1) Ran ( B (l 1)) if m is odd,
there exists a unique real numbers < 0 , <1 such that where
< 0 v0 <1v1 v2 . (1.1) §J0 L Jj · § J1 L J j 1 · § Ji ·
¨ ¸ ¨ ¸ ¨ ¸
A( j ) : ¨ M O M¸ , B ( j ) : ¨ M O M ¸ , v i, j : ¨ M ¸.
In fact,
¨J L J 2 j ¸¹ ¨J J 2 j 1 ¸¹ ¨J ¸
D 02D12 D 22 D12 D12 D 22 D 02 . © j © j 1 L © i j ¹
<0 2 2
, <1
D D D12 D 02
1 0
Let ξ2 : u
and let l 2 ¢ be the Hilbert
Moreover, the equality (1.1) is < 0J 0 <1J 1 J 2 and
space of square summable complex sequences indexed by

< 0J 1 <1J 2 J 3 . Let Jî : J i , i 0,1 , and let
¢ 2 . For D D1 , D 2 l 2 ¢ , let WD { WD , WD be the
Jˆn : < 0J n 2 <1J n 1 , , n t 2 . associated 2-variable weighted shift, acting on l ¢ as
1 2
Since Jˆn ! 0 n ¢ (see [5, Proof of Th. 3.5]), we

follows WD ek : D i k ek H
i i
k ¢ 2
, i 1, 2 , where
define Dˆ n : (Jˆn 1 / Jˆn ) 1/ 2
n ¢ (so that Dˆ n Dn
D i k ! 0 for all k ¢ , i
2
1, 2 and êk `k¢ 2 is the
for 0 d n d 2 ). Hence we obtain a bounded sequences
Dˆ : ^Dî ì
f
and the weighted shift operator WDˆ (or
canonical orthonormal basis for l 2 ¢ , H1 1, 0 and
0
written as W(D ^ ).
H2 0,1 . Assume that WD is commuting,
0 ,D1 ,D 2 )
J. Stampfli posed the following

i.e., D i k H j D j k D j k H i D i k for all k ¢ 2 ,
i, j 1, 2 .
Subnormal Completion Problem ([8]). Given m t 0
and an initial segment of positive weights D : D 0 ,K , D m , Lemma 1.1 ([4, Th. 6.1]). WD is hyponormal if and only
seeks necessary and sufficient conditions for the existence
if D1 k H1 t D1 k , D 2 k H 2 t D 2 k , and
of a subnormal weighted shift WDˆ whose first m 1
weights are D 0 ,K , D m . D k H D k D k H D k
2
1 1
2
1
2
2 2
2
2
2
t D k H D k H D k D k .
1 2
2
2 1 1 2
J. Stampfli solved the problem for m 2 . That is,

W(D ,D ,D )^ is subnormal if and only if 0 D 0 D1 D 2 We now define
0 1 2
1 if k (0, 0),
([8]). °D 2 (0, 0)L D 2 k 1, 0
R. Curto and L. Fialkow showed the following result °° 1 1 1 if k1 t 1, k2 0,
which solves the subnormal completion problem. J%k : ®D 2 (0, 0)L D 2 0, k2 1
2 2
if k1 0, k2 t 1,
°D 2 (0, 0)L D 2 k 1, 0 if k1 , k2 t 1.
Curto-Fialkow’s Subnormal Completion Criterion ([5, ° 12 1 1
°̄D 2 (k1 , 0)L D 2 k1 , k2 1

2
Th. 3.5]). Let D : D 0 ,L , D m m t 0 be an initial segment Generalized Berger Theorem ([7]). WD is subnormal if
of positive weights and let k : ª
m 1º ªmº . and only if there exists a compactly supported positive
«¬ 2 »¼ , l : «¬ 2 »¼ 1 Borel measure P on ¡ 2 such that
The following statements are equivalent:
(1) WDˆ is a subnormal completion of D; ³ t d P t : ³ t
k
1 2 t d P t1 , t2 J%
k1 k2
k k ¢ . 2

(2) D has a recursive subnormal completion; Two Variable Subnormal Completion Problem ([7]).
(3) D has a subnormal completion; Given m t 0 and a finite collection of pairs of positive

numbers C ^D k D (k ), D
1 2 (k ) ` k d m k : k1 k2 . Find If m is odd, i.e., 2 p 1 , then k l p .
m
necessary and sufficient conditions to guarantee the Since det A( j ) ! 0 , for j 1, 2,L , k 1 , and
existence of a subnormal two variable weighted shift whose det A(k ) 0 , we know that A(k ) is positive. And
initial weights are given by C .
since det B ( j ) ! 0 , for j 1, 2,L , l 1 , we know that
In this article, we first reconsider Curto-Fialkow’s B (l 1) is positive and invertible. Thus we have
subnormal completion criterion, and discuss two variable v k 1, k 1 Ran B(l 1) . Hence, by Curto-Fialkow’s
subnormal completion problem by using the solution of
truncated complex moment problem. All of the calculations Criterion, we have (1). ƶ
in this article, we used the computer software program
Mathematica ([9]). Theorem 2.2. Let D : D 0 ,L , D m be positive numbers with
m 1º ªmº
2. Reconsidering Curto-Fialkow’s Subnormal D 0 d D1 d L d D m and let k : ª« » , l : « » 1.
Completion Criterion ¬ 2 ¼ ¬2¼
Assume that rank D l 1 . The following statements are
Theorem 2.1. Let D : D 0 ,L , D m be positive numbers with equivalent:
(1) D has a subnormal completion;
D 0 d D1 d L d D m and let ª m 1º ªmº
k: « » ,l : « » 1 . (2) D has an l -hyponormal completion;
¬ 2 ¼ ¬2¼
(3) det A( j ) ! 0 , for j 1, 2,L , k and
Assume that rank D l . The following statements are
equivalent: det B ( j ) ! 0 , for j 1, 2,L , l 1 .
(1) D has a subnormal completion; Proof. (1) (2): Curto-Fialkow’s Criterion.
(2) D has an l -hyponormal completion; (2) (3): Since rank D l 1 , we know that
(3) det A( j ) ! 0 , for j 1, 2,L , k 1 , det B( j ) ! 0 , det A( j ) ! 0 , det B( j ) ! 0 for j 1, 2,L , l , and
for j 1, 2,L , l 1 , and det A(k ) ! 0 if m is even, det A( j ) 0 , det B ( j ) 0 for j t l 1 . Thus A( k )
det A(k ) 0 if m is odd. and B(l 1) are positive and invertible. Hence we have the
Proof. (1) (2): Trivial. result. ƶ
(2) (3): If (2) holds, then by Curto-Fialkow’s
criterion, A(k ) t 0 , B(l 1) t 0 , and By Theorem 2.1 and Theorem 2.2, we have the
following corollaries.
v k 1, k Ran A(k ) if m is even,
v k 1, k 1 Ran B(l 1) if m is odd. Since D : D 0 , D1 , D 2 be distinct positive
Corollary 2.3. Let
rank D l , det A( j ) ! 0 , det B( j ) ! 0 for numbers with rank D 2 . Then W(D ,D ,D )^ is
0 1 2
j 1, 2,L , l 1 , and det A( j ) 0 , det B( j ) 0 for subnormal if and only if D 0 D1 D 2 .
j t l . If m is even, i.e., m 2 p , then
k l 1 p . Thus A(k ) A( p) A(l 1) . So we Corollary 2.4. Let D : D 0 , D1 , D 2 , D 3 be distinct positive
have det A(k ) ! 0 . If m is odd, i.e., m 2 p 1 , numbers with D 0 D1 D 2 D 3 .
then k l p . Thus A( k ) A( p ) A(l ) . So we have (1) Let rank D 2 . Then W(D is
^
0 ,D1 ,D 2 ,D 3 )
det A(k ) 0 .
subnormal if and only if
(3) (1): If m 2 p , then
is even, m 2
D12 D 22 D 02 D 2
1 D 02 D 22D 32 D 02D12 .
k l 1 p . Since det A( j ) ! 0 , for j 1, 2,L , k , we
know that A(k ) is positive and invertible. Thus we have,
(2) Let rank D 3 . Then W(D ^ is
0 ,D1 ,D 2 ,D 3 )
subnormal if and only if

v k 1, k Ran A( k ) . And since, det B ( j ) ! 0 , for 2
D12 D 22 D 02 D12 D 02 D 22D 32 D 02D12 .
j 1, 2,L , l 1 , we know that B (l 1) is positive.

§ J 0,0 J 0,1 J 1,0 ·
¨
¸
Corollary 2.5. Let D : D 0 , D1 , D 2 , D 3 , D 4 be distinct M (1) ¨ J 1,0 J 1,1 J 2,0 ¸ t 0.
¨ ¸
positive numbers with D 0 D1 D 2 D 3 D 4 . Let ¨ J 0,1 J 0,2 J 1,1 ¸
© ¹
rank D 3 . Then W(D ^ is subnormal if and Case 1. rank M (1) 1 .
0 ,D1 ,D 2 ,D 3 ,D 4 )
2 2 Proposition 3.3 ([7, Th. 6.1]). If rank M (1) 1 , then the

only if D12 D 22 D
D 0
2
1 D 02 D 22D 32 D 02D12 and
representing measure is P J (0,0)G J / J .
2 2
D 22 D 32 D D D D D D12D 22 .
( 0,1) ( 0,0)
2 2 2 2
1 2 1 3 4
In this case, C ^( a , b ), ( a , b ), ( a , b ) . `
3. Two Variable Subnormal Completion Problem Hence
We discuss the problem with m 1. Theorem 3.4. C admits a subnormal completion and the
associated measure is. P t P (t1,t2 ) G ( a ,b ) .
Proposition 3.1. Given C ^ a, b ,
c, d , e, f ` , Proof. In fact, J%
1,1 : ab
1 1
³ t t d P.
1 2
where a, b, c, d , e, f are positive numbers with bc af . By similar checking, we have our conclusion. ƶ
The following statements are equivalent.
(1) C has a hyponormal completion; Case 2. rank M (1) 2 .
(2) C has a subnormal completion; In this case, there exist D , E £ such that
2
(3) a d e, b d d , b a c d a d b e a . J 1,0 DJ 0,0 EJ 0,1 , J 2,0 DJ 1,0 EJ 1,1 , J 1,1 DJ 0,1 EJ 0,2 .
Proof. By Lemma 1.1. ƶ §J J J J ·
Since § D · 1 ¨ 1,1 1,0 2,0 0,1 ¸ , for each choice of
¨ ¸
© E ¹ G ©¨ J 0,0 J 2,0 J 1,0 ¹¸
2
In this case, we have J%

0,0 1 , J%
0,1 b
, J%
1,0 a,
z0 on the line z D Ez with z0 z J 0,1 / J 0,0 , we
J%
0,2 bd , J%
1,1 af
, J%
2,0 ae . Now we can obtain
J 0,2 z0J 0,1
J 0,0 : 1 , J (0,1) : a bi , J (0,2) : ae bd 2iaf and define z1 . Then the densities are
J 0,1 z0J 0,0
J 1,1 : ae bd . Using ^J (i , j ) ` as a data, assume that
0di j d 2
J 0,0J 0,2 J 2 0,1 (J 0,1 z0J 0,0 )
2
a compacted representing measure Q has been found, i.e., U0 : , U1 : .

J 0,2 2J 0.1 z0 J 0,0 z0 2 J 0,2 2J 0.1 z0 J 0,0 z0 2
i j
³ z z dQ ( z, z ) J i , j , (0 d i j d 2) .
Thus we have the following
Let d P (t1 , t2 ) : dQ (t1 it2 , t1 it2 ) . Then we have the
following Theorem 3.5. Given C ^ a, b , c, d , e, f ` ,
Proposition 3.2. P is a compactly supported positive where a, b, c, d , e, f are positive numbers with bc af .
Borel measure on ¡ 2
which interpolates J%
. If a d e, b d d and b(a c) 2 a (d b)(e a ), then

Proof. In fact, C has a subnormal completion. Furthermore, the
1 associated measure is P : U 0G (Re z0 ,Im z0 ) U1G (Re z1 ,Im z1 ) .
0 2
³ t t d P(t t )
1 2 1, 2 (J (0,2) 2J (1,1) J (2,0) ) bd J%
0,2 .
4
By similar checking, we have our conclusion. ƶ Example 3.6. Let a 1, b 1, d 2, e 2 . We choose
c f 2 . Then all conditions of Theorem3.5 are satisfied.
The solution of the quadratic moment problem ([7, Th.
6.1]) implies that C has a subnormal completion if and Thus C ^1,1 , 2, 2 , 2, 2 ` admits a subnormal
only if completion. In this case,

§ 1 1 i 1 i · And det M (1) 36 . Then the atoms of P ( y ) are the 3
¨ ¸
M (1) ¨ 1 i 4 4 ¸ t 0. distinct roots of
¨1 i 4i 4i ¸ 9 z 3 15(1 i ) z 2 4(7 19 4i ) z 32(1 i )(2 19 i ) 0 .
© ¹
By computer calculation, we can obtain the following
And rank M (1) 2 , D 0, E i. Take z0 1 i / 2 . atoms and densities
Then z1 3(1 i ) . U 0 4 / 5, U1 1/ 5 . Hence the z0 | 1.50812 5.43922i, U0 | 0.0591804,
associated measure is P U0G 1/ 2,1/ 2 U1G (3,3) . z1 | 2.15249 2.69381i, U1 | 0.189787,
z2 | 1.99394 1.07875i, U 2 | 0.751104.
Case 3. rank M (1) 3. Thus the associate measure is
P : U0G (Re z ,Im z ) U1G (Re z ,Im z ) U 2G (Re z ,Im z ) .
0 0 1 1 2 2
Theorem 3.7. Given C ^ a, b , c, d , e, f ` ,

where a, b, c, d , e, f are positive numbers with bc af . Acknowledgements
2
If a d e, b d d and b(a c) a (d b)(e a), then
This paper is supported by science foundations from
C has a subnormal completion. the ministry of education of Jiangsu Province (No:
04KJD110168, 06KJB110107˅.
Taking y such that
2
2 Re((J 0,1J 2.0 J 1.1J 1.0 )J 2,0 y ) (J 1.1 J 0,1 ) 2 y
2
(J 21.1 J 0,2 ) 2 . References
Let
2 [1] J. Bram, “Subnormal operators”, Duke Math. J., Vol.
E1 : (J 21,1 J 0,2 ) 2 J 0,2 (J 0,1J 2,0 J 1,1J 1,0 ) y; 22, pp. 75-94, 1955.
E 2 : (J 2,0J 0,1 J 1,0J 1,1 )J 0,2 (J 21,0 J 2.0 ) y; [2] J. B. Conway, Subnormal operators, Pitman Publ. Co.
London, 1981.
2
E3 : (J 1,0J 0,2 J 0,1J 1,1 )J 0,2 (J 1,1 J 0,1 ) y; [3] R. Curto, “Quadratically hyponormal weighted shifts”,
2
Integral Equations Operator Theory, Vol. 13, pp.
E 4 : (J 21,1 J 0,2 )J 1,1 ; 49-66, 1990.
E5 : (J 2,0J 0,1 J 1,0J 1,1 )J 1,1 ; [4] R. Curto, “Joint hyponormality: A bridge between
hyponormality and subnormality”, Proc. Sympos. Pure
E 6 : (J 1,0J 0,2 J 0,1J 1,1 )J 1,1 .
Math. , Vol. 51, Part Ċ, pp. 69-91, 1990.
Then the atoms of P ( y) are the 3 distinct roots of [5] R. Curto and L. Fialkow, “Recursively generated
(det M (1)) 2 z 3 E3 E 4 E1E 6 ((det M (1)) E1 E 3 E5 E 2 E 6 ) z weighted shifts and the subnormal completion
problem, ĉ”, Integral Equations Operator Theory,
(det M (1))( E 2 E 6 ) z 2 .
Vol. 17, pp. 202-246, 1993.
[6] R. Curto and L. Fialkow, “Recursively generated
Example 3.8. If a b c f 1, d e 4 , then all weighted shifts and the subnormal completion
conditions of Theorem 3.7 are satisfied. Thus problem, Ċ”, Integral Equations Operator Theory,
C ^1,1 , 1,2 , 2,1` admits a subnormal completion. In Vol. 18, pp. 369-426, 1994.
this case, [7] R. Curto and L. Fialkow, “Solution of the truncated
J 0,0 1, J 0,1 1 i, J 0,2 2i, J 1,1 8. complex moment problems for flat data”, Mem. Amer.
Math. Soc., Vol. 568, pp. 603-635, 1996.
2
Taking y such that 4 Re((1 i ) y ) y 600 . So we [8] J. Stampfli, “Which weighted shifts are subnormal”,
Pacific J. Math. , Vol. 17, pp. 367-379, 1966.
let y 2(1 i ) 4 19(1 i ) . Then [9] Wolfram Research, Inc. Mathematica, Version 3.0,
E1 48( 19 2i ), E 2 12(1 i ), E 3 24 19(1 i ), Wolfram Research Inc., Champaign, IL, 1996.
E 4 480, E5 48(1 i ), E 6 48(1 i ).

The Left-Groebner bases in Ring of Differential Operators

Jinwang Liu Xiaoling Fu Dongmei Li
College of Mathematics and Computation, Hunan Science and Technology University, Xiangtan, Hunan, China, 411201,
E-mail: jwliu64@yahoo.com.cn
AMS subject classifications: 68Q40
Abstract: The ring of differential operators is an al- We define two partial orders <, <+ on Z2n ≥0 : α <
gebraic structure that is composed of differential opera- β ⇔ the first non-zero component of (β − α) is posi-
tors. A classical result of Stafford says that every (left) tion, α <+ β ⇔ β = α + δ,where 0 = δ ∈ Z2n ≥0 . We
ideal of the n-th ring of differential operators An (k) can define a term order < on An (K) : Y α < Y β ⇔ α < β.
be generated by two element. In this article , we dis- Obviously, x1 > · · · > xn > ∂1 > · · · > ∂n .
cuss mainly propositions and computations of Gröbner For 0 = f ∈ An (K), one has, f = m i=1 ci Y
δi
,
2n
bases of left ideal generated by two element in An (k), 0 = ci ∈ K, δi ∈ Z≥0 , i = 1, 2, · · · , n.
and obtain some interesting results. The leading term degree of f is max{δ1 , δ2 , · · · , δn },
denoted by le(f ); if le(f ) = δj , cj is called the leading
coefficient of f , denoted by lc(f ); the leading term of f
1 Preliminaries is Y δj , denoted by lp(f ); the leading monomial of f is
lc(f ) · Y δj , denoted by lt(f ).
We usually apply differential equations to study Let A = {f1 , f2 , · · · , fs }(fi = 0) be a subset of
neural networks. Ring of differential operators is an An (K),I be a left-ideal in An (K) generated by A, de-
algebraic structure that is composed of differential op- note I = L(f1 , f2 , · · · , fs ). For any f ∈ I, we have a
erators, the structure is complexity. We shall apply finite sum
Gröbner bases in symbolic computation to study gen-
erators of ideal in ring of differential operators. We see f= hi fi , hi ∈ An (K), fi ∈ A (1)
that every left ideal in ring of differential operators is
generated by two element, we mainly study Gröbner we call (1) a representation of f by A.
bases (well generators) of left ideals generated by two If le(hi fi ) ≤ le(f ), ∀i, we call (1) a canonical rep-
element. This shall be in the interest of analysis and resentation of f by A.
computations in differential equations. Definition 2.[6] We call A a left-Gröbner basis
of left ideal I if each of non-zero element of I has a
Let K be a field of characteristic zero and K[x] =
canonical representation by A.
K[x1 , · · · , xn ] the polynomial ring in n variable with
A left-Gröbner basis denotes simply by a L-G-B.
coefficients in K.On K[x] we have the usual k-linear
We define S-polynomials:
derivations ∂/∂x1 , · · · , ∂/∂xn . The K-linear map
∂/∂xi which maps a polynomial P into ∂P/∂xi is called 1 εij −αi 1
the partial differentiation with respect to xi . S(fi , fj ) = Y · fi − Y εij −αj · fj , ∀fi , fj ∈ A.
vi vj
We use the notation ∂i = ∂/∂xi so that ∂i (p) =
∂p/∂xi . Here αi = le(fi ), vi = lc(fi ), εij = sup{αi , αj , <+ }, i =
Definition 1. The ring of K-linear operators on 1, 2, · · · , s.
K[x] which is generated by the derivations ∂1 , · · · , ∂n Example. If αi = (1, 0, 3, 2), αj = (2, 1, 1, 2), then
and the multiplication operators defined by the polyno- sup{αi , αj , <+ } = (2, 1, 3, 2).
mials in K[x], is called the ring of K-linear differential From [6], we know that each non-zero left-ideal of
operators on K[x]. An (K) has a left-Gröbner basis A, and I = L(A), and
We denote this ring by An (K) = K < x1 , · · · , xn , A = {f1 , f2 , · · · , fs } is a L-G-B of L(f1 , f2 , · · · , fs ) ⇔
A
∂1 , · · · , ∂n > and An (K) is also called ring of differen- S(fi , fj ) →+ 0, i, j = 1, 2, · · · , s; A is a L-G-B of
A
tial operators (weyl algebra) in n variables with coeffi- I⇔f →+ 0, ∀f ∈ I.
cients in K. The theory and computation of Gröbner bases in
The xu denotes a monomial xu1 1 · · · xunn in K[x] An (K) are complex, we are interested in what are L-
and similarly ∂ v denotes a monomial ∂1v1 · · · ∂nvn , here Gröbner bases of some left ideals, and how are charac-
u, v ∈ Zn ≥0 . Simply, x · ∂ denotes by Y , α = (u, v).
u v α
teristic of left-Gröbner bases.
From [3], we know some following results: xi · xj = Let α = (a1 , · · · , a2n ), β = (b1 , · · · , b2n ) ∈ Z2n
≥0 ,
xj · xi , ∂i · ∂j = ∂j · ∂i , ∂i · xi = xi · ∂i + 1, ∂i · P = we say α and β are relatively prime if ai bi = 0, i =
P · ∂i + ∂i (P ); when i = j, ∂i xj = xj ∂i . Each element in 1, 2, · · · , n. We say Y α and Y β are relatively prime if
An (K) can be written in a unique way as a finite sum
α and β are relatively prime. We say that h is a divisor
kα Y α where the coefficients kα ∈ K. of f , if f = q · h, here f, h, q ∈ An (K); denoted by h | f ;

say h is a common divisor f and g if h | f and h | g. Proof. since S(f h, gh) = S(f, g) · h, and
If for any common divisor h of f and g, h must be in
{f h,gh} {f,g}
K and h = 0, we will say that f and g are relatively S(f h, gh) −→ + 0 ⇐⇒ S(f, g) −→ + 0.
prime.Obviously, if lt(f ) and lt(g) are relatively prime,
then f and g are relatively prime too. We say that lp(f ) Thus, we can obtain the results.
is a multiple of lp(g) if le(lp(g)) ≤+ le(lp(f )). Let 0 = f (xi ) ∈ K[xi ],and Ani (K) = K < xi , ∂i >
, Li (∂i , f (xi )) is a left-ideal in Ani (K) generated by
{xi , ∂i }.
2 Main Results Proposition 3. Li (∂i , f (xi )) = Ani (K), L(∂i , f (xi )) =
An (K). That is.a L-G-B of L(∂i , f (xi )) is {1}.
Two classical results of Stafford said that that ev-
Proof. Let m = degf (x)
ery (left)ideal of the n-th ring of differential operator
When m = 0, then 0 = f (xi ) = c ∈ K, so
weyl algebra An (k) can be generated by two element,
Li (∂i , f (xi )) = L(∂i , c) = Ani (K).
Anton modifies Stafford’s original proofs to make the
Assume, when m = k > 0, the results are right.
algorithmic computation of there generators possible.
When m = k + 1.
Theory and computation of Gröbner bases in ring of
Since ∂i f (xi ), f (xi )∂i ∈ Li (∂i , f (xi )), and
differential operators, our interest lies especially in the-
∂i f (xi ) = f (xi )∂i + ∂i (f (xi )).
ory and computation of Gröbner bases of left ideal gen-
So ∂i f (xi ) − f (xi )∂i = ∂i (f (xi )) ∈ Li (∂i , f (xi )),
erated by two elements, we shall get some new results.
by assume, so Li (∂i , ∂i (f (xi ))) = Ani (K). And
Proposition 1. Let f, g ∈ An (K),lp(f ), and
Li (∂i , ∂i (f (xi ))) ⊆ Li (∂i , f (xi )).Thus,Li (∂i , f (xi )) =
lp(g) be relatively prime. If f g = gf , then {f, g} is
Ani (K).
a L-G-B.
Furtherly, L(∂i , f (xi )) = An (K),a L-G-B of
Proof. We write f = aY α + f , g = bY β + g ,
L(∂i , f (xi )) is {1}.
where lt(f ) = aY α , lt(g) = bY β , a, b ∈ K. Then
Proposition 4. For any positive integer p, we
Y α = a1 (f − f ),and Y β = 1b (g − g ).
have
Case 1. f = g . Since f g = gf , so S(f, g) = 0.
∂ip ·f (xi ) = a1 f (xi )·∂ip +a2 ∂i (f (xi ))·∂ip−1 +a3 ∂i2 (f (xi ))·
{f, g} is a Gröbner basis.
∂ip−2 + · · · + ap+1 ∂ip (f (xi )) and a1 , a2 , · · · , ap , ap+1 > 0.
Case 2. f = 0 and g = 0.S(f, g) = a1 Y β f −
f Proof. When p = 1, ∂i · f (xi ) = f (xi ) · ∂i +
1 α
b
Y g = ab 1
(g − g )f − ab1
f g = − ab 1
g f , so S(f, g) →+ ∂i (f (xi )), the results is right.
0, {f, g} is a L-G-B. Assume, when p = k, we have
Case 3. f = 0 and g = 0.This is the same as ∂ik ·f (xi ) = b1 f (xi )·∂ik +b2 ∂i (f (xi ))·∂ik−1 +b3 ∂i2 (f (xi ))·
case 2. ∂ik−2 + · · · + bk+1 ∂ik (f (xi )) and b1 , b2 , · · · , bk , bk+1 > 0.
Case 4. f = 0, and g = 0. we have Then
S(f, g) = a1 Y β f − 1b Y α g = ab 1
(g − g )f − ab 1
(f − f )g ∂ik+1 f (xi ) = ∂i · ∂ik · f (xi ) = ∂ik · f (xi ) · ∂i + ∂i (∂ik · f (xi ))
1
= ab (f g − g f ). Let α = le(f ), β = le(g ), we have = [b1 f (xi ) · ∂ik + b2 ∂i (f (xi )) · ∂ik−1 + b3 ∂i2 (f (xi )) · ∂ik−2 +
that α < α, β < β,and α, β are relatively prime. If · · · + bk+1 ∂ik (f (xi ))] · ∂i + ∂i (∂ik · f (xi ))
lp(f g) = lp(g f ), then α + β = β + α, α − α = β − β . = b1 f (xi ) · ∂ik+1 + b2 ∂i (f (xi )) · ∂ik + b3 ∂i2 (f (xi )) · ∂ik−1 +
This is a contradiction to above. Therefore lp(f g) = · · · + bk+1 ∂ik (f (xi )) · ∂i
lp(g f ), and the leading term of ab 1
(f g −g f )appears in +∂i (b1 f (xi )∂ik +b2 ∂i (f (xi ))·∂ik−1 +b3 ∂i2 (f (xi ))·∂ik−2 +

f g or g f . and hence is a multiple of lp(f ) or lp(g).If · · · + bk+1 ∂ik (f (xi )))
lp(f g) > lp(g f ),then = b1 f (xi ) · ∂ik+1 + b2 ∂i (f (xi )) · ∂ik + b3 ∂i2 (f (xi )) · ∂ik−1 +
1 · · · + bk+1 ∂ik (f (xi )) · ∂i
((f − lt(f ))g − g f ).
g
S(f, g) → +b1 ∂i (f (xi ))∂ik +b2 ∂i2 (f (xi ))·∂ik−1 +· · ·+bk ∂ik−1 (f (xi ))·
ab
∂i + bk+1 ∂ik+1 (f (xi ))
If lp(f g) < lp(g f ),then = b1 f (xi ) · ∂ik+1 + (b1 + b2 )∂i (f (xi )) · ∂ik + (b2 +
1 b3 )∂i2 (f (xi ))∂ik−1 + · · · + bk+1 ∂ik+1 (f (xi )).
(f g − (g − lt(g ))f ).
f
S(f, g) → Obviously b1 , b1 + b2 , b2 + b3 , · · · , bk + bk+1 , bk+1 > 0.
ab
Therefore, we obtain the results.
Using an argument similar to the one above,we see the Since ∂i (f (xi )) = ∂f (xi )/∂xi , ∂i2 (f (xi )) =
leading term of ab1
((f − lt(f ))g − g f ) or ab
1
(f g − (g − ∂ f (xi )/∂x2i ,we denote f (xi ) = ∂i (f (xi )), f (xi ) =
2

lt(g ))f ) is a multiple of lp(f ) or lp(g). Therefore this ∂i2 (f (xi )), · · · , f (k) (xi ) = ∂ik (f (xi )).
reduction process uses only f or g. At each stage of the Proposition 5. For any positive integer
reduction the remainder has a leading term which is a p, Li (∂ip , f (xi )) = Ani (K), L(∂ip , f (xi )) = An (K).
multiple of lp(f ) or lp(g). We can continue this process That is, a L-G-B of L(∂ip , f (xi )) is {1}.
{f,g}
until we obtain 0. that is. S(f, g) −→ + 0. Therefore Proof. Suppose that degf (xi ) = m.
{f, g} is a L-G-B. When p = 1, from proposition 3,the results is right.
Proposition 2. For 0 = h ∈ An (K), {f, g} is a Assume, when p = k, the results is right, that is,
L − G − B if and only if {f h, gh} is a L-G-B. Li (∂ik , f (xi )) = Ani (K).

When p = k + 1.From proposition 4, we can write L(Fi · f (xi ), Fi · ∂ip ) = L(Fi ), of course, {Fi } is a L-G-B
∂ik+1 ·f (xi ) = a11 f (xi )·∂ik+1 +a12 f (xi )·∂ik +a13 f (xi )· of L(Fi ).
∂ik−1 + · · · + a1k+2 f (k+1) (xi ), Corollary 2. {Fi } is a L-G-B of L(Fi ·xm p
i , Fi ·∂i ).
a11 , a12 , · · · , a1,k+2 > 0 When Fi ∈ An (K). we can also obtain following
results.
Since ∂ik+1 · f (xi ), f (xi )∂ik+1 ∈ Li (∂ik+1 , f (xi )), so Proposition 9. {Fi } is a L-G-B of L(f (xi ) ·
h1 = ∂ik+1 · f (xi ) − a11 f (xi )∂ik+1 = a12 f (xi ) · Fi , ∂ip · Fi ).
∂ik + a13 f (xi ) · ∂ik−1 + · · · + a1k+2 f (k+1) (xi ) ∈ Proof. Since 1 = g(xi , ∂i ) · f (xi ) + h(xi , ∂i ) · ∂ip .
Li (∂ik+1 , f (xi )) So Fi = 1 · Fi = g(xi , ∂i ) · f (xi ) · Fi + h(xi , ∂i ) · ∂ip · Fi ∈
∂i · h1 = a12 f (xi )∂ik+1 + a13 f (xi ) · ∂ik + · · · + L(f (xi ) · Fi , ∂ip · Fi )
a1k+2 f (k+1) (xi ) · ∂i Obviously, L(f (xi ) · Fi , ∂ip · Fi ) ⊆ L(Fi ).Thus,L(f (xi ) ·
+a12 f (xi )∂ik +a13 f (xi )·∂ik−1 +· · · +a1k+2 f (k+2) (xi ) Fi , ∂ip · Fi ) = L(Fi ), {Fi } is a L-G-B of L(f (xi ) · Fi , ∂ip ·
= a21 f (xi )∂ik+1 +a22 f (xi )·∂ik +· · ·+a2k+2 f (k+2) (xi ). Fi ).
Here a21 = a12 , a22 = a12 + a13 , · · · , a2k+1 = a1k + Example.
a1k+1 , a2k+2 = a1k+1 > 0. 1. L(x21 , ∂15 ) = L(x72 + x42 , ∂25 ) = An (K)
h2 = ∂i · h1 − a21 f (xi )∂ik+1 = a22 f (xi ) · ∂ik + · · · 2. {x72 + x42 , x21 − 2∂3 } is a L-G-B of L(x72 + x42 , (x21 −
+ a2k+2 f (k+2) (xi ) ∈ Li (∂ik+1 , f (xi )). 2∂3 ) · ∂25 )
We can continue this process until we obtain that 3. {x21 − 2∂3 , ∂25 } is a L-G-B of L((x21 − 2∂3 ) · (x72 +
hm = ∂1 hm−1 − am1 f (m−1) (xi )∂ik+1 = am2 f (m) (xi ) · x2 ), ∂25 )
4
∂ik + · · · + amk+2 f (k+m) (xi ), 4. {x22 − 2∂3 } is a L-G-B of L((x72 + x42 ) · (x22 −
and hm ∈ Li (∂ik+1 , f (xi )), am2 > 0. Since degf (xi ) = 2∂3 ), ∂25 (x22 − 2∂3 ))
m,so 0 = f (m) (xi ) = c ∈ K, f (m+l) (xi ) = 0, ∀l > 5. Let F = ∂2 , {F } is a L−G−B of L(xi ·F, ∂2 ·F ),
0.Thus,hm = am2 · c · ∂ik ∈ Li (∂ik+1 , f (xi )), ∂ik ∈ but {F } is not a L-G-B of L(F ·x2 , F ·∂2 ) = L(∂2 ·x2 , ∂22 )
Li (∂ik+1 , f (xi )), Li (∂ik , f (xi )) ⊆ Li (∂ik+1 , f (xi )). But, Since ∂2 ∂2 x2 − x2 ∂22 = 2∂2 , 2∂2 x2 − x2 · 2∂2 = 2,
Li (∂ik , f (xi )) = Ani (K). Therefore Li (∂ik+1 , f (xi )) = so L(F · x2 , F · ∂2 ) = An (K), obviously, {F } is not a
Ani (K). so, L(∂ik+1 , f (xi )) = An (K), that is, a L-G-B L-G-B of An (K).
of L(∂ik+1 , f (xi )) is {1}. The computations of the L-G-B in An (K) are com-
In following, we let p, m be any positive integer. plex and interesting, many problems need further re-
Corollary 1. Li (∂ip , xm i ) = Ani (K), L(∂i , xi ) =
p m
search.
An (K). That is a L-G-B of L(∂ip , xm i ) is {1}.
Let f (xi ) is same as above, and 0 = Fi ∈ K <
x1 , · · · , xi−1 , xi+1 , · · · , xn , ∂1 , · · · , ∂i−1 , ∂i+1 , · · · , ∂n >. ∗
This research is supported by natural science fun-
We easily see that Fi · f (xi ) = f (xi ) · dation in Hunan(06jj2053) and item in the education
Fi ; lp(f (xi )) and lp(Fi ), lp(∂i ) and lp(Fi ) are Department of Hunan province(06A017).
relatively prime,respectively. From proposition
1,{f (xi ), Fi }, {∂ip , Fi } is a L-G-B, respectively.
Proposition 6. {f (xi ), Fi } is a L-G-B of References
L(f (xi ), Fi ∂ip ).
Proof. Since Li (f (xi ), ∂ip ) = Ani (K), then, [1] Adams,W.,Loustaunau,P., An Introduction to
there exist that g(xi , ∂i ), h(xi , ∂i ) ∈ Ani (K) such that Gröbner Bases, Graduate Studies in Mathematics
1 = g(xi , ∂i ) · f (xi ) + h(xi , ∂i )∂ip . So 3, Amer.Math.Soc., Providence, (1994).
Fi = Fi · 1 = Fi · g(xi , ∂i ) · f (xi ) + Fi · h(xi , ∂i ) · ∂ip = [2] Anton,L., Algotithmic proofs two theorems of
Fi · g(xi , ∂i ) · f (xi ) + h(xi , ∂i ) · Fi · ∂ip ∈ L(f (xi ), F · ∂ip ). stafford, J.Symb. Comput, 38, 1335-1550, (2004).
Obviously, L(f (xi ), Fi · ∂ip ) ⊆ L(f (xi ), Fi ). So,
L(f (xi ), Fi · ∂ip ) = L(f (xi ), Fi ). Therefore,{f (xi ), Fi } [3] BJ0̈RK,J-E., Rings of Differential Operators,
is a L-G-B of L(f (xi ), Fi · ∂ip ). North-Holland Publishing Company Amsterdam,
Similarly to the proof of proposition 6, we can ob- OXFORD, New York.
tain following results. [4] Jinwang liu, The Gröbner Bases in Wely-Algebras,
Proposition 7. {∂ip , Fi } is a L − G − B of Acta Mathematica Sinica, No384, (1995).
p
L(∂i , Fi · f (xi )).
[5] Jinwang liu, An Effective Judging Mathod for Ho-
Proposition 8. {Fi } is a L − G − B of L(Fi ·
momorphisms which Keep Gröbner Bases of Ide-
f (xi ), Fi · ∂ip ).
als, Acta Mathematica Sinica, No46(5), (2003).
Proof. Since 1 = g(xi , ∂i ) · f (xi ) + h(xi , ∂i ) · ∂ip .
So [6] Jinwang Liu, The Gröbner bases of Weyl algebra,
ACTA Mathematica Sinica, No 4, 38, (1995).
Fi = g(xi , ∂i )·Fi ·f (xi )+h(xi , ∂i )·Fi ·∂ip ∈ L(Fi ·f (xi ), Fi ·∂ip )
[7] Jinwang Liu, The term orderings which are com-
Obviously, L(Fi · f (xi ), Fi · ∂ip ) = L(f (xi ) · Fi , ∂ip · Fi ) ⊆ patible with composition II, J.Symb.Comput., 35,
L(Fi ). Thus, 153-168, (2003)..

[8] Jinwang Liu, Homogeneous Gröbner bases under proceedings of ISSAC 2001, New York: ACM
composition, to appear(accepted) in: Journal of press, 2001, 237-244.
Algebra.
[9] Mingsheng wang,Zhoujun Liu, Remarks on [10] Stafford, J., Module structure of weyl algebra,
Gröbner bases for ideal under composition , J.London Math.soc.,(2), 18(3), 429-442, (1978).

ON THE FUZZY RIEMANN-STIELTJES INTEGRAL

XUE-KUN REN, CONG-XIN WU, ZHI-GANG ZHU
Department of Mathematics, Harbin Institute of Technology, Harbin 150001, China

E-MAIL: renxuekun@hit.edu.cn
Abstract: In this paper, we define a kind of fuzzy Hence, the research of fuzzy Riemann-Stieltjes integral is
Riemann-Stieltjes integral of fuzzy-number-valued meaningful and may be find some applications in
functions; discuss the properties of the integral, present machine learning.
two necessary and sufficient conditions of integrability
and obtain an existence theorem. 2. Notations and preliminaries
1. Introduction We first recall some notions and notations on the

fuzzy number space ( E1 , D ) for the convenience of
The concept of fuzzy sets was first introduced by reader.
Zadeh [1], the integral for a fuzzy-number-valued Let E 1 = {u : 1 → [0,1] fulfills (i)-(iv) } , where (i) u
function was first defined by Dubios and Prade [2] in is normal, i.e. there exists t0 ∈ 1 , such that u (t0 ) = 1 ; (ii)
1982. Afterward in 1989, Nanda [3] presented a
definition about Riemann-Stieltjes integral of u is fuzzy convex; (iii) u is upper semi-continuous;
fuzzy-number-valued functions. In 2001 Hsein-Chung (iv) [u ]0 = {t ∈ 1 : u (t ) > 0} is a compact set. For the
Wu [4] has given a new definition about Riemann properties, representation theorems operations and
-Stieltjes integral of fuzzy mappings. Furthermore, ordering of fuzzy numbers refer to [10].
Chong Wu [5,6,7] defined the RSu integral of fuzzy We define D : E1 × E 1 → [0, ∞) as:
-number-valued functions and obtained some properties D (u , v) = sup r∈[0,1] d ([u ]r ,[v ]r )
of this type of integral.
But the expression of definition in [4] is too = sup r∈[0,1] max( u r− − vr− , u r+ − vr+ )
complex; and it not based on fuzzy mappings directly. where d (⋅, ⋅) is the ordinary Hausdorff metric. For the
Nanda [3] gave two definitions of Riemann-Stieltjes
properties refer to [10].
integrals of fuzzy-number-valued functions, i.e.
Definition 3.1 and 3.2 in [3] and proved that this two We say that f ( x ) is a fuzzy-number-valued
Definitions are equivalent, that is , the conclusion (a) of function if f :[ a , b] → E 1 . A fuzzy-number-valued
Theorem 4.2 in [3]. Unfortunately, this conclusion is not function f ( x ) is said to be bounded above on [a , b ] if
true (see [8] Remark 3.2); so we can’t use the conclusion
there exists a fuzzy number u ∈ E 1 , called an upper
(a) of Theorem 4.2 in [3] to consider the integral which is
defined by Definition 3.1 in [3] and to investigate this bound of f ( x ) , such that f ( x) ≤ u for all x ∈ [ a, b] ; u
integral by Definition 3.1 in [3] is more not convenient is called the supermum of f ( x ) on [a , b ] , denoted as
(see [8] Theorem 2.1). Therefore, in [5-7] the authors u = sup x∈[ a ,b ] f ( x) , if u is an upper bound of f ( x) and
used a different method to define the fuzzy
Riemann-Stieltjes type integral and got a series of u ≤ v for any upper bound v of f ( x) . The lower
properties. But the references [3] and [5-7] don’t discuss bound and infimum of f ( x ) on [a , b ] are defined
the necessary and sufficient conditions of integrability, so similarly. f ( x ) is said to be bounded on [a , b ] , if it is
they are uncompleted. Hence, in this paper we shall
both bounded above and bounded below [12]. And
define a kind of fuzzy Riemann-Stieltjes integral of
fuzzy-number-valued functions, and present two f :[ a, b ] → E 1 is continuous at x0 ∈ [ a , b] , if for any
necessary and sufficient conditions of integrability and ε > 0 , there exists δ (ε ) > 0 , when x − x0 < δ (ε ) we
obtain an existence theorem. Clearly, for the crisp case
have D ( f ( x ), f ( x0 )) < ε . If f ( x ) is continuous at all
this kind of fuzzy Riemann-Stieltjes integral is just the
ordinary Riemann -Stietjes integral. points of interval [a , b ] , then we say that f ( x ) is
It is well known that in classical case the continuous on [a , b ] .
expectation is defined by ordinary Riemann-Stieltjes Let f , g : [ a, b ] → 1 be real functions. For any
integral, so this kind of integral is more important in division T of [a, b] , T : a = x0 < x1 < < xn = b and for
probability and statistics, and this kind of integral is also
very useful in machine learning theory, such as the any ξi ∈ [ xi −1 , xi ] (i = 1, 2,…, n) , we denote sT = sT ( f , g )
n
problem of minimizing the risk functional from empirical
= ∑ f (ξi )( g ( xi ) − g ( xi −1 )) . If lim|T |→0 sT = A ∈ 1 where
data, the problem of pattern recognition etc, we can find i =1
these applications in Vapnik’s famous monograph [9].

T = max 1≤i ≤ n ( xi − xi −1 ) , i.e. for any ε > 0 , there exists b
( f1 + f2 )dg = ∫ f1dg + ∫ f2 dg .
b b
exists, and ∫ a a a
δ (ε ) > 0 , we have | sT − A |< ε when T < δ (ε ) , then we
(iv) If ( f , g ) ∈ FRS [a , b] , ( h , g ) ∈ FRS [a , b] and
say that f ( x ) is Riemann-Stieltjes integrable (in short,
≥ b hdg
f ( t ) ≥ h (t ) for all t ∈ [a , b ] , then ∫ .
b
RS-integrable) with respect to g ( x ) on [a , b ] , denoted ∫a

fdg
a
b b
(v) If ( f , g ) ∈ FRS [a , b] and f ( t ) ≥ 0 for all t ∈ [a , b ] ,
as ( f , g ) ∈ RS [a , b] , and ∫ a
f ( x )dg ( x ) = ( RS ) ∫ fdg = A .
a b
≥ 0 .
And f ( x ) is RS-integrable with respect to g ( x ) on then ∫ a
fdg
[a , b ] if and only if for any ε > 0 , there exists δ (ε ) > 0 , (vi) If f ( t ) = u ∈ E 1 for all t ∈ [a , b ] , then
such that for any two divisions T and T ' , we have = ( g (b) − g (a ))u .
( f , g ) ∈ FRS [a , b] and
b
| sT ( f , g ) − sT ' ( f , g ) |< ε when T < δ (ε ) and T ' < δ (ε ) . ∫ a

fdg
Lemma 2.1 (see [11] Ch2 Th4) Let ( D, ≥ ) be a directed Theorem 3.3 Let f :[ a, b ] → E 1 be a bounded function,
set, let ( Em , > m ) be a directed set for each m in D , let F g be an increase real function on [a , b ] . Then
be the product D × ⊗{Em : m ∈ D} , and for ( m, f ) in F let ( f , g ) ∈ FRS [a , b] if and only if ( fr− , g ) and ( fr+ , g ) are
R ( m, f ) = ( m, f ( m)) . If S ( m, n ) is an element of a topological uniformly RS-integrable for r ∈ [0,1] on [a , b ] , and
] = [ f dg , f dg ] .
b
[ ∫ fdg
b b
space for each m in D and each n in Em , then S R ∫ ∫
r − +
a a r a r
converges to lim m lim n S ( m, n ) whenever this iterated limit Proof.

exists. b
. From the definition of
(i) Necessity. Denote w = ∫a fdg
In above Lemma 2.1, the Cartesian product
⊗{Em : m ∈ D} is the set of all functions f on D such ( f , g ) ∈ FRS [a , b] we know that for any ε > 0 , there
that f m ( = f (m)) is an element of Em for each m in exists δ (ε ) > 0 , such that for any division T of [a , b ] ,
D . In product directed set (⊗{Em : m ∈ D}, ) , d e if T : a = x0 < x1 < < xn = b and for any ξi ∈ [ xi −1 , xi ] , we
and only if d m > m em for each m in D and in product have D ( sT , w) < ε when T < δ (ε ) , namely, for any
directed set ( D × ⊗{Em : m ∈ D}, ) , ( d , e) ( f , g ) if and r ∈ [0,1] we have | ( sT )+r − wr+ |< ε and | (sT )r− − wr− |< ε .
only if d ≥ f and e g . Lemma 2.1 is important Then we get | wr− − ∑ i =1 fr− (ξi )( g ( xi ) − g ( xi −1 )) |< ε , and
n
because it replaces an iterated limit by a single limit [11].

| wr+ − ∑ i =1 fr+ (ξi )( g ( xi ) − g ( xi −1 )) |< ε for any r ∈ [0,1] . So
n
3. The fuzzy Riemann-Stieltjes integral ( fr− , g ) , ( fr+ , g ) ∈ RS [a , b ] , further, wr− = ∫ fr− dg

b
and
a
let f :[ a , b] → E be a bounded function, w = ∫ f dg

b
+ +
Definition 3.1 1
r r are uniformly for all r ∈ [0,1] , and
a
g be an increase real function on [a , b ] and w ∈ E 1 . If for ]r = [ f − dg , f + dg ] .
b b b
[ ∫ fdg ∫ r ∫ r
any ε > 0 there exists δ (ε ) > 0 such that for any division a a a
T of [a, b] , T : a = x0 < x1 < < xn = b and for any (ii) Sufficiency.

n (a) We show that the class of closed intervals
ξi ∈ [ xi −1 , xi ] (i = 1, 2,… , n) , denoted sT = ∑ f (ξi )( g ( xi ) − {[∫ fr− dg , ∫ fr+ dg ] : r ∈ [0,1]} determines a fuzzy number.
b b
i =1 a a
g ( xi −1 )) , we have D ( sT , w) < ε when T < δ (ε ) In fact, since for any r ∈ [0,1] fr− ( x ) ≤ fr+ ( x ) on [a , b ] ,
( T = max1≤ i ≤ n xi − xi −1 ). Then we say that w is the fuzzy b
fr− dg ≤ ∫ fr+ dg , then [ ∫ fr− dg , ∫ fr+ dg ] is a
b b b
we have ∫ a a a a
Riemann-Stieltjes integral of ( f , g ) and denote it by nonempty bounded closed interval. If 0 ≤ r ≤ t ≤ 1 , since
b

w = ∫ fdg or ( f , g ) ∈ FRS [a , b] . fr− ( x ) ≤ ft − ( x ) and ft + ( x ) ≤ fr+ ( x ) , we obtain
a
fr− dg ≤ ∫ ft − dg , ft + dg ≤ ∫ fr+ dg , then

b b b b
By properties of D : E1 × E1 → [0, ∞) we can easily ∫ a a ∫ a a
obtain the follow theorem.
ft − dg , ∫ ft + dg ] ⊂ [ ∫ fr− dg , ∫ fr+ dg ] .
b b b b
Theorem 3.2 Let f :[ a , b] → E 1 be a bounded function, [∫ (1)

a a a a
g be an increase real function on [a, b] . Now we show that for positive numbers rn r ' ∈ (0,1] ,
b
frn− dg , ∫ frn+ dg ] = [ ∫ fr−' dg , ∫ fr+' dg ] .
∫
∞ b b b b
(i) If a
fdg exists and c is a positive constant, then we have ∩ [∫ n =1 a a a a
(cg ) , b (cf )dg exist and b fd

∫ (cg ) = ∫ (cf )dg = c ∫ fdg
.
b b b
∫ ∫ frn− dg , ∫ frn+ dg ] ⊃ [∫ fr−' dg , ∫ fr+' dg ]
b b b b
a
fd
a a a a
From Eq.(1),we have [ ∫a a a a
b
b
b
(g + g )
∫ ∫ ∫ ∩ n=1[ ∫ frn− dg , ∫ frn+dg ] ⊃ [∫ fr−' dg , ∫ fr+' dg ] .
∞
b b b b
(ii) If a
fdg1 and a
fdg 2 exist, then a
fd 1 2 , so
a a a a
( g + g ) = b fdg
∫ 1 + ∫ fdg
.
b b
∫ ∩ n =1[ ∫ frn− dg , ∫ frn+ dg ] ⊂ [ ∫ fr−' dg ,
∞ b b b
exists and fd 1 2 2 Then we prove that
a a a a a a
f1dg f2 dg ( f1 + f2 ) dg

b b b
∫ ∫ ∫ fr+' dg ]
b
(iii) If a
and a
exist, then a ∫ a
. By assumption, for the divisions
Tk : a = x0 < x1 < < x2k = b ( xi = a + i ( b − a ) 2 , i = 0,1,, 2k ) k

and ξi ∈ [ xi −1 , xi ] ( i = 0,1,,2 k ) (without loss of generality, b
fr−' dg − 2ε ≤ y ≤ ∫ fr+' dg + 2ε . Since ε
b
we obtain ∫ a a
is
we may assume that all xi and ξ i are independent of
arbitrary, we have y ∈ [ ∫ fr−' dg ∫ fr+' dg ] . So
b b
n and k ), we have a a
∩ n=1[ ∫ frn− dg , ∫ frn+dg ] ⊂ [∫ fr−' dg , ∫ fr+' dg ] .

k
∞ b b b b
lim k → ∞ ∑ i =1 frn− (ξi )( g ( xi ) − g ( xi −1 )) = ∫ frn− dg
2 b
a
(2) a a a a
and From the above we have

∩ [∫ frn− dg , ∫ frn+ dg ] = [ ∫ fr−' dg , ∫ fr+' dg ] .
k ∞ b b b b
lim k → ∞ ∑ i =1 frn+ (ξi )( g ( xi ) − g ( xi −1 )) = ∫ frn+ dg

2 b
a
(3) n =1 a a a a
uniformly for all n ∈

, namely, for any ε > 0 , there Also from a representation theorem of fuzzy number we
know that {[ ∫a fr− dg , ∫a fr+ dg ] : r ∈ [0,1]}
b b
exists N1 (ε ) > 0 , when k > N 1 (ε ) we have determines a
⎧
k
− frn− dg |< ε fuzzy number, denoted as , and
⎪| ∑ i =1 f rn (ξi )( g ( xi ) − g ( xi −1 )) − ∫a
b
2 w
( n = 1, 2,) [∫
b
f − dg , b f + dg ] = [ w− , w+ ] for any r ∈ [0,1] .
⎨ k
⎪| ∑ 2 f + (ξ )( g ( x ) − g ( x )) −
r ∫ r r r
frn+ dg |< ε
b
∫a
a a
⎩ i =1 rn i i i −1
. By
(b) We prove that ( f , g ) ∈ FRS[a, b] and w = ∫ fdg
b
i.e. a
fr− dg and fr+ dg exist uniformly for

b b
∫ ∫
k
∑ frn− (ξi )( g ( xi ) − g ( xi −1 )) −ε < ∫ frn− dg

b
2
assumption, a a
i =1 a
k
(4) r ∈ [0,1] , so for any ε > 0 , there exists δ (ε ) > 0 such
< ∫ frn+ dg < ∑ i =1 frn+ (ξi )( g ( xi ) − g ( xi −1 )) +ε
b 2
a that for any division T of [ a , b] ,

T : a = x0 < x1 < < xn = b ξi ∈ [ xi −1 , xi ]
k
∑ f (ξi )( g ( xi ) − g ( xi −1 )) ∈ E1 is clear, and for any

2
( n = 1,2,) . Since i =1
by a representation theorem of fuzzy numbers we obtain (i = 1, 2,… , n) , when T < δ (ε ) we have

k k
⎧| n f − (ξ )( g ( x ) − g ( x )) − b f − dg |< ε
∩ [∑ frn− (ξi )( g ( xi ) − g ( xi −1 )), ∑ i =1 frn+ (ξi )( g ( xi ) − g ( xi −1 ))] ⎪ ∑ i =1 r i
∞
∫a r
2 2
n =1 i =1 i i −1
k k ⎨ n ,
⎪| ∑ i =1 f r (ξi )( g ( xi ) − g ( xi −1 )) − ∫ fr+ dg |< ε
b
= ∑ i =1 fr−' (ξi )( g ( xi ) − g ( xi −1 )), ∑ i =1 fr+' (ξi )( g ( xi ) − g ( xi −1 ))](5)
2 2 +
⎩ a
So from Eq. (5), (2) and (3) we imply that for all r ∈ [0,1] . Thus
k
lim k → ∞ limn →∞ ∑ i =1 frn− (ξi )( g ( xi ) − g ( xi −1 )) = ∫ fr−' dg D ( w, sT ) = sup r∈[0,1] max{| wr− − ∑ i =1 fr− (ξi )( g ( xi ) − g ( xi −1 )) |,
2 b n
| wr+ − ∑ i =1 fr+ (ξi )( g ( xi ) − g ( xi −1 )) |} < ε

n
and
k
lim k → ∞ limn →∞ ∑ i =1 frn+ (ξi )( g ( xi ) − g ( xi −1 )) = ∫ fr+' dg .

b
Hence ( f , g ) ∈ FRS [a , b] .
2
a
k
Theorem 3.4 Let f :[ a, b ] → E 1 be a bounded function,
Denote S (k , n) = ∑ i =1 fr− (ξi )( g ( xi ) − g ( xi−1 )) , then we have
2
n
g be an increase real function on [a , b ] . Then
limk →∞ limn →∞ S (k , n) = ∫ fr−' dg . Let F =
× ⊗{Em : m ∈
} ,
b
a
( f , g ) ∈ FRS [a , b] if and only if for any ε > 0 , there exists
where Em =
for any m and let R ( k , h ) = ( k , h( k )) for δ (ε ) > 0 , such that for any divisions T and T ' , we have
any ( k , h ) ∈ F . Again, notice that S ( k , n ) ∈ 1 and 1 D ( sT , sT ' ) < ε when T < δ (ε ) and T ' < δ (ε ) .
with usual topology is a topological space. Hence, from Proof.
Lemma 2.1 we obtain (i) Necessity. Obviously, inequality D ( sT , sT ' ) ≤ D ( sT , w )
+ D ( w, sT ' ) implies the conclusion.
k
S R = ∑ i =1 frh−( k ) (ξi )( g ( xi ) − g ( xi −1 )) → ∫ fr−' dg .

2 b
a
k
(ii) Sufficiency. Let T be the set of all divisions of
∑ frh+( k ) (ξi )( g ( xi ) − g ( xi −1 )) → ∫ fr+' dg . Then
b
[a, b] , and define the ordering of T as: T1 ≺ T2 ⇔ T2 is
2
Similarly, i =1 a
for any ε >0

, there exists N 2 (ε ) > 0 and the subdivision of T1 . Then {sT }T ∈T is a net of E 1 . By
h0 ∈⊗{Em : m ∈
} , when k > N 2 (ε ) and h h0 we have assumption, for any ε > 0 , there exists δ (ε ) > 0 , such
that for any divisions T ' and T we have D ( sT , sT ' ) < ε
⎧
k
− −
⎪| ∑ i =1 f rh ( k ) (ξi )( g ( xi ) − g ( xi −1 )) − ∫a f r ' dg |< ε
2 b
when T < δ (ε ) and T ' < δ (ε ) . Taking division T0

⎨ k
. (6)
⎪| ∑ 2 f + (ξ )( g ( x ) − g ( x )) − f + dg |< ε
b
∫ satisfies T0 < δ (ε ) , then for any T , T ' T0 , we have
⎩ i = 1 rh ( k ) i i i −1 a r '
So for any ε > 0 , let N (ε ) = max( N1 (ε ), N 2 (ε )) , then T ≤ T0 < δ (ε ) and T ' ≤ T0 < δ (ε ) , so D ( sT , sT ' ) < ε . It
from Eq. (6) and Eq. (4) we get means that {sT }T ∈T is a Cauchy net of E 1 . By properties
fr−' dg − 2ε < ∫ frh−( k ) dg < ∫ frh+( k ) dg < ∫ fr+' dg + 2ε
b b b b
of D : E1 × E 1 → [0, ∞) we know that ( E 1 , D ) is a
∫ a a a a
when k > N (ε ) and h h0 . And clearly, complete metric space, so there exists w ∈ E 1 , such that
net {sT }T ∈T converges to w . Now we prove that
frn− dg, ∫ frn+ dg ] ⊂ [∫ frh−( k ) dg , ∫ frh+( k ) dg ] . Therefore,
∞ b b b b
∩ [∫
n =1 a a a a
b

w = ∫ fdg i.e. ( f , g ) ∈ FRS [a , b] . In fact, by assumption
a
for any y ∈ ∩ n =1 [∫a fr− dg, ∫a fr+ dg ] ⊂ [ ∫a fr− dg, ∫a fr+ dg ] ,
∞ b b b b
n n h(k) h( k ) for any ε >0 , there exists δ1 (ε ) > 0 such that

D ( sT , sT ' ) < ε 2 when T < δ1 (ε ) and T ' < δ1 (ε ) . On the (ii) Let T ' be a new division which only adds one point
e between x0 and x1 of division T . We consider the
other hand, we can find T1 ∈ T such that D ( sT , w) < ε 2
when T T1 , then we take T2 ∈ T satisfying T2 T1 corresponding FRS-integral sums:
n
and T2 < δ1 (ε ) . Finally, we get sT = ∑ f (ξi )( g ( xi ) − g ( xi −1 ))

i =1
D ( sT , w) ≤ D ( sT , sT2 ) + D ( sT2 , w) < ε and
b
.
when T < δ1 (ε ) , that is, w = ∫a fdg
n
sT ' = ∑ f (ξ 'i )( g ( xi ) − g ( xi −1 )) + f (ξ )( g ( x1 ) − g ( e ))
i=2
Lemma 3.5 Let f :[ a, b ] → E 1 be a bounded function,
+ f (ξ ')( g ( e) − g ( x0 ))
g be an increase real function on [a , b ] . If
where e ≤ ξ ≤ x1 , x0 ≤ ξ ' ≤ e and xi −1 ≤ ξ 'i ≤ xi
fr− ( fr+ ) : [ a , b] → 1 are uniformly RS-integrable for r ∈ [0,1] ,
(i = 2,3,…, n ) . Similar to the proof of (i), when T < δ (ε )
then f − | , f − | ( f + | , f + | ) are also uniformly
r [ a ,c ] r [ c,b ] r [ a ,c ] r [ c,b ]
we have
RS-integrable for r ∈ [0,1] , and D ( sT , sT ' ) ≤ ε ( g (b ) − g ( a )) + ε ( g ( x1 ) − g ( x0 )) .
fr− dg = ( RS ) ∫ fr− dg + ( RS ) ∫ fr− dg
b c b
( RS ) ∫ From the process of the above proof, we can also obtain
a a c
D ( sT , sT '' ) ≤ 2ε ( g ( b) − g ( a )) , where the division T '' is
( ( RS ) ∫a fr+ dg = ( RS )∫a fr+ dg + ( RS ) ∫c fr+ dg ).
b c b
that we add more than one point to the division T .

Theorem 3.6 Let f :[ a , b] → E 1 be a bounded function, (iii) We consider any two divisions T1 , T2 and the
g be an increase real function on [a , b ] . If ( f , g ) ∈ FRS [a , b] , corresponding FRS-integral sums:
⎧ n1
then for any c ∈ ( a, b) , we have ( f , g ) ∈ FRS[a, c] ,
⎪ sT = ∑ f (ξi )( g ( xi ) − g ( xi −1 ))
⎪ i =1
= c fdg
( f , g ) ∈ FRS [c, b] and ∫ + ∫ fdg
. If for any
b b
⎨ .
∫ fdg
⎪ s ' =
n2
∑ f (ξ 'i )( g ( x 'i ) − g ( x 'i −1 ))

a a c
c ∈ ( a, b) , ( f , g ) ∈ FRS[a, c] , ( f , g ) ∈ FRS[c, b] , and f is ⎪⎩ T

i =1
continuous at point c , then we have ( f , g ) ∈ FRS [a , b] and Let T3 be the union division of T1 and T2 , and the
= fdg corresponding FRS-integral sum is sT . Then from (ii)
∫ + ∫ fdg
.
b c b
∫a
fdg
a c
3
we get D( sT , sT ) ≤ D( sT , sT ) + D( sT , sT ) ≤ 4ε ( g (b) − g (a )).

If f :[ a , b] → E 1 is continuous and g ( x )
1 2 1 3 3 2
Theorem 3.7
At last, from Theorem 3.4 we infer ( f , g ) ∈ FRS [a , b] .
is an increase real function on [a, b] , then ( f , g ) ∈ FRS [a , b] .
Proof.
Acknowledgements
(i)Firstly, we consider a division T : a = x0 < x1 < < xn = b
, and xi −1 ≤ ξi ≤ xi , xi −1 ≤ ξ 'i ≤ xi (i = 1, 2,… , n) . Then the This paper is supported by NSFC (10571035).
corresponding FRS-integral sums are:
⎧ n
References
⎪ sT = ∑ f (ξi )( g ( xi ) − g ( xi −1 ))
⎪ i =1
⎨ n
. [1] L. A. Zadeh, Fuzzy sets, Inform. and Control, Vol
⎪ s ' = (ξ ' )( g ( x ) − g ( x ))
⎪⎩ T ∑
f i i i −1 8, (1965) 338-353.
i =1
[2] D. Dubois and H. Prade, Towards fuzzy differential
Thus
calculus, Fuzzy Sets and Systems, Vol 8, No. 1-3,
D ( s 'T , sT ) = sup r∈[0,1] max{| ∑ i =1 ( fr− (ξ 'i ) − fr− (ξi ))( g ( xi )
n
(1982) 1-17, 105-116, 225-233.
− g ( xi −1 )) |,| ∑ i =1 ( fr+ (ξ 'i ) − fr+ (ξi ))( g ( xi ) − g ( xi −1 )) |}. [3] S. Nanda, On fuzzy integrals, Fuzzy Sets and Systems,
n
Vol. 32, No. 1, (1989) 95-101.

Since f ( x ) is continuous, we can easily prove that [4] Hsien-Chung Wu, The fuzzy Riemann-Stieltjes integral,
f ( x ) is uniformly continuous on [a , b ] . Hence, for any International Journal of Uncertainty, Fuzziness and
ε > 0 ,there exists δ (ε ) > 0 ,such that D ( f (ξi ), f (ξ 'i )) < ε Knowledge-Based Systems, Vol. 6, No. 1, (1998)
51-67.
when T < δ (ε ) and 1≤ i ≤ n , so that
[5] Wu Chong, RSu integral of interval-valued functions and
max1≤i ≤n | fr− (ξi ), fr− (ξ 'i ) |< ε for any r ∈ [0,1] when fuzzy-valued functions redefined, Fuzzy Sets and
T < δ (ε ) . Thus, Systems, Vol. 84, (1996) 301-308.
[6] Wu Congxin and Wu Chong, A note of the RSu integrals
| ∑ i =1 ( fr− (ξ 'i ) − fr− (ξi ))( g ( xi ) − g ( xi −1 )) |
n
of fuzzy-valued functions, Fuzzy Sets and Systems, Vol.
≤ max 1≤ i ≤ n | fr− (ξi ), fr− (ξ 'i ) | ∑ i =1 | g ( xi ) − g ( xi −1 ) |
n
95, (1998) 119-125.
[7] Chong Wu and Cong-xin Wu, On Riemann-Stieltjes type
≤ ε ( g (b ) − g ( a )).
integral of fuzzy-valued functions, Proceeding of
Similarly, we have
ICMLC2004 Conference, Shanghai, August (2004)
| ∑ i =1 ( fr+ (ξ 'i ) − fr+ (ξi ))( g ( xi ) − g ( xi −1 )) |≤ ε ( g ( b) − g ( a )).
n
26-29.
So we obtain D ( s 'T , sT ) ≤ ε ( g (b) − g ( a )). [8] Wu Congxin and Wu Chong, The supremum and infimum

of the set of fuzzy numbers and its application, J. Math.
Anal. Appl., Vol. 210, (1997) 499-510.
[9] V. N. Vapnik, Statistical Learning Theory [M], John
Wiley & Sons, Inc., New York, (1998).
[10] P. Diamond, P. Kloeden, Metric Spaces of Fuzzy Sets
[M], World Scientific, Singapore, (1994)
[11] J. L. Kelley, General Topology [M], Van Nostrand,
Princeton, (1955)

ON THE DISCRETE TIME BROWNIAN FLOW I:

CHARACTERISTIC AND INVARIANT MEASURE OF THE
N-POINT MOTION
Jingxiao Zhang1
School of Statistics, Renmin University of China, Beijing, 100872
Abstract: First we give out the characteristic of the 2. A stochastic flow is called a temporally homege-
discrete time Brownian flow, then We establish the ex- neous stochastic flow if the law of φm,n and that
istence and uniqueness of an invariant probability for of φm+k,n+k coincide for any k > 0;
the motion corresponding to a temporally homogeneous 3. A stochastic flow is called a stochastic flow of
Brownian flow. homeomorphisms if the map φm,n : Rd → Rd is
an onto homeomorphisms for all m, n and all ω
not in F.
1 INTRODUCTION 4. A stochastic flow of homeomorphisms is called
In this article first we will give out the characteris- a Brownian flow of homeomorphisms if for, any
tics which can determine the discrete time Brownian partition 0 ≤ n0 < n1 < n2 · · · < nl , the random
flow(for the continuous time case,the correponding re- variables φni ,ni+1 , i = 0, 1, 2, · · · , l, are indepen-
sult is classical (see Kunita[4])), then we obtain the ex- dent.
istence of a unique invariant probability for the N-point 5. Let φm,n be a (forward) Brownian flow of home-
motion. omorphisms and xN = (x1 , x2 , · · · , xN ) be a vec-
For continuous time Brownian flow, there has been tor of N points from Rd . Define φm,n (xN ) ≡
lots of result such as the long-term behavior, law of large (φm,n (x1 ), φm,n (x2 ), · · · , φm,n (xN )). By fixing
number and central limit theorems and so on, for ref- m we can treat φm,n (xN , ω) as an Rd - valued
erences, see Kunita [4], Basak and Kannan[1] ,[2]. The stochastic process. This stochastic process is
discrete time Brownian flow was first defined in [7], in called an N-point notion of the Brownian flow
that paper, they also studied the Markovian property φm,n .
of the N-point motion. But relative to the continuous We shall now introduce a basic assumption on the
case, we have little information of the discrete Brownian (forward) Brownian flow of homeomorphisms φm,n (x)
flow. and some notations:
We shall begin by setting up the notations and in-
troducing the notions that are basic in this work. All 1. The random variable φm,n (x) is square interable
our random variables will be supported by a completely for each m, n, x.
probability space (Ω, F, P ). For m, n ∈ N, x ∈ Rd , 2. There exists a positive constant K such that
φm,n (x, ω) be an Rd -valued random field on (Ω, F , P ).
We will recall the definition of discrete stochastic flow |E[φm,n (x) − x]| ≤ K(1 + |x|)(n − m);
and Brownian flow in the following. |E[(φm,n (x) − x)(φm,n (x) − x)t ]|
We can treat the random field φm,n (x, ω) as a fam- ≤ K(1 + |x|)(1 + |y|)(n − m).
ily Φ ≡ {φm,n (x, ω), m, n ∈ N} of continuous functions
mapping Rd into itself. Let F be a null set in Ω. Then: 3. Let b(x, n) ≡ E[φn,n+1 (x) − x].
1. The family Φ is called a (discrete time) stochas-
tic flow if for all ω not in F, 2 CHARACTERISTICS OF
(a) φn,n = the identity map on Rd , for all n and
all ω ∈ F c , and
φm,n (x)
(b) φm,n = φm,k ◦ φk,n holds for all m, k, n, and In this chapter, we want to find the characteristics
all ω ∈ F c ;here · denote the compositions which candetermine the law of the Brownian flow.
of functions.
Proposition 2.1 For each m, x,
A stochastic flow is called a forward stochastic
X
n−1
flow if the time points m, k, n in (b) are taken Mm,n (x) ≡ φm,n (x) − x − b(φm,k (x), k), n ≥ m
in the increasing order m ≤ k ≤ n. We consider k=m
only the foward flows. (2.1)
1 Address
correspondence to zhjxiao@gmail.com. Research was supported in part by the National Natural Science Foundation
of China(10601066)

is a square integrable martingale adapted to Fm,n , re- Proposition 2.2 The joint quadratic variation of
call that Fm,n is the completion of σ(φk,l : m ≤ k, l ≥ Mm,n (x) is
n).
i j
X
n−1
Proof. Set ψm,n (x) = E[φm,n (x)]. The flow prop- [Mm,n (x), Mm,n (x)] = ai,j (φm,k (x), φm,k (x), k).
erty : φm,n+1 = φn,n+1 ◦ φm,n and the independence of k=m
(2.9)
φm,n and φn,n+1 imply
Z Proof. For m + 1,
ψm,n+1 (x) = E[φn,n+1 (y)]P (φm,n (x) ∈ dy)
Mm,m+1 (x)Mm,m+1 (x)t = a(x, x, m). (2.10)
= E[ψn,n+1 (φm,n (x))].
Since Mm,n (x), n ≥ m is an L2 -martingale with the
Therefore we have
additive property (2.5), we have
ψm,n+1 (x) − ψm,n (x) = E[ψn,n+1 (φm,n (x)) − φm,n (x)].
(2.2) (Mm,m+2 (x) − Mm,m+1 (x))(Mm,m+2 (y)
From the assumption on φm,n , we have the inequality −Mm,m+1 (y))t + a(x, x, m)
= Mm+1,m+2 (φm,m+1 (x))Mm+1,m+2 (φm,m+1 (y))t
|ψn,n+1 (φm,n (x)) − φm,n (x)| ≤ K(1 + |φm,n (x)|, (2.3)
+a(x, x, m)
where the right hand side is integrable. Sum both sides = a((φm,m+1 (x), φm,m+1 (y)), m + 1) + a(x, x, m),
with respect to n, we obtain
since Mm,m = 0. By induction ,we arrive at the result.
X
n−1
ψm,n (x) − x = E[b(φm,k (x), k)]. (2.4)
k=0
3 INVARIANT MEASURE
This proves E[Mm,n (x) = 0] for any n, x.
Now Mm,n (x) had the additive property:
FOR THE N-POINT MO-
TION
Mm,n (x) = Mm,k (x) + Mk,n (φm,k (x)) (2.5)
We shall show that there exists a unique invariant prob-
if m < k < n. Since Fm,k and Mk,n are independent, ability π such that the transition probability of the N-
we have point motion of the temporally homogeneous Brownian
flow will converge in distribution to π. Compare with
E[Mm,n (x)|Fm,k ] = Mm,k (x) + E[Mk,n (y)]y=Mm,k (x)
the continuous case(see [1]), we don’t have Itô formula
= Mm,k (x), which was a strong tool in their proof, but we can derive
similar result under certain conditions.
proving that Mm,n (x) is a martingale for each m, x. Let p(n; X (N ) , dY (N ) ) denote the transition prob-
Recall the definition of (co)quadratic variation of ability of the N −point motion {φn (X (N ) ), N ≥ 0}.
a discrete time martingale (see [6], [7]) Then the corresponding transition semigroup Tn , n ≥ 0,
is given by
Definition 2.1 For two martinales (Xn , Fn ) and
(Yn , Fn ), one defines their quadratic covariation se-
(Tn f )(X (N ) )
quence ([X, Y ]n , Fn ), by
= Ef (φn (X (N ) ))
Z
X
n
[X, Y ]n ≡ Xk Yk , (2.6) = f (Y (N ) )p(n; X (N ) , dY (N ) )
RN d
k=1
where Xk = Xk − Xk−1 , Yk = Yk − Yk−1 . on B(RN d ), the Bannach space of all real valued
In particular, the quadratic variation of a martingale bounded Borel measurable functions on RN d with sup
(Xn , Fn ) is defined by norm.
X
n
Definition 3.1 A measure m is said to be invariant
[X]n ≡ [Xk ]2 . (2.7) for the transition probability p(n; X (N ) , dY (N ) ) if, for
k=1
every Borel set B ⊆ RN d such that m(B) < ∞ , one
For any m ∈ N, x, y, let has
Z
a(x, y, m) ≡ Mm,m+1 (x)Mm,m+1 (y)t . (2.8) p(n; X (N ) , B)m(dX (N ) ) = m(B). (3.11)
RN d
2

734
Equivalently, m is invariant if, for all f ∈ L1 (RN d , m), whenever n > n0 . Now ∀m ∈ N,
one has
p(n + m; X (N ) , dY (N ) ) − p(n; X (N ) , dY (N ) )BL
Z = sup |Ef (φn+m (X (N ) )) − Ef (φn (X (N ) ))|
f (Y (N ) )p(n; X (N ) , dY (N ) )m(dX (N ) ) f ∈BL
Nd
ZR = sup |E(Ef (φn+m (X (N ) ))|Fn ) − Ef (φn (X (N ) ))|
= f (X (N ) )m(dX (N ) ) f ∈BL
Z
RN d
= sup | E(f (φn (Y (N ) )) − f (φn (X (N ) )))
f ∈BL RN d
Theorem 3.1 Suppose that for any x, y, n, there ex- p(m, X (N ) , dY (N ) )|

Z
ists a constant 0 < K < 1,
≤ sup |E(f (φn (Y (N ) )) − f (φn (X (N ) )))|
f ∈BL RN d
E|φn,n+1 (x) − φn,n+1 (y)| ≤ K|x − y|. (3.12) p(m, X (N ) , dY (N ) )

Z
≤ E{|φn (Y (N ) ) − φn (X (N ) )| ∧ 2}
and ∀ > 0, there exists a compact set B
∈ RN d , such RN d
that p(m, X (N ) , dY (N ) )
φ1 (x, RN d \B
) < , ∀x ∈ RN d . (3.13) while
Z
Then the N-point motion φn (x(N ) ) ≡ φ0,n (x(N ) ) of the E|φn (Y (N ) ) − φn (X (N ) )|p(m, X (N ) , dY (N ) )
RN d
homogeneous Brownian flow Z
= E|φ1 (φn−1 (Y (N ) )) − φ1 (φn−1 (X (N ) ))|
RN d
X
n−1
φn (x) = x + b(φk (x), k) + Mn (x) (3.14) p(m, X (N ) , dY (N ) )
Z
E[E(|φ1 (φn−1 (Y (N ) )) −
k=0
=
RN d
has a unique invariant probability, where Mn (x) = φ1 (φn−1 (X (N ) ))|)|Fn−1 ]p(m, X (N ) , dY (N ) )
M0,n (x). Z
= E[E|φ1 (y) − φ1 (x)|y=φn−1 (Y (N ) ),x=φn−1 (X (N ) ) ]
RN d
Proof. Existence. To prove the existence of a p(m, X (N ) , dY (N ) )

Z
(unique) invariant probability it suffices to show that
≤ K E|φn−1 (Y (N ) ) − φn−1 (X (N ) )|
the family p(n; X (N ) , dY (N ) ) : N ≥ 0 is Cauchy in the RN d
bounded Lipschitzian(BL) metric dBL difined on the p(m, X , dY (N ) )
(N )
space P(RN d ) of all probability measures on the Borel Z
σ−field of B(RN d )(see Dudley[3]). Recall that ≤ Kn E|Y (N ) − X (N ) |p(m, X (N ) , dY (N ) )
RN d

Next we will show that {p(m, X (N ) , dY (N ) ) : m ≥ 0} is
BL := {f : |f (z) − f (z )| ≤ |z − z |,
R that for each > 0,
a tight family. We only need to show
∀z, z ∈ RN d , and |f (z)| ≤ 1 f or all z ∈ RN d } ∀ subset of RN d B
such that RN d \B p(1, x, dy) =
φ1 (x, RN d \B
) < , ∀x ∈ RN d ,
Z
Z Z p(2, x, dy)
RN d \B
dBL (P1 , P2 ) = sup | f dP1 − f dP2 | Z Z
= p(1, x, dz)p(1, z, dy)
= P1 − P2 BL , (P1 , P2 ∈ P(RN d )). RN d \B RN d
≤
In order to prove that the family Then for each > 0, we can find a compact subset
p(n; X (N ) , dY (N ) ) : N ≥ 0 is Cauchy in the metric B
∈ RN d , such that
Z
dBL we need to show that for all > 0, ∃ n0 ∈ N such
that ∀m ∈ N, p(m, X (N ) , dY (N ) ) < , ∀m, (3.16)
R N d \B 4
We also can find a n0 be such that , for n > n0 ,
p(n + m; X (N ) , dY (N ) ) − p(n; X (N ) , dY (N ) )BL <
(3.15) E|φn (Y (N ) ) − φn (X (N ) )| ≤ , ∀ Y (N ) ∈ B
; (3.17)
4
3

735
Then we have continuous f that
Z Z
E{|φn (Y (N ) ) − φn (X (N ) )| ∧ 2} (Tn f )(Y (N ) )π(dY (N ) )
RN d
RN d \B
= lim Tm (Tn f )(X (N ) )
p(m, X (N ) , dY (N ) ) m→∞
Z
= lim Tm+n f (X (N ) )
+ E{|φn (Y (N ) ) − φn (X (N ) )| ∧ 2} m→∞
B
(N ) (N ) = lim Tm f (X (N ) ), m = m + n,
p(m, X , dY ) m→∞
Z
Z
≤ 2p(m, X (N )
, dY (N ) ) = f (Y (N ) )π(dY (N ) ).
RN d \B RN d
Z

+ p(m, X (N ) , dY (N ) )
4 B Thus π is invariant.

≤ 2 +
4 4 uniqueness. To prove uniqueness of the invariant
<
probability measure π, let π be another invariant dis-
(N )
tribution. Then for any Z ∈ RN d ,
The arbitrariness of thus proved that
{p(n, X (N ) , dY (N ) ) : n ≥ 0} is Cauchy. Since P(RN d ) p(n, Z (N ) , dY (N ) ) − π (dY (N ) )BL
Z
is complete under dBL , there exists a probability π
such that = sup |Ef (φn (Z (N ) )) − Ef (Y (N ) )π (dY (N ) )|

f ∈BL RN d
Z

(N )
p(n, X (N ) , dY (N ) ) − π(dY (N ) )BL → 0 as n → ∞. = sup |Ef (φn (Z )) − Ef (φn (Y (N ) ))π (dY (N ) )|
f ∈BL RN d
(3.18) Z

Now for arbitrary Z (N ) ∈ RN d , = sup | E[f (φn (Z (N ) )) − f (φn (Y (N ) ))]π (dY (N ) )|
f ∈BL RN d
Z

p(n, Z (N )
, dY (N )
) − π(dY (N )
)BL ≤ sup E|f (φn (Z (N ) )) − f (φn (Y (N ) ))|π (dY (N ) )
f ∈BL RN d
≤ p(n, Z (N ) , dY (N ) ) − p(n, X (N ) , dY (N ) )BL Z

≤ E{|φn (Z (N ) ) − φn (Y (N ) )| ∧ 2}π (dY (N ) )
+p(n, X (N ) , dY (N ) ) − π(dY (N ) )BL RN d
= sup |Ef (φn (Z (N ) )) − Ef (φn (X (N ) ))| → 0, as n → ∞,

f ∈BL
+p(n, X (N ) , dY (N ) ) − π(dY (N ) )BL by the lebesgue dominated convergence theorem, since

(N ) (N )
the integrand goes to zero as n → ∞ and is bounded.
≤ sup E|f (φn (Z )) − f (φn (X ))| Then we have
f ∈BL

+p(n, X (N ) , dY (N ) ) − π(dY (N ) )BL π(dY (N ) ) − π (dY (N ) )BL
≤ E{|φn (Z (N ) ) − φn (X (N ) )| ∧ 2} ≤ π(dY (N ) ) − p(n, Z (N ) , dY (N ) )BL
(N ) (N ) (N )
+p(n, X , dY ) − π(dY )BL +p(n, Z (N ) , dY (N ) ) − π (dY (N ) )BL
→ 0, as n → ∞. (3.19) → 0, as n → ∞,
This proves that the limiting distribution π exists and which implies that
it does not depend on the initial point. Z Z

f (Y (N ) )π(dY (N ) ) = f (Y (N ) )π (dY (N ) )
RN d RN d
(3.20)
invariance. It follows from (3.18) and (3.19) for every bounded and continuous function f. Conse-

that the transition probability of the N-point motion quently π = π . This complete the proof.
φn (X (N ) ) converges in the dBL metric to a probabil-
ity distribution π which does not depend on the ini-
tial point. To prove the invariance of π, first note References
that the convergence in dBL is equivalent to weak
convergence(Dudley,[3]). If f is bounded and con- [1] Basak,G.K., Kannan,D., On the Singular N-point
tinuous , then so is TRn f , for all n ≥ 0. Since motion of a Brownian Flow: Asymptotic Flatness
limn→∞ (Tn f )(X (N ) ) = RN d f (Y (N ) )π(dY (N ) ) for evans Invariant Measure,Stochastic Analysis and Ap-
ery X (N ) ∈ RN d , it therefore holds for all bounded plications, 11(4), 369-397, 1993.
4

736
[2] Basak,G.K., Kannan,D., On the Singular N-point [5] Xu J., Kannan D. & Zhang B.(2007), Optimal
motion of a Brownian Flow: Asymptotic Flatness Dynamic Control for the Defined Benefit Pension
ans Invariant Measure, Random Oper. and Stoch. Plans with Stochastic Benefit Outgo, Stochastic
Equ., Vol.4, No. 2, 163-178, 1996. Analysis and Applications, Vol.25(1), :201—236
[6] Bo Zhang, Kannan,D., Discrete-time Martingales
[3] Dudley,R.M., Distance of Probability Meaures and
with Spatial Parameters,Stochastic Analysis and
Random Variables, Ann. Math. Statist. 39, 1563-
Applications, Vol.20, No.5.1101 C 1131. 2002.
1572,1968.
[7] Bo Zhang, Jingxiao Zhang, Kannan,D., Nonlinear
[4] Kunita,H., Stochastic Flows and Stochastic Differ- Stochastic Difference Equations Driven by Martin-
ential Equations, Cambridge: Cambridge Univer- gales, Stochastic Analysis and Applications, Vol.23
sity Press, 1990. 1277-1304, 2005.
5

737
ON THE DISCRETE TIME BROWNIAN FLOW II: CENTRAL

LIMIT THEOREM OF THE N-POINT MOTION
Jingxiao Zhang1
Abstract: Continuing our study of a N-point motion of homeomorphisms(respectively, diffeomorphisms) of

of a Brownian flow[17], we establish in this article cer- Rd .
tain functional central limit theorem for such motions Let xN = (x1 , x2 , · · · , xN ) be a vector of N points
by the CLT of non-homogeneous Markov chain given from Rd and φm,n be a Brownian flow. The process
by Dobrushin[3]. ΦN ≡ (φm,n (x1 ), φm,n (x2 ), · · · , φm,n (xN )) is called an
N-point motion of the Brownian flow Φ.
Noticed that the N −point motion ΦN is non-
1 INTRODUCTION homogeneous Markov chain corresponding to transition
operators
We continue our earlier study of a N-point motion of a
Brownian flow[17], in which we considered the charac- πm,n (x, dy) = P (φm,n (x) ∈ dy), ∀m ≥ 0, n ≥ m.
teristics which can determine the discrete time Brown- (1.1)
ian flow and also the existence and uniqueness of invari- Define contraction coefficients δ(πm,n )([12])of πm,n
ant probability for such a motion. In the present work, as
we establish certain functional limit theorem(CLT) of 1
δ(πm,n ) ≡ sup |E(f (φm,n (X1 ))
the N-point motion(noticed that we don’t need the 2 X1 ,X2 ∈RN d ,f L∞
Brownian flow temporally homogeneous). In doing this,
−f (φm,n (X2 )))|, m ≥ 0, n ≥ m.
we used the CLT of non-homogeneous Markov chain
which was established by Dobrushin[3] and Sethuraman Also define the related coefficient α(πm,n ) = 1−δ(πm,n )
and Varadhan in ([12]). Clearly, 0 ≤ δ(πm,n ) ≤ 1, and δ(πm,n ) = 0 if and
We begin our discussion first by setting up the only if P (φm,n (x) ∈ · not depend on x. As in [12], we
notations and introducing the notions that are basic call δ(πm,n ) ”non-degenerate” if 0 ≤ δ(πm,n ) < 1.
in this work. A complete probability space (Ω, F , P ) For a probability measure μ on RN d and a bounded
will support alll our random variables. For m, n ∈ N, measurable function f , let
x ∈ Rd , φn (x, ω) be an Rd -valued random field on Z
(Ω, F , P ). The system that we analyze is a N-point (μπ)(A) = μ(dx)πm,n (x, A), ∀m ≥ 0, n ≥ m, (1.2)
motion of Brownian flow which we shall now define.
Z
Let φm,n (x, ω), Fix ∀m ∈ N, for n ∈ N, x ∈ Rd be
(πf )(x) = πm,n (x, dy)f (y), ∀m ≥ 0, n ≥ m. (1.3)
a continuous Rd -valued random field treated as a pro-
cess Φ ≡ {φm,n (x, ω), n ∈ N} of continuous functions We can see that δ(πm,n ) has the following proper-
mapping Rd into itself. The C(Rd )-valued process Φ is ties.
called a (discrete time) stochastic flow if there exists a
null set F in ω such that: δ(πm,n )
= sup |φm,n (X1 , A)
1. φn,n = the identity map on Rd , for all n and all X1 ,X2 ∈RN d ,A∈B(RN d )
ω ∈ F c , and −φm,n (X2 , A)|
2. φm,n = φm,k ◦ φk,n holds for all m, k, n, and all = sup |(φm,n f )(X1 ) − (φm,n f )(X2 )|,
ω ∈ F c ;here · denote the compositions of func- X1 ,X2 ∈RN d ,f ∈U
tions.
where U = {f : supX1 ,X2 |f (X1 ) − f (X2 )| ≤ 1}. It
A stochastic flow Φ is called a Brownian flow is the operator norm of πm,n with respect to the Ba-
if it is a process with independent increments. A nach (semi-)norm Osc(f ) = supX1 ,X2 |f (X1 ) − f (X2 )|,
stochastic flow Φ is called a stochastic flow of homeo- namely the oscillation of f . In particular,
morphisms(respectively, of diffeomorphisms) if φm,n :
δ(πm,n ) ≤ δ(πm,k )δ(πk,n ), f or m ≤ k ≤ n. (1.4)
Rd → Rd is an onto homeomorphisms(respectively,
diffeomorphisms). A stochastic flow of diffeomor- For a random variable X, denote by E[X] and
phisms Φ is called a Brownian flow of homeomor- V [X] its expectation and variance with respect to P .
phisms(respectively, of diffeomorphisms if it is a process
with independent increments taking values in the group
1 Address correspondence to zhjxiao@gmail.com. Research was supported in part by the National Natural Science Foundation
of China(10601066)

2 CLT of the N-point Motion Proof. It can be easily derived from Theorem 2.1.
This kind of result was also given out by Dobrushin[3].
On CLT of non-homogeneous Markov chain, there was
a famous result given by R. Dobrushin[3], which was Corollary 2 For the N-point motion of a homoge-
proved again by Sethuraman and Varadhan[12] through neous Brownian flow Φn , suppose that there exist two
martingale approximation method. We will use their finite constants C1 , C2 > 0 such that
result to prove the central limit theorem of the N-point
motion of a Brownian flow– a special non-homogeneous sup |f (X)| ≤ C1 , (2.15)
X∈RN d
Markov chain.
For each n ≥ 1, consider the Markov chain of V (|f (φ0,m (x)))| ≥ C2 , ∀ 1 ≤ m ≤ n (2.16)
length n, φ0,1 (x), φ0,2 (x), . . . , φ0,n (x) with transition α ≡ 1 − δ(π0,1 ) > 0 (2.17)
operators
then we have
πm,m+1 = πm,m+1 (x, dy), ∀m ≤ n − 1. (2.5)
Sn − E[Sn ]
p ⇒ N (0, 1). (2.18)
Let V (Sn )
αn = min αm,m+1 . (2.6)
0≤m≤n−1
Proof. Since Φn is homogeneous,
(n)
Let fm : m ≤ n be real valued functions. Define, for
n ≥ 1, αn ≡ 1 − δ(πn,n+1 ) = 1 − δ(π0,1 ) = α, (2.19)
Xn
(n)
Sn = fm (φ0,m (x)). (2.7) so we can derive the result.
m=1
Example 2.1 Consider the 1-point motion of a Brow-
Theorem 2.1 Suppose there exist some finite con-
nian flow, suppose that
stants Cn such that
sup (n)
sup |fm (X)| ≤ Cn , (2.8) πn,n+1 (x, dy) = P (φn,n+1 (x) ∈ dy)
1≤m≤n X∈RN d = δx (dy) + (1 − )μ(dy),
If,
0 < < 1, μ is a probability measure on Rd .
X
n
Then
lim Cn2 αn
−3
[ (n)
V (fm (φ0,m (x)))]−1 = 0 (2.9)
n→∞ 1
m=1 δ(πn,n+1 ) = sup |E(f (φn,n+1 (X1 ))
2 X1 ,X2 ∈Rd f L∞
then we have
Sn − E[Sn ] −f (φn,n+1 (X2 )))|
p ⇒ N (0, 1). (2.10) Z
V (Sn ) 1
= sup | f (y)[πn,n+1 (X1 , dy)
2 X1 ,X2 ∈Rd ,f ∈U
Proof. Since Φ is a special non-homogeneous Markov
chain with the transition operator πm,n (x, dy). All −πn,n+1 (X2 , dy)]|
the condition given on fm
(n)
: m ≤ n and = sup |[f (X1 ) − f (X2 )]|
X1 ,X2 ∈Rd ,f ∈U
φ0,1 (x), φ0,2 (x), . . . , φ0,n (x) above can match the con-
(n) ≤ .
dition on fm : m ≤ n and X in Theorem 1.1 in[12],
where X in their theorem denoted a general non- Then α(πn,n+1 ) = 1 − δ(πn,n+1 ) ≥ 1 − > 0. This
homogeneous Markov chain. Then we can derive the example was given by P.Del.Moral and L.Miclo in [7].
result by using their theorem directly.
Example 2.2 Consider the 1-point motion of a Brow-
Remark 2.1 Notice here we don’t need the Brownian
nian flow, suppose that there exists a probability mea-
flow is diffeomorphisms as the continuous case[2].
sure ν on Rd such that
Corollary 1 [3],[12] Suppose there exist two finite
πn,n+1 (x, A) = P (φn,n+1 (x) ∈ A)
constants C1 , C2 > 0 such that
≥ (1 − )ν(A),
sup |f (X)| ≤ C1 , (2.11)
X∈RN d 0 < < 1, A ∈ B(Rd ) .
V (|f (φ0,m (x)))| ≥ C2 , ∀ 1 ≤ m ≤ n (2.12) Then
then we have 1
δ(πn,n+1 ) = sup |E(f (φn,n+1 (X1 ))
Sn − E[Sn ] 2 X1 ,X2 ∈Rd f L∞
p ⇒ N (0, 1). (2.13)
V (Sn ) −f (φn,n+1 (X2 )))|
provided = sup πn,n+1 (X1 , ·)
1 X1 ,X2 ∈Rd
lim αn n = ∞ 3 (2.14)
n→∞ −πn,n+1 (X2 , ·)V ar .
2

739
By the equation Example 2.4 Let d = 1, Suppose the 1-point motion
of a Brownian flow is
α(πn,n+1 ) = 1 − δ(πn,n+1 )
X
m F (x)
φn,n+1 (x) = + θn , n ≥ 0, (2.22)
= inf min(πn,n+1 (X1 , Ai ), πn,n+1 (X1 , Ai ) 4
i=1
≥ . where {θn , n ≥ 0} is a family of independent random
variables with same distribution N (0, 1), the standard
where the infimum is taken over all X1 , X2 ∈ Rd and all normal distribution, F (x), x ∈ R be the distribution
resolutions of Rd into pairs of nonintersecting subsets function of N (0, 1).
{Ai ; 1 ≤ i ≤ m} and m ≥ 1(see [3], [7]).
1
Example 2.3 Let d = 1, Suppose the 1-point motion δ(πn,n+1 ) = sup |E(f (φn,n+1 (X1 ))
of a Brownian flow is 2 X1 ,X2 ∈Rd f L∞
−f (φn,n+1 (X2 )))|

θn Z
φn,n+1 (x) = , n ≥ 0, (2.20) 1
f (x) = sup | f (y)
2 X1 ,X2 ∈Rd ,f L∞
where {θn , n ≥ 0} is a family of independent random [πn,n+1 (X1 , dy) − πn,n+1 (X2 , dy)]|
variables with same distribution Exp(1), the standard Z
1 1
exponential distribution, F (x), x ∈ R be the Lipschitz = sup |√ f (y)
2 X1 ,X2 ∈Rd ,f L∞ 2π R
function with F ∞ ∨ F L < 1, where
1 F (X1 ) 2
[exp{− (y − ) }
2 4
F (x) − F (y)
F L ≡ sup ( , x = y) (2.21) 1 F (X2 ) 2
x,y∈R |x − y| − exp{− (y − ) }]dy|
2 Z 4
Then = sup | f (y)
X1 ,X2 ∈Rd ,f ∈U R
πn,n+1 (x, dy) = P (φn,n+1 (x) ∈ dy) 1 F (X1 ) 2
1 [exp{− (y − ) }
= IR+ (x) exp{−y/f (x)}dy. 2 4
f (x) 1 F (X2 ) 2
− exp{− (y − ) }]dy|
Therefore, 2 4
F (X1 ) − F (X2 )
≤ | |
4
1 1
δ(πn,n+1 ) = sup |E(f (φn,n+1 (X1 )) <
2 X1 ,X2 ∈Rd f L∞ 4
−f (φn,n+1 (X2 )))| 3
Z Then α(πn,n+1 ) = 1 − δ(πn,n+1 ) ≥ 4
> 0.
1
= sup | f (y)
2 X1 ,X2 ∈Rd ,f L∞
[πn,n+1 (X1 , dy) − πn,n+1 (X2 , dy)]| Remark 2.2 1. For the central limit theorem of a
Z
1 general Markov chain especially for the homo-
= sup | f (y) geneous case, there are lots of works. For ex-
2 X1 ,X2 ∈Rd ,f L∞ R+
ample, see[4], [5]. [8], [10],[14]. In those works,
1 y
[ exp{− }− the powerful tools they used are martingale ap-
F (X1 ) F (X1 )
proximations and martingale central limit theo-
1 y rem(For reference on Martingale limit theorem,
exp{− }]dy|
F (X2 ) F (X2 ) see[6]). There are relatively less works on non-
Z
homogeneous case, in which Dobrushin’s works[3]
= sup | f (y)
X1 ,X2 ∈Rd ,f ∈U R are very important. In[12], S.Sethuraman and
1 y S.R.S.Varadhan gave out a different and short
[ exp{− } proof of Dobrushin’s theorem, the method they
F (X1 ) F (X1 )
use are also martingale approximations and mar-
1 y
− exp{− }]dy| tingale central limit theorem. The main point is
F (X2 ) F (X2 )
how to construct a martingale from the Markov
≤ F ∞ ∨ F L chain. They gave out a kind of ”Poission-
≤ 1. resolvent” sequence which was often used in mar-
tingale approximations, certainly, what they gave
Then α(πn,n+1 ) = 1 − δ(πn,n+1 ) > 0. was different from the general case[13].
3

740
2. In [17], we considered the invariant measure of [8] Kipnis, C., Varadhan S.R.S., Central Limit Theo-
the N-point motion of A homogeneous Brownian rem for Additive Functionals of Reversible Markov
flow, we proved that there exists a unique invari- Processes, Commun. Math. Phys. 104, 1-19, 1986.
ant measure of the N-point motion under certain
conditions. The existence and uniqueness of the [9] Kunita,H., Stochastic Flows and Stochastic Differ-
invariant measure of a Markov chain is a strong ential Equations, Cambridge: Cambridge Univer-
condition when we consider its central limit the- sity Press, 1990.
orem . Then we may use the central limit the- [10] Pinsky,M., Lectures on Random Evolution., World
orem of a homogeneous Maikov chain to derive Scientific, Singapore, 1991.
the similar CLT of our special case, it should be
better than the result given in corollary 2. Here [11] Seneta,E., Non-negative Martrices and Markov
we won’t discuss more on that. Chains,Second Edition Springer-Verlag, New
York, 1981.
References [12] S.Sethuraman, S.R.S.Varadhan A Martin-

gale Proof of Dobrushin’s Theorem for Non-
[1] Basak,G.K., Kannan,D., On the Singular N-point Homogeneous Markov Chain, Electronic Journal
motion of a Brownian Flow: Asymptotic Flatness of Probability, Vol.10, 1221-1235, 2005.
ans Invariant Measure, Stochastic Analysis and
[13] Varadhan, S.R.S., Probability Theory, Courant
Applications, 11(4), 369-397, 1993.
Lecture Notes 7, American Mathematical Society,
[2] Basak,G.K., Kannan,D., On the Singular N-point Providence,R.T., 2001.
motion of a Brownian Flow: Asymptotic Flatness
ans Invariant Measure, Random Oper. and Stoch. [14] Wu, Wei Biao, Woodroofe,M., Martingale Approx-
Equ., Vol.4, No. 2, 163-178, 1996. imation for Sums of Stationary Processes, Ann.
[3] Dobrushin,R. , Central Limit Theorems for Non- Probab. 32, 1674-1690, 2004.
Stationary Markov Chain I, II, Theory of Probab.
[15] Bo Zhang, Kannan,D., Discrete-time Martingales
and its Appl. 1, 65-80, 329-383, 1956.
with Spatial Parameters, Stochastic Analysis and
[4] Gordin, M.I. , The Central Limit Theorem for Sta- Applications, Vol.20, No.5.1101 C 1131. 2002.
tionary Processes, Soviet Math. Dokl. 10, 1174-
1176.1969. [16] Bo Zhang, Jingxiao Zhang, Kannan,D., Nonlinear
[5] Gudynas,P., An Invariance Priciple for Inhomo- Stochastic Difference Equations Driven by Martin-
geneous Markov Chains, Lithuanian Math. J. 17, gales, Stochastic Analysis and Applications, Vol.23
184-192, 1977. 1277-1304, 2005.
[6] Hall, P., Heyde, C.C, Martingale Limit Theory and [17] Jingxiao Zhang, ON the Discrete Time Brown-
it Applications, Academic Press, New York, 1980. ian Flow I: Characteristic and Invariant Measure
[7] P.Del.Moral, L.Miclo, Self Interacting Markov of the N-Point Motion, Dynamics of Continuous,
chains,Stochastic Analysis and Applications, Vol. Discrete and Impulsive Systems, Series B: Theory
24, no. 3, 615–660, 2006. and Applications, Special Volume, 2007.
4

741
Doob’s Martingale Inequality in G-Framework

Jing Xu1 Bo Zhang2
Abstract: In this paper, we give the Doob’s martin- Definition 1.1 M ∈ S p is a LpG martingale, p ≥ 1 if
gale inequality in G-framework, which is established by for any 0 ≤ s ≤ t < ∞, it satisfies E[Mt |Fs ] = Ms , fur-
Peng [21] in 2006. This result covers the Doob’s mar- ther more if M is symmetric which means E[−Mt |Fs ] =
tingale inequality in probability space. −E[Mt |Fs ], we call M is a symmetric martingale.
1 Introduction 2 Main Results

In 1933 Kolmogorov publishes his foundation of Prob- To investigate the Doob’s inequality, we first extend the
ability. From then on the probability theory is widely definition of G-expectation to any measurable function
used in many fields, especially in mathematic finance. f ∈ Ω. By the theory of stochastic control as in [23],
But the linearity of the probability measure and the re- we know that, for any fixed T > 0
lated expectation restrict the application of probability " Z t2 Z tm !#
theory in finance, see Allais paradox and Ellesberg para- EG [X] = sup E ϕ vs dBs , · · · , vs dBs
dox. From this point of view, it is very important to de- v· ∈Λ t1 tm−1
velop a nonlinear expectation. In 1997, Peng introduce = sup EPv [X],
g-expectations and related g-conditional expectations Pv ∈Λ
via backward stochastic differential equations (BSDEs
in short)in [16], which can describe the dynamic finan- where X = ϕ(ωt1 , · · · , ωtm − ωtm−1 ) ∈ L0ip (FT ), E
cial model. The properties of g–expectation have been is the linear expectation under the Weiner measure,
studied in many papers, see [4], [5], [17], [18] and [19]. and it is the G-expectation when σ0 = 1. {Bt }t≥0 is
In 2006, Peng constructs the G-normal distribu- Z · Brownian motion under the Weiner measure. And
the
tion via the reformed heat equation, see [21]. With vdB(·) : C[0, T ] −→ C[0, T ],
0
this G-normal distribution, a nonlinear expectation is
given which is called G-expectation and the related con- Λ = {v : v is a progressively measurable & quadratic
ditional expectation is constructed, which is a kind of integrable stochastic process s.t.σ02 ≤ v 2 (t) ≤ 1, a.s.
dynamic coherent risk measure introduced by Delbaen
a.s. with respect to Weiner measure, 0 ≤ t ≤ T },
in [10]. Under this framework, the canonical process is Z ·
a G-Brownian motion. The stochastic calculus of Ito’s Λ = {Pv : Pv is the distribution of vs dBs , v· ∈ Λ }.
type with respect to the G-Brownian motion and the 0
related Ito’s formula are also derived. Differed from
Remark 2.1 For any X ∈ L1G (FT ),
there is a se-
the Brownian motion in classical case, the G-Brownian
quence fn ∈ L0ip (FT ), such that fn converges to f in
motion is not based on a given probability space.
L1G (FT ), then for any Pv ∈ Λ, fn converges to f in
Doob’s martingale inequality is one of the elemen-
L1 (Ω, Pv ), and this convergence is uniformly with re-
tary inequality in classical martingale theory, and it
spect to Pv . We have EG [f ] = supPv ∈Λ EPv [f ] (see
also plays an important role in stochastic analysis and
proposition 2.2 in [15]).
many other fields. In our paper, we will give the Doob’s
martingale inequality in G-framework, which covers the Next we will prove the tightness of Λ.
Doob’s martingale inequality in probability space see
[22]. Lemma 2.1 Λ is tight, which means for any ε > 0,
We shall begin by setting up the notations and in- there exists a compact set K ⊂ C([0, T ]) ⊂ Ω, such that
troducing the notions that are basic in this work. Let for any Pv ∈ Λ, Pv (K c ) < ε, where K c is the comple-
(Ω, F , FT , EG ) be the G-framework, the space of ran- ment of K.
dom variables LpG (FT ) and the space of stochastic pro- Proof: For any function x(t), t ∈ [0, T ] define
p
cesses MG (0, T ) p ≥ 1 have all well defined in [21]. We
set ωx (δ) = sup |xt − xs |.
|s−t|≤δ
p +
S = {M |M : R × Ω → R, M (t, ω) ∈ LpG (Ft ),
p
By Arzela-Ascoli and Prokhrov theorem, see theorem
∀T > 0, {Mt }t∈[0,T ] ∈ MG (0, T )}. 4.4.11 in [7], to prove the tightness of Λ, we only need to
1 Research of this author was supported in part by Supporting Program for Excellent Graduate Students in Renmin University
of China and in part of NNSF of China (10601066)

2 Corresponding Author, mabzhang@ruc.edu.cn (B. ZHANG), Research of this author was supported in part by Program for
NCET and in part by NNSF of China (No.60574077).

prove that for any α > 0, and ε > 0, there exists δ > 0, By remark 2.2, and remark 2.4, for any Borel func-
for any Pv ∈ Λ, we have Pv ({x : ωx (δ) ≥ α}) < ε. tion f defined on Ω, we can define EG [|f |].
εα2 For lower semi-continuous f ≥ 0,
For α > 0 and ε > 0, choose δ = > 0,
2
EG [f ] = sup{EG [g], g ∈ L0ip (FT ), 0 ≤ g ≤ f }.
EPv [ωx2 (δ)]
Pv ({x : ωx (δ) ≥ α}) ≤ For arbitrary f : Ω → R, f > 0,
α2
Z t
EG [f ] = inf{EG [g], g is lower semi-continuous, g ≥ f }.
EPv [sup|s−t|<δ | vdW |2 ]
s
= Set PG (A) = EG [IA ].
α2
Z δ
EPv [ vv2 dv] Proposition 2.1 Suppose An is a sequence of open
0 sets, satisfying An ↓ φ, where φ stands for the empty
≤ 2
α2 set, then PG (An ) ↓ 0.
δ
≤ 2 = ε. Proof: Since An is a an open set, we can find an in-
α2
creasing sequence fmn ∈ L0ip (FT ), such that fmn con-
Hence Λ is tight. verges to IAn for each n point-wisely. And for any ε > 0,
there exists mn,ε > 0, such that
Remark 2.2 For Ω is a polish space, by Prokhrov the-
orem , we know that Λ is weakly compact. For Xn ∈ fmn,ε < IAn , PG (An ) < EG [fmn,ε ] + ε,
L0ip (FT ), Xn ↓ 0 point wisely. As in the appendix of then fmn,ε ↓ 0, so we have
[15], by Dinni lemma, EG [Xn ] ↓ 0.
lim PG (An ) < lim EG [fmn,ε ] + ε = ε.
n−→∞ n−→∞
As in [8] and [15], we consider the Lebesgue exten-
sion of EG [| · |]: That is PG (An ) ↓ 0.
For bounded continuous function f ∈ Cb (Ω), Lemma 2.2 For any Borel set A ∈ FT , IA ∈ L1G (FT ).
EG [|f |] = sup{EG [g], g ∈ L0ip (FT )0 ≤ g ≤ |f |}. Proof: If A is a an open set, we can find an increasing
sequence fn ∈ L0ip (FT ), such that fn converges to IA
For open set A, point-wisely. Let
EG [IA ] = sup{EG [f ], f ∈ L0ip (FT ), 0 ≤ f ≤ IA }. [
∞
1
Ek,n = {ω : IA − fm > },
m=n
k
For closed set B,
S T∞
then Ek,n is an open set, and ∞ n=1 Ek,n = φ,
EG [IB ] = sup{EG [g], g ∈ L0ip (FT ), g ≥ IB }. where T
k=1
φ stands for the empty set. Obviously, for any k,
Ek = ∞ n=1 Ek,n = φ. Since Ek,n is an open set, then
Obviously, for any open set A, T
Ek,j = jn=1 Ek,n is also an open set, by proposition
EG [IA ] = sup EPv [IA ]. 2.1, we have
Pv ∈Λ
PG [Ek,n ] ↓ 0, n −→ ∞.
Remark 2.3 For any bounded continuous f ∈ Cb (Ω),
|f | ≤ M , M > 0, there exists a sequence of random Then for any ε > 0, there exists nεk > 0 such that
ε S
variables fn ∈ L0ip (FT ), such that fn monotonically PG [Ek,nεk ] < k , let Eε = ∞ k=1 , then PG (Eε ) < ε.
2
converges to f . And the extension above still satisfies By the classical argument we can get that fn −→ IA
sub-additive property. uniformly in Eεc , then we have
Remark 2.4 For Λ is tight, then for any ε > 0, there lim EG [|fn − IA |]
n→∞
exists a compact set K, such that supPv ∈Λ EPv [IK c ] < ≤ lim [EG [|fn − IA |IEεc ] + EG [|fn − IA |IEε ]]
ε, where K c is the complement of K. We know that in n→∞
a metric space a compact set is a closed set, then K c is ≤ ε,

an open set, so EG [IK c ] = supPv ∈Λ EPv [IK c ] < ε. And
so IA ∈ L1G (FT ). Then we can get that for any closed
by Dini’s theorem on any compact set K, fn converges
set A, IA ∈ L1G (FT ), by similar argument we have for
to f uniformly.
any Borel set A ∈ FT , IA ∈ LG (FT ).
lim EG [|fn − f |] ≤ lim [EG [|fn − f |IK ] Based on this lemma, we know that PG is a regular
n→∞ n→∞ Choquet capacity, that is, it holds that
+ EG [|fn − f |IK c ]] ≤ M ε,
(1) For any Borel set A, 0 ≤ PG (A) ≤ 1,
then limn→∞ EG [|fn − f |] = 0, so f ∈ L1G (FT ). (2) If A ⊂ B, then PG (A) ≤ PG (B),
2

743
(3) If ASn is a P sequence of Borel sets, then Similar we can get that
PG ( n An ) ≤ n PG (An ),
EG [IA Ms − IA Mt ] = 0.
(4) If An isSan increasing sequence of Borel sets,
thenPG ( n An ) = limn PG (An ). Then for any Pv , we have EPv [IA Mt − IA Ms ] = 0, then
We use the standard capacity related vocabulary: A set M is a Lp martingale under Pv . By the Doob’s martin-
A is polar if PG (A) = 0, a property holds “quasi-surely” gale inequality in probability space we have
(q.s.), if it holds outside a polar set.
PG [ sup |Mt | ≥ λ]
Proposition 2.2 Suppose that Xn , X ∈ LpG , 0≤t≤T
EG [|Xn − Xp |p ] → 0, then there exists a sub-sequence = sup Pv [ sup |Mt | ≥ λ]
Xnk of Xn , such that Xnk → X, q.s. Pv ∈Λ 0≤t≤T
1
Proof We know that for any ε > 0, ≤ sup p EPv [|MT |p ]
Pv ∈Λ λ
lim PG {|Xn − X| > ε} = 0. 1

n→∞ = EG [|MT |p ].
λp
Then for every integer k, there exists nk > 0, for any
n ≥ nk , we have
As applications, we give a corollary of the theorem.
1 1
PG {|Xn − X| ≥ } < k. Z t
2k 2
Corollary 1 I(t) = ηdBs is continuous q.s, where
We define Xk = Xnk a sub-sequence of Xn , then 2
0
η ∈ MG (0, T ), B is the canonical process.
PG {Xk X} Z t
( )
[\[ Proof For ηdBs is a symmetric L2G martingale,
= PG |Xk+v − X| ≥ εm 0 Z t
m k v 2,0
( ) see [21]. If η ∈ MG (0, T ), then ηdBs is continu-
X \[
0
≤ |Xk+v − X| ≥ εm 2,0
PG ous quasi-surely, else there exits ηn ∈ MG (0, T ), such
m v
Z t
k
( ) that EG [|In (t) − I(t)|2 ] −→ 0, where In (t) = ηn dBs .
X [ 0
≤ PG |Xk 0 +m+v − X| ≥ εm From theorem 2.1, we know
m v
XX j ff Z T
1
≤ PG |Xk 0 +m+v − X| ≥ EG [ sup |In (t)−Im (t)|2 ] ≤ EG [ |ηn −ηm |2 dt] −→ 0,
m v
2k0 +m+v 0≤t≤T 0
XX 1 1
≤ = k → 0. then by proposition 2.2, we know sup0≤t≤T |In (t) −
2k0 +m+v 2 0
m v Im (t)|2 is convergent q.s. so I(t) is continuous quasi-
Thus we obtain PG {Xk X} = 0. surely.
In the following, we will give the main theorem.
Theorem 2.1 Let M is a LpG symmetric martingale References
and is continuous quasi-surely, then
[1] Artzner P, Delbean F, Eber J M, Heath D., Co-
1
PG [ sup |Mt | ≥ λ] ≤ EG [|MT |p ], ∀p ≥ 1, T ≥ 0. herence measures of risk, Mathematical Finance,
0≤t≤T λp 9(1999),203—228.
In particular, [2] Coquet F., Hu Y., Memin J. and Peng S., Fil-
2 2 tration -consistent nonlinear expectations and re-
EG [ sup |Mt | ] ≤ 2EG [|MT | ].
0≤t≤T lated g-expectations, Probab.Theory Relat. Fields.
123(2002)1–27.
Proof: Here we only give the proof of the first inequal-
ity. By lemma 2.2, for any ξ ∈ LpG (FT ), A ∈ FT , [3] P.Billingsley, Convergence of Probability Measure,
EG [ξIA ] is well defined. 1968 John Wiley sons, New York.
For any s ≤ t, A ∈ FT [4] Chen,Z.,and Peng,S.,Continuous properties of g-
martingales, Chinese Ann. Math. Ser. B. 22:
EG [IA Mt − IA Ms ] 1(2001) 115—128.
= EG [EG [IA Mt − IA Ms |Fs ]]
[5] Chen,Z., Chen, T. and Davison M., Choquet
= EG [EG [IA Mt |Fs ] − IA Ms ] expectation and Pengs g-expectation. Annals of
= EG [IA EG [Mt |Fs ] − IA Ms ] = 0. Probability, Vol.33, No. 3(1999) 1179—1199.
3

744
[6] Chen,Z. and Epstein, L. Ambiguity, risk and (Eds.), Backward Stochastic Differential Equa-
asset returns in continuous time, Econometrica, tions. In: Pitman Res. Notes Math. Ser., 1997,
70:4(2002) 1403-1443. Vol. 364. Longman, Harlow, 141—159.
[7] Cheng S.H., Modern Probability , Peking Univer- [17] Peng S.G., Nonlinear expectations, nonlinear eval-
sity Press, Beijing, 2000. uations and risk measures. Stochastic Methods in
Finance. In: Frittelli, M., Runggaldier,W. (Eds.),
[8] Choquet G.,Theory of Capacities, Ann.Inst.
Lectures Notes in Mathematics, Springer,, 2004,
Fourier 5(1955) 131-295.
pp. 165—254.
[9] Daneilsson,J.,C.G.de Vries., Value at risk and ex-
[18] Peng S.G., Monotonic limit theorem of BSDE
treme returns, London School of Economics, Fi-
and nonlinear decomposition theorem of Doob-
nancial Markets Group, Discussion paper, 1998
Meyer’s type, Probability Theorey Related Fields,
no.273.
113: 4(1999)473—499.
[10] Dealbaen F., Coherent risk measures on gen- [19] Peng S.G., Filtration consistent nonlinear expec-
eral probability space, Advances in Finance and tations and evaluations of contingent claims, Acta
Stochastics, Springer-Verlag, 2002, 1—37. Mathematicae Applicatae Sinica, English series,
[11] El Karoui, N., Peng, S., Quenez, M.C., Backward 20: 2(2004)1–24.
stochastic differential equation in finance. Mathe- [20] Peng S.G., Nonlinear expectations and nonlinear
matical Finance, 7 (1997), 1—71. markov chains, Chin.Ann.Math., 26B:2(2005)159–
[12] Follmer H., Schied, A Convex measures of risk 184.
and trading constrains,Finance and Stochastics, [21] Peng S.G., G-expectation, G-Brwonian mo-
6(4)(2002),429—447 . tion and related calculus of Ito’s type.
[13] Follmer H., Schied,A., Robust preferences and con- http://abelsymposium.no/symp2005/preprints
vex measures of risk, Advances in Finance and /peng.pdf 2006.
Stochastics, Springer-Verlag, 2002, 39—56. [22] Stroock D.W., Varadhan, S.R.S. Multidimentional
[14] Krylov,N.V. Controlled Diffusion Processes, Diffusion Process,Springer-Verleg, 1979.
Springer-Verlag, 1980, 173—192. [23] Yan J.A. Peng S.G. et al: Topics on Stochastic
[15] Laurent Denis., Claude Martini.,Schied,A., A The- Analysis, Science Press, Beijing, 1997.
oretical Framework for the Pricing of Model Con- [24] Yosida K. Functional Analysis, Springer Verlag,
tingent Claims in the presence of Model Uncer- 1999. 43.
tainty, Annals of Applied Probability, 16:2(2006),
[25] Zhang B., Zhang J.X. and Kannan D., Non-
827-852.
linear Stochastic Difference Equations Driven by
[16] Peng S.G., Backward SDE and related g- Martingales, Stochastic Analysis and Applications,
expectations. In: El Karoui, N., Mazliak, L. Vol.23(6), 1277—1304 , 2005.
4

745
Fuzzy Genetic Algorithm Based on Principal Indexes Operation

FACHAO LI, PANXING YUE, CHENXIA JIN
School of Economy and Management, Hebei University of Science and Technology,
Shijiazhuang, Hebei, 050018, P. R. China
E-mail: lifachao@tsinghua.org.cn
Abstract: In this paper, by distinguishing principal indexes performance of BPIOFGA by one example.
and assistant indexes, we give the comparison method of
fuzzy information based on synthesizing effect and the
description method of fuzzy information on principal 2. Preliminaries
indexes. Furthermore, combine the transform strategy by
penalizing for problems with constraints, a new fuzzy 2.1. Fuzzy numbers
genetic algorithm by principal indexes operation is proposed
(denoted by BPIO-FGA, for short). Finally, we consider its Definition 1[7] Let A be a fuzzy set on the real number
convergence using Markov chain theory and analyze its field R, and AO {x | A(x) t O} be the O cuts of A. If
performance through an example. The results indicate that
A1 z I , AO is a bounded closed interval for each O (0,1]
BPIO-FGA possesses interesting advantages such as faster
convergence, less iterations and less chance trapping into and suppA {x | A(x) ! 0} is bounded, then A is called a fuzzy
premature states, so it can be applied to many fuzzy number. The class of all fuzzy numbers is called the fuzzy
optimization problems. number space, and written as E1 .
For given A E1 , if there exist real numbers a, b and
c satisfying A( x ) ( x a ) /(b a ) for all a d x b , and
1. Introduction A(b) 1 , and A( x) (c x) /(c b) for all b x d c , and
A( x) 0 for all x a or x ! b , then A is called a
The theory of fuzzy numbers is very popular in triangular fuzzy number, and written as A (a, b, c) .
describing uncertain phenomena in many actual problems, Obviously, a real number a can be viewed as a
its trace can be found in many domains such as fuzzy control,
special fuzzy number for its membership function defined
fuzzy optimization and fuzzy time serial etc.. For fuzzy
optimization, good results both in theory and application as a (a ) 1 and a ( x) 0 for all x z a , which tells us that
mainly focus on fuzzy linear problems[1-3], the basal methods the fuzzy number is just the extension of real numbers.
were mostly obtained by transforming a fuzzy optimization The operations of fuzzy numbers are the foundation of
problem to a classical one according to the structure fuzzy optimization problems. For the operations of fuzzy
properties of fuzzy numbers. With the development of numbers, we have the following results.
computer science, the evolutionary computation methods
Let A, B E1, k R , f (x, y) be a continuous binary
have entered into the field of vision of scholars who are
interested in fuzzy optimization problems. For instance, function, AO , BO be the O cuts of A and B, respectively.
genetic algorithms were used to processing the optimization Then f ( A, B ) E 1 , and ( f (A, B))O f ( AO , BO ) for each
problems with fuzzy coefficients but real variables in [4] O ( 0,1] . Particularly, for the triangular fuzzy numbers
and [5], and evolutionary computation were used to the A (a1 , b1 , c1 ) and B (a2 , b2 , c2 ) , we have
fuzzy linear optimization problems with fuzzy variables AB (a1 a2, b1 b2, c1 c2) , A B (a1 c2 , b1 b2 , c1 a2) ;
and fuzzy coefficients in [6], the essence of which is
kA (ka1, kb1, kc1) for k t 0 , kA (kc1, kb1, ka1) for k 0 .
transforming a fuzzy linear optimization problem to a
multi-objective optimization problem. Up to now, there is Fuzzy numbers have many good analytical properties,
no effective and common method for general fuzzy we can see ref. [8] for the concrete content.
optimization problems. In this paper, for the general
optimization problems with fuzzy coefficients, fuzzy 2.2. Compound Quantification Description of
variables and fuzzy constraints, we have the following Fuzzy Information
findings: 1) Based on distinguishing principal indexes and
assistant indexes, we give comparison method of fuzzy Ranking fuzzy numbers, as the main components of
information on synthesizing effect and description method of fuzzy number theory, is the key for fuzzy optimization
fuzzy information on principal indexes; 2) we establish a problems. In this paper, we use the compound quantification
broad and operable fuzzy optimization model, and combine strategy of fuzzy information proposed in reference [9].
the transform strategy by penalizing for problems with Definition 2[9] For fuzzy information A, if the real number
constraints, a new fuzzy genetic algorithm by principal
a is the centralized quantification value under a certain
indexes operation is proposed (denoted by BPIOFGA,
consciousness, a1 , a 2 , , a s denote the assistant quantity
for short); 3) we give the concrete implementation steps
and the crossover, mutation strategy; 4) we consider its indexes describing the connection between a and A from
global convergence on the elitist reserved strategy using different sides, then (a ; a1 , a2 ,, as ) is said to be a compound
Markov chain theory, 5) we further analyze the quantification value of A.

Generally speaking, we can define the centralized comparability like real numbers, so (3.1) is just a formal
quantification value of A as model, and can’t be easily solved. According to the above
compound quantification strategy, it can be converted into
1 1
I ( A)
L ³0 L(O )M ( AO )dO . (2.1) the following model (3.2) by synthesizing effect operator.
max E ( f ( x)), s. t. E (ci ( x)) d E (bi ), i 1, 2, , m . (3.2)
Here, M ( AO ) is the midpoint of AO , L(O) is a piecewise
continuous and monotone non-decreasing function from Here, E( f (x)) and E(ci (x)) denote the synthesizing effect
[0,1] to [0,f) , saying level effect function, which is value of fuzzy value function f (x) and ci (x) , respectively.
used to describe the reliability of level O in the process of Obviously, (3.2) have the feature of optimization
1 operation, but it is not conventional optimization problem,
decision making. L ³0 L(O)dO . In particular, if L 0, and can’t be solved by existing methods, its bottleneck
then I(A) is defined as the midpoint of A1. And take the lies that it is hard to describe the changing way of fuzzy
following index information in detail.
Considering that triangular fuzzy numbers are often
1
CD ( A) ³0 L ( Ȝ)m ( AȜ )dO (2.2) used to describe the fuzzy information in practical problems,
we assume that optimized variables and coefficients are
as the assistant quantity index to describe the confidence all triangular fuzzy numbers in this article. Using the
structure feature of fuzzy numbers and the density of step
degree of I(A) as the centralized quantification value of A,
type fuzzy number and quasi-linear fuzzy numbers in
and CD(A) is called concentration degree of A. Here, m fuzzy number space (see ref. [7]), we can establish the
is the Lebesgue measure. So, (I(A); CD(A)) is a kind of solution method for general fuzzy optimization problems.
compound quantification value of A in the level sense. Owing to the intrinsic difference of fuzzy numbers with
For the sake of specificity, in the following discussing, the real number in operations (such as the addition
we use (I(A); CD(A)) as the compound quantification operation and subtraction operation are not inverse), so
form of fuzzy information A. In the process of decision (3.2) is not still solved by analytical methods even if
triangular fuzzy number are strong in description. For this,
making, the assistant indexes play the role of supplement we can establish concrete solution method (denoted by
and constraint for principle index, we can obtain specific BPIO-FGA) by combining genetic algorithm and compound
quantitative values by acting the assistant index on the quantification strategy of fuzzy information.
principle index through certain method (call synthesizing
effect operator), through which the size comparison of
fuzzy numbers can be realized from global view. And for 4. Structure of BPIO-FGA
the maximum optimization problems, select
Genetic algorithms possess the features of easy
S(I ( A), CD(A)) I (A) /[1 E CD( A)]D (2.3)
operation and strong flexibility, which help itself become
as the synthesizing effect operator. Here, D , E (0,f) one of the most common used methods in many fields. In
all represent some kind of decision consciousness. this section, we will focus on the structure of BPIO-FGA.
And the basic operation strategy of BPIO-FGA includes
the following two aspects:
3. Formalized Description of Fuzzy 1) For decision variable A (a, b, c) , we see b as the
principle index describing the size position of A, a and c
Optimization Problems the assistant indexes. In the optimization process, we first
consider the change of the principle index b, and then by
In classical optimization, the objective function and combining the lengths of [a, b] and [b, c] and the change
constraints are determined. In practice, however, not only result of b, we determine the change results of assistant
the objective function but also the constraint conditions indexes a and c by the method of random supplement.
often have uncertainty in different forms. In this paper we Given the change result Ac (ac, bc, cc) of A (a, b, c)
will consider the following optimization problems: largely depends on the principle index b in this kind of
~ operational strategy, this kind of operational strategy is
max f ( x) , s. t. ci ( x) d bi , i 1, 2, , m . (3.1)
one of the main background we name our algorithm as
Here, f and ci are n-dimensional fuzzy value function, what we do.
~ 2) For the problems of the evaluation of the objective
d denotes the inequality relationship in the fuzzy sense,
x (x1, x2, , xn) , xi is the fuzzy variable, bi is the function and the satisfaction of the fuzzy constraints, we
given fuzzy number. take the synthesizing effect value of the compound
Because the fuzzy numbers do not have the quantification description of fuzzy information constituted of
(2.1) and (2.2) as the main criterion of operation. From

what we discussed in previous section 2.2, we can see that 4.3. Replication
we are involved in the concept of principle index and
assistant index there as well, which becomes another main In designing genetic algorithm, penalty strategy is
background we name our algorithm as what we do. commonly used to eliminate constraints in optimization
Owing to the nonnegativity of the objective function process. Its purpose is to convert infeasible solution into
value in real life, in the following we assume that: a) feasible solution by adding penalty item in the objective
E( f (x)) t 0 , if not, we can convert it into M E( f (x)) ; b) function. In BPIO-FGA, we use the following fitness
The optimization problem is the maximum one, and the function with some penalty strategy:
minimum optimization problem min f ( x ) can be converted F ( x) E ( f ( x)) p ( x) , (4.1)
into the maximum optimization problem by the follows
max [ M E ( f ( x))]. Here, M is a appropriate large positive and take (4.1) as the basis of proportional selection. Here,
number. p(x) is penalty factor, the basic form as follows: If all the
constraints are satisfied, then p( x) 1 ; If the constraints
4.1. Coding are not completely satisfied, then 0 d p ( x) d 1 .
In general, exponential function can be used as
Coding is the most basic component of genetic penalty function as follows:
algorithm. In BPIO-FGA, for (a, c, b), we have adopted m
three equal lengths 0, 1 code to separately represent the p(x) exp^ K ¦i 1 D i ri ( x) ` . (4.2)
principle index c and the left, right assistant indexes a and
b. Here, K (0, f] , D i (0, f] , ri ( x) [0, f) , 0 f 0 .
Obviously, K f implies decision result must satisfy
4.2. Crossover and Mutation all the constraints, D i f implies decision result must
satisfy the i th constraint, and 0 Di , K f implies the
The crossover and mutation operators are the specific
strategies in order to find the optimal or satisfied solution. decision result can break i th constraint. In the following
In BPIO-FGA, we only act the crossover and mutation example, let D i 1, K 0.01 , and ri (x) be the difference
operations on the middle section of fuzzy variables. And of synthesizing effect between two sides of i th constraint.
for the two ends of coding string we obtain from random
complement or definite complement strategy.
5. Convergence of BPIO-FGA
4.2.1. Crossover operation
For (a1, b1 c1) and (a2, b2, c2), cross the two strings We can know from the discussion above that, the
representing b1 and b2 separately, and take one of the process of crossover, mutation and selection in BPIO-FGA is
obtained strings b as the crossover result of b1 and b2, then only relevant to the current state of populations, but has
the left and right assistant indexes a and c can be nothing to do with the former one, so, the BPIO-FGA is a
determined by the following methods (here, r1 and r2 are Markov chain.
random numbers in specified scope): For convenience, let n be the size of the population in
ķ a b r1b , c b r2 b ; BPIO-FGA, S the population space, l the string length,
ĸ a b r1 , c b r2 ; then | S | 2 l n . Let R {rij} , C {cij} , M {mij} be the
Ĺ a b r1(b1 a1) r2(b2 a2), c b r1(c1 b1) r2(c2 b2). state transition matrix determined by the operations of
reproduction, crossover and mutation, respectively, then
4.2.2. Mutation operation using the finite Markov chain theory, we can obtain the
following results.
For A (a, b, c) , mutate the string representing b,
Lemma 1: The selection probability matrix R based on
and obtain the mutation results bc , then determine the left
and right assistant indexes a c and cc of bc by using the the proportional replication operation is random.
method of random complement as well. Usually, we can Lemma 2: The probability matrix C of crossover operation
take the following methods to determine a c and cc with crossover probability Pc [0,1] is random.
(here, r1 and r2 are random numbers in specified scope): Lemma 3: The probability matrix M of mutation operation
ķ ac bc r1bc , cc bc r2bc ; with mutation probability Pm (0,1) is a strictly positive
matrix.
ĸ ac bc r1 , cc bc r2 ; Theorem 1: If we select the genetic operations as
Ĺ ac bc r1(b a) , cc bc r1(c b) . proportional replication, crossover with probability
In this paper, we choose ķ s as the methods of p c [0, 1] , mutation with probability Pm (0,1) , then the
crossover and mutation. transition matrix P of BPIO-FGA is regular.

Corollary1: The Markov chain constituted by BPIO-FGA In order to further analyze the performance of
using proportional replication, crossover with probability BPIO-FGA, for different synthesizing effect functions and
Pc [0,1] and mutation with probability Pm (0, 1) is an level effect function L(O) , we separately make tests from the
ergodic Markov chain. following three aspects:
Theorem 2 BPIO-FGA using the elitist preservation Test 1 For L(O) O and the effect synthesizing
strategy in replication process is global convergent, that is function S(I ( A), CD( A)) I ( A) /(1 E CD( A))D and (D, E)
P{Z n f } o 1(n o f) . Here, Z n denotes the optimal takes (0.5, 0.1), (0.5, 1), (2, 0.1) and (2, 1), respectively,
value of the nth population, f denotes the global optimal the computation results are stated in Table1.
value of individuals. Test 2 For S ( I ( A), CD( A)) I ( A) /(1 0.1 CD( A)) 0.5 and
L(O ) be O, O2 and O0.5 , respectively, the computation results
are stated in Table 2.
6. Application Example Test 3 For S ( I ( A), CD( A)) I ( A) /(1 0.1 CD( A)) 0.5 , and
L(O) O , the results of 10 experiments separately are
Consider the following nonlinear programming: stated in Table 3.
max z (0.1, 0.3, 0.8) x12 (0.2, 0.4, 0.7) x 22 In Table 1, Table 2 and Table 3, Y1 denotes the
centralized quantification value of the maximum value,
(16.1, 17, 17.3) x1 (17.7, 18, 18.6) x 2
C.D. the concentration degree, Y2 the synthesizing effect
~
s.t. (1.4, 2, 2.6)x1 (2.7, 3, 3.3)x2 d (47, 50, 51), value of the maximum value, C. the convergence
~ generation, C.T. the computation times, and A.V. the
(3.8, 4, 4.4)x1 (1.6, 2, 2.2)x2 d (40, 44, 47),
(2.6, 3, 3.2)x1 (1.6, 2, 2.2)x2 ~ (32, 36, 40), average value.
~ All the calculations above are based on Matlab 6.5
x1, x2 t 0. and 2.00 GHz Pentium 4 processor and worked out under
For this optimization problem, if the coefficients and Windows 2000 Professional Edition platform.
variables are all real numbers, then the optimal solutions From the results above we can see that: 1) The
are x1 4.8333, x 2 10.75 , max z 222.4329 . computational results are related to the level effect function
By using BPIO-FGA with 20 bits of binary coding, if and synthesizing effect operator, which shows BPIO-FGA
can effectively merge decision consciousness into decision
setting the genetic parameters as follows: the size of
process; 2) Despite of variation of parameters, the
population is 80, the number of evolution generation is convergence generation is about 20, and the convergence
100, crossover probability pc 1 , mutation probability time is within 25 seconds, also, the rate of getting the
pm 0.0001 , the level effect function L (O ) O , the optimal result is almost more than 80%, which shows the
compound quantification form is (I(A); CD(A)), and the algorithm have higher computational efficiency and good
synthesizing effect operator is S (a, b) a /[1 0.01b]0.5 , convergence performance; 3) Though the computational
complexity is a bit larger than that of conventional algorithms,
then we can get the optimal value shown on Fig. 1 after
the difference is not great under high-performance parallel
100 times of iterations (taking the times of iteration as computing environment, so BPIO-FGA has good
x-coordinate, and the centralized quantification value of practicability; 5) BPIO-FGA, with the feature of good
fuzzy maximum value as y-coordinate). The optimal interpretability and strong operability, have good structure.
solutions are x1 (4.6245,5.0000,5.3103), x2 (10.8989,
11.0000, 11.3093), and the centralized quantification of
fuzzy maximum value is 222.7926. 7. Conclusion
In this paper, on the basis of distinguishing principal
indexes and assistant indexes and the restriction and
supplementation relation between them, we give comparison
method of fuzzy information based on synthesizing effect
and description method of fuzzy information on principal
indexes; and a new fuzzy genetic algorithm by principal
indexes operation for the general optimization problems is
proposed; finally, we consider its convergence using
Markov chain theory and analyze its performance through
an example. The results indicate that the algorithm not
only merge decision consciousness effectively with
optimization process, but possesses many interesting
Fig. 1 The results of 100 times iterations of Example 1 advantages such as strong robust, faster convergence, less

iterations and less chance trapping into premature states, intelligence, manufacture and management and optimization
so it can be applied to many fuzzy fields such as artificial control etc.
Table 1 The computation results of Test 1 Table 2 The computation results of Test 2
(Į, ȕ) Y1 C.D. Y2 C. C.T. L(Ȝ) Y1 C.D. Y2 C. C.T.
(0.5,0.1) 221.0667 28.8954 112.0919 21 29.0460 Ȝ 219.9388 21.1604 124.5949 18 17.7340

2
(0.5,1) 220.0334 28.2740 102.0919 17 21.3910 Ȝ 222.5093 13.7181 144.4800 15 19.0160
(2, 0.1) 199.1048 25.2674 22.1943 23 27.0780 Ȝ0.5 220.1760 26.0321 115.9912 20 22.0470
(2, 1) 163.1917 20.9739 0.3380 19 15.6400
Table 3 The computation results of Test 3

1 2 3 4 5 6 7 8 9 10 A.V.
Y1 221. 5670 221.9289 222. 2023 220. 7847 221. 5398 222. 1236 219. 6166 221. 7816 220. 1272 220. 7866 221. 2458
C.D. 21.5118 21.9740 21.6370 21.5817 21.8301 21.6555 20.6622 21.8890 22.5024 22.0553 21.7299
Y2 70.3124 69.4093 70.2349 69.9090 69.6007 70.1691 71.6246 69.5480 67.7264 68.8768 69.7411
C. 13 19 18 20 21 14 16 22 24 18 18.5000
C.T. 25.4545 25.2658 22.5965 25.0002 30.2365 24.4589 26.2648 24.2647 26.0213 26.0213 25.7158
[6] Buckley, J. J. and Feuring T., Evolutionary algorithm

solution to fuzzy problems: fuzzy linear programming,
Acknowledgements Fuzzy Sets and Systems, 109 (2000), 35-53.
[7] Li, F., Wu, C. and Qiu, J., Platform fuzzy number and
This work is supported by the National Natural separtability of fuzzy number space, Fuzzy Sets and
Science Foundation of China (70671034, 60574077) and Systems, 117(2001), 347-353.
the Ph. D. Foundation of Hebei Province (05547004D-2,
[8] Diamond, P. and Kloeden, P., Metric space of fuzzy set:
B2004509) and the Natural Science Foundation of Hebei
theory and applications, Singapore: Word Scientific,
Province (F2006000346).
1994.
[9] Li, F., Yue, P. and Su, L., Research on the
References Convergence of Fuzzy Genetic Algorithms based on
Rough Classification, Proceedings of The Second
International Conference on Natural Computation
[1] Tang, J. and Wang, D., Fuzzy Optimization Theory and The Third International Conference on Fuzzy
and Methodology Survey, Control Theory and Systems and Knowledge Discovery, 2006, 792-795.
Application, 17(2000), 159-164, (in Chinese).
[2] Cadenas, J. M. and Verdegay, J. L., Using ranking
functions in multiobjective fuzzy linear programming,
Fuzzy Sets and Systems, 111(2000), 47-531.
[3] Maleki, H. R., Tala, M., and Mashinchi, M., Linear
programming with fuzzy variables, Fuzzy Sets and
Systems, 109(2000), 21-33.
[4] Leu, S., Chen, A., and Yang. C., A GA-based fuzzy
optimal model for construction time-cost trade-off,
International Journal of Project Management, 19 (2001),
47-58.
[5] Tang, J., Wang, D., and Fung, R. Y. K., Modeling and
method based on GA for nonlinear programming
problems with fuzzy objective and resources,
International Journal of System Science, 29(1998),
907-91.

Analysis of Compressible Miscible Displacement with

Dispersion by a Characteristics Collocation Method
Ning Ma
Department of Mathematics and Physics, China University of Petroleum, Beijing, 102249, P.R.China
AMS subject classifications:35M10,65M25,65M70
Abstract: A nonlinear parabolic system is proposed to where ν is the outer normal to ∂Ω, and the initial con-
describe compressible miscible displacement with dis- ditions
persion in porous media. The concentration is treated
by a characteristic collocation method, while the pres- (a) p(x, y, 0) = p0 (x, y), (x, y) ∈ Ω,
(1.3)
sure is treated by a finite element collocation method. (b) c(x, y, 0) = c0 (x, y), (x, y) ∈ Ω.
Optimal order error estimates is also derived in this
paper. The collocation methods are widely used for solv-
ing practice problems in engineering due to its easi-
ness of implementation and high-order accuracy. But
1 Introduction the most parts of mathematical theory focused on
one-dimensional or two-dimensional constant coefficient
Compressible flow with dispersion is modelled by a non- problems [3,4].In 1990’s the collocation method of two-
linear coupled system of two partial differential equa- dimensional variable coefficients elliptic problems is
tions. Let Ω = (0, 1) × (0, 1) with the boundary ∂Ω, given in [5].But in this paper the mathematical con-
p(x, y, t) the pressure in the mixture, u the Darcy ve- trolling model for compressible flow with dispersion
locity of the fluid, and c(x, y, t) the relative concentra- in porous media is strongly nonlinear coupling sys-
tion of the injected fluid. k(x, y) and φ(x, y) are the tem of two partial differential equations. Nonlinear
permeability and the porosity of porous media, μ(c) is terms introduce many difficulties for convergence analy-
the viscosity of fluid, q and c̄(t) etc. are just like the sis of algorithms, we use different collocation technique
definition of [1,2]. The mathematical controlling model to treat equations of different types, usual collocation
for compressible flow with dispersion in porous media method to solve the equation for pressure and char-
is given by acteristic collocation scheme to approximate the equa-
tion for concentration. We develop some technique to
∂p ∂p analyze convergence of collocation algorithm for this
(a) d(c) + ∇ · u = d(c) − ∇ · (a(c)∇p) = q,
∂t ∂t strongly nonlinear system and obtain the optimal order
∂c ∂p L2 error estimate. And we shall assume the coefficients
(b) φ + b(c) + u · ∇c − ∇ · (D(u)∇c) = (c̄ − c)q,
∂t ∂t a(c), D(u), φ(x, y), d(c), b(c) to be bounded above and
(x, y) ∈ Ω, t ∈ (0, T ] below by positive constants independently of c as well
(1.1) as being smooth.
The organization of the rest of the paper is as fol-
where lows. In Section 2, we will present the formulation of
c = c1 = 1 − c2 , a(c) = a(x, y, c) = k(x, y)/μ(c), the characteristic collocation scheme for nonlinear sys-

2 tem (1.1). In section 3, we will analyze convergent rate
b(c) = b(x, y, c) = φ(x, y)c1 {z1 − zj cj }, of the scheme defined in section 2.
j=1

2 Throughout, the symbols K and ε will denote, re-
d(c) = d(x, y, c) = φ(x, y) zj cj . spectively, a generic constant and a generic small posi-
j=1
tive constant. We also assume that the problem is pe-
ci denote the concentration of the ith component of
riodic in space, then the boundary condition (1.2) can
the fluid mixture, and zi is the ”constant compress-
be omitted.
ibility” factor [1] for the ith component. D(u) is
molecular diffusion and dispersion coefficient, it’s form
is D(u) = φ{dm I + |u|(dl E(u) + dt E ⊥ (u))}, where
E(u) = (uk ul /|u|2 ) is the 2 × 2 matrix representing
2 Fully discrete characteristic
orthogonal projection along the velocity vector and collocation scheme
E ⊥ (u) = I − E(u) is the complementary projection.
We shall assume that no flow occurs across the In this section, we will give some basic notations and
boundary definition for the characteristics collocation methods,
which will be used in this article. Then we will present
(a) u · ν = 0 on ∂Ω, the fully discrete characteristic collocation scheme for
(1.2)
(b) D(u)∇c · ν = 0 on ∂Ω, nonlinear system (1.1).

2.1 Preliminaries 2.2 Fully discrete CCS
We make the partition of the domain Ω, which is quasi- At first time can be discretized 0 = t0 < t1 < · · · <
uniform and equally spaced rectangular grid. The grid tN = T, t = tn − tn−1 . We consider the concentration
1
points are (xi , yj ), i = 0, 1 · · · Nx ; j = 0, 1 · · · Ny . Let equation, let ψ = [φ2 + u21 + u22 ] 2 , and the characteris-
δx : 0 = x0 < x1 < · · · < xNx = 1, tic direction associated with the operator φct + u · ∇c
δy : 0 = y0 < y1 < · · · < yNy = 1, is denoted by τ (x, y), hence
and hx = xi − xi−1 , hy = yj − yj−1 , h = max{hx , hy }
are grid size along x-direction and y-direction and max- ∂c ∂c
ψ =φ + u · ∇c.
imum size of partition respectively. Introduce the fol- ∂τ ∂t
lowing notations: Ωij = (xi−1 , xi ) × (yj−1 , yj ), I = The equation (1.1)(b) can be put in the form
[0, 1], Ixi = [xi−1 , xi ], Iyj = [yj−1 , yj ], for i = 1, 2 · · · Nx
and j = 1, 2 · · · Ny . Define function spaces as follows: ∂c ∂p
ψ + b(c) − ∇ · (D(u)∇c) = (c̄ − c)q,
m1 (3, δx ) = {v ∈ C (I)| v ∈ 1
P3 (Ixi ), i = 1 · · · Nx }, ∂τ ∂t (2.3)
(x, y) ∈ Ω, t ∈ (0, T ].
m1 (3, δy ) = {v ∈ C 1 (I)| v ∈ P3 (Iyj ), j = 1 · · · Ny },
where P3 denotes the set of polynomials of degree ≤ 3, For (2.3), we use a backward difference quotient for
and ∂c/∂τ along the characteristic line:
m1,P (3, δx ) = {v ∈ m1 (3, δx ) : v(0) = v(1)}, ∂cn cn (x, y) − cn−1 (x̆, y̆) cn − c̆n−1
ψ ≈ψ =φ , (2.4)
m1,P (3, δy ) = {v ∈ m1 (3, δy ) : v(0) = v(1)}, ∂τ 1
t[1 + |u|2 /φ2 ] 2 t
then let m1 (3, δ) and m1,P (3, δ) be the spaces of piece-
where
wise Hermite bicubics defined by
# u n
u n
m1 (3, δ) = m1 (3, δx ) m1 (3, δy ), f˘n = f (x̆n , y̆ n , tn ), x̆n−1 = x− 1 t, y̆ n−1 = y− 2 t.
φ φ
#
m1,P (3, δ) = m1,P (3, δx ) m1,p (3, δy ). Then, we have the following discrete equation:
Next, we take four Gauss points as collocation points n−1
x y x cn
h − c̆h P n − P n−1
in Ωij : (ξik , ξjl ), k, l = 1, 2, ξik = xi−1 + hx ξk ,
√ φ + b(cn−1
h ) − ∇ · (D(un n
h )∇ch )
y t t (2.5)
ξjl = yj−1 + hy ξl , where ξ1 = (3 − 3)/6,
√ − (c̄n−1 − cn−1 )q = 0, n = 1, 2 · · · .
ξ2 = (3 + 3)/6. Let T3,δx and T3,δy be the in- h
terpolation operators of piecewise Hermite bicubics of Now by using the interpolation operator T3,δ and
m1 (3, δx ) in x and m1 (3, δy ) in y, respectively, and the Gauss points { (ξik x y
, ξjl ), 1 ≤ i ≤ Nx ; 1 ≤ j ≤
T3,δ be the interpolation operator of piecewise Her- Ny ; k, l = 1, 2}, we give the fully discrete characteristic
mite bicubics in m1 (3, δ) on Ω, which may be defined collocation scheme:
by T3,δ v = T3,δx T3,δy v = T3,δy T3,δx v, for sufficiently Characteristic Collocation Scheme: If
smooth function v. (C n−1 , P n−1 ) has been known at t = tn−1 , at t = tn
Introduce the following summation notation: the (C n , P n ) should be
Nx Ny

u, v = u, vij (a)C 0 = T3,δ c0 (x, y), P 0 = T3,δ p0 (x, y),
i=1 j=1
P n − P n−1
Nx Ny
1
2 (b) d(C n−1 ) − ∇ · (a(C n−1 )∇P n ) − q (ξik
x y
, ξjl ) = 0,
= hx hy x
(uv)(ξik y
, ξjl ), t
4
i=1 j=1 k,l=1 C n − Ĉ n−1 P n − P n−1
Nx
Nx

(2.1) (c){φ + b(C n−1 ) − ∇ · (D(U n )∇C n )
hx
2
x t t
u, vx = u, vix = (uv)(ξik ),
i=1 i=1
2 k=1 − (C̄ n−1 − C n−1 )q}(ξik
x y
, ξjl ) = 0,
Ny Ny
hy
2
y
(2.6)
u, vy = u, vjy = (uv)(ξjl ),
j=1 j=1
2 l=1 where
and n n
U U
u, v = u, vx , 1y = u, vy , 1x , fˆn = f (x̂n , ŷ n , tn ), x̂n−1 = x− 1 t, ŷ n−1 = y− 2 t
φ φ
u, u = |||u|||2 ,
1 1 and
|||u|||2H 1 (Ω) = Dux , ux y dx + Duy , uy x dy, U n−1 = −a(C n−1 )∇P n−1 (2.7)
0
0 0 for 1 ≤ i ≤ Nx , 1 ≤ j ≤ Ny , k, l = 1, 2 and n, m ≥ 0,
1 1
|||u|||2E = ux , ux y dx + uy , uy x dy,
(2.2) computed in the order: at first P n can been computed
0 0 from (2.6)(b), then from (2.7) and (2.6)(c) we can ob-
∀u ∈ m1 (3, δ). tain C n .

3 Convergence analysis 3.2 Existence of the solution of CCS
In this section, we first analyze the existence of the so- In this section we consider the existence and uniqueness
lution of the characteristic collocation scheme, and then of the numerical solution. (2.6)(b)(c) can be rewritten
analyze convergence. as the discrete Galerkin method given by
$ %
P n − P n−1
3.1 Preliminary results (a) d(C n−1 ) − ∇ · (a(C n−1 )∇P n ) − q, χ = 0,
t
We list some basic results in [3,4,6]. ∀χ ∈ m1,P (3, δ)
Lemma 3.1 Let e = v − T3,δx v, then there exists con- n n−1 n n−1
C − Ĉ P −P
stant K > 0 such that (b) φ + b(C n−1 ) − ∇ · (D(U n )∇C n )
N t t
x xi
∂αv
2
(1) el , el x ≤ Khx2(4−l) · dx, l = 0, 1 − (C̄ n−1 − C n−1 )q, Z = 0, ∀Z ∈ m1,P (3, δ).
∂x α
i=1 x i−1 α≤4 (3.1)
Nx x
i α 2
∂ v
(2) exx , exx x ≤ Kh6x · α
dx, We only discuss the pressure equation, and the con-
i=1 x i−1 α≤5
∂x centration equation is similar. It is clear that any so-
Nx x
i ∂ α v 2 lution of (2.6)(b) is a solution of (3.1)(a). Thus, it is
(3) |ex , 1x |2 ≤ Kh9x · dx, sufficient to prove existence for (2.6)(b) and uniqueness
i=1 xi−1 α≤5
∂xα for (3.1)(a) (lemma 4.1 of [3]). For sufficiently small
Nx x α 2 t, existence for (2.6)(b) follows from lemma 3.3, since
i
∂ v
(4) |exx , 1x |2 ≤ Kh9x · dx. it implies that matrix generated by the time deriva-
i=1 xi−1 α≤6
∂xα tive term is nonsingular for any choice of the basis for
m1,P (3, δ), and uniqueness for solutions of (3.1)(a) also
There is the same conclusions in y direction.
is implies by lemma 3.3, since the matrix generated by
Lemma 3.2 There exists constant K ≥ 0 such that for
time-derivative term in (3.1)(a) must be nonsingular
sufficiently smooth function v
since d(c) is bounded below by a positive constant.
Nx Ny
1 So CCS(2.6) and the discrete Galerkin method
v − T3,δ vL2 (Ω) ≤ Kh4 ( v (4) L2 (Ωij ) ) 2 , (3.1) each possess a unique solution for 0 < t ≤ T ;
i=1 j=1
moreover, these solutions are identical if the processes
Nx Ny
(4) 1 are started from the same initial values.
vt − T3,δ vt L2 (Ω) ≤ Kh4 ( vt L2 (Ωij ) ) 2 .
i=1 j=1
3.3 Error estimate
Lemma 3.3 For any v ∈ m1 (3, δ), if we have
x x y y In this section, we will obtain the optimal L2 -norm er-
v(ξik , 0) = v(ξik , 1) = v(0, ξjl ) = v(1, ξjl ) = v(0, 0)
ror estimate. We assume & that &
x y 2 &
= v(0, 1) = v(1, 0) = v(1, 1) = v(ξik , ξjl ) = 0, (R) c ∈ L∞ (H 6 ) L∞ (W∞ 2
) H 1 (W∞ ) H 2 (H 1 )
& & 1 &
for 1 ≤ i ≤ Nx , 1 ≤ j ≤ Ny and k, l = 1, 2, then v = 0. p ∈ L∞ (H 6 ) H 1 (H 6 ) L∞ (W∞ ) H 2 (H 1 ).
Lemma 3.4 Assume that v ∈ m1 (3, δ) holds, there ex- Theorem 3.1 Suppose (R) and r = 3 hold, and
ists constant K1 ≥ 0 and K2 ≥ 0 such that t = o(h), then there exists a constant K =
K(Ω, a∗ , b∗ , d∗ , φ∗ , D∗ , · · · , K ∗ , K1 , K2 ) such that, for h
vL2 (Ω) ≤ |v| ≤ K1 vL2 (Ω) , sufficiently small,
vL∞ (Ω) ≤ K2 h−1 vL2 (Ω) . T /t

Lemma 3.5 Assume that D(x, y) is sufficiently max cn −C n 2 + pn −P n 2 t ≤ K( t2 +h8 ).
smooth. There exists constants 0 < K∗ ≤ K ∗ such T ]
0≤n≤[ t
n=0
that for each v ∈ m1,P (3, δ)
Proof: Let c̃ = T3,δ c, ζ = c − c̃, ξ = c̃ − C,
K∗ − v, v ≤ −∇ · (D∇v), v ≤ K ∗ − v, v.
p̃ = T3,δ p, η = p − p̃, π = p̃ − P.
Proof: We can get the conclusion from the Peano We first consider the pressure equation. Subtract-
representation of the remainder in the two-point Gauss- ing (3.1)(a) from the Galerkin method of (1.1)(a), we
Legendre quadrature and Leibnitz’s formula (see The- obtain
orem 4.2 in [5]) and the lemma 3.4 in [6].
Lemma 3.6 Under the same conditions as in lemma d(C n−1 )dt π n , χ − ∇ · (a(C n−1 )∇π n ), χ
3.5, there exists constant 0 < C∗ ≤ C ∗ such that = [ d(C n−1 ) − d(cn ) ]dt p̃n , χ − d(cn )dt η n , χ
C∗ |v|2H 1 (Ω) ≤ −∇ · (D∇v), v ≤ C ∗
|v|2H 1 (Ω) , $ %
∂pn (3.2)
0 0
+ d(cn ) dt pn − , χ + ∇ · (a(cn )∇η n ), χ
Proof: For ∀v ∈ m1,P (3, δ), since (2.1) and lemma ∂t
3.5, obviously we know lemma 3.6 is right. + ∇ · [ (a(cn ) − a(C n−1 ))∇p̃n ], χ, ∀χ ∈ m1,P (3, δ)

f n − f n−1 then for sufficient h there exists constant C > 0, we
where dt f n = , and choosing the test func-
t have a∗ − KK2 h ≥ C > 0. By (3.3)-(3.8), we multi-
tion χ = π n in (3.2), and the right terms can be denoted plied by 2 t and sum in time n, for ε sufficiently small,
by Ti , i = 1, 2 · · · 5 in turn. Then by lemma 3.1, lemma
3.2 and lemma 3.4, we have
m−1
m
$' ( % d∗ ||π n ||2 t + d∗ ||π m ||2 + ||∇π n ||2 t
∂d ∂d
|T1 | = (C n−1 − cn−1 ) + (cn−1 − cn ) dt p̃n , π n n=1 n=1
∂c ∂c
(3.9)

m−1
≤ K(h8 + t2 + ||ξ n−1 ||2 ) + ε||π n ||2 . ≤K h8 + t2 + ||ξ n ||21 t .
n=1
And
)$ %)
) η n − η n−1 n )) We can turn to the derivation of a corresponding
|T2 | ≤ )) d(cn ) ,π ) evolution inequality for ξ n . Subtracting (3.1)(b) from
t (3.3)
the discrete Galerkin scheme of (1.1)(b), we obtain
≤ K|||ηt |||2 + ε|||π n ||| ≤ Kh8 + ε||π n ||2 ,
$ %
For T3 , we can get from the standard backward- ξ n − ξ n−1
φ , Z − ∇ · (D(U n )∇ξ n ), Z
difference error equation or Taylor expansion[7,8] t
$ %
)$ n %) ∂cn cn − c̆n−1
) p − pn−1 ∂pn ) =− φ + un · ∇cn − φ ,Z
|T3 | ≤ )) d(cn ) − , π n )) ∂t t
t ∂t (3.4) $ n−1 % * n−1 ˆn−1 +
n−1
≤ K( t)2 + ε||π n ||2 . c̆ − ĉ ξ −ξ
+ φ ,Z − φ ,Z
t t
To obtain T4 , we have the following conclusion for ε * + (3.10)
ζ n − ζ̂ n−1
sufficiently small(see lemma 4.1 of [6]) − φ , Z + ∇ · (D(U n )∇ζ n ), Z
t
|(Dζxn )x , ξ n | ≤ ε{(ξxn , ξxn ) + ξ n , ξ n }
+ ∇ · [D(un ) − D(U n )]∇cn , Z
Nx Ny
∂ α cn 2 (3.5)
+ Kh8 dΩ, + [−(ξ n−1 + ζ n−1 ) + (cn−1 − cn ) ] q, Z
∂x α $ %
i=1 j=1 Ωij n n−1 n
n−1 P − P
α≤6
n ∂p
+ b(C ) − b(c ) ,Z ,
So we obtain t ∂t
|T4 | = |∇ · (a(cn )∇η n ), π n | where Z ∈ m1,P (3, δ), to obtain L2 estimate for ξ, we
(3.6)
≤ Kh8 + ε(||π n ||2 + ||∇π n ||2 ) choose Z = ξ n as test function in (3.10), and we denote
the resulting right-hand side terms by T1 , T2 , · · · , T8 .
For T5 , we shall need an induction hypothesis. We as- First we shall discuss the right-hand side of (3.10). And
sume that we need another induction hypothesis, we assume that
||C n ||W∞
1 ≤ K, 0 ≤ n ≤ l − 1. (3.7)
||∇P n ||L∞ ≤ K, 0 ≤ n ≤ l − 1. (3.11)
We start this induction by seeing that
If l = 1, we can start the induction by (3.9) to get
||C 0 ||W∞ 0 0 0
1 ≤ ||c̃ ||W 1 + ||ξ ||W 1 ≤ ||c̃ ||W 1 ≤ K,
∞ ∞ ∞
for h sufficiently small. We shall check that if n = l, ||∇P 0 ||L∞ ≤ ||∇p̃0 ||L∞ + ||∇π 0 ||L∞
(3.7) is right at the end of the proof. Similar to the ≤ K + Kh−1 (h4 + t) ≤ K,
proof of T1 and T4 and using lemma 3.1, lemma 3.2,
lemma 3.4 and (3.7), we can get
for h sufficiently small and t = o(h). We shall check
|T5 | ≤ K(||ξ n−1 ||21 + h8 + t2 ) + ε(||π n ||2 + ||∇π n ||2 ), that if n = l (3.11) is right at the end of the proof.
Similar to the discussion in [2,6,7], and by the
Next using the inequality a(a − b) ≥ 21 (a2 − b2 ), we above lemmas and the induction hypothesis, we can get
see that the first left-hand side term of (3.2),
|T1 + T2 + T3 + T4 | ≤ ε( ξ n 21 + ξ n−1 21 )
d(C n−1 )dt π n , π n (3.12)
1 + K( t2 h2 + t2 + h2 + ξ n 2 ).
≥ d(C n−1 )π n , π n − d(C n−1 )π n−1 , π n−1 .
2 t
Then by (3.5),(3.7) and (3.11), we have
Similar to the proof of lemma 3.5 and (3.7), the
second left-hand side term of (3.2) get
|T5 | ≤ |(D(U n )ζxn )x , ξ n | + |(D(U n )ζyn )y , ξ n |
(3.13)
−∇ · (a(C n−1 )∇π n ), π n ≥ (a∗ − KK2 h) ∇π n 2 , (3.8) ≤ K h8 cn 2H 6 + ε(||ξ n ||2 + ||∇ξ n ||2 ),

and simlar to T1 , T4 , T5 , by (3.7) and (3.11), we get and it can be combined with (3.9) to show that
n n n n
|T6 | ≤ |[D(u ) − D(U )] c , ξ |

m
+ |[D(un ) − D(U n )]x cn n
x , ξ | ||∇π n ||2 t ≤ K( t2 + h8 ), (3.18)
n n
+ |[D(u ) − D(U )]y cn n
y , ξ |
n=1
≤ K(h8 + t2 + ||ξ n−1 ||2 ) + ε(||ξ n ||2 + ||∇ξ n ||2 ). At last we shall check the induction hypotheses
Next by lemma 3.1,lemma 3.2,lemma 3.4, we shall get (3.7) and (3.11)
|T7 | ≤ K( h8 + t2 + ||ξ n−1 ||2 ) + ε||ξ n ||2 . ||∇P l ||L∞ ≤ ||∇p̃l ||L∞ + ||∇π l ||L∞ ≤ K + Kh−1 ||∇π l ||
Similar to the pressure equation estimate (3.2), T8 can ≤ K + Kh−1 ( t + h4 ) ≤ K,
be written as
||C l ||W∞ l l
1 ≤ ||c̃ ||W 1 + ||ξ ||W 1 ≤ K + Kh
−2
||ξ l ||
|T8 | ≤ |d(C n−1 )dt π n , ξ n | ∞ ∞
≤ K + Kh−2 ( t + h4 ) ≤ K,
+ |[ d(C n−1 ) − d(cn )]dt p̃n , ξ n |
)$ %)
) ∂pn )
+ |d(cn )dt η n , ξ n | + )) d(cn ) dt pn − , ξ n )) for h sufficiently small , and the proof is complete.
∂t
≤ K(h8 + t2 + ||ξ n−1 ||2 ) + ε||ξ n ||2
)$ %) References
) π n − π n−1 n ))
+ )) d(C n−1 ) ,ξ ).
t
[1] J.Douglas,Jr. and J.E.Roberts, Numerical methods
Thus we obtain the estimate of the right-side of for a model for compressible miscible displacement
(3.10) by the preceding, next for the left-hand side of in porous media, Math. Comp., 41(1983), 441-459.
(3.10) we use the inequality 21 (a2 − b2 ) ≤ a(a − b) and
lemma 3.6, such that [2] Thomas F. Russell, Time stepping along charac-
teristics with incomplete iteration for a galerkin
1
{φξ n , ξ n − φξ n−1 , ξ n−1 } + C∗ |||ξ n |||2H 1 (Ω) approximation of miscible displacement in porous
2 t 0
$ n % (3.14) media, SIAM. J Numer. Anal., 17(1985), 970-1013.
ξ − ξ n−1 n
≤ φ ,ξ − ∇ · (D(U )∇ξ ), ξ n .
n n
[3] Dougals J. and Dupont T., Lecture Notes in Math
t
385., Berlin:Springer-Verlag, 1974.
So by (3.12)-(3.14), now we have
[4] Ryan L.Fernandes and Graeme Fairweather, Anal-
1
φξ n , ξ n − φξ n−1 , ξ n−1 + C∗ |||ξ n |||2H 1 (Ω) ysis of alternating direction collocation methods
2 t 0
for parabolic and hyperbolic problems in two space
≤ K( t2 + t2 h2 + h8 + ||ξ n−1 ||2 + ||ξ n ||2 ) (3.15) variables, Numerical Methods for Partial Differen-
)$ %)
) n
π −π n−1 ) tial Equations, 9(1993), 191-211.
+ ε(||ξ n ||2 + ||∇ξ n ||2 ) + )) d(C n−1 ) , ξ n )) .
t [5] Bernard Bialecki and X.Cai, H 1 -norm error
If (3.15) is multiplied by 2 t and summed in time n bounds for piecewise hermite bicubic orthogonal
(ξ 0 = 0, t = o(h) ), then it follows that space collocation schemes for elliptic boundary

m value problemsExistence and continuability of so-
φξ m , ξ m + |||ξ n |||2H 1 (Ω) t lutions for differential equations with delays and
0
n=1 state-dependent impulses, SIAM. J Numer.Anal.,

2 8

m
n 2

m 31(1994), 1128-1146.
≤K t + h + ||ξ || t +ε ||ξ n ||21 t
n=1 n=1
[6] N.Ma, T.Lu and D.Yang, Analysis of incom-
pressible miscible displacement in porous media

m
), -)
+2 ) d(C n−1 )(π n − π n−1 ), ξ n ) , (3.16) by a characteristics collation method, Numeri-
n=1 cal Methods for Partial Differential Equations,
where the last term of (3.16) can be written as 22(2006),797-814.

m
), -) [7] Y.Yuan, Time stepping along characteristics for
) d(C n−1 )(π n − π n−1 ), ξ n ) the finite element approximation of compressible
n=1 miscible displacement in porous media, Mathemat-
(3.17)
∗

m−1
n 2 ∗ m 2

m
n 2
ica Numerica Sinica, 4(1992), 385-400.
≤d ||π || t + d ||π || + ε ||ξ || t.
n=1 n=1 [8] Douglas J, and Thomas F.Russell,Numerical meth-
So the relations (3.16) and (3.17) can be combined with ods for convection-dominated diffusion problems
(3.9) and the Gronwall lemma to show that based on combining the method of characteristics
with finite element or finite difference procedures,

m
SIAM. J Numer. Anal., 19(1982), 871-885.
max ξ n 2 + C∗ |||ξ n |||2H 1 (Ω) t ≤ K( t2 + h8 ),
1≤n≤m 0
n=1

Existence of Weak Solutions for Evolution Inclusions

in Reflexive Banach Spaces
Guocheng Li
School of Basic Courses, Beijing Institute of Machinery
Beijing, 100085, China
E-mail: xyliguocheng@sohu.com
Abstract to the Volterra-Hammerstein integral equation, their

approach was based on the Schauder-Tychonoff theo-
In this paper we investigate the existence of weak mild
rem in a locally convex topological space. Szep[20] dis-
solutions for evolution inclusions in reflexive Banach
cussed the abstract Cauchy problem. The nonreflexive
spaces. Using Kakutani-Ky Fan fixed point theorem,
case was examined by Cramer, Lakshmikantham and
we prove existence theorems.
Mitchell[6], and more recently Bugajewski [2], Cichoń
keywords: evolution inclusion; weak mild solutions;
[3], Cichoń and Kubaczyk [5], and O’Regan [15-16].
fixed point
2 PRELIMINARIES
1 INTRODUCTION
In this section we fix our terminology and notation and
In this paper, we consider the following evolution in-
briefly recall some definitions and facts from multival-
clusions defined in a reflexive Banach space X:
ued analysis we shall need in the sequel.

x (t) ∈ A(t)x(t)+F (t,x(t)) a.e. on T X will always be a reflexive Banach space with norm
(1) || · ||. X ∗ will denote dual space of X. We assume
x(0) = x0 ,
that X ∗ is separable. We will let Xw denote the space
where t∈T =[0,b], {A(t):t∈T } is a family of closed, lin- X when endowed with the weak topology generated
ear, densely defined operators in X and F : T ×X −→ by the continuous linear functions on X(the family of
2X \{∅} is a set-valued function. Such evolution in- seminorms {ρh :h∈X ∗ } is defined by ρh (x)=|h(x)| for
clusions appear in the study of distributed parameter all x ∈ X. C(T,Xw ) denotes the family of weakly con-
control problems (see Hu-Papageorgiou[12]). For the tinuous functions on T (the family of seminorms {ηh } is
evolution inclusions some existence theorems of mild defined by ηh (g)=supx∈T ρh (g(x)) for all g ∈C(T,Xw ).
solution have been established by Cichoń[4], Papa- C(T,Xw ) is a locally convex topological space. For
georgiou[17] and Hu-Papageorgiou[12] when set-valued A ∈ 2X \{∅}, x∗ ∈ X ∗ , σA (x∗ ) = sup{(x∗ ,a) : a ∈ A} is
function F meet a certain set of growth condition, called support function of A. L(p) (T,X) we denote the
here the vector valued integral used in the mild so- space of all (equivalence classes of ) Pettis integrable
lution is taken in the sense of Bochner. The purpose functions f : T → X endowed with the norm
of our work is to establish the existence theorems of b
weak mild solutions for evolution inclusions when F
||f ||L(p) = sup |x∗ f (t)|dt.
is constrained by a weakly integrable bounded set- ||x∗ ||≤1 0
valued function as well as satisfies the weak growth
condition, where vector valued integral used in defi- Then L(p) (T,X) is a Banach space. Furthermore, sim-
nition of a weak mild solution is taken in the sense ilar to the proof of Theorem 1 of Diestel-Uhl [7, P.
of Pettis. This change is essential because the for- 98-100], we have {L(p) (T,X)}∗ = L∞ (T,X ∗ ), i.e, the
mer is based on the Bochner integral while the later dual space of L(p) (T,X) is L∞ (T,X ∗ ). By Pf (c) (X)
is on the Pettis integral. It is well known that there (respectively, P(w)k(c) (X) ), we denote the collection of
is at least one vector valued function that is Pettis in- all nonempty closed (convex) (respectively, nonempty,
tegrable but is not Bochner integrable in any infinite (weak) compact ( convex)subsets of X).
dimensional Banach space. About single-valued prob- A set-valued function F : T → Pf (X) is said to be
lem for the Pettis integral in reflexive Banach space, measurable if it satisfies any of the following equivalent
O’regan[14] established the existence of a weak solution conditions:

(i) t→d(x,F (t))=inf{||x−y||:y ∈F (t)} is measurable all f (·) ∈ L(p) (T,X), g(·) ∈ L∞ (T,X ∗ ). So we have for
for all x ∈ X; g(·) ∈ L∞ (T,X ∗ ):
(ii) there exists a sequence {fn }n≥1 of measurable
selections of F such that F (t) = cl{fn (t) : n ≥ 1} for all sup g,f = sup (g(t),f (t))dt.
t ∈ T (Castaing’s representation). f ∈SwF f ∈SwF T
(iii) GrF = {(t,x,v) ∈ T ×X ×X : v ∈ F (t,x)} ∈ Σ× As proved in Theorem 2.2 of Hiai-Umegaki [10] we

B(X)×B(X) with Σ being the Lebesgue σ-field of T have
and B(X) the Borel σ-field of X (graph measurable).
A graph measurable set-valued function F : T ×X → sup (g(t),f (t))dt = sup (g(t),z)dt.
Pf (X) has the property that if x : T → X is mea- f ∈SwF T T z∈F (t)
surable, then t → F (t,x(t)) is graph measurable, i.e.
GrF (·,x(·))∈Σ×B(X). So By Aumann’s selection the- Let L(t) = {x ∈ F (t) : (g(t),x) = M (t)}, where M (t) =
orem we can find a measurable function g : T → X supz∈F (t) (g(t),z). An application of Castaing’s rep-
such that g(t) ∈ F (t,x(t)) a.e. on T . We denote by resentation tells us that M (·) is measurable. Thus
SwF the set of all selections of F (·) that belong to the u(t,s) = (g(t),x) − M (t) is a Carathéodory function,
space L(p) (T,X), i.e., SwF = {f (·) ∈ L(p) (T,X) : f (t) ∈ hence jointly measurable. Therefore
F (t) a.e. on T }. We say that F (·) is weakly integrable GrL = {(t,x) ∈ T ×X : u(t,x) = 0} ∈ Σ×B(X).
bounded if and only if F (·) is measurable and for any
x∗ ∈ X ∗ , |x∗ F (·)| = sup{|x∗ (y)| : y ∈ F (·)} is integrable. Applying Aumann’s selection theorem we find fˆ:T →X
If F (·) is weakly integrable bounded, then SwF = ∅. measurable s.t. fˆ(t) ∈ L(t) a.e. Since F (·) is weakly
Let Y , Z be Hausdorff topological spaces and let G: integrable bounded, then fˆ(·) ∈ L(p) (T,X). Therefore
Y →2Z \{∅}. We say that G(·) is upper semicontinuous . .
(u.s.c.) , if for all C ⊆ Z nonempty closed, G− (C) = T
supz∈F (t) (g(t),z)dt = T (g(t),fˆ(t))dt = g,fˆ
{y ∈ Y : G(y)∩C = ∅} is closed in Y . If G(·) is closed ⇒ supf ∈SwF g,f = g,fˆ.
valued and G(Y ) is compact, then the above definition
of upper semicontinuity is equivalent to saying that Since g(·) was arbitrary we conclude that SwF is w-
G(·) has a closed graph in Y ×Z. compact in L(p) (T,X).
Let Δ = {(t,s) ∈ T ×T : 0 ≤ s ≤ t ≤ b} and by L(X) Lemma 2.2 If {fn (·), f (·)} ⊆ L(p) (T,X), fn (·) → f (·)
we will denote the space of linear, bounded, operators weakly in L(p) (T,X) and fn (t) ∈ G(t) a.e. on T ,
from X into itself. We will assume that the operators where G(t) ∈ Pwk (X) a.e. on T , then f (t) ∈ convw −
{A(t) : t ∈ T } generate a strongly continuous evolution lim{fn (t)}n≥1 a.e. On T .
operator (fundamental solution) S : Δ → L(X). Condi- Proof From Mazur’s Theorem we know that for all
tions for the existence of such an operator can be found k ≥ 1, f (·) ∈ conv∪n≥k fn (·). Then there exists gk (·) ∈
in Friedman[9], Kato[13], Tanabe [21] and Pazy [19]. A conv ∪n≥k fn (·) such that gk (·) → f (·) in L(p) (T,X).
function x : T → Xw is a weak mild solution of (1), if it Since X ∗ is separable, by diagonalization, we can find
is continuous and for all t ∈ T we have: subsequence {gkk (·)}k≥1 ⊆ {gk (·)}k≥1 such that gkk (t) →
t f (t) weakly a.e. on T . Then f (t) ∈ conv∪n≥k fn (t) a.e.
x(t) = S(t,0)x0 + S(t,s)f (s)ds, on T . Let x ∈ X ∗ . Then for all k ≥ 1 we have:
0
(x∗ ,f (t)) ≤ σconv∪n≥k fn (t) (x∗ ) = σ∪n≥k fn (t)(x∗ )
where f ∈ SwF (·,x(·)) . = supn≥k (x∗ ,fn (t)) a.e. on T
In the rest of this section we give some the lemmas ⇒ (x∗ ,f (t)) ≤ lim(x∗ ,fn (t))
that we will need in the proof of our existence results. = limσ{fn (t)} (x∗ ) a.e. on T.
The idea of following two lemmas can be found in
Hu-Papageorgiou [11, p. 187-188, Theorem 3.34; P. Using Proposition 3.1 of Papageorgiou [18], we get that
694, Proposition 3.9].
Lemma 2.1 If F : T → Pwkc (X) is weakly integrable limσ{fn (t)} (x∗ ) ≤ σ (x∗ )
w−lim{fn (t)}n≥1
bounded, then SwF is a nonempty, weakly compact and ⇒ (x∗ ,f (t)) ≤ σ (x∗ )
w−lim{fn (t)}n≥1
convex subset of L(p) (T,X). ⇒ f (t) ∈ conv w−lim{fn (t)}n≥1 a.e.on T.
Proof Nonemptiness and convexity follow from the
fact that F (·) is weakly integrably bounded and con- Finally we state a result which is just a version of the
vex valued. To show weak compactness, by James’s classical Ascoli-Arzelà Theorem, its proof is similar to
theorem (see Floret [8, p.59]), it suffices to show that the classical case. Please refer to Theorem 1.2 of [14]
every element in (L(p) (T,X))∗ attains its supremum on for similar results.
SwF . We know that (L(p) (T,X))∗ =L.∞ (T,X ∗ ) and the Theorem 2.3 Subset K of C(T,Xw ) is compact if and
duality brackets are given by g,f = T (g(t),f (t))dt for only if the following conditions are satisfied:

(1) K is closed; in SwG ×SwG and (gn ,fn ) ∈ GrN . Once again, Lemma
(2) for any ε>0 and h∗ ∈X ∗ , there exists δ=δ(ε,h∗ )> 2. 2 tells us that
0 such that supx∈K |h∗ (x(t)−x(s))| < ε when |t−s| < ε;
(3) {x(t):x∈K} is relatively weakly compact subset f (t) ∈ convw−limF (t,K(t)(gn )) a.e. on T.
of X for all t ∈ T . But note that K(t)(gn ) → K(t)(g) weakly in X. So,
exploiting the weak-semicontinuity of F (t,·) (H(f )(ii))
3 MAIN RESULTS we get that f (t) ∈ F (t,K(t)(g)) a.e. on T , then
For the proof of the main results , we shall need the f ∈ SwF (·,K(·)(g)) . Therefore (g,f ) ∈ GrN and GrN
following hypotheses: is closed in SwG ×SwG . Apply the Kakutani-Ky Fan
H(F ). F : T ×X → Pf c (X) is a set-valued function fixed-point theorem to get ĝ ∈ SwG such that
such that:
ĝ ∈ N (ĝ) ⇒ ĝ ∈ SwF (·,K(·)(ĝ)) .
(i) F (·,x) is measurable;
(ii)x → F (t,x) is ww-u.s.c; Then, t
(iii) F (t,x) ⊆ G(t) a.e. on T , where G : T → Pwkc (X)
is weakly integrable bounded. x̂(t) = S(t,0)+ S(t,s)ĝ(s)ds.
0
H(A). {A(t)}t∈T is family of linear, densely defined
Clearly, this x̂(·) is desired weak mild solution of (1).
operators that generate a strongly continuous evolution
We can have an alternative version of Theorem
operator S : Δ → L(X).
3.1,set-valued function obeys a more general w-growth
Theorem 3.1 If hypotheses H(F ), H(A) hold, then assumption. Assume the following:
problem (1) admits a weak mild solution. H (F ). F : T ×X → Pwkc (X) is a set-valued function
Proof For t ∈ T , let K(t) : L(p) (T,X) → X be defined such that:
by (i) t → F (t,x) is measurable;
t
K(t)(g) = S(t,0)x0 + S(t,s)g(s)ds. (ii)x → F (t,x) is u.s.c from Xw into Xw ;
0 (iii) for every x∗ ∈ X ∗ ,
Clearly, t → K(t)(g) is w-continuous from T into X.
Next, consider the set-valued function N : SwG → 2SwG | sup x∗ (u)| ≤ |(x∗ ,a(t))|+|(x∗ ,b(t))|·||x|| a.e.on T,
u∈F (t,x)
defined by
N (g) = SwF (·,K(·)(g)) . where a(·), b(·) ∈ L(p) (T,X).
Theorem 3.2 If hypotheses H (F ) and H(A) hold.
We shall show that N (·) has nonempty, convex values Then (1) admits a weak mild solution.
in L(p) (T,X) and is u.s.c.. Indeed, The convexity of Proof First, let us determine an a priori bound for
the values of N (·) is clear. To see the nonemptiness, we the solutions. So, let x(·) ∈ C(T,X ) be a weak mild
w
proceed as follows. Since K(·)(g) is weakly continuous solution of (1). Then, by definition, we have
on T , K(·)(g) is strongly measurable. Let Pn (·) be a t
simple function such that Pn (t) → K(t)(g) a.e on T in
x(t) = S(t,0)x0 + S(t,s)f (s)ds, t ∈ T, f ∈ SwF (·,x(·)) .
X. Then by virtue of hypothesis H(F )(i), for every n≥ 0
1, t → F (t,pn (t)) admits a measurable selection fn (·).
From hypothesis H(F )(iii) and Lemma 2.1, {fn (·)} is Fix t∈T , and without loss of generality, assume x(t) =
relatively weakly compact set in L(p) (T,X) and passing 0. Then there ∗
exists (consequence of the Hahn-Banach
to a subsequence if necessary, we may assume that fn → Theorem) x t ∈ X ∗ with ||x∗t || = 1 and
f weakly in L(p) (T,X) as n → ∞. Then from Lemma t
2.2, we have ||x(t)|| = x∗t S(t,0)x0 + S(t,s)f (s)ds .
0
f (t) ∈ convw−lim{fn (t)}n≥1 So, if ||S(t,s)|| ≤ M , (t,s) ∈ Δ, we have
⊆ convw−limF (t,pn (t)) .t
⊆ F (t,K(t)(g)) a.e. on T, ||x(t)|| ≤ M ||x0 ||+ 0 |x∗t (S(t,s)f (s))|ds
.t
= M ||x0 ||+ 0 |S ∗ (t,s)(x∗t f (s))|ds
the last inclusion being a consequence of hypothesis .t
≤ M ||x0 ||+ 0 ||S ∗ (t,s)||·|x∗t f (s)|ds
H(F )(ii). So f ∈ N (g) and this proves the nonempti- .t .t
≤ M ||x0 ||+M 0 |x∗t (a(s))|ds+M 0 |x∗t (b(s))|·||x(t)||ds
ness of the values of N (·). .t ∗
Next we shall show that N (·) is u.s.c from SwG with ≤ M ||x0 ||+M ||a(·)||Lp +M 0 |xt (b(s))|·||x(t)||ds,
the weak topology into itself. Since SwG is w-compact where S ∗ (·,·) is dual operator of S(·,·). Applying Gron-
in L(p) (T,X), in order to show that N (·) is u.s.c, we wall’s inequality, we get
only need to show that GrN ⊆ SwG ×SwG is weakly
sequentially closed. Hence, let (gn ,fn ) → (g,f ) weakly ||x(t)|| ≤ (M ||x0 ||+M ||a(·)||L(p) )exp(||b(·)||L(p) ) = M1 .

Consider the new multifuction F̂ :T ×X →Pwkc , defined with g ∈V . Then {y(t): y ∈W } is relatively w-compact
by in X for all t ∈ T (note that X is reflexive). Therefore,
from Theorem 2.3, we deduce that W is compact in
F (t,x), for ||x|| ≤ M1 C(T,Xw ). The convexity of the W is clear (note that
F̂ (t,x) =
F (t, M 1x
||x|| ), for |x|| > M1 V is convex).
Next, consider the set-valued function
Then, F̂ (·,·) has the same measurability and conti- R : W → 2C(T,Xw ) defined by R(y) =
nuity properties as F (·,·), and .t
x ∈ C(T,Xw ) : x(t) = S(t,0)x0 + 0 S(t,s)f (s)ds,f ∈ SwF̂ (·,y(·)) .
| sup x∗ (u)| ≤ |x∗ (a(t))|+M1 |x∗ (b(t)| a.e.on T. As above, by approximating y(·) with simple functions
u∈F̂ (t,x) and exploiting the upper semicontinuity of F̂ (t,·), we
get that SwF̂ (·,y(·)) = ∅, so R(y) = ∅. Furthermore, it
Let V = {g ∈ L(p) (T,X) : for every x∗ ∈ X ∗ ,|x∗ g(t)| is clear that, for every y ∈ W , R(y) ⊆ W . Therefore,
≤ |x∗ a(t)| + M1 |x∗ b(t)| a.e.on T }. Now, consider W ⊆ R : W → P (W ). We claim that R(·) is u.s.c. Since
fc
C(T,Xw ) defined
.t by W = {y ∈ C(T,Xw ) : y(t) = W is compact in C(T,Xw ), we only need to show that
S(t,0)x0 + 0 S(t,s)g(s)ds, g ∈ V }. Our claim is that W GrR ⊆ W ×W is closed. So, let
(the closure taken in C(T,Xw )) is compact and convex
in C(T,Xw ). To this end, let ε > 0, x∗ ∈ X ∗ be given, (yn ,xn ) ∈ GrR and (yn ,xn ) → (y,x) in W ×W .
and let δ(ε,x∗ ) > 0 be such that
t We have
∗ ∗ ε t
2M (|x a(t)|+2M1 |x b(t)|)ds < .
t−δ(ε,x∗ ) 4 xn (t)=S(t,0)x0 + S(t,s)fn (s)ds, t∈T, fn ∈SwF̂ (·,yn (·)) .
0
For 0 < δ(ε,x∗ ) < t < t ≤ b and every y ∈ W , we have,
Since F̂ (t,·) is u.s.c. from Xw into Xw ; and so, since
∗
|x [y(t )−y(t)]| it is Pwkc (X)-valued, it maps w-compact sets into w-
≤ |x∗ [(S(t ,0)−S(t,0))x0 ]| compact sets (Aubin-Cellina [1], Proposition 3, p.42).
. t .t So
+| 0 x∗ S(t ,s)g(s)ds− 0 x∗ S(t,s)g(s)ds|
conv∪n≥1 F̂ (t,yn (t)) = G(t) ∈ Pwkc (X),
≤ ||x∗ ||·||S(t ,0)−S(t,0)||·||x0 ||
. t and clearly it is w-integrable bounded. Hence, from
+| t x∗ S(t ,s)g(s)ds|
. t−δ(ε,x∗ ) ∗ Lemma 2.1, SwG is w-compact in L(p) (T,X). Because
+| 0 x (S(t ,s)−S(t,s))g(s)ds| {fn }n≥1 ⊆ SwG , by passing to a subsequence if neces-
.t
+| t−δ(ε,x∗ ) x∗ (S(t ,s)−S(t,s))g(s)ds| sary, we may assume that fn →f weakly in L(p) (T,X).
≤ ||x∗ ||·||S(t ,0)−S(t,0)||·||x0 || By Lemma 2.2, we have that
. t
+ t M (|x∗ a(s)|+M1 |x∗ b(s)|)ds
. t−δ(ε,x∗ ) f (t) ∈ convw−limF̂ (t,yn (t)) ⊆ F̂ (t,y(t)), a.e.
+ 0 ||S(t ,s)−S(t,s)||(|x∗ a(s)| ⇒ f ∈ SwF̂ (·,y(·)) .
∗
+M1 |x b(s)|)ds
.t
+2M t−δ(ε,x∗ ) (|x∗ a(s)|+M1 |x∗ b(s)|)ds Also .t
x(t) = S(t,0)x0 + 0 S(t,s)f (s)ds
We know that t→S(t,s) is continuous in the operator ⇒ (y,x) ∈ Gr R ⇒ R(·)is u.s.c.
norm topology on [s,b], uniformly in s, for t−s bounded We apply the Kakutani-Ky Fan fixed-point theorem to
away from zero. So, if 0<δ1 (ε,x∗ )<δ(ε,x∗ ) is such that, get x̂ ∈ W such that
for t −t < δ1 (ε,x∗ ), we have
x̂ ∈ R(x̂)
||x∗ ||·||S(t ,0)−S(t,0)||·||x0 || < 4ε .t
. t ⇒ x̂(t) = S(t,0)x0 + 0 S(t,s)f (t)ds, t ∈ T, f ∈ SwF̂ (·,x̂(t)) .
t
M (|x∗ a(s)|+M1 |x∗ b(s)|)ds < 4ε ,
Through Gronwall’s inequality, as in the beginning of
. t−δ(ε,x∗ )
0
||S(t ,s) − S(t,s)||(|x a(s)| + M1 |x b(s)|)ds < the proof, we can check that
∗ ∗
ε
4, we finally get that, for t −t < δ1 (ε,x∗ ), ||x̂(t)|| ≤ M1 , t ∈ T
∗
||x (y(t )−y(t))|| < ε ⇒ W is w−equicontinuous. ⇒ F̂ (t,x̂) = F (t,x̂(t))
⇒ x̂(·) is the mild weak solution of (1).
Also note that, for every y ∈ W , we have ACKOWLEDGEMENT
.t This work is supported by the National Natural Sci-
||y(t)|| ≤ sup||x∗ ||≤1 x∗ S(t,0)x0 + 0 S(t,x)g(s)ds
ence Foundation of China under Grants No.10571035
≤ M ||x0 ||+M (||a(·)||L(p) +M1 ||b(·)||L(p) ) and No.60574077.

References [20] A. Szep. Existence theorems for weak solutions
of ordinary differential equations in reflexive Banach
[1] J. P. Aubin and A. Cellina. Differential Inclusions.
spaces. Studia Sci. Math. Hungar., 6:197-203, 1971.
Springer-Verlag, Berlin , 1984.
[21] H. TanabeA. Equations of Evolution. Pitman, Lon-
[2] D. Bugajewski. On the existence of weak solutions of
don,1979.
integral equations in Banach spaces. Comment. Math.
Univ. Carolinae , 35:35–41, 1994.
[3] M. Cichoń. Weak solutiona of differential equations
in Banach spaces. Discuss. Mathematicae-Differential
Inclusions , 15:5-14, 1995.
[4] M. Cichoń. Differential inclusions and abstract control
problems . Bull. Austral. Math. Soc., 53:109-122, 1996.
[5] M. Cichoń and I. Kubiaczyk. Existence theo-
rems for the Hammerstein integral equation. Dis-
cuss. Mathematicae-Differential Inclusions, 16:171-
177, 1996.
[6] E. Cramer, V. Lakshmikantham and A. R. Mitchell.
On the existence of weak solutions of differential equa-
tions in nonreflexive Banach spaces. Nonlinear Anal.,
2:169-177, 1978.
[7] J. Diestel, J. J. Uhl. Vector Measures. Math. Surveys
Monogr.,1977.
[8] K. Floret. Weakly compact sets. Springer, Berlin, 1980.
[9] A. Friedman. Partial Differential EquationsLecture
Notes in Math., Hold, Reinhart and Winston, New
York, 1969.
[10] F. Hiai and H. Umegaki. Integrals, conditional ex-
pectations and martingales of multivalued functions.
J. Multivariate Anal., 7:149-182, 1977.
[11] S. Hu and N. S. Papageorgiou. Handbook of Mutival-
ued Analysis. I:Theory. Kluwer, 1997.
[12] S. Hu and N. S. Papageorgiou. Handbook of Mutival-
ued Analysis. II:Applications. Kluwer, 2000.
[13] T . Kato. Nonlinear evolution equations of hyperbolic
type. J. Fac. Sci. Univ. Tokyo Sect. I A, 17:241-258,
1970.
[14] D. O’Regan. Integral equations in Banach spaces and
weak topologies. Proc. Amer. Math. Soc., 124:607-614,
1996.
[15] D. O’Regan. Fixed point theory for weakly sequen-
tially continuous mappings. Math. Comput. Mod-
elling., 27(5):1-14, 1998.
[16] D. ÓRegan. Weak solutions of ordinary differential
equations in Banach space. Applied Mathematics Let-
ters., 12:101-105, 1999.
[17] N. S. Papageorgiou. On multivalued semilinear evo-
lution equations. Boll. Un. Math. Ital., 7(3-B):1-16,
1989.
[18] N. S. Papageorgiou. Convergence theorems for Ba-
nach space valued integrable multifunctions. Internat.
J. Math. Sci., 10:433-442, 1987.
[19] A. Pazy. Semigroups of Linear operators and applica-
tion to partial differential equations. Springer-Verlag,
Berlin ,1983.

A Class of Robust Strategy for Robot Manipulators with
Uncertainties
Lixia Zhi
Department of Mathematics and Physics, University of Petroleum, Beijing 102200, China
AMS Subject Classification: 70E60 n u 1 vector of centripetal and coriolis torques; G (q) is a
Abstract: A class of robust control strategy for trajectory n u 1 vector of gravitational torques; W d ( q, q , t ) is dynamic
tracking of robot manipulators with uncertainties based on part whose structure is unknown, including frictionǃexternal
Lyapunov theory is proposed in this paper. A controller based disturbance and unmodelled dynamics, etc.
on computed-torque structure and a continuous nonlinear The dynamic part with unknown structure is assumed to
compensator is included. It is shown that uncertain effects can satisfy the following inequality:
be eliminated and exponential stability, asymptotic stability or || W d ( q, q , t ) ||d E ( q, q ) ˄2˅
uniform ultimate bound stability can be guaranteed globally.
Our purpose is to design a robust tracking strategy to
Moreover, when the upper bounds of some uncertainties are
unknown, the controller can be revised to guarantee globally ensure three stability, that is globally exponential stability
asymptotic stabilization. A numerical example demonstrates ˄GES˅, globally asymptotic stability˄GAS˅or globally
the validity of the proposed method. uniform ultimate bound stability˄GUUB˅of the closed loop
for the robot manipulator described in (1).
1 Introduction
3 Robust Controller Design
In recent years, a large amount of references come up in
robust control field for robot manipulators with unmodelled Assuming M (q) , C ( q, q ) , G (q) is estimate of
uncertainties [1]. In these references, the most common M 0 ( q) , C 0 ( q, q ) , G0 ( q) respectively, M (q) is a symmetric
approach is to compensate the uncertainties using variable positive definite matrix, Then the standard form of system
structure control. Although the system’s globally asymptotic (1) is
stability can be guaranteed, the chattering caused by variable M 0 (q )q C0 (q, q ) q G0 (q ) W 0 (3)
structure control can’t be avoided. By introducing saturation Using computed torques on system (3)ˈthat is let
function and boundary technique in [2-4], the chattering is W0 M 0 (q )(qd kv e k p e) C0 ( q, q ) q G0 ( q ) (4)
greatly reduced, with failure in giving clear transient
description of the system. In [5], a continuous nonlinear W W0 u (5)
feedback controller is given in the case of uncertain systems
with sector nonlinear input. In fact, any input can be where, q d is desired trajectory; e q qd , k p ˈ k v are
considered as sector input. In this paper, if we use computed linear PD feedback matrices, which are both positive definite;
torques controller as feedback of the standard system, the u is an unknown function to determine. Using (4), (5), we get
system with parameter and unstructured uncertainties satisfy
e kv e k p e M 1['M (qd kv e k p e) 'H ]
the condition of [5], the nonlinear feedback controller can be (6)
established accordingly, which is then revised to eliminate the M 1u M 1W d (q, q , t )
uncertain effects. The globally exponential stability, globally
asymptotic stability or globally uniform ultimate bound where,
stability of the closed loop system can be guaranteed in the 'M M 0 (q) M (q)
same time. In addition, when the upper bounds of some 'H C 0 (q, q )q G0 (q) C (q, q )q G (q)
uncertainties are unknown, we design a simple adaptive
controller to ensure the globally asymptotic stability of the Let x [e, e]T , formula (6) can be expressed in the
closed loop system. following form:
x Ax BM 1['M (qd kv e k p e) 'H ]

(7)
2 Problem Formulation BM 1u BM 1W d
The dynamics of a robot manipulator with n freedom can

be written as ª 0 I º ª0 º .
Where, A « k I ˈB
M (q )q C (q, q )q G (q ) W d (q, q , t ) W (1) ¬ p k v I »¼ «I »
¬ ¼
whereˈ W is a n u 1 vector of applied torques; q(t ) is a n u 1
vector of joint displacements; M (q) is a n u n symmetric Let f ( x ) Ax , g ( x) B,
positive definite manipulator inertia matrix; C (q, q )q is a

h1u T u d u T I (u ) d h2 u T u .
'f ( x) M 1['M (qd kv e k p e) 'H ] , From above, we see that system ˄ 7 ˅ satisfy the
assumption of theorem 1 in reference [5], thus, we can
p( x, t ) BM 1W d , we can clearly see that formula (7) is a design controller as following:
special form of system (1) when g ( x , q ) 0 in [5]. In In case (I):
order to use the conclusion in [5], we first prove that formula B T PxJ 2
u1 T , (8)
(7) satisfy the assumption 1-3 in reference [5]. a (|| B Px || J 0.5H || x || 2 )
Firstly we give a lemma: E1 ( x) E 2 ( x) h2 h1
Lemma 1 ˖ To system (3) ˈ if k v , k p R
nun
are where J , k1
1 k1 h2 h1 E 1 ( h2 h1 ) ,
symmetric positive definite matricesˈthe closed loop system
obtained from computed torques (4) is globally exponentially h2 h1
a
stable. 2 .
The detailed proof of this lemma can be found in [6].
In case˄II˅˖
From lemma 1, we know that the standard system
MB T PxJ 2 (9)
x Ax Bu is globally exponentially stable after using u2 T
|| B Px || J 0.5H || x || 2
u D ( x ) 0 as feedback. Basing on Lyapunov stability
theoryˈwe can select a positive definite matrix Q to ensure
where J E1 ( x) E 2 ( x) .
that there is a symmetric positive definite matrix P

Under the two above conditions, H is a positive
satisfying Riccati equation AT P PA Q . That is, let constant satisfying D (Omin (Q) H ) Omax ( P) ! 0 .
Omin (Q)
V 0.5 x T Px , we have V x T Qx d V KV
Omax ( P ) If E1 ( x) ˈE 2 ( x) are completely known, the following
theorem can be proved:
Theorem 1˖For the system (1), if we use (4) and (5) as
where, O max ˈOmin indicate the smallest and largest eigenvalue
controller and (8) or (9) as compensator, the closed loop
system (7) obtained is globally exponential stable, and
respectively. Assumption 1 is proved.
Noting that the desired trajectory is uniform bounded O max ( P) 1
and take formula (2) into consideration, we have (the detailed || x(t ) ||d || x(t 0 ) || exp{ D (t t 0 )}
O min ( P) 2
proof can be found in [7])
M 1 [ 'M ( qd k v e k p e) 'H ] d E 1 ( x ) , Remark˖In formula (8), we consider I (u ) M 1u as

sector input, which enhanced the robust stability of the system.
That is to say, the stability of the closed loop system is
M 1W d ( q, q , t ) d E 2 ( x ) ensured when there is a disturbance for u , provided that the
disturbance for u lies in the range of the boundary of sector
where E 1 ( x ) , E 2 ( x) are continuous functions respectively,
T input. In addition, if I (u ) is known, but its form is very
and x [e, e] .
complex, controller (8) is suitable, which simplified the
Now we can prove that I (u ) satisfies the condition of
design. The more detailed discuss can be found in reference
sector nonlinear input (that is assumption 3). We consider it
[8].
from two aspects:
Now we revised the compensator (9) as followed:
(I): Because the inertia matrix M is positive definite Theorem 2˖For the system (1), if we use (4) and (5) as
controller and (10) as compensator,
and bounded, we have Omin ( M ) O ( M ) Omax ( M ) ,
MB T PxJ 2 , (10)
u T
I I || B Px || J 0.5H (t )
so d O ( M 1 ) d ,
Omax ( M ) Omin ( M ) the closed loop system (7) obtained can be ensured three
1 1 stability (GUUBˈGASˈGES) according to different choice
u T u d u T I (u ) u T M 1u d uT u ,
O max ( M ) O min ( M ) of H (t ) .
Before the proof of theorem 2, we give a lemma firstly,
Let h1 1 Omax ( M ) , h2 1 Omin (M ) , we have references [7,9] is useful for comprehension.
Lemma 2˖Assuming V (t ) is a given Lyapunov
functionˈsatisfying V d OV H (t ) , where O is a positive
h1u T u d u T I (u ) d h2 u T u . constant, H ( t ) ! 0 ˈ t ! 0 , we have
˄1˅If H (t ) c or lim H ( t ) c ˈthe closed
(II): we look I (u ) u as the special form of sector tof
nonlinear input. Take h1 h2 1 , we get loop system (7) obtained is globally uniform ultimate bound

stable˄GUUB˅, and 1
x T Q x 2 x T PB { M [ ' M ( qd k v e k p e ) ' H ]
1
V (t ) d [c (c OV (0)) exp(O t ) t>0
O
MBT PxJˆ 2
˄2˅If lim H (t ) 0 ,the closed loop system (7) M 1W d } 2 xT PBM 1 T
2 || BPx || (J Jˆ ) 2
tof
|| B Px || Jˆ 0.5H || x ||2
obtained is globally asymptotic stable (GAS), for example we
2 || x T PB || 2 Jˆ 2
can take H (t ) 1 (2t 1) . d x T Qx 2 xPBJˆ T
|| B Px || Jˆ 0.5H || x || 2
˄3˅If H (t ) H exp( Et ) , H ! 0, E ! 0 , the
2 O (Q) H
closed loop system (7) obtained is globally exponentially d x T Qx H || x || d min V 0
O max ( P)
stable (GES), and
Remark˖A simple and self-adapting robust controller,
V (0) exp(O t ) H t exp(O t ) O E
° which calculates faster and is more easily realized than
V (t ) ® H
°V (0) exp( O t ) O E [exp( E t ) exp(O t )] O z E traditional one, is offered in Theorem 3. What’s more, it is not
¯ necessary to consider continuous inspiring conditions that
t ! 0 . guarantee the convergence of the parameters.
Now we can prove theorem 2:

The proof of theorem 2: We choose Lyapunov function 4 Numerical Example
as V x T Px , then V x Px xT Px
Consider a robot system with two joints and two degrees
xQx 2 xT PB{M 1['M (qd kv e k p e) 'H ] of freedom, whose dynamic equation is easily got as (12)
M 1W d } 2 xT PBM 1u without consideration of such uncertainties as friction.
q C ( q , q) q
M ( q ) W ˄12˅
2 T 2 xT PBBT PxJ 2 T
d Omin (Q ) || x || 2 || B Px || J ( x) T where q [ q1 , q 2 ]
|| B Px || J 0.5H ( x)
Omin (Q ) ª p1 p2 2 p3 cos(q2 ) p2 p3 cos( q2 ) º
d Omin (Q ) || x ||2 H (t ) d V H (t ) M (q) « p p cos( q ) p2 »
Omax ( P) ¬ 2 3 2 ¼
ª p3 sin(q2 )q2 p3 sin(q2 )(q1 q2 ) º ,
Take lemma 2 into consideration, theorem 2 can be C (q, q ) « p sin(q )q »
proved instantly. ¬ 3 2 1 0 ¼
Theorem 2 is proved with E1 ( x) and E 2 ( x ) known. m1 m2 l22 m2l1l2
p1 ( m2 )l12 I , p2 I 2 , p3
4 4 2
Usually, measurement of E1 ( x) and E 2 ( x) is too
conservative, especially when there is W d (q, q , t ) , system is When the robot is loaded or moves quickly, parameters
as mass, length of the arm and turning inertia tend to change.
controlled so much that the operator may saturate. In order to
With the influence of other parameters as friction and elastic
overcome the too conservative estimate, a self-adapting strain which are not considered in this model, the robot
control law is created and then the uncertain parameters of system could be described as
upper bound can be estimated online. Eventually, the close ( M 0 (q) 'M 0 (q))q (C0 (q, q ) 'C0 (q, q ))q W d W ˄13˅
loop system still gradually and globally converges. where
With Jˆ the estimate of J , revised compensate control
algorithm is ª p01 p02 2 p03 cos(q2 ) p02 p03 cos(q2 ) º
M 0 (q ) « »
T
B PxJˆ 2
¬ p02 p03 cos(q2 ) p02 ¼
u
|| B T Px || Jˆ 0.5H || x || 2 ª p03 sin(q2 )q2 p03 sin(q2 )(q1 q2 ) º
C0 (q, q ) « p sin(q )q »
Jˆ T
c || B Px || (11) ¬ 03 2 1 0 ¼
whereˈ c is a positive constant, the selection of H and 'M (q )
P is the same as in theorem 1. ª 'p01 'p02 2'p03 cos( q2 ) 'p02 'p03 cos(q2 ) º
Theorem 3˖For the system (1), if we use (4) and (5) as « »
controller and (11) as compensator, the closed loop system (7) ¬« 'p02 'p03 cos(q2 ) 'p02 ¼»
obtained is globally asymptotic stable.
Initial values of parameters in robot system are
Proof: let V c V 1 (J Jˆ ) 2 1
x T Px (J Jˆ ) 2 ,
c c m01 1kg , m02 1kg , l 01 1m , l 02 1m , I 01 1, I02 1
we have
2 with 50% uncertainty in the parameters, then p 01 1.45 ,
V c x T Px xT Px Jˆ (J Jˆ )
c
p 02 0.45 , p 03 0.5 , 'p 01 0.725 | sin t | ,

1
'p02 0.225 | sin t | , 'p 03 0.25 | sin t | . 0.9
0.8
With W d [q1q1 sin(t ), q2 q2 cos(t )]T , controlling

0.7
0.6
0.5
parameter kv diag(100), k p diag(50)andQ I , by

0.4
0.3
0.2
T
Riccati Equation A P PA Q , symmetrical positive 0.1
0
0 1 2 3 4 5 6 7 8 9 10
definite matrix P is

(a) positional error
ª 1.2600 0.0000 0.0050 0.0000º 0.05
« 0.0000 1.2600 0.0000 0.0050 » 0
P « » -0.05
« 0.0050 0.0000 0.0101 0.0000 » -0.1
-0.15
« »
¬ 0.0000 0.0050 0.0000 0.0101 ¼
-0.2
-0.25
-0.3
With a prior uncertain function of upper boundary -0.35
-0.4
2 2.5|| x || || x ||2 ,If the expected tracks of the

-0.45
J -0.5
0 1 2 3 4 5 6 7 8 9 10
robot are (b) velocity error

qd 1 0.5sin(t ) 0.1sin(3t ) 0.2sin(4t ) , Figure 3
qd 2 0.1sin( 2t ) 0.2 sin(3t ) 0.1sin( 4t ) and initial 0.9
1
1.0 ˈq 2 (0) 0.5 ˈq1 (0) q 2 (0) 0 , by

0.8
states are q1 (0) 0.7
0.6
Runge-Kutta, results of this simulation are as follows 0.5
1 0.4
0.9 0.3
0.8 0.2
0.7 0.1
0.6 0
0 1 2 3 4 5 6 7 8 9 10
0.5
0.4
0.3
(a) positional error
0.2 0.05
0.1 0
0 -0.05
0 1 2 3 4 5 6 7 8 9 10
-0.1
(a) positional error -0.15
-0.2
0.05
-0.25
0
-0.3
-0.05
-0.35
-0.1
-0.4
-0.15
-0.45
-0.2
-0.5
-0.25 0 1 2 3 4 5 6 7 8 9 10
-0.3
-0.35
-0.4
(b) velocity error
-0.45
-0.5 0.5014
0 1 2 3 4 5 6 7 8 9 10
0.5012
(b) velocity error
0.501
Figure 1 (a)(b)Simulating Results with Application of 0.5008
Controlling Law˄4˅˄5˅˄8˅ 0.5006
1
0.5004
0.9
0.8 0.5002
0.7
0.5
0 1 2 3 4 5 6 7 8 9 10
0.6
0.5
0.4
0.3
˄c˅Self-adaptive curve of J
0.2
0.1
Figure 4 (a)(b)(c)Simulating result of self- adaptive
0
0 1 2 3 4 5 6 7 8 9 10
Lyapunov Controller with uncertain upper boundary
(a) positional error From the figures above, the controlling law in this paper
0.05 works well whether the uncertain upper boundary function is
0
-0.05 known or not, namely, unknown parameters change little and

-0.1
-0.15
converge to the constant 0.5. Moreover, so long as the
-0.2
-0.25
disturbance of u is located in the sector area, the controlling
-0.3
-0.35
law can work. Cases of u ' Du , D 0.5 and 2 are
-0.4
-0.45
simulated as in figures 2 and 3. Compared with Literature [7],
-0.5
0 1 2 3 4 5 6 7 8 9 10 there is less parameter that need be distinguished. Therefore,
(b) velocity error simulative velocity is faster (according to actual simulative
operation); what’s more, in spite of the large initial error, nice
Figure 2

results can be got nevertheless. 3 Craig J J. Adaptive control of mechanical manipulators.
New York: Addison-Wesley, 1988
4 Man Zhihong, Palaaniswami M. Robust tracking Control
5 Conclusion for rigid robotic manipulators. IEEE Trans. Auto.
Contr., 1994, 39(1):154-159.
A class of robust control strategy for trajectory tracking
5 Lixia Zhi, Hong Wang, Robust Exponential Stabilization
of robot manipulators with uncertainties, which is based on
for Uncertain Dynamic Systems with Sector
Lyapunov theory, is proposed in this paper. It consists of a
Nonlinearities. Control and Decision. 2001, 16(5):
controller based on so-called Computed-torque structure and
545-548 (in Chinese).
a continuous nonlinear compensator. Moreover, when the
6 Richard M.Murray, Zexiang Li, S.Shanker Sastry. A
upper bounds of some uncertainties are unknown, the control
Mathematical Introduction to Robotic Manipulation.
is revised to guarantee globally asymptotic stabilization. A
CRC Press. 1994.
numerical example demonstrates the effectiveness of the
7 Ying Dai,Nangning Zheng, Chunlai Li. A new class of
proposed method.
robot control strategies for trajectory tracking of robot
manipulators. Robot, 1998,20(2):111-115 (in Chinese)
8 Jianfeng Wei, Liufan Zheng. On Robust Almost
References Disturbance Decoupling Control Design of a Class of
1 Abdallah C et al. Survey of robust control for rigid Uncertain Nonlinear Systems with Sector Nonlinear
robots. IEEE Control Systems, 1991, 11(1):24-30. Inputs. Control Theory and Application, 2000,
2 Ye Xudong , Jiang Jingping. A new robust control 17(5):747-751 (in Chinese)
strategy for tracking. Control Theory and Application, 9 LaSalle J, Lefschetzs. Stability by Lyapunov's Direct
1994, 11(4): 502-506. Method with Application. 1961, Academic Press, New
York.

THE SUFFICIENT AND NECESSARY CONDITIONS OF ERROR BOUNDS

FOR CONSTRAINED MULTIFUNCTIONS
JIAN-RONG WU1, SHI-JI SONG2
1
Department of Applied Mathematics, University of Science and Technology of Suzhou, Jiangsu, 215009, China
2
Department of Automation, Tsinghua University, Beijing 100084, China
E-mail: jrwu@mail.usts.edu.cn, shijis@tsinghua.edu.cn
Abstract: are also closely related to metric regularity and weak sharp
In this paper, the concept of contingent derivative minima of mathematical programming problems. Some of
DC F ( x, y ) (with respect to a set CX ) of a these achievements can be seen in the recent papers by Wu
and Ye [1], Nagi and Thera [2-3], Bosch, Jourani and
multifunction F is introduced; then the sufficient and Henrion [4], and the references therein.
necessary conditions of error bounds for the constrained set
inclusion in terms of contingent derivatives are given.
Thanks to the need of applications, the above convex
inequality system (1) has been generalized to the set
Keywords: inclusion system involving a multifunction. Let X , Y
Y
error bound, contingent derivative, set inclusion system, be Banach spaces and F : X o 2 be a multifunction.
Banach space
Let b Y , and consider the problem of set inclusion:
b F (x) (2)
1. Introduction
Thus F (b) ^x X : b F ( x ) ` consists of all
1
Let f 1 , …, f n be convex functions from a normed x satisfying (2). The following definition was introduced
and studied in [5] by Li and Singer for the special case
space X to R ^ f`. Consider the following system
when X , Y are normed spaces.
of finitely many convex inequalities: Y
Definition 1. Let F : X o 2 , b Y . We say
f i ( x) d 0 , i 1,…, n . (1)
that F has a Lipschitz error bound (or error bound for
Let S denote the solution set of the system (1). We short) for set inclusion (2) if there exists a constant
say that (1) has an error bound if there exists a constant W 0, f such that
W 0, f such that
d ( x, F 1 (b)) d W d (b, F ( x)) , x X .
d ( x, S ) d W max ^> f i ( x) @ : i 1, ..., n ` Y
A multifunction F : X o 2 is said to be
where d ( x, S ) = inf s x , (i) convex, if for any x1 , x 2 X , 0 d O d 1, we
sS
>f i @
(x) =max ^ f i (x), 0 ` . have
In the last two decades, the study of error bounds of

O F ( x1 ) +(1- O ) F ( x 2 ) F (Ox1 (1 O ) x 2 ) ;
convex systems has received a growing interest in the (ii) closed, if its graph
mathematical programming literature. There are both Gr F = ^( x, y ) X u Y : y F ( x)`
theoretical and practical reasons for this phenomenon. For is closed in X u Y .
example, the theory of error bounds have been crucial for
analysis of linear convergence of various descent methods
Moreover, we say thatF has closed value if F (x)
for solving linearly constrained programs, for termination is closed in Y , x X . It is easy to see that F has
conditions in several optimization methods, for the closed value if F is closed; F is convex if and only if
sensitivity analysis of integer or linear programs and of Gr F is a convex subset in X u Y .
linear complementarity problems, and so on. Error bounds Li and Singer [5] showed that the set inclusion (2) has

an error bound if X , Y are Banach spaces, F is a in S O such that x n o x 0 . Then f ( x n ) d O , n .
convex closed multifunction and b int F ( X ) . Recently,
Since Y is reflexive, meanwhile, F ( x n ) is convex and
Zheng [6] proved the above result is still true under the
weaker condition for F . closed, there exists y n F ( x n ) such that
In this note, we will first introduce the concept of b y n = d (b, F ( x n )) = f ( x n ) d O . Therefore
contingent derivative of a multifunction; then, develop the
sufficient and necessary conditions of error bounds for the y n b + O BY , where BY denotes the unit closed ball
set inclusion in terms of contingent derivatives, where the
of Y . Recall that Y is reflexive, BY is weakly compact
solution of the considered set inclusion system is
constrained in a given subset. and hence is weakly sequentially compact from
Eberlein-Smulian Theorem (cf. [8, P. 18]). Then there is a
2. Notation and preliminaries weak limit point y 0 of ^y n `, and y 0 b + O BY , that
Let X be a Banach space, K X and x X .

is, b y 0 d O . Furthermore, x0 , y 0 is a weak
The contingent cone[7, P.121] TK (x ) is defined by limit point of ^( x n , y n ) ` in Gr F . Note that Gr F is

^
TK (x) = v X : lim inf d ( x hv, K ) / h
hp0
` closed and convex, by Mazur Theorem (cf. [8, P. 11]),
Gr F is closed in X u Y (where X is equipped with
It is convenient to have the following characterization weak topology and Y with norm topology). Then,
of this cone in terms of sequences: v TK (x ) if and only x0 , y 0 Gr F , that is, y 0 F ( x0 ) . And hence,
if hn p0 and vn o v such that n , f ( x0 ) = inf b y d b y 0 d O , or, x0 S O .
yF ( x0 )
x hn v n K .
Y
Thus S O is closed.
Let F : X o 2 be a multifunction. The
contingent derivative DC F ( x, y ) (with respect to 3. Main results
C X ) of F at ( x, y ) Gr F is the set-valued map
Theorem 1. (Sufficient Condition) Let X be a
from X to 2 Y defined by: v ( DC F ( x, y ))(h) if Banach space, Y a Hilbert space with the inner product
and only if t n o 0+ ˈ hn o h and v n o v operator , , and F : X o 2 Y a convex closed
such that
multifunction. Let C be a closed subset of X ,
y t n v n F ( x t n hn ) , x t n hn C . 1
b F(X ) ˈ SC F (b) C z I .
F : X o 2 Y be a multifunction,
Lemma 1.
y Y .
Let
If 1

x X \ F (b) C ˈ there exist h TC (x)
1 with h =1 and v DC F ( x, p ( x))( h) such that˖
(i) If F is convex, then F ( y ) is a convex
subset; b p( x) 1
1
(ii) If F is closed, then F ( y ) is a closed subset. ,v t (3)
b p( x) W
The proof is trivial.
Lemma 2. Let X be a Banach space, Y a then, x C ˈ d ( x, S C ) d W d (b, F ( x)) . Where
reflexive Banach space, and F : X o 2 a convex
Y
W ! 0, p(x) is the (orthogonal) projection of b onto
closed multifunction. Then, for any b Y , the function F (x) .
f (x) = d (b, F ( x)) is lower semicontinuous. Remark: If Y is a Hilbert space, by Theorem
Proof. To show that f is lower semicontinuous, it is 3.1[9], p (x) is always existed uniquely.
enough to show that the set S O = ^x : f ( x) d O ` is Proof. Suppose (3) is not true, then there exist
x0 C and H 0 ! 0 such that:
closed for any O R . To this end, take a sequence ^x n `

1 such that:
d (b, F ( x0 )) d ( x0 , S C ) , x1 t n hn C ˈ p ( x1 ) t n v n F ( x1 t n hn ) .
W H0
therefore, From (7), we have
d (b, F ( x0 )) inf ^ d (b, F ( x)) : x C `+ || b p ( x1 ) ||
1 W 2H 0
d ( x0 , S C ) . < G (b, F ( x1 t n hn )) + || t n hn ||
W H0 (W H 0 ) 2
Apply Lemma 2 and Ekeland Variational Principle[10] , W 2H 0
d || b p ( x1 ) t n v n || + || hn || t n
W 2H 0 (W H 0 ) 2
for O= there is an x1 C such
(W H 0 )d ( x0 , S C ) d || b p ( x1 ) t n v || + t n || v n v ||
that: W 2H 0
Od ( x 0 , S C ) + || hn || t n
d (b, F ( x1 )) d d (b, F ( x 0 )) - || x0 x1 || (W H 0 ) 2
W H0 that is,
W 2H 0 1
= d (b, F ( x 0 )) - || x0 x1 || (4) 0d > || b p( x1 ) t n v || || b p( x1 ) || @
(W H 0 ) 2 tn
1 W 2H 0
|| x0 x1 || d < d ( x0 , S C ) (5) + || v n v || + || hn || .
O (W H 0 ) 2
and for x C ˈ x z x1 Let n o f , then
Od ( x 0 , S C ) b p ( x1 ) W 2H 0
d (b, F ( x1 )) d (b, F ( x)) + || x x1 || 0d- ,v + ,
W H0 b p ( x1 ) (W H 0 ) 2
W 2H 0 or,
= d (b, F ( x)) + || x x1 || . (6)
(W H 0 ) 2 b p( x1 ) W 2H 0 1
1 ,v d < ,
(5) tells us that x1 X \ F (b) . Since F is b p( x1 ) (W H 0 ) 2 W
convex and closed, F ( x1 ) is a convex and closed set in which is a contradiction to (8).
Y . Recall that p ( x1 ) is the projection of b onto Theorem 2. (Necessary Condition) Let X be a
reflexive Banach space, Y a Banach space, F :
F ( x1 ) , || b p ( x1 ) || = d (b, F ( x1 )) , which together
X o 2 Y a convex closed multifunction. C X is
with (6) implies that for x C ˈ x z x1 1
convex and closed, b F ( X ) with S C F (b) C
W 2H 0
|| b p ( x1 ) || < d (b, F ( x)) + || x0 x1 || (7) z I . If there is an W >0 such that for any x C ,
(W H 0 ) 2
1
d ( x, S C ) d W d (b, F ( x))
Note that x1 ( X \ F (b)) C ˈby hypothesis,

x X \ F 1 (b) C ˈ there exists a
then,
there are h K C ( x1 ) with h =1 and
p f 1 (b) such that:
v DC F ( x1 , p( x1 ))(h) such that:
§ § x p ·· 1
b p ( x1 ) 1 d ¨¨ 0, DC F ( p, b )¨¨ ¸¸ ¸¸ t (9)
,v t (8) © © || x p|| ¹ ¹ W
b p ( x1 ) W
According to the definition of the contingent
1
Proof. Suppose x X \ F (b) C . From
1
derivative, there are t n o 0+ ˈ hn o h and v n o v Lemma 1, F (b) is closed and convex, which implies

that S C is also closed and convex. Remember that X is No.: .-' ˈ .-% and the National
Natural Science Foundation of China (Grant No.:
reflexive, there is p SC such that 60574077)
|| x p || = d ( x, S C ) >0 ( p is the (orthogonal)
References
projection of x onto S C ).
[1] Z. Wu and J.J. Ye, On error bounds for lower
§ x p ·
Take any v DC F ( p, b)¨¨ ¸¸ , there exist semicontinuous functions, Math. Program., Ser. A, Vol
© || x p || ¹ 92: pp. 301-314, 2002.
[2] H. Nagi and M. Thera, Error bounds and implicit
x p multifunction theorem in smooth Banach spaces and
t n o 0+ˈ hn o and v n o v such that:
|| x p|| applications to optimization, Set-Valued Analysis, Vol
12: pp. 195-223, 2004.
b t n v n F ( p t n hn ) , p t n hn C . [3] H. Nagi and M. Thera, Error bounds for convex
By hypothesis, differentiable inequality systems in Banach spaces,
d ( p t n hn , S C ) d W d (b, F ( p t n hn )) Math. Program., Ser. B, Vol 104: pp. 465-482, 2005.
[4] P. Bosch, A. Jourani and R. Henrion, Sufficient
d W || b (b t n v n ) || = W t n || v n || . (10) conditions for error bounds and applications, Appl.
On the other hand, Math. Optim, Vol 50: pp. 161-181, 2004.
d ( p t n hn , S C ) t d ( x, S C ) - || x p t n hn ) || [5] L. Wu, I. Singer, Global error bounds for convex
multifunctions and applications, Math. Oper. Res., Vol
x p x p 23: pp. 443-462, 1998.
t || x p || - x p t n - t n hn [6] X. Zheng, Error bounds for set inclusions, Science in
|| x p || || x p || China, Series A, Vol 33: pp. 631-643, 2003.
x p [7] J.P. Aubin, H. Frankowska, Set Valued Analysis, Boston:
= || x p || - || x p || t n - t n hn . Birkhauser, 1990.
|| x p || [8] J. Diestel, Sequences and Series in Banach Spaces,
Springer-Verlag, New York, 1984.
Since t n o 0+, we can suppose that || x p||! t n ,
[9] J. Weidmann, Linear Operators in Hilbert Spaces,
then Springer-Verlag, New York, 1980.
x p [10] I. Ekeland, On the variational principle, J. Math. Anal.
d ( p t n hn , S C ) t t n - t n hn . Appl., 47: pp. 324-352, 1974.
|| x p ||
Notice that (10), we have
x p
W t n || v n || t t n - t n hn ,
|| x p ||
or,
x p
W || v n || t 1- hn .
|| x p ||
1
Let n o f , then W ||v || t 1, that is, ||v || t .
W
The arbitrariness of v follows that (9).
Acknowledgements
This paper is supported by the Science Foundation

from the Ministry of Education of Jiangsu Province (Grant

DCDIS A Supplement,
Dynamics Advances
Press in Discrete
of Continuous,
Copyright@2007 Watam Neural Networks, Vol. 14(S2) 770--773
and Impulsive Systems
Commutativity Theorems On Rings1

Chen Guanghai and Yang Xinsong
AMS subject classifications: 16U80.
Abstract: In this paper we give two theorems Proof: If R is not commutative, then there exist
about commutativity in Kőthe semisimple rings and x, y ∈ R such that [x, y] = 0. The subring R1 gener-
semiprime rings, which improve the results in references ated by x, y is not commutative. For all a ∈ R1 and
[5], [7], [10] and [11]. e, x ∈ R, there exists a polynomial f (t1 , t2 ) such that
[f (a, b), c] = 0.
1 Introduction Note that
f (a, t) = ak + a2k φ1 (a1 ).
In most paper on commutativity of rings, ring R was
Let
said to satisfy a polynomial f (t1 , t2 ), in which the small-
a1 = ak + a2k φ1 (a1 ).
est degree term has the form such as ±t1 tk2 or ±tk1 t2 .
We want to extend this condition to that the smallest For a1 , e, y ∈ R, we have
degree term can be a polynomial in which the degree of
[g(a1 , e), y] = 0,
neither indeterminate should be 1. Thus the results we
get in this paper improve the conclusions of the refer- Similarly, by letting
ences [5], [7], [10] and [11] naturally.
a2 = g(a1 , e) = ak1 + a2k
1 φ1 (a1 )
= akk + akk+1 φ2 (a),

2 Preliminaries
we have a2 ∈ R1 , [a2 , x] = 0 and [a2 , y] = 0. This
In this paper, R denotes a ring, Z(R) and J(R) denote implies that a2 ∈ Z(R1 ). By Lemma 2.1, R1 is com-
the center of R and the Jacobson radical of R, respec- mutative. This is a contradiction. Thus R must be a
tively. Let [x, y] = xy − yx and let C(R) be the ideal field.
that generated by all elements having the form [x, y]. In the next lemma, some restrictions are added
We can suppose that in this paper the polynomial to the coefficients of f (t1 , t2 ). As in [3], axb (axe ) and
f (t1 , t2 ) can be written as the sum of f1 (t1 , t2 ) and ayb (aye ) are the sums of coefficients of all terms that be-
f2 (t1 , t2 ), where the polynomial f1 (t1 , t2 ) is the smallest gin with t1 in f (t1 , t2 ) and that end with t2 in f (t1 , t2 ),
degree part in f . The sum of all coefficients in f1 (t1 , t2 ) respectively.
is 1 or −1, and, in any term of f1 (t1 , t2 ), the degrees of
Lemma 2.3 If Jacobson semisimple ring R satisfies
t1 and t2 are k1 and k2 (t1 +t2 = n), respectively, where
condition (A) and the coefficients of f satisfy either one
n is a fixed integer. For the terms in f2 (t1 , t2 ) with de-
of the following conditions
grees bigger than n, the degree of t1 is mk1 ,where m is
an integer bigger than 1. (1) axb (axe ) = 1;
We say that ring R satisfies condition (A) if for all
a, b, c ∈ R, there exists a polynomial f (t1 , t2 ) as men- (2) ayb (aye ) = 1,
tioned above depending on a, b, c, such that then R is commutative.
[f (a, b), c] = 0, Proof: We only need to proof under condition (1). For
Jacobson semisimple ring is sub direct sum of primitive
where k1 and k2 = n − k1 are depending on a, b, c. rings, if R is not division ring, then there exists a sub
ring of R as the homomorphism of a full matrix ring on
Lemma 2.1 (see [2]) If R is a division ring and for division ring D1 and D2 still satisfying the conditions.
all a ∈ R, there exists a polynomial pa (a) and a positive Let
integer r(a) depending on a, such that ar(a)+1 pa (a) − a = E11 + E12 , b = E21 + E22 .
ar(a) ∈ Z(R), then R is a field.
Then we have
„ «
Lemma 2.2 If R is a division ring which satisfies con- axb axb
f (a, b) = .
dition (A), then R is a field. ayb ayb
1 supported by the Foundation of HLEC

Since axb (axe ) = 1, f (a, b) ∈
/ Z(D2 ). This is a contra- Proof: Suppose that x ∈ J(R) and 0 = y ∈ R with
diction. Thus R is a division ring. By Lemma 2.2, R is xy = 0. If x is not nilpotent, then, since RyR is non-
commutative. in R, there must exist si , ti , i = 1, 2, · · · , N1 ,
zero ideal P
Before discussing the commutativity of Kőthe such that N si ti = b5 , where b is a bone element. We
semisimple ring, we introduce the concept of B-ring also have that xp + xp+1 f (x) can be commuted with si .
If there exists a non-zero nilpotent element a in R and Thus
any non-zero ideal in R must include the power of a, b5 xp + xp+1 f (x) = 0.
then we call R a B-ring and a a bone-element of R. Noting that b5 is a regular element, we have
Lemma 2.4 (see [4]) Kőthe semisimple rings can be xp + xp+1 f (x) = xp [1 + xf (x)] = 0.
expressed as the sub direct sum of a group of B-rings.
Then
xp = 0.
Lemma 2.5 If B-ring R satisfies condition (A) and
its Jacobson radical J(R) = 0, then the bone-elements Lemma 2.7 If R is a B-ring satisfying condition (A)
in R must be regular. and J(R) = 0, then there is no zero divisor in J(R).
Proof: (1) It is easy to prove that if z is a regular ele-
Proof: If a is a bone-element in R, then there must
ment in J(R), then for all n ≥ 1, z n + z n+1 f (z) is also
exist a positive integer m such that b = am ∈ J(R). If
regular.
bx = 0, then, for b, b, x ∈ R, we have
(2) If z is a regular element, x is a zero divisor and
ˆ n+1 ˜ zx = xz, then z − x is regular. Otherwise, by Lemma
b + bn+2 p(b), x = 0.
2.6, there must ∃k > 1, such that (z − x)k = 0. Spread-
It follows that ing it we get z k = g(x, z)x. Obverse that the left side of
the equation is a regular element, while the right side
xbn+1 [1 + bp(b)] = 0. of is a zero divisor. This is a contradiction. Thus z − x
is regular.
Note that bp(b) ∈ J, we get xbn+1 = 0. Thus there ex- (3) All nilpotent elements in generate nilpotent ideal.
ists y such that by = yb = 0. If y = 0, then, by the Suppose x, y are zero divisors in J(R). If x + y is
fact that RyR is a non-zero ideal of R, there exists a regular, then, by condition (A), we have that a =
5
PN b ∈ RyR. 5Hence there exist
positive integer such that (x + y)k + (x + y)k+1 f (x + y) can be commuted with
si , ti ∈ R such that i=1 si yti = b . For b, b, s1 we x. Similarly, b = ak + ak+1 fa (a) can be commuted with
have ˆ n+1 ˜ y. Then b = (x + y)kk + (x + y)kk+1 f3 (x + y) can be
b + bn+2 f (b), s1 = 0. commuted with both x and y. For b can be expressed
Let as b = xc1 + yc2 and can be commuted with xc1 , in
c1 = bn+1 + bn+2 f1 (b). view of (2) we have b − xc1 = yc2 is regular. But yc2
is a zero divisor obviously. Contradiction! Thus x + y
Then c1 can be commuted with s1 . By condition (A) c2 must be a zero divisor.
can be commuted with s2 , where c2 = cn +cn+1 f1 (c1 ) = Furthermore, for all r ∈ R, it is obvious that rx is
bn + bn+1 f2 (b). Furthermore, c2 can be commuted with a zero divisor. Then all zero divisors in J(R) generate
s1 obviously. Similarly, cN can be commuted with an ideal in R, by Lemma 2.6, we get that the ideal is
s1 , s2 , · · · , sN . We know that cN can be expressed by nilpotent. Since R is a B-ring, there is no zero divisor
bp + bp+1 g(b) and cy = yc = 0. Hence in J(R).
!
5 ˆ p p+1 ˜ X
N Lemma 2.8 (see [5]) If R is a prime ring and I is
cN b = b + b g(b) si yti a non-zero ideal in R, then if I is commutative, R is
i=1 commutative too.
X
N
= si cN yti = 0 In the rest of this paper f (t1 , t2 ) is a polynomial satis-
i=1 fying the hypothesis of Lemma 2.3.
It follows that Lemma 2.9 If R is a Kőthe semisimple ring and for

all x ∈ R, there exists a positive integer k depending on
bp+5 [1 + bg(b)] = 0. x and a polynomial p(t) such that xk + xk+1 p(x) = 0,
then R is commutative.
Thus bs+p = 0. This is contrary to the fact that a is a
Proof : By hypothesis of the lemma, for all x ∈ R, we
bone-element. Thus y = 0, which implies that b and a
have
are regular.
f (x, x) = xk + xk+1 g(x) = 0.
Lemma 2.6 If R is a B-ring satisfying condition (A) Since x ∈ J(R), so xk = 0. This implies that J(R) is a
and J(R) = 0, then the zero divisors in J(R) must be nilpotent ideal in R. Since R a Kőthe semisimple ring,
nilpotent. J(R) = 0. By Lemma 2.3, R is commutative.

Lemma 2.10 (see [6]) If f is a polynomial in several Hence H 2 = 0 and H 2 = H. That is to say, R is a
non-commutative indeterminates x1 , x2 , · · · , xr and the sub direct irreducible ring and its heart is idempotent.
coefficients of f are co-prime, then the following asser- Thus R is a prime ring.
tions are equivalent For R is not commutative,
˘ by equation
¯ (2.1),
(1) A ring satisfying f = 0 must have a nilpotent Z(R) = 0. Let S = x|x ∈ Z(R), x = 0 Then there
commutator ideal; is no zero divisor in S.
We define an equivalent relation on R × S :
(r1 , c1 ) (r2 , c2 ) if and only if r1 c1 = r2 c2 . The addition
(2) A semiprime ring satisfying f = 0 is commuta-
and the multiplication are defined as:
tive;
(r1 , c1 ) + (r2 , c2 )
(3) For all prime integer p, there exists a matrix ring = (r1 c2 + r2 c1 , c1 c2 )
of 2 order on Zp not satisfying f = 0.
and
Lemma 2.11 If ring R satisfying condition (A) has no
zero divisor and for all x ∈ R there exists a polynomial (r1 , c1 )(r2 , c2 ) = (r1 r2 , c1 c2 )
p(t) depending on x, such that respectively. Then we get a ring for fractions S −1 R. R
is embedded into S −1 R under the map σ : r → (rc, c).
xq + xq p(x) ∈ Z(r), (2.1)
For all non-zero ideal A in S −1 R, let
where q is a positive integer, then R is commutative. ˘
I = r|r ∈ R,
Proof : If R is not commutative, by condition (A), for ¯
∃c ∈ S, s.t.(r, c) ∈ A .
all x, y ∈ R, we have
ˆ n ˜ Then I is a non-zero ideal in R. By Lemma 2.8, we
x + xn+1 p(x), y .
have is not commutative. Then there exists x ∈ I, such
If [xn , y] = 0 for all x and y, then R is a PI-ring satis- that xr(x) + xr(x)+1 ∈ S. Thus (S, S) ∈ A which means
fying [xn , y] = 0. Let a = E11 + E12 and b = E12 . Then A = S −1 R. Hence A = S −1 R is a simple ring including
[an , b] = E12 . Since for all E12 can’t be 0 for all zp , 1. Then
there must exist x1 , y1 ∈ R such that [xn 1 , y1 ] = 0. By
equation (2.1) and Lemma 2.8 and Lemma 2.9, there S −1 H = {(y, c)|y ∈ H, c ∈ S}
exists = S −1 R.
0 = b
q q+1 Therefore, for all (x, a) ∈ S −1 R, there exist y ∈ H and
= [xn n
1 , y1 ] + [x1 , y1 ] p ([xn
1 , y1 ]) ∈ Z(R). d ∈ S such that (x, a) = (y, d). Since dH is a non-zero
For bx1 , b and y1 by applying condition (A), we have ideal in R and dH ⊆ H, dH = H. This means that
h i there exists h ∈ H such that
bk xn
1 +b
k+1
φ (xn
1 , b) , y1 = 0.
y = dh = hd,
Thus
Then
[xn n
1 , y1 ] = −b [φ (x1 , b) , y1 ] . (2.2) (x, a) = (y, b) = (hd, d) ∈ σ(R).
Suppose R1 is a subring generated by xn 1 and y1 . It fol- Thus S R ∼
−1
= R. Then R is a Jacobson semisimple
lows from Zorn Lemma that there exists a maximal ring. Since R satisfies condition (A), by Lemma 2.3,
ideal M not including [xn 1 , y1 ] . Then
ˆ every ˜ non-zero R is commutative. This is a contradiction. Thus R is
ideal in R = R/M must including xn 1 , y1 . Hence R commutative.
is a non-commutative sub direct irreducible ring gener-
ated by xn 1 , y1h. Its hearti H is a principal ideal generated
ˆ ˜ ˆ n ˜
by xn 1 , y 1 . x kn
1 , y k
1 ∈ H since x1 , y 1 ∈ H. Noting 3 Main Results
that R is generated by xn
and y1 we know the commu-
1
Theorem 3.1 If R is a Kőthe semisimple ring and sat-
tator ideal C(R) is included in H. Thus H = C(R) and
isfies condition (A), then R is commutative.
then b ∈ H. By equation (2.2), we have
[xn n
1 , y1 ] = b[φ (x1 , b) , y1 ]. Proof : By Lemma 2.4, we can suppose that R is a
B-ring. If R is not commutative, then, by Lemma 2.3
Then and Lemma 2.7, we know that J(R) = 0 and J(R) is a
0 = [xn non-commutative ring which has no zero divisor. Then
1 , y1 ] ∈ H,
there exist x1 , x2 ∈ J(R) such that [x1 , x2 ] = 0. Sup-
b ∈ C(R) ∈ H, [φ (xn
1 , b) , y1 ] ∈ H. pose R1 generated by x1 and x2 is a subring in J(R).

Then there is no zero divisor in R1 . For all a, a, x1 ∈ R1 , Corollary 3.2 If R is a semiprime ring and for all
there exists a polynomial f (t1 , t2 ) such that x, y ∈ R, one of the following formulas holds, then R is
n+1 commutative.
[f (a, a), x1 ] = 0, f (a, a) = an+a p(a)
.
(1) xy − xm(x,y) y n(x,y) ∈ Z(R)[5]; (MR87i 16042)
Let a1 = f (a, a). Then we have
[f (a1 , a1 ), x2 ] = 0and [f (a1 , a1 ), x1 ] = 0. (2) xy − xm(x,y) y n(x,y) ∈ Z(R)[5]; (MR87i 16042)
Thus f (a1 , a1 ) ∈ Z(R1 ). Note that f (a1 , a1 ) can be

expressed by an + an+1 p(a). By Lemma 2.11, R1 is (3) (xy)n(x,y) − yx ∈ Z(R)[10]; (MR86e 16041)
commutative. This is a contradiction. Thus J(R) and
R are commutative. (4) (xy)n(x,y) − yx ∈ Z(R)[10]; (MR86e 16041)
Corollary 3.1 If R is a Kőthe semisimple ring and for
all a, b ∈ R, there exist a positive integer K = K(a, b), (5) (xm y)n − xm y ∈ Z(R).[11]
a polynomial fx (x, y) and a polynomial with integral co-
efficients φx (x, y) such that
abk − fx (a, b)φx (a, b ∈ Z(R),
References
where fx (x, y) includes the term x2 and the degree of y [1] Herstein I. N. Two remarks on the commutativity
is n = n(a, b) ( K), then R is commutative[7]. of rings, canad. J. Math 7(1955), 411-412.
[2] Herstein I. N. The strcture of a certain class of
Lemma 3.1 If R is a non-commutative prime ring sat- rings, Amer. J. Math. 75(1953), 864-871.
isfying condition (A), then there is no zero divisor in
R. [3] Fu Changlin, Commutative conditions, for rings
with f (x, y) ∈ C, North Math J ,7(2)(1991)206-
Proof : We will prove the lemma under the following 208.
two cases:
(1) If the nilpotent index of the nilpotent elements in R [4] Wang Xianghao, About Kőthe semisimple rings,
is less than n, then if there is no one-side nilpotent ideal Journal of Northeast Public University (Natural
in R, R is a Kőthe semisimple ring. By Theorem 3.1, Science Edition) 1(1955),143-147.
R is commutative. Otherwise, if there exists a one-side [5] Liu Zeyi, Some commutative conditions in as-
nilpotent ideal in R, in view of [8], we can assert there sociate rings. Acta. Sci. Natur. Univ. Jilin.1986,
exists a nilpotent ideal in R, which is contradictive to No.144-54.
the fact that R is semiprime.
[6] Kezlan.T.P, A note on commutativity of semiprime
(2) If there exists a ∈ R such that ˆ a˜ = 0 and
m
PI-rings, Math Japon, 27(1982), 267-268.
am+1 = 0, where m + 1 > n. Let k = m n
. Then
[7] Dai Yuejin, Commutative conditions in some rings,
kn m (k + 1)n. J.of Math.(PRC). 14(3)(1991), 431-434.
ˆLet b =˜ ak . Then for b, b, y, by condition (A) we have [8] Xie Bangjie, Abstract Algebra, Shanghai Sience
a , y = 0. This shows that akn ∈ Z(R), since there
kn
and Technology Publishing Company, 1982.
is no zero divisor in the center of prime ring, we get
akn = 0. This is contradictive to the hypothesis that [9] Qiu Qizhang, Sufficient condition about com-
am = 0. Thus there is no non-zero nilpotent in R, and, mutativity of semisimple ring, J.of Math.(PRC).
in view of [9], R has no zero divisors. 2(3)(1982), 291-301.
By Lemma 2.11 and Lemma 3.1, we have the fol- [10] Zhu Xiaozhang, Some commutativity conditions
lowing theorem. of baer-semi-simple rings, Acta. Sci. Natur. Univ.
Jilin. 3(1984), 27-34.
Theorem 3.2 If R is a semiprime ring satisfying con-
dition (A), f (t1 , t2 ) satisfies the hypothesis of Lemma [11] Guo Xiuzhan, Two commutativity results for
2.3 and n is a fixed positive integer, then R is commu- semiprime rings, J.Math, Res, Exposition,
tative. 11(3)(1991), 575-578.

Local-Bandwidth Mean Shift Segmentation of MR

Images Using Nonlinear Diffusion
Dong Huang, Huizhong Qiu and Zhang Yi
Computational Intelligence Laboratory, School of Computer Science and Engineering
University of Electronic Science and Technology of China
Chengdu 610054, P. R. China
E-mail: {donnyhuang, hzqiu,zhangyi}@uestc.edu.cn
Abstract— This paper proposes a data-driven tissue segmen- Thus it’s natural to expect that the regions near boundaries be
tation method for magnetic resonance (MR) brain images using analyzed at finer scales, while other regions can be assigned
nonlinear diffusion and mean-shift algorithm in the joint spatial- with coarser scales. For the image data processing (non-
range domain. The quality of tissue segmentation is improved
by assigning local bandwidth to image pixels for the mean- gaussian distribution) considered above, automatic image-
shift algorithm during the anti-geometric diffusion process. based local scale selection is more suitable.
Experiment results show the good performance of the proposed In this paper, we solve this problem automatically using
method comparing to the traditional fixed-bandwidth mean shift
segmentation. anti-geometric nonlinear diffusion [5]. Instead of adaptive
thresholding method in [5], we use the efficient mean shift
Index Terms— Local Bandwidth, Mean Shift, Image Segmen- algorithm to partition the image into homogenous regions
tation, MR Images, Nonlinear Diffusion
considering both spatial and range similarities. The images
obtained in the anti-geometric diffusion process are used to
I. I NTRODUCTION determine proper local bandwidth for the mean shift segmen-
tation. This approach greatly improve the segmentation quality
Segmentation of the brain structure of magnetic resonance
of tissue regions with different sizes and contrast comparing
imaging (MRI) is an active research topic over a decade. MRI
to the traditional mean shift algorithm.
segmentation is an important image processing step to identify
anatomical areas of interest for diagnosis, the study of lesion The rest of this paper is organized as follows. Section
and disorders, treatment, surgical planning, image registration II introduces some preliminaries of mean shift process and
and functional mapping. discuss its properties in image segmentation. In section III, we
The goal of the segmentation procedure in medical appli- discuss the method of adaptively assigning local bandwidth
cations is to partition an image into significant anatomical for the traditional fixed bandwidth mean shift. Section IV
regions. Various methods have been proposed for this problem. presents the implementation details of the proposed method.
Among them, mean shift algorithm ([1][2]) is one of the Finally, results on both synthesis example and real MR image
most efficient tools. However, in the traditional mean shift segmentation are given in Section V.
segmentation the radius of the kernel window is constant
throughout the image. Local structures, especially the noisy
gray matter regions and tissue boundaries, are not well pre-
served. This leaves difficult work to the following region-
II. M EAN -S HIFT S EGMENTATION
fusion and tissue classification process. Traditional mean shift
algorithm presents no solution to this problem itself. In [3]
and [4] variable bandwidth selection methods are used, where
bandwidth selection is carried out by estimating fixed band- This section first introduces mean shift segmentaion in
width mean-shift using linearly (or logistically) spaced scale the joint spatial-range domain and discusses the how the
parameters, and checking the clustering stability measured by bandwidth parameter affects the boundary preserving property.
Jensen-Shannon divergence. In addition, these methods are Denote X = {xi ∈ Rd , i = 1, · · · , N } as a data set in the
based on the assumption of local gaussian distribution of high d-dimensional space Rd . The kernel density estimator with
dimensional data space. Gaussian kernel and a symmetric positive definite bandwidth
Note in MRI, the tissue regions are characterized by pixel matrix H, computed at the point x is given by
intensities originated from spatially differentiated MR signals.
And gray matter (GM), white matter (WM), and cerebrospinal
fluid (CSF) regions have practically uniform image intensities 1 N 1
2
fˆ(x) = exp − d (x, xi , H) ,
but are of different size and shape throughout the whole brain. N
2πH
1/2 i=1 2
Grant 60471055 and Specialized Research Fund for the Doctoral Program of
Higher Education under Grant 20040614017. where d2 (x, xi , H) = (x − xi )T H−1 (x − xi ) is the Maha-
774
lanobis distance. Mean shift vector can be computed as starting from the same locations/ in the image, converge to
1 different modes (denote by ” ”) in the joint spatial-range

N
xi exp − d2 (x, xi , H) domain using different “global” bandwidth parameters. The
i=1
2 resulted mean shift segmentations are displayed in Fig. 2 (e)-
m(x) = −x

N 1 (f). It can be seen that the smaller plate in the bottom-right
exp − d2 (x, xi , H) corner of the original image fail to be segmented as a separate
i=1
2 patch in Fig. 2 (e). Meanwhile in Fig. 2 (f) the smaller plate
∇fˆ(x) is successfully segmented, but trajectories starting from pixels
= H . of the bigger plate are attracted to large numbers of separated
fˆ(x)
modes. The attraction basins of these modes are difficult to be
Assume now that the data points xi (i = 1 · · · N ) are the fused into a meaningful region because, practically, distances
generalized pixels of the input image. The joint spatial-range among them are too large compared to their kernel bandwidth
representation of the image can be formulated as xi = parameters.
(s) (r) (s)
((xi )T , (xi )T )T , i = 1 · · · N , where xi is the spatial
(r)
coordinates, and xi is the range information. r = 1 for
gray level image, r = 3 for color images, or r > 3 for the
multispectral case. Here we will assume that the bandwidth
matrix is diagonal for both spatial and range parts, i.e., H =
2 2
diag{σ(s) , σ(r) }.. Using these notations, the mean shift vector
can be expressed as
m(x) = −x +
N
x(s) − x(s)
2
x(r) − xi
2
(r)
xi exp − 2
i
− 2
(a) (b)
i=1
2σ(s) 2σ(r)
.

N
x(s) − x(s)
2
x(r) − xi
2
(r)
exp − 2
i
− 2
i=1
2σ(s) 2σ(r)
Observe that the translation of the kernel according to the
mean shift vector leads to a local mode of the density. By
running the procedure for all j (= 1, · · · , N ), each data point
is associated to a local mode in the joint spatial-range domain.
After the mean shift process for all the pixels, the gray level of (c) (d)
each mode is assigned to all pixels that converge to this mode
(pixels referred to as attraction basin), the image is segmented
as homogeneous patches. Bandwidth plays a very important
role in this process. Inappropriate bandwidth may fail to lead
the mean shift trajectories to converge to significant objects or
regions in the image.
In traditional mean shift algorithm, a fixed bandwidth matrix
is used for all data points. And by varying the bandwidth
parameters, we have a multi-scale representation of the image. (e) (f)
Fig.1 shows a synthesis image with patches of different sizes Fig. 2. The performance of the fixed bandwidth mean shift segmentation on
and contrast. Fig.2 shows the behavior of the mean shift algo- the synthesis image. (a) and (c) show mean shift trajectories of some pixels
in the upper-left white square in Fig. 1 (σ(s) = 4, σ(r) = 30) and (e) the
segmentation result; (b) and (d) show mean shift trajectories of some pixels
in the bottom-right white square in Fig. 1 (σ(s) = 6, σ(r) = 50) and (f) the
segmentation result.
In medical image segmentation, one wants to segment tissue

regions of different contrast and sizes in the present of strong
noise. Traditional fixed bandwidth mean shift can not do a
good job. To cluster pixels into significant tissue regions, it’s
expected that a set of appropriate local bandwidth parameters
Fig. 1. A synthesis image (30 × 30) with significant patches of different
sizes and contrast. is assigned to these regions. Manual bandwidth selection is
difficult. And presently there’s no efficient method to auto-
rithm on the image in Fig. 1 with different “global” bandwidth matically assign local bandwidth for image segmentation.
parameters. In Fig.2 ((a)-(b) and (c)-(d) are the regions in the Mean shift segmentations with fixed bandwidth of a MR
white squares in Fig. 1 respectively), the mean shift trajectories image of human brain is shown (Fig 3). Fig.3(b) results in
775
many trivial patches. Since pixels are spaced in uniform grid, where Ix ,Iy are the first derivatives of the image along axis
mean shift process with constant bandwidth may stop in large x and y respectively, and ∇I = (Ix , Iy ) is the local spatial
coherent regions due to the round-off effect. However, in Fig.3 gradient of the image.
(c), patches are more homogeneous within tissue boundary, The linear scale-space
/ → representation / of the image is
but some boundaries are not well preserved. In other word, I(−→
x , t) = I(− →
x ) G(− x , t), where is the convolution
operator, and G(− → 1 −
→ − 2
mean shift processes with fixed bandwidth throughout the x , t) = 4πt e x
/4t denotes the gaussian
whole image (Fig.3(b) and (c)) can not obtain homogenous kernel). The linear scale-space representation satisfies the
segmented regions of different sizes simultaneously. linear heat diffusion equation
∂I(− →x , t)
= div(∇I)
∂t
= Iξξ + Iηη ,
where I(− →x , 0) = I(−

→
x ). Keeping only the tangential diffusion
yields the well-known anisotropic geometric heat flow [8],
which diffuses along the boundaries of image features but
not across them. Anisotropic diffusion and the use of partial
differential equations for image restoration and denoising has
(a) (b) become a rather mature field. Mean Curvature Motion and
Total Variation (TV) Flow are of such nonlinear diffusion.
These model are well known for its ability to denoise images
while maintaining sharp edges. These models are actually
filtering the image [6]. And segmentation of the MR images
in to tissue regions are generally done by threshold method
[5] or linking through scale [7].
Note that image diffusion under these flow presents no
reliable information to decide the local scale of homogeneous
(c) regions, in which case diffusion across the image edges is
Fig. 3. Fixed-bandwidth mean shift segmentations of a MR image of human preferred. If, instead, we omit the diffusion term along the
brain (60 × 60). (a) is the original image; (b) the segmented result with tangential direction and keep the term along the normal
σ(s) = 4, σ(r) = 15; (c) the segmented result with σ(s) = 8, σ(r) = 30. diffusion, the complementary diffusion model of the geometric
heat flow is obtained, which is referred to as the anti-geometric
heat flow [5]:
III. O BTAIN L OCAL BANDWIDTH THROUGH Ix2 Ixx + 2Ix Iy Ixy + Iy2 Iyy
∂I
A NTI -G EOMETRIC H EAT F LOW = , (1)
∂t Ix2 + Iy2
In this section we discuss the properties of Anti-Geometric
Diffusion, then present the method on how to obtain local where I(t) is the diffused image (t ∈ R+ ), and I(0) is the
bandwidth through the diffusion process. original image.
The nonlinear diffusion filtering is designed to utilize locally Next, we discuss how to use the anti-geometric diffusion
geometric properties to achieve anisotropic changes in images. process described above to adaptively assign local bandwidth
For 2-D image I(− →
x ), −
→
x = (x, y), x and y denote the horizon for each pixel. Fig 5 (a)-(d) show the diffused images and
and vertical directions respectively. Let η be the direction intensity changes from the original image through the diffusion
orthogonal to the local gradient, and ξ be the tangent direction process. Fig 5(a)-(d) are difference images (normalized to
(see Fig. 4). [0,1] for display purpose) between the diffused images and
the original image corresponding to Fig 5 (a)-(d).
Early in the diffusion process, only the intensity values
of pixels near object boundaries change significantly. As the
diffusion proceeds, its more global averaging effects spread
intensity changes of pixels further away. If we wait long
enough for diffused intensities of pixels far away from object
boundaries to differ from their original intensities, then inten-
sities near boundaries of smaller features may dissolve into
Fig. 4. The local coordinate system at the image boundary. nearby regions due to the prolonged diffusion. So we propose
to assign bandwidth parameter to a given pixel as soon as
The local orthogonal coordinate system (η, ξ) can be written it meet the predefined criteria (i.e., the diffused and original
as: intensities differ significantly), and maintain this assignment
(Ix , Iy ) (−Iy , Ix ) as the diffusion proceeds, then run the diffusion as long as
η= , ξ= ,

∇I

∇I
necessary to assign pixels far away from region boundaries.
776
(a) (e) (a) (b)
(b) (f) (c) (d)

Fig. 6. The process of local bandwidth assignment. (a)-(d) show bandwidth
assignment process with pixel gray level displayed here corresponding to
relative values of local bandwidth. (normalized to [0,1])
Fig. 6 (a)-(d) show the bandwidth assignment process.

Assignment first takes place in regions near boundaries, then
spreads to inner regions. The gray scale of image displayed
in Fig. 6 is proportional to the relative value of the assigned
(c) (g) bandwidth. It is important that not every pixel gets classified
at the same time. Once a a pixel is classified, it continues
to diffuse, but is never re-assigned. This process is stopped
when bandwidth of all interested pixels in the MR image are
assigned. To keep the analysis resolution in the range domain
consistent with that in spatial domain, the bandwidth in the
range domain σ(r) are calculated from σ(s) using the method
in [9].
To sum up, the proposed method consists of two major parts:
(d) (h) bandwidth assignment and mean shift segmentation. First, the
Fig. 5. The anti-geometric diffusion process. (a)-(d) are images in diffusion image values at each pixel location are actually diffused for
process (at t= 1, 4, 20, 60 respectively); (e)-(h) are corresponding difference one time step t to yield new diffused image values. Second,
images (normalized to [0,1]) between the diffused images and the original
image.
Check the monotonicity of the pixels changes, then assign the
local bandwidth H to the pixels according to scale parameter t
2
(t = 2σ(s) , and σ(r) is calculated from σ(s) using the method
In this manner, we are utilizing entire family of images in [9] ), when specified criteria is met. Once a pixel has been
generated by the anisotropic diffusion model. This method assigned its bandwidth, its classification is maintained. The
is effective because pixels in regions with high details (i.e., authors preferred stop condition is that all pixels in the Region
boundaries) change intensity relatively quickly during diffu- Of Interest (ROI) in the image have been assigned. Run local
sion, are therefore assigned at finer scales. Pixels in low-detail bandwidth mean shift on the original image, relate all pixels to
regions change intensity slowly, are assigned much later, and the modes. Merge the attraction basins where the distances of
are therefore classified at coarser scales. their modes in the joint spatial-range domain are smaller than
To increase robustness of bandwidth assignment to noise, the windows specified by the local bandwidth parameters.
one should check the intensity changes of a pixel after the
diffusion is monotonic at that pixel. Note that an isolated bright IV. E XPERIMENTAL R ESULTS
pixel of noise on a dark background (but near the boundary of In this section, the proposed method is applied on both
a large bright region) will initially decrease by a huge amount synthesis and real MR brain images.
as the noise is smoothed away. Soon, the pixels intensity will Fig.7 shows the results of the bandwidth assignment and
steadily increasing due to the diffusion of the large bright the local bandwidth mean shift algorithm on the synthesis
region nearby. Thus, assignment based upon absolute intensity image mentioned in Fig.1. The set of bandwidth parameters
changes would erroneously classify the noise. (Fig.7 (a)) is adaptively assigned during the anti-geometric
777
Analysis at Massachusetts General Hospital and are available
at http://www.cma.mgh.harvard.edu/ibsr/. The results of our
method is compared to fixed bandwidth mean-shift and manual
segmentation by experts.
Fig 8 (a) is the original image, and Fig 8 (f) is the
manual segmentation by expert obtained from the Center for
Morphometric Analysis at Massachusetts General Hospital.
We first segment the MR image using is the fixed bandwidth
(a) (b) mean-shift (σ(s) = 8, σ(r) = 30) (Fig 8 (b)). Notice that
Fig. 7. The performance of the proposed method on synthesis image in Fig. despite favorable boundary preserving ability in most large
1. (a) the bandwidth obtained by anti-geometric diffusion (normalized to [0,1] area of white matter, the tissue boundaries in in the white
for display purpose); (b) the final segmentation result of the proposed method. circle are destroyed because size of the local regions are not
considered in the fixed bandwidth mean shift process. Next,
the local bandwidth set is assigned for the image during the
diffusion process. After local bandwidth mean-shift and fusion
anti-geometric diffusion process (see Fig 8 (c)). And the mean-
of the attraction basins according to the bandwidth, the final
shift with adaptive local bandwidth is applied to segment the
segmentation result of the proposed method is shown in Fig.7
image (See Fig 8 (d)). The final segmentation result (Fig 8(e))
(b). It can be seen that all the significant regions (the plates
is obtained by fusing the attraction domain of the mean shift
and rings) are segmented using the local bandwidth mean shift.
modes within the range of their local bandwidth. Compared
to the fixed bandwidth mean-shift (Fig 8 (b)), Fig 8 (e) shows
the improved performance of the proposed method.
V. C ONCLUSION
In this paper, we proposed a data-driven tissue segmen-
tation method based on the mean shift algorithm, in which
local bandwidth parameters are automatically chosen using
anti-geometric diffusion. The incorporation of local image
information enables our method to partition the image into
homogenous regions of various sizes and contrast. In the
(a) (b)
future work, if the shape information is considered during
the parameter assignment process, our method may also be
developed into anisotropic versions. And we believe these
variations will further improve the segmentation performance
on asymmetry image regions.
R EFERENCES
[1] D. Comaniciu, P. Meer, “Mean Shift: A Robust Approach Toward
Feature Space Analysis”, IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 24, no.5, pp. 603-619, 2002.
(c) (d) [2] J. R. Jimnez-Alaniz, V. Medina-Banuelos, O. Yanez-Suarez, “Data-
Driven Brain MRI Segmentation Supported on Edge Confidence and
A Priori Tissue Information”, IEEE Trans. Medical Imaging, vol. 25,
no.1, pp 74-83, 2006
[3] D. Comaniciu, “An Algorithm for Data-Driven Bandwidth Selection”,
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no.2,
pp. 281-288, 2003
[4] K. Okada, D. Comaniciu, and A. Krishnan, “Robust Anisotropic Gaus-
sian Fitting for Volumetric Characterization of Pulmonary Nodules in
Multislice CT”, IEEE Trans. Medical Imaging, vol. 24, no.3, pp 409-
423, 2005
[5] S. Manay, A. Yezzi, “Anti-Geometric Diffusion for Adaptive Thresh-
(e) (f) olding and Fast Segmentation”, IEEE Trans. Image Processing, vol. 12,
Fig. 8. The performance of the proposed method. (a) is the original image; no.11, pp 1310-1323, 2003
(b) the fixed bandwidth mean-shift segmentation (σ(s) = 8, σ(r) = 30); (c) [6] G. Gerig, R. Kikinis, O. Kubler and F.A. Jolesz , “Nonlinear Anisotropic
the bandwidth obtained by anti-geometric diffusion (normalized to [0,1] for Filtering of MRI Data”, IEEE Trans. Medical Imaging, vol. 11, no.2,
display purpose); (d) the local bandwidth mean-shift segmentation; (e) the pp 221-232, 1992
final segmentation result from (d); (f) the manual segmentation by expert. [7] A. Petrovic, O. D. Escoda and P. Vandergheynst, “Multiresolution
Segmentation of Natural Images: From Linear to Nonlinear Scale-Space
Representations”, IEEE Trans. Image Processing, vol. 13, no.8, pp 1104-
Next, we show the performance of the proposed approach 1114, 2004
on the real MR brain image. Here, T1-weighted brain scan [8] P. Perona, J. Malik, “Scale-space and edge detection using anisotropic
diffusion”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.
of a male subject is segmented using the local bandwidth 12, no.7, pp 629-639, 1990
mean shift algorithm. The MR brain data set and their manual [9] J. Wang, B. Thiesson, Y. Xu, M. Cohen, “Image and Video Segmentation
segmentation were provided by the Center for Morphometric by Anisotropic Kernel Mean Shift”, ECCV, 2004
778
On the Central Limit Theorem of Markov Chains in Markovian

Environments for ϕ∗ -mixing Stochastic Sequences ∗
Chen Neiping1 Yang Gang1,2
1.Information Department,Hunan Business College,Changsha,410205,China
2.School of mathematical science and computing technology,CSU,Changsha,410075,China
Abstract A sufficient condition on Markov of stochastic process by virtue of the ϕ∗ −

chains in Markovian environments being ϕ∗ − mix- mixing property.
ing is given, and the functional central limit the- Let {ξn }∞
0 be an arbitrary stochastic
orem related to this process is established. It is sequence. For simplicity, we write ξ =
∞ m
a very important theory that is widely applied to {ξn }∞ m
0 = ξ0 ,ξk = {ξn }k . We call ξ = ξ0
∞
speech recognition based on Markov model and is ϕ∗ − mixing, if there exists a nonnega-
Artificial neutral network recently. tive real valued function ϕ(k), k ≥ 0 such
KeyWords: ϕ∗ − mixing; Central Limit Theo-
that
rem; Markov Chains in Markovian Environments
(1)limk→∞ ϕ(k) = 0,
(2)For ∀k, n ≥ 0, ∀A ∈ σ(ξ0k ), B ∈
2000 MR Subject Number: 37A30 ∞
σ(ξk+n ), we have
|P (AB) − P (A)P (B)| ≤ ϕ(n)P (A)P (B)

1 Introduction
In this paper we study the ϕ∗ − mixing
The central limit issue of stochastic pro- property of a stochastic process on Markov
cess is very important in probability the- chains in Markovian environments, and
ory.The most classical research approach obtain some related central limit results.
to study the central limit issue is by virtue Suppose that (X, D) and (Θ, Σ) are two
of the ϕ− mixing property of stochastic arbitrary measurable spaces, {P (θ), θ ∈
process. Since a stochastic process is ϕ∗ − Θ} is a transition probability family on
mixing, evidently, it is ϕ− mixing. Hence (X, D), and, ∀A ∈D, P (θ; x, A) is Σ × D-
it is natural to study the central limit issue measurable. X = {Xn }n≥0 is a stochas-
∗ This work is supported by the Soft Science Foundation of Hunan (No.: 2006ZK3028)and Education Organization
Foundation of Hunan(No.: 05C562), China.

tic sequence taking values in X, and so is Based on the tilting process[3] and
ξ = {ξn }n=−∞ in Θ. If X
∞ and ξ satisfy the ϕ − mixing property, Cogburn[2] studied
∗
following conditions: for ∀A ∈ D,n ≥ 0, it the related central limit issue of Markov
follows that process in stationary environments. While
= P (ξn ; Xn , A), it is based on the Markov bichain and
P (Xn+1 ∈ A|X0 , ···, Xn; ξ)
ϕ∗ − mixing property in this paper that
a.e. we carry on our research work. That is to
say, we set X on Markov bichain (X,
ξ),
P (X0 ∈ A|ξ) = P (X0 ∈ A|ξ0 ), a.e.
−∞
and give a sufficient condition as that X

then we call X a Markov chain in ran- is ϕ∗ − mixing by Markov bichain (X,
ξ).

dom environments ξ, and ξ is a random And then some central limit results of X

environmental sequence. Especially, if ξ are obtained. Therefore, we give different
is a Markov process, then X is said to
sufficient condition satisfying central limit
be a Markov Chain in Markovian environ- theorem even if the Markovian condition
ments. At the same time (X, ξ) is Marko-
were added to the environmental process
vian as a bivariate process, so we call it in [2]. It should be noted that more and
Markov bichain [1]. more people in the fields of engeering focus
In this paper we only study the case on the theoretical application to speech
in which the state space X and environ- recognition based on Markov model and
mental space Θ are all countable. For Artificial neutral network recently[6].
convenience, we suppose that X and Θ In section 2 we will present the main
are nonnegative integral sets. In this results and proofs.
case we denote the transition probabil-
ity of Markovian environmental sequence
ξ by K(θ, θ ), θ, θ ∈ Θ, and of Markov 2 Main results and proofs

bichain (X, by Q(·, ·). Then for ∀θ, θ ∈

ξ)
Θ, x, y ∈ X , it follows that In this section we will give a sufficient
condition under with a Markov chain in
Q(θ, x; θ , y) = K(θ, θ ) · P (θ; x, y)
Markovian environments is ϕ∗ − mixing. It
This formula sets up the relation- follows that a sufficient condition that this
ship among the transition probability of process satisfies central limit theorem can
Markov bichain, environmental transition be obtained.
probability and transition function family, Lemma 1 : If environmental se-
which makes it possible to study Markov quence ξ is a Markov Chain taking values
chain X in Markovian environments em- in Θ ,and X is a stochastic sequence taking
ploying Markov bichain (X, It is the
ξ). values in X, then the sufficient and neces-
basic idea in this paper. sary condition that X is a Markov chain

in Markovian Environments ξ is bivariate (ηn )n≥0 is a stationary and ergodic Markov
process (X, = {Xn , ξn }∞ is a Markov
ξ) chains with stationary distribution π(·),
0
chain with transition probability Q(·, ·), and transition probability P (·, ·). If there
and exits a constant number λ, 0 < λ < 1 such
that the following inequality holds
Q(θ, x; θ , y) = K(θ, θ ) · P (θ; x, y),
P (x, y) − π(y)

sup ≤λ (3)
θ, θ ∈ Θ, x, y ∈ X. π(y)
x,y∈X
Cogburn [1] presented the conclusion
in Lemma 1 without proof. The details for then (ηn )n≥0 is ϕ∗ − mixing, and there ex-
the proof can be found in [5].This lemma ists a constant number C > 0, such that
set up the relationship among the Markov ϕ(n) = C · λn .
To prove Markov bichain (X, ξ) is ϕ∗ −
bichain transition probability, the envi-
ronmental transition probability and the mixing, we only need to show that its
transition function family, hence it sup- transition probability P (·, ·) satisfies for-
plied with key evidence for us to study mula (3). Since supω∈Θ K(ω, θ) ≤ (1 +
the Markov Chains in Markovian Envi- α) inf ω∈Θ K(ω, θ), supω,x P (ω; x, y) ≤ (1 +
ronments boiling down to studying of the β) inf ω,x P (ω; x, y), hence

Markov bichain.
sup K(ω, θ) · sup P (ω; x, y)
Lemma 2 : Suppose that Markov ω ω
bichain (X, is a stationary ergodic chain

ξ) ≤ (1 + α)((1 + β)) inf K(ω, θ) · inf P (ω; x, y)
ω ω
with stationary distribution π(·), if there
i.e.
exist two real numbers α, β, 0 < α, β < 1
, and λ = β + α(1 + β) < 1 such that, for sup Q(ω, x; θ, y) ≤ (1 + λ) inf Q(ω, x; θ, y)
ω,x ω,x
∀θ ∈ Θ, ∀y ∈ X
Moreover since π(·) is a stationary dis-
sup K(ω, θ) ≤ (1 + α) inf K(ω, θ) (1)
ω∈Θ ω∈Θ tribution of Q(·, ·), for ∀(θ, y), it follows
that inf ω,x Q(ω, x; θ, y) ≤ π{(θ, y)} ≤
sup P (ω; x, y) supω,x Q(ω, x; θ, y), so
ω∈X,ω∈Θ

≤ (1 + β) inf P (ω; x, y) (2) Q(ω, x; θ, y) − π{(θ, y)}
ω∈X,ω∈Θ sup
π{(θ, y)}
(ω,x),(θ,y)
∗
then Markov bichain (X, ξ) is ϕ − mix-
supω,x Q(ω, x; θ, y) − inf ω,x Q(ω, x; θ, y)
. And there exists a constant
ing, so is X ≤ sup

n (ω,x),(θ,y) inf (ω,x) Q(ω, x; θ, y)
number , such that ϕ(n) = C · λ .
≤λ
Proof. By the definition of ϕ∗ −
mixing and the stationary and ergodic of Hence transition probability Q(·, ·) of the
Markov chains we can obtain the follow- Markov bichain satisfies formula (3),so
ing facts without difficulty: suppose that (X, is ϕ∗ − mixing.
ξ)

In addition, by the definition of ϕ∗ − real number α, β, 0 < α, β < 1, and
mixing we can deduce directly the follow- λ = β + α(1 − β) < 1 such that, for
ing facts: if a stochastic process (ηn )n≥0 is ∀θ ∈ Θ, ∀y ∈ X
ϕ∗ − mixing and f (·) is an arbitrary mea-
sup K(ω, θ) ≤ (1 + α) inf K(ω, θ) (4)
surable mapping, let yn = f (ηn ), n ≥ 0 , ω∈Θ ω∈Θ
then stochastic process (yn )n≥0 is also ϕ∗ − sup P (ω; x, y) ≤ (1+β) inf P (ω; x, y)
ω∈X,ω∈Θ
mixing. By setting f (x, θ) = x, θ ∈ Θ, x ∈ ω∈X,ω∈Θ
(5)
X evidently, f (·) is a measurable mapping, [nt]
= f ((X, ,it follows that X
ξ)) is LetYn (t) = √1nσ Σk=0 (Xk − EX0 ), 0 ≤ t ≤
and X
D
ϕ∗ − mixing and ϕ(n) = C · λn , C > 0. 1, ω ∈ Ω, then Yn → W . where σ 2 =

This ends the proof of this lemma. E(X0 −EX0 )2 +2 ∞ k=1 E(X0 −EX0 )(Xk −
EXk ).
Lemma 3 Suppose that station-
Proof By the definitions of ϕ∗ − mix-
ary process (ηn )n≥0 is ϕ−mixing and
1 ing and ϕ− mixing it is easy to find that
n ϕ(n) < ∞, Eη0 = 0, Eη0 < ∞. Let
2
2
[nt] a ϕ∗ − mixing process must be ϕ− mixing.
Yn (t, ω) = √1nσ k=0 ηk (ω),0 ≤ t ≤ 1, ω ∈
It is not difficult to verify X satisfies the
Ω. If σ 2 = Eη02 + 2 ∞ k=1 E(η0 ηk ) > 0
D condition of Lemma 3 by the condition of
, then Yn → W .Where [x] is the great-
this theorem and the conclusion of Lemma
est integer less than or equal to real num-
D 2.Using the Lemma 3 the proof is immedi-
ber x,W is a Wiener measure, → and de-
ately done.
notes converge in distribution, Ω is a sam-
We should note that the hypothesis
ple space.
(4) and (5) in the theorem seem stronger,
The above lemma directly come from while the hypothesis acting on the tran-
Theorem 20.1 in the literature[4].By appli- sition probability of environment ξ and
cation of this lemma, we obtain immedi- the transition function family of X respec-
ately the following central limit theorem tively are very concise to verify, most im-
of Markov chains in Markovian environ- portant of all, such process classes satis-
ments. fying the hypothesis exist out of question.
Theorem X is a Markov chain in the We take for a simple example to illustrate.
Markovian environments ξ, and the tran- Example Let Θ = {1, 2, 3, · ·
sition probability of ξ is K(θ, θ ), θ, θ ∈

·}, K(i, j), i, j ∈ Θ , is a transition prob-
Θ.The transition function family of X ability on Θ and satisfies the following

is P (θ; x, y), θ ∈ Θ, x, y ∈ X. Sup- conditions: K(i, j) = K(i , j), ∀i, j , j ∈
pose that transition probability matrix Θ. Suppose that X is an arbitrary finite

Q(θ, θ ) of the Markov bichain (X,
ξ) set, and P (x, y), x, y ∈ X is a transition
has stationary distribution π(·), and probability on X satisfying the following
2
i j j π({(i, j)}) < ∞. If there exist conditions:A, B are all constant, 0 < A ≤

P (x, y) ≤ B < 1, ∀x, y ∈ X, β = B−A A
< Markovian Environments, The An-
1. Define a transition probability family nals of Probability, 8(5)(1980), 908-
{P (θ, ·, ·) : θ ∈ Θ} on X by P (i; x, y) = 916.
P (i) (x, y),i ∈ Θ, x, y ∈ X. Where
[2] Cogburn R., On the Central Limit
P (x, y) =
(i)
z P (x, z)P
(i−1)
(z, y), i ≥
Theorem for Markov Chains in Ran-
1. By the property of P (x, y), x, y ∈
dom Environments, The Annals of
X, we see that transition function family
Probability, 19(2)(1991), 587-604.
{P (θ, ·, ·) : θ ∈ Θ} has following property:
A ≤ P (i; x, y) ≤ B, ∀i ∈ Θ, x, y ∈ X [3] Cogburn R., The Ergodic theory of

Markov Chains in Random Environ-
Let β = (B − A)/A, 0 < α < (1 − β)/(1 + ments, Probability Theory and Re-
β), then it follows that K(i, j), i, j ∈ Θ lated Fields, 66(1)(1984),109-128.
and transition function family {P (θ, ·, ·) :
θ ∈ Θ} satisfy the following formula: [4] Billingsley P., Convergence of Prob-
ability Measures, New York:Wiley.
sup K(i, j) ≤ (1 + α) inf K(i, j), ∀j ∈ Θ (1968).
i∈Θ i∈Θ
sup P (i; x, y) ≤ (1+β) inf P (i; x, y) [5] Wang Hanxing, The Exponential Esti-
i∈Θ,x∈X i∈Θ,x∈X
mation of Probability on Small Cylin-
∀y ∈ X. Hence they satisfy the hypoth-
drical set for Markov Chain Ran-
esis (4) and (5) of the above theorem. If
dom Environment, Acta Sci Nat Univ
environmental space and state space are fi-
Norm Hunan, 18(3)(1995),(in Chi-
nite, naturally the hypothesis of the above
nese).
theorem can be satisfied easily.
[6] Chen Jiguang ,ZhuLingde, Markov
References Model for the Deformation Forecast
Based on Artificial Neural Network,
[1] Cogburn R., Markov Chains in Ran- Computer Engineering and Applica-
dom Environments: the case of tions, 6(2006),225-226,(in Chinese).

DCDIS A Supplement,
Dynamics Advances in Discrete
of Continuous, Neural Networks, Vol. 14(S2) 784--787
and Impulsive Systems
Lower Bounds and Existence Conditions of the Solution for the

Perturbation Generalized Lyapunov Equations1
Dong-Yan Chen Ling Hou2 Jun-Fang An
AMS subject classifications: 34K35,34H05,49J25,
Abstract: In this paper, concept of the perturbation

generalized Lyapunov equations is introduced, some es-
timation of lower bounds of the symmetric positive
definite matrix solution for perturbation generalized 2 Lower bounds of the solu-
Lyapunov equations are presented, and several lower
bounds of matrix solution are given for a class of un- tion matrix for PGLE
certainty structure. At the same time, some existence
conditions of symmetric positive definite matrix solu- Define the region Ω of the complex plane as
tion are shown for perturbation generalized Lyapunov
equations.
Ω = {(x, y)|β0 + β1 x + β2 y < 0}
where β0 ,β1 ,β2 ∈ R and β12 + β22 = 0
1 Introduction
It is well known that stability is an important charac- Then we have the following result for the problem
teristic of controlled systems and the most fundamen- of root clustering.
tal requirement for controlled system design. There-
fore, during the past several decades stability analysis Lemma 1[10] All eigenvalues of A ∈ Rn×n are
for linear system has been a hot topic and various ap- located inside Ω if and only if for any given positive
proaches have been proposed for the stability testing definite Hermitian matrix Q there exists a unique ma-
problem wherein the stability region is the left half of trix P > 0 such that
the complex plane for continuous systems and the in-
1
terior of unit disc for discrete systems. Among those c0 P + c1 AT P + c2 P A = − Q (2.1)
2
approaches, Lyapunov theory may be the most use-
ful one, and continuous and discrete Lyapunov type where c0 = β0 , c1 = 12 (β1 + iβ2 ), c2 = 12 (β1 − iβ2 ), and
equations are usually utilized to solve the above prob- (·)T denotes the transpose of matrix (·).
lem. Furthermore, as mentioned in [1], the solutions
bounds of the Lyapunov equations can also be applied Equation (1) is called generalized Lyapunov equa-
to solve many control problems including stability anal- tions (GLE).
ysis, root clustering and determination of the size of
the estimation error for multiplicative systems. So far The solution’s bounds of GLE has been discussed
a lot of results have been obtained to estimate the solu- by Chien-Hua Lee and Su-Tsung Lee[9] . Also in [9],
tions bounds of the continuous and discrete Lyapunov they gave the upper and lower bounds of the solution
equations[2-8]. In [9], Lee C.-H. and Lee S.-T. proposed matrix and the upper bounds of maximal eigenvalues
and discussed the solutions bounds of generalized Lya- of solution matrix to GLE.
punov equations (GLE). By extending the methods de-
veloped by [6] and [8], new upper and lower bounds for In this paper, we consider the estimation problem
the solution of GLE are obtained, and it is shown that of the solution matrix under the condition that matrix
the majority of these existing bounds are the special A is with uncertainty ΔA in equation (1), that is,
cases of their results.
In this paper, the concept of perturbation general- 1
c0 P + c1 (A + ΔA)T P + c2 P (A + ΔA) = − Q (2.2)
ized Lyapunov equation (PGLE) is introduced accord- 2
ing to the definition of GLE, the solutions bounds of
PGLE in a certain perturbation structure are obtained Equation (2) is called perturbation generalized
by means of the approach for GLE, and the existence Lyapunov equations(PGLE).
conditions of the solution for PGLE are presented.
1 Supported by National Natural Science Foundation of P.R.China (10471031)
2 Author for correspondence: Ling Hou, E-mail: ling1037@sina.com

Assumption (a): all eigenvalues of A + ΔA ∈ R − εDDH > 0 , we have
n×n
R are located insideΩ .
(A + DF E)H R−1 (A + DF E) ≤ AH (R−εDDH )−1 A+ε−1 E H E
Since all eigenvalues of A+ΔA ∈ Rn×n are located
2) for any symmetric positive definite matrix R > 0
inside Ω , by Lemma 1, PGLE (2) must have a unique
and any positive numberε > 0 satisfying εI − ERE H >
solution matrix
0 , we have
P > 0 . Let
U = c0 I + 2c2 (A + ΔA) (A + DF E)R(A + DF E)H

Then PGLE (2) can be rewritten as ≤ εDDH + ARAH + ARE H (εI − ERE H )−1 ERAH
U H P + P U = −Q
where (·)H denotes the conjugate transpose of matrix. Theorem 1 Under the conditions of Assumptions
(a) and (b), the lower bound of symmetric positive def-
Note that U = (c0 I + 2c2 A) + 2c2 ΔA. If we let inite matrix solution P for PGLE (2) is
e = c0 I + 2c2 A and ΔA
A e = 2c2 ΔA, then U = A
e + ΔA.e
Thus PGLE (2) can be rewritten as perturbation Lya- P1 = α[Q − α2 M1 ]1/2 (2.4)
punov equation or
e + ΔA)
(A e + ΔA)
e H P + P (A e = −Q (2.3) P2 = α[Q − α2 M2 ]1/2 (2.5)
Therefore, the bounds of solution matrix for PGLE if there exist α ∈ R and ε > 0 , such that
(2) can be obtained by estimating the bounds of solu-
tion matrix for perturbation Lyapunov equations (3). Q − α2 M1 > 0 (2.6)
Q − α2 M2 > 0 (2.7)
Assumption(b): In PGLE (2), the uncertain
matrix ΔA satisfies norm bounded uncertainty, that is where M1 = A (I − εDeH
eD ) e H −1 e+ε
A −1 T
E E,
ΔA = DF (t)E M2 = A e + ε−1 E
eH (I − εDDT )−1 A eH E
e
where D, E are known matrices with suitable dimen- Proof By perturbation Lyapunov equation (3) and
sion, F (t) is an uncertainty matrix with Lebesgue mea- Lemma 2, we have
surable elements, and satisfies
Q = [−(Ae + ΔA)
e H ]P + P [−(Ae + ΔA)]
e
F T (t)F (t) ≤ I
1 2 e + ΔA)
e H (A
e + ΔA)
e
where I is the unit matrix with suitable dimension. ≤ P + α2 (A (2.8)
α2
Under the condition of Assumption (b), the uncer- Hence, the lower bound of P can be obtained by
e in perturbation Lyapunov equations e+
inequality (8) if the upper bound of matrix (A
tainty matrix ΔA
(3) can be rewritten as ΔA) e + ΔA)
e H (A e can be estimated.
ΔA e E
e = 2c2 ΔA = 2c2 DF E DF By Lemma 3, for any ε > 0 , we have
or
ΔAe = 2c2 ΔA = 2c2 DF E DF E
e e + ΔA)
e H (A
e + ΔA)
e
(A
e = 2c2 D, E
where D e = 2c2 E.
= (Ae + DF
e E) (A
H e + DF
e E)
≤ eH (I − εD
eD e + ε−1 E T E
e H )−1 A
Lemma 2[11] For any matrices A, B ∈ C n×n , the A
following relation is true = M1
1 H
AH B + B H A ≤ α2 AH A + B B By inequality (8), it follows that
α2
where α ∈ R, α = 0.
1 2
P + α2 M1
Q≤
Lemma 3[12] Suppose that A, D, E and F are α2
matrices with suitable dimension and F H (t)F (t) ≤ I. Therefore, P ≥ P1 if there exist α ∈ R and ε > 0
Then 1) for any symmetric positive definite matrix such that inequality (6) holds.
R > 0 and any positive number ε > 0 satisfying

On the other hand, by Lemma 3, for any ε > 0, we Combining with inequality (12), P ≥ P4 if there exists
have α ∈ R such that inequality (11) is satisfied.
e + ΔA)
(A e + ΔA)
e H (A e = (A
e + DF E)
e H (A
e + DF E)
e
eH (I − εDDT )−1 + ε−1 E
≤A e H E = M2 3 The existence condition of
By inequality (8), we have solution for PGLE
1 2
Q≤ P + α2 M2
α2 The above discussion is developed under the con-
Therefore, P ≥ P2 if there exist α ∈ R and ε > 0 such dition of Assumption (a). Next, we will show the
that inequality (7) holds. condition under which Assumption (a) is tenable.
Theorem 2 Under the conditions of Assumption Theorem 3 Suppose that the solution P > 0 of
(a) and Assumption (b), the lower bound of symmetric GLE (1) exists. Under the condition of Assumption
positive definite matrix solution P for PGLE (2) is (b), Assumption (a) is tenable if there exists α ∈ R
α such that
P3 = p [Q − α2 I]1/2 (2.9)
λ1 (N1 ) 1 e eH
or −Q + P D D P + α2 E T E < 0 (3.13)
α2
α or
P4 = p [Q − α2 I]1/2 (2.10)
λ1 (N2 ) 1 2 e 2 1
σ1 (D)λ1 (P ) + α2 σ12 (E) < λn (Q) (3.14)
if there exist α ∈R and ε > 0 such that α2 2
Q − α2 I > 0 (2.11) Proof we assume that the solution P > 0 of GLE

where (1) exists. Note that
eA
N1 = A eH + AE
e T (εI − EE T )−1 E A
eH + εD
eDeH
c0 P + c1 (A + ΔA)T P + c2 P (A + ΔA)
eA
N2 = A eH + A
eEe H (εI − E
eE eA
e H )−1 E eH + εDDT
1
= [−Q + 2c1 ΔAT P + 2c2 P ΔA]
2
Proof By perturbation Lyapunov equation (3) and
and
Lemma 2, we have
−Q + 2c1 ΔAT P + 2c2 P ΔA
Q = [−(Ae + ΔA)
e ]P + P [−(A
H e + ΔA)]
e = −Q + P DF e E + ET F T D
eHP
1 1 e eH
≤ e + ΔA)
P (A e H (A
e + ΔA)P
e + α2 I(2.12) ≤ −Q + P D D P + α2 E T E
α2 α2
By Lemma 3, for any ε > 0 , it follows that Hence, if there exists α ∈ R such that inequality (13) or
(14) hold, then the solution Pe > 0 of PGLE (2) exists
e + ΔA)(
(A e A e + ΔA)
eH and Pe = P .
= (Ae + DF
e E)(A e + DF
e E)H
Similarly, another sufficient condition for the exis-
≤ eA
A eH + εD
eDe H + AE
e T (εI − EE T )−1 E A
eH tence of solution Pe > 0 of PGLE (2) can be obtained.
= N1
Theorem 4 Suppose that the solution P > 0 of
Thus, by inequality (12), p ≥ p3 if there exists α ∈ R GLE (1) exists. Under the condition of Assumption
such that inequality (11) holds. (b), Assumption (a) is tenable if there exists α ∈ R
such that
Again by Lemma 3, for any ε > 0 , we have
1 eEeH < 0
−Q + P DDT P + α2 E (3.15)
α2
e + ΔA)(
(A e A e + ΔA)
e H
or
= (Ae + DF E)(
e Ae + DF E)
e H 1 2 e < 1 λn (Q)
σ1 (D)λ21 (P ) + α2 σ12 (E) (3.16)
≤ eA
A eH + A
eEe H (εI − E
eE eA
e H )−1 E eH + εDDT α2 2
= N2

4 Example .
Hence the other lower bound of the solution matrix to
PGLE (2) is
Example 1 Consider matrix A and uncertain ma-
trix ΔA
0 as follows 1 P2 = α(Q − α2 M1 )1/2 = P1
−1.5 0 1
A=@ 0 −3.4 0 A, ΔA = DF E,
−1 0 −1.5
0 1
1 0 1
D = I3 , E = @ 0 0 0 A, F T F ≤ I3
References
1 1 1 [1] T. Mori and I.A. Derese, A brief summary of the
It is obvious that eigenvalues of matrix A are −3.4, bounds on the solution of the algebraic matrix equa-
−1.5 + I, −1.5 − I , and they are all located insideΩ tion in control theory. Int. J. Control, 39(1984),
˘ ¯ 247-256
Ω = (x, y)|x − y < 0
[2] K. Yasuda and H. Hirai, Upper and lower bounds
Moreover
on the solution of the algebraic Riccati equa-
β0 = 0, β1 = 1, β2 = −1, c0 = 0, tion. IEEE Transactions on Automatic Control,
1 1 24(1979), 483-487
c1 = (1 − i), c2 = (1 + i), A e = 2c2 A = (1 + i)A,
2 2 [3] N. Komaroff, Upper bounds for the eigenvalues
e = 2c2 D = (1 + i)I3 , E
D e = 2c2 E = (1 + i)E of the solution of the Lyapunov matrix equa-
Thus, GLE (1) is tion. IEEE Transactions on Automatic Control,
35(1990), 737-739
(1 − i)AT P + (1 + i)P A = −Q [4] N. Komaroff, Diverse bounds for the eigenvalues
Let of the continuous algebraic Riccati equation. IEEE
0 1 Transactions on Automatic Control, 39(1994),
4 0 1 532-534
Q=@ 0 4 0 A
1 0 2 [5] C.H. Lee, Upper and lower bounds of the solutions
of the discrete algebraic Riccati and Lyapunov ma-
By Theorem 1, we have
trix equations. Int. J. Control, 68(1997), 579-598
0 6.5 1
1−2ε
+ 2ε 1
ε
2
ε [6] C.H. Lee, On the upper and lower bounds of the
B C solutions for the continuous Riccati matrix equa-
M1 = B
@
1
ε
23.12
1−2ε
+ 1ε 1
ε
C
A tions. Int. J. Control, 66(1997), 105-118
2 1 6.5 2
ε ε 1−2ε
+ ε [7] C.H. Lee, Upper and lower matrix bounds of the so-
There exist α = ε = 0.25 such that lutions for the discrete Lyapunov equations. IEEE
0 1 Transactions on Automatic Control, 41(1997),
2.6875 −0.25 0.5
1338-1341
Q − α M1 = @ −0.25 0.865
2
−0.25 A > 0
0.5 −0.25 0.6875 [8] C.H. Lee and F.C. Kung, Upper and lower matrix
bounds of the solutions for the continuous and dis-
Hence the lower bound of the solution matrix to PGLE crete Lyapunov equations. J. of Franklin Institute,
(2) is 334B(1997), 539-546
P1 = α(Q − α2 M1 )1/2 [9] C.H. Lee, On the estimation of solution bounds of
0 1
0.4061 −0.0219 0.0505 the generalized Lyapunov equations and the robust
= @ −0.0219 0.2290 −0.0340 A root clustering for the linear perturbed systems. Int.
0.0505 −0.0340 0.1982 J. Control,74(10)(2001),996-1008
On the other hand, by Theorem 1, we have [10] S. Gutman and E.I. Jury, A general theory for
0 6.5 1 matrix root-clustering in subregions of the complex
1−2ε
+ 4ε 2
ε
4
ε plane. IEEE Transactions on Automatic Control,
B C 26(1981), 853-863
M2 = B @
2
ε
23.12
1−2ε
+ 2ε 2
ε
C
A
4 2 6.5 4 [11] K. Zhou and P.P. Khargonekar, Robust stabiliza-
1−2ε
+
ε ε ε tion of linear systems with norm-bounded time-
There exist α = 0.25, ε = 0.5 such that varying uncertainty. Systems and Control Letters,
0 1 10(1988), 17-20
2.6875 −0.25 0.5
Q − α2 M2 = @ −0.25 0.865 −0.25 A > 0 [12] E. Noldus, Design of robust state feedback laws.
0.5 −0.25 0.6875 Int. J. Control, 35(4)(1982), 935-938

Commuting Toeplitz Operators with Harmonic Symbols

Limin Yang
Department of Mathematics and Physics, China University of Petroleum, Beijing 102249, China
(e-mail:zhjyzhsqyang@yahoo.com.cn)
AMS subject classfications: 34K35, 34H05, 49J25. mean value property [2]
iT
Abstract: This paper investigates the Toeplitz operators on the ³ u (h(re ))S 2 dT u (h(0)), h Aut ( D ) .
If u C ( D ) then the radialization of u, denoted R (u), is the
weighted Bergman spaces, then gives the sufficient and function on D defined by
necessary conditions about the Toeplitz operators with 2S
harmonic symbols commute which are suppose I and M are R (u )( w) ³ u ( weiT ) S 2 dT .

0
bounded harmonic functions on D, then TI TM TM TI if and

only if (1) I and M are both analytic on D, or (2) I and 2 some lemmas
M are both analytic on D, or (3) there exist constants Lemma 1 Suppose that u C ( D ) L1 ( D , dAD ) , then u is harmonic
a , b C not both 0, such that aI bM is constant on D. on D if and only If
1 Introduction ³D u D hdAD / S u (h(0)) and R(u D h) C ( D) ,
for every h Aut (D) .
In 1991, Sheldon Axler showed that on Bergman space, Proof: suppose that u is harmonic on D, leW h Aut (D ) , then
two Toeplitz Operators with harmonic symbols commute only 2S
³
1
with the obvious cases. The main tool is a characterization of u D hdAD / S ³ (D 1)(1 r
2 D
) rdr ³ u D h( reiT ) dT / S
D 0 0
harmonic functions by a conformally invariant mean value 1
2 D
property [1]. This paper investigates the Toeplitz operators on ³ (D 1)(1 r ) u ( h(0)) dr 2
0
the weighted Bergman spaces by similar method to Axler’s
> @
1
2 D 1
and gives the sufficient and necessary conditions about the u ( h (0)) (1 r ) u ( h (0)) .while
0
Toeplitz operators with harmonic symbols commute.
2S
iT
Let dA denote the usual area measure on the open unit R (u D h ) ³ 0
u D h ( we ) dT / 2S u ( h(0)) C ( D ) .
disk D in the complex plane C, 1 D f , To prove the other direction, let h Aut ( D ) and
2
dAD ( z ) (D 1)(1 z ) dA ( z )
D
, the complex space let v R (u D h ) C ( D ) , fix g Aut ( D ) ,
L2 ( D, dAD ) is a Hilbert space with the inner product /S
then ³ v D gdAD
D
³ R (u D h)( g ( w))dAD / S
D
fgdAD , f , g L ( D , dAD )
2
f,g ³ , 2S
) dT / 2S dAD ( w) / S .
iT
D
2
³³ D 0
u ( h ( g ( w) e
the weighted Bergman space LD ( dAD ) is the set of those iT
For each T > 0, 2S @ , define fT Aut ( D ) by fT ( w) h ( g ( w) e ) , the
2
functions in L ( D , dAD ) that are analytic on D, when 1
inverse f- of fT is also an analytic automorphism, so there exist
2
D 0 , LD ( dAD ) is the Bergman space, the weighted Bergman
D wD and E D such that 1
E z for all z D , thus
fT ( z ) D
spaces is closed subspaces of L2 ( D, dAD ) , so there is a 1 E z
D 2
bounded orthogonal projection P from L2 ( D, dAD ) 1 E 1 E iT
1
( fT ) ( z )
/
d , where, E fT (0) h ( g (0) e )
2
onto L2D ( dAD ) . L ( D , dAD )

f
is the algebra of those 1 E z 1 E
.
fT ( z ) is bounded for all z D and T > 0, 2S @
1
so
essentially bounded functions.For I Lf ( D, dAD ) , Toeplitz
2S
operator with symbol I denoted by TI is the operator ³ ³
0 D
u ( h ( g ( w)e ))
iT
dAD ( w) / S dT / 2S
2 D 2
from L 2
D
( dAD ) to L 2
D
( dAD ) defined by 2S (1 E ) (1 z )
D
TI ( f ) p (I f ), f L ( dAD ) .
D 2 ³ ³ 0 D
u ( z ) (D 1)
1 E z
2D
TI is a liner bounded operators. Harmonic function means a 1 / 2 dAD ( z ) dT

u ( fT ) ( z )
complex valued function on D whose Laplacian is identically S 2S
2 D
zero. Let Aut(D) denote the set of analytic and one to one § 1 E · dAD ( z )
maps of D onto D, a function h on D is in Aut(D) if and only d¨K ¸ ³ u( z) f.
S
© 1 E ¹
D
if there exists E wD and z D such that
Thus we can apply Fubbinis theorem
h ( w) for all w D . A function
E ( z w) (1 zw)
is harmonic if and only if u has the invariant dAD 2S
iT dAD ( w) dT
u C ( D)
³D
vDg
S
³ ³ u (h( g ( w)e
0 D
))
S 2S

2S dAD ( w) dT 2 D 2 D 1
1 (1 r )
2 D 1
(1 r ) .1.(1 (1 r )
³ ³ 0 D
(u D fT )( w)
S 2S 2D 4
)
2 2 D 4
2
r .(1 r )
2
r .(1 r )
2S dT 2S
iT dT
³ 0
u ( fT (0))
2S 0
u ( h ( g (0)e ))
2S
³ when 1 D 0 and w ! r,
2 D 2 D 1 2 D 1
R (u D h )( g (0) v ( g (0)) . P ( D ( w, r )) (1 w ) .1.(1 (1 r ) 1 (1 r )
.
D 4
Thus v is a continuous function on D that has the areal D ( w, r )
2
r .(1 w )(1 r w )
2 D 4 2
r (1 r )
version of the invariant mean value property [2], v is
harmonic on D.because v is also a radial function, the mean Lemma 5 For h Aut(D), an operator from L2 ( D, dAD )
value property implies that v is a constant function on D with onto L2 ( D, dAD ) :
value v (0) , / 1 D / 2
2S dT U h ( f ) ( f D h)[ h ( w)] ,
³0 v(r ) 2S v(0) v(r ) v(0) ˈ Uh is a unitary operator with inverse U . 1
h
recall that v R (u D h ) , so
Proof: A simple computation shows U U h 1
I and U hU 1
I
h h
2S dT
v(r ) ³ u D h ( re ) u ( h (0)) v (0) ,
iT
(D )
0
2S ( f ,Uh g) 1
³ D
f .( g D h ).k E z ( w)dAD ( w)
for every r > 0,1 and for each h Aut ( D ) ,where u is (D ) (D ) 2
harmonic on D. ³D
( f D h). g .k E z ( w) D h. k E z ([ ) dAD ([ )
Lemma 2 Let w D and let 0 r 1 , then
2 2 2 2 2 2 1 2
D( w, r ) S r (1 w ) (1 r w ) [3]. (D )
Lemma 3 let 0 r 1 , let 1 d P f and u be positive ³D

f D h. g . (H )
k z ([ )
. k z ([ ) dAD ([ )
Borel measure on D, then the following two quantities are
(D ) z [
equivalent [3] ³D
( f D h).k z ([ ).gdAD ([ ) (U h f , g ) .( w h E
1 z[
)
(A) ° f p du p °½
sup ® ³ p : f La ( D, dA), f z 0 ¾
¯°
D
³D f dA ¿° Therefore U h U 1
.
h
(B) sup ^ u ( D ( w, r ))
D ( w, r )
: w D , ` Lemma 6

Let h Aut(D) and let M L ( D , dAD ) ,then
f
furthermore, the constants of equivalency depend only upon r VhTMVh TM D h .

and not upon p or u.
Proof: let U h L2a ( dAD ) Vh , thus V h maps L2a ( dAD ) onto
2
Lemma 4 La ( D , dA) is combined in La (DdA
2
, D). 2 (D ) (D )
La ( dAD ) , so P vh vh P .If f L2a ( dAD ) then
Proof: P ( D ( w, r ))
1D 1D
(1 [ )
2 D TM D hVh f TM D h (( f D h)[h / ] 2
) P (D ) ((M D h)( f D h)[ h / ] 2
)
2 2 D
(D 1)(1 w ) ³ D ( 0 ,1)
1 w[
2D 4
dA([ )
P (D ) (Vh (M f )) Vh ( P (D ) (M f )) VhTM f .
(1 [ )
2 D Lemma 7 Let H p ( D ) denote the usual hardy space on the disk,
2 2 D
(D 1)(1 w ) ³ (1 r w )
2D 4
dA([ ) then H 1 ( D ) L2a ( dAD ) [4].
D ( 0, r )
(1 w )
2 2 D
2 D 3 The main results
(1 r w )
2D 4 ³ D ( 0, r )
(D 1)(1 r ) rdrdT
Theorem: suppose I and M are bounded harmonic functions on D,
2 2 D
(1 w )
S 2D 4 ³ (D 1)(1 r ) d (1 r )
2 D 2
then TITM TMTI if and only if (ν) I and M are both analytic on
(1 r w )
2 2 D
D, or (ξ) I and M are both analytic on D, or (ο) there exist
(1 w ) D 1
).
2
S 2D 4
(1 (1 r ) constants a , b C not both 0, such that aI bM is constant on
(1 r w )
D.
2 2 D 2 2 2 2 D 1
Proof : “if” is obvious, we see “only if”. Suppose that M and I
P ( D ( w, r )) d (1 w ) .(1 r w ) (1 (1 r ) )
D ( w, r )
2
r (1 w ) (1 r w )
2 2 2D 4 are bounded harmonic function on D such that TM TI TI TM .
2 2 because M and I are harmonic on D, there exist
(1 w )D (1 r 2 w ) 2 (1 (1 r 2 )D 1 )
functions f , f , g and g 2 analytic on D such that M f1 f2 and
r 2 (1 r w ) 2D 4 1 2 1
I g1 g 2 on D . Because M and I are bounded on D, the

2 D 1
P ( D ( w, r )) 1.1.(1 r )
functions f1 , f 2 , g1 and g 2 must be in H ( D ) and fg , f g H 1 ( D ) ,
2
when D ! 0, 2 2D 4
D ( w, r ) r (1 r )
then those all be in L2 ( D, dAD ) Let 1 denote the constant function
P ( D ( w, r )) .
when 1 D 0 and 0 d w d r , 1 on D then
D ( w, r ) (D ) (D )
TM TI 1 TM ( P I ) TM ( P ( g1 g 2 ))

(D ) R ( f1 g 2 ) D h ) c ( D ) , thus u is harmonic, then
TM ( g1 g 2 ( w)) P ([ f1 f 2 ]{ g1 g 2 (0)])
w wu w w ( f 2 g1 f1 g 2 ) w / /
P
(D )
( f1 g1 f1 g 2 (0) f 2 g1 f 2 g 2 (0)) 0 4 ( ) 4 ( ) 4 ( f 2 g1 f1 g 2 )
w z wz wz wz wz
(D ) / /
f1 g1 f1 g 2 (0) P ( f1 g1 f 2 (0) g 2 (0) . (1) / / /
4( f 2 g1 f1 g 2 ) , thus
/ /
f1 g 2
/
f 2 g1 (7)
f1 g1 g 2 (0) f1 f 2 g1 f 2 (0) g 2 (0),1 We can show that the above equation implies that ( ν ),
TM TI 1,1
(ξ),( ο)) holds, if g 1/ is identically 0 on D, then (7) shows
³ f1 g1 g 2 (0) f1 f 2 g1 f 2 (0) g 2 (0)dAD
/
either g 2 is identically 0 on D (so I would be constant on D and
S [ f1 (0) g 1 (0) g 2 (0) f1 (0) f 2 (0) g 2 (0)] ³ f 2 g1dAD ( z ) .(2)
A similar formula (interchanging the f and g) can be (ο)) would hold) or f1 / is identically 0 on D (so both M and
obtained for TI TM 1,1 . /
I would be analytic on D and (ξ) would hold). Similarly, if g 2 is
Because TM TI TI TM , we can set the right- hand side of
identically 0 on D then (7) shows that either (ο)) or (ν) would
(1)equals to the corresponding formula for TI TM 1,1 , then hold, thus we may assume that neither g 1/ nor g 2/ is identically 0
³ ( f 2 g1 f1 g 2 )dAD /S f 2 (0) g1 (0) f1 (0) g 2 (0) . (3) on D. And so (7) shows that f1/ / g1/ f2/ / g2/ at all points of D
D
/ /
Multiply both sides of the equation TM TI TI TM by v h on except the countable set consisting of the zeroes of g1 g 2 . The

left-hand side of the above equation is an analytic function (on D
the left and vh on the right, and recall that v h is unitary / /
with the zeroes of g1 g 2 deleted ), and the right-hand side is the
( vh vh vhTM vh vhTM vh

vhTI vh vhTI vh , Lemma 4
complex conjugate of analytic function on the same domain, and so
I ) we get
both sides must equal a constant c C , thus f1/ cg1/ and
shows that
TM D hTI D h TI D hTM D h . (4) f 2/ cg 2/ on D, hence f1 cg1 and f 2 c g 2 are constant on D,
Composing both sides of the equations in (1) with h express, and so their sum, which equals M cI is constant on D, in other
each of the bounded harmonic functions M D h and I D h is words (ο)) holds, and the proof of theorem 1is complete .
the sum of an analytic function and a conjugate analytic Corollary Suppose that M is a bounded harmonic function on D,
function on D
then TM is a normal operator if and only if M ( D ) lies on some
M D h f1 D h f 2 D h , I D h g 1 D h g 2 D h . (5)
Equation (2) was derived under the assumption line in C.
that TM TI TI TM , thus (4) combined with (5) says that (3) is Proof: First suppose that M ( D ) lies on some line in C, then there
exist constants D , E c with D z 0 , such that DM E is real
still valid when we replace each function in it by composition
with h. In other words valued on D, thus TDM E is a real self-adjoint operator, and hence
dA
TM which equal D 1 (TDM E E I ) is a normal operator. To prove the
³D
( f 2 g1 f1 g 2 ) D h D
S
other direction, suppose that TM is a normal operator, thus
f 2 ( h (0)) g1 ( h (0)) f1 ( h (0)) g 2 ( h (0)) . (6)
Let u f 2 g1 f1 g 2 , the above equation becomes TT M
T T and so theorem 1 implies that M and M are both
M M M
/S analytic on D ( in which case M is a constant, same are done ) or

³
D
(u D h) dAD u ( h(0)) .
there are constants a , b C not both 0, such that aM bM is a
In other words, u has the area version of the invariant mean
value property. We want to show that u is harmonic on D, by constant on D, the latter condition implies that M ( D ) lies on a line.
the above equation and lemma 1, we need to show
that R (u D h ) c ( D ) .To do this, represent the analytic References
function f2 D h and g1 D h as Taylor series
[1] Sheldon Axler, Commuting Toeplitz operators with harmonic
f f symobles, Integral equations and operator theory, 1991 , 14(1),
( f 2 D h )( z ) ¦D z n
n
and ( g1 D h )( z ) ¦E z n
n
, p: 1-12.
n 0 n 0
[2] Walter Rudin, Function theory on the unit ball of C n .
f f
2 Springer-verlag, NewYork, 1980.
where ¦D n
2
f and ¦E n
f.
[3] Luecking Daniel, A technique for characterizing carleson
n 0 n 0
measures on Bergman spaces, Proceedings of the American
We have R (( f2 . g ) D h )( z ) 1
mathematical society, 1983 (87), p: 656-660.
2S 2S
dT f f
dT [4] John B. Garnet, Bounded analytic functions, Academic press.
³ ¦ D n z e .¦ E z e
iT iT n inT n inT
³ ( f2 D h)( ze )( g1 D h)( ze ) 2S n 0 n 0
n
2S New York ,1980, p: 90-93.
0 0
¦DnE
2n
n
z f
n 0
So R (( f 2 g ) D h ) c ( D ) , similarly we get

Ergodic Characteristics Analysis of Time Series

Data in Hydrological Process
Hongrui Wang1, Xin Lin2, Xiaoming Peng3 and Dongli Zhou2
P P P P P P P P
1
P College of Water Sciences, Beijing Normal University, Key Laboratory for Water and Sediment Sciences Ministry of Education Beijing, P.R.C.
P
2
P School of Mathematical Science, Beijing Normal University, Beijing, P.R.C.
P
3
P P Department of Mathematics and Physics, China University of Petroleum-Beijing, Beijing, P.R.C.
F
A. Definition of Ergodicity
Abstract—The ergodic characteristics analysis of hydrological
process is a new research field. In this paper, we put forward system In ergodicity theory, a stochastic process exhibits ergodicity
cluster method, autocorrelation analysis plot and RBF(Root Blood Flow) if its mean and covariance possess ergodicity. However,
Neural Network model to study on the ergodic characters, and take the ergodicity of covariance is always related to a 4th-order matrix,
rainfall data from Lanzhou hydrologic station for Yellow River basin area
and Ankang hydrologic station for Hanjiang River basin as examples to which is too complex to be tested here [6]. So we just discuss
carry out the calculation and analysis. The results show that rainfall the ergodicity of mean in this article.
data in August from both Lanzhou and Ankang stations possess
ergodicity. The methodology developed in this paper could be B. Ergodicity of Mean
extended to the ergodic property analysis in other hydrological We define a stochastic sequence ^[ t ; t 1,2,...` and a
processes, such as runoff or vaporization.
sample mean sequence ^M T , T 1,2,...` ,
Index Terms—Ergodicity, Hydrological Process, Rainfall,
1 T
Stationarity, RBF Neural Network where M T ¦ [ t . If lim D M T 0 , the sample
Tt1 T of
mean sequence is ergodic and ^[ t ` possesses ergodicity.

I. INTRODUCTION
H YDROLOGICAL phenomenon changes with time, and is

called a hydrological process. Stationary process is a
stochastic process and its statistical property does not
II. Ergodic Properties Analysis of Time Series Rainfall Data
Here we have analyzed rainfall data from two drainage areas,
Lanzhou Station for Yellow River and Ankang Station for
change with time. However, the procedure to get hydrologic
Hanjiang River (Table 1). They both have complete records of
data has high uncertainty. It is one of the realizations of
long history, of which the Lanzhou Station’s sequence is 51
stochastic processes. If we have a stationary hydrological
years long (January, 1951~December, 2001) and Ankang
process and we could use the realization to obtain statistical
Station’s sequence is 70 years long (July, 1929~June, 1998),
properties (such as mean, variance), it will significantly
respectively. We used these data to carry out the ergodic
contribute to hydrological forecast research. In a hydrological
property analysis for time series.
process, complicated variables of hydrological data vary with
time and are affected by stochastic factors. We need to know A. Primary Data Analysis
the statistical properties, such as mean function Rainfall changes with seasons. Different seasons have quite
mt E^[ t ` and correlation different rainfall amounts. Here we assume that the rainfall data
for the same month belong to the same population. Monthly
function R s, t Cov [ s , [ t , to describe and analyze
rainfall data for both study areas are shown in Fig.1and Fig.2.
such a system. However, in the real world, what we usually
have is just a sample function of the whole process, i.e., we just TABLE I
Basic Properties of the Two Study Stations
have a limited portion ^[ t ; t 1,2,..., N ` from the whole Name Longitude˄°˅ Latitude˄°˅ Position
stochastic sequence ^[ t ` . Therefore, the problem is: Can we Lanzhou 103.70 35.90 Lanzhou, Gansu
Ankang 109.03 32.72 Ankang, Shanxi
expand the statistical property from the limited sample to the
whole population? In statistics, property that the sample mean
equals to the population mean is called ergodicity [1][2], which
is the basic assumption of research on many other subjects,
and the investigation on itself is highly significant [3][4][5].

Jan, Feb, Mar, Apr, Oct, Nov, Dec
°May, Jun, Sep
°
®
° Jul
°¯ Aug
Cluster results for 12 months for Ankang area are:
Jan, Feb, Mar, Nov, Dec
° Apr, May, Jun, Jul, Oct
°
®
° Aug
°¯Sep
12 months are partitioned into 4 clusters in both areas,
however, the cluster results are different due to different
rainfall patterns.
Fig.1. Monthly rainfall data from Lanzhou Area. Y-axis is the rainfall amount
(mm) and X-axis is the month of the calendar year C. Analysis of Stationarity and Ergodicity
Since we have monthly hydrological data sequence with
periodic effect, which does not satisfy the equal-mean
prerequisite of stationary process, we treat months with similar
rainfall property as a stochastic process to investigate its
stationarity and ergodicity. Therefore, the 8 clusters listed
above are just the objects of our research.
1) Methodology
Step1: Analyze the autocorrelation function of the sample
sequence to find out if it is a stationary sequence or not.
Step2: Calculate the sample mean sequence
1 T
^M T , T 1,2,...` , M T ¦ [ t and the variance
Tt1
sequence D(MT). Plot MT and D(MT).
B B B B B B
Step3: Employ RBF (Root Blood Flow) Neural Network

Model to simulate and predict D(MT), get D(MT) when T ĺ f ,
B B B B
and test the ergodicity of the original sequence ^[ t ` .

Fig.2. Monthly rainfall data from Ankang Area
.
From Fig.1and Fig.2, we could clearly see the pattern of 2) Analysis of Stationarity and Ergodicity, Lanzhou
rainfall amount changes with seasons. It rains most in Summer, Station
less in Spring and Fall and least in Winter. Total precipitation is
higher in Ankang area than in Lanzhou area. a) Analysis of Autocorrelation and Stationarity
According to the definition of stationarity, it is difficult to
B. Cluster Analysis for Periodic Rainfall Data
find out a stationary sequence. Here we use the autocorrelation
System cluster method is used to partition the 12-month
function between every sequential value [ t , [ t 1 ,..., [ t k
periodic rainfall data into groups. Every month is a variable.
¦t 1 [ t [ [ t k [
We investigate the ergodic properties of months with similar nk
rainfall pattern. Data elements are partitioned into groups rk ˄1˅

¦t 1 [ t [
n 2
called clusters which represent collections of data elements that
2
are approximated based on a distance (similarity index). R Autocorrelation analysis plot could provide the evidence of
statistics, partial R 2 statistics, pseudo-F statistics and stationarity: if the autocorrelation coefficient rapidly trends to 0,
2
pseudo- t statistics determined the number of groups. Every i.e., it falls into stochastic interval, the time series sequence is
stationary. Otherwise, it is not stationary[7][ 8].
step reflects the change of squared distance within a cluster,
Figures 3-6 illustrate the autocorrelation analysis plots for
consequently reflects a good cluster or bad.
the 4 clusters for Lanzhou Station.
Cluster Results
Cluster results for 12 months for Lanzhou area are:

Fig.3. {Jan, Feb, Mar, Apr, Oct, Nov, Dec, Lanzhou Station} Autocorrelation Fig.6. {Aug, Lanzhou Station} Autocorrelation function plot, Lanzhou Station
function plot
From Fig.3 to Fig.6, we have the following results:
Sequence{Jan, Feb, Mar, Apr, Oct, Nov, Dec, Lanzhou
Station} is not stationary˗
Sequence {May, Jun, Sep, Lanzhou Station}, {Jul, Lanzhou
Station} and {Aug, Lanzhou Station} are stationary, i.e., these
three time series satisfy the prerequisite to carry out the test of
ergodicity.
b) Analysis of Ergodicity
Formula of ergodicity:
1 T
MT ¦ [ t ˈ ˄2˅
Tt1
T
¦ (M t M t ) 2
D M T t 1 ˄3˅
T
Fig.4. {May, Jun, Sep, Lanzhou Station} Autocorrelation function plot
If Tlim D M T 0 , then sample mean sequence is
of
ergodic, and sequence ^[ t ` possesses ergodicity[8].

Fig.7 to Fig.9 illustrate trend curve of MT, D(MT) v.s. T for
T B B T B B T
the sequences {May, Jun, Sep, Lanzhou Station},{Jul, Lanzhou

T T T
Station} and {Aug, Lanzhou Station}.

T T
Fig.5. {Jul, Lanzhou Station} Autocorrelation function plot Fig.7. Curve of MT, D(MT) varying with time T for {May, Jun, Sep, Lanzhou
B B B B
Station}

Fig.8. Curve of MT, D(MT) varying with time T for {Jul, Lanzhou Station}
B B B B
Figure 10 Curve of D(MT) varying with time T for {May, Jun, Sep, Lanzhou
B B
Station}
Fig.9. Curve of MT, D(MT) varying with time T for {Aug, Lanzhou Station}
B B B B
Fig.7 to Fig.9 illustrate that we could not expect

T T T T
T o f because of the limited sample size. So the trend of

D(MT) is still ambiguous. Next we use Neural Network Model
B B
to simulate and predict D(MT). B B
From Fig.7 to Fig.9, we could see that the curve of D(MT) is B B
nonlinear. So we use RBF Network model to simulate it[9][10].

RBF (Root Blood Flow) Neural Network has high operation
speed and deductibility, and it has great ability to approximate a
nonlinear mapping. So RBF is appropriate for nonlinear
sequence analysis. Figure 11 Curve of D(MT) varying with time T for {Jul, Lanzhou Station}
B B
The computing steps are presented below:

Step 1: Normalize original data x into x1 , where
x1 x / max( x) [11][12];
Step 2: Build a 3-layer RBF network. Input layer has n1
neurons and output layer has m neurons [13];
Step 3: Use RBF network to build a relation between n1 and
n1+1 data to simulate the relation within original data;
Step 4: Use the established relation to make forward
predictions;
Step 5: After continuous prediction, find out whether this
sequence would reach zero or not.
Simulation results are illustrated in Figures 10-12.
Specifically, in Figures 10-12, thin curve is the original
sequence plus expanded prediction sequence and thick curve is
the prediction sequence which is moved k length unit left, here
k is the length of original sequence.
Fig.12. Curve of D(MT) varying with time T for {Aug, Lanzhou Station}
B B
From Fig.10 to Fig.12, we can see that only D(MT ) of

T T T T B B
sequence {Aug, Lanzhou Station} reaches 0 when t 320 .

In conclusion, rainfall data sequence {Aug, Lanzhou Station}
possesses ergodicity.
3) Analysis of Stationarity and Ergodicity, Ankang Station
Similarly, we carry out analysis of ergodicity for Ankang
Station.
a) Analysis of Autocorrelation and Stationarity

Fig.13 to Fig.16 illustrates the autocorrelation analysis for the 4
T T T T
clusters for Lanzhou Station.
Fig.15. {Aug, Ankang Station} Autocorrelation function plot
Fig.13. {Jan, Feb, Mar, Nov, Dec, Ankang Station} Autocorrelation function
plot
Fig.16. {Sep, Ankang Station} Autocorrelation function plot
From Fig.13 to Fig.16, we conclude:

T T T T
Sequence {Jan, Feb, Mar, Nov, Dec, Ankang Station} is not

stationary˗
Sequence {Apr, May, Jun, Jul, Oct, Ankang Station}, {Aug,
Ankang Station} and {Sep, Ankang Station} are stationary, i.e.,
these three time series satisfy the prerequisite to carry out the
ergodicity test.
b) Analysis of Ergodicity
Fig.17 to Fig.19 illustrate trend curve of MT, D(MT ) v.s. T for
T T T T B B B B
the sequences {Apr, May, Jun, Jul, Oct, Ankang Station},{Aug,

Fig.14. {Apr, May, Jun, Jul, Oct, Ankang Station} Autocorrelation function Ankang Station},{Sep, Ankang Station}
plot

Fig.20. Curve of D(MT) varying with time T for {Apr, May, Jun, Jul, Oct,
B B
Ankang Station}
Fig.17. Curve of MT , D(MT) varying with time T for {Apr, May, Jun, Jul, Oct,
B B B B
Ankang Station}
Fig.21. Curve of D(MT) varying with time T for {Aug, Ankang Station}
B B
Fig.18. Curve of MT , D(MT) varying with time T for {Aug, Ankang Station}
B B B B
Fig.19. Curve of MT , D(MT) varying with time T for {Sep, Ankang Station}
B B B B
Fig.17 to Fig.19 show that we could not expect T o f for

T T T T
Fig.22. Curve of D(MT) v.s. T for {Sep, Ankang Station}

B B
a limited sample size to find out the tendency of D(MT ). B B
From Fig.20 to Fig.22, only D(MT) of sequence {Aug,

T T T T B B
Similarly, we use RBF network to simulate and predict D(MT ). B B
Ankang Station} reaches 0 when t 337 , and its rainfall data

Results are illustrated in Figures 20-22: sequence possesses ergodicity.
III. DISCUSSION AND CONCLUSIONS

Ergodicity analysis of hydrological process is a brand-new
research area, and it is also very difficult to implement. There
are some studies suggesting the possibility of ergodicity in
hydrological process [14] [15][16], but no research results
published in open literature yet.
In this study, we carried out the ergodicity analysis using the
monthly rainfall periodic data from both Lanzhou and Ankang
Stations. Since the periodic rainfall data possess the properties
of limitation, single path and semi-periodicity, which do not
satisfy the prerequisite of stationarity of ergodicity, we
developed some original methods in our study as follows: (1)
Treat every month as a variable. According to rainfall data, we
use system cluster analysis to partition 12 months into groups
with similar rainfall pattern. We then rearrange sample

sequence to get continuous time series sequence and discuss its [15] K. Demetris, Stochastic Simulation of Hydrosystems. Available:
http://www.itia.ntua.gr/getfile/541/2/2004EncyclStochSimulPP.pdf
stationarity and ergodicity. From hydrological study point of HU UH
[16] C. Z. Fan, “The influence of the preferential flow path on slope stability :
view, this is also a very reasonable approach (2) Use taking Tasoling as an example,” Ph.D. dissertation, Department of Resources
autocorrelation analysis to investigate stationarity, which Engineering, National Cheng Kung University, Taiwan, 1993.
avoids the difficulty of using the definition of stationarity to
carry out the test. (3) Suggest to use RBF Neural Network Dr. Hongrui Wang is an associate professor in the College of Water Sciences at the
Model to simulate and predict the variance sequence of sample Beijing Normal University. His research interests include hydrology, water
resources. He has published 40 papers and one book in water resources and
mean, which overcomes the shortcoming of the limited rainfall hydrology. Email: henryzsr@bnu.edu.cn Postcode: 100875
HTU UTH
data sequence.
In conclusion, the rainfall data in August from both Lanzhou M.s.Xin Lin is a graduate student in the School of Mathematical Science, Beijing
Normal University. Her research interests include mathematical statistics and
and Ankang Stations possess ergodicity, i.e., rainfall process in hydrology statistics. Email: snaillx@tom.com
HTU UTH
August in Lanzhou area and Ankang area are both highly

Dr. Xiaoming Peng is an associate professor in the Department of Mathematics and
stationary and well-regulated. Physics, China University of Petroleum-Beijing. His recent research interests
Similarly, we could also study the stationarity and ergodicity include environmental mathematics and ecological mathematics. He has published a
of other hydrological processes, such as runoff and number of technical papers in scientific journal. Email: pxm518@126.com
HTU UTH
vaporization. In the future, we could analyze auto-covariance M.s.Dngli Zhou is a graduate student in the School of Mathematical Science,
of certain hydrological process data to investigate ergodicity; Beijing Normal University. Her research interests include applied statistics and
computational mathematics. Email: dongli_zhou@hotmail.com
such study is not only more significant but difficult as well. HTU UTH
ACKNOWLEDGMENT
This work was supported by the Key Project of National
Basic Scientific Program “973 Project” entitled by “Evolving
Law and Maintaining Mechanism of Renewable Capacity of
Water Resources in Yellow River Basin” funded by the
Ministry of Science and Technology.
REFERENCES
[1] S. Chick., J. Shortle, P. V. Gelder, M. B. Mendel, “A Model for the
Frequency of Extreme River Levels Based on River Dynamics,” Structural
Safety, vol. 18, No. 4, pp. 261-276, 1996.
[2] East China Engineering School of Water Resources, Fundamentals of
Probability and Statistics on Hydrology, China WaterPower Press, 1981,
170~173
[3] F. Aldo, J. Igor, “Can we determine the transverse macro dispersivity by
using the method of moments,” Advances in Water Resources, vol. 28, No. 6,
pp. 589-599, 2005.
[4] K-C Hsu, “The influence of the log-conductivity auto covariance structure
on macro dispersion coefficients,” Journal of Contaminant Hydrology, vol. 65,
No. 1-2, pp. 65-77, 2003.
[5] H. T. Mitosek, “On Stochastic Properties of Daily River Flow Processes,”
Journal of Hydrology, vol. 228, No. 3-4, pp. 188-205, 2000.
[6] Z. B. Fang, B. Q. Miao, Stochastic Process, University of Science and
Technology of China Press, 2002, pp82~91.
[7] B. H. Daren, H. Pu, “Verifying Irreducibility and Continuity of a Nonlinear
Time Series,” Statistics & Probability Letters, vol. 40, pp. 139~148, 1998.
[8] B. H. Daren, H. Pu, “Stability of Nonlinear AR(1) Time Series with Delay,”
Stochastic Processes and Their Applications, vol. 82, pp. 307~333, 1999.
[9] Y. Y. Li, D. R. Shen, Y. J. Chen, S. J. Li, “Prediction of Nonlinear
Systems.Based on RBF Neural Networks,” Computer Measurement and
Control, vol. 14, No.3, pp. 319–321, 2006.
[10] A. Murat, H. K. Cigizoglu, “Suspended sediment load simulation by two
artificial neural network methods using hydrometeorological
data,” Environmental Modelling & Software, vol. 22, No. 1, pp. 2-13, 2007.
[11] J. K. Zu, Q. F. Zeng, “Structure Optimization Strategy of Normalized
Radial Basis Function Networks,” Computer Simulation, vol. 19, No.3, pp.
43–45, 2002.
[12] J. S. R. Jang, C. T. Sun, E. Mizutaru, Neuro-fuzzy and Soft Comptuting[M].
Prentice Hall, 1997.
[13] M. Norgaard. Neural network based system identification toolbox[R].
00-E-891, Department of Automation, Technical University of Denmark, 2000.
[14] Hindawi Publishing Corp, J. Q. Duan, B. Goldys, Ergodicity of
Stochastically Forced Large Scale Geophysical Flows. Available:
http://ijmms.hindawi.com
HU UH

Periodic solutions for a kind of p−Laplacian Liénard equation

with a deviating argument
Minggang Zong†,‡,1 Wei Yuan† Wenqing Zhao
†. Faculty of Science, Jiangsu University, Zhenjiang 212013,Jiangsu,P.R.China

‡. School of Statistics, Renmin University of China,Beijing 100872,P.R.China
. School of Electrical and Information Engineering, Jiangsu University, Zhenjiang212013, Jiangsu,P.R.China
AMS subject classifications: 34K13,34C25
Abstract: By employing the continuation theorem of where ϕp is defined as above, functions f (x), g(x) are
coincidence degree theory and some analysis techniques, continuous. In their work, the Lipschitz condition im-
we study a kind of p−Laplacian Liénard equation with posed on g(x) such as
a deviating argument as follows:
|g(x1 ) − g(x2 )| ≤ L|x1 − x2 |, ∀x1 , x2 ∈ R, (1.4)
(ϕp (x (t))) + f (t, x(t))ϕp (x (t)) + g(x(t − τ (t))) = e(t).
Some sufficient condtions on the existence of periodic was required.
solutions are obtained. The main technique of these works [1-5] was to
convert the problem into the abstract form Lx = N x,
where L being a non-invertible linear operator. Thus
1 Introduction the existence of solutions of the problem can be ob-
tained by Mawhin’s continuation theorem [5]. However,
The purpose of this paper is to deal with the existence as p = 2, the differential operator x → (ϕp (x )) is no
of periodic solutions for the following p−Laplacian dif- longer linear, in this case the continuation theorem of
ferential equation with a deviating argument Mawhin can’t beR used directly, and on the other hand,
the crucial step 0 f (x(t))x (t)dt = 0 which is required
T
(ϕp (x (t))) + f (t, x(t))ϕp (x (t)) + g(x(t − τ (t))) = e(t),
to obtain an a priori bound of periodic solutions for
(1.1)
Eq.(1.1) is no longer valid. In the present paper we try
where ϕp : R → R is defined by ϕp (s) = |s|p−2 s, p > 1
to establish some criteria to guarantee the existence of
is a constant. f (t, x) is continuous for (t, x) ∈ R2 and
T −periodic solutions of Eq.(1.1). The methods used to
periodic about t with period T , i.e f (t + T, ·) = f (t, ·).
estimate a priori bound of periodic solutions are differ-
g, e, τ ∈ C(R, R), e, τ are periodic with period T and
RT ent from the corresponding ones in [1-4]. We conquer
0
e(t) dt = 0 .
these difficulties by means of translating Eq.(1.1) into a
Obviously, when p = 2, ϕ2 (s) = s, the existence of
two-dimensional system on which Mawhin’s continua-
periodic solutions to several types of second order dif-
tions theorem can be applied. Furthermore, the signif-
ferential equations with deviating arguments were dis-
icance of this paper is that the conditions imposed on
cussed extensively, see papers[6-8,11]. For example, in
g(x) in Theorem 2 are weaker than condition (1.4)(see
[8] Lu and Ge discussed the following kind of Liénard
Example 3.2 later). The results in our paper are related
equation with deviating arguments
to the deviating argument, which can be used in system
x (t) + f (t, x(t), x(t − τ0 (t)))x (t) control.
+β(t)g(x(t − τ1 (t))) = p(t). (1.2)
1
Under the conditions that sup
(t,x,y)∈R3
|f (t, x, y)| < T
, 2 Some Lemmas
lim sup | g(x)
x
| ≤ r and other conditions, the authors got Let X and Y be real Banach Space and let L : D(L) ⊂
|x|→+∞
X → Y be a Fredholm operator with index zero, here
the existence of periodic solutions for Eq.(1.2).
D(L) denotes the domain of L. This means that Im L
For the case p = 2, corresponds to the so called
is closed in Y and dimKer L = dim(Y / Im L) < ∞.
one-dimensional p−Laplacian, which is used to describe
Consider the complementary subspaces X1 and Y1 such
fluid mechanical and nonlinear elastic mechanical phe-
that X = Ker L ⊕ X1 and Y = Im L ⊕ Y1 and let
nomena, there appeared many papers to investigate the
P : X → Ker L and Q : Y → Y1 be the natural pro-
existence of periodic solution with or without delays, for
jections. Clearly, ker L ∩ D(L) ∩ X1 = {0}, thus the
example, in [1-4,9-10,12] and the references therein. In
restriction Lp := L|D(L)∩X1 is invertible. Denote by K
[2] Cheung and Ren discussed the p−Laplacian Liénard
the inverse of LP .
equation with a deviating argument of the form
Let Ω be an open bounded subset of X with
(ϕp (x (t))) + f (x(t))x (t) + g(x(t − τ (t))) = e(t), (1.3) D(L) ∩ Ω = φ. A map N : Ω → Y is said to be L−
1 Corresponding author. E-mail address: zong-mg@163.com(M.Zong)

1 1
compact in Ω, if QN (Ω) is bounded and the operator Notice that p
+ q
= 1, so p
q
= p − 1, we have
K(I − Q)N : Ω → X is compact.
Z T
Lemma 2.1 [5] Suppose that X and Y are two Banach |x(t) − x(t − s(t))|p dt
spaces, and L : D(L) ⊂ X → Y is a Fredholm opera- 0
Z ˛˛Z t ˛p
˛
tor with index zero. Furthermore, Ω ⊂ X is an open ˛ ˛
bounded set and N : Ω → Y is L−compact on Ω. If = ˛ x (η) dη ˛ dt (2.8)
Λ1 ˛ t−s(t) ˛
(1)Lx = λN x, ∀x ∈ ∂Ω ∩ D(L), λ ∈ (0, 1); ˛
Z ˛Z t ˛p
˛
(2)N x ∈ Im L, ∀x ∈ ∂Ω ∩ Ker L; ˛ ˛
+ ˛ x (η) dη ˛ dt
(3)deg{JQN, Ω ∩ Ker L, 0} = 0, where J : Im Q → Λ2 ˛ t−s(t) ˛
Ker L is an isomorphism, Z Z t
then the equation Lx = N x has a solution in Ω ∩ D(L). ≤ αp−1 |x (η)|p dη dt
Λ1 t−α
Z Z t+α
Next, we introduce the following inequality which +α p−1
|x (η)|p dη dt
Λ2 t
is important for us to estimate a priori bound of peri- Z TZ t+α
odic solutions in Section 3. ≤ αp−1 |x (η)|p dη dt. (2.9)
0 t−α
Lemma 2.2 Let 0 ≤ α ≤ T be a constant, s(t) is con-

tinuous and periodic with period T , max |s(t)| ≤ α, Case 1. If α ∈ [0, T
2
], then we have
t∈[0,T ]
1
then for any x ∈ C (R, R) which is periodic with period Z TZ t+α
T , we have |x (η)|p dη dt
Z Z Z
0 t−α
αZ η+α Z T −αZ η+α
T T
|x(t) − x(t − s(t))|p dt ≤ 2αp |x (t)|p dt. (2.5) = |x (η)|p dt dη + |x (η)|p dt dη
0 0
−α 0 α η−α
Z T +αZ T
Proof Let Λ1 = {t : t ∈ [0, T ], s(t) ≥ 0} and Λ2 = {t : + |x (η)|p dt dη
t ∈ [0, T ], s(t) < 0}. Then from Hölder inequality, for T −α η−α
∀t ∈ Λ1 Z α Z T −α
˛Z ˛p = |x (η)|p (η + α) dη + 2α |x (η)|p dη
˛ t ˛ −α α
˛ ˛ Z
˛ x (η) dη ˛ T +α
˛ t−s(t) ˛ + |x (η)|p (T − η + α) dη
Z t !p T −α
Z α Z T −α
≤ |x (η)| dη
t−s(t)
= 2α |x (η)|p dη + 2α |x (η)|p dη
−α α
Z !p Z !p Z T
t q t p
≤ q
1 dt p
|x (η)| dη = 2α |x (η)|p dη, (2.10)
t−s(t) t−s(t) 0
Z t
p
≤ |s(t)| q |x (η)|p dη so from (2.9)we can get
t−α
Z t Z Z
p T T
≤ αq |x (η)|p dη (2.6) |x(t) − x(t − s(t))|p dt ≤ 2αp |x (t)|p dt.
t−α 0 0
and for ∀t ∈ Λ2
˛Z ˛p Case 2 if α ∈ [ T2 , T ],then
˛ t ˛
˛ ˛ Z TZ
˛ x (η) dη ˛ t+α
˛ t−s(t) ˛
!p |x (η)|p dη dt
Z t−s(t) 0 t−α
Z T −αZ η+α Z Z
≤ |x (η)| dη α T
t = |x (η)|p dt dη + |x (η)|p dt dη
!p !p −α 0 T −α 0
Z t−s(t) q Z t−s(t) p Z T +αZ T
≤ 1q dt |x (η)|p dη + |x (η)|p dt dη
t t α η
Z t+α
Z T −α Z α
p
≤ |s(t)| q |x (η)|p dη = |x (η)|p (η + α) dη + T |x (η)|p dη
t −α T −α
Z t+α Z T +α
p
≤ αq |x (η)|p dη. (2.7) + |x (η)|p (T − η + α) dη, (2.11)
t α

while For convenience, set CT = {x ∈ C(R, R)|x(t+T ) =
Z T +α
x(t)}, with the norm |x|0 = max |x(t)|, X = Y = {x =
t∈[0,T ]
|x (η)|p (T − η + α) dη (x1 (·), x2 (·)) ∈ C(R, R2 ) : x(t) ≡ x(t + T )} with the
α
Z RT 1
α norm x = max{|x1 |0 , |x2 |0 }. |x|p = ( 0 |x(t)|p d t) p .
= |x (η)|p (T − η + α) dη Clearly, X and Y are Banach spaces. Define
−α
Z T −α
+ |x (η)|p (T − η + α) dη L : D(L) = {x = (x1 (·), x2 (·)) ∈ C 1 (R, R2 )
−α : x(t) ≡ x(t + T )} ⊂ X → Y
Z T +α
p
+ |x (η)| (T − η + α) dη. by
T −α
!
x1
Lx = x = ,
it follows from (2.11) that x2
Z T Z t+α N : X → Y,
|x (η)|p dη dt !
0 t−α ϕq (x2 )
Z Z Nx = .
T −α α −f (t, x1 (t))x2 (t) − g(x1 (t − τ (t))) + e(t)
p p
= (T + 2α) |x (η)| dη + T |x (η)| dη
−α T −α It is easy to see that Ker L = R2 , Im L = {y ∈ Y :
Z −α
RT
y(s) ds = 0}. So L is a Fredholm operator with in-
+ |x (η)|p (T − η + α) dη 0
α
dex zero. Let P : X → Ker L and Q : Y → Im Q ⊂ R2
Z T +α be defined by
+ x (η)|p (T − η + α) dη Z Z
T −α 1 T 1 T
Z T −α Z α Px = x(s) ds; Qy = y(s) ds,
T 0 T 0
= (T + 2α) |x (η)|p dη + T |x (η)|p dη
−α T −α
Z and let K denote the inverse of L|Ker P ∩D(L) . Obvi-
−α
+ p
|x (η)| (T − η + α) dη ously, Ker L = Im Q = R2 and
Zαα Z T
+ |x (η)|p (−η + α) dη [Ky](t) = G(t, s)y(s) ds, (2.13)
0
−α
Z T −α Z T −α where (
= (T + 2α) |x (η)|p dη − T |x (η)|p dη s
, 0 ≤ s < t ≤ T,
−α T
Z α
α G(t, s) = s−T
T
, 0 ≤ t ≤ s ≤ T.
−T |x (η)|p dη
−α From (2.13), we can easily see that N is L−compact on
Z T −α Z T −α Ω, where Ω is an open, bounded subset of X.
= (T + 2α) |x (η)|p dη − T |x (η)|p dη
−α −α
Z
= 2α
T
|x (η)|p dη.
3 Main Results
0
In this section, we will give our main results and some
Substituting it into (2.9), we can get examples.
Z Z
T T Theorem 3.1 Suppose that there are positive con-
|x(t)−x(t−s(t))|p dt ≤ 2αp |x(t)−x(t−s(t))|p dt stants σ, β, d such that the following conditions hold:
0 0
(H1 ) inf |f (t, x)| ≥ σ > 0,
(t,x)∈[0,T ]×R
In order to use Mawhin’s continuation theorem
for all (t, x) ∈ [0, T ] × R,
to study the existence of T −periodic solutions for
Eq.(1.1), we rewrite Eq.(1.1) in the following form
(H2 ) sgn(x)g(x) > ˛ |e|0˛, for |x| > d,
˛ ˛
(H3 ) lim sup ˛ xg(x)p−1 ˛ ≤ β.
j |x|→+∞
x1 (t) = ϕq (x2 (t)) = |x2 (t)|q−2 x2 (t) then Eq.(1.1) has at least one T −periodic solution pro-
x2 (t) = −g(x1 (t − τ (t))) − f (t, x1 (t))x2 (t) + e(t), vided that βT p−1 < σ.
(2.12)
where q > 1 is a constant with p1 + 1q = 1. Clearly, Proof Consider the following operator equation
if x(t) = (x1 (t), x2 (t)) is a T −periodic solution
Lx = λN x, λ ∈ (0, 1). (3.14)
to Eq.(2.12), then x1 (t) must be a T −periodic solu-
tion to Eq.(1.1). Therefore the problem of finding a where the operators L, N were defined as before. Let
T −periodic solution for Eq.(1.1) reduces to finding one
for Eq.(2.12). Ω1 = {x ∈ X : Lx = λN x, λ ∈ (0, 1)}.

` ´
If x(t) = xx12 (t)
(t)
∈ Ω1 , then from (3.14)we have Substituting x2 (t) = ϕp ( λ1 x1 (t)) into the second equa-
j tion of (3.15), we have
x1 (t) = λϕq (x2 (t)) = λ|x2 (t)|q−2 x2 (t),
x2 (t) = −λg(x1 (t − τ (t))) − λf (t, x1 (t))x2 (t) + λe(t). 1 1
[ϕp ( x1 (t))] + λf (t, x1 (t))ϕp ( x1 (t))
(3.15) λ λ
We first assert that there is a constant ξ ∈ R such that +λg(x1 (t − τ (t))) = λe(t),
|x(ξ)| ≤ d. (3.16) that is

RT
In fact, as 0 x1 (t) dt = 0, we know that there exist
(ϕp (x1 (t))) + λf (t, x1 (t))ϕp (x1 (t))
two constants t1 , t2 ∈ [0, T ] such that
+λp g(x1 (t − τ (t))) = λp e(t). (3.25)
x1 (t1 ) ≥ 0, x1 (t2 ) ≤ 0. (3.17)
With the condition of (H3 ), for a given ε =
From the first equation of (3.15), we have x2 (t) = 1
[ σ − β] > 0, there is a constant A > 0, such that
ϕp ( λ1 x1 (t)), So 2 T p−1
1 |g(x1 (t − τ (t)))| ≤ (β + ε)|x1 (t − τ (t))|p−1 ,

x2 (t1 ) = |x1 (t1 )|p−2 x1 (t1 ) ≥ 0,
λp−1
1 for |x1 (t − τ (t))| > A. Set
x2 (t2 ) = |x1 (t2 )|p−2 x1 (t2 ) ≤ 0.
λp−1
E1 = {t|t ∈ [0, T ], |x1 (t − τ (t))| ≤ A}
Let t3 , t4 ∈ [0, T ] be the maximum point and minimum
point of x2 (t). Obviously, we have
E2 = {t|t ∈ [0, T ], |x1 (t − τ (t))| > A}
x2 (t3 ) ≥ 0, x2 (t3 ) = 0, (3.18)
Multiplying two side of Eq.(3.25) with x1 (t) and inte-
x2 (t4 ) ≤ 0, x2 (t4 ) = 0. (3.19) grating them over [0, T ], notice that from the boundary
condition
From the condition(H1 ) and by the continuity, f will
not change sign for (t, x) ∈ [0, T ] × R. without loss of Z T Z T
generality, suppose f (t, x) > 0 for (t, x) ∈ [0, T ] × R and (ϕp (x1 (t))) x1 (t) dt = x1 (t) dϕp (x1 (t))
0 0
upon the substitution of (3.18) into the second equation Z x1 (T )
of (3.15), we have = s dϕp (s)
x1 (0)
−λg(x1 (t3 − τ (t3 ))) + λe(t3 ) = λf (t3 , x1 (t3 ))x2 (t3 ) ≥ 0.
= 0.
that is
so we have
g(x1 (t3 − τ (t3 ))) ≤ e(t3 ) ≤ |e|0 . (3.20)
Z T
From (H2 ) we can get σ |x1 (t)|p dt
0
˛Z T ˛
x1 (t3 − τ (t3 )) < d. (3.21) ˛ ˛
≤ λp−1 ˛˛ x1 (t)g(x1 (t − τ (t))) dt˛˛
0
With the similar argument, from (3.19) we have ˛Z T ˛
p−1 ˛
˛ ˛
+λ x
(t)e(t) dt˛
g(x1 (t4 − τ (t4 ))) ≥ e(t4 ) ≥ −|e|0 , (3.22) ˛ 1 ˛
0
Z
and by (H2 ), ≤ |x1 (t)||g(x1 (t − τ (t)))| dt
E1
x1 (t4 − τ (t4 )) > −d, (3.23) Z
+ |x1 (t)||g(x1 (t − τ (t)))| dt
Case(1)If x1 (t3 − τ (t3 )) ∈ (−d, d), define ξ = E2
t3 − τ (t3 ). Obviously, |x(ξ)| ≤ d. Z T
Case(2)If x1 (t3 − τ (t3 )) < −d, from (3.23) and the + |x1 (t)||e(t)| dt
0
fact that x(t) is a continuous function in R, there exists Z T
a constant ξ between x1 (t3 − τ (t3 )) and x1 (t4 − τ (t4 )) ≤ (β + ε) |x1 (t)||x1 (t − τ (t))|p−1 dt
such that |x(ξ)| = d. Z
0
Z
T T
So inequality (3.16) holds.
+gE1 |x1 (t)| dt + |e|0 |x1 (t)| dt
Since ξ ∈ R, there is an integer k and a con- 0 0
stant t5 ∈ [0, T ] such that ξ = kT + t5 , and |x1 (ξ)| = Z T
|(x1 (t5 )| ≤ d. So ≤ (β + ε) |x1 (t)||x1 (t − τ (t))|p−1 dt
0
Z T Z T
1
|x1 |0 ≤ d + |x1 (t)| dt ≤ d + T q |x1 |p . (3.24) +(gE1 + |e|0 ) |x1 (t)| dt, (3.26)
0 0

where gE1 := max |g(u)|. From Hölder inequality , that
|u|≤A
Z T Z T
Z T
|x2 (t)| dt ≤ λ |f (t, x1 (t))||x2 (t)| dt
0 0
|x1 (t)||x1 (t − τ (t))|p−1 dt
0 +λgM2 T + λ|e|1
„Z « p−1 Z T
T
|f (t, x1 (t))||ϕp (x1 (t))| dt
p p
≤ |x1 (t − τ (t))|
(p−1) p−1
dt = λ
0
0
„Z « 1 λgM2 T + λ|e|1
T p Z
× |x1 (t)|p dt T
0 ≤ gM2 T + fM2 |x1 (t)|p−1 dt + |e|1

0
„Z T « p1 „Z T « p−1
p
while
≤ fM2 1p dt |x1 (t)|p dt
Z T
0 0
p
|x1 (t − τ (t))|
(p−1) p−1
dt +gM2 T + |e|1
0 1
Z T ≤ gM2 T + fM2 T p M1p−1 + |e|1 ,
= |x1 (t − τ (t))|p dt ≤ |x1 |p0 T
0 where gM2 := max |g(u)|, fM2 :=
|u|≤M2
≤ (d + T 1/q
|x1 |p )p T, RT
max |f (t, u)| and |e|1 := 0 |e(t)| dt. Thus
t∈[0,T ],|u|≤M2
then inequality (3.26) becomes from (3.29) we have

1
1 1 |x2 |0 ≤ gM2 T + fM2 T p M1p−1 + |e|1 := M3 . (3.30)
σ|x1 |pp ≤ (β + ε)|x1 |p (d +T q |x1 |p )p−1 ·T q
1 Let Ω2 := {x ∈ Ker L : N x ∈ Im L}. If x ∈ Ω2 ,

+(gE1 + |e|0 )T q |x1 |p
Rthen x ∈ Ker L and QN x = 0. From assumption that
p−1 1 T
= (β + ε)|x1 |p T q |x1 |p−1
p Tq 0
e(t) dt = 0, we see
X
p−2 j
+(β + ε)|x1 |p T q
1
k
Cp−1
k
|x1 |kp T q dp−k−1 |x2 |q−2 x2 = 0,
(3.31)
k=0
g(x1 ) = 0.
1
+(gE1 + |e|0 )T |x1 |p q So
= (β + ε)T p−1 |x1 |pp |x1 | ≤ d ≤ M2 , x2 = 0 ≤ M3 . (3.32)
X
p−2 Set Ω = {x = (x1 , x2 ) ∈ X : |x1 |0 < N1 , |x2 |0 < N2 },
1 k
+(β + ε)|x1 |p T q k
Cp−1 |x1 |kp T q dp−k−1 where N1 and N2 are constants with N1 > M2 , N2 >
k=0 M3 and (N2 )q > dgd , where gd := max |g(u)|. Then
1
|u|≤d
+(gE1 + |e|0 )T q |x1 |p . (3.27) Ω1 ⊂ Ω, Ω2 ⊂ Ω. From (3.32),(3.29)(3.28), it is obvious
that condition (1) and (2) of Lemma 2.1 are satisfied.
From the selection of ε, we see that (β + ε)T p−1 < σ, Next, we claim that condition (3) of Lemma 2.1 is
so there is a constant M1 independence of λ such that also satisfied. For this, define the isomorphism
|x1 |p < M1 , which yields that
J : Im Q → Ker L
1/q 1/q
|x1 |0 ≤ d + T |x |p ≤ d + T M1 := M2 . (3.28) by J(x1 , x2 ) := (−x2 , x1 ) and let
In view of the first equation of (3.15), we have H(v, μ) := μv + (1 − μ)JQN v, (v, μ) ∈ Ω × [0, 1].
Z T
By simple calculation, we obtain that for (x, μ) ∈
|x2 (t)| q−2
x2 (t) dt = 0. ∂(Ω ∩ ker L) × [0, 1],
0
x H(x, μ) = μ(x21 + x22 ) + (1 − μ)(x1 g(x1 ) + |x2 |q ) > 0.
which implies that there is a constant t6 ∈ [0, T ] such
Hence
that x2 (t6 ) = 0. Therefore
Z deg{JQN, Ω ∩ ker L, 0}
T
|x2 |0 ≤ |x2 (t)| dt. (3.29) = deg{H(x, 0), Ω ∩ ker L, 0}
0
= deg{H(x, 1), Ω ∩ ker L, 0}
Taking absolute value and integrating over [0, T ] on = deg{I, Ω ∩ ker L, 0}
both sides of the second equation of (3.15) we obtain = 0.

and condition (3) of Lemma 2.1 is also satisfied. However
Therefore, by Lemma 2.1,we conclude that equa- ˛Z T ˛
˛ ˛
tion ˛ x
(t)g(x (t − τ (t))) dt˛
˛ 1 1 ˛
Lx = N x 0
˛Z T
has a solution x(t) = (x1 (t), x2 (t)) on Ω, i.e. Eq.(1.1) ˛
= ˛ x1 (t)[g(x1 (t − τ (t))) − g(x1 (t))] dt
has a T −periodic solution x1 (t) with |x1 | ≤ M2 . This ˛
0
Z T ˛
completes the proof of Theorem 3.1. ˛
+ x1 (t)g(x1 (t)) dt˛˛
Theorem 3.2 Suppose that there exists a constant 0
Z T
K > 0 such that
≤ |x1 (t)||g(x1 (t − τ (t))) − g(x1 (t))| dt
p−1 0
|g(x) − g(y)| ≤ K|x − y| , ∀x, y ∈ R. (3.33) Z T
≤ K |x1 (t)||x1 (t − τ (t)) − x1 (t)|p−1 dt.
and the conditions (H1 ),(H2 ) in Theorem 3.1 are 0
also satisfied. Then equation (1.1) has at least one
p−1 From Hölder inequality,
T −periodic solution provided that σ > 2 p K|τ |p−1
0 .
Z T
Remark When 1 < p ≤ 2, function g(x) in (3.33) is |x1 (t)||x1 (t − τ (t)) − x1 (t)|p−1 dt
called Hölder continuous and if p = 2 Hölder contin- 0
uous becomes Lipschitz continuous. We will see later „Z T « p−1
p
p
(p−1) p−1
that there exists some function which is Hölder contin- ≤ |x1 (t − τ (t)) − x1 (t)| dt
uous but not Lipschitz continuous, so our condition is 0
„Z « p1
weaker than(1.4) in some sense. T
× |x1 (t)|p dt ,
`Proof
x1 (t)
´ Let Ω1 be defined as in Theorem 1. If x(t) = 0
x2 (t)
∈ Ω1 , then from the proof of Theorem 1 we know
that By using Lemma 2.2 we know
(ϕp (x1 (t))) + λf (t, x1 (t))ϕp (x1 (t)) „Z T « p−1
p
p
(p−1) p−1
+λp g(x1 (t − τ (t))) = λp e(t), (3.34) |x1 (t − τ (t)) − x1 (t)| dt
0
and „ Z T « p−1
p
Z T ≤ 2|τ |p0 |x1 (t)|p dt
1
|x1 |0 ≤ d + |x1 (t)| dt ≤ d + T q |x1 |p . (3.35) 0
0 „Z T « p−1
p
p−1
=2 p |τ |p−1
0 |x1 (t)|p dt . (3.38)
Next we assert that |x1 |0 is bounded. Multiplying 0
two side of Eq.(3.34) with x1 (t) and integrating them
over [0, T ], from the boundary condition we know So inequality (3.37) reduces to
Z T Z T p−1 1
(ϕp (x1 (t))) x1 (t) dt = x1 (t) dϕp (x1 (t)) σ|x1 |pp ≤ 2 p K|τ |p−1
0 |x1 |pp + |e|0 T q |x1 |p .
0 0
Z x1 (T ) p−1
with the condition 2 p K|τ |p−1
0 < σ we know there
= s dϕp (s)
x1 (0) exists a constant M4 independence of λ, such that
= 0. (3.36) |x1 |p ≤ M4 . The remainder is similar to the proof of
Theorem 3.1.
So we can get
Z T Example 3.1 Consider the following differential equa-
f (t, x1 (t))|x1 (t)|p dt tion
0
Z T
(ϕp (x (t))) + (sin x(t) + 2)ϕp (x (t))
+λp−1 x1 (t)g(x1 (t − τ (t))) dt
1
Z
0
+ (x(t − cos(t)))3 = cos t, (3.39)
T 64
= λp−1 e(t)x1 (t) dt,
0 and when p = 3, suppose that f (t, x) = (sin x +
1 3
i.e, 2), g(x) = 64 x , τ (t) = e(t) = cos t, t ∈ [0, 2π]. Then
Z ˛Z ˛ |f (t, x)| ≥ σ = 1,and there is a constant d = 1 > 0,
T ˛ T ˛ such that for |x| > 1, sgn(x)g(x) > |e|0 . As T = 2π,
σ |x1 (t)|p dt ≤ ˛ x1 (t)g(x1 (t − τ (t))) dt˛˛
˛ 2
0 0 we see βT p−1 = 64 1
(2π)2 = π16 < 1 = σ, form Theorem
Z T 3.1 we know that Eq.(3.39) has at least one 2π−periodic
+|e|0 |x1 (t)| dt. (3.37) solution.
0

Example 3.2 Consider the following differential equa- [2] W.S. Cheung , J.L. Ren, Periodic solutions for
tion p−Laplacian Liénard equation with a deviating ar-
gument, Nonlinear Anal., TMA 59(2004)107-120.
(ϕ 3 (x (t))) + (et·x(t) + 2)ϕ 3 (x (t))
2 2
p [3] W.S. Cheung , J.L. Ren, Periodic solutions for
+ (x(t − cos t)) − 1 + 1 = sin t, (3.40) p−Laplacian Rayleigh equation, Nonlinear Anal.,
where p = 32 , f (t, x) =√et·x + 2 > 2 for all (t, x) ∈ TMA 65(2006)2003-2012.
[0, 2π] × R and g(x) √ = x − 1 + 1, τ (t) = cos t, e(t) = [4] C. Fabry, D. Fayyad, Periodic solutions of second
sin t. sgn(x)g(x) = x − 1 + 1 > 1 for x > 1. The order differential equations with a p-Laplacian and
following is to testify that condition imposed on g(x) asymmetric nonlinearities, Rend. Ist. Univ. Tri-
satisfies (3.33) for p = 32 . este, 24(1992), 207-227.
In fact, we should to prove that for all x, y ≥ 1, the
[5] R.E. Gaines, J.L. Mawhin, Coincidence Degree and
following ratio is bounded,
Nonlinear Differential Equations, Springer-Verlag,
√ √
| x − 1 − y − 1| Berlin, 1977.
p ≤ K, (3.41)
|x − y| [6] J. P. Gossez, P.Omari Periodic solutions for a sec-
this can be transformed into the following inequality ond order ordenary differential equation: A neces-
√ √ sary and sufficent conndition for nonresonance, J.
| x̂ − ŷ| Differential Equations, 94,(1991),67-82.
p ≤ K, (3.42)
|x̂ − ŷ| [7] X.K. Huang, Z.G. Xiang, On the existence of
√ 2π−periodic solutions of Duffing type equation
for all x̂, ŷ ≥ 0. Dividing through by ŷ(if ŷ = 0, then
the bound is obviously one), we only need to show that x (t) + g(x(t − τ )) = p(t), Chinese Science Bul-
for all x̃ > 0 letin., 39(1994), 201-203, in Chinese.
√
| x̃ − 1| [8] S.P. Lu, W.G. Ge, Periodic solutions of the secon-
p ≤ K. (3.43)
|x̃ − 1| der order differential equation with deviating ar-
The left hand side of above inequality is continuous ex- guments, Acta Mathematica Sinica, 45(2002),811-
√
| x̃−1| 818, in Chinese.
cept perhaps at x̃ = 1. As x̃ → 1, we get lim √ =
x̃→1 |x̃−1|
√
| x̃−1|
[9] R. Manásevich, J. Mawhin, Periodic solutions for
0, and as x̃ → +∞, lim √ = 1. and (3.43) fol- nonlinear systems with p-Laplacian like operarors,
x̃→+∞ |x̃−1|
p−1 1 J. Differential Equations, 145(1998), 367-393.
lows. It’s easy to see that 2 p K|τ |p−1
0 = 23 × 1 × 1 <
2 = σ. By using Theorem 3.2, Eq.(3.40) has at least [10] M. A del Pino, R. Manásevich , Multiple solutions
one 2π−periodic solution. for the p−Laplacian under global nonresonance,
Proc. Amer. Math. Soc. 112 (1991)131-138.
[11] G.Q. Wang, A priori bounds for periodic solutions
References of a delay Rayleigh equation, Appl. Math. Lett. 12
[1] P. De. Amster ,P. Napoli and M.C. Mariani , Pe- (1999) 41-44.
riodic solutions for p−Laplacian like systems with [12] M.G. Zong, H.Z. Liang, Periodic solutions for
delay, Dynamics of continuous discrete and im- Rayleigh type p−Laplacian equation with devi-
pulsive system, Series A: Mathematical Analysis, ating arguments, Applied Mathematics Letters,
13(3-4):(2006)311-319. 20(2007)43-47.

WEB PAGE IMPORTANCE RANKING WITH PRIORI KNOWLEDGE

GUOYANG SHEN, SHIJI SONG
Department of Automation, Tsinghua University, Beijing 100084, China

E-mail: shijis@tsinghua.edu.cn, shengy@mails.tsinghua.edu.cn
Abstract PageRank and its variations have been widely used in Therefore, the importance of Web pages calculated by
current search engines to measure the importance of Web pages. PageRank may not be as reasonable as expected. For
However, the Web graphs used by current search engines are instance, one may find that a very popular web page is
inexact owning to missing edges and additive noises. As a result, ranked lower than it should be. To tackle this problem, a
the so-calculated importance of Web pages may not be as
reasonable as expected. This paper investigates how to achieve
feasible approach is to use the feedbacks of users to adjust
better ranking for a noisy Web graph with the help of priori the ranking so as to make it more reasonable. That is, we
knowledge. In particular, partial-order constraints are used to can integrate the priori knowledge in terms of partial order
represent the priori knowledge and the Web page importance constraints into the calculation of PageRank, in order that
ranking is formulated as an optimization problem. The the eventual importance of Web pages is jointly determined
optimization problem is solved by augmented Lagrange multiplier by the priori knowledge and the intrinsic structure of the
algorithm. Preliminary simulations demonstrated the promising Web graph. This is the basic idea of our work.
performance of the presented approach for Web page importance In particular, we propose to adjust the transition
ranking with priori knowledge. probability matrix of the Web graph to make the invariant
probability (corresponding to the importance of Web page)
Keywords Search engine; Web page importance ranking; produced by PageRank satisfy the priori knowledge.
optimization problem; Lagrange multiplier algorithm Accordingly, we formulate Web page importance ranking as
a constrained optimization problem. We use augmented
1. Introduction Lagrange multiplier algorithm to solve this problem.
Preliminary simulations demonstrated the promising
Using search engines has become a major means for performance of our proposed approach.
people to find information on the Web. The Web page The following part of this paper is organized as below.
importance ranking is one of the most important factors that In Section 2, we mainly introduce PageRank algorithm and
influence the efficiency when people find information they researches after it. In Section 3, we show the problem of
want. The most famous ranking method for Web pages is PageRank algorithm. In Section 4, we propose our
PageRank[1]. This algorithm uses the Markov random walk approach which is called Web page importance ranking by
model to explain its mathematical foundation. And it was priori knowledge to tackle the problem of PageRank
proved that the resultant importance of the Web pages algorithm. In Section 5, we analyze the adjustment to the
corresponds to the principle eigenvector of the transition Web graph in our approach. In Section 6, we present the
probability matrix of the underlying Markov chain preliminary simulations we conducted to test the
corresponds to the Web graph. performance of our approach. After that, we will give the
The basic assumption of PageRank is that the Web conclusions and discussions in the last section.
graph PageRank working on are exact and reliable.
Unfortunately this assumption is not true. On the one hand, 2. PageRank Algorithm
the Web graph may lose some edges. While constructing the
Web graph, some links may be missing due to the html Now we will introduce the PageRank algorithm. The
parsing error and the URL expiration. On the other hand, original PageRank algorithm can be describe by the
the graphs may contain noisy edges. For example, there are following formula.
many spam links in the Web[2], which can dramatically drop
the performance of search engine by reducing the reliability S w
S v ¦ q w , (1)
of PageRank scores of web pages.
w p v
As aforementioned, the Web graph is inexact.

S v represents the PageRank value of Web page v , its variations are wildly used. Many researches have been
done after it was proposed. In 1999, T.H. Haveliwala
q w represents the set of child nodes of the Web page w , studied the convergence of PageRank algorithm[5]. In
2002-2003, T.H. Haveliwala proposed the topic-sensitive
p v represents the set of parent nodes of the Web page PageRank algorithm which can be adjusted according to
v. personal interest of users[6]. In 2004, Deng Cai et al. gave
We can also use a matrix formula to represent the different weights to the links in different positions of Web
original PageRank algorithm. pages before calculating PageRank[8]. In 2005, Gui-rong
Xue et al. calculated PageRank basing on the Web
ST S T P, (2) hierarchical structure[9].
where S is the probability distribution of the Web
pages (also can be seen as the states of Markov 3. Drawback of PageRank Algorithm
chain)ˈand P is the transition probability matrix.
The aforementioned descriptions show that the ranking
1
° q (u ) ˈ if edge (u , v ) exist, result produced by PageRank is mainly determined by the
P (u , v) ® (3) structure of the Web graph (although we have a tunable
°0 ˈ parameter D ). This may have some potential problems. As
¯ else.
we know, the eigenvector is sensitive to the change of the
From (2) we can see that S is the principle matrix. Therefore, the inexactness of the graph structure
eigenvector of the transition probability matrix P . And we may impact the final ranking significantly in some cases
can use an power method to obtain the S . If the Markov and lead to unsatisfactory ranking results. For example,
chain corresponds to the Web graph has an invariant with many spam links, a trivial page may get a higher
distribution, the power method will converge to an S in PageRank score than an important webpage. This is surely
inconsistent with human knowledge. To tackle this problem,
(2) (see [3]). According to the property of the Markov chain,
a feasible way is to refine the ranking result produced by
limited irreducible and aperiodic Markov chain has an
PageRank with the priori knowledge of human beings. This
invariant distribution. In real Web graph, leaf nodes (which
is just the idea of our work, which will be elaborated on in
have no child nodes) lead to the reducibility of the Markov
the next section.
chain. For solving of this problem, Prasanna Desikan et al
presented that the leaf nodes can be linked to all nodes in
the Web graph (see [4]). The new transition probability 4. Ranking with Priori Knowledge
matrix can be described in (4).
It is not very difficult for us to get some priori
1
Pc P d eT , knowledge or user feedbacks about the importance of Web
n page in the Web. For example, the number of visitors may
1, q (i ) 0, reflect the importance of a webpage. For ease of
di G q (i) , 0 ® (4) representation, we denote such priori knowledge by
¯0, q (i ) z 0, partial-order constraints. For instance, supposing we know
T that node 1 is more important than node 2, and node 3 is
e 1,1,",1 .
more important than node 4, the partial-order constraints
At the same time, a damping factor is introduced to can be represented by the following inequalities,
make every node linked to a random node in the Web graph.
The intuitive sense is that when a Web user browse a Web ª10 0 0 º ª 010 0 º
page, he may not click the link in the page but random « 0 010 » S ! « 0 0 01» S , (7)
choose another Web page in the Web. So the transition ¬ ¼ ¬ ¼
probability matrix will be changed as or equivalently,
AS ! BS , (8)
Pcc (1 D ) Pc D K . (5)
Here K
T
(1," ,1) e, 0 D 1. Pcc is the transition ª10 0 0 º ª010 0 º
where A «0 010 » and B «0 0 01» .
probability matrix used in current PageRank algorithm. ¬ ¼ ¬ ¼
ST S T Pcc. (6) Suppose that we have got the constraints with a similar
In current search engines, the PageRank algorithm and format to (8). With these constraints, our methodology is to

adjust the transition probability matrix, so that the invariant yT DT yT DT ( Pcc - Cyy T C T ) , (17)
distribution of the new Markov chain will be consistent
with the constraints as much as possible. eT Dy 1 , (18)
Note that if we do not further constrain the adjustment Dy t 0, (19)
to the transition probability matrix, we will get many
possible adjustments which can satisfy those partial-order ADy ! BDy. (20)
constraints. This is not reasonable because the graph after In order to use augmented Lagrange multiplier method
adjustment may be totally different from the original graph. to solve Problem 2, we now transform it to Problem 3,
In our opinion, although the original graph is noisy, much ª I 0º
of its information can still be regarded as reliable and min g ( y ) yT « »y (21)
informative. Therefore, we should not adjust the transition ¬0 0¼
probability matrix too much. s. t.
With the above considerations, we formulate Web page c1 ( y ) eT ( I - P '' Cyy T C T )( I - P '' Cyy T C T )T e 0,
importance ranking as the following optimization problem (22)
(denoted by Problem 1 for ease of reference), in which the
adjustment is minimized subjected to those partial-order c2 ( y) yT DT (I - Pcc CyyTCT )(I - Pcc CyyTCT )T Dy 0,
constraints. (23)
T
min f ( x) x x, T
(9) c3 ( y ) e Dy 1 0, (24)
T
s.t. ( Pcc - xx )e e, (10) § c4 ( y ) ·
¨ ¸ § Dy · (25)
¨ # ¸ t 0.
T T T
S S ( Pcc - xx ), (11) ¸ ¨
¨c ¸ © ( A - B ) Dy ¹
eT S 1, (12) © 2n3 ( y ) ¹
T
AS ! BS , (13) Where c ( y ) c1 ( y ), c2 ( y)," , c2 n3 ( y) is a
S ! 0. (14) vector with 2n+3 dimensions.
T The augmented Lagrangian function of Problem 3 is as
Where e 1,1," ,1 . Pcc is an n u n matrix. 3
Vi 3
x is a vector with n dimensions. Matrix xx is an T L( y , O , V ) g ( y) ¦ O j c j ( y) ¦c 2

j ( y)
j 1 2 j 1
n u n matrix used to adjust Pcc . We do not use a full rank
matrix to adjust Pcc for it will make the problem too ª V º O
° « Oi ci ( y) i ci2 ( y) » ,if ci ( y) i ,
complicated. The matrix derived by a vector can adjust the 2 n 3
°¬ 2 ¼ Vi
matrix Pcc and simplify the problem. ¦ ® 2
(26)
Problem 1 include two vector x and S . We make j 4 ° Oi
ˈ else.
the following transformation. °¯ 2V i
§x · () () ()
Denote c ( y ) (c1 ( y )," , c 2 n3 ( y )) ,
T
Firstly, we denote y
¨ ¸ , then
©S ¹ ci( ) ( y ) ci ( y ), i 1," ,3;
x Cy, C > I nun | 0nun @ ,
ci( ) ( y ) min ^0, ci ( y )` , i 4," , 2n 3.
S Dy, D > 0 n u n | I nu n @ , We use the augmented Lagrange multiplier algorithm
where I nun is an n u n unit matrix. to solve Problem 3.
With the above transformation, Problem 1 can be Suppose in the k th iteration, we have
3
transformed to Problem 2,
g y ( k 1) ¦ ª¬O (k )
V (j k ) c j y ( k 1) º¼c j y ( k 1)
ªI 0º j
min f (x) g( y) (Cy)T Cy yT « » y, (15) j 1
¬0 0¼ 2n3
T T
s.t. ( Pcc - Cyy C )e e , (16) j 4
^ `
¦ max O j(k ) V (jk )c j y(k 1) ,0 c j y(k 1) , (27)
Thus, by taking

Oj(k1) Oj(k ) V (jk )cj y(k1) , j 1, 2, 3; (28) i 4," , 2n 3; k : k 1 , go to Step 2.
LEMMA 1 If the feasible domain of y is not empty,
^
Oj(k1) max O(jk) V(jk)cj y(k1) ,0 , j 4, 5,",2n3. (29) ` the above algorithm will be stopped in finite iterations or
as the Lagrange multiplier in the next iteration. From produce a series of y k 1 satisfied
(27)-(29), we know that
2 n 3 lim inf g y k f. (37)
g y ¦ O c j y
( k 1) ( k 1) ( k 1) k of
j 0. (30)
Proof. See [10].
j 1
For convenience, we call the whole solution of Web
From (30) we get that for any k t 2 , the error of page importance ranking with priori knowledge as RPK
Kuhn-Tucker condition of y , O is
k
(k ) algorithm.
y L0 yk , O ( k ) c ( ) y c y ,
(k ) () (k )
(31) 5. Adjustment Analysis
where In this section, we will give the lower bound of the

2 n 3
adjustment.
L0 y, O g ( y ) ¦ Oi ci ( y ) (32) Suppose S is an invariant distribution of transition
i 1
probability matrix Pcc , after adjustment the transition
is the Lagrange function of Problem 3.
probability matrix P1cc defined by P1cc Pcc E ,
Thus, if k t 2 and
1 here E xxT , and define the new invariant distribution
ci ()
y ( k 1)
d ci( ) y ( k ) ,
4
(33)
S 1 . We denote A I - Pcc, and Z ( A eS T )-1 , then
Can not be satisfied, we will amplify the penalty factor, i.e. we have
V i( k 1) 10V i( k ) . (34) THEOREM 1 If S T S T P cc and S 1T S 1T ( Pcc - E ) ,
Thus the augmented Lagrange multiplier algorithm is given then S T - S 1T S 1T EZ .
as follows Proof: From
Step 1 Initialization.
S T - S 1T S T Pcc - S 1T ( Pcc - E )
y (1) \ n , O (1) \ 2 n 3ϨOi(1) t 0(i 4," , 2n 3);
(S T - S 1T ) P cc S 1T E , (38)
V i(1) ! 0(i 1," , 2n 3); H t 0, k : 1.
we have
Step 2 Solve the following unconstrained optimization
( k 1) (S T - S 1T )( I - P cc) S 1T E , (39)
problem to get y T T T T T T
Note that (S - S )eS (S e - S e)S
minn L y, O ( k ) , V ( k ) . (35)
1 1
y\ (1 1)S T 0, (40)
If c ()
y ( k 1)
f
d H , then stop iteration. We have
(S T - S 1T )( I - Pcc eS T ) S 1T E. (41)
Step 3 For i 1," , 2n 3 , we take T
Considering that I - P cc eS is invertible matrix, we have
V ( k 1)
(k )
°V i , if ci y
() ( k 1)
d 14 ci( ) y k , (S - S ) S E ( I - Pcc eS T )-1
T T
1
T
1
i ®
° max ª10V i( k ) , k 2 º , else. S 1T EZ . (42)
¯ ¬ ¼
Define F norm of an matrix as follows
Step 4 Calculate O ( k 1) by using following formulas 1
O i
( k 1)

Oi( k ) V i( k ) ci y k 1 , i 1," ,3; || A ||F tr AT A 2 . (43)
T T
S 1T EZ , we have
Oi( k 1) ^
max Oi( k ) V i( k ) c y , 0` .
i
k 1
(36)
Note that S - S 1
S T - S 1T d S 1T Z E d Z E . (44)
F 2 F F F F

From (44) we can estimate the lower bound of the rate) to the original graph. The corresponding inexactness is
adjustment to the transition probability matrix Pcc in caused by additive noises. For the next step, we randomly
order to satisfy the priori knowledge choose some of the partial orders from the original graph as
S T - S 1T 2 the constraints to rank the inexact graph with the algorithm
EF t , (45) given in section 4. The effectiveness of RPK algorithm is
Z F illustrated by following Figures respectively.
Note that E F
xxT xT x , we have
F
S T - S 1T
xT x t 2
. (46)
Z F
(46) gives the lower bound of xT x and show that we

minimize xT x is reasonable.
6. Preliminary Simulations
In this section we present the preliminary simulations

we conducted to test the performance of RPK algorithm.
For the evaluation criterion, we used a metric based on Figure 1 The random generated original graph
the Kendall distance [11] to measure the similarity between
two ranking lists. That is, suppose the ground-truth rank list
is s and the rank list we calculated is t , and then their
similarity is calculated as follows
K ( s, t ) ^(i, j ) | i j , s(i ) s ( j ), t (i ) ! t ( j )` . (47)
Where K ( s, t ) is the Kendall distance, which counts the
number of pair-wise disagreements between two ranking
lists s and t , and the similarity of two ranking lists is
defined as follows
K ( s, t )
S ( s, t ) 1 . (48)
Cn2
It is clear that S ( s, t ) takes values between 0 and 1.
In particular, when it equals 1, the new rank list is perfectly Figure 2 Similarity between RPK algorithm over
the same as the ground truth. If S ( s, t ) equals 0, the new down-sampled graph and original PageRank
rank list will be totally different from the ground truth.
For the dataset, we first randomly generated an -node
Web graph (denoted by the original graph in the following
discussions) and calculate a PageRank vector for this graph.
The C n2 partial orders between all the vertices in this
vector are used as the ground truth for evaluation. After that,
we generate an inexact graph based on the original graph by
means of sampling. In this paper, we consider two kinds of
samplings: down-sampling and up-sampling.
Down-sampling refers to reserving each edge in the
original graph with some probability Į (named by sampling
rate). The inexactness of the down-sampled graph is caused
by information loss or missing edges. Up-sampling refers to
randomly adding some new edges (the ratio of newly added
edges to the edges in the original graph is named by noise Figure 3 Similarity between RPK algorithm over
up-sampled graph and original PageRank

Figure 1 shows the original graph with 30 vertices. Acknowledgements
Figure 2 and Figure 3 show the performance of RPK
algorithm with respect to different number of constraints, in This paper is supported by the National Science
case of down-sampling and up-sampling respectively Foundation of China under grant 60574077.
(without loss of generality, we set the sampling rate to 0.6
and set the noise rate to 0.2). The y-axis represents the References
similarity between the ranking list produced by RPK
algorithm based on the inexact graph and the ranking list of [1] Larry Page, Sergey Brin, R Motwani, T Winograd.
the original graph. The larger the similarity is, the more The PageRank Citation Ranking: Bringing Order to
effective RPK algorithm will be. The x-axis represents the the Web. Technical Report. Stanford University,
number of constraints. Considering that these constraints 1998.
were randomly selected, we actually ran 50 times of our [2] Z Gyongyi, H. Garcia-Molina. Web spam
experiment and report the average performance. From the Taxonomy. Technical report. Stanford Digital
above figures, we can see that the performance of RPK
Library Technologies Project, 2004.
algorithm increases with the increasing number of
[3] Pavel Berkhin. A Survey on PageRank Computing.
constraints. For example, for the down-sampling case as
shown in Figure 2, without any constraint the similarity is Internet Mathematics. 2005, 73-120.
about 0.7, while with 12 constraints, RPK algorithm [4] Prasanna Kumar Desikan, Nishith Pathak, Jaideep
improves the performance to almost 0.8, which corresponds Srivastava, Vipin Kumar. Divide and Conquer
to a relative improvement of 14.3%. Note that 12 is a small Approach for Efficient PageRank Computation.
number of constraints, if we consider that there are Proc. 6th Intl. Conf. on Web Engineering. 2006,
30 × 29 / 2 = 435 constraints in total for this graph. 233-240.
Similarly, from Figure 3 we also see an absolute [5] T H Haveliwala. Efficient Computation of PageRank.
improvement from less than 86% to over 92%, or a relative Technical Report. Stanford University, 1999.
improvement of about 9.3% for the case of up-sampling [6] T H Haveliwala. Topic-Sensitive PageRank. Proc.11th
with 12 constraints. Intl. World Wide Web Conf., 2002.
[7] T H Haveliwala. Topic-Sensitvie PageRank: A
Context-Sensitive Ranking Algorithm for Web Search.
7. Conclusion
IEEE Transactions on Knowledge and Data
Engineering, 2003, 784-796.
In this paper, we investigated how to improve Web
importance ranking with the help of priori knowledge. Our [8] Deng Cai, Xiaofei He, Ji-Rong Wen and Wei-Ying Ma.
main idea is to adjust the transition probability matrix of the Block-level Link Analysis. Proc. 27th Annual Intl.
Markov chain corresponding to the Web graph, in order that ACM SIGIR Conf., 2004, 440-447.
the invariant distribution of the Markov chain can satisfy [9] Gui-rong Xue, Qiang Yang, Hua-Jun Zeng. Exploiting
the given partial-order constraints. In particular, we the Hierarchical Structure for Link Analysis. Proc.
formulated the Web importance ranking with priori 28th Annual Intl. ACM SIGIR Conf., 2005, 186-193.
knowledge as an optimization problem, and further [10] Yaxiang Yuan, Wenyu Sun. Optimization theory and
proposed the RPK algorithm to solve the optimization method. Science Press, Beijing. 1997, 475-477.
problem. Preliminary simulations validated the [11] Conover W J. Practical Non-Parametric Statistics,
effectiveness of RPK algorithm. 1980.
Besides what we reported in this paper, we are also
very interesting in some other critical problems related to
Web page ranking with priori knowledge, such as how to
select more effective constraints, and how to further boost
the efficiency of the RPK algorithm using some specific
properties of Web graph. In addition, as can be seen, we
evaluated our algorithm only with a synthetic dataset in this
paper. We will do more experiments, especially on
real-world large-scale Web graphs in our future works.

AUTHOR INDEX
A
Alexander G. Loukianov 25 Amol Patwardhan 168
Alfonso Iglesias 580 Anamarija Borštnik Bračič 227
Alma Y. Alanis 25 Anjan Kumar Ray 550
Amit Shuklaz 550 An Jun-Fang 784
B
Bernardino Arcay 580
C
Carlos Dafonte 580 Chen Hung-Cheng 658
Carlos R. Mariaca Gaspar. 512 Chen Jianguo 526
Cao Daosheng 378 Chen Jing 608
Chacón M. 232 Chen Lei 689
Chang Guoliang 285，805 Chen Leiting 481
Chen-Chia Chuang 119 Chen Lin 416
Chen Degang 19 Chen Qiao 31
Chen Guanghai 770 Chen Qijun 470
Chen Ming 694 Chen Wen-Yu 503
Chen Pu 78 Cui Xuehui 41,435
Chen Neiping 779 Chen Yong 313
Chia Chang Hsu 713 Cleber Zanchettin 328
Ching-Hung Lin 146 Cui Ying-kun 382
Chen Guanrong 25 Cui Yuan 439
Chen Dong-Yan 784 Chi Zheru 624
Chen Hanping 36
D
Dae Sik Jeong 408 Donald C. Wunsch II 486,494
Dai Xianzhong 612 Dong Xiaogang 41
Daša Grabec 602 Doreswamy 591
David A. Cartes 486,494 Dušan Grošelj 602
Diego Ordóñez 580 Du W 323
Ding Xuejun 36 Du Yongfeng 463
E
Ebenezer JeyaKumar.A 349 Edvard Govekar 227
Edgar N. Sanchez 25 Eui Chul Lee 408
F
Fan Zhengping 400 Fang Zhongjie 481
Fang Jiancheng 138,338,559 Feng Chunhua 111
811
Feng Guochen 708 Fu Xiaoling 724
Francis C. K. Wong 650 Fullana.R 618
Fu Jiacai 307 Fx Sun 323
Fu Xiangling 77
G
Geok See Ng 248 Guo Shenghai 450
Guan Xiaohong 395 Gursel Serpen 168
Gunasekaran.N 638 Gwo-Ruey Yu 543
H
Ha Ming-Hu 46,72 Hong Bingrong 248
Hamdi A. Awad 153 Hou Ling 784
Han-Pang Huang 146 Hou Jinjun 77
Hao Zhifeng 294 Hu Cq 323
He Jia 416 Huang Dong 737
He Qiang 222 Huang Qi 608
He Shan 536 Hyun-Ae Park 408
He You 374,522
I
Ieroham S. Baruch 512 I-Hsum Li 57
Igor Grabec 129,227,602 Ivan Ferkolj 602
J
James B. Hayfron-Acquah 307 Jiang Han-qiao 382
Jeng-Chyan Lin 658 Jiang Ju 202,689
Ji Dong-hai 699 Jin Chenxia 746
Ji Luping 369 Jin Fan 313
Ji Pengcheng 133 Jin Feng 182,207
Jia Jun-jing 699 Jin-Tsong Jeng 119
Jian Cheng Lv 163
K
Kang Ryoung Park 408,425 Keyue Zhang 83
Kang Xidai 95 Kuchen.B 618
Keith W. Hipel 703 Kuei Hsiang Chao 713
L
Laxmidhar Beheray 550 Li Dan 682
Lei Guo 138,559 Li Dongmei 724
Lei Qiang 253 Li Fachao 746
Li Chunji 719 Li Guocheng 756
812
Li Jiamin 708 Liu Bo 294
Li Lanlan 450 Liu Baiqi 559
Li Ling 644 Liu Fen 382
Li Peng 260,570 Liu Guisong 369,454
Li Ping 1 Liu Haitao 248
Li Shengjun 719 Liu Jinwang 724
Li Wei 444 Liu Lifeng 694
Li Xiaoli 536 Liu Qinghuai 41,708
Li Xinyu 364,508 Liu Shijun 526
Li Yajie 444 Liu Xiangdong 459
Li Yunxia 163 Liu Yanhui 463
Li Zhonghua 400 Liu Yuan 391
Li Zhongyan 677 Lu Xiaodan 301
Liang Jingwei 193 Lu Shuxia 187
Liang Yc 323 Luo Wenjing 338,386
Ling Ping 265
M
Ma Ning 751 Marİfİ Güler 664
Ma Wanbiao 682 Michael T. Manry 284
Ma Xicheng 444 Ming Hui 435
Manikopoulos.C.N 644 Mohamed S.Kamel 202,689
Marc Kilgour.D 703 Musa Alc 671
Mariesa L. Crow 486,494
P
Patch. Beadle 285,596 Peng Xiaoming 791
Patiño.H.D 618 Pu Xiaorong 481
Pattabiraman.J 631 Pucheta.J 618
Q
Qiu Jian-Cong 658 Qiu Huizhong 454,737
Qiao Jihong 89,474
R
Rainer Spiegel 354 Réjean Plamondon 111
Ramakrishnan.S 349 Ren Jia-song 531
Ranganathan.H 631,638 Ren Xue-Kun 728
Raymond Pavloski 271 Rivas P. 232
813
S
Sabri Arik 6 Shi Jun 450
Sang Yongsheng 565 Shi Juan 307
Saul Escalante M. 512 Song Shiji 133,253,395,805
Schugurensky.C 618 Song Shi-Ji 182,197,207,766
Seong G. Kong 242 Stones LeiZhang 439,565,575
Shang Lifeng 369 Su Shun-Feng 57
Shen Guoyang 596 Sun Shi-Xin 503
Shen Minfen 285 Sun Xiaoshu 374
Shi Daming 248
T
Tao. C. W. 57,543 Tian Shurong 374,522
Teresa B. Ludermir 328 Tolga Ensari 307
Tian Da-Zeng 46 Tong Menghua 586
Tian Jing 72 Tu Guoyu 395
V
Varoon Charastrakul 307 Venugopalan.S 349
W
Walter H. Delashmit 284 Wang Zhiguang 694
Wang Hengyou 19 Wei Bian 13
Wang Hongrui 791 Wei Jiang 242
Wang Hongyan 89,474 Wei Qiuping 459
Wang Guoli 400 Wei Wu 95
Wang Guoyin 313 Wei Xiong 522
Wang Lijun 67 Wei Yuan 798
Wang Meng-Hui 658,713 William S-Y Wang 650
Wang Runqiu 421 Woonseng Gan 586
wang Qing-lin 474 Wu Cheng 133,182
Wang Sq 323 Wu Chong 217,378
Wang Wancheng 612 Wu Cong-Xin 222,728
Wang Wei-Yen 57 Wu Huaiqin 51
Wang Xinmin 41,708 Wu Jianrong 719,766
Wang Xizhao 187 Wu Rushi 570
Wang Xm 323 Wu Sen-lin 699
Wang Y 323
X
Xia yu-hui 474 Xin Lin 791
Xiao Tianyuan 253 Xiong Qingyu 608
Xin Guan 522 Xu Chi 624
814
Xu Fei 391 Xu Jing 742
Xu Jian-zhong 531 Xu Zongben 31
Xu Haiyan 703 Xue Fang 444,463
Xu Hong 416 Xue Xiaoping 13,51
Xu Honglei 36
Y
Yan Cai 624 Yang Shangming 212
Yan Chen 89 Yang Xing 260
Yan Hua 386 Yang Xinsong 770
Yan Xiong 95 Yang Yan-Xuan 207
Yan Zhizhong 435 Yue Panxing 746
Ya n g Bi n- We i 503 Yi Danhui 391
Yang Chengfu 575 You Hui 450
Yang Gang 779 Yu Linghui 138
Yang Limin 788
Z
Zhang Bo 425,459,742 Zhang Yi 1 1,177,212,338,
Zhang C 323 386,454,565,575,774
2
Zhang Chao 95 Zhang Yi 197,689,67
Zhang Chenggong 177 Zhang Zhi-Ming 46
Zhang Da Yong 217 Zhao Chuan-feng 382
Zhang Jiaqi 470 Zhao Wenqing 798
Zhang Jingxiao 733,738 Zhao Xia 343
Zhang Jiye 83,100,106 Zheng Weifan 100
Zhang JunPeng 439 Zhi Lixia 421,761
Zhang Liming 301 Zhou Cg 323
Zhang Ming 421 Zhou Cai-Li 72
Zhang Qizhi 586 Zhou Dongli 791
Zhang Rui 31 Zhou Lei 197
Zhang Shuguan 106 Zhu Zhi-Gang 728
Zhang Weihua 83,106 Zong Minggang 798
Zhang Yunong 400
815

Book2 Neural

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Book2 Neural

Cargado por

Copyright:

Formatos disponibles

Volume 14(S1) Supplementary August 2007

Series A: Mathematical Analysis

Xinzhi Liu, University of Waterloo

Advances in Neural Networks--Theory and Applications

DCDIS 14(S1) 503-815 (2007) ISSN 1201-3390

Watam Press • Waterloo

Georgia Institute of Technology Melbourne, FL 32901, USA S. R. May Massachusetts Institute of

Atlanta, GA 30332, USA G. Leitmann University of Oxford Technology

V. Lakshmikantham University of California Oxford, OX1 3PS, UK Cambridge, MA 02139, USA

R.P. Agarwal Winterthurerstr. 190, Zürich D. Guo S. Sathananthan

National University of Y.J. Cho Shandong University Tennessee State University

S. Ahmad S.N. Chow Universität Tübingen University of Mi nnesota

San Antonio, TX 78249, USA Singapore L. Hatvani D.D. Siljak

University of Ottawa L. O. Chua N. Hirano Santa Clara, CA 95053, USA

Ottawa, Ontario, Canada University of California Yokohama National University R. Temam

University of Texas G. Da Prato I. M. Lasiecka C.N.R.S., France

University of Waterloo T. Furumochi J. Mawhin Hamilton, Ontario, Canada

Waterloo, Canada Shimane University University of Louvain J. Wu

C.Y. Chan Matsue, 690 Japan B-1348 Louvain-La-neuve, York University

University of Louisiana K. Gopalsamy Belgium Toronto, Ontario, Canada

Lafayette, LA 70504, U.S.A. Flinders Univ of South O’Regan

M. Chipot Australia National Univ of Ireland

Universität Zürich Adelaide S.A. 5001, Australia Galway, Ireland

Section Language Model for Information Retrieval 503

Section Language Model for Information Retrieval

section or an article. P (Q|d) = P (w1 , w2 , ..., wn |d) ≈ P (wi |d) (5)

Average Insert-Value Veracity

Fig. 1. The Values and average insert-value veracity graph λ.

Markerless Tracking Based on Transiently Chaotic Neural Network with Invariant

where * is the convolution operation in x and y , and z i (t  1) (1  E ) z i (t ) (6)

3 Feature Matching by Neural Network

Fig.4 virtual object overlap on the real scene

5 Conclusions and Future Work

Identification, Filtering and Control of Nonlinear Plants by

L(k ) || J (k ) ||2  || B (k ) ||2  || C ( k ) ||2 (14)

Then the identification error is bounded, i.e.:

The general recursive L-M algorithm of learning, [13]- ¬ 0 U ¼

Where: the S(.), and ȍ(.) matrices are given as follows:

S[W (k )] D (k )/(k )  :T [W (k )] P(k  1):[W (k )] (27)

Fig. 4. Block-diagram of the closed-loop RTNN control system

Here the input, output and state dimensions of the RTNN-1

X c f b k  1 Ac f b X c f b  Bc f b X i k (38) The RTNN topology is controllable and observable, [9], [10]

The input signal is as follows:

U k 0.25sign[sin 2S k / 30 ]  0.25 (54)

The RTNN topology is (1, 2, 1) and To=0.01. The graphics

R1 k J 1sign[sin S k /10 ] (59)

Where the reference amplitudes are random Gaussian zero-

Fig. 11. Detailed graphical simulation results of MIMO plant direct

The estimated states of the recurrent neural network model

REFERENCES IEEE Transactions on Neural Networks, vol. 1, No1,

Combining Radar Emitter Recognition with Ambiguous a priori

AMS subject classiﬁcations: 34K35,34H05,49J25,

Abstract: Based on the concept of the concordance 2 Concordance and Condi-

where B(F ) is the (Borel) subsets of F (E).

Simply, random ﬁnite set is a random variable

m(σ|γ) ((f |f )) = p(σ ∧ γ = f, γ = f ) = p(f ⊆ σ, γ = f ) (3) αγ (A ·γ B, C) = αγ (A, B ·γ C).

= p(γ ⊆ σ) = βγ (σ) priori Knowledge

Local Weather Forecast for Flight Training Using

Wind Shear of Guanzhong Season Rainfall Wind Snowfall Coldwave Rainstorm

Air Trough Season Rainfall Wind Dust Thunderstorm Rainstorm

The Percetron learning rule is deﬁned as: as:

W (n + 1) = W (n) + η · (D − a(n)) p, (5) if p1 = (0, 1, 0, ..., 1, 0, ..., 0)T40 ,

The Research of Enterprise Strategy Management Based on Bayesian

The Bayesian networks is one kind of statistical

where * is the convolution operation in x and y , and z i (t 1) (1 E ) z i (t ) (6)

L(k ) || J (k ) ||2 || B (k ) ||2 || C ( k ) ||2 (14)

S[W (k )] D (k )/(k ) :T [W (k )] P(k 1):[W (k )] (27)

X c f b k 1 Ac f b X c f b Bc f b X i k (38) The RTNN topology is controllable and observable, [9], [10]

U k 0.25sign[sin 2S k / 30 ] 0.25 (54)

This is followed by some ﬁne movement given by θnout 1

averaged output to calculate θ0out and θnout

θ0out = s−1 hr1 (r) θr + Ar ip target − wr (28)