Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Improved Nelder Mead's Simplex Method and Applications
Improved Nelder Mead's Simplex Method and Applications
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 55
Abstract – Nelder Mead’s simplex method is known as a fast and widely used algorithm in local minimum optimization. However,
this algorithm by itself does not have enough capability to optimize large scale problems or train neural networks. This paper will
present a solution to improve this deficiency of Nelder Mead’s simplex algorithm by incorporating with a quasi gradient method.
This method approximates gradients of a function in the vicinity of a simplex by using numerical methods without calculating
derivatives and it is much simpler than analytical gradient methods in mathematic perspectives. With this solution, the improved
algorithm can converge much faster with higher success rate and still maintain the simplicity of simplex method. Testing results with
several benchmark optimization problems of this improved algorithm will be compared with Nelder Mead’s simplex method. Then
this algorithm will be applied in synthesizing lossy ladder filters and training neural networks to control robot arm kinematics.
These typical applications are used to show the ability of the improved algorithm to solve the varieties of engineering problems.
Index Terms — Nelder Mead’s simplex method, quasi gradient method, lossy filter, Error Back Propagation, Lavenberg
Marquardt, neural networks
Or xs x1,1 , x 2 , 2 ,..., x n , n (2)
Nelder Mead’s simplex method is simple and can converge Step 4: if the function value at R’ is smaller than the
to local minima without calculating derivatives. To maintain function value at B, it means that BR’ is the right direction
this simplicity, one quasi gradient method is presented to of the gradient then R’ can be expanded to E’.
approximate gradients of a function [16]. This method uses an
extra point created from a simplex to approximate gradients. E' (1 ) B R' (5)
The accuracy of this method depends on the linearity of a
function in the vicinity of a simplex. However, its computing The quasi gradient method using numerical methods have
cost does not increase significantly when the size of just been presented above is much simpler than analytical
optimized problems becomes larger. gradient methods. This method does not have to derive
This method approximates gradients of a (n+1) derivatives of a function which is usually very difficult for
dimensional plane created from a geometrical simplex. By complicated functions. Generally, the improved simplex
approximating gradients of the plane, we can approximate method with the quasi gradient method is similar to Nelder
gradients in the vicinity of a simplex. First we select an extra Mead’ simplex method except the way it calculates the
point with its coordinates composed from (n+1) vertices in a reflected point and the extended point in case the Nelder
simplex and then combine this point with n selected vertices Mead’s simplex method cannot define its moving directions.
in the same simplex to estimate gradients. Its steps are
presented as following:
Assume an optimized function f: n →, x Є n
4. EXPERIMENTAL RESULTS
Step 1: initialize a simplex with (n+1) random vertices x1, All algorithms are written in Matlab and all experiments
x2, …, xn are tested on a PC with Intel Quad. Several benchmark
Step 2: select an extra point xs with its coordinates functions which are well-known in local minimum
composed from n vertices in the simplex. In other words, optimization are tested in these experiments [1], [17], [18],
coordinates of the selected point are the diagonal of matrix [21]. Each function exhibits features deemed relevant for the
X from n vertices in the simplex. purpose of this comparison. In order to compare
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 3, MARCH 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 58
performances of these two algorithms, some assumptions are to 50,000; desired error is predefined to terminate algorithms
set: algorithms start with a random initial simplex in the range DE= 0.001; coefficients α= 1, β= 0.5, γ= 2; learning constant
of [-100, 100]; dimensions of all benchmark problems are σ=1. All results in Table 1-3 are the average values calculated
equal to 10, 15, 20 respectively; maximum iteration is equal over 100 random running times.
2
i
n
( 1) De Jong function 1
n
F6 x j
i 1 j 1
F1 xi2
i 1 ( 7) Zarakov function
2 4
( 2) Step function n
n n
n 2 F7 [ x 0.5ixi 0.5ixi ]
2
i
F2 xi 0.5 i 1 i 1 i 1
i 1 ( 8) Biggs Exp6 function
n
F8 [ xi2eti xi xi3eti xi 1 xi5eti xi 4 yi ]2
( 3) Wood function
1 x
n
F3 [ 100 xi 1 x 2 2 2
i 1
i i
i 1 where : ti 0.5i
90( xi 3 x ) (1 xi 2 )
2 2 2
i2
yi eti 5e10ti 3e4ti
10.1((1 xi 1 ) 2 (1 xi 3 ) 2 )
( 9) Colville function
19.8(1 xi 1 )(1 xi 3 )] n
F9 [ 100( xi2 xi1 ) 2 (xi 1) 2 (xi2 1) 2
( 4) Powell function
i 1
n
F4 [ xi 10xi1 5xi2 xi3
2 2
10.1((xi1 1) 2 (xi3 1) 2 )
i 1
90(xi22 xi3 ) 2 19.8(xi11)(xi3 1)]
xi1 2xi2 10xi xi3 ]
4 4
n
F5 100( xi 1 x ) ( xi 1)
2 2
i
2
i 1
i 1 where : a t * i, t is a const
( 6) Schwefel function
EXPERIMENTAL RESULTS
Nelder Mead’s simplex method Improved simplex method with
Function quasi gradient
Success Iteration Computing Success Iteration Computing
rate time rate time
F1 100% 989 0.0907 100% 531 0.0806
F2 100% 935 0.0863 100% 535 0.0825
F3 41% 2148 0.2042 52% 1259 0.2038
F4 100% 1041 0.1037 100% 777 0.1346
F5 55% 9606 0.8812 76% 7993 1.2094
F6 100% 1772 0.1666 100% 898 0.1400
F7 99% 3415 0.3195 100% 1208 0.1879
F8 52% 1158 0.11186 60% 2084 0.3301
F9 46% 2065 0.2026 50% 1251 0.2081
F10 65% 3435 0.3321 81% 4012 0.6398
Table 1: Evaluation of success rate and computing time of 10-dimensional functions
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 3, MARCH 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 59
In experiments tested on PC with Intel Quad, the improved task because the traditional algorithms as Butterworth,
algorithm with quasi gradient method shows its better Chebychev or inverse Chebychev, etc just synthesize filters
performance than Nelder Mead’s simplex method in terms of without affects of lossy inductors and capacitors (Fig. 3).
success rate and computing time. When the scale of Therefore, frequency responses of these ideal filters are much
optimizing problems become larger, Nelder Mead’s simplex different from the ones of real filters. In order to implement a
method gets less success rate. With the same random choice real filter with lossy elements having similar characteristics as
of initial vertices, the improved simplex method always a prototype filter, we have to shift pole locations of the real
obtains higher convergence rate, less computing time than filter close to pole locations of the prototype filter. In this part,
Nelder Mead’s simplex method. It means that the improved the improved algorithm can be used to replace for analytical
simplex method here is more reliable and more effective in solutions which are very complex and inefficient especially
optimization than the original simplex method. with high order filters.
5. APPLICATIONS
In the previous sections, the improved simplex method was
presented. This algorithm has shown its better performance in
several benchmark optimization functions. This section will
apply this new algorithm in synthesizing lossy filters and
training neural networks to control robot arm kinematics
5.1 Synthesis of lossy ladder filters Fig.3. Models of real capacitor and real inductor
Ladder filters are made up of inductors and capacitors and
widely used in communication systems. How to design a Our example is to design a 4th low-pass Chebychev filter
good filter with a desired frequency response is a challenging with attenuation in pass-band αp= 3dB, attenuation in
stop-band αs= 30 dB, pass-band frequency ωp= 1Khz,
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 3, MARCH 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 60
0.17198
H (S ) (6)
4 3 2
S 0.88598S 1.22091S 0.64803S 0.17198
1
H (S ) 4 3 2 (7)
a S a S a S a S a
1 2 3 4 5
where:
a C C L L R
1 1 2 1 2 2
a2 C1L1L2 0.1C1C2L1R2 0.1C1C2L2R2
0.1C1L1L2R2 0.1C2L1L2R2 C1C2L2R1R2
a3 0.1C1L1 0.1C1L2 0.1L1L2 C1L2R1
0.01C C R 1.01C L R 1.01C L R
Fig.7. Pole locations of filters
1 2 2 1 1 2 2 1 2
0.01C1L2R2 1.01C2L2R2 0.01L1L2R2 (8) In order to design a filter having desired characteristics as
ideal filter, its transfer function (eq. 7) has to be similar to the
0.1C1C2R1R2 0.1C1L2R1R2 0.1C2L2R1R2 transfer function of the prototype filter (eq. 6). In other words,
its pole locations have to be close to pole locations of the ideal
a 0.01C 1.01L 1.01L 0.1C R 0.1L R filter (Fig. 7). According to analytical method presented in
4 1 1 2 1 1 2 1
[23], they have to solve a nonlinear system (eq. 9) of five
0.101C R 0.201C R 0.201L R 0.101L R equations with five unknowns (assume R2 is known). Clearly,
1 2 2 2 1 2 2 2 the analytical method is not an effective way to find proper
1.01C R R 1.01C R R 0.01L R R solutions for this filter. Firstly, it is not easy to solve this
1 1 2 2 1 2 2 1 2 nonlinear system especially with high order filters. Secondly,
a5 0.201 1.01R1 1.0301R2 0.201R1R2 its solutions may not be applied in real implementation if one
of component values is a negative number.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 3, MARCH 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 61
a1 1 network architectures but are not good at training the others.
a 0.88598 In other words, it is not easy to find a reliable algorithm which
2 has ability to train all types of neural network. Error Back
Propagation (EBP) is considered as a breakthrough to train
a3 1.22091 (9)
neural networks but it is not an effective algorithm because of
a 0.64803 its slow convergence. Lavenberg Marquardt is much faster
4 but it is not suitable for large networks. Although these two
a5 0.17198 algorithms are well-known, they usually face difficulties in
many real applications because of their complex
As presented in previous sections, the improved simplex computations. Training neural networks to control robot arm
method has ability to optimize high dimensional problems kinematics is a typical example.
with very reliable convergence rate. For this particular End Effector
application, this algorithm can be applied very effectively to
synthesize filters. Instead of using the analytical method, we
can use the improved simplex method to optimize the error
function (eq.10) between coefficients of two transfer R2
functions (eq. 6, 7). To guarantee reasonable results with all
positive values, we may divide the numerator and
denominator of the transfer function of the real filter by the
value of C1 in this case (which does not change characteristics R1
of filters). The desired filter with R1= 0.07131, R2= 0.2, C1=
3.3831, L1= 0.70676, C2= 9.7949, L2= 0.7189 has similar
frequency response and pole locations as the ideal filter (Fig.
6, 7).
Fig. 8: Two-link planar manipulator
Er [(a1 1) (a2 0.8859) (a3 1.2209)
2 2 2
6. CONCLUSIONS
REFERENCES
Fig. 11: Desired output and actual output from neural
network in y direction [1] J. A. Nelder, R. Mead, “A Simplex Method for Function
Minimization”, Computer Journal, vol. 7, pp. 308-313,
As we can see, the desired outputs and actual outputs from 1965.
the neural network are not so much different with an error [2] I. Jordanov, A. Georgieva, “Neural Network Learning
about 0.001. The advantage of this algorithm in training with Global Heuristic Search”, IEEE Trans. on Neural
neural networks is that its computational cost is proportional Networks, vol. 18, no. 3, pp. 937-942, 2007.
to the number of weights not the number of input patterns. For [3] S. Shrivastava, K. R. Pardasani, M. M. Malik, “SVM
this particular case, all 2500 patterns can be applied at once, Model for Identification of human GPCRs“, Journal of
and the improved simplex method will train neural networks Computing, vol. 2, no. 2, pp. , 2010.
by optimizing a function with seven variables which are equal [4] G. Guo, Y. Shouyi, “Evolutionary Parallel Local Search
to seven weights. In contrast, training neural networks with for Function Optimization”, IEEE Trans. on System,
Error Back Propagation for 2500 patterns seems impractical Man, and Cybernetics, vol. 33, no. 6, pp. 864-876, 2003.
because EBP has to adjust its weights for each training pattern [5] D. Stiawan, A. H. Abdullah and M. Y. Idris,
in each iteration. This training process is very “Classification of Habitual Activities in Behavior-based
time-consuming and inapplicable for this particular case. Network Detection“, Journal of Computing, vol. 2, no. 8,
Levenberg Marquardt is known as a fast training algorithm pp. , 2010.
but its training ability is limited by the number of input [6] E. Öz, İ. Guney, “A Geometric Programming Approach
patterns P, weights N, and outputs M. In other words, the for The Automotive Production in Turkey“, Journal of
problem becomes more difficult with increase of the size of Computing, vol. 2, no. 7, pp. , 2010.
the network. In each iteration, this algorithm has to calculate [7] H. Miyamoto, K. Kawato, T. Setoyama, and R. Suzuki,
the Jacobian matrix JP*M X N and the inversion of JTJ square “Feedback-Error Learning Neural Network for
matrix. It is obvious that LM cannot train neural networks Trajectory Control of a Robotic Manipulator”, IEEE
with seven weights and 2500 input patterns for this robot arm Trans. on Neural Networks, vol. 1, no. 3, pp. 251–265,
kinematics because of the huge size of Jacobian matrix J2500 7 X
1988.
which over-limits computing capability of PC computers. In [8] Y. Fukuyama, Y. Ueki, “An Application of Neural
order to train neural networks with EBP or LM, the size of Networks to Dynamic Dispatch Using Multi Processors”,
training patterns usually has to be reduced. This will ease the IEEE Trans. on Power Systems, vol. 9, no. 4, pp.
computing tension but it will affect the accuracy of neural 1759-1765, 1994.
network outputs significantly. Therefore, the actual outputs [9] S. Sadek, A. Al-Hamadi, B. Michaelis, U. Sayed,
may be much different from the desired outputs. In contrast, “Efficient Region-Based Image Querying “, Journal of
the increased size of input patterns may affect the Computing, vol. 2, no. 6, pp. , 2010.
convergence rate but not the training ability of the improved
simplex method. This character makes it different from Error
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 3, MARCH 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 63
[10] D. E. Rumelhart, G. E Hinton, and R. J. Williams, Electrical and Computer Engineering, Auburn University.
“Learning Representations by Back-propagating Errors”, His main interests include numerical optimization, neural
Nature, vol. 323, pp. 533-536, Oct. 9, 1986. networks, database systems and network security.
[11] M. T. Hagan, M. Menhaj, “Training Feedforward
Networks with the Marquardt Algorithm”, IEEE Trans. Bogdan M. Wilamowski (M’82– SM’83–F’00) received the
on Neural Networks, vol. 5, no. 6, pp. 989-993, 1994. M. S. degree in computer engineering, the Ph.D. degree in
[12] K. Levenberg, “A Method for the Solution of Certain neural computing, and the Dr. Habil. degree in integrated
Problems in Least Squares,” Quart. Appl. Mach., vol. 2, circuit design in 1966, 1970, and 1977, respectively. He
pp. 164–168, 1944. received the title of Full Professor from the President of
[13] B. M. Wilamowski, Hao Yu, “Improved Computation for Poland in 1987. He was the Director of the Institute of
Levenberg-Marquardt Training”, IEEE Trans. on Neural Electronics (1979–1981) and the Chair of the Solid State
Networks, vol. 21, no.6, pp. 930-937, June. 2010. Electronics Department (1987–1989), Technical University
[14] R. A. Jacobs, “Increased Rates of Convergence through of Gdansk, Gdansk, Poland. He was/has been a Professor with
Learning Rate Adaption”, IEEE Trans. on Neural the Gdansk University of Technology, Gdansk (1987–1989),
Network, vol. 5, no. 1, pp. 295-307, 1988. the University of Wyoming, Laramie (1989–2000), the
[15] T. Tollenaere, “SuperSAB: Fast Adaptive Back University of Idaho, Moscow (2000–2003), and Auburn
Propagation with Good Scaling Properties”, IEEE Trans. University, Auburn, AL (2003–present), where he is currently
on Neural Networks, vol. 3, no. , pp. 561-573, 1990. the Director of the Alabama Micro/Nano Science and
[16] R. Salomon, J. L. Van Hemmen, “Accelerating Back Technology Center and a Professor with
Propagation through Dynamic Self-Adaption”, IEEE the Department of Electrical and
Trans. on Neural Networks, vol. 9, no. , pp.589-601, Computer Engineering. He was also with
1996. the Research Institute of Electronic
[17] M. Manic, B. M. Wilamowski, “Random Weights Search Communication, Tohoku University,
in Compressed Neural Networks Using Over-determined Sendai, Japan (1968–1970), and the
Pseudoinverse”, IEEE International Symposium on Semiconductor Research Institute, Sendai
Industrial Electronics 2003, vol. 2, pp. 678-683, 2003. (1975–1976), Auburn University (1981–1982 and
[18] L. Nazareth, P. Tseng, “Gilding the Lily: A Variant of the 1995–1996), and the University of Arizona, Tucson
Nelder-Mead Algorithm Based on Golden-Section (1982–1984). He is the author of four textbooks and about
Search”, Comput. Optim. Appl, vol. 22, no. 1, pp. 300 refereed publications and is the holder of 28 patents.
133–144, 2002.
[19] F. Gao, L. Han, “Implementing the Nelder-Mead
Simplex Algorithm with Adaptive Parameters”, Comput.
Optim. Appl, vol., no. , pp. , 4th May 2010.
[20] Torczon, V.:Multi-directional Search: A Direct Search
Algorithm for ParallelMachines. Ph.D. Thesis, Rice
University, TX (1989).
[21] J. T. Betts, “Solving the Nonlinear Least Square
Problems: Application of a General Method”, Journal of
Optimization Theory and Applications, vol. 8, no. 4,
1976.
[22] B. M. Wilamowski and R. Gottiparthy “Active and
Passive Filter Design with MATLAB”, International
Journal on Engineering Educations, vol. 21, No 4, pp.
561-571, .2005
[23] Marcin Jagiela and B.M. Wilamowski “A Methodology
of Synthesis of Lossy Ladder Filters”, 13th IEEE
Intelligent Engineering Systems Conference, INES 2009,
Barbados, April 16-18., 2009, pp. 45-50, 2009.