Long Short Term Memory in MLP Pair

INTERNATIONAL SCIENTIFIC CONFERENCE
17-18 November 2017, GABROVO
LONG SHORT TERM MEMORY IN MLP PAIR

Todor Balabanov
Institute of Information and Communication Technologies - Bulgarian Academy of Sciences
akad. Georgi Bonchev Str., block. 2, office 514, 1113 Sofia, Bulgaria
+359898237103, todorb@iinf.bas.bg
Abstract
This study focuses on the capabilities to introduce long short-term memory in classical multilayer perceptron.
In MLP there are not recurrent links and because of that it has no memory capabilities. The idea of the solution
proposed is to improve MLP time series forecasting possibilities by introduction of modified long short-term memory
model as pair of artificial neural networks.
Keywords: artificial neural networks, time series forecasting, backpropagation, machine learning.
INTRODUCTION In the classical MLP links between the

neurons are only from the input to the output.
Artificial neural networks (ANNs) as It is not possible for the signals to travel from
math models with the goal to describe natural the output to the input. Backward only the
neural systems was introduced in the middle of neurons' error is propagated in the backward
20th century. In their organization ANNs are pass. In ANN with such topology it is not
oriented weighted graph. ANNs are basically possible some past information to remain
used in two modes - training and usage. The circulating in the network. Because of this
training mode is determined by the search for MLPs are not so suitable for forecasting tasks
such weights values [7,9,10] that the ANN and generally in the field of the financial time
learns the information from the input to output series forecasting. The lack of memory in MLP
information as well as possible. In the most is overcome by the introduction of recurrent
used case of ANNs the MLP the signals are links as it was done in Jordan and Elman
transmitted from the input to the output. The networks.
input information in each neuron is coming
from the outputs of the previous neurons. The
external impact obtained by this way is then MLP PAIR
processed in a summation function. The most
used summation function is the linear function If we put two MLPs in pair we can
(neurons' output signals are multiplied by the achieve network memory (Fig. 1). As input of
weights of the connections). This is the general MLP1 past values of the time series are
case but functions different than the linear provided with combination of MPL2 output.
function also can be applied. The output of the At the output of MLP1 the forecast is
summation function is then transferred to a expected. MLP1 output is also used as input
function for normalization which determines for MLP2. After that signals are propagated in
the activation level of the neuron. In the MLP2. By this way MLP2 is used as network
literature a set of activation functions are memory. In this implementation both MLPs
proposed. Some of them are well established are trained with backpropagation of the error.
when others not so well [11]. The values of the time series are normalized
before to be supplied in MLP1. The time series
International Scientific Conference UNITECH 2017 Gabrovo

itself is conditionally divided on past frame disadvantaged in its derivative is that it has
(lag) and future frame (lead). two break points.
(2)
Fig. 1. MLP Pair.
ACTIVATION FUNCTION
Fig. 3. Fading sin derivative.
ALTERNATIVES
With less periodic influence is the
It is possible to use other function second proposal of exponent regulated sin in
instead of the most used activation functions. Fig. 4 with formula Eq. 3
In this research to different activation
functions are proposed (Fig. 2-5).
(3)
(1)
The derivative of exponent regulated
sine is presented in Fig. 4.
The fading sin function is presented on
Fig. 2 and its formula with Eq. 1.
Fig. 4. Exponent regulated sin activation function.
Fig. 2. Fading sin activation function.

(4)
Fading sin derivative is presented with
Eq. 2 and it is visualized in Fig. 3. A small

It can be provided as formula by usage possible because of the way in which Encog
of the function itself as it is shown in Eq. 5. ANN programming library is designed.
The usage of fading function gives to
the neurons a behavior of input signals
(5)
saturation. If the input signals are too strong
the neurons will stop working. By such barrier
the responsibilities of the neurons in the
Disadvantage in this derivative is that it hidden layers are much better divided.
has breakpoint when x is zero.
EXPERIMENTS AND RESULTS
All the experiments performed and the

results achieved are done with the Encog
program library.
Fig. 5. Exponent regulated sin derivative.
In both derivatives there is periodic

component Fig. 3 and Fig. 5. Periodic Fig. 7. Single MLP Convergence.
functions are problematic in gradient descent
methods such as backpropagation of the error. All experiments runs are conducted on
a desktop computer with the MacOS Sierra
10.12.2 operating system, the 2.3 GHz Intel
Core i5 processor and 8 GB of RAM, and the
Encog Core v3.3.0 Java Version, Java 8
Update 112 (64bit).
Fig. 6. Alternative activation function derivative.
An efficient way to escape from these

complications is artificially to replace the Fig. 8. MLP Pair Convergence.
derivatives with much smoother and much
more effective function as proposed in Fig. 6. The same experiment was performed
Such artificial replacement of derivatives is with two different networks [6], the first
having no recurrent links (topology 15-11-5),

and the second network was a combination of logical because the network consists of more
two multilayer perceptron (MLP1 with neurons and much more connections than the
topology 20-11-5 and MLP2 with topology 5- single MLP. The proposed alternatives for
6-5). As a time series, publicly available data activation functions are leading to better
on the number of earthquakes per year with distribution of the information in the hidden
magnitude of 7.0 or greater for the period from layers.
1900 to 1998 year. The paired multilayer As further development it will be
perceptron has slightly better predictive interesting this kind of training to be implemented
properties than the three-layer MLP as it is as parallel or distributed computing as it is
shown in Fig. 7 and Fig. 8. In terms of training proposed in [3,5,8]. Another direction of further
time, the pair performs slightly worse than the development is the application in big data
single MLP, which is shown in Fig. 9 and Fig. problems as described in [12]. In this study
10. backpropagation of error was used, but
evolutionary or population based training
heuristics can be applied as proposed in [1,2].
ACKNOWLEDGMENTS
This work was supported by a private

funding of Velbazhd Software LLC.
REFERENCE
Fig. 9. Single MLP Training Epochs. [1] Balabanov, T., Zankinski, I., Barova, M.:
Strategy for Individuals Distribution by
Incident Nodes Participation in Star Topology
The proposed model of MLP pair leads to
of Distributed Evolutionary Algorithms.
results with improved training time and Cybernetics and Information Technologies,
forecasting accuracy. Sofia, Bulgaria, vol. 16, no. 1, 80--88 (2016).
[2] Balabanov, T., Zankinski, I., Dobrinkova, D.:
Time Series Prediction by Artificial Neural
Networks and Differential Evolution in
Distributed Environment. International
Conference on Large-Scale Scientific
Computing, Sozopol, Bulgaria, Lecture Notes
in Computer Science, vol. 7116, no. 1, 198--205
(2011).
[3] Balabanov, T.: Heuristic Forecasting
Approaches in Distributed Environment (in
Bulgarian). Proceedings of Anniversary
Scientific Conference 40 Years Department of
Industrial Automation, UCTM, Sofia, Bulgaria,
163--166 (2011).
Fig. 10. MLP Pair Training Epochs. [4] Balabanov, T.: Distributed evolutional model
for music composition by human-computer
interaction. Proceedings of International
CONCLUSION Scientific Conference UniTech15, University
publishing house V. Aprilov, Gabrovo, Bulgaria,
MLP pair shows promising results when vol. 2, 389--392 (2015).
it is compared with single MLP in the problem of [5] Balabanov, T.: Avoiding Local Optimums in
Distributed Population based Heuristic
financial time series forecasting. Training of the
Algorithms (in Bulgarian). Proceedings of
MLP pair is more time consuming, but it is

XXIII International Symposium Management Time Series Forecasting in Distributed
of energy, industrial and environmental Environment, Proceedings of International
systems, John Atanasoff Union of Automation conference Automatics and Informatics, ISSN
and Informatics, Sofia, Bulgaria, 83--86 (2015). 1313-1850, 129--132, Sofia, Bulgaria, (2016).
[6] Balabanov, T.: Long Short Term Memory in [10] Zankinski, I., Stoilov, T.: Effect of the Neuron
MLP Pair with Encog, Permutation Problem on Training Artificial
https://github.com/VelbazhdSoftwareLLC/Long Neural Networks with Genetic Algorithms in
-Short-Term-Memory-in-MLP-Pair-with-Encog Distributed Computing, Proceedings of XXIV
Sofia, Bulgaria (2017). International Symposium Management of
[7] Keremedchiev, D., Barova, M., Tomov, P.: Energy, Industrial and Environmental Systems,
Mobile application as distributed computing ISSN 1313-2237, Bankya, Bulgaria, 53--55,
system for artificial neural networks training (2016).
used in perfect information games, Proceedings [11] Zankinski, I., Tomov, P., Balabanov, T.:
of 16th International scientific conference Alternative Activation Function Derivative in
UNITECH16, Gabrovo, Bulgaria, vol. 2, 389-- Artificial Neural Networks, Proceedings of
393, (2016). XXV International Symposium Management of
[8] Tashev T., Monov V., Tasheva R.: Load Energy, Industrial and Environmental Systems,
optimization in a grid structure for parallel ISSN 1313-2237, Bankya, Bulgaria, 79--81,
simulations of the throughput of a packet switch (2017).
node. Journal Information Technology and [12] , ., , ., , .:
Control, Volume 12, Issue 2, ISSN (Online)
1312-2622, DOI: 10.1515/itc-2015-0013, 23-- .
30, (2014).
[9] Tomov, P., Monov, V., Artificial Neural , 193--198, ISSN:1314-1937, (2016).
Networks and Differential Evolution Used for

Long Short Term Memory in MLP Pair

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Long Short Term Memory in MLP Pair

Cargado por

Copyright:

Formatos disponibles

INTERNATIONAL SCIENTIFIC CONFERENCE

17-18 November 2017, GABROVO

LONG SHORT TERM MEMORY IN MLP PAIR

INTRODUCTION In the classical MLP links between the

International Scientific Conference UNITECH 2017 Gabrovo

Fig. 1. MLP Pair.

Fig. 4. Exponent regulated sin activation function.

Fig. 2. Fading sin activation function.

International Scientific Conference UNITECH 2017 Gabrovo

All the experiments performed and the

Fig. 5. Exponent regulated sin derivative.

In both derivatives there is periodic

Fig. 6. Alternative activation function derivative.

An efficient way to escape from these

International Scientific Conference UNITECH 2017 Gabrovo

This work was supported by a private

International Scientific Conference UNITECH 2017 Gabrovo

International Scientific Conference UNITECH 2017 Gabrovo

También podría gustarte