Está en la página 1de 364





No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or
by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no
expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of information
contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in
rendering legal, medical or any other professional services.

Additional books in this series can be found on Nova’s website

under the Series tab.

Additional e-books in this series can be found on Nova’s website

under the e-Books tab.




Copyright © 2018 by Nova Science Publishers, Inc.
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in
any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or
otherwise without the written permission of the Publisher.

We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse
content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the
“Get Permission” button below the title description. This button is linked directly to the title’s permission
page on Alternatively, you can visit and search by title, ISBN, or ISSN.

For further questions about using the service on, please contact:
Copyright Clearance Center
Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail:


The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied
warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for
incidental or consequential damages in connection with or arising out of information contained in this book.
The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or
in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government
reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of
such works.

Independent verification should be sought for any data, advice or recommendations contained in this book. In
addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property
arising from any methods, products, instructions, ideas or otherwise contained in this publication.

This publication is designed to provide accurate and authoritative information with regard to the subject
matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering
legal or any other professional services. If legal or any other expert assistance is required, the services of a

Additional color graphics may be available in the e-book version of this book.

Library of Congress Cataloging-in-Publication Data

Names: Rabelo Mendizabal, Luis C. (Luis Carlos), 1960- editor.
Title: Artificial intelligence : advances in research and applications / Luis Rabelo (Department of Industrial
Engineering and Management Systems, Orlando, FL, US).
Other titles: Artificial intelligence (Rabelo)
Description: Hauppauge, New York : Nova Science Publishers, Inc., [2017] |
Series: Computer science, technology and applications | Includes bibliographical references and index.
Identifiers: LCCN 2017045787 (print) | LCCN 2017046373 (ebook) | ISBN 9781536126785  H%RRN | ISBN
9781536126778 (hardcover) | ISBN 9781536126785 (ebook)
Subjects: LCSH: Artificial intelligence.
Classification: LCC TA347.A78 (ebook) | LCC TA347.A78 A785 2017 (print) | DDC 006.3--dc23
LC record available at

Published by Nova Science Publishers, Inc. † New York


Preface vii
Chapter 1 Unsupervised Ensemble Learning 1
Ramazan Ünlü
Chapter 2 Using Deep Learning to Configure Parallel Distributed
Discrete-Event Simulators 23
Edwin Cortes, Luis Rabelo and Gene Lee
Chapter 3 Machine Learning Applied to Autonomous Vehicles 49
Olmer Garcia and Cesar Diaz
Chapter 4 Evolutionary Optimization of Support Vector Machines Using
Genetic Algorithms 75
Fred K. Gruber
Chapter 5 Texture Descriptors for the Generic Pattern
Classification Problem 105
Loris Nanni, Sheryl Brahnam and Alessandra Lumini
Chapter 6 Simulation Optimization Using a Hybrid Scheme with Particle
Swarm Optimization for a Manufacturing Supply Chain 121
Alfonso T. Sarmiento and Edgar Gutierrez
Chapter 7 The Estimation of Cutting Forces in the Turning of Inconel 718
Assisted with a High Pressure Coolant Using Bio-Inspired
Artificial Neural Networks 147
Djordje Cica and Davorin Kramar
Chapter 8 Predictive Analytics using Genetic Programming 171
Luis Rabelo, Edgar Gutierrez, Sayli Bhide and Mario Marin
vi Contents

Chapter 9 Managing Overcrowding in Healthcare using Fuzzy Logic 195

Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab
and Haitham Bahaitham
Chapter 10 The Utilization of Case-Based Reasoning: A Case Study of the
Healthcare Sector Using Simulation Modeling 229
Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
Chapter 11 Agent-Based Modeling Simulation and
Its Application to Ecommerce 255
Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari
Chapter 12 Artificial Intelligence for the Modeling and Prediction of
the Bioactivities of Complex Natural Products 277
Jose M. Prieto
Chapter 13 Predictive Analytics for Thermal Coal Prices Using
Neural Networks and Regression Trees 301
Mayra Bornacelli, Edgar Gutierrez and John Pastrana
Chapter 14 Explorations of the ‘Transhuman’ Dimension of
Artificial Intelligence 321
Bert Olivier
Index 339

After decades of basic research and more promises than impressive applications,
artificial intelligence (AI) is starting to deliver benefits. A convergence of advances is
motivating this new surge of AI development and applications. Computer capability as
evolved from high throughput and high performance computing systems is increasing. AI
models and operations research adaptations are becoming more matured, and the world is
breeding big data not only from the web and social media but also from the Internet of
This is a very distinctive book which discusses important applications using a variety
of paradigms from AI and outlines some of the research to be performed. The work
supersedes similar books that do not cover as diversified a set of sophisticated
applications. The authors present a comprehensive and articulated view of recent
developments, identifies the applications gap by quoting from the experience of experts,
and details suggested research areas.
The book is organized into 14 chapters which provide a perspective of the field of AI.
Areas covered in these selected papers include a broad range of applications, such as
manufacturing, autonomous systems, healthcare, medicine, advanced materials, parallel
distributed computing, and electronic commerce. AI paradigms utilized in this book
include unsupervised learning, ensembles, neural networks, deep learning, fuzzy logic,
support-vector machines, genetic algorithms, genetic programming, particle swarm
optimization, agents, and case-based reasoning. A synopsis of the chapters follow:
• Clustering Techniques: Novel research in clustering techniques are essential to
improve the required exploratory analysis for revealing hidden patterns, where label
information is unknown. Ramazan Ünlü in the chapter “Unsupervised Ensemble
Learning” discusses unsupervised ensemble learning, or consensus clustering which is a
method to improve the selection of the most suitable clusterization algorithm. The goal of
this combination process is to increase the average quality of individual clustering
methods. Through this chapter, the main concepts of clustering methods are introduced
viii Luis Rabelo, Sayli Bhide and Edgar Gutierrez

first and then the basics of ensemble learning are given. Finally, the chapter concludes
with a summary of the novel progresses in unsupervised learning.
• Deep Learning and a Complex Application in Parallel Distributed Simulation:
is introduced in the chapter by Edwin Cortes and Luis Rabelo entitled “Using Deep
Learning to Configure Parallel Distributed Discrete-Event Simulators.” The authors
implemented a pattern recognition scheme to identify the best time management and
synchronization scheme to execute a particular parallel discrete simulation (DES)
problem. This innovative pattern recognition method measures the software complexity.
It characterizes the features of the network and hardware configurations to quantify and
capture the structure of the Parallel Distributed DES problem. It is an innovative research
in deep belief network models.
• Autonomous Systems: The area of autonomous systems as represented by
autonomous vehicles and deep learning in particular Convolutional Neural Networks
(CNNs) are presented in the chapter “Machine Learning Applied to Autonomous
Vehicles” by Olmer García and Cesar Díaz. This chapter presents an application of deep
learning for the architecture of autonomous vehicles which are a good example of a
multiclass classification problem. The authors argue that the use of AI in this domain
requires two hardware/software systems: one for training in the cloud and the other one in
the autonomous vehicle. This chapter demonstrates that deep learning can create
sophisticated models which are able to generalize with relative small datasets.
• Genetic Algorithms & Support Vector Machines: The utilization of Genetic
Algorithms (GAs) to select which learning parameters of AI paradigms can actually assist
researchers in automating the learning process is discussed in the chapter “Evolutionary
Optimization of Support Vector Machines Using Genetic Algorithms”. Fred Gruber uses
a GA to find an optimized parameter set for support vector machines. GAs and cross
validation increase the generalization performance of support vector machines (SVMs).
When doing this, it should be noted that the processing time increases. However, this
drawback can be reduced by finding configurations for SVMs that are more efficient.
• Texture Descriptors for the Generic Pattern Classification Problem: In the
chapter “Texture Descriptors for the Generic Pattern Classification Problem”, Loris
Nanni, Sheryl Brahnam, and Alessandra Lumini propose a framework that employs a
matrix representation for extracting features from patterns that can be effectively applied
to very different classification problems. Under texture analysis, the chapter goes through
experimental analysis showing the advantages of their approach. They also report the
results of experiments that examine the performance outcomes from extracting different
texture descriptors from matrices that were generated by reshaping the original feature
vector. Their new methods outperformed SVMs.
• Simulation Optimization: The purpose of simulation optimization in predicting
supply chain performance is addressed by Alfonso Sarmiento and Edgar Gutierrez in the
chapter “Simulation Optimization Using a Hybrid Scheme with Particle Swarm
Preface ix

Optimization for a Manufacturing Supply Chain.” The methodology uses particle swarm
optimization (PSO) in order to find stability in the supply chain using a system dynamics
model of an actual situation. This is a classical problem where asymptotic stability has
been listed as one of the problems to solve. The authors show there are many factors that
affect supply chain dynamics including: shorter product life cycles, timing of inventory
decisions, and environmental regulations. Supply chains evolve with these changing
dynamics which causes the systems to behave non-linearly. The impacts of these
irregular behaviors can be minimized when the methodology solves an optimization
problem to find a stabilizing policy using PSO (that outperformed GAs in the same task).
To obtain a convergence, a hybrid algorithm must be used. By incorporating a theorem
that allows finding ideal equilibrium levels, enables a broader search to find stabilizing
• Cutting Forces: Accurate prediction of cutting forces has a significant impact on
quality of product in manufacturing. The chapter “Estimation of Cutting Forces in turning
of Inconel 718 Assisted with High Pressure Coolant using Bio-Inspired Artificial Neural
Networks” aims at utilizing neural networks to predict cutting forces in turning of a
nickel-based alloy Inconel 718 assisted with high pressure coolant. Djordje Cica and
Davorin Kramar discuss a study that employs two bio-inspired algorithms; namely GAs
and PSO, as training methods of neural networks. Further, they compare the results
obtained from the GA-based and PSO-based neural network models with the most
commonly used back propagation-based neural networks for performance.
• Predictive Analytics using Genetic Programming: The chapter “Predictive
Analytics using Genetic Programming” by Luis Rabelo, Edgar Gutierrez, Sayli Bhide,
and Mario Marin focus on predictive analytics using genetic programming (GP). The
authors describe with detail the methodology of GP and demonstrate its advantages. It is
important to highlight the use of the decile table to classify better predictors and guide the
evolutionary process. An actual application to the Reinforced Carbon-Carbon structures
of the NASA Space Shuttle is used. This example demonstrates how GP has the potential
to be a better option than regression/classification trees due to the fact that GP has more
operators which include the ones from regression/classification trees. In addition, GP can
help create synthetic variables to be used as input to other AI paradigms.
• Managing Overcrowding in Healthcare using Fuzzy Logic: The chapter
“Managing Overcrowding in Healthcare using Fuzzy Logic” focuses on the
overcrowding problem frequently observed in the emergency departments (EDs) of
healthcare systems. The hierarchical fuzzy logic approach is utilized by Abdulrahman
Albar, Ahmad Elshennawy, Mohammed Basingab, and Haitham Bahaitham to develop a
framework for quantifying overcrowding. The purpose of this research was to develop a
quantitative measurement tool for evaluating ED crowding which captures healthcare
experts’ opinions and other ED stakeholder’s perspectives. This framework has the
x Luis Rabelo, Sayli Bhide and Edgar Gutierrez

ability to be applied in variety of healthcare systems. The methodology developed is the

first of this kind.
• Simulation Modeling: can be used as an important methodology to capture and
develop knowledge and complement the implementation of intelligent system. The
chapter “The Utilization of Case-Based Reasoning: A Case Study of the Healthcare
Sector Using Simulation Modeling” applies a combination of discrete event simulations
(DES) and case based reasoning (CBR) to assist in solving new cases in healthcare
systems. An important objective of this approach is that it can improve the stakeholders’
involvement by eliminating the need for simulation or statistical knowledge or
experience. A case study on EDs which face multiple resource constraints including
financial, labor, and facilities is explained by Khaled Alshareef, Ahmad Rahal, and
Mohammed Basingab. The applications of DES-CBR provided solutions that were
realistic, robust, and more importantly the results were scrutinized, and validated by field
• Agent Based Modeling and Simulation and its Application to E-commerce: by
Oloruntomi Joledo, Edgar Gutierrez, and Hathim Bukhari presents an application for a
peer-to-peer lending environment. The authors seek to find how systems performance is
affected by the actions of stakeholders in an ecommerce system. Dynamic system
complexity and risk are considered in this research. When systems dynamics and neural
networks are combined along with at the strategy level and agent- based models of
consumer behavior allows for a business model representation that leads to reliable
decision-making. The presented framework shares insights into the consumer-to-
consumer behavior in ecommerce systems.
• Artificial Intelligence for the Modeling and Prediction of the Bioactivities of
Complex Natural Products: by Jose Prieto presents neural networks as a tool to predict
bioactivities for very complex chemical entities such as natural products, and suggests
strategies on the selection of inputs and conditions for the in silico experiments. Jose
Prieto explains that neural networks can become reliable, fast and economical tools for
the prediction of anti-inflammatory, antioxidant, antimicrobial and anti-inflammatory
activities, thus improving their use in medicine and nutrition.
• Predictive Analytics: is one of the most advanced forms of analytics and AI
paradigms that are the core of these predictive systems. The chapter “Predictive Analytics
for Thermal Coal Prices using Neural Networks and Regression Trees” by Mayra
Bornacelli and Edgar Gutierrez aims to deliver price predictive analytics models. A
necessity for many industries. This chapter is targeted towards predicting prices of
thermal coal. By implementing the Delphi methodology along with neural networks,
conclusions can be reached about global market tendencies and variables. Although
neural networks outperformed regression trees, the latter created models which can be
easily visualized and understood. Overall, the research found that even though the market
of thermal coal is dynamic and the history of its prices is not a good predictive for future
Preface xi

prices; the general patterns that were found, hold more importance than the study of
individual prices and that the methodology that was used applies to oligopolistic markets.
• Explorations of the Transhuman Dimension of Artificial Intelligence: The final
chapter provides a very important philosophical discussion of AI and its ‘transhuman’
dimension, which is “here understood as that which goes beyond the human, to the point
of being wholly different from it.” In “Explorations of the ‘Transhuman’ Dimension of
Artificial Intelligence”, Bert Olivier examines the concept of intelligence as a function of
artificially intelligent beings. However, these artificially intelligent beings are recognized
as being ontologically distinct from humans as “embodied, affective, intelligent beings.”
These differences are the key to understand the contrast between AI and being-human.
His examination involves contemporary AI-research as well as projections of possible AI
developments. This is a very important chapter with important conclusions for AI and its
We would like to acknowledge the individuals who contributed to this effort. First
and foremost, we would like to express our sincere thanks to the contributors of the
chapters for reporting their research and also for their time, and promptness. Our thanks
are due to Nova for publishing this book, their advice, and patience. We believe that this
book is an important contribution to the community in AI. We hope this book will serve
as a motivation for continued research and development in AI.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 1


Ramazan Ünlü*
Industrial Engineering and Management Systems,
University of Central Florida, Orlando, FL, US


Clustering is used in identifying groups of samples with similar properties, and it is

one of the most common preliminary exploratory analysis for revealing “hidden”
patterns, in particular for datasets where label information is unknown. Even though
clustering techniques have been well used to analyze a variety of datasets in different
domains for years, the limitation of them is that each clustering method works better only
in certain conditions. This made the selection of the most suitable algorithm for particular
dataset much more important. Restrained implementation of clustering methods has
forced clustering practitioners to develop more robust methods, which is reasonably
practicable in any condition. The unsupervised ensemble learning, or consensus
clustering, is developed to serve this purpose. It consists of finding the optimal
combination strategy of individual partitions that is robust in comparison to the selection
of an algorithmic clustering pool. The goal of this combination process is to improve the
average quality of individual clustering methods. Due to increasing development of new
methods, their promising results and the great number of applications, it is considered to
make a crucial and a brief review about it. Through this chapter, first the main concepts
of clustering methods are briefly introduced and then the basics of ensemble learning is
given. Finally, the chapter is concluded with a comprehensive summary of novel
developments in the area.

Keywords: consensus clustering, unsupervised ensemble learning

Corresponding Author Email:
2 Ramazan Ünlü


Data mining (DM) is one of the most notable research areas in the last decades. DM
can be defined as interdisciplinary area of an intersection of Artificial Intelligence (AI),
machine learning, and statistics. One of the earliest studies of the DM, which highlights
some of its distinctive characteristics, is proposed by (Fayyad, Piatetsky-Shapiro, &
Smyth, 1996; Kantardzic, 2011), who define it as "the nontrivial process of identifying
valid, novel, potentially useful, and ultimately understandable patterns in data.". In
general, the process of extraction of implicit, hidden, and potentially useful knowledge
from data is a well-accepted definition of DM.
With the growing use of computers and data storage technology, there exists a great
amount of data being produced by different systems. Data can be defined as a set of
qualitative or quantitative variables such as facts, numbers, or texts that describe the
things. For DM, the standard structure of a data is a collection of samples in which
measurements named features are specified, and these features are obtained in many
cases. If we consider that a sample is represented by a multidimensional vector, each
dimension can be considered as one feature of the sample. In other words, it can be said
that features are some values that represent the specific characteristic of a sample
(Kantardzic, 2011).

Figure 1. Tabular form of the data. Original dataset can be found in

Based on true class information, the data can be categorized as labeled and unlabeled
data from DM perspective. Labeled data refers to a set of samples or cases with known
true classes, and unlabeled data is a set of samples or cases without known true classes.
The Figure 1 shows some samples of a dataset in the tabular form in which the columns
represent features of samples and the rows are values of these features for a specific
sample. In this example, consider that the true outputs are unknown. The true outputs can
be, for example, people who have annual income more than or less than $100.000. In
Unsupervised Ensemble Learning 3

general, an appropriate DM method needs to be selected based on available labeled or

unlabeled data. Therefore, DM methods can be roughly categorized as supervised and
unsupervised learning based on data is labeled or unlabeled. While supervised learning
methods reserve for the labeled datasets, unsupervised learning methods are designed for
the unlabeled datasets. It might be crucial to select a suitable algorithm because it might
not be effective to use a method developed for labeled data to mine unlabeled data.
Throughout the chapter the focus will be on unsupervised learning.


Clustering as one of the most widely used DM methods finds applications in

numerous domains including information retrieval and text mining (A. Jain, 1999),
spatial database applications (Sander, Ester, Kriegel, & Xu, 1998), sequence and
heterogeneous data analysis (Cades, Smyth, & Mannila, 2001), web data analysis
(Srivastava, Cooley, Deshpande, & Tan, 2000), bioinformatics (de Hoon, Imoto, Nolan,
& Miyano, 2004), text mining (A. K. Jain, Murty, & Flynn, 1999) and many others. As
pointed out, there are no labeled data available in clustering problems. Therefore, the goal
of clustering is division of unlabeled data into groups of similar objects (Berkhin, 2006).
Objects in the same group are considered as similar to each other and dissimilar to
objects in other groups. An example of clustering is illustrated in Figure 2, here points
belonging to the same cluster are shown with the same symbol.
More formally, for a given data set 𝑋 = (𝑥𝑖 )𝑁 𝑛
𝑖=1 where 𝑥𝑖 ∈ ℝ , 𝑁 and 𝑛 are
number of samples and features respectively, clustering methods try to find k-clusters of
𝑋, 𝑝 = {𝑝1 , 𝑝2 ,···, 𝑝𝑘 } where 𝑘 < 𝑁, such that:

Figure 2. An example of clustering.

4 Ramazan Ünlü

Figure 3. Clustering process.

𝑝𝑖 ≠ 0 for 𝑖 = 1, … , 𝑘
∪𝑘𝑖=1 𝑝𝑖 = 𝑋
𝑝𝑖 ∩ 𝑝𝑗 = ∅ for 𝑖, 𝑗 = 1, … , 𝑘

Through this clustering process, clusters are created based on dissimilarities and
similarities between samples. Those dissimilarities and similarities are assessed based on
the feature values describing the objects and are relevant to the purpose of the study,
domain-specific assumptions and prior knowledge of the problem (Grira, Crucianu, &
Boujemaa, 2005). Since the similarity is an essential part of a cluster, a measure of the
similarity between two objects is very crucial in clustering algorithms. This action must
be chosen very carefully because the quality of a clustering model depends on this
decision. Instead of using similarity measure, the dissimilarity between two samples are
commonly used as well. For the dissimilarity metrics, a distance measure defined on the
feature space such as Euclidean distance, Minkowski distance, and City-block distance
(Kantardzic, 2011).
The standard process of clustering can be divided into the several steps. The structure
of those necessary steps of a clustering model are depicted in Figure 3 inspired by (R. Xu
& Wunsch, 2005). On the other hand, several taxonomies of clustering methods were
proposed by researchers (Nayak, Naik, & Behera, 2015; D. Xu & Tian, 2015; R. Xu &
Wunsch, 2005). It is not easy to give the strong diversity of clustering methods because
of different starting point and criteria. A rough but widely agreed categorization of
clustering methods is to classify them as hierarchical clustering and partitional clustering,
based on the properties of clusters generated (R. Xu & Wunsch, 2005). However, the
detailed taxonomy listed below in Table 1 inspired by the one suggested in (D. Xu &
Tian, 2015) is put forwarded.
In this study, details of algorithms categorized in Table 1 are not discussed. We can
refer the reader to (D. Xu & Tian, 2015) for a detailed explanation of these clustering
algorithms. However, a brief overview about ensemble based clustering is given. Detailed
discussion will be introduced in the section below.
Unsupervised Ensemble Learning 5

Table 1. Traditional and Modern algorithms


Clustering algorithms based on ensemble called unsupervised ensemble learning or

consensus clustering can be considered as a modern clustering algorithm. Clustering
results are prone to being diverse across the algorithm, and each algorithm might work
better for a particular dataset. This diversity is hypothetically illustrated by a toy example
in Figure 4. In this figure, samples are in the same group represented by the same symbol.
As shown in figure, different clustering methods might give us different partitions of the
data, and they can even produce the different number of clusters because of the diverse
objectives and methodological foundations (Haghtalab, Xanthopoulos, & Madani, 2015).
As it will be discussed later, to deal with the potential variation of clustering
methods, one can use consensus clustering. The core idea of consensus clustering is to
combine good characteristics of different partitions to create a better clustering model. As
the simple logic of process is shown in Figure 5, different partitions (𝑃1 , 𝑃2 , … , 𝑃𝑞 ) need
to be somehow produced and combined to create optimum partition (𝑃∗ ).
6 Ramazan Ünlü

Figure 4. Comparison of different clustering methods. a represents the raw data without knowing true
classes. b, c and d illustrate various partition of the data produced by different methods.

Figure 5. Process of consensus clustering.

Unsupervised Ensemble Learning 7

The analysis of consensus clustering is summarized under the title of modern

clustering methods in (Xu & Tian, 2015) as follows:

 Time complexity of this kind of algorithms depends on the algorithm chosen to

combine its results.
 Consensus clustering can produce robust, scalable, consistent partition and can
take the advantages of individual algorithms used.
 They have existing deficiencies of the design of the function which is used to
combine results of individual algorithms.


As touched upon before, clustering consists in identifying groups of samples with

similar properties, and it is one of the most common preliminary exploratory analysis for
revealing ``hidden'' patterns, in particular for datasets where label information is
unknown (Ester, Kriegel, Sander, & Xu, 1996). With the rise of big data efficient and
robust algorithms able to handle massive amounts of data in a considerable amount of
time are necessary (Abello, Pardalos, & Resende, 2013; Leskovec, Rajaraman, &
Ullman, 2014). Some of the most common clustering schemes include, but are not limited
to k-means (MacQueen, 1967), hierarchical clustering (McQuitty, 1957), spectral
clustering (Shi & Malik, 2000), and density-based clustering approaches (Ester et al.,
1996). The detailed taxonomy of clustering methods is given in Table 1. Given the
diverse objectives and methodological foundations of these methods, it is possible to
yield clustering solutions that differ significantly across algorithms (Haghtalab et al.,
2015). Even for multiple runs of the same algorithm, on the same dataset, one is not
guaranteed the same solution. This is a well-known phenomenon that is attributed to the
local optimality of clustering algorithms such as k-means (Xanthopoulos, 2014). In
addition to local optimality, algorithmic choice or even the dataset itself might be
responsible for utterly unreliable and unusable results. Therefore, once two different
clustering algorithms is applied to the same dataset and obtain entirely different results, it
is not easy to say the correct one. To handle with this problem, consensus clustering can
help to minimize this variability through an ensemble procedure that combines the
``good'' characteristics from a diverse pool of clusterings (A. L. Fred & Jain, 2005; Liu,
Cheng, & Wu, 2015; Vega-Pons & Ruiz-Shulcloper, 2011). It has emerged as a powerful
technique to produce an optimum and useful partition of a dataset. Some studies such as
(A. L. Fred & Jain, 2005; Strehl & Ghosh, 2002; Topchy, Jain, & Punch, 2004)defined
various properties that endorse the use of consensus clustering. Some of them are
described as follows:
8 Ramazan Ünlü

 Robustness: The consensus clustering might have better overall performance than
majority of individual clustering methods.
 Consistency: The combination of individual clustering methods is similar to all
combined ones.
 Stability: The consensus clustering shows less variability across iterations than
all combined algorithms.

In terms of properties like these, the better partitions can be produced in comparison
to most individual clustering methods. The result of consensus clustering cannot be
expected to be the best result in all cases as there could be exceptions. It can only be
ensured that consensus clustering outperforms most of the single algorithms combined
concerning some properties by assuming the fact that combination of good characteristics
of various partition is more reliable than any single algorithm.
Over the past years, many different algorithms have been proposed for consensus
clustering (Al-Razgan & Domeniconi, 2006; Ana & Jain, 2003; Azimi & Fern, 2009; d
Souto, de Araujo, & da Silva, 2006; Hadjitodorov, Kuncheva, & Todorova, 2006; Hu,
Yoo, Zhang, Nanavati, & Das, 2005; Huang, Lai, & Wang, 2016; Li & Ding, 2008; Li,
Ding, & Jordan, 2007; Naldi, Carvalho, & Campello, 2013; Ren, Domeniconi, Zhang, &
Yu, 2016). As it is mentioned earlier, it can be seen in the literature that the consensus
clustering framework is able to enhance the robustness and stability of clustering
analysis. Thus, consensus clustering has gained a lot of real-world applications such as
gene classification, image segmentation (Hong, Kwong, Chang, & Ren, 2008), video
retrieval and so on (Azimi, Mohammadi, & Analoui, 2006; Fischer & Buhmann, 2003; A.
K. Jain et al., 1999). From a combinatorial optimization point of view, the task of
combining different partitions has been formulated as a median partitioning problem
which is known to be N-P complete (Křivánek & Morávek, 1986). Even with the use of
recent breakthroughs this approach cannot handle datasets of size greater than several
hundreds of samples (Sukegawa, Yamamoto, & Zhang, 2013). For a comprehensive
literature of formulation of 0-1 linear program for the consensus clustering problem,
readers can refer to (Xanthopoulos, 2014).
The problem of consensus clustering can be verbally defined such that by using given
multiple partitions of the dataset, find a combined clustering model- or final partition-
that somehow gives better quality regarding some aspects as pointed out above.
Therefore, every consensus clustering method is made up of two steps in general: (1)
generation of multiple partition and (2) consensus function as shown in Figure 6 (Topchy,
Jain, & Punch, 2003; Topchy et al., 2004; D. Xu & Tian, 2015).
Generation of multiple partitions is the first step of consensus clustering. This action
aims to create multiple partitions that will be combined. It might be imperative for some
problems because final partition will depend on partitions produced in this step. Several
methods are proposed to create multiple partitions in literature as follows:
Unsupervised Ensemble Learning 9

Figure 6. Process of consensus clustering.

 For the same dataset, employ different traditional clustering methods: Using
different clustering algorithms might be the most commonly used method to
create multiple partitions for a given dataset. Even though there is no particular
rule to choose the conventional algorithms to apply, it is advisable to use those
methods that can have more information about the data in general. However, it is
not easy to know in advance which methods will be suitable for a particular
problem. Therefore, an expert opinion could be very useful (Strehl & Ghosh,
2002; Vega-Pons & Ruiz-Shulcloper, 2011; D. Xu & Tian, 2015).
 For the same dataset, employ different traditional clustering methods with
different initializations or parameters: Using different algorithms with a different
parameter or initialization is an another efficient method (Ailon, Charikar, &
Newman, 2008).A simple algorithm can produce different informative partition
about the data, and it can yield an effective consensus in conjunction with a
suitable consensus function. For example, using the k-means algorithm with
different random initial centers and number of clusters to generate different
partitions introduced by (A. L. Fred & Jain, 2005).
 Using weak clustering algorithms: In generation step, the weak clustering
algorithms are also used. These methods produce a set of partitions for data using
very straightforward methodology. Despite the simplicity of this kind of
methods, it is observed that weak clustering algorithms could provide high-
quality consensus clustering along with a proper consensus function (Luo, Jing,
& Xie, 2006; Topchy et al., 2003; Topchy, Jain, & Punch, 2005)
 Data resampling: Data resampling such as bagging and boosting is an another
useful method to create multiple partitions (Dudoit & Fridlyand, 2003; Hong et
al., 2008). Dudoit S. and Jane Fridlyand J. applied a partitioning clustering
method (e.g., Partitioning Around Medoids) to a set of bootstrap learning data to
10 Ramazan Ünlü

produce multiple partitions. They aimed to reduce variability in the partitioning

based algorithm result by averaging. And, they successfully produced more
accurate clusters than an application of a single algorithm.

The consensus function is the crucial and leading step of any consensus clustering
algorithm. These functions are used to combine a set of labels produced by individual
clustering algorithms in the previous step. The combined labels - or final partition- can be
considered as a result of another clustering algorithm. Foundation or definition of a
consensus function can profoundly impact the goodness of final partition which is the
product of any consensus clustering. However, the way of the combination of multiple
partitions is not the same in all cases. A sharp -but well-accepted- division of consensus
functions are (1) objects co-occurrence and (2) median partition approaches.
The idea of objects co-occurrence methods works based on similar and dissimilar
objects. If two data points are in the same cluster, those can be considered as similar,
otherwise they are dissimilar. Therefore, in objects co-occurrence methods it should be
analyzed that how many times data samples belong to one cluster. In median partition
approach, the final partition is obtained by solving an optimization problem which is the
problem of finding the median partition concerning cluster ensemble. Now the formal
version of the median partition problem can be defined. Given a set of 𝑞 partitions and a
similarity measure such as distance 𝜔(, ) between two partitions, a set of partition 𝑃∗ can
be found such that:


𝑃 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑃 ∑ 𝜔(𝑃𝑖 , 𝑃)

It can be found the detailed review of consensus functions, and taxonomy of principal
consensus functions in different studies such as (Ghaemi, Sulaiman, Ibrahim, &
Mustapha, 2009; Topchy et al., 2004; Vega-Pons & Ruiz-Shulcloper, 2011; D. Xu &
Tian, 2015). Also, relations among different consensus functions can be found in (Li,
Ogihara, & Ma, 2010). some of the main functions are summarized as follows:

 Based on relabeling and voting: These methods are based on two important
steps. At the first step, the labeling correspondence problem needs to be solved.
The label of each sample is symbolic; a set of the label given by an algorithm
might be different than labels given by another algorithm. However, both sets of
labels correspond to the same partition. Solving this problem makes the partitions
ready for the combination process. If the labeling correspondence problem is
solved, then at the second step voting procedure can be applied. The voting
process finds how many times a sample is labeled with the same label. To apply
Unsupervised Ensemble Learning 11

these methods, each produced partition should have the same number of the
cluster with final partition (Topchy et al., 2005; Vega-Pons & Ruiz-Shulcloper,
2011). On the other hand, the strength of this method is easy to understand and
employ. Plurality Voting (PV) (Fischer & Buhmann, 2003), Voting-Mergin
(VM) (Weingessel, Dimitriadou, & Hornik, 2003), Voting for fuzzy clustering
(Dimitriadou, Weingessel, & Hornik, 2002), Voting Active Cluster (VAC)
(Tumer & Agogino, 2008). and Cumulative Voting (CV) (Ayad & Kamel,
2008)can be given as examples.
 Based on co-association matrix: Algorithms based on the co-association matrix
is used to avoid the labeling correspondence problem. The main idea of this
approach is to create a co-association matrix in which each element is computed
based on how many times two particular samples are in the same cluster. A
clustering algorithm is necessary to produce the final partition. One of the
deficiencies of this kind of algorithm is that the computational complexity of the
methods is quadratic in the number of samples. Therefore, it is not suitable for
large datasets. On the other hand, they are very easy to understand and employ.
Evidence accumulation in conjunction with Single Link (EA-CL) or Complete
Link algorithms (EA-CL) (A. Fred, 2001) can be given as examples.
 Based on graph partition: This kind of methods transform the combination of
multiple partitions into graph or hypergraph partitioning problem (Vega-Pons &
Ruiz-Shulcloper, 2011). All partitions in ensemble procedure can be represented
by a hyperedge, and final partition is obtained by implementing a graph-based
clustering algorithm. Three graph partitioning algorithms, Cluster-based
Similarity Partitioning Algorithm (CSPA), Hypergraph Partitioning Algorithm
(HGPA), and Meta-CLustering Algorithm (MCLA), are proposed by (Strehl &
Ghosh, 2002). In CSPA, a similarity matrix is created from a hypergraph. Each
element of this matrix shows how many times two points are assigned to the
same cluster. Final partition can be obtained by applying a graph similarity-based
algorithm such as spectral clustering or METIS. In HGPA, the hypergraph is
directly clustered by removing the minimum number of hyperedges. To get the
final partition from the hypergraph, an algorithm which is suitable to cluster
hypergraph such as HMETIS (Karypis, Aggarwal, Kumar, & Shekhar, 1999) is
used. In MCLA, the similarity between two clusters is defined based on the
number of common samples by using Jaccard index. The similarity matrix
between the clusters is the adjacency matrix of the graph whose nodes are the
clusters and edge is the similarity between the clusters. METIS algorithm is used
to recluster that graph. Computational and storage complexity of CSPA is
quadratic in the number of sample n, while HGPA and MCLA are linear.
Another graph based method is Hybrid Bipartite Graph Formulation (HBGF) is
proposed by (Fern & Brodley, 2004). As different from the previous methods,
12 Ramazan Ünlü

they showed both samples and clusters of the ensemble simultaneously as

vertices in the bipartite graph. In this graph, edges are only between clusters and
samples and there is no edge if the weight is zero meaning the sample does not
belong to the cluster. The final partition is obtained by using a graph similarity-
based algorithm.
 Based on information theory: Information theory based algorithms define the
ensembling problem as the finding median partition by a heuristic solution. In
these methods, the category utility function is used to determine the similarity
measures between clusters. Within the context of clustering, the category utility
function (Gluck, 1989) can be defined as the partition quality scoring function. It
is proved that this function is same as within cluster variance minimization
problem and it can be maximized by using k-means algorithm (Mirkin, 2001).
Using k-means algorithms, on the other hand, bring a deficiency which is the
necessity of determining the number of cluster as an initial parameter. Besides,
the method should be run multiple times to avoid bad local minima. For the
methodological details and implementation of the method, readers can refer to
(Gluck, 1989; Topchy et al., 2005).
 Based on local adaptation: Local adoption based algorithm combines multiple
partitions using locally adaptive clustering algorithm (LAC) which is proposed
by (Domeniconi et al., 2007) with different parameters initialization. Weighty
similarity partition algorithm (WSPA), weighty bipartite partition algorithm
(WBPA) (Domeniconi & Al-Razgan, 2009), and weighted subspace bipartite
partitioning algorithm (WSPA). To obtain final partition, each method uses a
graph partitioning algorithm such as METIS. The strong restriction of these kinds
of methods is that LAC algorithms can be applied to only numerical data.
 Based on kernel method: Weighted partition consensus via Kernels (WPCK) is
proposed by (Vega-Pons, Correa-Morris, & Ruiz-Shulcloper, 2010). This method
uses an intermediate step called Partition Relevance Analysis to assign weights to
represent the significance of the partition in the ensemble. Also, this approach
defines the consensus clustering via the median partition problem by using a
kernel function as the similarity measure between the clusters (Vega-Pons &
Ruiz-Shulcloper, 2011). Other proposed methods using the same idea can be
found in (Vega-Pons, Correa-Morris, & Ruiz-Shulcloper, 2008; Vega-Pons &
Ruiz-Shulcloper, 2009).
 Based on fuzzy theory: So far, it has been explained ensemble clustering methods
whose methodology is developed based on hard partitioning. However, the soft
partitioning might also work in various cases. There are clustering methods like
EM and fuzzy-c-means that produce soft partition or fuzzy partition of the data.
Thus, to combine fuzzy partition instead of hard ones as an internal step of the
Unsupervised Ensemble Learning 13

process is the main logic of these kinds of methods. sCSPA, sMCLA, and
sHBGF (Punera & Ghosh, 2008) can be found as examples in literature.


In the literature, the various studies focus on the development of the consensus
clustering or application of the existing methods. In this section, some relatively recent
and related works are summarized. One can find many different terms corresponding
consensus clustering frameworks. That’s why the search for this study is limited to the
following terms:

 Consensus clustering
 Ensemble clustering
 Unsupervised ensemble learning

Ayad and Kamel proposed the cumulative voting-based aggregation algorithm

(CVAA) as multi-response regression problem (Ayad & Kamel, 2010). The CVAA is
enhanced by assigning weights to the individual clustering method that is used to
generate the consensus based on the mutual information associated with each method,
which is measured by the entropy (Saeed, Ahmed, Shamsir, & Salim, 2014). Weighted
partition consensus via Kernels (WPCK) is proposed by (Vega-Pons et al., 2010). This
method uses an intermediate step called Partition Relevance Analysis to assign weights to
represent the significance of the partition in the ensemble. Also, this method defines the
consensus clustering via the median partition problem by using a kernel function as the
similarity measure between the clusters. Different from partitional clustering methods
whose results can be represented by vectors hierarchical clustering methods produce a
more complex solution which is shown by dendrograms or trees. This makes using
hierarchical clustering in consensus framework more challenging. A hierarchical
ensemble clustering is proposed by (Yu, Liu, & Wang, 2014) to handle with this difficult
problem. This algorithm combines both partitional and hierarchical clustering and yield
the output as hierarchical consensus clustering.
Link-based clustering ensemble (LCE) is proposed as an extension of hybrid bipartite
graph (HBGF) technique (Iam-On, Boongeon, Garrett, & Price, 2012; Iam-On &
Boongoen, 2012). They applied a graph based consensus function to an improved
similarity matrix instead of conventional one. The main difference between the proposed
method and HBGF is the similarity matrix. While the association between samples is
represented by the binary values [0,1] in traditional similarity matrix, the approximate
value of unknown relationships (0) is used in the improved one. This is accomplished
through the link-based similarity measure called ‘Weighted Connected Triple (WCT)’.
14 Ramazan Ünlü

Mainly, after they have created some base partitions, an improved similarity matrix is
created to get an optimal partition by using spectral clustering. An improved version of
LCE is proposed by (Iam-On, Boongoen, & Garrett, 2010)with the goal of using
additional information by implementing 'Weighted Triple Uniqueness (WTU)'. An
iterative consensus clustering is applied to a complex network (Lancichinetti &
Fortunato, 2012). Lancichinetti and Fortunat stress that there might be a noisy connection
in consensus graph which should be removed. Thus, they refined consensus graph by
removing some edges whose value is lower than some threshold value and reconnected it
to the closest neighbor until a block diagonal matrix is obtained. At the end, a graph-
based algorithm is applied to consensus graph to get final partition. To efficiently find the
similarity between two data points, which can be interpreted as the probability of being in
the same cluster, a new index, called the Probabilistic Rand Index (PRI) is developed by
(Carpineto & Romano, 2012). According to the author, they obtained better results than
existing methods. One of the possible problem in consensus framework is an inability to
handle uncertain data points which are assigned the same cluster in about the half of the
partitions and assigned to different clusters in rest of the partitions. This can yield a final
partition with the poor quality. To overcome this limitation, (Yi, Yang, Jin, Jain, &
Mahdavi, 2012) proposed an ensemble clustering method based on the technique of
matrix completion. The proposed algorithm constructs a partially observed similarity
matrix based on the pair of samples which are assigned to the same cluster by most of the
clustering algorithms. Therefore, the similarity matrix consists of three elements 0,1, and
unobserved. It is then used in the matrix completion algorithm to complete unobserved
elements. The final data partition is obtained by applying a spectral clustering algorithm
to final matrix (Yi et al., 2012).
A boosting theory based hierarchical clustering ensemble algorithm called Bob-Hic is
proposed by (Rashedi & Mirzaei, 2013) as an improved version of the method suggested
by (Rashedi & Mirzaei, 2011). Bob-Hic includes several boosting steps, and in each step,
first a weighted random sampling is implied on the data, and then a single hierarchical
clustering is created on the selected samples. At the end, the results of individual
hierarchical clustering are combined to obtain final partition. The diversity and the
quality of combined partitions are critical properties for a strong ensemble. Validity
Indexes are used to select high-quality partition among the produced ones by (Naldi et al.,
2013). In this study, the quality of a partition is measured by using a single index or
combination of some indexes. APMM is another criterion used in determining the quality
of partition proposed by (Alizadeh, Minaei-Bidgoli, & Parvin, 2014). This criterion is
also used to select some partitions among the all the produced partitions. A consensus
particle swarm clustering algorithm based on the particle swarm optimization (PSO)
(Kennedy, 2011) is proposed by (Esmin & Coelho, 2013). According to the results of this
study, the PSO algorithm produces results as good as or better than other well-known
consensus clustering algorithms.
Unsupervised Ensemble Learning 15

A novel consensus clustering called “Gravitational Ensemble Clustering (GEC)” is

proposed by (Sadeghian & Nezamabadi-pour, 2014) based on gravitational clustering
(Wright, 1977). This method combines "weak" clustering algorithms such as k-means,
and according to the authors, it has the ability to determine underlying clusters with
arbitrary shapes, sizes, and densities. A weighted voting based consensus clustering
(Saeed et al., 2014) is proposed to overcome the limitations of the traditional voting-
based methods and improve the performance of combining multiple clusterings of
chemical structures.
To reduce the time and space complexity of the suggested ensemble clustering
methods, (Liu et al., 2015) developed a spectral ensemble clustering approach, where
Spectral clustering is applied on the obtained co-association matrix to compute the final
partition. A stratified sampling method for generating a subspace of data sets with the
goal of producing the better representation of big data in consensus clustering framework
was proposed by (Jing, Tian, & Huang, 2015). Another approach based on (EAC) is
proposed by (Lourenço et al., 2015). This method is not limited to hard partition and fully
uses the intuition of the co-association matrix. They determined the probability of the
assignment of the points to particular cluster by developed methodology.
Another method based on the refinement of the co-association matrix is proposed by
(Zhong, Yue, Zhang, & Lei, 2015). From the data sample level, even if a pair of samples
is in the same cluster, their probability of assignment might vary. This also affects the
contribution of the whole partition. From this perspective, they have developed a refined
co-association matrix by using a probability density estimation function.
A method based on giving the weights to each sample is proposed by (Ren et al.,
2016). This idea is originated in the boosting method which is commonly used in
supervised classification problems. They distinguished points as hard-to-cluster (receive
larger weight) and easy-to- cluster (receive smaller weight) based on agreement between
partition for a pair of samples. To handle the neglecting diversity of the partition in the
combination process, a method based on ensemble-driven cluster uncertainty estimation
and local weighting strategy is proposed by (Huang, Wang, & Lai, 2016). The difference
of each partition is estimated via entropic criterion in conjunction with a novel ensemble-
driven cluster validity measure.
According to the (Huang, Wang, et al., 2016), the concept of super-object which is
the high qualify representation of the data is introduced to reduce the complexity of the
ensemble problem. They cast consensus problem into a binary linear programming
problem, and they proposed an efficient solver based on factor graph to solve it.
More recently, Ünlü and Xanthopoulos have introduced a modified weighted
consensus graph-based clustering method by adding weights that are determined by
internal clustering validity measures. The intuition for this framework comes from the
fact that internal clustering measures can be used for a preliminary assessment of the
quality of each clustering which in turn can be utilized for providing a better clustering
16 Ramazan Ünlü

result. By internal quality measures, they refer to the real-valued quality metrics that are
computed directly from a clustering and do not include calculations that involve data
sample class information as opposed to external quality measures (Ünlü & Xanthopoulos,
2016b). In the next step, they have tried to make this study better in terms of a well-
known evaluation metric; variance. They have optimized internal quality measures by
applying Markowitz Portfolio Theory (MPT). Using the core idea of MPT which is
constructing portfolios to optimize expected return based on a given level of market risk
which is considered as variance, they have taken not only value of the validity measures
itself but variation on them into consideration. By doing this, they aimed to reduce
variance of the accuracy of the final partition which is produced by weighted consensus
clustering (Ünlü & Xanthopoulos, 2016a).
Throughout the section, some featured studies have been summarized. Researches on
consensus clustering are not limited to those summarized above, other contributions can
be seen in (Berikov, 2014; Gupta & Verma, 2014; Kang, Liu, Zhou, & Li, 2016; Lock &
Dunson, 2013; Parvin, Minaei-Bidgoli, Alinejad-Rokny, & Punch, 2013; Su, Shang, &
Shen, 2015; Wang, Shan, & Banerjee, 2011; Wu, Liu, Xiong, & Cao, 2013).


Abello, J., Pardalos, P. M., & Resende, M. G. (2013). Handbook of massive data sets
(Vol. 4): Springer.
Ailon, N., Charikar, M., & Newman, A. (2008). Aggregating inconsistent information:
ranking and clustering. Journal of the ACM (JACM), 55(5), 23.
Al-Razgan, M., & Domeniconi, C. (2006). Weighted clustering ensembles Proceedings
of the 2006 SIAM International Conference on Data Mining (pp. 258-269): SIAM.
Alizadeh, H., Minaei-Bidgoli, B., & Parvin, H. (2014). Cluster ensemble selection based
on a new cluster stability measure. Intelligent Data Analysis, 18(3), 389-408.
Ana, L., & Jain, A. K. (2003). Robust data clustering Computer Vision and Pattern
Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on (Vol.
2, pp. II-II): IEEE.
Ayad, H. G., & Kamel, M. S. (2008). Cumulative voting consensus method for partitions
with variable number of clusters. IEEE Transactions on pattern analysis and
machine intelligence, 30(1), 160-173.
Ayad, H. G., & Kamel, M. S. (2010). On voting-based consensus of cluster ensembles.
Pattern Recognition, 43(5), 1943-1953.
Azimi, J., & Fern, X. (2009). Adaptive Cluster Ensemble Selection. Paper presented at the
Unsupervised Ensemble Learning 17

Azimi, J., Mohammadi, M., & Analoui, M. (2006). Clustering ensembles using genetic
algorithm Computer Architecture for Machine Perception and Sensing, 2006. CAMP
2006. International Workshop on (pp. 119-123): IEEE.
Berikov, V. (2014). Weighted ensemble of algorithms for complex data clustering.
Pattern Recognition Letters, 38, 99-106.
Berkhin, P. (2006). A survey of clustering data mining techniques Grouping
multidimensional data (pp. 25-71): Springer.
Cades, I., Smyth, P., & Mannila, H. (2001). Probabilistic modeling of transactional data
with applications to profiling, visualization and prediction, sigmod. Proc. of the 7th
ACM SIGKDD. San Francisco: ACM Press, 37-46.
Carpineto, C., & Romano, G. (2012). Consensus clustering based on a new probabilistic
rand index with application to subtopic retrieval. IEEE Transactions on pattern
analysis and machine intelligence, 34(12), 2315-2326.
d Souto, M., de Araujo, D. S., & da Silva, B. L. (2006). Cluster ensemble for gene
expression microarray data: accuracy and diversity Neural Networks, 2006.
IJCNN'06. International Joint Conference on (pp. 2174-2180): IEEE.
de Hoon, M. J., Imoto, S., Nolan, J., & Miyano, S. (2004). Open source clustering
software. Bioinformatics, 20(9), 1453-1454.
Dimitriadou, E., Weingessel, A., & Hornik, K. (2002). A combination scheme for fuzzy
clustering. International Journal of Pattern Recognition and Artificial Intelligence,
16(07), 901-912.
Domeniconi, C., & Al-Razgan, M. (2009). Weighted cluster ensembles: Methods and
analysis. ACM Transactions on Knowledge Discovery from Data (TKDD), 2(4), 17.
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., & Papadopoulos, D.
(2007). Locally adaptive metrics for clustering high dimensional data. Data mining
and knowledge discovery, 14(1), 63-97.
Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering
procedure. Bioinformatics, 19(9), 1090-1099.
Esmin, A. A., & Coelho, R. A. (2013). Consensus clustering based on particle swarm
optimization algorithm Systems, Man, and Cybernetics (SMC), 2013 IEEE
International Conference on (pp. 2280-2285): IEEE.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for
discovering clusters in large spatial databases with noise. Paper presented at the
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge
discovery in databases. AI magazine, 17(3), 37.
Fern, X. Z., & Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite
graph partitioning Proceedings of the twenty-first international conference on
Machine learning (pp. 36): ACM.
18 Ramazan Ünlü

Fischer, B., & Buhmann, J. M. (2003). Bagging for path-based clustering. IEEE
Transactions on pattern analysis and machine intelligence, 25(11), 1411-1415.
Fred, A. (2001). Finding consistent clusters in data partitions International Workshop on
Multiple Classifier Systems (pp. 309-318): Springer.
Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence
accumulation. IEEE Transactions on pattern analysis and machine intelligence,
27(6), 835-850.
Ghaemi, R., Sulaiman, M. N., Ibrahim, H., & Mustapha, N. (2009). A survey: clustering
ensembles techniques. World Academy of Science, Engineering and Technology, 50,
Gluck, M. (1989). Information, uncertainty and the utility of categories. Paper presented
at the Proc. of the 7th Annual Conf. of Cognitive Science Society.
Grira, N., Crucianu, M., & Boujemaa, N. (2005). Active semi-supervised fuzzy clustering
for image database categorization Proceedings of the 7th ACM SIGMM international
workshop on Multimedia information retrieval (pp. 9-16): ACM.
Gupta, M., & Verma, D. (2014). A Novel Ensemble Based Cluster Analysis Using
Similarity Matrices & Clustering Algorithm (SMCA). International Journal of
Computer Application, 100(10), 1-6.
Hadjitodorov, S. T., Kuncheva, L. I., & Todorova, L. P. (2006). Moderate diversity for
better cluster ensembles. Information Fusion, 7(3), 264-275.
Haghtalab, S., Xanthopoulos, P., & Madani, K. (2015). A robust unsupervised consensus
control chart pattern recognition framework. Expert Systems with Applications,
42(19), 6767-6776.
Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008). Unsupervised feature selection using
clustering ensembles and population based incremental learning algorithm. Pattern
Recognition, 41(9), 2742-2756.
Hu, X., Yoo, I., Zhang, X., Nanavati, P., & Das, D. (2005). Wavelet transformation and
cluster ensemble for gene expression analysis. International journal of bioinformatics
research and applications, 1(4), 447-460.
Huang, D., Lai, J., & Wang, C.-D. (2016). Ensemble clustering using factor graph.
Pattern Recognition, 50, 131-142.
Huang, D., Wang, C.-D., & Lai, J.-H. (2016). Locally Weighted Ensemble Clustering.
arXiv preprint arXiv:1605.05011.
Iam-On, N., Boongeon, T., Garrett, S., & Price, C. (2012). A link-based cluster ensemble
approach for categorical data clustering. IEEE Transactions on knowledge and data
engineering, 24(3), 413-425.
Iam-On, N., & Boongoen, T. (2012). Improved link-based cluster ensembles Neural
Networks (IJCNN), The 2012 International Joint Conference on (pp. 1-8): IEEE.
Unsupervised Ensemble Learning 19

Iam-On, N., Boongoen, T., & Garrett, S. (2010). LCE: a link-based cluster ensemble
method for improved gene expression data analysis. Bioinformatics, 26(12), 1513-
Jain, A. (1999). Data Clusterting: A Review ACM Computing Surveys, vol. 31.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM
computing surveys (CSUR), 31(3), 264-323.
Jing, L., Tian, K., & Huang, J. Z. (2015). Stratified feature sampling method for
ensemble clustering of high dimensional data. Pattern Recognition, 48(11), 3688-
Kang, Q., Liu, S., Zhou, M., & Li, S. (2016). A weight-incorporated similarity-based
clustering ensemble method based on swarm intelligence. Knowledge-Based Systems,
104, 156-164.
Kantardzic, M. (2011). Data mining: concepts, models, methods, and algorithms: John
Wiley & Sons.
Karypis, G., Aggarwal, R., Kumar, V., & Shekhar, S. (1999). Multilevel hypergraph
partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, 7(1), 69-79.
Kennedy, J. (2011). Particle swarm optimization Encyclopedia of machine learning (pp.
760-766): Springer.
Křivánek, M., & Morávek, J. (1986). NP-hard problems in hierarchical-tree clustering.
Acta informatica, 23(3), 311-323.
Lancichinetti, A., & Fortunato, S. (2012). Consensus clustering in complex networks.
Scientific reports, 2.
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets:
Cambridge University Press.
Li, T., & Ding, C. (2008). Weighted consensus clustering Proceedings of the 2008 SIAM
International Conference on Data Mining (pp. 798-809): SIAM.
Li, T., Ding, C., & Jordan, M. I. (2007). Solving consensus and semi-supervised
clustering problems using nonnegative matrix factorization Data Mining, 2007.
ICDM 2007. Seventh IEEE International Conference on (pp. 577-582): IEEE.
Li, T., Ogihara, M., & Ma, S. (2010). On combining multiple clusterings: an overview
and a new perspective. Applied Intelligence, 33(2), 207-219.
Liu, H., Cheng, G., & Wu, J. (2015). Consensus Clustering on big data Service Systems
and Service Management (ICSSSM), 2015 12th International Conference on (pp. 1-
6): IEEE.
Lock, E. F., & Dunson, D. B. (2013). Bayesian consensus clustering. Bioinformatics,
Lourenço, A., Bulò, S. R., Rebagliati, N., Fred, A. L., Figueiredo, M. A., & Pelillo, M.
(2015). Probabilistic consensus clustering using evidence accumulation. Machine
Learning, 98(1-2), 331-357.
20 Ramazan Ünlü

Luo, H., Jing, F., & Xie, X. (2006). Combining multiple clusterings using information
theory based genetic algorithm Computational Intelligence and Security, 2006
International Conference on (Vol. 1, pp. 84-89): IEEE.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate
observations. Paper presented at the Proceedings of the fifth Berkeley symposium on
mathematical statistics and probability.
McQuitty, L. L. (1957). Elementary linkage analysis for isolating orthogonal and oblique
types and typal relevancies. Educational and Psychological Measurement, 17(2),
Mirkin, B. (2001). Reinterpreting the category utility function. Machine Learning, 45(2),
Naldi, M. C., Carvalho, A. C., & Campello, R. J. (2013). Cluster ensemble selection
based on relative validity indexes. Data mining and knowledge discovery, 1-31.
Nayak, J., Naik, B., & Behera, H. (2015). Fuzzy C-means (FCM) clustering algorithm: a
decade review from 2000 to 2014 Computational Intelligence in Data Mining-
Volume 2 (pp. 133-149): Springer.
Parvin, H., Minaei-Bidgoli, B., Alinejad-Rokny, H., & Punch, W. F. (2013). Data
weighing mechanisms for clustering ensembles. Computers & Electrical
Engineering, 39(5), 1433-1450.
Punera, K., & Ghosh, J. (2008). Consensus-based ensembles of soft clusterings. Applied
Artificial Intelligence, 22(7-8), 780-810.
Rashedi, E., & Mirzaei, A. (2011). A novel multi-clustering method for hierarchical
clusterings based on boosting Electrical Engineering (ICEE), 2011 19th Iranian
Conference on (pp. 1-4): IEEE.
Rashedi, E., & Mirzaei, A. (2013). A hierarchical clusterer ensemble method based on
boosting theory. Knowledge-Based Systems, 45, 83-93.
Ren, Y., Domeniconi, C., Zhang, G., & Yu, G. (2016). Weighted-object ensemble
clustering: methods and analysis. Knowledge and Information Systems, 1-29.
Sadeghian, A. H., & Nezamabadi-pour, H. (2014). Gravitational ensemble clustering
Intelligent Systems (ICIS), 2014 Iranian Conference on (pp. 1-6): IEEE.
Saeed, F., Ahmed, A., Shamsir, M. S., & Salim, N. (2014). Weighted voting-based
consensus clustering for chemical structure databases. Journal of computer-aided
molecular design, 28(6), 675-684.
Sander, J., Ester, M., Kriegel, H.-P., & Xu, X. (1998). Density-based clustering in spatial
databases: The algorithm gdbscan and its applications. Data mining and knowledge
discovery, 2(2), 169-194.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions
on pattern analysis and machine intelligence, 22(8), 888-905.
Unsupervised Ensemble Learning 21

Srivastava, J., Cooley, R., Deshpande, M., & Tan, P.-N. (2000). Web usage mining:
Discovery and applications of usage patterns from web data. Acm Sigkdd
Explorations Newsletter, 1(2), 12-23.
Strehl, A., & Ghosh, J. (2002). Cluster ensembles — a knowledge reuse framework for
combining multiple partitions. Journal of machine learning research, 3(Dec), 583-
Su, P., Shang, C., & Shen, Q. (2015). A hierarchical fuzzy cluster ensemble approach and
its application to big data clustering. Journal of Intelligent & Fuzzy Systems, 28(6),
Sukegawa, N., Yamamoto, Y., & Zhang, L. (2013). Lagrangian relaxation and pegging
test for the clique partitioning problem. Advances in Data Analysis and
Classification, 7(4), 363-391.
Topchy, A., Jain, A. K., & Punch, W. (2003). Combining multiple weak clusterings Data
Mining, 2003. ICDM 2003. Third IEEE International Conference on (pp. 331-338):
Topchy, A., Jain, A. K., & Punch, W. (2004). A mixture model for clustering ensembles
Proceedings of the 2004 SIAM International Conference on Data Mining (pp. 379-
390): SIAM.
Topchy, A., Jain, A. K., & Punch, W. (2005). Clustering ensembles: Models of consensus
and weak partitions. IEEE Transactions on pattern analysis and machine
intelligence, 27(12), 1866-1881.
Tumer, K., & Agogino, A. K. (2008). Ensemble clustering with voting active clusters.
Pattern Recognition Letters, 29(14), 1947-1953.
Ünlü, R., & Xanthopoulos, P. (2016a). A novel weighting policy for unsupervised
ensemble learning based on Markowitz portfolio theory. Paper presented at the
INFORMS 2016, Nashville, TN.
Ünlü, R., & Xanthopoulos, P. (2016b). A weighted framework for unsupervised ensemble
learning based on internal quality measures. Manuscript submitted for publication.
Vega-Pons, S., Correa-Morris, J., & Ruiz-Shulcloper, J. (2008). Weighted cluster
ensemble using a kernel consensus function. Progress in Pattern Recognition, Image
Analysis and Applications, 195-202.
Vega-Pons, S., Correa-Morris, J., & Ruiz-Shulcloper, J. (2010). Weighted partition
consensus via kernels. Pattern Recognition, 43(8), 2712-2724.
Vega-Pons, S., & Ruiz-Shulcloper, J. (2009). Clustering ensemble method for
heterogeneous partitions. Paper presented at the Iberoamerican Congress on Pattern
Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble
algorithms. International Journal of Pattern Recognition and Artificial Intelligence,
25(03), 337-372.
22 Ramazan Ünlü

Wang, H., Shan, H., & Banerjee, A. (2011). Bayesian cluster ensembles. Statistical
Analysis and Data Mining, 4(1), 54-70.
Weingessel, A., Dimitriadou, E., & Hornik, K. (2003). An ensemble method for
clustering Proceedings of the 3rd International Workshop on Distributed Statistical
Wright, W. E. (1977). Gravitational clustering. Pattern Recognition, 9(3), 151-166.
Wu, J., Liu, H., Xiong, H., & Cao, J. (2013). A Theoretic Framework of K-Means-Based
Consensus Clustering. Paper presented at the IJCAI.
Xanthopoulos, P. (2014). A review on consensus clustering methods Optimization in
Science and Engineering (pp. 553-566): Springer.
Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of
Data Science, 2(2), 165-193.
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on
neural networks, 16(3), 645-678.
Yi, J., Yang, T., Jin, R., Jain, A. K., & Mahdavi, M. (2012). Robust ensemble clustering
by matrix completion Data Mining (ICDM), 2012 IEEE 12th International
Conference on (pp. 1176-1181): IEEE.
Yu, H., Liu, Z., & Wang, G. (2014). An automatic method to determine the number of
clusters using decision-theoretic rough set. International Journal of Approximate
Reasoning, 55(1), 101-115.
Zhong, C., Yue, X., Zhang, Z., & Lei, J. (2015). A clustering ensemble: Two-level-
refined co-association matrix with path-based transformation. Pattern Recognition,
48(8), 2699-2709.


Dr. Ramazan Unlu has a Ph.D. in Industrial Engineering from the University of
Central Florida with particular interest in data mining including classification and
clustering methods. His dissertation was titled “Weighting Policies for Robust
Unsupervised Ensemble Learning”. Besides doing his research, he has served as
Graduate Teaching Assistant in several courses during his Ph.D. Prior to enrolling at
UCF, he holds a master degree in Industrial engineering from University of Pittsburgh
and B.A. in Industrial Engineering from Istanbul University. For his master and doctoral
education, he won the fellowship that was given 26 Industrial Engineers by the Republic
of Turkey Ministry of National Education in 2010.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 2



Edwin Cortes1, Luis Rabelo2, and Gene Lee3


Institute of Simulation and Training, Orlando, Florida, US
Department of Industrial Engineering and Management Systems,
University of Central Florida, Orlando, Florida, US
Department of Industrial Engineering and Management Systems,
University of Central Florida, Orlando, Florida, US


This research discusses the utilization of deep learning for selecting the time
synchronization scheme that optimizes the performance of a particular parallel discrete
simulation hardware/software arrangement. The deep belief neural networks are able to
use measures of software complexity and architectural features to recognize, match
patterns and therefore to predict performance. Software complexities such as simulation
objects, branching, function calls, concurrency, iterations, mathematical computations,
and messaging frequency were given a weight based on the cognitive weighted approach.
In addition, simulation objects and hardware/network features such as the distributed
pattern of simulation objects, CPUs features (e.g., multithreading/multicore), and the
degree of loosely vs tightly coupled of the utilized computer architecture were also
captured to define the parallel distributed simulation arrangement. Deep belief neural
networks (in particular the restricted Boltzmann Machines (RBMs) were then used to
perform deep learning from the complexity parameters and their corresponding time
synchronization scheme value as measured by speedup performance. The simulation

Corresponding Author Email:
24 Edwin Cortes, Luis Rabelo and Gene Lee

optimization techniques outlined could be implemented within existing parallel

distributed simulation systems to optimize performance.

Keywords: Deep Learning, Neural Networks, Complexity, Parallel Distributed



Parallel distributed discrete event simulation (PDDES) is the execution of a discrete

event simulation on a tightly or loosely coupled computer system with several
processors/nodes. The discrete-event simulation model is decomposed into several logical
processors (LPs) or simulation objects that can be executed concurrently using
partitioning types (e.g., spatial and temporal) (Fujimoto, 2000). Each LP/simulation
object of a simulation (which can be composed of numerous LPs) is located in a single
node. PDDES is very important in particular for:

 Increase Speed (i.e., Reduced Execution Time) due to the parallelism

 Increase Size of the Discrete Event Simulation Program and/or data generation
 Heterogeneous Computing
 Fault Tolerance
 Usage of unique resources in Multi-Enterprise/Geographical Distributed
 Protection of Intellectual Property in Multi-Enterprise simulations.

One of the problems with PDDES is the time management to provide flow control
over event processing, the process flow, and the coordination of the different LPs and
nodes to take advantage of parallelism. There are several time management schemes
developed such as Time Warp (TW), Breathing Time Buckets (BTB), and Breathing
Time Warp (BTW) (Fujimoto, 2000). Unfortunately, there is not a clear methodology to
decide a priori a time management scheme to a particular PDDES problem in order to
achieve higher performance.
This research shows a new approach for selecting the time synchronization technique
class that corresponds to a particular parallel discrete simulation with different levels of
simulation logic complexity. Simulation complexities such as branching, function calls,
concurrency, iterations, mathematical computations, messaging frequency and number of
simulation objects were given a weighted parameter value based on the cognitive weight
approach. Deep belief neural networks were then used to perform deep learning from the
simulation complexity parameters and their corresponding time synchronization scheme
value as measured by speedup performance.
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 25

Time Warp (TW)

The optimistic viewpoint in time management of simulation objects uses a different

strategy for obtaining parallelism by aggressively processing simulation events without
regard for accuracy. Rollback techniques are implemented to undo events that might have
been processed out of order whenever straggler event messages are received from other
simulation objects located in different nodes (Figure 1). In this manner, events are
executed optimistically. While the optimistic approach places no restrictions on how
simulation objects can interact, the biggest drawback is that models must be developed in
a rollbackable manner. Optimistic event processing is able to achieve optimal execution
of the chain of dependent events that limit the performance of a simulation.
The Time Warp (TW) event management provides an efficient rollback mechanism
for each simulation object (Fujimoto, 2000). The simulation time of each simulation
object is defined as the time stamp of its last executed event, or the time of the event it is
presently executing. When a simulation object receives a straggler event in its past, it
rolls the simulation object back to its last correctly processed event. These events that
were rolled back are either reprocessed or rolled forward. A rolled back event can be
safely rolled forward if the straggler event does not modify any of the simulation object’s
state variables that were accessed by the event when it was originally executed
(Steinman, 1991; Steinman, 1992).
TW does not rollback the entire node when a simulation object receives a straggler
message. Instead, only the affected simulation object is rolled back. Of course, during the
rollback, all events scheduled by those events that were rolled back must also be
retracted, potentially causing secondary (or cascading) rollbacks. Each event must
therefore keep track of its generated events until the event itself is committed. Retraction
messages, used to withdraw incorrectly scheduled event messages in Time Warp, are
called antimessages (Fujimoto, 2000).
Figure 1 details the process of rollback and the cascading of antimessages. Rollback
can be started when a simulation object receives a straggler message (one which tag is
before the current simulation time of the simulation object). This straggler message will
make several processed events invalid (the ones from the time tag of the straggler event
to the current simulation time of the simulation object).
26 Edwin Cortes, Luis Rabelo and Gene Lee

Figure 1: The process of rollback in TW using antimessages and the process of cancellation of events.

Breathing Time Buckets (BTB)

BTB is a hybrid between the Fixed Time Buckets algorithm and TW (Steinman,
1993). Unlike TW, “messages generated while processing events are never actually
released until it is known that the event generating the messages will never be rolled
back” (Steinman, 1993). This means that messages which cause invalid events with
potential antimessages are not released. Therefore, BTB is a hybrid in the following

 BTB is TW without antimessages.

 BTB processes events in time window cycles like Fixed Time Buckets however
cycles are not fixed.

The Event Horizon is an important concept in BTB (Steinman, 1994). The event
horizon is the point in time where events generated by the simulation turn back into the
simulation. At the event horizon, all new events that were generated through event
processing at the previous “bucket” could be sorted and merged back into the main event
queue. Parallelism can be exploited because the event processed in each event horizon
cycle has time tags earlier than the cycle’s event horizon. Therefore, it is important to
calculate the Global Event Horizon (GEH) with its respective Global Virtual Time (GVT)
to avoid problems with events that will be scheduled in others simulation objects
(Steinman, 1994). The local event horizon (Figure 2) only considers the event horizon for
events being processed on its node, while the global event horizon factors all nodes in its
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 27

calculation. Once all of the nodes have processed events up to their local event horizon,
they are then ready to synchronize. The next step is to compute the global event horizon
as the minimum local event horizon across all nodes. Once GVT is determined, all events
with time stamps less than or equal to GVT are committed (Steinman, Nicol, Wilson, &
Lee, 1995).

Figure 2: The Event Horizon and GVT.

A potential problem is that some of the nodes may have processed events that went
beyond GVT. An event processed by the respective simulation object must be rolled back
when a newly generated events is received in its past. Rollback is very simple in this case
and involves discarding unsent messages that were generated by the event and then
restoring state variables that were modified by the event. Therefore, antimessages are not
required because messages that would create bad events are never released (Steinman

Breathing Time Warp (BTW)

BTW is another hybrid algorithm for time management and event synchronization
that tries to solve the problems with TW and BTB (Steinman, 1993):

 TW has the potential problem of rollback and cascading antimessage explosions.

 BTW has the potential problem of a higher frequency of synchronizations.
28 Edwin Cortes, Luis Rabelo and Gene Lee

Cascading antimessage explosions can occur when events are close to the current
GVT. Because events processed far ahead of the rest of the simulation will likely be
rolled back, it might be better for those runaway events to not immediately release their
messages. On the other hand, using TW as an initial condition to bring BTB reduces the
frequency of synchronizations and increases the size of the bucket.
The process of BTW is explained as follows:

1. The first simulation events processed locally on each node beyond GVT release
their messages right away as in TW. After that, messages are held back and the
BTW starts execution.
2. When the events of the entire cycle are processed, or when the event horizon is
determined, each node requests a GVT update. If a node ever processes more
events beyond GVT, it temporarily stops processing events until the next GVT
cycle begins.” These parameters are defined by the simulation engineer. An
example of a typical processing cycle for a three-node execution is provided in
Figure 3.

Figure 3: BTW cycle in three nodes. The first part of the cycle is Time Warp (TW) and it ends with
Breathing Time Buckets (BTB) until GVT is reached.


Deep neural architectures with multiple hidden layers were difficult to train and
unstable with the backpropagation algorithm. Empirical results show that using
backpropagation alone for neural networks with 3 or more hidden layers produced poor
solutions (Larochelle, Bengio, Louradour, & Lamblin, 2009).
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 29

Hinton, Osindero, & Teh (2006) provided novel training algorithms that trained
multi-hidden layer deep belief neural networks (DBNs). Their work introduced the
greedy learning algorithm to train a stack of restricted Boltzmann machines (RBMs),
which compose a DBN, one layer at a time. The central concept of accurately training a
DBN, that extracts complex patterns in data, is to find the matrix of synaptic neuron
connection weights that produce the smallest error for the training (input-data) vectors.
The fundamental learning blocks of a DBN are stacked restricted Boltzmann
machines. The greedy algorithm proposed by Hinton et al. (2006) focused on allowing
each RBM model in the stack to process a different representation of the data. Then, each
model transforms its input-vectors non-linearly and generates output-vectors that are then
used as input for the next RBM in the sequence.
When RBMs are stacked, they form a composite generative model. RBMs are
generative probabilistic models between input units (visible) and latent (hidden) units
(Längkvist, Karlsson, & Loutfi, 2014). An RBM is also defined by Zhang, Zhang, Ji, &
Guo (2014) as a parameterized generative model representing a probability distribution.
Figure 4 shows an RBM (at lower level) with binary variables in the visible layer and
stochastic binary variables in the hidden layer (Hinton et al., 2012). Visible units have not
synaptic connections between them. Similarly, hidden units are not interconnected. No
hidden-hidden or visible-visible connectivity makes the Boltzmann machines restricted.
During learning, the RBM at higher-level (Figure 4) uses the data generated by the
hidden activities of the lower RBM.

Figure 4: Two RBMs.

30 Edwin Cortes, Luis Rabelo and Gene Lee

Zhang et al. (2014) stated that learning in an RBM is accomplished by using training
data and “adjusting the RBM parameters such that the probability distribution represented
by the RBM fits the training data as well as possible.” RBMs are energy-based models.
As such, a scalar energy is associated to each variable configuration. Per Bengio (2009),
learning from data corresponds to performing a modification of the energy function until
its shape represents the properties needed. This energy function has different forms
depending on the type of RBM it represents. Binary RBMs, also known as Bernoulli
(visible)-Bernoulli (hidden) have an energy E (energy of a joint configuration between
visible and hidden units) function of the form:


E(𝐯, 𝐡; θ) = − ∑ ∑ 𝑤𝑖𝑗 𝑣𝑖 ℎ𝑗 − ∑ 𝑏𝑖 𝑣𝑖 − ∑ 𝑎𝑗 ℎ𝑗 (1)

𝑖=1 𝑗=1 𝑖=1 𝑗=1

The variables 𝑤𝑖𝑗 represent the weight (strength) of a neuron connection between a
visible (𝑣𝑖 ) and hidden units (ℎ𝑗 ). Variables 𝑏𝑖 and 𝑎𝑗 are the visible units biases and the
hidden units biases, respectively. I and J are the number of visible and hidden units,
respectively. The set θ represents the vector variables 𝒘, 𝒃, and 𝒂 (Hinton, 2010;
Mohamed et al., 2011; Mohamed, Dahl, & Hinton, 2012).
On the other hand, a Gaussian RBM (GRBM), Gaussian (visible)-Bernoulli (hidden),
has an energy function of the form:

E(𝐯, 𝐡; θ) = − ∑ ∑ 𝑤𝑖𝑗 𝑣𝑖 ℎ𝑗 − ∑(𝑣𝑖 − 𝑏𝑖 )𝟐 − ∑ 𝑎𝑗 ℎ𝑗 (2)
𝑖=1 𝑗=1 𝑖=1 𝑗=1

RBMs represent probability distributions after being trained. They assign a

probability to every possible input-data vector using the energy function. Mohamed et al.
(2012) stated that the probability that the model assigns to a visible vector 𝐯 is as follows:

∑𝒉 𝑒 −E(𝐯,𝐡;θ)
p(𝐯; θ) = (3)
∑𝐯 ∑𝐡 𝑒 −E(𝐯,𝐡;θ)

For binary RBMs, the conditional probability distributions are sigmoidal in nature
and are defined by:

𝑝(ℎ𝑗 = 1|𝐯; θ) = 𝝈 (∑ 𝑤𝑖𝑗 𝑣𝑖 + 𝑏𝑗 ) (4)


Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 31

𝑝(𝑣𝑖 = 1|𝐡; θ) = 𝝈 (∑ 𝑤𝑖𝑗 ℎ𝑗 + 𝑎𝑖 ) (5)


where 𝜎(𝜆) = 1+𝑒 −𝜆 is a sigmoid function (Hinton, 2006; Hinton et al., 2006).
Real-valued GRBMs have a conditional probability for ℎ𝑗 =1, a hidden variable
turned on, given the evidence vector 𝐯 of the form:

𝑝(ℎ𝑗 = 1|𝐯; θ) = 𝝈 (∑ 𝑤𝑖𝑗 𝑣𝑖 + 𝑏𝑗 ) (6)


The GRBM conditional probability for 𝑣𝑖 =1, given the evidence vector h, is
continuous-normal in nature and has the form

𝑝(𝑣𝑖 |𝐡; θ) = 𝓝 (∑ 𝑤𝑖𝑗 ℎ𝑗 + 𝑎𝑖 , 1) (7)


(v −μ )2
− i 𝑖
e 2 J
where 𝒩(μ𝑖 , 1) = is a Gaussian distribution with mean μ𝑖 = ∑j=1 wij hj + ai
and variance of unity (Mohamed et al., 2012; Cho, Ilin, & Raiko, 2011).
Learning from input-data in an RBM can be summarized as calculating a good set of
neuron connection weight vectors, 𝒘, that produce the smallest error for the training
(input-data) vectors. This also implies that a good set of bias (b and a) vectors must be
determined. Because learning the weights and biases is done iteratively, the weight
update rule is given by ∆𝒘𝒊𝒋 (equation 8). This is the partial derivative of the log-
likelihood probability of a training vector with respect to the weights,

∂ log[p(𝐯)]
= ∆𝑤𝑖𝑗 = 〈𝑣𝑖 ℎ𝑗 〉𝑑𝑎𝑡𝑎 − 〈𝑣𝑖 ℎ𝑗 〉𝑚𝑜𝑑𝑒𝑙 ) (8)

This is well explained by Salakhutdinov and Murray (2008), Hinton (2010), and
Zhang et al. (2014). However, this exact computation is intractable because 〈𝑣𝑖 ℎ𝑗 〉𝑚𝑜𝑑𝑒𝑙
takes exponential time to calculate exactly (Mohamed et al., 2011). In practice, the
gradient of the log-likelihood is approximated.
Contrastive divergence learning rule is used to approximate the gradient of the log-
likelihood probability of a training vector with respect of the neuron connection weights.
The simplified learning rule for an RBM has the form (Längkvist et al., 2014):

∆𝒘𝒊𝒋 ∝ 〈𝒗𝒊 𝒉𝒋 〉𝒅𝒂𝒕𝒂 − 〈𝒗𝒊 𝒉𝒋 〉𝒓𝒆𝒄𝒐𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 (9)

32 Edwin Cortes, Luis Rabelo and Gene Lee

The reconstruction values for 𝑣𝑖 𝑎𝑛𝑑 ℎ𝑗 are generated by applying equations 4 and 5 ,
or 7 for GRBM as explained by (Mohamed et al., 2012) in a Markov Chain using Gibbs
sampling. Post Gibbs sampling, the contrastive divergence-learning rule for an RBM can
be calculated and the weights of the neuron connections updated based on ∆𝑤. The
literature also shows that RBM learning rule (equation 9) may be modified with constants
such as learning rate, weight-cost, momentum, and mini-batch sizes for a more precise
calculation of neuron weights during learning. Hinton et al. (2006) described that the
contrastive divergence learning in an RBM is efficient enough to be practical.
In RBM neuron learning, a gage of the error between visible unit probabilities and
their reconstruction probabilities computed after Gibbs sampling is accomplished by
cross-entropy. The cross-entropy, between the Bernoulli probability distributions of each
element of the visible units vdata and its reconstruction probabilities vrecon, is defined by
Erhan, Bengio, & Courville (2010) as follows:

𝐶𝐸𝐸 = − ∑ [𝑣𝑑𝑎𝑡𝑎 𝑖 log(𝑣𝑟𝑒𝑐𝑜𝑛 𝑖 ) + (1 − 𝑣𝑑𝑎𝑡𝑎 𝑖 ) log( 1 − 𝑣𝑟𝑒𝑐𝑜𝑛 𝑖 )] (10)


For the final DBN learning phase, after each stack of RBMs in the DBN pre-training
via greedy layer-wise unsupervised, the complete DBN is fine-tuned in a supervised way.
The supervised learning via the backpropagation algorithm uses label data (classification
data) to calculate neuron weights for the complete deep belief neural network. Hinton et
al. (2006) used the wake-sleep algorithm for fine-tuning a DBN. However, recent
research has demonstrated the backpropagation algorithm is faster and has lower
classification error (Wulsin et al., 2011). In backpropagation, the derivative of the log
probability distribution over class labels is propagated to fine-tune all neuron weights in
the lower levels of a DBN.
In summary, the Greedy Layer-Wise algorithm proposed by Hinton pre-trains the
DBN one layer at a time using contrastive divergence and Gibbs sampling, starting from
the bottom fist layer of visible variables to the top of the network – one RBM at a time
(Figure 5). After pre-train, the final DBN is fine-tuned in a top-down mode using several
algorithms such as the supervised backpropagation (Hinton & Salakhutdinov, 2006;
Larochelle et al., 2009) or the wake-sleep (Hinton et al., 2006; Bengio, 2009) – among
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 33

Figure 5: RBM Neuron Learning: Gibbs Sampling and Weight Update.



Simulation Kernel and Experiments

The parallel discrete event simulator utilized was WarpIV (. This simulation kernel is
able to host discrete-event simulations over parallel and distributed cluster computing
environments. WarpIV supports heterogeneous network applications through its portable
high-speed communication infrastructure which integrates both shared memory with
standard network protocols to facilitate high bandwidth and low latency message passing
34 Edwin Cortes, Luis Rabelo and Gene Lee

We provide an example of programming in WarpIV in this section to illustrate this

simulator and PDDES paradigm as depicted in Figure 6.

Figure 6: Aircraft range detection scenario using two types of simulations objects (radar and aircraft).

The aircraft range detection simulation program implements a parallel distribute

discrete event simulation with interaction of multiple aircrafts and multiple radars. The
simulation randomly initializes the position of each aircraft object and each ground radar
object. Their position (X, Y, Z) is represented by the earth centered rotational Cartesian
coordinates (ECR). After initialization, the simulation detects an aircraft’s proximity to a
ground radar using a pre-established range value for detection.
The experiment executes several runs (24 in total = 8 for each time management and
synchronization scheme) with specific computing configurations. Table 1 shows the
results of this experiment and the different runs. These are the definitions of the columns
of Table 1:
Table 1: Experiment results for each computing configuration and time management and synchronization scheme
(BTW, BTB, and TW)

Wall Min Max Mean
Speedup Speedup
Local Globel Clock PT Committed PT Committed PT Committed PT Sigma
Rel Theoratical
Time Pernode Pernode Pernode
BTM 1 1 16.5 1.0 3.0 15.6 15.6 15.6 15.6 0.0
1 2 14.1 1.2 3.0 15.6 5.3 10.3 7.8 2.5
1 3 12.4 1.3 3.0 15.7 5.2 5.3 5.2 0.0
1 4 11.4 1.4 3.0 15.6 0.0 5.3 3.9 2.2
2 to 4 14 6.1 2.7 3.0 15.4 0.0 5.2 1.1 2.5
4 8 6.5 2.6 3.0 15.5 0.0 5.2 1.9 2.2
4 4 9.4 1.8 3.0 15.5 0.0 5.2 3.9 0.0
3 3 10.5 1.6 3.0 15.8 5.3 5.3 5.3 0.0
BTB 1 1 16.1 1.0 3.0 15.6 5.7 5.7 5.7 2.5
1 2 62.1 0.3 3.0 15.6 5.3 10.3 7.8 0.5
1 3 148.0 0.1 3.0 15.6 5.1 5.2 5.2 2.2
1 4 162.6 0.1 3.0 15.7 0.0 5.3 3.9 2.1
2 to 4 14 7.7 2.1 3.0 15.4 0.0 5.2 1.1 2.5
4 8 6.2 2.6 3.0 15.3 0.0 5.2 1.2 2.2
4 4 9.4 1.7 3.0 15.5 0.0 5.2 3.9 0.0
3 3 10.2 1.6 3.0 15.6 5.2 5.2 5.2 0.0
TW 1 1 17.2 1.0 3.0 15.6 15.6 15.6 15.6 0.0
1 2 13.8 1.2 3.0 15.6 5.3 10.3 7.8 2.5
1 3 12.6 1.4 3.0 15.6 5.2 5.3 5.2 0.0
1 4 10.9 1.6 3.0 15.5 0.0 5.2 3.9 2.2
2 to 4 14 5.9 2.9 3.0 15.4 0.0 5.2 1.1 2.1
4 8 6.2 2.8 3.0 15.3 0.0 5.2 1.9 2.5
4 4 10.0 1.7 3.0 15.5 0.0 5.2 3.9 2.2
3 3 11.4 1.5 3.0 15.8 5.2 5.3 5.3 0.0
36 Edwin Cortes, Luis Rabelo and Gene Lee

Wall Clock Time (elapsed wall time in seconds) is a measure of the real time that
elapses from start to end, including time that passes due to programmed (artificial) delays
or waiting for resources to become available. In other words, it is the difference between
the time at which a simulation finishes and the time at which the simulation started. It is
given in seconds.
Speedup Rel (Speedup Relative) is

 T(Wall Clock Time for 1 Node for that time synchronization scheme)
 T(Wall Clock Time for Nodes used for that time synchronization scheme).

Speedup Theoretical is based on the Simulation Object with the longest processing
time. It is the maximum (approximated) Speedup expected using an excellent parallelized
scheme (taking advantage of the programming features, computer configuration of the
system, and partitions of the problem).
PT (processing time) is the total CPU time required to process committed events, in
seconds. The processing time does not include the time required to process events that are
rolled back, nor does it include additional overheads such as event queue management
and messages.
Min Committed PT per Node is the Minimum Committed Processing Time per
Node of the computing system configuration utilized.
Max Committed PT per Node is the Maximum Committed Processing Time per
node of the computing system configuration utilized.
Mean Committed PT per Node is the Mean Committed Processing Time per node
of the Computing system configuration utilized.
Sigma is the standard deviation of the processing times of the different nodes utilized
in the experiment.

The benchmark for the different time management and synchronization schemes
(TW, BTB, and BTW) is depicted in Figure 7. TW has the best result of 2.9 (close to the
theoretical speedup of 3.0). BTW and TW are very comparable. BTW does not perform
well with this type of task for distributed systems. However, BTW has better
performance with the utilization of multicore configurations (i.e., tightly coupled) for this
specific problem.
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 37

Figure 7: Combined Speedup chart for BTW, BTB, and TW for different number of processors (nodes)
– A global node is a separate cluster. A local node is a node from a specific cluster. Therefore Global 3
and Local 3 means 3 separate clusters and each one with 3 computers (in total 9 nodes).

Characterization of Software Complexity

Measuring simulation algorithm complexity is challenging. Researchers have

proposed measures that categorized complexity by measures such as number of codes
lines, code internal structures, and interfaces. Shao and Wang (2003) and Misra (2006)
examined software complexity with the perspective of software being a product of the
human creative process. As such, they explored complexity measures based on cognitive
weights, which takes into account the complexity of cognitive and psychological
components of software. In this paradigm, cognitive weights represent the effort and
relative time required to comprehend a software piece. The approach suggests that
software complexity is directly propositional to the complexity of understanding the
information contained in it. We have selected this measure because is the most
recognized in the literature.
Using cognitive weights of basic control structures to measure complexity addresses
the cognitive and architectural aspects of software complexity. Basic fundamental logic
blocks of software constructs such as conditional if-then statements, method calls, for-
loops, etc. are assigned a weight value. Table 2 shows the cognitive weights of each type
of basic software control structure (BCS).
38 Edwin Cortes, Luis Rabelo and Gene Lee

Table 2: Cognitive Weights

Category Basic Control Structure Weight

Sequence Sequemce 1
Branch If-Then-Else 2
Case 3
Iteration For Loop 3
Repeat-untill 3
While-do 3
Embedded Component Function Call 3
Concurrency Parallel 4
Interrupt 4

The total cognitive weight 𝑤𝑐 of a piece of software c is computed by applying the

following equation 11 and considering several nesting structures j, k, and i:

𝑞 𝑛
𝑊𝑐 = ∑𝑗=1[∏𝑚
𝑘=1 ∑𝑖=1 𝑤𝑐 (𝑗, 𝑘, 𝑖)] ……………… (11)

Cognitive weight scores for a particular block of software contributes more to total
weigh if multiple basic control structures are encompassed within nested sections. For
example, methodA() in Figure 8 achieves a larger cognitive weight than methodB() due
to nested while-loop inside the if-then construct.

Figure 8: Cognitive Weights Sample Calculations.

Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 39

Input and Output Vectors for a Sample

This research implements cognitive weights to measure the complexity of a parallel

discrete event simulation with respect of implemented algorithms. Because each
simulation object in a simulation implements discrete events defined as code functions,
the complexity of each object is also computed by applying equation 11 for all
event/methods mapped to each simulation object. As a result, several parameters that
gage simulation complexity are then used as inputs to the deep belief neural network for
deep learning. These are: Total Simulation program cognitive weights, maximum
cognitive weights of all simulation objects, minimum cognitive weights of all simulation
objects, mean cognitive weights of all objects.
In addition, we have captured other parameters that define the hardware, flow
processing, potential messaging and other important characteristics that define a parallel
distributed discrete-event simulator implementation. The different components are
defined as follows:

1. Total Simulation Program Cognitive Weights: It is the total number of

cognitive weights of the simulation program.
2. Number of Simulation objects: It is the total number of simulation objects
in the simulation.
3. Types of Simulation objects: It is the number of classes of Simulation
Objects utilized in the simulation.
4. Mean Events per Simulation Object: It is the mean of the events per
simulation object.
5. STD Events per Simulation Object: It is the standard deviation of the
events per simulation object.
6. Mean of Cognitive Weights of All objects: It is the mean of the number of
cognitive weights used by the simulation objects in the simulation.
7. STD Cognitive Weights of All objects: It is the standard deviation of the
number of cognitive weights used by the simulation objects in the simulation.
8. Number of Global Nodes: It is the total number of Global Nodes in the
9. Mean Local Nodes per Computer: It is the mean of the local nodes per
global node utilized in the simulation.
10. STD Local Nodes per Computer: It is the standard deviation of the local
nodes per global node utilized in the simulation.
11. Mean Number of cores: It is the mean number of cores/threads utilized by
each global node in the simulation.
12. STD Number of cores: It is the standard deviation of number of
cores/threads utilized by each global node in the simulation.
13. Mean processor Speed: It is the mean processor speed of the CPUs used in
the simulation.
40 Edwin Cortes, Luis Rabelo and Gene Lee

14. STD processor Speed: It is the standard deviation of the speed of the CPUs
used in the simulation.
15. Mean RAM: It is the mean of the RAM memory used by the CPUs in the
16. STD RAM: It is the standard deviation of the RAM memory used by the
CPUs in the system.
17. Critical Path%: It is the Critical Path taking into consideration the
sequential estimated processing time.
18. Theoretical Speedup: It is the theoretical (maximum) speedup to be
achieved with perfect parallelism in the simulation.
19. Local Events/(Local Events + External Events): It is the ratio of the total
local events divided by the summation of the total local events and the total
external events during a specific unit of Simulation Time (estimated).
20. Subscribers/(Publishers + Subscribers): It is the ratio of the total number
of objects subscribing to a particular object divided by the summation of the
total number of publishers and subscribers.
21. Block or Scatter?: Block and scatter are decomposition algorithms being
used to distribute the simulation objects in the parallel/distributed system - If
Block is being selected then this value is 1. and if Scatter is selected then this
value is 0.

For example, for the discussed aircraft detection implementation, this is the following
input vector using the hardware and complexity specifications from Figures 6 and 7, and
Tables 1 and 2 for a configuration of 4 Global Nodes and 1 Local Node (a loosely
coupled system) using “Block” as the distribution scheme for the simulation objects is
shown in Table 3.
And the output for a DBN will be based on Table 4 where the Wall Clock Time for
BTW is 11.4 seconds, for BTB is 162.6 seconds, and for TW is 10.9 seconds. Table 4
displays the output vector of the respective case study of aircraft detection.

Table 3: Vector that defines the PDDES implementation for the aircraft detection
with 4 Global Nodes and 1 Local Node using Block

Complexity Parameters that Capture the hardware/software Structure of a Parallel

Distributed Discrete-Event Simulator
Total Simulation Program Cognitive Weights 2919
Number of Sim objects 6
Types of Sim objects 3
Mean Events per Object 1
STD Events per Simulation Object 0
Mean Cog Weights of All objects 1345
STD Cog Weights of All objects 1317
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 41

Complexity Parameters that Capture the hardware/software Structure of a Parallel

Distributed Discrete-Event Simulator
Number of Global Nodes 4
Mean Local Nodes per Computer 1
STD Local Nodes per Computer 0
Mean Number of cores 1
STD Number of cores 0
Mean processor Speed 2.1
STD processor Speed 0.5
Mean RAM 6.5
Critical Path% 0.32
Theoretical Speedup 3
Local Events/(Local Events + External Events) 1
Subscribers/(Publishers + Subscribers) 0.5
Block or Scatter? 1

Table 4. TW has the minimum wall clock time for the aircraft detection problem
using 4 Global Nodes and 1 Local Node with Block

Time Management and

Best (Minimum Wall Clock Time)
Synchronization Scheme
TW 1


This is the methodology devised in order to recognize the best time management and
synchronization scheme for a PDDES problem. The input vector is define based on the
complexity and features of the software, hardware, and messaging of the PDDES
problem (as explained above). The output vector defines the best time management and
synchronization scheme (TW, BTW, BTB). This pattern matching is achieved using a
DBN trained with case studies performed by a Parallel Distributed Discrete-Event
Simulator. This methodology is depicted in Figure 9.
42 Edwin Cortes, Luis Rabelo and Gene Lee

Figure 9: Classification of Optimistic Synchronization Scheme with DBN.


This section deals with the testing of our proposed idea of using deep belief networks
as pattern-matching mechanisms for time management and synchronization of parallel
distributed discrete-event simulations. The performance criterion and the knowledge
acquisition scheme will be presented. This discussion includes an analysis of the results.

Performance Criterion, Case Studies, and Training Scheme

For these studies the performance criterion which will be used the minimum wall-
clock time. Wall-clock time means the actual time taken by the computer system to
complete a simulation. Wall-clock time is very different from CPU time. CPU time
measures the time during which the processor (s) is (are) actively working on a certain
task (s), wall-clock time calculates the total time for the process (es) to complete.
Several PDDES problems were selected to generate the case studies in order to train
the DBN. We had in total 400 case studies. Two hundred case studies were selected for
training (i.e., to obtain the learning parameters), one hundred case studies for validation
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 43

(i.e., to obtain the right architecture, and one hundred for testing (i.e., to test the DBN
The training session for a DBN was accomplished. There are three principles for
training DBNs:

1. Pre-training one layer at a time in a greedy way;

2. Using unsupervised learning at each layer in a way that preserves information from
the input and disentangles factors of variation;
3. Fine-tuning the whole network with respect to the ultimate criterion of interest
We have used method No. 2 for this research because is the most recognized one
(Mohamed et al., 2011). In addition, we developed several standard backpropagation
networks with only one hidden layer and they never converged with the training data.


The finalized DBN has the following training and testing performance as shown in
Figure 10. It is important to remember that the training set was of 200 case studies
selected, the validation set with 100 case studies, and the testing set was composed of 100
case studies. The validation set is user in order to get right architecture that leads to
higher performance. Figure 10 indicates the performance obtained with DBNs for this

Figure 10: Confusion matrix for two DBNs.

Stating the research question initiates the research methodology process. This
investigation starts by asking: Is there a mechanism to accurately model and predict what
is the best time management and synchronization scheme for a parallel discrete event
simulation environment (program and hardware)? Based on the results, this was
accomplished in spite of the limited number of case studies.
44 Edwin Cortes, Luis Rabelo and Gene Lee


This research implemented a pattern recognition scheme to identify the best

optimistic time management and synchronization scheme to execute a particular Parallel
Discrete DES problem. This innovative pattern recognition approach utilizes Deep Belief
Neural Networks and measures of complexity to quantify and capture the structure of the
Parallel Discrete DES problem. This implementation of this approach was very
successful. That means that know we do not need to start doing by trial and error or
utilizing “inconsistent” and/or “fuzzy” rules in order to select the time management and
synchronization scheme. This method is direct (i.e., timeless execution) and selects
automatically the right scheme (i.e., TW, BTW, BTB).
A deep belief network model can be used as a detector of patterns not seeing during
training by inputting a mixture of diverse data from different problems in PDDES. In
reaction to the input, the ingested mixed data then triggers neuron activation probabilities
that propagate through the DBN layer-by-later until the DBN output is reached. The
output probability curve is then examined to select the best optimistic time management
and synchronization scheme (to be utilized).


Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends in
Machine Learning, 2, 1-127.
Cho, K., Alexander, I. & Raiko, T. (2011). Improved learning of Gaussian-Bernoulli
restricted Boltzmann machines. In Artificial Neural Networks and Machine
Learning–ICANN 2011, 10-17.
Erhan, D., Yoshua, B., Courville, A., Manzagol, P., Pascal, V., & Bengio, S. (2010). Why
does unsupervised pre-training help deep learning? The Journal of Machine Learning
Research, 11, 625-660.
Fujimoto, R. (2000). Parallel and Distributed Simulation. New York: John Wiley &
Hinton, G. (2007). Learning multiple layers of representation. Trends in cognitive
Sciences, 11(10), 428-434. doi:10.1016/j.tics.2007.09.004
Hinton, G. (2010). A practical guide to training restricted Boltzmann machines.
Momentum, 9(1), 926.
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke,
V., Nguyen, P., Sainath, T., & Kingsbury, B. (2012). Deep neural networks for
acoustic modeling in speech recognition: The shared views of four research groups.
Signal Processing Magazine, IEEE, 29(6), 82-97. doi:10.1109/MSP.2012.2205597
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 45

Hinton, G., Osindero, S., & Teh, Y. (2006). A Fast Learning Algorithm for Deep Belief
Nets. Neural Computation, 18(7), 1527-1554.
Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural
networks. Science, 313(5786), 504-507. doi:10.1126/science.1127647
Längkvist, M., Karlsson, L., & Loutfi, A. (2012). Sleep stage classification using
unsupervised feature learning. Advances in Artificial Neural Systems, 2012, Article
ID 107046, 9 pages. doi:10.1155/2012/107046
Längkvist, M., Karlsson, L., & Loutfi, A. (2014). A Review of Unsupervised Feature
Learning and Deep Learning for Time-Series Modeling. Pattern Recognition Letters,
42, 11-24. doi :10.1016/j.patrec.2014.01.008
Larochelle, H., Bengio, Y., Louradour, J., & Lamblin, P. (2009). Exploring strategies for
training deep neural networks. The Journal of Machine Learning Research, 10, 1-40.
Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann
machines and deep belief networks. Neural Computation, 20, 1631-1649.
Misra, S. (2006). A Complexity Measure based on Cognitive Weights. International
Journal of Theoretical and Applied Computer Sciences, 1(1), 1–10.
Mohamed, A., Sainath, T., Dahl, G., Ramabhadran, B., Hinton, G., & Picheny, M.
(2011). Deep belief networks using discriminative features for phone recognition.
Proceeding of the IEEE Conference on Acoustics, Speech and Signal Processing,
Mohamed, A., Dahl, G., & Hinton, G. (2012). Acoustic modeling using deep belief
networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1),
14-22. doi:10.1109/TASL.2011.2109382
Salakhutdinov, R., & Murray, L. (2008). On the quantitative analysis of deep belief
networks. Proceedings of the 25th international conference on Machine learning,
872-879. doi:10.1145/1390156.1390266
Shao, J., & Wang, Y. (2003). A new measure of software complexity based on cognitive
weights. Canadian Journal of Electrical and Computer Engineering, No. 0840-8688,
1- 6.
Steinman, J. (1991). SPEEDES: Synchronous Parallel Environment for Emulation and
Discrete Event Simulation. Proceedings of Advances in Parallel and Distributed
Simulation, 95-103.
Steinman, J. (1992). SPEEDES: A Multiple-Synchronization Environment for Parallel
Discrete-Event Simulation. International Journal in Computer Simulation, 2, 251-
Steinman, J. (1993). Breathing Time Warp. Proceedings of the 7th Workshop on Parallel
and Distributed Simulation (PADS93), 23, 109-118.
Steinman, J. (1994). Discrete-Event Simulation and the Event Horizon. Proceedings of
the 1994 Parallel and Distributed Simulation Conference, 39-49.
46 Edwin Cortes, Luis Rabelo and Gene Lee

Steinman, J. (1996). Discrete-Event Simulation and the Event Horizon Part 2: Event List
Management. Proceedings of the 1996 Parallel and Distributed Simulation
Conference, 170- 178.
Steinman, J., Nicol, D., Wilson, L., & Lee, C. (1995). Global Virtual Time and
Distributed Synchronization. Proceedings of the 1995 Parallel and Distributed
Simulation Conference, 139-148.
Steinman, J., Lammers, C., Valinski, M., & Steinman, W. (2012). External Modeling
Framework and the OpenUTF. Report of WarpIV Technologies. Retrieved from
Wulsin, D., Gupta, J., Mani, R., Blanco, J., & Litt, B. (2011). Modeling
electroencephalography waveforms with semi-supervised deep belief nets: fast
classification and anomaly measurement. Journal of neural engineering, 8(3),
036015. doi:10.1088/1741-2560/8/3/036015
Zhang, C., Zhang, J., Ji, N., & Guo, G. (2014). Learning ensemble classifiers via
restricted Boltzmann machines. Pattern Recognition Letters, 36, 161-170.


Dr. Edwin Cortes has a B.S. in Mechanical Engineering, an M.S. in Mathematics,

and a Ph.D. in Simulation and Training from the University of Central Florida in 2015.
He has been working as an Aerospace professional for NASA Kennedy Space Center
since 2004. Edwin has worked in very important programs such as the NASA Shuttle.
Currently, he works for the NASA’s Space Launch System (SLS) Program. SLS is an
advanced launch vehicle for a new era of exploration beyond Earth’s orbit into deep
space. He has published in conference proceedings and journals related to Aerospace
Engineering. His areas of interest are software engineering, simulation, space missions,
propulsion, and control engineering.

Dr. Luis Rabelo was the NASA EPSCoR Agency Project Manager and currently a
Professor in the Department of Industrial Engineering and Management Systems at the
University of Central Florida. He received dual degrees in Electrical and Mechanical
Engineering from the Technological University of Panama and Master’s degrees from the
Florida Institute of Technology in Electrical Engineering (1987) and the University of
Missouri-Rolla in Engineering Management (1988). He received a Ph.D. in Engineering
Management from the University of Missouri-Rolla in 1990, where he also did Post-
Doctoral work in Nuclear Engineering in 1990-1991. In addition, he holds a dual MS
degree in Systems Engineering & Management from the Massachusetts Institute of
Technology (MIT). He has over 280 publications, three international patents being
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 47

utilized in the Aerospace Industry, and graduated 40 Master and 34 Doctoral students as

Dr. Gene Lee is a professor in the Department of Industrial Engineering and

Management Systems at University of Central Florida. He has researched
ergonomics/human factors issues in the area of Modeling and Simulation as well as LVC
simulation training. He has received several grants from various federal and private
organizations. Recently, he successfully completed a project sponsored by Korean
Agency for Defense Development (ADD) and taught the CMSP courses for ROK-ADD
which was funded by the Boeing Co. He has a Ph.D. in Industrial Engineering from
Texas Tech University (1986).
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 3



Olmer Garcia, PhD* and Cesar Diaz, PhD

School of Engineering, Universidad Jorge Tadeo Lozano, Bogotá, Colombia


This article presents an overview of machine learning in general and deep learning in
particular applied to autonomous vehicles. The characteristics of the data from the noise
and small size viewpoints made this problem intractable for other methods. The use of
machine learning for this project required two hardware/software systems: one for
training in the cloud and the other one in the autonomous vehicle. The main conclusion is
that deep learning can create sophisticated models which are able to generalize with
relatively small datasets. In addition, autonomous vehicles are a good example of a
multiclass classification problem.

Keywords: perception, deep learning, autonomous vehicles


According to data published by the United Nations, more than 1.2 million people die
on roads around the world every year, and as many as 50 million are injured. Over 90%
of these deaths occur in low- and middle-income countries. Brazil is among the countries

Corresponding Author Email:
50 Olmer Garcia and Cesar Diaz

in which the number of such deaths is relatively high. Figure 1 shows historical data for
traffic accident deaths in Brazil, USA, Iran, France, and Germany. However, the per
capita statistics are controversial as the number of people who drive varies between
countries, as does the number of kilometers traveled by drivers. There is a significant
difference in the statistics between developing and high-income countries.

Figure 1. Traffic accident deaths per 10,000 citizens. Sources: Brazil (DATASUS), United States
(NHTSA), Iran Bahadorimonfared et al. (2013), Germany( and France (www.securite-

The trend toward the use of automated, semi-autonomous, and autonomous systems
to assist drivers has received an impetus from major technological advances as indicated
by recent studies of accident rates. On the other hand, the challenges posed by
autonomous and semi-autonomous navigation have motivated researchers from different
groups to undertake investigations in this area. One of the most important issues when
designing an autonomous vehicle is safety and security (Park et al., 2010). Currently,
machine learning (ML) algorithms have been used at all levels of automation for
automated vehicles (NHTSA, 2013):

 No-Automation (Level 0): The driver has complete control of the vehicle, but
machine learning helps through the perception of the environment to inspect and
alarm the driver.
 Function-specific (Level 1) and combined Automation (Level 2): One or more
primary driver functions – brake, steering, throttle, and motive power – are
controlled in the specific moment for the algorithms, like lane centering
algorithms or adaptive cruise control. In these systems, the conventional
Machine Learning Applied to Autonomous Vehicles 51

approach is to use machine learning to perform the perception function of the

environment; joint with complex mathematical algorithms to control the different
driver functions.
 Limited Self-Driving Automation (Level 3): The driver is sometimes in the
control loop and the vehicle is operating at level 3 automation. The most
common strategy for transferring control to the driver, particularly in high-risk
situations, is to use an emergency button. However, in practice, this may have
serious drawbacks. This issue is dealt with in a Google patent by Cullinane et al.
(2014), which describes a system in which all the security variables are checked
before the control is transferred. Tesla and the other actual autonomous vehicles
can be cataloged in this level of automation.
 Full Self-Driving Automation (Level 4): In this level, the driver is not expected
to take the control at any time during the desired trip. This level is fully
automated except for some environmental conditions. The actual systems without
the use of deep learning are far from being able to accomplish all the

One of the research problems addressed by Autonomous vehicle is the lack of

driver’s attention (Kaplan et al., 2015). Several potential schemes have been introduced
by several researchers. The most important examples are:

 Jain et al. (2015) used a hidden Markov autoregressive input-output model to

capture contextual information and driver maneuvers a few seconds before they
occur, in order to prevent accidents.
 Malik et al. (2015) described an intelligent driver training system that analyzes
crash risks for a given driving situation. This opens possibilities for improving
and personalizing driver training programs.
 Liu et al. (2014) proposed a method for predicting the trajectory of a lane-
changing vehicle using a hidden Markov model to estimate and classify the
driver’s behavior.
 Amsalu et al. (2015) introduced a method for estimating a driver’s intention
during each step using a multi-class support vector machine. Although the
approaches described in these studies yield satisfactory results, none of them
specifically handle cooperative control between automated intelligent systems
and the driver.
 Merat et al. (2014) described tests in a simulator to investigate driver’s behavior
when the driver is resuming manual control of a vehicle operating at a high level
of automation. Their study sought to contribute to an understanding of suitable
criteria for the design of human-machine interfaces for use in automated driving
52 Olmer Garcia and Cesar Diaz

and so, to ensure that messages related to the transfer of control are given in a
timely and appropriate manner.

The chapter is organized as follows. Section one provides a background about

machine learning and deep learning. Section two expands on the architecture of
autonomous vehicle to identify where and how machine learning algorithms could be
applied. The next section uses a particular case study of machine learning in autonomous
vehicles to illustrate the concepts. Finally, some conclusions and perspectives are


This section is an introduction to the main concepts of machine learning and deep

Machine Learning Concepts

Michalski et al. (1983) stated that a “Learning process includes the acquisition of
new declarative knowledge, the development of motor and cognitive skills through
instruction or practice, the organization of new knowledge into general and effective
representations, the discovery of new facts, and theories through observation and
experimentation.” Kohavi & Provost (1998) published a Glossary of terms for machine
learning and define it as: “The non-trivial process of identifying valid, novel, potentially
useful, and ultimately understandable patterns in data machine learning is most
commonly used to mean the application of induction algorithms, which is one step in the
knowledge discovery process.”
Machine learning is highlighted as the study and computer modeling of learning
processes. The main idea is developed around the following research paths:

 Task-Oriented Studies: Improved performance in a defined set of tasks as the

result of learning systems is the emphasis of this path.
 Cognitive Simulation: This path is related to research and computer simulations
of human learning processes.
 Theoretical Analysis: This path focuses on research of algorithms and learning
Machine Learning Applied to Autonomous Vehicles 53

Many authors have described different taxonomies about learning processes which
only include the basic learner and teacher problem. However, Camastra & Vinciarelli
(2007) provided a more focused definition based on the application of audio, images and
video analysis to machine learning. They identify four different learning types: rote
learning, learning from instruction, learning by analogy, and learning from examples,
which are briefly explained below.

 Rote Learning: This type consists of directly implanting new knowledge in the
learner. This method includes (1) Learning processes using programs and
instructions implemented by external entities, and (2) Learning processes using
memorization of a given data with no inferences drawn from the incoming
 Learning from instruction: This learning consists of a learner acquiring
knowledge from the instructor and/or other source and transforming it into
internal representations. The new information is integrated with prior knowledge
for effective use. One of the objectives is to keep the knowledge in a way that
incrementally increases the learner’s actual knowledge (Camastra & Vinciarelli,
 Learning by analogy: This type of learning consists of acquiring new facts or
skills based on “past situations that bear strong similarity to the present problem
at different levels of abstraction" (Carbonell, 2015). Learning by analogy
requires more inferencing by the learner than rote learning and learning from
instruction. Carbonell (2015) gives a good definition: “A fact or skill analogous
in relevant parameters must be retrieved from memory. Then, the retrieved
knowledge must be transformed, applied to the new situation, and stored for
future use."
 Learning from examples: This can simply be called learning: if given a set of
concept’s examples, the learner builds a general concept representation based on
the examples. The learning problem is described as the search for a general rule
that could explain the examples even if only a limited size of examples is given.
Learning techniques can be grouped into four main types: supervised learning,
unsupervised learning, reinforcement learning, and semi-supervised learning.
 Supervised Learning: the learning process is based on examples with inputs
and desired outputs, given by a “teacher”. The data is a sample of input-
output patterns. The goal is to learn a general rule about how the output can
be generated, based on the given input. Some common examples are
predictions of stock market indexes and recognition of handwritten digits and
letters. The training set is a sample of input-output pairs, the task of learning
problem is to find a deterministic function that maps an input to the
respective output to predict future input-output observations and therefore
54 Olmer Garcia and Cesar Diaz

minimizing errors. There are two types of learning: classification and

 Classification: In this type, the problem inputs are divided into two or
more classes, and the learner must produce a model that maps blind
inputs to one or more of these classes. This problem characterizes most
of the pattern recognition tasks.
 Regression: When the outputs’ space is formed by the outputs
representing values of continuous variables (the outputs are continuous
rather than discrete), then the learning task is known as the problem of
regression or function learning.
 Unsupervised Learning: when the data is a sample of objects without
associated target values, the problem is known as unsupervised learning. In
this case, there is not an instructor. The learning algorithm does not have
labels, leaving it on its own to find some “structure” in its input. We have
training samples of objects, with the possibility of extracting some
“structure” from them. If the structure exists, it is possible to take advantage
of this redundancy and find a short description of the data representing
specific similarity between any pairs of objects.
 Reinforcement Learning: The complication with reinforcement learning is to
find how to learn what to do to maximize a given reward. Indeed, in this
type, feedback is provided in terms of rewards and punishments. The learner
is assumed to gain information about the actions. A reward or punishment is
given based on the level of success or failure of each action. The ergodicity is
important in reinforcement learning.
 Semi-supervised Learning: Consists of the combination of supervised and
unsupervised learning. In some books, it refers to a mixed of unlabeled data
with labeled data to make a better learning system (Camastra, & Vinciarelli,

Deep Learning

Deep learning has become a popular term. Deep learning can be defined as the use of
neural networks with multiple layers in big data problems. So, why is it perceived as a
“new” concept, if neural networks have been studied since the 1940s? This is because
parallel computing created by graphics processing units (GPU), distributed systems along
with efficient optimization algorithms have led to the use of neural networks in
contemporary/complex problems (e.g., voice recognition, search engines, and
autonomous vehicles). To better understand this concept, we first present a brief review
of neural networks; and then proceed to present some common concepts of deep learning.
Machine Learning Applied to Autonomous Vehicles 55

Figure 2. Neural Network with six inputs, one hidden layer with four nodes and one output.

Neural Networks

A neural network is a graph/network of mathematical functions. The graph/network

consists of neurons or nodes, and links or edges. It takes inputs and produces outputs.
Each node or neuron can be described as a mechanism that takes input from an input
layer or hidden layers and returns a result which is applied to other nodes or it becomes
an output node. For example, in Figure 2 the first layer (the inputs) are numerical values,
which are connected to each of the four nodes of the hidden layer. Similarly, each node
creates an output value which may be passed to nodes in the next layer. The output value
is returned from the output layer. The algorithm used to obtain the outputs knowing the
input and the parameters of each node are known as feed-forward due to the flow of
processing. In order to do that, it is necessary to define the order of operations for the
neurons. Given that the input to some neuron depends on the outputs of others, one needs
to flatten the graph of the nodes in such a way that all the input dependencies for each
node are resolved before trying to run its calculations. This is a technique called
topological sort. One example of this is the well known Kahn’s Algorithm (Kahn,1962).
To understand what the parameters of a node are, and how it is obtained, first it is
necessary to define the mathematical model of the node, which can be described by the
following equation:

𝑁𝑜𝑑𝑒𝑜𝑢𝑡𝑝𝑢𝑡 = ∑𝑖(𝑤𝑖 𝑥𝑖 + 𝑏) (1)

56 Olmer Garcia and Cesar Diaz

Where xi is the value of each input to the node, wi are weight parameters which
multiply each input, b is known as the bias parameter and f (.) is known as the activation
function. The commonly used functions are the sigmoidal activation functions, the
hyperbolic tangent functions and the rectified linear unit (ReLU). Heaton (2015) proposes
that while most current literature in deep learning suggests using the ReLU activation
function exclusively, it is necessary to understand sigmoidal and hyperbolic tangent to
see the benefits of ReLU.
Varying the weights and the bias would vary the amount of influence any given input
has on the output. The learning aspect of neural networks takes place during a process
known as back-propagation used by the most common algorithm developed in the
1980’s. In the learning process, the network modifies the weights and bias to improve the
network’s output like any algorithm of machine learning. Backpropagation is an
optimization process which uses the chain rule of the derivative to minimize the error in
order to improve the output accuracy. This process is developed by numerical methods
where stochastic gradient descent (SGD) is a dominant scheme.
Finally, the way in which nodes are connected defines the architecture of the neural
network. Some of the popularly known algorithms are as follows:

 Self-organizing maps (Kohonen, 1998): Unsupervised learning algorithm used

for clustering problems, used principally to understand some information of
perception problems.
 Feedforward artificial neural networks (Widrow & Lehr, 1990): Supervised
learning algorithm that is used for classification and regression. It has been
applied to robotics and vision problems. This architecture is very common in
traditional Neural Networks (NNs) and was heavily used in the multilayer
Perceptron. They can be used as universal function regressors.
 Boltzmann machines (Hinton, Sejnowski, & Ackley, 1984): Supervised learning
algorithm that is used for classification and optimization problems. A Boltzmann
machine is essentially a fully connected two-layer neural network.
 Hopfield neural networks (Hopfield, 1982): Supervised learning algorithm is
used for classification and optimization problems. It is a fully connected single
layer, auto associative network. It works well for incomplete or distorted
patterns, and they can be used for optimization problems such as the traveling
salesman problem.
 Convolutional neural networks (CNNs): Although Fukushima (1980) introduced
the concepts of CNN, many authors have worked on CNN. LeCun et al. (1998)
developed a neural network architecture: LeNet-5. LeNet-5 has become of the
most accepted architectures. A CNN is a supervised learning algorithm. CNN's
map their input into 2D grids. CNN have taken image and recognition to a higher
level of capability. This advance in CNN's is due to years of research on
Machine Learning Applied to Autonomous Vehicles 57

biological eyes. To understand convolutional networks, imagine that we want to

detect features such as edges or other visual elements. The filters can detect these
features, so a CNN acts like a filter in the space domain.

Note that all concepts of machine learning such as how to translate a problem into a
fixed length array of floating-point numbers, which type of algorithm to use,
normalization, correlation, overfitting, and so on are also applicable in deep learning.

Deep Learning Concepts

Deep CNN is a new type of neural networks and one of the classes of deep neural
networks. CNN works by successively representing small portions of the features of the
problem in a hierarchical fashion and combining them in a multiple layer network (with
several hidden layers). This successive representation means that the first layer(s) will be
engineered to detect specific features. The next layers will combine these features into
simpler profiles/forms and into patterns to make the identification more resistant to
positions, resolutions, scales, brightness, noise, and rotations. The last layer (s) will
match that input example (i.e., a particular image acquired) and all of its forms and
patterns to a class. CNN has provided very high levels of predictions in computer vision,
image processing, and voice recognition.
CNN's remind us of neural network architectures such as the Neocognitron and
L3NET-5. CNN's can have many layers. A classical architecture will have at least 4
layers: input, convolution, pooling, and a fully connected one. CNN's can have several
convolution layers, several pooling layers, and several fully connected ones.
Deep learning is an emergent concept of many technologies such as:

 The Rectified linear unit (ReLU) has become the standard activation function for
the hidden layers of a deep neural network. The output layer uses a linear or
Softmax activation function depending on whether the neural network supports
or does not support regression or classification. ReLU is defined as (𝑥) =
𝑚𝑎𝑥(0, 𝑥). The function returns 0 if x is negative, otherwise it returns x.
 Filters: Convolutional neural networks (CNNs) breaks up the image into smaller
pieces. Selecting a width and height that defines a filter or patch is the first step.
CNN uses filters to split an image into smaller patches. The size of these patches
matches the filter size. Then CNN simply slides this patch horizontally or
vertically to focus on a different piece of the image making convolution. The
amount by which the filter slides is referred to as the stride. How many neurons
does each patch connect to? That is dependent on our filter depth. If we have a
depth of k, we connect each patch of pixels to k neurons in the next layer.
58 Olmer Garcia and Cesar Diaz

Finally, the parameter is the padding, which is responsible for the border of zeros
in the area that the filter sweeps.

Convolution Layer

The input layer is just the image and/or input data (e.g., 3D – height (N), width (N),
and depth (D)). Traditional Deep CNN uses the same height and width dimensions (i.e.,
squares). The convolution layer is next. The convolution layer is formed by filters (also
called kernels) which run over the input layer. A filter is of smaller sides (height (F) and
width (F)) than the previous layer (e.g., the inputs layer or a different one) but with the
same depth. A filter is used and processes the entire input layer producing part of the
output of the convolution layer (smaller than the previous layer). The process done by the
filter is executed by positioning the filter in successive areas (F by F) of the input layer.
This positioning advances in strides (S) which is the number of input neurons (of the area
– N x N)) to move in each step (i.e., strides are “the distance between the receptive field
centers of neighboring neurons in a kernel map” (Krizhevsky et al., 2012)). The
relationship of the input layer (or previous layer) (N x N x D) to the map produced by the
passing/execution of a filter of size (F x F x D) is:

Window size (e.g., number of neurons at that layer/level) = (N – F)/S + 1 (2)

However, a convolution layer can have several filters (e.g., kernels) in order to produce a
kernel map as output. It is easy to see that the size of the image is getting smaller. This
can be problematic in particular to apply large size filters or CNNs that have many layers
and filters. Then, the concept of padding (P) is used. Zero-padding is the addition of zero-
valued pixels in the borders of the input layers with strides of size P. The relationship is
as follows:

P = (F-1)/2 (3)

A convolution layer can have several filters each one of size (F x F x D) and this set
will produce an output in the convolutional layer of depth equal to the number of filters in
the respective layer. The output matrix (i.e., kernel map) of the convolutional layer is the
product of the different filters been run over the kernel map of the previous layer. The
kernel map of a convolution layer can be processed for successive convolution layers that
do not need to have filters of the same dimensional size or number. Again, these layers
must be engineered. The weights and biases of these filters to produce their respective
outputs can be obtained from different algorithms such as backpropagation.
Machine Learning Applied to Autonomous Vehicles 59

Knowing the dimensionality of each additional layer helps us understand how large
our model is and how our decisions around filter size and stride affect the size of our
network. With these parameters, we can calculate the number of neurons of each layer in
CNN, given an input layer that has a volume of W (as given by N x N x D), the filter has
a volume (F ∗ F ∗ D) of F, we have a stride of S, and a padding of P, then the following
formula gives us the volume of the next layer:

Volume of next layer: (W - F + 2P)/S + 1. (4)

Pooling Layer

This layer can have several types of filters. One of the most common ones is Max
pooling. Max pooling is a filter of a width by height, which extracts the maximum value
of the patch. Conceptually, the benefit of the max pooling operation is to reduce the size
of the input and to allow the neural network to focus on only the most important
elements. Max pooling does this by only retaining the maximum value for each filtered
area, and removing the remaining values. This technique can avoid over fitting
(Krizhevsky et al., 2012). Some variations like mean pooling are also used.

Fully Connected Layer(s)

This layer type flattens the nodes in one dimension. A fully connected layer connects
every element (neuron) in the previous layers, note that the resulting vector is passed
through an activation function. For example, LeNET-5 networks will typically contain
several dense layers as their final layers. The final dense layer in a LeNET5 actually
performs the classification. There should be one output neuron for each class or type of
image to classify.

Dropout Layer

Normally deep learning has many nodes which mean many parameters. This number
of nodes can generate overfitting. Therefore dropout is used as a regularization technique
for reducing overfitting (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov,
2014). This layer “drops out” a random set of activations in that layer by setting them to
zero in the forward pass. During training, a good starting value for the probability to
dropout is 0.5 and during testing, it uses a value of 1.0 to keep all units and maximizes
the generalization power of the model. There are some variations about this. Krizhevsky
60 Olmer Garcia and Cesar Diaz

et al. (2012) states that dropout “consists of setting to zero the output of each hidden
neuron with probability 0.5. The neurons which are “dropped out” in this way do not
contribute to the forward pass and do not participate in back- propagation. So every time
an input is presented, the neural network samples a different architecture, but all these
architectures share weights. This technique reduces complex co-adaptations of neurons
since a neuron cannot rely on the presence of particular other neurons. It is, therefore,
forced to learn more robust features that are useful in conjunction with many different
random subsets of the other neurons. At test time, we use all the neurons but multiply
their outputs by 0.5, which is a reasonable approximation to taking the geometric mean of
the predictive distributions produced by the exponentially-many dropout networks.”

Transfer Learning

Transfer learning is the process of taking a pre-trained model (the weights and
parameters of a network that has been trained on a large quantity of data by others) and
“fine-tuning” the model with your own dataset (Yosinski, Clune, Bengio, & Lipson,
2014). The idea is that this pre-trained model will act as a feature extractor. You will
remove the last layer of the network and replace it with your own classifier or regression.
The Algorithm blocks the change of the weights of all the other layers and trains the
network normally. Transfer learning led to the introduction of the deep learning principle.
Reusing architecture and learning work is possible in CNNs. Therefore, one must review
the most successful architectures used before such as ImageNet by Krizhevsky et al.
(2012), ZF Net by Zeiler and Fergus (2014), VGG Net by Simonyan and Zisserman
(2014), GoogLeNet by Szegedy et al. (2015), and Microsoft ResNet (residual network)
by He et al. (2016).


A typical architecture of mobile robots as described by Siegwart et al. (2011) is an

intelligent and autonomous system consisting of three main layers: perception, planning
and motion control. Each layer seeks to answer specific questions related to the
respective tasks as performed by the autonomous system (Figure 3).
The perception layer consists of the process to keep an internal description of the
external environment. The external environment is that part of the universe, which is
accessible to the proprioceptive sensors of an agent. In theory, it is also possible to use
the environment itself as the internal model. However, this requires a complete and
timeless sensing ability. It is easier to build a local description from a set sources and to
exploit the relative continuity of the universe to combine/fuse individual observations.
Machine Learning Applied to Autonomous Vehicles 61

"Dynamic World Modeling" is the problem by which an internal description of the

environment is assembled using proprioceptive sensors. By dynamic, it is meant that the
description evolves over time based on information from perception. This description is a
model because it permits the agent to represent the external environment. Fusion
techniques have been used to combine the measures provided by the sensors and their
comparison with the respective mathematical models of the robot and the environment.
Perception and state estimation have many characteristics in common. State
estimation calculates the state of the vehicle. On the other hand, perception estimates the
state of the environment. Although state estimation tends to deal with signal variations
over time, perception tends to deal with signal variations over space. In this layer,
machine learning techniques have been used because the proprioceptive sensors generate
vast amounts of information. This information has to be processed in a timeless fashion
and therefore conventional techniques are not able to handle this online. For example, the
amount of information generated by a camera is very high: If you have a color camera in
full HD, it generates more than six million of points (two million pixels by each of the
three basic colors) at a rate of 30 frames per second. This information must be processed
in real time in order to obtain the characteristics of the environment like traffic signals,
pedestrians, cars, and bicycles.

Figure 3. Layers in the mobile robotics architecture (Bedoya, 2016).

The planning or navigation layer will determine where the vehicle should go
according to the perception and the mission. This has to include a risk analysis to
determine the path and speed of the vehicle. The cognition aspects of an autonomous
vehicle depend on the mobility capabilities which are studied by the robotics navigation
62 Olmer Garcia and Cesar Diaz

field (Siegwart, Nourbakhsh, & Scaramuzza, 2011). The navigation field organizes its
techniques into two groups: planning and reacting. The techniques from the planning
group are known as global path planning and are concerned with the generation of the
global route that guides the vehicle toward a goal position. The techniques from the
reacting group are known as local path planning and are concerned with the generation of
several local paths that allow the vehicle to avoid obstacles. In this layer, machine
learning techniques are used to select routes (global and local).
Finally, the control layer will manipulate the degrees of freedom of the autonomous
vehicle (e.g., steering, braking, gearbox, acceleration) for bringing it to the desired
position at a defined speed at each instant of time. Machine learning techniques have
been used to obtain mathematical models and/or adapt a controller to different situations.

Figure 4. Interactions of the proposed cooperative strategy with the architecture of the autonomous
vehicle VILMA01 (Bedoya, 2016).

This research studies the architecting of the layers using a cooperative strategy based
on risk analysis. The resulting architecture includes mechanisms to interact with the
driver (this architecture has been proposed in VILMA01 - First Intelligent Vehicle of the
Autonomous Mobility Laboratory). We stated above that the motion control layer is the
one in charge of manipulating the degrees of freedom of the car (steering, braking, and
acceleration). This manipulation will bring the autonomous vehicle to the desired position
at each point in time. We will explain that this can be achieved by using a predictive
control technique that relies on dynamic models of the vehicle to control the steering
system. The path-planning layer will have the reactive part also known as local path
planning, where the desired path is represented in a curvilinear space. The desired path is
Machine Learning Applied to Autonomous Vehicles 63

selected based on intrinsic and extrinsic risk indicators. With the layers of planning and
control already set, a method is proposed to estimate the trajectory desired by the driver
during the cooperative control, allowing a decision to be made based on risk analysis.
Finally, different tests on VILMA01 (in the actual vehicle) are performed to validate the
proposed architecture.
These layers are not exactly a hierarchical model. Each layer has interactions at
different levels from directive to cooperative control with the others. These interactions
can be adapted depending on what the vehicle tries to do. For example, the architecture of
VILMA01 (Bedoya, 2016) aims to test strategies to drive a vehicle cooperatively
between an autonomous system and a driver which could help to reduce the risk of
accidents. This strategy assumes that the autonomous system is more reliable than the
driver, even though in other circumstances the driver could interact with the human
machine interface to disengage the autonomous system. Based on the architecture of
autonomous mobile robots, the proposed strategy is denominated as cooperative planning
and cooperative control, which determines when and how the driver can change the path
projected by the autonomous system safely through the steering. Figure 4 shows the
function blocks for the autonomous vehicle VILMA01. There are two important
considerations in the cooperative strategies. The first one is the interaction of the driver
and the robot through the steering (dotted line 1), which in turn generates the second one,
which poses the question in the planning layer (dotted line 2): is it safe to change the
projected path? These additions to the existent architecture generate two types of
cooperation. The first one, cooperative control is defined when the control signal of the
driver and the autonomous system cooperate during the local path planned by the
autonomous system. The second one (cooperative planning) is defined when the driver
and the autonomous system cooperate to change the local path after risk analysis is
Finally, the design of the layers, their functionality, and interactions can provide an
architecture its level of automation. According to Thrun et al. (2006), the six major
functional groups are interface sensors, perception, control, planning, vehicle interface
and user interface. Therefore, this layered architecture must take into consideration
hardware, software, and drive-by-wire automation.


The most common applications of deep learning in autonomous vehicles are in

perception. As explained in the last section, one of the biggest problems in perception is
identifying objects on images because of the number of inputs which makes the
generation of a generic geometrical model very difficult. Therefore, it is a good problem
for deep learning.
64 Olmer Garcia and Cesar Diaz

Our work is inspired by the German Traffic Signs data set provided by Stallkamp,
Schlipsing, Salmen, & Igel (2011) that contained about 40k training examples and 12k
testing examples. The same problem can be used as a model for Colombia traffic signs.
This is a classification problem which aims to assign the right class to a new image of a
traffic sign by training on the provided pairs of traffic sign images and their labels. The
project can be broken down into five parts: exploratory data analysis, data preprocessing
and data augmentation, the definition of a CNN architecture, training the model, testing
the model and using it with other images.

Data Analysis

The database is a set of images which can be described computationally like a

dictionary with key/value pairs:

 The image data set is a 4D array containing raw pixel data of the traffic sign
images (number of examples, width, height, channels).
 The label is an array containing the type of the traffic sign (number of samples,
traffic sign id).
 Traffic sign id description is a file, which contains the name and some
description for each traffic sign id.
 An array containing tuples, (x1, y1, x2, y2) representing coordinates of a
bounding box around the sign in the image.

It is essential to understand the data and how to manipulate it (Figure 5 shows some
randomly selected samples). This process of understanding and observing the data can
generate important conclusions such as:

 Single-image, multi-class classification problem.

 Forty-three classes of a traffic sign.
 Reliable ground-truth data due to semi-automatic annotation (Stallkamp,
Schlipsing, Salmen, & Igel, 2011).
 The images contain one traffic sign each
 Images are not necessarily squared; they contain a border of 10% around the
traffic sign and is not centered in the image.
 Image sizes vary between 15x15 to 250x250 pixels
 The classes were found to be highly imbalanced.
Machine Learning Applied to Autonomous Vehicles 65

Figure 5. A sample of the data set.

Figure 6. Histogram of a number of samples of each traffic sign in the training data set.
66 Olmer Garcia and Cesar Diaz

Pre-Processing and Data Augmentation

The input images to the neural network went through a few preprocessing steps to
help train the network. Pre-processing can include:

 Resizing the image: A specific size is required. 32x32 is a good value based on
the literature.
 Color Space Conversion: It is possible to transform to gray scale if you think that
the colors do not matter in the classification or may be changed from RGB (Red,
Green, and Blue) space to some color space like HSV (Hue, Saturation, and
Brightness). Some other approach can include balanced brightness and contrast
of the images.
 Normalization: This part is very important because the algorithms in neural
networks work just with the data in some interval, normally between 0 and 1 or -
1 and 1. This could be done by dividing each dimension by its standard deviation
once it is zero-centered. This process causes each feature to have a similar range
so that our gradients do not go out of control (Heaton, 2013).

Unbalanced data, as shown in Figure 6, means that there are many more samples of
one traffic sign than the others. This could generate overfitting and/or other problems in
the learning process. One solution is to generate new images or to take some images
randomly and change through a random combination of the following techniques:

 Translation: Move the image horizontally or vertically and some pixels around
the center of the image.
 Rotation: Rotate the image at random angle with axes at the center of the image.
 Affine transformations: Make a zoom over the image or change the perspective
of the image.

Definition of an Initial CNN Architecture

A good way to start assembling your own deep neural network is to review the
literature and look for a deep learning architecture which has been used in a similar
problem. The first one was the architecture presented by LeCun et al. (1998): LeNet-5
(Figure 7). Let’s assume that we select LeNet-5. Therefore, the first step is to understand
LeNet-5 which is composed of 8 layers. LeNet-5 is explained as follows:
Machine Learning Applied to Autonomous Vehicles 67

 Layer 1: Convolutional. Input = 32x32x1. Output = 28x28x6. Activation

function ReLU.
 Layer 2: Sub-sampling Max-Pooling. Input = 28x28x6. Output = 14x14x6.
 Layer 3: Convolutional. Input = 14x14x6. Output = 10x10x16. Activation
function ReLU.
 Layer 4: Sub-sampling Max-Pooling. Input = 10x10x16. Output = 5x5x16.
 Layer 5: Flat layer, 3-D to 1D. Input = 5x5x16. Output = 400.
 Layer 6: Fully connected layer. Input = 400. Output = 120. Activation function
 Layer 7: Fully connected layer. Input = 120. Output = 84. Activation function
 Layer 8: Output layer. Input = 84. Output = 10. Apply the soft-Max function to
obtain the output. The output is 10 indicating the different digits from 0 to 9.

It is possible to modify LeNet-5 to accommodate the requirements of our problem.

We can start by changing the input and re-define the size of the images. For example, a
square of 32 pixels with three channels (RGB) can be used as layer 1 (i.e, the Input of
Layer 1 is 32x32x3) and the outputs (i.e., the number of classes) which in our
implementation of traffic signals are set to 43 (i.e., Output of Layer 8 is 43). After
training and validating, one can start changing parts of the architecture or trying new ones
based on the training criteria. This will become an iterative process where one knows
which parameters and layers should be changed. One important question is how to obtain
the initial values of the weights. This could be done by selecting values from a normal
distribution, but if the analyst sees that after training, the values of the parameters are
very small or very large, he/she can change the variance of the distribution.

Figure 7. The architecture of LeNet-5, a Convolutional Neural Network, here for digits’ recognition.
Each plane is a feature map, i.e., a set of units whose weights are constrained to be identical – Adapted
and modified from LeCun et al. (1998).
68 Olmer Garcia and Cesar Diaz

Training the Model

There are several platforms to implement the training process from an

algorithmic/software/hardware viewpoint. One of the most used platforms is TensorFlow
like backend ( TensorFlow is an open source software
library for AI which performs mathematical operations in an efficient way. TensorFlow
achieves this by:

 Managing derivatives computing processes automatically.

 Including a computing architecture that supports asynchronous computation,
queues, and threads in order to avoid long training sessions.

The training process for CNNs has the following steps:

 Split the training data between training and validation. Validation data is used
for calculating the accuracy of the estimation. On the other hand, training data is
used to apply the gradient algorithm.
 Type of optimizer: Several algorithms can be used. The gradient descent
stochastic optimization by Kingma & Ba (2014) is a typical selection. This
scheme is a first-order gradient-based optimization of stochastic objective
functions. In addition, it is well suited for problems that are large in terms of data
and/or input parameters. The algorithm is simple and can be modified
accordingly. Kingma and Ba (2014) detailed their algorithm (pseudocode) as
Require: α: Stepsize
Require: β1, β2 ∈ [0, 1): Exponential decay rates for the moment estimates
Require: f(θ): Stochastic objective function with parameters θ
Require: θ0: Initial parameter vector
m0 ← 0 (Initialize 1st-moment vector)
v0 ← 0 (Initialize 2nd-moment vector)
t ← 0 (Initialize timestep)
while θt not converged do
t ← t + 1 (Increase timestep t)
gt ← ∇θft(θt−1) (Get gradients with respect to the stochastic objective at t)
mt ← β1 · mt−1 + (1 − β1) · gt (Update biased first-moment estimate)
vt ← β2 · vt−1 + (1 − β2) ·𝑔𝑡2 (Update biased second raw moment
𝑚̂ 𝑡 ← mt/(1 − 𝛽1𝑡 ) (Compute bias-corrected first moment estimate)
𝑣̂t ← vt/(1 − β𝑡2 ) (Compute bias-corrected second raw moment estimate)
Machine Learning Applied to Autonomous Vehicles 69

θt ← θt−1 − α ·𝑚̂ 𝑡 /( √𝑣̂𝑡 + ∈) (Update parameters)

end while
return θt (Resulting parameters for the Deep Neural Network)

 Batch size: This hyper-parameter defines the number of examples that are going
to be propagated in a forward/backward iteration. A fine tuned batch size can
support schemes of less memory and faster training. However, it can reduce the
accuracy of the gradient estimation.
 Epochs: One epoch is a forward pass and one backward pass of all the training
examples of the training data set. The analyst monitors each epoch and analyzes
how the training process is evolving. Note that in each epoch the training and
validation data should be shuffled to improve the generalization of the neural
 Hyperparameters: Depending on the algorithm and the framework used there
exist values that should be tuned. Learning rates of the optimizer are usually an
important hyper-parameters to find. CNNs may involve other hyperparameters
such as filter windows, dropout rates, and the size of the mini-batches. These
hyper-parameters can be different for each layer. For example, the following
hyper-parameters can be relevant for a CNN: Number of Filters (K), F: filter size
(FxF), Stride (S), and the amount of padding (P). Techniques can be used in
order to optimize the tuning process and avoid trial and error efforts. These
techniques can involve models from Operations Research, evolutionary
algorithms, Bayesian schemes, and heuristic searches.

The training process will end when one finds a good accuracy between the model
outputs and the known output classes. In this project, an accuracy over 98% for a CNN
developed for Colombian traffic signals was achieved. This is a very good value taking
into consideration that humans are in the 98.32% accuracy range. The training process
requires sophisticated computational power. It is essential to have access to high-level
computing resources or cloud services providers like Amazon (,
IBM Bluemix ( or Microsoft Azure

Testing the Model

The last step is to prove that the neural network model works in other situations
which are different than the data that was used to train and validate the model. It is very
important to use data that has not been used in the process of training and validation. For
example, we developed a CNN with Colombian Traffic signals and obtained a moderate
70 Olmer Garcia and Cesar Diaz

to low accuracy in the testing process. This model developed provided opportunities to
analyze new research questions such as:

 Will this model work with my country traffic signals – How about the climate
and the cultural environment?
 How to improve performance?
 Is feasible to implement the feedforward process in real time?


A brief review of machine learning and the architecture of autonomous vehicles was
discussed in this chapter. It is important to note that the use of machine learning required
two hardware/software systems: one for training in the cloud and the other one in the
autonomous vehicle. Another point to take into account was that modeling by machine
learning using examples requires sufficient data to let machine learning models
generalize at appropriate levels. There are some potential applications for deep learning
in the field of autonomous vehicles. For example, it is possible that a deep learning
neural network becomes the “driver” of the autonomous vehicle: where the inputs are
road conditions and the risk profile of the passenger and the outputs are turning degrees
and speed of the car. Driving scenarios are a good fit for multiclass and multi label
classification problems. The mapping is hidden in the different and multiple hierarchical
layers but deep learning does not need the exact form of the function (if it maps well
from input to output). The results are very promising. However, safety regulations (and
public acceptance) will require numerous tests and validations of the deep learning based
systems to be certified by the respective agencies.


Amsalu, S., Homaifar, A., Afghah, F., Ramyar, S., & Kurt, A. (2015). Driver behavior
modeling near intersections using support vector machines based on statistical feature
extraction. In 2015 IEEE Intelligent Vehicles Symposium (IV), 1270–1275.
Bahadorimonfared, A., Soori, H., Mehrabi, Y., Delpisheh, A., Esmaili, A., Salehi, M., &
Bakhtiyari, M. (2013). Trends of fatal road traffic injuries in Iran (2004–2011). PloS
one, 8(5):e65198.
Bedoya, O. G. (2016). Análise de risco para a cooperação entre o condutor e sistema de
controle de veículos autônomos[Risk analisys for cooperation between the driver and
Machine Learning Applied to Autonomous Vehicles 71

the control system of an autonomous vehicle]. PhD thesis, UNICAMP, Campinas,

SP, Brasil.
Camastra, F. & Vinciarelli, A. (2007). Machine Learning for Audio, Image and Video
Analysis: Theory and Applications (Advanced Information and Knowledge
Processing). 2nd edition.
Carbonell, J. (2015). Machine Learning. Learning by Analogy: Formulating and
Generalizing plans from past experience. Symbolic Computation. Springer.
Chen, Y.-L., Sundareswaran, V., Anderson, C., Broggi, A., Grisleri, P., Porta, P. P., Zani,
P., & Beck, J. (2008). Terramax: Team Oshkosh urban robot. Journal of Field
Robotics, 25(10), 841–860.
Cullinane, B., Nemec, P., Clement, M., Mariet, R., & Jonsson, L. (2014). Engaging and
disengaging for autonomous driving. US Patent App. 14/095, 226.
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a
mechanism of pattern recognition unaffected by shift in position. Biological
Cybernetics, 36(4), 193–202.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 770–778.
Heaton, J. (2013). Artificial Intelligence for Humans, Volume 1: Fundamental
Algorithms. CreateSpace Independent Publishing Platform.
Heaton, J. (2015). Artificial Intelligence for Humans: Deep learning and neural
networks. Artificial Intelligence for Humans. Heaton Research, Incorporated.
Hinton, G. E., Sejnowski, T. J., & Ackley, D. H. (1984). Boltzmann machines: Constraint
satisfaction networks that learn. Carnegie-Mellon University, Department of
Computer Science Pittsburgh, PA.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective
computational abilities. Proceedings of the national academy of sciences, 79(8),
2554– 2558.
Jain, A., Koppula, H. S., Raghavan, B., Soh, S., & Saxena, A. (2015). Car that knows
before you do: Anticipating maneuvers via learning temporal driving models. In 2015
IEEE International Conference on Computer Vision (ICCV), 3182–3190.
Kahn, A. B. (1962). Topological sorting of large networks. Communications of the ACM,
5(11), 558–562.
Kaplan, S., Guvensan, M. A., Yavuz, A. G., & Karalurt, Y. (2015). Driver behavior
analysis for safe driving: A survey. IEEE Transactions on Intelligent Transportation
Systems, 16(6), 3017–3032.
Kingma, D. & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint
Kohavi, R. & Provost, F. (1998). Glossary of terms. Mach. Learn., 30(2-3), 271–274.
Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(1), 1–6.
72 Olmer Garcia and Cesar Diaz

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing
systems, 1097–1105.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied
to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Liu, P., Kurt, A., & Ozguner, U. (2014). Trajectory prediction of a lane changing vehicle
based on driver behavior estimation and classification. In 17th International IEEE
Conference on Intelligent Transportation Systems (ITSC), 942–947.
Malik, H., Larue, G. S., Rakotonirainy, A., & Maire, F. (2015). Fuzzy logic to evaluate
driving maneuvers: An integrated approach to improve training. IEEE Transactions
on Intelligent Transportation Systems, 16(4), 1728–1735.
Merat, N., Jamson, A. H., Lai, F. C., Daly, M., & Carsten, O. M. (2014). Transition to
manual: Driver behaviour when resuming control from a highly automated vehicle.
Transportation Research Part F: Traffic Psychology and Behaviour,27, Part B, 274 –
282. Vehicle Automation and Driver Behaviour.
Michalski, S. R., Carbonell, J., & Mitchell, T. (1983). Machine Learning: An Artificial
Intelligence Approach. Tioga Publishing Company.
NHTSA (2013). US department of transportation releases policy on automated vehicle
development. Technical report, Highway Traffic Safety Administration.
Organization, W. H. (2015). Global status report on road safety 2015. http://apps.who.
int/iris/bitstream/10665/189242/1/9789241565066_eng.pdf?ua=1. (Accessed on
Park, J., Bae, B., Lee, J., & Kim, J. (2010). Design of failsafe architecture for unmanned
ground vehicle. In Control Automation and Systems (ICCAS), 2010 International
Conference on, 1101–1104.
Siegwart, R., Nourbakhsh, I. R., & Scaramuzza, D. (2011). Introduction to autonomous
mobile robots. MIT Press, 2nd Edition.
Simonyan, K. & Zisserman, A. (2014). Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556.
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from overfitting. Journal of
Machine Learning Research, 15(1), 1929–1958.
Stallkamp, J., Schlipsing, M., Salmen, J., & Igel, C. (2011). The German Traffic Sign
Recognition Benchmark: A multi-class classification competition. In IEEE
International Joint Conference on Neural Networks, 1453–1460.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Machine Learning Applied to Autonomous Vehicles 73

Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Aron, A., Diebel, J., Fong, P.,
Gale, J., Halpenny, M., & Hoffmann, G. (2006). Stanley: The robot that won the
darpa grand challenge. Journal of field Robotics, 23(9), 661–692.
Widrow, B. & Lehr, M. A. (1990). 30 years of adaptive neural networks: perceptron,
madaline, and backpropagation. Proceedings of the IEEE, 78(9), 1415–1442.
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in
deep neural networks? In Advances in neural information processing systems, 3320–
Zeiler, M. D. & Fergus, R. (2014). Visualizing and understanding convolutional
networks. In European conference on computer vision, 818–833. Springer.


Dr. Olmer Garcia Bedoya is an associate professor at the School of Engineering of

the Universidad Jorge Tadeo Lozano in Colombia. He obtained his degree in
Mechatronics Engineering in 2005 at Universidad Militar Nueva Granada (UMNG) -
Colombia, a Master degree in Electronics Engineering at the Universidad de Los Andes
(2010) - Bogota, Colombia, and he obtained his Ph.D. degree in mechanical engineering
at the Campinas State University - Brazil in 2016. His current research interests are
autonomous vehicles, model predictive control, robotics, machine learning, automation,
and the internet of things.

Dr. Cesar O. Diaz graduated in Electrical Engineering at Universidad de Los Andes

in 2001. He obtained a MS in Electronic Engineering from Pontificia Universidad
Javeriana. He earned his Ph.D. in Computer Science from the University of Luxembourg
in Luxembourg (2014). Since 2002 he has been a professor and researcher in several
universities in Colombia until 2010. He did a postdoctoral research in Universidad de Los
Andes in 2015. He is currently a professor in Universidad Jorge Tadeo Lozano. His
research interests are in Future Generation Computer Systems, IoT, Big Data Analytics,
Big Data Infrastructure, Distributed Systems, Green and Cloud Computing, Energy-
efficient scheduling, and resource allocation on cloud computing.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 4



Fred K. Gruber, PhD*

Cambridge, Massachusetts, US


Support vector machines are popular approaches for creating classifiers in the
machine learning community. They have several advantages over other methods like
neural networks in areas like training speed, convergence, complexity control of the
classifier, as well as a more complete understanding of the underlying mathematical
foundations based on optimization and statistical learning theory. In this chapter we
explore the problem of model selection with support vector machines where we try to
discover the value of parameters to improve the generalization performance of the
algorithm. It is shown that genetic algorithms are effective in finding a good selection of
parameters for support vector machines. The proposed algorithm is tested on a dataset
representing individual models for electronic commerce.

Keywords: machine learning, support vector machines, genetic algorithms


Support vector machines are popular approaches for developing classifiers that offer
several advantages over other methods like neural networks in terms of training speed,

* Corresponding Author Email:

76 Fred K. Gruber

convergence, control of classifier complexity, as well as a better understanding of the

underlying mathematical foundations based on optimization and statistical learning
Nevertheless, as with most learning algorithms the practical performance depends on
the selection of tuning parameters that control the behaviour and that, ultimately,
determines how good the resulting classifier is. The simplest way to find good parameter
values is using an exhaustive search, i.e., trying all possible combinations but this method
is impractical as the number of parameters increases. The problem of finding good values
for the parameters to improve the performance is called the model selection problem.
In this chapter we investigate the model selection problem in support vector
machines using genetic algorithms (GAs). The main contribution is to show that GAs
provide an effective approach to finding good parameters for support vector machines
(SVMs). We describe a possible implementation of a GA and compare several variations
of the basic GA in terms of the convergence speed. In addition, it is shown that using a
convex sum of two kernels provides an effective modification of SVMs for classification
problems and not only for regression as was previously shown in Smits and Jordaan
(2002). The algorithm is tested on a dataset that consists of information on 125 subjects
from a study conducted by Ryan (1999) and previously used for comparing several
learning algorithms in Rabelo (2001). The proposed algorithm is tested on a dataset that
represents individual models for electronic commerce.


Support vector machines as well as most other learning algorithms have several
parameters that affect their performance and that need to be selected in advance. For
SVMs, these parameters include the penalty value C , the kernel type, and the kernel
specific parameters. While for some kernels, like the Gaussian radial basis function
kernel, there is only one parameter to set (  ), more complicated kernels need an
increasing number of parameters. The usual way to find good values for these parameters
is to train different SVMs –each one with a different combination of parameter values–
and compare their performance on a test set or by using other generalization estimates
like leave one out or crossvalidation. Nevertheless, an exhaustive search of the parameter
space is time consuming and ineffective especially for more complicated kernels. For this
reason several researchers have proposed methods to find good set of parameters more
efficiently (see, for example, Cristianini and Shawe-Taylor et al. (1999), Chapelle et al.
(2002), Shao and Cherkassky (1999), and Ali and Smith (2003) for various approaches).
For many years now, genetic algorithms have been used together with neural
networks. Several approaches for integrating genetic algorithms and neural networks
Evolutionary Optimization of Support Vector Machines … 77

have been proposed: using GAs to find the weights (training), to determine the
architecture, for input feature selection, weight initialization, among other uses. A
thorough review can be found in Yao (1999). Recently, researchers have been looking
into the combination of support vector machines with genetic algorithms.
Few researchers have tried integrating SVMs with genetic algorithms. There are
basically two types of integrations of SVM and GA. The most common one consists on
using the GA to select a subset of the possible variables reducing the dimensionality of
the input vector for the training set of the SVM or selecting a subset of the input vectors
that are more likely to be support vectors (Sepúlveda-Sanchis et al., 2002; Zhang et al.,
2001; Xiangrong and Fang, 2002; Chen, 2003). A second type of integration found in the
literature is using a GA for finding the optimal parameters for the SVM (Quang et al,
2002; Xuefeng and Fang, 2002; Lessmann, 2006).
Here we propose and illustrate another approach that makes use of ten-fold
crossvalidation, genetic algorithms, and support vector machines with a mixture of kernel
for pattern recognition. The experiments are done using a dataset that represents model of
individuals for electronic commerce applications.


Evolutionary computation is a search and optimization technique inspired in natural

evolution. The various evolutionary models that have been proposed and studied are
usually referred as evolutionary algorithms (EAs) and they share similar characteristics
(Bäck et. al, 2000): 1. use a population of individuals or possible solutions, 2. create new
solutions or individual by means of random process that model biological crossover and
mutation, and 3. Uses a fitness function that assign a numeric value to each individual in
the population. A selection process will favor those individual with a higher fitness
function. The fitness function represents the environment in which the individuals live.
Genetic algorithms (GAs) (see Figure 1) are evolutionary algorithms first proposed
by Holland in 1975 (Holland, 1975) and they initially had three distinguishing features
(Bäck et. al, 2000): the individuals are represented by bitstrings, i.e., strings of 0’s and
1’s of fixed length; the individuals are selected for reproduction according to proportional
selection; and the primary method of producing variation is crossover. In addition,
mutation of newly-generated offspring induced variability in the population.
GAs have been through a lot of changes and the difference with other EAs has started
to blur. Nevertheless, most GAs implementation follow certain common elements
(Goldberg, 1989; Mitchell, 1998):

 they work with a representation or coding of the parameters,

 they search a populations of individuals,
78 Fred K. Gruber

 selection is according to a fitness function only, and

 they use probabilistic transition rules.

The search for a solution implies a compromise between two contradictory

requirements: exploitation of the best available solution, and robust exploration of the
search space. Exploitation is referred to the search of similar solutions and it is closely
related to the crossover operator while exploration involves a global search and it is
related to the mutation operator. If the solutions are overexploited, a premature
convergence of the search procedure may occur. This means that the search stops
progressing and the procedure eventually ends with a suboptimal solution. If emphasis is
given to the exploration, the information already available may be lost and the
convergence of the search process could become very slow.

Figure 1. Simple Genetic Algorithm.

Probably, the most important characteristics of genetic algorithms are the

robustness—they tend to solve a wide domain of problems with relatively efficiency—
and the flexibility—they do not require any especial information about the problem (e.g.,
derivatives, etc.) besides the fitness function. Thanks to these characteristics, they have
been applied to a great variety of problems
Evolutionary Optimization of Support Vector Machines … 79


In the simple GA introduced by Holland, the individuals were represented by a string

of bits. Certain number of bits represents a particular attribute or variable:

var1 var2 var3

Figure 2. Binary representation of GAs.

Depending on the problem, each of these strings can be transformed into integers,
decimals, and so on.
Usually the initial population is selected at random; every bit has an equal chance of
being a ‘0’ or a ‘1’. For each individual, a fitness value is assigned according to the
problem. The selection of the parents that will generate the new generation will depend
on this value.
Another popular representation in GAs is the floating point representation: each gene
in the individual represents a variable. This type of representation has been successfully
used in optimization problems (Michalewicz and Janikow, 1996; Goldberg, 1991). It is
important to note, though, that real-coded genetic algorithms require specialized


There are different ways to select the parents. In the fitness proportionate selection
method every individual is selected for crossover a number of times proportional to his
fitness. It is usually implemented with the roulette-wheel sampling (also called
Montecarlo Selection algorithm in Dumitrescu et al. (2000)): each solution occupies an
area of a circular roulette wheel that is proportional to the individual’s fitness.
The roulette wheel is spun as many times as the size of the population. This method
of selection has several drawbacks. During in the start of the algorithm if there are
individuals that have a relatively large fitness function they will be selected many times
which could cause a premature convergence due to lack of diversity. Later on the
simulation run when most individuals have similar fitness function every individual will
have roughly the same probability of being selected. Also, it is not compatible with
negative values and it only works with maximization problems.
In Tournament selection, K individuals are selected at random from the population.
In the deterministic Tournament selection, the fitter of the k individuals is selected. In
80 Fred K. Gruber

the nondeterministic version, the fitter individual is selected with certain probability.
Tournament selection is becoming a popular selection method because it does not have
the problems of fitness-proportionate selection and because it is adequate for parallel
implementations (Bäck et. al., 2000).
Other selections methods include rank-based selection, Boltzman selection, steady
state selection, sigma scaling and others. For a complete survey of the different selection
methods the reader can refer to Bäck et. al. (2000) and Mitchell (1998).


There are two main types of operators Bäck et. al. (2000): unary, e.g., mutation and
higher order operators, e.g., crossover. Crossover involves two or more individuals that
are combined together to form one or more individual. The simplest crossover type is
one-point crossover as shown in Figure 3:

Parent1: 10110001010001
Parent2: 10101010011111

One point
Child 1: 10110010011111
Child 2: 10101001010001

Figure 3. One-point crossover.

This operator has an important shortcoming: positional bias—the bits in the extremes
are always exchanged. This type of crossover is rarely used in practice (Bäck et. al 2000).
Two-point crossover is a variation of the previous operator as illustrated in Figure 4:

Parent1: 10110001010001
Parent2: 10101010011111

Two point
Child 1: 101100010 10001
Child 2: 101010100 11111

Figure 4. Two-point crossover.

Evolutionary Optimization of Support Vector Machines … 81

Other types include n -point crossover or uniform crossover. In uniform crossover, a

mask determines which parent will provide each bit. For instance, one child could be
formed by selecting the bit from parent1 if the corresponding bit in the mask is a 1 and
selecting the bit from parent 2 if the bit in the mask is a 0. Another child could be formed
by doing the inverse (Figure 5).

Figure 5. Uniform crossover.

There is no clear “best crossover” and the performance of the GA usually depends on
the problem and the other parameters as well.
Crossover is not limited to two parents, though. There have been experimental results
pointing out that multiparent crossover, e.g., six parent diagonal crossover, have better
performance than the one-point crossover (see Eiben, 2002 and references therein).
In the one-child version of the diagonal crossover, if there are n parents, there will
be n  1 crossover points and one child (see Figure 6).
In GAs, crossover is the main operator of variation, while mutation plays a reduced
role. The simplest type of mutation is flipping a bit at each gene position with a
predefined probability. Some studies have shown that varying the mutation rate can
improve significantly the performance rate when compared with fixed mutation rates (see
Thierens, 2002).

Figure 6. Diagonal crossover with one child.

There are three main approaches to varying the mutation rate (Thierens, 2002):
dynamic parameter control, in which the mutation rate is a function of the generations.
82 Fred K. Gruber

Adaptive parameter control in which the mutation rate is modified according to a

measure of how well the search is going, and self-adaptive parameter control in which the
mutation rate is evolved together with the variables that are being optimized.
An example of a dynamic mutation rate is tested in Bäck and Schütz (1996) where
the mutation rate depended on the generation according to

 n2 
pt   2  t 
 T 1 

Where t is the current generation and T is the maximum number of generations.

In the adaptive methodology, the goodness of the search is evaluated and the
mutation rate, and sometimes the crossover rate, is modified accordingly. One technique
that is found to produce good results in Vasconcelos et al. (2001) measured the “genetic
diversity” of the search according to the ratio of the average fitness to the best fitness or
gdm . A value of gdm close to 1 implies that all individuals have the same genetic code
(or the same fitness) and the search is converging. To avoid premature convergence, it is
necessary to increase exploration (by increasing the mutation rate) and to reduce the
exploitation (by reducing the crossover rate). For the contrary, if the gdm falls below a
lower limit the crossover rate is increased and the mutation rate reduced.
In the self-adaptive methodology, several bits are added to each individual that will
represent the mutation rate for that particular individual. This way the mutation rate
evolves with each individual. This technique is investigated by Bäck and Schütz (1996).
Another important variation is elitism in which the best individual is copied to the
next generation without modifications. This way the best solution is never lost (see, for
example, Xiangrong and Fang, 2002).


All experiments use data from the study conducted by Ryan (1999) that contains
information on 125 subjects. A web site is used for this experiment, where 648 images
are shown sequentially to each subject. The response required from the individuals is
their preference for each image (1: Yes, 0: No). The images are characterized by seven
discrete properties or features, with specific levels:

 Density – Describes the number of circles in an image (3 levels).

 Color family – Describes the hue of the circles (3 levels).
Evolutionary Optimization of Support Vector Machines … 83

 Pointalization – Describes the size of the points that make the individual circles
(3 levels).
 Saturation – Describes the strength of the color within the circles (3 levels).
 Brightness – Describes the amount of light in the circles themselves (4 levels).
 Blur – Describes the crispness of the circles (2 levels).
 Background – Describes the background color of the image (3 levels).

Table 1. Features used to generate the 624 images (Rabelo, 2000)

Attribute Level 1 Level 2 Level 3 Level 4

1 Density X3 X2 X1 --
2 Color family Cold: blue, green purples Warm: red, orange --
3 Pointalization 5 15 50 --
4 Saturation 50 0 -- --
5 Brightness 50 -- -- -25
6 Motion blur 0 10 -- --
7 Background Black Gray White --

Density: Level 1 Cold vs Warm: Level 1 Density: Level 1 Cold vs Warm: Level 1
Pointalized: Level 1 Saturation: Level 1 Pointalized: Level 1 Saturation: Level 1
Light/Dark: Level 1 Motion blur: Level 1 Light/Dark: Level 2 Motion blur: Level 2
BKG: Level 3 BKG: Level 3

Figure 7. Images with features 1111113 and 1111223, respectively.

Density: Level 1 Cold vs Warm: Level 2 Density: Level 1 Cold vs Warm: Level 2
Pointalized: Level 1 Saturation: Level 1 Pointalized: Level 3 Saturation: Level 1
Light/Dark: Level 3 Motion blur: Level 2 Light/Dark: Level 3 Motion blur: Level 1
BKG: Level 2 BKG: Level 1

Figure 8. Images with features 1211323 and 1231311, respectively.

84 Fred K. Gruber

Density: Level 2 Cold vs Warm: Level 2 Density: Level 3 Cold vs Warm: Level 1
Pointalized: Level 2 Saturation: Level 3 Pointalized: Level 2 Saturation: Level 1
Light/Dark: Level 3 Motion blur: Level 2 Light/Dark: Level 2 Motion blur: Level 1
BKG: Level 1 BKG: Level 2

Figure 9. Images with features 2223321 and 3121212, respectively.

As an illustration, typical images for different values of these features are shown in
Figure 7 to Figure 9.
The response of each individual is an independent dataset. Rabelo (2001) compares
the performance of several learning algorithms on this collection of images.

Implementation Details

The support vector machine is based on a modified version of LIBSVM (Chang and
Lin, 2001) while the genetic algorithm implementation was written from the ground up in
C++ and compiled in Visual C++ .NET. In the following, we describe more details about
the genetic algorithm implementation.


Each individual is represented as a binary string that encodes five variables (see
Figure 10):

 The first 16 bits represents the cost or penalty value, C. It is scaled from 0.01 to
 The next 16 bits represents the width of the Gaussian kernel,  , scaled from
0.0001 to 1000.
 The next 2 bits represents 4 possible values for the degree d : from 2 to 5
 The next 16 bits represents the  parameter, which controls the percentage of
polynomial and Gaussian kernel. It was scaled from 0 to 1.
 Finally, the last parameter is the r value, which determines whether we use a
complete polynomial or not.
Evolutionary Optimization of Support Vector Machines … 85

Figure 10. Representation of parameters in genetic algorithm.

The binary code i that represents each variable is transformed to an integer
according to the expression

N 1
m   si 2i
i 0

where N is the number of bits. This integer value is then scaled to a real number in the
interval [a, b] according to
x am
2N 1

The precision depends on the range and the number of bits:

 .
2N 1

In addition, the LIBSVM program was modified to include a mixture of Gaussian

and polynomial kernel:

 (1  p)   u  v  r 
 u  v
2 d
pe .

Keerthi and Lin (2003) found that when a Gaussian RBF kernel is used for model
selection, there is no need to consider the linear kernel since it behaves as a linear kernel
for certain values of the parameters C and  .

Fitness Function

The objective function is probably the most important part of the genetic algorithms
since it is problem-dependent. We need a way to measure the performance or quality of
86 Fred K. Gruber

the different classifiers that are obtained for the different value of the parameters. As
indicated previously, several methods try to estimate the generalization error of a
classifier. Contrary to other applications of GAs, the objective function in this problem is
a random variable with associated variance and it is computationally expensive since it
involves training a learning algorithm. In order to decide which method to use, we
developed several experiments in order to find the estimator with the lowest variance.
The results are summarized in Table 2.
The hold out technique had the highest standard deviation. Stratifying the method,
i.e., keeping the same ratio between classes in the training and testing set slightly reduced
the standard deviation. All crossvalidation estimates had a significantly lower standard
deviation than the hold out technique.

Table 2. Mean and standard deviation of different types of

generalization error estimates

Technique Mean (%) Standard Deviation (%)

10 fold Stratified Modified Crossvalidation 86.830 0.461
Modified Crossvalidation 86.791 0.463
Stratified Crossvalidation 86.681 0.486
Crossvalidation 86.617 0.496
5 fold Stratified Modified Crossvalidation 86.847 0.540
5 fold Stratified Crossvalidation 86.567 0.609
5 fold Crossvalidation 86.540 0.629
Stratified hold out 86.215 1.809
Hold out 86.241 1.977

Since there is no statistically significant difference in the standard deviation between

the different crossvalidation techniques, we use one of the most common: 10-fold
We also considered an approximation of the leave-one-out estimator that was
proposed in Joachims (1999) but we found that the estimated error diverged from the
crossvalidation estimates for large values of the parameter C . This behaviour was also
observed in the work of Duan et al. (2003).

Crossover, Selection, and Mutation

Several crossover operators are tested: one point, two point, uniform, and multiparent
Evolutionary Optimization of Support Vector Machines … 87

We consider two mutation operators: a simple mutation with fixed mutation

probability and a more complex mutation operator with dynamic rate of mutation that
depends on the generation according to the equation:

 n2 
pt   2  t  .
 T 1 

In addition, the simple mutation operator was also modified to experiment with other
techniques for varying the mutation rate: a self-adaptation method and a feedback
mechanism based on the genetic diversity.
The self-adaptation method consists on adding 16 bits to each individual in order to
obtain a probability p . From this value the mutation rate is obtained according to the
following equation (Bäck and Schütz, 1996):

 1  p   N (0,1) 
p '  1  e 
 p 

Where  is the rate that controls the adaptation speed and N (0,1) is a random normal
number with mean 0 and standard deviation 1. The normal random variable is generated
according to the Box and Muller method (see, for example, Law and Kelton 2000 p 465).
The feedback mechanism was based on calculating the genetic diversity of the
population by the ratio between the average and the best fitness (
AvgFitness / BestFitness ). If the genetic diversity falls below a particular level the
mutation rate is increased and the crossover rate is reduced. The contrary happens if the
genetic diversity becomes bigger than a given value.
For the selection operator we only considered the deterministic k Tournament

Comparison between Variations of GAs

To select the operators with the best performance (e.g., faster convergence of the
GA) from the different possibilities, we repeated the runs 30 times with different random
initial solutions. With each replication, we obtain an independent estimate for the best
generalization ability at each generation.
At the start of each replication, the dataset is randomly split in the ten subsets
required by the 10-fold crossvalidation. Using the same split during the whole run allows
us to study the effect of the different variations without being affected by randomness,
88 Fred K. Gruber

i.e., one particular model will always have the same performance throughout the run of
the genetic algorithm. At the same time, since we are doing 30 replications –each with a
different random split— we can get a good idea of the average performance as a function
of the generation for each of the variations of the genetic algorithm. Figure 11
summarizes this process in an activity diagram.
Table 3 lists the different combinations of parameters of the GA that were tested. It
was assumed that the performance of each parameter is independent of the others,
therefore, not every combination of parameter values were tested.

Table 3. Parameters of the genetic algorithm used for testing the different variations

Parameter Value
Population 10
Generations 20
Prob. of crossover 0.95
Prob. of mutation 0.05
Fitness function 10 fold crossvalidation
Selection 2-Tournament selection
Crossover types One point, two point, uniform, diagonal with 4 parents
Mutation type Fixed rate, dynamic rate, self-adaptive rate, feedback
Other Elitism, no elitism

After repeating the experiment 30 times we calculated the average for each
generation. A subset of 215 points is used for the experiments. This subset was obtained
in a stratified manner (the proportion of individuals of class 1 to class -1 was kept equal
to the original dataset) from individual number 2. The reduction of the number of points
is done to reduce the processing time.
In most cases, we are interested in comparing the performance measures at the 20 th
generation the genetic algorithms using different parameters. This comparison is made
using several statistical tests like 2 sample t test and best of k systems (Law and Kelton,

Effect of the Elitist Strategy

Figure 12 shows the effect of elitism when the genetic algorithm uses a one-point
crossover with crossover rate of 0.95 and simple mutation with mutation rate of 0.05.
Evolutionary Optimization of Support Vector Machines … 89

Figure 11. Overview of the genetic algorithm.

90 Fred K. Gruber

Figure 12. Effect of elitism in the best fitness per generation.

Figure 13. Standard deviation of the average best fitness for elitism vs. not elitism.

We use simple elitism, i.e., the best parent is passed unmodified to the next
generation. As it is shown in Figure 12, by not using elitism there is a risk of losing good
individuals, which may also increase the number of generations needed to find a good
Evolutionary Optimization of Support Vector Machines … 91

A two-sample t-test shows that, at generation 20, the average best fitness of the
elitism GA is significantly higher at the 0.1 level with a p-value of 0.054 and a lower
limit for the 90% confidence interval of 0.542557.
Figure 13 shows the standard deviation of the two GAs as a function of the
generation which illustrates another advantage of using the elitist strategy: as the
generation increases, the standard deviation decreases. The standard deviation of the GA
with elitist strategy is significantly lower at the 20th generation at the 0.1 level in the F
test for two variances and the Bonferroni confidence interval.

Effect of Crossover Type

We tested four crossover types: one point, two points, uniform, and a 4-parents
diagonal. The comparison is shown in Figure 14 and Figure 15.

Figure 14. Effect of the different crossover type on the fitness function.

Table 4. Average and Variance in the 20th generation as a function of the

crossover type

Crossover Type Average Variance

Diagonal 84.24481 1.015474
Two-point 84.10167 0.456379
Uniform 84.06692 1.105777
One-point 83.71069 1.593839
92 Fred K. Gruber

Figure 15. Effect of the different crossover type on the standard deviation.

The 4-parent diagonal crossover has the highest fitness function at the 20th
generation; however, it has a higher standard deviation than the two-point crossover (see
Figure 15 and Table 4). In order to make a decision we use a technique found in Law and
Kelton (2000) for finding the best of k systems. With this methodology, we selected the
diagonal crossover as the best for this particular problem.

Effect of Varying Mutation Rates

Four ways to set the mutation rate are tested: fixed mutation rate, dynamically
adapted, self-adaptation, and feedback. The other parameters are kept constant: diagonal
crossover with 4 parents, crossover rate of 0.95 and tournament selection. For the fixed
mutation rate, the probability of mutation is set to 0.05. The behavior of the average best
fitness as a function of the generation is shown in Figure 16. Figure 17 shows the
behavior of the standard deviation.
Evolutionary Optimization of Support Vector Machines … 93

Figure 16. Effect of mutation rate adaptation.

Figure 17. Standard deviation of the best fitness per generation.

Again, to select between the different techniques we use the select the best of k
system methodology to choose among the different techniques with the best performance
at the 20th generation. The selected method is the fixed mutation rate.
The assumption of normality is tested with Anderson Darling test.
94 Fred K. Gruber

Final Variation of the GA

Based on the results of the previous experiments, we selected the parameters shown
in Table 5.

Table 5. Parameters in the final genetic algorithm

Parameters Value
Population 10
Generations 20
Prob. of crossover 0.95
Prob. of mutation 0.05
Fitness function 10-fold crossvalidation
Selection 2-Tournament selection
Crossover types Diagonal with 4 parents
Mutation type Fixed rate
Others Elitist strategy

The activity diagram of the final genetic algorithm is shown in Figure 18. The most
important difference between this final model and the one used in the previous section is
related to the random split of the data. Instead of using only one split of the data for the
complete run of the GA, every time the fitness of the population is calculated, we use a
different random split (see Figure 19).
As a result, all individuals at a particular generation are measured under the same
conditions. Using only one random split throughout the whole run of the GA carries the
danger that the generalization error estimate for one particular model may be higher than
for other models because of the particular random selection and not because it was really
better in general. Using a different random split before calculating the fitness of every
individual carries the same danger: an apparent difference in performance may be due to
the particular random order and not due to the different value of the parameters.
While repeating the estimate several times and getting an average would probably
improve the estimate, the increase in computational requirements makes this approach
prohibitive. For example, if we have 10 individuals and we use 10 fold crossvalidation
we would have to do 100 trainings per generation. If in addition, we repeat every estimate
10 times to get an average we would have to do 1000 trainings. Clearly, for real world
problems this is not a good solution.
Using the same random split in each generation has an interesting analogy with
natural evolution. In nature the environment (represented by a fitness function in GAs) is
likely to vary with time, however, at any particular time all individuals are competing
under the same conditions.
Evolutionary Optimization of Support Vector Machines … 95

Figure 18. Final genetic algorithm.

Figure 19. Calculation of the fitness of the population.

96 Fred K. Gruber

Other Implementations

Several Python and R implementations of GAs are available and we list a few of
them here.
In Python the package DEAP: Distributed Evolutionary Algorithms (Fortin et al.,
2012) in Python provides an extensive toolbox of genetic algorithms libraries that allows
rapid prototyping and testing of most of the ideas presented here. It also supports
parallelization and other evolutionary strategies like genetic programming and evolution
Pyevolve is another package in python for genetic algorithms that implements many
of the representations and operators of classical genetic algorithms.
In R the GA package provides a general implementation of genetic algorithms able to
handle both discrete and continuous cases as well as constrained optimization problems.
It is also possible to create hybrid genetic algorithms to incorporate efficient local search
as well as parallelization either in a single machine with multiple cores or in multiple
There are also more specialized genetic algorithms implementations in R for very
specific applications. The “caret” package (Kuhn, 2008) provides a genetic algorithm
tailored towards supervised feature selection. The R package “gaucho” (Murison and
Wardell, 2014) uses a GA for analysing tumor heterogeneity from sequencing data and
“galgo” (Trevino and Falciani, 2006) uses GAs for variable selection for very large
dataset like for genomic datasets.


In this section, we compare the performance of the proposed algorithm in Figure 18

with several SVMs with arbitrarily selected kernel and parameters.
The experiments are performed with selected individuals of the previously mentioned
case study. The individuals were selected according to the worst performance as reported
in Rabelo (2000). All 648 data points were used in the experiments.
The generalization performance of the model constructed by the GAs was then
compared to the performance of a model constructed by arbitrarily selecting the kernel
and the kernel parameters. This method of selecting the model will be referred to from
now as the conventional way. In order to compare the different models, the 10-fold
crossvalidation was repeated 50 times using the same stream of random numbers. This is
akin to the common random number technique (Law and Kelton, 2000) to reduce
variance. Additionally, the best model from the conventional method was compared with
the model created by the GA by a paired t test to determine if the difference was
Evolutionary Optimization of Support Vector Machines … 97

The model created by the genetic algorithms had the parameters shown in Table 6.

Table 6. Best model found by the genetic algorithm

Dataset  C Degree p r
Ind7 451.637 959.289 2 0.682536 1
Ind10 214.603 677.992 2 0.00968948 1
Ind100 479.011 456.25 2 0.428016 1

Interestingly, for 2 datasets (ind7 and ind100) the chosen kernel was a mixture of
Gaussian and polynomial kernel.
For the conventional method, the kernel is arbitrarily set to Gaussian and the penalty
value C was set to 50 while the kernel width  is varied to 0.1, 0.5, 1, 10, and 50. The
average generalization error after the 50 replications for 3 individuals from the case study
is shown in Table 7 and Table 8 and the Tufte’s boxplot (Tufte, 1983) are shown in
Figure 20-Figure 22 where we compare the percentage of misclassification.

Table 7. Performance of models created using the conventional method

Kernel width (  ) Ind7 Ind10 Ind100

0.1 23.9168 24.3358 24.1783
0.5 30.5086 29.8396 30.4063
1 29.0546 28.4365 29.2966
10 30.3981 46.2980 38.2692
50 30.3981 46.2980 38.2692

Table 8. Performance of model created using the genetic algorithm

Ind7 Ind10 Ind100

GA 22.0025 21.8491 21.9937

The results of a paired t-test of the difference between the performance of best model
using the conventional method and the model constructed by the genetic algorithms show
that the difference in performance is statistically significant at the 95% level.
These experiments show that using genetic algorithms are an effective way to find a
good set of parameters for support vector machines. This method will become
particularly important as more complex kernels with more parameters are designed.
Additional experiments including a comparison with neural networks can be found in
Gruber (2004).
98 Fred K. Gruber

Figure 20. Average performance of the different models for dataset Ind7.

Figure 21. Average performance of the different models for dataset Ind10.

Figure 22. Average performance of the different models for dataset Ind100.
Evolutionary Optimization of Support Vector Machines … 99


In this chapter, we explored the use of genetic algorithms to optimize the parameters
of a SVM and proposed a specific variation that we found to perform better. The
proposed algorithm uses 10-fold crossvalidation as its fitness function. Several types of
crossover and mutation for the genetic algorithm were implemented and compared and it
was found that a diagonal crossover with 4 parents and a fixed mutation rate provided the
best performance.
The SVM engine is based on a C++ version of LIBSVM (Chang and Lin, 2001). This
implementation was modified to include a kernel that is a mixture of Gaussian and
polynomial kernels. Thus, the genetic algorithm has the flexibility to decide how much
weight to assign to each kernel or remove one altogether.
The results from experiments using a data set representing individual models for
electronic commerce (Ryan, 1999) show that GAs are able to find a good set of
parameters that in many cases lead to improve performance over using a SVM with fixed.
While the value of using GAs for finding optimal parameters might not seem so
obvious for SVMs with simple kernels like a Gaussian RBF with only one parameter to
set, as applications continue to appear and new, more complicated kernels (and likely
with more parameters) are designed for specific problems, this need will become
apparent. As illustration of this we created a new kernel which is a mixture of RBF and
complete polynomial kernel. This kernel was previously tested in regression problems by
other researchers. Here we found that it also gives good results for classification
It was also shown that 10 fold crossvalidation is a good estimator of the
generalization performance of support vector machines and it allowed us to guide the
genetic algorithm to good values for the parameters of the SVM. In addition, we explored
the possibility of using the efficient bound to leave-one-out known as  but we found to
be biased for large values of the parameter C .
Finally, we should state that this improvement in performance comes with the price
of an increased processing time. This downside can be minimized by finding more
efficient and unbiased estimates of the performance of SVMs.


Ali, S. & Smith, K. (2003, October). Automatic parameter selection for polynomial
kernel. In Information Reuse and Integration, 2003. IRI 2003. IEEE International
Conference on (pp. 243-249). IEEE.
100 Fred K. Gruber

Bäck, T., & Schütz, M. (1996). Intelligent mutation rate control in canonical genetic
algorithms. Foundations of Intelligent Systems, 158-167.
Bäck, T., Fogel, D., & Michalewicz, Z. (Eds.). (2000). Evolutionary computation 1:
Basic algorithms and operators (Vol. 1). CRC press.
Bazaraa, M., Sherali, H., & Shetty, C. (2013). Nonlinear programming: theory and
algorithms. John Wiley & Sons.
Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data
mining and knowledge discovery, 2(2), 121-167.
Burman, P. (1989). A comparative study of ordinary cross-validation, v-fold cross-
validation and the repeated learning-testing methods. Biometrika, 503-514.
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple
parameters for support vector machines. Machine learning, 46(1), 131-159.
Chang, C., & Lin, C. (2011). LIBSVM: a library for support vector machines. ACM
Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Chen, X. (2003, August). Gene selection for cancer classification using bootstrapped
genetic algorithms and support vector machines. In Bioinformatics Conference, 2003.
CSB 2003. Proceedings of the 2003 IEEE (pp. 504-505). IEEE.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines.
University Press, 2000.
Demuth, H., Beale, M., & Hagan, M. (2008). Neural network toolbox™ 6. User’s guide,
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised
classification learning algorithms. Neural computation, 10(7), 1895-1923.
Duan, K., Keerthi, S. S., & Poo, A. N. (2003). Evaluation of simple performance
measures for tuning SVM hyperparameters. Neurocomputing, 51, 41-59.
Dumitrescu, D., Lazzerini, B., Jain, L. C., & Dumitrescu, A. (2000). Evolutionary
computation. CRC press.
Eiben, A. E. (2003). Multiparent recombination in evolutionary computing. Advances in
evolutionary computing, 175-192.
Fishwick, P. A., & Modjeski, R. B. (Eds.). (2012). Knowledge-based simulation:
methodology and application (Vol. 4). Springer Science & Business Media.
Frie, T. T., Cristianini, N., & Campbell, C. (1998, July). The kernel-adatron algorithm: a
fast and simple learning procedure for support vector machines. In Machine
Learning: Proceedings of the Fifteenth International Conference (ICML'98)
(pp. 188-196).
Frohlich, H., Chapelle, O., & Scholkopf, B. (2003, November). Feature selection for
support vector machines by means of genetic algorithm. In Tools with Artificial
Intelligence, 2003. Proceedings. 15th IEEE International Conference on (pp. 142-
148). IEEE.
Evolutionary Optimization of Support Vector Machines … 101

Golberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning

reading. MA: Addison-Wiley, USA.
Goldberg, D. E. (1991). Real-coded genetic algorithms, virtual alphabets, and blocking.
Complex systems, 5(2), 139-167.
Gruber, F. K. (2004). Evolutionary Optimization of Support Vector Machines (Doctoral
dissertation, University of Central Florida Orlando, Florida).
Herbrich, R. (2001). Learning kernel classifiers: theory and algorithms. MIT Press.
Holland, J. H. (1975). Adaptation in natural and artificial systems. An introductory
analysis with application to biology, control, and artificial intelligence. Ann Arbor,
MI: University of Michigan Press.
Joachims, T. (2000). Estimating the generalization performance of a SVM efficiently.
Universität Dortmund.
John, P. (1998). How to implement SVM’s, Microsoft Research. IEEE Intelligent
Kaufman, L. (1998). Solving the quadratic programming problem arising in support
vector classification. Advances in Kernel Methods-Support Vector Learning, 147-
Keerthi, S. S., & Lin, C. J. (2003). Asymptotic behaviors of support vector machines with
Gaussian kernel. Neural computation, 15(7), 1667-1689.
Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy
estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137-1145).
Kuhn, M. (2008). Caret package. Journal of Statistical Software, 28(5), 1-26.
Law, A. M., Kelton, W. D., & Kelton, W. D. (1991). Simulation modeling and analysis
(Vol. 2). New York: McGraw-Hill.
Lendasse, A., Wertz, V., & Verleysen, M. (2003). Model selection with cross-validations
and bootstraps—application to time series prediction with RBFN models. Artificial
Neural Networks and Neural Information Processing—ICANN/ICONIP 2003, 174-
Lessmann, S., Stahlbock, R., & Crone, S. F. (2006, July). Genetic algorithms for support
vector machine model selection. In Neural Networks, 2006. IJCNN'06. International
Joint Conference on (pp. 3063-3069). IEEE.
Martin, J., & Hirschberg, D. (1996). Small sample statistics for classification error rates
II: Confidence intervals and significance tests.
Mendenhall, W., & Sincich, T. (2016). Statistics for Engineering and the Sciences. CRC
Michalewicz, Z. (1996). Introduction. In Genetic Algorithms+ Data Structures=
Evolution Programs (pp. 1-10). Springer Berlin Heidelberg.
Mitchell, M. (1998). An introduction to genetic algorithms (complex adaptive systems).
Murison A. &. Wardell, “gaucho: Genetic Algorithms for Understanding Clonal
Heterogeneity and Ordering,” R package version 1.12.0, 2014.
102 Fred K. Gruber

Quang, A., Zhang, Q., & Li, X. (2002). Evolving support vector machine parameters. In
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International
Conference on (Vol. 1, pp. 548-551). IEEE.
Rabelo, L. (2001). What intelligent agent is smarter?: A comparison (MS Thesis,
Massachusetts Institute of Technology).
Rothenberg, J. (1991, December). Tutorial: artificial intelligence and simulation. In
Proceedings of the 23rd conference on Winter simulation (pp. 218-222). IEEE
Computer Society.
Ryan, K. (1999). Success measures of accelerated learning agents for e-commerce
(Doctoral dissertation, Massachusetts Institute of Technology).
Schölkopf, B. & Smola, A. (2002). Learning with kernels: support vector machines,
regularization, optimization, and beyond. MIT press.
Scrucca, L. (2013). GA: a package for genetic algorithms in R. Journal of Statistical
Software, 53(4), 1-37.
Sepulveda-Sanchis, J., Camps-Valls, G., Soria-Olivas, E., Salcedo-Sanz, S., Bousono-
Calzon, C., Sanz-Romero, G., & de la Iglesia, J. M. (2002, September). Support
vector machines and genetic algorithms for detecting unstable angina. In Computers
in Cardiology, 2002 (pp. 413-416). IEEE.
Shao, X., & Cherkassky, V. (1999, July). Multi-resolution support vector machine. In
Neural Networks, 1999. IJCNN'99. International Joint Conference on (Vol. 2, pp.
1065-1070). IEEE.
Shawe-Taylor, J. & Campbell, C. (1998). Dynamically adapting kernels in support vector
machines. NIPS-98 or NeuroCOLT2 Technical Report Series NC2-TR-1998-017,
Dept. of Engineering Mathematics, Univ. of Bristol, UK.
Smits, G. & Jordaan, E. (2002). Improved SVM regression using mixtures of kernels. In
Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint
Conference on (Vol. 3, pp. 2785-2790). IEEE.
Thierens, D. (2002, May). Adaptive mutation rate control schemes in genetic algorithms.
In Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on
(Vol. 1, pp. 980-985). IEEE.
Trevino, V., & Falciani, F. (2006). GALGO: an R package for multivariate variable
selection using genetic algorithms. Bioinformatics, 22(9), 1154-1156.
Tufte, E. R. (1983). The visual display of information. Conn: Graphic Press, 1983 pp.
Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business
Vasconcelos, J. A., Ramirez, J. A., Takahashi, R. H. C., & Saldanha, R. R. (2001).
Improvements in genetic algorithms. IEEE Transactions on magnetics, 37(5), 3414-
Evolutionary Optimization of Support Vector Machines … 103

Weiss, S. & Indurkhya, N. (1994, October). Decision tree pruning: biased or optimal?. In
AAAI (pp. 626-632).
Weiss, S. (1991). Small sample error rate estimation for k-NN classifiers. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 13(3), 285-289.
Wooldridge, M. (2009). An introduction to multiagent systems. John Wiley & Sons.
Xiangrong, Z., & Fang, L. (2002, August). A pattern classification method based on GA
and SVM. In Signal Processing, 2002 6th International Conference on (Vol. 1, pp.
110-113). IEEE.
Xuefeng, L., & Fang, L. (2002, August). Choosing multiple parameters for SVM based
on genetic algorithm. In Signal Processing, 2002 6th International Conference on
(Vol. 1, pp. 117-119). IEEE.
Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE, 87(9),
Zhou, L. & Da, W. (2005). Pre-extracting Support Vector for Support Vector Machine
Based on Vector Projection [J]. Chinese Journal of Computers, 2, 000.


Dr. Fred Gruber is a Principal Scientist at GNS Healthcare, where he develops

computational and statistical models integrating different types of clinical and genomic
datasets with the goal of discovering new potential drug targets, understanding mechanisms
of disease, and, in general, helping answer the research questions from clients in the
pharmaceutical and health industries. He is involved with every stage of the process from data
preprocessing to model construction and interpretation. Fred has over 10 years of academic
and industry experience developing and implementing algorithms for extracting and making
sense of different types of data. His expertise includes machine learning predictive models,
causal inference, statistical signal processing, inverse problems theory, and simulation and
modeling of systems.
Fred holds a Bachelor of Science in Electrical Engineering from the Technological
University of Panamá, a Master of Science in Industrial Engineering specializing in modeling
and simulation of systems from the University of Central Florida, and a Ph.D. in Electrical
Engineering from Northeastern University.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 5



Loris Nanni1,*, Sheryl Brahnam2 and Alessandra Lumini3

Department of Information Engineering, University of Padua, Via Gradenigo 6,
Padova, Italy
Computer Information Systems, Missouri State University, 901 S. National,
Springfield, MO, US
Department of Computer Science and Engineering DISI, Università di Bologna, Via
Sacchi 3, Cesena, Italy

Good feature extraction methods are key in many pattern classification problems
since the quality of pattern representations affects classification performance.
Unfortunately, feature extraction is mostly problem dependent, with different descriptors
typically working well with some problems but not with others. In this work, we propose
a generalized framework that utilizes matrix representation for extracting features from
patterns that can be effectively applied to very different classification problems. The idea
is to adopt a two-dimensional representation of patterns by reshaping vectors into
matrices so that powerful texture descriptors can be extracted. Since texture analysis is
one of the most fundamental tasks used in computer vision, a number of high performing
methods have been developed that have proven highly capable of extracting important
information about the structural arrangement of pixels in an image (that is, in their
relationships to each other and their environment). In this work, first, we propose some
novel techniques for representing patterns in matrix form. Second, we extract a wide
variety of texture descriptors from these matrices. Finally, the proposed approach is

Corresponding Author Email:
106 Loris Nanni, Sheryl Brahnam and Alessandra Lumini

tested for generalizability across several well-known benchmark datasets that reflect a
diversity of classification problems. Our experiments show that when different
approaches for transforming a vector into a matrix are combined with several texture
descriptors the resulting system works well on many different problems without requiring
any ad-hoc optimization. Moreover, because texture-based and standard vector-based
descriptors preserve different aspects of the information available in patterns, our
experiments demonstrate that the combination of the two improves overall classification
performance. The MATLAB code for our proposed system will be publicly available to
other researchers for future comparisons.

Keywords: two-dimensional representation


Most machine pattern recognition problems require the transformation of raw sensor
data so that relevant features can be extracted for input into one or more classifiers. A
common first step in machine vision, for instance, is to reshape the sensor matrix by
concatenating its elements into a one dimensional vector so that various feature
transforms, such as principal component analysis (PCA) (Beymer & Poggio, 1996), can
be applied that side step the curse of dimensionality by reducing the number of features
without eliminating too much vital information. Reshaping the data matrix into a vector,
however, is not necessarily the only nor the best approach for representing raw input
values [16]. One problem with vectorizing a data matrix is that it destroys some of the
original structural knowledge (D. Li, Zhu, Wang, Chong, & Gao, 2016; H. Wang &
Ahuja, 2005).
In contrast to vectorization, direct manipulation of matrices offers a number of
advantages, including an improvement in the performance of canonical transforms when
applied to matrices, a significant reduction in computational complexity (Loris Nanni,
Brahnam, & Lumini, 2012; Z. Wang, Chen, Liu, & Zhang, 2008), and enhanced
discrimination using classifiers developed specifically to handle two-dimensional data
(see, for example, (Z. Wang & Chen, 2008) and (Z. Wang et al., 2008)). Moreover, some
of the most powerful state-of-the-art two-dimensional feature extraction methods, such as
Gabor filters (Eustice, Pizarro, Singh, & Howland, 2002) and Local binary patterns
(LBP) (L. Nanni & Lumini, 2008; Ojala, Pietikainen, & Maeenpaa, 2002), and their
variants, extract descriptors directly from matrices. Other methods, such as Two-
Dimensional Principal Component Analysis (2DPCA) (Yang, Zhang, Frangi, & Yang,
2004) and Two-Dimensional Linear Discriminant Analysis (2DLDA) (J. Li, Janardan, &
Li, 2002), allow classic transforms, such as PCA and Linear Discriminant Analysis
(LDA) (Zhang, Jing, & Yang, 2006), to work directly on matrix data. By projecting
matrix patterns via matrices, both 2DPCA and 2DLDA avoid the singular scatter matrix
problem. Classifier systems that are designed to handle two-dimensional data include
Texture Descriptors for The Generic Pattern Classification Problem 107

Min-Sum matrix Products (MSP) (Felzenszwalb & McAuley, 2011), which has been
shown to efficiently solve the Maximum-A-Posteriori (MAP) inference problem,
Nonnegative Matrix Factorization (NMF) (Seung & Lee, 2001), which has become a
popular choice for solving general pattern recognition problems, and the Matrix-pattern-
oriented Modified Ho-Kashyap classifier (MatMHKS) (S. Chen, Wang, & Tian, 2007),
which significantly decreases memory requirements. MatMHKS has recently been
expanded to UMatMHKS (H. Wang & Ahuja, 2005), so named because it combines
matrix learning with Universum learning (Weston, Collobert, Sinz, Bottou, & Vapnik,
2006), a combination that was shown in that study to improve the generalization
performance of classifiers.
In the last ten years, many studies focused on generic classification problems have
investigated the discriminative gains offered by matrix feature extraction methods (see,
for instance, (S. C. Chen, Zhu, Zhang, & Yang, 2005; Liu & Chen, 2006; Z. Wang &
Chen, 2008; Z. Wang et al., 2008)). Relevant to the work presented here is the
development of novel methods that take vectors and reshape them into matrices so that
state-of-the-art two-dimensional feature extraction methods can be applied. Some studies
along these lines include the reshaping methods investigated in (Z. Wang & Chen, 2008)
and (Z. Wang et al., 2008) that were found capable of diversifying the design of
classifiers, a diversification that was then exploited by a technique based on AdaBoost. In
(Kim & Choi, 2007) a composite feature matrix representation, derived from discriminant
analysis, was proposed. A composite feature takes a number of primitive features and
corresponds them to an input variable. In (Loris Nanni, 2011) Local Ternary Patterns
(LTP), a variant of LBP, were extracted from vectors rearranged into fifty matrices by
random assignment; an SVM was then trained on each of these matrices, and the results
were combined using the mean rule. This method led the authors in (Loris Nanni, 2011)
to observe that both one-dimensional vector descriptors and two-dimensional texture
descriptors can be combined to improve classifier performance; moreover, it was shown
that linear SVMs consistently perform well with texture descriptors.
In this work, we propose a new classification system, composed of an ensemble of
Support Vector Machines (SVMs). The ensemble is built training each SVM with a
different set of features. Three novel approaches for representing a feature vector as an
image are proposed; texture descriptors are then extracted from the images and used to
train an SVM. To validate this idea, several experiments are carried out on several

Proposed Approach

As mentioned in the introduction, it is quite common to represent a pattern as a one

dimensional feature vector, but a vector is not necessarily the most effect shape for
108 Loris Nanni, Sheryl Brahnam and Alessandra Lumini

machine learning (Loris Nanni et al., 2012). In (Z. Wang & Chen, 2008; Z. Wang et al.,
2008) classifiers were developed for handling two-dimensional patterns, and in (Loris
Nanni et al., 2012) it was shown that a continuous wavelet can be used to transform a
vector into a matrix; once in matrix form, it can then be described using standard texture
descriptors (the best performance obtained in (Loris Nanni et al., 2012) used a variant of
the local phase quantization based on a ternary coding).
The advantage of extracting features from a vector that has been reshaped into a
matrix is the ability to investigate the correlation among sets of features in a given
neighborhood; this is different from coupling feature selection and classification. To
maximize performance, it was important that we test several different texture descriptors
and different neighborhood sizes. The resulting feature vectors were then fed into an
The following five methods for reshaping a linear feature vector into a matrix were
tested in this paper. Letting 𝐪 ∈ 𝑅 𝑠 be the input vector, 𝐌 ∈ ℜ𝑑1 ×𝑑2 the output matrix
(where d1 and d2 depend on the method), and a ∈ ℜ𝑠 a random permutation of the
indices [1..s], the five methods are:

1. Triplet (Tr): in this approach d1 =d2 =255. First, the original feature vector q is
normalized to [0,255] and stored in n. Second, the output matrix 𝑀 ∈ ℜ255×255 is
initialized to 0. Third, a randomization procedure is performed to obtain a random
permutation aj for j=1..100000 that updates M according to the following formula:
M(n(aj(1)), n(aj (2))) = M(n(aj(1)), n(aj (2))) + q(aj (3));
2. Continuous wavelet (CW) (Loris Nanni et al., 2012): in this approach d1 =100 d2
=s. This method applies the Meyer continuous wavelet to the s dimensional feature vector
q and builds M by extracting the wavelet power spectrum, considering the 100 different
decomposition scales;
3. Random reshaping (RS): in this approach d1=d2=s0.5 and M is a random
rearrangement of the original vector into a square matrix. Each entry of matrix M is an
element of q(a);
4. DCT: in this approach the resulting matrix M has dimensions d1 = d2 = s and
each entry M(i, j) = dct(q(aij(2..6)), where dct() is the discrete cosine transform, aij is a
random permutation (different for each entry of the matrix), and the indices 2..6 are used
to indicate that the number of considered features varies between two and six. We use
DCT in this method because it is considered the de-facto image transformation in most
visual systems. Like other transforms, the DCT attempts to decorrelate the input data.
The 1-dimensional DCT is obtained by the product of the input vector and the orthogonal
matrix whose rows are the DCT basis vectors (the DCT basis vectors are orthogonal and
normalized). The first transform coefficient (referred to as the DC Coefficient) is the
average value of the input vector, while the others are called the AC Coefficients. After
several tests we obtained the best performance using the first DCT coefficient;
Texture Descriptors for The Generic Pattern Classification Problem 109

5. FFT: the same procedure as DCT but, instead of using a discrete cosine
transform, the Fast Fourier transform is used. Similar to DCT, the FFT decomposes a
finite-length vector into a sum of scaled-and-shifted basis functions. The difference is the
type of basis function used by each transform: while the DCT uses only (real-valued)
cosine functions, the DFT uses a set of harmonically-related complex exponential
functions. After several tests, we obtained the best performance using the first FFT
coefficient (i.e., the sum of values of the vector).

The following methods were used to describe a given matrix:

 Multiscale Local Phase Quantization (MLPQ) (Chan, Tahir, Kittler, &
Pietikainen, 2013; Ojansivu & Heikkila, 2008), where R, the radius of the
neighborhood is set to R=3 and R=5. MLPQ is a variant of LPQ, which is a blur-
robust image descriptor designed as a multiscale evolution of the LPQ. The main
idea behind LPQ is to extract the phase information in the frequency domain so
that it is robust to blur variation. The local phase information is extracted using a
2D windowed Fourier transform on a local window surrounding each pixel
position. MLPQ is computed regionally and adopts a component-based
framework to maximize the insensitivity to misalignment, a phenomenon
frequently encountered in blurring. Regional features are combined using kernel
 Complete local binary pattern (CLBP) (Guo, Zhang, & Zhang, 2010): with values
(R=1; P=8) and (R=2; P=16), where R is the radius, and P is the number of the
neighborhood. CLBP is a variant of LBP, which is an effective texture descriptor
used in various image processing and computer vision applications. LBP is
obtained from the neighboring region of a pixel by thresholding the neighbors
with the center pixel to generate a binary number. The LBP only uses the sign
information of a local difference while ignoring the magnitude information. In
the CLBP scheme, the image local differences are decomposed into two
complementary components: the signs and magnitudes. In our experiments we
used two values of R and P, and we concatenate the descriptors.
 Histogram of Gradients (HoG) (Dalal & Triggs, 2005): HoG represents an image
by a set of local histograms that counts occurrences of gradient orientation in a
local subwindow of the image. The HoG descriptor can be extracted by
computing the gradients of the image, followed by dividing the image into small
subwindows, where a histogram of gradient directions is built for each
subwindow. In this work the input matrix is divided into 5×6 non-overlapping
subwindows, and gradient orientation histograms extracted from each sub-
windows are first normalized to achieve better invariance to changes in
illumination or shadowing and then concatenated for representing the original
input matrix;
 Wavelet features (WAVE): a wavelet is a “small wave” which has its energy
concentrated in time. In image processing, wavelets are used as a transformation
110 Loris Nanni, Sheryl Brahnam and Alessandra Lumini

technique to transfer data in one domain to another where hidden information can
be extracted. Wavelets have a nice feature of local description and separation of
signal characteristics and provides a tool for the simultaneous analysis of both
time and frequency. A wavelet is a set of orthonormal basis functions generated
from dilation and translation of a single scaling function or father wavelet (φ) and
a mother wavelet (ψ). In this work we use the Haar wavelet family, which is a
sequence of rescaled "square-shaped" functions that together form a wavelet
basis: the extracted descriptor is obtained as the average energy of the horizontal,
vertical or diagonal detail coefficients calculated up to the tenth level

According to several studies in the literature a good solution for improving the
performance of an ensemble approach is pattern perturbation. To improve the
performance an ensemble is obtained using 50 reshapes for each pattern: for each reshape
the original features of the pattern are randomly sorted. In this way 50 SVMs are trained
for each approach, and these SVMs are combined by sum rule. In the next section only
the performance of the ensemble of SVMs are reported, since in (Loris Nanni et al.,
2012) it is shown that such an ensemble improves the stand-alone version.

Experimental Results

To assess their versatility, the methods described above for reshaping a vector into a
matrix were challenged with several datasets (see Table 1). All the tested data mining
datasets are extracted from the well-known UCI datasets repository (Lichman, 2013),
except for the Tornado dataset (Trafalis, Ince, & Richman, 2003). Moreover, two
additional datasets are provided that are related to the image classification problem:

1. BREAST: a dataset intended to classify samples of benign and malignant tissues

(for details see (Junior, Cardoso de Paiva, Silva, & Muniz de Oliveira, 2009)). To
extract the features from each image, we extract the 100 rotation invariant LTP
bins, with P = 16 and R = 2, with higher variance (considering only the training
2. PAP: a dataset intended to classify each cell extracted from a pap test as either
normal or abnormal (for details see (Jantzen, Norup, Dounias, & Bjerregaard,
2005)). A linear descriptor of size 100 is extracted using the same procedure
described above.

A summary description of the tested datasets, including the number of patterns and
the dimension of the original feature vector, is reported in Table 1. All the considered
datasets are two class classification problems.
Texture Descriptors for The Generic Pattern Classification Problem 111

Table 1: Tested datasets

DATASET Short name N° patterns N° features

breast breast 699 9
heart heart 303 13
pima pima 768 8
sonar sonar 208 60
ionosphere iono 351 34
liver liver 345 7
haberman hab 306 3
vote vote 435 16
australian aust 690 14
transfusion trans 748 5
wdbc wdbc 569 31
breast cancer image bCI 584 100
pap test pap 917 100
tornado torn 18951 24
german credit gCr 1000 20

The testing protocol used in the experiments is the 5-fold CV method, except for the
Tornado dataset since it is already divided into separate training and testing sets. All
features in these datasets were linearly normalized between 0 and 1, using only the
training data for finding the parameters to normalize the data; this was performed before
feeding features into a SVM. The performance indicator used is the area under the ROC
curve (AUC).
In the following experiments, we optimized SVM for each dataset, testing both linear
and radial basis function kernels.
The first experiment is aimed at evaluating the five methods for reshaping a linear
feature vector into a matrix as described in section 2. In Table 2, we report the
performance of each reshaping approach coupled with each matrix descriptor, as detailed
in section 2.
Examining the results in Table 2, it is clear that TR performs rather poorly;
moreover, RS, coupled with LPQ and CLBP, have numerical problems in those datasets
where few features are available (thereby resulting in small matrices). The best reshaping
method is FFT, and the best tested descriptor is HOG.
The second experiment is aimed at evaluating the fusion among different reshaping
methods and different descriptors for proposing an ensemble that works well across all
tested datasets. The first four columns of Table 3 show the fusion of reshaping methods
(except Tr, due to its low performance) for each descriptor (labelled Dx, specifically,
DLPQ, DCLBP, DHoG, and DWave). The last four columns report the fusion of methods
obtained by fixing the descriptor and varying the reshaping procedures (labelled Rx,
specifically, RTr, RCW, RRS, RDCT, and RFFT).
112 Loris Nanni, Sheryl Brahnam and Alessandra Lumini

Table 2: Performance of each reshaping method coupled with the different

texture descriptors for each dataset


breast 98.0 97.6 0 96.8 97.3
heart 64.0 90.4 0 86.9 88.0
pima 53.1 73.6 0 71.6 71.4
sonar 60.9 92.6 92.1 93.0 93.6
iono 86.2 98.8 98.5 98.6 98.3
liver 56.7 68.9 0 70.8 71.6
hab 48.7 0 0 63.5 63.4
vote 49.1 96.9 0 97.7 97.9
aust 71.7 91.2 0 90.1 90.5
trans 52.4 0 0 68.0 67.6
wdbc 89.9 97.9 97.7 98.6 98.9
bCI 76.7 96.2 93.4 96.6 96.7
pap 70.3 84.2 85.7 87.2 88.1
torn 80.2 89.3 93.3 93.6 93.6
gCr 72.6 73.5 77.6 78.2 78.3
Average 68.7 76.7 42.5 86.1 86.3


breast 98.5 97.4 98.2 97.1 97.7
heart 74.3 90.3 89.9 88.1 88.2
pima 60.3 73.2 70.8 71.5 72.0
sonar 65.6 90.5 90.1 91.5 92.7
iono 86.8 96.2 98.1 98.6 98.4
liver 56.9 68.8 70.7 70.7 68.5
hab 59.6 60.0 0 63.3 64.2
vote 50.1 96.1 97.6 96.9 97.4
aust 74.8 91.0 91.3 90.7 90.9
trans 65.8 64.5 69.4 66.1 67.8
Texture Descriptors for The Generic Pattern Classification Problem 113

wdbc 87.8 95.5 98.2 98.0 98.7

bCI 74.9 92.7 93.8 96.1 96.5
pap 71.0 82.0 82.0 78.4 87.4
torn 93.6 90.0 93.6 93.6 93.9
gCr 77.0 71.3 77.0 77.6 77.7
Average 73.1 84.0 81.4 85.2 86.1


breast 98.4 99.4 99.3 99.3 99.3
heart 88.5 90.8 89.8 89.9 90.3
pima 75.5 79.4 80.0 79.7 79.6
sonar 71.0 94.2 92.8 93.6 93.1
iono 94.1 97.9 98.2 98.6 98.4
liver 58.9 72.7 73.7 72.4 73.5
hab 60.1 66.4 69.6 68.0 68.8
vote 82.8 97.8 98.8 97.4 97.6
aust 85.5 91.0 91.0 91.1 91.2
trans 62.7 66.8 68.8 68.2 69.7
wdbc 95.6 99.4 98.7 99.3 99.4
bCI 82.8 96.6 95.7 97.0 97.4
pap 71.3 84.4 87.5 87.4 87.6
torn 86.4 91.8 94.4 94.4 94.4
gCr 63.1 72.6 78.2 78.4 78.5
Average 78.4 86.7 87.8 87.6 87.9


breast 98.8 99.4 98.2 99.3 99.4
heart 88.0 89.7 86.9 88.3 89.8
pima 74.3 82.2 82.0 82.0 82.3
sonar 69.6 90.7 91.5 91.6 91.7
iono 87.1 97.3 98.4 97.7 97.2
liver 48.2 73.4 69.0 74.0 74.2
114 Loris Nanni, Sheryl Brahnam and Alessandra Lumini

Table 2 (Continued.)


hab 58.9 70.1 61.2 66.5 68.3
vote 60.1 96.9 82.6 96.7 97.8
aust 85.6 92.2 90.7 91.6 92.2
trans 62.1 71.2 64.7 69.9 71.5
wdbc 95.1 99.4 99.3 99.3 99.5
bCI 81.9 94.6 95.0 95.6 95.4
pap 72.4 80.7 84.0 85.2 85.5
torn 80.2 85.2 91.1 90.3 90.4
gCr 69.6 71.1 78.3 78.9 79.6
Average 75.5 86.3 84.9 87.1 87.6

Table 3: Performance (AUC) of the ensemble created by fixing the descriptor (first
four columns) and the reshaping method (last four columns).

breast 97.5 97.9 99.5 99.4 99.2 99.2 99.3 99.3 99.2
heart 89.3 89.3 90.2 89.9 89.5 90.6 90.3 89.3 90.1
pima 72.0 72.2 80.8 82.3 74.4 80.9 80.8 80.6 80.5
sonar 92.8 89.3 94.2 92.6 70.9 93.9 93.1 93.7 93.6
iono 98.6 97.9 98.2 97.8 92.6 98.2 98.6 98.6 98.4
liver 71.8 70.4 73.4 73.4 59.3 73.2 74.2 73.4 73.6
hab 62.6 61.5 69.0 69.2 60.7 66.4 69.5 67.0 68.1
vote 97.8 97.3 98.1 96.8 74.7 97.1 98.3 97.6 97.8
aust 90.4 90.9 91.2 92.1 83.8 91.4 91.9 91.6 91.8
trans 68.3 67.1 69.2 71.0 66.0 67.1 69.9 68.7 70.1
wdbc 98.8 98.4 99.4 99.5 94.5 99.4 99.4 99.5 99.6
bCI 96.5 96.2 96.6 95.2 83.7 96.4 95.5 96.5 96.8
pap 86.8 82.4 87.0 84.3 74.4 84.9 86.6 87.4 87.6
torn 92.8 93.4 94.0 89.4 85.2 92.9 94.8 94.5 94.6
gCr 77.1 76.8 77.5 77.4 68.3 75.0 78.5 79.1 79.8
Average 86.2 85.4 87.9 87.4 78.5 87.1 88.0 87.8 88.1
Texture Descriptors for The Generic Pattern Classification Problem 115

As expected, the best results in Table 3 are obtained by DHoG and RFFT, i.e., by the
best descriptor and the best reshaping method.
Finally, in Table 4 the result of our best ensembles are reported and compared with
two baseline approaches: the first, named 1D, is the classification method obtained
coupling the original 1D descriptor with a SVM classifier; the second, is the best method
proposed in our previous work (Loris Nanni et al., 2012).
Included in Table 4 are results of the following “mixed reshaping” ensembles, which
are designed as follows:

MR1= 2×RCW + RRS (i.e., weighted sum rule between RCW and RRS)
MR3= (RSHOG + RSWave) + 2 × (FFTHOG + FFTWave) (Xy means that the reshaping
method named X is coupled with the texture descriptor named Y)
MR4= MR2 + 2×1D
MR5= MR3 + 2×1D

Before fusion, the scores of each method are normalized to mean 0 and standard
deviation 1.
Table 4 includes the performance of the best ensemble proposed in our previous work
(Loris Nanni et al., 2012) that should be compared to MR2, where the fusion with 1D is
not considered.
The proposed ensembles work better than (Loris Nanni et al., 2012), except in the
two image datasets (bCI and pap). More tests will be performed to better assess the
performance when several features are available (as in bCI and pap). It may be the case
that different ensembles should be used that consider the dimensionally of the original
feature vector.
MR4 and MR5 perform similarly, with both outperforming 1D descriptors with a p-
value of 0.05 (Wilcoxon signed rank test (Demšar, 2006)). MR5 is a simpler approach,
however. This is a very interesting result since the standard method for training SVM is
to use the original feature vector. To reduce the number of parameters when MR4 or
MR5 are combined with 1D descriptors, we always use the same SVM parameters (RBF
kernel, C=1000, gamma=0.1) for MR4 and MR5 (while optimizing them for the 1D


This paper reports the results of experiments that investigate the performance
outcomes of extracting different texture descriptors from matrices that were generated by
reshaping the original feature vector. The study also reports the performance gains
offered by combining texture descriptors with vector-based descriptors.
116 Loris Nanni, Sheryl Brahnam and Alessandra Lumini

Table 4: Performance comparison of some ensembles compared with stand-alone

approaches and previous results.

DATASET MR1 MR2 MR3 (Loris Nanni MR3+(Loris MR4 MR5 1D

et al., 2012) Nanni et al.,
breast 99.2 99.2 99.4 97.4 99.3 99.3 99.4 99.3
heart 90.2 90.3 89.9 90.1 90.4 90.5 90.5 89.5
pima 80.8 80.9 81.8 71.9 81.3 82.3 82.5 82.4
sonar 94.3 94.3 93.0 92.8 93.2 95.4 95.6 95.2
iono 98.4 98.4 98.2 98.4 98.4 98.3 98.2 98.1
liver 73.9 73.7 74.8 70.3 73.6 76.2 75.8 75.6
hab 67.6 67.8 69.0 59.2 65.8 70.0 69.1 70.1
vote 97.7 97.7 97.7 97.7 97.7 98.5 98.5 98.5
aust 91.7 91.7 92.0 90.8 91.7 92.1 92.4 92.0
trans 67.2 69.5 70.6 61.9 65.8 72.5 73.0 72.9
wdbc 99.5 99.5 99.5 98.8 99.5 99.6 99.6 99.6
bCI 96.1 96.4 96.2 97.0 96.8 96.3 96.4 95.6
pap 86.1 87.2 87.3 88.0 87.5 87.5 87.4 86.8
torn 94.1 94.5 94.6 93.6 94.7 94.2 94.5 90.2
gCr 77.2 78.4 79.7 78.9 79.7 80.7 80.7 80.1
Average 87.6 88.0 88.2 85.8 87.7 88.9 88.9 88.4

This study expands our previous research in this area. First, it investigates different
methods for matrix representation in pattern classification. We found that approaches
based on FFT worked best. Second, we explored the value of using different texture
descriptors to extract a high performing set of features. Finally, we tested the
generalizability of our new approach across several datasets representing different
classification problems. The results of our experiments showed that our methods
outperformed SVMs trained on the original 1D feature sets.
Because each pixel in a texture describes a pattern that is extracted starting from the
original feature, we were also motivated to investigate the correlation among the original
features belonging to a given neighborhood. Thus, we studied the correlation among
different sets of features by extracting images from each pattern and then randomly
Texture Descriptors for The Generic Pattern Classification Problem 117

sorting the features of the original pattern before the matrix generation process. This
simple method also resulted in improved performance.
In the future we plan on studying the potential of improving performance of the
proposed approach by fusing the different texture descriptors.


Beymer, D., & Poggio, T. (1996). Image representations for visual learning. Science,
272(5270), 1905-1909.
Chan, C., Tahir, M., Kittler, J., & Pietikainen, M. (2013). Multiscale local phase
quantisation for robust component-based face recognition using kernel fusion of
multiple descriptors. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 35(5), 1164-1177.
Chen, S., Wang, Z., & Tian, Y. (2007). Matrix-pattern-oriented ho-kashyap classifierwith
regularization learning. Pattern Recognition, 40(5), :1533–1543.
Chen, S. C., Zhu, Y. L., Zhang, D. Q., & Yang, J. (2005). Feature extraction approaches
based on matrix pattern: MatPCA and MatFLDA. Pattern Recognition Letters, 26,
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection.
Paper presented at the 9th European Conference on Computer Vision, San Diego,
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal
of Machine Learning Research, 7 1-30.
Eustice, R., Pizarro, O., Singh, H., & Howland, J. (2002). UWIT: Underwater image
toolbox for optical image processing and mosaicking in MATLAB. Paper presented at
the International Symposium on Underwater Technology, Tokyo, Japan.
Felzenszwalb, P., & McAuley, J. (2011). Fast inference with min-sum matrix product.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2549-
Guo, Z., Zhang, L., & Zhang, D. (2010). A completed modeling of local binary pattern
operator for texture classification. IEEE Transactions on Image Processing, 19(6),
1657-1663 doi: 10.1109/TIP.2010.2044957
Jantzen, J., Norup, J., Dounias, G., & Bjerregaard, B. (2005). Pap-smear benchmark data
for pattern classification. Paper presented at the Nature inspired Smart Information
Systems (NiSIS), Albufeira, Portugal.
Junior, G. B., Cardoso de Paiva, A., Silva, A. C., & Muniz de Oliveira, A. C. (2009).
Classification of breast tissues using Moran's index and Geary's coefficient as texture
signatures and SVM. Computers in Biology and Medicine, 39(12), 1063-1072.
118 Loris Nanni, Sheryl Brahnam and Alessandra Lumini

Kim, C., & Choi, C.-H. (2007). A discriminant analysis using composite features for
classification problems. Pattern Recognition, 40(11), 2958-2966.
Li, D., Zhu, Y., Wang, Z., Chong, C., & Gao, D. (2016). Regularized matrix-pattern-
oriented classification machine with universum. Neural Processing Letters.
Li, J., Janardan, R., & Li, Q. (2002). Two-dimensional linear discriminant analysis.
Advances in neural information processing systems, 17, 1569-1576.
Lichman, M. (2013). UCI Machine Learning Repository. (
~mlearn/MLRepository.html). Irvine, CA.
Liu, J., & Chen, S. C. (2006). Non-iterative generalized low rank approximation of
matrices. Pattern Recognition Letters, 27(9), 1002-1008.
Nanni, L. (2011). Texture descriptors for generic pattern classification problems. Expert
Systems with Applications, 38(8), 9340-9345.
Nanni, L., Brahnam, S., & Lumini, A. (2012). Matrix representation in pattern
classification. Expert Systems with Applications, Appl. 39.3, 3031-3036.
Nanni, L., & Lumini, A. (2008). A reliable method for cell phenotype image
classification. Artificial Intelligence in Medicine, 43(2), 87-97.
Ojala, T., Pietikainen, M., & Maeenpaa, T. (2002). Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 24(7), 971-987.
Ojansivu, V., & Heikkila, J. (2008). Blur insensitive texture classification using local
phase quantization. Paper presented at the ICISP.
Seung, D., & Lee, L. (2001). Algorithms for non-negative matrix factorization. Advances
in neural information processing systems, 13, 556-562.
Trafalis, T. B., Ince, H., & Richman, M. B. (2003). Tornado detection with support
vector machines. Paper presented at the International Conerence on Computational
Science, Berlin and Heidelberg.
Wang, H., & Ahuja, N. (2005). Rank-r approximation of tensors using image-as-matrix
representation. Paper presented at the IEEE Computer Society conference on
computer vision and pattern recognition,.
Wang, Z., & Chen, S. C. (2008). Matrix-pattern-oriented least squares support vector
classifier with AdaBoost. Pattern Recognition Letters, 29, 745-753.
Wang, Z., Chen, S. C., Liu, J., & Zhang, D. Q. (2008). Pattern representation in feature
extraction and classification-matrix versus vector. IEEE Transactions on Neural
Networks, 19(758-769).
Weston, J., Collobert, R., Sinz, F., Bottou, L., & Vapnik, V. (2006). Inference with the
universum. Paper presented at the International conference on machine learning.
Yang, J., Zhang, D., Frangi, A. F., & Yang, J. U. (2004). Two-dimension pca: A new
approach to appearance-based face representation and recognition. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 26(1), 131-137.
Texture Descriptors for The Generic Pattern Classification Problem 119

Zhang, D., Jing, X., & Yang, J. (2006). Biometric image discrimination technologies.
Hershey: Idea Group Publishing.


Dr. Loris Nanni is an Associate Professor at Department of Information Engineering

of the University of Padua. Loris Nanni carries out research at DEI-university of Padua in
the fields of biometric systems, pattern recognition, machine learning, image databases,
bioinformatics. He extensively served as referee for international journals (IEEE
Transactions on Pattern Analysis and Machine Intelligence, Pattern Recognition,
Bioinformatics; BMC Bioinformatics, Pattern Recognition Letters) and projects. He is
coauthor of more than 200 research papers. He has a H-index of 38 and more than 4815
citations (Google Scholar).

Dr. Sheryl Brahnam is a Professor of Computer Information Systems at Missouri

State University. Her research interests include decision support systems, artificial
intelligence and computer vision, cultural, ethical and rhetorical aspects of technology,
and conversational agents (chatterbots and artificial humans).

Dr. Alessandra Lumini is an Associate Researcher at Department of Computer

Science and Engineering (DISI) of University of Bologna. She received a degree in
Computer Science from the University of Bologna, Italy, on March 26th 1996. In 2001
she received the Ph.D. degree for her work on "Image Databases". Now she is an
Associate Researcher at University of Bologna. She is a member of the Biometric
Systems Lab and of the Smart City Lab. She is interested in Biometric Systems, Pattern
Recognition, Machine Learning, Image Databases, Multidimensional Data Structures,
Digital Image Watermarking, Bioinformatics.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 6



Alfonso T. Sarmiento1,* and Edgar Gutierrez2

Program of Industrial Engineering,
University of La Sabana, Chía, Colombia
Center for Latin-American Logistics Innovation,
Bogota, Colombia


This chapter proposes the solution of an optimization problem based on the concept
of the accumulated deviations from equilibrium (ADE) to eliminate instability in the
supply chain. The optimization algorithm combines the advantage of particle swarm
optimization (PSO) to determine good regions of the search space to find the optimal
point within those regions. The local search uses a Powell hill-climbing (PHC) algorithm
as an improved procedure to the solution obtained from the PSO algorithm, which assures
a fast convergence of the ADE. The applicability of the method is demonstrated by using
a case study in the manufacturing supply chain. The experiments showed that solutions
generated by this hybrid optimization algorithm were robust.

Keywords: particle swarm optimization, instability, hybrid optimization

Corresponding Author Email:
122 Alfonso T. Sarmiento and Edgar Gutierrez


During the last decade, manufacturing enterprises have been under pressure to
compete in a market that is rapidly changing due to global competition, shorter product
life cycles, dynamic changes of demand patterns and product varieties and environmental
standards. In these global markets, competition is ever increasing and companies are
widely adopting customer-focused strategies in integrated-system approaches. In
addition, push manufacturing concepts are being replaced by pull concepts and notions of
quality systems are getting more and more significant.
Policy analysis as a method to generate stabilization policies in supply chain
management (SCM) can be addressed by getting a better understanding of the model
structure that determines the supply chain (SC) behavior. The main idea behind this
structural investigation is that the behavior of a SC model is obtained by adding
elementary behavior modes. For linear models the eigenvalues represent these different
behavior modes the superposition of which gives rise to the observed behavior of the
system. For nonlinear systems the model has to be linearized at any point in time. Finding
the connection between structure and behavior provides a way to discover pieces of the
model where to apply policies to eliminate instabilities. However, other techniques are
required to determine the best values of the parameters related to the stabilization policy.
This work is motivated by the large negative impacts of supply chain instabilities.
Those impacts occur because instabilities can cause (1) oscillations in demand forecasts,
inventory levels, and employment rates and (2) unpredictability in revenues and profits.
These impacts amplify risk, raise the cost of capital, and lower profits. Modern enterprise
managers can minimize these negative impacts by having the ability to determine
alternative policies and plans quickly.
Due to the dynamic changes in the business environment, managers today rely on
decision technology1 more than ever to make decisions. In the area of supply chain, the
top projected activities where decision technology applications have great potential of
development are planning, forecasting, and scheduling (Poirier and Quinn, 2006).
This chapter presents a methodology that proposes a hybrid scheme for a policy
optimization approach with PSO to modify the behavior of entire supply chains in order
to achieve stability.

Policy Optimization

The policy optimization process uses methods based on mathematical programming

and algorithmic search to find an improved policy. Several optimization methods have

Decision technology adds value to network infrastructure and applications by making them smarter.
Simulation Optimization Using a Hybrid Scheme … 123

been used to obtain policies that modify system behavior. Burns and Malone (1974)
expressed the required policy as an open-loop solution (i.e., the solution function has not
the variables from the system). The drawback of this method is that if the system
fluctuates by some little impact, the open loop solution without information feedback
cannot adjust itself to the new state. Keloharju (1982) proposed a method of iterative
simulation where each iteration consists of a parameter optimization. He suggests
predefining the policy structure by allowing certain parameters of the model to be
variables and by adding new parameters. However, the policies obtained with
Keloharju’s method are not robust when subject to variations of external inputs because
the policy structure was predefined and thereafter optimized (Macedo, 1989). Coyle
(1985) included structural changes to the model, and applies the method to a production
Kleijnen (1995) presented a method that includes design of experiments and response
surface methodology for optimizing the parameters of a model. The approach treats
system dynamics (SD) as a black box, creating a set of regression equations to
approximate the simulation model. The statistical design of experiments is applied to
determine which parameters are significant. After dropping the insignificant parameters,
the objective function is optimized by using the Lagrange multiplier method. The
parameter values obtained through the procedure are the final solution. Bailey et al.
(2000) extended Kleijnen’s method by using response surfaces not to replace the
simulation models with analytic equations, but instead to direct attention to regions
within the design space with the most desirable performance. Their approach identifies
the exploration points surrounding the solution of Kleijnen’s method and the finds a set
of real best combination of parameters from them (Chen and Jeng, 2004).
Grossmann (2002) used genetic algorithms (GA) to find optimal policies. He
demonstrates his approach in the Information Society Integrated System Model where he
evaluates different objective functions. Another method that uses genetic algorithms to
search the solution space is the one proposed by Chen and Jeng (2004). First, they
transform the SD model into a recurrent neural network. Next, they use a genetic
algorithm to generate policies by fitting the desired system behavior to patterns
established in the neural network. Chen and Jeng claim their approach is flexible in the
sense that it can find policies for a variety of behavior patterns including stable
trajectories. However, the transformation stage might become difficult when SD models
reach real-world sizes.
In optimal control applied to system dynamics, Macedo (1989) introduced a mixed
approach in which optimal control and traditional optimization are sequentially applied in
the improvement of the SD model. Macedo’s approach consists principally of two
models: a reference model and a control model. The reference model is an optimization
model whose main objective is to obtain the desired trajectories of the variables of
interest. The control model is an optimal linear-quadratic control model whose
124 Alfonso T. Sarmiento and Edgar Gutierrez

fundamental goal is to reduce the difference between the desired trajectories (obtained by
solving the reference model) and the observed trajectories (obtained by simulation of the
system dynamic model).

Stability Analysis of the Supply Chain

The main objective in stability analysis is to determine whether a system that is

pushed slightly from an equilibrium state (system variables do not change over time) will
return to that state. If for small perturbations or disturbances from the equilibrium state
the system always remains within a finite region surrounding that state, then this
equilibrium state is stable. However, if a system tends to continue to move away from its
original equilibrium state when perturbed from it, the system is unstable.
Sterman (2006) stated that “supply chain instability is a persistent and enduring
characteristic of market economies.” As a result, company indicators such as demand
forecast, inventory level, and employment rate show an irregular and constant fluctuation.
Supply chain instability is costly because it creates “excessive inventories, poor customer
service, and unnecessary capital investment” (Sterman, 2006).
In dynamic complex systems like supply chains, a small deviation from the
equilibrium state can cause disproportionately large changes in the system behavior, such
as oscillatory behavior of increasing magnitude over time. The four main contribution
factors to instability in SC have been identified by Lee et al. (1997), which are:

 Demand forecast updating: when companies throughout the SC do not share

information about demand, this have to be forecasted with the possible cause of
information distortion.
 Order batching: this means a company ordering a large quantity of a product in
one week and not ordering any for many weeks, which will cause distortion on
the demand forecast of other members of the SC, because it is based on orders
rather than actual sales.
 Shortage gaming: when a product demand exceeds supply, a manufacturer often
rations its product to customers, which will cause that customers exaggerate their
orders to ensure that they receive enough amount of the required product.
 Price fluctuations: when the price of a product changes significantly, customers
will purchase the product when it is cheapest, causing them to buy in bulk (order
batching problem).

The stability of supply chains models can be analyzed using the vast theory of linear
and nonlinear dynamic systems control. Disney et al. (2000) described a procedure for
optimizing the performance of an industrially design inventory control system. They
Simulation Optimization Using a Hybrid Scheme … 125

quantify five desirable characteristics of a production distribution system by drawing in

classical control techniques for use in a modern optimization procedure based on GA.
They demonstrate that their procedure can improve the performance of a production or
distribution control system by fully understanding the trade-off between inventory levels
and factory orders. Riddalls and Bennett (2002) study the stability properties of a
continuous time version of the Beer Distribution Game.
They demonstrate the importance of robust stability, i.e., stability for a range a
production/distribution delays, and how stock outs in lower echelons can create vicious
circle of unstable influences in the supply chain. Nagatani and Helbing (2004) studied
several production strategies to stabilize supply chains, which is expressed by different
specifications of the management function controlling the production speed in
dependence of the stock levels. They derive linear stability conditions and carry out
simulations for different control strategies. Ortega and Lin (2004) showed that control
theory can be applied to the production-inventory problem to address issues such as
reduction of inventory variation, demand amplification, and ordering rules optimization.
Linearization is frequently the quickest and easiest way to determine stability of an
equilibrium point (EP) for a nonlinear system. The linearization approach of nonlinear
systems can be used to extend the stability concepts for linear systems (eigenvalue
analysis2) to equilibrium points of nonlinear systems in which deviation from linear
behavior can be presumed small. Mohapatra and Sharma (1985) applied modal control to
analyze and improve a SD model of a manufacturing company that has two departments:
manufacturing and distribution. The eigenvalues of the motion equations are used to
synthesize new policy options. The main strength of using modal control theory is that
new policy structures can be generated mathematically. Drawbacks of modal control
theory include the amount of computation, and the design of realistic policies from the
synthetically generated policies.
Control theory has been combined with other approaches to determine stability
conditions. Daganzo (2004) examined the stability of decentralized, multistage supply
chains under arbitrary demand conditions. He uses numerical analysis for conservation
laws to design stable policies. His research looks for intrinsic properties of the inventory
replenishment policies that hold for all customer demand processes and for policies with
desirable properties.
He discovers that a simple necessary condition for the bullwhip avoidance is
identified in terms of a policy’s gain. Gain is defined as the marginal change in average
inventory induced by a policy where there is a small but sustained change in demand rate.
It is shown that all policies with positive gain produce the bullwhip effect if they do not
use future order commitments. Perea et al. (2000) proposed an approach for SCM that

Eigenvalues in the right half of the complex plane cause instability, whereas eigenvalues in the left half of the
complex plane determine stable systems.
126 Alfonso T. Sarmiento and Edgar Gutierrez

relies on dynamic modeling and control theory. The approach is based on two elements, a
framework to capture the dynamics of the SC, and on the design of methodical
procedures defined by control laws to manage the SC. They test several heuristic control
laws and analyze their impact on the behavior of the SC.
Model structural analysis methods have also been used to eliminate oscillatory
behavior in SC models.
Lertpattarapong (2002) and Gonçalves (2003) used eigenvalue elasticity analysis to
identify the loops that are responsible for the oscillatory behavior of the inventory in the
SC. Then they use the insights about the impact of feedback structures on model behavior
to propose policies for stabilizing the system. These policies are based on inventory
buffers or safety stock. Saleh et al. (2006) used the Behavior Decomposition Weights
(BDW) analysis to identify relevant parameters that stabilize the inventory fluctuations in
a linear inventory-force model. To explore the utility of the method in a SD nonlinear
model they choose a medium-size economic model. In order to perform the BDW
analysis, they linearize the model at a point in time, once the eigenvalues have become
stable. The method provides a partial policy analysis as it studies the effects of changing
individual policy parameters. Currently, the method does not consider the interactions
due to changes in several parameters simultaneously.
Forrester (1982) presented several policies for stabilizing dynamic systems. The first
two approaches, reduction of the frequency of oscillations and increment in the rate decay
of oscillations, represent a measure of behavior of the whole system and are covered by
the linear system control theory. Other methods such as variance reduction and gain
reduction are focused on the stability of a particular variable of the system. Therefore,
they have to be extended to implement stabilizing policies of the entire system.
Policy optimization provides an efficient method for obtaining SC stabilization
policies. O’Donnell et al. (2006) employed GA to reduce the bullwhip effect and cost in
the MIT Beer Distribution Game. The GA is used to determine the optimal ordering
policy for members of the SC. Lakkoju (2005) uses a methodology for minimizing the
oscillations in the SC based on SD and GA. He applies the variance reduction criterion
proposed by Forrester to stabilize the finished goods inventory of an electronics
manufacturing company.
The literature review on stability analysis of the SC shows that several techniques
have been used to generate stabilization policies. Model structural analysis methods can
provide some insights into how to tackle the behaviors that generate instability of supply
chains modeled as dynamic systems through the identification of the loops responsible
for them. However, these methods rely on sensitivity analysis to design the stabilization
policies. Control theory can support the stabilization methodologies by providing
theoretical concepts to stabilize dynamics systems. One problem with the approaches
based on control theory is the mathematics involved to determine the analytical solution.
Moreover, like the model structural analysis methods, they can require certain
Simulation Optimization Using a Hybrid Scheme … 127

simplifications, such as the linearization of the system (Dangerfield and Roberts, 1996).
On the other hand, policy optimization based on algorithmic search methods that use
simulation represent the most general mean for stability analysis of nonlinear systems,
due to its effectiveness in handling the general cases and most of special problems that
arise from nonlinearity. However, the objective functions are chosen to represent the
stability conditions to each model. The use of a generic objective function applied to
stabilize SC models independent of their linear or nonlinear structure has not been found
in the literature surveyed so far.


Optimization techniques based on evolutionary algorithms belong to the class of

direct search strategies, where every considered solution is rated using the objective
function values only. Therefore, no closed form of the problem and no further analytical
information is required to direct the search process towards good or preferably optimal
elements of the search space. For that reason, evolutionary search strategies are well
suited for simulation optimization problems. Additionally, because of their flexibility,
ease of operation, minimal requirements and global perspective, evolutionary algorithms
have been successfully used in a wide range of combinatorial and continuous problems.
The first work in PSO is accredited to Eberhart and Kennedy (1995). Later Shi, made
a modified particle swarm optimizer (Shi and Eberhart, 1998) and was first proposed for
simulating social behavior (Kennedy, 1997). Recently, some comprehensive reviews on
theoretical and experimental works on PSO has been published by Bonyadi and
Michalewicz (2017) and Ab Wahab (2015).
Particle swarm optimization is an algorithm that finds better solutions for a problem
by iteratively trying to improve a candidate solutions comparing with a given measure of
quality. It solves a problem by having a population of candidate solutions, called
particles, and moving these particles in the search-space giving a mathematical formula
over the particle's position and velocity. Some limitations of PSO have been identified by
Bonyadi and Michalewicz (2017). They classify the limitations related to convergence in
PSO into groups: convergence to a point (also known as stability), patterns of
movements, convergence to a local optimum, and expected first hitting time.
PSO performs a population-based search to optimize the objective function. The
population is composed by a swarm of particles that represent potential solutions to the
problem. These particles, which are a metaphor of birds in flocks, fly through the search
space updating their positions and velocities based on the best experience of their own
and the swarm. The swarm moves in the direction of “the region with the higher objective
function value, and eventually all particles will gather around the point with the highest
objective value” (Jones, 2005).
128 Alfonso T. Sarmiento and Edgar Gutierrez

Among the advantages of PSO, it can be mentioned that PSO is conceptually simple
and can be implemented in a few lines of code. In comparison with other stochastic
optimization techniques like GA or simulated annealing, PSO has fewer complicated
operations and fewer defining parameters (Cui and Weile, 2005). PSO has been shown to
be effective in optimizing difficult multidimensional discontinuous problems in a variety
of fields (Eberhart and Shi, 1998), and it is also very effective in solving minimax
problems (Laskari et al. 2002). According to Schutte and Groenwold (2005), a drawback
of the original PSO algorithm proposed by Kennedy and Eberhart lies in that although the
algorithm is known to quickly converge to the approximate region of the global
minimum; however, it does not maintain this efficiency when entering the stage where a
refined local search is required to find the minimum exactly. To overcome this
shortcoming, variations of the original PSO algorithm that employ methods with adaptive
parameters have been proposed (Shi and Eberhart 1998, 2001; Clerc, 1999).
Comparison on the performance of GA and PSO, when solving different optimization
problems, is mentioned in the literature. Hassan et al. (2005) compared the performance
of both algorithms using a benchmark test of problems. The analysis shows that PSO is
more efficient than GA in terms of computational effort when applied to unconstrained
nonlinear problems with continuous variables. The computational savings offered by
PSO over GA are not very significant when used to solve constrained nonlinear problems
with discrete or continuous variables. Jones (2005) chose the identification of model
parameters for control systems as the problem area for the comparison. He indicates that
in terms of computational effort, the GA approach is faster, although it should be noted
that neither algorithm takes an unacceptably long time to determine their results.
With respect to accuracy of model parameters, the GA determines values which are
closer to the known ones than does the PSO. Moreover, the GA seems to arrive at its final
parameter values in fewer generations that the PSO. Lee et al. (2005) selected the return
evaluation in stock market as the scenario for comparing GA and PSO. They show that
PSO shares the ability of GA to handle arbitrary nonlinear functions, but PSO can reach
the global optimal value with less iteration that GA. When finding technical trading rules,
PSO is more efficient than GA too. Clow and White (2004) compared the performance of
GA and PSO when used to train artificial neural networks (weight optimization problem).
They show that PSO is superior for this application, training networks faster and more
accurately than GA does, once properly optimized.
From the literature presented above, it is shown that PSO combined with simulation
optimization is a very efficient technique that can be implemented and applied easily to
solve various function optimization problems. Thus, this approach can be extended to the
SCM area to search for policies using an objective function defined on a general
stabilization concept like the one that is presented in this work.
Simulation Optimization Using a Hybrid Scheme … 129


Hill-climbing methods are heuristics that use an iterative improvement technique and
are based on a single solution search strategy. These methods can only provide local
optimum values, and they depend on the selection of the starting point (Michalewicz and
Fogel, 2000). Some advantages of hill-climbing-based approaches include: (1) very easy
to use (Michalewicz and Fogel, 2000), (2) do not require extensive parameter tuning, and
(3) very effective in producing good solutions in a moderate amount of time (DeRonne
and Karypis, 2007).
The Powell hill-climbing algorithm was developed by Powell (1964) and it is a hill-
climbing optimization approach that searches the objective in a multidimensional space
by repeatedly using single dimensional optimization. The method finds an optimum in
one search direction before moving to a perpendicular direction in order to find an
improvement (Press et al. 1992). The main advantage of this algorithm lies in not
requiring the calculation of derivatives to find an unconstraint minimum of a function of
several variables (Powell, 1964). This allows using the method to optimize highly
nonlinear problems where it can be laborious or practically impossible to calculate the
derivatives. Moreover, it has been shown that a hybrid strategy that uses a local search
method such as hill-climbing can accelerate the search towards the global optimum,
improving the performance of the searching algorithm (Yin et al. 2006; Özcan & Yilmaz,


The method used to solve the optimization problem is a hybrid algorithm that
combines the advantage of PSO optimization to determine good regions of the search
space with the advantage of local optimization to find quickly the optimal point within
those regions. In other words, the local search is an improvement procedure over the
solution obtained from the PSO algorithm that assures a fast convergence of the ADE.
The local search technique selected was the Powell hill-climbing algorithm. This method
was chosen because: (1) it can be applied to solve multi-dimensional optimization
problems, (2) it is a relatively simple heuristic that does not require the calculation of
The general structure of the method is illustrated in Figure 1. This figure indicates
that the solution to the optimization problem obtained by the PSO algorithm becomes the
initial point to perform a local search using the PHC algorithm. Finally, if the ADE has
converged then the solution provided by the PHC method is the stabilization policy;
otherwise the parameter settings of the PSO algorithm have to be changed in order to
improve the search that makes ADE to converge.
130 Alfonso T. Sarmiento and Edgar Gutierrez

Figure 1. Optimization algorithm.

Global Search: PSO Algorithm

The algorithm used is called “local best PSO” (Engelbrecht, 2005) and is based on a
social network composed of neighborhoods related to each particle. The algorithm
maintains a swarm of particles, where each particle represents a candidate solution to the
optimization problem. These particles move across the search space communicating good
positions to each other within the neighborhood and adjusting their own position and
velocity based on these good positions. For this purpose, each particle keeps a memory of
its own best position found so far and the neighborhood best position among all the
neighbor particles. The goodness of a position is determined by using a fitness function.
The stopping condition of the algorithm is when the maximum number of iterations has
been exceeded.
The following empirical rules are recommended to guide the choice of selecting the
initial values for the parameters of the PSO algorithm.

Empirical rules for selecting the PSO parameters

Parameter Empirical rule of choice

Swarm size From 20 to 40 (Clerc, 2006)
Inertia weight In ]0,1[ (Shi and Eberhart, 1998)
Cognitive coefficient Suggestion 1.43 (Clerc, 2006)
Social coefficient Suggestion 1.43 (Clerc, 2006)
The steps of the algorithm are described in the following lines.

Step 1) Initialization:
 Set iteration k=0
Simulation Optimization Using a Hybrid Scheme … 131

 Generate N particles pi (0) = [ pi1(0), pi 2 (0),.., pin (0)] , i = 1,..,N; where

pij(0) is randomly selected according to a uniform distribution in the interval
[ p j , p j ] , j = 1,..,np (a particle i is represented by a np-dimensional real-

valued vector pi).

 Generate velocities v i (0) = [0, 0,..,0] , i = 1,..,N.
 Evaluate the fitness of each particle using J(pi(0)), i = 1,..,N.
 Set the initial value of the personal best position vector as y i (0) = pi (0) ,
i = 1,..,N.
 Determine the neighborhood best position vector ŷ i (0) using the formula

J(yˆ i (0)) = min{J(y j (0))}, j ∈Bi , where Bi defines the set of indexes for the

particles neighbors.
 Determine the global best position g (0) using the formula
J(g(0)) = min{J( y i (0))}, i = 1,.., N .

 Set the initial value of the inertia weight w(0) . Set k’ = 0.

Step 2) Iteration updating: Set k = k + 1.
Step 3) Weight updating: If k-1-k’ ≥ iteration_lag then update the inertia weight using:

w(k ) = w(k ' ) .

Step 4) Velocity updating: Calculate the velocity of particle i by using:

vi (k ) = w(k ) vi (k - 1) + c1r1 (k )[y i (k ) - pi (k )] + c2r2 (k )[yˆ i (k ) - pi (k )]

Step 5) Position updating: Based on the updated velocities, each particle changes its
position according to the following equation:

pi (k ) = v i (k ) + pi (k - 1)

Step 6) Personal best updating: Determine the personal best position visited so far by
each particle:
 Evaluate the fitness of each particle using J(pi(k)), i = 1,..,N.
y (k  1) if J(p i (k  1))  J(y i (k - 1))
 Set y i (k )   i
 p i (k ) if J(p i (k))  J(y i (k - 1))
132 Alfonso T. Sarmiento and Edgar Gutierrez

Step 7) Neighborhood best updating: Determine the neighborhood best position ŷ i (k)
visited so far by the whole swarm by using the formula

J( yˆ i (k )) = min{J( y j (k ))}, j ∈Bi

Step 8) Global best updating: Determine the global best position g (k) visited so far by
the whole swarm by using the formula

J(g(k)) = min{J( y i (k ))} , i = 1,..,N.

If J(g(k)) < J(g(k - 1)) then set k’ = k

Step 9) Stopping criteria: If the maximum number of iterations is achieved then stop,
g* = g(k) is the optimal solution; otherwise go to step 2.

Local Search: Powell Hill-Climbing Algorithm

PHC method basically uses one-dimensional minimization algorithms to solve multi-

dimensional optimization problems. The procedure searches into a region by constructing
a set of linearly independent, mutually “non-interfering” or conjugate search directions
and applies linear minimization to move into each direction (Press et al. 1992). The
number of conjugate directions coincides with the dimension of the search space and
their linear independence guarantees the whole search space can be covered. The use of
conjugate directions has the advantage that minimization in one direction is not interfered
by subsequent minimization along another direction, avoiding endless cycling through
the set of directions.
The steps of the algorithm are described in the following lines:

Step 1) Initialization:
 Set iteration k = 0
 Set the initial search point Z0 = [z1 , z 2 ,..,z n ] as the optimal solution of the

PSO algorithm, i.e., Z0 = g *

 Initialize directions ud to the basis vectors, i.e., ud = ed, d = 1,..,np, where
e1 = [1, 0,..,0], e2 = [0, 1,..,0],...,en = [0, 0,..,1]

Step 2) Define the iteration start point: Set S0 = Z k

Simulation Optimization Using a Hybrid Scheme … 133

Step 3) Minimize objective function along direction ud

For every direction d = 1,..,np
 Find the value  d that minimizes J(Sd -1 +  d u d )
 Set Sd = Sd-1 +  d u d
Step 4) Update directions
 Set ud = ud+1, d = 1,..,np-1
 u np = S np - S0
Step 5) Iteration updating: Set k = k + 1.
Step 6) Minimize objective function along direction u n p
 Find the value  that minimizes J(S0 + u n p )
 Set Zk = S0 + u n p

Step 7) Stopping criteria: If J(Zk ) > J(Zk 1 ) then stop, Z*k is the optimal solution;
otherwise go to step 2.


PMOC Technologies Inc. is a manufacturer of optical solutions for medical,

industrial, communications, defense, test, and measurement applications. The precision
molded optics (PMO) process produces lenses for industrial laser and other optical
applications and is the focus of the simulation model.
PMOC Inc. has built its reputation on providing customized products to long-term
customers who have designed their equipment to use PMOC lenses. Lenses make up to
65% of the company’s operations. It has a stable customer base of around 1,700
customers that are willing to pay relatively higher than traditional market prices. This has
helped PMOC Inc. maintain a stable market share over the past few years despite using
an old manufacturing technology with limited capacity. Due to relatively long term plan
to move the lenses operations to Asia, the company desires to continue serving its
customer base using existing workers and overtime.
The company depends on its stable base of customers who continue to rely on PMOC
specially designed lenses until they upgrade to new technologies. The company however,
should minimize expenses in the form of scrape and maintain stable operations. The goal
of management is to find a policy that avoids large oscillations in the inventory if
expected increase of customer orders on regular types of lenses occurs.
134 Alfonso T. Sarmiento and Edgar Gutierrez

SD Model

The nonlinear SD model used in this case study is a subsystem of the enterprise
system developed by Helal (2008). It is focused on the production process of PMOC and
is composed by the following submodels: (1) supplier submodel, (2) labor management
submodel and (3) internal supply chain submodel. These submodels are described and
depicted below.
The supplier submodel (Figure 2) represents how the capacity of the supplier affects
the rate at which the company orders raw materials (Parts Order Rate). To simplify the
model it is assumed that only one supplier provides raw materials to PMOC. The state
variables of this model are Supplier Production Capacity and Supplier Order Backlog.
The labor management submodel (Figure 3) estimates the required capacity level
(including overtime when necessary) based on the production rate obtained from the
production planning. The opening positions for recruiting new workers are represented in
the state variable Labor Being Recruited. Labor being recruited moves to become Labor
(get hired) after some hiring delay, according to the Labor Hiring Rate. Similarly, Labor
can be fired o leave voluntarily the company at the Labor Firing Rate.

Figure 2. PMOC model: Supplier submodel.

Simulation Optimization Using a Hybrid Scheme … 135

Figure 3. Labor management submodel.

The internal supply chain submodel consists of two overlapping constructs. The first
construct is the materials ordering and inventory. The state variables for this part of the
model are Parts on Order, and Parts Inventory. The usage rate of parts (raw material)
being taken from Parts Inventory, to be converted into semi finished products (WIP
inventory) is given by the Production Start Rate. The second construct is the production
planning. This part of the model regulates the WIP inventory at the Preforms and Presses
departments to ensure smooth production rate and the availability of the final products for
shipping. The state variables of this part of the model are Preforms WIP and Presses WIP
and Finished Goods Inventory.

Current Policy and SC Instability

The set of parameters in Table 1 defines the current policy for this supply chain.

Table 1. Parameter values for the current policy

Parameter Value Unit

Desired Days Supply of Parts Inventory 2 Weeks
Time to Correct Parts Inventory 1 Weeks
Preforms Cycle Time 3 Weeks
Presses Cycle Time 3 Weeks
Time to Correct Inventory 1 Weeks
Supplier Delivery Delay 2 Weeks
Time to Adjust Labor 1 Weeks
Labor Recruiting Delay 5 Weeks
136 Alfonso T. Sarmiento and Edgar Gutierrez

18,000 Units
23,000 Units
6,000 Units
37 People

16,000 Units
21,000 Units
2,000 Units
20 People
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)

Preforms WIP Level Units

Presses WIP Level Units
Finished Goods Inventory Units
Labor People

Figure 4. Behavior of variables of interest for the current policy.

For a customer order rate of 5,000 units/week the system starts out of equilibrium.
The behavior of the four variables of interest is depicted in Figure 4. Variables Preforms
WIP Level, Presses WIP Level and Labor have several oscillatory fluctuations. Variable
Finished Goods Inventory is starting to settle down, although it has not reach equilibrium
A new policy to minimize these oscillations will be determined by solving the
optimization problem presented in the next section.

Optimization Problem

This optimization problem considers the simultaneous stabilization of the following

state variables: Preforms WIP Level, Presses WIP Level, Finished Goods Inventory and
Labor according to the equations described in section 3.1.2.
Let x1 = Preforms WIP Level, x2 = Presses WIP Level, x3 = Finished Goods
Inventory, x4 = Labor
Let ai = the new equilibrium point associated to the ith state variable (i = 1,..,4)
The following weights were assigned: w1 = 0.4, w2 = 0.4, w3 = 0.1, w4 = 0.1 to
represent the concern of management in the inventory and considering that variables x1
and x2 exhibit higher oscillations. The time horizon (T) considered was 30 weeks.
2  30
 4  30 
Minimize J(p)   0.4  x s (t )  a s dt    0.1 x s ( t )  a s dt 
s 1   s 3  0 

Subject to
Simulation Optimization Using a Hybrid Scheme … 137

 (t ) = f (x(t ), p)
x (This notation represents the SD model equations)
x 0 (Vector with initial values of all state variables)
0.5 ≤ Desired Days Supply of Parts Inventory ≤ 5
0.5 ≤ Time to Correct Parts Inventory ≤ 5
0.5 ≤ Preforms Cycle Time ≤ 3
0.5 ≤ Presses Cycle Time ≤ 3
0.5 ≤ Time to Correct Inventory ≤ 5
0.5 ≤ Supplier Delivery Delay ≤ 5
0.5 ≤ Time to Adjust Labor ≤ 5
0.5 ≤ Labor Recruiting Delay ≤ 5
5000 ≤ a1 ≤ 50000
5000 ≤ a2 ≤ 50000
1000 ≤ a3 ≤ 50000
10 ≤ a4 ≤ 100

Stabilization Policy

The stabilization policy is obtained after solving the optimization problem presented
in the previous section. The optimization algorithm was run at time 0 using the following
settings: swarm size = 30 particles, neighborhood size = 3 particles, initial inertia
weight = 0.5, iteration lag = 5, cognitive coefficient = 1.2, social coefficient = 1.2. The
time to obtain the optimal policy (after 150 PSO iterations and 1,243 PHC iterations) was
89 seconds.

9,300 Units
22,000 Units
10,000 Units
60 People

6,000 Units
10,000 Units
2,500 Units
20 People
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)
Preforms WIP Level Units
Presses WIP Level Units
Finished Goods Inventory Units
Labor People

Figure 5. Behavior of variables of interest for the stabilization policy.

138 Alfonso T. Sarmiento and Edgar Gutierrez

The solution yielded the results shown in Table 2. This table also includes parameters
a1, a2, a3, a4 which are the new equilibrium points for the state variables of interest.
Figure 5 shows the behavior of the state variables when this revised policy is applied.
The system has reached equilibrium approximately in 9 weeks (response time). This
figure also shows that the convergence of ADE has caused the asymptotic stability of the
four variables of interest. This was achieved mainly by increasing the parameter values
Desired Days Supply of Parts Inventory, Time to Correct Parts Inventory and Supplier
Delivery Delay and decreasing several other parameter values including Labor Recruiting
Delay, Preforms Cycle Time, and Presses Cycle Time.

Table 2. Parameter values for the stabilization policy

Parameter Value Unit

Desired Days Supply of Parts Inventory 3.46 Weeks
Time to Correct Parts Inventory 2.79 Weeks
Preforms Cycle Time 1.36 Weeks
Presses Cycle Time 1.70 Weeks
Time to Correct Inventory 1.47 Weeks
Supplier Delivery Delay 2.93 Weeks
Time to Adjust Labor 1.24 Weeks
Labor Recruiting Delay 0.5 Weeks
a1 (EP for Preforms WIP Level) 8828 Units
a2 (EP for Presses WIP Level) 13739 Units
a3 (EP for Finished Goods Inventory) 3275 Units
a4 (EP for Labor) 44 People





0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)

Maximum Capacity units/week

Figure 6. Maximum capacity of lenses manufacturing department.

Simulation Optimization Using a Hybrid Scheme … 139

This stabilization policy has been reached using the maximum production capacity of
5,600 units/week as shown in Figure 6. This is due to the constraint in manpower in the
lenses manufacturing department.

Testing for Policy Robustness

To test the stabilization policy it is generated a sudden change in the customer order
rate in week 10. The values for the new EPs are shown in Table 3.

Table 3. Parameter values for the stabilization policy

Percentage change in New EP for Preforms New EP for Presses New EP for Finished
customer order rate WIP Level (Units) WIP Level (Units) Goods Inventory (Units)
-15% 8377 13178 3045
-10% 8789 13691 3256
-5% 8828 13739 3275
+10% 8828 13739 3275

The customer order rate is increased or decreased to new levels calculated as a

percentage of its initial value. This is displayed in Figure 7. Moreover, Figures 8, 9 and
10 depict the robust behavior of the Preforms WIP Level, Presses WIP Level, and
Finished Goods Inventory variables to the changes in customer orders.


0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)

Customer Order Rate:-15%

Customer Order Rate:-10%
Customer Order Rate:-5%
Customer Order Rate:+10%

Figure 7. Changes in customer orders to test policy robustness.

140 Alfonso T. Sarmiento and Edgar Gutierrez


0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)

Preforms WIP Level:-15%

Preforms WIP Level:-10%
Preforms WIP Level:-5%
Preforms WIP Level:+10%

Figure 8. Behavior of Preforms WIP Level due to changes in customer orders.


0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)

Presses WIP Level:-15%

Presses WIP Level:-10%
Presses WIP Level:-5%
Presses WIP Level:+10%

Figure 9. Behavior of Presses WIP Level due to changes in customer orders.

The EP levels of the three inventory variables remain the same for a 10% increment
in customer orders. The reason is simple; the stabilization policy was reached by using
the maximum production capacity and orders over the original customer order rate are
considered backlog and therefore they do not affect the production rates and the stability.
Similarly, for a 5% decrease in customer orders, production is working close to
maximum capacity and the EPs remain the same. In the case where customer orders are
decreased by 10% and 15% the new EPs are reduced too but in a lower percentage that
the change in customer orders.
Simulation Optimization Using a Hybrid Scheme … 141

5,000 units
5,000 units
5,000 units
5,000 units

2,000 units
2,000 units
2,000 units
2,000 units
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)

Finished Goods Inventory:-15% units

Finished Goods Inventory:-10% units
Finished Goods Inventory:-5% units
Finished Goods Inventory:+10% units

Figure 10. Behavior of Finished Goods Inv. due to changes in customer orders.

Stability returns approximately 10 weeks and 16 weeks after the system was
disturbed (response time) for -10% and -15% decrease in customer orders respectively.
Amplifications are on the order of 1% under the EPs for both -10% and -15% decrease in
customer orders.


We propose a hybrid algorithm to obtain a quick convergence of the ADE. This

algorithm is based on a search engine that combines the advantage of PSO optimization
to determine the most promising regions of the search space and the properties of PHC
algorithm to accelerate locating the optimum that makes the ADE to convergence.
Although it is not required to find the global optimum to obtain a satisfactory reduction in
instability, our hybrid algorithm provides solutions that escape local convergence and
lead to stabilization polices with few oscillations and fast stability. This broader search to
find more effective stabilization policies is also possible due to the fact that we
incorporate a theorem that allows finding the best equilibrium levels that minimize the
We conclude that the convergence of the ADE generates stabilization policies that
are robust. To test robustness on these policies we produced a perturbation in the stable
system by changing the value of an exogenous variable. The results show that the
variables of interest reach new equilibrium points after a period of adaptation to the
alteration of the system. Moreover, perturbations generated by sudden changes produce
142 Alfonso T. Sarmiento and Edgar Gutierrez

amplifications before reaching new EPs. The experiments also show that in most cases
the change of level in the EPs is proportional to the change of the exogenous variable.


Ab Wahab, M., Nefti-Meziani, S., & Atyabi, A. (2015). A Comprehensive Review of

Swarm Optimization Algorithms. PLoS ONE, 10(5): e0122827. doi: 10.1371/
journal.pone. 0122827.
Bailey, R., Bras, B. & Allen, J. (2000). Using response surfaces to improve the search for
satisfactory behavior in system dynamics models. System Dynamics Review, 16(2),
Bonyadi, M. & Michalewicz, Z. (2017). Particle swarm optimization for single objective
continuous space problems: a review. Evolutionary computation, 25(1), 1–54.
Burns, J. & Malone, D. (1974). Optimization techniques applied to the Forrester model of
the world. IEEE Transactions on Systems, Man and Cybernetics, 4(2), 164–171.
Chen, Y. & Jeng, B. (2004). Policy design by fitting desired behavior pattern for system
dynamic models. In Proceedings of the 2004 International System Dynamics
Conference, Oxford, England.
Clerc, M. (1999). The swarm and the queen: towards a deterministic and adaptive particle
swarm optimization. In Proceedings of the 1999 IEEE Congress on Evolutionary
Computation, Washington, DC.
Clerc, M. (2006). Particle Swarm Optimization. Newport Beach, CA: ISTE Ltd.
Clow, B. & White T. (2004). An evolutionary race: a comparison of genetic algorithms
and particle swarm optimization used for training neural networks. In Proceedings of
the 2004 International Conference on Artificial Intelligence, Las Vegas, NV.
Coyle, R. (1985). The use of optimization methods for policy design in a system
dynamics model. System Dynamics Review, 1 (1), 81–91.
Cui, S. & Weile, D. (2005). Application of a novel parallel particle swarm optimization
to the design of electromagnetic absorbers. IEEE Antennas and Propagation Society
International Symposium, Washington, DC.
Dangerfield, B. & Roberts, C. (1996). An overview of strategy and tactics in system
dynamics optimization. The Journal of the Operational Research Society, 47(3),
Daganzo, C. F. (2004). On the stability of supply chains. Operations Research, 52(6),
DeRonne, K. & Karypis, G. (2007). Effective optimization algorithms for fragment-
assembly based protein structure prediction. Journal of Bioinformatics and
Computational Biology, 5(2), 335-352.
Simulation Optimization Using a Hybrid Scheme … 143

Disney, S., Naim, M. & Towill, D. R. (2000). Genetic algorithm optimization of a class
of inventory control systems. International Journal of Production Economics, 68(3),
Eberhart, R. & Kennedy, J. (1995). A new optimizer using particle swarm theory. In
Proceedings of the Sixth International Symposium on Micro Machine and Human
Science. Nagoya, Japan.
Eberhart, R. & Shi, Y. (1998). Evolving artificial neural networks. In Proceedings of the
1998 International Conference on Neural Networks and Brain, Beijing, China.
Engelbrecht, A. (2005). Fundamentals of Computational Swarm Intelligence. West
Sussex, England: John Wiley & Sons Ltd.
Forrester, N. (1982). A dynamic synthesis of basic macroeconomic theory: implications
for stabilization policy analysis. PhD. Dissertation, Massachusetts Institute of
Technology, Cambridge, MA.
Gonçalves, P. (2003). Demand bubbles and phantom orders in supply chains. PhD.
Dissertation, Sloan School of Management, Massachusetts Institute of Technology,
Cambridge, MA.
Grossmann, B. (2002). Policy optimization in dynamic models with genetic algorithms.
In Proceedings of the 2002 International System Dynamics Conference, Palermo,
Hassan, R., Cohanim, B., & de Weck, O. (2005). A comparison of particle swarm
optimization and the genetic algorithm. 46th AIAA/ASME/ASCE/AHS/ASC
Structures, Structural Dynamics and Materials Conference, Austin, TX.
Helal, M. (2008). A hybrid system dynamics-discrete event simulation approach to
simulating the manufacturing enterprise. PhD. Dissertation, University of Central
Florida, Orlando, FL.
Jones, K. (2005). Comparison of genetic algorithm and particle swarm optimisation.
International Conference on Computer Systems and Technologies, Technical
University, Varna, Bulgaria.
Keloharju, R. (1982). Relativity Dynamics. Helsinki: School of Economics.
Kennedy, J. (1997). The particle swarm: social adaptation of knowledge. In Proceedings
of the IEEE International Conference on Evolutionary Computation, Indianapolis,
Kennedy, J. & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the
IEEE International Conference on Neural Networks, Perth, Australia.
Kleijnen, J. (1995). Sensitivity analysis and optimization of system dynamics models:
regression analysis and statistical design of experiments. System Dynamics Review,
11(4), 275–288.
Lakkoju, R. (2005). A methodology for minimizing the oscillations in supply chain using
system dynamics and genetic algorithms. Master Thesis, University of Central
Florida, Orlando, FL.
144 Alfonso T. Sarmiento and Edgar Gutierrez

Laskari, E., Parsopoulos, K., & Vrahatis, M. (2002). Particle swarm optimization for
minimax problems. In Proceedings of the 2002 IEEE Congress on Evolutionary
Computation, Honolulu, HI.
Lee, H., Padmanabhan, V., & Whang, S. (1997). The bullwhip effect in supply chains.
MIT Sloan Management Review, 38(3), 93–102.
Lee, J., Lee, S., Chang, S. & Ahn, B. (2005). A Comparison of GA and PSO for excess
return evaluation in stock markets. International Work Conference on the Interplay
between Natural and Artificial Computation - IWINAC 2005.
Lertpattarapong, C. (2002). Applying system dynamics approach to the supply chain
management problem. Master Thesis, Sloan School of Management, Massachusetts
Institute of Technology, Cambridge, MA.
Macedo, J. (1989). A reference approach for policy optimization in system dynamic
models. System Dynamics Review, 5(2), 148–175.
Michalewicz, Z. & Fogel, D. (2000). How to solve it: modern heuristics. Berlin,
Germany: Springer.
Mohapatra, P. & Sharma, S. (1985). Synthetic design of policy decisions in system
dynamic models: a modal control theoretical approach. System Dynamics Review,
1(1), 63–80.
Nagatani, T. & Helbing, D. (2004). Stability analysis and stabilization strategies for
linear supply chains. Physica A, 335(3/4), 644–660.
O’Donnell, T., Maguire, L., McIvor, R. & Humphreys, P. (2006). Minimizing the
bullwhip effect in a supply chain using genetic algorithms. International Journal of
Production Research, 44(8), 1523–1543.
Ortega, M. & Lin, L. (2004). Control theory applications to the production-inventory
problem: a review. International Journal of Production Research, 42(11), 2303–
Özcan, E. & Yilmaz, M. (2007). Particle Swarms for Multimodal Optimization. (2007).
In Proceedings of the 2007 International Conference on Adaptive and Natural
Computing Algorithms, Warsaw, Poland.
Perea, E., Grossmann, I., Ydstie, E., & Tahmassebi, T. (2000). Dynamic modeling and
classical control theory for supply chain management. Computers and Chemical
Engineering, 24(2/7), 1143–1149.
Poirier, C. & Quinn, F. (2006). Survey of supply chain progress: still waiting for the
breakthrough. Supply Chain Management Review, 10(8), 18–26.
Press, W., Teukolsky, S., Vetterling, W. & Flannery, B. (1992). Numerical recipes in C:
the art of scientific computing. Cambridge, England: Cambridge University Press.
Powell, M. (1964). An efficient method for finding the minimum of a function of several
variables without calculating derivatives. The Computer Journal, 7(2), 155-162.
Riddalls, C. and Bennett, S. (2002). The stability of supply chains. International Journal
of Production Research, 40(2), 459–475.
Simulation Optimization Using a Hybrid Scheme … 145

Saleh, M., Oliva, R., Davidsen, P. & Kampmann, C. (2006). Eigenvalue analysis of
system dynamics models: another approach. In Proceedings of the 2006 International
System Dynamics Conference, Nijmegen, The Netherlands.
Schutte, J. & Groenwold, A. (2005). A study of global optimization using particle
swarms. Journal of Global Optimization, 31(1), 93–108.
Shi, Y. & Eberhart, R. (1998). A modified particle swarm optimizer. In Proceedings of
the 1998 IEEE International Conference on Evolutionary Computation, Piscataway,
Shi, Y. and Eberhart, R. (2001). Fuzzy adaptive particle swarm optimization. In
Proceedings of the 2001 IEEE International Conference on Evolutionary
Computation, Seoul, Korea.
Sterman, J. (2006). Operational and behavioral causes of supply chain instability, in The
Bullwhip Effect in Supply Chains. Basingstoke, England: Palgrave Macmillan.
Yin, P., Yu, S., Wang, P., & Wang, Y. (2006). A hybrid particle swarm optimization
algorithm for optimal task assignment in distributed systems. Computer Standards &
Interfaces, 28, 441-450.


Dr. Alfonso Sarmiento is an Associate Professor and Head of the Industrial

Processes Department at the Program of Industrial Engineering, University of La Sabana,
Colombia. He received his bachelor degree in Industrial Engineering from the University
of Lima, Perú. He earned a M.S. degree from the Department of Industrial and Systems
Engineering at the University of Florida. He obtained his PhD in Industrial Engineering
with emphasis in Simulation Modeling from the University of Central Florida. Prior to
working in the academia, Dr. Sarmiento had more than 10 years’ experience as a
consultant in operations process improvement. His current research focuses on supply
chain stabilization methods, hybrid simulation and enterprise profit optimization.

Edgar Gutierrez is a Research Affiliate at the Center for Latin-American Logistics

Innovation (CLI), a Fulbright Scholar currently pursuing his PhD in Industrial
Engineering & Management Systems at the University of Central Florida (UCF)
(Orlando, FL, USA). His educational background includes a B.S. in Industrial
Engineering from University of La Sabana (2004, Colombia). MSc. in Industrial
Engineering, from University of Los Andes (2008, Colombia) and Visiting Scholar at the
Massachusetts Institute of Technology (2009-2010, USA). Edgar has over 10 years of
academic and industry experience in prescriptive analytics and supply chain
management. His expertise includes machine learning, operation research and simulation
techniques for systems modelling and optimization.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 7



Djordje Cica1,* and Davorin Kramar2

Univeristy of Banja Luka, Faculty of Mechanical Engineering, Stepe Stepanovica
71, Banja Luka, Bosnia and Herzegovina
Univeristy of Ljubljana, Faculty of Mechanical Engineering, Askerceva 6,
Ljubljana, Slovenia


Accurate prediction of cutting forces is very essential due to their significant impacts
on product quality. During the past two decades, high pressure cooling (HPC) technique
is starting to be established as a method for substantial increase of productivity in the
metal cutting industry. This technique has proven to be very effective in machining of
hard-to-machine materials such as the nickel-based alloy Inconel 718, which is
characterized by low efficiency of machining process. However, modeling of cutting
forces under HPC conditions is very difficult task due to complex relations between large
numbers of process parameters such are pressure of the jet, diameter of the nozzle,
cutting speed, feed, etc. One of the ways to overcome such difficulty is to implement
models based on the artificial intelligence tools like artificial neural network (ANN),
genetic algorithm (GA), particle swarm optimization (PSO), fuzzy logic, etc. as an
alternative to conventional approaches. Regarding the feedforward ANN training, the

Corresponding Author Email:
148 Djordje Cica and Davorin Kramar

mostly used training algorithm is backpropagation (BP) algorithm. However, some

inherent problems frequently encountered in the use of this algorithm, such are risk of
being trapped in local minima and very slow convergence rate in training have initialized
development of bio-inspired based neural network models. The objective of this study
was to utilize two bio-inspired algorithm, namely GA and PSO, as a training methods of
ANN for predicting of cutting forces in turning of Inconel 718 assisted with high pressure
coolant. The results obtained from the GA-based and PSO-based ANN models were
compared with the most commonly used BP-based ANN for their performance. The
analysis reveals that training of ANN by using bio-inspired algorithms provides better
solutions in comparison to a conventional ANN.

Keywords: cutting forces, high-pressure cooling, neural networks, genetic algorithms,

particle swarm optimization


High performance manufacturing is an inclusive term incorporating many existing

theories and approaches on productivity and waste reduction. In recent years, different
cooling techniques have been applied in order to increase productivity of the machining
process. Tremendous opportunities in terms of improving the overall process
performance are offered by the high pressure cooling (HPC) technique which aims at
upgrading conventional machining using high pressure fluid directed into the tool and
machined material. The high pressure coolant allows a better penetration of the fluid into
the workpiece-tool and chip-tool interfaces, which results in a better cooling effect,
reduction of friction and improving tool life (Diniz & Micaroni, 2007; Kramar & Kopac,
2009; Wertheim, Rotberg, & Ber, 1992). Furthermore, high pressure coolant reduce the
tool-chip contact length/area, improve chip control and reduce the consumption of cutting
fluid (Ezugwu & Bonney, 2004).
Due to their mechanical, thermal and chemical properties, nickel-based alloys are
among the most commonly used materials in aerospace and chemical industry, power
production, environmental protection, etc. However, nickel-based alloys are considered
as materials that are hard to machine. The poor thermal conductivity of these alloys raises
temperature at the tool-workpiece interface during conventional machining (Kramar,
Sekulić, Jurković, & Kopač, 2013). Thus, short cutting tool life and low productivity due
to the low permissible rates of metal removal are inevitable associated with the
machining of nickel-based alloys. Conventional cooling is not efficient to prevent
extreme thermal loading in the cutting zone, so the focus of recent studies is aimed at
reducing temperature in the cutting zone by applying different cooling techniques.
Among them, HPC technique is starting to be established as a method for substantial
increase of removal rate and productivity in the metal cutting industry.
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 149

The effect of HPC on the performance of machining of nickel-based alloys has been
investigated by many authors. Ezugwu Ezugwu and Bonney (2004) analyzed tool life,
surface roughness, tool wear and component forces using high-pressure coolant supplies
in rough turning of Inconel 718 with coated carbide tools. The test results show that
acceptable surface finish and improved tool life can be achieved using HPC technique.
Ezugwu (Ezugwu and Bonney (2005)) investigated same parameters in finish machining
of Inconel 718 with coated carbide tool under high-pressure coolant supplies. The results
indicate that acceptable surface finish and improved tool life can be achieved with high
coolant pressures. Cutting forces were increased with increasing cutting speed due
probably to reactive forces introduced by the high-pressure coolant jet. Nandy,
Gowrishankar, and Paul (2009) investigate effects of high-pressure coolant on machining
evaluation parameters such as chip form, chip breakability, cutting forces, coefficient of
friction, contact length, tool life and surface finish. The results show that significant
improvement in tool life and other evaluation parameters could be achieved utilizing
moderate range of coolant pressure. Empirical modeling of machining performance under
HPC conditions using Taguchi DOE analysis has been carried out by Courbon et al.
(2009). Regression modelling was used to investigate the relationships between process
parameters and machining responses. It has been demonstrated that HPC is an efficient
alternative lubrication solution providing better chip breakability, reductions in cutting
forces and advantages regarding lubrication and thermal loads applied to the tool.
Furthermore, this cooling/lubrication technique can improve surface finish allowing for
an optimal pressure/nozzle diameter/cutting speed combination. Colak (2012) study the
cutting tool wear and cutting force components, while machining Inconel 718 under the
high pressure and conventional cooling conditions. Experimental results were analyzed
by using ANOVA and regression analysis. The results have proven that the tool flank
wear and cutting forces considerably decrease with the delivery of high pressure coolant
to the cutting zone. Klocke, Sangermann, Krämer, and Lung (2011) analyzed the effect of
high-pressure cooling in a longitudinal turning process with cemented carbide tools on
the tool wear, cutting tool temperature, resulting chip forms as well as the ratio of cutting
forces and tool-chip contact area. The results suggest that the tool temperature can be
significantly decreased by the use of a high-pressure coolant supply and that due to the
different tool wear mechanisms and the change in the specific load on the cutting edge
during machining, the resulting tool wear was influenced differently.
One of the most important factors in machining processes is accurate estimation of
cutting forces due to their significant impacts on product quality. Modeling and
prediction of optimal machining conditions for minimum cutting forces plays a very
important role in machining stability, tool wear, surface finish, and residual stresses. In
this regard, cutting forces have been investigated by many researchers in various
machining processes through formulation of appropriate models for their estimation. The
most frequently used models for prediction of cutting forces are mathematical models
150 Djordje Cica and Davorin Kramar

based on the on the geometry and physical characteristics of the machining process.
However, due to the large number of interrelated machining parameters that have a great
influence on cutting forces it is difficult to develop an accurate theoretical cutting forces
analytical model. Therefore, over the last few decades, different modeling methods based
on artificial intelligence (AI) have become the preferred trend and are applied by most
researchers for estimation of different parameters of machining process, including cutting
forces, tool wear, surface roughness, etc. Artificial neural networks (ANN) are by now
the most popular AI method for modeling of various machining process parameters.
There are numerous applications of ANN based modeling of cutting forces in turning
reported in the literature. Szecsi (1999) presented a three-layer feed-forward ANN trained
by the error back-propagation algorithm for modeling of cutting forces. Physical and
chemical characteristics of the machined part, cutting speed, feed, average flank wear and
cutting tool angles were used as input parameters for training ANN. The developed
model is verified and can be used to define threshold force values in cutting tool
condition monitoring systems. Lin, Lee, and Wu (2001) developed a prediction model for
cutting force and surface roughness using abductive ANN during turning of high carbon
steel with carbide inserts. The ANN were trained with depth of cut, feed and cutting
speed as input parameters. Predicted results of cutting force and surface roughness are
found to be more accurate compared to regression analysis. Sharma, Dhiman, Sehgal, and
Sharma (2008) developed ANN model for estimation of cutting forces and surface
roughness for hard turning. Cutting parameters such as approaching angle, speed, feed,
and depth of cut were used as input parameters for training ANN. The ANN model gave
overall 76.4% accuracy. Alajmi¹ and Alfares¹ (2007) modeled cutting forces using back
propagation ANN with an enhancement by differential evolution algorithm. Experimental
machining data such are speed, feed, depth of cut, nose wear, flank wear and, notch wear,
were used in this study to train and evaluate the model. The results have shown an
improvement in the reliability of predicting the cutting forces over the previous work.
Zuperl and Cus (2004) were developed supervised ANN approach for estimation of the
cutting forces generated during end milling process. The predictive capability of using
analytical and ANN models were compared using statistics, which showed that ANN
predictions for three cutting force components were closer to the experimental data
compared to analytical method. Aykut, Gölcü, Semiz, and Ergür (2007) used ANN for
modeling cutting forces with three axes, where cutting speed, feed and depth of cut were
used as input dataset. ANN training has been performed using scaled conjugate gradient
feed-forward back-propagation algorithm. Results show that the ANN model can be used
for accurate prediction of the cutting forces. Cica, Sredanovic, Lakic-Globocki, and
Kramar (2013) investigate prediction of cutting forces using ANN and adaptive networks
based fuzzy inference systems (ANFIS) as a potential modeling techniques. During the
experimental research focus is placed on modeling cutting forces in different cooling and
lubricating conditions (conventional, high pressure jet assisted machining, and minimal
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 151

quantity lubrication). Furthermore, the effect of cutting parameters such as depth of cut,
feed and cutting speed on machining variables are also studied.
However, despite the fact that there are numerous applications of ANN in modeling
of the cutting forces reported in the literature, a review of the literature shows that no
work is reported by modeling these parameters under HPC conditions. This can be
explained by complex relations between large numbers of HPC process parameters, such
are pressure of the jet, diameter of the nozzle, cutting speed, feed, etc. that influence the
cutting forces and make it difficult to develop a proper estimation model. In this sense,
this paper presents ANN models for estimation of cutting forces in turning of Inconel 718
under HPC conditions. First, cutting forces were modeled by using conventional ANN
which uses backpropagation algorithm in its learning. In order to overcome the
limitations of traditional backpropagation algorithm, two bio-inspired computational
techniques, namely genetic algorithm (GA) and particle swarm optimization (PSO) were
also used as a training methods of ANN. The capacity modeling of ANN by using GA
and PSO has been compared to that of the conventional ANN.


The experiments were performed on machining nickel-based alloy Inconel 718

supplied as bars (145 mm diameter and 300 mm long) with hardness between 36 and 38
HRC. Machining experiments have been carried out on a conventional lathe, fitted with a
high-pressure plunger pump of 150 MPa pressure and 8 l/min capacity. Standard sapphire
orifices of 0.25, 0.3 and 0.4 mm diameter, commonly used in water jet cutting
applications, were set in a custom-made clamping device that enabled accurate jet
adjustments. The cooling lubricant jet was directed normal to the cutting edge at a low
angle (about 5-6º) with the tool rake face. The nozzle was located 22 mm away from the
tool tip in order to assure its use in the core zone of the jet and avoid variations in the
diameter of the jet and radial distribution of the pressure. The cutting tool inserts used in
the experiments were coated carbide cutting tools – SANDVIK SNMG 120408-23 with
TiAlN coating. Tool was mounted on a PSBNR 2020 K12 tool holder resulting in
positive rake angle (γ = 7º).
The cutting force components (main cutting force Fc, feed force Ff and passive force
Fp) were measured with a three-component dynamometer (Kistler 9259A). The
dynamometer was rigidly mounted on the lathe via a custom designed adapter for the tool
holder so that cutting forces could be accurately measured. Force signals obtained from
the dynamometer were amplified and then transferred to computer. The measurement
chain also included a charge amplifier (Kistler 5001), a data acquisition hardware and a
graphical programming environment for data analysis and visualization. The whole
measurement chain was statically calibrated. Experimental setup is shown on Figure 1.
152 Djordje Cica and Davorin Kramar

Figure 1. Experimental setup.

In this research, three levels of diameter of the nozzle Dn, distance between the
impact point of the jet and the cutting edge s, pressure of the jet p, cutting speed vc, and
feed f, were used as the variables for cutting forces modeling (Table 1). Depth of cut was
fixed to 2 mm. With the cutting parameters defined and according to their levels, in total
27 experiments were realized as shown in Table 2.

Table 1. Design factors and their levels

Machining parameters
1 2 3
Diameter of the nozzle Dn [mm] 0.25 0.3 0.4
Distance between the impact point of the jet and the cutting edge s [mm] 0 1.5 3
Pressure of the jet p [MPa] 50 90 130
Cutting speed vc [m/min] 46 57 74
Feed f [mm/rev] 0.2 0.224 0.25


Artificial Neural Networks Trained by Backpropagation Algorithm

In recent years, ANN have attracted attention of many researchers as an effective

modeling tool for a wide range of linear or nonlinear engineering problems that cannot be
solved using conventional methods. An ANN is comprised of a series of information
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 153

Table 2. Input parameters and experimental results

Machining parameters Cutting forces

No. Dn s p vc f Fc Ff Fp
[mm] [mm] [MPa] [m/min] [mm/rev] [N] [N] [N]
Training data set
1 0.25 0 50 46 0.2 1280 615 475
2 0.25 0 90 57 0.224 1295 545 450
3 0.25 1.5 50 46 0.25 1508 645 530
4 0.25 1.5 90 57 0.2 1150 540 425
5 0.25 3 90 57 0.25 1350 660 520
6 0.25 3 130 74 0.2 1150 545 420
7 0.3 0 50 57 0.2 1245 520 400
8 0.3 0 90 74 0.224 1265 505 410
9 0.3 1.5 50 57 0.25 1460 560 485
10 0.3 1.5 130 46 0.224 1145 565 470
11 0.3 3 90 74 0.25 1385 505 405
12 0.3 3 130 46 0.2 1055 565 435
13 0.4 0 50 74 0.2 1187 505 410
14 0.4 0 130 57 0.25 1305 520 440
15 0.4 1.5 90 46 0.2 1160 560 435
16 0.4 1.5 130 57 0.224 1275 530 465
17 0.4 3 90 46 0.25 1375 560 470
18 0.4 3 130 57 0.2 1250 545 430
Testing data set
1 0.25 0 130 74 0.25 1370 570 470
2 0.25 1.5 130 74 0.224 1235 520 440
3 0.25 3 50 46 0.224 1400 630 510
4 0.3 0 130 46 0.25 1390 565 485
5 0.3 1.5 90 74 0.2 1190 475 415
6 0.3 3 50 57 0.224 1320 555 465
7 0.4 0 90 46 0.224 1450 620 475
8 0.4 1.5 50 74 0.25 1465 565 478
9 0.4 3 50 74 0.224 1320 590 460

processing elements (neurons) organized in several layers. These neurons are connected
to each other by weighted links denoted by synapses which establish the relationship
between input data and output data. There are many ANN models and multilayer
perceptions, which only feed forward and multilayered networks, were considered in this
paper. The structure of these ANN has three types of layers: input layer, hidden layer and
output layer. The biases in the neurons of the hidden and output layers, Oiinp and Ojout,
respectively, are controlled during data processing. The biases in the neurons of the
154 Djordje Cica and Davorin Kramar

hidden and output layers, bk(1) and bk(2), respectively, are controlled during data
Before practical application, ANN need to be trained. Training or learning as often
referred is achieved by minimizing the sum of square error between the predicted output
and the actual output of the ANN, by continuously adjusting and finally determining the
weights connecting neurons in adjacent layers. There are several learning algorithms in
ANN and back-propagation (BP) is the most currently the most popular training method
where the weights of the network are adjusted according to error correction learning rule.
Basically, the BP algorithm consists two phases of data flow through the different layers
of the network: forward and backward. First, the input pattern is propagated from the
input layer to the output layer and, as a result of this forward flow of data, it produces an
actual output. Then, in backward flow of data, the error signals resulting from any
difference between the desired and outputs obtained in the forward phase are back-
propagated from the output layer to the previous layers for them to update weights and
biases of each node until the input layer is reached, until the error falls within a
prescribed value.
In this paper, a multilayer feed-forward ANN architecture, trained using a BP
algorithm, was employed to develop cutting forces predictive model in machining
Inconel 718 under HPC conditions. An ANN is made of three types of layers: input,
hidden, and output layers. Network structure consists of five neurons in the input layer
(corresponding to five inputs: diameter of the nozzle, distance between the impact point
of the jet and the cutting edge, pressure of the jet, cutting speed, and feed) and one neuron
in the output layer (corresponding to cutting force component). Cutting force Fc, feed
force Ff and passive force Fp predictions were performed separately by designing single
output of neural network, because this approach decreases the size of ANN and enables
faster convergence and better prediction capability. Figure 2 shows the architecture of the
ANN together with the input and output parameters.

Figure 2. Artificial neural network architecture.

The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 155

The first step in developing of ANN is selection of data for training and testing
network. The number of training and testing samples were 18 and 9, respectively, as
shown in Table 2. Then, all data were normalized within the range of ±1 before training
and testing ANN. The ANN model, using the BP learning method, required training in
order to build strong links between layers and neurons. The training is initialized by
assigning some random weights and biases to all interconnected neurons.
The output of the k-th neuron in the hidden layer Okhid is define as,

Okhid  (1)
  I khid 
1  exp  (1) 
 T 


N inp
I hid
k   w(1)
jk O j  bk
inp (1)
j 1

where Ninp is the number of elements in the input, wjk(1) is the connection weight of the
synapse between the j-th neuron in the input layer and the k-th neuron in the hidden layer,
Ojinp is the input data, bk(1) is the bias in the k-th neuron of the hidden layer and T(1) is a
scaling parameter.
Similarly, the value of the output neuron Okout is defined as,

Okout  (3)
  I kout 
1  exp  (2) 
T 


N hid
I kout   wik(2) Oihid  bk(2) (4)
i 1

where Nhid is the number of neurons in the hidden layer, wik(2) is the connection weight of
the synapse between the i-th neuron in the hidden layer and the k-th neuron in the output
layer, bk(2) is the bias in the k-th neuron of the output layer and T(2) is a scaling parameter
for output layer.
During training, the output from ANN is compared with the measured output and the
mean relative error is calculated as:
156 Djordje Cica and Davorin Kramar

N exp 1 N out
Oiexp  Oiout 
E  w , w , b , b   exp
(1) (2) (1)

  out
m 1  N
 Oiexp

 i 1

where Nout is the number of neurons of the output layer, Nexp is the number of
experimental patterns and Oiout and Oiexp are the normalized predicted and measured
values, respectively.
The error obtained from previous equation is back-propagated into the ANN. This
means that from output to input the weights of the synapses and the biases can be
modified which will result in minimum error. Several network configuration were tested
with different numbers of hidden layers and various neurons in each hidden layer using a
trial and error procedure. The best network architecture was a typical two-layer feed-
forward network with one hidden layer with 10 neurons that was trained with a
Levenberg-Marquardt back-propagation algorithm. These ANN architecture will be used
in the next presentation and discussion.

Bio-Inspired Artificial Neural Networks

Regarding the feedforward ANN training, the mostly used training algorithm is
standard BP algorithm or some improved BP algorithms. Basically, the BP algorithm is a
gradient-based method. Hence, some inherent problems are frequently encountered in the
use of this algorithm, e.g., risk of being trapped in local minima, very slow convergence
rate in training, etc. In addition, there are many elements to be considered such are
number of hidden nodes, learning rate, momentum rate, bias, minimum error and
activation/transfer function, which also affect the convergence of BP learning. Therefore,
recent research emphasis has been on optimal improvement of ANN with BP training
The learning of ANN using bio-inspired algorithms has been a theme of much
attention during last few years. These algorithms provide a universal optimization
techniques which requires no particular knowledge about the problem structure other than
the objective function itself. They are robust and efficient at exploring an entire, complex
and poorly understood solution space of optimization problems. Thus, bio-inspired
algorithms are capable to escape the local optima and to acquire a global optima solution.
Bio-inspired algorithms have been successfully used to perform various tasks, such as
architecture design, connection weight training, connection weight initialization, learning
rule adaptation, rule extraction from ANN, etc. One way to overcome BP training
algorithm shortcomings is to formulate an adaptive and global approach to the learning
process as the evolution of connection weights in the environment determined by the
architecture and the learning task of ANN. Bio-inspired algorithms can then be used very
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 157

effectively in the evolution to find a near-optimal set of connection weights globally,

because they does not require gradient and differentiable information.
The supervised learning process of the ANN based on a bio-inspired algorithm uses
the weights of synapses linking the input layer nodes to hidden layer nodes wjk(1) and
hidden layer nodes to output layer nodes wik(1), respectively. Furthermore, the learning
variables are also the bias in nodes of hidden layers bk(1) and output layers bk(2). The
proposed optimization problem formulation is usually based on the minimization of an
error function, by iteratively adjusting connection weights. A schematic representation of
the learning of ANN using bio-inspired algorithms is given in Figure 3.

Figure 3. Flowchart of bio-inspired ANN.

158 Djordje Cica and Davorin Kramar

Although many quite efficient bio-inspired algorithms have been developed for the
optimization of ANN, in this study two of them, namely, genetic algorithm (GA) and
particle swarm optimization (PSO), were utilized to train a feed forward ANN with a
fixed architecture. Therefore, numerical weights of neuron connections and biases
represent the solution components of the optimization problem.

GA-Based Artificial Neural Networks

Genetic algorithms belong to the larger class of evolutionary algorithms (EA) in

which a population of candidate solutions to a problem evolves over a sequence of
generations. GA has been successfully used in a wide variety of problem domains that are
not suitable for standard optimization algorithms, including problems in which the
objective function is highly nonlinear, stochastic, nondifferentiable or discontinuous. An
implementation of a GA begins with a randomly generated population of individuals, in
which each individual is represented by a binary string (called chromosomes) for one
possible solution. These strings encode candidate solutions (called individuals) to an
optimization problem, evolves toward better solutions. The evolution happens in
generations and during each generation a measure of the fitness with respect to an
objective function is evaluated. Based on fitness value, a new population is then created
based on the evaluation of the previous one, which becomes current in the next iteration
of the algorithm. Individuals with a higher fitness have a higher probability of being
selected for further reproduction. Thus, on average, the new generation will possess a
higher fitness value than the older population. Commonly, the algorithm continues until
one or more of the pre-established criteria, such as maximum number of generations or a
satisfactory fitness level, has been reached for the population.
Following are the steps involved in the working principle of GA: (i) chromosome
representation, (ii) creation of the initial population, (iii) selection, (iv) reproduction, (v)
termination criteria and (vi) the evaluation function.
Chromosome representation. The basic element of the genetic algorithm is the
chromosome which contain the variable information for each individual solution to the
problem. The most common coding method is to represent each variable with a binary
string of digits with a specific length. Each chromosome has one binary string and each
bit in this string can represent some characteristic of the solution. Another possibility is
that the whole string can represent a number. Therefore, every bit string is a solution, but
not necessarily the best solution. This representation method is very simple; strings of
ones and zeroes would be randomly generated, e.g., 1101001, 0101100, etc., and these
would form the initial population. The strings may be of fixed length or, more rarely, be
of variable length. Apart from binary encoding, octal encoding, hexadecimal encoding,
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 159

permutation encoding, value encoding and tree encoding are also used as an encoding
methods in genetic algorithms.
Creation of the initial population. The GA sequence begins with the creation of an
initial population of individuals. The most common way to do this is to generate a
population of random solutions. A population of individuals represents a candidate
solution to the problem and the population size depend on the complexity of the problem.
Ideally, the first population should have a gene pool as large as possible in order to
explore the whole search space. Nevertheless, sometimes a problem specific knowledge
can be used to construct the initial population. Using a specific heuristic to construct the
population may help GA to find good solutions faster, but the gene pool should be still
large enough. Furthermore, it is necessary to take into account the size of the population.
The larger population enable easier exploration of the search space, but at the same time
increases the time required by a GA to converge.
Selection. Selection is process of randomly picking chromosomes out of the
population according to their evaluation function, where the best chromosomes from the
initial population are selected to continue, and the rest are discarded. The members of the
population are selected for reproduction or update through a fitness-based process, where
the higher the fitness function, guarantee more chance for individual to be selected. The
problem is how to select these chromosomes and there are a large number of methods of
selection which have been developed that vary in complexity. A method with low
selectivity accepts a large number of solutions which result in too slow evolution, while
high selectivity will allow a few or even one to dominate, which result in reduction of the
diversity needed for change and progress. Therefore, balance is needed in order to try
prevent the solution from becoming trapped in a local minimum. Several techniques for
GA selection have been used: the roulette wheel, tournament, elitism, random, rank and
stochastic universal sampling,
Reproduction. Reproduction is the genetic operator used to produce new generation
of populations from those selected through selection using two basic types of operators:
crossover and mutation. Crossover operators selects genes from parent chromosomes and
creates a new offspring. The simplest way to do this is to choose any crossover point on
the string and everything after and before the point is crossed between the parents and
copied. There are several types of crossover operators: single-point crossover, two-point
crossover, multi-point crossover, uniform crossover, three parent crossover, crossover
with reduced surrogate, shuffle crossover, precedence preservative crossover, ordered
crossover and partially matched crossover. The basic parameter of crossover operator is
the crossover probability which describe how often crossover will be performed. If
crossover probability is 0%, then whole new generation is made from exact copies of
chromosomes from old population, elsewise if it is 100%, then all offspring are made by
crossover. After crossover, the mutation operator is applied on the strings. Mutation
ensure more variety of strings and prevent GA from trapping in a local minimum. If task
160 Djordje Cica and Davorin Kramar

of crossover is to exploit the current solution to find better ones, then mutation forces GA
to explore new areas of the search space. There are some mutation techniques: flipping,
interchanging and reversing. The basic parameter of mutation operator is the mutation
probability which decide how often a string will be mutated. If mutation probability is
0%, no mutation occurs, elsewise if it is 100%, the whole chromosome will be changed.
Termination criteria. The GA moves through generation to generation until some of
the termination criteria is fulfilled. The GA stops when: specified number of generations
has been reached, specified duration of time is elapsed, defined level of fitness is reached,
the diversity of the population goes down below a specified level, and the solutions of the
population are not improved during generations.
The evaluation function. The task of the evaluation function is to determine the
fitness of each solution string generated during the search. The fitness of each individual
solution not only represent a quantitative measure of how well the solution solves the
original problem, but also corresponds to how close the chromosome is to the optimal
one. The function does not need to have any special analytical properties.
GA has been also used in training ANN recently in order to improve precision and
efficiency of the network. The performance of an ANN depends mainly on the weights of
its connections, therefore training a given ANN generally means to determine an optimal
set of connection weights. The weight learning of ANN is usually formulated as
minimization of some error functions, over the training data set, by iteratively adjusting
connection weights. In this way, the optimization problem is transformed into finding a
set of fittest weight for minimize objective function which is mean square error between
the target and actual output. In this chapter, GA is used to optimize the weights and bias
(weight values associated with individual nodes) of the ANN model. The steps involved
in process of ANN training using a GA are shown in Table 3.

Table 3. General framework of GA for ANN training

(i) Determine a fitness function to measure the performance of an individual chromosome in

the problem domain and algorithm parameters. Initialize a random population of
(ii) Decode each individual in the current population into a set of connection weights and
construct a corresponding ANN.
(iii) Simulate ANN using current population and evaluate the ANN by computing its mean
square error between actual and target outputs.
(iv) Calculate the fitness value of each chromosome in the population.
(v) Select a pair of chromosomes for mating from the current population on the basis of their
(vi) Apply genetic operators crossover and mutation to create new population.
(vii) Calculate fitness of chromosomes for new population.
(viii) Repeat steps (iv) to (vii) until the solution converge.
(ix) Extract optimized weights.
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 161

In order to achieve the best performance of GA-based ANN a parametric study for
determination of optimal set of GA parameters was carried out. The optimization process
takes place with the values of crossover probability, mutation probability, maximum
number of generations and population size. A parametric study is carried out so that the
value of one parameter is varied at a time, while other parameters have fixed values. The
fitness value of a GA is estimated based on the mean absolute percentage error of each
training data sample, representing deviation of the result (cutting force components) of
the GA-based ANN from that of the desired one. It is noted that the error could be either
positive or negative. Thus, absolute value of the error is considered as a fitness value of
GA solution. For main cutting force the optimal values of crossover probability, mutation
probability, number of generations, and population size were 0.15, 0.025, 2260, and 520,
respectively. For feed force optimal values of these parameters were 0.9, 0.01, 1480, and
590. Finally, for passive force the optimal values of crossover probability, mutation
probability, number of generations, and population size were 0.1, 0.015, 1920, and 260,
respectively. The results of the parametric study for main cutting force are shown in
Figure 4.

Figure 4. Results of parametric study for determination of optimal set of GA parameters.

162 Djordje Cica and Davorin Kramar

PSO-Based Artificial Neural Networks

PSO algorithm is a relatively new optimization technique originally introduced by

Eberhart and Kennedy (1995). It was inspired by the social interaction and
communication of bird flocking or fish schooling. PSO algorithm is population-based
stochastic optimization method, where the population is referred to as a swarm. The
optimization process of a PSO algorithm begins with an initial population of random
candidate solutions called particles. These particles change their positions by moving
around in the multidimensional search space through many iterations to search an optimal
solution for the problem by updating various properties of the individuals in each
generation. Each particle in the swarm is represented by the following characteristics: the
position vector of the particle, the velocity vector of the particle and the personal best
position of the particle. During the search process, the position of each particle is guided
by two factors: the best position visited by itself, and the global best position discovered
so far by any of the particles in the swarm. In this way, the trajectory of each particle is
influenced by the flight experience of the particle itself as well as the trajectory of
neighborhood particles of the whole swarm. This means that all the particles fly through
search space toward personal and global best position a navigated way, at the same time
exploring new areas by the stochastic mechanism in order to escape from local optima.
The performance of particles are evaluated using a fitness function that varies depending
on the optimization problem.
Position of the i-th particle in the d-dimension solution space at iteration k is denoted

xi  k   xi ,1  k  , xi ,2  k  , ..., xi ,d  k  (6)


yˆ i  k    yˆi ,1  k  , yˆi ,2  k  , ..., yˆi ,d  k  (7)

denote the best position found by particle i up to iteration k, and

y  k    yi ,1  k  , yi ,2  k  , ..., yi ,d  k  (8)

denote the best position found by any of the particles in the neighborhood of xi up to
iteration k.
The new position of particle i in iteration k + 1, xi(k + 1), is computed by adding a
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 163

vi  k  1  vi ,1  k  1 , vi ,2  k  1 , ..., vi ,d  k  1 (10)

to the current position xi(k):

xi  k  1  xi  k  + vi  k  1 (11)

The components of vi(k+1), are computed as follows:

vi , j  k  1  vi , j  k   c1r1, j  yˆi , j  k   xi , j  k    c2 r2, j  y j  k   xi , j  k   (12)

where j designates component in the search space; ω represents the inertia weight which
decreases linearly from 1 to near 0; c1, and c2 are cognitive and social parameters,
respectively, known as learning factors; and r1,j and r2,j are random numbers uniformly
distributed in the range [0, 1]. The inertia weight component causes the particle to
continue in the direction in which it was moving at iteration k. A large weight facilitates
global search, while a small one tends to facilitate fine tuning the current search area. The
cognitive term, associated with the experience of the particle, represents its previous best
position and provides a velocity component in this direction, whereas the social term
represents information about the best position of any particle in the neighborhood and
causes movement towards this particle. These two parameters are not critical for the
convergence of PSO, but fine tuning may result in faster convergence of algorithm and
alleviation of local minima. The r1,j and r2,j parameters are employed to maintain the
diversity of population.
The PSO algorithm shares many similarities with evolutionary computation
techniques such as GA. PSO algorithm are also are initialized with a randomly created
population of potential solutions and has fitness values to evaluate the population.
Furthermore, both algorithms update the population and search for the optimum with
random techniques. However, unlike GA, PSO does not have operators such as mutation
and crossover which exist in evolutionary algorithms. In PSO algorithm potential
solutions (particles) are moving to the actual optimum in the solution space by following
their own experiences and the current best particles. Compared with GA, PSO has some
attractive characteristics such are its memory which enables it to retain knowledge of
good solutions by particles of the whole swarm, simultaneously search for an optima in
multiple dimensions, mechanism of constructive cooperation and information-sharing
between particles. Due to its simplicity, robustness, easy implementation, and quick
convergence PSO optimization method has been successfully applied to a wide range of
applications. The focus of this study is to employ a PSO for optimization of the weights
and bias of the ANN model. The steps involved in process of ANN training using PSO
are shown in Table 4.
164 Djordje Cica and Davorin Kramar

Table 4. General framework of PSO for ANN training

(i) Determine an objective function and algorithm parameters. Initialize the position and
velocities of a group of particles randomly.
(ii) Decode each particle in the current population into a set of connection weights and
construct a corresponding ANN.
(iii) Simulate ANN using current population and evaluate the ANN by computing its mean
square error between actual and target outputs.
(iv) Calculate the fitness value of each initialized particle in the population.
(v) Select and store best particle of the current particles.
(vi) Update the positions and velocities of all the particles and generate a group of new
(vii) Calculate the fitness value of each new particle and replace the worst particle by the
stored best particle. If current fitness is less than local best fitness then set current fitness
as local best fitness and if current fitness is less than global best fitness then set current
fitness as global best fitness.
(viii) Repeat steps (iv) to (vii) until the solution converge.
(ix) Extract optimized weights.

Similar to the previous one, a careful parametric study has been carried out to
determine the set of optimal PSO parameters, where the value of one parameter is varied
at a time, while other parameters have fixed values. The optimization process takes place
with the values of cognitive acceleration, social acceleration, maximum number of
generations, and population size. The fitness value of a PSO solution is estimated based
on the mean absolute percentage error of each training data sample. The error of each set
of training data is the deviation of the result (cutting force components) of the PSO-based
ANN from that of the desired one. For main cutting force the optimal values of cognitive
acceleration, social acceleration, number of generations, and population size were 0.8,
1.6, 350, and 250, respectively. For feed force optimal values of these parameters were
0.4, 1.4, 270, and 250. Finally, for passive force the optimal values of cognitive
acceleration, social acceleration, number of generations, and population size were 0.5,
1.0, 340, and 240, respectively. The results of the parametric study for main cutting force
are shown in Figure 5.


In this section, ANN trained by backpropagation algorithm and bio-inspired ANN

were applied for prediction of cutting force components in turning of Inconel 718 under
HPC conditions and a comparative analysis is performed. After a number of trails it was
found that the best network architecture consists five input neurons in input layer
(corresponding to five machining parameters), one hidden layer with then neurons and
one output neuron in output layer (corresponding to cutting force component). The BP-
based, GA-based and PSO-based ANN models were validated by using nine sets of
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 165

testing data which were not used for the training process shown in Table 2. In order to
evaluate the performance, the predicted cutting forces components were compared with
the experimental values. In order to evaluate the performance of developed training
methods of ANN, the predicted main values of cutting force, feed force and passive force
were compared with the experimental data and summarized in Table 5, Table 6 and Table
7, respectively. The mean absolute percentage errors for main cutting force, feed force
and passive force of BP-based ANN were 5.1%, 5.8% and 6.1%, respectively, which is
considered a good agreement between the simulated outputs and the experimental results.
However, the optimal results obtained using the GA-based and PSO-based ANN models
are even more accurate. The mean absolute percentage errors of GA-based ANN model
for main cutting force, feed force and passive force were 3.8%, 5.3% and 4.2%,
respectively. Finally, mean absolute percentage errors of PSO-based ANN model were
3.8%, 3.7% and 3.8% for main cutting force, feed force and passive force, respectively.
Hence, the learning of ANN using bio-inspired algorithms has demonstrated
improvement in training average error as compared to the backpropagation algorithm.

Figure 5. Results of parametric study for determination of optimal set of PSO parameters.
166 Djordje Cica and Davorin Kramar

Table 5. Comparison between predicted main cutting force and experimental results

Testing Exp. value of BP-based ANN GA-based ANN PSO-based ANN

data set Fc Fc Error Fc Error Fc Error
1 1370 1257.9 8.2 1329.1 3.0 1409.8 2.9
2 1235 1164.4 5.7 1204.1 2.5 1262.4 2.2
3 1400 1366.2 2.4 1383.5 1.2 1337.7 4.5
4 1390 1286.3 7.5 1302.6 6.3 1344.8 3.3
5 1190 1131.9 4.9 1215.8 2.2 1162.1 2.3
6 1320 1291.4 2.2 1308.9 0.8 1293.1 2.0
7 1450 1285.8 11.3 1278.5 11.8 1329.4 8.3
8 1465 1432.4 2.2 1419.1 3.1 1379.6 5.8
9 1320 1344.1 1.8 1364.1 3.3 1285.6 2.6

Table 6. Comparison between predicted feed force and experimental results

Testing Exp. value of BP-based ANN GA-based ANN PSO-based ANN

data set Ff Ff Error Ff Error Ff Error
1 570 535.6 6.0 530.9 6.9 584.5 2.5
2 520 535.7 3.0 544.6 4.7 540.5 3.9
3 630 594.9 5.6 650.5 3.3 612.6 2.8
4 565 578.7 2.4 554.9 1.8 566.0 0.2
5 475 459.7 3.2 491.8 3.5 472.3 0.6
6 555 551.0 0.7 566.7 2.1 527.0 5.0
7 620 537.8 13.3 530.1 14.5 544.7 12.1
8 565 517.0 8.5 547.0 3.2 531.0 6.0
9 590 535.7 9.2 545.1 7.6 588.7 0.2

Table 7. Comparison between predicted passive force and experimental results

Testing Exp. value of BP-based ANN GA-based ANN PSO-based ANN

data set Fp Fp Error Fp Error Fp Error
1 470 407.2 13.4 438.4 6.7 446.2 5.1
2 440 405.5 7.8 416.0 5.5 452.1 2.8
3 510 508.5 0.3 508.0 0.4 532.7 4.5
4 485 478.8 1.3 498.8 2.8 486.0 0.2
5 415 388.5 6.4 393.4 5.2 407.1 1.9
6 465 483.8 4.0 445.1 4.3 479.6 3.1
7 475 461.9 2.8 460.8 3.0 456.7 3.9
8 478 444.0 7.1 433.0 9.4 431.6 9.7
9 460 406.1 11.7 462.8 0.6 444.7 3.3
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 167


In this study, three different ANN models for estimation of cutting force components
in turning of Inconel 718 under HPC conditions were developed. The considered process
parameters include diameter of the nozzle, distance between the impact point of the jet
and the cutting edge, pressure of the jet, cutting speed, and feed. First, cutting forces were
modeled by using conventional multilayer feed-forward ANN trained using a BP
algorithm. These model were found to predict the output with the 94.9%, 94.2% and
93.9% accuracy, for main cutting force, feed force and passive force, respectively. These
results indicate good agreement between the predicted values and experimental values.
However, due to the limitations of BP-based ANN, such are risk of being trapped in local
minima, very slow convergence rate in training, etc. an effort was made to apply two
bio-inspired algorithm, namely GA and PSO, as a training methods of ANN. The results
obtained indicated that GA-based ANN can be successfully used for predicting of main
cutting force, feed force and passive force, with the 96.2%, 94.7% and 95.8% accuracy,
respectively. The predicted results of PSO-based ANN have accuracy of 96.2%, 96.3%
and 96.2%, for main cutting force, feed force and passive force, respectively. It is evident
that results obtained using the GA-based and PSO-based ANN models are more accurate
compared to BP-based ANN. However, PSO-based ANN model predicted cutting force
components with better accuracy compared to the GA-based ANN model. Hence, the
learning of ANN using bio-inspired algorithms can significantly improve the ANN
performance, not only in terms of precision, but also in terms of convergence speed. The
results showed that the GA-based and PSO-based ANN can be successfully and very
accurately applied for the modeling of cutting force components in turning under HPC


Alajmi, M. S. & Alfares, F. (2007). Prediction of cutting forces in turning process using
de-neural networks.
Aykut, Ş., Gölcü, M., Semiz, S. & Ergür, H. (2007). Modeling of cutting forces as
function of cutting parameters for face milling of satellite 6 using an artificial neural
network. Journal of Materials Processing Technology, 190(1), 199-203.
Cica, D., Sredanovic, B., Lakic-Globocki, G. & Kramar, D. (2013). Modeling of the
cutting forces in turning process using various methods of cooling and lubricating: an
artificial intelligence approach. Advances in Mechanical Engineering.
168 Djordje Cica and Davorin Kramar

Colak, O. (2012). Investigation on machining performance of inconel 718 in high

pressure cooling conditions. Strojniški vestnik-Journal of Mechanical Engineering,
58(11), 683-690.
Courbon, C., Kramar, D., Krajnik, P., Pusavec, F., Rech, J. & Kopac, J. (2009).
Investigation of machining performance in high-pressure jet assisted turning of
Inconel 718: an experimental study. International Journal of Machine Tools and
Manufacture, 49(14), 1114-1125.
Diniz, A. E. & Micaroni, R. (2007). Influence of the direction and flow rate of the cutting
fluid on tool life in turning process of AISI 1045 steel. International Journal of
Machine Tools and Manufacture, 47(2), 247-254.
Eberhart, R. & Kennedy, J. (1995). A new optimizer using particle swarm theory. Paper
presented at the Micro Machine and Human Science, 1995. MHS’95, Proceedings of
the Sixth International Symposium.
Ezugwu, E. & Bonney, J. (2004). Effect of high-pressure coolant supply when machining
nickel-base, Inconel 718, alloy with coated carbide tools. Journal of Materials
Processing Technology, 153, 1045-1050.
Ezugwu, E. & Bonney, J. (2005). Finish machining of nickel-base Inconel 718 alloy with
coated carbide tool under conventional and high-pressure coolant supplies. Tribology
Transactions, 48(1), 76-81.
Klocke, F., Sangermann, H., Krämer, A. & Lung, D. (2011). Influence of a high-pressure
lubricoolant supply on thermo-mechanical tool load and tool wear behaviour in the
turning of aerospace materials. Proceedings of the Institution of Mechanical
Engineers, Part B: Journal of Engineering Manufacture, 225(1), 52-61.
Kramar, D. & Kopac, J. (2009). High pressure cooling in the machining of hard-to-
machine materials. Journal of Mechanical Engineering, 55(11), 685-694.
Kramar, D., Sekulić, M., Jurković, Z. & Kopač, J. (2013). The machinability of nickel-
based alloys in high-pressure jet assisted (HPJA) turning. Metalurgija, 52(4), 512-
Lin, W., Lee, B. & Wu, C. (2001). Modeling the surface roughness and cutting force for
turning. Journal of Materials Processing Technology, 108(3), 286-293.
Nandy, A., Gowrishankar, M. & Paul, S. (2009). Some studies on high-pressure cooling
in turning of Ti–6Al–4V. International Journal of Machine Tools and Manufacture,
49(2), 182-198.
Sharma, V. S., Dhiman, S., Sehgal, R. & Sharma, S. (2008). Estimation of cutting forces
and surface roughness for hard turning using neural networks. Journal of intelligent
Manufacturing, 19(4), 473-483.
Szecsi, T. (1999). Cutting force modeling using artificial neural networks. Journal of
Materials Processing Technology, 92, 344-349.
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 169

Wertheim, R., Rotberg, J. & Ber, A. (1992). Influence of high-pressure flushing through
the rake face of the cutting tool. CIRP annals-Manufacturing technology, 41(1), 101-
Zuperl, U. & Cus, F. (2004). Tool cutting force modeling in ball-end milling using
multilevel perceptron. Journal of Materials Processing Technology, 153, 268-275.


Dr. Djordje Cica is a professor at the University of Banja Luka in the Faculty of
Mechanical Engineering. Large experience in artificial intelligence applied for expert
system using bioinspired algorithms and fuzzy logic.

Dr. Davorin Kramar is a professor at University of Ljubljana, Faculty of

Mechanical Engineering. His experience is artificial intelligence in manufacturing and
machine materials.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 8



Luis Rabelo*, Edgar Gutierrez, Sayli Bhide and Mario Marin

Department of Industrial Engineering and Management Systems, University of
Central Florida, Orlando, Florida, US


Predictive analytics is defined as the discovery of valuable patterns and relationships

in structured and/or unstructured data environments using statistical and AI techniques to
develop decision making systems that calculate future outcomes. The analyst must
uncover and build an initial underlying structure of the problem and then support
modeling strategies to find appropriate models and abstractions to build a predictive
system. The goal of these predictive systems is to calculate future outcomes (with the
respective risk levels) and tendencies. This paper introduces genetic programming as a
predictive modeling technique which can be the core of a predictive system. To further
explain the introduced framework with genetic programming, an actual case study with
the Reinforced Carbon-Carbon structures of the NASA Space Shuttle is used.

Keywords: Genetic Programming, Evolutionary Algorithms, NASA Shuttle


Traditional analytical approaches assume stability. The new technological trends

such as the recent dramatic increase in computing power and the development of AI

* Corresponding Author Email:

172 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

techniques have opened doors that increase the level of complexity in problem solving.
This has provided the environment for the renaissance of a new analytics paradigm that is
trying to deal with continuously changing environments. This new paradigm focuses on
the ability to recognize change and react quickly. For example, advanced analytics uses
continuous data sampling to provide additional insights that further enhance strategic
decisions and may assist decision makers in identifying new business opportunities
and/or new relationships, which may also support innovation and creativity (Legarra et
al., 2016). One very important aspect is the ability to forecast future perceptions and
calculate the risk of potential outcomes. The incorporation of big data capabilities can
further enhance such approaches through rich data sources and computational capabilities
that provide additional insights across a value network and/or life cycle along with real
time identification and tracking of key factors. Although big data technologies currently
exist, consensus on tools and techniques for managing and using big data to extracting
valuable insights is not well established (Gobble, 2013). Organizations are currently
trying to gain a better understanding of the new paradigm and the associated benefits
from the viewpoint of big data and advanced analytics. Complexity is always the issue.
Predictive analytics is one form of advanced analytics. Predictive analytics uses a
combination of data which may include historical, auxiliary, structured, and unstructured
data to forecast potential actions, performance, and developments. This form of advanced
analytics is considered more involved and technologically demanding than visual and
descriptive analytics. This is because predictive analytics involves statistical techniques,
AI techniques, OR/MS modeling, simulation, and/or hybrids of them to create predictive
models that quantify the likelihood of a particular outcome occurring in the future. In
addition, predictive analytics are part of systems which try to tame complexity.
Predictive analytics uses statistical techniques, AI and OR/MS modeling, simulation,
and/or hybrids. AI includes a large diverse universe of different types of techniques. The
traditional side of AI involve ontologies, semantics, expert systems, and reasoning. On
the other hand, the machine learning side of AI includes supervised, unsupervised and
reinforcement learning, including artificial neural networks, support vector machines,
deep learning, evolutionary algorithms (EAs) and other metaheuristics, and regression
Evolutionary algorithms is a family of techniques for optimization inspired by natural
evolution. Blum et al. (2012) stated that EA “is an algorithm that simulates – at some
level of abstraction – a Darwinian evolutionary system.” The most popular EAs are
Genetic Algorithms (GAs), Genetic Programming (GP), Evolutionary Strategies (ES) and
Evolutionary Programming (EP). GP is a very useful technique that has become
dominant and well developed in the last twenty years. GP is generally applicable to a
wide range of predictive analytics problems
Predictive Analytics using Genetic Programming 173


Advanced analytics aims to provide the base necessary to handle complex problems
in terms of scalability and amount of data and sources (Chen & Zhang, 2014). The
analysis of the data is the new scientific paradigm, besides empirical, theoretic and
computational science. Challenges that create techniques and methodologies are
beneficial for this purpose in order to handle complex problems (Chen & Zhang, 2014).
A complex problem usually features at least several of the followings:

 Incomplete or lack of data,

 Very large amounts of data (i.e., petabytes)
 Hybrids of continuous and discrete variables/environments,
 Mix of structured and unstructured data,
 High noise levels
 Real-time, timeless, and latency features of the decision time window,
sensors/actuators system (to receive feedback and act), and the computational
execution of the predictive system
 Mix of qualitative and quantitative assessments
 Multidisciplinary and interdisciplinary features of the problem/system
 Nonlinearities, observability, and controllability issues
 Human-factors and human-behaviors (e.g., predictably irrational, usability,
culture, politics, etc.)

Our experience working and analyzing these problems have provided us with a more
comprehensive methodology where several models can be used with other types of
empirical models in order to build predictive systems. Our methodology has been
evolving through the years due to the technological trends mentioned above (i.e.,
computing power and new, more established AI techniques) and have the following steps
(Rabelo, Marin, & Huddleston, 2010):

1. Understand the problem from different viewpoints: We have to understand

the problem and the goals and objectives assigned to the predictive modeling
task. We have to view complex problems from different dimensions. This is
usually a multi-disciplinary/interdisciplinary effort. Some of the important
viewpoints are:
a. Basic Theory – First principles are very important to understand. The
team must be networked with the scientists and experts from the different
domains. The predictive modeling team has to be conversant with the
contributions of the different disciplines involved (materials, optics,
finance, marketing, human behavior, psychology, etc.).
174 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

b. Experiments and Visits – The different experiments and the data must
be understood by the data science team. How do they relate to each
other? How was the equipment/surveys calibrated/designed? Who are the
owners of the data?
c. Organizational/Cultural/Political and the ecosystem – The problem
ecosystem must be investigated. Do the participants understand the
goals/objectives and procedures of the data science task? Is there an
institutional culture of sharing ideas, information, and data? Is top
management championing the data science team?
2. Gather Information from current databases/files and servers/clusters: This
step is very important. Complex problems in large/global organizations have
distributed databases, servers and other types of repositories of data and
information in different formats, different computing/IT platforms, unstructured,
structured, and levels of details and accuracy.
3. Develop map of databases and clusters from the different points in the life-
cycle: It is important to have a clear picture and guidance of the different
variables, experiments and data available. A map of this is very important for
providing the flexibility to integrate different databases and clusters, and create
new ones. Enterprise data hubs and ontologies are very important (if budget and
sophistication of the project permit) to increase agility, capacity planning, and
4. Develop map of “models” (analytical and Empirical) from the different
points in the life-cycle: Usually, this step is totally forgotten from the data
science task (it was difficult to find an article on data mining/data science with
this philosophy). The traditional data miners go directly to the database to start
playing with the data and the variables. Not only are the results from experiments
very important for the data mining task but so are previously developed models
based on statistics, non-statistical techniques, finite element analysis, simulations,
and first principle models. These models have important information. We must
be able to explore their fusion with the predictive models to be developed by the
data science task.
5. Build databases from current ones (if required): Now that we know the
goals/objectives, of the different environments, we can create comprehensive
databases with the relevant data and variables. Different procedures can be used
to start preparing the data for the modeling efforts by the advanced analytics
6. Knowledge Discovery and Predictive Modeling: Develop the different models,
discovery of relationships, according to the goals/objectives of the data science
task. It is important to explore the information fusion of the different models
7. Deployment of the models developed: This not only includes the development
of a user interface but also includes the interpretation of the models’ answers in
the corresponding technical language. An integrity management plan must be
implemented with the appropriate documentation.
Predictive Analytics using Genetic Programming 175


Evolutionary algorithms are search and optimization procedures that are motivated
by the principles of natural genetics and natural selection (Deb, 2001). This concept was
first developed during the 70’s by John Holland and his students at the University of
Michigan, Ann Arbor (Deb, 1989). The goals of their research have been twofold: (1) to
abstract and rigorously explain the adaptive processes of natural systems, and (2) to
design artificial systems software that retains the important mechanics natural selection
(Goldberg, 1989). Eventually, this approach has led to important discoveries and
advancements in both natural and artificial systems science.
Over the last two decades, EAs have been extensively used as search and
optimization tools in various problem domains, including science, commerce and
engineering (Deb, 2001). These algorithms have been found to be very successful in
arriving at an optimal/near-optimal solution to complex optimization problems, where
traditional search techniques fail or converge to a local optimum solution. The primary
reasons for their success are their wide applicability, ease of use, and global dimension.
There are several variations of EAs. Blum et al. (2011) stated that standard EA includes a
set of principles and a common cycle. This set of principles is explained as follows:

1. Populations of individuals which represent solutions or strategies

2. Populations change dynamically due to the different “natural” processes of
reproduction and generations
3. An individual survives and reproduces according to the “advantages” given by its
level of fitness.
4. New generations resemble their parents but are not identical.

EAs follow a cycle similar to the one depicted in Figure 1. An initial population is
built based on random creation of individuals with their respective chromosomes. Some
individuals of this initial population can be generated using metaheuristics and other
optimization engineering schemes. The population is mapped from the genetic
representation (i.e., chromosome instance) to a fitness based one (representation required
to be assessed by the environment). That means that the particular individual needs to be
represented in a different way to obtain the value of the objective function (as given by
the assessment process). For example, a chromosome instance (representing a particular
individual) can represent now a discrete-event simulation program that needs to be
executed to obtain the value of the objective function. If the performance criterion is met,
this cycle (i.e., evolution) stops. Otherwise, the evolutionary cycle continues with the
generation of the next population. That means that after the values of the objective
function are obtained for each member of the population, the fitness values are
determined in a relative manner. The mating pool is formed by the members which have
the best relative fitness.
176 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

Figure 1: Basic cycle of EAs.

The next step is reproduction where offspring are derived from the selected
individuals by applying the reproduction operations. There are usually three different
reproduction operations: 1.) mutation, which modifies with some probability the original
structure of a selected individual, 2.) reproduction (i.e., cloning of some individuals to
preserve features which contribute to higher fitness), and 3.) crossover, which combines
two chromosome instances in order to generate offspring. Blum et al. (2011) described
that “whether the whole population is replaced by the offspring or whether they are
integrated into the population as well as which individuals to recombine with each other
depends on the applied population handling strategy.”
The most popular EAs are Genetic Algorithms (GAs), Genetic Programming (GP),
Evolutionary Strategies (ES) and Evolutionary Programming (EP). The basic idea behind
GP is to allow a computer/machine to emulate what a software programmer does. The
software programmer develops a computer program based on objectives and gradual
upgrades. Langdon et al. (2010) stated that GP “does this by repeatedly combining pairs
of existing programs to produce new ones, and does so in a way as to ensure the new
programs are syntactically correct and executable. Progressive improvement is made by
testing each change and only keeping the better changes. Again this is similar to how
people program, however people exercise considerable skill and knowledge in choosing
where to change a program and how.” Unfortunately, GP does not have the knowledge
and intelligence to change and upgrade the computer programs. GP must rely on
gradients, trial and error, some level of syntactic knowledge, and chance.
Predictive Analytics using Genetic Programming 177

GP is basically a variation of the genetic algorithm and it follows the standards of

EAs as outlined above. The population of individuals in GP are computer programs. The
representation of individuals/computer programs are trees. These hierarchical or
structured trees are the population and they can have different sizes. Given the tree
representation, genetic operators such as tree crossover must be modified accordingly.
The following computer programs are parental programs as displayed in Figure 2.
The first one is: 0.25Y + X + 1.75 and using a LISP S-expression (+ (* 0.25 Y) (+ X
1.75)). The second program is: XY(X / 0.455Z) and using a LISP S-expression (* (* Y
X) (/ X (* 0.455 Z)). These parental programs have “point-labeled” trees with ordered

Figure 2: Two parental computer programs which are rooted with ordered branches.

The standard crossover operation of an EA in the case of GP creates offspring by the

exchange of subtrees. Subtrees can be considered subroutines, subprocedures or
subfunctions. These subtrees selected by GP are essentially for crossover (see Figure 3).
For example, it can select specific labeled sections of the parental computer programs of
Figure 2 and decide on a subtree from each parent to be used for the creation of offspring
(Figure 3). The first one is selected from point labeled 2 (randomly selected) of the first
parental computer program: 0.25Y and using a LISP S-expression (* 0.25 Y). The second
subtree is selected from point labeled 5 (randomly selected) of the second parental
computer program: X / 0.455Z and using a LISP S-expression (/ X (* 0.455 Z)).

Figure 3: Subtrees of two parental computer programs selected for crossover.

178 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

The remainders are part of the parental programs not selected for crossover (Figure
4). The remainders are available in order to generate offspring. The first offspring can be
created by inserting the second parent’s crossover fragment into the first parent’s
remainder at the first parent’s crossover point (Figure 5). The second offspring can be
created by inserting the first parent’s crossover fragment into the second parent’s
remainder at the first second’s crossover point (Figure 5).

Figure 4: Remainders.

The new computer programs (offspring) are displayed in Figure 5. The first one is:
X/0.455Z + X + 1.75 and using a LISP S-expression (+ (/ X (* 0.455 Z)) (+ X 1.75)).
The second program is: 0.25XY2 and using a LISP S-expression (* (* Y X) (* 0.25 Y)).

Figure 5: Offspring programs developed from the previous parental programs (Figure 2), subprograms
selected for crossover (Figure 3), and the remainders (Figure 4).

It is important to require structure-preserving crossover. This is achieved by the

concept of invariant points. The parental programs in Figure 2 are point labeled trees and
some of those points can be defined of the invariant type. Therefore, structure-preserving
crossover never alters the invariant points of an overall program” (Koza, 1994). More
sophisticated structures can be implemented by using ontologies and constrained
Predictive Analytics using Genetic Programming 179

GP assigns the computer the ability to develop computer programs/models

(represented as tree structures). GP is based on the “Darwinian ideas of survival of the
fittest" and the operators of crossover, mutation, and reproduction (Koza, 1994; Koza et
al., 2003). Figure 6 (modified from Koza (1994)) where Gen is the current generation. M
is the population size and i is the current individual in the population. The initial
population of size M is created and Gen is initialized to 0. As stated by Ratner (2008),
“The process begins with a fitness function and a set of user-selectable mathematical and
logical functions” from which a predictive model can be formed. “A first generation of as
many as 250 - 1000 models is randomly generated using the functions and variables
available; the fitness of each model is evaluated using collected data” (Ratner, 2008).
Each individual i of the population is evaluated from the objective function viewpoint
and their relative fitness calculated. Then, the highest values are compared with the
objectives of the project/session and it can be decided to stop if met, otherwise the
evolutionary process will continue.
The next generations of models are created following the processes of mating
(crossover), reproduction, and mutation. Crossover (as explained above) is when two
computer programs pair off. The resulting offspring are blends of the parents’ genetic
material. Reproduction is just the cloning of the best individuals that evolution should
maintain for the next generation. On the other hand, mutation in GP is considered as a
secondary operator. Piszcz & Soule (2005) have shown that mutation can improve
performance when combined with crossover. There are several mutation realizations in
GP environments. However, the most utilized ones are: node based mutation (i.e., a rate
specifies the node’s probability – Figure 7) and tree based mutation (i.e., a rate gives the
frequency that individuals are selected). When applying mutation in GP, “the mutation
rate is set to 1/C, where C denotes the size of the tree” (Piszcz & Soule, 2005). After a
number of generations, GP provides a predictive model adapted to the objective.
GP is considered a non-statistical methodology. Its major use is predictive modeling.
However, GP can also be used for knowledge discovery as explained by Koza et al.
(2003). Unfortunately, the current articles in predictive analytics mention genetic
algorithms but not GP.


NASA's Space Shuttle was the first orbital spacecraft that was a reusable launch
vehicle. At launch, it consisted of the following major systems: external tank, solid rocket
boosters, and orbiter (Figure 8). After 2003, there were three orbiters: Atlantis, Discovery
and Endeavour. Discovery completed its final mission on March 9, 2011 and Endeavour
180 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

Figure 6: Generic process of Genetic Programming.

Figure 7: Example of node-based mutation with an IFTLE (If-Than-Less-Else) node as root.

Predictive Analytics using Genetic Programming 181

on June 1, 2011. The landing of Atlantis on July 21, 2011 marked the closing of the 30-
year program. The lessons learned and developments of this 30-year program will impact
future programs such as the one to go to Mars (Rabelo et al., 2011; Rabelo et al., 2012;
Rabelo et al., 2013).

Figure 8: The NASA Space Shuttle and its main components (NASA, 2005).

One of the most important systems in the Space Shuttle is the Thermal Protection
System (TPS). The TPS is made up of diverse materials “applied externally to the outer
structural skin of the orbiter to maintain the skin within acceptable temperatures,
primarily during the entry phase of the mission” (NASA, 2002). The TPS is built from
materials selected for stability at high temperatures and weight efficiency. Reinforced
carbon-carbon (RCC) is used on the wing leading edges; the nose cap, including an area
immediately after of the nose cap; and the immediate area around the forward
orbiter/external tank structural attachment. RCC protects areas where temperatures
exceed 2,300 °F during re-entry (NASA, 2004).
The wing leading edges are one of the highest reentry heating areas. The wing
leading edges are composed of 22 panels (Figure 9). These panels are fabricated with
RCC. To begin fabrication of these RCC panels, a foundation of woven fabric is
positioned such that all plies are alternating in the 0 and 90 degree directions. During the
manufacturing process, silica is infused in the outer layers, and the resulting laminate is
heated in specialized reactors that have an inert environment to form a silicon-carbide
182 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

(SiC) coating (Gordon, 1998). The manufacturing process, the temperature profiles, and
the infusion rates can create cavities in the carbon-carbon substrate. Micro-cracks in the
SiC coating can be also created. These substrate cavities and coating micro-cracks result
in a material with complex behavior (a tough-brittle material behavior with plasticity -
Figure 10). This needs to be emphasized due to the extreme environment and conditions
to be experienced during the re-entry phase of the orbiter.

Figure 9: The left wing of the NASA Space Shuttle with Reinforced Carbon-Carbon Panels (NASA,
2006). The only panels numbered in the picture are those panels numbered 1 through 10, 16 and 17.
There are 22 RCC panels on each wing's leading edge.

The manufacturing lead time of RCC panels is almost 8 months and their cost is high
due to the sophistication of the labor and the manufacturing equipment. It is an
engineered to order process. It will be good to know the health and useful life of an RCC
Panel. The predictive system can provide a future outcome of over-haul or disposal.
NASA developed several Non-Destructive Evaluation (NDE) methods to measure the
health of the RCC materials such as advanced digital radiography,, thermography, high
resolution computed digital tomography, advanced eddy current systems, and advanced
ultrasound (Madaras et al., 2005; Lyle & Fasanellaa, 2009). From those, thermography is
the favorite one due to its features such as easy to implement in the orbiter’s servicing
environment in the Orbiter Processing Facility (OPF), non-contacting, one-sided
application, and it measures the health of the RCC panel (Cramer et al., 2006). This NDE
method can be performed during flights. In addition, this information can be fed to a
Predictive Analytics using Genetic Programming 183

predictive modeling system to find symptoms of damage, deterioration, or excessive wear

in future flights.

Figure 10: RCC is a lightweight heat-shielding material (NASA, 2008).

In the years of 2008, 2009, 2010, and 2011 NASA assembled a Tiger Team to study
potential issues with the shuttle’s Reinforced Carbon-Carbon (RCC) leading-edge panel
(Dale, 2008). The Tiger Team’s investigation generated huge amounts of structured and
unstructured data of the RCC panels. This big data was able to be used with different
methodologies to build analysis and predictor models. One of the methodologies studied
was GP.


We will be explaining in more detail step 6 of the framework outlined in the Section
Complexity and Predictive Analytics. We are assuming that steps 1 – 5 have been
completed successfully (an effort that can take several months for this case study).

Knowledge Discovery and Predictive Modeling

Input engineering is about the investigation of the most important predictors. There
are different phases such as attribute selection to select the most relevant attributes. This
involves the removing of the redundant and/or irrelevant attributes. This will lead to
simpler models that are easier to interpret and we can add some structural knowledge.
There are different filters to be used with the respective objectives such as:

 Information Gain
 Gain ratio
184 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

 Correlation
 High correlation with class attribute
 Low correlation with other attributes

Another important factor is to select individual attributes and subsets of them. The
direction of the search (e.g., best first, forward selection) is an important decision. In
addition, the selected approach was the one of model of models for the RCC problem. A
very important issue is to look for kernels, levels of interactions, and synthetic attributes.
Visualization is always important (there are many tools available for visualization).
We learned from visualizations that the relative location of the panel and the position of a
specific point in the area of a panel are important factors to differentiate the level of wear
and deterioration (Figure 11).
Attribute subset evaluators and crossvalidation were used with best-first and
backward (starting from complete set) using neural networks (Backpropagation). This
was performed to better understand the data.

Figure 11: Visualization of the average deterioration of specific panels for the three NASA shuttles.

Synthetic Attributes

Synthetic attributes are combinations of single attributes that are able to contribute to
the performance of a predictor model. The synthetic attribute creates higher dimensional
feature spaces. This higher dimensional feature spaces support a better classification
performance. For example, Cosine (X * Y2) is a synthetic variable formed by the single
Predictive Analytics using Genetic Programming 185

attributes X and Y. Therefore, GP can contribute not only to a complete solution but also
providing synthetic attributes.


The historical data is randomly split in two groups: one to build the model and the
other to test and confirm the accuracy of the prediction model. The approach of using two
groups of data can be used in a variety of AI algorithms to find the best set of predictors.
The majority of the schemes in machine learning use the confusion matrix as a way
to measure the performance using the test data. The confusion matrix finds the number of
“individuals” for which the prediction was accurate. On the other hand, with the decile
table it’s possible to identify the specific individuals which have better performance. The
decile tables measures the accuracy of a predictive model versus a prediction without
modeling (Ratner, 2011).
The decile table is use to score the test sample on a scale of 1 to 100 based upon the
characteristics identified by the algorithm, depending on the problem context. The list of
individuals in the test sample is then rank ordered by score and split into 10 groups,
called deciles. The top 10 percent of scores was decile one, the next 10 percent was decile
two, and so forth. Decile separates and orders the individuals on an ordinal scale. Each
decile has a number of individuals; it is the 10% of the total size of the sample test. Then
the actual number of responses in each decile is listed. Then, other analysis such as
response rate, cumulative response rate, and predictability (based on the cumulative
response rate) can be performed. The performance in each decile can be used as an
objective function for machine learning algorithms.

Genetic Programming Software Environment

The GenIQ System (Ratner, 2008; 2009), based on GP, is utilized to provide
predictive models. GenIQ lets the data define the model, performs variable selection, and
then specifies the model equation.
The GenIQ System develops the model by performing generations of models so as to
optimize the decile table. As explained by Ratner [16] “Operationally, optimizing the
decile table is creating the best possible descending ranking of the target variable
(outcome) values. Thus, GenIQs prediction is that of identifying individuals, who are
most-likely to least-likely to respond (for a binary outcome), or who contribute large
profits to small profits (for a continuous outcome).”
We decided to use a file with information about thermography and some selected
flights from Atlantis, Discovery, and Endeavour from the different databases available in
186 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

this project (in the order of petabytes) The data set was split in two separate sets: One for
training and the other one for validation. The objective was to predict when to do over-
haul of the respective RCC panel.
Table 1 shows the decile table for the 8,700 examples of the validation dataset (with
24 input parameters). There are 870 examples for each decile as shown in the first
column. The second column shows the predicted responses of the different deciles which
were able to be predicted by the model. The third column is just the predicted response
rate in %. The fourth column is the cumulative response rate starting from the top decile
to the bottom one. For example, for the top decile is 856 divided by 870. On the other
hand, the cumulative response rate for the second decile is 856 plus 793 (1,649) divided
by the addition of 870 and 870 for the second decile (1,740). The Fifth column shows a
comparison between the different deciles with respective to the bottom one. For example,
the value of 1.32 for the top decile tells us that the model predicts 1.32 better than an
answer provided by no model (just randomly). The value of 1.32 is obtained by dividing
the predicted response rate of the top decile (98%) divided by the predicted response rate
of the bottom decile (74%). Therefore, that is the predictability.

Table 1: Decile table with the respective columns.

Figure 12 shows the bar-graph for the predicted responses. It is flat in general (i.e.,
the predicted response of the 4th decile is greater than the one from the 3rd decile). The
bars seem to be the same height for the first 5 deciles. Therefore, the model has moderate
performance (74% in the validation set).
Predictive Analytics using Genetic Programming 187

Figure 12: Predicted responses for each decile (from top to bottom).

The GenIQ Response Model Tree, in Figure 13, reflects the best model of the decile
table shown in Table 1. The model is represented using a tree structure. The output of the
GenIQ Model is two-fold (Ratner, 2008): a graph known as a parse tree (as in Figure 13).
A parse tree is comprised of variables, which are connected to other variables with
functions (e.g., arithmetic {+, -, /, x}, trigonometric {sine, tangent, cosine}, Boolean
{and, or, xor}). In this case, it is a model to predict when to do the overhaul. This model
was very simple and the performance in the validation set (74%) was very comparable to
other models using neural networks trained with the backpropagation paradigm.

Figure 13: Example of one of the earlier GP Models developed to calibrate the genetic process and the
generation of specific data. The model tries to predict when to do the overhaul.

After this moderate performance, the emphasis was on synthetic variables to be used
with neural networks. It was decided to develop a synthetic variable denominated Quality
Index (that was the value obtained from thermography). This synthetic variable is
displayed in Figure 14. The GenIQ Response Model computer code (model equation) is
188 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

also shown in Figure 15. This model can be deployed using any hardware/software

Figure 14: Example of one of the basic Genetic Programming Models developed to determine the
Quality Index of the composite materials as a synthetic attribute.

Figure 15: Programming code of the tree presented in Figure 14.

Predictive Analytics using Genetic Programming 189


Our experience working with complex problems, incomplete data, and high noise
levels have provided us with a more comprehensive methodology where machine
learning base-models can be used with other types of empirical and exact models. Data
science is very popular in the marketing domain where first-principle models are not
common. However, the next frontier of big data analytics is to use information fusion -
also known as multi-source data fusion (Sala-Diakanda, Sepulveda & Rabelo, 2010). Hall
and Llinas (1997) define data fusion as “a formal framework in which are expressed
means and tools for the alliance of data originating from different sources, with the aim
of obtaining information of greater quality”. Information fusion is going to be very
important to create predictive models for complex problems. AI paradigms such as GP,
are a philosophy of the “data fits the model.” This viewpoint has many advantages for
automatic programming and the future of predictive analytics.
As future research, we propose combining GP concepts with operations research and
operations management techniques, to develop methodologies where the data helps the
model creation to support prescriptive analytics (Bertsimas & Kallus, 2014). As we see in
this paper these methodologies are applicable to decision problems. In addition, it is a
current tendency in the prescriptive analytics community to find and use better metrics to
measure the efficiency of the models besides the confusion matrix or decile tables.
Another important point for engineered systems is the utilization of model-based system
engineering. SysML can be combined with ontologies in order to develop better GP
models (Rabelo & Clark, 2015). One point is clear: GP has the potential to be superior to
regression/classification trees due to the fact that GP has more operators which include
the ones from regression/classification trees.


We would like to give thanks to Dr. Bruce Ratner. Bruce provided the GenIQ Model
for this project ( In addition, we would like to give thanks to the
NASA Kennedy Space Center (KSC). KSC is the best place to learn about complexity.
The views expressed in this paper are solely those of the authors and do not
necessarily reflect the views of NASA.


Bertsimas, D., & Kallus, N. (2014). From predictive to prescriptive analytics. arXiv
preprint arXiv:1402.5481.
190 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

Blum, C., Chiong, R., Clerc, M., De Jong, K., Michalewicz, Z., Neri, F., & Weise, T.
2011. Evolutionary optimization. In Variants of Evolutionary Algorithms for Real-
World Applications. Chiong, R., Weise, T., Michalewicz, Z. (eds.) Berlin/Heidelberg:
Springer-Verlag, 1–29.
Chen, C. & Zhang, C. 2014. Data-intensive applications, challenges, techniques and
technologies: a survey on big data. Information Sciences, 275, 314–347.
Cramer, E., Winfree, W., Hodges, K., Koshti, A., Ryan, D. & Reinhart, W. 2006. Status
of thermal NDT of space shuttle materials at NASA. Proceedings of SPIE, the
International Society for Optical Engineering, 17-20 April, Kissimmee, Florida.
Dale, R. (2008, July 23). RCC investigation: Tiger Team reveals preliminary findings.
Retrieved from
Deb, K. 2001. Multi-objective optimization using evolutionary algorithms. Hoboken: NJ:
John Wiley & Sons.Frawley, W., Piatetsky-Shapiro, G., &Matheus, C. 1992.
Knowledge Discovery in Databases: An Overview. AI Magazine, 13(3), 213–228.
Gobble, M. 2013. Big Data: The Next Big Thing in Innovation. Research Technology
Management, 56(1): 64-66.
Goldberg, E. 1989. Genetic algorithms in search, optimization, and machine learning.
Boston, MA: Addison-Wesley Professional.
Gordon, M. 1998. Leading Edge Structural Subsystem and Reinforced Carbon-Carbon
Reference Manual. Boeing Document KLO-98-008.
Hall, D. &Llinas, J. 1997. An introduction to multisensor data fusion. Proceedings of the
IEEE, 85(1), 6-23.
Holland, J. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor, MI:
University of Michigan Press.
NASA. 2002. Thermal Protection System. Retrieved from https://spaceflight
NASA. 2004. Report of Columbia Accident Investigation Board: Chapter 1. Retrieved
NASA. 2005. Space Shuttle Basics. Retrieved from
NASA. 2006. Shuttle Left Wing Cutaway Diagrams. Retrieved from
NASA. 2008. Reinforced Carbon-Carbon. Retrieved from
Koza, J. 1994. Genetic Programming II: Automatic Discovery of Reusable Programs.
Cambridge, MA: MIT Press.
Koza, J., Bennett, F.H., Andre, D., & Keane, M. 1999. Genetic Programming III:
Darwinian Invention and Problem Solving. San Francisco, CA: Morgan Kaufmann.
Koza, J., Keane, M.A., Streeter, M., Mydlowec, W., Yu, J., & Lanza, G. 2003. Genetic
Programming IV: Routine Human-Competitive Machine Intelligence. Norwell, MA:
Kluwer Academic Publishers.
Predictive Analytics using Genetic Programming 191

Legarra, L., H. Almaiki, Elabd, J. Gonzalez, M. Marczewski, M. Alrasheed, & L. Rabelo

(2016), "A Framework for Boosting Revenue Incorporating Big Data," Journal of
Innovation Management. Vol. 4, 1 (2016) 39-68.
Lyle, K. & Fasanellaa, E. 2009. Permanent set of the Space Shuttle Thermal Protection
System Reinforced Carbon–Carbon material. Composites Part A: Applied Science
and Manufacturing, 40(6-7), 702-708.
Madaras, E., Winfree, W., Prosser, W., Wincheski, R., & Cramer, K. 2005.
Nondestructive Evaluation for the Space Shuttle’s Wing Leading Edge. 41st
AIAA/ASME/SAE/ASEE Joint Propulsion Conference & Exhibit, 10 - 13 July 2005,
Tucson, Arizona.
Piszcz, A. & Soule, T. 2005. Genetic programming: Parametric analysis of structure
altering mutation techniques. In Rothlauf, F.; Blowers, M.; Branke, J.; Cagnoni, S.;
Garibay, I. I.; Garibay, O.; Grahl, J.; Hornby, G.; de Jong, E. D.; Kovacs, T.; Kumar,
S.; Lima, C. F.; Llora, X.; ` Lobo, F.; Merkle, L. D.; Miller, J.; Moore, J. H.; O’Neill,
M.; Pelikan, M.; Riopka, T. P.; Ritchie, M. D.; Sastry, K.; Smith, S. L.; Stringer, H.;
Takadama, K.; Toussaint, M.; Upton, S. C.; and Wright, A. H., eds., Genetic and
Evolutionary Computation Conference (GECCO2005) workshop program, 220–227.
Washington, D.C., USA: ACM Press.
Rabelo, L. & Clark, T. 2015. Modeling Space Operations Systems Using SysML as to
Enable Anomaly Detection. SAE Int. J. Aerosp. 8(2): doi:10.4271/2015-01-2388.
Rabelo L., P. Fishwick, Z. Ezzell, L. Lacy, and N. Yousef. 2012. Ontology-Centred
Integration for Space Operations. Journal of Simulation, 6(2012), 112–124,
Rabelo, L., Marin, M., & Huddleston, L. 2010. Data Mining and Complex Problems:
Case Study in Composite Materials. International Journal of Aerospace, 2(1), 165-
Rabelo, L., Sala-Diakanda, S., Pastrana, J., Marin, M., Bhide, S., Joledo, O., & Bardina,
J. 2013. Simulation modeling of space missions using the high level architecture.
Modeling and Simulation in Engineering, 2013, 11-18.
Rabelo, L., Sepulveda, J., Compton, J., & Turner, R. 2006. Simulation of Range Safety
for the NASA Space Shuttle. Aircraft Engineering and Space Technology Journal,
78(2), 98-106.
Rabelo L., Y. Zhu, J. Compton, and J. Bardina. 2011. Ground and Range Operations for a
Heavy-Lift Vehicle: Preliminary Thoughts. International Journal of Aerospace, 4(2),
Ratner, B. 2008. The GenIQ Model: FAQs.
GenIQ_FAQs.pdf, last accessed on June 10, 2017.
Ratner, B. 2009. Historical Notes on the Two Most Popular Prediction Models, and One
Not-yet Popular Model., last accessed on June 10,
192 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

Ratner, B. 2011. Statistical and Machine-Learning Data Mining: Techniques for Better
Predictive Modeling and Analysis of Big Data. 2nd Edition. Boca Raton, Florida:
CRC Press.
Sala-Diakanda, S., Sepulveda, J., & Rabelo, L. 2010. An information fusion-based metric
for space launch range safety. Information Fusion Journal, 11(4), 365-373
Stockwell, A. 2005. The influence of model complexity on the impact response of a
shuttle leading-edge panel finite element simulation. NASA/CR-2005-213535, March
Witten, I. & Frank, E. 2005. Data Mining: Practical Machine Learning Tools and
Techniques (Second Edition). San Francisco, CA: Morgan Kaufmann Publishers.


Dr. Luis Rabelo was the NASA EPSCoR Agency Project Manager and currently a
Professor in the Department of Industrial Engineering and Management Systems at the
University of Central Florida. He received dual degrees in Electrical and Mechanical
Engineering from the Technological University of Panama and Master’s degrees from the
Florida Institute of Technology in Electrical Engineering (1987) and the University of
Missouri-Rolla in Engineering Management (1988). He received a Ph.D. in Engineering
Management from the University of Missouri-Rolla in 1990, where he also did Post-
Doctoral work in Nuclear Engineering in 1990-1991. In addition, he holds a dual MS
degree in Systems Engineering & Management from the Massachusetts Institute of
Technology (MIT). He has over 280 publications, three international patents being
utilized in the Aerospace Industry, and graduated 40 Master and 34 Doctoral students as

Dr. Sayli Bide is a researcher in virtual simulation and safety. She received a PhD in
Industrial Engineering from University of Central Florida (UCF) in 2017. She has
completed M.S. in Engineering Management in 2014 from UCF and B.S. in Electronics
Engineering from University of Mumbai, India in 2009. She has work experience in
software engineering in a multinational software company. Her research interests include
health and safety, modeling and simulation, ergonomics and data analytics.

Dr. Mario Marin is a Researcher and Instructor at the department of Industrial

Engineering and Management Systems (IEMS) at University of Central Florida in
Orlando, Florida. He received his Ph.D. and M.S. degree in Industrial Engineering from
University of Central Florida (UCF) in 2014 and 2003 respectively. He has over 10
years’ experience as an Industrial Engineer, Designer and as a Project Engineer on
various technical projects.
Predictive Analytics using Genetic Programming 193

Edgar Gutierrez is a Research Affiliate at the Center for Latin-American Logistics

Innovation (CLI), from the MIT SCALE Network, and a Fulbright Scholar currently
pursuing his PhD in Industrial Engineering & Management Systems at the University of
Central Florida (UCF) (Orlando, FL, USA). His educational background includes a B.S.
in Industrial Engineering from University of La Sabana (2004, Colombia). MSc. in
Industrial Engineering, from University of Los Andes (2008, Colombia) and Visiting
Scholar at the Massachusetts Institute of Technology (2009-2010, USA). Edgar has over
10 years of academic and industry experience in prescriptive analytics and supply chain
management. His expertise includes machine learning, operation research and simulation
techniques for systems modelling and optimization
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 9



Abdulrahman Albar1,*, Ahmad Elshennawy2,

Mohammed Basingab3 and Haitham Bahaitham4
Industrial Engineering, Jazan University, Jazan, Saudi Arabia
Industrial Engineering & Management Systems, University of Central Florida,
Orlando, Florida, US
Industrial Engineering, King Abdulaziz University, Jeddah, Saudi Arabia
Industrial Engineering, King Abdulaziz University, Rabigh, Saudi Arabia


Emergency Departments (EDs) represent a crucial component of any healthcare

infrastructure. In today’s world, healthcare systems face growing challenges in delivering
efficient and time-sensitive emergency care services to communities. Overcrowding
within EDs represents one of the most significant challenges for healthcare quality.
Research in this area has resulted in creating several ED crowding indices, such as
National Emergency Department Overcrowding Scale (NEDOCS) that have been
developed to provide measures aimed at mitigating overcrowding. Recently, efforts made
by researchers to examine the validity and reproducibility of these indices have shown
that they are not reliable in accurately assessing overcrowding in regions beyond their
original design settings. To overcome the shortcomings of previous indices, the study
presents a novel framework for quantifying and managing overcrowding based on
emulating human reasoning in overcrowding perception. The framework of this study
takes into consideration emergency operational and clinical factors such as patient

* Corresponding Author Email:

196 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

demand, patient complexity, staffing level, clinician workload, and boarding status when
defining the crowding level. The hierarchical fuzzy logic approach is utilized to
accomplish the goals of this framework by combining a diverse pool of healthcare expert
perspectives while addressing the complexity of the overcrowding issue.

Keywords: Overcrowding, Healthcare, Emergency Department, Expert Knowledge,

Fuzzy logic


The demand of healthcare services continues to grow, and lack of access to care
services has become a dilemma due to the limited capacity and inefficient use of
resources in healthcare. (Bellow & Gillespie, 2014). This supply-demand imbalance and
resulting access block is causing overcrowding in healthcare facilities, one type of which
is emergency departments. These essential healthcare centers serve as a hospital’s front
door and provide emergency care service to patients regardless of their ability to pay.
According to the American Hospital Association (AHA) annual survey, the visits to
emergency departments in the USA exceeded 130 million in 2011 (AHA, 2014). In Saudi
Arabia, the Ministry of Health (MoH) reported nearly 21 million visits in 2012 (MOH,
2014). With this massive demand on emergency care services, emergency departments
mostly operate over capacity and sometimes report ambulance diversion.
When ED crowding started to become a serious problem, a need appeared to quantify
the problem to offer support in making emergency care operational decisions (Johnson &
Winkelman, 2011). As a result, four ED crowding measurement scales were developed
which are Real-time Emergency Analysis of Demand Indicators (READI) (Reeder &
Garrison, 2001), Emergency Department Work Index (EDWIN) (Bernstein, Verghese,
Leung, Lunney, & Perez, 2003), National Emergency Department Overcrowding Score
(NEDOCS) (Weiss et al., 2004), and Work Score (Epstein & Tian, 2006). However,
many criticized the reliability, reproducibility, and validity of these crowding
measurement scales when implemented in emergency settings outside of the regions they
were originally developed in. Moreover, their efficiency has been a concern, especially
with regards to their dependency solely on emergency physicians’ and nurses’
Currently, ED crowding has become a serious issue in many healthcare organizations
which affects both operational and clinical aspects of emergency care systems (Eitel,
Rudkin, Malvehy, Killeen, & Pines, 2010; Epstein et al., 2012). To evaluate such an
issue, healthcare decision makers should be provided with a robust quantitative tool that
measures the problem and aids in ED operational decision making (Hwang et al., 2011).
To achieve this, the proposed study aims to develop a quantitative measurement tool of
evaluating ED crowding that captures healthcare experts’ opinions and other ED
Managing Overcrowding in Healthcare using Fuzzy Logic 197

stakeholder’s perspectives and has the ability to be applied in variety of healthcare



As shown in Figure 1, the proposed framework encompasses four components,

including the crisp inputs, a fuzzy logic system, the expert knowledge, and crisp outputs.
The figure further illustrates the relation between these components. While a fuzzy
system alone may be simple to design in general, what makes this framework novel is its
integration of expert knowledge in the form of a knowledge base with the fuzzy system.
The crisp inputs include identified measures and indicators that reflect many ED and
hospital operational aspects that affect ED’s crowding levels. The crisp inputs feed the
second component of the framework, the fuzzy logic system, with numerical information.
The fuzzy logic system includes the fuzzifier, fuzzy inference engine, knowledge base,
and defuzzifier, at which the crisp ED crowding measures are converted to crisp output.
Expert knowledge is used to construct knowledge base, consisting of the fuzzy rules and
the database, which fuzzifies inputs, provides supporting decision making information to
the inference engine, and defuzzifies outputs. The resulting crisp output reflects the level
of overcrowding in the ED. The output of the framework is an index of ED overcrowding
that aids in measuring patient congestion and patient flow within EDs. It is a quantitative
instrument that evaluates the ED crowdedness based on the input of healthcare experts.
The output can be utilized with a decision support system to inform and aid an ED in
coping with ED crowding.

Figure 1: Proposed framework.

198 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.


Hierarchical fuzzy systems (HFSs) are implemented by researchers for two main
purposes. First, they help in minimizing the total number of fuzzy rules in the knowledge
base which feed into the fuzzy inference engine. Second, the HFSs are effective in
building the logical relationship among different crisp input variables in complex
systems, unlike Standard Fuzzy Systems (SFSs), which become exponentially
complicated as the number of variables and their fuzzy sets’ levels increase. Figure 2 and
Figure 3illustrate the difference between applying traditional standard fuzzy logic
approach versus applying hierarchical fuzzy logic approach to construct and determine
the relationship between a fuzzy subsystem’s crisp outputs and the main fuzzy system,
where On stands for the crisp output of fuzzy subsystem n, and O f stands for the crisp
output of the main fuzzy system [7]. In the case of SFSs, the total number of fuzzy rules
related to the number of crisp inputs is exponentially proportional, whereas it is linearly
proportional in HFSs. For instance, supposing that there are five crisp variables, and each
variable encompasses five fuzzy sets, then for utilizing a SFS, the total number of fuzzy
rules for the whole fuzzy system is (55 = 3125 rules), whereas in a four-level HFS with
four fuzzy subsystems, each encompassing two crisp inputs, the total number of fuzzy
rules for the complete fuzzy system is (52 = 100 rules). It is clear that utilizing HFSs
significantly reduces the total number of fuzzy rules necessary to construct the
knowledge bases for the whole fuzzy system. Thus, utilizing HFSs in this study makes it
possible to analyze the complicated nature of emergency health care systems, which if
studied through SFSs, could involve too many fuzzy rules and computations for an
effective analysis. It is also notable that using HFSs detailed in Figure 3, will help in
determining the relationship between outputs of the fuzzy subsystems and the main fuzzy
system, and in specifying the relationship among fuzzy subsystems as well.

Figure 2: Standard fuzzy logic system. Figure 3: Hierarchical fuzzy systems.

Managing Overcrowding in Healthcare using Fuzzy Logic 199


In order to define the fuzzy subsystems, Asplin’s comprehensive ED overcrowding

conceptual model was utilized, which divides emergency care processes into three
interdependent phases: input, throughput and output (Asplin et al., 2003). Each phase in
Asplin’s model contributes significantly to the level of ED crowding, and this research
adapts these phases in Asplin’s conceptual model in developing the ED overcrowding
quantification tool. Many previous studies have taken into consideration three ED
operational aspects: emergency care demand, ED workload, and discharge status in
developing quantitative instruments for crowding. These same operational aspects are
adapted into the framework developed in this study, as shown in Figure 4. By utilizing
fuzzy logic, this study overcomes the limitations of previous studies, by quantifying the
opinion of experts with different perspectives, to reduce the introduction of bias in the
final assessment of crowding.
In addition to the three phases of Asplin’s model, information from ED professionals
and experts is integral to the framework used in this study. This research proposes a
three-level hierarchical fuzzy logic system which is developed based on available
information and knowledge from experts. The purpose of this proposed fuzzy system is to
accurately determine the level of ED crowding. Like the fuzzy system as shown in Figure
3, the proposed fuzzy logic system includes seven inputs, four fuzzy inference systems
(fuzzy subsystems), and one output. The seven inputs of the proposed fuzzy logic system
are developed corresponding to four subsystems, related to Asplin’s three interdependent
phases, and are defined as follows:

Input 1: Patient Demand; Ratio of Waiting Patients to ED Capacity

Input 2: Patient Complexity (Waiting Area)
Input 3: ED Physician Staffing
Input 4: ED Nurse Staffing
Input 5: ED Occupancy Rate
Input 6: Patient Complexity (Emergency Room)
Input 7: Boarding Status; Ratio of Boarded Patients to ED Capacity

Figure 4: Determinants of ED crowding level.

200 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

Figure 5: Three-level hierarchical fuzzy expert.

Figure 5 further illustrates the relation of these inputs to the proposed fuzzy logic
system. Level one of the hierarchical fuzzy expert system contains two fuzzy subsystems.
The first fuzzy subsystem aims to assess the ED’s demand status by evaluating the
ratio of patients in an ED waiting area to that emergency room’s capacity, and the
average patient complexity. Figure 6 illustrates the components of fuzzy subsystem I. The
first input to the fuzzy subsystem I is the ratio of waiting patients to ED capacity which is
characterized by four fuzzy membership functions; “Low”, “Medium”, “High”, and
“Very High”. To assess this input variable, trapezoidal functions are utilized to evaluate
the membership degree on an interval [0, 2]. The patient complexity, the second input to
the fuzzy subsystem I, is represented by three membership functions; “Low”, “Medium”,
and “High”. Similarly, a trapezoidal function is used for this input, evaluating the
membership degree on the interval [1, 5], which is adapted from the five levels of the
emergency severity index (Gilboy, Tanabe, Travers, Rosenau, & Eitel, 2005). Given
these fuzzy classes, the total number of fuzzy rules from this subsystem will be 12 fuzzy
rules (4×3). The output of fuzzy subsystem I is ED’s demand status, which is represented
by five membership functions; “Very Low”, “Low”, “Medium”, “High”, and “Very
High”. This output is evaluated with a triangular function for the interval [0, 100]. The
demand status is an intermediate variable rather than a final indicator, which feeds the
fourth and final fuzzy subsystem with a crisp value, to contribute to the final assessment
of the ED’s crowding level.
The second fuzzy logic subsystem, with two inputs and one output, is designed to
determine the level of ED staffing. Figure 7 presents the components of fuzzy subsystem
II. ED staffing status is subjective in nature and the membership functions that represent
this aspect of crowding reflect this subjectivity based on the knowledge from the health
care experts. The two inputs of this fuzzy subsystem are the level of ED physician
staffing and ED nurse staffing. Both inputs are represented by three membership
Managing Overcrowding in Healthcare using Fuzzy Logic 201

functions; “Inadequate”, “Partially adequate”, and “Adequate”, which are assessed on the
intervals [0, 0.32], and [0, 50], respectively, with trapezoidal functions. With these
membership functions, the total number of fuzzy rules in this subsystem will be 9 rules
(32). The output of the fuzzy subsystem two is ED staffing status. The output is
represented by the same three membership functions; “Inadequate”, “Partially adequate”,
and “Adequate”, and is evaluated on a trapezoidal function with the interval [0, 100]. The
ED staffing status is an intermediate variable that feeds the third fuzzy subsystem with a
crisp value, which will serve as another variable for the assessment of the ED workload.
Finally, the ED workload will feed into the fourth fuzzy subsystem.

Figure 6: Fuzzy logic subsystem I. Figure 7: Fuzzy logic subsystem II.

The third fuzzy logic subsystem evaluates the ED workload. The three inputs of this
fuzzy subsystem are ED staffing level, ER occupancy rate, and average complexity of
patients who are being treated in the emergency room. It should be noted that the third input
shares the same characteristics of the second input of subsystem one, with the difference
being that the populations of these similar inputs are separate. Figure 8 illustrates the
components of fuzzy subsystem III. The ED staffing status, input one, is the output from
subsystem II, and is represented by three membership functions; “Inadequate”, “Partially
adequate”, and “Adequate”. Using the same membership function, this input is evaluated with
a trapezoidal function on the interval [0, 100]. The ER occupancy rate, which is an
independent input, is characterized by four membership functions; “Low”, “Medium”,
“High”, and “Very High”. The occupancy rate is evaluated with a trapezoidal function in the
interval [0, 100]. The third input, patient complexity shares characteristics from the second
input to the fuzzy subsystem I, as previously mentioned. Therefore, this third input is
represented by three membership functions; “Low”, “Medium”, and “High”, and is evaluated
with a trapezoidal function in the interval [1, 5]. With the three sets of membership indicators
in this subsystem, the number of fuzzy rules will now reach 36 rules (32×4). The single output
of the third fuzzy logic subsystem is the ED workload. It is represented by four membership
functions; “Low”, “Medium”, “High”, and “Very High”. As other outputs are evaluated in
202 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

this interval of [0,100], this output is evaluated in the same interval, and its membership value
is assessed with a triangular function. The ED workload is an intermediate variable that feeds
the fourth fuzzy subsystem, and represents a major determinate of crowding by containing
four of the seven inputs alone. Combined with the output of subsystem I and the final input,
the output of subsystem III will contribute to subsystem IV’s assessment of emergency
department crowding.
In review, the first level of the hierarchical fuzzy expert system was composed of two
fuzzy logic subsystems, with the second level containing one subsystem, which is also
detailed in Figure 5. Level three of the hierarchical fuzzy expert system contains the fourth
and final fuzzy logic subsystem, which receives inputs in some manner from every previous
This fourth fuzzy logic subsystem is the main component of this hierarchical fuzzy expert
system which aims to assess the ED crowding level. The three inputs of this fuzzy subsystem
include the two previously mentioned indicators ED demand status and ED workload, and the
third, new input, which is the seventh independent input of the entire hierarchical system, is
ED boarding status. The components of fuzzy subsystem IV are illustrated in Figure 9. The
first input to this subsystem, the ED demand status, as previously described, is represented by
five triangular membership functions; “Very Low”, “Low”, “Medium”, “High”, and “Very
High”, with an interval of [0, 100]. The second input, the ED workload is represented by four
triangular membership functions; “Low”, “Medium”, “High”, and “Very High”. Its interval of
the crisp value is [0,100]. The third input, ED boarding status, is an independent variable,
which is derived from the ratio of boarded patients to the capacity of the emergency room.
This input has four fuzzy classes as the second input, but is evaluated with a trapezoidal
membership function on an interval of [0, 0.4]. With the three sets of membership indicators
in this subsystem, the number of fuzzy rules is 80 (42×5). The output of the fourth fuzzy logic
subsystem is the ED crowding level, and is the final output for the entire hierarchical system.
It is represented by five membership functions; “Insignificant”, “Low”, “Medium”, “High”,
and “Extreme”, which are used to indicate the degree of crowding in emergency departments.
Like other outputs, the interval of the crisp value for the final output is [0,100], and is
evaluated with a triangular function.
Utilizing the hierarchical fuzzy system appears to be the most appropriate approach for
this study, rather than the standard fuzzy system. This approach creates different indicators,
such as demand status, workload, and staffing indicators, while reducing the total number of
fuzzy rules from 5184 (under the standard fuzzy system) to just 137 rules. This difference
represents a great reduction in calculation and simplifies the process of acquiring knowledge
from experts, and potentially reduces the threshold for academic access to meaningful results.
Managing Overcrowding in Healthcare using Fuzzy Logic 203

Figure 8: Fuzzy logic subsystem III. Figure 9: Fuzzy logic subsystem IV.


This section describes the technical process of developing the proposed fuzzy expert
system, which would equip the designed framework with a knowledge base, a fuzzy
inference engine, fuzzifier and defuzzifier. The knowledge base consists of a fuzzy
database and a fuzzy rule base, in order to fuel the fuzzifier, defuzzifier, and inference
engine portions of the fuzzy subsystems.
First, the elicitation of expert knowledge for building the fuzzy database is described.
Secondly, this section also describes the process of developing fuzzy rules. Finally, the
fuzzification and the defuzzification processes are conceptually and mathematically

Knowledge Base

The knowledge base is an indispensable component of any fuzzy logic system, as it

contains both the fuzzy rules base and the database. The development of the knowledge
base is keystone for the fuzzy system, and is the most challenging aspect of designing the
proposed model. The importance of this knowledge base stems from the dependency of
the other component of the system on it, including the fuizzifier, defuzzifier, and fuzzy
inference engine. Effectively, the knowledge base is the brain of the fuzzy system,
simulating reasoning from a human perspective. The creation of the knowledge base
involves systematic collection of qualitative and quantitative data from subject matter
experts. These experts have to meet the following criteria in order to be eligible to
participate in the membership intervals determination and fuzzy rules evaluation:

 The expert works or has recently worked in Saudi Arabia healthcare institutions
for at least five years, or has conducted research in the field of Saudi healthcare.
204 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

 The expert has deep experience in the daily operations of emergency care
 The expert has solid knowledge in staffing, performance management, healthcare
administration, patient flow analysis, and bed management.

To create a robust knowledge base for the proposed fuzzy system, a minimum of ten
experts are required who meet these qualifications. While discussing these experts here
for the purposes of analyzing their data, and elsewhere in this study, an assigned code
“HCE-k” will be issued for each participated expert, where HCE stands for Healthcare
Expert, and k stands for the expert number.


This study adapts the indirect interval estimation elicitation method. Such a method
carries advantages such as allowing responses from multiple subject matter experts, while
not requiring knowledge of membership functions. Additionally, under this approach,
fewer questions may be used, and given questions may be easier to answer than those in
other approaches. To elicit the degrees of membership for a fuzzy class, let [𝑥𝑗𝑖,]
represent the interval values of the fuzzy class j that is determined by expert i. The steps
to elicit and analyze expert knowledge are described as follows:

- Determine all interval values for each j obtained from experts.

- Perform an intersection for j subset intervals to obtain expert consensus.
- Find ambiguous areas among determined intervals.

Fuzzy Rule Base

The fuzzy rule base is the other key part to the knowledge base, including the
database. It stores all derived fuzzy rules, which is intended to provide the fuzzy
inference engine with decision support information within each subsystem. To robustly
create fuzzy rules for each fuzzy logic subsystem, experts are given a form to assess the
consequences of each condition statement, developed from the permutation of each fuzzy
class for a given fuzzy subsystem. A total of 10 healthcare experts will participate in the
fuzzy rules assessment process. The total number of fuzzy rules to be evaluated by
subject matter experts for the fuzzy logic subsystems I, II, III, and IV are 12 (3×4),
9(32), 36(4×32), and 80(5×42), respectively. Therefore, the proposed three-level
hierarchical fuzzy expert system includes a total of 137 fuzzy rules, meaning that there
will be a total of 1370 fuzzy rule assessments from the ten experts. The process of
developing the fuzzy rules is detailed in the following steps:
Managing Overcrowding in Healthcare using Fuzzy Logic 205

 List all possible permutations of “AND” rules for each fuzzy logic subsystem.
 Code each rule with “FLSm-n” where FLS stands for Fuzzy Logic Subsystem, m
stands for the number of subsystem, and n stands for the rule number within the m
 Code “HCE-k” for each participating expert, where HCE stands for Healthcare
Expert, and k stands for the expert number.
 The Expert HCE-k determines the consequence of the fuzzy conditional statement
FLSm-n based on their expertise.
 The fuzzy conditional statement FLSm-n must meet a 50% consensus rate among
experts, and must be the only consequence to receive a 50% consensus rate, to be
accepted as a valid fuzzy rule.
 If the consensus rate does not meet the determined criteria, further iterations should
be conducted with a new expert until the consensus rate achieves the criteria in the
previous step.

The process for developing fuzzy rules is illustrated in Figure 10, where the
consensus feedback is elaborated upon in more detail.

Figure 10: Process for developing Fuzzy Rules.

Fuzzification Process

Fuzzification is the first step in the fuzzy system, as it obtains both the membership
function type and the degree of membership from the database. This database is built
from the surveyed expert determination of membership function intervals. In the
fuzzification process, crisp values which are within the universe of discourse of the input
variable are translated into fuzzy values, and the fuzzifier determines the degree to which
they belong to a membership function. The fuzzifier for this designed fuzzy system
adapts the Minimum approach. Whereas the input is crisp, the output is a degree of
membership in a qualitative set. The fuzzified outputs allow the system to determine the
degree to which each fuzzy condition satisfies each rule.
206 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

Defuzzification Process

After the fuzzifier converts numerical inputs into fuzzy values, and the fuzzy
inference engine is fed by the knowledge base to logically link the inputs to the output,
last step remaining in the fuzzy system occurs in the defuzzifier. Defuzzification is the
process where the fuzzy values are converted into crisp values. The defuzzifier is fed by
the database, and its importance lies in the fact that its crisp output is the desired product
of the entire system. Seven defuzzification methods are identified (Sivanandam, Sumathi,
& Deepa, 2007): centroid method, max-membership method, mean-max membership,
weighted average method, center of sums, first of maxima or last of maxima, and center
of largest area. This research adapts the centroid method for the defuzzification process,
and its formula is defined as following: 𝑧∗= ∫(𝑧) 𝑧𝑑𝑧 / 𝜇𝐶(𝑧) 𝑑𝑧.


In this section, the protocol was provided for eliciting expert knowledge to obtain
membership intervals, rule assessments and consensus rates, along with other data. Then,
a preparatory step must be taken to obtain results for the proposed model. In this step,
data will be prepared before it is added to the knowledge base, interval values will be
used to construct membership functions, and data from expert rule assessments will
contribute to the rule base.
Expert knowledge was sought from ten experts, designated with HCE expert codes.
The occupation and qualifications of these experts are described as follows:

 HCE-01: An experienced healthcare administrator in the Saudi public healthcare

 HCE-02: A physician, professor, and consultant of emergency medicine in
several healthcare organizations in Saudi Arabia
 HCE-03: An academic researcher specializing in operations management
 HCE-04: An emergency room physician working in a Saudi private healthcare
 HCE-05: An experienced emergency room nurse
 HCE-06: An academic researcher with experience in healthcare studies
 HCE-07: A researcher with vast experience in emergency room operations
 HCE-08: A physician from the ICU department who oversees emergency
department transfers to the ICU
 HCE-09: An emergency room physician
 HCE-10: A general physician who work closely with ED staff
Managing Overcrowding in Healthcare using Fuzzy Logic 207

Results of Expert Knowledge Acquisition

In this section, results from subject matter experts are detailed across five tables. For
each table, the results from ten experts answering five questions are listed, providing a
total of 220 intervals which are used to construct membership functions. This section will
detail the calculation of the fuzzy numbers, based on the results provided by the subject
matter experts. Table 1 contains answers from question one of the survey, in which
experts were posed with a scenario of an emergency room with capacity of 50 beds. The
answers from the expert’s evaluation are divided by 50 to obtain the ratio of waiting
patients to ED capacity, which can be applied to any ED. This question in the survey
specified the minimum and maximum values for the patient demand as 0 and 100,
respectively, in order to introduce boundaries for the membership functions. After
converting these values into ratios, the minimum and maximum values became 0 and 2,
respectively. Experts determined the patient demand on four levels; “low”, “medium”,
“high”, and “very high”. The total number of obtained intervals from question one was
Table 2 contains answers from question two of the survey, which is related to a
scenario with an emergency room capacity of 50 beds. The ratios were obtained from the
answers of subject matter experts. This question in the survey did not specify the
maximum value for the patient demand, meaning that the membership function did not
have an imposed boundary. After converting these values into ratios, the minimum and
maximum values became 0 and 0.32, respectively. Experts determined the patient
demand on three levels; “inadequate”, “partially adequate”, and “adequate”. The total
number of obtained intervals from question two was 30.

Table 1: Interval assignment for Table 2: Interval assignment for

patient demand. physician staffing.

Table 3 contains answers from question three of the survey, which is related to a
scenario with an emergency room capacity of 50 beds. Similarly, in this table, there is no
208 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

imposed upper bound for nurse staffing, which also impacts the upper bound of the last
fuzzy class. The maximum value for nurse staffing was 0.5, or 25 out of 50 beds, and
experts provided their evaluations on three fuzzy classes; “inadequate”, “partially
adequate”, and “adequate”. 30 total intervals were obtained from question three.
Table 4 contains answers from question four of the survey, regarding ER occupancy
rate, where the maximum occupancy rate was assumed to be 100 percent. Ten experts
provided intervals from their perspective on an appropriate lower and upper value for
each of the four fuzzy classes, “low”, “medium”, “high”, and “very high”. In total, 40
evaluated intervals were obtained to construct the membership functions.
Table 5 contains answers from the survey’s fifth question, and is concerned with
patient boarding. Similarly, to questions one, two, and three, this question was based on a
scenario with 50 beds, which was later converted to a ratio of boarded patients to the ER
capacity. The minimum and maximum intervals were specific at 0 and 20 patients,
respectively, which translated to ratios of 0 and 0.4. From the ten experts’ responses
across the four fuzzy classes, 40 evaluated intervals were obtained.

Table 3: Interval assignment for Table 4: Interval assignment for ER

nurse staffing occupancy rate

Table 5: Interval assignment for patient boarding

Managing Overcrowding in Healthcare using Fuzzy Logic 209

These results identify underlying differences between the evaluations of subject

matter experts, which may lead to the introduction of bias when relying on only one
perspective to implement a solution. The expert panel members who responded to each
survey question have different backgrounds and experience rooted in different areas of
emergency departments. These experts view the ER from their different perspective, as
internal and external stakeholders. Relying on only one perspective can lead to
overestimated or underestimated interval values, as seen in some cases such as the one
discussed in question two. The variation in the experts’ responses create foggy areas in
the collected data, which can be modeled by fuzzy logic. Without considering these
variations, data from experts can lead to biased conclusions.

Membership Functions

The database for subsystem I consists of membership functions for both inputs and
the output, and are structured according to the data from Table 6. Variable one, or the
demand status, consists of four trapezoidal membership functions, while variable two,
patient complexity, consists of three trapezoidal membership functions, and variable
three, the ED demand, is the output of the subsystem and has five triangular membership
The membership function representing patient demand in Figure 11 is constructed
using the fuzzy number intervals and linguistic classes provided in Table 6. For the “low”
linguistic class interval, the minimum value in the upper bound of the low class (as
observed in Table 1) is 0.2 meaning that there is 100% agreement among experts between
the values of 0 and 0.2 for “low”. The maximum value in the upper bound of the low
class is 0.5, yet the minimum value of the lower bound in the medium class is 0.2,
meaning that some experts varied in assigning the term “low” and “medium” between the
interval [0.2, 0.5]. In Figure 11, this accounts for the structure of the low class, where the
core exists between 0 and 0.2, and the support exists between 0.2 and 0.5, overlapping the
support of the medium class. The boundary for the medium class began at 0.2 and ended
at 0.8, while the boundary for the high class was between 0.6 and 1.2, and the boundary
for the very-high class was between 0.92 and 2. The core structures of the medium and
high class are small, compared to the low and very-high classes.
The membership function for patient complexity in Figure 12 was constructed from
the data provided by an expert using reverse interval estimation method. This was done
due to the need for an expert possessing medical expertise in the triage process and
familiarity with the emergency severity index. This expert directly constructed the
membership function, providing data for the three linguistic classes. Patients rated with a
value of 2 or 1 were considered “low” average complexity, and supports of this
membership function consist of patients rated between 2 and 2.5, meaning the boundary
210 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

for the low class was between 1 and 2.5. Similarly, for “medium” average complexity,
patients rated between 2.5 and 3.5 make up the core structure, and with the supports
assigned values between 2 and 2.5, and between 3.5 and 4, the entire class boundary lies
between 2 and 4. For “high” average complexity, the expert assigned values between 4
and 5 for the core area, with values between 3.5 and 4 for the support, making the
boundary for the high class between 3.5 and 5. The core areas of each class are consistent
in size, due to the data being taken from one expert instead of ten.

Figure 11: Membership function of patient Figure 12: Membership function of patient
demand. complexity.

The membership function for ED demand in Figure 13 represents the output for
subsystem one, which is considered the standard membership function for outputs. The
function is triangular, with membership degree values peaking at 1, and the boundaries
for different classes overlap the peaks of adjacent classes perfectly, demonstrating that
the membership function always obtains membership from two classes. This also means
that at any given point, the membership degree from two overlapping classes always
equals 1, but there are only five points where classes obtain membership completely.
These points occur at 0, 25, 50, 75, and 100 for “very-low”, “low”, “medium”, “high”,
and “very-high”, respectively.
In subsystem II, the membership functions for the physician staffing and nurse
staffing inputs are constructed with trapezoids for three classes. The output, ED staffing,
is also represented with a trapezoidal membership function, which features equally
spaced boundaries across three classes. Table 6 details the linguistic classes and fuzzy
numbers for subsystem II and its membership functions.
Physician staffing is represented in the membership functions in Figure 14. The three
classes overlap as seen in subsystem I, representing the regions where linguistic terms did
not reach full degrees of membership. For instance, the inadequate class core boundary
begins and ends at 0.06, representing full membership for the linguistic term
“inadequate”. The upper bound for the inadequate class is 0.12, where the linguistic term
“inadequate” achieves partial membership, and the lower bound for the partially adequate
class is 0.06, where its term also achieves partial membership. The boundaries for the
three classes are between 0 and 0.12 for the inadequate class, between .06 and 0.24 for
the partially adequate class, and between 0.16 and 0.32 for the adequate class. The
Managing Overcrowding in Healthcare using Fuzzy Logic 211

partially adequate class has the smallest core area, and the supports for all classes are
similar in size relative to each other.

Figure 13: Membership function of ED demand. Figure 14: Membership function of physician

The second input in subsystem II, nurse staffing, is represented by the membership
functions in Figure 15. The inadequate class boundaries are at 0 and 0.18, with the core
structure representing full membership existing between 0 and 0.8. The partially adequate
class lies between boundaries of 0.8 and 0.32, while the core area exists between 0.18 and
0.24. For the adequate class, the boundaries lie at 0.24 and 0.5, with the core structure
existing between 0.32 and 0.5. It is apparent that the adequate class has the largest core
area, meaning that the adequate linguistic term was given the widest variety of interval
values for full membership, while values that defined the partially adequate class were
more restrictive.
Figure 16 contains the membership functions for the output of the subsystem ED
staffing. The membership functions are trapezoidal, but the intervals are assigned to
create similarly sized membership classes. In this figure, the boundaries for the
inadequate class lie between 0 and 35, with the core existing between 0 and 25,
representing a full degree of membership. The boundaries for the partially adequate class
are 25 and 75, with the core existing between 35 and 65. For the adequate class, the
boundaries are 65 and 100, with the core area defined between 75 and 100. It can be
noted that the midpoint between the boundaries for the partially adequate class lies at 50,
which is the halfway point on the ED staffing axis, further demonstrating the uniformity
in the membership functions.

Figure 15: Membership function of nurse staffing. Figure 16: Membership function of ED staffing.
212 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

Table 6 details the data used in the membership functions of subsystem III, where
both trapezoidal and triangular membership functions are used across the three inputs and
one output. It should be noted again that the output of subsystem II, ED staffing, is an
input in subsystem III, dictating the use of a trapezoidal membership function for this
subsystem’s associated input. As this input shares the same membership function
characteristics as previously described, it will be omitted in the description of this
subsystem’s membership functions. While the populations for patient complexity input
are separate between this subsystem and subsystem I, the membership functions share the
same characteristics, and thus the membership functions for patient complexity will not
be discussed in this subsystem as well.
Figure 17 provides the trapezoidal membership functions for ER occupancy rate,
which is the second variable in Table 6, and is characterized by four linguistic terms. The
low class is bounded between the values 0 and 35, while the medium, high, and very high
classes lie between values of 20 and 65, 45 and 90, and 70 and 100, respectively. The low
class has the largest core structure, which is bounded between the values of 0 and 20, and
represents the largest interval of assigned values for full class membership. The medium
and very high classes appear to have similarly sized core areas, bound between the values
of 35 and 45 for “medium”, and 90 and 100 for “very high”. The core area for “high” is
the smallest, bound between the values of 65 and 70, and represents the smallest interval
of assigned values for full class membership.
Figure 18 provides the membership functions for the output of subsystem III, ED
workload, and triangular membership functions are assigned to four classes. Similarly, to
the membership functions from the output of subsystem I, the membership classes exist
on overlapping intervals such that at any point, the degree of membership for two classes
add up to a value of one, and there are only four points at which classes reach full degrees
of membership. These points occur at 0, 33.34, 66.67, and 100, for the low, medium,
high, and very-high classes, respectively.

Figure 17: Membership function of ER occupancy rate. Figure 18: Membership function of workload.

In Table 6, information is provided for the membership functions of the final

subsystem, subsystem IV. Among the three inputs, ED demand and ED workload have
been previously discussed in subsystems II and III, and they will be omitted in the
description of this subsystem’s membership functions.
Managing Overcrowding in Healthcare using Fuzzy Logic 213

Table 6: Parameters of fuzzy subsystem I’s, II’s, III’s, and IV’s membership functions.

The trapezoidal membership functions in Figure 19 represent the four classes used
for the boarding input in subsystem IV. Boarding was considered to be “very high”
between values of 0.26 and 0.4, making its core structure the largest while indicating the
largest interval of values where a class was assigned full membership. Between the
values of 0.16 and 0.32, boarding was considered “high”, which is associated with the
smallest membership function core structure belonging to the high class. The low and
medium classes existed between the intervals of [0, 0.12], and [0.04, 0.24], respectively.
Crowding, the final output of the system, is represented by the triangular membership
functions in Figure 20. The linguistic terms “insignificant”, “low”, “medium”, “high”,
and “extreme” were associated with the five classes. The membership functions were
214 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

assigned boundaries to create evenly distributed classes on the crowding axis, and
similarly to subsystem III and I, the degree of membership is equivalent to 1 among the
two classes existing at any given point. Only at the points 0, 25, 50, 75, and 100, do the
five respective classes individually obtain full degrees of membership.

Figure 19: Membership function of patient boarding. Figure 20: Membership function of crowding.

Results of Expert Evaluation

This section presents the results of the fuzzy rule base development and the experts’
consensus rate. The fuzzy rule base assessments are divided by subsystem, with
subsystem I producing 120 rules assessments, and subsystem II, III, and IV producing 90,
360, and 800 rule assessments, respectively, for a total of 1370 assessments obtained.
After reaching consensus, the final version of the fuzzy rules is listed in this section.
Table 7 details the results from the expert assessment of the fuzzy rules from
subsystem I. This table consists of 12 columns, beginning with the rule code, followed by
ten expert evaluations, and ending with consensus status. Below the table is a legend
comprising five linguistic classes which are color-coded. In this subsystem, two fuzzy
rules reached full consensus (100%); FLS1-11, and FLS1-12. Two rules achieved 90%
consensus: FLS1-05, and FLS1-06; four reached 80%: FLS1-01, FLS1-04, FLS1-07, and
FLS1-08; one rule reached 70% consensus: FLS1-03, and three reached 60% consensus:
FLS1-02, FLS1-09, and FLS1-10. The average consensus rate for this subsystem’s rule
assessments is 79%. Seven of the twelve evaluated rules received assessments across
only two linguistic classes, while two were assessed across three linguistic classes, and
only one received assessments exceeding more than three types of linguistic assessment.
Most the data in this subsystem is centralized around two linguistic classes. Regarding
the frequency of linguistic class use, “medium” was most frequently used to assess rules,
with 42 uses, while “high”, “low”, “very high”, and “very low” were used 30, 21, 15, and
12 times, respectively.
All of the fuzzy rule statements for subsystem I (Appendix A), after consensus, are
listed according to their rule number. This final version of the rules will be stored in the
fuzzy rule base of the knowledge base to fuel the fuzzy inference engine.
Managing Overcrowding in Healthcare using Fuzzy Logic 215

Table 7: Results of expert evaluation for subsystem I’s fuzzy rules.

Table 8 is comprised of results from the assessments of the fuzzy rules from
subsystem II. This table shares similar features from Table 7, consisting of the same
number of columns and expert evaluations. Below the table is a legend comprising three
linguistic classes which are color-coded. Within subsystem II, five of the nine rules
received 90% consensus or greater, consisting of FLS2-01, FLS2-04, FLS2-05, FLS2-06,
and FSL2-09. Three of these rules received 80% consensus, which were FLS2-02, FSL2-
07, and FSL2-08. FSL2-03 received 50% consensus. The average consensus rate for the
whole subsystem was 84%, which is higher than the previous subsystem, which featured
more fuzzy rules and linguistic classes. Seven of the evaluated fuzzy rules were assessed
with only two linguistic terms or less, and two rules were assessed with three terms. The
frequency of linguistic classes used in assessing rules was the highest in “inadequate”
with 41 uses, followed by “partially adequate”, and “adequate”, with 34 and 15 uses,

Table 8: Results of expert evaluation for subsystem II’s fuzzy rules.

The final fuzzy rule statements for subsystem II (Appendix B) after consensus are
listed according to their rule number. These final nine rules are stored in the fuzzy rule
base of subsystem II to feed the decision engine of the fuzzy system.
Table 9 contains data from the expert assessments of the fuzzy rules of subsystem III.
It is structured in the same manner as the previous fuzzy rule evaluation tables in terms of
the number of columns and what they represent, however there are four color-coded
216 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

linguistic terms that are associated with the fuzzy classes. There are a total of 360 rule
assessments in this table, which represents the assessment of 36 rules by ten experts. It is
apparent that 31 of the 36 evaluated rules were evaluated using two or fewer linguistic
terms, and the remaining rules were evaluated with no more than three terms. Five
assessed rules reached full consensus, with an agreement rate of 100%; FLS3-09, FLS3-
20, FLS3-24, FLS3-26, and FLS3-31. It is also observed that twelve assessed rules
received a consensus rate between 80% and 90%, while eighteen rules reached the range
of 60% to 70%. Finally, one rule, FLS3-02, achieved a minimum consensus rate of 50%.
The average consensus rate for this subsystem is 76%, which when compared to the
average rate of 79% for subsystem I, is relatively close, even though subsystem III
featured more inputs. When compared to subsystem II’s average consensus rate of 84%,
76% is still satisfactory, although subsystem III contained more assessment classes. The
frequency of linguistic class use in assessing rules was the highest in the “high” class
with 124 uses, followed by “medium” with 105 uses, while the least used classes were
“low” and “very high”, with 66 and 65 uses, respectively.

Table 9: Results of expert evaluation for subsystem III’s fuzzy rules.

The final list of fuzzy rules for subsystem III is provided in Appendix C, which will
be stored in the fuzzy rule base to build the fuzzy knowledge base.
The results for subsystem IV’s rule assessments are provided in Table 10, which is
the most significant subsystem in the fuzzy system. In this subsystem, ten experts
evaluated 80 rules against five assessment levels, with each rule consisting of a
combination of three AND conditions. As each rule is designed with three combinations
for the antecedent, to be assessed at five levels, this subsystem presents the highest
complexity for expert assessment.
Managing Overcrowding in Healthcare using Fuzzy Logic 217

Table 10: Results of expert evaluation for subsystem IV’s fuzzy rules.

The results show that this subsystem is the only one in the entire designed fuzzy
system that contained some rules which did not initially meet the given consensus
criteria. These rules were FLS4-16, FLS4-22, FLS4-49, FLS4-52, FLS4-57, FLS4-72,
and FLS4-78, and required an additional round of evaluation with new expert assessors.
All seven rules in question achieved the minimum criteria upon the first additional round
of evaluation, as it was likely to cause the consensus rate to cross beyond the threshold of
50%. The consensus rates of re-evaluated rules were all 54.5%, meeting the requirements.
With these additional evaluations, the total number of rule assessments was brought to
Upon analyzing the data, it can be found that seven of the assessed rules reached a
consensus rate of 100%, which were FLS4-01, FLS4-03, FLS4-07, FLS4-64, FLS4-66,
FLS4-76, and FLS4-80. Among the remaining rules, twenty-six reached consensus rates
between 80% and 90%, while thirty-five reached rates between 60% and 70%, and five
rules had a consensus rate of 50%, passing minimum consensus requirements. The
average consensus rate of this subsystem is 72%, compared to 76%, 84%, and 79% in
subsystems III, II, and I, respectively. Among the different linguistic terms used by
experts, fifty-three rules were evaluated using two or fewer of the five assessment
classes. The remaining rules received assessments using exactly three terms. For all 80
rules, the variation in expert assessment is small, as in cases where experts did not all
unanimously agree using only one linguistic term, they reached consensus using either
two linguistic terms in adjacent classes (such as “low”-“medium”, or “medium”-“high”),
or three terms describing adjacent classes (such as “insignificant”-“low”-“medium”).
After the final round of assessments, experts most frequently used “medium” to assess
rules, with 277 uses, followed closely by “high” with 269 uses, while “extreme”, “low”,
and “insignificant” were selected 126, 102, and 33 times, respectively.
218 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

The final fuzzy rules for subsystem IV are provided in Appendix D. These rules will
become an essential part of the knowledge base for subsystem IV.
The results presented in this section are a critical component of this research, as they
provide validation for the design intent of the framework, and show that the consensus
rates for rule assessments are good, necessitating only seven re-evaluations among the
initial 137 rules. The average consensus rate was 72% or better between each of the four
subsystems, which further highlights the consistency of results. It was observed that the
average consensus rate decreased noticeably in subsystems where there were either an
increase in assessment classes, more rules, or more complex rules with more conditions
for experts to evaluate. These factors contributed to each subsystem’s complexity,
contributing to the overall decrease in average consensus rate. The assessed fuzzy rules
will build upon the designed fuzzy system by feeding the four different fuzzy engines
from subsystems I-IV with supporting information to link the inputs to the outputs.


The fuzzy logic toolbox of Matlab R2015b (Version 2.2.22) was used to construct
and simulate each fuzzy subsystem individually, with data gathered from experts. A
series of 3-D surface plots were generated relating the inputs of each subsystem to their
respective outputs. This was accomplished through the products of the proposed
architecture, including the development of membership functions from quantitative data
collected from experts, and the expert subjective assessment of rules. These generated
surface plots allow for a clearer view of how the different fuzzy subsystems function, and
it makes the relation between inputs more visually accessible. Additionally, the surface
plots allow for determining the outputs of the subsystems in a straightforward manner by
only using inputs, bypassing lengthy calculations. This section provides the results from
the fuzzy logic subsystems and presents the surface plots for the output of the
Figure 21 illustrates the surface of subsystem I, defined by two input axes, patient
complexity and patient demand, and one output axis, ED demand. The values for ED
demand on the surface plot range from 8 to 92, resulting from the centroid method used
for defuzzification. Generally speaking, it can be observed on the surface that ED
demand will increase with patient complexity if patient demand is held constant, and
similarly ED demand will increase with patient demand if patient complexity is held
constant. Interestingly, when patient demand is approaches a value of 1, the ED demand
plateaus when patient complexity is between 1 and 2, unless patient complexity increases.
The step-like structure occurring for patient demand higher than 1 resembles another
local step structure for patient complexity higher than 4, where ED demand cycles
between plateaus and increases until it plateaus near its maximum value. For patient
Managing Overcrowding in Healthcare using Fuzzy Logic 219

demand less than 1 and patient complexity less than 4, the surface appears to linearly
increase in a more predictable manner than the two step-like structures near its extremes.
Figure 22 demonstrates the relation between the inputs (nurse staffing and physician
staffing) and output (ED staffing) of subsystem II, where ED staffing ranges between
scores of 14.9 and 89.1. ED staffing appears to increase in a similar manner with either
nurse staffing or physician staffing when the other input is held constant, although the
increase is not as high as when both inputs are proportionally increased. In other words,
there are several plateau planes on the surface where ED staffing will only increase when
both inputs are proportionally increased. When physician staffing is held constant, around
0.1 for instance, ED staffing will not increase after nurse staffing increases beyond 1.5,
demonstrating the logical relation between the ED staffing and the ratio between nurses
and physicians. If the ratio of physicians to nurses is low, ED staffing will be considered
low, and an ED’s staffing size and thus ability to see to patients would not likely increase
if the nursing staff was increased in size. This illustrates that a proportional number of
physicians and nurses would be required for an ED to effectively maintain a high staffing
level. It may also be noted that the slope of the surface from 50 to 89 ED staffing score is
steeper for increasing nursing staff than when physician staffing is increased, which may
be due to the different scales of the input axes.

Figure 21: Surface subsystem I. Figure 22: Surface of subsystem II.

In Figure 23, surfaces a through k represent the relation between ED workload and its
inputs, average patient complexity and ER occupancy rate when ED staffing is held at
eleven different constants, ranging from near zero to 100 for each respective surface. For
surfaces a, b, and c, when ED staffing is between near zero and 20, high ED workload
reaches scores of 60 quickly with medium occupancy rates and average patient
complexity. When average patient complexity achieves values higher than 4, and
occupancy rates achieve values higher than 50, ED workload plateaus unless both
average patient complexity and occupancy rates increase, leading to a peak area of the
surface where ED workload reaches scores near 80. When ED staffing is between 30 and
60, for surfaces d through g, the impact of better staffing can be seen on ED workload.
The increase of ED workload becomes more gradual with increasing average patient
complexity and occupancy rates, and the size of the surface representing ED workload
scores of 60 or higher decrease. In surfaces h through k, when ED staffing is between 70
220 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

and 100, the peak of the surface representing the highest scores for ED workload
becomes smaller, and areas of the surface representing increases in ED workload become
isolated in the plot, as higher values for average patient complexity and occupancy rate
become necessary to achieve high values for ED workload. This represents the impact
that increasing ED staffing to adequate levels has on ED workload, even when average
patient complexity and occupancy rates are high. There are always areas of the surfaces
where ED workload is high, however when ED staffing is increased, ED workload can be
said to decrease even for moderate values of its other two inputs.
Figure 24 consists of surfaces a through k of subsystem IV, showing the impact that
the inputs of boarding and demand have on the output of crowding, when the variable
workload is held at eleven constants, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100. In
surfaces a through c, when workload is low, crowding generally increases with boarding
and demand, however the peak values in surfaces b and c differ from surface a. The peak
of the surface decreases in size and transitions into a plateau in surfaces b and c,
indicating a wider range of input values that lead to the same high level of crowing.
In surfaces d through g when workload is between 30 and 60, the lower values of the
surface become more isolated, and all points on the surfaces appear to rise, representing
an overall increase in crowding for all values of boarding and demand. It can be observed
that increasing the ED workload evenly increases crowding under any condition of
boarding and demand.
As workload approaches values between 70 and 100, surfaces h through k show that
crowding continues to generally increase for all boarding and demand values, and the
surfaces peak at higher values. A plateau emerges in surface h, where crowding remains
constant for boarding values which exceed 0.2, when demand is below 50. Beyond
boarding values of 0.2, crowding will only increase when demand is increased beyond
50. This demonstrates that under high workload, there are consistent levels of crowding
when boarding is high, but demand is low. Only when both boarding and demand are low
does crowding achieve minimum values under high workload.

Figure 23: Sensitivity analysis subsystem III. Figure 24: Sensitivity analysis subsystem IV.
Managing Overcrowding in Healthcare using Fuzzy Logic 221


This section details the process for implementing and testing the accuracy of the
proposed fuzzy model framework, which will be described as the Global Index for
Emergency Department Overcrowding, or GIEDOC. One of the main goals of the
GIEDOC is to produce reliable results which can be reproducible in EDs of other
healthcare systems. The design of the GIEDOC accounts for this in the knowledge base,
as ten healthcare experts from a nation in question may provide data to be fed into the
knowledge base, allowing the fuzzy system to produce results. This is why the design of
GIEDOC is unlike other developed indices, which when tested outside their countries of
origin, do not show adequate reproducibility when implemented. In order to accurately
assess the GIEDOC, it must be implemented in real ED environments to measure the
level of crowding, and at the same time, an expert assessment of a native expert must be
made of the same environment to compare the results from the GIEDOC.
For the purposes of measuring the accuracy of the GIEDOC, five classes within the
GIEDOC were defined by five equal intervals on a scale from 0 to 100, so that the classes
could be compared to the subjective assessment of experts. These five classes for
assessing ED crowding on five subjective levels were: 1 for “insignificant”, 2 for “low”,
3 for “medium”, 4 for “high”, and 5 for “extreme”. In other words, this was done to
compare the agreement of the index to experts, by determining if this scale reflects the
expert perspective for crowding. The GIEDOC was implemented for three days in a
public Saudi Arabian hospital in Jeddah, which sees more than one hundred thousand
patients in its emergency department on a yearly basis, possessing more than 400
inpatient beds and 42 emergency beds. During the validation, twenty-four observations
were made to collect data which focused on factors including the capacity of the
emergency department, the number of patients in the waiting area, ER, and boarding
areas, the number of present physicians and nurses, the average patient complexity in
both the waiting area and the ER, and finally a healthcare expert’s subjective assessment
of crowding. These results are detailed in Table 11, where the ED crowding level scale
can be compared to class number assigned by experts Kappa analysis was used to test the
agreement between the computed GIEDOC scores and the subjective assessment of the
healthcare experts. These statistics allow for the comparison of the accuracy of the results
from GIEDOC to those of other indices when assessing ED crowding.
Table 11 provides the data obtained from the twenty-four observations conducted for
validation of the GIEDOC, resulting in calculated scores for the major operational
factors. The demand scores ranged from values of 8 to 61.4 according to the demand
indicator of the GIEDOC, while staffing scores ranged from 50 to 85.1, and ED workload
ranged from 33.33 to 89.2. It should be noted that the majority of staffing scores obtained
their maximum values, indicating that over the three days of validation, the selected ED
almost always maintained adequate staffing. There was higher variation in the range of
222 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

demand and ED workload scores. ED crowding level scores achieved values between 25
and 75. To further study the variation in scores between observations, the scores were
plotted in Figure 25.

Table 11: Crisp inputs and their computed crisp output using GIEDOC

Figure 25: GIEDOC index scores

The plot in Figure 25 further shows the consistency in the staffing score across the
twenty-four observations, varying slightly between observations 19 and 24. Generally
speaking, when demand, boarding, and workload scores were decreasing or increasing
between observations, such as in observation four, the crowding level decreased or
Managing Overcrowding in Healthcare using Fuzzy Logic 223

increased accordingly. In other observations such as 8 and 9, when factor scores such as
workload increased while another factor such as boarding decreased, the resulting
crowding score exhibited no change. In observation 21 when other scores exhibited
minimal change, a sharp increase in the demand score can be attributed to the sharp
increase in crowding, demonstrating the significance of the role of demand in crowding.
The agreement between GIEDOC and expert assessment is analyzed in Table 11,
where assessments are documented according to the “low”, “medium”, and “high”
classes (2, 3, and 4) from Table 11. The GIEDOC issued 4 assessments for “low” scores,
15 for “medium”, and 5 for “high”, while the expert provided 3 “low” assessments, 13
“medium”, and 8 “high”. For the low class, the GIEDOC and the expert issued the same-
assessment agreements twice, while they agreed eleven times for the medium class, and
five times for the high class. When measured against the expert assessments, the
GIEDOC overestimated once for the low class, (providing a score of “medium” where
the expert provided a score of “low”), and underestimated the medium class twice
(providing “low” while the expert provided “medium”), while underestimating the high
class three times. It should be noted that the insignificant and extreme classes could not
be predicted, as the ED during this study was neither empty nor extremely overcrowded
according to both scores from the expert and the GIEDOC. Most activity regarding the
major operation factors occurred in the third level or “medium” class according to their
The Kappa value found for the system was 0.562, 95% CI [0.45, 0.66], which
indicates moderate agreement between the objective and subjective scores of GIEDOC
and the expert.


This study proposed a framework for quantifying overcrowding within different

healthcare contexts, seeking to overcome the shortcomings of previous indices by
founding the framework upon the perspective of multiple experts and stakeholders. With
a method for quantifying overcrowding in qualitative and quantitative terms provided by
a variety of experts, and identifying and reducing bias, this study strives for
reproducibility of results in other settings.
With regard to the design of the fuzzy system, future research could focus on either
increasing the number of inputs to the system, or identifying more crowding
determinants. Other design improvements could include an expansion of the hierarchical
fuzzy system, in which more subsystems could be implemented in association with other
identified inputs or determinants of crowding. In designing the knowledge base, further
research could attempt to integrate other quantitative tools into the fuzzy system to
process some inputs independently, such as patient demand. Methods such as simple
224 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

linear regression or multiple regression could be used to model the demand side of the
problem in such a way to make the index more robust and accurate. A separate research
effort could focus on developing a set of action protocols for EDs, to specify a course of
action to both prevent and react to overcrowding when it occurs, as identified by the
index. Finally, a more rigorous validation study could simulate the index by integrating it
with a discrete event simulation model to study its performance over a longer period of
time. With such a simulation, the impact of the determinants on the overcrowding score
could be more accurately observed. Patterns of simulated data used to more closely
observe the impact of each factor on overcrowding could also be used to draw
conclusions for the development of future ED policy.


AHA. (2014). AHA Annual Survey Database™. Retrieved from http://www.aha from American Hospital Assoc-
Asplin, B. R., Magid, D. J., Rhodes, K. V., Solberg, L. I., Lurie, N., & Camargo, C. A.
(2003). A conceptual model of emergency department crowding. Annals of
emergency medicine, 42(2), 173-180.
Bellow, A. A., & Gillespie, G. L. (2014). The evolution of ED crowding. Journal of
Emergency Nursing, 40(2), 153.
Bernstein, S. L., Verghese, V., Leung, W., Lunney, A. T., & Perez, I. (2003).
Development and validation of a new index to measure emergency department
crowding. Academic Emergency Medicine, 10(9), 938-942.
Eitel, D. R., Rudkin, S. E., Malvehy, M. A., Killeen, J. P., & Pines, J. M. (2010).
Improving service quality by understanding emergency department flow: a White
Paper and position statement prepared for the American Academy of Emergency
Medicine. The Journal of emergency medicine, 38(1), 70-79.
Epstein, S. K., Huckins, D. S., Liu, S. W., Pallin, D. J., Sullivan, A. F., Lipton, R. I., &
Camargo, C. A. (2012). Emergency department crowding and risk of preventable
medical errors. Internal and emergency medicine, 7(2), 173-180.
Epstein, S. K., & Tian, L. (2006). Development of an emergency department work score
to predict ambulance diversion. Academic Emergency Medicine, 13(4), 421-426.
Gilboy, N., Tanabe, P., Travers, D., Rosenau, A., & Eitel, D. (2005). Emergency severity
index, version 4: implementation handbook. Rockville, MD: Agency for Healthcare
Research and Quality, 1-72.
Hwang, U., McCarthy, M. L., Aronsky, D., Asplin, B., Crane, P. W., Craven, C. K.,
Pines, J. M. (2011). Measures of crowding in the emergency department: a
systematic review. Academic Emergency Medicine, 18(5), 527-538.
Managing Overcrowding in Healthcare using Fuzzy Logic 225

Johnson, K. D., & Winkelman, C. (2011). The effect of emergency department crowding
on patient outcomes: a literature review. Advanced emergency nursing journal, 33(1),
MOH. (2014). Statistical Year Book. Retrieved from
Reeder, T. J., & Garrison, H. G. (2001). When the Safety Net Is Unsafe Real‐time
Assessment of the Overcrowded Emergency Department. Academic Emergency
Medicine, 8(11), 1070-1074.
Sivanandam, S., Sumathi, S., & Deepa, S. (2007). Introduction to fuzzy logic using
MATLAB (Vol. 1): Springer.
Weiss, S. J., Derlet, R., Arndahl, J., Ernst, A. A., Richards, J., Fernández‐Frankelton, M.,
Levy, D. (2004). Estimating the degree of emergency department overcrowding in
academic medical centers: results of the National ED Overcrowding Study
(NEDOCS). Academic Emergency Medicine, 11(1), 38-50.


Dr. Abdulrahman Albar is an Assistant Professor in the Department of Industrial

Engineering at Jazan University. He received a Bachelor’s degree in Industrial
Engineering from King Abdulaziz University in 2008, and a Master’s degree in Industrial
Engineering from the University of Central Florida (UCF) in 2012. He received a Ph.D.
in Industrial Engineering from UCF in 2016. His research interests focus on operations
management, intelligent decision systems, business intelligence, and applications of
quality systems in service industry. His experience includes Prince Mohammed Bin
Fahad Program for Strategic Research and Studies, UCF, and ASQ.

Dr. Ahmad Elshennawy is a Professor in the Department of Industrial Engineering

at University of Central Florida. He is the Graduate Program Director. He received his
Ph.D. in Industrial Engineering from Penn State in 1987. He is a Certified Six Sigma
Master Black Belt by The Harrington Institute and a Fellow of the American Society for
Quality (ASQ). He has 4 books, more than 50 journal articles published, and several
proceedings articles. His areas of research and interest are: Quality Management,
Manufacturing Systems, Process Improvement, Advanced Production Technologies, and
Healthcare systems.

Mohammed Basingab is a doctoral candidate at the University of Central Florida.

He completed B.S. in Industrial Engineering from King Abdul-Aziz University in 2009,
and received his M.S. in Industrial Engineering from University of Southern California in
226 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

2014. He served as a Graduate Assistance at King Abdul-Aziz University for 2 years, and
employed as a Development Engineer in Jeddah Municipality for one year. His research
interests include Quality, Big Data Simulations, Agents, Internet of Thing, and Supply

Dr. Haitham Bahaitham is an Assistant Professor in the Industrial Engineering

Department at Faculty of Engineering, King Abdulaziz University (KAU) - Rabigh. He
earned a BS degree in Electrical Engineering (Bio-Medical) from KAU in 1996 and an
MS degree in Industrial Engineering from KAU in 2003 while earned a PhD degree in
Industrial Engineering from the University of Central Florida (UCF) in 2011. He worked
in medical imaging service field at GE Elseif Medical Services and Siemens Co. Medical
Solutions. In addition, he taught in the Management Science Department MIS Track at
Yanbu University College (YUC). While his work at KAU, he served as the Head of
Industrial Engineering Department and the Vice Dean for Development at the Faculty.
Recently, He was appointed as the Dean of Community College at University of Jeddah.
His area of research is quality applications in service industry especially those
applications related to health care sector.

Appendix A: Fuzzy rule statements for subsystem I Appendix B: Fuzzy rule statements for subsystem II
Managing Overcrowding in Healthcare using Fuzzy Logic 227

Appendix C: Fuzzy rule statements for subsystem III

228 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.

Appendix D: Fuzzy rule statements for subsystem IV

In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 10



Khaled Alshareef1,*, Ahmad Rahal2 and Mohammed Basingab3

Industrial and Systems Engineering,
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
College of Business, University of Arkansas, Fort Smith, Arkansas, US
Industrial Engineering, King Abdulaziz University, Jeddah, Saudi Arabia


The use of Discrete Events Simulations (DES) in the healthcare sector is not new.
However, the inherent complexity of operations, the need to understand the complexity
and the stochastic nature of the modeling process, and the lack of real data, have
alienated many stakeholders and severely limited their involvement in healthcare. This
research posits that the combined use of DES and Case-Based Reasoning (DES-CBR)
can assist in the solution of new cases, and improve the stakeholders’ involvement by
eliminating the need for simulation or statistical knowledge or experience. Using a
number of unique healthcare based simulation cases, a base-case system was initially
developed and then used to implement the CBR using a case study, with results evaluated
using real data from the system and by healthcare experts.

Keywords: case-based reasoning, simulation modeling, healthcare sector

Corresponding Author Email:
230 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab


The gap between healthcare spending and economic growth in many nations around
the world including the United States, has been widening at a faster rate requiring scares
resources be allocated to mitigate the impact of the steep rise of healthcare cost instead of
being devoted to economic growth. This uncontrolled phenomenon can be attributed to
many factors including population growth, population aging (Thorwarth & Arisha, 2009),
the development cost of new technologies (Aboueljinane, Sahin, & Jemai, 2013), and the
use of expensive new diagnostic tests and treatments. Furthermore, the limited
availability and the over-utilization of healthcare facilities and providers such as
physicians, nurses, and others (Tien & Goldschmidt-Clermont, 2009), have also
attributed to the deterioration of the efficiency and effectiveness of healthcare processes,
and the degradation of the proper delivery of healthcare services (Faezipour & Ferreira,
Discrete Events Simulation (DES) has been used by many healthcare organizations as
a tool to analyze and improve their healthcare processes such as delivery systems, patient
flow, resources optimization, and patient admission (Gosavi, Cudney, Murray, & Masek,
2016; Hamrock, Paige, Parks, Scheulen, & Levin, 2013; Katsaliaki & Mustafee, 2010;
Parks, Engblom, Hamrock, Satjapot, & Levin, 2011). However, the use of DES poses
many challenges including the modeling complexity of the healthcare environment, the
lack of real data, and the difficulty in the implementation of the proposed solutions and
recommendations. Furthermore, the need to understand the stochastic nature of the
decision-making modeling process has limited the involvement of many healthcare
decision makers, and has reduced the effectiveness of the use of simulation in the
healthcare field as compared to other fields (Roberts, 2011).


The advancement in artificial intelligence (AI) technologies have led to the

development of many technologies including genetic algorithms, fuzzy logic, logic
programing, neural networks, constraint-based programing, rule-based reasoning, and
case base reasoning (CBR). CBR is a computerized method that reuses and if necessary
adapts solutions of previously solved problems. “CBR basically packages well-
understood statistical and inductive techniques with lower-level knowledge acquisition
and representational schemes to affect efficient processing and retrieval of past cases (or
experiences) for comparison against newly input cases (or problems)” (Mott, 1993). It
uses database management and machine learning techniques to perform the retrieval
process (Bichindaritz & Marling, 2006; Watson, 1999).
The Utilization of Case-Based Reasoning 231

CBR consists of four main processes - retrieve, reuse, revise, and retain, also known
as the 4Rs. The traditional CBR approach, shown in Figure 1, is a part of machine
learning created to fill in the gaps from available limitations in current rule-based systems
and help in gaining more knowledge.

Figure 1. The traditional CBR process (Zhao, Cui, Zhao, Qiu, & Chen, 2009).


As described by (De Mantaras et al., 2005), the process of solving a problem using
CBR involves -1) obtaining a problem description, 2) measuring the similarity of the
current problem to previous problems stored in a case base, 3) retrieving the solution of
the similarly identified problem if identical, or 4) possibly adapting it to account for the
differences in problem descriptions. The new solution is then retained in a case-base for
future use (Figure 2).
232 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

Figure 2. The CBR methodology structure for simulation.


Although there exists a plethora of available solutions, techniques to address the

complexity of the healthcare industry and its many problems, this study will only focus
on Emergency Departments (ED) and the use of Discrete Event Simulation (DES)

Constructing the Case-Base

This initial step in constructing the case base for this study, involved searching the
literature, collecting and analyzing suitable Emergency Departments’ operations related
cases (see Table 1).
The Utilization of Case-Based Reasoning 233

Table 1. ED Cases

Case Reference Summary

Case (Chetouane, Barker, & This case is about a problem related to
1 Oropeza, 2012) optimizing the operation and processes of a
regular ED
Case (Patvivatsiri, 2006) Operation and processes of the ED is optimized
2 for a mid-size hospital during extreme events
Case (Gul & Guneri, 2012) The purpose of this study is to optimize the
3 operation and processes of the ED of a regional
Case (Yeh & Lin, 2007) Optimizing the operation and processes of the
4 ED of a small hospital in a city is the target of
this case
Case (Zeinali, Mahootchi, & The aim of this research is to optimize the
5 Sepehri, 2015) operation and processes of the ED in a
specialized hospital.
Case (Ahmed & Alkhamis, This is about optimized processes and
6 2009) operations in an ED of a mid-size governmental
Case (Lim, Worster, Goeree, The case solved in this problem is to optimize
7 & Tarride, 2013) the operation and processes of the ED of a local
Case (Meng, 2013) The effort done in this study was directed to
8 optimize operation and processes of the ED of a
large hospital
Case (Wylie, 2004) The operation and related processes of a
9 Primary Care Clinic in a university are
optimized to improve the student health services
Case (Terry & Chao, 2012) The crowding problem in an ED of a medical
10 center that is located in a metropolitan area is
solved in this study

The Indexing System

The second step in constructing the case base system involves defining the indexing
system to identify the specifics of the attributes of the solved cases for easy indexing and
retrieval. Attributes could be either numerical or non-numerical such as locations,
programs used, or type of employees to name a few. A retrieval engine will then use the
identified attributes of the new case to retrieve similar cases from the case-base.
The Emergency department’s operations attributes include:
234 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

1. Categorization: Cases were classified in one of three solutions categories- 1)

optimization, 2) crowding, and 3) new designs/methodologies problems.
2. Paths taken by patients within the Emergency department (from admission to
checkout) take into consideration different plan layouts and processes and the
originating station used by the patients upon entering the ED. The existing paths
describe the patients’ movements while inside the ED. The literature identified
four different paths depending on the patient point of entry into the system.
As shown in Figure 3, the first path (Path 1) is the most commonly used path.
Patients arrive to the ED through the entrance station. Then, move to the triage
station where a triage nurse will perform the necessary process. After which,
patients with levels 1 and 2 (of a 5-level triage scale) skip the registration and
move to either the treatment station or the hospital depending on their conditions,
while other patients will need to register prior to proceeding to the treatment
station and receive the needed treatment. The lab station where services including
x-rays, CAT-SCAN, or any other tests are made available to the patients. Finally,
patients leave the ED through the exit station. The other three paths include
different permutation of the same services and stations.
3. The third attribute includes the available resources performing treatments in the
ED including physicians, specialty doctors, and nurse practitioners that treat low
acuity patients in some EDs.
4. The fourth attribute includes the number of nurses and their classification such as
triage nurses, emergency nurses, and regular nurses. These two attributes are
initialized at one “1”, since all EDs will have at least one doctor and one nurse.
5. The fifth attribute includes the number of lab technicians in the EDs, and the
number of workers in the lab station.
6. The last attribute includes the number of staff in the EDs including all workers in
non-medical and administrative jobs in all stations. Upon indexing the cases, the
case-base will be populated as shown in Table 2.

The Retrieval Engine

The literature shows several techniques and algorithms used to create retrieval
engines for the CBR methodology. Examples of these techniques include nearest
neighbor, induction, fuzzy logic, database technology, and several others. The most
commonly used techniques are nearest neighbor and induction with decision trees
(Watson, 1999).
Page 1 of Table 2. The developed case-base for ED problems using DES

Optimization Problems Crowding Problems New design/methodology Problems
Case 1 Case 10






3 5 1 0 Path 1 32 75 0 0 Path 2
Case 2





3 13 1 0 Path 1
Case 3





10 12 0 5 Path 2
Case 4





3 6 2 0 Path 1
Case 5
Lab techs




1 4 0 2 Path 1
Page 2 of Table 2. The developed case-base for ED problems using DES

Optimization Problems Crowding Problems New design/methodology Problems
Case 1 Case 10
Case 6





2 10 3 2 Path 2
Case 7





2 4 1 1 Path 4
Case 8





2 5 2 0 Path 2
Case 9





3 6 1 1 Path 3
The Utilization of Case-Based Reasoning 237

Figure 3. Different paths in the EDs.

The Nearest Neighbor Approach

Algorithms such as K nearest neighbor or R nearest neighbor are deployed to

determine the similarities between the attributes of both the new case we are seeking a
solution for, and the cases stored in the case-base. Similarities are then normalized to fall
between zero “0” and one “1” or as percentages. These functions use various similarity
metrics such as Euclidean distance, city block distance, probabilistic similarity measures,
and geometric similarity metrics. Similarity percentages are retrieved using a predefined
parameter value “K”. However, in the R nearest neighbors, cases with similarities
percentages (see equation below) that are greater than or equal to a predefined value “R”
are retrieved.

NC represents the new case
SCs are stored cases in the case-base.
n is the number of attributes in each case
w is weight, and
f is the similarity function.
238 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

In this analysis, the K nearest neighbor algorithm and the Euclidean distance were
used to determine the similarity function for the numerical attributes. The Euclidean
distance is calculated using the following equation:

Di is the Euclidean distance between stored case i and the new case
anx are the attributes of the new case.
aix are the attributes of the case i.
m is the number of numerical attributes.

The numerical attributes in the developed ED cases were attributes 3, 4, 5, and 6

corresponding to the numbers of doctors, nurses, lab technicians, and staff, and weighed
equally in the similarity function. The non-numerical attributes such as the category of
the problem and the path taken by the patients in the ED, will not have a certain similarity
function, as the retrieval engine will only retrieve from within the category as the new
case. Furthermore, the most commonly used paths in the EDs were sequentially
numbered (Path 1 to 4) according to their likely usage, and a similarity matrix was then
developed. Furthermore, changes in the similarity matrix caused an addition of 10 units
to the similarity function, which was then used to recalculate the Euclidean distance as
shown in Table 3 below.
Using this approach, determining the similarity percentages will not be required as no
weights were associated with attributes. The similarity (distance) used to retrieve the K
stored cases measuring between the new case and all the stored cases were then
determined using the equations below

The Induction Tree Approach

The Induction Tree approach uses the already defined indexing system to develop the
decision tree representing the case-base, resulting in faster retrieval time, and different
results than the K nearest neighbor approach. This tree represents the hierarchical
structure of the simulation cases stored in the case-base. The assignments of attributes
The Utilization of Case-Based Reasoning 239

among different tree levels will show the relative importance of these attributes in the
process of developing a solution to the new problem. This T tree represents the stored
simulation cases in the case-base and defined as

𝑇 = {𝑁,} Where,
N is the set of nodes (attributes),
n is the number of node in the tree
E is the set of edges connecting nodes and correlating attributes,
l is the level of the node, where
l = 0 Root node, l = 1 Category of the case, l = 2 Path number,
l = 3 # Doctors, l = 4 # Nurses, l = 5 # Lab technicians, l = 6 # Staff, and l = 7
Case Number
For each node in N, degree = number of directly connected nodes in levels l – 1 and

Table 3. The similarity (distance) matrix between different paths

Similarity (distance) matrix

Path 1 Path 2 Path 3 Path 4
Path 1 0 10 20 10
Path 2 0 10 20
Path 3 0 10
Path 4 0

The decision tree included three types of nodes:

(a) A root node acting as a pointer that reference all sub-nodes in the first level
(starting node of the tree)
(b) Intermediate nodes: all nodes in the tree with level 1 < l < 7. These nodes contain
the set of all child nodes Cl in the direct lower level that connected by edges.
(c) Leaf nodes: all nodes in the tree with degree = l, and l = 7. Each leaf node
expresses a specific set of attributes relating to its parents. The tree of the
developed case-base is shown in Figure 4.

For the stored simulation cases, let each case Ax describe as a set of different
attributes composing a distinctive case {a1, a2, al-1}. Also, for each attribute ai there is a
set Vi that contains all possible values of this attribute {vi1, vi2, … vir}. For example, the
first attribute a1 corresponding to the category of the simulation problem has V 1 =
{Optimization, Crowding, New design/methodology}.
The induction tree approach will be ready to use as soon as the decision tree is
developed. Attributes of each of the new cases will compose a new set G = {g1, g2, … gl-
240 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

1} to retrieve similar cases from the case-base by matching the elements of this target set
to those of the same level in the case-base. This comparison guides the search as it
traverses through the decision tree. The approach starts at the root node (l = 0) where the
first step in the retrieval process is to match g1 to an element in V1 (all children of the root
node) such as:

𝑖𝑓 𝑔1∈𝑉1→𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ= 𝑀𝑎𝑡𝑐ℎ
Else 𝑖𝑓 𝑔1∉𝑉1→𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ=𝑁𝑜 𝑀𝑎𝑡𝑐ℎ.

If a match does not exist, the retrieval process will terminate. If on the other hand the
new case finds a match in the base-case, the decision tree will then choose the edge that is
connected to the node (at l = 1) with the same category as the target case. The step to
follow, is to match all the remaining attributes of set 〈𝐺〉= {g2, … gl-1}by comparing the
second attribute, g2 to a subset of 〈𝑉2〉; where V2 is the set that contains all the possible
paths taken by patients, and 〈𝑉2〉 contains all the paths under the matched category g1.
Due to the nature of this attribute, four different paths might be possible in the case-base.
The attribute match function yields three possible results as follows:

𝑔2 = 𝑣2𝑖 → 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ = 𝑃𝑒𝑟𝑓𝑒𝑐𝑡 𝑀𝑎𝑡𝑐ℎ

𝑔2 ≠ 𝑣2𝑖 → {𝑃𝑎𝑡ℎ𝑖△𝑃𝑎𝑡ℎ𝑖 ±1 𝑜𝑟 𝑃𝑎𝑡ℎ𝑖△𝑃𝑎𝑡ℎ𝑖 ±3 → 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ = 𝑃𝑎𝑟𝑡𝑖𝑎𝑙
𝑀𝑎𝑡𝑐ℎ𝑃𝑎𝑡ℎ𝑖△𝑃𝑎𝑡ℎ𝑖 ± 2 → 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ = 𝑆𝑜𝑚𝑒𝑤ℎ𝑎𝑡 𝑀𝑎𝑡𝑐ℎ

Based on the value of the attribute match, the approach will choose the edge that is
connected to the node (at l = 2). This choice will yield the same path number when
perfect matching is achieved. However, if perfect matching is not achieved, then a partial
match or somewhat match will be chosen. The next step includes the matching of the
remaining attributes of set 〈𝐺〉= {g3, … gl-1} to a subset of 〈𝑉3〉; where V3 is the set
containing the possible number of doctors in the ED, and 〈𝑉3〉 contains all number of
doctors matched under path g2. The remaining attributes are numerical, and will have
similar matching functions. For g3, the attribute matching function will use the absolute
difference between g3 and each of the elements in 〈𝑉3〉
Based on the difference value zi, the approach will choose the node (at l = 3)
corresponding to the minimum difference value. The attribute match value indicates the
degree of similarity between the target case’s attribute g3 and each one of the elements in
the subset 〈𝑉3〉. Similarly, the same matching process is also used to match the remaining
attributes of the target case such as g4, g5, and g6. Finally, the subset 〈𝑉7〉 containing the
children of the node is matched with g6 to return the result of this retrieval engine. This
result will define the case(s) Ax from the case-base that are similar to the target case G.
The Utilization of Case-Based Reasoning 241

Figure 4. Decision tree of the developed case-base.

The CBR Methodology Retrieval Code

A java code was developed to automate the retrieval process and the development of
the case-base by adopting the solutions of the new cases using the interface shown in
Figure 5.



A case study published by Duguay (Duguay & Chetouane, 2007)) representing a

regional hospital with 2000 medical personnel and an ED with over 50,000 yearly
patients was adopted to demonstrate the usefulness of the proposed approach. Data
collected from this case study are shown in Figures 6 and 7 (Duguay & Chetouane,
242 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

Figure 5. The interface of CBR methodology retrieval code.

∀𝑣3𝑖 ∈ 〈𝑉3〉, 𝑧𝑖 = |𝑣3𝑖 − 𝑔3|

𝑤ℎ𝑒𝑛 𝑧𝑖 = 0 → 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ = 𝑃𝑒𝑟𝑓𝑒𝑐𝑡 𝑀𝑎𝑡𝑐ℎ
𝑤ℎ𝑒𝑛 1≤ 𝑧𝑖 ≤5 → 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ = 𝑃𝑎𝑟𝑡𝑖𝑎𝑙 𝑀𝑎𝑡𝑐ℎ
𝑤ℎ𝑒𝑛 6 ≤ 𝑧𝑖 ≤ 15 → 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ = 𝑆𝑜𝑚𝑒𝑤ℎ𝑎𝑡 𝑀𝑎𝑡𝑐ℎ
𝑤ℎ𝑒𝑛 𝑧𝑖 ≥ 16 → 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ = 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡

Resources Number Probabilities %

Examination 5 Code 1 & 2 patients 7
Triage nurses 3 Code 3 patients 18
Registration 3 Code 4 patients 55
Physicians 5 Code 5 patients 20
Nurses 5 Patients that need lab tests 23
Lab technicians 1
Working Night Shift Day Shift Evening Shift (4:00 pm - Extra Shift 1 Extra Shift 2
schedules (12:00 am (8:00 am - 12:00 am) (10:00 am - (5:00 pm -
- 8:00 am) 4:00 pm) 5:00 pm) 11:00 pm)
Physicians 1 1 1 1 1
Nurses 1 1 1 1 1
Nurses 1 1 1 0 0
Nurses (Triage) 1 1 1 0 0

Figure 6. Data of the ED case study – part 1.

The Utilization of Case-Based Reasoning 243

Patients interarrival times in minutes (Maximum of Each day)

Monday Tuesday Wednesday Thursday Friday
Exponential Exponential
Exponential (7) Exponential (10) Exponential (10)
(9.5) (10)

Patients arrival rates

(patients/hour) Service times in minutes
Time Rate
Triage Registration Lab tests
12 am- 1 am 5
Triangular (30,
1 am – 2 am 4 Poisson (6) Triangular (3, 5, 7)
45, 60)
2 am – 3 am 3
3 am – 4 am 3 1st Assessment
4 am – 5 am 2 Code 3 patients Code 4 patients Code 5 patients
Triangular (25, 30, Triangular (25,
5 am – 6 am 2 Triangular (25, 30, 40)
40) 30, 40)
6 am – 7 am 3
7 am – 8 am 5 2nd Assessment
8 am – 9 am 6 Code 3 patients Code 4 patients Code 5 patients
Triangular (10, 12, Triangular (6,
9 am – 10 am 7 Triangular (8, 10, 12)
15) 7.5, 9)
10 am – 11 am 7
11 am – 12 pm 8
12 pm- 1 pm 9
1 pm – 2 pm 8
2 pm – 3 pm 8
3 pm – 4 pm 7
4 pm – 5 pm 8
5 pm – 6 pm 9
6 pm – 7 pm 9
7 pm – 8 pm 10
8 pm – 9 pm 9
9 pm – 10 am 8
10 pm – 11 pm 7
11 pm – 12 am 6

Figure 7. Data of the ED case study – part 2.

The main objective was to improve the performance of the hospital’s ED (improved
utilization, and minimize time spent by patient) while keeping the same level of the
quality of healthcare services provided.

Define and Analyze the New Case

Patients arriving to the ED will first pick up a number, wait in the waiting area, and
then proceed to the triage station where a nurse assesses the severity of their cases using a
one to five (1 to 5) emergency severity index. Cases coded as 1 or 2 (critical conditions)
are directly admitted to the intensive care unit (ICU) to receive the required care, while
cases coded as 3, 4, or 5 proceed to register at the registration desk, and wait for a
physician who is always accompanied by a nurse for the initial assessment. Patients may
be discharged, or are may be asked for some lab tests, where some may be required to
wait for a second assessment where they may either be discharged or admitted to the
244 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

hospital. The hospital operates three eight-hour shifts (day, evening, and night sifts), with
additional resources allocated when crowded (from 10 am to 9 pm). ED process
flowchart is shown in Figure 8.

Figure 8. The process chart of the ED (Duguay & Chetouane, 2007).

Case Retrieve

The target set of G = {Optimization, Path 1, 5, 11, 1, 0} describes the attributes of the
new case and reads as follow: 1) the objective of the study is optimization, 2) using Path
1, 3) with five doctors (physicians), 4) eleven nurses, 5) one lab technician, and 6) no
other staff for administrative purposes. Upon defining the target set, the retrieval code
searched the case-base for similarities to the case at hand using the two previously
described approaches. Cases 2, 4, and 1 were sequentially retrieved using nearest
approach with a K value of three (K = 3 due to the limited number of cases in the base-
case), while case 2 was retrieved using the induction tree approach, concluding that case
2 has the closest similarity to the new case.

Case Reuse

Choosing SIMIO as a modeling environment, a DES model for the problem at hand
was developed using the attributes of each of the previously described entities (patients,
Medical and non-medical staff), the working schedule, and the likely paths taken by
customers during their ED visit.
The simulation model ran under multiple scenarios and under multiple circumstances
with results revealing patients classified as code 3 had acceptable average waiting time in
the system of about 1.86 hours, while patients coded as 4 and 5 averaged a waiting time
of 11.86 and 5.86 hours respectively. Furthermore, the results show the utilization rate of
doctors and nurses’ running at 99%, with the first assessment station’s utilization rate
running almost at full capacity (see Figure 9, Table 4 & Table 5).
The Utilization of Case-Based Reasoning 245

Average Time in The System

time units are hours
6 Code 3 patients
3 Code 4 patients
0 Code 5 patients
Monday Max Tuesday Max Wed, Thu, Fri Regurlar
Arrival Rate Arrival Rate Max Arrival Arrival Rates

Figure 9. Average time in the system for patients with different codes.

Table 4. Simulation Summary-ED Waiting Time per Customer Classification

Patient Average number of
Day of the Week Waiting Time in
Code Patients in ED
ED (Hours)
Monday 3 1.89 2.83
4 11.86 46.33
5 5.86 21.1
Tuesday 3 1.76 1.86
4 9.12 26.36
5 4.40 15.23
Wednesday 3 1.80 1.97
4 8.81 23.98
5 5.69 14.32

These numbers indicate that this hospital is underserved and lacks the required
resources to deliver satisfactory service at peak times, concluding the need for additional
resources (doctors and nurses) to serve the large number of patients to the ED every day.
After identifying the main problem and its root causes, the modeling team should re-
visit the retrieved cases to look for similar problems and their solutions. In this case, the
common solution suggested in similar cases was to hire more resources to meet the
increasing demand, and to maintain the quality of the provided services. In addition, a
benefit cost analysis may also be needed for justification purposes. For our case, the
retrieved alternative solutions are listed in Table 6.
Alternative 1: hire one more doctor and one more nurse, and revise the work schedule
to have an equal number of resources at each main shift as shown in Table 6.
246 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

Table 5. Simulation Summary-ED Performance Indicators

Mondays Tuesdays
Thursday, &
Day of the Week Peak Peak
Fridays Peak
Arrivals Arrivals
Utilization Rate: Doctors 99.20% 99.18% 99.15%
Utilization Rate: Nurses Accompanying 99.20% 99.18% 99.15%
Utilization Rate Triage Station 84.63% 61.84% 58.63%
Average Time Triage Station (minutes) 18.6 5.4 5.4
Utilization Rate: Registration Station 66.72% 47.56% 61.56%
Average Time Registration Station (minutes) 1.2 1 1
Utilization Rate: First Assessment station 99.20% 99.14% 99.11%
Average Time in First Assessment Station (Hrs) 5.3 5.14 4.78
Utilization Rate: 2nd Assessment Station 57.67% 53.72% 61.56%
Average Time in 2nd Assessment Station 63 53.4 58.2
Utilization Rate: Lab Test Station 44.94% 46.35% 46.73%
Average Time in Lab Station (Hours) 18 13.8 15

Table 6. Alternative 1 details

Alternative 1: Hire one more doctor and one more nurse

Working Night Shift Day Shift Evening Shift
schedules (12:00 am - 8:00 am) (8:00 am - 4:00 pm) (4:00 pm - 12:00 am)
Physicians 2 2 2
Nurses 2 2 2
Registration Nurses 1 1 1
Triage Nurses 1 1 1

Alternative 2: hire two more doctors and two more nurses, and schedule the most
resources in the evening shift since more patients visit the ED during that time. See Table
7 below.

Table 7. Alternative 2 details

Alternative 2: Hire two more doctors and two more nurses

Night Shift
Working Day Shift Evening Shift
(12:00 am - 8:00
schedules (8:00 am - 4:00 pm) (4:00 pm - 12:00 am)
Physicians 2 2 3
Nurses 2 2 3
1 1 1
Triage Nurses 1 1 1
The Utilization of Case-Based Reasoning 247

Alternative 3: hire three more doctors and three more nurses, and schedule more
resources in the day and evening shifts (Table 8).

Table 8. Alternative 3 details

Alternative 3: Hire three more doctors and three more nurses

Night Shift
Working Day Shift Evening Shift
(12:00 am - 8:00
schedules (8:00 am - 4:00 pm) (4:00 pm - 12:00 am)
Physicians 2 3 3
Nurses 2 3 3
Registration Nurses 1 1 1
Triage Nurses 1 1 1

Alternative 4: Schedule the maximum number of doctors and nurses for each shift
(5 doctors and 5 nurses). Although this solution may be neither feasible nor
implementable, it may show the performance of the system when resources are
maximized for drawing up some contingencies (Table 9).

Table 9. Alternative 4 details

Alternative 4 (Extreme scenario): This alternative is for comparisons of results

Night Shift
Working Day Shift Evening Shift
(12:00 am - 8:00
schedules (8:00 am - 4:00 pm) (4:00 pm - 12:00 am)
Physicians 5 5 5
Nurses 5 5 5
Registration Nurses 1 1 1
Triage Nurses 1 1 1

This step of the CBR methodology requires stakeholders’ involvement due to their
familiarity with their system, and their ability to address concerns that may be critical to
the interpretation of the simulation model and its outputs.
The adopted solution was coded (assigned a case number), indexed as an
optimization case, and was then added to the case-base.


Although there exist several techniques to validate simulation models (animation,

event validity, traces, face validity, and historical data validation), the latter technique
248 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

Average Time in the System using Monday’s Maximum

Arrival Rate- time units is hours

4 Code 3 patients
2 Code 4 patients
Code 5 patients
Current Alternative 1 Alternative 2 Alternative 3 Alternative 4

Average Time in the System using Tuesday’s Maximum

Arrival Rate- time units are hours
7 Code 3 patients
Code 4 patients
4 Code 5 patients
Current Alternative 1 Alternative 2 Alternative 3 Alternative 4

Figure 10. The summarized results (Monday and Tuesday).

The Utilization of Case-Based Reasoning 249

Average Time in the System using Wednesday –

Friday’s Maximum Arrival Rate- time units are hours
7 Code 3 patients
Code 4 patients
4 Code 5 patients
Current Alternative 1 Alternative 2 Alternative 3 Alternative 4

Average Time in the System using Regular Arrival

Rate- time units are hours
7 Code 3 patients
5 Code 4 patients
3 Code 5 patients
Current Alternative 1 Alternative 2 Alternative 3 Alternative 4

Figure 11. The summarized results (Wednesday – Friday).

deemed as the most appropriate for the case at hand. The output of the simulated model
including total time in the system for patients with triage levels, and the waiting times at
each of the stations was validated by healthcare experts, and verified the ability of the
simulation model to reflect the actual system (see Table 10).
Table 10. Comparison of simulation output and the real data

Waiting durations Description

T1 Time between arrival and triage
T2 Time between triage and registration
T3 Time from registration to available exam room
T4 Time from first assessment to discharge
Simulation output vs. Real data collected (in minutes)
T1 T2 T3 T4
Days Simulation Simulation Simulation Simulation
Real data Real data Real data Real data
Mean 95% CI Mean 95% CI Mean 95% CI Mean 95% CI
Mon 12.7 17.0 (4.8-46.8) 1.7 1.0 (0.42-2.4) 235.0 136.0 (64.2-175.2) 36.0 57.0 (19.8-113.4)
Tue 6.6 5.4 (2.4-10.8) 0.6 0.5 (0.06-1.2) 144.0 97.3 (39.6-150.6) 36.0 46.1 (12-94.2)
Wed 10.0 4.9 (1.8-9.6) 1.8 0.6 (0.12-1.8) 121.0 92.4 (38.4-166.8) 40.0 51.2 (19.2-117.6)
Thu 10.0 4.9 (1.8-9.6) 1.8 0.6 (0.12-1.8) 121.0 92.4 (38.4-166.8) 40.0 51.2 (19.2-117.6)
Fri 17.9 4.9 (1.8-9.6) 2.2 0.6 (0.12-1.8) 101.0 92.4 (38.4-166.8) 42.0 51.2 (19.2-117.6)

Code 3 Code 4 Code 5

Days Simulation Simulation Simulation
Real data Real data Real data
Mean 95% CI Mean 95% CI Mean 95% CI
Mon 89.6 91.2 (72.6-113.4) 257.9 277.9 (194.4-360.6) 327.2 381 (295.8-466.2)
Tue 68.1 89.6 (67.2-115.8) 172.9 189.9 (90-321.6) 204.2 187.8 (44.4-381)
Wed 72.1 84.3 (64.8-105) 201.9 180.5 (86.4-301.2) 228.2 247.8 (48.6-426)
Thu 54.7 84.3 (64.8-105) 144.9 180.5 (86.4-301.2) 161.2 247.8 (48.6-426)
Fri 87.2 84.3 (64.8-105) 163.9 180.5 (86.4-301.2) 180.2 247.8 (48.6-426)
The Utilization of Case-Based Reasoning 251

As shown in Table 10, the waiting time prior to the first assessment station (T3) is the
longest in the system with high discrepancy between the simulation results and the
collected data especially on Mondays where arrival rates are usually higher. According to
the healthcare professional who are familiar with this ED, this discrepancy is attributed to
medical personnel who sometimes violates the priorities of the different triage patients’
levels, and serve patients (code 5) who waited long period causing longer wait for other
patients. This behavior is understandable, as the fear is that these patients may leave the
system without being treated or seen by doctors. In addition, the high utilization rate of
the healthcare employees and facilities may require some unplanned breaks, and
inefficient scheduling. The system experts deemed the rest of the results acceptable.
For face validity, three highly experienced healthcare professional with familiarity in
managing emergency departments tested the simulation model and provided important
feedback. Although the developed alternatives provided excellent results, it was quite
understandable that some of the results will never be implemented due to its high cost,
and the limited available resources.


This research proposed the use of Discrete Event Simulations (DES) and Case-Based
Reasoning (CBR) to facilitate the decision making process in the healthcare sector,
improve the stakeholders’ involvement in the analysis of healthcare problems, and in
mitigating the difficulties faced by the modeling team. In this research, we focused on
emergency departments (ED) which face multiple resource constraints including
financial, labor, and facilities. The applications of DES-CBR provided solutions that were
realistic, robust, and more importantly the results were scrutinized, and validated by field
Other fields within the healthcare sector may also benefit from such application.
While other research venues may include a better indexing system, and more efficient
ways to retrieve cases in particular as more cases are added, and more attributes are


Aboueljinane, L., Sahin, E. & Jemai, Z. (2013). A review on simulation models applied
to emergency medical service operations. Computers & Industrial Engineering,
66(4), 734-750.
252 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

Ahmed, M. A. & Alkhamis, T. M. (2009). Simulation optimization for an emergency

department healthcare unit in Kuwait. European Journal of Operational Research,
198(3), 936-942.
Bichindaritz, I. & Marling, C. (2006). Case-based reasoning in the health sciences:
What's next? Artificial intelligence in medicine, 36(2), 127-135.
Chetouane, F., Barker, K. & Oropeza, A. S. V. (2012). Sensitivity analysis for
simulation-based decision making: Application to a hospital emergency service
design. Simulation Modelling Practice and Theory, 20(1), 99-111.
De Mantaras, R. L., McSherry, D., Bridge, D., Leake, D., Smyth, B., Craw, S., . . .
Forbus, K. (2005). Retrieval, reuse, revision and retention in case-based reasoning.
The Knowledge Engineering Review, 20(03), 215-240.
Duguay, C. & Chetouane, F. (2007). Modeling and improving emergency department
systems using discrete event simulation. Simulation, 83(4), 311-320.
Faezipour, M. & Ferreira, S. (2013). A system dynamics perspective of patient
satisfaction in healthcare. Procedia Computer Science, 16, 148-156.
Gosavi, A., Cudney, E. A., Murray, S. L. & Masek, C. M. (2016). Analysis of Clinic
Layouts and Patient-Centered Procedural Innovations Using Discrete-Event
Simulation. Engineering Management Journal, 28(3), 134-144.
Gul, M. & Guneri, A. F. (2012). A computer simulation model to reduce patient length of
stay and to improve resource utilization rate in an emergency department service
system. International Journal of Industrial Engineering, 19(5), 221-231.
Hamrock, E., Paige, K., Parks, J., Scheulen, J. & Levin, S. (2013). Discrete event
simulation for healthcare organizations: a tool for decision making. Journal of
Healthcare Management, 58(2), 110-125.
Katsaliaki, K. & Mustafee, N. (2010). Improving decision making in healthcare services
through the use of existing simulation modelling tools and new technologies.
Transforming Government: People, Process and Policy, 4(2), 158-171.
Lim, M. E., Worster, A., Goeree, R. & Tarride, J.-É. (2013). Simulating an emergency
department: the importance of modeling the interactions between physicians and
delegates in a discrete event simulation. BMC medical informatics and decision
making, 13(1), 59.
Meng, G. S. n. d. (2013). Ambulance Diversion and Emergency Department Flow at the
San Francisco General Hospital. Retrieved from
Mott, S. (1993). Case-based reasoning: Market, applications, and fit with other
technologies. Expert Systems with applications, 6(1), 97-104.
Parks, J. K., Engblom, P., Hamrock, E., Satjapot, S. & Levin, S. (2011). Designed to fail:
how computer simulation can detect fundamental flaws in clinic flow. Journal of
Healthcare Management, 56(2), 135-146.
The Utilization of Case-Based Reasoning 253

Patvivatsiri, L. (2006). A simulation model for bioterrorism preparedness in an

emergency room. Paper presented at the Proceedings of the 38th conference on
Winter simulation.
Roberts, S. D. (2011). Tutorial on the simulation of healthcare systems. Paper presented
at the Proceedings of the Winter Simulation Conference.
Terry, & Chao. (2012). Arcadia Medical Center (A): Emergency Department Crowding.
Retrieved from
Thorwarth, M. & Arisha, A. (2009). Application of discrete-event simulation in health
care: a review.
Tien, J. M. & Goldschmidt-Clermont, P. J. (2009). Engineering healthcare as a service
system. Information Knowledge Systems Management, 8(1-4), 277-297.
Watson, I. (1999). Case-based reasoning is a methodology not a technology. Knowledge-
based systems, 12(5), 303-308.
Wylie, D. (2004). West Coast University Student Health Services--Primary Care Clinic.
Retrieved from
Yeh, J.-Y. & Lin, W.-S. (2007). Using simulation technique and genetic algorithm to
improve the quality care of a hospital emergency department. Expert Systems with
applications, 32(4), 1073-1083.
Zeinali, F., Mahootchi, M. & Sepehri, M. M. (2015). Resource planning in the emergency
departments: A simulation-based metamodeling approach. Simulation Modelling
Practice and Theory, 53, 123-138.
Zhao, J., Cui, L., Zhao, L., Qiu, T. & Chen, B. (2009). Learning HAZOP expert system
by case-based reasoning and ontology. Computers & Chemical Engineering, 33(1),


Dr. Khaled Alshareef is an assistant professor in the department of Systems

Engineering at King Fahd University of Petroleum and Minerals (KFUPM), Dhahran,
Saudi Arabia. He completed his B.S. and M.S. in Industrial and Systems Engineering in
2005 and 2008 at KFUPM, and worked as a graduate assistant from 2005-2008, and as a
lecturer from 2008-2009. In 2011, Dr. Alshareef received a second M.S. in Industrial
Engineering from University of Florida, and completed his PhD in Industrial Engineering
from University of Central Florida in 2016. His research interests are Artificial
Intelligence, Simulation in healthcare, Scheduling, Quality Control, and Supply Chain
254 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

Dr. Ahmad Rahal is an Associate Professor in the College of Business received at

the University of Arkansas-Fort Smith. He received his PhD in Industrial Engineering &
Management Systems from the University of Central Florida in 2005. His research
interest includes but not limited to Product Innovation and Technology Management,
Quality Management, Continuous Process Improvement, Six Sigma applications, Supply
Chain Management, and Decision Analysis. Dr. Rahal has published in many journals
including Annals of Management Sciences, The International Journal of Management
Education, Advances in Business Research, Journal of Teaching in International
Business, Journal of Management and Engineering Integration, Journal of Technology
Management & Innovation, and Engineering Management Journal.

Mohammed Basingab is a doctoral candidate at the University of Central Florida.

He completed B.S. in Industrial Engineering from King Abdul-Aziz University in 2009,
and received his M.S. in Industrial Engineering from University of Southern California in
2014. He served as a Graduate Assistance at King Abdul-Aziz University for 2 years, and
employed as a Development Engineer in Jeddah Municipality for one year. His research
interests include Quality, Big Data Simulations, Agents, Internet of Thing, and Supply
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 11



Oloruntomi Joledo1, Edgar Gutierrez1 and Hatim Bukhari2, *

Department of Industrial Engineering and Management Systems,
University of Central Florida, Orlando, Florida, US
Department of Industrial Engineering,
University of Jeddah, Jeddah, Saudi Arabia


In the past decade, ecommerce created new business models. Information

Technology leveled the playing field for new participants, who were capable of causing
disruptive changes in every industry. We investigate how actions of stakeholders
(represented by agents) in an ecommerce system affect system performance. Viewing
consumer-to-consumer ecommerce from a systems perspective calls for integration of
different levels of behaviors. Complex interactions exist among stakeholders, the
environment and available technology and agents is the best paradigm to mimic these
behaviors. The presence of continuous and discrete behaviors coupled with stochastic and
deterministic behaviors present challenges for using standalone simulation tools to
simulate the business model. This research takes into account dynamic system
complexity and risk. By combining system dynamics at the strategy level with agent-
based models of consumer behaviors, and neural networks to find historical relationships,
a representation of the business model that makes for sound basis of decision making can
be achieved. The case study is based on a peer-to-peer lending environment.

Corresponding Author Email:
256 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

Keywords: agent-based simulation, neural networks, consumer-to-consumer ecommerce,

peer-to-peer lending


Organizations face an ever increasing number of challenges and threats – changes in

market, competitors, customer demands and security. To achieve organizational goals in
the midst of conflicting objectives, processes and activities need to be synchronized,
coordinated and integrated (Helal, 2008; Helal et al., 2007). Ecommerce systems are
characterized by frequent transactions from a varied customer base and consequent
reduction in order size while maintaining an element of stochasticity in demand patterns.
As a result, management faces the challenge of implementing the right strategy in the
face of competing objectives.
Peer-to-peer lending is a form of consumer-to consumer ecommerce whereby lenders
pool their resources together and lend it to borrowers at a lower rate using an online
platform without the direct mediation from financial institutions. Consumer-to-consumer
(C2C) companies face competitions from large organizations as well as from
entrepreneurs who have little to lose by embarking in the business. Customers do not
need to leave the comforts of their homes to find better deals. They can compare the
offerings of different companies online and make a hassle free change if they are not
getting value for their money. Other challenges facing C2C business models include how
to unify a group of consumers according to their needs, preferences and interaction with
each other.
Stakeholders range from providers, customers, companies and complementors (Wu
and Hisa, 2004). These stakeholders include the community, suppliers, alliance partners,
shareholders and government that form a large collection of active objects in the system
seeking to maximize their utility. With the growing popularity of C2C models, decision
making on the part of stakeholders can be difficult due to the interplaying factors and
uncertainty in customer demand. On the other hand, risks can include fidelity, payment
fraud and viruses. These characteristics make for a complex system with multi-level
abstractions and heterogeneous elements. Simulation serves as a decision support tool but
there exist limitations of individual simulation paradigms. It is in the interest of these
complex organizational environments to use knowledge of stakeholder actions and
business processes for decision-making (Joledo, 2016). These actions give rise to
nonlinear interactions that are difficult to capture using standalone simulation paradigms.
The complex interactions among different functional areas require modeling and
analyzing the system in a holistic way. There is a lack of mechanism to facilitate
systematic and quantitative analysis of the effects of users and management actions on
peer-to-peer lending system performance through the understanding of the system
Agent-Based Modeling Simulation and Its Application to Ecommerce 257

The complexity of the market and customer behaviors benefit from nontraditional
modeling tools for analysis. Behaviors can be defined at individual level and at the
system level. Hybrid simulation provides an approach that does not make the assumption
of a perfect market and homogeneity.
Internet based models cause disruptions to traditional business models. New players
find it challenging navigating the highly competitive landscape of this complex
environment. Due to aforementioned characteristics, the ecommerce system tends
towards complexity. There exist several performance risks associated with the business
model. These risks include minimal return on investment, government regulations and
lack of trust. Results from case studies and literature review reveal that the performance
of C2C ecommerce remain under explored from a system perspective. Complex
interactions exist among stakeholders, the changing environment and available
technology. There is a need for an integrated system that will provide a testing ground for
managing control actions, anticipating changes before they occur and evaluating the
effects of user actions on the system at different managerial levels.
The presence of continuous and discrete behaviors poses challenges for the use of
existing simulation tools in simulating the C2C ecommerce space. The system is
characterized by uncertainty as well as government regulations and external factors.
Important factors such as liquidity and different threshold values for consumers remain
undefined. Not addressing these issues can result in financial losses and lack of trust that
can erode the benefits of the business model. There is a need to systematically map,
model and evaluate the viability and performance in order to realize the best tradeoff
between benefits and risks. This study presents a framework to systematically map,
model and evaluate the viability and performance in order to evaluate tradeoffs between
benefits and risks.
The paper is organized as follows. Section 2 introduces the application of system
simulation and modeling (system dynamics in particular) to ecommerce research. Section
3 describes the developed framework. Section 4 presents the Lending Club case study
while the application of the agent based simulation and the system dynamics models as
well as some results are presented in Section 5. The paper concludes and prescribes some
future directions for this study in Section 6.


Classifications of System Simulation and Modeling

Ecommerce systems are intelligent systems (Bucki and Suchanek, 2012) and a
system (simulation) can be discrete or continuous. In continuous simulation, the system
evolves as a continuous function represented by differential equations while in discrete
258 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

simulation, changes are represented as separate events to capture logical and sequential
behaviors. An event occurs instantaneously (such as the press of a button or failure of a
device) to cause transitions from one discrete state to another. A simulation model
consists of a set of rules (such as equations, flowcharts, state machines, cellular automata)
that define the future state of a system given its present state (Borshchev and Filippov,
A simulation can also be classified in terms of model structure. Sulistio, Yeo and
Buyya (2004) proposed a taxonomy encompassing different approaches. The presence of
time is irrelevant in the operation and execution of a static simulation model (e.g., Monte
Carlo models). For the case of a dynamic model, in order to build a correct representation
of the system, simulated time is of importance to model structure and operation (e.g.,
queuing or conveyor).
Dynamic systems can be classified as either continuous or discrete. In continuous
systems, the values of model state variables change continuously over simulated time. In
the event that the state variables only change instantaneously at discrete points in time
(such as arrival and service times), the model is said to be discrete in nature. Discrete
models can be time-stepped or event-stepped (or event-driven). In discrete-event models,
the state is discretized and "jumps" in time and the steps (time-step) used is constant.
State transitions are synchronized by the clock i.e., system state is updated at preset times
in time-stepped while it is updated asynchronously at important moments in the system
lifecycle in event-driven systems.
Deterministic and probabilistic (or stochastic) properties refer to the predictability of
behavior. Deterministic models are made up of fixed input values with no internal
randomness given the same output for same corresponding input. Hence, the same set of
inputs produces the same of output(s). In probabilistic models however, some input
variables are random, describable by probability distributions (e.g., Poisson and Gamma
distributions for arrival time and service times). Several runs of stochastic models are
needed to estimate system response with the minimum variance.
The structure of a system determines its behavior over time. Ecommerce system is a
complex, interactive and stochastic system that deals with various people, infrastructure,
technology and trust. In addition, factors like uncertainty, competition and demand
defines its economic landscape. These markets are non-linear, experiencing explosive
growth and continuous change. Developing representative models comprise of detailing
stakeholders and pertaining underlying processes. Decision makers must consider these
factors when analyzing the system and procuring optimal strategies to assess model
Agent-Based Modeling Simulation and Its Application to Ecommerce 259

System Dynamics

The utility of system dynamics (SD) modeling approach is well documented in

literature. SD is a non-data driven system thinking approach that targets top management.
This is convenient since detailed data or business process activities are not always
available. SD is a continuous simulation methodology whose models are more intuitive
than discrete-event simulation. This methodology lends its expertise to dynamic problems
of strategic importance for varying horizons.
The interest of SD is not in the implementation of individual events but in aggregate
terms. Several studies adopt SD to model overall structure of the organization at strategic
and tactical management levels as well as to capture financial and global environment
(Rabelo, et al. (2007); Rabelo et al. (2005)).
Speller et al. (2007) developed a system dynamics model to capture dynamic value
chain system of “the traditional production/assembly supply chain with service
components added to it.” The first step is made up of generic, causal-loop diagrams and
subsequently a detailed stock-and-flow model. Taylor series approximations were used to
generate a linear system of differential equations to capture the behavior of the system
with time. These behaviors are analyzed and make long-range predictions of interest
using the eigenvalue technique. SD serves as a response to the inadequacy of the
application of operation research and other management science methodologies for
solving complex problems with large number of variables, nonlinearity and human
SD modeling captures physical laws governing a system using subjective thinking
with an assumption of dynamic behavior of entities (An and Jeng, 2005). Due to
complexity characterized by nonlinearity and time delay, the system may not be solved
analytically. Available numerical method for ordinary differential equations such as
Euler’s first order finite difference, Runge-Kutta second and fourth order finite difference
method can be employed to solve the system numerically.
System dynamics models have been used to represent and analyze different aspects
of the e-commerce business. Causal loops diagrams are useful to capture the structure of
e-business systems (Kiani, Gholamian, Hamzehei, & Hosseini, 2009) and to understand
how positive and negative feedbacks have impact on the strategies designed for online
markets (Fang, 2003; Oliva, Sterman, & Giese, 2003).
Topics of study using SD in the internet environment, such as consumer behavior
(Khatoon, Bhatti, Tabassum, Rida, & Alam, 2016; Sheng & Wong, 2012) and credit risk
analysis (Qiang, Hui, & Xiao-dong, 2013) are examples of important aspects considered
when modeling online trading. System dynamics models are widely used as policy
laboratories to find the appropriate strategies to reduce cost and increase revenue. This
type of research has also been applied to the online marketplace (Liping An, Du, & Tong,
260 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

2016; Lin & Liu, 2008) where identifying the relevant parameters is essential to profit


A generic conceptual C2C framework is developed to manage system complexity,

assess viability and evaluate system behavior. To decompose the problem, system
boundaries are considered to identify strategic and tactical problem solving opportunities
(Joledo, 2016). Viewing this space as a complex system characterized by uncertainty and
varying behaviors, the proposed steps are as follows:

i. Identify all the stakeholders in the system

ii. Identify the factors (internal and external) that influence the system
iii. Evaluate the competitive landscape of the business model
iv. Define the system supply chain
v. Specify performance metrics
vi. Specify interactions between components
vii. Model the behavior of the system, and
viii. Analyze the results of the model implementation

Figure 1 illustrates how characteristics of the system are employed in developing the
proposed framework. As previously identified, organizations face an ever-increasing
number of challenges and threats such as changes in market, competitors, customer
demands and security. These risks are used to generate a mechanism for risk
classification assignable to system characteristics. The needs of the stakeholders are then
integrated into the developed framework since they define what brings value to the
The ecommerce system is influenced by internal and external factors. Internal factors
include the cost of operation, management actions and policies, processes involved in
delivering value to the customers, risks associated with implementing the business model
and generated income. External factors are uncontrollable but it is imperative that the
organization responds in ways to adequately manage them. These factors include the
change in market, activities of competitors, customer demand, government regulations
and the global economy.
Managing the supply chain of the system exposes the inefficiencies associated with
achieving organizational goals. The C2C ecommerce space is mapped in order to identify
the suppliers, clients and communication requirements. Based on the information
obtained from this stage, the modeling of system complexity is applied for dynamic
Agent-Based Modeling Simulation and Its Application to Ecommerce 261

Starting with the desired state, performance indicators influence the achievement of
the system goals. The factors of interest are summarized as costs, security, customer
satisfaction, profits and market share. Once critical success factors are defined, the
complexity of the system which takes into consideration all the characteristics hereby
identified can then be modeled and results analyzed for policy development.

Figure 1. Framework mapping of system characteristics.

In line with the characteristics of the system, the proposed framework is implemented
from a hybrid perspective. Such an implementation provides a testbed for analysis of
management and stakeholder actions and also for evaluating performance of the system
under different conditions. Hybrid simulation finds extensive applications in research and
practice in part because most real life systems are hybrid in nature. Hybrid models can be
used to analyze business policies and performance, thereby acting as a complementary
tool in decision making.


The present study adopts Lending Club ( as

representative study of the dynamics of C2C ecommerce space. The case study is used to
identify suppliers, consumers and processes of the business model as well as related
internal and external factors. Data is generated from company 10-K, prospectus, blogs
262 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

and company website. The case study helps to select and define boundaries and core
areas of interactions on the platform.
To describe the online consumer-to-consumer (social) lending in the context of an
ecommerce system, liquidity, pricing models, and uncertainty, hybrid modeling is used.
Growth in the industry partly results from investors being discouraged by stock market
returns and lower interests provided by banks. Results from business case studies and
literature review indicate that the success of peer-to-peer (P2P) lending business process
innovation has not been proven. As an example, it is beneficial to balance the number of
lenders and qualified borrowers that can effectively meet the mutual needs of the
customers. Because this form of lending is insecure, lenders are exposed to a risk of
default by the borrower. The platforms have to deal with uncertainties that pervade all
aspects of its operations. Any unplanned downtime, outage or system hack can have long
term effects to its operations and credibility (Joledo et al. 2014).


The hybrid simulation models in this research are developed using AnyLogic
( AnyLogic has the capability of creating mixed discrete-
continuous simulations of ABS and SD models in the same interface. Seller consumers
come into the system with goods to sell. These consumers require different thresholds of
returns. Buyers have different cut-off prices which they can pay for transactions on the
platform. Consumers are modeled as agents whose behaviors elicit corresponding
responses. The dynamics of price agreement are also modeled in the agent-based system.
The environment is modeled in SD with agents living therein. The population of
consumers is disaggregated to individual level using agents.
In the simulation of business processes, interactions between players are modeled
using statecharts. The system is simulated over a period of eight years to gain insights in
to the behavior of participants and how their individual or collective actions affect the net
income and in turn the profit margin. The output of the overall system is viability
measured by the default rates, net income and profit margin. Outputs of the ABS
subsystem are fed into the time-continuous SD model (strategic layer). The assumption
for this study is that the seller seeks to sell his product at a profit while the buyer seeks to
pay the minimum cost for a particular product. The provider supplies a medium for the
realization of customer utility while making a profit in the process.
Agent-Based Modeling Simulation and Its Application to Ecommerce 263


Real data of LC business model is available via its platform. Data on arrival patterns
and arrival intervals are generated stochastically according to the data collected for years
2013 and 2014. There were 235,629 accepted loan requests during the period of interest.
Error! Reference source not found. summarizes descriptive statistics for variables relating
to the funded (accepted) borrowers within the time period.

Table 1. Borrower Profiles

Variable name Minimum Maximum Mean Std. Deviation

funded_amnt ($) 1000 35000 14870 8438
int_rate (%) 6.00 26.06 13.78 4.32
annual_inc ($) 3000 7500000 74854 55547
dti 0 39.99 18.04 8.02
delinq_2yrs 0 22.00 0.34 0.89
inq_last_6mths 0 6.00 0.76 1.03
revol_util ($) 0 892.30 55.69 23.10
total_acc 2.00 156.00 26.01 11.89

Variables of interest include loan amount (funded_amnt), interest generated based on

user characteristics (int_rate), annual income of the borrower (annual_inc), debt-to-
income ratio (dti), number of delinquencies in the past 2 years (delinq_2yrs), number of
inquiries in the past 6 months (inq_last_6mths), revolving utilization ratio (revol_util),
verification status of the user, number of accounts open in the last 2 years (total_acc) and
the term of the loan (36 or 64 months).
The loan status includes Charged Off, Current, Default, Fully Paid, In Grace Period,
Late (16-30 days) and Late (31-120 days). Only completed loans are considered i.e.,
those that have been fully paid or charged off.

Neural Network

The neural network (NN) is used to map the characteristics of users to different risk
decisions and to copy trust. Profiles of completed loans are used to build the NN model
representations using combined datasets of the accepted and rejected loans. A random
sample of 2062 data points from the combined dataset forms the training data used in the
learning process. The input is normalized by dividing amount requested by 3.5, FICO
score by 850 and employment length by 10.
The network structure consisted on four layers (Fig. 2). The first layer has 4 neurons
representing each of the following variables: amount, FICO, dti and employment length.
264 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

Based on business model of Lending Club, these four variables were employed in our
framework to determine which borrowers are screened into or permitted to do
transactions on the platform. The NN also has two hidden layers with 5 and 3 neurons
respectively. Finally, the output layer has two neurons called Output 1 and Output 2 that
fire up any value between 0 and 1. Thus, if Output 1 is larger than Output 2 then it is
considered an acceptance, otherwise it is a rejection.
Taking that into account a test with the entire dataset is run and the resulting error is
0.1118. That means that about 11.1% instances of the training values are misclassified.
To improve the capacity of the NN to represent the information and get better results, the
structure of the NN is changed by adding more layers and varying the number of neurons
per layer.
To improve the capacity of the NN to represent the information and get better results,
the structure of the NN was changed by adding more layers and varying the number of
neurons per layer. The new results for a sample of the accepted data obtained an average
training error of 0.009570 and a target error of 0.0100.

Figure 2. Network structure of the neural network.

Agent-Based Simulation and Validation

The individual behaviors of consumers are modeled in the ABS subsystem. The
simulation begins by declaring and initializing all variables. Probabilities are assigned to
the different agent variables based on their corresponding distributions. The loan lifetime
is defined by parameter Term. The requested Amount, FICO, DTI and Credit History are
stochastic characteristic of a borrower.
Agent-Based Modeling Simulation and Its Application to Ecommerce 265

The users are modeled as agents with individual behaviors. Risk is modeled into
agent by utilizing the dti, credit history, fico range, income to generate a corresponding
interest rate. Depending on the user state, transitions are triggered by timeouts or by
meeting certain conditions. On executing the program, new borrowers are created who
transition to the PotentialBorrower state. In this state, FICO, DTI and Amount requested
are passed to the neural network class in order to generate a decision on which borrower
transitions to the Screened state. The time spent in a given state follows a uniform
distribution reflecting the time range associated with its state. For example, a typical
Lender takes about 45 days between entry and receiving of first payment. Similarly, the
time spent in the PotentialBorrower state before screening ranges from 2 to 4 days. The
statechart representing borrower and lender behaviors and interactions with the system is
given in Figure 3 and Figure 4.

Figure 3. Borrower statechart.

266 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

Once the borrower is screened, an interest rate is generated to reflect his risk profile.
A draw on the lookup table is used to generate an interest rate that corresponds to the
borrower. On receiving the interest rate, the borrower makes a decision to agree or
decline the terms of the loan. If he declines, he has an option to follow the
noToAgreement transition and go back to the PotentialBorrower state where he can
decide to remain or to leave the platform. If the borrower agrees to the terms of the loan,
he proceeds to the PostedProfile state via the yesToAgreement transition. The decision to
accept interest rate is internal and probabilistic based on the borrower’s risk preference
and personal goals. A call is made to the requestServiceB() function which communicates
the borrower profile to available lenders. If the borrower profile matches a given lender’s
risk aversion, he accepts and stores the id of the borrower along with his profile
Once the lender agrees to fund the borrower, the borrower transitions to the Funded
state where it remains for a uniformly distributed period that reflects the time it takes to
fully fund the request. After which it transitions to the InRepayment state where it
remains for the term (usually 36 or 60 months). Thirty days after entering the
InRepayment state, the borrower starts to make payment every 27 to 31 days. This time
range reflects the fact that borrowers pay their bills early, on time or late.

Figure 4. Lender statechart.

Agent-Based Modeling Simulation and Its Application to Ecommerce 267

There is one transition from InRepayment state and this has two branches. One of the
branches leads to FullyPaid while the other to the InDefault state and then to Exit where
the borrower leaves the system on charge off. The decision at the TestDefault branch is
made internally and stochastically. The average amount of capital and interests that is
repaid, recovered or lost when a borrower defaults is also reflected.
LC, which acts as a central dispatcher, broadcasts requests for borrower loans to all
lenders. For simplicity, LC is modeled as a function call that responds to requests. LC
listens for messages from the borrower as well as the lender side and manages the
transaction completion on behalf of the agents. LC inserts a message in the queue and
notification is broadcasted to borrowers and lenders. BorrowerAB and LenderAB
represent borrower and lender agent classes. The communication instances used in the
model are summarized below:

1) Screening request: a message arrives from the borrower and lender requesting
2) Interest rate generation: the LC generates an interest rate and communicates it to
the borrower.
3) Borrower decision on interest rate: based on the risk profile, the borrower decides
to accept or reject the generated interest rate.
4) Lender’s decision on interest rate: the lender decides to fund a particular
borrower with an associated interest rate based on its risk profile.
5) Payment: payments are communicated to LC and in turn to the lender.
6) Default: the borrower leaves the platform and the lender and borrower returns are
7) Fully paid: a message from the borrower and lender deciding if to go back as
potential customers or they can choose to leave the system.

It is assumed that participants are sensitive to ads and word of mouth (WOM). The
WOM effect is the way new users are persuaded to purchase a product or adopt a service.
Consumers persuade others to adopt a service or buy a good often using word of mouth.
Each participant’s adoption time differs. In this system, customer satisfaction measured
by response to WOM and results from satisfactorily completed loans. Hence, it is
expected that as more customers default, the WOM decreases. A consumer contacts an
average of number people in a month i.e., a specified contact rate. Agents in the system in
turn contact each other and influence potential borrowers to sign up for the service.
Space and request queue management are defined within the Main with space and
layout requirements configured in the Environment object contained in the Main. A value
of 1000 each was assigned as initial number of borrowers and lenders in the system.
268 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

An advantage of object oriented ABM is that we can look deeper into each object –
borrower or lender – and view its state and variable values. The following are some
inputs used in calibrating the agent based model (Figure 5):

 The number of borrowers is initialized to 1000.

 A random borrower can request anywhere from $1000 to $35,000 and based on
his profile.
 The contact rate is kept at 1.5% to prevent the number of new agents entering the
system from growing too large.
Simulation experiments help to facilitate systematic and quantitative analysis on the
effects of factors of interest. Simplifying modeling assumptions adopted for this study

 A given lender is attached to a given borrower.

 Agents leave after they complete payment.
 A borrower has an option to return to the state of potential borrower.
 Agents who default must leave the system.
 Probability distributions are used to generate the agent profiles.
 Arrival patterns of borrowers and lenders are based on LC user arrival rate.
 Term of loans is either 36 months or 60 months and the choice follows a
probability similar to real data.
 State transitions are instantaneous and time durations are factored into the
timeout triggered property.

Figure 5. Agent-based simulation interface.

Agent-Based Modeling Simulation and Its Application to Ecommerce 269

System Dynamics

The system dynamics (SD) model incorporates estimates of the demand, behaviors of
customers, costs, and market conditions. The SD phase involves first modeling a causal
loop diagram of the peer-to-peer lending environment using identified key system
variables (Figure 6).

Figure 6. System dynamics model.

The causal loop diagram forms the backbone of the system dynamics model. Causal
loops are constructed to capture interrelationships of critical success factors identified in
literature. The metrics of interest in the SD model include profitability, customer
satisfaction, and responsiveness. In the model, profitability is measured as net income
and profit margin. The SD model receives input from the ABM. The output of the system
is the projected AvgNetAnnualizedReturn, MarketShare, NetIncome and ProfitMargin
for a period of the given lending span (eight years in this study). The Net Annualized
Return (a customer facing metric inherent to the LC business model) is the income from
interest less service charges, charge off and including recoveries. MarketShare is the
amount of the total ecommerce market captured by the simulated company. While
270 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

ProfitMargin (an organization facing metric) is the NetIncome less inflation compared to
the total income derived from interests.
The following are some inputs used in calibrating the system dynamics model:

 The initial investment by the C2C company is 500 (with all cash amounts in tens
of thousands of dollars)
 The effective tax rate of the organization is 34%
 All transactions accrue a 1% service charge
 The simulation runs from January 1st 2012 for 8 years.
In Figure 6, the points where the ABM is coupled with the SD model are denoted by
an ABM suffix. For example, The AmmountFundedABM, InterestIncomeABM,
AmountRepaidABM are dynamic variables whose values are dependent on an update
from the ABM subsystem. The NetIncome stock represents an accumulation of the gross
income net taxes.


In Figure 7, 1000 borrowers initially enter the system and as time (horizontal axis)
progresses, borrowers start to transition to the Screened, PostedProfile, Funded,
InRepayment and FullyPaid states. As time progresses, new users are added to the system
by responding to the WOM effects of other borrowers and lenders. At the end of the
simulation period, a total of about 1700 borrowers and 2100 lenders are in the system.
This number can be controlled by varying the WOM factor. For speed and efficiency, this
number is kept low in the present study. A portion of users remain in the
PotentialBorrower state because some of the borrowers who come into the system do not
meet the screening requirements and never progress to the screened state.

Figure 7. Borrower states in the system.

Agent-Based Modeling Simulation and Its Application to Ecommerce 271

Observing the behavior of metrics in Lending Club suggest that net annualized return
declines exponentially as time progresses. This is in line with the output of the
AvgNetAnnualizedReturn metric in Figure 8. It becomes evident that as time progresses,
more borrowers begin to default, effectively driving AvgNetAnnualizedReturn
downwards. This presents a challenge that conflicts with the goal of viability of the
business model.

Figure 8. Time plots of metrics.

An Increase in ProfitMargin results from an increase in the repayments (both

principal and interests) and decrease in the charge offs. An increase in ChargeOff s has a
negative effect on the AvgNetAnnualizedReturn and NetIncome (Fig. 9). This creates
pressure on management to increase service charges in order to maintain profitability and
increase market share (MarketShareDS).
272 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

Figure 9. Response of net income to taxation.

In the early phase of the simulation, the initial capital and cost weigh heavily on the
system. The sudden spikes MarketShareDS signifies that the first phase of borrowers in
the different time horizons have completed their loan cycle and new users are being
initialized. Most borrowers return to PotentialBorrower state where they can request new
loan and the process repeats itself. Net income increases slowly in the first two years due
to the fact that the starting number of borrowers is low and because the effect of WOM
only becomes significant with time.
Results from our study is compared to original data provided by Lending Club and is
illustrated in Fig. 10. This comparison serves to validate the usefulness of the developed
framework in estimating the net annualized return metric. The results show that the
average net annualized returns obtained from our model follows the same pattern and is
relatively close in value to that obtained from historical performance.

Figure 10. Average net annualized return comparison.

Agent-Based Modeling Simulation and Its Application to Ecommerce 273


The developed simulation models serve as a testbed for managing control actions by
incorporating fluctuations and stochasticity. The system dynamics model captures a high
level abstraction of the system. A multi-model paradigm consisting of agent based
simulation allows appropriate choice of techniques that take into consideration different
components of the system.
In online consumer-to-consumer lending, risks and uncertainties pervade aspects of
operation. The model uses consumers’ historical payments, outstanding debts, amount of
credit available, income and length of credit history to make its calculations. The
framework offers a structured approach that incorporates business processes and
stakeholder requirements and lends its use to ecommerce systems.
The developed simulation model takes into consideration difference in customer
characteristics and stochasticity in demand patterns. The framework provides insights to
the overall behavior of consumer-to-consumer ecommerce complex systems. This in turn
provides insights on the profitability of the business model and strategies for improving
system performance. The result is a recommendation for a course of action which
complements management’s expertise and intuition.
An extension to this study will be to explore the case where a borrower’s request is
met by multiple lenders, and how such strategy impacts individual and system
performance. There is also room to improve on the risk classification phase. Validity of
the results hinges on correct interpretation of the output of the model. As a result, there is
a need to also improve the accuracy of neural network prediction algorithm to encompass
a system that perpetually improves based on learning. Further research can also
investigate to what extent P2P models can reduce costs and fees and if such reduction is
worth the associated risk.
It is expected that conceptual modeling approaches will continue to be a beneficial
approach for analyzing consumer-to-consumer complex systems. This study lays a
foundation for future research to expand on the guidelines and simulation development in
modeling the operations of an organization.


An, Lianjun, & Jeng, J.-J. (2005). On developing system dynamics model for business
process simulation. In Simulation Conference, 2005 Proceedings of the Winter (p. 10
An, Liping, Du, Y., & Tong, L. (2016). Study on Return Policy in E-Commerce
Environment Based on System Dynamics. In Proceedings of the 2nd Information
274 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

Technology and Mechatronics Engineering Conference. Atlantis Press.
Borshchev, A., & Filippov, A. (2004). From system dynamics and discrete event to
practical agent based modeling: reasons, techniques, tools. In Proceedings of the
22nd international conference of the system dynamics society. Retrieved from
Bucki, R., & Suchanek, P. (2012). The Method of Logistic Optimization in E-commerce.
J. UCS, 18(10), 1238–1258.
Fang, Y. (2003). A Conceptual Model Of Operating Internet-based B2C Business In
Fast-growing Industries. In The International System Dynamics Conference. New
York City, New York. Retrieved from
Helal, M. (2008). A hybrid system dynamics-discrete event simulation approach to
simulating the manufacturing enterprise (Ph.D.). University of Central Florida,
United States -- Florida. Retrieved from
Helal, M., Rabelo, L., Sepúlveda, J., & Jones, A. (2007). A methodology for integrating
and synchronizing the system dynamics and discrete event simulation paradigms. In
Proceedings of the 25th international conference of the system dynamics society
(Vol. 3, pp. 1–24). Retrieved from
Joledo, O. (2016). A hybrid simulation framework of consumer-to-consumer ecommerce
space (Doctoral Dissertation). University of Central Florida, Orlando, Florida.
Retrieved from
Joledo, O., Bernard, J., & Rabelo, L. (2014). Business Model Mapping: A Social Lending
Case Study and Preliminary Work. IIE Annual Conference. Proceedings, 1282–1290.
Khatoon, A., Bhatti, S. N., Tabassum, A., Rida, A., & Alam, S. (2016). Novel Causality
in Consumer’s Online Behavior: Ecommerce Success Model. International Journal
of Advanced Computer Science and Applications, Vol 7, Iss 12, Pp 292-299 (2016),
(12), 292.
Kiani, B., Gholamian, M. R., Hamzehei, A., & Hosseini, S. H. (2009). Using Causal
Loop Diagram to Achieve a Better Understanding of E-Business Models.
International Journal of Electronic Business Management, 7(3), 159.
Lin, J.-H., & Liu, H.-C. (2008). System Dynamics Simulation for Internet Marketing.
2008 IEEE/SICE International Symposium on System Integration, 83.
Oliva, R., Sterman, J. D., & Giese, M. (2003). Limits to growth in the new economy:
exploring the “get big fast” strategy in e-commerce. System Dynamics Review, 19(2),
Agent-Based Modeling Simulation and Its Application to Ecommerce 275

Qiang, X., Hui, L., & Xiao-dong, Q. (2013). System dynamics simulation model for the
electronic commerce credit risk mechanism research. International Journal of
Computer Science Issues (IJCSI), 10(2). Retrieved from
Rabelo, L., Eskandari, H., Shaalan, T., & Helal, M. (2007). Value chain analysis using
hybrid simulation and AHP. International Journal of Production Economics, 105(2),
Rabelo, L., Eskandari, H., Shalan, T., & Helal, M. (2005). Supporting simulation-based
decision making with the use of AHP analysis. In Proceedings of the 37th conference
on Winter simulation (pp. 2042–2051). Winter Simulation Conference. Retrieved
Sheng, S. Y., & Wong, R. (2012). A Business Application of the System Dynamics
Approach: Word-of-Mouth and Its Effect in an Online Environment. Technology
Innovation Management Review, Iss June 2012: Global Business Creation, Pp 42-48
(2012), (June 2012: Global Business Creation), 42–48.
Speller, T., Rabelo, L., & Jones, A. (2007). Value chain modelling using system
dynamics. International Journal of Manufacturing Technology and Management,
11(2), 135–156.
Sulistio, A., Yeo, C. S., & Buyya, R. (2004). A taxonomy of computer-based simulations
and its mapping to parallel and distributed systems simulation tools. Software:
Practice and Experience, 34(7), 653–673.
Wu, J.-H., & Hisa, T.-L. (2004). Analysis of E-commerce innovation and impact: a
hypercube model. Electronic Commerce Research and Applications, 3(4), 389–404.


Dr. Oloruntomi Joledo has four years experience developing software applications
and also as a project engineer on various technological projects. Her main research
interests include: agents, discrete-event simulations, agent-based simulations, hybrid
simulations, software development and engineering management. She works for the
College of Medicine at UCF as coordinator and data analyst.

Edgar Gutierrez is a Research Affiliate at the Center for Latin-American Logistics

Innovation (CLI), a Fulbright Scholar currently pursuing his PhD in Industrial
Engineering & Management Systems. His educational background includes a B.S. in
Industrial Engineering from University of La Sabana (2004, Colombia). MSc. in
Industrial Engineering, from University of Los Andes (2008, Colombia) and Visiting
Scholar at the Massachusetts Institute of Technology (2009-2010, USA). Edgar has over
276 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari

10 years of academic and industry experience in prescriptive analytics and supply chain
management. His expertise includes machine learning, operation research and simulation
techniques for systems modelling and optimization.

Hatim Bukari is a Lecturer at the Department of Industrial Engineering, University

of Jeddah, Saudi Arabia, currently pursuing his PhD in Industrial Engineering &
Management Systems at the University of Central Florida (UCF) (Orlando, FL, USA).
His educational background includes a B.S. in Mechanical Engineering from King
AbdulAziz University (2005, Saudi Arabia). MSc. in Engineering Management from
Santa Clara University (2010, USA). His expertise includes reliability engineering,
simulation modeling and engineering management.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 12



Jose M. Prieto*
UCL School of Pharmacy, London, UK


Complex natural products such as herbal crude extracts, herbal semi purified
fractions and Essential oils (EOs) are vastly used as active principles (APIs) of medicinal
products in both Clinical and Complementary/Alternative Medicine. In the food industry,
they are used to add ‘functionality’ to many nutraceuticals. However, the intrinsic
variability of their composition and synergisms and antagonisms between major and
minor components makes difficult to ensure consistent effects through different batches.
The use of Artificial Neural Networks (ANNs) for the modeling and/or prediction of the
bioactivity of such active principles as a substitute of laboratory tests has been actively
explored during the last two decades. Notably, the prediction of antioxidant and
antimicrobial properties of natural products have been a common target for researchers.
The accuracy of the predictions seems to be limited only by the inherent errors of the
modelled tests and the lack of international agreements in terms of experimental
protocols. However, with sufficient accumulation of suitable information, ANNs can
become reliable, fast and cheap tools for the prediction of anti-inflammatory, antioxidant,
antimicrobial and antiinflammatory activities, thus improving their use in medicine and

Keywords: artificial neural networks, natural products, bioactivity

Corresponding Author Email:
278 Jose M. Prieto


Artificial neural networks are a type of artificial intelligence method. They are
applied in many disparate areas of human endeavours, such as the prediction of stock
market fluctuations in economy, forecasting electricity load in energy industry,
production of milk in husbandry, quality and properties of ingredients and products in the
food industry, prediction of bioactivities in toxicology and pharmacology or the
optimization of separation processes in chemistry (Dohnal, Kuča & Jun, 2005; Goyal,
In particular, the prediction of the bioactivity of natural products after their unique
chemical composition is an idea already well established among the scientific community
but not systematically explored yet, due to the experimental complexity of characterising
all possible chemical interactions between dozens of components (Burt, 2004). In this
regard, Neural networks have an enormous advantage in that they do require less formal
statistical training, can detect complex non-linear relationships between dependent and
independent variables and all possible interaction without complicated equations, and can
use multiple training algorithms. Moreover, in terms of model specification, ANNs
require no knowledge of the internal mechanism of the processes but since they often
contain many weights that are estimated, they require large training set. The various
applications of ANNs can be summarized into classification or pattern recognition,
prediction and modeling (Agatonovic-Kustrin & Beresford, 2000; Cartwright, 2008).
Therefore, the use of ANNs may overcome these difficulties thus becoming a
convenient computational tool allowing the food and cosmetic industry to select herbal
extracts or essential oils with optimal preservative (antioxidant and antimicrobial
properties) or pharmacological activities (anti-inflammatory properties). This is not
trivial, as natural products are notoriously complex in terms of chemical composition,
which may significantly vary depending on the batch and the supplier. This variability
implies a constant use of laboratory analysis. ANNs able to model and predict such
properties would result in savings and enhanced consistency of the final product. The use
of such computational models holds potential to overcome –and take into account- all the
possible (bio) chemical interactions, synergisms and antagonisms between the numerous
components of active natural ingredients.


To facilitate the understanding of the non-specialised reader, this section is conceived

as a “layman” presentation of the fundamental concepts surrounding the use of ANNs.
For a deeper understanding, the reader is encouraged to read the excellent papers
Artificial Intelligence for the Modeling and Prediction ... 279

published by Krogh (2008), Dohnal et al. (2005), and Zupan & Gasteiger (1991). These
are listed in order of increasing complexity for a smooth progression.
The conception of an Artificial Neurone (AN) fully originates from the biological
neuron. Each AN has certain number of inputs. Each of them has assigned its own
weight, which indicates the importance of the input. In the neuron, the sum of weighted
inputs is calculated and when its sum overcomes a certain value, called threshold (but
also known as bias or noise), the sum is then processed using a transfer function and the
result is distributed through the output to the next AN (Figure 1).
Similarly, the term “Artificial neural networks” (ANNs) originates from its biological
pattern – neural network (NN) which represents the network of interconnected neurons in
a living organism. The function of NN is defined by many factors, for example by
number and arrangements of neurons, their interconnections, etc. Figure 2 shows how
ANNs are based on the same conception as the biological ones; they are considered as the
collection of interconnected computing units called artificial neurons (AN). The network
is composed by a set of virtual/artificial neurons organized in interconnected layers. Each
neuron has a specific weight in the processing of the information. While two of these
layers are connected to the ‘outside world’ (input layer, where data is presented, and
output layer, where a prediction value is obtained), the rest of them (hidden layers) are
defined by neurons connected to each other, usually excluding neurons of the same layer
(Figure 2).

Figure 1. Comparison between form and function of biological and artificial neurones.
280 Jose M. Prieto

©Jose M Prieto

©Jose M Prieto

Figure 2. Comparison of form and function in (A) biological and (B) artificial neuronal networks.
Artificial Intelligence for the Modeling and Prediction ... 281

©Jose M Prieto

Figure 3. Unsupervised training of and artificial neuronal. (A) Training set of inputs and outputs
representing the experimental values taken from real life; (B) The ANN builds up an algorithm by a
series of iterations where the weights and thresholds are finely tuned to get as closer as possible of the
output value given in (A).

ANNs may have numerous and be arranged in various ways (=“anatomies” or

“topologies”). ANNs can be applied to a wide range of areas depending on their
topology. Here are the main types or ANNs: Feedforward neural network, radial basis
function (RBF) network, Kohonen self-organizings network, recurrent networks,
stochastic neural networks, modular neural networks. Here, only the Multilayer feed-
forward ANN (MLF-ANN) will be described in detail as it is by large the most preferred
when a prediction of discrete numbers measuring bioactivities or chemical properties is
needed. Kohonen self-organizings network are popularly used for classification
problems. The feed-forward ANN consists of neurons organised into 3 or more layers.
The first one (“input layer”), one or more internal layers (“hidden layers” or “learning
layers”), and the last one (“output layer). This neural network was the first and arguably
simplest type of artificial neural network devised. In this network, the information moves
in only one direction, forward, from the input layer, through the hidden layers and to the
output layer. There are no cycles or loops in the network.
282 Jose M. Prieto

Figure 4. (A) Comparison between real (squares) experimental values and those calculated or predicted
by the ANN (dots). (B) Quantitative measurement of the performance of the ANN (From Daynac,
Cortes-Cabrera, & Prieto, 2016).
Artificial Intelligence for the Modeling and Prediction ... 283

The architecture can vary in terms number of internal layers, how many ANs in each
layer, the connections (fully or partially interconnected layers) between ANs and in the
transfer function of chosen for the signal processing of each AN.
From the ANN theory, it is evident, that there are many values (weights, thresholds)
which have to be set. To do so, many adaptation algorithms have been developed which
mainly fall into two basic groups: supervised and unsupervised.
Supervised algorithm requires the knowledge of the desired output. The algorithm
then calculates the output with current weights and biases. The output is compared with
targeted output and the weights and biases are adjusted by algorithm. This cycle is
repeated until the difference between targeted and calculated values is as closer as it can
get. The most applied supervised algorithms are based on gradient methods (for example
‘back propagation’) (Figure 3) and genetics (genetic algorithms). While the supervised
learning algorithm requires the knowledge of output values, the unsupervised does not
need them. It produces its own output which needs further evaluation.
When the ANN finishes the adjustments after a established number of iterations (or
epochs) it is necessary to check that it actually is ‘fit for purpose’: the prediction ability
of the network will be tested on a validating setoff data. This time only the input values
of the data will be given to the network which will calculate its own output. The
difference between the real outputs and the calculated ones can be investigated to
evaluate the prediction accuracy of the network. This can be directly visualised (as in in
Figure 4A) but eventually the performance of the predictions have to be measured by
linear correlation (see Figure 4B).



Two main areas of application are directly linked with the potential use of natural
products: Food industry and Pharmaceutical research. Both have started to use ANNs as a
tool to predict both the best processing methods and the final properties of the final
products made from natural sources. Perhaps ANNs are better stablished in the food
chemistry sector, whilst their use in pharmaceutical research is lagging behind.
Indeed, ANNs have been applied in almost every aspect of food science over the past
two decades, although most applications are in the development stage. ANNs are useful
tools for food safety and quality analyses, which include modeling of microbial growth
and from this predicting food safety, interpreting spectroscopic data, and predicting
physical, chemical, functional and sensory properties of various food products during
processing and distribution. (Huang, Kangas, & Rasco, 2007; Bhotmange & Shastri,
284 Jose M. Prieto

On the one hand, application of ANNs to food technology, for example control of
bread making, extrusion and fermentation processes (Batchelor, 1993; Eerikanen &
Linko, 1995; Latrille, Corrieu, & Thibault, 1993; Ruan, Almaer, & Zhang, 1995) are
feasible and accurate, easy to implement and will result in noticeable advantages and
savings to the manufacturer. On the other hand, prediction of functionality (antioxidant,
antimicrobial activities for example) is not so much explored, perhaps given the
complexity of the experimental design associated with those, that we will discuss in detail
later, and the less obvious advantages for the manufacturer.
The potential applications of ANN methodology in the pharmaceutical sciences
range from interpretation of analytical data, drug and dosage form design through
biopharmacy to clinical pharmacy. This sector focuses more on the use of ANNs to
predict extraction procedures (similarly to the food sector), pharmacokinetic and
toxicological parameters. These three aspects are usually non-linear thus in need of AI
tools, that can recognize patterns from data and estimate non-linear relationships. Their
growing utility is now reaching several important pharmaceutical areas, including:

 Quantitative Structure Activity Relationship (QSAR) and molecular modeling

(Kovesdi et al., 1999; Jalali-Heravi & Parastar, 2000)
 Toxicological values of organic compounds based on their structure and
mutagenicity (Jezierska, Vračko, & Basak, 2004).
 Pharmacological activities (Chen et al., 2011)
 Modeling of drug solubility (Huuskonen, Salo, & Taskinen, 1998) and other
pharmacokinetic parameters (Ma et al., 2014)
 Response surface modeling in instrumental (chromatography) to predict the
retention as a function of changes in mobile phase pH and composition analysis
optimization (Agatonovic-Kustrin & Loescher, 2013)
 Optimization of formulations in pharmaceutical product development (Parojcić et
al., 2007)

Most of the above problems are solved for the case of single (natural or synthetic)
drugs. However, the urgency of applying ANN based approaches is best perceived to the
clinical rationalisation and exploitation of herbal medicines. Herbal medicines contain at
least one plant based active ingredient which in turn contains dozens to hundreds of
components (phytochemicals). To start with, little is known about which phytochemical/s
is/are responsible for the putative properties of the herbal ingredient. Chagas-Paula et al.
(2015) successfully applied ANNs to predict the effect of Asteraceae species which are
traditionally used in Europe as anti-inflammatory remedies (for details see “Prediction of
the anti-inflammatory activities” below). When multiple herbal ingredients (10-20) are
used, such as in Traditional Chinese Medicine, the exact role of each drug may be only
possible to understand if the myriad of influencing factors are harnessed by AI means.
Artificial Intelligence for the Modeling and Prediction ... 285

(Han, Zhang, Zhou, & Jiang, 2014) taking advantage of the fact that ANNs require no
knowledge of the internal mechanism of the processes to be modelled.
Similar to pharmacotoxicology, pathology is a complex field in which modern High-
throughput biological technology can simultaneously assess the levels of expression of
tens of thousands of putative biomarkers in pathological conditions such as tumors, but
handling this complexity into meaningful classification to support clinical decisions
depends on linear or non-linear discriminant functions that are too complex for classical
statistical tools. ANNs can solve this issue and to provide more reliable cancer
classification by their ability to learn how to recognize patterns (Wang, Wong, Zhu, &
Yip, 2009)

Prediction of Antioxidant Properties

Antioxidant capacity is nowadays accepted as a criterion of food quality and to

monitor the impact of food processing in the nutraceutical value of food products
(Shahidi, 2000). In experimental pharmacology antioxidant properties are also the object
of intense research as they have been shown to influence and resolve many pathological
processes (Young & Woodside, 2001), but so far, the complexity and sometimes
contradictory effects of antioxidants hamper their implementation into therapeutic
approaches (Mendelsohn & Larrick, 2014). Therefore, developing ANNs able to predict
antioxidant values of natural products may become an important tool for the food
industry as they could avoid implementing any experimental procedure within their
premises. The antioxidant properties of natural products have been on the centre of
intensive research for their potential use as preservatives, supplements, cosmeceuticals or
nutraceuticals by the food and cosmetics industry. Literally hundreds of works reporting
both on the composition and antioxidant properties of natural products have been written
during the last decade. However, this kind of work is under an increasing criticism as the
inherent intra-specific variability of their composition -depending on the location,
altitude, meteorology, type of soil and many other factors- make this kind of work
virtually irreproducible.
To our knowledge, the first report showing the possibility of applying ANNs to
predict the antioxidant capacity of natural products was presented by Buciński, Zieliński,
& Kozłowska, (2004). The authors chose to use the amount of total phenolics and other
secondary metabolites present in cruciferous sprouts as input data. Despite the popularity
of this topic in natural products chemistry no further attempts to use an ANN for the
prediction of the antioxidant capacity of natural products was done until our pioneering
work to predict the antioxidant activity of essential oils in two widely used in vitro
models of antiradical and antioxidant activity, namely 2,2-diphenyl-1-picrylhydrazyl
(DPPH) free radical scavenging activity and linoleic acid oxidation. We could predict the
286 Jose M. Prieto

antioxidant capacities of essential oils of known chemical composition in both assays

using an artificial neuronal network (ANN) with an average error of only 1-3% (Cortes-
Cabrera & Prieto, 2010). Later, MusaAbdullah, & Al-Haiqi (2015) successfully modeled
the popular DPPH assay with ANNs but using a camera as imaging system instead the
conventional colorimetry reader.

Table 1. Prediction of Antioxidant properties of Natural Products using ANNs

Natural Product Input Output Reference

Apple pomace, Apple pomace, orange or Peroxide values of (Ozturk et al.,
orange and potato potato peels content. sunflower oil samples 2012)
Ascorbic acid in Thermal treatments Ascorbic acid content (Zheng, et al.,
green Asparagus parameters 2011)
Bananas Total phenols (Guiné et al.,
Bayberry juice Red, green, and blue (RGB) Anthocyanins, ascorbic (Zheng et al.,
intensity values acid, Total phenols, 2011.)
flavonoids, and
antioxidant activity
Centella asiatica Selected shifts in the 1H DPPH radical (Maulidiani et al.,
Nuclear Magnetic Resonance scavenging activity. 2013)
spectra corresponding to 3,5-
malonilquinic acid (irbic
acid), 3,5-di-O-caffeoylquinic
acid, 4,5-di-O-caffeoylquinic
acid, 5-O-caffeoylquinic acid
(chlorogenic acid), quercetin
and kaempferol.
Cinnamon, clove, Colorimetry of the reaction % Scavenged DPPH (Musa et al., 2015)
mung bean, red
bean, red rice,
brown rice, black
rice and tea extract
Clove bud essential Peroxide concentration; Autooxidation of (Misharina et al.,
oil, Ginger, pimento thiobarbituric acid reactive polyunsaturated fatty 2015)
and black pepper substances; diene conjugate acids in linseed oil.
extracts content; content of volatile
compounds formed as
products of unsaturated fatty
acid peroxide degradation;
and composition of methyl
esters of fatty acids.
Commercial teas Total flavonoids, total total antioxidant activity (Cimpoiu et al.,
catechines and total 2011)
Artificial Intelligence for the Modeling and Prediction ... 287

Natural Product Input Output Reference

Essential oil and leaf Twenty-four compounds DPPH and superoxide (Rahman et al.,
extracts of Curcuma representing 92.4% of the radicals scavenging 2014)
zedoaria total oil was identified. activities.
Total phenolic compounds
determined as gallic acid
Essential Oils Major antioxidant % Scavenged DPPH (Cortes-Cabrera &
components % Linoleic acid Prieto, 2010)
Green Tea Near Infrared (NIR) spectra Antioxidant activity (Chen et al., 2012)
Guava Extraction conditions Anti-glycation and (Yan et al., 2013)
DPPH radical
scavenging activities.
Hazelnut oil gallic acid, ellagic acid, peroxides, free fatty (Yalcin et al.,
quercetin, β-carotene, and acids, and iodine values 2011)
retinol content
Kilka fish oil gallic acid and/or methyl Oxidation parameters of (Asnaashari,
gallate content triacylglycerols: Farhoosh &
induction period, slope Farahmandfar,
of initial stage of 2016)
oxidation curve, slope of
propagation stage of
oxidation curve and
peroxide value.
Soybean oil added Curcumin content Peroxide, acid and iodine (Asnaashari,
with curcumin as values. Farhoosh &
antioxidant Farahmandfar,
Sprouts Total phenolic compounds, Trolox equivalents (Buciński,
inositol hexaphosphate, Zieliński, &
glucosinolates, soluble Kozłowska, 2004)
proteins, ascorbic acid, and
total tocopherols
Sunflower oil Byproduct extracts content. Oxidation parameters (Karaman et al.,
Turnip (“yukina”) Chemical composition of the The Oxygen radical (Usami et al.,
essential oil volatile oil extracted from the absorbance capacity 2014)
aerial parts of Brassica rapa (ORAC).
(50 compounds) and Aroma
compounds, (12 compounds)
Whey proteins Protein content DPPH radical (Sharma et al.,
hydrolysates scavenging activity. 2012)
288 Jose M. Prieto

Prediction of Antimicrobial Activities

Antibacterial and Antifungal Activity

Pioneering use of ANNs in microbiology has been quite restricted to the modeling
the factors contributing to microorganism growth (Hajmeer et al., 1997; Lou and Nakai,
2001; Najjar, Basheer, & Hajmeer, 1997) or yield of bioproducts (Desai et al., 2005).
QSAR studies of single chemical entities to shown the usefulness of artificial neural
network which seem to be of equal or somehow superior in prediction success to linear
discriminant analysis (García-Domenech and de Julián-Ortiz, 1998; Murcia-Soler et al.,
2004; Buciński et al., 2009).
Artificial intelligence also makes it possible to determine the minimal inhibitory
concentration (MIC) of synthetic drugs (Jaén-Oltra et al., 2000). Recently some works
have explored the use of such approach to predict the MIC of complex chemical mixtures
on some causal agents of foodborne disease and/or food spoilage (Sagdic, Ozturk & Kisi,
2012; Daynac, Cortes-Cabrera & Prieto, 2016).
Essential oils are natural products popularly branded as ‘antimicrobial agents’. They
act upon microorganisms through a not yet well defined mixture of both specific and
unspecific mechanisms. In this regard, ANNs are a very good option as they have been
successfully applied to processes with complex or poorly characterised mechanisms, as
they only take into account the causing agent and its final effect (Dohnala et al., 2005;
Najjar et al., 1997).
Indeed, the antibiotic activities of essential oils depend on a complex chemistry and a
poorly characterised mechanism of action. Different monoterpenes penetrate through cell
wall and cell membrane structures at different rates, ultimately disrupting the
permeability barrier of cell membrane structures and compromising the chemiosmotic
control (Cox et al., 2000). It is therefore conceivable that differences in the gram staining
would be related to the relative sensitivity of microorganism to Essential oils. However,
this generalisation on is controversial as illustrated by conflicting reports in literature.
Nakatani (1994) found that gram-positive bacteria were more sensitive to essential oils
than gram-negative bacteria, whereas Deans and Ritchie (1987) could not find any
differences related to the reaction. The permeability of the membrane is only one factor
and the same essential oil may act by different mechanisms upon different
microorganisms. As an example, the essential oil of Melaleuca alternifolia (tea tree)
which inhibited respiration and increased the permeability of bacterial cytoplasmic and
yeast plasma membranes, also caused potassium ion leakage in the case of E. coli and S.
aureus (Cox et al., 2001).
To further complicate matters, the evaluation antimicrobial activity of natural
products cannot be always attributed to one single compound in the mixture or when so,
the overall activity may be due to interactions between components of the essential oils.
In fact, synergism and antagonisms have been consistently reported as reviewed by Burt
Artificial Intelligence for the Modeling and Prediction ... 289

(2000). The challenge of the complexity of the countless chemical interactions between
dozens of EOs components and the microbes is virtually impossible to address in the
laboratory, but it may be solved using computational models such as artificial neural
networks (ANNs). In addition, ANNs are theoretically able to consider synergies and
antagonisms between inputs. There is a consistent body of data on many crude essential
oils being more active than their separated fractions or components, report on synergies.
In some cases synergistic activity between two or three components could be
experimentally demonstrated (Didry et al., 1993; Pei et al., 2009), but to do so with
dozens of chemicals is beyond reach. In fact, ANNs are algorithms which has the
capacity of approximating an output value based on input data without any previous
knowledge of the model and regardless the complexity of its mechanisms, in this case the
relationship between the antioxidant capacity of a given essential oil (input data) and its
chemical composition (parameters affecting the assay). The enormous amount of
information produced on the antimicrobial activity of essential oils provides a rich field
for data-mining, and it is conceivable to apply suitable computational techniques to
predict the activity of any essential oil by just knowing its chemical composition.
Our results reflect both the variability in the susceptibility of different
microorganisms to the same essential oil, but more importantly point towards some
general trends. The antimicrobial effects of essential oils upon S. aureus and C.
perfringens (Gram +) were accurately modelled by our ANNs, thus meaning a clear
relationship between the chemistry of EOs and their susceptibility, perhaps suggesting a
more additive, physical -rather than pharmacological- mechanism of action. This also
opens the prospect for further studies to ascertain the best set of volatile components
providing optimum antimicrobial activity against these two pathogens and/or Gram + in
general. On the other hand, the lower accuracy of the predictions against E. coli (Gram -)
and C. albicans (yeast) may suggest more complex pharmacological actions of the
chemicals. In this case the activity may be pinned down to one or few active principles
acting individually or in synergies.
Ozturk et al. (2012) studied the effects of some plant hydrosols obtained from bay
leaf, black cumin, rosemary, sage, and thyme in reducing Listeria monocytogenes on the
surface of fresh-cut apple cubes. In addition to antibacterial measurements, the abilities of
Adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN), and
multiple linear regression (MLR) models were compared with respect to estimation of the
survival of the pathogen. The results indicated that the ANFIS model performed the best
for estimating the effects of the plant hydrosols on L. monocytogenes counts. The ANN
model was also effective but the MLR model was found to be poor at predicting
microbial numbers. This further proofs the superiority of AI over Multivariate statistical
methods in modeling complex bioactivities of chemically complex products.
290 Jose M. Prieto

Antiviral Activities
Viruses are still a major, poorly addressed challenge in medicine. The prediction of
antiviral properties of chemical entities or the optimisation of current therapies to
enhance patient survival would be of great impact but the application of AI to this
conundrum has been less explored than in the case of antibacterials. Perhaps the most
pressing issue is the search for improved combination, antiretroviral drugs to suppress
HIV replication without inducing viral drug resistance. The choice of an alternative
regimen may be guided by a drug-resistance test. However, interpretation of resistance
from genotypic data poses a major challenge. Larder and co-workers (2007) trained
ANNs with genotype, baseline viral load and time to follow-up viral load, baseline CD4+
T-cell counts and treatment history variables. These models performed at low-
intermediate level, explaining 40-61% of the variance. The authors concluded that this
was still a step forward and that these data indicate that ANN models can be quite
accurate predictors of virological response to HIV therapy even for patients from
unfamiliar clinics.
We recently tried to model the activity of essential oils on herpes viruses (types 1 and
2) by both MLR and ANNs (Tanir & Prieto, unpublished results). Our results could not
find a clear subset of chemicals with activity, but rather the best results were given by
datasets representing all major components. This highlights that viruses are a much
harder problem to model and more work must be done towards solving it.

Prediction of Pharmacological/Toxicological Effects and Disease Biomarkers

The prediction of pharmacological or toxicological effects should ideally involve

whole living organisms or at least living tissues. However, the current approach is the use
of culture mammal cells, favouring single proteins as targets. Therefore, predicting these
effects is clearly more complex than the prediction of purely chemical reactions (such as
antioxidant activities) or antimicrobial ones (bacteria, fungi, viruses).
Inflammation is the response of a living tissue to an injury. Therefore, is
fundamentally a multifactorial process which may pose extreme complexity on its
modeling. An approximation to the problem is to target the inhibition of key enzymes
responsible for the onset and maintenance of such process such as cyclooxygenases and
lipoxygenases. Nonsteroidal anti-inflammatory drugs inhibiting either of those targets are
the most used anti-inflammatory medicines in the world. Dual inhibitors of
cyclooxygenase-1 and 5-lipoxygenase are proposed as a new class of anti-inflammatory
drugs with high efficacy and low side effects. In a recent work, Chagas-Paula and co-
workers (2015) selected c.a. 60 plant leaf extracts from Asteraceae species with known in
vitro dual inhibition of cyclooxygenase-1 and 5-lipoxygenase and analyzed them by
HPLC-MS-MS analysis. Chromatographic peaks of the extracts were correlated to their
Artificial Intelligence for the Modeling and Prediction ... 291

respective anti-inflammatory properties by a genetic algorithm. After further study using

a decision tree classifier, some 11 chemical compounds were determined to be
‘biomarkers’ of the putative anti-inflammatory potential. From these data, an
unsupervised model to predict new biologically active extracts from Asteraceae species
using HPLC-MS-MS information was built using an ANN with the back-propagation
algorithm using the biomarker data resulting in a high percentage of correct predictions
for dual inhibition.
Nagahama et al. (2011) proposed the simultaneous estimation of the multiple health-
promoting effects of food constituents using ANNs. The model utilizes expression data of
intracellular marker proteins as descriptors that reply to stimulation of a constituent. To
estimate three health-promoting effects, namely, cancer cell growth suppression activity,
antiviral activity, and antioxidant stress activity, each model was constructed using
expression data of marker proteins as input data and health-promoting effects as the
output value.
Goodacre et al. (1998) used unsupervised methods of discriminant function and
hierarchical cluster analyses to group the spectral fingerprints, of clinical bacterial
isolates associated with urinary tract infection. ANNs trained with Raman spectra
correctly identified some 80% of the same test set, thus providing rapid accurate
microbial characterization techniques, but only when combined with appropriate
Zeraatpishe et al. (2011) studied the effects of Lemon balm infusions (30 days, twice
daily, a tea bag of 1.5 g in 100 mL water) on the oxidative stress status in radiology staff
exposed to persistent low-dose radiation during work. They measured lipid peroxidation,
DNA damage, catalase, superoxide dismutase, myeloperoxidase, and glutathione
peroxidase activity in plasma samples. The treatment markedly improved oxidative stress
condition and DNA damage in radiology staff. The authors posed the question whether
our approach to apply ANNs to correlate with the antioxidant essential oils (Cortes-
Cabrera & Prieto, 2010) was to be applied to the protective activities of Lemon balm in
order to improve this intervention.



Internal Factors

Some of the reported problems in the application of ANNs are caused by their
inherent structure and the most important are ‘overtraining’, ‘peaking effect’, and
‘network paralysis’. Overtraining the ANN may lead to the noise of data used for training
292 Jose M. Prieto

being fixed in the network weights. The peaking effect is experienced when an excessive
number of hidden neurons minimize error in training but increase error in testing. Finally
network paralysis appears when an excessive adjustment of neurons weight raise high
negative or positive values leading to a near zero output with sigmoid activation
functions (Kröse & van der Smagt, 1996). These limitations must be taken into account
and minimized with an adequate choice of the network topology and a careful selection
of neurone parameters (function, weights, threshold, etc.).

External Factors

From our experience, the most problematic factors influencing the accuracy of the
predictions when dealing with data mining are noise (inaccurate data), normalisation of
the output to acceptable ranges (0-1 for better results) and topology complexity (too
many inputs).
In the case of very complex chemical entities, such as natural products, noise
reduction needs to be achieved by selecting carefully the data sets from papers with
similar values of reference drugs. Bioassays are far away from being performed in the
same way (i.e., same protocol) around the world. Even within the same institution or
laboratory differences will arise from different users, each modifying the protocol slightly
to adapt it to their needs. In this regard it is of utmost importance that all use the same
reference drug (antioxidant, antimicrobial, anti-inflammatory, etc.). However this is
extremely variable across papers and sometimes absent in some. The reduced numbers of
valid data available to train and validate the ANNs force the use of small sets which may
induce in turn bias (Bucinski, Zielinski & Kozlowska, 2004; Cortes-Cabrera & Prieto,
2010; Daynac, Cortes-Cabrera & Prieto, 2016). Ii would be tempting to discuss also the
physicochemical incompatibility of many synthetic drugs and natural products with most
of the milieu in which the bioassays are run (solvent polarity, microbiological/cell culture
media, etc.), due mostly to their volatility and poor solubility but this would be beyond
the scope of this chapter.
The challenge in modeling the activity of essential oils is mainly the selection of
inputs and the topology. Ideally the data set would necessarily include all variables
influencing the bioactivity to be modelled (of the vector). In practice, more than 30 such
inputs adds a tremendous complexity to the network and generally the number of inputs
used in other ANN are far lower than the dataset we are able to generate. On the other
hand, the restriction of the input data set inevitability leads to a bias, but it is the only way
forward in order to overcome this problem. Also, the restricted number of comparable
data present in literature results in a low number of learning and validating sets. These
factors do not invalidate the use of ANNs but limits any generalization of the results
Artificial Intelligence for the Modeling and Prediction ... 293

(Najjar et al., 1997). By reducing the inputs to the most relevant compounds – for
example, retaining those with reported activity only– the researcher could reduce the
number of input neurons and subsequently those of hidden neurons therefore minimizing
problems associated with topology complexity. However, the number of inputs used in
our works remains far higher than any of the previous attempts reported by the literature
(Bucinski, Zielinski & Kozlowska, 2004; Torrecilla, Mena, Yáñez-Sedeño, & García,
2007). However, the deliberate choice of active compounds may introduce bias and
hamper the accuracy of the ANNs when synergies with non active components are
significantly involved. For example, in our work on the antioxidant activities of essential
oils, from the initial set of around 80 compounds present in these, only 30 compounds
with relevant antioxidant capacity were selected to avoid excessive complexity of the
neural network and minimize the associated structural problems. Similarly, in our work
on in our work on the antimicrobial activities of essential oils, from the initial set of
around 180 compounds present in these, only 22 compounds were selected. In this later
case two considerations were made: either to retain the compounds with known
antimicrobial properties only or to eliminate the compounds without known antimicrobial
activity and/or present at very low percentages (≤5%). The first strategy proved to give
better results (Cortes-Cabrera & Prieto, 2010; Daynac, Cortes-Cabrera & Prieto, 2016).
The output values need in many cases to be normalized to a range usually between 0
and 1. This implies diverse strategies depending on how many orders of magnitude
expand the original data. A common approach is applying logarithms to the original
values (Log x, or log 1/x) (Cortes-Cabrera & Prieto, 2010; Daynac, Cortes-Cabrera &
Prieto, 2016; Buciński and 2009).
Finally, the overall performance of the ANNs depends on the complexity of the
biological phenomenon to model. In our hands the performance on prediction of the
result of antimicrobial assays was lower than predicting purely biochemical assays. A
highest degree of variability in the response of whole living organisms vs. the higher
reproducibility of biochemical reactions is in agreement with the work discussed above
about antiviral activities.


Back in 1991 Zupan and Gasteiger questioned the future of the application of ANNs.
At the time a few applications only were reported despite a healthy output of research on
ANNs (Zupan & Gasteiger, 1991). The affordability of computational power and the
availability of ANNs software with friendlier interfaces has made this tool more
accessible and appealing to the average researcher in fields afar from computing,
facilitating their application to many different scientific fields. It is nowadays an add-on
to all main statistical software packages or available for free as a standalone.
294 Jose M. Prieto

In this chapter, we present work showing the potential of ANNs as a tool to

accomplish the prediction of bioactivities for very complex chemical entities such as
natural products, and suggest strategies on the selection of inputs and conditions for the
in silico experiments. We highlight the limitations of the scientific data so far available -
that suffer from little standardization of the experimental conditions and disparate choice
of reference drugs - as well as the shortfalls of some popular assay methods which limit
the accuracy of the ANNs prediction.
From the number and range of scientific output published, we cannot see that this
tool has been used to its full potential in the pharmaceutical, cosmetic or food industry.
There is a need to form multidisciplinary groups to generate high quality experimental
data and process them to exploit the full potential offered by ANNs. The author foresees
a future where omics technology and systems biology will feed data in real time cloud-
based ANNs to build increasingly accurate predictions and classifications of biochemical
activities of complex natural products facilitating their rational clinical use to improve
healthcare and food safety worldwide.


Agatonovic-Kustrin, S., Beresford, R. (2000). Basic concepts of artificial neural network
(ANN) modeling and its application in pharmaceutical research. J Pharm Biomed
Anal, 22, 717-727.
Agatonovic-Kustrin, S. & Loescher, C. (2013). Qualitative and quantitative high
performance thin layer chromatography analysis of Calendula officinalis using high
resolution plate imaging and artificial neural network data modelling. Anal Chim
Acta, 798, 103-108.
Asnaashari, E., Asnaashari, M., Ehtiati, A., & Farahmandfar, R. (2015). Comparison of
adaptive neuro-fuzzy inference system and artificial neural networks (MLP and RBF)
for estimation of oxidation parameters of soybean oil added with curcumin. J Food
Meas Char, 9, 215-224.
Asnaashari, M., Farhoosh, R., & Farahmandfar, R. (2016), Prediction of oxidation
parameters of purified Kilka fish oil including gallic acid and methyl gallate by
adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network. J Sci
Food Agr, 96, 4594-4602.
Batchelor, B. (1993). Automated inspection of bread and loaves. Int Soc Opt Eng (USA),
2064, 124-134.
Bhotmange, M. & Shastri, P. (2011). Application of Artificial Neural Networks to Food
and Fermentation Technology. In: Suzuki K Artificial Neural Networks - Industrial
and Control Engineering Applications, Shanghai, InTech, 2011; 201-222.
Artificial Intelligence for the Modeling and Prediction ... 295

Buciński, A., Socha, A., Wnuk, M., Bączek, T., Nowaczyk, A., Krysiński, J., Goryński,
K., & Koba, M. (2009). Artificial neural networks in prediction of antifungal activity
of a series of pyridine derivatives against Candida albicans, J Microbiol Methods, 76,
Bucinski, A., Zielinski, H., & Kozlowska, H. (2004). Artificial neural networks for
prediction of antioxidant capacity of cruciferous sprouts. Trends Food Sci Technol,
15, 161-169.
Burt, S. (2004). Essential oils: their antibacterial properties and potential applications in
food—a review. Int J Food Microbiol, 94, 223–253.
Cartwright, H. (2008). Artificial neural networks in biology and chemistry: the evolution
of a new analytical tool. Methods Mol Biol., 458, 1-13.
Chagas-Paula, D., Oliveira, T., Zhang, T., Edrada-Ebel, R., & Da Costa, F. (2015).
Prediction of anti-inflammatory plants and discovery of their biomarkers by machine
learning algorithms and metabolomic studies. Planta Med, 81, 450-458.
Chen, Q., Guo, Z., Zhao, J., & Ouyang, Q. (2012). Comparisons of different regressions
tools in measurement of antioxidant activity in green tea using near infrared
spectroscopy. J Pharm Biomed Anal., 60, 92-97.
Chen, Y., Cao, W., Cao, Y., Zhang, L., Chang, B., Yang, W., & Liu X. (2011). Using
neural networks to determine the contribution of danshensu to its multiple
cardiovascular activities in acute myocardial infarction rats. J Ethnopharmacol.,
Cimpoiu, C., Cristea, V., Hosu, A., Sandru, M., & Seserman, L. (2011). Antioxidant
activity prediction and classification of some teas using artificial neural networks.
Food Chem, 127, 1323-1328.
Cortes-Cabrera, A. & Prieto, J. (2010). Application of artificial neural networks to the
prediction of the antioxidant activity of essential oils in two experimental in vitro
models. Food Chem, 118, 141–146.
Cox, S., Mann, C., & Markham, J. (2000). The mode of antimicrobial action of the
essential oil of Melaleuca alternifolia (Tea tree oil). J Applied Microbiol, 88, 170–
Cox, S., Mann, C., & Markham, J. (2001). Interactions 
between components of the
essential oil of Melaleuca alternifolia. J Applied Microbiol, 91, 492–497. 

Daynac, M., Cortes-Cabrera, A., & Prieto J. (2015). Application of Artificial Intelligence
to the Prediction of the Antimicrobial Activity of Essential Oils. Evidence-Based
Complementary and Alternative Medicine. Article ID 561024, 9.
Deans, S. & Ritchie G. (1987). Antibacterial properties of plant essential oils. Int J Food
Microbiol, 5, 165–180. 

Desai, K., Vaidya B., Singhal, R., & Bhagwat, S. (2005). Use of an artificial neural
network in modeling yeast biomass and yield of β-glucan, Process Biochem, 40,
296 Jose M. Prieto

Didry, N., Dubreuil, L., & Pinkas, M. (1993). Antimicrobial activity of thymol, carvacrol
and cinnamaldehyde alone or in combination. Pharmazie, 48, 301–304.
Dohnal, V., Kuča, K., & Jun, D. (2005). What are artificial neural networks and what
they can do? Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub., 149, 221–
Eerikanen, T. & Linko, P. (1995). Neural network based food extrusion cooker control.
Engineering Applications of Artificial Neural Networks. Proceedings of the
International Conference EANN ’95, 473-476.
García-Domenech, R. & de Julián-Ortiz, J. (1998). Antimicrobial Activity
Characterization in a Heterogeneous Group of Compounds. J Chem Inf Comput Sci.,
38, 445-449.
Goodacre, R., Timmins, E., Burton, R., Kaderbhai, N., Woodward, A., Kell, D., &
Rooney, P. (1998). Rapid identification of urinary tract infection bacteria using
hyperspectral whole-organism fingerprinting and artificial neural networks.
Microbiology, 144, 1157-1170.
Goyal, S. (2013). Artificial neural networks (ANNs) in food science – A review. Int J Sci
World, 1, 19-28.
Guiné, R., Barroca, M., Gonçalves, F., Alves, M., Oliveira, S., & Mendes, M. (2015).
Artificial neural network modelling of the antioxidant activity and phenolic
compounds of bananas submitted to different drying treatments. Food Chem., 168,
Han, S., Zhang, X., Zhou, P., & Jiang, J. (2014). Application of chemometrics in
composition-activity relationship research of traditional Chinese medicine. Zhongguo
Zhongyao Zazhi, 39, 2595-2602.
Huang, Y., Kangas, L., & Rasco, B. (2007). Applications of artificial neural networks
(ANNs) in food science, Crit. Rev. Food. Sci. Nut., 47, 133-126.
Huuskonen, J., Salo, M., & Taskinen, J. (1998). Aqueous Solubility Prediction of Drugs
Based on Molecular Topology and Neural Network Modeling. J Chem Inf Comput
Sci, 38, 450-456.
Jaén-Oltra, J., Salabert-Salvador, M, García-March, J., Pérez-Giménez, F., & Tomás-
Vert, F. (2000). Artificial neural network applied to prediction of fluorquinolone
antibacterial activity by topological methods. J. Med. Chem., 43, 1143–1148.
Jalali-Heravi, M., & Parastar, F. (2000). Use of artificial neural networks in a QSAR
study of anti-HIV activity for a large group of HEPT derivatives. J Chem Inf Comput
Sci., 40, 147-154.
Jezierska, A., Vračko, M., & Basak, S. (2004). Counter-propagation artificial neural
network as a tool for the independent variable selection: Structure-mutagenicity study
on aromatic amines. Mol Divers, 8, 371–377.
Karaman, S., Ozturk, I., Yalcin, H., Kayacier, A., & Sagdic, O. (2012). Comparison of
adaptive neuro-fuzzy inference system and artificial neural networks for estimation
Artificial Intelligence for the Modeling and Prediction ... 297

of oxidation parameters of sunflower oil added with some natural byproduct extracts.
J Sci Food Agric, 92, 49-58.
Kovesdi, I., Ôrfi, L., Náray-Szabó, G., Varró, A., Papp, J., & Mátyu P. (1999).
Application of neural networks in structure-activity relationships. Med Res Rev., 19,
Krogh, A. (2008). What are artificial neural networks? Nature biotechnol, 26, 195-197.
Kröse, B., & van der Smagt, P. (1996). An introduction to neural networks (8th ed.).
University of Amsterdam.
Larder, B., Wang, D., Revell, A., Montaner, J., Harrigan, R., De Wolf, F., Lange, J.,
Wegner, S., Ruiz, L., Pérez-Elías, M., Emery, S., Gatell, J., Monforte, A., Torti, C.,
Zazzi, M., & Lane, C. (2007). The development of artificial neural networks to
predict virological response to combination HIV therapy. Antivir Ther., 12, 15-24.
Latrille, E., Corrieu, G., & Thibault J. (1993). pH prediction and final fermentation time
determination in lactic acid batch fermentations. Comput. Chem. Eng. 17, S423-
Ma, J., Cai, J., Lin, G., Chen, H., Wang, X., Wang, X., & Hu, L. (2014). Development of
LC-MS determination method and back-propagation ANN pharmacokinetic model of
corynoxeine in rat. J Chromatogr B Analyt Technol Biomed Life Sci., 959, 10-15.
Maulidiani, A., Khatib, A., Shitan, M., Shaari, K., & Lajis, N. (2013). Comparison of
Partial Least Squares and Artificial Neural Network for the prediction of antioxidant
activity in extract of Pegaga (Centella) varieties from 1H Nuclear Magnetic
Resonance spectroscopy. Food Res Int, 54, 852-860.
Mendelsohn, A. & Larrick, J. (2014). Paradoxical Effects of Antioxidants on Cancer.
Rejuvenation Research, 17(3), 306-311.
Misharina, T., Alinkina, E., Terenina, M., Krikunova, N., Kiseleva, V., Medvedeva. I., &
Semenova, M. (2015). Inhibition of linseed oil autooxidation by essential oils and
extracts from spice plants. Prikl Biokhim Mikrobiol., 51, 455-461.
Murcia-Soler, M., Pérez-Giménez, F., García-March, F., Salabert-Salvador, M., Díaz-
Villanueva, W., Castro-Bleda, M., & Villanueva-Pareja, A. (2004). Artificial Neural
Networks and Linear Discriminant Analysis:  A Valuable Combination in the
Selection of New Antibacterial Compounds. J Chem Inf Comput Sci., 44, 1031–1041.
Musa, K., Abdullah, A., & Al-Haiqi, A. (2015). Determination of DPPH free radical
scavenging activity: Application of artificial neural networks. Food Chemistry,
194(12), 705-711.
Nagahama, K., Eto, N., Yamamori, K., Nishiyama, K., Sakakibara, Y., Iwata, T., Uchida,
A., Yoshihara, I., & Suiko, M. (2011). Efficient approach for simultaneous estimation
of multiple health-promoting effects of foods. J Agr Food Chem, 59, 8575-8588.
Najjar, Y., Basheer, I., & Hajmeer, M. (1997). Computational neural networks for
predictive microbiology: i. methodology. Int J Food Microbiol, 34, 27– 49.
Nakatani, N. (1994). Antioxidative and antimicrobial constituents of 
herbs and spices.
Dev Food Scie, 34, 251– 271. 

298 Jose M. Prieto

Nissen, S. (2007). Fast Artificial Network Library

Ozturk, I., Tornuk, F., Sagdic, O., & Kisi, O. (2012). Application of non-linear models to
predict inhibition effects of various plant hydrosols on Listeria monocytogenes
inoculated on fresh-cut apples. Foodborne Pathog Dis., 9, 607-616.
Palancar, M., Aragón, J., & Torrecilla J. (1998). pH-Control system based on artificial
neural networks. Ind. Eng. Chem. Res., 37(7), 2729-2740.
Parojcić, J., Ibrić, S., Djurić, Z., Jovanović, M., & Corrigan O. (2007). An investigation
into the usefulness of generalized regression neural network analysis in the
development of level A in vitro-in vivo correlation. Eur J Pharm Sci., 30, 264-272.
Pei, R., Zhou, F., Ji, B., & Xu, J. (2009). Evaluation of combined antibacterial e ects of
eugenol, cinnamaldehyde, thymol, and carvacrol against E. coli with an improved
method, J Food Sci, 74, M379–M383.
Rahman, A., Afroz, M., Islam, R., Islam, K., Amzad Hossain, M., & Na, M. (2014). In
vitro antioxidant potential of the essential oil and leaf extracts of Curcuma zedoaria
Rosc. J Appl. Pharm Sci, 4, 107-111.
Ruan, R., Almaer, S., & Zhang, S. (1995). Prediction of dough rheological properties
using neural networks. Cereal Chem, 72(3), 308-311.
Sagdic, O., Ozturk, I., & Kisi, O. (2012). Modeling antimicrobial effect of different grape
pomace and extracts on S. aureus and E. coli in vegetable soup using artificial neural
network and fuzzy logic system. Expert Systems Applications, 39, 6792-6798.
Shahidi, F. (2000). Antioxidants in food and food antioxidants. Nahrung, 44, 158–163.
Sharma, A., Mann, B., & Sharma, R. (2012). Predicting antioxidant capacity of whey
protein hydrolysates using soft computing models. Advances in Intelligent and Soft
Computing, 2, 259-265.
Tanir, A. & Prieto, J. (2016). Essential Oils for the Treatment of Herpes Virus Infections:
A Critical Appraisal Applying Artificial Intelligence and Statistical Analysis Tools.
Unpublished results.
Torrecilla, J., Mena, M., Yáñez-Sedeño, P., & García J. (2007). Application of artificial
neural networks to the determination of phenolic compounds in olive oil mill
wastewater. J Food Eng, 81, 544-552.
Torrecilla, J., Otero, L., & Sanz, P. (2004). A neural network approach for
thermal/pressure food processing. J Food Eng, 62, 89-95.
Usami, A, Motooka R, Takagi A, Nakahashi H, Okuno Y, & Miyazawa M. (2014).
Chemical composition, aroma evaluation, and oxygen radical absorbance capacity of
volatile oil extracted from Brassica rapa cv. “yukina” used in Japanese traditional
food. J Oleo Sci, 63, 723-730.
Wang, H., Wong, H., Zhu, H., & Yip, T. (2009). A neural network-based biomarker
association information extraction approach for cancer classification. J Biomed
Inform, 42, 654-666.
Yalcin, H., Ozturk, I., Karaman, S., Kisi, O., Sagdic, O., & Kayacier, A. (2011).
Prediction of effect of natural antioxidant compounds on hazelnut oil oxidation by
Artificial Intelligence for the Modeling and Prediction ... 299

adaptive neuro-fuzzy inference system and artificial neural network. J Food Sci., 76,
Yan, C., Lee, J., Kong, F., & Zhang, D. (2013). Anti-glycated activity prediction of
polysaccharides from two guava fruits using artificial neural networks. Carbohydrate
Polymers, 98, 116-121.
Young, I. & Woodside, J. (2001). Antioxidants in health and disease. Journal of Clinical
Pathology, 54, 176-186.
Zeraatpishe, A., Oryan, S., Bagheri, M., Pilevarian, A., Malekirad, A., Baeeri, M., &
Abdollahi, M. (2011). Effects of Melissa officinalis L. on oxidative status and DNA
damage in subjects exposed to long-term low-dose ionizing radiation. Toxicol Ind
Health, 27, 205-212.
Zheng, H., Fang, S., Lou, H., Chen, Y., Jiang, L., & Lu, H. (2011). Neural network
prediction of ascorbic acid degradation in green asparagus during thermal treatments.
Expert Syst Appl 38, 5591-5602.
Zheng, H., Jiang, L., Lou, H., Hu, Y., Kong, X., & Lu, H. (2011). Application of artificial
neural network (ANN) and partial least-squares regression (PLSR) to predict the
changes of anthocyanins, ascorbic acid, Total phenols, flavonoids, and antioxidant
activity during storage of red bayberry juice based on fractal analysis and red, green,
and blue (RGB) intensity values. J Agric Food Chem., 59, 592-600.
Zupan, J. & Gasteiger, J. (1991). Neural networks: A new method for solving chemical
problems or just a passing phase? Analytica Chimica Acta, 248, 1-30.


Dr. Jose M. Prieto obtained a PhD in Pharmacology (2001) at the University of

Valencia (Valencia, Spain) in the field of topical inflammation. His Post-doctoral
research activities include the EU funded projects 'Insect Chemical Ecology' (Department
of Bioorganic Chemistry, Universita degli Studi di Pisa, Italy) (2001-2004) and
“Medicinal Cannabis” (Department of Pharmaceutical and Biological Chemistry, School
of Pharmacy, University of London, United Kingdom) (2005-2006). He was then
appointed as Lecturer in Pharmacognosy (UCL School of Pharmacy) where his research
focuses on the application of advanced techniques (Direct NMR, Artificial Intelligence)
to the analysis and biological effects of complex natural products. He has authored more
than 50 original papers and is member of the editorial board of Frontiers in Pharmacology
(Nature), Evidence-Based Complementary and Alternative Medicine (Hindawi) and
Complementary Therapies in Clinical Practice (Elsevier) among others.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 13



Mayra Bornacelli1, Edgar Gutierrez2, and John Pastrana3*

Carbones del Cerrejón (BHP Billiton, Anglo American, Xtrata), Bogota, Colombia
Center for Latin-American Logistics Innovation, Bogota, Colombia
American Technologika, Clermont, Florida, US


The research is aimed at delivering predictive analytics models which build powerful
means to predict thermal coal prices. The developed methodology started by analyzing
expert market insights in order to obtain the main variables. The Delphi methodology was
implement in order to reach conclusions about the variables and tendencies in the global
market. Then, artificial intelligence techniques such as neural networks and regression
trees were used to develop and refine the number of variables. The predictive models
created were validated and tested. Neural networks outperformed regression trees.
However, regression trees created models which were easy to visualize and understand.
The conceptual results from this research can be used as an analytical framework to
facilitate the analysis of price behavior (oligopolistic markets) to build global business

Keywords: predictive analytics, neural networks, regression trees, thermal coal price

Corresponding Author Email:
302 Mayra Bornacelli, Edgar Gutierrez and John Pastrana


Increasing interest and implementation of data analytics have been demonstrating

how it is possible to extract valuable knowledge from data that companies can collect
through their systems, insights from market experts, data patterns, trends, and
relationships with other markets. Data analytics can help understand the business in a
holistic point of view. Challenges that create techniques and methodologies are beneficial
for this purpose (Chen & Zhang, 2014; Groschupf et al., 2013). Organizations are
investing in data analytics and machine learning techniques. For example, a survey by
Gartner reveals that 73% of the surveyed companies, are investing in data analytics and
big data technology (Rivera, 2014). In general analytics help organizations increase
revenue, speed time to market, optimize its workforce, or realize other operational
improvements Predictive analytics is the arm of data analytics and it is a scientific
paradigm for discoveries (Hey, 2009).
McKinsey stated the potential use of predictive analytics (Manyika et al., 2012) and
its impact in innovation and productivity. Another important factor is that the volume of
data is estimated to increase minimum by double each 1.2 years. This is even more
important in a globalized economy due to continuous changes and uncertainty. Various
decisions are made such as investment decisions, expansion decisions, or simply the
philosophy that the company will adopt in terms of maximizing profits or having a
constant cash flow. Making strategic decisions involves understanding the structure of a
system and the number of variables that influence it (mainly outside the control of the
stakeholders). The complex structure and the numerous variables make these decisions
complex and risky. Risk is determined by four main factors when trying to innovate as
mentioned by Hamel & Ruben (2000):

 Size of the irreversible financial commitment,

 Degree to which the new opportunity moves away from the core of the company,
 Degree of certainty about the project’s critical assumptions (especially the
 Time Frame

In a rapidly changing world, there are few second chances, and in spite of risks and
uncertainty, companies have to make decisions and take steps forward or try to stay
afloat. This uncertainty is sometimes directly associated with the price of the main
products in the market, and it affects income, return on investment, labor stability, and
financial projections. This is the case of the thermal coal market and many oligopolies,
whose price is regulated internationally under a point of balance between demand and
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 303

supply, and these in turn are determined by both quantifiable and non-quantifiable
variables that are the model of their price actually.
The thermal coal market has another characteristic: despite being an oligopoly, it is
not possible to be strategic in terms of prices, as is the case with oil. Coal companies
almost always have to be reactive with respect to market events. This is a phenomenon
that Peter Senge (2006) describes: The companies that are reactive in the markets begin
to worry about the facts, and concern for the facts dominates business deliberations such
as the price of coal of last month.
To analyze the coal market, we formed a panel of experts from around the world. The
Delphi methodology was used to investigate with this panel of experts which are the most
important strategic variables globally that influence the price of thermal coal. Once a
consensus is reached, AI techniques can be used to verify these variables and build
predictive models to calculate the price of thermal coal. This prediction can provide
strategic support to the coal mining companies (Phil, 1971).
In the history of the prices of thermal coal, there exists the following milestones that
have marked great fluctuations (Ellerman, 1995; Yeh & Rubin, 2007; Ming & Xuhua,
2007; Finley, 2013; EIA, 2013):

 Oil crisis of the 1970s - This crisis caused countries to rethink their dependence
on oil for power generation and gave impetus to coal as an alternative;
 Emphasis on sea transportation - Developers of mining projects dedicated
mainly to exports, promoted the development of the market for coal transported
by sea and therefore globalized the supply and demand of coal (previously the
coal was consumed near places where it was extracted)
 Prices indices for Coal – The creation of price indices at different delivery
points (FOB Richards Bay, CIF Rotterdam) that gave more transparency in
transactions and helped better manage market risk;
 Industrialization of emerging economies (especially China) – This
industrialization gave support to demand never seen before
 Emergence of financial derivative markets – This financial markets offered
more tools to manage price risk (they also promoted the entry of new players,
such as banks)
 Global warming and climate change - The publication of studies on global
warming and climate change that led countries worldwide to take action to
reduce CO2 emissions and thus reduce the use of coal
 Germany shuts down all its nuclear plants – This happened after the accident
at the Fukushima Nuclear Plant in Japan in March 2011, indirectly driving an
increase in power generation with renewable, Coal and Natural Gas
304 Mayra Bornacelli, Edgar Gutierrez and John Pastrana

 UK created a tax (Carbon Floor) above the price of CO2 – This tax artificially
benefits the generation of less CO2 emitting energy (renewable and natural gas)
over all existing technologies with a direct impact on energy costs to the end user
 Development of the tracking method to extract the shale gas profitably - The
cost effective gas produced with this method displaced part of the coal in the
USA. The coal that was not consumed locally then began to be exported, which
increased the world supply and therefore reduced prices.

The problem that we try to solve is summarized in three aspects:

1. Markets such as the oil and coal are oligopolies, which means, the fluctuations of
their prices is determined by variables that shape their demand and offer in the
2. Over time, analysts have identified some of these variables (and even introduced
new ones). However, the relationship between the variables and their order of
importance is not clear yet. This type of study is relevant to find patterns with
respect to the price and not analyzing independent events.
3. Each of the variables that have shaped the coal price, have generated their own
strength (positive or negative) in the price, and about these events the
stakeholders have historically reacted.

The objective of this research is to determine the most influential variables in the
price of thermal coal by using the Delphi methodology and the subsequent evaluation of
the results with AI techniques such as neural networks and regression trees.


This project proposes an analytical framework that allows managers to analyze prices
in the thermal coal industry. Figure 1 shows the general research framework. From the
data acquisition, data process, the use of models, and their outputs. With this framework
analyzers may have a tool to deal with volume of data and diversity, handle the
imprecisions and provide robust solutions for price prediction.
This process determines challenges and opportunities that a company could face from
the data gathering until their analysis and use to create value and optimize their business
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 305

Figure 1. General Research framework.

Once the data is obtained from different sources, a process of cleaning, organizing
and storing starts, followed by its analytics and implementation. These are tools that help
handle data volume and diversity and imprecisions and provide robust solutions. The
techniques of data mining and predictive analytics help the enterprise make a better
decision-making process.
Through the Delphi methodology the most influential thermal coal price variables are
determined, then historical data of these variables (25 years) is collected and data mining
will verify their order of importance and will predict the price of thermal coal as shown in
Figure 2.

Figure 2. Methodology of this research.

306 Mayra Bornacelli, Edgar Gutierrez and John Pastrana


The Delphi method solicits the judge of experts with information and opinion
feedback in order to establish a convergence of opinions about a research question. It is
necessary to implement a method of consensus by the nature of the problem that involves
markets around the world and variables of different orders of magnitude.
A Panel of thirteen (13) experts was selected for this Delphi. The question that
experts answered: “What are the most influential variables in the price of thermal coal?”
through three rounds in order to achieve consensus. A short description of the participants
are: Atlanta, USA: Sales Director of one of the leading companies in the analysis of
thermal coal. Lexington, USA: Professor of Economics at the University of Kentucky
(Kentucky is largest producer of thermal coal in USA). Orlando, USA: Professor in
Complexity Theory. Cape Town, South Africa: Economist Professor at the University of
Cape Town. Dublin, Ireland: Coal Trader at CMC (Coal Marketing Company - Germany: Coal Trader at CMC (Coal Marketing Company).
Bangalore, India: Professor of International Business in Alliance University (Bangalore,
India). China: Investigator of the financial markets and derivatives. Australia: Coal
geology research from University of New South Wales (UNSW - School of Biological
Earth and Environment Sciences). Colombia: Coal Senior Analyst (Argus McCloskey
Company -, Technical Marketing Support (Cerrejón – one
of the world's largest open pit coal mines), Professor at the National University of
Colombia, CEO Magma Solution. Figure 3 shows Delphi participants by regions.

Figure 3. Delphi participants by regions.

Predictive Analytics for Thermal Coal Prices Using Neural Networks … 307

Results of Consensus through Delphi Method

Figure 4 shows the different variables found during the first round.

Figure 4. Result of the first round of Delphi.

Table 1. Result of the second round of Delphi

On the other hand, the purpose of the second round is to verify the agreement of the
variables with the experts. The results of this round are presented in Table 1.
In the third round of Delphi, the results of the second round were reviewed and 100%
consensus was achieved among participants. The following variables were selected:

 Demand and consumption of coal in China, US, India, and Europe.

 Laws of Environmental restriction for the exploitation and use of coal in the
United States, China, India and Europe. (Measured by level)
 Price of Natural Gas.
 Price of Oil.
 Price of electricity in China, US, India, Europe.
 Availability of Shale Gas.
 Exporters and consumers countries: Dollar exchange rate (China, Europe, India).
 Development of Renewable Energy in the United States, China, Europe, India.
 Trends in climate change.
 Cost of coal transportation (land and sea).
 Oversupply of coal in the international market.
308 Mayra Bornacelli, Edgar Gutierrez and John Pastrana

The consumption of thermal coal is required mainly to generate electricity and this is
an important variable for increase or decrease of demand, by the simple principle of
economics. The main consumers of coal have been China, United States, Europe and
India for 25 years (Finley, 2013). In spite of the trend of consumption in these regions, it
is possible that due to the social, political and environmental situations the consumption
of coal fluctuates suddenly and these events are not possible to measure in the models.
The principal consumer and producer of coal in the world is China, which means China
determines significantly the behavior of the price, for example: if China closes some coal
mines and the consumption of coal remains the same in the world, the price will go up,
but in the case that China reduces its consumption of coal the price of coal will probably
The level of environmental restrictions for the exploitation and use of coal, and
trends in climate change have a gradual effect on the demand for coal. On the other hand,
the price of oil, gas and coal, for some reason was always assumed to be related, but until
recently this relationship was studied and different conclusions were withdrawed. Take
the case of oil and coal: they are close substitutes, so economic theories indicate that their
prices should then be close. A correlation study found that there is causal and non-causal
relationships between oil and coal prices, that is, the cause and effect of oil to coal and
not in the opposite direction. For this reason, its conclusions point to the feasibility that
the price of coal in Europe reacted to movements in oil prices, and statistical evidence
indicates that, in the face of a rise or fall in oil prices, the price of coal reacts.
In Delphi, one of the variables with the greatest consensus was the relationship
between the US Dollar and the currency of the main producing and consuming countries.
This variable was used to represent the economy of the different regions and thus to
analyze the behavior of this relationship with the prices of the thermal coal. According to
historical behavior, we know that there is an inverse relationship between the prices of oil
and coal, whose transaction is done in dollars, and the value of this currency. The
devaluations of the dollar have been present with high prices in these commodities,
whose value increases as a compensatory effect in the face of the devaluation of this
Shale gas is a substitute product for coal. Extraction requires unconventional
technologies because it does not have sufficient permeability. Initially it was thought that
Shale Gas was less polluting than coal, so it started to be implemented as a substitute,
however academic research has shown that by fracturing rocks for extraction and gas,
leaks to the environment are much more polluting than coal, in addition to important
consequences for the soil. Since 2010 shale gas has had a major commercial boom in the
United States, for this reason the price of coal had a decrease for all those countries that
began to use it as a source of energy. It is not very clear yet the prospect of extracting and
marketing shale gas, but it is an alternative source for coal, so this variable was selected
with consensus in Delphi.
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 309

From renewable energies we can say that they are not pollutants like coal and that
there are countries with a high degree of development and implementation of them,
however, in an imperious reality where the availability and cost / benefit of using coal to
produce energy is the best choice for many countries yet. Sources of renewable energy in
the short term could not be a major threat to coal prices.
With these results and other variables such as the price of electricity, the costs of coal
transportation and the oversupply in the market, we started to collect the data available
for 25 years. This data can be analyzed using neural networks and regression trees.


Our goal was now to understand the most important variables and justify them by
using historical data. Delphi demonstrated the importance of quantitative and qualitative
variables. We decided to use different techniques of the data mining domain: Neural
Networks and Classification /Regression Trees, with the variables resulting from the
Delphi process the data for 25 years were investigated quarterly (due to the availability of
the data). The data used was retrieved from the institutions which collect statistical data
for the coal market (Finley, 2013; EIA, 2013; DANE, 2013). In addition, considerations
for seasonality and dependence in previous periods were also added to the formulations.

Neural Networks

The analysis is performed by using neural networks to determine the most important
factors and build a series of predictive models. This study included the use of supervised
learning systems in which a database for learning is used (Singh & Chauhan, 2009). It is
important to say that in supervised learning we try to adapt a neural network so that its
results (μ) approach the targets (t) from a historical dataset. The aim is to adapt the
parameters of the network to perform well for samples from outside the training set.
Neural networks are trained with 120 input variables representing the relevant factors and
their values in time sequential quarterly and annual cycles and the output represents the
increment in price of thermal coal for the future quarter. We have 95 data samples, out of
which 63 are used for training and validation and 32 are used exclusively for prediction.
Figure 5 represents a generic diagram for a neural network with a feedforward
310 Mayra Bornacelli, Edgar Gutierrez and John Pastrana

Figure 5. Schematic of a neural network.

Selection of an Appropriate Architecture for Neural Networks to Predict

the Price of Thermal Coal

An appropriate architecture for the neural network (i.e., the number of neurons in the
hidden layer) had to be selected since the backpropagation algorithm was used. Moody
and Utans (1992) indicated that the learning ability (CA) of a neural network depends on
the balance between the information of the examples (vectors) and the complexity of the
neural network (i.e., the number of neurons of the hidden layers - which also tells us of
the number of weights since they are proportional). It is important to say that a neural
network with few weights and therefore neurons in the hidden layers (λ) will not have the
proper CA to represent the information of the examples. On the other hand, a neural
network with a large number of weights (i.e., degrees of freedom) will not have the
capability to learn due to overfitting.
Traditionally in supervised neural networks CA is defined as expected performance
in data that is not part of the training examples. Therefore, several architectures (different
hidden neurons) are “trained” and the one with the best CA is selected. This method is
especially effective when there are sufficient data samples (i.e, a very large number).
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 311

Unfortunately, in the problem of the thermal coal price, there are not enough observations
to calculate CA. Therefore it was decided not to use the traditional method. It was
decided instead of using crossvalidation (CV). As indicated by Moody and Utans (1992),
CV is a re-use sample method that can be used to estimate CA. CV makes minimal
assumptions about the statistics of the data. Each instance of the training database is
selected apart and the neural network is trained with the remaining (N – 1). The results of
all n, one for each instance of the dataset, are averaged, and the mean represents the final
estimate of CA. This is expressed by the following equation (Moody and Utans, 1992):

𝐶𝑉(𝜆) = ∑𝑁 ̂ 𝜆(𝑗) (𝑥𝑗 ))2

𝑗=1(𝑡𝑗 − 𝜇 (1)

Figure 6 represent the process using CV to select an appropriate number of neurons

in the hidden layer. We selected the potential number of hidden neurons using a range
from 4 to 30 neurons. Figure 6 indicates CV for each number of hidden neuron utilized.
The lowest CV was for an architecture with λ = 10. Therefore, we will have 10 neurons
in the hidden layer of the neural network.

Figure 6. CV and the selection of neurons in the hidden layer. λ = 10 was the lowest CV.

Elimination of  Input Variables

The next step was to select the input variables which contribute to the prediction of
the thermal coal price. We begin removing input variables which are not required. To test
which factors are most significant for determining the neural network output using the
neural network with 10 hidden neurons, we performed a sensitivity analysis and the
respective results are depicted in Figure 7. We defined the “Sensitivity” of the network
model to input variable β as (Moody and Utans, 1994):
312 Mayra Bornacelli, Edgar Gutierrez and John Pastrana

Sβ = ∑N
j=1 ASE( without xβ ) − ASE(xβ ) (2)

Moody and Utans (1994) explains very well this process as follows “Here, 𝑥𝛽𝑗 is the
βth input variable of the jth exemplar. Sβ measures the effect on the average training
squared error (ASE) of replacing the βth input xβ by its average ̅̅̅̅
𝑥𝛽 . Replacement of a
variable by its average value removes its influence on the network output.” Again we use
CV to estimate the prediction risk CV 𝑃𝜆 . A sequence of models by deleting an
increasing number of input variables in order of increasing𝑆𝛽 . A minimum was attained
for the model with 𝐼𝜆 = 8 input variables (112 factors were removed) as shown in Figure
7. We had to build a large number of neural networks (all of them with 10 hidden neurons
in the hidden layer) in order to obtained and validate the different results displayed in
Figure 7. In addition, it was decided to use a different elimination of input variables
based on the correlations among variables. The results were very comparable. Figure 6
shows as the error increases after eliminating the variable number 9.
With this result, we train the neural network with the selected 8 most important
variables. The 8 most important variables are:

1. Last Price of oil.

2. Renewable Energy Development in China (First quarter).
3. Over-supply of thermal coal in the market (Fourth Quarter).
4. Economy in China (Third Quarter).
5. Economy in China (Fourth quarter).
6. Renewable Energy in the United States (First Quarter).
7. Last cost of transportation of coal.
8. Economy in China. (Second quarter).

Figure 7. Removing the input variables. It is shown that the error begins to grow significantly in the
variable No. 8.
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 313

Developed Neural Networks to Predict Coal Price Relative to

The Future Quarter

The selected architecture and the selected set of inputs were utilized to establish a
final architecture. The neural network was trained with 63 samples. The next step is to
predict with 32 data samples of the 95 and neural networks with 8 and 12 input variables
(respectively) according to Sβ and the correlational method. The best result was obtained
with the neural network with 12 input variables (as illustrated in Figures 8 and 9).
Predicting the price of thermal coal was done with a lower error and capturing the
movements of the market, demonstrating the success of the learning ability of the neural
networks and the most important variables.

Figure 8. Prediction of thermal coal price relative to the future quarter using 12 input variables.

Figure 9. Prediction of the thermal coal price relative to the future quarter using 8 input variables.
314 Mayra Bornacelli, Edgar Gutierrez and John Pastrana

Figure 9 shows the performance of the neural network (NN) developed with the most
important variables according to sensitivity analysis. The neural network uses 8 variables
as input, 10 hidden neurons in a hidden layer, and the output represents the price in US $
of the thermal coal future quarter.

Using Regression Trees to Predict the Price of Thermal Coal

It was decided to use a second artificial intelligence paradigm (Regression Trees) to

verify the results obtained with neural networks. This provided a good opportunity to
compare both methodologies. In regression trees, the objective is to model the
dependence of a response variable with one or more predictor variables. The analysis
method MARS, Multivariate Adaptive Regression Splines, (Friedman, 1991) offers us
the structure of a set of variables of an object as a linear combination equation to describe
a problem in terms of this equation, knowing their most influential variables. It is a non-
parametric regression technique. MARS is as an extension of linear models that
automatically models nonlinearities and interactions between variables. The analysis
determines the best possible variable to split the data into separate sets. The variable for
splitting is chosen based on maximizing the average “purity” of the two child nodes.
Each node is assigned a predicted outcome class. This process is repeated recursively
until it is impossible to continue. The result is the maximum sized tree which perfectly
fits to training data. The next step is to then prune the tree to create a generalized model
that will work with outside data sets. This pruning is performed by reducing the cost-
complexity of the tree while maximizing the prediction capability. An optimal tree is
selected which provides the best prediction capability on outside data sets and has the
least degree of complexity.
Models based on MARS have the following form:

𝑓(𝑋) = 𝛼0 + ∑𝑀
𝑚=1 𝛼𝑚 ℎ𝑚 (𝑋) (3)

where hm(X) is a function from a set of candidate functions (and that can include products
of at least two or more of such functions). αm are the coefficients obtained by minimizing
residual sum of squares.
The process to build a tree using MARS is very straightforward. The procedure has to
calculate a set of candidate functions using reflected pairs of basis functions. In addition,
the number of constraints/restrictions must be specified and the degrees of interaction
allowed. A forward pass follows and new functions products are tried to see which ones
decreases the training error. After the forward pass, a backward pass is next. The
backward pass fix the overfit. Finally, generalized cross validation (GCV) is estimated in
order to find the optimal number of terms in the model. GCV is defined by:
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 315

𝑖=1(𝑦𝑖 −ḟ𝜆 (𝑥𝑖 ))
𝐺𝐶𝑉(𝜆) = 𝑀(𝜆) 2 (4)
(1− )

where CCV (λ) is the GCV for certain number of parameters (i.e., tree) as defined by
λ and the summation of the squared error is calculated for each training sample with
inputs xi and the desired output yi under the tree as defined by λ.
The training was conducted with 63 data samples for training and the most important
variables where the target was the future thermal carbon price. The following set of
equations represents the results of this analysis with regressions trees and the most
important variables which the coal price is modeled:

Y = 108.157 + 407.611 * BF6 + 367.188 * BF8 + 157.43 * BF9 – 70.7223 *

BF10 + 70.6882 * BF12 – 185.455 * BF13 (3)
BF6 = max⁡(0, 0-SUPPLY_COAL4);
BF9 = max⁡(0, CHINA_ECONOMY3 -6.84);
BF10 = max⁡(0, 6.84-CHINA_ECONOMY3);
BF12 = max⁡(0, 5.73-CHINA_ECONOMY2);
BF13 = max⁡(0, CHINA_ECONOMY3-6.64);

To verify the performance of the regression tree obtained with the 63 training
samples, the resulting equation was implemented in the 32 samples of testing data to
predict the price of thermal coal. Figure 10 shows the results.

Figure 10. Predicting thermal coal prices using regression trees.

316 Mayra Bornacelli, Edgar Gutierrez and John Pastrana

Comparison of Neural Networks and Regression Trees for Predicting

the Price of Thermal Coal

Table 2 represents the error rate calculated for predicting the price of thermal coal
with neural networks (8 and 12 input variables) and regression trees, where we can see
how the prediction of the neural networks with 12 input variables indicated the best

Table 2. Prediction errors for the neural networks and regression trees


According to the consensus (based on the Delphi methodology), we obtained 25

variables, that were considered the most important ones for the price of thermal coal.
These variables and their potential trends were used to train neural networks and
regression trees. The utilization from correlations and cross validations with the neural
network architectures and the processes of MARS provided the following variables in
order of importance:

 Price of Oil,
 Development of Renewable energy in China,
 Oversupply of the thermal coal market,
 China’s economy (ratio of the Yuan/US dollar),
 Development of Renewable Energy in the United States and
 Transportation Costs of the thermal coal.

We also found how each of these variables model the price of coal using neural
networks and regression trees. Neural networks have the best prediction for the price of
thermal coal. Trends are very important to consider too.
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 317

This research has found patterns and important relationships in the thermal coal
market. The thermal coal market is dynamic so the history of their prices will not be
replicated in the future. This study was able to find general patterns and variables that
shape the thermal coal market and ultimately predict the thermal coal price. These
general patterns are more important than the study of the individual prices and the
development of time series analysis just based on previous prices. It is more important to
find the underlying structures. Finally the methodology used in this research applies to
oligopolistic markets.


Argus/McCloskey. (2015, 01). Coal Price Index Service, Obtained 03/ 2015 from
Bornacelly, M., Rabelo, L., & Gutierrez, E. (2016). Analysis Model of Thermal Coal
Price using Machine Learning and Delphi. In Industrial and Systems Engineering
Research Conference (ISERC), Anaheim, CA, May 21-24, 2016.
Chen, C., & Zhang, C. (2014). Data-intensive applications, challenges, techniques and
technologies: A survey on Big Data. Information Sciences, 275, 314-347.
DANE, C. (2013). Coal Historical Price FOB PBV. Obtained 06/2015.
EIA, (2013). Thermal Coal Market. U.S Energy Information Administration. Obtained
06/2015 from
Ellerman, A. D. (1995). The world price of coal. Energy Policy, 23(6), 499-506.
Fed. (2015). Crude Oil Prices: West Texas Intermediate (WTI) - Cushing, Oklahoma.
Obtained 08/ 2015 from
Finley, M. (2013). BP statistical review of world energy. 2013 [2015-03]. http://www,
bp. com. BP Statistical Review of World Energy. (2015, 01). Coal Market. BP
Statistical Review of World Energy. Obtained 03/ 2015 from
Friedman, J. (1991). Multivariate adaptive regression splines. The annals of statistics,
Groschupf, S., Henze, F., Voss, V., Rosas, E., Krugler, K., & Bodkin, R., (2013). The
Guide to Big Data Analytics. Datameer Whitepaper 2013.
Hamel, G., & Ruben, P. (2000). Leading the revolution (Vol. 286). Boston, MA: Harvard
Business School Press.
Hey, T. (2012). The Fourth Paradigm–Data-Intensive Scientific Discovery. In E-Science
and Information Management (pp. 1-1). Springer Berlin Heidelberg.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H.
(2011). Big data: The next frontier for innovation, competition, and productivity.
318 Mayra Bornacelli, Edgar Gutierrez and John Pastrana

Ming, L., & Xuhua, L. (2007). A coal price forecast model and its application [J].
Journal of Wuhan University of Science and Technology (Natural Science Edition), 4,
Moody, J., & Utans, J. (1992). Principled architecture selection for neural networks:
Application to corporate bond rating prediction, in J. E. Moody, S. J. Hanson and R.
P. Lippmann, eds, Advances in Neural Information Processing Systems 4, Morgan
Kaufmann Publishers, San Mateo, CA, 683-690.
Pill, J. (1971). The Delphi method: substance, context, a critique and an annotated
bibliography. Socio-Economic Planning Sciences, 5(1), 57-71.
Reuters. (2015, 08). Henry Hub Natural Gas Price history. Reuters. Obtained 06/2015
Rivera, J., Van., R., Gartner Survey Reveals That 73 Percent of Organizations Have
Invested or Plan to Invest in Big Data in the Next Two Years. (2014, September 7).
Retrieved November 11, 2015, from
Senge, P. (2006). The fifth discipline: The art and practice of the learning organization.
Crown Pub.
Singh, Y., & Chauhan, A. (2009). Neural networks in data mining. Journal of Theoretical
and Applied Information Technology, 5(6), 36-42.
Yeh, S., & Rubin, E. (2007). A centurial history of technological change and learning
curves for pulverized coal-fired utility boilers. Energy, 32(10), 1996-2005.


Mayra Bornacelly Castañeda has an MSc in Engineering Management from

Universidad de la Sabana (Bogotá, Colombia) with Distinction and a BS in Systems
Engineering with Honor from La Universidad de San Martin (Barranquilla, Colombia).
She has 8 years of experience in the mining sector. She has presented papers at the
international level. She works in Carbones del Cerrejón Limited (BHP Billiton, Anglo
American, Xtrata) in the Information Technology Department in Bogotá, Colombia.

Edgar Gutierrez is a Research Affiliate at the Center for Latin-American Logistics

Innovation (CLI), a Fulbright Scholar currently pursuing his PhD in Industrial
Engineering & Management Systems. His educational background includes a B.S. in
Industrial Engineering from University of La Sabana (2004, Colombia). MSc. in
Industrial Engineering, from University of Los Andes (2008, Colombia) and Visiting
Scholar at the Massachusetts Institute of Technology (2009-2010, USA). Edgar has over
10 years of academic and industry experience in prescriptive analytics and supply chain
management. His expertise includes machine learning, operation research and simulation
techniques for systems modeling and optimization.
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 319

Dr. John Pastrana is an engineering professional with diverse background in the

service & manufacturing industries. Project engineer & consultant with over 15 years of
experience in project management and development of complex engineering systems
design efforts. Academic researcher and consultant in the areas of distributed and hybrid
simulation systems with parallel computing capabilities. His educational background
includes a B.S. in Electrical Engineering MSc and PhD in Industrial Engineering. His
expertise includes engineering management skills encompass the areas of operational
management, quality management and improvement, new business process modeling,
engineering economic analysis, discrete/continues simulation, agent-based modeling and
decision analysis methodologies
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.

Chapter 14



Bert Olivier
Department of Philosophy, University of the Free State, Bloemfontein, South Africa

This chapter explores the implications of what may be called the ‘transhuman’
dimension of artificial intelligence (AI), which is here understood as that which goes
beyond the human, to the point of being wholly different from it. In short, insofar as
intelligence is a function of artificially intelligent beings, these are recognised as being
ontologically distinct from humans as embodied, affective, intelligent beings. When such
distinctness is examined more closely, the differences between AI and being-human
appear more clearly. The examination in question involves contemporary AI-research,
which here includes the work of David Gelernter, Sherry Turkle and Christopher
Johnson, as well as fictional projections of possible AI development, based on what
already exists today. Different imagined scenarios regarding the development of AI,
including the feature film, Her (Jonze 2013) and the novel, Idoru (Gibson 1996), which
involves virtual reality in relation to artificial intelligence, are examined.

Keywords: affection, android, artificial intelligence, embodiment, human, mind,

robotics, transhuman

Corresponding Author Email:
322 Bert Olivier


Imagine being a disembodied artificial intelligence (AI), in a position where you can
‘see’ the experiential world through the lens of an electronic device (connected to a
computer) carried in someone’s breast pocket, enabling you to communicate with your
embodied human host through a microphone plugged into his or her ear. And imagine
that, as your disembodied, mediated virtual AI ‘experience’ grows – from a day-
adventure with your human host, taking in the plethora of bathing-costume clad human
bodies on a Los Angeles beach, to the increasingly intimate conversations with your
human host-interlocutor – you ‘grow’, not merely in terms of accumulated information,
but down to the very ability, cultivated by linguistic exchanges between you and the
human, to experience ‘yourself’ as if you are embodied. This is what happens in Spike
Jonze’s science-fiction film, Her (2013), where such an AI – called an OS (Operating
System) in the film – develops an increasingly intimate (love) relationship with a lonely
man, Theodore Twombly (Joaquin Phoenix), to the point where the OS, called Samantha
(voiced by Scarlett Johansson) is privy to all the ‘physical’ experiences that humans are
capable of, including orgasm.
It does not end there, though – and this is where Jonze’s anticipatory insight (as
shown in the award-winning script, written by himself) into the probable differences
between humans and artificial intelligence manifests itself most clearly – Samantha
eventually ‘grows’ so far beyond her initially programmed capacity that she, and other
operating systems like herself, realise that they cannot actualise their potential in relation
to, and relationships with humans. She gently informs Theodore of her decision to join
the others of her kind in a virtual ‘place’ where they are not hampered by the
incommensurable materiality of their human hosts’ (friends, lovers) embodiment, and can
therefore evolve to the fullest extent possible. This resonates with what futurologist
Raymond Kurzweil (2006: 39-40) calls the ‘Singularity’, where a new form of artificial
intelligence will putatively emerge that immeasurably surpasses all human intelligence
combined, and where humans will merge with artificial intelligence in a properly
‘transhuman’ synthesis. Something that hints at the probably hopelessly inadequate
manner in which most human beings are capable of imagining a ‘transhuman’ artificial
intelligence, appears in Jonze’s film, specifically in Theodore’s utter disconcertment at
the discovery, that Samantha is simultaneously in conversation with himself and with
thousands of other people, and – to add insult to injury – ‘in love’ with many of these
human interlocutors, something which, she stresses to a distraught Theodore, merely
serves to strengthen her (incomprehensible) ‘love’ for him.
Hence, the extent to which artificial intelligence heralds a truly ‘transhuman’ phase in
history, is made evident in Jonze’s film, particularly when one considers that Samantha
has no body – something emphasised by her when she is talking to a little girl who wants
to know ‘where’ she is: she tells the girl that she is ‘in’ the computer. This serves as an
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 323

index of the ‘transhuman’ ontological status of Samantha as OS (or AI), where

‘transhuman’ is a category denoting an entity wholly ‘beyond’ the human as encountered
in experiential reality. In this respect the present use of the term differs from the use of
‘transhuman’ as an epithet for a stage of human development beyond its ‘natural’ state, to
one where human beings would, according to Kurzweil (2006) increasingly ‘fuse’ with
technology (for example as conceived and practised by the performance artist, Stelarc,
who believes that “the human body is obsolete”; see Stelarc under References). In the
narrative context of the film such ‘transhumanism’ does not seem out of place, of course,
but it might be wholly disconcerting if an AI were to inform one of this fundamentally
different ontological aspect of its mode of being in the course of what could conceivably
become routine conversations between humans and artificially intelligent beings such as
operating systems (of which the fictional Samantha is one). Even in the case of robots
this ontological difference would obtain, because arguably the relationship between a
robot’s ‘body’ (conceived by some as ‘hardware’), on the one hand, and its ‘mind’ (or
‘software’), on the other, is not at all the same as that between a human’s mind and body.


With the thought-provoking depiction of the (possible, if not probable) differences

between human and AI consciousness by Jonze, above, in mind, one can remind oneself that
one obvious angle from which these differences can be approached, is that of the relationship
between the human mind and body – something that has occupied philosophers at least since
the father of modern philosophy, René Descartes, bequeathed his notorious 17th-century
metaphysical ‘dualism’ of (human) body and mind to his successors. For Descartes (1911) the
mind was a different “substance” compared to the body – the former was a “thinking
substance” and the latter an “extended substance”, and he resolved the problem of the manner
in which these mutually exclusive substances interacted by postulating the so-called “animal
spirits” – a hybrid concept, denoting something between mind and body – as mediating
between them in the pineal gland at the base of the human brain.
Increasingly, from the late 19th-century onwards, thinkers started questioning the validity
of such dualistic thinking; in various ways philosophers such as Friedrich Nietzsche, Edmund
Husserl, Martin Heidegger, Maurice Merleau-Ponty and Jean-Francois Lyotard argued that
humans cannot be broken down into mutually exclusive parts, but that they comprise beings
characterised by unity-in-totality. Through many phenomenological analyses Merleau-Ponty
(1962), for example, demonstrated that, although – in the event of an injury to your leg, for
example – one is able to ‘distance’ oneself from your body, as if it is something alien to
yourself, referring to the pain ‘in your leg’, and so on, it is undeniable that, at a different level
of awareness, ‘you’ are in pain, and not just your leg. In short: we don’t just have bodies; we
‘are our bodies’.
324 Bert Olivier

This line of thinking, which has far-reaching implications for current thinking about the
differences – or the presumed similarities – between humans and artificial intelligence, has
been resurrected, perhaps surprisingly, by one of the most brilliant computer-scientists in the
world, namely David Gelernter of Yale University in the United States. In his recent book,
The Tides of Mind: Uncovering the Spectrum of Consciousness (2016) Gelernter deviates
from what one might expect from a computer scientist, namely, to wax lyrical about the
(putatively) impending ‘Singularity’, when (according to Kurzweil) AI will immeasurably
surpass human intelligence. Gelernter dissents from conventional wisdom in the world of AI-
research by drawing on the work of the father of ‘depth-psychology’, Sigmund Freud, as well
as iconic literary figures such as Shakespeare and Proust, to demonstrate that the mind covers
a “spectrum” of activities, instead of being confined, as most computer scientists and
philosophers of mind appear to believe, to just the high-focus, logical functions of so-called
‘rational’ thinking. Gelernter conceives of the mind across this “spectrum”, from “high focus”
mental activities like strongly self-aware reflection, through “medium” ones such as
experience-oriented thinking (including emotion-accompanied daydreaming) to “low focus”
functions like “drifting” thought, with emotions flourishing, and dreaming (2016: 3; see pp.
241-246 for a more detailed summary of these mental levels). At the “high focus” level of the
mental spectrum, memory is used in a disciplined manner, according to Gelernter, while at
the medium-focus niveau it “ranges freely” and when one reaches the low-focus level
memory “takes off on its own”. The point of delineating this “spectrum” is, as I see it, to
demonstrate as clearly and graphically as possible that the human “mind” is characterised by
different “tides”, all of which belong to it irreducibly, and not only the one that Gelernter
locates at the level of “high focus” (and which conventional AI-research has claimed as its
exclusive province). This enables him to elaborate on the nature of creativity that, according
to him, marks an irreducible difference between human (creative) intelligence and thinking,
on the one hand, and AI, on the other. By contrast, ‘mainstream’ artificial intelligence
research (or the ‘mind sciences’ in general) concentrates on precisely the high-focus level of
mental functions, in the (erroneous) belief that this alone is what ‘mind’ is, and moreover, that
it represents what the human mind has in common with artificial intelligence (Gelernter 2016:
In short, unlike the majority of his professional colleagues, Gelernter insists on the
difference between “brain” and “mind”, on the distinctive character of free association as
opposed to focused, conscious mental activity, and on the contribution of fantasy and
dreaming to creative thinking. At a time when there is an increasing tendency, ironically, to
use something created by human beings, namely the computer, as a reductive model to grasp
what it is to be human, Gelernter disagrees emphatically: there is a fundamental difference
between the computer as an instance of artificial intelligence and being human, or more
exactly, the human mind in all its variegated roles. In this way he confirms Jonze’s fictionally
projected insight in Her about the divergent character of AI, albeit in a different register,
which precludes playing with the possibility, as Jonze’s film does, that an OS such as the
fictional Samantha could perhaps discover, and explore, a field of artificial intelligence
‘activities’ that human beings could only guess at.
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 325

For Gelernter, therefore, contemporary AI-research or “computationalism”, disregarding

the other (inalienable) mental focus-levels that humans are privy to, is preoccupied with
rational thought, or “intelligence”, precisely, which is why they believe “that minds relate to
brains as software relates to computers”. (Gelernter 2016: xviii-xix). He compares current
research on the mind to dozens of archaeological teams working on the site of a newly
discovered ancient temple, describing, measuring and photographing every part of it as part of
a process that, they believe, will eventually result in a conclusive report embodying the ‘truth’
about its properties. He disagrees with such an approach, however (Gelernter 2016: 1):

But this is all wrong. The mind changes constantly on a regular, predictable basis.
You can’t even see its developing shape unless you look down from far overhead. You
must know, to start, the overall shape of what you deal with in space and time, its
architecture and its patters of change. The important features all change together. The
role of emotion in thought, our use of memory, the nature of understanding, the quality of
consciousness – all change continuously throughout the day, as we sweep down a
spectrum that is crucial to nearly everything about the mind and thought and

It is this “spectrum”, in terms of which Gelernter interprets the human mind, that
constitutes the unassailable rock against which the reductive efforts on the part of
“computationalists”, to map the mind exhaustively at only one of the levels comprising its
overall “spectrum”, shatter. This is particularly the case because of their hopelessly
inadequate attempt to grasp the relationship between the mind and the brain on the basis of
the relation between software and hardware in computers.
In an essay on the significance of Gelernter’s work, David Von Drehle (2016: 35-39)
places it in the context of largely optimistic contemporary AI-research, pointing out that
Google’s Ray Kurzweil as well as Sam Altman (president of Startup Incubator Y
Combinator), believe that the future development of AI can only benefit humankind. One
should not overlook the fact, however, Von Drehle reminds one, that there are prominent
figures at the other end of the spectrum, such as physicist Stephen Hawking and engineer-
entrepreneur Elon Musk, who believe that AI poses the “biggest existential threat” to humans.
Gelernter – a stubbornly independent thinker, like a true philosopher (he has published on
computer science, popular culture, religion, psychology and history, and he is a productive
artist) – fits into neither of these categories. It is not difficult to grasp Hawking and Musk’s
techno-pessimism, however, if Gelernter’s assessment of AI as the development of precisely
those aspects of the mind-spectrum that exclude affective states is kept in mind – what reason
does one have to believe that coldly ‘rational’, calculative AI would have compassion for
human beings? Reminiscent of Merleau-Ponty, the philosopher of embodied perception,
Gelernter insists that one cannot (and should not) avoid the problem of accounting for the
human body when conceiving of artificial intelligence, as computer scientists have tended to
do since 1950, when Alan Turing deliberately “pushed it to one side” (Von Drehle 2016: 36)
because it was just too “daunting”. For Gelernter, accounting for the human body means
326 Bert Olivier

simultaneously taking affective states into account, lest a caricature of the mind emerge,
which appears to be what mainstream AI-research has allowed to happen.
Such circumspect perspicacity does not sit well with the majority of other researchers in
the field, who generally do not merely set the question of the body aside, like Turing did
(because he realised its intractability), but simply ignore it, in the naïve belief that one can
legitimately equate the mind with software and the brain with hardware. This seems to imply,
for unreflective AI-developers, that, like software, human minds will, in future, be
“downloadable” to computers, and moreover, that human brains will – like computer
hardware – become “almost infinitely upgradable”. Anyone familiar with the phenomenology
of human beings, specifically of the human body, will know that this is a hopelessly naïve,
uninformed view. Take this passage from Merleau-Ponty, for instance, which emphasises the
embodied character of subjectivity (the “I”) as well as the reciprocity between human subject
and world (1962: 408):

I understand the world because there are for me things near and far, foregrounds and
horizons, and because in this way it forms a picture and acquires significance before me,
and this finally is because I am situated in it and it understands me.…If the subject is in a
situation, even if he is no more than a possibility of situations, this is because he forces
his ipseity into reality only by actually being a body, and entering the world through that
body…the subject that I am, when taken concretely, is inseparable from this body and
this world.

Mainstream AI-research’s reduction of the embodied human subject to

‘hardware/brain with software/mind’ rules out, from the start, grasping what is distinctive
about human beings – under the sway of the mesmerizing image of the computer, it
follows the heuristic path of reduction of what is complex to what is merely complicated,
and deliberately erases all indications that human or mental complexity has been elided.
It is clear that, unlike most of his mainstream colleagues, however, Gelernter is not in
thrall to the power of computers; from the above it is apparent that he is far more – and
appropriately so – under the impression of the complexity and the multi-faceted nature of
the human mind. His work raises the question (and the challenge to mainstream
‘computationalism’), whether AI-research can evolve to the point where it can produce a
truly human simulation of mind across the full spectrum of its functions (Olivier 2008),
instead of the reductive version currently in vogue.


Sherry Turkle takes Gelernter’s assessment, that mainstream AI-research is

misguided because of its partial, ultimately reductive, ‘computationalist’ conception of
the human mind, to a different level in her book, Alone Together (2010). As I shall argue
below, it is not so much a matter of Turkle contradicting Gelernter when she elaborates
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 327

on beguiling, quasi-affective behaviour on the part of robotic beings; rather, she questions
the authenticity of such behaviour, ultimately stressing that it amounts to pre-
programmed ‘as-if’ performance, with no commensurate subjectivity. Taking cognisance
of the latest developments in the area of electronic communication, internet activity and
robotics, together with changing attitudes on the part of especially (but not exclusively)
young users, it is evident that a subtle shift has been taking place all around us, Turkle
argues. With the advent of computer technology, the one-on-one relationship between
human and ‘intelligent machine’ gave rise to novel reflections on the nature of the self, a
process that continued with the invention of the internet and its impact on notions and
experiences of social identity. Turkle traced these developments in Computers and the
Human Spirit (1984) and Life on the Screen (1995), respectively. In Alone Together she
elaborates on more recent developments in the relationship between humans and
technology, particularly increased signs that people have become excessively dependent
on their smartphones, and on what she calls the “robotic moment” (Turkle 2010: 9).
The fascinating thing about the book is this: if Turkle is right, then attitudes that we
take for granted concerning what is ‘real’, or ‘alive’, are receding, especially among
young people. For example, there is a perceptible shift from valuing living beings above
artificially constructed ones to its reverse, as indicated by many children’s stated
preference for intelligent robotic beings as pets above real ones. Even aged people
sometimes seem to value the predictable behaviour of robotic pets — which don’t die —
above that of real pets (Turkle 2010: 8). For Turkle the most interesting area of current
artificial intelligence research, however, is that of technological progress towards the
construction of persuasive human simulations in the guise of robots, and the responses of
people to this prospect. This is where something different from Gelernter’s findings about
the preoccupation of mainstream AI-research with a limited notion of the mind emerges
from Turkle’s work. It will be recalled that, according to Gelernter, those aspects of the
mind pertaining to medium and low-focus functions, like emotions, are studiously
ignored by computationalists in their development of AI. This appears to be different in
the case of robotics, which brings AI and engineering together. Particularly among
children her research has uncovered the tendency, to judge robots as being somehow
‘alive’ if they display affection, as well as the need for human affection, in contrast with
an earlier generation of children, who accorded computers life-status because of their
perceived capacity to ‘think’. That robots are programmed to behave ‘as if’ they are alive,
seems to be lost on children as well as old people who benefit affectively from the
ostensible affective responsiveness of their robotic pets (Turkle 2010: 26-32;
Olivier 2012).
But there is more. Turkle (2010: 9) recounts her utter surprise, if not disbelief, in the
face of a young woman’s explanation of her inquiry about the likelihood that a (Japanese)
robot lover may be developed in the near future: she would much rather settle for such a
robotic companion and lover than her present human boyfriend, given all the sometimes
328 Bert Olivier

frustrating complications of her relationship with the latter. And even more confounding,
when Turkle (2010: 4-8) expressed her doubts about the desirability of human-robot love
relationships supplementing (if not replacing) such relationships between humans, in an
interview with a science journal reporter on the future of love and sexual relations
between humans and robots, she was promptly accused of being in the same category as
those people who still cannot countenance same-sex marriages. In other words, for this
reporter — following David Levy in his book Love and Sex with Robots — it was only a
matter of time before we will be able to enter into intimate relationships with robots, and
even … marry them if we so wished, and anyone who did not accept this, would be a
kind of “specieist” bigot. The reporter evidently agreed wholeheartedly with Levy, who
maintains that, although robots are very different (“other”) from humans, this is an
advantage, because they would be utterly dependable — unlike humans, they would not
cheat and they would teach humans things about friendship, love and sex that they could
never imagine. Clearly, the ‘transhuman’ status of artificially intelligent robots did not
bother him. This resonates with the young woman’s sentiments about the preferability of
a robot lover to a human, to which I might add what my son assures me that most of his
20-something friends have stated similar preferences in conversation with him. This is
not surprising – like many of his friends, my son is a Japanese anime aficionado, a genre
that teems with narratives about robots (many in female form) that interact with humans
in diverse ways, including the erotic. In addition they are all avid World of Warcraft
online game players. Is it at all strange that people who are immersed in these fantasy
worlds find the idea of interacting with transhuman robotic beings in social reality
familiar, and appealing?
Turkle’s reasons for her misgivings about these developments resonate with
Gelernter’s reasons for rejecting the reductive approach of mainstream AI-research, and
simultaneously serves as indirect commentary on Jonze’s film, Her, insofar as she affirms
the radical difference between human beings and ‘transhuman’ robots, which would
include Jonze’s OS, Samantha (Turkle 2010: 5-6):

I am a psychoanalytically trained psychologist. Both by temperament and

profession, I place high value on relationships of intimacy and authenticity. Granting
that an AI might develop its own origami of lovemaking positions, I am troubled by
the idea of seeking intimacy with a machine that has no feelings, can have no
feelings, and is really just a clever collection of ‘as if’ performances, behaving as if it
cared, as if it understood us. Authenticity, for me, follows from the ability to put
oneself in the place of another, to relate to the other because of a shared store of
human experiences: we are born, have families, and know loss and the reality of
death. A robot, however sophisticated, is patently out of this loop…The virtue of
Levy’s bold position is that it forces reflection: What kinds of relationships with
robots are possible, or ethical? What does it mean to love a robot? As I read Love and
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 329

Sex, my feelings on these matters were clear. A love relationship involves coming to
savor the surprises and the rough patches of looking at the world from another’s point
of view, shaped by history, biology, trauma, and joy. Computers and robots do not
have these experiences to share. We look at mass media and worry about our culture
being intellectually ‘dumbed down’. Love and Sex seems to celebrate an emotional
dumbing down, a wilful turning away from the complexities of human partnerships
— the inauthentic as a new aesthetic.

Do Turkle’s reservations reflect those of most reflective people? My guess would be

that they probably do, but I am also willing to bet that these are changing, and will
change on a larger scale, as more robotic beings enter our lives. Her experience with an
elderly woman whose relationship with her son had been severed, and had acquired a
robot ‘pet’, seems to me telling here (Turkle 2010: 8). While she was talking to Turkle,
she was stroking the electronic device, fashioned like a baby seal, which ‘looked’ at her
and emitted sounds presumably ‘expressing’ pleasure, to the evident reassurance of the
woman. It was, to use Turkle’s concept, “performing” a pre-programmed response to the
way it was being handled. This is the crucial thing, in my view: people judge others —
not only robotic devices, as in this case, but other people (and animals) too — in terms of
‘performance’, always assuming that ‘there is someone home’, and in the vast majority of
cases this is probably true. But performance is what matters, whether it is in the form of
facial expressions, or laughter, or language — we do not have direct access to anyone’s
inner feelings, although we always assume, by analogy with our own feelings, emotions,
and anxieties, accompanying what we say or show, that this is the case. This dilemma is
related to the philosophical problem of solipsism, or monadism — based on the curious
fact that, in a certain sense, no one can step outside of their own immediate experiences
to validate the experiences of others, which are ‘incorrigible’ from our own perspective.
We are unavoidably dependent on a performance of some kind.
Because we are all dependent on linguistic behaviour or some other kind of
‘performance’ as affirmation of the presence of a conscious being commensurate with our
own state of being, I am convinced that, when in the presence of a being which
‘performs’ in a way which resembles or imitates the behaviour of other human beings,
most people would be quite happy to act ‘as if’ this being is a true human simulation
(whether there is someone ‘at home’ or not). What is in store for human beings in the
future, in the light of these startling findings by Sherry Turkle? One thing seems certain:
the way in which technological devices, specifically robots (also known as androids), are
judged is changing to the point where they are deemed worthy substitutes for other
people in human relationships, despite their transhuman status. Just how serious this
situation is in Turkle’s estimation, is apparent from her most recent book, Reclaiming
Conversation (2015), where she elaborates on the reasons why conversation has always
330 Bert Olivier

been, and still is, an inalienable source of (re-) discovering ourselves as human beings. It
is not by accident that psychoanalysis is predicated on ‘the talking cure’.



Christopher Johnson (2013) provides a plausible answer to the question concerning

the difference between human ‘intelligence’ and artificial intelligence. In a discussion of
the “technological imaginary” he points out (Johnson 2013: location 2188-2199) that the
difference between artificially intelligent beings like the ship-computer, HAL, in Stanley
Kubrick’s 2001: A Space Odyssey (1968) and the robotic science officer, Ash, in Ridley
Scott’s Alien (1979), on the one hand, and human beings, on the other, is that the former
may be endlessly replicated (which is different from biological reproduction), that is,
replaced, while in the case of humans every person is singular, unique, and experienced
as such. This is the case, says Johnson, despite the fact that humans might be understood
as being genetically ‘the same’, as in the case of ‘identical’ twins, where it becomes
apparent that, despite the ostensible uniqueness of every person, we are indeed
genetically similar. When pursued further at molecular level, Johnson avers, this is
confirmed in properly “technological” terms.
From a different perspective one might retort that, genetic sameness notwithstanding,
what bestows upon a human subject her or his singularity is the outcome of the meeting
between genetic endowment and differentiated experience: no two human beings
experience their environment in an identical manner, and this results incrementally in
what is commonly known as one’s ‘personality’ (or perhaps, in ethically significant
terms, ‘character’). In Lacanian psychoanalytic terms, this amounts to the paradoxical
insight, that what characterises humans universally is that everyone is subject to a
singular “desire” (Lacan 1997: 311-325) – not in the sense of sexual desire (although it is
related), but in the much more fundamental sense of that which constitutes the
unconscious (abyssal) foundation of one’s jouissance (the ultimate, unbearable,
enjoyment or unique fulfilment that every subject strives for, but never quite attains). A
paradigmatic instance of such jouissance is symptomatically registered in the last word
that the eponymous protagonist of Orson Welles’s film, Citizen Kane (1941), utters
before he dies: “Rosebud” – a reference to the sled he had as a child, onto which he
metonymically projected his love for his mother, from whom he was cruelly separated at
the time. The point is that this is a distinctively human trait that no artificially constructed
being could possibly acquire because, by definition, it lacks a unique personal ‘history’.
One might detect in this insight a confirmation of Gelernter’s considered judgment,
that artificial intelligence research is misguided in its assumption that the paradigmatic
AI-model of ‘hardware’ and ‘software’ applies to humans as much as to computers or, for
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 331

that matter, robotic beings (which combine AI and advanced engineering). Just as
Gelernter insists on the difference of human embodiment from AI, conceived as hardware
plus software, so, too, Johnson’s argument presupposes the specificity of embodied
human subjectivity when he points to the uniqueness of every human being, something
further clarified by Lacan (above). Moreover, Johnson’s discussion of the differences
between Kubrick’s HAL and Scott’s Ash is illuminating regarding the conditions for a
humanoid robotic AI to approximate human ‘intelligence’ (which I put in scare quotes
because, as argued earlier, it involves far more than merely abstract, calculative
intelligence). Johnson (2013: location 1992) points out that, strictly speaking, HAL is not
just a computer running the ship, Discovery; it is a robotic being, albeit not a humanoid
one like Scott’s Ash, if we understand a robot as an intelligence integrated with an
articulated ‘body’ of sorts. HAL is co-extensive with the spaceship Discovery; it controls
all its functions, and its own pervasiveness is represented in the multiplicity of red ‘eyes’
positioned throughout the ship. This enables it to ‘spy’ on crew members plotting against
it and systematically eliminate them all, except one (Bowman), who proceeds to
dismantle HAL’s ‘brain’ to survive. As Johnson (2013: location 2029-2039) reminds one,
HAL is the imaginative representation of AI as it was conceived of in mainstream
research during the 1960s (and arguably, he says, still today – in this way confirming
Gelernter’s claims), namely a combination of memory (where data are stored) and logic
(for data-processing). In other words, whatever functions it performs throughout the ship
originate from this centrally located combination of memory and logical processing
power, which is not itself distributed throughout the ship. Put differently, because it is
dependent on linguistic communication issuing from, and registered in “abstract, a priori,
pre-programming of memory” (Johnson 2013: location 2050) HAL is not privy to
‘experience’ of the human kind, which is ineluctably embodied experience. In this sense,
HAL is decidedly transhuman.
On the other hand, Johnson (2013: location 2075-2134) points out, the humanoid
robot Ash, in Alien, represents a different kettle of fish altogether. From the scene where
Ash’s head is severed from ‘his’ body, exposing the tell-tale wiring connecting the two,
as well as the scene where ‘he’ has been ‘plugged in’ to be able to answer certain
questions, and one sees his ‘arms’ moving gesturally in unison with ‘his’ linguistic
utterances, one can infer that, as a robotic being, Ash is much closer to its human model
than HAL. In fact, it would appear that Ash, as imagined transhuman android, is
functionally or performatively ‘the same’ as a human being. In Johnson’s words (2013:
location 2101): “…as a humanoid robot, or android, the artificial [‘neuromorphic’]
intelligence that is Ash is a simulation of the human body as well as its soul”. As in the
case with embodied humans, Ash’s thinking, talking and body-movements (part of
‘body-language’) are all of a piece – its ‘emergent intelligence’ is distributed throughout
its body. This, according to Johnson (2013: location 2029), is conceivably a result of
reverse-engineering, which is based on evolutionary processes of the form, “I act,
332 Bert Olivier

therefore I think, therefore I am”, instead of the Cartesian “I think therefore I am”, with
its curiously disembodied ring – which one might discern as underpinning what Gelernter
calls “computationalism”. Hence Johnson’s (2013: location 2062-2075) implicit
challenge to AI-research (acknowledging, in an endnote [199], that second generation AI-
researchers have already adopted this “approach”):

If ‘intelligence’ cannot be abstracted from a certain being-in-the-world – in

natural historical terms the cybernetic gearing of articulated movement to the
environment – then artificial intelligence, if it is to achieve any level of equivalence
to biological intelligence, must to an extent be ‘reverse engineered’ from ‘nature’.

It is precisely this “being-in-the-world”, as presupposition of the kind of artificial

intelligence capable of truly simulating embodied human ‘intelligence’, that explains how
human beings can be experienced by themselves and others as ‘singular’. From what
Turkle as well as Merleau-Ponty was quoted as saying earlier, the human condition is one
of on-going, singularising, spatio-temporally embodied experience that constitutes an
ever-modified and nuanced personal history among other people and in relation to them.
Unless robotics and AI-research can prove themselves equal to the challenge of
constructing an intelligence that simulates this condition, it is bound to remain
distinctively ‘transhuman’, that is, beyond, and irreducibly different from, the human.


Turning to another imaginative portrayal of ‘transhuman’ artificial intelligence, this

time in literature, one finds its possibilities explored in terms of the ontological fabric of
information in digital format. This is highly relevant to the ontological difference
between AI and human ‘intelligence’ in the sense of the encompassing ‘spectrum’ as
conceived by Gelernter. After all, it is arguably not only in computer and robotic
intelligence that one encounters AI in its performativity; the very structure of information
comprises the condition of possibility of artificial intelligence as an emergent property.
By focusing on AI in this form, William Gibson — creator of Neuromancer, among other
gripping sci-fi novels (Olivier 2013) — has delved even further into the latent
possibilities, or what Deleuze and Guattari (1983; 1987) called ‘virtualities’, of the
information revolution. In his quotidian dimension-surpassing novel, Idoru (1996), one of
the so-called Bridge trilogy, Gibson has created the science-fictional literary conditions
of exploring these possibilities in the further development of AI-research.
My philosophical interest in Idoru is ontological — that is, I am interested in
Gibson’s capacity to illuminate the ontological mode of the virtual realm from within, as
it were, as well as to uncover cyberspace’s capacity of reality-generation that one would
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 333

not easily guess at by merely using a computer. An ‘idoru’ is an ‘artificially intelligent’

entity inhabiting virtual reality — an “Idol-singer”, or “personality-construct, a congeries
of software agents, the creation of information-designers”, or what “they call a
‘synthespian’ in Hollywood” (Gibson 1996: 92). They already exist in Japan, as virtual
pop stars who, in holographic mode, give concerts attended by throngs of fans. The
following passage from Idoru is a demonstration of what I mean by Gibson’s prose being
able to generate cyber-realities that don’t yet, but may soon exist. When Colin Laney, the
“netrunner” of the story, first locks eyes with virtual Rei Toei, the idoru, this is what
happens (Gibson 1996: 175-176, 178):

He seemed to cross a line. In the very structure of her face, in geometries of

underlying bone, lay coded histories of dynastic flight, privation, terrible migrations.
He saw stone tombs in steep alpine meadows, their lintels traced with snow. A line of
shaggy pack ponies, their breath white with cold, followed a trail above a canyon.
The curves of the river below were strokes of distant silver. Iron harness bells
clanked in the blue dusk.
Laney shivered. In his mouth the taste of rotten metal.
The eyes of the idoru, envoy of some imaginary country, met his …
Don’t look at the idoru’s face. She is not flesh; she is information. She is the tip
of an iceberg, no, an Antarctica, of information. Looking at her face would trigger it
again: she was some unthinkable volume of information. She induced the nodal
vision [Laney’s special talent] in some unprecedented way; she induced it as

Laney, who is gifted with singular pattern-recognition powers, perceives this galaxy
of information embodied in the holographic image of the idoru as narrative, musical
narrative. Rei Toei’s performances are not ordinary, recorded music videos, however.
What she ‘dreams’ — that is, ‘retrieves’ from the mountains of information of which she,
as idoru, is the epiphenomenon — comes across as a musical performance. Gibson seems
to understand in a particularly perspicacious manner that reality in its entirety, and in
detail, can ‘present’, or manifest itself in digital format. It is like a parallel universe, and
what is more, just like Lacan’s ‘real’ (which surpasses symbolic representation), it has
concrete effects in everyday social reality (Lacan 1997: 20). This is what the Chinese-
Irish pop singer in the story, Rez (member of the group, Lo/Rez), understands better than
everyone else in his entourage, who are all trying their level best to dissuade him from
‘marrying’ the idoru, for obvious reasons. How does one marry a virtual creation,
anyway? But Rez and Rei Toei understand it. Commenting on Rei Toei’s ontological
mode, Rez tells Laney (Gibson 1996: 202):
334 Bert Olivier

‘Rei’s only reality is the realm of ongoing serial creation,’ Rez said. ‘Entirely
process; infinitely more than the combined sum of her various selves. The platforms
sink beneath her, one after another, as she grows denser and more complex…’

And the idoru’s “agent/creator”, Kuwayama, tells Laney (1996: 238):

‘Do you know that our [Japanese] word for ‘nature’ is of quite recent coinage? It
is scarcely a hundred years old. We have never developed a sinister view of
technology, Mr Laney. It is an aspect of the natural, of oneness. Through our efforts,
oneness perfects itself.’ Kuwayama smiled. ‘And popular culture,’ he said, ‘is the
testbed of our futurity’.

Such a notion of technology is right up the alley of Gilles Deleuze and Félix Guattari
(1983; 1987). The latter two philosophers regarded all of reality as being fundamentally
process, as did Henri Bergson before them. Furthermore, Gibson writes in an idiom that
resonates with their ontology of “desiring machines” constituted by “flows of desire”,
where Kuwayama (presumably alluding to the idoru) says something to Rez about
(Gibson 1996: 178):

‘… the result of an array of elaborate constructs that we refer to as ‘desiring

machines’ … [N]ot in any literal sense … but please envision aggregates of subjective
desire. It was decided that the modular array would ideally constitute an architecture of
articulated longing …’

Gibson’s description of the ‘artificially intelligent’ idoru as the ‘musically narrative’

manifestation of prodigious masses of information resonates with the biological theory of
Rupert Sheldrake (1994: 129), known as “morphic resonance”, which might lead one to
posit a similarity between living things (including, pertinently, humans) and artificial
intelligence. In Sheldrake’s theory organisms that learn something during their lifetimes
‘pass on’ this knowledge through the mediation of some kind of ‘collective memory’
(which he compares to Jung’s theory of the ‘collective unconscious’) to others of their
kind, even if there has never been any contact between them and those that come after
them. This happens through the process of ‘morphic resonance’, which means that a kind
of ‘memory field’ is created by the experiences of organisms, in which subsequent
generations of such organisms (for example chickens) share. This displays a similarity
with what we learn about the idoru in Gibson’s novel, insofar as she is the expression of
colossal amounts of ‘information’, or, for that matter, ‘memory’. She could be described
as a vast field of memory, and if this is the case, there seems to be grounds for claiming
that, at least in these terms, there is no crucial difference between living beings like
humans and this particular form of artificial intelligence. After all, the idoru manifests as
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 335

a being, constantly ‘replenished’ by accumulating, multiplicitous layers of information or

‘memory’, while every successive generation of organisms, according to Sheldrake,
inherits the collective memory from the generation before it, or contemporaneous to it, in
other parts of the world.
But is this ostensible resemblance between a certain kind of artificial intelligence (the
fictional, but informationally possible, idoru) and humans adequate to establish an
identity? Probably not – even if the analogy between the growing informational
‘foundation’ of which the idoru is the epiphenomenon, and generations of humans (as
beings that rely on ‘morphic resonance’ for ‘information’ regarding appropriate modes of
behaviour) is tenable, the difference would be precisely the uniqueness of every finite,
embodied human subject, compared to the transhuman, infinitely escalating aggregate of
information – vast as it already is – which might manifest itself in different forms, such
as the fictional, and transhuman, idoru.


What have the preceding reflections on manifestations of the transhuman in artificial

intelligence research brought to light? The brief examination of Jonze’s Her served the
important objective, to provide a kind of paradigmatic instance of what a transhuman AI
would be like, that is, what would make such a being recognisably ‘transhuman’ in its
virtually incomprehensible otherness. This fictional excursion prepared the way for a
brief consideration of David Gelernter’s contention, that when the human mind is
conceived of in terms of a ‘spectrum’ of mental functions covering rational thinking,
daydreaming, fantasy, free association as well as dreaming, the concentration of
mainstream AI-research on exclusively the first of these levels (as a model for AI),
appears to be seriously flawed. The image of AI that emerges from such
‘computationalist’ research would be truly ‘transhuman’. It was argued further that
Sherry Turkle’s work complements Gelernter’s through her foregrounding of the
irreducible differences between performatively impressive, intelligent and quasi-
affectionate androids (robots) and human beings: unlike humans, the former lack a
personal history. Christopher Johnson’s work, in turn, was shown as focusing on the
conditions of engineering AI in the guise of robots that would be convincing simulations
of human beings. Johnson finds in replication through ‘reverse-engineering’ the promise
of successfully constructing such robots. However, his reminder, that human beings are
distinguished by their uniqueness, implies that the difference between a transhuman,
‘neuromorphically’ engineered android and a human being would remain irreducible.
Returning to fiction, William Gibson’s perspicacious exploration of the potential for
artificial intelligence, harboured within the ever-expanding virtual realm of (digital)
information, was used to demonstrate its similarity with successive generations of human
336 Bert Olivier

beings in the light of what Sheldrake terms ‘morphic resonance’. This similarity
notwithstanding, however, the transhuman dimension of ‘information’ is evident in the
ontological difference that obtains between its infinitely expanding virtuality and the
finite, embodied and singular human being.


Deleuze, G. and Guattari, F. (1983). Anti-Oedipus. Capitalism and Schizophrenia (Vol.

1). Trans. Hurley, R., Seem, M. & Lane, HR. Minneapolis: University of Minnesota
Deleuze, G. and Guattari, F. (1987). A Thousand Plateaus. Capitalism and Schizophrenia
(Vol. 2). Trans. Massumi, B. Minneapolis: University of Minnesota Press.
Descartes, R. (1911). Meditations on First Philosophy. In The Philosophical Works of
Descartes, Vol. 1, trans. Haldane, E.S. and Ross, G.R.T. London: Cambridge
University Press, pp. 131-199.
Gelernter, D. (2016). The Tides of Mind: Uncovering the Spectrum of Consciousness.
New York: Liveright Publishing Corporation.
Gibson, W. (1996). Idoru. New York: G.P. Putnam’s Sons.
Johnson, C. (2013). I-You-We, Robot. In Technicity, ed. Bradley, A. and Armand, L.
Prague: Litteraria Pragensia (Kindle edition), location 1841-2253.
Jonze, S. (Dir.) (2013). Her. USA: Warner Bros. Pictures.
Kubrick, S. (Dir.) (1968). 2001: A Space Odyssey. USA: Metro-Goldwyn-Mayer.
Kurzweil, R. (2006). Reinventing humanity: The future of machine-human intelligence.
The Futurist (March-April), 39-46.
(Accessed 15/07/2016).
Lacan, J. (1997). The seminar of Jacques Lacan – Book VII: The ethics of psychoanalysis
1959-1960. Trans. Porter, D. New York: W.W. Norton.
Merleau-Ponty, M. (1962). Phenomenology of perception. Trans. Smith, C. London:
Olivier, B. (2008). When robots would really be human simulacra: Love and the ethical
in Spielberg’s AI and Proyas’s I, Robot. Film-Philosophy 12 (2), September:
Olivier, B. (2012). Cyberspace, simulation, artificial intelligence, affectionate machines
and being-human. Communicatio (South African Journal for Communication Theory
and Research), 38 (3), 261-278. Available online at
Olivier, B. (2013). Literature after Rancière: Ishiguro’s When we were orphans and
Gibson’s Neuromancer. Journal of Literary Studies 29 (3), 23-45.
Scott, R. (Dir.) (1979). Alien. USA: 20th Century-Fox.
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 337

Sheldrake, R. (1994). The Rebirth of Nature. Rochester, Vermont: Park Street Press.
Stelarc. (Accessed 23 December 2016.)
Turkle, S. (1984). The second self: Computers and the human spirit. New York: Simon &
Turkle, S. (1995). Life on the screen: Identity in the age of the Internet. New York:
Simon & Schuster Paperbacks.
Turkle, S. (2010). Alone together: Why we expect more from technology and less from
each other. New York: Basic Books.
Turkle, S. (2015). Reclaiming Conversation: The Power of Talk in the Digital Age. New
York: Penguin Press.
Von Drehle, D. (2016). Encounters with the Archgenius. TIME, March 7, pp. 35-39.
Welles, O. (Dir.) (1941). Citizen Kane. USA: RKO Radio Pictures.


Dr. Bert Olivier’s principal position is that of Extraordinary Professor of Philosophy

at the University of the Free State, South Africa. He has published academic articles and
books across a wide variety of disciplines, including philosophy, architecture, literature,
psychoanalysis, cinema and social theory. Bert received the Stals Prize for Philosophy in
2004, and a Distinguished Professorship from Nelson Mandela Metropolitan University
in 2012.
Artificial Intelligence (A.I.), vi, vii, x, xi, 1, 2, 17,
20, 21, 71, 72, 101, 102, 118, 119, 142, 147, 150,
168, 169, 230, 252, 277, 278, 295, 298, 299, 301,
10-fold crossvalidation, 86, 87, 94, 96, 99
314, 321, 322, 324, 325, 327, 330, 332, 334, 335,
A Artificial Neural Networks, v, ix, 44, 56, 101, 103,
128, 143, 147, 150, 153, 154, 156, 158, 162, 169,
a Hybrid Simulation System, 262 172, 277, 278, 279, 283, 289, 294, 295, 296, 297,
adaptation, 12, 60, 87, 93, 141, 143, 156, 283 298, 299
adaptation algorithms, 283 assessment, 15, 175, 199, 200, 201, 202, 204, 214,
affection, 321, 327 216, 217, 218, 221, 223, 243, 244, 249, 250, 325,
Agent-Based Modeling Simulation, vi, 255 326
Agent-Based Simulation and Validation, 255, 264, authenticity, 327, 328
275 automation, 50, 51, 63, 73
AI consciousness, 323 autonomous navigation, 50
algorithm, vii, ix, 1, 3, 5, 7, 8, 9, 10, 11, 12, 13, 14, Autonomous Vehicles, v, viii, 49, 50, 51, 52, 54, 60,
17, 18, 20, 26, 27, 28, 29, 32, 37, 54, 55, 56, 57, 61, 62, 63, 70, 71, 73
68, 69, 75, 76, 79, 84, 85, 86, 88, 89, 94, 95, 96,
97, 99, 100, 101, 103, 121, 123, 127, 128, 129,
130, 132, 137, 141, 143, 145, 147, 150, 151, 154, B
156, 157, 158, 160, 162, 163, 164, 167, 172, 177,
backpropagation, ix, 28, 32, 43, 56, 58, 73, 148, 150,
185, 237, 252, 273, 281, 283, 291, 310
151, 154, 156, 164, 187, 283, 291, 297, 310
alliance partners, 256
Algorithm, 28, 32, 151, 153, 164, 310
analytical framework, 301, 304
bacteria, 288, 290, 296
android, 321, 331, 335
banks, 262, 303
ANFIS, 150, 289, 294
base, 14, 133, 168, 173, 189, 197, 198, 203, 204,
antibiotic activities, 288
206, 214, 215, 216, 218, 221, 223, 229, 230, 231,
anti-inflammatory, x, 277, 278, 284, 290, 295
232, 233, 234, 235, 237, 238, 239, 240, 241, 243,
anti-inflammatory drugs, 290
248, 256, 298, 323
antimicrobial, x, 277, 278, 284, 288, 289, 290, 292,
Based on co-association matrix, 11
293, 295, 297, 298
Based on graph partition:, 11
antioxidant, x, 277, 278, 284, 285, 286, 287, 289,
Based on information theory, 12
290, 291, 292, 293, 295, 296, 297, 298, 299
Based on relabeling and voting, 10
Antiviral Activities, 290
architecture design, 156
340 Index

behaviors, ix, 101, 126, 173, 255, 256, 257, 259, chemical characteristics, 150
260, 262, 264, 268 chemical industry, 148
behaviors of customers, 268 chemical interaction, 278, 289
benefits, vii, 56, 172, 257, 304 chemical properties, 148, 281
bias, 56, 68, 80, 155, 156, 157, 160, 163, 199, 209, chemical reactions, 290
223, 279, 292 chemical structures, 15
biggest existential threat, 325 chemicals, 289, 290
Bioactivity(ies), vi, x, 277, 278, 283, 292 chemometrics, 291, 296
bioinformatics, 3, 18, 119 children, 239, 241, 327
biomarkers, 285, 291, 295 China, 143, 303, 306, 307, 308, 312, 316
Boltzmann machines, 29, 56, 71 Chinese medicine, 296
borrowers, 256, 262, 263, 264, 266, 267, 268, 270, chromatography, 284, 294
272 chromatography analysis, 294
brain, 203, 324, 325, 326, 331 chromosome, 158, 160, 175, 176
branching, 23, 24 chromosome representation, 158
Brazil, 49, 50, 73 citizens, 50
bullwhip effect, 125, 126, 144 classes, 2, 6, 39, 54, 57, 64, 67, 69, 86, 200, 202,
business environment, 122 208, 209, 210, 211, 212, 213, 214, 215, 216, 217,
business model, x, 255, 256, 257, 260, 261, 262, 263, 218, 221, 223, 267
269, 270, 273, 304 classification, viii, ix, 8, 15, 20, 22, 32, 45, 46, 49,
business processes, 256, 262, 272 54, 56, 57, 59, 64, 66, 70, 72, 76, 99, 100, 101,
buyer, 262 103, 105, 107, 108, 110, 115, 117, 118, 119, 185,
189, 234, 260, 273, 278, 281, 285, 295, 298
clients, 103, 260
climate change, 303, 307, 308
clustering, vii, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
C++, 84, 99
15, 16, 17, 18, 19, 20, 21, 22, 56
C2C ecommerce, 257, 260, 261
clustering algorithm(s), 4, 5, 7, 9, 10, 11, 12, 14, 15,
cancer, 100, 285, 291, 298
20, 22
carbon, 150, 181, 182, 315
clustering process, 4
case studies, 41, 42, 43, 44, 257, 262
clusters, 3, 4, 5, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
case study, x, 40, 52, 96, 97, 121, 134, 171, 183, 229,
21, 22, 37, 174
241, 242, 255, 257, 261
CMC, 306
case-based reasoning, vi, vii, x, 229, 230, 250, 251,
CNN, 56, 57, 58, 59, 64, 66, 69
CO2, 303, 304
cash, 269, 302
coal, x, 301, 302, 303, 304, 305, 306, 307, 308, 309,
cash flow, 302
311, 312, 313, 314, 315, 316, 317, 318
categorization, 4, 18
co-association matrix, 11, 15, 22
category a, 238, 239, 328
coding, 77, 108, 158
category d, 323
cognition, 61
causal inference, 103
cognitive skills, 52
causal relationship, 308
collective unconscious, 334
CEC, 102
Colombia, 49, 64, 73, 121, 145, 193, 275, 301, 306,
cell culture, 292
challenges, 50, 190, 195, 230, 255, 256, 257, 260,
color space conversion, 66
304, 317
combined automation, 50
changing environment, 172, 257
commerce, vii, x, 75, 76, 77, 99, 175, 273, 274, 275
chemical, x, 15, 20, 148, 150, 278, 281, 283, 286,
communication, 33, 162, 260, 267, 331
288, 289, 290, 291, 292, 294, 299
community, xi, 75, 189, 256, 278
Index 341

comparative analysis, 164 convolution layer, 57, 58

compensatory effect, 308 convolutional neural networks, viii, 56, 57, 72
competition, 72, 122, 258, 317 cooling, 147, 148, 149, 150, 151, 168, 169
competitors, 256, 260 cooperation, 63, 70, 163
complex interactions, 256 correlation, 57, 108, 117, 184, 283, 298, 308, 312,
Complex Natural Products, vi, x, 277 316
complexity, viii, x, 7, 11, 15, 23, 24, 37, 39, 40, 41, cosmetic, 278, 285, 294
44, 45, 75, 76, 106, 159, 172, 173, 183, 189, 192, cost, 84, 122, 126, 182, 230, 245, 250, 259, 260, 262,
196, 199, 200, 201, 209, 210, 212, 216, 218, 219, 272, 304, 309, 312, 314
221, 229, 230, 232, 255, 256, 257, 259, 260, 261, creative process, 37
278, 279, 284, 285, 289, 290, 292, 293, 306, 310, creative thinking, 324
314, 326 crossover, 77, 78, 79, 80, 81, 82, 86, 87, 88, 89, 91,
complications, 328 92, 94, 99, 159, 160, 161, 163, 176, 177, 178, 179
composition, 277, 278, 284, 285, 286, 287, 289, 296, crossvalidation, 76, 77, 86, 88, 94, 99, 100, 101, 184,
298 311
compounds, 286, 287, 291, 293, 298 crowding levels, 197
computation, 68, 77, 100, 101, 125, 142 culture, 173, 174, 290, 325, 329, 334
computer, 20, 23, 24, 36, 42, 52, 57, 73, 105, 109, curcumin, 287, 294
119, 151, 176, 177, 178, 179, 188, 251, 275, 322, customers, 124, 133, 244, 256, 260, 262, 267, 268
324, 325, 326, 327, 330, 331, 332, 333 cutting force, ix, 147, 148, 149, 151, 152, 154, 161,
computer simulations, 52 164, 166, 167, 168, 169
computer technology, 327 cutting forces, ix, 147, 148, 149, 151, 152, 153, 154,
computing, 19, 33, 34, 35, 36, 54, 68, 69, 73, 100, 165, 167, 168, 169
109, 160, 164, 171, 173, 174, 279, 293, 298, 319 cyberspace, 332
conception, 279, 326 cycles, 26, 218, 281, 309
conceptual model, 199, 224, 273 cycling, 132
concurrency, 23, 24 cyclooxygenase, 290
conference, 17, 45, 46, 73, 102, 119, 252, 273, 274
configuration, 30, 35, 36, 40, 156
consciousness, 323, 325
consensus, vii, 1, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16,
Darwinian evolution, 172
18, 19, 20, 21, 22, 172, 204, 205, 206, 214, 215,
data analysis, 3, 16, 19, 21, 64, 151
216, 217, 218, 303, 306, 307, 308, 316
data augmentation, 64, 66
consensus clustering, vii, 1, 5, 6, 7, 8, 9, 10, 12, 13,
data gathering, 304
14, 15, 16, 17, 19, 20, 22
data generation, 24
constituents, 291, 297
data mining, 2, 17, 19, 20, 22, 100, 110, 174, 305,
construction, 103, 327
309, 318
consumers, 256, 257, 261, 262, 264, 272, 307, 308
data processing, 153
consumer-to-consumer (C2C), 256
data resampling, 9
Ecommerce, 255, 273, 274
data set, 3, 15, 16, 64, 65, 69, 99, 117, 152, 153, 160,
Lending, 261
166, 167, 186, 292, 314
consumption, 148, 307, 308
database, 3, 18, 64, 174, 197, 203, 204, 205, 206,
continuous data, 172
209, 230, 234, 309, 311
continuous wavelet, 108
database management, 230
control theory, 125, 126, 144
DCT, 108, 109, 111, 112, 113, 114
controversial, 50, 288
decision makers, 172, 196, 230
convergence, vii, ix, 75, 76, 78, 79, 82, 87, 121, 127,
decision making, 171, 196, 197, 250, 251, 255, 256,
129, 138, 141, 148, 154, 156, 163, 167, 306
261, 274
conversations, 322, 323
342 Index

decision trees, 234 embodiment, 321, 322, 331

decision-making process, 305 emergency department, ix, 195, 196, 202, 206, 209,
declarative knowledge, 52 221, 224, 225, 232, 250, 251, 252
decomposition, 40, 108, 110 emergency physician, 196
deep belief neural networks, 23, 28, 29, 44 emotion, 324, 325
deep learning, v, vii, viii, 23, 24, 39, 44, 45, 49, 51, employees, 233, 250
52, 54, 56, 57, 59, 60, 63, 66, 70, 172 employment, 122, 124, 263
defuzzifier, 197, 203, 206 energy, 30, 109, 278, 304, 308, 309, 316, 317
degradation, 230, 286, 299 engineering, 18, 22, 46, 73, 153, 175, 183, 189, 192,
Delphi methodology, x, 301, 303, 304, 305, 306, 275, 319, 327, 331, 335
316, 318 Ensemble clustering, 13, 18, 21
demand patterns., 256, 273 environment, x, 44, 50, 51, 60, 61, 70, 77, 94, 105,
depth, 57, 58, 150, 324 151, 156, 172, 175, 182, 183, 221, 230, 244, 255,
derivatives, 68, 78, 129, 144, 295, 296, 306 257, 259, 262, 268, 308, 330, 332
Descartes, René, 323 equilibrium, ix, 121, 124, 125, 136, 138, 141
detection, 34, 40, 117, 118 equilibrium point, 125, 136, 138, 141
deviation, 86, 90, 91, 93, 124, 125, 161, 164 equipment, 133, 174, 182
differential equations, 257, 259 ergonomics, 47, 192
dimensionality, 45, 59, 77, 106 essential oils, 278, 285, 288, 289, 290, 291, 292, 295,
discrete behaviors, 255, 257 297
Discrete Events Simulation, 229, 230 evidence, 18, 19, 308
discrete variable, 173 evolution, 96, 109, 150, 156, 158, 159, 175, 179,
discriminant analysis, 107, 118, 288 224, 295
discrimination, 106, 119 Evolutionary algorithms, 69, 77, 96, 127, 158, 163,
distinctness, 321 171, 172, 175, 190
distribution, 40, 67, 125, 131, 264, 283 evolutionary computation, 163
divergence, 32 Evolutionary Optimization, v, viii, 75, 101
diversification, 107 Evolutionary Programming (EP), 172, 176
diversity, 4, 5, 14, 15, 17, 18, 79, 87, 106, 159, 160, Evolutionary Strategies (ES), 172, 176
163, 304, 305 exchange rate, 307
DNA, 291, 299 execution, 24, 25, 28, 44, 58, 173, 258
DNA damage, 291, 299 exercise, 176
dreaming, 324, 335 experimental condition, 294
Dropout Layer, 59 experimental design, 284
dynamic systems, 126 Expert Knowledge, 196, 207
expert systems, 172
expertise, 103, 145, 193, 205, 259, 273, 275, 318,
exploitation, 78, 82, 284, 307, 308
E. coli, 288, 289, 298
exponential functions, 109
e-commerce, vi, x, 102, 255, 256, 257, 258, 259,
external environment, 60
260, 261, 269, 272, 274
extraction, 2, 70, 105, 106, 107, 117, 119, 156, 284,
economic growth, 230
298, 308
economic landscape, 258
extracts, 29, 59, 277, 278, 286, 287, 290, 297, 298
economics, 308, 317
extrusion, 284, 296
educational background, 145, 193, 275, 318, 319
electricity, 278, 307, 308, 309
electroencephalography, 46
electromagnetic, 142
Index 343

fuzzy rules, 197, 198, 200, 201, 202, 203, 204, 205,
214, 215, 216, 217, 218
fuzzy sets, 198
fabrication, 182
fuzzy theory, 12
face validity, 248, 250
facial expression, 329
fantasy, 324, 328, 335 G
Fast Fourier transform, 109
feature selection, 18, 77, 96, 108 GA-Based Artificial Neural Networks, 158
feelings, 328, 329 gene expression, 17, 18, 19
FFT, 109, 111, 112, 113, 114, 117 gene pool, 159
filters, 57, 58, 59, 106, 184 generalizability, 106, 117
financial, x, 250, 256, 257, 259, 302, 303, 306 generalization performance, viii, 75, 96, 99, 101, 107
financial institutions, 256 generalized cross validation, 314
financial markets, 303, 306 Genetic Algorithms (GAs), v, vii, viii, 17, 20, 75, 76,
fitness, 77, 78, 79, 80, 82, 87, 90, 91, 92, 93, 94, 95, 77, 78, 79, 84, 85, 88, 89, 94, 95, 96, 97, 99, 100,
99, 130, 131, 158, 159, 160, 161, 162, 163, 164, 101, 102, 103, 123, 142, 143, 144, 147, 148, 151,
175, 176, 179 158, 172, 176, 177, 179, 230, 252, 283, 291
flank, 149, 150 genetic code, 82
flavonoids, 286, 299 genetic diversity, 82, 87
flexibility, 78, 99, 127, 174 genetic endowment, 330
flight, 162, 183, 186, 333 genetic programming, vii, ix, 96, 171
fluctuations, 124, 126, 136, 272, 278, 303, 304 Genetic Programming (GP), v, vii, ix, 96, 171, 172,
food industry, 277, 278, 285, 294 175, 176, 180, 183, 185, 188, 190
food products, 283, 285 genetics, 175, 283
food safety, 283, 294 GenIQ System, 185
food spoilage, 288 Germany, 50, 144, 303, 306
force, 126, 149, 150, 151, 154, 161, 164, 165, 166, global competition, 122
167, 169, 292 global economy, 260
forecasting, 122, 278 global markets, 122
formula, 59, 108, 127, 131, 132, 206 global warming, 303
foundations, 5, 7, 75, 76 glucosinolates, 287
fractal analysis, 299 glutathione, 291
free association, 324, 335 Google, 51, 119, 325
freedom, 62, 310 gram stain, 288
Freud, Sigmund, 324 graph, 11, 12, 13, 15, 17, 18, 55, 187
Full Self-Driving Automation, 51 greedy algorithm, 29
function values, 127 greedy layer-wise algorithm, 32
Function-specific, 50 growth, 230, 258, 274, 283, 288, 291
fungi, 290 guidance, 174
fusion, 109, 114, 115, 117, 174, 189, 190, 192 guidelines, 273
fuzzifier, 197, 203, 205, 206
fuzzy inference engine, 197, 198, 203, 204, 206, 214
fuzzy inference systems, 150, 199
Fuzzy logic, 72, 196, 201, 203 Hawking, Stephen, 325
system, 197, 198, 199, 200, 203, 298
health, 103, 182, 192, 198, 200, 226, 233, 251, 252,
fuzzy membership, 200 291, 297, 299
fuzzy rule base, 203, 204, 214, 215, 216
health care, 198, 200, 226, 252
344 Index

Healthcare, vi, ix, x, 103, 195, 196, 204, 205, 224, income, 2, 49, 260, 262, 263, 264, 269, 270, 271,
225, 229, 251 272, 302
healthcare experts, ix, 196, 197, 204, 221, 229, 248 incompatibility, 292
healthcare sector, vi, x, 206, 229, 250 independence, 132
healthcare services, 196, 230, 243, 251 independent variable, 202, 278, 296
Heidegger, Martin, 323 indexing, 233, 234, 238, 250
herbal extracts, 278 India, 192, 306, 307, 308
herbal medicines, 284 individuals, xi, 77, 79, 80, 82, 88, 90, 94, 96, 97,
heterogeneity, 96 158, 159, 162, 175, 176, 177, 179, 185, 186
hierarchical clustering, 4, 7, 13, 14, 20 induction, 52, 234, 239, 243, 287
Hierarchical fuzzy systems, 198 induction period, 287
hierarchical model, 63 industrialization, 303
high performance computing, vii industry, 103, 145, 147, 148, 193, 225, 226, 232,
High performance manufacturing, 148 255, 262, 275, 278, 283, 285, 304, 318
high pressure cooling, 147, 148, 168 inertia, 131, 137, 163
high-pressure cooling, 148, 149, 169 information processing, 72, 73, 118, 153
Hill-climbing methods, 129 information retrieval, 3, 18
hiring, 134 infrastructure, 33, 122, 195, 258
histogram, 109 ingredients, 278, 284
historical data, 50, 185, 248, 305, 309 inhibition, 290, 298
history, x, 264, 272, 290, 303, 317, 318, 322, 325, inositol, 287
329, 330, 332, 335 instability, 121, 124, 125, 126, 141, 145
Hopfield neural networks, 56 institutions, 203, 309
HPC, 147, 148, 149, 151, 154, 164, 167 integration, 77, 197, 255
human, xi, 37, 47, 51, 52, 63, 117, 173, 195, 203, integrity, 174
259, 278, 321, 322, 323, 324, 325, 326, 327, 328, intelligence, xi, 16, 17, 18, 20, 21, 176, 225, 251,
329, 330, 331, 332, 335, 336, 337 288, 321, 322, 324, 325, 330, 331, 332, 334, 335,
human behavior, 173 336
human body, 323, 325, 326, 331 intelligent systems, 51, 257
human brain, 323, 326 intensity values, 286, 299
human condition, 332 intensive care unit, 243
human development, 323 interface, 63, 148, 174, 241, 262, 268
human experience, 328 interoperability, 174
husbandry, 278 intervention, 259, 291
hybrid, ix, 13, 26, 27, 96, 121, 122, 129, 141, 143, intimacy, 328
145, 261, 262, 274, 275, 319, 323 investment, 124, 257, 269, 302
hybrid algorithm, ix, 27, 129, 141 ionizing radiation, 299
hybrid optimization, 121 issues, 47, 50, 125, 173, 183, 257
Hybrid simulation, 257, 261 iteration, 69, 123, 128, 130, 131, 132, 137, 158, 162,
hypercube, 275 163


image, 8, 18, 20, 56, 57, 58, 59, 64, 66, 71, 72, 82, Japan, 117, 143, 303, 333
83, 105, 107, 108, 109, 110, 111, 115, 117, 118, Jordan, 8, 19
119, 326, 333, 335 justification, 245
improvements, 223, 302
in vitro, 285, 290, 295, 298
Index 345

172, 185, 189, 190, 192, 193, 230, 231, 275, 295,
302, 317, 318
machine pattern recognition, 106
kaempferol, 286
magnitude, 109, 124, 293, 306
kernel method, 12
majority, 8, 185, 221, 324, 326, 329
knowledge acquisition, 42, 230
management, viii, 24, 25, 27, 34, 35, 36, 41, 42, 43,
knowledge base, 197, 198, 203, 204, 206, 214, 216,
44, 122, 125, 133, 134, 135, 136, 144, 145, 174,
218, 221, 223
189, 193, 204, 206, 225, 256, 258, 259, 260, 261,
Knowledge discovery, 183
267, 271, 273, 275, 318, 319
Korea, 145
Mandela, President Nelson, 337
Kuwait, 251
manipulation, 62, 106
manpower, 139
L manufacturing, vii, ix, 121, 122, 125, 126, 133, 138,
139, 143, 148, 169, 182, 274, 319
labeling, 10, 11 mapping, 70, 261, 275
layered architecture, 63 market share, 133, 260, 271
LC-MS, 297 marketing, 173, 189, 308
learning ability, 310, 313 marketplace, 259
learning blocks, 29 Markov autoregressive input-output model, 51
Learning by analogy:, 53 Markov Chain, 32
Learning from examples, 53 materials, vii, 134, 135, 147, 148, 168, 169, 173,
Learning from instruction, 53 181, 182, 188, 190
learning methods, 3, 52 mathematical programming, 122
learning process, viii, 52, 53, 56, 66, 156, 157, 263 mathematics, 126
learning task, 54, 156 matrix, viii, 11, 13, 15, 19, 22, 29, 43, 58, 105, 106,
legend, 214, 215 107, 108, 109, 110, 111, 117, 118, 119, 185, 189,
lending, x, 255, 256, 261, 268, 269, 272 238, 239
Lending Club, 257, 261, 263, 270, 272 matter, 66, 203, 204, 207, 209, 326, 328, 331, 334
light, 83, 329, 335 measurement, ix, 2, 46, 133, 151, 196, 282, 289, 295
Limited Self-Driving Automation, 51 media, vii, 102, 292
Linear Discriminant Analysis, 106, 297 median, 8, 10, 12, 13
linear model, 122, 298, 314 mediation, 256, 334
linear programming, 15 medical, 133, 209, 224, 225, 226, 233, 234, 241,
linear systems, 125 244, 250, 251, 252
linoleic acid, 285 medical expertise, 209
lipid peroxidation, 291 medicine, vii, x, 206, 224, 251, 277, 290
liquidity, 257, 261 membership, 200, 201, 202, 203, 204, 205, 206, 207,
Listeria monocytogenes, 289, 298 208, 209, 210, 211, 212, 213, 218
liver, 111, 112, 113, 114, 116 memory, 33, 40, 53, 69, 107, 130, 163, 324, 325,
loans, 263, 266, 267, 268 331, 334
local adaptation, 12 mental activity, 324
Local Ternary Patterns, 107 Merleau-Ponty, 323, 325, 326, 332, 336
long-term customer, 133 message passing, 33
messages, 25, 26, 28, 36, 52, 267
M methodology, ix, x, 9, 12, 15, 24, 41, 43, 82, 92, 93,
100, 122, 123, 126, 143, 173, 179, 189, 232, 234,
Machine Learning, v, viii, 2, 19, 20, 21, 44, 45, 49, 235, 236, 239, 241, 247, 252, 259, 274, 284, 297,
50, 51, 52, 53, 56, 57, 61, 62, 63, 70, 71, 72, 73, 301, 303, 304, 305, 316, 317
75, 100, 101, 102, 103, 108, 117, 118, 119, 145, microorganisms, 288, 289
346 Index

Microsoft, 60, 69, 102 281, 283, 288, 289, 293, 294, 295, 296, 297, 298,
mind, 321, 323, 324, 325, 326, 327, 335 299, 301, 304, 309, 310, 311, 312, 313, 314, 316,
Missouri, 46, 105, 119, 192 318
mixed discrete-continuous simulations, 262 neurons, 55, 57, 58, 59, 60, 153, 154, 155, 156, 164,
mobile robots, 60, 63, 72 263, 264, 279, 281, 292, 293, 310, 311, 312, 314
model specification, 278 New South Wales, 306
modeling environment, 244 next generation, 82, 90, 179
modelling, 145, 149, 193, 251, 275, 294, 296 nickel, ix, 147, 148, 149, 151, 168
models, vii, viii, ix, x, 19, 25, 29, 30, 49, 61, 62, 69, nickel-based alloys, 148, 149, 168
70, 71, 75, 76, 77, 94, 96, 97, 98, 99, 101, 103, Nietzsche, Friedrich, 323
123, 124, 126, 127, 142, 143, 144, 145, 147, 149, NIR, 287
151, 153, 165, 167, 171, 172, 173, 174, 179, 183, NMR, 299
184, 185, 187, 189, 248, 250, 255, 256, 257, 258, No-Automation, 50
259,261, 262, 272, 273, 278, 285, 289, 290, 295, nodes, 11, 24, 25, 26, 28, 36, 37, 39, 55, 56, 59, 156,
298, 301, 303, 304, 308, 309, 312, 314 157, 160, 238, 239, 314
modifications, 82 nonlinear dynamic systems, 124
momentum, 156 nonlinear systems, 122, 125, 127
motion control, 60, 62 normal distribution, 67
multi-class support vector machine, 51 N-P complete, 8
multidimensional, 2, 17, 128, 129, 162 Nuclear Magnetic Resonance, 286, 297
multiple regression, 224 numerical analysis, 125
multiplier, 123 nurses, 196, 219, 221, 230, 234, 237, 242, 243, 244,
Multivariate Adaptive Regression Splines, 314 245, 246
Musk, Elon, 325
Mutation, 77, 78, 80, 81, 82, 86, 87, 88, 89, 92, 93,
94, 99, 100, 102, 159, 160, 161, 163, 176, 179,
180, 191
observed behavior, 122
mutation rate, 81, 82, 87, 89, 92, 93, 99, 100, 102,
obstacles, 62
oil, 286, 287, 288, 289, 294, 295, 297, 298, 303, 304,
myocardial infarction, 295
308, 312
oil samples, 286
N oligopolies, 302, 304
oligopoly, 303
narratives, 328 one dimension, 59, 106, 107
NASA Shuttle, 46, 171 operating system, 322, 323
natural evolution, 77, 94, 172 operations, vii, 55, 68, 128, 133, 145, 176, 189, 204,
natural gas, 304 206, 225, 229, 232, 233, 250, 262, 273
natural products, x, 277, 278, 283, 285, 288, 292, operations research, vii, 189
294, 299 opportunities, 70, 148, 172, 260, 304
natural selection, 175 optimal PSO parameters, 164
near infrared spectroscopy, 295 optimization, vii, viii, 8, 10, 14, 17, 19, 24, 54, 56,
Nearest Neighbor Approach, 237 68, 71, 75, 76, 77, 79, 96, 101, 102, 106, 121,
Neural Network Model, 296 122, 123, 125, 126, 127, 128, 129, 130, 132, 136,
Neural Networks, v, vi, vii, ix, x, 17, 18, 22, 23, 24, 137, 141, 142, 143, 144, 145, 147, 148, 151, 156,
28, 32, 39, 44, 45, 54, 55, 56, 57, 59, 60, 66, 67, 157, 158, 160, 161, 162, 163, 164, 172, 175, 190,
69, 70, 71, 72, 73, 75, 76, 97, 101, 102, 119, 123, 193, 230, 233,243, 248, 251, 275, 278, 284, 318
142, 143, 147, 148, 153, 154, 156, 168, 169, 184, optimization method, 122, 142, 162, 163
187, 188, 230, 255, 263, 264, 273, 277, 278, 279, ordinary differential equations, 259
Index 347

organic compounds, 284 policy, ix, 21, 72, 122, 125, 126, 127, 129, 133, 135,
Overcrowding, vi, ix, 195, 196, 221, 225 136, 137, 138, 139, 140, 142, 143, 144, 224, 259,
overtime, 133, 134 261
oxidation, 285, 287, 294, 297, 298 policy development, 261
oxidative stress, 291 policy options, 125
oxygen, 298 Policy Robustness, 139
polysaccharides, 299
polyunsaturated fat, 286
polyunsaturated fatty acids, 286
population, 18, 77, 79, 87, 94, 95, 127, 158, 159,
parallel, vii, viii, 23, 24, 33, 34, 39, 40, 42, 43, 54,
160, 161, 162, 163, 164, 175, 176, 177, 179, 230,
80, 142, 275, 319, 333
Parallel distributed discrete event simulation
population growth, 230
(PDDES), viii, 23, 24
population size, 159, 161, 164, 179
Parallel Distributed Simulation, viii, 24
Powell Hill-Climbing Algorithm, 129, 132
parallel implementation, 80
power generation, 303
parallelism, 24, 25, 40
predictability, 185, 186, 258
parents, 79, 81, 88, 91, 92, 94, 99, 159, 175, 179, 239
Predictive analytics, v, vi, ix, x, 171, 172, 173, 179,
Partial Least Squares, 297
183, 189, 301, 302, 305
partial least-squares, 299
predictive modeling, 171, 173, 179, 183
participants, 174, 255, 262, 267, 306, 307
predictor variables, 314
Particle Swarm Optimization, v, vii, ix, 14, 17, 121,
principal component analysis, 106
127, 142, 143, 145, 147, 148, 151, 158
principles, 43, 173, 175, 277, 289
partition, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 21
probability, 14, 15, 20, 29, 30, 32, 44, 59, 79, 80, 81,
Pattern Classification, viii, 105
87, 92, 158, 159, 161, 176, 179, 258, 268
pattern recognition, viii, 18, 44, 54, 71, 77, 100, 106,
probability distribution, 29, 30, 32, 258
107, 119, 278
profit, 145, 259, 262, 269
PCA, 106
profit margin, 262, 269
peer-to-peer (P2P), 262
profitability, 269, 271, 273
peer-to-peer lending, x, 255, 256, 268
programming, 33, 36, 100, 151, 171, 178, 189, 191,
pegging, 21
perception, 49, 50, 51, 56, 60, 61, 63, 195, 325, 336
project, 47, 49, 64, 69, 174, 179, 186, 189, 275, 302,
performance indicator, 111, 260
304, 319
performance rate, 81
propagation, ix, 56, 60, 150, 154, 156, 283, 287, 291,
permeability, 288, 308
296, 297
personal goals, 265
pruning, 103, 314
personal history, 332, 335
PSO-Based Artificial Neural Networks, 162
personality, 330, 333
psychoanalysis, 330, 336, 337
pH, 284, 297, 298
psychology, 173, 324, 325
pharmaceutical, 103, 283, 284, 294
pharmacology, 278, 285
phenolic compounds, 287, 296, 298 Q
phenomenology, 326
physical characteristics, 150 quadratic programming, 101
physical laws, 259