Está en la página 1de 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.


An intelligent simulation model of online consumer behavior

Article  in  Journal of Intelligent Manufacturing · August 2012

DOI: 10.1007/s10845-010-0439-7


1 126

2 authors:

Gültekin Çağil Mehmet Bilgehan Erdem

Sakarya University University of Ontario Institute of Technology


All content following this page was uploaded by Mehmet Bilgehan Erdem on 19 April 2016.

The user has requested enhancement of the downloaded file.

J Intell Manuf (2012) 23:1015–1022
DOI 10.1007/s10845-010-0439-7

An intelligent simulation model of online consumer behavior

Gültekin Çağil · Mehmet Bilgehan Erdem

Received: 14 July 2009 / Accepted: 12 July 2010 / Published online: 30 July 2010
© Springer Science+Business Media, LLC 2010

Abstract This paper describes the design of an Intelligent customers and suppliers. These practices are established,
Simulation Model of Online Consumer Behavior (ISMOCB) well tested and maturing, so electronic marketing is tack-
that incorporates a knowledge base using some form of the ling larger problems these days. Affected factors of network
Artificial Intelligence methods such as Naïve Bayes Classi- customer buying provide valuable quality assurance, service
fier and Artificial Neural Networks. This study investigates and marketing data. But the challenge is to use the data to
modeling online consumer behavior by using demographic make decisions that result in substantive action (Xue-wu et al.
characteristics such as age, gender, marital status, educa- 2006). To use research data to solve problems in design, mar-
tional status, monthly income and number of people in the keting, installation, distribution and after sale use and mainte-
family. This will provide producing more synthetic data and nance, one should have a basic understanding the relationship
creating an “Artificial Database” which includes the demo- between purchasing behavior and consumer demographics.
graphics of online consumers and their purchase transactions. E-commerce is a type of electronic trade based on
The model is built for online shopping based on empirical advanced computer network technology and communication
data gathered in Turkey via an online survey. Two different technology platform, which brings dramatic change of the
inference systems are used for which product group is chosen concept of traditional market in its qualitative area for both
by whom has which demographic characteristics. The qual- the time scale and spatial dimension. E-commerce makes the
ity of the data, gathered exclusively for this project, allows a market boundaries dimmer owing to the difference among
fine validation of the simulation results. the traditional geography, political opinion and ideology. The
rapid development of E-commerce has drawn a vast attention
Keywords Intelligent simulation · Data mining · from government, enterprises and academic scholars, and the
Artificial Database · Bayesian classification · researches focus on two aspects: electronic commerce tech-
Artificial Neural Networks · Consumer behavior nology and marketing management under the environment of
e-commerce. For marketing management, the key research
point lies in the modeling of consumer behavior. However,
Introduction the consumer demand and behavior are so complicated that
it is almost impossible to establish completely quantitative
Online sellers have long been able to design a useful model because we can hardly collect all the precise data on
and aesthetically appealing Web site, collect detailed cus- product sales revenue, related marketing activities and large
tomer information, and develop electronic relationships with quantity of consumer information, we can only do quantita-
tive analysis to make all the possible predictions of consumer
behaviors based on many suppositions (Liu 2007).
G. Çağil · M. B. Erdem (B)
For the kind of complex, hybrid system, which include
Department of Industrial Engineering, Sakarya University,
Adapazari, Esentepe, Turkey continuous events, discrete events and decision-making
e-mail: events, modeling process involves many nonstructural, non-
G. Çağil quantitative matters, it’s too complex to describe them with
e-mail: traditional mathematical methods, the artificial intelligent

1016 J Intell Manuf (2012) 23:1015–1022

technologies can be used to solve this complex problem. that it was due to the fact that simulation efforts scattered
Intelligent Simulation Model of Online Consumer Behavior in other magazines besides social magazines, and simulation
(ISMOCB) technology can solve this nonstructural, qualita- purpose focuses on three points: process forecast, principle
tive matter and reach the goal through reasoning. test and result discovery (Axelrod 1997).
Decision process of online consumer models in the sim-
ulation model is rendered by different inference systems.
Online consumers wandering in the intelligent simulation Demographic attributes
model have a free will to buy or not to buy and have choices.
Therefore this model will make it possible to derive signifi- Before building a simulation model one needs to understand
cant data from limited survey data. An “Artificial Database” the particular problem domain (Chick 2006). In order to gain
which includes demographic attributes of online consumers this understanding, some demographic attributes has been
can be built with this synthetic dataset. investigated.
In this study demographic attributes such as age, gender, The search for the right demographic variables can be
marital status, educational status, monthly income, occupa- seen in some of the latest efforts by e-retailers to attract and
tional status and number of people in the family are con- retain their customers. Wallace (2000) notes the emphasis
sidered at the same time different from previous studies on women “as the salvation for some dot-commerce com-
which focus on demographic modeling of online consumer panies”. The increasing use of the Internet by women, their
behavior. growing economic power and their dominant influence on
This paper structure is arranged as follows: first is a brief household shopping behaviors are positive reasons for this
introduction, second is related work review; third is defining emphasis (Wallace 2000). Others are concerned that women-
the demographic attributes, fourth is Intelligent Simulation focused sites run the risk of missing out on male shop-
Model of Online Consumer Behavior (ISMOCB) with dif- pers., for instance, was strongly positioned for the
ferent techniques of artificial intelligence such as Bayesian women’s apparel market but disappointing results has ended
Classification, Artificial Neural Networks and implementa- its tenure.
tions and finally conclusions. A brief literature review supporting the hypotheses regard-
ing the following demographic characteristics of a
consumer—gender, age, education, total gross monthly
Related work household income, and family composition—is reported
When investigating the behavior of complex systems the Gender: Women and men seem to differ in their shop-
choice of an appropriate modeling technique is very impor- ping orientation. Although sex roles have blurred, shop-
tant (Robinson 2004). The choice of modeling technique, the ping is still a gendered activity, particularly in married
relevant literature spanning the fields of Economics, Social households (Fisher and Arnold 1990). For working women,
Science, Psychology, Retail, Marketing, OR, Artificial Intel- shopping along with other household tasks, become a partic-
ligence, and Computer Science has been reviewed. Within ular challenge (Thompson 1996) and can be associated with
these fields a wide variety of approaches are used which negative feelings. Time-pressured working women, in par-
can be classified into three main categories: analytical meth- ticular, have been observed and targeted by direct marketers
ods, heuristic methods, and simulation. Often a combina- (Schiffman and Kanuk 1997). Shopping is also recreational
tion of these is used within a single model (Greasley 2005; for women, generating positive feelings (Bellenger and Kor-
Schwaiger and Stahmer 2003; Siebers et al. 2009). After a gaonkar 1980). The influence of gender on shopping is likely
thorough investigation of the relevant literature, simulation to be quite complex.
has been identified as being the most appropriate approach Age: Consumers’ needs, interests, and resources vary
for our purposes. according to age. Previous studies indicate that consumer
Although people placed quite high expectation on com- innovativeness is lower among older consumers. Older con-
puter simulation in the 1960s, computer simulation did not sumers also tend to be satisfied with conventional shopping
flourish in social and behavioral science field. Until the methods (May and Greyser 1989), on the other hand, report
1990s, with the increase of computer procession speed, lots mixed evidence regarding the influence of age on consumers’
of scholars picked up simulation as a tool to study differ- tendency towards non-store shopping.
ent social process again. Owing to the broad field in social Education: Adoption behavior is likely to be influenced
science, a lot of simulation models swell up recently from by education as innovators tend to be more educated than
pure game theory simulation to the micro world simulation non-adopters. Previous studies have found a positive rela-
in social science. However, no clear social scientists group in tionship between in-home shopping and education (Darian
computer simulation has come into being. Axelrod thought 1987; Donthu and Garcia 1999). What is more, Internet users

J Intell Manuf (2012) 23:1015–1022 1017

have above-average education (Darian 1987; Hoffman et al. Initializing system data
Setting simulation clock
1996). Initializing system's state and
Income: Previous studies support the positive relationship events
between in-home shopping and consumer income. Darian
(1987) found that households in the middle income groups
were most likely to be in-home shoppers. Internet-using Using knowledge base and inference
households are also likely to have higher than average income engine find which transactions will
(Darian 1987; Hoffman et al. 1996).
Family Composition: Number and age of children con-
stitute a major component of the family and household life
cycle concept (Gilly and Enis 1982; Schaninger and Danko Dealing with events
1993). The presence/absence of children as well as the age of
the youngest child has a significant influence on households’ Advance the
simulation clock
needs, resources and expenditures (Solomon 1999). Darian
(1987) found housewives and part-time female workers with
Updating system sate and statistic
preschool children to be one group of potential in-home data

Intelligent simulation model YES

Continue ?

Methods and concepts

Simulation can be used to analyze the operation of dynamic
and stochastic systems showing their development. There are
Print Simulation results
many different types of simulation, each of which has its spe-
cific field of application (Siebers et al. 2009). There has been a
fair amount of modeling and simulation of operational man- Fig. 1 System simulation flow diagram
agement practices, but people management practices have
often been neglected although research suggests that they
crucially impact upon an organization’s performance (Schiff- Simulation strategy
man and Kanuk 1997). One reason for this relates to the
key component of people management practices, an orga- The system simulation strategy adopts the producing con-
nization’s people, who may often be unpredictable in their sumer population and making choices. When the simula-
individual behavior. tion clock is advanced a simulation unit, the consumer will
During the recent past, the interest in research and appli- make the decision of purchase and choice events. The expert
cation of Artificial Intelligent (AI) simulation models has knowledge base and reasoning engine will find these events
increased substantially. These successful application areas that content to occurring conditions, make the corresponding
include aircraft control, missile control, human behavior rep- treatments and change the variable value of system state (each
resents, or computer generation force (CGF) (Liu 2007). different demographic attribute of the consumer, the prod-
The overall aim of our study is to investigate the link uct groups and etc.). The system state changes from S(C) to
between different demographic attributes of online consum- S(C+C). By this purchasing simulation clock is advanced,
ers and product groups chosen by them. This strand of work the simulation system will run continuously. The simulation
focuses on the inference system that makes the decision of flow diagram is shown on Fig. 1.
whether buying a product from a product group or not. This
paper aims to understand and predict how Intelligent Simu- Naïve Bayes Classifier
lation Model of Online Consumer Behavior (ISMOCB) can
determine the decision processes of new consumers in the Bayesian decision theory is the basis of statistical classifi-
model. In this manner more synthetic data can be derived by cation methods (Duda and Hart 1973). It provides the fun-
the knowledge base of intelligent model. damental probability model for well-known classification
The inference systems used in the model will be intro- procedures such as the statistical discriminant analysis.
duced below. These are Bayesian Classification and Artificial The Naive Bayes Classifier technique is based on the so-
Neural Networks. called Bayesian theorem and is particularly suited when the

1018 J Intell Manuf (2012) 23:1015–1022

dimensionality of the inputs is high. Despite its simplicity, Table 1 Age group
Naive Bayes can often outperform more sophisticated clas- Group Age
sification methods (Cagil et al. 2008).
Naive Bayes classifiers can handle an arbitrary number 1 <18
of independent variables whether continuous or categorical. 2 18–24
Given a set of variables, X = {x1 , x2 , x3 , . . . , xn }, we want 3 25–34
to construct the posterior probability for the event C j among 4 35–44
a set of possible outcomes C = {c1 , c2 , c3 , . . . , cd }. In a 5 44–59
more familiar language, X is the predictors and C is the set 6 >59
of categorical levels present in the dependent variable. Using
Bayes’ rule:
p(C j /x1 , x2 , x3 , . . . , xn ) ∝ p(x1 , x2 , x3 , . . . , xn /C j ) p(C j )
where p(C j |x1 , x2 , x3 , . . . , xn ) is the posterior probability of Table 2 Product group
class membership, i.e., the probability that X belongs to C j .
Group Product group
Since Naive Bayes assumes that the conditional probabilities
of the independent variables are statistically independent we 1 Cells and Mobile Phones
can decompose the likelihood to a product of terms: 2 Computers and Comp. Parts

n 3 Books
p(X/C j ) ∝ p(xk /C j ) 4 Health and Cosmetics
k=1 5 Electronics
and rewrite the posterior as: 6 Clothes
7 Mp3 Players

p(C j / X ) ∝ p(C j ) p(xk /C j ) 8 Music
k=1 9 E-Ticket
10 Others
Using Bayes’ rule above, we label a new case X
with a class level C j that achieves the highest posterior
Although the assumption that the predictor (independent)
variables are independent is not always accurate, it does sim- The marital status is defined as single, married or other.
plify the classification task dramatically, since it allows the The educational status is defined as primary school, second-
class conditional densities p(xk |C j ) to be calculated sepa- ary school, high school, college, undergraduate and gradu-
rately for each variable, i.e., it reduces a multidimensional ate/or higher. The occupational status is defined as public,
task to a number of one-dimensional ones. In effect, Naive private, own, student and unemployed.
Bayes reduces a high-dimensional density estimation task Product groups used in the model are categorized as in
to one-dimensional kernel density estimation. Furthermore, Table 2.
the assumption does not seem to greatly affect the poster- Empirical data was collected through a mail survey on
ior probabilities, especially in regions near decision bound- shopping behavior in Turkey. The simulation model is based
aries, thus, leaving the classification task unaffected (Statsoft on this empirical data. An example view of this dataset is
2008). given in the Table 3.
The simulation model is built by using this empirical data.
Intelligent simulation of online consumer behavior with Each attribute’s data was inspected to observe which attri-
Naïve Bayes Classifier bute fits which statistical distribution. Based upon these sta-
tistical distributions, various attributed online consumers are
The demographic attributes concerned in this paper include generated randomly. Furthermore Naive Bayes Classification
gender, age, marital status, educational status, occupation, method is used for which product group is chosen by whom
monthly income and number of people in the family. Online has which demographic characteristics.
consumer’s demographic attributes are presented in columns In the simulation model, an online consumer, consumer’s
of the matrix in Table 3. The gender is defined as grouping choice and number of the consumers which have same attri-
online consumers as male or female, while the age is defined butes and same decisions are represented as a raw of the
as assembling consumers into one of the following groups in matrix. A brief example of simulation outputs is shown in
Table 1. Table 4.

J Intell Manuf (2012) 23:1015–1022 1019

Table 3 Example of empirical

dataset collected through online Gender Age Marital Educational Occupational Monthly Family Product
survey status status status income composition group

2 1 2 4 1 3 3 1
1 3 1 2 4 7 4 2
2 2 2 3 4 9 4 3
1 4 2 3 2 1 3 4
… … … … … … … …
1 4 1 1 3 5 2 2

Table 4 A brief example of

simulation outputs ID Gender Age Marital Educational Occupational Monthly Family Product Quantity
status status status income composition group

546 1 2 2 3 4 3 5 8 33
547 1 2 2 3 4 4 4 1 15
548 1 2 2 3 4 4 4 3 170
549 1 2 2 3 4 4 4 5 16
550 1 2 2 3 4 4 4 8 10
551 1 2 2 3 4 5 5 2 31
552 1 2 2 3 4 5 5 3 35
553 1 2 2 3 4 6 4 1 23

Artificial Neural Networks (Cybenko 1989; Hornik 1991). Since any classification pro-
cedure seeks a functional relationship between the group
Traditional statistical classification procedures such as dis- membership and the attributes of the object, accurate identi-
crimInant analysis are built on the Bayesian decision the- fication of this underlying function is doubtlessly important.
ory (Duda and Hart 1973). In these procedures, an under- Third, neural networks are nonlinear models, which makes
lying probability model must be assumed in order to cal- them flexible in modeling real world complex relationships.
culate the posterior probability upon which the classifica- Finally, neural networks are able to estimate the posterior
tion decision is made. One major limitation of the statis- probabilities, which provide the basis for establishing classi-
tical models is that they work well only when the under- fication rule and performing statistical analysis (Richard and
lying assumptions are satisfied. The effectiveness of these Lippmann 1991).
methods depends to a large extent on the various assump- Artificial Neural Networks are relatively crude electronic
tions or conditions under which the models are developed. networks of “neurons” based on the neural structure of the
Users must have a good knowledge of both data properties brain. They process records one at a time, and “learn” by
and model capabilities before the models can be successfully comparing their classification of the record (which, at the
applied. outset, is largely arbitrary) with the known actual classifica-
Neural networks have emerged as an important tool for tion of the record. The errors from the initial classification
classification. The recent vast research activities in neural of the first record is fed back into the network, and used to
classification have established that neural networks are a modify the networks algorithm the second time around, and
promising alternative to various conventional classification so on for many iterations.
methods. The advantage of neural networks lies in the follow- In the training phase, the correct class for each record
ing theoretical aspects. First, neural networks are data driven is known, and the output nodes can therefore be assigned
self-adaptive methods in that they can adjust themselves to “correct” values—“1” for the node corresponding to the cor-
the data without any explicit specification of functional or rect class, and “0” for the others. It is thus possible to com-
distributional form for the underlying model. Second, they pare the network’s calculated values for the output nodes
are universal functional approximators in that neural net- to these “correct” values, and calculate an error term for
works can approximate any function with arbitrary accuracy each node. These error terms are then used to adjust the

1020 J Intell Manuf (2012) 23:1015–1022

Table 5 Parameters of ANN architecture Intelligent simulation of online consumer behavior with
Network type Feed Forward Back Propagation (FFBP) ANNs
Training function Levenberg–Marquardt (TRAINLM)
Learning function Gradient Descent with Momentum (GDM) Backpropagation is the generalization of the Widrow-Hoff
Performance function Mean Square Error (MSE) learning rule to multiple-layer networks and nonlinear dif-
Transfer function Tangent Sigmoid (TANSIG) ferentiable transfer functions. Input vectors and the corre-
Number of hidden 1
sponding target vectors are used to train a network until it can
layers approximate a function, associate input vectors with specific
Number of neurons 10 for input, 10 for hidden, 1 for output output vectors, or classify input vectors in an appropriate way
each layer as defined. Networks with biases, a sigmoid layer, and a lin-
Epoch size 35 ear output layer are capable of approximating any function
with a finite number of discontinuities.
weights in the hidden layers so that, hopefully, the next Properly trained backpropagation networks tend to give
time around the output values will be closer to the “correct” reasonable answers when presented with inputs that they
values. have never seen. Typically, a new input leads to an output

Fig. 2 a Training, validation

and test graph. b Training sets
and validation checks

J Intell Manuf (2012) 23:1015–1022 1021

similar to the correct output for input vectors used in training Classifier and Artificial Neural Networks to determine which
that are similar to the new input being presented. This gen- product group will be chosen by which online consumer
eralization property makes it possible to train a network on has which demographic attributes. As a consequence of our
a representative set of input/target pairs and get good results Intelligent Simulation Model of Online Consumer Behavior
without training the network on all possible input/output (ISMOCB), decision process handled by the inference sys-
pairs. tem provided independent and various synthetic data. Now
The online survey dataset consists of 527 consumers and it is possible to build an “Artificial Database” that includes
their transactions. 756 subjects participated in the survey, demographic attributes and purchase transactions of online
7 of them were discarded because of missing data, leaving consumers.
749 respondents in the final sample. Of the sample, 70.36%
(n = 527) reported that they had experience in purchasing
online, and 29.64% of subjects (n = 22) reported that they References
did not experienced online shopping for various reasons. 85%
of the dataset (450) is used for training set and the rest (77) Axelrod, R. (1997). The complexity of cooperation: Agent-based mod-
els of cooperation and collaboration. Princeton, NJ: Princeton
is used for the test dataset.
University Press.
A feed forward back propagation network which has three Bellenger, D. N., & Korgaonkar, P. K. (1980). Profiling the recreational
layers is used in the model. Input layer has 10 neurons, hid- shopper. Journal of Retailing, 56, Fall, 77–92.
den layer has 10 neurons and output layer has one neuron. Cagil, G., Erdem, M. B., & Topal, B. (2008). Analysis of purchasing
behavior of online consumers. YA/EM 2008.
Levenberg-Marquardt (trainlm) training algorithm is used in
Chick, S. E. (2006). Six ways to improve a simulation analysis. Journal
the model. Hidden layer size and number of neurons per hid- of Simulation, 1, 21–28.
den layer are determined with trial and error method. The Cybenko, G. (1989). Approximation by superpositions of a sigmoidal
network ran through about 35 training epochs in 11 s. The function. Mathematics of Control, Signals, and Systems, 2, 303–
architecture of ANN model and its parameters are shown in
Darian, J. C. (1987). In-home shopping: Are there consumer seg-
Table 5. ments?. Journal of Retailing, 63(2), 163–186.
There are generally four steps in the training process: Donthu, N., & Garcia, A. (1999). The internet shopper. Journal of
Advertising Research, 39(3), 52–58.
Duda, P. O., & Hart, P. E. (1973). Pattern classification and scene
1. Assemble the training data.
analysis. New York: Wiley.
2. Create the network object. Fisher, E., & Arnold, S. J. (1990). More than a labor of love:
3. Train the network. Gender roles and christmas shopping. Journal of Consumer
4. Simulate the network response to new inputs. Research, 17(December), 333–345.
Gilly, M. C., & Enis, B. M. (1982). Recycling the family life cycle:
A proposal for redefinition. In A. A. Mitchell (Ed.), Advances in
The network has achieved a significant training success as consumer research (pp. 271–276). 9 Ann Arbor, MI: Association
seen in Fig 2a, b. for Consumer Research.
The simulation results of Neural Networks perform bet- Greasley, A. (2005). Using DEA and simulation in guiding operating
units to improved performance.
ter then Naive Bayes Classifier technique results. The rea-
Hoffman, D. L., Kalsbeek, W. D., & Novak, T. P. (1996). Internet and
son behind this situation is that the advance of ANNs learn- web use in the United States: Baselines for commercial develop-
ing algorithm. The performance of ANN can be improved ment. Special Section on “Internet in the Home,” Communications
with changing its parameters by trial and error. FFBP Neu- of the ACM, 39(December), 36–46.
Hornik, K. (1991). Approximation capabilities of multilayer feedfor-
ral Network does not require any rule base or expert opin-
ward Networks. Neural Networks, 4, 251–257.
ion. With appropriate dataset and learning rules, ANN has Liu, M. (2007). A study on the qualitative simulation of consumer
better results. The experimental results show that Neu- behavior under the environment of electronic commerce. ISKE-
ral Network Classification performance converge 89% real 2007 Proceedings, Advances in Intelligent Systems Research.
May, E. G., & Greyser, S. A. (1989). From-home shopping: Where is it
leading? In L. Pellegrini & S. K. Reddy (Eds.), Retail and market-
ing channels—Economic and marketing perspectives on producer-
distributor relationships (pp. 216–233). London: Routledge.
Conclusions Richard, M. D., & Lippmann, R. (1991). Neural network classifi-
ers estimate Bayesian a posteriori probabilities. Neural Comput-
ing, 3, 461–483.
Different from the previous researches based on demo- Robinson, S. (2004). Simulation: The practice of model development
graphic attribute-based simulation, various attributes such and use. Chichester, UK: Wiley.
as age, gender, marital status, educational status, monthly Schaninger, C. M., & Danko, W. D. (1993). A conceptual and empirical
comparison of alternative household life cycle models. Journal
income, occupational status and number of people in the of Consumer Research, 19(March), 580–594.
family have been considered at the same time. Furthermore Schiffman, L. G., & Kanuk, L. L. (1997). Consumer behavior (6th
different inference systems are used such as Naïve Bayes ed.). NJ: Prentice Hall.

1022 J Intell Manuf (2012) 23:1015–1022

Schwaiger, A., & Stahmer, B. (2003). SimMarket: Multi-agent based Statsoft. (2008). Text book:
customer simulation and decision support for category manage- stnaiveb.html. Accessed 11 December 2008.
ment. In M. Shillo et al. (Eds.), Applying agents for engineering Thompson, C. J. (1996). Caring consumers: Gendered consump-
of industrial automation systems, LNAI 2831. Berlin: Springer. tion meanings and the juggling lifestyle. Journal of Consumer
Siebers, P., Aickelin, U., Celia, H., & Clegg, C. (2009). Simulat- Research.
ing customer experience and word-of-mouth in, retail—A case Wallace, P. (2000). Forget kids—Women rule on the web.
study. Simulation: Transactions of the Society for Modeling and Xue-wu, S., Gui-hua, N., & Ling, S. (2006). Gender-based differ-
Simulation International. ences in the effect of web advertising in e-business. International
Solomon, M. (1999). Consumer behavior (4th ed.). New Jersey: Pren- Conference on Management Science and Engineering, 78–83.
tice Hall.


View publication stats