Está en la página 1de 13

International Journal of Information Technology & Decision Making

Vol. 16 (2017)
°c World Scienti¯c Publishing Company
DOI: 10.1142/S0219622017500286

Thinking and Modeling for Big Data from the


Perspective of the I Ching

Chuang Lin*,† and Guoliang Li†,‡


*Department of Computer Science, Shenzhen University
Shenzhen, P. R. China
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.


Department of Computer Science, Tsinghua University
Beijing, P. R. China

liguoliang@tsinghua.edu.cn

Zhiguang Shan
State Information Center of China, Beijing, P. R. China

Yong Shi
University of Chinese Academy of Sciences, Beijing, P. R. China

Published 25 July 2017

Data is growing faster than ever before and is changing our daily life. However it is rather
challenging to manage the big data [F. H. Cate, The big debate, Science 346 (2014) 810,
J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh and A. H. Byers, Big Data:
The Next Frontier for Innovation, Competition, and Productivity (Mckinsey global Institute,
2011), S. Lohr, The Age of Big Data (New York Times, 2012), p. 11, L. Einav and J. Levin,
Economics in the age of big data, Science 345 (2014) 715, M. J. Khoury and J. P. A. Ioannidis,
Big data meets public health, Science 346 (2014) 1054–1055, V. Marx, Biology: The big
challenges of big data, Nature 498(7453) (2013) 255–260.]. In this paper, we propose the big
data thinking and modeling techniques from the perspective of the I Ching, which is a very
famous imaginal thinking theory in China with 3,000 years history. The I Ching has been proven
to be very useful and practical in many domains, e.g., 36 stratagems.
Firstly, inspired from the three components of the I Ching, image, number and principle, we
propose a new three-cycle big data thinking way, from data to phenomenon, from phenomenon
to correlation, and from correlation to knowledge, which is a generalization of the fourth
paradigm (from causality to correlation) proposed by Jim Gray.
Secondly, inspired from the three entities of the I Ching, heaven, earth and human, we
propose a new big data modeling method. We use the tree entities to represent the big data. We
map the 4 V of big data (volume, variety, velocity, veracity) to four opposition and uniform
relations in the I Ching, and generate the eight diagrams. By capturing the relationships
between eight diagrams, we generate the 64 hexagrams, and use 64 hexagrams to model big
data. We also provide the principle rules to understand the knowledge generated by the model.
Thirdly, we discuss how to utilize our model to describe big-data management tools,
including, MapReduce, Spark, Storm. We also provide a new model for handling distributed
data streams.

‡ Corresponding author.

1
2 C. Lin et al.

We do think that we provide a new practical way of thinking and modeling for big data.
We also believe that this will open up many new research directions on big data.

Keywords: Big data; I Ching; data modeling; data thinking.

1. Introduction
There are more and more data being generated in every ¯eld which are changing our
daily life and we enter the era of big data.1–7 However, it is rather challenging to
manage the big data8–10 because (i) large volume, data scale increase from TB to PB,
even EB. The data generated in the recent two years is larger than that generated in
the last 1,000 years; (ii) high velocity, the data is generated continuously and in-
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com

stantly. For example, several thousands tweets will be posted every second;
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

(iii) variety, the data is heterogeneous, including unstructured data, structured data,
semi-structured data; and (iv) veracity, the truth is hidden under the big data and it
is rather hard to identify the truth.
To address these challenges, we propose the big data thinking and modeling
techniques, which can provide new perspectives to understand and manage the big
data. Di®erent from most of the existing computing models which utilize formal
logical system from the perspectives of dialectical thinking and system thinking, the I
Ching adopts the imaginal thinking. Dialectical thinking and system thinking are
utilized to model the thinking way of western people while imaginal thinking is used
to model the thinking way of eastern people, especially Chinese. The former is de-
terministic, local, accurate. It addresses a problem from local to global, like deep
learning, and its advantages include logical reasoning, system analysis, and quanti-
tative analysis. It emphasizes on how to accurately formulate a problem. The latter is
nondeterministic, global, and fuzzy. It addresses a problem globally and its advan-
tage is property analysis and designed principle. It emphasizes on perception.
Obviously, it is hard to accurately address big-data problem using the dialectical
thinking and system thinking, because it may go astray due to the complicated data.
For example, in deep learning, even if there is a minor disturbance in the data (e.g.,
if slightly changing the face of a dog to cat), it may generate very di®erent results
(e.g., then deep learning will take it as a cat). On the contrary, the imaginal thinking
is naturally suited for designing e®ective techniques for big data. Note imaginal
thinking has been widely acceptable in many ¯elds. For example, Albert Einstein
said that
\Development of Western science is based on two great achievements: the in-
vention of the formal logical system (in Euclidean geometry) by the Greek philoso-
phers, and the discovery of the possibility to ¯nd out causal relationships by
systematic experiment (during the Renaissance). In my opinion one has not to be
astonished that the Chinese sages have not made those steps. The astonishing thing is
that those discoveries were made at all."11
Thinking and Modeling for Big Data from the Perspective of the I Ching 3

The I Ching is a representative theory of imaginal thinking, and has a history of


3,000 years. It is not only widely accepted in China but also very famous in the world.
It has been translated into English, German, Japanese, French, and many other
languages. The I Ching is called \universal algebra', and Leibniz, who was corre-
sponding with Jesuits in China, wrote the ¯rst European commentary on the I Ching
in 1703, arguing that it proved the universality of binary numbers and theism.12 The I
Ching has been proven to be very useful and practical in many domains. For example,
thirty-six stratagems are inspired from the I Ching on tactics, which has been proven
to be very useful in Chinese history from 420 AD. In thirty-six stratagems, the tactics
is hidden in the objective laws which imply the tactics. Stratagems cannot be designed
by human because it does not conform to the objective law, otherwise the designed the
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com

stratagems must fail in practice. Thus its basic idea is inspired from the I Ching. Note
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

that western people utilize formal logical system from the perspectives of dialectical
thinking and system thinking, while Chinese people, especially, the I Ching, adopt the
imaginal thinking, which emphasizes on perception and gives a high-level idea to
address hard problems.13
In this paper, we utilize the I Ching to facilitate the big data thinking and
modeling. Firstly, inspired from the three components of the I Ching, image, number
and principle, we propose a new three-cycle big data thinking way, from data to
phenomenon, from phenomenon to correlation, and from correlation to knowledge,
which is a generalization of the fourth paradigm (from causality to correlation)
proposed by Jim Gray.14 Secondly, inspired from the three entities of the I Ching,
heaven, earth and human, we propose a new big data modeling method. We use the
tree entities to represent the big data. We map the 4 V of big data (volume, variety,
velocity, veracity) to four opposition and uniform relations in the I Ching, and thus
generate the eight diagrams. We utilize 64 hexagrams to capture the relationships
between the eight diagrams. We also provide the principle rules to understand the
knowledge generated by the model. Thirdly, we discuss how to utilize our model to
describe big-data management tools, including, MapReduce, Spark, Storm. We also
provide a new model for handling distributed data streams.
To summarize, we make the following contributions.

(1) We provide a new three-cycle big-data thinking way inspired by the I Ching
using the imaginal thinking, from data to phenomenon, from phenomenon to
correlation, and from correlation to knowledge. For addressing any big-data
problem, we need to utilize the three steps to design the methodology.
(2) We propose a new big-data modeling framework using the I Ching. We use the
heaven, earth and human to model the big data. We map the 4 V of big data
(volume, variety, velocity, veracity) to four opposition and uniform relations in
the I Ching, and use the eight diagrams to model the big data. We utilize the 64
diagrams to capture the relationships of the four relations and give the principle
rules to understand the knowledge.
4 C. Lin et al.

(3) We discuss how to utilize our thinking and modeling methods to explain existing
big-data processing platforms, and also provide a new model for processing
distributed stream.

2. Big Data Thinking


2.1. The fourth paradigm
In 2007, Jim Gray proposed the fourth paradigm,14,15 which can be taken as a way of
data thinking. From history, three research paradigms can be summarized. The ¯rst
is experiment, e.g., ¯re by rubbing sticks. The second is theory, e.g., Newton's second
law. The third is computer simulation, e.g., weather forecast. We can see that all the
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com

three paradigms are led by humans, and the human needs to do experiments, think
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

out new theory, and provide new models for computing. However, in the big data era,
it is rather hard to do experiments on big data, give theory and provide models from
a large amount of data. Thus, a new way is to automatically identify them from the
big data. In other words, the big data processing should be led by big data instead of
humans.
Although the fourth paradigm provides a new way to data thinking, it only
provides a high-level idea but does not provide any details on how to understand big
data. In traditional data processing, people will decide how to process the data. For
example, in databases, people should embed schema into the data and pose SQL to
query the data, and the database systems will return the result. Take search engine
as an example. The search engines ¯rst crawl the data from the web and index them.
Then given a query, search engines ¯nd most similar documents to the query. We can
¯nd that these traditional processes have an important common feature people
design an automatic data processing °ow and the machines process the data fol-
lowing the °ow. This is coined with computational simulation. However, for big data,
it is rather hard to give the automatic °ow because the people do not know how to do
it, and even do not know what to do. Thus, the °ow should be automatically detected
based on the big data.
To address this problem, in this section, we propose a novel data thinking way to
help users understand and model the big data.

2.2. Three-cycle data thinking


Data Thinking. Data thinking is a philosophical principle, which aims to identify
phenomenon and objective law from the big data. Formally, given a big data set, data
thinking is the way of thinking on how to identify the big value from the big data.
Inspired from the I Ching, we propose a new data thinking way with three cycles.

(1) From Data to Phenomenon: It is very important to detect phenomenon from the
data. Any data processing operations aim to detect some phenomenon from the
big data. For example, Google predicts the in°uenza activity for more than 25
Thinking and Modeling for Big Data from the Perspective of the I Ching 5

countries based on users' queries posted to Google. The idea behind Google Flu
Trends (GFT) is that, by monitoring millions of users health tracking behaviors
online, the large number of Google search queries gathered can be analyzed to
reveal if there is the presence of °u-like illness in a population. Google Flu Trends
compared these ¯ndings to a historic baseline level of in°uenza activity for its
corresponding region and then reports the activity level as either minimal, low,
moderate, high, or intense. These estimates have been generally consistent with
conventional surveillance data collected by health agencies, both nationally and
regionally. There are many studies pointing out that it is very important to
identify phenomenon from the big data.5,16–18 Thus, we think that identifying
the phenomenon from the big data is the ¯rst step.
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com

(2) From Phenomenon to Correlation: Jim Gray in the fourth paradigm points that
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

we require to transform the data thinking way from causation to correlation.5,14


We have the same idea but in a di®erent way that we should detect the corre-
lation from the phenomenon. In the Google Flu Trend example, we know that
the °u and the search keywords by users have high correlation. Taking the
product recommendation as an example, the product systems, e.g., Amazon,
usually recommend a user with products that are bought by other users who buy
similar products with the user. In other words, they recommend products based
on high correlated users. Identifying the correlation is fairly important to un-
derstand the big data. Thus, we think that identifying the correlation from the
big data is the second step.
(3) From Correlation To Knowledge: Based on the correlation harvested from the
big data, it is important to generate knowledge or principle,5,19 which can in turn
improve the data understanding and management in future. For example, to
improve the search quality, Google builds a knowledge base based on the big
data, which help Google to better understand data. Obviously, knowledge base
can be utilized to know the real-world facts and do reasoning based on the
relationships between facts. In addition, it is rather important to generate
knowledge in biology2 and medicine, and in this way the doctors can know how
to make better treatment. Thus, we think that identifying the knowledge from
the big data is the third step which can also bene¯t the ¯rst two steps.

There should be a closed loop in the three cycles. In other words, the knowledge
can also bene¯t detecting phenomenon from big data. To manage the big data, we
must utilize this data thinking way, which can guide users to understand and manage
the big data.

2.3. Data thinking implementation from the I ching perspective


The I Ching contains three main entities, image, number, and principle, which ex-
actly match the three cycles in the data thinking, and they provide e®ective ways to
implement the three cycles and thus can be used to understand the cosmic inventory.
6 C. Lin et al.

Image: It corresponds to the phenomenon, which is the core of the I Ching.


Everything has its own phenomenon. If we can capture the image, we can understand
and explain phenomenon in various domains. Image can be understood as the ab-
stract features from various domains. It is important to capture the image and use it
to understand the phenomenon. An important concept is gram, corresponding to Yin
(0) and Yang (1). Three-level grams generate eight diagrams. Each diagram can map
to di®erent entities in di®erent domains. For example, Yang refers to heaven and Yin
refers to earth in the cosmos domain.
As the eight diagrams do not consider the relationship between di®erent entities,
eight diagrams are extended to 64 hexagrams which are superimposed by two layer
eight diagrams. The I Ching uses 64 hexagrams to capture the phenomenon, which
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com

are similar to 64 functions. Given some input, e.g., the data, it uses the 64 hexagrams
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

to tell the phenomenon. The 64 hexagrams can cover every aspects in social science,
natural science and cosmos.
For example, a very famous saying in China \bad surroundings make bad
civilians" is an image, which is veri¯ed in the 3,000 history of China. Based on the
images, we can summarize some useful principle and avoid bad images to bene¯t
mankind and environment.
Thus, the I Ching implies that it is rather important to get the phenomenon from
the big data, which is consistent with our analysis. It also provides a way using 64
hexagrams to detect the phenomenon.
Number: Number is the mathematical system in the I Ching, which is used to do
reasoning in the I Ching. It can be understood as the various mathematical models.
The I Ching also uses a binary system, similar to the computer. It uses two numbers
Yin (0) and Yang (1), which are the two options in each gram in the eight diagrams
and 64 hexagrams, Yin and Yang also re°ect the positions of di®erent grams in the
eight diagrams and 64 hexagrams. For example, in eight diagrams, we have eight
images, 000, 001, 010, 011, 100, 101, 110, and 111. The positions are rather important
in the eight diagrams and 64 hexagrams. A 64 hexagram contains six-layer gram, and
each gram has two options.
There are many important numbers in the I Ching, including two forms Yin and
Yang, four images, eight diagrams, and 64 hexagrams. Image and number can used
together and then it can generalize more images from the basic images. This is also
coined with that we can generate all kinds of everything from the images. In other
words, the heaven gives a birth to 1, 1 gives birth to 2, 2 gives birth to 3, and 3 gives
all things of the world. It can also build the connection between di®erent images.
Principle: It is the philosophy in the I Ching model, referring to the Law in the
natural world and human social activities. It is used to explain the reasons in various
domains. It can be understood as prediction functions. Based on the images, we can
get important principles. For example, based on the image \heaven maintains vigor
through movement", we can get a principle \a gentleman should constantly strive for
self-perfection". Based on the image \earth's condition is receptive devotion", we can
Thinking and Modeling for Big Data from the Perspective of the I Ching 7

get a principle \a gentle man should hold the outer world with board mind".
These two principles are used as the motto of Tsinghua University.
There are four types of principles. First, the I Ching principle can model every-
thing and complies with the natural science. Second, the I Ching principle can model
cosmos and social, and thus it complies with cosmos and social science. Third, the I
Ching can model the relationship between diagrams. A famous saying is that \After
a storm comes a calm," which means that the diagrams can be mutual transformed.
Fourth, the gram number and gram position are rather important. A saying is that
\couple hardness with softness and the doctrine of the mean". This discusses the
importance of gram positions.
Principle and number can be used together. For example, we can use di®erent
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com

mathematical models and di®erent functions, and then we can generate di®erent
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

results.
Relationship Between Image, Number and Principle: Image, number,
and principle are strongly connected and cannot be separated. Image refers to the
phenomenon, number refers to the way the ancients describe the world. Principle
refers to the knowledge or truth extracted based on image and number. A famous
saying is \number implies power tactics/methods, and methods also reply on num-
ber; Yin and Yang can change the principle, and the changing mechanism plays an
important role. However, the mechanism cannot be supposed, otherwise the mech-
anism does not work." For example, any circumstance hitting a limit will begin to
change. Change will in turn lead to an unimpeded state, and then lead to continuity.
A Thingking Principle: Simplicity-Variance-Consistency. The I Ching also
has important thinking principles. (1) Simplicity: If we can capture the essential
features, everything is simple and we do not need to utilize many complicated
models. The I Ching use 64 hexagrams to capture the main features, and thus it
is simple and easy to understand. This corresponds to the imaginal thinking.
(2) Variance: Everything will change. The I Ching contains a circle and a curve.
The curve divides the circles into Yin and Yang, and they can change to each other.
The curve denotes that everything can change. This corresponds to the dialectical
thinking. (3) Consistency: Although everything changes, the changing mechanism
will not change. The change is the surface phenomenon and the consistency is the
inherent law. This corresponds to the system thinking. Thus, the three I Ching covers
the three types of thinking ways. The simplicity-variance-consistency also explains
image, number and principle. Image can deduce number, which in turn deduces the
principles and the principles are in¯nite. In other words, we can understand the
content from the external imagery. From the status, we can understand the property.
From the property, we can know the essence.
Summary. Based on the above discussion, we can use the I Ching to facilitate the
big data thinking. We use image to capture the phenomenon of big data, number to
capture the correlation, and principle to capture the knowledge. In any big data
8 C. Lin et al.

processing operations, we need to utilize this three-cycle model to understand


the big data.

3. Data Modeling from the Perspective of the I Ching


The 4Vs of big data, volume, variety, velocity, and veracity, are four surface features,
and we should not be puzzled by the 4Vs when processing the big data. Instead,
our goal is to detect the inherent features of big data and identify the phenomenon.
We aim to ¯nd a new data model to understand the big data and thus address the big
data problems.

3.1. Data representation: Three entities principle


Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

There are three dimensions in data processing, micro dimension, medium dimension,
and macro dimension. Micro dimension is used to model individual and macro is used
to model the entirety. The medium is used to model a partial group. In big data, it is
more important to capture the entirety and thus macro dimension is more important.
The I Ching is also good at modeling the entirety. It uses the heaven, earth, human to
model everything. This is also a philosophy abstraction model. It has no numerical
form but uses Yin/Yang. The three entities give birth to 1, 1 gives birth to 2, 2 gives
birth to 3, and 3 gives all things of the world.
Note that the three entities have two options, heaven: Yin/Yang, earth: hard/
soft, human: kindheartedness/justice. Based on the three dimensions, we can build
eight diagrams. Thus, we can use the three entities to present the big data. For any
type of data, we can model it as a set of triples, where each triple includes heaven,
earth, human. In di®erent domains, we can utilize di®erent strategies.

3.2. 4 V: Eight-diagram principle


We can use the 8-Diagram Principle to capture the features of big data. To generate
the eight diagrams, it is important to extract 4 opposition and uniform relations.
To this end, we generate four relations based on the 4Vs.
(1) Volume: For large volume, it is important to process large amount of data. There
are usually two types of methods: centralized or distributed. Distributed computing
can improve the performance, but at the expense of weak consistency. Centralized
computing can guarantee high consistency but at the expense of low performance.
So there is a tradeo® between centralized computing and distributed computing for
large volume data.
(2) Velocity: For high velocity, it is important to instantly process the data. There
are two ways to handle high velocity: continuous or discrete. The former processes
the continuous data while the latter handles the discrete data. So there is a tradeo®
between continuous data and discrete data for high velocity. To address the velocity,
we require e±cient methods to process the data. However, we do not need to get
the results for every second, because the result is binary and we only need to get the
Thinking and Modeling for Big Data from the Perspective of the I Ching 9

changing point. This is also consistent with consistency, variance and simplicity in
the I Ching.
(3) Variety: It is rather hard to handle the variety as the data is heterogeneous,
multiple-sourced, and full of noise. So we have two ways: hard or soft. The hard mode
exactly processes each type of data while the soft mode approximately handles
the variety. So there is a tradeo® between hard and soft for variety. Obviously, the
heaven, earth, and human is a good model for variety. The big data contains un-
structured, semi-structured and structured data. We can use \simplicity" to model
the variety. The nature law is simple and everything can be simpli¯ed. The struc-
tured data is well organized and easy to process. However, the way of getting the
structured data is hard. On the contrary, the unstructured data is not well structured
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com

and processing the unstructured data is hard, requiring complicated techniques (e.g.,
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

distributed machine learning). But the law hidden in the data is simple, and the
processing theory and philosophy are also simple. Thus, the way of changing from
processing structured data to unstructured data requires to ¯nd a new computing
way to handle the unstructured data which should use all the features of unstruc-
tured data. If we can ¯nd the way, it should also be simple.
(4) Veracity: It is fairly important to know the truth of the data, true or false.
So there is a tradeo® between true and false for veracity. First, the quality of data is
uneven, and thus the value is also di®erent and the truth may be a®ected by the error
data. Second, the whole data is rather important for detecting the truth. An example
is the questionnaire on the web. Obviously, the users have di®erent experiences,
di®erent ages, di®erent backgrounds. If we do not have the whole data, the results

Fig. 1. Eight diagram for big data.


10 C. Lin et al.

are unbelievable or biased based on the skewed data. Third, the veracity is similar to
Yin/Yang, which should be stable and there must exist a law to detect the truth.
Based on the four opposite relationships, centralized/distributed, continuous/
discrete, hard/soft, and true/false, we can build an eight-diagram model as shown in
Fig. 1. In this way, we can capture the main features of big data and use the eight
diagrams to represent the image of big data.

3.3. Big data modeling: 64 hexagram principle


From eight Diagrams to 64 Hexagrams. To capture the relationships between
the eight diagrams, the 64 hexagram is proposed, which is composed of two-layer
eight diagrams. For example, the correlation behind the data can be captured by 64
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com

hexagrams. The hierarchical design for computers and networks can also be deduced
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

from the I Ching.20 Thus, we use 64 hexagrams to model the big data. For each type
of query processing, we need to ¯nd an appropriate hexagram to address it.
Principle Rules of Knowledge. From the images/hexagrams, we can get very
important principles. For example, based on the image \heaven maintains vigor
through movement", we can get a principle \a gentleman should constantly strive for
self-perfection. " Based on the image \earth's condition is receptive devotion", we
can get a principle \a gentleman should hold the outer world with board mind." A
famous image is that \After a storm comes a calm", which means that the diagrams
can be mutually transformed. Note the gram number and gram position are rather
important. An image is that \couples hardness with softness and the doctrine of the
mean". For example, a very famous saying in China \bad surroundings make bad
civilians" is an image, which is veri¯ed in the 3,000 history of China. Based on the
images, we can summarize some useful principles and avoid bad images to bene¯t
mankind and environment.

4. From I Ching to Big Data Computing Paradigms


In this section, we discuss how to utilize the I Ching to address big data problems.
Usually, we can utilize a hexagram to model a big data problem, and then use the
explanations of the hexagram to guide the big data processing. Here, we take three
existing big data processing platforms as examples, and show that they are implied
by the I Ching. Then inspired from the I Ching, we propose a new model to process a
new application.

4.1. Three big data computing paradigms


4.1.1. MapReduce
MapReduce is a disk-based distributed data processing framework for batch com-
puting. Its basic idea is to utilize multiple computer nodes to process the big data in
parallel. If the data has no strong correlations, e.g., web pages, the framework is very
Thinking and Modeling for Big Data from the Perspective of the I Ching 11

e®ective. However, if the data has strong correlations, e.g., graph, the framework
is not e®ective.
The 14th image (called DaYou) in the 64 hexagrams implies the basic idea of
MapReduce. In this image, the top is distributed and the bottom is centralized.
An explanation in this hexagram, in the image shown in Fig. 2, is that \Given the
products in a big cart, if the products are not strongly connected, we can use multiple
small trollies to load the products without of any loss." This implies that we should
utilize distributed environment to process the centralized data. To this end, it
requires to partition the data into di®erent nodes (Map), distribute the data into
di®erent nodes (Shu®le), and process the data in each node (Reduce). Thus, the 14th
image provides a good hint for batch computing in the big data era.
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

4.1.2. Spark
Spark is an in-memory distributed data processing framework for iterative com-
puting. It aims to improve MapReduce by avoiding the expensive disk-based pro-
cessing and synchronization steps. To address these problems, it iteratively processes
the data using a directed acyclic graph (DAG) by saving the synchronization time.
It also provides a lazy processing technique to optimize multiple operations.
The 35th image (called Jin) in the 64 Hexagram implies the basic idea of Spark.
In this image, the top is distributed and the bottom is also distributed. An expla-
nation is that \Given a cart and multiple person pulling the cart, if all the persons go
to the same direction, then the cart can be pulled; If some person pulls from the front
and some push at the back, the cart can be easily pulled". This is to say, we can keep
our brain ticking over and think out new idea. In big data computing, we have two
e®ective strategies. The ¯rst is to trade time for space. That is, we can utilize the
memory to replace the disk and thus can reduce the disk latency. The second is to
optimize the multiple operations using a delay execution manner (i.e., given multiple
operations, we do not execute them immediately. Instead, we can batch them to-
gether, and then execute them using an optimized way when we need to execute the
operations). This implies that we should utilize distributed environment to process
distributed data using an iterative way, trade time for space and do optimizations in
the query processing. In addition, the image also implies that to pull the cart, the
person should be evenly distributed. That is, in Spark, we need to make workload
balance in the system. This also gives a suggestion to improve Spark by enabling
load balance. Obviously, the 35th image provides a good hint for iterative data
processing.

4.1.3. Storm
Storm is a stream processing framework for streaming computing. Its basic idea is to
use distributed environment to handle centralized streaming data.
The ¯fth image (called Xu) in the 64 hexagrams implies the basic idea of Storm.
In this image, the top is serial and the bottom is also serial. An explanation is that
12 C. Lin et al.

(a) DaYou for MapReduce (b) Jin for Spark (c) Xu for Storm (d) BI for MapReduce

Fig. 2. Examples for the I Ching.

\Take precautions and act according to circumstances". This is to say, we need to


Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

have full preparation before action. In big data computing, we need to ¯rst use a
master node to monitor the request, which is used to accept and distribute the
requests, and utilize multiple slave nodes to serve the request. Usually, we need to
preprocess for the coming request, e.g., indexes and routing rules on the queries.
Obviously, the ¯fth image provides a good hint for stream processing.

4.2. A new application: Distributed stream


Storm usually processes the centralized stream, e.g., the stream is generated from a
single source. However, if the streams are from multiple sources, we need to generate
new models. Note that there are many applications that require to monitor multiple
streams, e.g., industry 4.0, forest ¯re monitoring system.
The eighth image (called Bi) can handle this case. In this image, the top is
distributed and the bottom is serial. An explanation is that \head to tail, orderly
transition". This is to say, there are multiple coming streams, we need to utilize
multiple streams to make action. To use them, the order is rather important. For
example, in the ¯re monitor, there are multiple sensors, e.g., temperature sensor,
humidity sensor, and smoke transducer. We need to monitor all of them, consider
their order, and give an alarm after processing all of them. Thus, we need to syn-
chronize multiple streams. Obviously, the 8th image provides a good hint for dis-
tributed stream processing.

5. Conclusion and Future Work


In this paper, we discuss how to utilize the I Ching to facilitate data thinking and
data modeling. We provide a new three-cycle big-data thinking way inspired by the I
Ching using the imaginal thinking, from data to phenomenon, from phenomenon to
correlation, and from correlation to knowledge. For addressing any big-data prob-
lem, we need to utilize the three steps to design the methodology. We propose a new
big-data modeling framework using the I Ching. We use the heaven, earth and
human to model the big data. We map the 4 V of big data (volume, variety, velocity,
veracity) to four opposition and uniform relations in the I Ching, and thus generate
Thinking and Modeling for Big Data from the Perspective of the I Ching 13

the eight diagrams. We utilize the 64 hexagrams to capture the relationships of the
eight diagrams and give the principle rules to understand the knowledge. We discuss
how to utilize our thinking and modeling methods to explain existing big-data
processing platforms, and also provide a new model for processing distributed
stream.
In future work, we aim to utilize more hexagrams to provide big data processing
tools. We do think that we have provided a new practical way of thinking and
modeling for big data. We also believe that this will open up new research directions
on big data.

References
Int. J. Info. Tech. Dec. Mak. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/23/17. For personal use only.

1. J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh and A. H. Byers,


Big Data: The Next Frontier for Innovation, Competition, and Productivity (Mckinsey
global Institute, 2011).
2. V. Marx, Biology: The big challenges of big data, Nature 498(7453) (2013) 255–260.
3. F. H. Cate, The big data debate, Science 346 (2014) 810.
4. S. Lohr, The Age of Big Data (New York Times, 2012), p. 11.
5. Y. Shi, Big data: History, current status, and challenges going forward, The Bridge, The
US National Academy of Engineering 44 (2014) 6–11.
6. L. Einav and J. Levin, Economics in the age of big data, Science 345 (2014) 715.
7. M. J. Khoury and J. P. A. Ioannidis, Big data meets public health, Science 346 (2014)
1054–1055.
8. A. Jacobs, The pathologies of big data, Communications of the ACM 52(8) (2009) 36–44.
9. G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural
networks, Science 313(5786) (2006) 504–507.
10. J. Mervis, Science 336(6077) (2012) 22–22.
11. Letter to J. S. Switzer, April 23, 1953; Einstein Archive 61-381, Available at: http://
www.autodidactproject.org/quote/einstn2.html.
12. G. W. Leibniz, Published in the memories de 1 academic royale des sciences, Die Philo-
sophischen Schriften Von Gottfried Wilhelm Leibniz, Vol vii, ed. C. I. Gerhardt (Berlin,
1875–1890).
13. R. Wilhelm and C. F. Baynes, The I Ching or Book of Changes (Pantheon Books, 1951).
14. C. A. Lynch, Jim gray's fourth paradigm and the construction of the scienti¯c record,
in The Fourth Paradigm: Data-Intensive Scienti¯c Discovery (Microsoft Research, 2009),
pp. 177–183.
15. A. Labrinidis and H. V. Jagadish, Challenges and opportunities with big data, VLDB
5(12) (2012) 2032–2033.
16. I. C.-H. Fung, Z. T. H. Tse and K.-W. Fu, Converting big data into public health,
Science 347 (2015) 6220–6222.
17. L. David, Computational social science, Science 323 (2009) 721–723.
18. D. Lazer, R. Kennedy, G. King and A. Vespignani, The parable of google °u: Traps in big
data analysis, Science 343(6176) (2014) 1203–1205.
19. D. T. Larose, Discovering Knowledge in Data: An Introduction to Data Mining (John
Wiley & Sons, US, 2014).
20. C. Lin, Philosophical principles of computer architecture design in the i ching, Acta
Electronica Sinica 44(8) (2016) 1777–1783.

También podría gustarte