Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Introduction
This report is part of an information-visualization (InfoVis) project for the IN4086 Data Visualization
course at Delft University of Technology in the Netherlands and its primary goal is to apply
principles and concepts of Data Visualization introduced in class into practice.
The aim of the project is to visually encode complex datasets using existing data visualization tools
while utilizing human (visual) perception factors in order to represent the raw, complex data in a
way that leverages the extraction of valuable knowledge and insights that were not directly derived
or perceived by the non-visualized data.
The subject of our project is the latest Ebola 2014 outbreak in the West African countries, and the
geographic and demographic impact of Ebola on the epicenter countries of the outbreak in West
Africa: Guinea, Liberia and Sierra Leone between 2014 and 2015.
In the first part of the report, we initially show an overview of the geographic spread of the latest
Ebola 2014 outbreak globally and the escalation of its impact on the epicenter countries.
In the second part, we perform a visual analysis spanning in two main directions: first, the spread of
Ebola differentiated by gender and second the spread of Ebola across different age groups in the
epicenter countries.
Under the temporal scope, we are only visualizing the impact of the 2014 Ebola outbreak between
2014 and 2015. After conducting some research on the available Ebola data corresponding to the
year 2016, we did find some level of inconsistency in the data available across different resources
and thus, we decided to exclude the data of 2016 from the scope of our design study.
Under the thematic scope, as most official reports were already available investigating the health,
social and economic impact of the Ebola Virus Disease (EVD), we decided to focus the design and
analysis of our project on the geographic and demographic scope of Ebola while at the same time
restricting our analysis to the impact of Ebola on the epicenter countries rather than to its global
effect.
In that direction, regarding the geographic impact, we are visualizing the impact of Ebola on the
global scale so that comparison of affected areas and the emergency of the outbreak would be
clearly conceived. For the demographic impact, we are only shedding light over the Ebola impact on
the epicenter countries of West Africa and its effect over 3 different age groups and the 2 genders.
Validity of Drawn Conclusions: When aspiring to visualize data about a global public health threat
like Ebola, the volume of data you visualize as well as the accuracy and timeliness of the data may
vitally affect the accuracy and truthfulness of your final observations. That is why we came to the
conclusion that it would be necessary to restrict the scope of our analysis and specify clear goals.
However, the dataset we assessed is still a subset (sample) of the available data so we acknowledge
that a significant larger sample may highlight the impact of Ebola in much more detail and validity.
Visualization Tool: We wanted to work on a visualization tool that would enable us to easily create
plots out of the data, but at the same time it would enable us to create an overview of different
plots where data would be interactively connected across different plots e.g. on the click of the
mouse. Using tableau as a visualization tool was quite helpful as we could create a dashboard with
different sheets and visualize data from different datasets in an aspiration to raise some important
correlations among them.
Data Acquisition and Data Formating: Although Ebola was a major threat for public health and
multiple sources of data were available, deciding on the acquisition of data available and filtering
the data was a major challenge due to different datasets available deviating in measurements.
Data Acquisition
In our report, we use datasets acquired from the websites of major humanitarian and research
organizations such as the World Health Organization (WHO), the World DataBank, the
Humanitarian Data Exchange and the WorldPop that value open data practices and high level of
reliability and variety on the data collected.
When we started working on the project, we also aspired to go into a much more in-depth analysis,
for example we wanted to zoom further in the regions of the epicenter countries and visualize the
impact of Ebola across a specific country, but unfortunately we could not find enough available
data to perform this task. Even when we did find some related data of Ebola cases per district
concerning the epicenter countries, the period of time the data concerned differed from the
predefined timeframe scope we had already set for our project, so we finally had to give up this
idea, nevertheless we allocated quite some time working on that direction.
Data Formatting
In order to be able to later on create data visualizations in tableau, we first had to format the
acquired data to create a dataset that would be functional and useful for our project scope and
analysis. Therefore, for example for the demographic analysis, we collected data from the World
Health Organisation); we first downloaded various individual data points from December 2014 to
December 2015 and then, we merged and organized them using Microsoft Visual Basic so that our
dataset would be refined for tableau visualization. An example of our formatting in shown at the
Figure 1.
Figure 1 shows on the upper side the individual data points concerning a specific date in excel format as acquired
by the World Health Organization website, while on the bottom is shown the formatted dataset for Guinea after
merging and cleaning the individual datasets for different dates between December 2014 and December 2015 in
Microsoft Visual Basic.
Our Tool
Tableau is an existing data visualization tool that enables powerful and interactive analysis of
complex data. Tableau consists of a user-friendly, drag-and-drop interface and a lot of
documentation and training videos are available online to be consulted. As soon as we decided to
use tableau for our project, we started experimenting on the different visualization possibilities and
realizing the advantages and disadvantages of alternative visual encodings along with the
understanding that according to the dataset and the category of data we were using, there was a
visualization that would reveal or validate our initial hypothesis in the most representative-to-the
reality way.
Our Approach
Our main approach to conduct the data analysis was:
a. Research on the Web and initial hypothesis formulation intuitive, shortsighted assumption,
b. Experimentation on alternative correlation of data (interactive queries or calculations) and data
visualization observation,
c. Final decision of the most beneficial visualization technique to confirm or reject the initial
hypothesis through data visualization.
Visualization Techniques
By focusing on the geographic and demographic data analysis of the Ebola outbreak, we had the
chance to experiment on different information visualization techniques:
a. For the geographic analysis we did use a multiple views dashboard where the user is able to
perform a dynamic (visual) query on the data and interact with it by simply selecting a range of
interest (in our case the country). In this section, we mostly focused on multiple and interactive
data representation rather than performing extensive operations on the data, while
b. For the demographic analysis we augmented filtering on the data by performing aggregation of
multiple dataset and normalization to a common scale in order to confirm or revoke initial
assumptions and be able to draw stable conclusions.
In short, from a perceptual point of view, part 1 promotes more data exploration (exploratory
visualization) through the interaction among data and the color saturation that represents the
escalation of the Ebola spread and impact, whereas part 2 verifies or refutes pre-conceived
hypothesis through perceiving the data distribution and pattern of evolution over time. Both parts
serve the purpose of presenting the data in a more communicative, visual approach.
Techniques
Multiple linked views on a dashboard (Map and stacked bar charts)
Treemaps: We used treemaps to visualize the mortality rate of Ebola per country. The color
saturation along with the area of the scheme highlight the impact of Ebola per country in
terms of fatality.
Bar charts. We used bar charts in order to best compare the lookup values and study their
distribution over time to uncover any pattern on the occurrence of Ebola in people of
different age groups and gender.
Stacked bar charts. Differentiation in color and length showcases the evolution of cases per
country while also highlights the fatality rate among the total cases of Ebola namely the
part-to-whole relationship between cases of infected people and fatal cases of Ebola.
Motion chart. Used for visualizing the evolution on the rate of affected population as well as
the Ebola effect on diverse age groups.
We used different colors for each country; each country has the same color in each bar
graph and also at the motion chart visualization at the demographic analysis.
Geographic Impact
Through the geographic data analysis, we were able to design a dashboard on tableau of interlinked
multiple views in order to enable interaction between the data and make better sense of the Ebola
impact on the epicenter countries compared to the global spread of Ebola in terms of the
cumulative number of affected people (cases of people infected by Ebola plus fatal cases) and
whether there were emerging cases of Ebola present over the last 21 days in our data subset.
Although, our figures are mostly showcasing the impact of the outbreak in Guinea, Liberia and
Sierra Leone, we mapped the global impact of the outbreak so that comparative analysis and
perception would be better conceived through statistics and color density (saturation). On the right
of the multiple views dashboard, there is a scale of red color whose saturation highlights the
impact of Ebola with respect to the combined cases of infected and deaths people per country.
The user is able to apply filters on the global map in order to see the geographical and cumulative
distribution of specific features of the data across the globe and the impact of Ebola per region.
In the map view, the redder colored a country is, the more the impact of the Ebola outbreak is, and
thus the more urgent the situation has been. At the same time, the user can perceive the
emergency of the situation per country by checking whether there were still cases of Ebola
detected in the selected country over the last 21 days and the national fatality rates.
Figure 2: By selecting filters on the right, the user can discover the geographic distribution of
metrics such as the Ebola cases, the fatality rate, etc. along with their cumulative distribution over
time. An important observation is the impressive variation in the range of affected people; in some
countries the number of total cases is only 3 whereas in the epicenter countries this number
reaches and exceeds the 2.8 million affected people. The redder a country is on the global map, the
more affected by Ebola.
Figure 2 shows the interlinked multiple views dashboard for the Ebola worldwide impact.
Figure 3: By the click of the mouse, the user can see the geographic impact of Ebola per country,
along with the total number of cases of infected and people died of Ebola in yellow and red color
respectively, while also checking the number of cases of affected people over the last 21 days.
Figure 3 shows the interlinked multiple views dashboard for Ebola impact to the epicenter countries in West
Africa.
Figure 4: Although numerous countries were affected by Ebola, the disease was deemed deadly
only in the countries shown in Figure 4. Out of all the countries Mali has the highest mortality rate.
Figure 4 shows the mortality rate for the Ebola affected countries worldwide.
Demographic Impact
In this part, in order to explore the data in more depth, instead of filtering and interacting with it,
we make calculations with the data and visualize its distribution over time so that we can derive
meaningful insights and patterns. More specifically, in this section, we analyzed the spread of Ebola
in terms of age and gender distribution for the epicenter countries of Guinea, Liberia and Sierra
Leone.
The main questions we aspired to reply were:
Does Ebola present a pattern related to the age group that people belong to?
What is the infection rate on children and elderly? Are they less or more affected?
Does Ebola affect more women or men? Is there any pattern on the occurrence of the
disease with respect to gender?
Is Ebola actually an epidemic?
Although Ebola first appeared back in 70s, it was not until the very recent outbreak in 2014 that
has been considered an epidemic disease as the number of cases (people infected) and the number
of deaths due to the disease increased significantly alerting the global community. According to
Wikipedia, the West African Ebola virus epidemic was the most widespread outbreak of Ebola virus
disease in historycausing major loss of life and socioeconomic disruption in the region, mainly in
the countries of Guinea, Liberia, and Sierra Leone [1].
Initially, when we saw the statistics of the affected population and after computing the percentage
of the affected population over the general population of the West African countries, the
percentage of affected people seemed to be relatively slow. That is why we did wonder why Ebola
was defined as an epidemic.
After visualizing the data and searched on the Web how an epidemic is defined, we observed that
according to the global metric of an epidemic definition, Ebola has been extensively contagious and
dangerous for people as shown in Figure 5.
More specifically, according to Principle of Epidemiology in Wikipedia an epidemic is the rapid
spread of infectious disease to a large number of people in a given population within a short period
of time, usually two weeks or less. An attack rate in excess of 15 cases per 100,000 people for two
consecutive weeks is considered an epidemic [2]. In Figure 5, It is obvious that the disease
exceeded by far the 15 cases per 100,000 for 2 consecutive weeks, so we can see that Ebola was
actually an epidemic outbreak in the African countries.
Figure 5 showcases the cumulative number of Ebola cases in West African countries over 2014 - 2015.
Figure 6 shows the increasing trend of the 2014 Ebola outbreak on male and female population between
December 2014 and December 2015.
Figure 7: Our initial intuition from the motion chart on Figure 6 was that Ebola was more occurrent
to female population. In order to test this hypothesis, we plotted the number of cases of affected
people by Ebola varied by gender. To avoid drawing wrong conclusions influenced by the potentially
unequal gender distribution over a country's population - for example, a country may seem to have
more female affected people of Ebola, however the female population may be larger in that
country -, we normalized the data over the population of each country.
Figure 7 shows the visualization of the affected people by Ebola per gender normalized in respect to the total
population of a country. The bar charts regards to year 2015.
Figure 8: For the analysis per age group, our initial hypothesis was that the cases of affected people
must be more in the category of people that are more active socially (15-44), namely the workers
and the people who are taking care of the more vulnerable age groups e.g. children (0-14) and
elderly people (45+). We plotted the Ebola affected population classified in 3 different age groups in
a motion chart. The chart showcases that the people who belong to the middle-age category are
mostly affected.
Figure 8 visualizes the pattern of the Ebola impact over the age groups of 0-14, 15-44 ,45+ for the epicenter
countries.
After May 2015, we also observed a sudden decrease in number of Ebola affected male and female
(as shown in Figure 6). Through our analysis and findings, we came to know that there was a series
of vaccinations carried out in three phases by the World Health Organization (WHO) in the affected
African countries that resulted to a decrease on the affected population.
References
[1]https://en.wikipedia.org/wiki/West_African_Ebola_virus_epidemic. Accessed: 16.12.2016.
[2] https://en.wikipedia.org/wiki/Epidemic. Accessed: 16.12.2016.
[3] A. Vilanova. Information Visualization.
https://blackboard.tudelft.nl/webapps/blackboard/execute/content/file?cmd=view&content_id=_2
901901_1&course_id=_56543_1. Accessed: 10-12-2016.
https://blackboard.tudelft.nl/webapps/blackboard/execute/content/file?cmd=view&content_id=_2
904070_1&course_id=_56543_1. Accessed: 12-12-2016.
Dataset Sources:
[1] https://en.wikipedia.org/wiki/Epidemic
[2]http://apps.who.int/ebola/en/status-outbreak/situation-reports/ebola-situation-report-31-
december-2014
[3]https://data.humdata.org/dataset
[4]http://www.worldpop.org.uk/data/data_sources/
[5] http://datatopics.worldbank.org/gender/indicators\
[6] http://databank.worldbank.org/data/reports.aspx?source=global-bilateral-migration