Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Table of Contents
Executive summary
Introduction
Main body
Conclusion
Project Overview
For the project, we hope to use analytical tools like Tableau and R to achieve two goals:
1. Perform business intelligence analysis on the data set to determine the factors related to
death
2. Provide recommendations on business analysis strategy implementation
Introduction
Every year, the Center for Disease Control and Prevention releases the countrys most
detailed report on death. The mortality dataset contains records of every death in the
country in 2014, which includes information about demographic background and causes
of death. The U.S. government uses the data to determine life expectancy and to
understand the complex circumstances of death across the country. In our study, we will
apply key business analytics concept to analyze the DeathRecord dataset, assess the BA
maturity of the CDC, and provide our insights.
Main Body
The dataset contains four main tables:
DeathRecord: the primary table containing all the pertinent information in a
single row per death
EntityAxisConditions: ordered list of causes of death
RecordAxisConditions: unordered list of causes of death
Lookup Tables: a reference to show the code and description for each column
name from DeathRecord table
In our study, we will apply the following topics from the course: Lead/Lag Data, Critical
Success Factor, SMART, Rockart Model, Insights, Explorative model, and BI Maturity,
to analyze the a Kaggle competition dataset.
Lead data:
We have a lag data of age and year of death from which insights could be drawn
about deaths in particular year for particular age group.
Using the age data, patterns between deaths in specific age group v/s specific
manner of death could be evaluated. Eg. Manner of death is suicide for
particular age group.
Correlation between age group and gender can be evaluated using lag data of
age and gender form DeathOfRecords.
Manner of death lag data can be useful to infer the relation between the age of
persona and the manner of death.
Explorative Analysis:
Explorative Analysis is an approach to analyzing data sets to summarize their main
characteristics with visual methods. By exploring this data, it is possible to formulate
hypotheses that can lead to new data collection thus helping with lead data for future
analysis.
The DeathRecord table contains 38 variables and 1,048,576 observations. Before conducting a
BI analysis, we need to first perform a data cleansing process. By using R, we randomly select
2,000 samples from the dataset.1 We use the sample dataset to represent the whole population.
Data Reduction
1. Find most significant measurements: In our dataset, the following variables are
considered to be the most significant for measuring the required KPIs: Age, Education
Status, Causes of Death, and Marital Status.
Cluster Analysis
1. Grouping/ Segmentation: Group together same causes and analyze on various factors:
Age Groups, Marital status, Education. For example, marital status: married, divorced,
single etc. were grouped to come to a better analysis. Education system was clustered as
8th grade, bachelors, graduate etc. to understand the liability of deaths in particular
education pattern. Here, we will use Tableau to illustrate a better visualization.
1 Please refer to the Appendix for the R syntax to get the sample of 2,000
observations
2. Allows for targeting: after grouping the dataset, we set the following targets:
a. The relationship between death and marital status
b. Top causes of death
c. Death and seasonality
Appendix
a. Indicate which of the course materials were used during the project. For each item,
briefly describe the context in which it was used (1 to 2 bullet points is sufficient).
b. Hours spent on different project tasks by each team member. Anything else that is
relevant to the project but is not appropriate for the main body of the Project Summary
Report
library(data.table)
library(MASS)
install.packages("data.table")
install.packages("data.table", dependencies= TRUE)