Está en la página 1de 36

HS791: Data Presentation

Ambarish Kunwar
Department of Biosciences and Bioengineering
Indian Institute of Technology Bombay
1

Outline of this Lecture


Types of variables and data
Presenting your data using figures
Making proper figures and tables
Types of error bars
Proper use of different types of error bars

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Variables and Data


VARIABLES: The values of the quantities that vary from
one measurement to another are called variables.
Examples: Gender and Age of people in a party.
Two types: Qualitative and Quantitative
DATA: Data are values of qualitative or quantitative
variables, belonging to a set of measurements.
Qualitative Data: No numbers can be assigned
Quantitative Data: Number can be assigned
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Quantitative Variable
A variable quantity may either be continuous i.e. it can
assume any value within a certain range
Or, it can only assume integer values (whole numbers)
and not fractions of integers
Continuous variables are usually measurements
Examples: heights, weights, lengths
Integer (Discontinuous) variables are usually counts
Examples: number of petals on a flower, number of fishes
in a pond

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Qualitative Variable
Qualitative data arise when the observations fall into
separate distinct categories.
Examples:
Colour of eyes : blue, green, brown
Exam result: pass or fail
Socio-economic status: lower, middle or upper.
Qualitative data are classified as:
Nominal if there is no natural order between the categories
(e.g. eye color)
Ordinal if an ordering exists (e.g. grades, pain level)
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Tutorial Assignment I
Please indicate type of variable by checking two appropriate boxes
from: integer, continuous, quantitative, qualitative, ordinal, and
nominal

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Presenting your data (Discrete)


A simple and effective way of summarizing discrete
(qualitative or quantitative) data is by counting the
number of observations falling into each category
The number associated with each category is called the
frequency and the collection of frequencies over all
categories gives the frequency distribution of that
variable

The relative frequency is a number which describes the


proportion of observations falling in a given category

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Presenting your data (Discrete)


Illustrated using data on the number of students in different
disciplines in a university
Discipline

Total Frequency

Relative Frequency

Percentage

Physics

70

0.175

17.5

Chemistry

85

0.225

22.5

Mathematics

30

0.075

7.5

Biology

90

0.225

22.5

Statistics

45

0.1125

11.25

History

32

0.0875

8.75

Economics

34

0.100

10

The frequency distribution of a variable is often presented


graphically as a Bar Chart
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Presenting your data (Bar Chart)


The vertical axis can be:
frequencies
or, relative frequencies
or, percentages

On the horizontal axis:


All boxes should have the same width
Leave gaps between the boxes (no connection between them)
Boxes can be in any order.
An alternative way of displaying the data is using a Pareto Chart
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Presenting your data (Pie Chart)


However, For relative frequencies or percentages,
a Pie Chart is often more appropriate
Each slice represents a proportion of the total

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

10

Presenting your data (Scatter Plot)


Bar charts and pie charts are often used to describe the structure within a
particular data set.
Often, occasions arise where we want to examine the relationships
between two or more quantitative data sets. The data can be continuous or
integer.
Examples:
1. Are the number of caterpillars on an oak leaf (integer variable) related to
the size of the leaf (continuous variable) for a sample of 100 leaves?
2. Does the number of sparrows (integer variable) in a particular village is
determined by the number of houses (integer variable) ?
3. Is the length of the upper arm bone (continuous variable) related to the
length of the upper leg bone (continuous variable) in a group of 100
students?
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

11

Presenting your data (Scatter Plot)


The first think to think about is whether you believe one of the
variables (the INDEPENDENT variable) has the main effect on
the values of the other variable (the DEPENDENT variable) but
not vice versa).
If so, plot independent variable on x-axis and dependent
variable on Y-axis.

Identify Independent variable


and dependent variable in last
three examples
If neither variable is dependent or
independent the one can choose
either variable to be the dependent
variable
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

12

Presenting your data (Scatter Plot)


Consider another scatter
plot: Relationship between
number of fishes and
pond size
As pond size increases,
there is a tendency for the
number of fishes to also
increase. The relationship
isnt perfect (all the points
do not lie on a single line)
but seems right.

OUTLIER

A point a bit out from all other points is called an OUTLIER


and care should always be taken to identify outliers
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

13

Presenting your data (Scatter Plot)


Why is this point different?
Possible Reasons?
Number of fish was overestimated?
Area of the pond underestimated?
Measurements correct but the water level of the pond was higher
than the other ponds and so supported more fish?

We cant tell by looking at the data and we would need


to study this pond in details to find out.
However, the scatter plot has helped us to visualize the
data and identify this outlier.
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

14

Presenting your data (Continuous Data)


The frequency distribution of a discrete quantitative variable may
be summarized in a bar chart or relative percentage in pie chart
Similarly, The frequency distribution of a continuous quantitative
variable can be constructed in the same way by first grouping the
observations. Example: Body weight of 100 individuals

Put them in bins


find frequency

73.26

71.87

67.88

87.78

64.00

65.78

61.45

65.64

69.77

70.39

71.46

81.55

82.45

79.77

81.38

87.35

68.76

69.00

83.67

81.81

66.54

69.87

77.46

79.93

65.69

68.62

76.50

78.97

87.98

82.76

84.38

86.35

65.76

82.00

83.38

78.32

65.69

69.63

71.67

78.66

80.45

69.97

84.39

77.55

78.26

77.65

82.99

69.69

64.66

78.50

81.95

72.49

79.27

69.13

64.29

75.67

82.25

87.18

64.20

65.58

81.45

72.89

68.25

66.53

65.39

82.32

85.69

75.63

74.67

78.36

71.59

80.95

79.72

67.32

69.93

64.39

75.62

82.75

69.97

81.49

64.35

66.16

72.07

72.77

71.38

76.39

74.76

79.00

73.67

80.81

65.87

68.99

67.95

79.93

82.69

78.62

69.50

68.97

78.89

80.76

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

15

Presenting your data (Histogram)


Charts which show the frequency distributions of continuous
variables are called histograms.
Unlike bar charts, they are drawn without gaps between the bars
because the x-axis is used to represent the class intervals

Not so good in Excel!

Sometimes bin centers are shown

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

16

Tutorial Assignment II
A particular data set is described to you and you must choose
which of the following four graphs (Bar Chart, Histogram, Pie Chart
and Scatter Plot) is most appropriate for presentation of this data

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

17

How to make good figures


First lets look at a bad figure

Problems with figure?

Appropriate axis and origin


Tick marks
Legends

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

18

Do not use figures unnecessarily. They take space and are costly!
Avoid them: especially when result can be described using simple one line text
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Figure source: http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefigs.html

Anatomy of a Figure

19

Figure source: http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefigs.html

Anatomy of a Figure

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

20

The Anatomy of a Figure

Use figures rather than table if same information can conveyed


using figure. Tables often take more space than figure
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

Table source: http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefigs.html

Anatomy of a Table

21

Error Bars
Some researchers are often unsure how error bars
should be used and interpreted.
Discuss some basic features of error bars and how
these can help communicate data and assist correct
interpretation.
Error bars may show confidence intervals, standard
errors, standard deviations, or other quantities.
Different types of error bars give quite different
informations, and so figure legends must make
clear what error bars represent
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

22

Error Bars (Descriptive and Inferential)


Error bars, if used properly:
EITHER give information describing the data
(Descriptive Error Bars)
OR, give information about what conclusions,
or inferences, are justified
(Inferential Error Bars)

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

23

Range and standard deviation (SD) are


descriptive error bars because they show
how the data are spread.
Range error bars encompass the lowest
and highest values.
Standard Deviation (SD) roughly gives the
average or typical difference between the
data points and their mean, M.

Figure source: Cumming et al., JCB, Vol. 177, 7 (2007)

Error Bars (Descriptive)

About two thirds of data points lie within


mean 1 SD region, and ~95% of the
data points will be within mean 2 SD
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

24

Error Bars (Descriptive)


Mean of your results (M) will tend to get closer and closer
to the true mean , as you increase the size of your
sample, or repeat the experiment more times. Therefore,
we can use M as our best estimate of the unknown .
Similarly, if repeat an experiment more and more times,
the SD of your experimental results will tend to more and
more closely approximate the true standard deviation ()
that you would get if the experiment was performed an
infinite number of times
The SD of the experimental results will approximately
equal to , whether n is large or small.
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

25

Error Bars (Inferential)


In biology and some other fields it is common to compare
samples from two groups, to see if they are different.
Examples: wild-type vs mutant mice, or experimental result
with a control
To make inferences from the data i.e. to make a judgment
whether the results are significantly different, or whether the
differences in results are due to random fluctuation or
chance, a different type of error bar can be used (inferential).
These interferential error bars are standard error (SE/SEM)
bars and confidence intervals (CI/95%CI).
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

26

Error Bars (Inferential)


Mean (M) of the data with SE or CI error bars, gives an
indication of the region where you can expect the mean of
the whole possible set of results.
This region defines the values
that are most plausible for .

Figure source: Cumming et al., JCB, Vol. 177, 7 (2007)

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

27

Types of Error Bars


Error bars can be descriptive or inferential, and could be any of
the bars discussed on previous slides (or something else too)

Error bars are meaningless and misleading if figure


legend does not state what kind they are.
Take Home Message I
When showing error bars, always describe in the figure
legends what they are
Use error bars with caution especially when reporting data
replicate measurements and representative experiments
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

28

Replicates or Independent Samples


Scientist handles the wide variation that occurs in nature by
measuring a number of independently sampled individuals,
independently conducted experiments, or independent
observations (n).

Take Home Message II


Value of n (sample size or number of independently
performed experiments) must be stated in figure legend
It is very important that n (the number of independent
results) is carefully distinguished from the number of
replicates
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

29

Replicates: Example
Replicates: repetition of measurement on one individual in a
single condition, or multiple measurements of the same or
identical samples
Consider a lab experiment to determine whether deletion of a
gene in mice affects tail length.
Option 1: choose one mutant mouse and one wild type, and
perform 10 measurements of each of their tails.
Option 2: measure the tail lengths 10 wild type mice and 10
mutant mice

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

30

Replicates: Example
Option1 can not answer the central question, whether gene
deletion affects tail length, because n=1 for each genotype,
no matter how often each tail was measured
To address this question successfully we must distinguish
the possible effect of gene deletion from natural animal-toanimal variation.
Therefore, Option 2 is the correct experiment to do as n>1

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

31

Representative Experiments
Sometimes a figure shows only the data for a representative
experiment. This implies that several other similar
experiments were also performed. If a representative
experiment is shown, then n = 1, and no error bars should
be shown
Take Home Message III
Show error bars only for independently repeated
experiments, and never for replicates. Data from a
representative experiment should not have error bars,
because in such an experiment, n = 1

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

32

Comparing Results-Which Error Bars to use?


Which error bar to use when comparing experimental
results
For example, biologist usually try to compare experimental
results with controls,
It is usually appropriate to show inferential error bars, such
as SE or CI, rather than SD when comparing experimental
results with some controls.

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

33

Table source: Cumming et al., JCB, Vol. 177, 7 (2007)

Summary - Common Error Bars

Take Home Message IV


Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

34

Acknowledgments and Useful Resources


I would like to thank all those whose scientific papers,
lecture notes and other course materials available on
the internet have directly/indirectly helped me to
prepare these lecture slides.
Some of the useful resources are listed below:
1.

Cumming, G.; Fidler, F.; Vaux, D. L. (2007). "Error bars in experimental biology".
The Journal of Cell Biology 177 (1): 711. doi:10.1083/jcb.200611141

2.

http://www.dur.ac.uk/stat.web/variab.htm

3.

http://www.dur.ac.uk/~dbl0www6/cont_pres.htm

4.

http://www.ruf.rice.edu/~bioslabs/tools/tools.html

5.

http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefightml

Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

35

Thank You !
Dr. Ambarish Kunwar, Department of Biosciences and Bioengineering, IIT Bombay

36