Está en la página 1de 22

How to Lie with Statistics

Econ 105 (2012-2013)-Last Class

Many ways of misusing statistics


Only a couple of them:

Misinformation Fabricating your own data Playing with graphs Relying on absolute numbers rather than relatives (or ignorance of baseline) Arbitrary or wrong comparisions Outliers Causality versus correlations Sampling bias or selection bias

Misinformation
You assume that your claim is supported by available statistics without refering to any available statistics

Misinformation
Example: The Turkish economy did not experience a crisis in 2009. If you spend enough time for checking data, it is obvious that in terms of the definition of crisis in economics the Turkish economy experienced a deep crisis which was very similar to the case of 2001 in terms of the decline in the GDP growth.

Fabricating your own data


You can simply pretend to know the data or/and fabricate artificial data There are many examples regarding creating artificial survey data sets to support a certain hypothesis

Playing with Data (choosing a specific presentation method to support your argument)
A Comparision of Population of Some Cities
16,000,000.00

14,000,000.00
12,000,000.00 10,000,000.00 8,000,000.00 6,000,000.00 4,000,000.00 2,000,000.00

0.00
stanbul Ankara Adana

In our case,
A Comparison of Population of Some Cities
70,000,000.00

60,000,000.00

50,000,000.00

40,000,000.00

30,000,000.00

20,000,000.00

10,000,000.00

0.00 stanbul Ankara Adana

An example from a web page, (the weight of the pumpkins of three farmers

What if you wanted to convince people that all the pumpkins were about the same size. graph:

Ignoring bases
Among the cities below, which city is the most dangerous one in Turkey? Adana?, Ankara?, ankr?, Gmhana?, stanbul?

Total Convicts
Total Male Female

Turkey stanbul Ankara

88480 13057 4662

84956 12583 4504

3524 474 158

Giresun Adana

392 2956

376 2 830

16 126

Relying on absolute numbers rather than relative number (Ignoring bases)


Total Convicted People in Different Cities in 2011 Per male (The Per female Total Per number of (The number Fema Person male of female Male le (Convicts/ Convicts/the Convicts/the Population male female city population) population) 0.118 0.096 0.095 0.052 0.296 0.140 0.226 0.184 0.185 0.103 0.561 0.269 0.009 0.007 0.006 0.001 0.024 0.012

Total

Turkey stanbul Ankara ankr Giresun Adana

88480 84956 3524 13057 12583 474 4662 4504 158 92 91 1 392 2956 376 2 830 16 126

Relying on absolute numbers rather than relative number


Example: Istanbul is the most dangerous city among these cities whereas Giresun is the safest place is the wrong statement.. Importance of relatives (or baseline!!)

Relying on absolute numbers rather than relative number


In fact, many newspaper column are simply wrong because, they do not pay attention to relatives Any other example ?

Arbirtary or wrong comparisons


Remember our example, For example, to be able to support the argument that the Turkish Current Account Deficit is not very bad, you can pick a couple of countries whose Current Accounts are worse than the Turkish case Or to support the argument that the Turkish Current Account is very bad you can only pick a couple of countries whose Current Accounts are better than the Turkish case!

Outlier Problem
Name Ali Mehmet Selin Dilem Sevgi Murat Mean Standard Deviation Age 18 20 19 24 23 69 28.83 18.09

We may need to get rid of outliers to interpret our data sets better!
Name Ali Mehmet Selin Dilem Sevgi Age 18 20 19 24 23

Mean Standard Deviation

20.80 2.32

Causality versus Correlation


Remeber our discussion: Relying on figures or correlation/econometric results some people may simply argue that event A is caused by event B However, correlation may not mean causality. A simple correlation statistics or a sophisticated econometric results are only indicative without implying a correlation among variables!

Inflation and Population


100 90 80 70 60 50 % 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Population Growth Inflation

40
30 20 10 0

Selection bias or sampling problem


This problem may an unintentional result of a usage of a complex data. Alternatively, it can be a very sophisticated techniques of manipulating public. This is the case when a sample which may not represent a whole population is chosen to discuss an issue regarding the whole population.

Example of Sample Bias


Suppose that you want to conduct a survey about the possible results of the next election. You got 50.000 online response to your questionnaire You claim that your results are very reliable because your data set is larger than any data set used by other studies What is the problem with your study and data?

Sampling problem (Selection Bias)


You only reached to those who had an excess to the internet However, these group of people may not be representative enough!

También podría gustarte