Documentos de Académico
Documentos de Profesional
Documentos de Cultura
2
Data Collection
Data Vocabulary
Level of Measurement
Time Series and Cross-sectional Data
Sampling Concepts
Sampling Methods
Data Sources
Survey Research
Data Vocabulary
Data is the plural form of the Latin datum (a given
fact).
In scientific research, data arise
from experiments whose results
are recorded systematically.
In business, data usually arise from
accounting transactions or
management processes.
Types of Data
Attribute Numerical
(qualitative) (quantitative)
Terrible Poor Adequate Good Excellent
Cross-sectional Data
Each observation represents a different individual
unit (e.g., person) at the same point in time
(e.g., monthly VISA balances).
We are interested in
- variation among observations or in
- relationships.
We can combine the two data types to get pooled
cross-sectional and time series data.
Sampling Concepts
Sample or Census?
A sample involves looking only at some items
selected from the population.
A census is an examination of all items in a
defined population.
Why cant the United States Census survey every
person in the population?
- Mobility
- Illegal immigrants
- Budget constraints
- Incomplete responses or nonresponses
Sampling Concepts
Situations Where A Sample May Be Preferred:
Infinite Population
No census is possible if the population is infinite or of indefinite size
(an assembly line can keep producing bolts, a doctor can keep
seeing more patients).
Destructive Testing
The act of sampling may destroy or devalue the item (measuring
battery life, testing auto crashworthiness, or testing aircraft turbofan
engine life).
Timely Results
Sampling may yield more timely results than a census (checking
wheat samples for moisture and protein content, checking peanut
butter for aflatoxin contamination).
Sampling Concepts
Situations Where A Sample May Be Preferred:
Accuracy
Sample estimates can be more accurate than a census. Instead of
spreading limited resources thinly to attempt a census, our budget
of time and money might be better spent to hire experienced staff,
improve training of field interviewers, and improve data safeguards.
Cost
Even if it is feasible to take a census, the cost, either in time or
money, may exceed our budget.
Sensitive Information
Some kinds of information are better captured by a well-designed
sample, rather than attempting a census. Confidentiality may also
be improved in a carefully-done sample.
Sampling Concepts
Situations Where A Census May Be Preferred
Small Population
If the population is small, there is little reason to sample, for the effort of
data collection may be only a small part of the total cost.
Large Sample Size
If the required sample size approaches the population size, we might as
well go ahead and take a census.
Database Exists
If the data are on disk we can examine 100% of the cases. But auditing or
validating data against physical records may raise the cost.
Legal Requirements
Banks must count all the cash in bank teller drawers at the end of each
business day. The U.S. Congress forbade sampling in the 2000 decennial
population census.
Sampling Concepts
Parameters and Statistics
Statistics are computed from a sample of n items,
chosen from a population of N items.
Statistics can be used as estimates of parameters
found in the population.
Symbols are used to represent population
parameters and sample statistics.
Sampling Concepts
Parameters and Statistics
Parameter or Statistic?
Parameter Any measurement that describes an entire population.
Usually, the parameter value is unknown since we
rarely can observe the entire population. Parameters
are often (but not always) represented by Greek
letters.
Statistic Any measurement computed from a sample. Usually,
the statistic is regarded as an estimate of a population
parameter. Sample statistics are often (but not
always) represented by Roman letters.
Sampling Concepts
Parameters and Statistics
The population must be carefully specified and the
sample must be drawn scientifically so that the
sample is representative.
Target Population
The target population is the population we are
interested in (e.g., U.S. gasoline prices).
The sampling frame is the group from which we
take the sample (e.g., 115,000 stations).
The frame should not differ from the target
population.
Sampling Concepts
Finite or Infinite?
A population is finite if it has a definite size, even if
its size is unknown.
A population is infinite if it is of arbitrarily large
size.
Rule of Thumb: A population may be treated as
infinite when N is at least 20 times n (i.e., when
N/n > 20)
N n
Here,
N/n > 20
Sampling Methods
Probability Samples
We rely on random
numbers to select a
name.
=RANDBETWEEN(1,48)
Sampling Methods
Random Number Tables
A table of random digits used to select random
numbers between 1 and N.
Each digit 0 through 9 is equally likely to be
chosen.
Setting Up a Rule
For example, NilCo wants to award cash prizes to
10 of its 875 loyal customers.
To get 10 three-digit numbers between 001 and
875, we define any consistent rule for moving
through the random number table.
Sampling Methods
Setting Up a Rule
Randomly point at the table to choose a starting
point.
Choose the first three digits of the selected five-
digit block, move to the right one column, down
one row, and repeat.
When we reach the end of a line, wrap around to
the other side of the table and continue.
Discard any number greater than 875 and any
duplicates.
Start Here Table of 1,000 Random Digits
82134 14458 66716 54269 31928 46241 03052 00260 32367 25783
07139 16829 76768 11913 42434 91961 92934 18229 15595 02566
45056 43939 31188 43272 11332 99494 19348 97076 95605 28010
10244 19093 51678 63463 85568 70034 82811 23261 48794 63984
12940 84434 50087 20189 58009 66972 05764 10421 36875 64964
84438 45828 40353 28925 11911 53502 24640 96880 93166 68409
98681 67871 71735 64113 90139 33466 65312 90655 75444 30845
43290 96753 18799 49713 39227 15955 46167 63853 03633 19990
96893 85410 88233 22094 30605 79024 01791 38839 85531 94576
75403 41227 00192 16814 47054 16814 81349 92264 01028 29071
78064 92111 51541 76563 69027 67718 06499 71938 17354 12680
26246 71746 94019 93165 96713 03316 75912 86209 12081 57817
98766 67312 96358 21351 86448 31828 86113 78868 67243 06763
37895 51055 11929 44443 15995 72935 99631 18190 85877 31309
27988 81163 52212 25102 61798 28670 01358 60354 74015 18556
19216 53008 44498 19262 12196 93947 90162 76337 12646 26838
28078 86729 69438 24235 35208 48957 53529 76297 41741 54735
34455 61363 93711 68038 75960 16327 95716 66964 28634 65015
53510 90412 70438 45932 57815 75144 52472 61817 41562 42084
30658 18894 88208 97867 30737 94985 18235 02178 39728 66398
Sampling Methods
With or Without Replacement
If we allow duplicates when sampling, then we are
sampling with replacement.
Duplicates are unlikely when n is much smaller
than N.
If we do not allow duplicates when sampling, then
we are sampling without replacement.
Sampling Methods
Computer Methods
Excel - Option A Enter the Excel function =RANDBETWEEN(1,875)
into 10 spread-sheet cells. Press F9 to get a new
sample.
Excel - Option B Enter the function =INT(1+875*RAND()) into 10
spreadsheet cells. Press F9 to get a new sample.
Internet The web site www.random.org will give you many
kinds of excellent random numbers (integers,
decimals, etc).
Minitab Use Minitabs Random Data menu with the Integer
option.
These are pseudo-random generators because even the best
algorithms eventually repeat themselves.
Using MINITAB to generate random numbers.
Sampling Methods
Row Column Data Arrays
When the data are arranged in a rectangular array,
an item can be chosen at random by selecting a
row and column.
Here is an
example of 4
elements sampled
from each of 3
randomly chosen
clusters (two-stage
cluster sampling).
Sampling Methods
Cluster Sample
Cluster sampling is useful when
- Population frame and stratum characteristics are
not readily available
- It is too expensive to obtain a simple or stratified
sample
- The cost of obtaining data increases sharply with
distance
- Some loss of reliability is acceptable
Sampling Methods
Judgment Sample
A nonprobability sampling method that relies on
the expertise of the sampler to choose items that
are representative of the population.
Can be affected by subconscious bias (i.e.,
nonrandomness in the choice).
Quota sampling is a special kind of judgment
sampling, in which the interviewer chooses a
certain number of people in each category.
Sampling Methods
Convenience Sample
Take advantage of whatever sample is available at
that moment. A quick way to sample.
Sample Size
Sample size depends on the inherent variability of
the quantity being measured and on the desired
precision of the estimate.
Data Sources
Useful Data Sources
Type of Data Examples
U.S. general data Statistical Abstract of the U.S.
U.S. economic data Economic Report of the President
Almanacs World Almanac, Time Almanac
Periodicals Economist, Business Week, Fortune
Indexes New York Times, Wall Street Journal
Databases CompuStat, Citibase, U.S. Census
World data CIA World Factbook
Web Google, Yahoo, msn
Survey Research
Basic Steps of Survey Research
Step 1: State the goals of the research