Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Lesson 2
The Quartiles
Q1: The 1st Quartile
o Has 25% of the observations in the ordered data set below
it and 75% above it
Q2 the 2nd Quartile
o Median (M)
Q3
o 75% below, 25% above
computing Q1 and Q3
o order the data and find the median (doesnt have to be an
actual data point)
o if number of data points (n) is odd, then leave the overall
median out of the computation but if n is even, use all
observations
o Q1 = median of lower half of data and Q3 = median of
upper half of data
5 Number Summary and Box Plot
Box plot: symmetry of box and whiskers
o Width of brackets give info about spread (measure of
spread = quartiles)
o Central box spans quartiles
o Center line of box = median
o 50% data above median and 50% below
o whisker on left longer = left skewed
o range: entire length of box plot from max to min
o IQR: Q3 Q1
Standard Deviation
Affected by strong skewnesss and outliers
Has same unit of measurement of original observations
Is either 0 or positive
o S=0 then data doesnt vary and all the numbers in the data
set are equal
* 1-var stats L1 gives mean, (x), S.d (Sx), 5 number summary
basically
Steps of Statistical Problem-Solving
Lesson 3
Explanatory variable (x) may explain/influence changes in the response
variable
Comes first and predicts response variables
Response variable (y) measures outcome on each individual
Scatterplots
To graph bivariate data, determine scales on x and y axes (dont
have to be same scale) and plot each (x,y) pair
Direction
o Positive: positively sloped; as x increases, y increases
o Negative: downward slope
Form
o Linear (straight line)
o Nonlinear (curvature) quadratic, exponential, etc.
Strength of relationship
o Strong: points concentrated about the form
o Weak:
Correlation Coefficient (r)
A number that gives a measure of the direction and strength of
the linear relationship between two quantitative variables, x and
y
o Direction: + or
o Strength: 0, weakest; 1, strongest
-1 </ r </ 1
o r=-1: points on straight line with negative slope
o r =0: no linear association/pattern; points appear randomly
o r=1: points on straight line with positive slope
has no units of measure
doesnt distinguish between explanatory and response variables
NOT appropriate for curvilinear relationships
Is affected by outliers
Matched Pairs
Special case of RBD
Explanatory variable: two treatments (makes a pair, duh); more
than 2 = normal RBD
o Ex. twins each get treatment, 2 treatments for each
individual, measurements before and after treatment on
each individual
Cautons about Experimentation
Hidden bias: bias thats introduced by not treating all individuals
equally after treatments are applied
Placebo effect positive response for subjects taking placebo due
to confident in doctor and hope in medication .. thats why
control group with placebo is necessary
Observational Studies
Researchers observe individuals and record info about variables
of interest
No treatment is imposed!! And individuals self-select which
treatment to receive
o Observe them in neutral way without changing course of
their livs
Influential factors can be controlled
All sample surveys are observational
Often have confounding = fail to have clear causal conclusions
Bad Sampling: non-probability sample chosen using personal
judgement or human subjectivity
Convenience sample = produces unrepresentative sample
Voluntary response sample: individuals choose themselves =
consists of ppl with really strong opiniions who are more likely to
respond
Mall-intercept sample: mall shoppers are interviewed = retired,
middle class, and teens are overrepresented whereas poor are
underrepresented
Quota sample: individuals selected to fill quotas; most
interviewers use own preferences in choosing individuals to
sample
Simple Random Samples (SRS)
A sample size of n chosen from the population in such a way
that every set of n individuals has an equal chance to be part of
the sample actually selected
In other words, every possible sample from the population has
equal chance of selection
Sample taken from entire population
Stratified Sample
Lesson 5
Probability
Describes what happens in many trials and how likely an event is
to occur describes what happens in long run
P = 0 event is impossible and will never occur
P = 1 event is certain and will occur on every trial
Discrete probability models
Sample space made up of a list of individual, discrete outcomes
(whole numbers)
Continuous Probability Models
Probabilities = area under a density curve
Lesson 6
Normal Distributions