Está en la página 1de 6

Hi Class - Answer three of the following DQs.

  Two responses are due by Day 3


and one is due by Day 6.  Please keep in mind that some of these are from are
objectives from Week 5.  There is no participation requirement or DQs for Week
5. 

1.) When and how do you use a chi-square distribution to test if two variables
are independent? Also, provide an example showing how to use the contingency
table to find expected frequencies.

You can use the chi-square test to see if two categorical variables are
independent of each other. The null hypothesis is always that there is no
relationship between the two variables. Other conditions are:

The population is at least 10 times the sample size


The samples are drawn by simple random sampling from the same population

The distribution is positively skewed but approaches the normal distribution as


the number of degrees of freedom increases.
The x2-value is always positive.

To calculate the expected frequencies in the contingency table, you multiply the
row total (r) by the column total (k) for each cell, and then divide by the table total

State the H0 and Ha: the null hypothesis states that the two variables are
statistically independent. This means that knowledge of the one variable does
not help in predicting the other variable. H0 – the variables are independent (no
relationship). Ha – the variables are dependent (is a relationship)
Select the level of significance (a)
State the decision rule by defining the rejection region.
You must know the level of significance (a) and the degrees of freedom (df) to
find the critical value from the x2- table.
df= (number of rows – 1) * (number of columns – 1) from the contingency
table
= (r-1)(k-1)

The top row of the x2-table shows the significance level and the first column
contains the number of degrees of freedom. Because the x2-distribution is
positively skewed, the critical value will always be positive and in the right-hand
tail of the curve. The non-rejection region for H0 goes from the left tail of the
curve to the x2- critical value. To the right lies the rejection area. We will reject
H0 if the x2-test > x2-table value.
 
2.)   How do you use the chi-square distribution to test if a frequency distribution
fits a claimed distribution? Provide an example showing the steps of the chi-
square Goodness-of-Fit Test.

A goodness-of-fit test is an inferential procedure used to determine whether a


frequency distribution follows a claimed distribution, basically a test for
comparing observed frequencies with theoretically predicted frequencies.

Goodness-of-fit tests can be used when one has more than 2 categories. Also,
goodness-of-fit tests can be used when you expect something other than equal
frequencies in groups.

This example will combine both of these:

Suppose that we already know that in the US, 50% of the population has
brown/black hair, 40% has blonde hair, and 10% has red hair

We want to see if this is true in Europe too: we sample 80 Europeans and record
their hair color:

Black/Brown Blonde Red


Observed 38 36 6
Expected 40 32 8

H0: the observed frequencies will = the expected frequencies


H1: the observed frequencies will ≠ the expected frequencies

Calculated: χ2 = ∑( (O-E)2/E)

Where: O: the observed freq in each category


E: the expected freq in each category

χ2 = ((38-40) 2/40) + ((36-32) 2/32) + ((6-8)2/8)


χ2 = 0.1+0.5+0.5
χ2 = 1.1

The goodness-of-fit test has k – 1 = d.f. (degrees of freedom)

(where k = the # of categories)

In the current example, k = 3 ( 3 categories of: Black/Brown hair, Blonde hair,


Red hair)
thus we have d.f. = 3 – 1 = 2

If the calculated χ2 equals or exceeds the critical value, then we reject H 0

If the calculated χ2 does NOT equal or exceed the critical value, then we fail to
reject H0

d.f. = 3 – 1 = 2

critical value (from Table, using α = .05 and d.f. = 2) = 5.99

Our obtained χ2 of 1.1 does not equal or exceed this value. Therefore we fail to
reject H0.

Our conclusion:

“The distribution of hair color in Europe is not different than the distribution of hair
color in the US, χ2(2, N = 80) = 1.1, p > .05.”

We need a new distribution to do this: the χ 2 distribution

The χ2 distribution:

--A family of distributions, based on different df


--Positively skewed (though degree of skew varies with df)
--See Table E.1 for CV’s
--CV’s all in upper tail of distribution
--When H0 is true, each observed freq. equals the corresponding
expected freq., and is zero

3.)   Does correlation equal causality? Why or why not?

Regardless of the calculated strength, correlation does not necessarily imply a


cause-and-effect relationship between the variables. Another way to look at this
issue is to assert that every correlation does not lead to a cause-and-effect
relationship, but every cause-and-effect relationship does assume that a prior
correlation does exist. Many people conclude incorrectly that if two variables are
correlated, one must be the cause of another. This is not necessarily true. It can
be argued that correlation is a necessary but not sufficient condition to establish
a causal relationship. It is difficult to imagine, although not impossible under
laboratory conditions, a situation where a variable would be the cause of another
but where the two variables would not show a correlation. A correlation between
two variables is only a starting point for examining causality. A strong and
significant correlation definitely supports an argument of causality and should
certainly lead to further exploration of the possibility of causation.

Correlation does not equal causality example: Just because A is positively


correlated with B does not mean that A caused B, even when that difference is
statistically significant. Correlational evidence is never sufficient grounds, all by
itself, to infer causation. We may have perfect correlation, yet resist making a
causal inference. For example, despite the perfect correlation between wind
blowing hard and the lack of crop failure due to stampeding elephants, the causal
prediction that winds prevent crop destroying elephant stampedes is
unwarranted. Likewise every time you go to work you may brush your teeth, but
it hardly follows from that, that you brushing your teeth causes you to go to work.

To avoid confusing correlation with causation, it is helpful to realize that


correlation is a symmetrical relation, while causation is asymmetrical. If A is
correlated with B, B is likewise correlated with A. But this symmetry does not
exist with causal relations. If A caused B, B did not cause A.

For example, it’s very probable that you could demonstrate a strong positive
correlation between the variable of average daily temperature and the variable
number of juvenile offenses committed in most communities in the U.S. Having
discovered such as strong positive correlation, however, does not mean that a
high temperature causes the commission of juvenile offenses. In fact, we know
from a broad array of sociological studies

4.)   What is the difference between strong positive and strong negative
correlation? Is there value in a strong negative correlation? If so, what?

Correlational claims are based on observations of two distinct objects or events,


let’s say event A and event B. Event A may be positively correlated with B,
negatively correlated with B, or show no correlation with B. Event A is positively
correlated with B when an increase in As is accompanied by an increase in Bs.
That is, if there are more As, then there are more Bs. A is negatively correlated if
the reverse holds, such that the more As there are, the less Bs. If however many
As there are has no bearing on how many Bs there are, we say there is no
correlation; the two are unrelated. Having said that, however, a strong positive or
negative correlation might serve as an indication of possible causality in some
situations. Assume for example that you discovered a strong negative correlation
between the length of time p
5.)   What does zero correlation tell you?

When a correlation coefficient is not statistically different from zero, the two
measures are said to be uncorrelated. Technically, this means that knowing that
the value of one measure does not allow you to predict the value of the second
measure with accuracy greater than chance. If the correlation between tow
variables is zero, no statistical relationship is present – a value on one measure
reveals nothing about the other measure.

A zero correlation tells us that the variables are not related at all and finding a
zero correlation would not be helpful for drawing any types of inferences. Such a
result could be the product of many influences, including the lack of reliability of
your assessment instrument. When the correlation coefficient is zero it
corresponds to cases where the two estimated regression lines are parallel to the
axes. Some examples of data having zero correlation are shown below:

y y y

x x x
The last of these diagrams serves to emphasize that the correlation coefficient is
a statistic that is specifically concerned with linear relationships; zero correlation
does not necessarily imply that x and y are unrelated.

A good rule of thumb to remember is “Independence implies zero correlation, but


zero correlation does not imply independence. “(Härdle, & Simar, 2003)

Zero correlation means that there is no connection at all. If one correlated the
length of people’s noses with their intelligence, the result would be neither
positive nor negative and is an example of zero correlation. The connection is
meaningless, and the correlation would be zero as intelligence and nose length
are not correlated.

Reference: Härdle, Wolfgang, & Simar, Léopold. (2003). Applied multivariate


statistical analysis. Springer Verlag.
This measure does not mean that the one series of measurements is the cause
of the other, but it offers an indication that there is some probable connection
between the two phenomena measured. Thus if we know that the correlation
between tested intelligence and the length of schooling is plus 8, say, then we
have grounds for belief that the two are connected in a causal way.

6.)    How is linear regression used? What are its limitations?

7.)   What is the relationship between the algebraic equation for a straight line
and the linear regression equation?

For example if there is zero correlation between SAT scores and GPA, then
knowing a person’s SAT score tells you nothing about that person’s GPA.

También podría gustarte