Wuolah Free SII Tests

SII-Tests.
pdf
sabinarodriguez
Estadística II
1º Grado en Psicología
Facultad de Psicología
Universitat de València
Reservados todos los derechos.

No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
Statistics II (TESTS)
Group ARA, Year I

Psychology, University of Valencia
Course 2021 - 2022
Professor: Pedro Valero
Student: Sabina Rodríguez
Reservados todos los derechos. No se permite la explotación económica ni la transformación de esta obra. Queda permitida la impresión en su totalidad.
a64b0469ff35958ef4ab887a898bd50bdfbbe91a-5511582
2
3
Index
Test 1. Causality 5
Test 2. Foundations of psychology 11
Test 3. Scientific method 17
Test 4. Rules for test selection 23
Test 5. Steps for applying tests 28
Test 6. T-one sample tests 36
Test 7. T-tests for independent groups 52
Test 8. Anova for independent variables 62
Test 9. T-test for paired samples 73
Test 10. Anova repeated measures 80
4
5
Test 1. Causality
1. Ivtzan, I., Young, T., Martman, J., Jeffrey, A., Lomas, T., Hart, R., & Eiroa-Orosa, F. J.
(2016). Integrating mindfulness into positive psychology: A randomized controlled trial of an
online positive mindfulness program. Mindfulness, 7(6), 1396-1407.
Search the article to answer the following questions. You will find most of the answers in the
method section but some of them will be in other parts of the article.
a. The study is a randomized clinical trial.

b. The control group comes from people on the waiting list.
c. The treatment consists of the positive mindfulness program that is applied in groups
of 10 people who attend a mindfulness center.
d. The treatment was applied online.
e. All participants were volunteers.
f. The effect was measured using self-report scales that were filled out in the presence
of the investigators.
g. One of the measures was the MAAS (Mindful Attention Awareness Scale).
h. Only 53 people completed the mindfulness program (experimental group).
i. Only 168 people completed all the questionnaires.
j. No significant differences were observed between the control group and the
experimental group with respect to sociodemographic variables at the beginning of
the study.
- TRUE: This one was easy: it is in the title.

- TRUE: Indeed, the control group comes from people on the waiting list.
- FALSE: The program was not applied in groups but individually (online).
- TRUE: The treatment was applied online, indeed.
- TRUE: All participants were volunteers.
- FALSE: The study is online and, consequently, when the subjects filled in the
questionnaires, they were not in the presence of the researcher.
- FALSE: Not. The MAAS was not one of the measures (although it is a widely used
scale for mindfulness).
- TRUE: Only 53 people completed the mindfulness program (experimental group).
- TRUE: Only 168 people completed all the questionnaires.
- TRUE: No significant differences were observed between the control group and the
experimental group with regard to sociodemographic variables at the beginning of the
study
2. Ivtzan, I., Young, T., Martman, J., Jeffrey, A., Lomas, T., Hart, R., & Eiroa-Orosa, F. J.
(2016). Integrating mindfulness into positive psychology: A randomized controlled trial of an
online positive mindfulness program. Mindfulness, 7(6), 1396-1407.
Search the article to answer the following questions. You will find most of the answers in the
method section but some of them will be in other parts of the article.
6
a. The study does not find that the control group is different from the experimental
group in the sociodemographic variables.
b. As all participants are volunteers, the results may not apply to people who would
never want to participate in a mindfulness program.
c. References seem adequate.
d. The percentage of women participating was higher than that of men.
e. Most of the participants had university studies.
f. The effect on mindfulness was measured using the FMI scale (Freiburg mindfulness
inventory).
g. The effect on happiness was measured using the Pemberton Happiness Index (PHI).
h. The results may be confounded: perhaps those who dropped out of the program
were the most unsatisfied with it, and only those who were doing well remained, so
the evaluations are positively biased.
i. With these results, we can recommend this positive mindfulness program to
everyone.
j. With these results, it is not proven that this program works.
- FALSE: Actually, the study does not give information about that. This is probably the
biggest problem with this study: since many subjects did not finish the program
—perhaps because they felt that it was not working— and, consequently, those who
stayed were most likely the ones who were more satisfied with it, and the average
score of the responses was pumped up incorrectly.
- TRUE: Since all the participants are volunteers, we don’t know if for someone who
wouldn’t be interested in participating in this type of program it would be of any use.
This is a problem with many therapies: if those who start them are not predisposed
enough, they may not be of any use to them and will not work.
- TRUE: The references seem adequate.
- TRUE: Indeed, the percentage of women who volunteered to participate was higher
than that of men.
- TRUE: Indeed, most of the participants had university studies.
- TRUE: The effect on mindfulness was measured using the FMI (Freiburg mindfulness
inventory) scale. It puts it on page 1401 below, in the Measures section.
- TRUE: The effect on happiness was measured using the Pemberton Happiness
Index (PHI). It is mentioned on the page 1401 below, in the Measures section. By the
way, the authors of the referenced article (Hervás and Vázquez, 2013) are from the
Complutense University, so that scale is available in Spanish.
- TRUE: I am afraid that, if I myself have understood the article correctly, the results
may be muddled because those who dropped out of the program might be the ones
who did the worst and only those who did well remained. That would cast doubt on
the results unless we studied the reasons for abandoning the program earlier and
found that they were unrelated to the outcome.
- FALSE: This study did not manage well the people who dropped out of the program.
For this reason, it is not very clear if the program works or not, since we only have
data from a part of the participants in the treatment group which may be biased.
- TRUE: Indeed, although this study offers promising results, it is not conclusive (at
least this is my opinion, but let me know yours if you feel otherwise)
7
3. Mensink, M.C., & Dodge, L. (2014). Music and memory: Effects of listening to music while
studying in college students.
Search the article to answer the following questions, indicating if the answer is correct
(several alternatives may be correct). Most of the answers are in the method section but
others there may be in other parts of the article
a. The study has a reference section that seems quite reasonable.

b. The study reports the mean effect of treatments.
c. The article mixes the two styles of music (classical and pop) to make a new
treatment: people who heard any style of music.
d. The conclusion of the study is that it is better not to listen to music when reading.
e. The author of the article seems very competent and professional.
f. The study could have failed because the students were not sufficiently motivated.
g. The author of the article is a student.
h. Students at the University of Wisconsin-Stout had very similar success rates listening
to pop music or in the silence condition.
i. After reading this article it is clear that everyone should be recommended to listen to
classical music to study.
j. After reading this article I know that studying for several hours with the music on will
not affect my performance.
- TRUE: Yes. The study has a reference section that seems quite reasonable.
- TRUE: The study reports the mean effect of treatments on page 209.
- TRUE: The article makes two comparisons: between reading in silence and listening
to some kind of music, and then between the two styles of music and silence.
- FALSE: There are no significant differences in reading comprehension between the
music condition and the silence condition.
- TRUE: It sounds like a student’s report but her competence and professionalism
seem to be excellent.
- TRUE: The authors of the article say so in the discussion (p. 210).
- TRUE: The author is a student.
- TRUE: Indeed, it seems that pop music did not affect the participants as much as
classical music. Perhaps familiarity is an important factor that was not taken into
account in the study. The authors mention this factor too.
- FALSE: I wouldn’t say that.
- FALSE: They only read for five minutes, and the authors point out that this is a
limitation of the study. It would be interesting to do this study with longer stretches of
studying and music listening.
4. Mensink, M.C., & Dodge, L. (2014). Music and memory: Effects of listening to music while
studying in college students.
Search the article to answer the following questions, indicating if the answer is correct
(several alternatives may be correct). Most of the answers are in the method section but
there may be others in other parts of the article.
8
a. The study had a control group.

b. The study had at least one treatment.
c. The effect that we wanted to study is the ability to understand a text from a book by
Bill Bryson (which by the way I have read and I recommend reading because it is
very funny).
d. One treatment was listening to classical music.
e. One treatment was doing math exercises.
f. One treatment was listening to a Lady Gaga song.
g. One treatment was silent reading.
h. Participants were randomly assigned to one of the treatments or to the control group.
i. The authors say that the reason why classical music is the worst is because of the
lack of familiarity of the participants with it.
j. Music preferences could affect reading comprehension.
- TRUE: YES, the study had a control group.

- TRUE: The study had at least one treatment.
- TRUE: Indeed, the effect studied is the ability to understand a text from a book by Bill
Bryson.
- TRUE: Specifically, Piano Sonata No. 11 in A major (Mozart, 1783).
- FALSE: This was not a treatment, as all subjects did it. I think it was done, although it
is not explained, to make the task of remembering the text a little bit more difficult.
- TRUE: That’s right.
- TRUE: Yes.
- TRUE: Like any self-respecting experiment.
- TRUE: It says so on page 211, near the end of the article.
- TRUE: It says so on page 211, at the top of that page.
5. Click here and read the paper in order to answer the following questions, indicating if the
answer is correct (several alternatives may be correct).
a. The study is a randomized clinical trial.

b. This is a randomized study.
c. This is a review of the scientific literature.
d. The authors appear to be experts on the subject.
e. The article demonstrates that music has the same effect on anxiety levels as a
massage.
f. The article shows that music improves brain functions
g. There is a control group and a treatment group.
h. Wearing headphones improves performance on final exams.
i. Study is supported by Scott Christ’s research showing 20 health benefits of music.
j. The article states that if you are healthy you listen to music
- FALSE: Not even remotely.

- FALSE: That is mumbo jumbo.
- FALSE: A review is something more serious than this.
- FALSE: The marketing team? Experts? Oh yeah!
- FALSE: The article says that music has the same effect on anxiety levels as a
massage. It does not provide any proof of that.
9
- FALSE: Here we go again.

- FALSE: None in sight.
- FALSE: It may or may not, but this article does test that.
- FALSE: The Scott Christ article they cite is a list of purported benefits but there are
no proofs of tests.
- FALSE: I must admit that this one is not within the set of dubious statements that this
article includes.
6. This wikipedia page (https://es.wikipedia.org/wiki/Sugestopedia) describes a method for

using music for learning. Answer the following questions using the information available in it
or from other sources if you wish.
a. There are experimental studies in Google Scholar backing up the Sugestopedia

method.
b. Baroque music is used because it has the effect of relaxing the students.
c. The method affirms that the relaxation produced by the music and the slow reading
improve the ability to learn foreign languages.
d. The best way to demonstrate if the method works would be to apply it in a language
center to see what happens.
e. The article mentions that Suggestopedia is a pseudoscience.
f. If you recommend this method to me I think I would try it.
g. The explanation of the method is logical: as children learn languages more easily, it
is good to bring the student to a childish state so that he learns more easily.
h. I want to sign up for the SALT method (Suggestive Accelerated Learning and
Teaching ) as soon as possible.
i. The study is supported by basic psychology research.
j. It is obvious that a teacher who has or uses the five most important factors described
by Lozanov will be successful in teaching languages
- FALSE: I haven’t found any but if you have, tell me so I know.

- TRUE: Indeed, in this method, as in many places, it is proposed that baroque music
relaxes. However, although many things have been said about the effect of this type
of music, the truth is that its benefits are not demonstrated.
- TRUE: Indeed, the effect of music on language learning would be indirect, through
relaxation.
- FALSE: That would be the best way to see if it works commercially, but as a
demonstration of the effectiveness of the method, I don’t see much of a future for it.
- TRUE: Indeed the article on wikipedia says that suggestopedia is a pseudoscience.
- FALSE: I do not see that the version for children is better than the one for adults,
really.
- TRUE: Pseudoscientific theories usually have a more or less logical explanation, but
they usually miss some critical details. In this case, taking the student to an infantile
state seems logical, but can that really be achieved?
- FALSE: Because a company sells it is not sufficient proof that something works.
There are many miraculous methods and therapies that do not have any kind of
scientific support and have commercial success for a stretch of time until they do not.
- FALSE: As far as I can say, the method is not supported by basic psychology
research but if you find any tell me so I know.
10
- FALSE: Whether something is evident is in the eye of the beholder. In my opinion

that is not evident at all.
7. Search for the following article in academic Google and answer the questions. Most of the
answers can be found in the methods section.
Serpil, U. (2015). An analysis of the academic performance of students who listen to music
while studying. Educational Research and Reviews, 10(6), 728-732.
a. This article studies the effect of listening to music on academic achievement.

b. Subjects were randomly assigned to the control group and the treatment group.
c. There is a control group of subjects who did not listen to music while studying.
d. There is a treatment group of subjects who listened to music while studying.
e. This article describes the academic performance of those who listen to music.
f. There is an adequate description of the differences between the control group and
the treatment group.
- FALSE: Only people who listen to music are used in the study, so we have
information on those who do not listen to music and without comparison, no causal
inferences can be drawn.
- FALSE: I do not see treatment and control anywhere.
- FALSE: There is no group control that I know of.
- FALSE: There is no treatment group.
- TRUE: Indeed, the only thing that is done is to describe the academic performance of
those who listen to music.
- FALSE: There is no description of the differences between the control group and the
treatment group that I see.
11
Test 2. Foundations of psychology
1. The standard error of estimation of a mean calculated on a sample depends on:

a. The standard deviation of the population and the square root of the mean.
b. The mean calculated on the sample.
c. How well the data sample was collected.
d. The estimate of the standard deviation in the population calculated on the sample
and the square root of the sample size that has been used for the estimate
Student’s great discovery was this formula σX¯=sn−1n√ and which is used to calculate the
standard error. Here s is the estimate of the standard deviation in the population and n is the
sample size.
- False: The square root of the mean has nothing to do with the standard error.
- False: No, the mean is not part of the formula for calculating the standard error of the
mean (weird, isn’t it?).
- False: No. Whether the data was collected well or not has little to do with this matter.
- Correct: Indeed.
2. The population census is:

a. Where you look at the place you have to vote.
b. It is the person who applies censorship in a whatsapp list.
c. The count of all the individuals in a country.
d. It is the mechanism to prevent people from writing things with political content.
The count of all individuals in a country with additional information such as social, economic,
or home conditions. You can read more about the Spanish census here (if you want).
- False: This is the electoral census.
- False: No, this is the censor.
- Correct: Congratulations!
- False: No, this is censorship
3. A sample taken from a population is:

a. The best way to study a population.
b. The most scientific way to study a population.
c. The most realistic way to study a population.
d. The optimal way to study a population.
Taking population samples is the most realistic way we have to study a population (unless
you have superpowers).
- False: The best is an ambiguous word, the best for what?
- False: Scientists often use samples, but non-scientists do too.
- Correct: Sampling is the realistic way to study populations.
- False: Optimal is an ambiguous word: Optimal in which sense?
4. Does reducing the sample variance have an effect on the standard error?
a. Yes. But only when the sample is small.
b. Yes. The larger the variance the larger the standard error.
12
c. Yes. The larger the variance, the larger the standard error, so the smaller the
variance, the smaller the standard error.
d. Yes. But only when the sample is greater than 1,000 cases.
If you remember the standard error formula you will see that we divide the standard
deviation of the sample (although using n-1) by the square root of the number of cases.
σX¯=sn−1n√
The larger the variance, the larger the standard deviation, and since it is in the numerator of
the standard error formula, the standard error will also augment.
- False: It is true that, if the sample is small, a larger variance will have a higher impact
on the result of an experiment. However, although this has consequences for
experimental design, it is a topic for another course (one about Experimental
Design).
- False: see the explanation in the solution.
- Correct: As the solution indicates.
- False: Variance always has an effect, but if the sample is large, the effect of variance
diminishes, so the truth is the opposite of what is written in this alternative.
5. In most studies, the number of samples usually taken are:

a. Three.
b. One.
c. Several hundred.
d. Two
Only one sample is taken in each study (although sometimes we speak of subsamples when
it is possible to distinguish between subgroups within the full sample). However, sometimes,
the same study or a similar one is replicated, but this is exceptional as there are always
small variations in the conditions, so we usually do not say that we are drawing a new
sample, but that we are undertaking a new study. Of course, this is in Psychology: It is
possible that drawing different samples from a population under comparable conditions is
carried out in other sciences (think of Chemistry or Biology).
6. To get “the probable error of a mean” you can draw many samples of a given size from a
population and:
a. Calculate the mean on each sample and then take the means of those means.
b. Calculate the mean in each sample and then take the standard deviation of those
means.
c. Calculate the probability of each mean in the sample.
d. Calculate the mean for each sample
Calculate the mean in each sample and then take the standard deviation of those means:
Student called it the empirical solution and carried it out with a table of correlations of the
measurements of the middle finger of 3000 criminals, which makes me wonder whether
someone at some point in history thought that this measurement might be related to being a
criminal.
- False: The mean of the means is equal to zero.
13
- Correct: and its name is standard error.

- False: Huh?
- False: This would be a first step
9. Those who make political predictions make up for the variability in the results in different
samples by:
a. Taking the average of several surveys.
b. Increasing the size of the sample until it is equal to that of the population.
c. Trusting experts who know how to interpret people’s feelings.
d. Making up the result (as everyone in the media usually does).
The (weighted) average of several polls has lately established itself as the method of making
electoral predictions in the media. These averages are more accurate than individual survey
results because they incorporate more information and make up for errors or biases in
individual surveys.
- Correct: Congratulations!
- False: Working with the entire population is generally unfeasible, although of course
there are always exceptions
- False: Maybe yes, but I hope not.
- False: Hmm. Very conspiranoic, isn't it?
10. The sampling error is explained in relation to:

a. The mean and standard deviation.
b. The mean of the population.
c. Any parameter that we want to estimate in the population using samples.
d. Standard deviation.
Although the mean and standard deviation are the parameters that we will use the most, the
sampling error problem applies to any situation where we want to estimate something about
the population and we use samples: the correlation, the percentiles, the median.
- False: This is correct but there is an even better one.
- Correct: Bravo! Indeed we can calculate the standard error of many more things
besides the standard error of the mean, but this is the one we use to introduce the
concept.
11. To calculate a 95% confidence interval of the value of the mean in a population from the
mean in a sample:
a. We multiply 2.56 and -2.56 by the standard error and then add and subtract the
obtained value from the sample mean.
b. We calculate the standard error and multiply it for the values of t for the sample size
(minus 1) that leave 95% of the possible values in the middle (that value is usually
about 2). Then we add and subtract the obtained value from the sample mean.
c. We multiply 2.56 and -2.56 by the standard error and then add and subtract the value
obtained from the population mean.
d. I would request the statistical package to calculate it for me.
14
To make inferences about populations from values in samples, we use the value of t
multiplied by the standard error for a given level of confidence (usually 95%). Adding and
subtracting the sample value we calculate an interval in which we are confident the
population value will be within.
- False: 2.56 and -2.56 are the values that leave 99% of the values within when using
the z distribution. Sometimes we will use those values to calculate confidence
intervals because the t distribution with very large samples ends up being the z
distribution, but this is not the right answer in general.
- Correct: This is the correct answer.
- False: In no case can we subtract the values from the mean value in the population
because that value is always unknown.
- False: Nice try.
12. In a study on the effects of online positive psychology, volunteers were randomly
assigned either to treatment or to the waiting list (control group). The study found that
practically all of the two samples were women with a high educational level. This poses a
problem of:
a. Something but I don’t know what.
b. Of representativeness.
c. Of confounding variables.
d. From concept.
13. The difference between the means obtained in small sample sizes (say 4 cases) differ
with respect to the mean’s population more than the means obtained in a large sample size
(say 100 cases) because:
a. Small values are more likely when the sample is small than when it is large.
b. The differences between the value of the sample mean and that of the population
mean when the sample size is only 4 cases can be much larger than when the
sample size is 100 cases..
c. The differences between the value of the sample mean and that of the population
mean when the sample size is 4 cases is always larger than when the sample size is
100 cases.
d. The sample size of 4 cases has values more
clustered than when the sample is of 100 cases.
When the samples are small, if, by chance, there is a

value that deviates a lot from the central value, its
influence on the result will be great, and, therefore, the
sample mean is more likely to be off-center with respect
to the population value. So, with small samples, there is
a higher chance of getting results that are away from the
population’s mean. This is shown in the example below
and corresponds to an explanation of the slides, in
which we have samples of size 4 on the left drawn from
a population with mean=28 (yellow line) and samples of
size 100 on the right. The means of the samples are
drawn in red. You will see that the red line does not coincide with the yellow line in samples
of size 4 more often than in samples of size 100
15
- False: No. Small values are not more likely when the sample is small than when it is
large, what made you think that this could be the case?
- Correct: When the sample is small, the differences from the population value can be
larger (although not necessarily) than when the sample is large.
- False: It says always and that is not correct for sure. It may happen that by chance
the mean of a sample of 100 cases is away from the population’s mean, but it is not
normal.
- False: I don’t really know how “clustered” can be interpreted, so I hope you haven’t
answered this alternative.
14. The main contribution of this guy to the world was:

a. How to calculate the probable error of a result calculated in a sample when
estimating the value in a population.
b. Some tables with many numbers in columns.
c. Turn a hauntingly colored drink into an unforgettable
experience.
d. How to calculate the mean of the results found in
several samples to estimate the value of the
population.
Student worked for a beer company and although he would

undoubtedly help to improve the brewing of it somewhat,
there is no brand called “student’s” (although that sounds
like a good name to me). The Student’s t-tables were
probably not made by him. The calculation of the mean with
several samples to estimate the value of the population…I
don’t think it occurred to him, or if it did, it was so modest
that he would not think of claiming it as his own.
What Student did come up with was how to calculate the
standard deviation of the different mean values obtained in
each sample, which allowed him to calculate the standard error (or mean, or probable, as
you wish to call it).
- Correcto:
- Falso: Some students might answer this but their future grade would look bleak in
such cases.
- Falso: Unforgettable?
- Falso: Nope.
15. Student’s t distribution is:

a. 95%
b. A model, in standard scores, of the distribution of error one might expect to have in
estimating a mean in a population from samples
c. -2 and 2.
d. The standard error.
Student’s t distribution tells us what percentage of values will be below a value normally
expressed in standard scores. From that we can calculate the percentage of values that will
be between two standard (z) scores. Since the distribution of means calculated in samples of
16
a certain size drawn from a population follows the t distribution, we can say that it is the right
model
- False: 95% is the size of the confidence interval that we most commonly use, but it is
not the t distribution.
- Correct: If you have been led by the intuition that the longest alternative is the correct
one, this time it has worked out (but do not put too much trust in that rule because I
also know it).
- False: -2 and 2 are values that we often use for t because they allow us to calculate
an approximate 95% confidence interval without looking at tables. But they are not
the distribution t.
- False: We use the value of t (as an approximation we can use 2 and -2 for 95%
confidence) to calculate 95% confidence intervals
17
Test 3. Scientific method
1. The results part of the report of an study:

a. It is the one most related to Statistics.
b. It is the one that explains the conclusions of the studies.
c. The technical information to understand this part is usually explained in courses on
Psychometrics or Research Design.
d. It is the only one worth reading of any study.
The results part is where the statistical analyzes performed on the data are explained. It is a
fairly technical part that often uses plenty of Statistics analyzes.
- Correct: The contents of the Statistics subject are usually applied in this section, both
to learn how to write it and to understand the analyzes carried out by others in their
reports.
- False: The results should not be but brief comments about the results of the
statistical tests: the conclusions part is where you discuss your results in relation to
the theory presented in the introduction and even possible future work.
- False: Psychometrics is related to measurement and, unless the work is focused on
improving or developing a questionnaire, the part where measurement is described is
usually in the method section. And so it happens with Research Design.
- False: If you are a Statistics professor answering this exam, this answer is maybe
correct: for the rest of the world, the answer is false, the other parts of a study are
also important.
2. If a study produces results against our theory:

a. We must accept that we were wrong.
b. We reject our theory.
c. We try to accommodate the results within our theory if it is possible.
d. We reject our study.
In many sciences, theories are not so well specified that a single study can completely refute
them. Therefore, there can always be some attuning to the theory to make the results fit into
it. However, if the negative results continue piling up then it’s probably high time to start
afresh with a new theory.
- False: Although this alternative is marked as false, under certain conditions and on
certain occasions it could be correct. For example, a) if we have very little confidence
in our theory, or b) if the theory is very well detailed because then the test can be
more conclusive.
- False: This is very radical. You usually require more than one study to reject a theory
(if it has a serious one, of course).
- Correct: This is the correct alternative.
- False: That would be killing the messenger, wouldn’t it?
3. Research design:
a. It is usually taught in courses other than Statistics.
b. It is irrelevant as far as the data are well analyzed.
c. It is a part of the Statistics courses.
18
d. It cannot be taught as it relies only on the brilliance of the researcher
In Psychology, Statistics and Research Designs are regarded as different topics, although
closely related to each other. If an investigation is poorly designed, not even the most
sophisticated statistical analyzes will lead to sound conclusions.
- Correct: Indeed, Statistics and Research Designs are usually two separate topics in
Psychology (although perhaps they should not be)
- False: If a research design is not correct, statistical analysis alone will seldom save
the day.
- False: Normally, the Statistics courses usually do not deal with how to carry out
research designs in detail.
- False: Although the brilliance of some researchers often achieves wonderful research
designs that no professor could ever have taught them, there are aspects of design
that are well established and can be learned in the right books and courses
4. In the material of the course, in the section about the scientific method, there are several
journalistic articles that refer to surveys or studies at a national or international level. Using
this link related to water El mito de los 8 vasos de agua al día (nationalgeographic.com.es),
answer the question below:
According to that article, there is a myth that it is necessary to drink 8 glasses of water daily.
Suppose that —although it is not true— we suspect that young people have bought it and
college students have developed the habit of drinking 8 or more glasses of water a day…
What null hypothesis would we use in this case to test that using a random sample of
students?
a. That students drink 7 glasses of water daily or fewer than 7.

b. That students drink eight glasses of water daily or fewer than 8.
c. That students drink more than 8 glasses of water daily.
The question says “if they drink eight or more glasses of water” so the null hypothesis to test
is to drink 7 or fewer than 7 glasses of water.
- True: The question says “if they drink eight glasses of water or more” so the null
hypothesis to test is to drink 7 or fewer than 7 glasses of water.
- False: The null hypothesis to test is to drink less than 8 glasses of water, so this
answer is not correct. To support that they drink 8 or more we should put in the null
hypothesis that they drink 7 or fewer than 7 glasses of water.
- False: This would be very wrong. We think they drink 8 or more so the null
hypothesis should be 7 or fewer than 7 glasses of water
this related to smoking El consumo de tabaco en España y el mundo, en datos y gráficos
(epdata.es), answer the following question:
If we believed that university students are heavier smokers than the normal population, what
value would we place as the null hypothesis of a hypothesis test about the percentage of
students who smoke daily?
19
a. That 25% of the students smoke.

b. That more than 22% of the students smoke.
c. That 25% or more of the students smoke.
d. That 22% or less of the students smoke
The text indicates that 22% of the Spanish population claims to smoke daily (the source is
AECC/INE, so it seems reasonable to trust it). The null hypothesis is the opposite of what we
think so in this case it is 22% or less (since we think they smoke more and we would aim to
reject that).
- False: 25% are those who declare themselves ex-smokers, so you have looked at
the wrong percentage. Also, this would be a test of a specific value and not whether
the students smoke more than a particular value so this response is also wrong for
that.
- False: This would be the study hypothesis or alternative (what we want to confirm).
- False: 25% are those who declare themselves ex-smokers, so you have looked at
the wrong percentage
- True: Well done
this related to smoking El consumo de tabaco en España y el mundo, en datos y gráficos
(epdata.es), answer the following question
If we believed that the difference between males and females studying at the university with
regard to smoking is higher than it is in the general population, what hypothesis of the study
could we use to test this?
a. That university male students smoke more than university female students in a
percentage greater than 25%.
b. That university male students smoke the same as university female students in a
percentage of 7% or less.
c. That university male students smoke more than university female students in a
percentage greater than 7%.
d. That university male students smoke less than university female students in a
percentage greater than 25%
As you can see in the text, a quarter of men in Spain declare themselves to be smokers,
versus 18% of women. This puts men 7% above women men in smoking. If we think that
male university students smoke more in relation to university female students than that 7%,
we should put as a hypothesis of the study that university students smoke more than
university women in a percentage of 7% or more, and as a null hypothesis that they smoke
the same or less.
- False: 25% is the percentage of male smokers but the hypothesis is about the
difference with females.
- False: This would be the null hypothesis.
- True: Good answer!
20
- False: 25% is the percentage of male smokers but the hypothesis is about the
difference with females. Besides the less does not make much sense either
7. As the teacher is a bit of a smartass, he has decided to change the name of one of the
hypotheses and use a different one in his classes: Do you remember which was which he
renamed?
a. Everyone except the teacher calls the study hypothesis the alternative hypothesis.
b. The teacher would never rename the hypotheses because their names perfectly
reflect their intrinsic meanings.
c. Everyone except the teacher calls the null hypothesis the alternative hypothesis.
d. Everyone except the professor calls the study hypothesis the null hypothesis
Everyone except the professor calls the study hypothesis the alternative hypothesis.
- Correct: This alternative would be correct.
- False: Null hypothesis more or less reflects the intrinsic sense of being the
hypothesis of no effect or no difference, but the teacher believes that alternative
hypothesis, the name traditionally used for the hypothesis with which we most agree,
is not quite right and it seems to imply that it is an alternative to the one we preferer
when in fact it is the one that we prefer.
- False: Everyone, including the professor, calls the null hypothesis the null hypothesis
(although it’s a name that he is not very happy with either).
- False: The study hypothesis is called the alternative hypothesis by everyone except
the professor.
8. Induction means:
a. Deducing the consequences of the laws of science.
b. Inferring theories from repeated observations of reality.
c. Proving that the theories are true.
d. Rejecting hypotheses that are not true.
9. IMRD is the name of the sections that:

a. Followed by the theoretical academic papers in Psychology..
b. Followed by the scientific-empirical articles in sciences such as biology, psychology,
or chemistry.
c. Followed by Statistics academic papers.
d. Followed by the scientific method
IMRD are the initials of Introduction, Method, Results, and Discussion which is the basic
structure used in scientific-empirical articles in sciences such as biology, psychology, or
chemistry.
- False: Theoretical articles do not follow these steps.
- Correct: This is the correct alternative.
- False: Statistical papers don’t use these steps so often.
- False: Although IMRD is a scheme that has a lot to do with the scientific method
itself, they are not the same thing.
10. Popper’s falsificationist logic indicates that:

a. Scientists must attempt to refute the theories that they consider correct.
21
b. It is necessary to verify if the data of the investigations are false.

c. You have to study false theories to understand their errors.
d. Scientists must try to refute theories other than those they consider correct
Popper argued that in order to reinforce the theories they believe to be correct, scientists
must actively try to find evidence that refutes them. If they can’t, then the theories will come
out reinforced.
- False
- False
- False
- True
11. The results part of the report of an study:

a. It is the one that explains the conclusions of the studies..
b. It is the one most related to Statistics.
c. The technical information to understand this part is usually explained in courses on
Psychometrics or Research Design.
d. It is the only one worth reading of any study.
The results part is where the statistical analyzes performed on the data are explained. It is a
fairly technical part that often uses plenty of Statistics analyzes.
- False: The results should not be but brief comments about the results of the
statistical tests: the conclusions part is where you discuss your results in relation to
the theory presented in the introduction and even possible future work.
- Correct: The contents of the Statistics subject are usually applied in this section, both
to learn how to write it and to understand the analyzes carried out by others in their
reports.
- False: Psychometrics is related to measurement and, unless the work is focused on
improving or developing a questionnaire, the part where measurement is described is
usually in the method section. And so it happens with Research Design.
- False: If you are a Statistics professor answering this exam, this answer is maybe
correct: for the rest of the world, the answer is false, the other parts of a study are
also important.
12. The significance value is:

a. The probability of obtaining the results that have occurred in a sample if the study
hypothesis (for the entire population) is true.
b. The probability of obtaining the results that have occurred in a sample if the null
hypothesis (for the entire population) is true.
c. The probability that the null hypothesis is true.
d. The probability that the study hypothesis is true
How the significance value is calculated is by extracting samples from a population with
parameters equal to the null hypothesis, therefore it indicates the probability of obtaining the
results that have occurred in a sample if the null hypothesis (for the entire population ) it’s
true. Note that we usually want that probability to be low because that way we will reject the
null hypothesis.
- False: In a study, the hypothesis we test is the null hypothesis, not the study
hypothesis.
22
- Correct: Indeed, the significance value is the probability of obtaining the results that
have occurred in a sample if the null hypothesis (for the entire population) is true.
- False: In the hypothesis testing procedure we use, we cannot calculate whether a
hypothesis, null or study, is true. What we can do is calculate the probability of
obtaining the results that have occurred in a sample if the null hypothesis (for the
entire population) is true.
- False: In the hypothesis testing procedure we use, we cannot calculate whether a
hypothesis, null or study, is true. What we can do is calculate the probability of
obtaining the results that have occurred in a sample if the null hypothesis (for the
entire population) were true.
13. The steps of the scientific method are usually:

a. Perform experiments in the laboratory to find the solution.
b. Demonstrate with data the null and study hypotheses.
c. Theory, hypothesis, method, analysis of results, and conclusions.
d. Develop a hypothesis, prove a theory, and perform statistical analysis.
The steps of the scientific method are Theory, hypothesis, method, analysis of results and
conclusions
- The scientific method is not only applied within the laboratory. There is also science
beyond the laboratory.
- This alternative doesn’t really make much sense: proving the null and study
hypotheses is not the goal of the scientific method.
- This alternative is the correct one.
- This option is not complete: statistical analyzes alone are usually not enough to
reach conclusions, and there is not mention of this last step. Also, the hypotheses
are drawn from from theory so those two steps are not in the correct order
14. Psychology is a … science:

a. Empirical
b. Medical
c. Formal
d. Philosophical
Psychology is an empirical science because its theories need to be tested against reality to
determine if they are valid.
- True
- False
- False
- False
23
Test 4. Rules for test selection
1. You have a file with data from a survey on social issues in the USA (GSS93) in the
course’s material. (There is a version in Spanish and one in English, if you can’t find the one
in your favorite language, please let me know). You can open the file with JASP or Jamovi,
but SPSS has an advantage because it allows you to see the information about the variables
in a more compact way.
Let’s suppose that you want to analyze if the zodiac sign (zodiac) influences the region in
which you live (region), what statistical test would you use?
a. Pearson’s correlation.
b. Pearson’ Chi-Square test.
c. Simple regression.
d. An analysis of variance test
The two variables are categorical so, in this case, Pearson’s Chi-Square test is the
appropriate technique for testing the association between them.
- False: This alternative is for numerical variables and the two variables are
categorical.
- Correct: This would be the best.
- False: This alternative is for numerical variables and the two variables are
categorical.
- False: No. This alternative would not work
Suppose that you want to analyze whether marital status (married, single, etc.) influences
the level of life satisfaction of the people (measured with a Likert scale). What statistical
technique would be appropriate to analyze the relationship between these two variables?
a. An analysis of variance test.

b. The Kruskal-Wallis test
c. The Friedman’s test
d. The analysis of variance test or the Kruskal-Wallis test.
We want to study how satisfaction depends on marital status because satisfaction is a

dependent variable in this case (and the other is an independent variable).
The marital status variable is categorical with several categories and the level of satisfaction
is categorical ordinal. In this case, looking at the table in the theory’s chapter, the
Kruskal-Wallis test is the one usually recommended but, as we will see later, there is not
much difference in using a test with a dependent numerical variable, which would be the
24
analysis of variance, and using that instead of the Kruskal-Wallis test has some advantages.
For this reason, there are two alternatives that are acceptable, but the one that combines
both tests is the best.
- False: This alternative is not bad but there is a better one.
- False: The Friedman test works for repeated samples, which is a concept that I
haven’t explained yet but will come up later in the course
Let’s say that you want to analyze whether age makes people spend more hours watching
television. What statistical technique would be suitable for this goal?
a. An analysis of variance test.

b. Pearson’s correlation.
c. Multiple regression
d. Simple regression
We want to study how TV hours depend on age, so TV is the dependent variable (and the
other one is the independent variable).
The TV Hours variable is numerical, and so it is the age of the respondents. In this case,
simple regression is the appropriate technique.
- False: No. This alternative would not work.
- False: This alternative would tell us the association, but we want to analyze the
dependency (although both tests are closely linked TBH).
- False: Multiple regression with one independent variable is simple regression after
all, but it is better to use the correct names.
- Correct: This would be the best
Let’s suppose that you want to analyze whether the Academic level is related to the level of
satisfaction of people with their lives (life). What statistical technique would be appropriate to
analyze the relationship between these two variables?
a. Pearson correlation.
b. Simple regression.
c. Ordinal correlation (Spearman).
25
d. Pearson correlation or Spearman correlation.

The two variables are ordinal (ordered categories), and the question is about “the
relationship”. In this case, an ordinal correlation would be the most appropriate, but, if the
sample is large, the result will not be very different from the Pearson correlation, so both
techniques would be reasonable.
- False: Simple regression needs one independent and one dependent variable, and
the question’s write-up does not hint at a test about whether one depends on the
other.
- False: This alternative is not bad but there is a better one
Suppose that you want to analyze if the average age of the respondents is 25 years…what
statistical test would be appropriate in this case?
a. Pearson’s correlation.
b. The two-sample t-test.
c. A binomial test.
d. The one-sample t-test.
We only have one sample on one variable so the proper technique is the one-sample t-test.
- False: This alternative would test the association between two variables and we only
have one.
- False: This alternative is not correct.
- False: No. This alternative would not work.
- Correct: This would be the correct one
6. Congratulations, you have been hired as the person in charge of the students’ satisfaction
at a Valencian university. As a first step, you are planning to carry out a survey in which you
are going to ask a sample of 1000 students how satisfied they are with the vegetarian menu
in the cafeteria. Please, indicate which format would be the most suitable for this question.
a. An open question in which you would allow each student to answer verbally for a
maximum of 15 minutes, you will record what they say, and then you will listen to the
answers calmly afterward to draw your conclusions.
b. A numerical scale called a satisfaction-meter in which students must accurately
indicate their level of satisfaction on a scale of 1 to 100.
c. A series of questions using a Likert-type scale with five points of evaluation (from not
at all satisfied to very satisfied): For example, a question could be: how would you
rate the quality of the broccoli on the vegetarian menu from one to five?
d. A list of adjectives with different emotional content that you will then interpret
depending on whether the tone is positive or negative.
26
Unless your life is really boring and you are thinking that listening to the recordings of 1000
students is the way to make it more interesting, or you want to develop a new psychological
theory about the relationship between words and food, I recommend that you use a scale
Likert. Using a numerical scale is not unreasonable, but do you think anyone can be that
accurate on broccoli?
- False: 15 minutes per thousand students? By the time you have finished listening to
the recordings, the course will be over.
- False: This alternative is not unreasonable, but do you think that someone can be
very precise in something like this? It is better to ask with an ordinal scale.
- Correct: This is the most boring option of all, but it is the one that makes the most
sense.
- False: Inventing theories is what psychologists are most famous for, but that doesn’t
mean it’s a good idea to feed the flames.
7. In this link there is a compressed data file that has a series of files by country that come
from the European Social Survey and that you can use to write the report if you find
something that interests you. If you open any file and examine the variables you will see that
I have selected many that have to do with mood, life satisfaction, etc. In addition, starting
with one called ipcrtiv (Important to think of new ideas and being creative) you will find a
series of questions about how important the subject considers that value in their life. Those
questions correspond to the Schwartz values questionnaire, which is a well-known theory of
values that you may have seen in Social Psychology class but that is explained in many
places anyway.
By the way, if the file does not include anything that convinces you, you can see all the
topics covered in the survey at this link.
For this exercise I only ask you the following: In the this document it is indicated that
questions about values number 3 (ipeqopt: Important that people are treated equally and
have equal opportunities), 8 (ipudrst: Important to understand different people) and 19
(impenv: Important to care for nature and environment) score on the universalism value. If
we calculate a sum of these three questions that indicates universalism and we want to see
how it depends on the age of the respondent, what statistical technique can we use?
a. Age is a numerical variable and universalism is numerical so we can use a Pearson

correlation.
b. Age is a numerical variable and universalism is ordinal so we can use a Spearman
correlation.
c. Age is a numerical variable and universalism is categorical so we can use an
analysis of variance.
d. I don’t think age has anything to do with universalism.
When we add several questions from a Likert scale that are on the same scale, we can treat
it as if it were numerical. Since age is numerical we can calculate a Pearson correlation.
- Correct: Great!
- False: When we add several questions from a Likert scale that are on the same
scale, we can treat it as if it were numerical.
27
- False: When we add several questions from a Likert scale that are on the same
scale, we can treat it as if it were numerical.
- False: Well. Ops, this…
8. In this link 2021-22 Estadística II Gr.B (36245) (uv.es) there is a compressed data file that
has a series of files by country that come from the European Social Survey European Social
Survey | European Social Survey (ESS) and that you can use to write the report if you find
something that interests you.
If you open any and examine the variables, you will see that I have selected many that have
to do with mood, life satisfaction, etc. In addition, starting with one called ipcrtiv (Important to
think new ideas and being creative) you will find a series of questions about how important
the subject considers that value in their life. Those questions correspond to the Schwartz
values questionnaire, which corresponds to a well-known theory of values that you may have
seen in Social Psychology class but that is explained in many places anyway.
By the way, if the file does not include anything that convinces you, you can see all the
topics covered in the survey at this link Data and Documentation by Theme | European
Social Survey (ESS)
For this exercise I only ask you the following: In the this document
ESS_computing_human_values_scale.pdf (europeansocialsurvey.org) it is indicated that
questions about values number 3 (ipeqopt: Important that people are treated equally and
have equal opportunities), 8 (ipudrst: Important to understand different people) and 19
(impenv: Important to care for nature and environment) score on the universalism value.
How would we know if someone is considered high in that value? Note: In the previous
document there is a more complicated method to obtain these scores that we will not use for
now.
a. Asking him, because asking is the best way to go to Rome.

b. Reading each person’s answers to the questions.
c. Adding the answers of that person in the three questions: The higher the result, the
more universalist this person would be considered. For example, somebody scoring
15 would be the universalist in chief.
d. Adding the answers of that person in the three questions: The higher the result, the
lower the universalist would be considered. For example, 15 would be the lowest
universalist.
If you look at the response scale, you will see that answering 1 means feeling that you are
close to that value, while 5 means not feeling that you are as indicated. Adding the questions
will produce a score that summarizes the universalist value of people but higher scores
would mean lower universalism.
- False: Well, it’s not false, but you want to pass the course, not to go to Rome.
- False: Well. Yes but…
- False: The response categories are inverted, so high values indicate lower
universalism.
- Correct: Great!
28
Test 5. Steps for applying tests
1. The GSS93 is a survey we have used for previous exercises that you should be able to
find and open with your favorite data analysis software. One of the questions in this survey is
one called life that reads “Is your life Exciting or Dull?” and has three alternatives for
responding: Dull(1), Routing(2), and Exciting(3). The goal of this exercise is to explore this
variable and others related to it as the first step before performing statistical tests. This step
will be carried out with graphics that I assume you learned how to make at the first part of
the course, but if you do not know or you do not remember, let me know.
Draw a bar chart of the life variable and select which alternative below is correct.
a. There must be many people that did not respond to this question.
b. The total number of responses is 400.
c. Many more people have exciting lives
versus routine lives.
d. The number of people with Dull lives is
higher than those with routine lives
The statistical packages may show or not the

missing values (people who did not respond). In
this case, there are 1500 people in the GSS93 but
only 997 responded to the life question. If we
were working for the company collecting this data
we might want to know why there are so many
people not responding as this could be a problem
for the analyses.
Is your life Exciting or Dull?

- Correct: Yes. Only 997 people responded to the question out of 1500 people in the
survey.
- False: Not. There were 997 who responded.
- False: There is a small advantage but I would not say “many”.
- False: This is simply not true.
Draw a boxplot of the income91 variable split by the life satisfaction categories to check if
there is something relevant about it.
29
a. There is absolutely no relationship between life and family income (but we need to
test for significance before claiming it).
b. It looks like people who rate their lives as
less exciting, are in families that have lower
incomes (but we need to test for significance
before claiming it).
c. It looks like people who rate their lives as
more exciting, are in families that have lower
incomes (but we need to test for significance
before claiming it).
d. The total number of responses is 400.
The boxplots show an increase in family income by

categories of life excitement. The plots suggest an
association between these two variables (but we
need to test for significance before claiming it).
- False: Absolutely not relationship seems
excessive, although we are not completely true
before testing for significance, there seems to be some relationship between life
excitement and family salaries.
- Correct: Yes, you are right!
- False: Not. It rather looks like the more the excitement, the higher the income
- False: Still not true.
Suppose that you have the theory that 50 years old is the limit to have an exciting life and
consequently you set the hypothesis that people who claims to have a Dull life are older than
those who have an Exciting life. Looking at the table below, do you think that the results
support it?
a. Looking at the second line, I see that I can reject the null hypothesis so the results do
support this theory.
30
b. Looking at the third line, I see that I can not reject the null hypothesis so the results
do not support this theory.
c. Looking at the first line, I see that I can reject the null hypothesis so the results
d. Looking at the first line, I see that I can reject the null hypothesis so the results do not
The right line for this test is the second and the reason is that the study hypothesis
associated with our theory would be μDull−μExciting>0 and consequently the null hypothesis
would be μDull−μExciting≤0. In this case, the difference is of 8 years and is significant, so
you can reject the null hypothesis and consequently people who affirm that they have a Dull
life is older than people who affirm that have an Exciting life.
- False:
- False:
- False
consequently you set the hypothesis that people who claims to have an Routine life are older
than those who have an Exciting life. Looking at the table below, do you think that the results
support it?
a. The differences are significant but the null hypothesis is not the right one.
b. The differences are significant, the null hypothesis is the right one but the effect size
is not high.
c. The differences are significant, the null hypothesis is the right one and the effect size
is very high.
d. The differences are significant, the studio hypothesis is not the right one but the
effect size is not high.
31
The null hypothesis of the table is the right one according to the question and consequently
the study hypothesis too. The differences are significant but the d that measures effect size
is not very large (remember that a d below 2 is regarded as small).
- False:
- False:
- False
Suppose that you have the theory that people who have a Dull life has a very sad life in
general compared with people who have an Exciting life, and you want to show that so it
happens in several aspects, namely, their income, family income, the number of hours they
watch television per day and the age at which they got married.
a. The results do support the hypotheses set in the question.

b. The results do not support the hypotheses set in the question.
c. The sample size is 400.
d. The results support partially the hypotheses set in the question.
The differences are significant in all cases and they are in the direction that supports that
people that label their lives as Dull have less income, live in families with less income, watch
more TV and get married a little bit younger (although you might dispute this is bad, of
course). Besides, the effect sizes are large in general. All together, you might claim people
who affirm that their lives are Dull may have a point.
- Correct: Yes, you are right
- False:
- False: Well…
32
- False
Exploring the variable tv hours you decide to test if the average hours of the people that
affirm that their lives are Dull life is five. Using the wondrous graphic that the teacher has
shown a couple of times in class, what could you say about the results of this test?
a. The effect size is very low so the results are not reliable.
b. The average number of hours of the sample is five because the red “thing” is just
over 5.
c. We can not reject the null hypothesis that the average number of hours of the sample
is five. Also, there must be somebody who does not ever turn the TV off.
d. The sample size is 400
The blue line representing the average of the

sample overlaps with the violin plot (the red “thing”)
which means that there are no differences with
respect to the value set in the null hypothesis.
Besides, the boxplot shows that there are two
outliers: one watches the TV 24 hours per day and
the other one is doing it 20 hours per day. As these
outliers may move the sample’s average up and
the sample is not huge, it might be interesting to
check what happens if you remove them as shown
in the plot below, in which you can see that after
that the sample’s average seems to be below the
hypothesized value of 5 hours per day.
- False:
- False:
- False: Not. It is 64
33
will be carried out with graphics that I assume you
learned how to make at the first part of the course,
but if you do not know or you do not remember, let
me know.
Draw a histogram of the income91 variable to

check if there is something relevant about it.
a. There are more families earning low

salaries than high salaries.
b. Most of the people earn salaries that are
the middle in the scale of salaries.
c. There are fewer families earning low
salaries than high salaries.
d. The total number of responses is 400
The variable looks asymmetric, with the peak on the right side of the data. This suggests that
There are fewer families earning low salaries than high salaries.
- False: I do not see that.
- False: The middle of the scale is ten but the bars there are not the tallest. More
families earn between 17 to 21.
- False: This answer was not right for the previous question and it is not for this one.
Suppose that you have the theory that 50 years old is the limit to having an exciting life and
consequently that people who claim to have a Dull life are on average over 50 years old.
Looking at the table below, do you think that the results support it?
34
a. Looking at the first line, I see that I can reject the null hypothesis so the results do not
b. Looking at the second line, I see that I can not reject the null hypothesis so the
results do not support this theory.
c. Looking at the first line, I see that I can reject the null hypothesis so the results
d. Looking at the third line, I see that I can not reject the null hypothesis so the results
do not support this theory.
The right line for this test is the second and the reason is that the study hypothesis
associated with our theory would be μ1>50 and consequently the null hypothesis would be
μ0≤50. In this case, although the “Dull” people in the sample has an average age over 50,
you can not reject the null hypothesis in any case (but all the results are non-significant so
this is the easiest part of this question).
- False:
- False:
- False
Let’s say that you have the theory that the people with Exciting lives are younger than those
with Dull lives
a. You are not sure of the differences so consequently your test should be bilateral
b. The hypotheses should be unilateral because you believe that young people have
more exciting lives than olders (BTW, I beg to differ).
c. The total number of responses is 400.
d. There should be a hypothesis of the type null and another of the type alternative
35
The hypothesis should be unilateral because you believe that young people have more
exciting lives than old people (but I do not).
- False: You may not be sure of the result, but you should be of your hypotheses.
- Correct: Yes, unilateral it is.
- False: Still not true.
- False: There will be always one null an one alternative but I ask specifically in this
case.
you set the hypothesis that people who affirm to have an Exciting life are on average under
50 years old. Looking at the table below, do you think that the results support your theory?
a. Looking at the second line, I see that I can not reject the null hypothesis so the
results do not support this theory.
b. Looking at the second line, I see that I can reject the null hypothesis so the results do
not support this theory.
c. Looking at the third line, I see that I can reject the null hypothesis so the results do
d. Looking at the first line, I see that I can reject the null hypothesis so the results
The right line for this test is the third and the reason is that the study hypothesis associated
with our theory would be μ1<50 and consequently the null hypothesis would be μ0≥50. In
this case, you can reject the null hypothesis supporting that the average age of people who
affirmed having an Exciting life was under 50.
- False:
- False:
- False
36
Test 6. One sample tests
1. In the Data section of the course, there is a data file called ESSEspanya.sav, which has
variables belonging to the European Social Survey referring to Spain. In it there are
variables related to the state of mind of the Spaniards, happiness, values, attitudes towards
social issues (gender, emigration, etc.), politics, and many others. That file should be a good
source for your course report.
Look at this news and find out what percentage of young people between 15 and 19 years
old suffer from anxiety according to the WHO (if you do not know how to select those
younger than 19 is time to ask). Check if that percentage is the same in Spain. For this, you
can use the question fltanx. I understand that if someone answers that question “Most of the
time” or “All, or almost all the time” then he/she is anxious. NOTE: I have combined the
responses from the two previous responses into the “Most of the time” alternative in the data
file).
Check if the percentage of adolescents aged between 15 and 19 years who say in the link
that they are anxious coincides with the value that appears in the ESS for Spain.
a. The percentage of people aged 15-19 suffering anxiety in the article is 10% but that
of the sample is only 0.03, so the information is clearly wrong.
b. The percentage of people aged 15-19 suffering anxiety in the article is 4.6%, and in
the sample it is 3.4%. We cannot reject the null hypothesis of no difference.
c. There are only 3 subjects who claim to be anxious in the sample, so we cannot draw
valid conclusions.
d. The effect size is very large, so there must be differences between the two values.
As we can see in the table below, 3 out of 87 subjects under 19 years of age stated that they
had feelings of depression in the previous week. That makes 3.4% of the total which is close
to the information of the WHO (4.6%). Rejecting the null hypothesis is not possible as p=0.8
so we can say that our sample coincides with the WHO’s value for the world.
- FALSE.
- TRUE. This is correct.
- FALSE.
- FALSE
Felt anxious, how often in the past week
37
2. One of the problems that the aliens have is that their cholesterol level goes up a lot when
they eat every day in the spaceship canteen because they can only eat processed food. To
solve this problem, a sample of aliens who had recently made interspatial trips in different
ships was put on a diet. This sample represented a random sample of all aliens doing this
type of trip. Blood samples were taken on days 2, 4, and 14 just after their trips to examine if
they passed the recommended maximum cholesterol levels (which interestingly enough
matches the levels for humans: 200mgs/dL). Cholesterol was also measured before setting
off for the trip(CONTROL). In this case, our aim is to show that the cholesterol levels at the
CONTROL moment were correct (that is, reject that
H0>200) and the opposite in the rest of the tests, that is,
that their levels were incorrect. Cholesterol test results
are found in the course material (Alien Cholesterol).
a. We cannot reject that aliens, in general, have a

cholesterol level equal to 200 since the confidence
interval for those we measure in CONTROL includes that
value. Also, for that hypothesis test p=.102, so we cannot
reject the null hypothesis.
b. The results are not conclusive because the
variables have many strange values. There are also
missing values. In spite of that, the impression that it
conveys is that the mean cholesterol of the aliens in the
population will be correct on day 14 but not earlier, since
on day 2 the mean is 253.93.
c. The mean cholesterol of the alien population is
above 200 even up to day 14 after landing, but before the
trip (CONTROL variable) it was perfectly fine.
d. The results show that the aliens always have too
low cholesterol, both before making the trip (which is in
the CONTROL variable) and after dieting (the rest of the
variables).
e. The variables are sufficiently symmetric and do
not show very strange values. The results of the
hypothesis tests show that the alien sample had
statistically significant high cholesterol on days 2, 4 and
14, although it seems that their levels decreased as time
went by. We can not reject that the aliens had high
cholesterol in CONTROL too.
The number of cases in the sample is 30, but if you look

closely at the data, you will see that there are some
missing cases marked as NA in the table, so they are
actually fewer cases for the analyses.
Box plots give the impression that all variables are fairly
symmetric. There are some extreme values but they are
not extremely large and consequently we can be
confident in the results.
38
The next step is to set the hypotheses:
In the variables DAY2, DAY4 and DAY14, what we

want is to show that their levels were incorrect, so
our null hypothesis is that they have cholesterol
below 200 (H0:μ<=200). If we reject the null
hypothesis, then the result is that our subjects have
excessively high cholesterol levels. If we reject this
hypothesis we have evidence in favor of H1:μ>200.
In the case of the CONTROL variable, if what we

are looking for is to show that his cholesterol levels
were adequate (below 200), one way to approach
this is to see if we can reject the null hypothesis
that his levels were too high and for this we set that
H0:μ>=200. If we reject this hypothesis we have
evidence in favor of H1:μ<200.
The test results are shown in the table below.
T-test table
For DAY2, DAY4, and DAY14, the alien sample’s cholesterol is above 200. Now, as our
questions are about the population from which those samples were drawn we need
hypothesis tests to confirm that the differences are significant. The null hypothesis is that the
sample mean would be below 200, but as can be seen, in reality, the sample mean is above
that value of 200 in all three cases. This suggests that the null hypothesis must not be true
since the probability of obtaining the sample means obtained if the null hypothesis were true
is very low (<0.05) in all the tests.
For CONTROL the result is different. First of all, the null hypothesis we test is that the
cholesterol level of the aliens is high. We would clearly reject that null hypothesis if the
39
sample’s mean were much lower than 200, but in this case, the result is not very conclusive.
The sample mean is 193.13, somewhat below 200 but not too much. Since the value of the
sample mean is 193.13 and it is quite close to 200, the probability of rejecting the null
hypothesis is 0.051, which is quite close to the value that we usually use (0.05) but it does
not reach it. Furthermore, looking at the 95% confidence interval, we can see that the
cholesterol value of the alien population could even reach a value of 200.05 according to this
result. Thus, we cannot completely rule out that aliens as a whole do not have slightly high
cholesterol values even before space traveling, and that it would therefore be a good idea
that all of them, and not just those who travel to space, watch their diet a little bit more.
Finally, note that the cholesterol levels of those on a diet seem to go down progressively. It
looks like a diet free of processed foods has the desired effect. However, at this time it is not
possible to make a diagnosis of whether the change is really statistically significant. You will
learn about this type of analysis in other exercises of the course.
- This answer is FALSE since it is based on two-tailed hypothesis testing and the
statement of the exercise requires one-tailed testing.
- This answer is FALSE. The part of the strange values, although it is true that there
are some marks like this in the graph, is somewhat overstated. There are also
missing values but in this course we do not give them too much importance. In the
hypothesis tests, the cholesterol level on day 14 is still high (we reject H0<=200 with
p=0.022) and on the other days it is even worse.
- This answer is FALSE although just barely: the cholesterol of the population of
subjects that represents the CONTROL sample borders on excessive cholesterol
levels. Look at the solution of the exercise to see the explanation of the result in more
detail.
- This answer is FALSE. The results show that the aliens always had their cholesterol
too high, not too low, during the time they were dieting, and they were very close to
having it too high when they were measured for the CONTROL measure.
- This answer is CORRECT. The results show that the variables are sufficiently
symmetric and do not show very strange values. The results of the hypothesis tests
show that the alien population has high cholesterol on days 2, 4, and 14, although it
shows a decreasing trend.
Look at this piece of news and find out what percentage of young people below 19 years of
age suffer from depression. To make this calculation, keep in mind that you have to select
only adolescents (if you don’t know how to do it, it’s time to ask). I have selected those under
19 years old and I get 65. In that file there is a variable called fltdpr that corresponds to the
question: “Felt depressed, how often past week”. I understand that if someone answers that
question “Most of the time” or “All, or almost all the time” then he/she is depressed. NOTE: I
have combined the responses from the two previous responses into the “Most of the time”
alternative.
40
Check if the percentage of adolescents who say in the news that they are depressed in
Spain coincides with the value that appears in the ESS.
a. The percentage of depressed people in the article is 10% but that of the sample is
only 0.01, so it is very conspicuous that fewer persons are depressed in the sample
and, therefore the press should not be trusted at all.
b. Although the difference between the proportions is small, the effect size is very large,
so there must be differences between the two values.
c. There are only 6 subjects who claim to be depressed in the sample, so we cannot
draw valid conclusions.
d. The proportion of depressed people in the article is 0.1 and in the sample, it is 0.095.
Since the difference is very small, we cannot reject the null hypothesis that there is
no difference.
As we can see in the table below, 6 out of 63 subjects under 19 years of age stated that they
had feelings of depression in the previous week. That makes almost 10% (9.5% to be exact)
which almost completely coincides with the headline of the newspaper (1 out of 10).
Therefore, rejecting the null hypothesis is not possible and we could say that our sample
coincides (as it should) with the estimate published in the newspaper. The risk is also very
close to one, which means that the proportion in the null hypothesis (symbolized by p0) is
very similar to the observed one (symbolized by p1).
Felt depressed, how often past week
- FALSE. This alternative mixes proportions with percentages. Do not forget that the
analyzes are usually carried out with proportions, but percentages are often used to
communicate the results.
- FALSE. The size of the effect is measured by the relative risk. To that extent, a
relative risk of 1 means no effect. We have a large effect well when it is greater than
1 or less than one and we don’t have strict criteria about when that effect is large or
small (it depends on how good or bad it is that we are talking about).
- FALSE. I do not see why.
- TRUE. This statement is valid.
41
In this document from the WHO (Salud mental del adolescente (who.int)) there is a mention
of the percentage of depressed people between 15 and 19 years old. You can check if this
percentage is the same in the ESS for Spain(although we do not have 15-year-olds in our
sample but this is OK). There is a variable called fltdpr in that file that corresponds to the
question: “Felt depressed, how often past week”. I understand that if someone answers that
question “Most of the time” or “All, or almost all the time” then he or she is depressed.
NOTE: I have combined the responses from the two previous responses into the “Most of
the time” alternative. NOTE: I have combined the responses from the two previous
responses into the “Most of the time” alternative. To make this calculation, keep in mind that
you have to select only those under 20 years of age (if you don’t know how to do it, it’s time
to ask).
Check if the percentage of adolescents mentioned in the piece of the news who are
depressed in the world coincides with the value that appears in the ESS.
a. The proportion of depressed people according to WHO .8% and in the sample is
9.5%. The difference is not significant and Spain has lower percentage of youth
people suffering depression than mentioned by the WHO.
b. There are only 6 subjects who claim to be depressed in the sample, so we cannot
draw valid conclusions.
c. The proportion of depressed people in the article is 2.8% and in the sample it is
9.5%. The difference is not significant and in the Spanish sample the risk of
depression is similar to that indicated by the WHO.
d. The proportion of depressed people in the link is 2.8% and in the Spanish sample is
9.5%. The difference is significant and in the Spanish sample the risk of depression
would be more than three times greater than that indicated by the WHO (ug!)
As we can see in the table below, 6 of 63 subjects under 19 years of age stated that they
had feelings of depression in the previous week. That makes practically 10% (9.5% to be
exact) which is quite different from what appears in the WHO link. In this case, we reject the
null hypothesis and see that the risk is much greater in the Spanish sample.
Felt depressed, how often past week
5. For aliens, totamine is part of what is sometimes called the happiness quartet, mediating
feelings such as love, pleasure, and sexuality, though it may also have to do with addictions.
Low totamine levels can make aliens less likely to work for a purpose. After studying in depth
42
the work problems of a group of aliens who work mainly the night shift,
psychologists wonder if their average levels of totamine are altered due to
their lifestyle. The average totamine level of normal aliens is set at 10
mgs/dl.
Note that the purpose of this study is to test whether the night shift aliens
can be considered “normal” aliens or not: that is, whether they can be
considered a random sample that could have been drawn from a population
of normal subjects. Would you say that is so?
The data of these aliens are in the table below.
a. The subjects that were captured are a biased sample that we can
say included those with less totamine.
b. The mean totamine of the subjects is 10.33 and a significance test
with H0:μ<=10 would lead to no rejection of this null hypothesis with
(p=.190).
c. The mean of the sample is 10, but since there is a very high extreme value, we
should not trust that value since it may not generalize well to the population.
d. Since the sample only includes night shift subjects, this study cannot be performed.
e. The sample of subjects evaluated has a normal totamine since there are several
subjects who have less than 15.
T-test table
43
6. Returning to our planet, the data in this exercise correspond to one of the darkest
moments in human history. In 1945, in the city of Nuremberg, the first trial began against the
highest officials of the Nazi government (those who were captured alive, of course). This trial
had many ramifications, both legal and political, as such a trial posed challenges from the
point of view of international law… but it also aroused much interest among psychologists
struggling to understand explanations for behavior as malignant as those shown by those
leaders during that period. For this reason, two specialists in human behavior carried out
tests and interviews with these leaders to gain an in-depth understanding of their
psychological characteristics. A summary of these studies can be seen for example here but
if you do an internet search you will find many other sources.
The IQ_Nuremberg.sav has the results of the IQ measurements carried out by two
specialists. In it you can see the names of the subjects and their level of intelligence (IQ). IQ
is a measure of general intelligence that can be measured with different tests and that,
despite all its caveats, is widely used in a variety of places. In general, an average IQ has a
value of 100, and the standard deviation is usually normalized to 15. Assuming that
intelligence follows the normal distribution, then 115 is high, 130 is very high, 145 is
extremely high, and 187 is Sheldon.
One of the arguments used by some of the defendants in their defense was simply “playing
dumb” by saying that they did not know, that they did not suspect, that they had no
decision-making capacity, etc. One way to answer those arguments would be to study those
intelligence tests, but that raises some issues, and therefore not all answers in the list below
are equally correct, so do your analysis of the data in IQ_Nuremberg.sav and answer the
alternative question or alternatives that seem most correct to you.
a. The mean intelligence of the subjects is 128 and a significance test with H0:μ<=100
would lead to rejecting the null hypothesis that they had low intelligence. However,
these results are based on considering that these data are a random sample drawn
from a population of Nazis with characteristics similar to those captured, which is a
questionable assumption.
b. The sample of subjects evaluated has a normal intelligence since there are several
subjects who have IQs lower than 115.
c. The mean of the sample is 128, but since there is a very low extreme value, we
should not trust that value since it may not generalize well to the population.
d. The subjects that were captured are a biased sample that we can say included the
least intelligent, since the most intelligent were surely able to escape and set up
secret organizations that are responsible for climate change and COVID.
44
The number of cases in the sample is 21. There is an extreme case with intelligence of
“only” 106 but it does not seem that this should affect the average too much. This can be
seen in the box plot below.
First of all, the sample mean is 128 and therefore it seems that the IQ they showed is quite
high. Considering them as “fools” does not seem to be right.
However, using inferential statistics in this case seems a bit complicated to justify since it is
not clear which population the sample represents. Does the sample refer to the entire
population of leading Nazis? Or only those who were initially selected to stand trial? Or since
there were many who died or escaped before, were those who were captured special in
some way? Keep in mind that all these considerations, are very relevant when it comes to
appreciating research that justifies psychological theories: many times, the subjects to which
researchers or therapists have access are very limited (for example, patients with very
specific problems like Freud’s only treating aristocratic upper-class people) and therefore
their conclusions should be assessed within that context. For this reason, although studying
the type of Nazis who were tried in Nuremberg is not without interest, we must be cautious
and avoid making generalizations to populations that perhaps do not exist.
Anyway, below I have put the results of a hypothesis test with the null hypothesis that the
subjects were below average. It is interesting to see that Cohen’s d is very high, which
shows that this group of subjects greatly exceeded the level of 100. I have also put 115 as
the null hypothesis to test if they were more than one standard deviation above the mean
and the results are still significant as you will see. Finally, a test using 130 (two standard
deviations) gives non-significant results, which means that we cannot reject that these
subjects had high intelligence on average.
T-test table
45
- TRUE.
- FALSE. Just because there are some subjects below 115 does not mean that the
sample is normal as a whole.
- FALSE. The outlier value is not so large.
- FALSE. I can recommend you a good psychologist if you wish.
This article provides data on the percentage of people age 65 and older who reported having
chronic sleep problems. In the slprl question you can see similar information for Spain. I
understand that if someone answers that question “Most of the time” or “All, or almost all the
time” he or she has a chronic sleep problem. NOTE: I have combined the responses from
the two previous responses into the “Most of the time” alternative.
Check if the percentage of people over 65 years old who say in the article that they have
sleep problems coincides with the value that appears in the ESS. I hope you know how to
filter out the group of people over 65, but if you do not, let me know.
a. There are only 6 subjects who state that they have sleep problems in the sample, so
we cannot draw valid conclusions.
b. The size of the effect is small, so although the differences are significant, it must be
considered that they are not important.
c. The percentage of people who have sleep problems according to the article is 50%,
but in the sample, it is lower (22% rounded), and the differences are significant.
46
d. The percentage of people who have sleep problems according to the article is 50%
but in the sample, it is higher, although the differences are not significant.
As we see in the table below, the proportion of people who answered that they had sleep
problems in the sample was 22% compared to 50% in the article mentioned above. The
differences appear significant and the size of the effect is quite large (there would be more or
less half the risk in the sample than that indicated in the article.
Sleep was restless, how often the past week
- TRUE. This is just for filling as I hope you have noticed.

- FALSE. The size of the effect is measured by the relative risk. To that extent, a
relative risk of 1 means no effect. We have a large effect well when it is greater or
less than 1 and we don’t have strict criteria about when that effect is large or small (it
depends on how good or bad it is that we are talking about).
- FALSE. The differences are significant.
8. This exercise uses some data about a race of aliens that, interestingly enough, have
many similarities to the beings that inhabit the planet Earth. So even though many of the
theories, hypotheses, and results mentioned might resemble what we have about human
beings, don’t trust your instincts as aliens may be different and therefore we need to do
statistical tests to make sure of that. Below is an example of the above.
For aliens, totamine is part of what is sometimes called the happiness quartet, mediating
feelings such as love, pleasure, and sexuality, though it may also have to do with addictions.
Low totamine levels can make aliens less likely to work for a purpose. An alien psychologist
believes that the inhabitants of the planet Terrum have totamine levels too low and that is
why their work performance is very low. To solve this problem, he has designed a treatment
based on meditation and concentration that, according to him, would increase the totamine
levels of the aliens, but before applying this method, he needs to demonstrate that the
problem actually exists, and for this he has measured the totamine of a sample of 10,000
subjects from that planet. The average totamine level of normal aliens is set to 10 mgs/dl.
Would you say that the aliens living on the planet Terrum have too low average totamine
levels from the results shown below?
47
The boxplot for all subjects is shown below.
The hypothesis test using the population value is below. Note that I have put all the possible
hypotheses that could be used in this case so you must pay attention anc choose the
appropriate one for the problem.
a. Mean totamine is NA mgs/dl. Using $H_0$10 we would reject the null hypothesis with
a significance value of < .001 which would mean that the mean totamine is indeed
too low. However, since we see that the sample is very large and that the difference
is actually very small (-0.11, it is appropriate to take a look at Cohen’s d to assess
this result. Since this value is very small, our conclusion should be that the difference,
although it exists, is too small to believe that it has consequences.
48
b. The sample of aliens tested has a mean totamine of mean(Datos$totamine)+10

which is clearly over normal totamine levels. The conclusion is that they have too
much totamine and they should be under treatment soo.
c. Mean totamine is NA mgs/dl. Using $H_0$10 we see that we cannot reject the null
hypothesis.
d. The mean totamine of the aliens is NA and the two-tailed hypothesis test gives a
value of papaja::printp(res$p.value). Since we don’t know anything about the biology
of the aliens, it is acceptable that we do a two-tailed test and therefore that result is
the one we should use.
e. The 95% confidence interval of the mean IQ of a supposed population from which the
sample would have been drawn is [9.87,9.91]. However, considering that there is a
very high outlier and the sample is not very large, that confidence interval should not
be taken too seriously.
The mean totamine is NA mgs/dl. Using $H_0$10 we would reject the null hypothesis with a
significance value of < .001 which would mean that the mean totamine is indeed too low.
However, since we see that the sample is very large and that the difference is actually very
small (-0.11, it is appropriate to take a look at Cohen’s d to assess this result. Since that
value is very small, our conclusion should be that the difference, although it exists, is too
small to believe that it has consequences.
- TRUE. This statement is valid. Although there is a small difference that is significant
due to the large sample size, it is necessary to look at the effect indicators when the
sample is very large.
- FALSE. The mean is not correct if you look closely at the results.
- FALSE. The sample is very large. You have to look at the size of the effect to be able
to assess the result.
- FALSE. Although this answer is valid with respect to the hypothesis test, the effect
size part needs to be assessed: When the sample is very large, you have to look at
the effect size to be able to assess the result.
- FALSE. Although there are some extreme values, with a sample of that size there is
no need to worry that they will alter the result. The t-test is robust to this type of
deviation.
9. Among the aliens, it is quite common to take a

kind of cooked cereal for breakfast that apparently
increases the level of totamine in their blood. The
same counselor who is hell-bent on selling aliens
a treatment for totamine has thought that those
who are intolerant to that cereal might be good
customers. To find out, he has selected a sample
of aliens with this problem to see if they actually
have a low totamine level.
You must remember that, for aliens, totamine is

part of what is sometimes called the happiness
quartet, and is responsible for mediating feelings
such as love, pleasure and sexuality, although it
49
may also have to do with addictions . Low totamine levels can make aliens less likely to work
toward a goal. An alien psychologist believes that the inhabitants of the planet Terrum have
totamine levels too low and therefore their work performance is very low. To solve this
problem, he has designed a treatment based on meditation and concentration that,
according to him, would increase the totamine levels of the aliens, but before applying this
method, he needs to demonstrate that the problem actually exists, and for this he has
measured the totamine of a sample of 50 subjects of that planet. The average totamine level
of normal aliens is set to 10 mgs/dl.
The box plot is beside.
The hypothesis test is set below.
T-test table
a. Mean totamine is 7.67 mgs/dl. Using $H_0$10 we would reject the null hypothesis
with a significance value of < .001 which would mean that the mean totamine is
indeed too low.
b. Mean totamine is 7.67 mgs/dl. Using $H_0$10 we see that we cannot reject the null
hypothesis.
c. The sample of aliens tested has a mean totamine of 17.67 which is clearly above
normal totamine levels. The conclusion is that they have too much totamine and they
should follow the recommended treatment.
d. The mean totamine of the aliens is NA but since there is a value highlighted in the
results, that value is not well estimated and we should not carry out the analysis.
e. The 95% confidence interval of the mean IQ of a supposed population from which the
sample would have been drawn is [7.4,7.94], this means that we are not sure that the
sample of aliens has less totamine than necessary.
The mean totamine is NA mgs/dl. Using $H_0$10 we would reject the null hypothesis with a
significance value of < .001 which would mean that the mean totamine of this group of aliens
is indeed too low. The standardized effect size is also very high, which suggests that if the
only thing that differentiates this group of aliens from the rest of the aliens is the diet they
follow, they are really going to feel very sad about having such a low totamine.
- FALSE. The null hypothesis that we should reject is $H_0$10, not $H_0$10 as stated
in the alternative.
- FALSE. That average is not correct if you look closely at the results.
- FALSE. With 50 cases, the t-test is sufficiently robust if there are any extreme values,
which in this case may or may not be true.
50
- FALSE. The confidence interval for the mean does not include 10 so we are
confident that the subjects in the sample have less totamine than 10 overall.
10. In the GSS93sp.sav file there is a variable called zodiac that collects the zodiac sign of
the participants. Although my ignorance about astrology is total and my lack of interest in it is
so great that I haven’t even bothered to Google the subject, I believe that the distribution of
the signs of the zodiac is homogeneous so that in any random sample of people there
should be a similar percentage of people of the same zodiac sign. To check it, calculate a
binomial test using the corresponding test value.
a. The test value to use is 1/12=0.08. Using this value, it can be observed that there are
two signs of the zodiac that occur in a percentage significantly higher than expected
(Pisces and Leo) and another that occurs less than expected (Taurus).
b. The test value to use is 1/12=0.08. Using this value, it can be observed that there are
two signs of the zodiac that occur in a percentage significantly higher than expected
(Pisces and Leo) and another that occurs less than expected (Taurus), but it gives
me that this answer must not be the correct one but I don’t know why.
c. Looking at the size of the effect, it is straightforward to see that there are five signs
that occur in a percentage significantly higher than expected, and seven that occur
less.
Indeed, this question is tricky since it introduces a new concept that I have not presented in
class but that we will see on other occasions: When we make many comparisons with an
error level of 5% (it is called the error level because even though there were no differences
in the population there is still the possibility of erroneously rejecting the null hypothesis in a
5% of the times), the probability that we make such error goes up and consequently the error
level becomes higher than originally set.
That effect is clearly seen in the table below. Despite the fact that in principle it is expected
that about 8% of the sample will be of a given sign, and, therefore there should be 124
people with each sign out of the total of 1500, we can observe that there are some signs
under/over that number. These variations are unsurprising since a sample will always
present variations on the ideal. Besides, some of the numbers exceed what is expected so
much that the differences appear as statistically significant.
One solution to this problem is to adjust the significance level based on the number of
comparisons (categories in this case). When the categories are few, this adjustment is not
very important, but, in this case, in which we have twelve signs, making this adjustment is
quite effective. There are several methods but in this case I have used holm’s method which
is in the p.adj (adjusted p value) section and as you will see none of the comparisons is
significant in that column, and, in fact, many of them give a value close to the maximum of
p=1.
Zodiac signs
51
- FALSE. This alternative is not correct but it is true that until now I have not taught
how to do this analysis correctly.
- TRUE. This statement is valid. See the solution for explanation
- FALSE. Effect size is not used to test whether a difference is significant.
52
Test 7. T-tests for independent groups
1. One study compared the cholesterol levels of two groups of people living in different parts
of a Central American country. One group lived in a rural setting and the other lived in the
city. The theory behind this comparison is that those who eat a more “natural” diet will tend
to have better health and consequently lower cholesterol levels. The data is on the website
in the file Cholesterol.sav. Note that the variable cholesterol appears as the logarithm of the
cholesterol value (using logarithms is a way to reduce the asymmetry of a variable that we
will not see in detail in class but that will occasionally appear in some examples). Apply the
appropriate statistical technique to compare the two groups and indicates whether the initial
hypothesis is true.
a. There are several outliers in the data so the differences, while significant, should not
be taken seriously.
b. The group that comes from the urban environment has higher cholesterol than the
rural group, also the effect size is quite large.
c. The average cholesterol of the two groups is similar, so the differences are not
significant.
d. The group that comes from the urban environment has higher cholesterol than the
rural group but the effect is very small since the difference is one-third of a point.
The two groups have similar variance and are of the same size as we see in the graphs. The
theory indicates that the cholesterol levels in the urban world will be higher than in the rural
world, and, indeed, the averages indicate that this is how it happens. The differences
between the means are significant and, furthermore, the effect size is very large. No doubt
the urban group would see some benefits from a changed diet.
T-test table
53
Note that the unstandardized effect size (that is, the difference between the means) may
seem small because it is shown in logarithms, but it is actually larger than it seems. If you do
the reverse transformation 5.36 becomes 212.7249464 and 5.05 becomes 156.0224645.
The difference is 56.702482. Keep in mind that above 200 ml/cm doctors advise taking
medicines, so in the city there would be quite a few that would get this recommendation, as
you can see in the plot below.
- FALSE. I don’t see many outliers and besides the t-test is robust with sample sizes
like the ones shown in the study.
- RIGHT. This statement is valid. See the solution for an explanation.
- FALSE. The differences are clearly significant. A look at the graph should have
shown this to you. Do not forget that cholesterol is measured on a logarithmic scale,
which leads to small values.
- FALSE. The difference may be apparently small by using logarithms but if you scale
the value to the original you will see that they are not dif=56.702482
2. Taking Ginkgo has been associated with cognitive improvements and is sold as a
“traditional” medicine (the quotes are because I refuse to believe that something that comes
out of a bottle can be called “traditional”). In this experiment it was tested with a randomized
clinical trial if this drug really had an effect.
The Memory variable is the difference in the recall of a series of elements before and after
following the treatment, so both positive and negative values can appear.
The data is in the ginkgo.sav file.
54
Answer which alternatives below are correct

(there is more than one).
a. The correct test in this case should be

μ0≤0.
b. The effect size according to Cohen’s
d was 0.22
c. The placebo group scored higher in
memory although the differences are not
significant.
d. A mean’s difference test is not
appropriate in this case because there are
many outliers
If you are thinking that Ginkgo could help you

remember better, according to this study, you
can forget it: the differences are not
significant and the effect is small (and leaning to the placebo group).
T-test table
- FALSE. No. In this case the correct null hypothesis is H0:μ1−μ2=0, which is the one
that appears in the data table.
- FALSE. There are no outliers as far as I can see.
3. A series of tests and psychological tests were carried out in a hypothetical school to
improve the way in which certain topics are approached in the subject of sports. Specifically,
there was interest in physical performance and its perception by students. Four variables
were collected in a sample of 100 12-year-old boys and girls.
BMI: Body Mass Index, a body mass indicator about which there is a lot of information on the
internet along with how to calculate it.
Body satisfaction: The results of a test in which questions about that topic are asked and
then added up.
Self-assessment of resistance: A questionnaire in which questions are asked about how
capable each one feels of enduring in effort and then added up.
PACER: A physical test that consists of going around a circuit for a given time. The
measurement is the number of turns.
55
The Body Satisfaction data file

(BodysatisfactionEndurance.sav) has
data from 100 boys and girls in the
variables BMI (body mass), body
satisfaction (satisfaction),
self-assessment of resistance
(Resistance) and number of laps in a
test of resistance (PACER) collected in
a school. In this case, the interest is to
find out if there are differences between
genders to guide the teaching of the
subjects.
a. There are significant differences

between genders in PACER and
Resistencia.
b. There are significant differences
in body mass and body
satisfaction between the two
genders
c. Men have a higher BMI than women.
d. The highest difference between genders is the one found in the body satisfaction
variable.
It is convenient that you look at the statistical graphs before proceeding with the tests. There
are some outliers and lack of homogeneity of variance, but in general it is not excessive.
The two variables in which the differences are significant are PACER, with a fairly high
standardized effect, and Resistencia, with a somewhat lower effect. In both cases, males
have higher scores. In the other two variables the differences are not significant
T-test table
56
- TRUE. This statement is valid. See the solution for an explanation.

- FALSE. The body mass variable takes into account the height of the subjects, so the
body mass between genders does not have to be different (and it is not). There are
no significant differences in body satisfaction either.
- FALSE. This is not the case in these data and, in principle, since body mass is a
measure that takes into account the height of people, you should not take for granted
to find such differences.
- FALSE. There are no significant differences in this variable
4. Congratulations, you have been hired as the personnel manager of a famous telephone
company and since the directors are interested in creating a system of incentives for workers
linked to productivity, they have decided to entrust you to design how these bonuses will be
distributed. In principle, the two fundamental criteria in which the company is interested are
the satisfaction of the customers with the service and the length of the calls. However, when
the proposal has been brought to the union representatives, they have found it inappropriate,
since it does not take into account the different working conditions, seniority, gender, the type
of call that workers receive, their qualification, if shifts vary, etc. In addition, they say that
using satisfaction as answered by those who have received the call is not a good indicator
and they demand that a more in-depth evaluation be performed based on the judgment of
experts assessing how well the phone calls are managed by part of the worker.
To study whether the unions are right, you decide to randomly sample 110 calls and evaluate
them anonymously by company experts to judge their quality (on a scale of 1 to 7).
Satisfaction with the call is collected by asking customers to provide a score of 1 to 10 when
ending the call. The duration of the call is recorded automatically (in minutes). The gender of
the client is deduced from his/her voice and the rest of the variables are information about
the workers you can obtain easily.
Your task is to decide which factors of those indicated by the unions affect the satisfaction
ratings, the quality scores, and the time of the call. From your statistical analysis, indicate if
there are arguments that support the thesis of the unions by demonstrating it with the
corresponding statistical results. The data is in the SatProducti.sav file. You have a link to
this data in section 9.3 of the course.
Analyze the effect of being a permanent or temporary worker in this case. Answer the
questions below after doing the analyses.
a. Temporary workers are more attentive and get better quality according to experts and
better satisfaction ratings from customers.
b. The results show that the resolution time and satisfaction according to the customer
ratings is greater in temporary workers than in permanent ones, but not in terms of
quality.
c. The satisfaction variable is highly asymmetric so no conclusions can be drawn from
the hypothesis tests.
d. The permanent workers are slower than the temporary workers but they get better
satisfaction ratings than those. The experts do not find that any group has better
quality in the solutions they provide.
57
There are significant differences in time and satisfaction, but not in quality. The standardized
effect size is also quite large for the two variables with significant differences between
groups. The temporaries are faster but customers are less satisfied with them. The quality
according to the experts does not present differences between groups.
T-test table
- FALSE. Nope.
- FALSE. Nope.
- FALSE. There is some asymmetry in some variables but it is not a big deal.
58
To study whether the unions are right, you

decide to randomly sample 110 calls and
evaluate them anonymously by company
experts to judge their quality (on a scale of
1 to 7). Satisfaction with the call is collected
by asking customers to provide a score of 1
to 10 when ending the call. The duration of
the call is recorded automatically (in
minutes). The gender of the client is
deduced from his/her voice and the rest of
the variables are information about the
workers you can obtain easily.
Your task is to decide which factors of

those indicated by the unions affect the
satisfaction ratings, the quality scores, and
the time of the call. From your statistical
analysis, indicate if there are arguments
that support the thesis of the unions by demonstrating it with the corresponding statistical
results. The data is in the SatProducti.sav file. You have a link to this data in section 9.3 of
the course.
Analyze the effect of gender of the workers in this case. Answer the questions below after
doing the analyses.
a. The results show that the resolution time and satisfaction according to the clients’
score is different between men and women. Women are valued better than men.
b. The results show that the quality of care assessed by experts is different between
men and women. Women are valued better than men.
c. The results show that there are no differences between men and women in any of
the three criteria.
d. The results show that there is a relationship between the quality evaluated by experts
and the satisfaction with the customer calls.
There are significant differences in quality but not in the other variables. The standardized
effect size is also quite large.
T-test table
59
- FALSE. Nope.
- TRUE. This statement is valid. See the solution for explanation.
- FALSE. Nope.
- FALSE. This alternative has nothing to do with the question so even if it is true, which
I have not checked yet, it is beside the point
To study whether the unions are right, you decide to randomly sample 110 calls and evaluate
them anonymously by company experts to judge their quality (on a scale of 1 to 7).
Satisfaction with the call is collected by asking customers to provide a score of 1 to 10 when
ending the call. The duration of the call is recorded automatically (in minutes). The gender of
the client is deduced from his/her voice and the rest of the variables are information about
the workers you can obtain easily.
Your task is to decide which factors of those indicated by the unions affect the satisfaction
ratings, the quality scores, and the time of the call. From your statistical analysis, indicate if
there are arguments that support the thesis of the unions by demonstrating it with the
corresponding statistical results. The data is in the SatProducti.sav file. You have a link to
this data in section 9.3 of the course.
Analyze the effect of having a university diploma (Licenciado2) in this case. Answer the
questions below after doing the analyses.
a. The results show that quality and satisfaction as rated by customers is not different
between those with a degree and those without.
60
b. The satisfaction produced by the graduates is greater than that of the non-graduates
and the size of the effect is quite good. The quality is also better and the
standardized effect is also quite good. There are no significant differences in
resolution time with a two-sided
test.
c. The results show that the
unions were wrong as
qualification does not lead to
better work.
d. The results show that the
graduates are better than the
non-graduates in general and
that the effect is robust in all
cases.
There are significant differences in

Satisfaccion and Calidad, but not in
TiempoResolution, although the result
is close to significance. Furthermore,
the effect size is moderately high in the cases where the differences are significant. In
general, it seems that owning a diploma is associated with better performance at this job.
T-test table
- FALSE. Nope.
- TRUE. This statement is correct. Look at the solution for the explanation
- FALSE. This alternative has nothing to do with the question so even if it is true, which
has not been verified at the moment, it is beside the point.
- FALSE. In general they are quite good but not faster (which is an important point for
the company, of course)
7. One of the oldest psychological theories in history is the one that relates physical aspects
of people’s heads to their character, intelligence, or whatever. At the end of the 19th century,
phrenology became quite popular and more recently neuroscience has applied that same
idea by taking advantage of new technologies.
61
The following data test a specific hypothesis about the relationship between brain and
intelligence. The description of the experiment can be found here and in summary allow us
to test the hypothesis of whether being big
headed (pun intended) is related to intelligence.
The data is in the file Brain size (there are two

others very similar in name, the one I am referring
to is almost at the end of the entire list in section
15).
Unfortunately, as in this case what I need is a

categorical variable with two categories to use as
an example, the hypothesis that we are going to
test is whether men and women differ in some of
the measures that are in these data and which
are:
Intel: A measure of general intelligence based on

the Wechsler IQ test.. IntVerbal: A measure of verbal intelligence based on a subscale of the
Wechsler IQ test. PesoCorporal: BodyWeight: A measure of weight based on putting a
person on a scale (in pounds, not kilos, not that everyone in the study is overweight).
TamaCerebroResMagnet: A measure of brain size based on the number of pixels in the
image. AlturaPulgadas: HeightInches: The height in inches.
a. The results show that the height of the people is not related to their intelligence.
b. The results show that the brain size of the people is not related to their intelligence.
c. There are significant differences in intelligence and verbal intelligence.
d. There are no significant differences in intelligence and verbal intelligence.
The results show that there are no significant differences between genders in the variables
of intelligence and verbal intelligence, but there are in weight, height and brain size.
T-test table
62
Test 8. Anova for independent variables
1. An area of research and application in Psychology is the one related to perception through
our senses. Perception requires mental processing and many psychological disorders that
affect mental capacity alter perception. Likewise, having problems with perception and
sensation can have psychological effects of many kinds. As an example, it seems that
people’s personality can influence their perception of whether a hearing aid works well
(Personality, Hearing Problems, and Amplification Characteris... : Ear and Hearing
(lww.com))
Technological aids can be a lifesaver for many, but tuning them up is often not as easy as
desired.
Hearing-enhancing devices must be individually fitted. One way to check that a device is
working well for a patient is to have them listen to a recording with 25 words spoken clearly
but loudly. However, there are words that are easier to recognize than others, so it is
important that the lists have the same difficulty. Another problem is that hearing aids amplify
both the correct sound and background noise. Four lists were tested to see if they were
equally difficult to recognize when there is background noise. As having heard a list before
makes it easier to recognize the words, any time that adjustments are made to the
apparatus, the list must be changed to do the tests. In the experiment, 96 subjects with
normal hearing listened to lists of words in English to verify that they were of the same
difficulty. Each group of 24 subjects heard a different list.
The data is in the file oirentre.sav (Section 9.4 Escuchar)
Analyze the data and indicate what you would do with the words’ lists (there is more than
one possible solution).
a. Only use lists 1 and 4.

b. Make two groups of lists (1 with 2 on one side, and 3 and 4 on the other).
c. Use only list number 2.
d. Do not use list 1.
e. Make two groups of lists (1 with 4 on one side, and 2 and 3 on the other).
The independent variable is the type of list and the number of words recognized, a numeric
variable, is the dependent variable. The appropriate technique to see if there are differences
in recognition between words is the analysis of variance.
The first step is to make a box plot to see if there are outliers, equality of variances,
asymmetry, etc.
The plot shows that list 1 seems to be easier than the other three since, on average, the
subjects who listened to it recognized more words. The second list is better than the third
and fourth, and the last two seem to be very similar.
63
It is interesting to see the descriptive statistics of the data. We see that the means follow the
order shown in the boxplots. The main part, the boxes, display similar variability, and both
the standard deviation and the standard error in the descriptives table below confirm this
impression. The means of lists 3 and 4 are practically the same, but the list’s 2 mean is
somewhat higher and the list’s 1 mean is quite different from the others, with a difference of
7 words more recognized on average than lists 3 and 4.
The plot shows that the differences are significant and what would be the distribution of F
and η2 if the null hypothesis were true. The box plots are similar to the ones we have seen
before but we also see a representation of the variability of the mean of all the data (the red
almond) and the position of the means of the lists. This plot illustrates that the lists are
further apart than might be expected by chance, but it’s best to refer to the value in the
analysis of the variance table shown below to confirm it. The distribution of F and η2 also tell
us the same thing: the value of F and η2 is greater than 95% of the valuesof F or η2 that we
could find at random if the differences between the groups were zero. This means that there
are indeed significant differences between the groups.
However, we still need to check the results in the tables and we must also determine
between which groups the differences occur, and although it can be seen that both 1 and 2
could be different from the other two, we need to confirm such observation with hypothesis
tests.
The analysis of variance confirms that there are differences between the groups. This
appears both in the F test, which does not control for non-homogeneity of variance, and in
the Welch test, which corrects for non-homogeneity of variance. In this case, we can see
that both results are very similar.
The value of η2 is 0.14. That means that, although there are differences, the effect size is
not very large.
Which group is different from the others is determined using the pairwise tests. We can
display the result on the boxplot to see it more clearly. The table shows that list 1 is different
from 3 and 4, but not from 2. List 2 is not different from any of the lists since it is in an
intermediate position. The boxplot shows only the differences that are significant.
IN SUMMARY, we could do two things: keep lists 2, 3, and 4 since there are no significant
differences between them, or, group 1 and 2 on the one hand since there are no differences
between them, and group 3 and 4 on the other.
OR ALSO, most likely in this case, we might exchange words from one list to another
seeking to make them more similar, although for this we would need a theory about which
words are the most difficult to understand (we would have to review scientific knowledge
about what makes some words more difficult than others).
- FALSE: Look up the solution for the explanation.

- TRUE. This option is valid.
- FALSE: Mira la solución para la explicación.
- TRUE. This option is valid.
64
- FALSE: Look up the solution for the explanation.
65
2. Pain is a very complicated symptom because although connected with physiological

causes, it often interacts with psychological characteristics. Furthermore, the only way to
assess pain is through subjective scales based on each person’s experience of pain.
One study tested the effectiveness of three drugs (medication) in reducing headache pain
(labeled A, B, and C). High scores indicated greater pain. Twenty-seven patients randomly
assigned to each of the drugs were used. Subjects had to take the drug at the next migraine
attack and indicate their pain level 30 ms later, from 1 to 10 where 1 is no pain and 10 is
extreme pain.
Drug A is the cheapest of all. The second in price is the B, and the most expensive is the C.
What medication would you recommend that doctors in your autonomous community
prescribe taking into account the results that can be seen below?
The data is in the file DrogaPain.sav (Section 9.4 Pain and pills)
a. Medications are useless to remove the pain because they do not go to the root of the
problem and if one endures in the end it winds up going by itself.
b. The A.
c. The C in first place and the B in second place as long as the money lasts.
d. The B in first place and the C in second place as long as the money lasts.
e. It is best to take one pill of each because this method never fails.
Looking at the box plots, drug A seems to work best. Drug B has an outlier. It could be
interesting to repeat the analysis without that subject to see what happens (I have done it but
I have not shown it, I leave it as an individual exercise to verify that the results are still
similar).
The descriptive ones confirm that drug A leads

to lower pain values.
66
El análisis de varianza confirma que las diferencias son significativas. El tamaño del efecto
es grande. Parece que las drogas realmente tienen efectos diferentes.
The pairwise tests show that drug A produces lower levels of pain than the other two, and
that these differences are significant. The differences in the pain scale are two points
between drug A and the other two, which seems to be a fairly important difference.
- FALSE: Very brave indeed.

- RIGHT. This is a valid possibility.
- FALSE: Drug A is better and cheaper, why would you recommend this?
- FALSE: Drug A is better and cheaper, why would you recommend this?
- FALSE: Medicines often have side effects that have usually been studied to
determine if they are serious. Drug combinations tend to be less well studied so that
strategy is very risky.
67
3. The use of a new method of teaching mathematics in fourth grade is proposed in a school
district. There were fifteen schools and you were interested in checking if the schools have
different or similar levels before applying the teaching method. To verify this, a test was
applied to 120 students (8 randomly selected students per school). Check if the schools
have a similar level in that test.
The data is in the file Escuelas.sav (Section 9.4 Más Escuelas)
a. There are no significant differences in performance between the different schools.

b. There is a school that has the highest average grade.
c. There is a school that has the lowest average grade.
d. The smallest schools have the highest grades.
e. The smallest schools have the lowest grades.
There are some outliers in the boxplots. In addition, there are schools that have less
variability than others, so in this case it is especially interesting to use the formulas that
correct for the lack of equality of variances.
The descriptives confirm the box-plots’ information.
The analysis of variance shows that the differences between the schools are not significant.
Interestingly, the p-value calculated using Fisher’s method is different from Welch’s, but it
does not change the conclusions.
Normally it would not be necessary to carry out pairwise tests when the analysis of variance
has given non-significant results, but I include them so that you can see that none of the
comparisons is significant (the table is very long).
- RIGHT. Bingo!
- FALSE: What a surprise, right? There is always one who is the best.
- FALSE: What a surprise, right? There is always one who is the worst.
- FALSE: That’s what they told Gates and they spent a lot of money for nothing.
- FALSE: I don’t know where you get that from.
68
+ GAMES HOWELL TEST

● comparison
○ A
○ B
● cases
○ n1
○ n2
● dif
○ x1-x2
● 95% CI
○ lower
○ upper
● Test
○ t
○ df
○ PTurkey
4. In a study conducted at a university in the United States (Bauman and Jones, Purdue
university, cited in Moore and McCabe 1989), the effect of three study methods on improving
reading comprehension in children was studied. Participants were assessed on two
measures of reading comprehension before being trained on the methods (Pre1 and Pre2)
and three measures after being trained (Post1, Post2, Post3). In this case, we want to check
if the subjects in the groups that were going to be trained in each of the methods were
69
similar to each other before being trained with them, in order to be more certain of the effects
of the methods.
The three methods that were tested were called the Basal (Control group), the DRTA, and
the Strat. I have no idea what they consist of, but we will see which one works best
assuming that higher scores on the Pre1 and Pre2 measures are better than lower scores.
The exercise now is to test if Pre1 and Pre2 are similar in the three groups or not.
The data is in the file ComprLectora.sav (Sección 9.4).
a. There are no significant differences between the groups.

b. There are differences in the Pre1 measurement but not in Pre2.
c. All groups are different from each other.
d. Subjects in the DRTA group are better than the other two.
e. Subjects in the Strat group are better than the other two.
There are some outliers in the boxplots. In addition, there are schools that have less
variability than others, so in this case it is especially interesting to use the formulas that
correct for the lack of equality of variances.
The descriptives confirm the box-plots’ information.
The analysis of variance shows that the differences between the schools are not significant.
Interestingly, the p-value calculated using Fisher’s method is different from Welch’s, but it
does not change the conclusions.
Normally it would not be necessary to carry out pairwise tests when the analysis of variance
has given non-significant results, but I include them so that you can see that none of the
comparisons is significant (the table is very long).
- RIGHT. Bingo!
- FALSE: What a surprise, right? There is always one who is the best.
- FALSE: What a surprise, right? There is always one who is the worst.
- FALSE: That’s what they told Gates and they spent a lot of money for nothing.
- FALSE: I don’t know where you get that from.
70
5. There are three things in life…the song says…health, money and love. A popular theory
about happiness is that having these three aspects well covered in life are the best path to
happiness (and they are probably not far wrong).
This theory can be tested in Spain using data from the European Social Survey archive
(ESSEspanya.sav). In this survey there is a question called “happy” and that is the answer to
the question “How happy are you?” on a scale of 0 to 10, which, although it is obviously an
ordinal variable, it is acceptable to analyze it as a numerical variable.
In this case I will analyze the variable marsts (Legal marital status) as an indicator of “love”,
or at least, as an indicator of sentimental relationship. There are five states in the database
71
and the hypothesis that could be tested is that those who are in a romantic relationship will
be happier than those who have ended or failed in a romantic relationship. Not having been
in any relationship…I don’t have a clear hypothesis drawn from my “theory”.
The data is in the file ESSEspanya.sav (Section 15 European Social Survey Spain).
a. It doesn’t matter if you’re married or single if you know how to manage yourself.
b. Singles are happier than married people.
c. Divorced people are the happiest.
d. Singles are happier than divorcees and widowers.
e. Being separated or divorced is worse than being married
The box plots show quite similarity between the groups. Singles and married seem to be a
little above the others although the differences do not stand out much.
The descriptives confirm that the means are quite similar but we need to confirm it with the
significance tests. As a curiosity, there are very few legally married or separated in the
sample and many never married.
The analysis of variance shows that there are significant differences, although the
standardized effect size is very small. Since there are many cases in the survey it is almost
inevitable that the value of p will be significant so we will be better off paying attention to the
value of η2 rather than to the value of p.
Pairwise comparisons show that the only significant differences are between those who are
divorced and those who are single, and between those who are widowed and those who are
single. Obviously, in the latter case there is an effect of age that should be taken into
account: for example, these analyzes could be done by age groups to compare the effect of
widowhood or other variables with people of a similar age who are still married. . Similarly,
being single at certain ages may be associated with happiness and not at others.
Actually, as you can imagine, happiness is a somewhat more complicated matter than the
song suggests and therefore, in order to make your report on the subject, my suggestion is
that you document yourself in a more serious than what I have done in this case.
- FALSE: O yeah, whatever.

- FALSE: See the solution for the explanation.
- RIGHT. In principle it is correct but see the solution for the explanation why this result
must be qualified.
- FALSE: See the solution for the explanation..
72
73
Test 9. T-test for paired samples
1. Nicotine goes to nicotinic receptors in the brain, increasing the release of numerous
neurotransmitters. Cotinine is a product of the transformation of nicotine by the body and
that remains in the body for a long time, so it is used to measure exposure to nicotine since it
disappears more quickly from the body (that is why nicotine tests are usually actually
cotinine tests).
Nicotine may play a role in certain mental illnesses. For example, people with schizophrenia
have much higher rates of tobacco use than the normal population, and although the causes
of this are difficult to discern, there is some evidence of underlying biological factors.
The amount of nicotine that a smoker metabolizes can depend on their genetics and also
their diet. One factor that can affect is menthol products, which are put in certain cigarettes.
There are suspicions that these products can increase tobacco consumption, or make it
more addictive, so it is interesting to study their influence on the body.
In this study the effect of a mentholated drink on the amount of nicotine, cotinine, and their
ratio (nicotine/cotinine) in smokers was investigated. Subjects spent a week with three mint
drinks a day and then a week without any menthol (well, it’s a bit more complicated but you
can read the details in the article if you wish). The urine samples were analyzed and there
was interest in comparing whether there was a difference in the amount of nicotine, cotinine,
and the Nicotine/Cotinine ratio of the subjects.
The data is in the file Menta.sav (Section 9.4).
a. Mint drinks lead to more nicotine, less cotinine, and a higher Nicotine Cotinine ratio
than drinks without menthol.
b. As the samples are very large, you have to look at the size of the effect, which in this
case is negative, and therefore taking mint drinks does not affect the nicotine tests at
all.
c. Mint drinks lead to less nicotine, more cotinine, and a lower Nicotine to Cotinine ratio,
than drinks without menthol.
d. Mint drinks do not affect the metabolism of nicotine
e. Chewing gum is the best way to hide that you have smoked
74
75
It is interesting that you look at how the data is organized. For analysis of paired, dependent,
or related measures (these names are equivalent), the data is most often than not organized
in columns as shown below.
To do the analysis, the first thing is to make a box plot to see if there are strange values,
equality of variances, asymmetry, etc. Note that each variable is plotted next to its pair since
we have three sets of variables.
The output below shows the descriptives for all the variables and the tests between the pairs
of variables. All comparisons are significant.
The cotinine level is higher in the no mint condition (which would mean that the body does
not metabolize nicotine so quickly when mint drinks have been taken than when not since
the body converts nicotine into cotinine).
The level of nicotine is lower when you do not take mint drinks for the same reason that
cotinine is higher: without mint drinks, nicotine is metabolized faster.
The ratio between nicotine and cotinine is where the differences are largest (we see it in the
effect size which is 1.21, the largest of all). There is more nicotine relative to cotinine with
mint drinks than without them.
- RIGHT. This is correct. See the solution for the explanation.

- FALSE: This is so astray that I don’t understand how you still answered it.
- FALSE. It is just the other way around but since I have put words in bold, you may
have been fooled to believe that it was the correct one.
- FALSE: I have no idea but in any case, you’d be better off not smoking
2. Anorexia is a mental illness with a very high mortality rate.
There are several types of therapies for anorexia, among which we will focus in this exercise
on family therapy.
In a study by Professor Brian Everitt* * and described in Hand, DJ et al. (p. 229)** the
weights in kilograms of a group of young women who received three types of treatment for
anorexia were analyzed. Unfortunately, there is not much information about these data come
from or the conditions of the study, apart from what has been said, but Professor B. Everitt
worked all his life at the Institute of Psychiatry at King’s College London and I suppose that
these data come from some study carried out in this center.
The file is in section 15 and is called Anorexia. The goal of this analysis is to compare the
weights of the patients before and after receiving family therapy.
a. The therapy had a detrimental effect on the patients.

b. It cannot be denied that the therapy had a detrimental effect on the patients.
c. The therapy did not have a beneficial effect on the patients.
d. The therapy had a beneficial effect on the patients.
76
The first step of the analysis is to make a box plot to check if there are strange values,
equality of variances, asymmetry, etc. In this case, it seems that the effect is important
although there are three or four patients who still have very low weight after the therapy.
A graph that is interesting to check is the so-called parallel coordinates graph. In this graph,
we link each patient with a line before and after the therapy to visualize the change.
In this plot, you can see that some of the subjects did not gain weight, but actually lost it, and
there are four particularly worrying cases that ended up with quite low weight. Although the
median weight after therapy is higher as we saw in the box plot, there are some cases where
it does not seem to work at all.
After therapy, the patients were 3.27 pounds heavier on average, and the differences were
significant. The effect is very large so we can be satisfied with the result.

- FALSE. If you state the hypothesis in one-sided terms, the null hypothesis would be
that the therapy had no or detrimental effect on the patients and the study that it was
77
positive. In this case, we reject the null hypothesis and also that the difference is on
the benefit side, so this statement is false.
- RIGHT. This is correct. See the solution for the explanation
on cognitive therapy.
The file is in section 15 and is called Anorexia. The goal of this analysis is to compare the
weights of the patients before and after receiving cognitive therapy
a. The therapy had a beneficial effect on the patients.

b. The therapy did not have a beneficial effect on the patients.
c. The therapy had a detrimental effect on the patients.
d. It cannot be denied that the therapy had a detrimental effect on the patients
The first step is to make a box plot to see if there are strange values, equality of variances,
asymmetry, etc. In this case, it seems that the effect is not too large since the medians are
quite similar. In addition, there is a little more variance after therapy.
In this graph, it can be seen that several patients had a fairly notable increase in weight, but
many remained more or less stable and they even reduced their weight.
After therapy, the patients were 1.35 pounds heavier on average, and the differences were
significant. The effect is moderate compared to the family therapy so it seems that family
therapy worked better.
- RIGHT. This is correct. See the solution for the explanation.

- FALSE. If you state the hypothesis in one-sided terms, the null hypothesis would be
that the therapy had no or detrimental effect on the patients and the study that it was
positive. In this case, we reject the null hypothesis and also that the difference is on
the benefit side, so this statement is false.
78
on NO therapy.
The file is in section 15 and is called Anorexia. Compare the weights of patients who did not
receive therapy (i.e, they were in the Control group) before and after therapy.
a. Not applying therapy had a detrimental effect on the patients..

b. No therapy had no effect on the patients.
79
c. Not applying therapy had a beneficial effect on the patients.
The first step is to make a box plot to see if there are strange values, equality of variances,
asymmetry, etc. In this case, it seems that the effect is null with similar weights in the
patients before and after.
Although the average effect is null, what we see in this graph is a large number of different
trajectories, with subjects improving a lot and others getting much worse than were before.
After therapy, patients were 0.2 pounds less on average in the sample and the differences
were not significant. The effect is null, so it seems that family and cognitive therapy worked
better than no therapy.

- TRUE. This is correct. See the solution for the explanation.
80
Test 10. Anova repeated measures
1. 12 aliens are given a list of 10 objects to memorize and are asked to repeat it in one
minute. The exercise is performed three times: in the morning, in the afternoon and then
again in the evening.
The data is in the Memory link (Memory_Recall.sav)
a. There are no significant differences between the times of the day in terms of memory.
b. There are differences, but they are only between morning and night.
c. The best time of day for intellectual work is in the morning, as everybody knows
To do the analysis, the first thing is to make a

box plot to see if there are strange values,
equality of variances, asymmetry, etc.
There are several peculiar things in that graph.
In the morning there is an alien that rocks, but
most of them don’t seem to work very well.
After noon they improve somewhat in general
but there is a lot of variability. They are
definitely best in the evening.
The results of the repeated measures Anova
are shown below.
The results show that the differences are significant but the effect size is not too large. In this
case it is appropriate to see the post hoc comparisons.
You can see that the only significant comparison looking at the p.adj column is between
morning and night. This can be seen in the table or, more easily, in the graph.

- FALSE: For humans maybe, but remember these are aliens.
81
2. 10 aliens are offered a meditation course aimed at improving their reasoning abilities. This
is measured by a reasoning questionnaire that, as it is alien, we are not able to understand.
These measures are taken every two weeks until the sixth week (4 measures in total).
The data is in the Reasoning file (Reasoning_Ability.sav).
a. Meditation has no significant effect on the aliens’ reasoning.

b. There are differences, but they only occur in the sixth week and with respect to time
zero and two weeks.
c. Differences only occur between week six and week zero.
To do the analysis, the first thing is to make a

box plot to see if there are strange values,
equality of variances, asymmetry, etc.
In this case, the boxplots look quite
symmetrical. It seems that there is some
progress in reasoning ability but it is necessary
to see the significance tests to confirm that
there is progress.
The results show that the differences are
significant but the effect size is not too large. In
this case it makes sense to see the
comparisons a posteriori.
It can be seen that the comparisons are significant between the sixth week and the first two
weeks and the initial one. In principle, the treatment seems to have an effect, although only
after a month of practice (and if you are an alien).

- FALSE: For humans maybe, but remember that these are aliens so who knows
82
3. The aliens are quite sensitive to temperature so they often complain that when the
spacecraft has the thermostat turned down too low, “it’s hard for them to think.” An alien
psychologist wants to test this problem and has asked a random sample of aliens to perform
a series of mental tests under three conditions (hot, wet, and cold). The measurements are
taken with tests of similar difficulty and the subjects go through the three conditions in
different series to reduce the effect of the order in which the exercises are performed.
The data is in the file Hot Cold Humid

(HotColdHumid.sav).
a. There are differences, but they are only

between the cold temperature and the others.
b. There are no significant differences
between the different conditions.
c. The aliens think better when the
temperature is pleasant.
Contrary to what the aliens say, cold weather seems to be the best thing for them. Humid
weather on the other hand shows a lot of variability.
The results of the repeated measures Anova are shown below.
The results show that the differences are not significant. There is no point in performing post
hoc tests.
83

- FALSE: We do not have a “pleasant” temperature condition so this cannot be the
correct answer.
4. The resistance of the skin of the aliens is associated with sweat and this with states of
psychological activation. That is why this measure is used in Psychology to, for example,
teach subjects to control their anxiety. This study wanted
to test whether five different types of electrodes worked
the same way or produced different measurements. The
data is in the Resistencia file (Resistencia.sav).
a. There are no significant differences between

electrodes 1, 3 and 5, but there are between the others.
b. There are significant differences between the
electrodes but it is not possible to determine exactly in
which ones they occur.
c. There are no significant differences between the
electrodes.
To do the analysis, the first thing is to make a box plot to

see if there are strange values, equality of variances,
asymmetry, etc.
The box plot shows that there are a couple of odd
values. The subject that looks the weirdest is number
15.
In the parallel coordinate diagram you can see that
subject 15 stands out quite a bit in his results with
electrode 2 and 3. There is another subject that is also a
bit strange in three electrodes (subject 2).
Below are the results of the Anova. It can be seen in the
descriptives that electrode 2 and 3 give very high
measurements compared to the other 3 but still the
differences are not significant according to the usual
criteria (<.05).
84
The results show that the differences are not significant but it would be interesting to see if
they remain significant without subject number 15.
Without subject number 15 the differences are still not significant so, in principle, we cannot
say that the electrodes work differently. However, the effect size is medium, which suggests
that by increasing the sample size, the results could possibly become significant.

- TRUE. This is correct. See the solution for the explanation
5. In a study of the effectiveness of training to improve the sexual attitudes and behavior of a
group of adolescent aliens, the frequency of unprotected sex was evaluated 6 months before
85
the intervention, 6 months after, between 6 and 12 months, and

from 12 to 18 months. The data is section 9.4 in Protected Sex
(file ProtectedSex.sav). Analyze if there are differences between
the frequency before the treatment and the rest of measures.
The variables that we will use are Pre (six months before), Post
(six months after), FU6 (from 6 to 12 months), and FU12 (12 to 18
months).
a. There is no change in the frequency of protected sex

between any of the conditions (times).
b. There are changes in the frequency of protected sex
between each of the conditions (times).
c. There is no change in the frequency of protected sex
between Pre and FU12
It is interesting that you look at how the data is organized. For

analysis of paired, dependent, or related measures (all such
variations are used), the data is organized in columns as follows.
To do the analysis, the first thing is to make a box plot to see if
there are strange values, equality of variances, asymmetry, etc.
It is possible to see that the four columns are quite symmetrical.
There is more variability before treatment although it seems to
decrease as time goes by. In addition, it seems that the treatment
has had effect as there is evidence of some reduction in this
dangerous behavior. However, as the data does not have a control
group to compare, we do not know if this evolution also occurs in
aliens who do not attend courses to improve their attitudes on
these issues.
The means drop as time goes by but the result of the analysis of
variance is not significant. In this case it makes no sense to look at
the pairwise comparisons, and therefore there is no support that
the treatment had an effect on the frequency of safe sex (these teenager aliens…).

86

Wuolah Free SII Tests

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Wuolah Free SII Tests

Cargado por

Copyright:

Formatos disponibles

SII-Tests.

Reservados todos los derechos.

Group ARA, Year I

Test 2. Foundations of psychology 11

Test 3. Scientific method 17

Test 4. Rules for test selection 23

Test 5. Steps for applying tests 28

Test 6. T-one sample tests 36

Test 7. T-tests for independent groups 52

Test 8. Anova for independent variables 62

Test 9. T-test for paired samples 73

Test 10. Anova repeated measures 80

a. The study is a randomized clinical trial.

- TRUE: This one was easy: it is in the title.

a. The study has a reference section that seems quite reasonable.

a. The study had a control group.

- TRUE: YES, the study had a control group.

a. The study is a randomized clinical trial.

- FALSE: Not even remotely.

- FALSE: Here we go again.

6. This wikipedia page (https://es.wikipedia.org/wiki/Sugestopedia) describes a method for

a. There are experimental studies in Google Scholar backing up the Sugestopedia

- FALSE: I haven’t found any but if you have, tell me so I know.

- FALSE: Whether something is evident is in the eye of the beholder. In my opinion

a. This article studies the effect of listening to music on academic achievement.

Test 2. Foundations of psychology

1. The standard error of estimation of a mean calculated on a sample depends on:

2. The population census is:

3. A sample taken from a population is:

5. In most studies, the number of samples usually taken are:

- Correct: and its name is standard error.

10. The sampling error is explained in relation to:

When the samples are small, if, by chance, there is a

14. The main contribution of this guy to the world was:

Student worked for a beer company and although he would

15. Student’s t distribution is:

Test 3. Scientific method

1. The results part of the report of an study:

2. If a study produces results against our theory:

d. It cannot be taught as it relies only on the brilliance of the researcher

a. That students drink 7 glasses of water daily or fewer than 7.

a. That 25% of the students smoke.

9. IMRD is the name of the sections that:

10. Popper’s falsificationist logic indicates that:

b. It is necessary to verify if the data of the investigations are false.

11. The results part of the report of an study:

12. The significance value is:

13. The steps of the scientific method are usually:

14. Psychology is a … science:

Test 4. Rules for test selection

a. An analysis of variance test.

We want to study how satisfaction depends on marital status because satisfaction is a

a. An analysis of variance test.

d. Pearson correlation or Spearman correlation.

a. Age is a numerical variable and universalism is numerical so we can use a Pearson

a. Asking him, because asking is the best way to go to Rome.

Test 5. Steps for applying tests

The statistical packages may show or not the

Is your life Exciting or Dull?

The boxplots show an increase in family income by

a. The results do support the hypotheses set in the question.

The blue line representing the average of the

Draw a histogram of the income91 variable to

a. There are more families earning low