Está en la página 1de 7

Assignment n.

5 (due November 2)
Readings
Chapter 5 - Estimation and Inference, pages 75-85, from Hand.
Chapter 2, Sections 1, 2, 3, 4, 5, 8.1 and 8.2 (only confidence intervals) from Weisberg.
Optional
Appendix A.3 Least Squares for simple regression, from W.
Appendix A.10 Maximum Likelihood Estimates, from W.

Problems
Solve problems 2.1.1, 2.1.2, and 2.4.2 from W.
Consider problem 2.1 in Weisberg and assume the true mean function is linear.
a) Provide a confidence interval for the slope under the assumption that the response
variable is conditionally normally distributed.
b) Provide a confidence interval for the slope under the assumption that the response
variable is conditionally normally distributed with a different confidence level w.r.t your
answer in a)
c) Provide a Wald asymptotic confidence interval for the slope.
d) Write an overall comment on your findings.
e) List the assumptions needed in order to properly derive a confidence interval.
Optional
2.3.1 from W.
Prove the LS estimate of beta1 in a simple linear regression model is given by SXY/SXX.

Assignment n. 4 (due October 26)


Readings
Chapter 1 (except Section 1.6), from Weisberg's 2005 textbook.
To undestand the difference between mean and median, read the Section "Averages"
starting on page 27 from Hand.
Read about least absolute deviations and the Median regression function in Section 5 in
"Robust Regression in R", by John Fox & Sanford Weisberg, available at:
http://socserv.mcmaster.ca/jfox/Books/Companion/appendix/Appendix-Robust-Regression.pdf
Problems
1. Do problems 1.2, 1.3, 1.4 (from Weisberg 2005). To obtain the datasets, the R package
"alr3" is needed.
2. Recreate the graphic given in "homework_species.pdf" (see Assignment n.1), but now
superimpose on the plot:
a) the regression line estimated via ordinary least square on the whole dataset
b) the regression line estimated via least absolute deviation on the whole dataset
c) the regression line estimated via ordinary least square on the dataset without outliers
d) the regression line estimated via least absolute deviation on the dataset without outliers.
Write a couple of sentences to explain the differences obtained, if any, by the four solutions
above.
Useful commands are:
## LAD regression
install.packages("quantreg")
library(quantreg)
lad
plot(y~x)

abline(lad$coefficients[1],lad$coefficients[2],col=2)
Assignment n. 3 (due October 20)
Readings
An introductory lecture on R, by Alfonso Iodice D'Enza
Problems
1. Recall the lecture planned for next Monday 19 October has been CANCELED !
2. Revised Homeworks number 2 will be available on Monday in teacher's mailbox at the
fifth floor.
3. On Tuesday 22 September 2015 data were collected in class through an anonymous
questionnaire. The data set is available in homeworks 2. Let us consider how father's Years
of Educations are related with mother's Years of Education.
a) Identify the predictor and the response. Motivate your choice.
b) Load the data in R and draw the proper scatterplot.
c) Consider a parametric linear model, draw the corresponding regression line on the plot,
and provide a value for the intercept and the slope.
d) Explain what these values tell to us.
e) Draw again the scatterplot, but now give a different coulour to points corresponding to
different student citizenship. Write a brief comment on what this plot suggests to you.
4. Unfortunately, some students did not answered to the questions on parents Years of
Education.
a) Impute somehow the missing values in parents Education Years. Write a sentence to
clearly explain your imputation method.
b) Estimate the same model used in Problem 1.c on the dataset obtained in 2.a. Compare
results.
5. Do your best to recreate the graphic given in "homework_species.pdf" (if not done yet,
see Assignment n.1).
Assignment n. 2 (due October 12)

Readings
Collecting good data, pages 36-42 from Hand
Survey sampling, pages 51-54 from Hand
Chapter 4 - Probability, with focus on "Random variables and their distributions", from Hand.
Problems
1. On Tuesday 22 September 2015 data were collected in class through an anonymous
questionnaire. All the students in class were interviewed. Briefly discuss if this data refers to
a population or to a sample.
2. On Tuesday 22 September 2015 data were collected in class through an anonymous
questionnaire. Assume data refers to a population, and let us focus on the distance ("How
far do you live from the University building ?") and time ("How long does it take to you to
reach the University building ?") variables. The data set is available here.
a) Identify the predictor and the response.
b) Load the data in R and draw a scatterplot of Distance on the vertical axis and Time on the
horizontal axis.
c) Discuss wheter the population mean function is a straight line. You may use a
nonparametric regression estimator to support your opinion (to do so, plot your data and
then use the command >lines(lowess(y~x,f=1/6)) ).
d) Consider a parametric linear model and provide a value for the intercept and the slope
(use the command >lm(y~x)$coef ).
3. Discuss whether the variable Years of Education of the father ("EduFather") is normally
distributed. May be with the help of a histogram - type >hist(variablename).
4. Discuss whether the variable Years of Education of the mother ("EduMother") is normally
distributed.
5. Discuss any difference you may observe between the distribution of the years of
educations of student fathers and mothers. Use both a graphical and a numerical
comparison - use the command > summary(variablename).
6. Using the command line > sample(variablename, 10), sample 10 values from a variable
at your choice within the student data set. Provide the maximum likelihood estimate of the
mean of the population.

Optional
1. Let Y be a Normal random variable. Derive the maximum likelihood estimate of the
parameter mu if a sample of size n is drawn.
Assignment n. 1 (due October 5)
Readings
To have an idea of possible applications of statistics, please read the Section "Examples"
starting on page 13 from Hand (2008).
To get an introduction to maximum likelihood estimates, read Point Estimation - pages 76-78
from Hand.
To recall missing values exist and one must be careful about them, read "Incomplete Data" pages 37-40 from Hand, and Section 4.5.1 in Weisberg (2005).
Optional
To have a technical approach to maximum likelihood estimates, please read the first pages
of Chapter 7 of the book "Probability and Statistics for Engineers and Scientists" by S. Ross.
Problems
1. PLEASE, recall I really do not care if you complete the homeworks. I do care you work on
them ! Just do your best.
2. (R class) ## Recreate the graphic given in "homework_species.pdf".
###### For now, just consider the line on the full data set.
## Proceed in the following way:
# a. Find and import the data set.
# b. Plot the data (plot function). Start with the basics and improve.
# c. Estimate two simple linear regression models (one for the full data set, and one where
the three outlying observations has been removed) between the brain weight (dependent
variable) and the body weight (independent variable) using the lm function. Add the
corresponding regression lines to the plot.
# The following functions might be useful: par, plot , axis, points, abline, text, legend

3. Let us assume that a random sample of size n=6 has been drawn from a certain
population of students. They have been asked if they have ever worked with people from
other countries.
The sample {Yes, No, Yes, Yes, Yes, No} has been observed. Let p be the probability of a
single student answering "Yes". That is, let p=P("Yes").
Compute:
a) the probability of observing such a sample if p=0.2
b) the probability of observing such a sample if p=0.3
c) the probability of observing such a sample if p=0.8
d) the probability of observing such a sample if p=0.9
Write the likelihood function.
Provide the value of p that maximizes the function.
4. Let us now assume that the sample {Yes, No, No, Yes, No, No} has been observed.
Compute:
a) the probability of observing such a sample if p=0.2
b) the probability of observing such a sample if p=0.3
c) the probability of observing such a sample if p=0.8
d) the probability of observing such a sample if p=0.9
Write the likelihood function.
Provide the value of p that maximizes the function.
5. Using the software R, draw the likelihood function of Problem 2. Just download the
software, open it and write the command line:
curve(x^4 * (1-x)^2, xlim=c(0, 1))

Then, draw a series of plots by substituting the values "4" and "2" with other numbers of
your choice within the formula. Observe what you obtain, and write a small sentence about
your findings (you do not need to report the plots within your solution).
6. Explain in your own words what the likelihood function represents and why the value that
maximizes it is of some interest.
Optional
7. A yellow candy was drawn from a box. Write the corresponding likelihood function with
the aim of estimating the fraction of the yellow candies within the box. Draw it by hand.
8. Eat some candies time by time (of any colour)

También podría gustarte