Documentos de Académico
Documentos de Profesional
Documentos de Cultura
a) In the specified model, both the variables corresponding to racial identity (black and
hispan) are statistically insignificant for conventional levels of significance, i.e. at 10%,
5% and 1%. Further, as shown by the F-test, they are also jointly insignificant. Hence, we
claim that on average, race is not a significant predictor of a baseball player’s salary.
Similarly, three of the four variables that capture player performance are statistically
insignificant at conventional levels. These are bavg, hrunsyr, rbisyr. However, allstar is an
important predictor of salary at the 99.5% confidence level. This implies that MLB team
owners deem selection for the all-star team as the dominant proxy for player performance,
rather than raw batting statistics. Thus, we find that for each additional year that a player is
b) It is very likely that the variables indicating a player’s performance are highly correlated.
For instance, if a player scores more home runs in a given year (hrunsyr), then it directly
increases both his batting average (bavg) and runs batted-in (rbisyr) for that year, and vice
versa. Consequently, the player’s likelihood of being selected for the all-star team rises,
thereby leading to potentially higher values of allstar. We evaluate this claim using a test
for Variance Inflation, which yields an especially high vif for rbisyr, indicating that it is
comparing their vif to a predetermined threshold, with variables having a high vif
being dropped from the model. Subsequently, the vif of remaining regressors goes
down due to a fall in the R2 of the auxiliary regressions, which reduces the standard
errors on our estimates of the beta coefficients, thereby making them more precise.
ii. Conducting Principal Component Analysis on the model, through which it is
possible to collapse multiple highly correlated regressors (x1, … x4) into another
variable, known as their principle component (z). This variable replaces (x1, … x4)
i. By setting our arbitrary threshold at vif <= 10, we drop rbisyr from the model. This
is because it’s vif of 19.01 indicates that ~94.8% of the variation in this variable can
However, this does occur at the cost of endogeneity in the model, since the rbisyr
portion of the error term will be correlated with the other regressors.
ii. The PCA approach, by design, aims to capture the maximum possible variation in
the correlated regressors through the principal component (factor). This is done by
ensuring that the eigenvalue of the selected component(s) is greater than 1, meaning
that it explains the variation in those regressors fairly well. Hence, the issue of