1. Assume that there is a population regression model
y = β0 + β1×1 + β2×2 + β3×3 + u
and that the model satisfies assumptions MLR1 through MLR5 in the population.
Indicate, without explanation, whether the following statements are true or false (each answer is worth 1 point).
a. If you take a random sample from the population, and estimate an OLS regression with y as the dependent variable and x1, x2 and x3 as the independent variables, the estimated coefficients of x1, x2 and x3 will be unbiased estimates of β1 , β2, and β3.
b. If you take a random sample from the population, and estimate an OLS regression with y as the dependent variable and x1, x2 and x3 as the independent variables, the estimated coefficients of x1, x2 and x3 will be equal to β1 , β2, and β3.
c. If you take a random sample from the population, and estimate an OLS regression with y as the dependent variable and x1, x2 and x3 as the independent variables, the estimated coefficients of x1, x2 and x3 will be statistically significant.
d. If you take two random samples from the same population, and use each of the samples to
estimate the population model using OLS, you will get the same β estimates from each regression.
e. If β1 is a positive number, then β1 (the OLS estimate of β1 that you would see on the Stata output after estimating a regression) may be a positive or a negative number.
2. Provide a short answer (one to three sentences) to each of the following questions. (It will sometimes help to think about the meaning and implications of the MLR assumptions for the properties of the OLS estimators, that is, whether the estimated β coefficients are unbiased, how efficient they are, and whether the OLS standard errors are biased or unbiased.) Each part is worth 3 points.
a. Suppose you have a sample that tells you the life expectancy of a 60 year old male in each of the 50 states. It also tells you the average amount of education completed and the average level of income for people over 60 in that state. You are interested in using regression analysis to estimate the effect of education on life expectancy. Is it a good idea to also include the variable measuring average income in this regression? Discuss the costs and/or benefits of doing so.
b. Consider the following population regression model that determines the score on a standardized test for elementary school children
score = β0 + β1classiz + β2faminc + u
where classiz is the size of the student’s class and faminc is annual income of the student’s family. The expected value of u is the same across all levels of class size and family income, but the variance of u is different for different classrooms. Does this cause the OLS estimators for the β coefficients to be biased? Explain.
c. Consider the regression model
Where cigs is daily cigarette consumption for an individual, price is price per pack of cigarettes in the local area, and income is the individual’s annual income. Suppose we have a random sample of 500 people. For the purposes of estimating β1, would it be better if there was a lot of correlation in the sample between price and income, or only a little correlation?
d. Consider the population model estimated for the previous homework assignment:
lpfries = β0 + β1prpblck + β2lincome + β3prppov + u
where lpfries is the log of the price of a small order of fries, prpblck proportion of people living in the restaurant’s zip code who are black, lincome is the log of median income in the zip code, and prppov is the proportion of people living in the restaurant’s zip code who are below the poverty line. Obviously, lincome and prpov are highly correlated with one another. Does this violate one of the MLR assumptions?
3. Use the data in APPLE.DTA to answer this question. These are experimental data in the sense that each family was offered a price pair (ecoprci, regprci), which are prices per pound of (hypothetical) “ecologically friendly” apples and regular apples, respectively. They then reported how many pounds of each apple they would buy at the given prices. The variable reglbs measured the reported demand for regular apples, and the variable ecolbs measured the reported demand for ecologically friendly apples. (Each part is worth 3 points)
a. Which is larger: the proportion of families reporting that they would buy no regular apples, or the proportion of families reporting that they would buy no eco-friendly apples? (use the Stata commands tab reglbs and tab ecolbs) .
b. Estimate the simple regression of ecolbs on ecoprc. Interpret the coefficient on ecoprc.
c. In the simple regression just estimated, test whether the estimated coefficient of ecoprc is statistically significant at the 1% level when the null hypothesis is βecoprc ≥ 0.
d. Now add regprc and faminc to the regression. Conduct the hypothesis test from part c again.
e. Interpret the coefficient of regprc. Does the sign of the coefficient make sense in term of economic theory?
f. Now estimate the regression
reg ecolbs ecoprc regprc faminc numgt64 num18_64 num5_17 numlt5 hhsize
Stata drops one of the variables. Which one, and why?
4. Use the data in ELEM94-95 to answer this question. Each observation in this sample is a Michigan elementary school. The dependent variable is lavgsal , which is the log of the average teacher salary in the school, and bs is the ratio of the value of fringe benefits to the value of salary for teachers in the school. (Each part is worth 3 points)
a. Find the average value of bs in the sample and explain in words what it means.
b. Run the simple regression of lavgsal on bs. Is the estimated coefficient of bs statistically different from zero at the 5% significance level? Is it statistically different from -1 at the 5% significance level? Justify your answers. (Hint: One way to do the second test is with the stata test command, test bs==-1 ).
c. Add the variables lenrol (the log of the number of students at the school) and lstaff (the log of the number of teachers and other adults working at the school) to the regression from (b). Find the p-value for H0: βbs = -1 against a two sided alternative. What do you conclude about H0 now?
d. Why is the standard error on the bs coefficient actually smaller in the part (c) than in the part (b) regression? (Hint: what happens to the estimate of σ – in Stata called the “root mean squared error” – when lenrol and lstaff are added?)
Now estimate a regression in which math4 is the dependent variable, and lunch, lavgsal, and lenrol are the independent variables. math4 is the percentage of the school’s fourth graders who passed their math proficiency test, while lunch is the percentage of the school’s students who come from low income families, as measured by their eligibility for free or reduced price lunch at school.
e. Interpret the coefficient on lunch. Is it statistically significant? In your opinion, is it large in magnitude? Explain your answer. (It might help to look at the summary statistics for math4 and lunch)
f. The coefficient on lavgsal is not statistically significant at the 5% level. Assuming that assumptions MLR.1 through MLR.5 are satisfied for this regression, does this allow us to conclude with 95% confidence that raising teacher salaries would have no effect on the math scores of fourth graders? Explain.