ECO 430 – Applied Econometrics General Writing Guidelines and Grading Your homework assignments should represent a well written and concise description of your efforts to undertake the task at hand. Your equations, tables and figures should all be presented in a format and style consistent with that appearing in Wooldridge. Your formatting should consist of one inch margins, double spacing and Times New Roman 12 point font. These formatting requirements are mandatory. Failure to comply with these seemingly basic rules will result in an automatic deduction of 10 points on your homework. In terms of the layout and structure of your homeowrk you are free to choose, however as a basic outline for what I would consider a successful project I would recommend that at a minimum your paper should include: Introduction: A clear statement of the the work being undertaken and the overall aim. As a conduit to understand the importance of the topic you are studying I would suggest using Google Scholar to help you track down appropriate citations. Appropriate use of these additional citations will have an affect on your overall grade. Data Description: This section should state in detail the data you are using. A useful tool is to construct a table of summary statistics of the data you will be using. Typically this table will involve means, medians, minima/maxima as well as standard deviations of the key variables of interest in the model you will be estimating. Sometimes it is also illuminating to provide subset summaries by considering interesting ‘splits’ of the data (for example looking at summary statistics for all variables along the male/female category). Model Description: This section needs to contain the baseline model that you will be using as well as at least one hypothesis of interest that you will be testing (when applicable). You need to clearly describe the model, its estimation and the test(s) involved (to the extent that they apply to what you have learned in this class). This is not the section where you describe your estimation of the model, merely you are setting up your paper here so that I know what exactly you will be estimating and how you will go about it. This section should be clear enough that if I gave the 1 paper to another group of students they could immediately determine what is going on without having to ask questions. Estimation/Inference: Here you will need to carefully and precisely outline how you estimated the model (and its variants) described in the previous section. You will need to provide an interpretation of all estimated coefficients (excluding the intercept) and discuss their individual statistical significance (when applicable). Moreover, you will need to perform at least one joint hypothesis test (when applicable). Conclusions: This section should summarize the key insights of the topic you are studying and a discussion of your overall findings from estimating the model. Bibliography: A reference section that appropriately cites all sources you used or drew from for the writing of your paper. This should include the paper you are referencing. Failure to include an appropriate bibliography will result in a mandatory 10 point deduction on your final grade. Also, given that successful projects display grammatically sound writing I will dock 1 overall point on your paper for every five grammatical mistakes made. As an example, a grammatical mistake can be as simple as using the word ‘on’ instead of ‘in’ or typing ‘form’ instead of ‘from’. Moreover, making incorrect economic/econometric statements will result in a mandatory 1 point deduction per mistake. These penalties will quickly add up so I suggest allowing yourself enough time to proofread and review your work.
ECO 430 R – APPLIED ECONOMETRICS FINAL EXAM – ANSWER KEY 130 POINTS Instructions: Please answer all of the following questions as best as possible. Points will be deducted for writing irrelevant statements. Partial credit will be awarded when it is earned. The point value for each question is in parentheses. The detail of your answer should correspond to the amount of points the question is worth. There are a total of 16 questions on this exam. Intuitive: Please answer the following questions trying to use logic and economic reasoning. I.1. (5) Explain why the linear probability model, for some level of the covariates, will always deliver estimated probabilities that are either greater than 1 or less than 0. The linear probability model always has the ability to produce estimated probabilities that are either greater than one or negative because the model is being fit with a line, which is unbounded. Probabilities, by definition are bounded between 0 and 1. Thus, this is a case where we knowingly have model misspecification and one of the costs of this misspecification is that we may obtain estimated probabilities that are inconsistent with theory. I.2. (10) Why do the coefficient estimates in models with nonlinearities in the covariates lose their ceteris paribus interpretation? Coefficient estimates from models with nonlinear covariates no longer retain their ceteris paribus interpretation because the nonlinearities no longer make it feasible to hold everything else fixed. For example, if the conditional mean was specified as β0 + β1x + β2x 2 , we cannot interpret β1 in a ceteris paribus fashion because it makes no sense to change x while holding x 2 fixed. Rather, we can interpret β1 directly by anchoring our change to the point x = 0, in which case, β1 is the change in y given a change in x, when x = 0. I.3. (5) Comment on the veracity of the following statement: “standard t-statistics are invalid if heteroskedasticity is present.” This statement is true. When the error terms are heteroskedastic then the estimate of the error variance under the assumption of homoskedasticity is invalid, which implies that the t-statistic is incorrect. This invalid formula suggests that all inference based off it will be useless. 1 2 I.4. (15) Define an instrument for the model yi = β0 + β1xi + εi . Next, comment on how having an instrument does not imply that the IV estimator will be accurate (your answer should discuss both issues of bias and variance). An instrument for the above model is any variable z such that E[ε|z] = 0. Now, if this variable z existed, we cannot guarantee that the IV estimator will be accurate as this also depends on the covariance between x (the potentially endogenous variable) and the instrument. Further, our IV estimator will be less precise than the OLS estimator because the IV estimator replaces the full variation that occurs in x with reduced form variation based on the link between x and z. Additionally, our IV estimator will not be accurate for a given sample due to the bias that is present in the estimator. It is only for large sample sizes that we can claim the IV estimator is accurate, which in practice may be untenable. I.5. (10) Suppose I test H0 : β1 = 0 against a two-sided alternative. If I fail to reject the null hypothesis at the 10% level does this also mean I would fail to reject the null hypothesis at the 5% level? What about the 15% level? If I fail to reject at the 10% level this means that I must have a test statistic with a p value that is greater than 0.1. Thus, my p-value is also greater than 0.05 so I would fail to reject the null hypothesis at the 5% level as well. However, without more information I cannot know the outcome of the test if it were conducted at the 15% level since I only know that p > 0.1. I.6. (10) The more variation in one of my explanatory variables means that the variance of the associated slope coefficient estimator is lower. Explain the intuition underlying this statement. This statement touches on the implication that variation in a covariate leads to more accurate estimates of the unknown coefficients of the model. In essence, the OLS estimator’s variance depends upon the variation in the error term and the proportion of the variation of the covariate of interest that is not explained by other covariates in the model. In math this is var(βbj ) = σ 2 T SSxj (1−R2 j ) . More variation in xj implies that T SSxj is larger and this in return implies that the variance of the OLS estimator is smaller. I.7. (10) Assume I have access to panel data so that my linear regression model is yit = β0+β1xit+αi+vit. Explain why a random effects specification requires the assumption that cov(xit, αi) = 0 for all t to produce an unbiased estimator of β1, but the fixed effects specification does not? Recall that αi is the time constant, individual effect. Given that the random effects model keeps the αi in the error term, the cov(xji, αi) = 0 assumption is needed otherwise this invalidates the conditional mean zero assumption that we need to produce an unbiased estimator, E(αi + vit|x) = 0. That is, this conditional mean zero assumption cannot hold unless cov(xji, αi) = 0 is true. 3 If cov(xji, αi) 6= 0 then we need the αi to appear in the model directly to account for the non zero correlation. Quantitative: In a recent 2014 article, “How University Endowments Respond to Financial Market Shocks: Evidence and Implications,” Jeffrey Brown, Stephen Dimmock, Jun-Koo Kang and Scott Weisbenner, research how university endowment payments correlate with the return on the endowments. Their assessment considers the following two models ln yi =β0 + β1Returni + εi , (1) ln yi =β0 + β1Returni ∗ DP os,i + β2Return ∗ DNeg,i + εi , (2) where ln yi is the logarithm of the $ value of the university endowment payout, Returni is the return on the university endowment, measured in percentage points and DP os,i is a dummy variable which is one if the university’s return on the endowment was positive and 0 if it was negative. Similarly DNeg,i is a dummy variable which is one if the university’s return on the endowment was negative and 0 if it was positive. Table 1 contains their baseline estimates. Note that the estimate for the intercept is not reported and heteroskedasticity robust and regular standard errors for each estimate appears underneath in parentheses and brackets, respectively. Table 1. Relation between payouts from endowments and endowment returns. Dependent Variable: ln(Endowment payouts in $) Regressors (1) (2) Return 0.35 — (0.05) [0.04] Return ∗ DP os — 0.13 (0.08) [0.09] Return ∗ DNeg — 0.82 (0.14) [0.16] Observations 3000 3000 R2 0.69 0.69 Q.1. (5) Please interpret the coefficient estimate for Return in Model (1). On average, if an endowment has a return of 10 percent, the payout in the current year increases by 3.5%, ceteris paribus (100 ∗ 0.35 ∗ 0.1 = 3.5). Q.2. (5) Please interpret the estimate of β2 in Model (2). On average, if an endowment has a decrease in its return of 10 percent, the payout in the current year decreases by 8.5%, ceteris paribus (100 ∗ 0.82 ∗ 0.1 = 8.2). Q.3. (5) Given that DP os,i + DNeg,i = 1 for all observations, do I need to worry about the dummy variable trap in Model (2)? 4 No. Only if Return also entered the model would the dummy variable trap be of concern here. Given the interaction with Return of these two dummy variables does not imply collinearity in the model. Q.4. (5) Suppose I wish to test H0 : β1 = β2 in Model (2). The heteroskedasticity robust p-value from doing so is 0.008 while the p-value assuming homoskedasticity is 0.012. Describe in detail why this difference would not matter if I was testing at the 5% significance level, but it would if I was testing at the 1% level. At the 5% level both tests would reject the null hypothesis of equal payouts since both p-values are less than 0.05. However, if the significance level was set to 1% then which test you selected would matter, as the heteroskedasticity robust p-value is less than the significance level (implying rejection) while
the homoskedasticity p-value is larger than the significance level (implying a failure to reject). In this case one may elect to use the heteroskedasticity robust version of the test as this test is robust to all forms of heteroskedasticity, as well as when the errors are in fact homoskedastic. Q.5. (5) A key conclusion of Brown et al. (2014) is that university endowment payouts respond asymmetrically to positive and negative shocks. In response to positive shocks, universities tend to leave current payouts unchanged while following negative shocks, universities actively reduce payout rates. Comment on the asymmetry in payout responses uncovered in Model (2) over that in Model (1). From an economic standpoint we see that when the return to an endowment is positive, the increase in the payout is roughly 6 times smaller than the decrease in the payout when the return to the endowment is negative. Statistically, we see that there is no change to the payout when returns are positive, but there is a statistically significant decrease in the payout when returns are negative. Q.6. (5) The heteroskedasticity robust p-values for statistical significance for β1 and β2 in Model (2) are 0.104 and 0.000, respectively. If you were conducting these tests at the 5% level what would your conclusions be regarding these asymmetric effects? In this case we would fail to reject the null hypothesis that β1 = 0 and we would reject the null hypothesis that β2 =. Thus, individually, the asymmetric effects of negative and positive returns hold at the 5% level. Q.7. (10) Detail why the two, single hypothesis tests you just investigated are inappropriate to test for the asymmetric effect of endowment returns. Setup a proper hypothesis that would allow you to test for the asymmetric effect suggested by Brown et al. (2014). These two single tests are inappropriate because each one ignores the other hypothesis. A true test for asymmetric effects would need to hold jointly and would require some form of an F-test. In our setting the appropriate null hypothesis is H0 :β1 = 0; β2 6= 0. However, this is a complicated null hypothesis to test. Mathematical: Please answer the following questions being as statistically precise as possible. 5 M.1. (10) You have the following regression model, yi = β0 + β1xi + β2x 2 i + εi where V ar(ε|xi) = σ 2x 2 i . Write out the feasible GLS regression that would produce OLS parameter estimators that were BLUE. yi/xi = β0 (1/xi) + β1 + β2xi + εi/xi M.2. (15) Consider a binary dependent variable y. Let ¯y represent the proportion of ones in the sample. Let ˆq0 = m0/n0 represent the % correctly predicted for the outcome y = 0 via the linear probability model (i.e. a fitted probability less than 0.5) and ˆq1 = m1/n1 represent the % correctly predicted for the outcome y = 1 via the linear probability model (i.e. a fitted probability greater than 0.5) where n0 (n1) is the total number of observations where yi = 0 (yi = 1) and m0(m1) is the total number of observations where ˆyi = 0 (ˆyi = 1). If ˆp is the overall % of outcomes that are correctly predicted (ˆp = (m0 + m1)/n), show that pˆ = (1 − y¯)ˆq0 + ¯yqˆ1. (3) Let n equal the total number of observations, n0 the number of observations where yi = 0 and n1 the number of observations where yi = 1. Then n = n0 + n1. Further, note that y¯ = n −1 Pn i=1 = n1/n. This implies that 1 − y¯ = n0/n. Now, by definition we have pˆ = m0 + m1 n . (4) Now, note that qˆ1 is the proportion of observations for which yi = 1 that the model correctly predicts, which we can quantify as qˆ1 = m1/n1. Similarly, qˆ0 is the proportion of observations for which yi = 0 that the model correctly predicts, which we can quantify as qˆ0 = m0/n0. Finally, we have pˆ = n0(m0/n0) + n1(m1/n1) n = n0qˆ0 + n1qˆ1 n = (n0/n)ˆq0 + (n1/n)ˆq1 = (1 − y¯)ˆq0 + ¯yqˆ1. (5) Bonus: Please answer the following question using proper spelling and grammar. B.1. (5) What does the fox say?
ECO 430-R – Applied Econometrics Final Exam (300 Points) Instructions: Please answer all of the following questions as best as possible. Points will be deducted for writing irrelevant statements. Partial credit will be awarded when it is earned. The point value for each question is in parentheses. The detail of your answer should correspond to the amount of points the question is worth. There are a total of 28 questions. All answers should be your own. If tests contain answers deemed “too similar” credit will be divided equally amongst the number of common answers. This exam is due to me by Tuesday, May 5th at 4:30 pm, no exceptions. The seminal article, “Colonial Origins of Comparative Development,” by Daron Acemoglu, Simon Johnson and James Robinson, which appeared in 2001 in the American Economic Review, studied the impact of institutions on economic output. As they note, “Countries with better “institutions”, more secure property rights, and less distortionary policies will invest more in physical and human capital, and will use these factors more efficiently to achieve a greater level of income.” To first investigate this claim they gather data on 110 countries from the World Development Indicators. Their model of interest is ln(yi) = β0 + β1Ri + β2Lati + β3DAsia,i + β4DAfrica,i + β5DOther,i + εi , (1) where yi is income per capita in country i in 1995, Ri is a measure of protection against expropriation (a proxy for the quality of institutions in a country), taken as an average over the 1985-1995 period (measured on a scale from 0 to 10), Lati is the latitude country i (measured as the distance from the equator, and scaled to lie between 0 and 1), DAsia,i is a dummy if country i resides on the Asian subcontinent, DAfrica,i is a dummy capturing if country i resides on the African continent, and DOther,i is a dummy capturing if country i resides on some other continent. Acemoglu, et al. (2001) estimate three different versions of (1), all of which appear in Table 1. Note that estimates for the intercept are not reported and common standard errors for each estimate appear underneath in parentheses. Further, in regression (3), the dummy for America is omitted to avoid the dummy variable trap. Additionally, Acemoglu et al. (2001) consider a subset of their data, consisting of 64 countries which they have access to other important variables that they use later on in their 1 2 Table 1. Dependent variable: ln y (1) (2) (3) R 0.54 0.47 0.43 (0.04) (0.06) (0.06) Lat 0.89 0.37 (0.49) (0.51) DAsia -0.62 (0.19) DAfrica -1.00 (0.15) DOther -0.25 (0.20) Observations 110 110 110 R2 0.62 0.63 0.73 analysis. Estimating the same three models that appear in Table 1, the estimates for this reduced sample are in Table 2. Using the smaller set of 64 countries, Acemoglu et al. (2001) estimate the model in (1) using an instrumental variables approach. Their instrument is the logarithm of European settle mortality in the 1800s. Estimating the same three models that appear in Table 2, the IV estimates appear in Table 3. Q.1. (5 each) Please interpret all of the coefficient estimates in model (3) in Table 1. Q.2. (15) Suppose the impact of institutions on growth differed between African countries and the rest of the world. First, comment on why the model in equation (1) cannot capture this effect. Second, write out a model which can allow for this differential African effect. Third, develop a test that would allow you to statistically discern if this effect is actually supported by the data. Q.3. (5) On average, what is the estimated difference in growth between African and Asian countries, ceteris paribus, from Table 1? Q.4. (5) Instead of measuring latitude as absolute difference from the equator we measure it as a raw number. Explain how the estimates, standard errors and R2 would change in model (2) in Table 1. 3 Table 2. Dependent variable: ln y (1) (2) (3) R 0.52 0.47 0.41 (0.06) (0.06) (0.06) Lat 1.60 0.92 (0.70) (0.63) DAsia -0.60 (0.23) DAfrica -0.90 (0.17) DOther -0.04 (0.32) Observations 64 64 64 R2 0.54 0.56 0.69 Table 3. Dependent variable: ln y (1) (2) (3) R 0.94 1.00 1.10 (0.16) (0.22) (0.46) Lat -0.65 -1.20 (1.34) (1.80) DAsia -1.10 (0.52) DAfrica -0.44 (0.42) DOther -0.99 (1.00) Observations 64 64 64 4 Q.5. (10) Suppose heteroskedasticity was a concern in Acemoglu et al.’s (2001) framework. Detail how you could test for the presence of heteroskedasticity using model (1) in Table 1. Q.6. (10) Detail how you would test (including the appropriate regression you would estimate) the hypothesis that the effect of geographic location (Latitude) has the same impact on growth as institutions (R) for model (3) in Table 1. Q.7. (5) Does model (3) nest model (2) in Table 1? Explain carefully. Q.8. (15) Clearly the quality of a country’s institutions is a difficult variable to measure and Ri , taken as an exact measure of institutions, is likely measured with error. In this setting, comment on the implication that βb1 is likely to have more variation due to the measurement error inherent in measuring institutional quality and what this implies for testing H0 : β1 = 0 against a two-sided alternative. Would this same implication be true if we thought of Ri as a proxy variable instead of being measured with error? Q.9. (15) Recall that for Ri to be a valid proxy variable, it must be the case that E(IQ|R, Lat) = E(IQ|R) where IQ is unobserved institutional quality. Comment on the precise meaning of this condition and if this condition is likely to hold. Hint: Read Acemoglu, Johnson and Robinson. Q.10. (10) Can the R2 s in Table 1 be compared to the corresponding R2 s in Table 2? Why or why not? Q.11. (5) Are the estimates from model (2) in Table 1 closer to the truth than the estimates from model (2) in Table 2? Explain carefully. Q.12. (10) Notice that the impact of geographic location, as measured through absolute latitude, has a much larger effect on economic output in models (2) and (3) in Table 2 compared to Table 1. Can you compare the differences in βb2 in Table 1 and 2? Why or why not? Q.13. (10) The effect of geographic location has gone from being statistically significant in model (2) in Tables 1 and 2 to being statistically insignificant in model (3) in both Tables at the 5% level. Comment on why this is not that surprising. Q.14. (10) In Table 2 in Acemoglu et al. (2001) they report estimates from model (1) but use a different measure of output per capita; they use output per worker in 1988 as opposed to output per capita in 1995. Their estimates for β1 are roughly similar, but R2 is about 10% lower. Does this suggest that this measure of country output is inferior to their measure? 5 Q.15. (10) Detail two reasons why we should not interpret β1 as a causal effect. Hint: Read Acemoglu, Johnson and Robinson. Q.16. (5) If OLS estimation of β1 is likely to suffer from endogeneity bias, comment on the likely direction of this bias. Q.17. (15) Acemoglu et al. (2001) suggest using settler mortality as an instrument for institutions to estimate a causal relationship between economic growth and institutions. However, suppose that no endogeneity bias existed, but instead an omitted interaction between institutions and geographic location arose. First, explain how this omitted nonlinearity could lead to biased estimates. Second, construct a model that incorporates this interaction and discuss if this model has a ceteris paribus interpretation for the effect of institutions on growth. Lastly, detail how you could test if the presence of this specific interaction belonged in the model statistically. Q.18. (15) Consider the hypothesis that institutions differ from a continental standpoint. Using model (1) from Table 1, detail how you could test this hypothesis at the 10% level. Further, suppose that if you were to conduct this test you obtained a p-value of 0.023. Discuss what this implies about the estimates in model (1) in Table 1 as it pertains to bias and variance. Q.19. (5) Suppose you wished to test the continental hypothesis on model (1) from Table 2 instead. Why might this not be such a good idea from a practical st
andpoint? Hint: Think about degrees of freedom. Q.20. (10) Detail the two main conditions that settler mortality must satisfy for it to be considered a valid instrument. Be precise. Q.21. (5 each) Please interpret both coefficient estimates in model (2) in Table 3. Q.22. (10) Can we say that the estimates of β1 in Table 3 are closer to the true value of β3 than those in either Table 1 or 2? Why or why not? Q.23. (15) Note that in models (1)-(3) in Table 1, the estimates of β1 are larger than their counterparts in Table 2. Comment on the theoretical implications that these larger estimates carry. Hint: Read Acemoglu, Johnson and Robinson. Q.24. (15) For models (1)-(3) in Table 1, the standard errors for βbIV 1 are larger than their counterparts in Table 2. Detail why this is not that surprising to you. Q.25. (10) The estimates of β2 which appear in Table 3 now have the wrong sign and are statistically insignificant with p-values larger than 0.15. Comment on the implication of this as it pertains to correlation between institutional quality and settler mortality. 6 Q.26. (10) If you disagreed with Acemoglu et al.’s (2001) assertion that early settler mortality was a valid instrument for current institutional quality, explain how you are hamstrung from a statistical standpoint. Q.27. (10) McArthur and Sachs (2001) suggest that the ‘disease environment’ and health characteristics of country belong in the Acemoglu et al. (2001) model. If disease environment was positively correlated with institutional quality and had a negative impact on growth, comment on the likely bias of Acemoglu et al.’s (2001) estimates? Further, discuss, given the magnitude found, why this may not be a valid concern pertaining to ruling out institutions as a driver of cross-country economic growth. Q.28. (10) Having considered the estimates appearing in Tables 1 through 3, detail to the best of your ability the likely impact that institutional quality has on economic growth. Be careful not to overstep your bounds, but to also say something with economic and statistical substance.