Videos and questions for Chapter 3 of the course "Empirical Economics with R" at Ulm University (taught by Sebastian Kranz)
Will the code lm(exam ~ homework)
in the video run without error?
Will the code lm(y ~ x)
in the video run without error?
If we increase the sample size n from 20 to 80. How will then approximately change the standard deviation (standard error) of our OLS estimator ˆβ1?
How will the standard error of ˆβ1 change if we increase the standard deviation sd(u) of our error term?
How will the standard error of ˆβ1 change if we increase the standard deviation sd(x) of the explanatory variable x?
Assume we have only a single sample with n observations. Can we get from a single sample a sensible estimation of the standard error of ˆβ1?
As the sample size grows large, will the estimate ˆβ1 converge (in probability) against the true value β1 in our example? Make a guess.
We found the following regression result:
Does that mean we are 95% confident that submitting an additional homework problem set causes an increase in the average exam score between 0.22 and 0.679?
We run the following simulation:
n = 10000
alpha0 = 0; alpha1 = 1; alpha2 = 1
u = rnorm(n,0,1)
x2 = rnorm(n,0,1)
x1 = x2+rnorm(n,0,1)
y = alpha0 + alpha1*x1 + alpha2*x2 + u
Does alpha1=1
measure the causal effect from x1
on y
in our simulation?
One can draw the causal relationships in the data generation process above as follows:
Now assume we estimate the short regression:
y=β0+β1x1+ε
Will for a large sample size the OLS estimator ˆβ1 of the short regression converge to the causal effect α1 of x1 on y in our example? In other words is in the short regression β∗1=α1?
Still assume that we generate the data with our simulation and estimate the short regression:
y=β0+β1x1+ε
Is the OLS ˆβ1 a consistent estimator of the causal effect α1=1 of x1 on y in our example?
Now assume we would estimate the long regression:
y=β0+β1x1+β2x2+η
Is the OLS estimator ˆβ1 of the long regression a consistent estimator of the causal effect α1=1 of x1 on y in our example?
Assume we estimate again the short regression y=β0+β1x1+ε. Is the OLS estimator ˆβ1 a consistent estimator of β1?
What do you think is the opinion of your lecturer?
Make a guess what happens with the bias of ˆβ1 if we add the noisy proxy (compared to the short regression without proxy).
How will our bias of ˆβ1 in our regression with the proxy variable change if the standard deviation in the noise of the proxy variable goes down? Make a guess.
How will the bias of ˆβ1 in our regression change if reduce the standard deviation of the sources of exogenous variation in x1
? Make a guess.
What about the bias of ˆβ1 if we have a very precise proxy but also almost no exogenous variation in x1
?
Assuming intelligence is the main confounder and we have some sources of exogenous variation. How would we expect the estimator ˆβ1 for the causal effect of edu
to change if we add the IQ score as control variable?
Here are the results of our 3 regressions:
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
Dependent variable: | |||
ecolbs | |||
(1) | (2) | (3) | |
ecoprc | -0.845** | -2.926*** | -2.949*** |
(0.331) | (0.588) | (0.593) | |
regprc | 3.029*** | 3.060*** | |
(0.711) | (0.715) | ||
male | -0.108 | ||
(0.227) | |||
inseason | -0.176 | ||
(0.206) | |||
hhsize | 0.053 | ||
(0.069) | |||
age | 0.001 | ||
(0.007) | |||
faminc | 0.003 | ||
(0.003) | |||
Constant | 2.388*** | 1.965*** | 1.703*** |
(0.372) | (0.380) | (0.591) | |
Observations | 660 | 660 | 660 |
R2 | 0.010 | 0.036 | 0.041 |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
Here are 5 quiz questions related to these regression results. You can find more detailed explanations of the answers in the lecture slides.
a) Part 1: If we have a well randomized experiment, is then the OLS estimator in our second regression ecolbs=β0+β1ecoprc+β2regprc+u consistent?
a) Part 2: Are the signs of ˆβ1<0 and ˆβ2>0 consistent with what we would expect from economic theory?
b) If we don't add regprc
(see first regression) does the OLS estimator seem to be biased? If yes, in which direction?
c) Looking at the regression results what is the likely sign of the correlation between the two prices ecoprice
and regprice
in the experiment?
d) Assume you were not sure whether the prices were indeed correctly randomized over households, i.e. chosen independently of household characteristics. Which of the following results suggest that we indeed had proper randomization?
Great, you have finished the video lectures for this quite long Chapter 3!
Maybe after a short break, it is a good time to start with the RTutor problem set of this chapter.