Videos and questions for Chapter 1c of the course "Market Analysis with Econometrics and Machine Learning" at Ulm University (taught by Sebastian Kranz)

Multiple Linear Regression with Control Variables

Consider the model from the video above and assume prices \(p_t\) are uncorrelated with \(u_t\) but positively correlated with \(s_t\). Do we get a consistent estimator \(\hat \beta_1\) if we estimate the short regression \(q_t = \beta_0 + \beta_1 p + \varepsilon\) via OLS?

Assume we estimate the multiple regression \[q_t = \beta_0 + \beta_1 p_t + \beta_2 s_t + u_t\] where \(p_t\) is uncorrelated with the error term \(u_t\) but correlated with the other explanatory variable \(s_t\). Is \(p\) then exogenous or endogenous?

Now assume we change the model in the video such that \(p\) is uncorrelated with both \(u\) and \(s\). Assume we estimate again the short regression \(q_t = \beta_0 + \beta_1 p + \varepsilon\). Is the estimator \(\hat \beta_1\) now consistent?

Assume you have a perfectly randomized pricing experiment. Do you need to add control variables to your regression to consistently estimate the causal effect of price on expected demand?

Nevertheless control variables are often added in randomized experiments. First, because perfect randomization is often not possible and second because control variables can allow you to reduce the standard errors of your estimator.

If you add control variables in the regression, you can also run non-perfectly randomized price experiments. For example, if you already know that it is more profitable to set higher price on sunny days, you may want to run an experiment where you set higher prices on sunny days than on non-sunny days but still add some random price fluctuation each day. If you then control in the regression for sunny and non-sunny days, you still can consistently estimate the causal effect of prices.

Matrix formula for multiple linear regression

Take a look on the lecture slides for chapter 1c to see that the OLS estimator \(\hat \beta\) can be computed with the same matrix formula that we have introduced for the simple linear regression. One only needs to add one column to the matrix \(X\) for each explanatory variable.

For more advanced intuition for the multiple regression, the following regression anatomy result is useful.

Regression Anatomy

Illustration of Regression Anatomy in R

On the slides you find more details and interpretation of the regression anatomy. If you are new to econometrics this may sound all a bit abstract. But if you already have a some experience, the regression anatomy can be really helpful for a better intuitive understanding of what it means to add a control variable.

But let's move on...

Which control variables to include...

And finally an example that is not about estimating ice cream demand!

Here are some quiz questions about which control variables you would include in a regression.

Assume you want to study if and how much wage discrimination based on gender exists. Would you add channel variables like having studied a quantitative subject to your regression?

Assume you want to analyse the effect of obtaining a university degree on wages. Would you add as control variable whether people later have a management position in their firm?

Assume you want to analyse the effect of obtaining a university degree on wages. Would you add high-school grades as control variable?

Video remark on the difficulty to estimate causal effect of a university degree on wages:

Assume a producer sells his product at a producer price \(p^p\) to stores and the stores set a retail price \(p^r\) for final customers. You are the producer and want to estimate how the demand for your product depends on your producer price \(p^p\). Should you add the retail price \(p^r\) as a control variable to your regression?

And a last quiz on this page...

Assume you know that prices \(p\) depend on variables describing the demand conditions and on costs \(c\). You also know that costs are uncorrelated with demand conditions. You want to estimate a demand function. Would you add costs \(c\) as control variable to your regression?

Controlling by running separate regressions for subsets of data

Heterogeneous Effects and Interaction Terms

R Example about Estimating Heterogeneous Effects

Consider the regression with interaction effects:

\[q = \beta_0 + \beta_1 p + \beta_2 s + \beta_3 p \cdot s + \varepsilon\]

To what will the estimated coefficient \(\hat \beta_1\) in the regression with interaction terms be equal to?

Non-linear Effects

Asssume we estimate a regression in which an explanatory variable \(x\) has a quadratic effect: \[y = \beta_0 + \beta_1 * x + \beta_1 x^2 + \varepsilon\] Is this still called a linear regression?

Instrumental Variable Estimation

R Example Instrumental Variable Estimation via Two Stages Least Squares

Conditions for a valid instrumental variable

An instrumental variable (in the example above costs), must satisfy two conditions. An exogeneity condition and a relevance condition.

What do you think is the exogeneity condition that an instrumental variable must satisfy?

What do you think is the relevance condition that an instrumental variable must satisfy?

Which condition of an instrument \(z\) for an endogenous variable \(x\) could you check statistically with a real world data set?

Is the exogenous explanatory variabe \(s\) in the demand function \[q = \beta_0 + \beta_1 p + \beta_2 s + \varepsilon\] a valid instrument for the endogenous price in this demand function?

Directly estimating IV Regression with ivreg