Market Analysis 2a

Videos and questions for Chapter 2a of the course "Market Analysis with Econometrics and Machine Learning" at Ulm University (taught by Sebastian Kranz)

Predicting y vs estimating \(\beta\)

Can a linear regression model also be used for pure prediction?

yes

Machine Learning

Can machine learning methods like random forests or lasso regression can only be used for prediction problems or can they also help to estimate causal effects?

Only for prediction

They can also help to estimate causal effects

Polynomial Example

Which model will make the best predictions for the training data set?

The linear model 1

The quadratic model 2

The octic model 3

Which model will make the worst predictions for the test data set? Make a guess...

The linear model 1

The quadratic model 2

The octic model 3

Root mean squared error

Instead of using the MSE to assess prediction accuracy on the test sample, one often uses the so called root mean squared error (RMSE). What is the formula for the sample RMSE? Make a guess:

\(RMSE = \frac {1} {n} \sum_{i=1}^n \sqrt{(\hat y_i - y_i)^2}\)

\(RMSE = \sqrt{\frac {1} {n} \sum_{i=1}^n (\hat y_i - y_i)^2}\)

Lasso Regression

Correction: In the video we have a wrong sign in front of the \(\lambda\) of the lasso minimization problem. It must be "+" instead of "-". The lasso estimator solves:

\[\min_{\hat \beta} \sum_{i=1}^n {\hat \varepsilon(\hat \beta)^2} + \lambda \sum_{k=1}^K {|\hat \beta_k|}\]

(There is a similar error later in the videos for ridge and elastic net regression.)

Assume we would estimate a Lasso regression with a regularization parameter \(\lambda=0\). Would then the lasso estimator be identical to the OLS estimator?

yes

Note that the R code of simulation studied in the following videos is available on Moodle. (This time there is no need to hide the code since the RTutor problem set is fairly different.).

Why do we only see two estimated coefficients in our output of the lasso model?

All other estimated lasso coefficients have the value 0 and are ommited.

The tidy function shows by default only 2 coefficients.

We made an error when construction the matrix X.

What will is the outcome if we estimate the lasso model again with a lower value of lambda (just 0.1 instead of 1 as before)?

Now it could be that even fewer coefficients are non-zero

We should get again two non-zero coefficients.

It is likely that we now get more then two non-zero coefficients.

Which coefficients will be selected for very small lambda close to 0?

Typically all coefficients that are not equal to zero in the true model.

Typically all coefficients including those that are zero in the true model.

Here you see again our results of computed RMSE for the 4 considered values of lambda:

unlist(rmse.li)
lambda_0.001  lambda_0.01   lambda_0.1   lambda_0.5 
  0.010079667  0.010153556 0.005719416  0.002839564

Which would be the best value of lambda out of the 4 above according to the results shown above?

0.001

0.01

0.1

0.5

Did we actually compute everything correctly or did we make some error?

Everything is correct.

We should predict on the training data set instead of the test data set.

We made a computation error in the formula for the RMSE.

We made an error in the code to estimate the models.

Assume now we estimate our models with a much smaller training data set (only 100 observations). What do you think will now be the best regularization parameter for an optimal out-of-sample prediction?

Now the smallest lambda is probably clearly the best regularization parameter.

It should remain the same one as before.

Probably a larger lambda is now better.

Wow, a lot of videos and questions about the lasso. You get the Cowboy award! Only one more video to go. Ride on to the next section...

Parameter Tuning & Cross Validation

Below is a brief video giving a brief summary about parameter tuning and cross validation. Here you are asked to read carefully the lecture slides for more details.

Parameter Ridge Regression and Elastic Net

Also take a look in the lecture slides about ridge regression, which is a variant of lasso regression.

RTutor problem set for this chapter

The RTutor problem set differs a bit more from the video lectures and slides than in previous chapters. I wanted to illustrate how to estimate ols and lasso prediction models for a real world data set about used cars. Before one can build a reasonable prediction model with a real world data set, one typically needs to spend considerable time on data preparation and cleaning. So those steps will be a large part of the RTutor problem set.