Videos and questions for Chapter 2a of the course "Market Analysis with Econometrics and Machine Learning" at Ulm University (taught by Sebastian Kranz)

Predicting y vs estimating \(\beta\)

Can a linear regression model also be used for pure prediction?

Machine Learning

Can machine learning methods like random forests or lasso regression can only be used for prediction problems or can they also help to estimate causal effects?


Polynomial Example

Which model will make the best predictions for the training data set?

Which model will make the worst predictions for the test data set? Make a guess...

Root mean squared error

Instead of using the MSE to assess prediction accuracy on the test sample, one often uses the so called root mean squared error (RMSE). What is the formula for the sample RMSE? Make a guess:


Lasso Regression

Correction: In the video we have a wrong sign in front of the \(\lambda\) of the lasso minimization problem. It must be "+" instead of "-". The lasso estimator solves:

\[\min_{\hat \beta} \sum_{i=1}^n {\hat \varepsilon(\hat \beta)^2} + \lambda \sum_{k=1}^K {|\hat \beta_k|}\]

(There is a similar error later in the videos for ridge and elastic net regression.)

Assume we would estimate a Lasso regression with a regularization parameter \(\lambda=0\). Would then the lasso estimator be identical to the OLS estimator?

Note that the R code of simulation studied in the following videos is available on Moodle. (This time there is no need to hide the code since the RTutor problem set is fairly different.).

Why do we only see two estimated coefficients in our output of the lasso model?

What will is the outcome if we estimate the lasso model again with a lower value of lambda (just 0.1 instead of 1 as before)?

Which coefficients will be selected for very small lambda close to 0?

Here you see again our results of computed RMSE for the 4 considered values of lambda:

unlist(rmse.li)
lambda_0.001  lambda_0.01   lambda_0.1   lambda_0.5 
  0.010079667  0.010153556 0.005719416  0.002839564

Which would be the best value of lambda out of the 4 above according to the results shown above?

Did we actually compute everything correctly or did we make some error?

Assume now we estimate our models with a much smaller training data set (only 100 observations). What do you think will now be the best regularization parameter for an optimal out-of-sample prediction?

Wow, a lot of videos and questions about the lasso. You get the Cowboy award! Only one more video to go. Ride on to the next section...


Parameter Tuning & Cross Validation

Below is a brief video giving a brief summary about parameter tuning and cross validation. Here you are asked to read carefully the lecture slides for more details.

Parameter Ridge Regression and Elastic Net

Also take a look in the lecture slides about ridge regression, which is a variant of lasso regression.

RTutor problem set for this chapter

The RTutor problem set differs a bit more from the video lectures and slides than in previous chapters. I wanted to illustrate how to estimate ols and lasso prediction models for a real world data set about used cars. Before one can build a reasonable prediction model with a real world data set, one typically needs to spend considerable time on data preparation and cleaning. So those steps will be a large part of the RTutor problem set.