Videos and questions for Chapter 2a of the course "Market Analysis with Econometrics and Machine Learning" at Ulm University (taught by Sebastian Kranz)

Can a linear regression model also be used for pure prediction?

Can machine learning methods like random forests or lasso regression can only be used for prediction problems or can they also help to estimate causal effects?

Which model will make the *best* predictions for the *training data set*?

Which model will make the *worst* predictions for the *test data set*? Make a guess...

Instead of using the MSE to assess prediction accuracy on the test sample, one often uses the so called root mean squared error (RMSE). What is the formula for the sample RMSE? Make a guess:

**Correction:** In the video we have a wrong sign in front of the \(\lambda\) of the lasso minimization problem. It must be "+" instead of "-". The lasso estimator solves:

\[\min_{\hat \beta} \sum_{i=1}^n {\hat \varepsilon(\hat \beta)^2} + \lambda \sum_{k=1}^K {|\hat \beta_k|}\]

(There is a similar error later in the videos for ridge and elastic net regression.)

Assume we would estimate a Lasso regression with a regularization parameter \(\lambda=0\). Would then the lasso estimator be identical to the OLS estimator?

Note that the R code of simulation studied in the following videos is available on Moodle. (This time there is no need to hide the code since the RTutor problem set is fairly different.).

Why do we only see two estimated coefficients in our output of the lasso model?

What will is the outcome if we estimate the lasso model again with a lower value of `lambda`

(just 0.1 instead of 1 as before)?

Which coefficients will be selected for very small `lambda`

close to 0?

Here you see again our results of computed RMSE for the 4 considered values of `lambda`

:

```
unlist(rmse.li)
lambda_0.001 lambda_0.01 lambda_0.1 lambda_0.5
0.010079667 0.010153556 0.005719416 0.002839564
```

Which would be the best value of lambda out of the 4 above according to the results shown above?

Did we actually compute everything correctly or did we make some error?

Assume now we estimate our models with a much smaller training data set (only 100 observations). What do you think will now be the best regularization parameter for an optimal out-of-sample prediction?

Wow, a lot of videos and questions about the lasso. You get the **Cowboy** award! Only one more video to go. Ride on to the next section...

Below is a brief video giving a brief summary about parameter tuning and cross validation. Here you are asked to read carefully the lecture slides for more details.

Also take a look in the lecture slides about ridge regression, which is a variant of lasso regression.

The RTutor problem set differs a bit more from the video lectures and slides than in previous chapters. I wanted to illustrate how to estimate ols and lasso prediction models for a real world data set about used cars. Before one can build a reasonable prediction model with a real world data set, one typically needs to spend considerable time on data preparation and cleaning. So those steps will be a large part of the RTutor problem set.