Multivariable calculus: Training and validation: solutions

Training and validation sets: solutions #

Homework exercise:

Outline how you would use training and validation sets to choose the degree for a polynomial model. Illustrate this process using an example from your Mathematica work above. Feel free to import images to support your work.

Bonus question: How can you use the validation mean squared error to estimate how good your predictions would be?

Solution

Here is the general outline

  1. Start by fitting a polynomial of degree 1 using the training data. Compute the MSE of your model using the training data and also the validation data.
  2. Increase the degree of the polynomial and repeat the process. Again, compute the MSE using training data and then using the validation data.
  3. Stop when the MSE of your model computed on the validation data stops decreasing. Or alternately, choose the degree for which your validation MSE is smallest. The MSE computed on the validation data is a proxy for how well you will be able to make predictions for values of \(x\) not in your data set, and this is ultimately what we would like to minimize.

Here is an image of the validation and training error for polynomials fit to one data set I created. Based on the above algorithm, I would choose degree nine.

In general, MSE on training data will decrease as you increase the degree of the polynomial. There is a simple explanation; if you think about it, a degree two polynomial is just a degree three polynomial with an additional \(0 \cdot x^3\) term. And it is possible that the additional flexibility of allowing the coefficient of \(x^3\) to be something other than zero can decrease this error. It certainly can’t do worse! This is not true for validation data, and validation error can begin to increase at any time.

Bonus answer: The mean squared error, when computed on the validation data, is an indication of what you can expect the square of the error of your prediction to be on yet unseen data. How good this error estimate is requires a more formal course in either machine learning or statistics: a coming attraction.