Overfitting #
Suppose that we have a data set of \(k\) input-output pairs:
\[ \mathcal{D} : (x_1, y_1), (x_2, y_2), \ldots, (x_k, y_k)\]
My minimizing the mean squared loss (MSE), we have developed a way of finding a polynomial of any degree that “best” fits that data set \(\mathcal{D}\). The higher the degree, the more intricate our optimization problem will be, but if a computer is doing the heavy lifting, this is not something we should worry about. So suppose we plot the points in \(\mathcal{D}\) and obtain:
We will use a Mathematica notebook to generate a synthetic data set consisting of a few dozen points and then compute polynomials of a variety of degrees. Use the notebook as you work through the following exercises.
Generate a data set \(\mathcal{D}\) and compute the quadratic polynomial of best fit, then a cubic polynomial of best fit.
- What happened to the mean squared error as you increased the polynomial’s degree?
- What do you expect to happen to the mean squared error as you increase the degree of the polynomial even more? Do this for a number of degrees and test your hypothesis.
- Can you explain why mean squared error behaves this way as the degree of the polynomial increases?
Solution
In general, increasing the degree of the polynomial will give a smaller MSE. Simply put, there is more flexibility when choosing a degree three polynomial instead of a degree two polynomial; and with more flexibility you should should expect the error to be smaller. In fact, a degree two polynomial is just a degree three polynomial whose first coefficient if forced to be zero!
This tends to suggest that we should always use high degree polynomials! Hmmmm…