I. Regression and classification

Supervised learning #

In our context, the goal of supervised learning will be to find a function \(\hat{f}\) so that

\[\hat{f}(\vec{x}) \approx \vec{y}\] for pairs \((\vec{x},\vec{y})\) in some data set \(\mathcal{D}\), allowing us to use the input vector \(\vec{x}\) to make a prediction about the output vector \(\vec{y}\). We will look for, and it makes sense to do so, an \(\hat{f}\) that is continuous. But how does one go about looking for a completely generic continuous function? Instead, we will focus on looking for functions in a hypothesis class \(\mathcal{H}\) that is:

sufficiently rich so that every continuous function can be approximated by a sequence of functions in \(\mathcal{H}\), and
each function in \(\mathcal{H}\) can be described by a finite number of real-valued parameters; thus approximating \(\hat{f}\) will boil down to finding good values of these parameters.

As we will see, the sets of polynomials, trigonometric polynomials, and dense as well as convolutional neural networks all form examples such hypothesis classes. We will focus on two main machine learning problems:

Regression: Given an input vector \(\vec{x} \in \mathbb{R}^n\), predict an output vector \(\vec{y} \in \mathbb{R}^m\). For instance, one can hope to be able to predict the sales price of a piece of real estate, or the annual crop yield of a farm, given sufficiently detailed feature vectors \(\vec{x}\).

Classification: Given an input vector \(\vec{x} \in \mathbb{R}^n\), predict the class of the underlying individual. For instance, one can hope to recognize a handwritten digit from the vector formed by the pixel values of its image.

This section of the course is divided into three parts. Approximation theory will develop tools and techniques designed to evaluate the expressiveness of a hypothesis class of functions. We will cover the Weierstrass Approximation Theorem for polynomials and its multivariate and trigonometric generalizations, and extend our techniques to be able to analyze more complex models like neural networks later in the course. Generalized regression will address techniques to find functions in a hypothesis class that minimize predictive error on a data set \(\mathcal{D}\). We will begin with the classical study of linear regression and generalize our work to multivariate polynomial regression. And Empirical risk minimization will describe iterative optimization tools designed to efficiently find models that minimize predictive error. While we will focus our analysis on gradient-based methods, there will be some discussion of other, more ad hoc machine learning models.