Machine learning: 1. Clustering

Dimension reduction and clustering #

Commandeering the computational resources of his internship sponsor Ayasdi, Muthu Alagappan analyzed a data set of season’s worth of individual NBA player statistics. What he discovered was surprising. It seemed that although basketball usually identifies five player position, there seem to be at least thirteen. His work, summarized in the following image, won the award for best Evolution of Sport in the 2012 MIT Sloan Sports Analytics Conference.


image by Muthu Alagappan

It was a perfect example of what is called unsupervised learning. We will begin our work on unsupervised learning thinking about two questions:

  • Given a high-dimensional data set, how should one find a useful low-dimensional representation of this data. Clearly, what we do will depend heavily on our definition of useful.

  • Describe and analyze algorithmic methods to detect groups of points. This problem is know as clustering.

Labs and exercises #

1. MNIST and Fashion MNIST examples
2. Does dimension reduction really matter?
3. Voronoi cells