Unsupervised Learning - Expectation Maximization

Expectation Maximization is a popular generative algorithm, which elegantly handles Multi-Variate Distributions and tries to find close approximations to underlying data-generating distribution.

For fun, I derived the Multi-Variate Expectation Maximization algorithm, using Mixture of Gaussians. Interested readers can find it here

In following I will consider a particular family of exponential distributions, the Normal Distribution, which for 1-dimensional data x:

(subscripts are for N data points and K Normal Distributions which try to approximate data distribution.
For multidimensional data (s.t. ), Normal Distribution is,

where is covariance matrix for (or )

For following discussion, we will mainly consider 1-dimensional data x (start from something simple and grow complex!)

EM Algorithm looks like this:

Repeat until convergence:

  • find
    (Can be interpretted as probability of (or ), given data point is observed.)

  • find
    (Can be interpreted as Expected number of data points which belong to (or )

  • find
    (updated prior of the (or ))

  • find
    (weighted mean of (or ))

  • find …[1]


    Now lets see this algorithm fitting 5-Gaussians (K=5) on a gray-scale image shown below. Size of this image=300KB
    alt text

    After running EM algorithm on this image, trying to fit 5-Gaussians, we get following compressed image=100KB
    (Note that actual storage size on disk < 10KB, as this image is actually stored as:

    1. an array of 8 bit integers, each location specifying which cluster it belongs to (that is, most likely Gaussian for that data point).
    2. And only 15 64-bit floating point numbers, with 5 for , 5 for , 5 for .)
      alt text

      Following is the statistical analysis of what original image distribution was, how 5-Gaussians were fitted and distribution after sub-sampling to prove that only few unique values are sufficient to describe the compressed image.
      alt text

      As seen above, 5-Gaussians fit (Red dotted line in top half) matches closely to actual data distribution. But is of-course a smoothed approximation.
      Bottom-half in picture above compares histogram of sub-sampled image (each pixel value replace with most likely Gaussian), and histogram of original image.









      [1]: it is important to interpret

      Let us say (and of course also ) So,

Written on July 19, 2017