Homework 4: Available 10/22/2016, Due 11/05/2016.
NOTE: Only the software portion can be done in teams of up to three people, your report should be your own work.
Handing in assignment on compass:
  1. Report in pdf format. File name should be net_id_hw4_report (e.g. yeh17_hw4_report.pdf)
  2. Code for both code-from-scratch and TensorFlow in TGZ or ZIP format. File name should be net_id_hw4_code (e.g. yeh17_hw4_code.zip)
  3. Include your team members' name in the submission description and report.
  4. Make sure the uploaded file is correct. Missing or corrupted files will be considered late. (You can download the uploaded file to verify)

Data

MNIST handwritten digit database [Link]. Tensorflow also has an API for loading and downloading the data. Link

Software Requirements

  1. Do NOT use ipython notebook.
  2. Modularized your code, a single script code is difficult to read.
  3. Use a relative path when accessing data. Clearly write in the README file where to store the data relative to the base directory.

Pencil-and-Paper

In this portion of the assignment, you will derive the update equations for a binary-binary RBM. Let $(V,H)$ denote the visible and hidden random variable which takes values $(v \in \{0,1\}^{m}, h \in \{0,1\}^{n}$). Then the joint probability distribution is described by the distribution $p(v,h; \theta) = \frac{1}{Z} e^{-E(v,h; \theta)}$, where $E$ is the following energy function $Z = \sum\limits_{v}\sum\limits_{h}e^{-E(v,h, \theta)}$, and $\theta = (W,b,c)$
  1. Find $p(v|h,\theta)$ and $\mathbb{E}[v|h, \theta]$. Show that $p(v_j=1|h)$ has the form $sigmoid(\sum\limits_{i=1}^n w_{ij}h_i+b_j)$
  2. Find $p(h|v,\theta)$ and $\mathbb{E}[h|v, \theta]$. Show that $p(h_i=1|v)$ has the form $sigmoid(\sum\limits_{j=1}^m w_{ij}v_j+c_i)$
  3. Now, you are given a dataset of $D = \{v_1, v_2, ..., v_N\}$. The log likelihood of the dataset can be computed as Compute $\frac{\partial \mathcal{L}(D|\theta)}{\partial \theta}$, and express it in terms of $p(h|v)$, $p(h,v)$ and $\frac{\partial E(v,h)}{\partial \theta}$
  4. Use the result in (2) compute $\frac{\partial \mathcal{L}(D|\theta)}{\partial W_{ij}}$, $\frac{\partial \mathcal{L}(D|\theta)}{\partial b_j}$, $\frac{\partial \mathcal{L}(D|\theta)}{\partial c_i}$.
  5. The update equations in (3) contain computationally intractable terms, which ones are intractable? And how do we approximate those terms? (Hint: contrastive divergence)

Code-From-Scratch

In this portion of the assignment, you will write a binary RBM and train on the MNIST dataset.
The MNIST dataset contains $28\times28\times1$ image, reshape this into $784 \times 1$ vector, this is $v$. Choose $h$ to be of dimension $200$. Use the update equations you derived in the pencil-and-paper part to train a RBM.
What to turn in:

TensorFlow

In this portion of the assignment, you will use RBM, stack of RBM, and PCA to perform dimension reduction on the raw pixels of the MNIST dataset. From the dimension reduced features, you will perform 10-way classification using multi-class logistic regression.

Report training and testing accuracy for the following:
  1. Train a 10-way multi-class logistic regression on the raw-pixel data.
  2. Train a RBM with 200 hidden units. Perform 10-way multi-class logistic regression on this reduced dimension of 200.
  3. Use PCA to reduce the input dimension to 200. Perform 10-way multi-class logistic regression on this reduced dimension.
  4. Train a stacking of RBMs, with hidden units [500, 200] (i.e. train a RBM with 500 hidden units, then use the output of the first RBM to train a second RBM with 500 hidden units. Perform 10-way multi-class logistic regression on this reduced dimension.
Note: For the classification part, you are allowed you use higher-level API such as TFLearn. This will greatly reduced the amount of code you need to write.

What to turn in: