ECE 544NA Fall 2016: Assignment 4

Homework 4: Available 10/22/2016, Due 11/05/2016.
NOTE: Only the software portion can be done in teams of up to three people, your report should be your own work.
Handing in assignment on compass:

Report in pdf format. File name should be net_id_hw4_report (e.g. yeh17_hw4_report.pdf)
Code for both code-from-scratch and TensorFlow in TGZ or ZIP format. File name should be net_id_hw4_code (e.g. yeh17_hw4_code.zip)
Include your team members' name in the submission description and report.
Make sure the uploaded file is correct. Missing or corrupted files will be considered late. (You can download the uploaded file to verify)

Data

MNIST handwritten digit database [Link]. Tensorflow also has an API for loading and downloading the data. Link

Software Requirements

Do NOT use ipython notebook.
Modularized your code, a single script code is difficult to read.
Use a relative path when accessing data. Clearly write in the README file where to store the data relative to the base directory.

Pencil-and-Paper

In this portion of the assignment, you will derive the update equations for a binary-binary RBM. Let $(V,H)$ denote the visible and hidden random variable which takes values $(v \in \{0,1\}^{m}, h \in \{0,1\}^{n}$). Then the joint probability distribution is described by the distribution $p(v,h; \theta) = \frac{1}{Z} e^{-E(v,h; \theta)}$, where $E$ is the following energy function $\begin{align} E(v,h; \theta) &= -\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{m} w_{ij}h_iv_j - \sum\limits_{j=1}^m b_jv_j - \sum\limits_{i=1}^{n}c_ih_i\\ &= -(v^{\intercal}Wh + v^{\intercal}b + h^{\intercal}c) \end{align}$ $Z = \sum\limits_{v}\sum\limits_{h}e^{-E(v,h, \theta)}$, and $\theta = (W,b,c)$

Find $p(v|h,\theta)$ and $\mathbb{E}[v|h, \theta]$. Show that $p(v_j=1|h)$ has the form $sigmoid(\sum\limits_{i=1}^n w_{ij}h_i+b_j)$
Find $p(h|v,\theta)$ and $\mathbb{E}[h|v, \theta]$. Show that $p(h_i=1|v)$ has the form $sigmoid(\sum\limits_{j=1}^m w_{ij}v_j+c_i)$

$\mathcal{L}(D|\theta) = \sum\limits_{t=1}^N \log(p(v_t|\theta))$

Use the result in (2) compute $\frac{\partial \mathcal{L}(D|\theta)}{\partial W_{ij}}$, $\frac{\partial \mathcal{L}(D|\theta)}{\partial b_j}$, $\frac{\partial \mathcal{L}(D|\theta)}{\partial c_i}$.
The update equations in (3) contain computationally intractable terms, which ones are intractable? And how do we approximate those terms? (Hint: contrastive divergence)

Code-From-Scratch

In this portion of the assignment, you will write a binary RBM and train on the MNIST dataset.
The MNIST dataset contains $28\times28\times1$ image, reshape this into $784 \times 1$ vector, this is $v$. Choose $h$ to be of dimension $200$. Use the update equations you derived in the pencil-and-paper part to train a RBM.
What to turn in:

Methods:
1. Describe the functions you wrote, and the overall structure of your code.
Results:
1. Provide a figure, showing any 64 of the 200 learned filter.
Code: Submit an auxiliary TGZ or ZIP file containing your code. Don't include the TensorFlow source. Note: A README.txt describing how to run your code should be included in the zip file

TensorFlow

In this portion of the assignment, you will use RBM, stack of RBM, and PCA to perform dimension reduction on the raw pixels of the MNIST dataset. From the dimension reduced features, you will perform 10-way classification using multi-class logistic regression.

Report training and testing accuracy for the following:

Train a 10-way multi-class logistic regression on the raw-pixel data.
Train a RBM with 200 hidden units. Perform 10-way multi-class logistic regression on this reduced dimension of 200.
Use PCA to reduce the input dimension to 200. Perform 10-way multi-class logistic regression on this reduced dimension.
Train a stacking of RBMs, with hidden units [500, 200] (i.e. train a RBM with 500 hidden units, then use the output of the first RBM to train a second RBM with 500 hidden units. Perform 10-way multi-class logistic regression on this reduced dimension.

Note: For the classification part, you are allowed you use higher-level API such as TFLearn. This will greatly reduced the amount of code you need to write.

What to turn in:

Methods:
1. Describe the RBM and multi-class logistic regression architecture, discuss how you used TensorFlow functions to create such architecture. Make sure your description matches your code
2. Report the total number of weights in the stacked RBM.
Results:
1. Provide a table, showing the training and testing accuracy for each of the methods mentioned above.
2. Plot training and testing classification confusion matrix for each of the methods mentioned above.
3. Code: Submit an auxiliary TGZ or ZIP file containing your code. Don't include the TensorFlow source. Note: A README.txt describing how to run your code should be included in the zip file