ECE 544NA Fall 2016: Assignment 2

Homework 2: Available 9/16/2016, Due 10/1/2016.
NOTE: Only the software portion can be done in teams of up to three people, your report should be your own work.
Handing in assignment on compass:

Report in pdf format. File name should be net_id_hw2_report (e.g. yeh17_hw2_report.pdf)
Code for both code-from-scratch and TensorFlow in TGZ or ZIP format. File name should be net_id_hw2_code (e.g. yeh17_hw2_code.zip)
Include your team members' name in the submission description and report.
Make sure the uploaded file is correct. Missing or corrupted files will be considered late. (You can download the uploaded file to verify)

Data

The dataset is here, same file as assignment 1. Refer to the README file for more details.

Software Requirements

Do NOT use ipython notebook.
Modularized your code, a single script code is difficult to read.
Use a relative path when accessing data. Clearly write in the README file where to store the data relative to the base directory.

Pencil-and-Paper

In this part of the assignment, you will compute the derivatives/gradients and backpropped error for the following common modules in a neural network.

Derivative of Softmax

$\vec{y}[k] = \frac{e^{\vec{z}[k]}}{\sum\limits_{l=1}^{C}e^{\vec{z}[l]}}$

Negative Log Likelihood loss for Multi-Class

$L = -\sum_{i}^{N}\sum_{k}^{K}{\mathbf{1}[y_i = k]}\cdot\log(\hat{\vec{y}}_i[k])$

Avg-pooling (1D)

$\vec{y}[i] = \frac{1}{W} \sum\limits_{j=0}^{W} \vec{x}[i+j]$

Max-pooling (1D)

$\vec{y}[i] = \max\limits_{j=0}^{W} \vec{x}[i+j]$

Convolutional layer (1D)

$\vec{y}[i] = (\vec{w} \star \vec{x})[i] = \sum\limits_{j=-1}^{1} \vec{x}[i-j]\vec{w}[j]$

Code-From-Scratch

Perform 9-way classification among the 9 letters in the ee-set using a fully-connected neural network with 2 hidden layers. Use the first 70 frames of each example and drop any example that is shorter than 70 frames. The use mini-batch gradient descent for optimization, you can pick the batch size. The test set accuracy is in the range of 35% to 50%.

Experiment with different number of hidden-nodes, {10, 50}, assume the two hidden layers have the same size.
Experiment with different type of nonlinearities [sigmoid, tanh, relu]

What to turn in:

Methods:
1. Describe the functions you wrote, and the overall structure of your code.
2. Describe the model architecture and specific hyperparameter you have chosen.
3. Report the total number of weights in the models.
Results:
1. Report training and testing accuracy for your best model.
2. Report training and testing accuracy for all the models. (All combinations between nonlinearities and number of hidden nodes.)
3. Report the running time (in seconds) for one iteration of backpropagation on model with 10 and 50 hidden-nodes. Also describe how the runng time varies with batch size.
4. Plot training and testing classification confusion matrix for your best model.
Code: Submit an auxiliary TGZ or ZIP file containing your code. Note: A README.txt describing how to run your code should be included in the zip file

TensorFlow

Perform 9-way classification among the 9 letters in the ee-set using the TDNN architecture of Waibel, Hanazawa, Hinton, Shikano and Lang. Use the first 70 frames of each example and drop any example that is shorter than 70 frames. The use mini-batch gradient descent for optimization, you can pick the batch size. The test set accuracy is in the range of 35% to 50%.

Use the following network architecture:
Input layer: (J=16, N=2), Hidden Layer 1: (J=8, N=4), Hidden Layer 2: (J=3, N=1), Final Layer: multi-class logistic regression.

Next, try changing the TDNN architecture or even CNN architecture and see if you can improve the accuracy (The accuracy for this part will not affect your grade) Here is a TensorFlow tutorial with CNN on the MNIST datase.
What to turn in:

Methods:
1. Describe the TDNN architecture, and discuss how you used TensorFlow functions to create such architecture.
2. Describe the variations of the TDNN architecture you have tried and anything interesting.
3. Report the total number of weights in the original TDNN architecture.
4. Report the activation dimensions at each layer.
5. Discuss which Tensorflow functions you used, and how. Additionally, explain the overall organization/structure of your code, you should refer to specific part of your code.
Results:
1. Report training and testing accuracy for the TDNN model.
2. Report training and testing accuracy for your best model.
3. Plot training and testing classification confusion matrix for the TDNN model, and your best model.
Code: Submit an auxiliary TGZ or ZIP file containing your code. Don't include the TensorFlow source. Note: A README.txt describing how to run your code should be included in the zip file