Due Date
Homework 1 released 8/28/2016, Due 9/16/2016.
Data
The dataset is here. The data include audio waveforms (and cepstral coefficients) from letters of the alphabet. The task will be to separate the ee-set (b,c,d,e,g,p,t,v,z) from the eh-set (f,l,m,n,s,x). The dataset is divided into train, dev (development test), and eval (evaluation test) sets. Read the file README.txt for some more information. Note: The dataset contains some corrupted data, you should skip over those examples.
Pencil-and-paper
Suppose that you have a one-layer neural network, of the form $y_i=g(w'x_i+b)$, where $g()$ is some nonlinearity, $b$ is a trainable scalar bias parameter, and $w'x_i$ means the dot product between the trainable weight vector, $w$, and the $i^{th}$ training vector, $x_i$. Suppose you have a training corpus of the form $D=\{(x_1,t_1),...,(x_n,t_n)\}$. Turn in your derivations of the following 5 things.
- Find the derivatives $\frac{dE}{dw_j}$ and $\frac{dE}{db}$, where $E=\sum_i((t_i-y_i)^2)$. Your answer should include the derivative of the nonlinearity, $g'(w'x_i+b)$.
- Suppose $g(a)=a$. Write $\frac{dE}{dw_j}$ without $g'()$.
- Suppose $g(a)=\frac{1}{(1+exp(-a))}$. In this case, $g'(w'x_i+b)$ can be written as a simple function of $y_i$. Write it that way.
- Use the perceptron error instead: $E=\sum_i(max(0,-(w'x_i+b)\cdot t_i))$.
- Use the SVM error instead: $E=||w||_2^2+C\cdot\sum_i(h(x_i,t_i))$, where $h(x_i,t_i)=max(0,1-t_i(w'x_i+b))$ is the hinge loss, $C$ is an arbitrary constant, and you can assume that $t_i$ is either +/-1.
Code-From-Scratch:
This part of the homework can be written using any programming language you like. We recommend Matlab or Python, but others are also possible.
Write a one-layer neural net that takes a single cepstrum as input, and classifies it as ee-set versus eh-set. Write general code for training (using gradient descent), and for testing (using both MSE and classification accuracy). Use your code to train four classifiers: linear, logistic, perceptron, and linear SVM. Note that you'll have to define the targets $t_i$ to be either {+1,-1} or {+1,0}, depending on the classifier.
What to turn in:
- Methods: describe the functions you wrote. Specify how particular lines of code implement particular numbered equations from your pencil-and-paper part. Include a figure showing either the dependency tree of your classes and methods, or a flowchart of training and testing, or something else descriptive.
- Results:
*Error rate = 1 - Accuracy*- Provide one figure with four subfigures, showing convergence plots of all four classifiers (abscissa = training iteration, ordinate = training-corpus error rate).
- Provide a figure, showing the training-corpus and development-test-corpus error rates of at least four different SVM training runs, using five different values of the "$C$" parameter (abscissa = $C$, ordinate = error rate).
- Provide a table, showing the evaluation-test-corpus error rates of all four classifiers (including whichever SVM has lowest error rate on the development-test corpus).
- Provide a figure showing a scatter plot of 300 randomly selected training tokens, with 'o' used to signify ee-set and 'x' used to signify eh-set tokens, depicted in the space defined by their first two principal components, with lines drawn across the space in four different colors showing the boundary lines of the four different classifiers.
- Code: submit an auxiliary TGZ or ZIP file containing your code.
TensorFlow
Train a one-layer neural net, with logistic output nodes, using TensorFlow. Hint: this will be very similar to MNIST for beginners, but with different data. This part of the assignment should be written in Python.
Note that a logistic node is exactly equal to a two-class softmax node. Cross-entropy loss is not the same as MSE loss; you can use whichever one you prefer.
What to turn in:
- Methods: discuss which TensorFlow functions you used, and how.
- Results: provide a convergence figure (abscissa = training iteration, ordinate = training corpus error rate). Provide a table showing error rate on the training corpus, and on the evaluation-test corpus.
- Code: submit an auxiliary TGZ or ZIP file containing your code. Don't include the TensorFlow source.