Almost every lecture will be accompanied by an article. Exams will contain two parts, with roughly half the points for each part: (1) short-answer or multiple-choice questions about the assigned articles, and (2) long-answer questions based on the homework.
Supplementary readings are provided for your information only; exam questions will not be drawn from those.
- Section 1: Supervised Learning
- August 23: SVD, pseudo-inverse, linear
regression. M. Planitz,
Inconsistent
Systems of Linear Equations, the Mathematical
Gazette 63(425):181-185, 1979.
- Supplementary reading: Christopher Bishop, Neural Networks for Pattern Recognition
- August 25: logistic regression. Frank
Baker, The
Basics of Item Response Theory, second edition, chapter 3.
Publisher: ERIC Clearinghouse on Assessment and
Evaluation.
- Supplementary reading: Christopher Bishop, Neural Networks for Pattern Recognition
- August 30: no lecture.
- September 1: perceptron. Mehryar Mohri and
Afshin
Rostamizadeh, Perceptron
Mistake Bounds, Arxiv 1305.0208, 2013 (only sections 1-2 required)
- Supplementary reading: Christopher Bishop, Neural Networks for Pattern Recognition
- September 6: support vector machine. C.J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Knowledge Discovery and Data Mining 2(2), 1998 (only sections 3.1 and 3.5 required)
- September 8: logistic regression, multivariate logistic regression [lecture slides]
- September 13: principal component analysis, python + numpy + TensorFlow tutorial [lecture slides] [ipython notebook] [ipython pdf]
- September 15: No lecture
- September 20: back-propagation. David
E. Rumelhart, Geoffrey E. Hinton, and Ronald
J. Williams,
Learning representations by back-propagating errors,
Nature 323:533-536, 1986.
- Supplementary reading: JSALT Slides on Neural Networks,
- Automatic Neural Networks (ANN)
- Christopher Bishop, Neural Networks for Pattern Recognition (Chapter 4)
- September 22: convolutional neural networks. Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms Towards AI, in Large-Scale Kernel Machines, L. Bottou, O. Chapelle, D. Decoste and J. Weston, Eds., 2007 (only section 6.2 required)
- September 27: Time-delay neural networks.
Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton,
Kiyohiro Shikano and Kevin
J. Lang, Phoneme
Recognition Using Time-Delay Neural Networks, IEEE
Trans. ASSP 37:328-339, 1989 (only section II required)
- Supplementary reading: CNN Lecture Slides
- September 29: Exam 1 review
- October 4: Exam 1
- August 23: SVD, pseudo-inverse, linear
regression. M. Planitz,
Inconsistent
Systems of Linear Equations, the Mathematical
Gazette 63(425):181-185, 1979.
- Topic 2: Unsupervised Learning
- October 6: Expectation maximization. A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B 39(1):1-38, 1977
- October 11: Gaussian mixture models. Jeff Bilmes, A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models, International Computer Sciences Institute TR-97-021, 1997
- October 13: Multivariate Gaussians. Sam Roweis, EM Algorithms for PCA and SPCA, Cal Tech. Research Note, 1999 (required: section 2, definition of PCA)
- October 17: GMMs again.
- October 20: restricted Boltzmann machines. Smolensky, Harmony Theory, pages 213-220. (Pages 213-220 talk about what the variables mean. These pages are really useful so that you can understand WHY we use RBMs. Pages 220-232 describe HOW we use RBMs, specifically, how you compute the hidden vector from the observed vector: log probability on p. 221, joint probability on p. 228, posterior probability on pp. 231-2).
- October 25: Contrastive Divergence. Geoffrey E. Hinton, Training products of experts by minimizing contrastive divergence, Neural computation 14:(8):1771-1800, 2002 (required: section 2, section 7).
- October 27: Parzen windows. Emanuel Parzen, On Estimation of a Probability Density Function and Mode, Annals of Mathematical Statistics 33(3):1065-1076, 1962.
- Topic 3: Time
- November 1: back-propagation through time. Paul Werbos, Back-propagation through time: What it is and why we do it, Proceedings of the IEEE 78(10):1550-1560, 1990 (Required: sections II.D-E, III.A-C)
- November 3: LSTM (long short-term memory network). Sepp Hochreiter and Jurgen Schmidhuber, Long Short-Term Memory. Neural Computation 9(8):1735-1780, 1997 (Required: Section 4 only)
- November 8: hidden Markov models. Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc. IEEE 77(2):257-286 (Required: Sections 1-3 only).
- November 10: Hybrid NN-HMM.
- Nelson Morgan and Herve Bourlard, Neural networks for statistical recognition of continuous speech, Proceedings of the IEEE 83(5):742-770, 1995. (Required: only Section VII.B, pp. 756-7)
- Yoshua Bengio, Renato de Mori, Giovanni Flammia and Ralf Kompe, Global Optimization of a Neural Network-Hidden Markov Model Hybrid, 1992. Hybrid LSTM-HMM. (Optional)
- Alex Graves, Navdeep Jaitly and Abdel-rahman Mohamed, Hybrid Speech Recognition with Deep Bidirectional LSTM, Proc. IEEE ASRU, 2013 (Optional)
- November 15: Lecture canceled.
- November 17: generative adversarial networks and variational autoencoder. (Optional) [lecture slides]
- November 29: finite state transducers. Mehryar Mohri, Fernando Pereira and Michael Riley, Weighted finite-state transducers in speech recognition, Computer Speech and Language 16:69-88, 2002 (Optional)
- December 1: Polychronization. Eugene M. Izhikevich, Polychronization: Computation with Spikes. Neural Compuutation 18:245-282, 2006. (Optional)
- December 6: Exam 3 review.
- December 15, 19:00-22:00 Exam 3
Extra Readings
Here are some papers that are not critical to the course material (not covered in any quiz) but might be of interest to some of you.
- self-training. H.J. Scudder, Probability of Error of Some Adaptive Pattern-Recognition Machines, IEEE Trans. Information Theory 11:363-371, 1965
- G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine 29(6):82-97, 2012
- simulated annealing. Bruce Hajek, A Tutorial Survey of Theory and Applications of Simulated Annealing, Proc. 24th Conference on Decision and Control, 755-760, 1985
- higher-order learning. Diederik P. Kingma and Jimmy P. Ba, Adam: A method for stochastic optimization, Arxiv 1402.6980, 2014
- transfer learning. Y. Bengio, "Deep learning of representations for unsupervised and transfer learning," JMLR: Proceedings of Unsupervised and Transfer Learning Challenge and Workshop, pp. 17-36, 2012
- multi-task learning. R. Caruana, "Multitask learning", Machine Learning 28(1):41-75, 1997
- Spiking neurons. Fred Rieke, D. Warland, and W. Bialek, Coding efficiency and information rates in sensory neurons, Europhys. Lett. 22:151-6, 1993.
- encoder-decoder networks. William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals, Listen, Attend and Spell, arXiv:1508.01211, 2015.