ECE 544NA Fall 2016: Lectures

Almost every lecture will be accompanied by an article. Exams will contain two parts, with roughly half the points for each part: (1) short-answer or multiple-choice questions about the assigned articles, and (2) long-answer questions based on the homework.

Supplementary readings are provided for your information only; exam questions will not be drawn from those.

Section 1: Supervised Learning
- August 23: SVD, pseudo-inverse, linear regression. M. Planitz, Inconsistent Systems of Linear Equations, the Mathematical Gazette 63(425):181-185, 1979.
  - Supplementary reading: Christopher Bishop, Neural Networks for Pattern Recognition
- August 25: logistic regression. Frank Baker, The Basics of Item Response Theory, second edition, chapter 3. Publisher: ERIC Clearinghouse on Assessment and Evaluation.
  - Supplementary reading: Christopher Bishop, Neural Networks for Pattern Recognition
- August 30: no lecture.
- September 1: perceptron. Mehryar Mohri and Afshin Rostamizadeh, Perceptron Mistake Bounds, Arxiv 1305.0208, 2013 (only sections 1-2 required)
  - Supplementary reading: Christopher Bishop, Neural Networks for Pattern Recognition
- September 6: support vector machine. C.J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Knowledge Discovery and Data Mining 2(2), 1998 (only sections 3.1 and 3.5 required)
- September 8: logistic regression, multivariate logistic regression [lecture slides]
- September 13: principal component analysis, python + numpy + TensorFlow tutorial [lecture slides] [ipython notebook] [ipython pdf]
- September 15: No lecture
- September 20: back-propagation. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams, Learning representations by back-propagating errors, Nature 323:533-536, 1986.
  - Supplementary reading: JSALT Slides on Neural Networks,
  - Automatic Neural Networks (ANN)
  - Christopher Bishop, Neural Networks for Pattern Recognition (Chapter 4)
- September 22: convolutional neural networks. Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms Towards AI, in Large-Scale Kernel Machines, L. Bottou, O. Chapelle, D. Decoste and J. Weston, Eds., 2007 (only section 6.2 required)
- September 27: Time-delay neural networks. Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano and Kevin J. Lang, Phoneme Recognition Using Time-Delay Neural Networks, IEEE Trans. ASSP 37:328-339, 1989 (only section II required)
  - Supplementary reading: CNN Lecture Slides
- September 29: Exam 1 review
- October 4: Exam 1
Topic 2: Unsupervised Learning
- October 6: Expectation maximization. A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B 39(1):1-38, 1977
- October 11: Gaussian mixture models. Jeff Bilmes, A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models, International Computer Sciences Institute TR-97-021, 1997
- October 13: Multivariate Gaussians. Sam Roweis, EM Algorithms for PCA and SPCA, Cal Tech. Research Note, 1999 (required: section 2, definition of PCA)
- October 17: GMMs again.
- October 20: restricted Boltzmann machines. Smolensky, Harmony Theory, pages 213-220. (Pages 213-220 talk about what the variables mean. These pages are really useful so that you can understand WHY we use RBMs. Pages 220-232 describe HOW we use RBMs, specifically, how you compute the hidden vector from the observed vector: log probability on p. 221, joint probability on p. 228, posterior probability on pp. 231-2).
- October 25: Contrastive Divergence. Geoffrey E. Hinton, Training products of experts by minimizing contrastive divergence, Neural computation 14:(8):1771-1800, 2002 (required: section 2, section 7).
- October 27: Parzen windows. Emanuel Parzen, On Estimation of a Probability Density Function and Mode, Annals of Mathematical Statistics 33(3):1065-1076, 1962.
Topic 3: Time
- November 1: back-propagation through time. Paul Werbos, Back-propagation through time: What it is and why we do it, Proceedings of the IEEE 78(10):1550-1560, 1990 (Required: sections II.D-E, III.A-C)
- November 3: LSTM (long short-term memory network). Sepp Hochreiter and Jurgen Schmidhuber, Long Short-Term Memory. Neural Computation 9(8):1735-1780, 1997 (Required: Section 4 only)
- November 8: hidden Markov models. Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc. IEEE 77(2):257-286 (Required: Sections 1-3 only).
- November 10: Hybrid NN-HMM.
  - Nelson Morgan and Herve Bourlard, Neural networks for statistical recognition of continuous speech, Proceedings of the IEEE 83(5):742-770, 1995. (Required: only Section VII.B, pp. 756-7)
  - Yoshua Bengio, Renato de Mori, Giovanni Flammia and Ralf Kompe, Global Optimization of a Neural Network-Hidden Markov Model Hybrid, 1992. Hybrid LSTM-HMM. (Optional)
  - Alex Graves, Navdeep Jaitly and Abdel-rahman Mohamed, Hybrid Speech Recognition with Deep Bidirectional LSTM, Proc. IEEE ASRU, 2013 (Optional)
- November 15: Lecture canceled.
- November 17: generative adversarial networks and variational autoencoder. (Optional) [lecture slides]
- November 29: finite state transducers. Mehryar Mohri, Fernando Pereira and Michael Riley, Weighted finite-state transducers in speech recognition, Computer Speech and Language 16:69-88, 2002 (Optional)
- December 1: Polychronization. Eugene M. Izhikevich, Polychronization: Computation with Spikes. Neural Compuutation 18:245-282, 2006. (Optional)
- December 6: Exam 3 review.
- December 15, 19:00-22:00 Exam 3

Extra Readings

Here are some papers that are not critical to the course material (not covered in any quiz) but might be of interest to some of you.

self-training. H.J. Scudder, Probability of Error of Some Adaptive Pattern-Recognition Machines, IEEE Trans. Information Theory 11:363-371, 1965
G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine 29(6):82-97, 2012
simulated annealing. Bruce Hajek, A Tutorial Survey of Theory and Applications of Simulated Annealing, Proc. 24th Conference on Decision and Control, 755-760, 1985
higher-order learning. Diederik P. Kingma and Jimmy P. Ba, Adam: A method for stochastic optimization, Arxiv 1402.6980, 2014
transfer learning. Y. Bengio, "Deep learning of representations for unsupervised and transfer learning," JMLR: Proceedings of Unsupervised and Transfer Learning Challenge and Workshop, pp. 17-36, 2012
multi-task learning. R. Caruana, "Multitask learning", Machine Learning 28(1):41-75, 1997
Spiking neurons. Fred Rieke, D. Warland, and W. Bialek, Coding efficiency and information rates in sensory neurons, Europhys. Lett. 22:151-6, 1993.
encoder-decoder networks. William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals, Listen, Attend and Spell, arXiv:1508.01211, 2015.