Useful Articles and Books
- MP2: PCA Matthew Turk and Alex Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience 3(1):71-86, 1991
- MP3: Cepstrum and MFCC
- Cepstrum and Liftering: A.V. Oppenheim, R.W. Schafer and T.G. Stockham, Jr., Nonlinear filtering of multiplied and convolved signals. IEEE Trans. Audio and Electroacoustics 16(3):437-466, section IV. Recommended reading: pp. 437-442, page 444, pages 460-464.
- MFCC: Steven Davis and Paul Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE Transactions on Acoustics, Speech and Signal Processing 28(4):357-366, 1980. The part on MFCC is required material, the part on DTW is optional.
- MP4, Gaussian Mixture Models:
- Things you need to know: GMMs. Bin H. Juang, Stephen E. Levinson and Man Mohan Sondhi, Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains, Trans. Information Theory 32(2):307-309, 1986.
- Optional: optimal detection and Bayesian classifiers. J. Neyman and E.S. Pearson, On the Problem of the Most Efficient Tests for Statistical Hypotheses, Philosophical Transactions of the Royal Society of London, Series A, 231:289-337, 1933 (read pages 289-303)
- MP5, Hidden Markov Models: Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc. IEEE 77(2):257-286. You should read at least pages 257-264, 266-267, 272-273, and 278-279.
- MP6, Adaboost Object Detection:
- Required: how to do it, Rapid Object Detection using a Boosted Cascade of Simple Features, Paul Viola and Michael Jones, CVPR 2001.
- Optional: why it works, A decision-theoretic generalization of on-line learning and an application to boosting, Freund and Schapire, 1995
- MP7, Image Animation:
Lecture Notes
- Thomas Huang, ECE 417 Lecture Notes, 2012
- Mark Hasegawa-Johnson, Lecture Notes in Speech Production, Speech Coding, and Speech Recognition, 1999.
Books
- MP1, MP2: Gilbert Strang, Introduction to Linear Algebra.
- MP3, MP4, MP5: L. Rabiner and B.W. Juang, Fundamentals of Speech Recognition, Prentice-Hall, 1993.
Useful Links
- Kevin Murphy, Matlab Tips
- Mike Brooks, Voicebox Toolbox for Matlab
- MIT Open Courseware Introduction to Matlab