Due to Covid-19 this course will be taught as a synchronous online class via Zoom.
Lectures: Tue/Thu 9.30-10.45am US Central Standard Time (Urbana-Champaign time)
Zoom info: Zoom link (needs Illinois credentials to log in), meeting id: 922 4939 6027
Lecture recordings: available for Illinois students at Mediaspace channel for the course
Anonymous attendance: Synchronous lectures will be recorded live. If you want to stay anonymous you can turn off video and change your name. See instructions.
- Tuesday, August 25. Introduction, logistics, streaming and sampling
- Thursday, August 27. Randomized algorithms: quick sort
- Tuesday, Sept 1. Probabilistic inequalities
- Thursday, Sept 3. Probabilistic counting (Morris counter)
- Tuesday, Sept 8. Intro to frequency moments in streaming, Distinct element estimation
- Thursday, Sept 10. Limited independence and Hashing, Distinct elements via pairwise independent hashing
- Tuesday, Sept 15. AMS estimator for frequency moments, F2 estimation
- Thursday, Sept 17. Heavy hitters and deterministic Misra-Greis algorithm
- Tuesday, Sept 22. Count-Min and Count Sketches
- Thursday, Sept 24. Sketching for range queries, Sparse recovery
- Tuesday, Sept 29. JL Lemma and Dimensionality Reduction, Subspace Embeddings
- Thursday, Oct 1. Finish Subspace Embeddings, Application to Least Squares Regression
- Tuesday, Oct 6. No lecture in lieu of midterm exam
- Thursday, Oct 8. Similarity estimation
- Tuesday, Oct 13. Locality Sensitive Hashing
- Thursday, Oct 15. LSH for Euclidean distances
- Tuesday, Oct 20. Quantiles and Selection
- Thursday, Oct 22. Finish previous lecture. Selection in Random Order Streams
- Tuesday, Oct 27. Topics in Streaming (F_p estimation, priority sampling, \ell_2 sampling, \ell_0 sampling)
- Thursday, Oct 29. Continue previous lecture
- Slides (clean, scribbles), scrubbed video
- Slide notes by McGregor for sampling.
- Section 4 in chapter on signals in the draft book by McGregor-Muthu.
- Paper on precision sampling by Andoni, Krauthgamer, Onak
- Paper on \ell_p sampling by Monemizadeh and Woodruff.
- Paper on near-optimal \ell_p sampling by Jowhari, Saglam, Tardos which has the simple \ell_0 sampling
- Tuesday, Nov 3. Election day, no lecture
- Thursday, Nov 5. Graph streaming and sketching for connectivity
- Tuesday, Nov 10. Graph streaming: matchings and sparsification
- Thursday, Nov 12. Approximate Matrix Multiplication
- Tuesday, Nov 17. SVD and low-rank approximation
- Thursday, Nov 19. No lecture due to FOCS
- Tuesday/Thursday, Nov 24, 26: Thanksgiving Break
- Tuesday, Dec 1. Fast/approximate NLA
- Thursday, Dec 3. Compressed Sensing, Coreset for MEB
- Tuesday, Dec 8. Coresets for clustering. Wrap up
References and study material
Relevant Books, Surveys, Collected Lecture Notes
- Small Summaries for Big Data by Graham Cormode and Ke Yi. Cambridge University Press, Dec 2020.
- (Evolving) Course notes of Amit Chakrabarti focusing mostly on streaming algorithms.
- Foundations of Data Science, by Avrim Blum, John Hopcroft and Ravi Kannan. Broad technical introduction covering foundations of several topics. UIUC students should have access to the digital version but there is also a free draft version available here.
- Mathematical Foundations of Data Analysis, Jeff Phillips.
- Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman and Jeff Ullman. Emphasis on the practical aspects
- Sketching as a Tool for Numerical Linear Algebra, a survey by David Woodruff. Copy for personal use from author's website
- Survey on sketching by Graham Cormode
- Survey on Graph Streams, Andrew MacGregor.
- Lecture Notes on Randomized Linear Algebra by Michael Mahoney. See also an older survey by same author.
- A book in preparation on data stream algorithms by Andrew McGregor and Muthu Muthukrishnan
- Surveys on core-sets. Core-sets: An updated survey by Dan Feldman. Coresets and Sketches by Jeff Phillips. Another one by Muneanu and Schwiegelshohn. Original one by Agarwal, Har-Peled and Varadarajan (see also Sariel's book).
- Lecture notes on Massively Parallel Computing model and algorithms by Mohsen Ghaffari
- Sublinear Algorithms is a topic that we will not cover in this course, however it is a related area and there are many resources collected at this website.
Resources on probability and randomized algorithms
- Algorithms for Big Data: Chandra (UIUC), Fall 2014
- Data Stream Algorithms: Amit Chakrabarti (Dartmouth), Spring 2020
- Algorithms for Big Data: Jelani Nelson (Harvard), Fall 2015. Includes Video Lectures
- Algorithms for Massive Data, Alex Andoni (Columbia), Spring 2019.
- Algorithms for Big Data, David Woodruff (CMU), Fall 2019
- Sketching Algorithms for Big Data: Piotr Indyk (MIT) and Jelani Nelson (Harvard), Fall 2017
- Algorithmic Techniques for Big Data: Moses Charikar (Stanford), Winter 2020
- Data Stream Algorithms, Andrew MacGregor (UMass Amherst), Spring 2018.
- Models of Computation for Massive Data: Jeff Philips (Utah)