**Lectures:** Tue/Thu 9.30-10.45am US Central Standard Time (Urbana-Champaign time)

**Zoom info: ** Zoom link (needs Illinois credentials to log in), meeting id: 922 4939 6027

**Lecture recordings:** available for Illinois students at Mediaspace channel for the course

**Anonymous attendance: **Synchronous lectures will be recorded live. If you want to stay anonymous you can turn off video and change your name. See instructions.

**Tuesday, August 25.**Introduction, logistics, streaming and sampling**Thursday, August 27.**Randomized algorithms: quick sort- Slides (clean, scribbles), scrubbed video

**Tuesday, Sept 1.**Probabilistic inequalities- Slides (clean, scribbles), scrubbed video
- Sariel Har-Peled's notes on randomized algorithms, in particular Chapter 7 on Chernoff inequality and a cheat sheet
- Randomized algorithms books and references below

**Thursday, Sept 3.**Probabilistic counting (Morris counter)- Slides (clean, scribbles), scrubbed video
- Lecture 1 from Fall 2014
- Lecture 1 from Nelson/Indyk course from 2017.
- Chapter 4 in Amit Chakrabarti's notes.
- Counting large numbers of events in small registers by Morris, CACM 1978
- Approximate counting: a detailed analysis by Flajolet.

**Tuesday, Sept 8.**Intro to frequency moments in streaming, Distinct element estimation- Slides (clean, scribbles), scrubbed video
- Lecture notes from Nelson/Indyk course.
- Lecture 2 from Fall 2014
- Chapters 2 and 3 in Amit Chakrabarti's notes.
- Original Flajolet-Martin paper based on hashing
- Hyperlolog in practice
- A theoretically optimal algorithm in terms of space

**Thursday, Sept 10.**Limited independence and Hashing, Distinct elements via pairwise independent hashing- Slides for hashing (clean, scribbles), slides for distinct elements (clean, scribbes), scrubbed video
- Avrim Blum's notes
- Salil Vadhan's book on pseudorandomness (Section 3.5 on pairwise independence)
- Mikkel Thorup's article on hashing
- A nice introduction to finite fields (see Chapter 7)
- Fast hashing with Strong Concentration Bounds, STOC 2020 paper motivated by streaming applications.

**Tuesday, Sept 15.**AMS estimator for frequency moments, F2 estimation- Slides (clean, scribbles), scrubbed video
- Lecture 3 from Fall 2014
- Lecture 4 from Fall 2014

**Thursday, Sept 17.**Heavy hitters and deterministic Misra-Greis algorithm- Slides (clean, scribbes), scrubbed video
- Chapter1 in Amit Chakrabarti's notes.

**Tuesday, Sept 22.**Count-Min and Count Sketches- Slides (clean, scribbles), scrubbed video
- Lecture 6 from Fall 2014

**Thursday, Sept 24.**Sketching for range queries, Sparse recovery- Slides (clean, scribbles), scrubbed video
- Lecture 6 from Fall 2014

**Tuesday, Sept 29.**JL Lemma and Dimensionality Reduction, Subspace Embeddings- Slides (clean, scribbles), scrubbed video
- Lecture 4 from Fall 2014
- Nelson slides on sparse JL.

**Thursday, Oct 1.**Finish Subspace Embeddings, Application to Least Squares Regression- Slides (clean, scribbles), scrubbed video

**Tuesday, Oct 6.**No lecture in lieu of midterm exam**Thursday, Oct 8.**Similarity estimation- Slides (clean, scribbles), scrubbed video
- Chapter 3 "Finding Similar Items" in MMDS book.
- Moses Charikar's paper Similarity estimation techniques from rounding algorithms
- Min-Wise Independent Permutations by Broder, Charikar, Frieze, Mitzenmacher.
- Piotr Indyk's paper on small min-wise hash families

**Tuesday, Oct 13.**Locality Sensitive Hashing- Slides (clean, scribbles), scrubbed video
- LSH webpage including downloadable code and papers and slides
- Sasho Nikolov's notes on LSH
- Survey Approximate Nearest Neighbor Search in High Dimensions by Andoni, Indyk and Razenshteyn
- Sariel Har-Peled's book on Geometric Approximation Algorithms covers ANN in both low and high-dimensions

**Thursday, Oct 15.**LSH for Euclidean distances- Slides (clean, scribbles), scrubbed video

**Tuesday, Oct 20.**Quantiles and Selection- Slides (clean, scribbles), scrubbed video
- Kent Quanrud's notes from Spring 2019
- Lecture 10 from 2014
- Survey by Greenwald and Khanna on Quantiles
- Application of weighted quantiles to an ML system. See paper of Chen, Guestrin.

**Thursday, Oct 22.**Finish previous lecture. Selection in Random Order Streams- Slides (clean, scribbles), scrubbed video
- Munro-Paterson paper
- Guha-MacGregor paper
- Secretary problem

**Tuesday, Oct 27.**Topics in Streaming (F_p estimation, priority sampling, \ell_2 sampling, \ell_0 sampling)**Thursday, Oct 29.**Continue previous lecture- Slides (clean, scribbles), scrubbed video
- Slide notes by McGregor for sampling.
- Section 4 in chapter on signals in the draft book by McGregor-Muthu.
- Paper on precision sampling by Andoni, Krauthgamer, Onak
- Paper on \ell_p sampling by Monemizadeh and Woodruff.
- Paper on near-optimal \ell_p sampling by Jowhari, Saglam, Tardos which has the simple \ell_0 sampling

**Tuesday, Nov 3.**Election day, no lecture**Thursday, Nov 5.**Graph streaming and sketching for connectivity- Slides (clean, scribbles), scrubbed video
- Survey by Andrew McGregor
- Chapters in Amit Chakrabartis notes

**Tuesday, Nov 10.**Graph streaming: matchings and sparsification- Slides (clean, scribbles), scrubbed video
- Survey by Andrew McGregor
- Chapters in Amit Chakrabartis notes

**Thursday, Nov 12.**Approximate Matrix Multiplication- Slides (clean, scribbles), scrubbed video
- Lecture 11 from Indyk-Nelson course
- Mahoney's lecture notes

**Tuesday, Nov 17.**SVD and low-rank approximation- Slides (clean, scribbles), scrubbed video
- Chapter 3 in Foundations of Data Science (online draft)

**Thursday, Nov 19.**No lecture due to FOCS**Tuesday/Thursday, Nov 24, 26:**Thanksgiving Break**Tuesday, Dec 1.**Fast/approximate NLA- Slides (clean, scribbles), scrubbed video
- Notes from Jelani Nelsons's course (see 16 to 20) and related lectures from the same course on Fast Jonhonson-Lindenstrauss Transform etc.
- Low Rank Approximation and Regression in Input Sparsity Time by Clarkson and Woodruff and Woodruff's talk slides.
- Woodruff's manuscript on sketching for NLA.
- Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression by Meng and Mahoney
- OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings by Nelson and Ngyuen

**Thursday, Dec 3.**Compressed Sensing, Coreset for MEB- Slides (clean, scribbles), scrubbed video
- Compressed sensing: lecture 19-21 from Jelani Nelsons course
- Patrick Lin's slides on coresets from Spring 2019
- Survey on coresets and sketching by Jeff Phillips
- Survey on coresets by Feldman
- Ke Chen's PhD Thesis
- Sariel Har-Peleds book on geometric approximation
- Several recent papers on coresets

**Tuesday, Dec 8.**Coresets for clustering. Wrap up

- Small Summaries for Big Data by Graham Cormode and Ke Yi. Cambridge University Press, Dec 2020.
- (Evolving) Course notes of Amit Chakrabarti focusing mostly on streaming algorithms.
- Foundations of Data Science, by Avrim Blum, John Hopcroft and Ravi Kannan. Broad technical introduction covering foundations of several topics. UIUC students should have access to the digital version but there is also a free draft version available here.
- Mathematical Foundations of Data Analysis, Jeff Phillips.
- Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman and Jeff Ullman. Emphasis on the practical aspects
- Sketching as a Tool for Numerical Linear Algebra, a survey by David Woodruff. Copy for personal use from author's website
- Survey on sketching by Graham Cormode
- Survey on Graph Streams, Andrew MacGregor.
- Lecture Notes on Randomized Linear Algebra by Michael Mahoney. See also an older survey by same author.
- A book in preparation on data stream algorithms by Andrew McGregor and Muthu Muthukrishnan
- Surveys on core-sets. Core-sets: An updated survey by Dan Feldman. Coresets and Sketches by Jeff Phillips. Another one by Muneanu and Schwiegelshohn. Original one by Agarwal, Har-Peled and Varadarajan (see also Sariel's book).
- Lecture notes on Massively Parallel Computing model and algorithms by Mohsen Ghaffari
- Sublinear Algorithms is a topic that we will not cover in this course, however it is a related area and there are many resources collected at this website.

- Books on probability and computing: Probability and Computing (Mitzenmacher-Upfal), Randomized Algorithms (Motwani-Raghavan), The Probabilistic Method (Alon-Spencer), Concentration of Measure (Dubhashi-Panconesi)
- A survey on concentration inequalities by Fan Chung and Linyuan Li.
- Probabilistic Tools for the Analysis of Randomized Optimization Heuristics by Benjamin Doerr
- Pseudorandomness by Salil Vadhan.
- High Speed Hashing for Integers and Strings by Mikkel Thorup.

- Algorithms for Big Data: Chandra (UIUC), Fall 2014
- Data Stream Algorithms: Amit Chakrabarti (Dartmouth), Spring 2020
- Algorithms for Big Data: Jelani Nelson (Harvard), Fall 2015.
**Includes Video Lectures** - Algorithms for Massive Data, Alex Andoni (Columbia), Spring 2019.
- Algorithms for Big Data, David Woodruff (CMU), Fall 2019
- Sketching Algorithms for Big Data: Piotr Indyk (MIT) and Jelani Nelson (Harvard), Fall 2017
- Algorithmic Techniques for Big Data: Moses Charikar (Stanford), Winter 2020
- Data Stream Algorithms, Andrew MacGregor (UMass Amherst), Spring 2018.
- Models of Computation for Massive Data: Jeff Philips (Utah)