This course will describe some algorithmic techniques developed for handling large amounts of data that is often available in limited ways. Topics that will be covered include data stream algorithms, sampling and sketching techniques, and sparsification, with applications to signals, matrices, and graphs. Emphasis will be on the theoretical aspects of the design and analysis of such algorithms.

This version of the course is directed at senior level undergraduate students and beginning graduate students, and hence will not assume background in randomized algorithms. The pace of the course and the topics will be commensurately adjusted.

**Instructor:** Chandra Chekuri (3228 Siebel Center, chekuri at illinois.edu).

**Teaching assistants:** Patrick Lin (plin15 at illinois.edu) and Kent Quanrud (quanrud2 at illinois.edu)

**Office hours:** Siebel 3rd floor theory loung. Chandra (Fri 1:00-2:00pm), Kent (Tue 3:00 - 4:00 ), Patrick (Mon 3:00 - 4:00)

**Grading policy:** 50% Homework, 25% Midterm, 25% Project

**Prerequisites: ** CS 374 and CS 361, or comparable understanding and facility with algorithms and probability.

**Links: ** Piazza; Gradescope (self-enrollment code: **9NY3ZN**)

- Homework 0. Optional, but recommended for students less comfortable with probability.
- Homework 1. Due 10am, Feb 6.
- Homework 2. Due 10am, Feb 22.

**Tuesday, January 15.**Lecture 1 from Fall 2014**Thursday, January 17.**Intro to randomized algorithms and Quick Sort (slides)**Tuesday, January 22.**Probabilistic inequalities (slides)- Sariel Har-Peled's notes on randomized algorithms, in particular Chapter 7 on Chernoff inequality
- Randomized algorithms books and references below

**Thursday, January 24.**Pairwise independence and Hashing (slides)- Avrim Blum's notes
- Salil Vadhan's book on pseudorandomness (Section 3.5 on pairwise independence)
- Mikkel Thorup's article on hashing
- Summer school on hashing (currently the material is missing, have written to them)
- A nice introduction to finite fields

**Tuesday, January 29.**Finish hashing discussion, Probabilistic Counting (slides)- Lecture 1 from Fall 2014
- Lecture 1 from Nelson/Indyk course from 2017.
- Counting large numbers of events in small registers by Morris, CACM 1978
- Approximate counting: a detailed analysis by Flajolet.

**Thursday, January 31.**Finish probabilistic counting. Intro to Frequency Moments and Distinct Elements (slides)- Lecture 2 from Fall 2014
- Lecture notes from Nelson/Indyk course.
- Chapters 2 and 3 in Amit Chakrabarti's notes.

**Tuesday, February 5.**Finish distinct elements. AMS-Sampling for frequency moment estimation (slides)- Lecture 3 from Fall 2014

**Thursday, February 7.**Finish AMS-Sampling. F_2 estimation and intro to sketching (slides)- Lecture 4 from Fall 2014

**Tuesday, February 12.**Frequent items and Misra-Gries algorithm (slides)**Thursday, February 14.**CountMin and Count Sketches (slides)- Lecture 6 from Fall 2014

- Algorithms for Big Data: Chandra (UIUC), Fall 2014
- Algorithms for Big Data: Jelani Nelson (Harvard), Fall 2015.
**Includes Video Lectures** - Sketching Algorithms for Big Data: Piotr Indyk (MIT) and Jelani Nelson (Harvard), Fall 2017
- Algorithmic Techniques for Big Data: Moses Charikar (Stanford)
- Data Stream Algorithms: Amit Chakrabarti (Dartmouth)
- Sub-linear Algorithms: Piotr Indyk and Ronitt Rubenfeld (MIT)
- Randomized Algorithms for Matrices and Data: Michael Mahoney (Stanford)
- Algorithms for Modern Data Models: Ashish Goel (Stanford)
- Algorithmic Techniques for Big Data Analysis: Barna Saha (U. Minnesota)
- Models of Computation for Massive Data: Jeff Philips (Utah)
- A book in preparation on data stream algorithms by Andrew McGregor and Muthu Muthukrishnan
- A useful book with an emphasis on the practical aspects: Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman and Jeff Ullman.
- Foundations of Data Science, an almost done book by Avrim Blum, John Hopcroft and Ravi Kannan
- Survey on sketching by Graham Cormode
- Sketching as a Tool for Numerical Linear Algebra, a survey by David Woodruff. Copy for personal use from David's website
- Books on randomization: Probability and Computing (Mitzenmacher-Upfal), Randomized Algorithms (Motwani-Raghavan), The Probabilistic Method (Alon-Spencer), Concentration of Measure (Dubhashi-Panconesi)
- A survey on concentration inequalities by Fan Chung and Linyuan Li.
- Probabilistic Tools for the Analysis of Randomized Optimization Heuristics by Benjamin Doerr
- Course material from Summer School on Hashing: Theory and Applications
- Open Problems in Sublinear Algorithms