CS 598: Algorithms for Big Data
This course will describe some algorithmic techniques developed for handling large amounts of data that is often available in limited ways. Topics that will be covered include data stream algorithms, sampling and sketching techniques, and sparsification, with applications to signals, matrices, and graphs. Emphasis will be on the theoretical aspects of the design and analysis of such algorithms. Prerequisites: CS 573, good background in (discrete) probability
Instructor: Chandra Chekuri- 3228
Office Hours: Wed 10-11am, and by appointment
Grading Policy: There will be 4-5 homeworks, roughly once every two weeks and final course project. Course projects could involve research on a specific problem or topic, a survey of several papers on a topic (summarized in a report and/or talk), or an experimental evaluation. I also expect students to scribe one lecture in latex.
Prerequisites: This is a graduate level class and a reasonable background in algorithms and discrete mathematics would be needed. Knowledge and exposure to probability and linear algebra is necessary.
Note: The above list is suggestive/tentative and we will cover only a subset of the topics.
Homework 1 given on Thursday 09/2/14, due in class on Thursday, 9/11/14.
Homework 2 given on Friday 09/12/14, due on Thursday, 9/23/14.
Homework 3 given on Friday 10/10/14, due on Thursday, 10/23/14.
Homework 4 given on Friday 11/7/14, due on Thursday, 11/20/14.
Warning: Notes may contain errors. Please bring those to the attention of the instructor.
Lecture 1: 8/26/14, Introduction, basics of probability, probabilistic counting (Morris's algorithm), reservoir sampling
Lecture 2: 8/28/14, Estimating Number of Distinct Elements in a Stream
Lecture 3: 9/2/14, Estimating F_k norms via AMS sampling
Lecture 4: 9/4/14, Estimating F_2 norm, Sketching, Johnson-Lindenstrauss Lemma
Lecture 5: 9/9/14, Estimating F_p norm for 0 < p < 2, Misra-Greis algorithm for frequent items
Lecture 6: 9/11/14, Count and Count-Min Sketches
Lecture 7: 9/16/14, Sparse recovery via Count-Sketch (see notes from previous lecture)
Lecture 8: 9/18/14, \ell_2 sampling and application to near-optimal F_k estimation for k > 2
Lecture 9: 9/23/14, \ell_0 sampling, and priority sampling
Lecture 10: 9/25/14, Quantiles and selection in multiple passes
Lecture 11: 9/30/14, Continue previous lecture.
No lecture on 10/2/14, discuss home work problems.
Lecture 12: 10/07/14, Graph Streams: Connectivity, Cut/Spectral sparsifiers
Lecture 13: 10/09/14, Graph Streams: Spanners, Matchings, Sketching for graphs
Lecture 14: 10/14/14, Finish graph streams. (2+\eps) for k-center clustering in streaming setting
Lecture 15: 10/16/14, Lower bounds for streaming via communication complexity
Lecture 16: 10/21/14, Lower bound on communication complexity of INDEX, lower bounds for graph streaming problems
Lecture 17: 10/23/14, Similarity estimation and Locality sensitive hashing
Lecture 18: 10/28/14, Approximate Nearest Neighbor Search via Locality Sensitive Hashing
Lecture 19: 10/30/14, Approximate matrix multiplication.
Lecture 20: 11/4/14, Singular Value Decomposition
Lecture 21: 11/6/14, Fast Deterministic Low-Rank Approximation
Lecture 22: 11/18/14, Subspace embeddings and Fast Least Squares Regression
Lecture 23: 11/20/14, Compressed Sensing
Course Project Information