Administrivia


About the Class

This course will use real data from several domains (including resiliency and healthcare science) to instill in the students following data-science expertise:

  • Data management
  • Feature engineering by identifying key data characteristics
  • Problem formulation, machine learning models and validation
  • Design, construction, and assessment of end-to-end application workflows by using data-driven insights
  • Generating application domain insights that can improve the understanding of the domain problem

Through this course, students will be able to develop expertise and ability to form intuitions which can be more broadly applicable. Such an expertise will prepare students for data science engineering roles.

Course Description

Many modern application domains require engineers and domain experts to work together in the design, and analysis of heterogeneous datasets often with the objective of automating the decision making (sometimes referred to as actionable intelligence). Extracting the right level of knowledge to generate actionable intelligence from these datasets is a compelling problem. The proposed course addresses this problem by providing students with an opportunity to build analysis workflows that use data management, feature engineering, supervised and unsupervised learning to derive real-world insights. In this course, students will have an opportunity to work on real-world applications while interacting with domain experts.

This course instills skillsets required for constructing end-to-end real-world analyses workflows through lectures, labs and mini-projects which will allow students to derive domain insights. The course will use real-world examples (measurement logs from supercomputers, data from large clinical trials) to teach data-management, feature engineering, supervised/unsupervised learning, and testing & validation techniques. Students will gain hands-on implementation experience by completing three mini-projects requiring them to analyze substantial amount of high-fidelity real-world measurement datasets obtained from different application domains such as computer system design and healthcare applications. While each workflow is end-to-end, the student will go deeper into the understanding of the methods as one progresses through these projects. Moreover, students will learn to create quantifiable domain-specific metrics of interest (cost models) and use techniques to quantitatively assess their results. Students will develop the expertise and an ability to form intuitions which are applicable in other domains.

This course will feature several guest lectures from domain experts who will demonstrate how innovative data-analyses techniques have been transformative and as a result generated significant societal impact.

Prerequisites

Basic probability and basic computer programming skills are essential. ECE 313 or CS 361 (Statistics and Probability), and exposure to basics of scripting languages (such as Python). Knowledge of Operating Systems (e.g., ECE 391), or an equivalent course, is beneficial.

Timeline

There are 45 hours lectures {30 hours classroom lectures, quizzes and presentations & 15 hours hands-on data analytics lab}, over 15 weeks in the spring semester.

Lecture Electronics Policy

During the lectures, and data analytics labs, cell phones or similar non-class use of electronics are NOT allowed. If, due to unforeseen circumstances, the student needs access to her/his cell phone, she/he shall inform the instructor in the beginning of the lecture and should sit in a way (typically furthest from the board) not to allow any students behind her/him get disturbed.

Attendance Policy

The attendance to all lectures, and data analytics labs are required. There will be in-class assignments and class participation is graded. Students are advised to contact both the TAs and the instructor via a private post on Piazza (before the beginning of the lecture) if they are to miss a lecture due to unforeseen circumstances. Instructor and TAs reserve the right to take class attendance. Class attendance includes data analytics lab hours.

Evaluation

3 hours for undergraduate and 1 unit for graduates with self-proposed extensions approved by instructor(s) to two of the three min-projects.

We will compute the final grade using the following table:

Activity Grade Details
Mini-Projects 1, 2, 3 55% TBD
Midterm and Final 25%
Final Project (extra credit) 30%
Class Participation 10% May include quizzes
Homework 10%

In-class Assignments

TBD