Final assignment

Due May 9 at 4:30PM

Instructions

You should do this homework on your own -- one submission per student, and by submitting you are certifying the homework is your work.

Submission: Homework submission will be via Compass (you should have been signed up automatically, if not please email Rick)

Your submission must be typed and submitted as a PDF. No handwritten/scanned solutions. See the submission guidelines at the end of the page for additional details.

Problems

To get started please consult the lecture notes and the textbook. You may ignore any data that is in a MODEL folder.

First, obtain the activities of daily life dataset from the UC Irvine machine learning website (https://archive.ics.uci.edu/ml/datasets/Dataset+for+ADL+Recognition+with+Wrist-worn+Accelerometer data provided by Barbara Bruno, Fulvio Mastrogiovanni and Antonio Sgor-bissa).

Build a classifier that classifies the given files into the appropriate activity ('Use_telephone', 'Standup_chair', 'Walk', 'Climb_stairs', 'Sitdown_chair', 'Brush_teeth', 'Comb_hair', 'Eat_soup', 'Pour_water', 'Descend_stairs', 'Eat_meat', 'Drink_glass', 'Getup_bed', 'Liedown_bed').

The data items in this case are the files themselves; the classifier you learn will be able to take an activity file of arbitrary length and classify it with one of the activity labels. You might notice that each file is of a different length, which is why we will use vector quantization to turn each file into a fixed-length feature vector.

For your classifier's features, you should use vector quantization, creating a histogram of cluster centers for each data item. You should use k-means in order to construct the pattern vocabulary. You may use whichever multi-class classifier you wish.

Please hand in the following

  1. Report your total error rate and the class confusion matrix for your classifier.
  2. Then, improve your classifier by a) modifying the number of cluster centers in your k-means and b) modifying the size of the fixed length samples that you use.
  3. Hand in your source code, your total error rate and class confusion matrix for your final classifier with an explanation of how you selected your parameters and why your chosen parameters performed well. Your explanation should not be sparse. For example, a one sentence explanation would be considered insufficient and will be graded as such. Rather, consider describing your experiments that you conducted to obtain your parameters, and why you believe that they are optimal with respect to the data and the operation of your classifier. Overall, we don't expect your explanation will be more than 1 page in length, though, you're welcome to expand it further if you feel like doing so. Likewise, if you feel your explanation can be adequately represented in less than 1 page, then that is reasonable as well.

----------------------------------

Submission Guidelines

As part of your final submission, please submit two files:

  1. A pdf file with all of the above written deliverables or output deliverables (eg error rate and confusion matrix).
  2. A compressed file that contains all of the source code used to complete the assignment.

Some additional notes: If you choose to present code inline in your pdf report (eg exporting a ipython notebook as pdf), then it should be immediately apparent where each deliverable is located in the pdf report. (EG putting a bold title to identify which aspects of the report are which.). This additional note doesn't apply to those who choose to separate all code from all written report details.

Any infraction of the submission guidelines will be met with grading penalties