ECE398BD: Fundamentals of Machine Learning (Labs)

	Topic	Quiz Answer	Feedback for Quiz	Lab	Assigned	Due	Feedback for Lab
Week 1	Introduction to Python	None	No Feedback	[link]	Aug 29	Not Graded	No Feedback
Week 2	Classification, Part 1	[link]	[link]	[link]	Sep 5	Sep 12, 11:59 PM	[link]
Week 3	Classification, Part 2	[link]	[link]	[link]	Sep 12	Sep 19, 11:59 PM	[link]
Week 4	Linear Regression and Clustering	[link]	No Feedback	[link]	Sep 19	Sep 26, 11:59 PM	[link]
Week 5	Principal Component Analysis	[link]	[link]	[link]	Sep 26	Oct 3, 11:59 PM	[link]

Here are two sample quizzes from previous years: [sample 1], [sample 2]. Sample 2 is better representative of the level of the first quiz.

Lab 5

This is the last week of this section of the course. It is the best have your lab completed before the start of next lab session, so that you do not fall behind in the second part of the course.

Lab 4

When calculating J*(K), you call kMeans 100 times (with niter=100) and take the minimum value of J_K as your estimate for J*(K). This is so that in case you get bad initial cluster centers, you still get a good estimate of J*(K).
This lab shouldn't take that long to run; if you're having issues with running time, chances are the hint from Lab2 on scipy.spatial.distance.cdist will help.
Note that scipy.spatial.distance.cdist returns the Euclidean distance, not its square, unless you pass 'sqeuclidean’.

Lab 3

Read the lab directions carefully. Make sure you are not training on your test data! As stated at the top of the lab, this will be penalized heavily. If you are calling .fit() on something that doesn't have train in the name, you're doing something wrong.
In the last problem, your error in the second to last part may come out to be zero depending on which algorithm you pick. This is an (unintentional) peculiarity of this data set (which is, in fact, in its documentation, but I missed; it will rarely happen otherwise – had I noticed this, I would have picked a different dataset). So, for the last part of the last problem, just pretend that the error was something small but non-zero when writing your answer.
There are many ways to split up the data into folds in problem 2. One simple way is to make a vector with indices 0,…,N-1, and remove the indices corresponding to the fold with numpy.setdiff1d, and use these to index the data. Another straightforward way is to make an array of size (4/5*N,d) and fill it in with the folds by slicing. Worst case, you can hardcode the folds and the data outside the folds.
Do not email me the data sets.

Lab 2

Hints:

If you're having trouble with broadcasting, read the help page (or search the internet for examples). Basically, dimensions have to match according to a certain set of rules (described in the links prior).
A vector (in the notes, or in math in general) is a column vector. You can't just take an equation in the notes (which takes in one feature vector and classifies it) and plug in a matrix full of data and expect it to work (the dimensions of the resulting expressions will make no sense, for one thing).
In problem 2, the prior is close in a way that might be somewhat confusing, since you should get [0.5,0.33,0.17]. Just the nature of this particular training set.
In problem 3, scipy.spatial.distance.cdist can calculate out all the distances between training data and the testing data in one call.
Read the problems carefully, and make you answer each part of what needs to be done.

Lab 1

Solution: [link]

Hints:

Exercises 5 and 6 will be building blocks for the first problem in Lab 2 (where you can use part (a) or part (b) of both exercises). You should be able to do part (a) of both exercises in a straightforward manner. As stated in the lab, part (b) is optional, but good to know. If you're stuck on part (b), make sure to write out the matrices and you should be able to construct the appropriate matrix multiplication. If you do not solve part (b), do not worry about it. But, you really should solve part (a) of both Exercises 5 and 6.
A better hint for Exercise 6(b) might be: “You can do this with the np.dot, elementwise multiplication and np.sum (along an axis) operations.”

Please follow the Python instructions to get started with Jupyter notebooks. You should not need to install any additional packages for this portion of the course if you have installed Anaconda or Canopy.

The following other Python tutorials may be helpful:

Elegant Scipy: The Art of Scientific Python, by Juan Nunez-Iglesias, Stefan van der Walt, Harriet Dashnow (Highly Recommended)
Software Carpentry (Highly Recommended)
UIUC CSE
Official Python Documentation
Learn Python the Hard Way
NumPy Quickstart (Highly Recommended)
NumPy for MATLAB Users
IPython Notebooks
High Performance Scientific Computing AMath 483/583, Spring 2011 at U. Washington

And a few links to write code concisely:

List Comprehensions
Code Vectorization (in Matlab; the ideas translate to Python in a straighforward manner)