Activity 1: Your Most Similar Classmate
Due: On git by Tuesday, January 24, 2017 at 9:30am
Team: This is a solo assignment.
Grading: This assignment is the post-lecture activity for Experience 1, worth 25 points. There is no partial credit.

Overview

In lecture on Thursday, you learned about distance metrics and using Python to read CSV files. For this activity, you will use both of these concepts to find the classmate of yours that is most similar.

If you were in class on Tuesday, you're ready to go!

If you're just joining us, you will need to first add yourself by adding a new row to the "Welcome" Google Sheet. The linked sheet is a version that has already had data cleaning applied. If someone already picked your answer, please make sure you match their answer (including capitalization).


Setting Up Files

A new directory has been created on the release git repository, which you should complete this assignment in. To merge this into your repository, navigate to your workbook directory using a command line and run the following commands:
   git fetch release
   git merge release/activity1 master -m "merge"

Once you have the activity1 directory, download the "Welcome to CS 205" Google Sheet as a CSV file from below and place it into the res directory of activity1 (this is the same thing you did in class!): "Welcome" Google Sheet.


Part 0: Thinking of the Algorithm

In lecture, we discussed six distance metrics. In all of them, we compare feature vectors between two samples (rows of data). To find the person most similar to you, you will need to compare you with every other person in the class.

As an algorithm, we can divide this into two major steps:

  1. Go through all the rows to find you!
  2. Go through all the rows again, finding who is most similar to you.

Part 1: Finding You!

Inside of compute.py in your Python directory, you will find the code we had at the end of lecture.

Inside of the for row in data for-loop, add a conditional to check if the name of the current record is you! If it is you, save the row to a variable (possibly like the following):
  myrow = row

After the for-loop has completed, the variable myrow will contain all of your answers. Let's verify this before we continue.

Outside of the for-loop, at the end of your program, print out a few things about you through myrow, for example:
print( myrow["Current Major"] )
print( myrow["Cats or Dogs?"] )

Run your program by moving into your activity1 directory and running:
python py/compute.py


Part 2: Finding Your Most Similar Classmate!

At this point, you've found you! Now, we need to find who's most similar to you.

A straightforward way to do this is to go through the data again, by making a second for-loop that runs after only after you have found you.

As you go through the data a second time, compare the data in row to myrow. For example, if row["Current Major"] and myrow["Current Major"] are the same, then you know you and the current row have the same major!

After finding how similar you are to the current row, print it out to the screen using something similar to the following:
  print( "Similarity with " + row["Name"] + ": " + str(similarityScore) )

Once your program works, you should see how similar you are to everyone else in the class! Find the one who you're most similar to and add it to the Activity 1 Results sheet (linked in the Submission section).
...and try to eliminate any ties!


Getting Stuck?

  1. Check out the Course Resources for help with Python programming.
  2. Search and/or post on CS 205's Piazza

Submission

You will need to do two things to complete this activity:

  1. Post your results to the Activity 1 Results Google Sheet, including a 1-2 sentence description of what distance/similarity metric you used. (I've already added my result as an example.)
  2. Push your work to your git repo. Detailed instruction on how to do this is found here.