In lecture on Thursday, you learned about distance metrics and using Python to read CSV files. For this activity, you will use both of these concepts to find the classmate of yours that is most similar.
If you were in class on Tuesday, you're ready to go!
If you're just joining us, you will need to first add yourself by adding a new row to the "Welcome" Google Sheet. The linked sheet is a version that has already had data cleaning applied. If someone already picked your answer, please make sure you match their answer (including capitalization).
A new directory has been created on the release git repository,
which you should complete this assignment in. To merge this into
your repository, navigate to your workbook directory using a
command line and run the following
commands:
git fetch release
git merge release/activity1 master -m "merge"
Once you have the activity1 directory, download the "Welcome to CS 205" Google Sheet as a CSV file from below and place it into the res directory of activity1 (this is the same thing you did in class!): "Welcome" Google Sheet.
In lecture, we discussed six distance metrics. In all of them, we compare feature vectors between two samples (rows of data). To find the person most similar to you, you will need to compare you with every other person in the class.
As an algorithm, we can divide this into two major steps:
Inside of compute.py in your Python directory, you will find the code we had at the end of lecture.
Inside of the for row in data
for-loop, add a conditional to
check if the name of the current record is you! If it is you, save the
row to a variable (possibly like the following):
myrow = row
After the for-loop has completed, the variable myrow will contain all of your answers. Let's verify this before we continue.
Outside of the for-loop, at the end of your program, print out a few things
about you through myrow, for example:
print( myrow["Current Major"] )
print( myrow["Cats or Dogs?"] )
Run your program by moving into your activity1 directory and running:
python py/compute.py
At this point, you've found you! Now, we need to find who's most similar to you.
A straightforward way to do this is to go through the data again, by making a second for-loop that runs after only after you have found you.
As you go through the data a second time, compare the data in row to myrow. For example, if row["Current Major"] and myrow["Current Major"] are the same, then you know you and the current row have the same major!
After finding how similar you are to the current row, print it out to
the screen using something similar to the following:
print( "Similarity with " + row["Name"] + ": " + str(similarityScore) )
Once your program works, you should see how similar you are to everyone else
in the class! Find the one who you're most similar to and add it to the
Activity 1 Results sheet (linked in the Submission section).
...and try to eliminate any ties!
You will need to do two things to complete this activity: