Homework 10

Due November 27 at 11:59PM

Instructions

You may do this homework in pairs; choose your partner wisely if you choose to do so, as each will get the same credit. By submitting you are certifying the homework is the work of your team.

Submission: Homework 10 submission will be via Compass (you should have been signed up automatically, if not please email Rick) Submit your answers, graphs, and other responses as a PDF

Problems

  1. Do problems 11.1 and 11.2

Additional clarification. For the SVM dataset, use the data file with the name "wdbc.data". Use the file "wdbc.names" to guide your data cleaning and preprocessing process. The choice of which columns to drop will become apparent if you carefully read through "wdbc.names"

Another additional clarification. For the random forest question, please use an 80-20 training set - test set split of your data. This means that 80% of your data will be used for fitting/training your random forest classifier, and 20% of your data will be used to determine the accuracy and the class confusion matrix of your classifier.

As was said during the discussion section, for this homework, we would like you to upload 3 items to compass as part of your submission. The first item is a .txt file with you and your partner's name and netid. The second item is a .pdf report of your answers for 11.1 and 11.2. The third item is a compressed file including all of your code file(s). The compressed file should only include the code file(s), not the .txt file or the .pdf file.

You can upload multiple items in one submission to compass by clicking on "Browse my computer" button multiple times (it won't replace the previous file)

good luck, and remember to get started early!