CS 205 (Fall 2015): Data Driven Discovery

About the CS 205 Workbook

Throughout CS 205, the workbook will be used as a platform to run your code and will serve as a portfolio of all your work. As a general rule, you will synchronize the workbook with the "master" workbook to get the latest demo, lab, and assignment. The master workbook contains only the descriptions of each demo, lab, and assignment and you will complete them in your own version of the workbook.

Installing the prerequisites

In order to get started with the CS 205 workbook, your computer must have the software that the workbook relies on already installed. To install everything that is needed, complete the following steps:

  1. Download Anaconda (Anaconda is a collection of python libraries)
    • ​​Go to http://continuum.io/downloads and click to download the default installer.
      • Windows: this is the "Windows 64-bit Python 2.7 Graphical Installer"
      • Mac OS X: this is the "Mac OS X 64-bit Python 2.7 Graphical Installer"
  2. Install Anaconda
    • ​​Run the downloaded installer.
      • You do NOT need to follow any instructions on the Anaconda web page to get any additional components installed. A default install is all that is needed.
      • Windows: In Advanced Options, make sure "Add Anaconda to my PATH environment variable" is selected. 
      • Mac OS X: default options for everything are fine.
  3. Open a Command Line Interface: The next step requires you to type a command into a command line interface. You must open a new command line interface after you installed Anaconda, so you should not do this step before the previous steps.
    • Windows: press Start and type cmd. (Just start typing after you press Start, Windows will know what you're doing.)
    • Mac OS X: use the App Launcher and type terminal and launch the Terminal App
  4. Add Flask to Anaconda (Flask is the framework or skeleton that we build our app on)
    1. Open command line interface if not open.
    2. Type the following and then press enter: conda install flask
      • When asked to proceed, type y and press enter.
  5. Install git (Git is the system we will use to keep track of our code and the changes we make to it)
    • Windows: go to http://git-scm.com/download/win to get the git program. Once downloaded, install it; again, default options are fine for everything.
    • Max OS X: type git into your Terminal App and press enter. OS X usually will pop up a message that will allow you install git as part of OS X.
  6. Create Heroku Account and Install Heroku Toolbelt (Heroku is a hosting service that we will host our app on)
    1. Sign up at https://www.heroku.com
    2. Windows: go to https://toolbelt.heroku.com/windows to download the toolbelt. Once downloaded, install it.
    3. Mac OS X: go to https://toolbelt.heroku.com/osx to download the toolbelt. Once downloaded, install it.

Setting up your Gitlab

  1. Sign into Gitlab (Gitlab is where our code will be hosted)
  2. Fork the class Gitlab repository (Forking makes a copy of the repository in your account)
    • ​​Go to https://gitlab-beta.engr.illinois.edu/cs205dev/workbook, click 'Fork'.
    • Can ignore warning at the top, "You won't be able to pull or push project code via SSH until you add an SSH key to your profile."
      • (OPTIONAL) set up SSH key to skip entering password on connecting to Gitlab.
  3. Add course staffs to your newly forked project (This is for grading purposes)
    1. Go back to your projects by clicking the logo on the upper left.
    2. Go to the newly forked project under Personal Projects on the right side, should be NETID/Workbook
    3. Click 'Members' on the left side bar
    4. Add the following as 'Reporters':
      • c-heeren
      • ajmori2
      • hanchen2
      • waf

(Optional) Add SSH key to Gitlab profile

Mac

  1. Open a Command Line Interface: If you forgot how, instructions here. If you just installed git, you will need to close your old Command Line Interface and re-open it for it to know about git.
  2. (Optional) ​​Check for old SSH keys
    1. type the following command ls -al ~/.ssh​​
    2. If any file with the following is listed, you can skip to step 4
      • id_rsa.pub
      • id_ed25519.pub
      • id_ecdsa.pub
      • id_dsa.pub
  3. ​​Generate a new SSH key
    1. Type the following command ssh-keygen -t rsa -C "your_email@example.com"
    2. Leave it blank and press ENTER when asked "Enter file in which to save the key:"
    3. Enter anything for passphrase
  4. ​​Start your ssh-agent
    1. ​To start the ssh-agent, type the following command eval $(ssh-agent)
    2. ​To add your ssh to the agent, type the following command ssh-add ~/.ssh/id_rsa
      1. If your ssh key is in different file, check the name from step 2 and replace id_rsa 
  5. Add SSH Key to your Gitlab account
    1. ​To copy your ssh key, type the following command pbcopy < ~/.ssh/id_rsa.pub
    2. Log on to your Gitlab account https://gitlab-beta.engr.illinois.edu/users/sign_in
    3. Select "Profile settings"
    4. Select "SSH Keys" on the top menu bar
    5. Select "Add SSH Key"
    6. Paste key into the Key box
    7. Select "Add Key"

Windows

  1. Open Git Bash
    • ​​Right click any blank area on your desktop and select 'Git Bash'
  2. (Optional) ​​Check for old SSH keys
    1. type the following command ls -a ~/.ssh​​
    2. If any file with the following is listed, you can skip to step 4
      • id_rsa.pub
      • id_ed25519.pub
      • id_ecdsa.pub
      • id_dsa.pub
  3. ​​Generate a new SSH key
    1. Type the following command ssh-keygen -t rsa -C "your_email@example.com"
    2. Leave it blank and press ENTER when asked "Enter file in which to save the key:"
    3. Enter anything for passphrase (Your passphrase is hidden as you type)
  4. ​​Start your ssh-agent
    1. ​To start the ssh-agent, type the following command eval $(ssh-agent)
    2. ​To add your ssh to the agent, type the following command ssh-add ~/.ssh/id_rsa
      1. If your ssh key is in different file, check the name from step 2 and replace id_rsa 
  5. Add SSH Key to your Gitlab account
    1. ​To copy your ssh key, type the following command clip < ~/.ssh/id_rsa.pub
    2. Log on to your Gitlab account https://gitlab-beta.engr.illinois.edu/users/sign_in
    3. Select "Profile settings"
    4. Select "SSH Keys" on the top menu bar
    5. Select "Add SSH Key"
    6. Paste key into the Key box
    7. Select "Add Key"

Downloading the CS 205 Workbook the first time

Before downloading the CS 205 Workbook, you will need to decide where you want it to "live" on your computer. For most people, a folder on the desktop makes the most sense. You will be working within it a lot, so it should be somewhere easy to access.

Mac

  1. Go to your project on Gitlab
    • ​​Go on Gitlab and go to your copy of the workbook, should be NETID/workbook.
  2. Get your project's address
    • ​​On the top, select HTTPS instead of SSH if you did not setup SSH key.
    • Copy the address
      • should look like https://gitlab-beta.engr.illinois.edu/NETID/FA15workbook.git (HTTPS) or git@gitlab-beta.engr.illinois.edu:hanchen2/workbook_dev.git (SSH)
  3. Open a Command Line Interface: If you forgot how, instructions here. If you just installed git, you will need to close your old Command Line Interface and re-open it for it to know about git.
  4. Navigate to your CS 205 folder on the Command Line Interface
    • If your folder is on your desktop and called cs205, you can do that by typing cd Desktop, pressing enter, typing cd cs205, and pressing enter again.
      • cd stands for change directory, use it to navigate through folders.  cd "FOLDER NAME" to enter a folder or cd .. to back out a folder.
  5. Clone a copy of the workbook for yourself
    • Type the following command git clone COPIED ADDRESS Ex. git clone https://gitlab-beta.engr.illinois.edu/NETID/FA15workbook.git
      • This creates a new folder within your current folder called "workbook" that contains the CS 205 workbook.
  6. Navigate into the folder
    • Type the following command cd workbook
  7. Add the course's original repository as a source (New assignments will be released through here)
    • Type the following command git remote add release https://gitlab-beta.engr.illinois.edu/cs205dev/workbook.git
      • 'git remote' is the set of commands regarding repositories.
      • 'add' adds a repository
      • 'release' is what we are naming the repository, can be named anything
  8. (Optional) View the repository associated with your workbook
    • Type the following command git remote -v
      • One set of entries is labeled 'origin', which is your forked repository
      • One set of entries is labeled 'release', which is what you just added.

Windows

  1. Go to your project on Gitlab
    • ​​Go on Gitlab and go to your copy of the workbook, should be NETID/workbook.
  2. Get your project's address
    • ​​On the top, select HTTPS instead of SSH if you did not setup SSH key.
    • Copy the address
      • should look like https://gitlab-beta.engr.illinois.edu/NETID/FA15workbook.git (HTTPS) or git@gitlab-beta.engr.illinois.edu:hanchen2/workbook_dev.git (SSH)
  3. Navigate to your CS 205 folder on your computer
  4. Open Git Bash
    • ​​Right click any blank area within the folder and select 'Git Bash'
  5. Clone a copy of the workbook for yourself
    • ​​Type the following command git clone COPIED ADDRESS Ex. git clone https://gitlab-beta.engr.illinois.edu/NETID/FA15workbook.git
      • To paste the address within the bash window
        1. Right click the top of the bash window,
        2. Go to 'Edit' then select 'Paste'
      • Type yes if prompted about "Are you sure you want to continue connecting (yes/no)?"
      • This command will create a new folder within your current folder called "workbook" that contains the CS 205 workbook.
  6. Navigate into the folder
    • Type the following command cd workbook
  7. Add the course's original repository as a source (New assignments will be released through here)
    • Type the following command git remote add release https://gitlab-beta.engr.illinois.edu/cs205dev/workbook.git
      • 'git remote' is the set of commands regarding repositories.
      • 'add' adds a repository
      • 'release' is what we are naming the repository, can be named anything
  8. (Optional) View the repository associated with your workbook
    • Type the following command git remote -v
      • One set of entries is labeled 'origin' which is your forked repository
      • One set of entries is labeled 'release' which is what you just added.

Updating the CS 205 Workbook to the latest version

When a new assignment is made, it will be done via the course adding content to the CS 205 Workbook. Therefore, for each assignment, you will need to update your workbook to the latest version each time.

To update it:

Mac

  1. Open a Command Line Interface: If you forgot how, instructions here. If you just installed git, you will need to close your old Command Line Interface and re-open it for it to know about git.
  2. Navigate to your CS 205 folder on the Command Line Interface
    • If your folder is on your desktop and called cs205, you can do that by typing cd Desktop, pressing enter, typing cd cs205, and pressing enter again.
  3. Fetch the latest updates
    • ​​Type the following command git fetch release
  4. ​​Type the following command git checkout master
    • ​​Continue if you see "Already on 'master'"
  5. ​​Type the following command git merge release/master

Windows

  1. Navigate to your CS 205 folder on your computer
  2. Open Git Bash
    • ​​Right click any blank area within the folder and select 'Git Bash'
  3. Fetch the latest updates
    • ​​Type the following command git fetch release
  4. ​​Type the following command git checkout master
    • ​​​Continue if you see "Already on 'master'"
  5. ​​Type the following command git merge release/master

Running the CS 205 Workbook

In order to run both your code, as well as the CS 205 Workbook as a whole, you need to start the CS 205 Workbook. To do this:

  1. Open a command line interface and navigate to your CS 205 workbook folder
  2. Run the app.py code in python
    • Run the following command: python app.py
  3. View workbook in browser
    • Open a web browser and type 127.0.0.1:5000 into your address bar

Text Processing in the CS 205 Workbook

In order to run any text processing code (starting February 17), you need to install the nltk Natural Language Toolkit for python. To do this:

  1. Open a command line interface
  2. Run the following command: conda install nltk
  3. Run the following command: python -m nltk.downloader punkt

Web Scraping

In order to run the web scraping code (starting April 2), you need to install the Scrapy package for python. To do this:

  1. Open a command line interface
  2. Run the following command: conda install scrapy
  3. When prompted to proceed, choose: y

Creating a Scraper

To create each scraper, you should:

  1. Rename your scraping folder olympics_original
  2. Open a command line interface
  3. Change directory to the py folder for your lab or demo
  4. Run the following command: scrapy startproject projectname where you fill in the desired project name. For example, for the Olympics lab, you would type scrapy startproject olympics
  5. Either in terminal or in your file manager, copy the following files from the olympics_original directory into the same folder in the olympics directory you just created:
    • items.py, pipelines.py, settings.py, olympics_scraper.py (or lyrics_scraper.py or whatever the scraper in the spiders folder is named)

Committing your work to Git, Gitlab, or update Heroku webpage

To turn in assignments in this class, you need to upload aka 'push' the changes you made onto Gitlab. You will first commit your changes, then push them onto Gitlab. Think of commit as a collection of changes, so you can commit multiple times before you do a push.

Mac OS X

  1. Open a Command Line Interface: If you forgot how, instructions here. If you just installed git, you will need to close your old Command Line Interface and re-open it for it to know about git.
  2. Navigate to your CS 205 workbook on the Command Line Interface
    • If your folder is on your desktop and called cs205, you can do that by typing cd Desktop, pressing enter, typing cd cs205, and pressing enter again.
  3. (Optional) List of new or changed files
    • Run the following command: git status
  4. Add changes to the next commit
    • Run the following command: git add FILENAME (replace FILENAME with the name of the new or modified file that you want to commit)
      • Repeat for all the files you want to commit.
      • This adds all the changes within the file to the next commit.
      • Do this for all new or modified files.
  5. Commit changes to repository
    • Run the following command: git commit -m "USEFUL MESSAGE" (replace USEFUL MESSAGE with what you changes you made to the files)
      • If you ever need to rollback to an earlier revision. The message will help you determine which revision it is you are looking for.
  6. (Optional) Update Heroku Website
    • Run the following command: git push heroku master
      • 'heroku' is the repository name.
      • 'master' is the branch, every repository starts with the master branch.
  7. (Optional) Push to Gitlab
    • Run the following command: git push origin master
      • 'origin' is the repository name.
      • 'master' is the branch, every repository starts with the master branch.

Windows

  1. Navigate to your CS 205 workbook on your computer
  2. Open Git Bash
    • ​​Right click any blank area within the folder and select 'Git Bash'
  3. (Optional) List of new or changed files
    • Run the following command: git status
  4. Add changes to the next commit
    • Run the following command: git add FILENAME (replace FILENAME with the name of the new or modified file that you want to commit)
      • Repeat for all the files you want to commit.
      • This adds all the changes within the file to the next commit.
      • Do this for all new or modified files.
  5. Commit to repository
    • Run the following command: git commit -m "USEFUL MESSAGE" (replace USEFUL MESSAGE with what you changes you made to the files)
      • If you ever need to rollback to an earlier revision. The message will help you determine which revision it is you are looking for.
  6. (Optional) Push to Gitlab
    • Run the following command: git push origin master
      • 'origin' is the repository name.
      • 'master' is the branch, every repository starts with the master branch.

Deploying Workbook on Heroku

To deploy the app online, you should:

Mac OS X

  1. Open a Command Line Interface: If you forgot how, instructions here. If you just installed git, you will need to close your old Command Line Interface and re-open it for it to know about git.
  2. Navigate to your CS 205 workbook on the Command Line Interface
    • If your folder is on your desktop and called cs205, you can do that by typing cd Desktop, pressing enter, typing cd cs205, and pressing enter again.
  3. If first time, log into Heroku
    • Run the following command: heroku login, and enter your credentials.
  4. Create Heroku Instance
    • Run the following command: heroku create
  5. Commit any changes to Git
  6. Push to Heroku Instance
    1. Run the following command: git push heroku master (This pushes the workbook on, might take awhile)
    2. Run the following command: heroku ps:scale web=1 (This scales the heroku instance to one(free) worker, any more will cost money)
  7. Open in browser
    • Run the following command: heroku open

Windows

  1. Navigate to your CS 205 workbook on your computer
  2. Open Git Bash
    • ​​Right click any blank area within the folder and select 'Git Bash'
  3. If first time, log into Heroku
    • Run the following command: heroku login, and enter your credentials.
  4. Create Heroku Instance
    • Run the following command: heroku create
  5. Commit any changes to Git
  6. Push to Heroku Instance
    1. Run the following command: git push heroku master (This pushes the workbook on, might take awhile)
    2. Run the following command: heroku ps:scale web=1 (This scales the heroku instance to one(free) worker, any more will cost money)
  7. Open in browser
    • Run the following command: heroku open