Final Project

Ferocious Final Projects

Assignment Description

In a team of three or four students, you will propose and produce a final project worth a total of 220 points. As a team, you have considerable freedom to choose a project of interest to you – while we have some suggested projects through the Project Goals links, you are strongly encouraged to propose your own.

Team Formation (Due March 11th)

To participate in the CS 225 Final Project, you must create your own team of three or four students. Once you have formed a team, each member of the team must complete the team submission form below with the netid of each team member (and your unique team name). If you do not fill out the form by the deadline, you will instead have to take the final exam at the end of the semester. (If you would rather take the final, you do not need to do anything at all here!)

Team Submission Form

You can choose your name but if you don’t have one and to ensure unique names you can use the following to create a “unique” identifier for your group: Get Unique ID.

Be sure to correctly input both your team name and your teammate’s netids! An incorrectly typed team name can delay or even prevent you from completing the final project. Once teams are formed, you will be notified your project mentor (who will create a git repository that you will use to submit deliverables and receive feedback).

Team Contract (Due March 25th)

As a team, you must submit a 1-2 page document as a MD file which formalizes your team’s views on both core logistic issues as well as common pitfalls you may encounter over the course of your project. Once signed by each member of your team, it should be considered a binding agreement for all parties. Breaches of this contract can and should be brought up internally and – if not resolved – brought to the attention of course staff. Accordingly, the document should – at minimum – include the following major Communication and Collaboration issues.

Communication

Determining how to communicate with your teammates as well as how often you should be communicating is key to a successful remote project. Discuss with your team and draft a statement detailing the following:

  1. Team Meetings When and how often often will your team meet? How long should each meeting last? What software or tool will you use to host these meetings? Will someone take notes (record minutes)?

  2. Assistance How will your teammates be able to contact you if they need your help or opinion on a task? How quickly should you be expected to respond?

  3. Respect An effective team needs to have an environment which encourages open expression of ideas. How will you ensure that every member has an opportunity to speak and, more importantly, that every member will actively listen and engage with the thoughts of others?

Collaboration

The final project tasks you with finding a fair distribution of labor where each student has some role in the development of each deliverable. However the details of this distribution are up to you. Discuss with your team and draft a statement detailing the following:

  1. Work Distribution How will you assign workload for this project? How will you address unexpected complications or unforeseen work? You are encouraged to identify the strengths and desires of each team member when distributing work. You do not need to all work equally on a particular deliverable – it is the overall work that should be largely equal.

  2. Time Commitment How many hours of work per week is expected of each group member? Are there prior time commitments that need to be accounted for? How will you address new conflicts or commitments when they do inevitably occur?

  3. Conflict Resolution How will the team resolve situations where there is a disagreement between members? Situations where one or more members have not accomplished their tasks? Situations where one or more members are habitually late? Are there other hypothetical situations that you as an individual or as a team want to discuss ahead of time? When issues occur, you are strongly encouraged to inform course staff, but only after first trying to resolve the issue as a team in a respectful manner.

To receive credit for your team contract, each individual must electronically sign the document. This must be done by making a single git commit that modifies the file to include your name and netid. In other words, your contract file should have 3-4 commits (one for each team member)!

This will both demonstrate that you as an individual have agreed to the contract and ensures that you experience and overcome some of the common problems when multiple people are trying to edit a single document. Hint: Be sure to git pull before you git push so you don’t overwrite your teammates signature!

An example contract can be found HERE. You should not copy this contract directly, but use it as an example of the format and content you should be discussing as a team.

Final Project Proposal (Due March 25th)

Even if you choose to use one or more of the suggested example project goals, as a team you are responsible for submitting a project proposal of no more than two pages that contains the following information:

  1. Leading Question Your final project should have a clear conclusion or target goal – given a dataset and a code base that implements some graph algorithms, what can you learn from the dataset? Are you hoping to solve a specific problem? Are you hoping to produce a general search tool? You should clearly describe how your team will use your dataset and algorithms to answer your leading question. Be thorough in your description – this is the foundation of your project and if your mentor cannot follow your logic, you will not be able to proceed further on the project. NOTE: Not every algorithm implemented in the project must directly answer this question, but you must answer the question using the algorithms you have selected.

  2. Dataset Acquisition and Processing Your final project must use at least one publicly accessible dataset and your proposal must clearly describe what dataset you have chosen to use. This includes succinctly describing:

    • Data format. In roughly a paragraph, you should describe to the best of your ability the specifics of your input dataset. At minimum this includes: What is the source of the dataset and what is the input format of said dataset? How big is the dataset? Do you plan to use all of the data or only a subset? If so, how will you define the subset?

    • Data Correction. In a paragraph or two, you should describe how will you parse the input data and what checks are you doing to ensure the input data is error-free. At minimum this should dicuss how you will check for missing entries and how will you correct such instances when you find them. Depending on the dataset, it is also reasonable to check for values that are not physically possible or values which are statistical outliers. Note: These are just suggestions – you may have many other ideas for how to find and correct problems in your dataset

    • Data Storage. In a paragraph or two, you should describe what data structure are you using to store the data within your code. If you need any auxilary data structures or preprocessed tables, you should also discuss them here. As part of this proposal you must include an estimate of the total storage costs for your dataset in Big O notation.

  3. Graph Algorithms In no more than a few paragraphs, describe what algorithms you will use to answer the leading question. You should spend some time considering what algorithms you might try and, for all major functions you plan to use, include the following details in your proposal:

    • Function Inputs What are the expected inputs for your algorithm? Do you have to do anything to convert your stored dataset into a usable input for the algorithm described? (Ex: A graph algorithm would require making the input into a graph.) For the more complex algorithms, be sure to include as part of the input any additional information you might need. For example, A* search requires a heuristic. If you choose to do A*, what are some possible heuristics you might use?

    • Function Outputs What is the expected output for your algorithm? How will you store, print, or otherwise visualize the outcome?

    • Function Efficiency Your algorithm likely has a theoretically optimal Big O that you can find online. But most algorithms also have multiple implementations and there is no guarantee that your implementation of this algorithm is optimal. As part of this proposal you must include an estimate or target goal on the Big O efficiency of your algorithm in both time and memory.

    NOTE: To be considered a valid final project, your team must implement at least two graph data algorithms as well as a graph traversal from the list of example goals or you must propose an algorithm or set of graph algorithms that represent an equivalent amount of coding development.

  4. Timeline As a team, identify a list of tasks such as data acquisition, data processing, completion of each individual algorithm, production of final deliverables, etc… and write a proposed timeline for the completion of these tasks. You are not required to adhere strictly to this timeline but it should represent a reasonable set of benchmarks to strive for. For example, stating that you will finish all graph algorithms over the span of a single week is not reasonable. At least one proposed task must be completed before the mid-project checkin – part of the mid-project grade will be based on whether or not this target goal was met.

Your overall grade for the proposal will be based on what you turn in by the deadline. If your proposal is not accepted (or you would like to resubmit for a regrade), there will be a minor regrade penalty for each additional attempt at submitting.

Development Log (Due weekly after March 25th)

A successful final project is built slowly over many weeks not thrown together at the last minute. To incentivize good project pacing and to let your project mentor stay informed about the status of your work, each week you should add an entry to a log.md file in an obvious location in your git repo (like documents).

Each entry should describe:

  1. What goals you had set for the week and whether they were accomplished or not
  2. What specific tasks each member of your team accomplished in the week
  3. What problems you encountered (if any) that prevented you from meeting your goals
  4. What you plan to accomplish next week

The development log will be graded for completion, detail, and honesty – not progress. It is much better to truthfully evaluate the work you completed in a week then lie to make the project sound further along then it really is. It is totally acceptable to have an entry that says you tried nothing and accomplished nothing. However if every week starts to say that, both you and your project mentor will be able to identify the issue before it becomes impossible to fix.

Mid-Project Checkin (April 18th)

A few weeks into the final project, you are required to meet with your project mentor for a check-in meeting. You do not need to prepare a presentation but should come prepared to summarize your progress as well as have a frank discussion about any issues or concerns you have encountered as a team or as an individual team member. The goal here is to ensure that forward progress is being made and to address any issues that are impeding progress while there is still time to correct and recover. To that end, you should be up front and honest about your current progress.

While a significant amount of points for the checkin meeting is awarded for attending as a team, for full credit in the mid-project meeting you must have also completed at least one of your chosen algorithms or have a thorough data parsing pipeline (with all corrections / cleaning steps functional). You will be expected to demonstrate in the meeting the tests you have written proving that the algorithm works. This is to encourage you to start working on the final project long before the final weeks and ensure that you are writing real tests for your code as you develop it.

Final Project Deliverables (Due May 6th)

There are four main deliverables for this final project. As a team, you are expected to distribute work on each deliverables fairly. This means that each student should be responsible for some part of each of the following:

  1. A functional code-base. Your code must be written in C++ and should be compilable and runnable on the EWS machines (or a VM equivalent). It will be tested for reproducibility of your original results and it’s capacity to run on datasets of our choosing that exactly match your proposed formatting. Your code will be graded based on the following metrics:

    • Code ExecutionHow easy is it to run your code? For full credit, your code should be runnable using simple command line arguments, which include the ability to alter or adjust the input data or output location.

    • Code EfficiencyDoes your code match your target Big O efficiencies? For full credit, your code should have no obvious inefficiency in implementation and be capable of running to completion on your proposed dataset using reasonable hardware resources.

    • Code OrganizationIs your code human-readable? For full credit, all your variables, functions, and classes should be named appropriately and organized comments should detail the input, output, and intended behavior of major code blocks. Additionally, your final submission should be devoid of unnecessary or obsolete code.

    • Code CompletionHave you completed all your algorithms? For full credit, your code must be able to run all the proposed algorithms on the full dataset and have tests proving that the algorithms worked.

  2. A descriptive README. In addition to the code itself, you must include a human-readable README.md which describes:

    • Github Organization – You should describe the physical location of all major files and deliverables (code, tests, data, the written report, the presentation video, etc…)

    • Running Instructions – You should provide full instructions on how to build and run your executable, including how to define the input data and output location for each method. You should also have instructions on how to build and run your test suite, including a general description on what tests you have created. It is in your best interest to make the instructions (and the running of your executables and tests) as simple and straightforward as possible.

  3. A written report. In addition to your code, your Github repository must contain a results.md file which describes:

    • The output and correctness of each algorithm – You should summarize, visualize, or highlight some part of the full-scale run of each algorithm. Additionally, the report should briefly describe what tests you performed to confirm that each algorithm was working as intended.

    • The answer to your leading question – You should direct address your proposed leading question. How did you answer this question? What did you discover? If your project was ultimately unsuccessful, give a brief reflection about what worked and what you would do differently as a team.

  4. A final presentation. In addition to your project write-up, you should submit a short video (10 minutes or less) describing your project. Your presentation should include slides or other visual aids and include the following content:

    • Your Goals (Suggested time: 1-2 minutes) The presentation should begin with a summary of your proposed goals and a short statement about what you successfully accomplished and, if necessary, what you were ultimately unable to complete.

      Tip: Think of this as ‘setting the stage’ for your presentation, letting the viewer know what you will be discussing for the rest of the talk.

    • Your Development (Suggested time: 2-3 minutes) The presentation should include a high level overview of the work you put into the presentation. This is not meant to be a line by line recounting of your code but a highlight reel of the various design decisions you made and the challenges you encountered – and hopefully overcame – while working on the project.

      If you were unable to complete one of your goals, this is the best opportunity to explain what you did that didn’t work out, how you tried to address the problem, and what you might do in the future if you were tasked to do this or a similar project again.

      Tip: If you are struggling to identify content here, ask yourself questions like: “How did we get the data we wanted?”, “How did we choose our implementation strategy for an algorithm?”, “How did we ultimately test our code to ensure that it is working?”

    • Your Conclusions (Suggested time: 3-5 minutes) The presentation should end by answering the ‘leading question’ you were hoping to solve. This may include details such as the final or full-scale input dataset you used and the output of each of your algorithms but ambitious teams should focus on how these results led you to discover something interesting involving your real-world dataset. For example, a traversal algorithm on OpenFlights data may be used to identify the shortest path between two airports that your team would like to visit.

    In addition to quantitative results, your conclusions should also end with some individual thoughts you had about the project. What did you learn, what did you like or didn’t like, and what would you explore or implement next if given more time?

    To submit your final project video, you may either include it on Github or include a direct link to the video on your team Github. Videos can be hosted through Zoom cloud recordings, Youtube, Google drive, etc…