MAT180-无代写
时间:2022-12-06
MAT 180 Group Project Instructions
Project Due Date: December 2
1 Proposal
You may choose your own group of two or three students (no individual projects). Ideally
at least one person per group will have some experience using Github, so choose your group
accordingly. By November 9 your group must submit a project proposal on gradescope.
One student per group will submit a proposal. The proposal itself should be a markdown
file named README.md satisfying the following conditions:
1. Include names of every group member
2. Name of Project
3. Outline goals of the project i.e. indicate the following:
(a) How will the data be collected?
(b) What task do you want to accomplish with the data?
(c) What kind of learning algorithm do you propose using to accomplish this task?
(d) How will you measure your performance of the task?
2 GitHub
We will use Github to organize the group projects. Once you submit a proposal, a folder
inside of the github repository here will be created and named after your project name in your
proposal. The README.md file you provided will be included, and will act as the starting
point for your final README.md file which gives a full description of your project (see next
section). One group member should create a fork of the repository, and add the other group
members as collaborators to that fork (see details here). Each group member must commit
changes to this forked repository while working on the project. While working individually,
I suggest creating your own branch of the forked repository and using pull requests to merge
your changes with the main branch of the fork. It is very important that you only make
changes inside of your group project folder and not in any other groups projects! (this would
result in a penalty). Once your project is complete, you will submit the project by creating
a pull request to the original repository by December 2.
1
3 Project Guidelines
Your grade will be assigned according to your group’s adherence to the following guidelines.
1. Solve a “real-world” problem using a learning algorithm which you implement your-
self. You may argue to use a build-in implementation if the algorithm involved is
overly complicated but necessary to the task. After solving the task with your own
implementation, you may also, in a separate notebook, use a built-in implementation
and compare performance.
2. Create an organized system of Jupyter notebooks and python scripts in your project
folder which work together to accomplish your task. Function definitions should be
written in python scripts and imported into your notebook when needed. Your project
folder should be well organized (e.g. perhaps Jupyter notebooks should be in a subdi-
rectory called ‘Notebooks’, python scripts should be in a subdirectory called ‘Scripts’,
and your data should be in a subdirectory called ‘Data’).
3. Provide a statistical argument that your model performs well not just on your dataset,
but it generalizes as well to new data (this should be explained in the README.md
file in your project folder, referencing the notebook files which show evidence of this).
This means you should compute an explicit approximation of the generalization error
as measured by the performance measure in your proposal (by correctly using training
data, validation data, and test data).
4. Your README.md file (located at the root of your project directory) should com-
pletely describe the project, what it accomplishes, how well it performs, and walk
through how a new user can use your model on their own data after cloning the repos-
itory. This means, by following instructions in your README.md file, I should easily
be able to run your algorithm on a new example not in your dataset and see that
the model accomplishes what it promised to accomplish (at least up to your claimed
performance measure).
5. The project needs to be your own original work performed during this quarter. Any
plagiarism will result in a grade of 0. There are many publicly available machine
learning projects online. Do not think you can just copy one of these and be done. All
code must be written by group members.
6. The project itself should be reasonably original and ambitious (it should really take 3
weeks of hard work) and the topic should be something you find personally interesting.
There are many project ideas that could be completed in an afternoon e.g. take an
often-used public housing dataset and find the best polynomial model to predict the
price, or use clustering/SVD to predict handwritten digits. Please do not just repeat
a classic model which is a common example found online or in textbooks.