r代写-MATH-575A
时间:2021-07-01
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 1/9
Final Project Description

Where to find data?
Grading
Examples for inspiration
Important Dates
About project proposal
About exploratory analysis
About blog posts
About the analysis document
A bunch of interested data sets available online
Final Project
Math 488P/575A: Principles of Data Science
You final project is to do a novel data analysis to answer a question and write about it. This can be
interpreted broadly and the requirements are discussed below.
The rough outline of the project is:
Start with a question.
Find data that might get at that question.
Play around with the data.
Attempt to answer the question.
Iterate.
Communicate.
Your project should have one significant aspect to it. Examples might include,
put together a novel data set (e.g. scrape something from the web)
answer an interesting question
a “sophisticated” statistical/machine learning model
a really compelling visualization
Final deliverable
Summer 2021 Principles of Data Science (MATH-575… HZ
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 2/9
a really compelling visualization
You can work solo or work in groups of up to 3 people. I can generate an initial non-binding group
assignment. You could take my recommendation or totally ignore it and find your own
teammates. See below for grading details and the group work policies.
Final deliverable
There are two final deliverable: a blog post and the analysis document. The final project is due
Tuesday July 6th at 11:59pm.
Blog post
Write a blog post in R Markdown aimed at a general audience (think 538
(https://fivethirtyeight.com/)).
should be 1000-1500 words
have at least two figures
See the section “About blog post” below.
Analysis document
All analysis document should be posted and well documented. The main technical results (plots,
regressions, etc) should be written up in a well documented, supporting technical document
(using R Markdown). You might also include R scripts for cleaning data or helper functions.
See the section “About analysis document” below.
Where to find data?
You can find a seriously large amount of data online. I encourage you to “gather your own data
online” by doing something like scraping Twitter (http://varianceexplained.org/r/trump-tweets/)
though this is not expected.
There are some obvious places to look like data.gov (data.gov?
_&d2lSessionVal=2GuukUAXlEW744t2vjwVmpaRG&ou=53741). I’ve put together a collection of
interesting data sets you can find online at the bottom of this page.
If you are already doing research with a data set you are welcome to use it, but you have to do
something new.
Grading
Your team’s grade will be 50% blog post and 50% analysis document. Your individual grade will be
weighted by your team member’s reviews.
The project will be graded on
Communication
Cl iti (b th i th bl t d i th ti t h i l d t)
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 3/9
Clear writing (both in the blog post and in the supporting technical document)
Document code
Accuracy
Did you use reasonable statistics?
Does your final code run?
How well do your findings support your conclusions? Note that “The evidence is
inconclusive” is a very possible, and completely acceptable answer.
Ambition
The project should take some creativity and eort i.e. should be more than a matter
of copy/pasting code.
Groupwork
You will anonymously rate your team members and yourself on team citizenship
(e.g. attends meetings, does what they promise, etc), not on ability.
Final grades will be adjusted based on peer ratings.
Individual grades are based on the project grade and a multiplied computed
from the peer ratings. This multiplier will range from 1.05 (for people who go
above and beyond) down to 0 (for people who don’t participate).
As a last resort you may fire a team member who refuses to participate. Please
contact the instructor well before it comes to this.
If you are fired you must start a new project and your peer rating multiplier will
take a hit.
Examples for inspiration
These are some examples of interesting analyses. Many of these examples would take longer than
you have for the final project. These are meant to be inspirations but not expectations.
Blog posts from polygraph (https://pudding.cool/)
David Robinson’s text analysis of Trump tweets (http://varianceexplained.org/r/trump-
tweets/)
genre classification (http://josh-jacobson.github.io/genre-classification/)
538 on how baby boomers get high (https://fivethirtyeight.com/datalab/how-baby-
boomers-get-high/)
538 on Bob Ross (https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-
bob-ross/)
see this page
(https://d1b10bmlvqabco.cloudfront.net/attach/icf0cypdc3243c/hcwsitww5k95ka/ii7mfqhc946l/CS1
for links to the final final projects from CS109
(http://cs109.github.io/2015/pages/projects.html) (warning: a couple of links are broken).
Important Dates
Initial project proposal: due 6/23 at 11:59pm
Describe your proposed project
Who are on your team?
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 4/9
Who are on your team?
What question(s) will you try to answer?
What data sets will you use? You should have already found and taken a first
look at the data set
How will you use the data to try to answer the question?
Project proposals should be submitted as Piazza questions for all other students to
see. I will make comments to these proposals. Note that these comments are meant
to help you to refine your goals. You are not obligated to complete all tasks that you
promised in the proposal.
Exploratory analysis: due 6/30 at 11:59pm
Write up your initial findings in an R Markdown document.
You should have at least N plots (still deciding N, but at least N should be greater
than 3).
Analysis document: due 7/6 at 11:59pm
Write up your technical results in an R Markdown document.
Provide detailed comments so that it is clear to me what you have done.
Put all code, data, etc together.
Blog post: due 7/6 at 11:59pm
should be 1000-1500 words
have at least two figures
target general audience
About project proposal
Write a project proposal with your team.
You should brainstorm a long list of ideas, then narrow it down to a couple that are feasible given
your knowledge of R, the time constraints, and the available data. Write the proposal for one of
these ideas, but you should keep a couple backups in case the original project doesn’t work out
for some reason.
The point of this exercise it to think though a reasonable project (and get feedback from the
instructor). You will not be held to doing exactly what you say you will do in this proposal; expect
to adapt your project as you continue to work on it ( just ask Robert Burns
(https://en.wiktionary.org/wiki/best_laid_plans) or Mike Tyson (http://articles.sun-
sentinel.com/2012-11-09/sports/sfl-mike-tyson-explains-one-of-his-most-famous-quotes-
20121109_1_mike-tyson-undisputed-truth-famous-quotes).) The more you put into the proposal,
however, the better your life will be 2 weeks from now.
Deliverable
Write a one page proposal posted on Piazza which discusses:
What questions will you try to answer? List 5-10 possible questions.
What data sets will you use? You should have already found and taken a first look at the
data sets. Make sure the data is clean enough to reasonably use and actually has the
information content to answer your questions.
What are some things you will do with the data to get at your questions? For example what
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 5/9
What are some things you will do with the data to get at your questions? For example, what
are some plots you might make.
Include a list of 3 backup ideas you brainstormed, with a couple bullet points of detail. Just in
case.
Advice
Meet once very early for an initial brainstorm. Have everyone go o and explore some ideas. Meet
again for a final brainstorm. Then write the proposal.
Look at the data sets you plan on using to make sure they aren’t awful. If you plan on creating a
data set (e.g. by scraping a website) convince me this will be feasible (you don’t have to have the
scraper working perfectly).
About exploratory analysis
By this point you should have done an exploratory analysis and have initial results. What this
means will vary from project to project so there aren’t many formal requirements. The point of
this is to: take stock of where you are, show me that you have made good progress and convince
me you will be able to finish the project.
Basically we expect to see that you
have the data
asked/answered a bunch of questions by making lots of plots and computing statistics
narrowed down the scope of the project to something coherent and manageable
have some initial results
What “initial results” means will also vary from project to project. For example, if the project is to
build a model to predict Y based on X then you should have a looked at a few simple models
Deliverables
Gather everything into one folder called n_eda (where n = your group number, which I will assign
to your group). This folder should have four subfolders: /summary, _results, /everything, and
/data. Please zip the n_eda folder and submit it to Google Form that I will set up.
1. Write a summary of what you have tried, what you found and what you have le to do. This
document should be about a page and can be mostly bullet points. Put this document into
a folder called summary.
2. Have some form of initial results. This could be a .Rmd document with a couple plots. The
initial results should be short and to the point. Put the initial results into a folder called
initial_results.
3. Include the rest of the work you have done. Simply gather all the scripts/.Rmd files you
have so far from each team member and put them into one folder (called everything). This
is just so I can see all the work you have done.
4. Include the data.
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 6/9
About blog posts
Write a blog post explaining what you found. It should answer:
1. What is the question(s) you tried to answer? Why should someone care?
2. What is the data/how did you get it?
3. How did you answer the questions (e.g. what statistical techniques, etc)?
4. What are your findings?
Points 1 and 4 are the most important for the blog post (your analysis document focuses on 2 and
3). This blog post should be aimed at a general audience who is not afraid of graphs/a little data
(think 538 (https://fivethirtyeight.com/)). The vast majority of the technical details should be in
the analysis document.
Requirements for the blog post
The post should be 1000-1500 words.
Include a title and your names.
Don’t display R code unless it is used to convey a point.
There should be at least 2 visualizations.
Make sure to describe the figures somewhere in the text.
These plots should be communicatory plots, not exploratory plots.
The post should be submitted in .html (probably written in R Markdown)
Submission
Include everything that went into creating this plot post in a folder called n_blog (where n = your
group number). You can name the blog post whatever you want, just make sure it is a .html
document. Please compress n_blog and submit it to the Google Form to be set up.
I plan on posting these blog posts and your analyses on the internet. If you do not want your
name associated with the post (or if you don’t want even an anonymous version of the post
displayed to the outside world on the internet), please let me as soon as possible.
Grading
Communication (80%)
Does your main point come through (e.g. see here
(http://www.storytellingwithdata.com/blog/2017/3/22/so-what))?
Is the document written well and clearly? Yes spelling and grammar matter.
Quality of the figures.
Eective communication? Ask your parents or friends to read your post and have
them to give you feedback.
Accuracy (10%)
Do you accurately convey a rigorous argument?
Ambition (10%)
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 7/9
Bonus points
Your team will get up to 5 extra points on the final project grade if you do the following:
make a webpage using github pages (see here (https://pages.github.com/))
Github pages are very easy to make. The webpage should showo all aspects of your project
including the blog post and technical analysis. The better this page is the more points you will get.
About the analysis document
Using R Markdown write a document called process_notebook describing process you used to
conduct your analysis (note this description is borrowed from here
(http://cs109.github.io/2015/pages/projects.html)). The process_notebook is the core
document for the analysis. It should show the code for the entire analysis you did and include text
justifying decisions you made (e.g. why did you remove certain observations, why median instead
of mean, how did you select the variables for a model, etc). The target audience is: someone who
knows R/statistics, but is unfamiliar with your project (i.e. the graders or even yourself three
months from now).
The process_notebook should detail the steps you took to develop a solution. This includes
where you got the data, other solutions you tried, the statistical methods you chose and your
findings. How you got to your conclusions is as important as the conclusions. This is where you
can show all the work you put into this project. You should have lots of visualizations in the
notebooks. Your discussion should hit on the following topics (depending on the project some of
these will be more important than others):
Abstract: one paragraph at the very beginning of the document summarizing everything.
Overview and Motivation: Provide an overview of the project goals and the motivation for
it. Consider that this will be read by people who did not see your project proposal.
Related Work: Anything that inspired you, such as a paper, a web site, or something we
discussed in class.
Initial Questions: What questions are you trying to answer? How did these questions evolve
over the course of the project? What new questions did you consider in the course of your
analysis?
Data: Source, scraping method, cleanup, storage, etc.
Exploratory Data Analysis: What visualizations did you use to look at your data in dierent
ways? What are the dierent statistical methods you considered? Justify the decisions you
made, and show any major changes to your ideas. How did you reach these conclusions?
Final Analysis: What did you learn about the data? How did you answer the questions? How
can you justify your answers?
Make sure the reader can answer the question “What is the point?” (e.g. see here
(http://www.storytellingwithdata.com/blog/2017/3/22/so-what)).
Sub ission
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 8/9
Download Print
Submission
Gather everything into a folder called n_analysis (where n = your group number). This folder
should have three sub-folders: /data, /results, /everything_else. Compress n_analysis and submit
to Google Form.
1. /results: The /results folder should have a R Markdown document called process_notebook
(include both the .Rmd and .html documents) and possibly several supporting .R scripts for
helper functions you wrote.
If you write helper functions (recommended) you should include them in separate .R
scripts. The .Rmd document should assume the working directory is the n_analysis folder
and should load the data accordingly (i.e. read_csv(‘data/my_cool_dataset.csv’)). We may
knit the process_notebook.Rmd and it should run!
The process_notebook should be mostly a matter of copy/pasting your analysis into a .Rmd
document then adding discussion (discussion should be in text, not in comments).
2. /data: Put the data sets you used in this folder.
If you started with a messy data set and did significant processing then you should include
both the raw and the cleaned data sets in separate sub-folders i.e. /data/raw/ and
/data/clean/.
3. /everything_else: You probably did a lot of stu that didn’t make it in your final analysis.
Include anything you did that you want to get credit for in this folder. If you have a lot of
material in here that you want me to look at then you should include a text document in
this folder pointing us to what you want us to look at.
Grading
The analysis is 50% of the final project grade. The main criteria it is graded on will be: accuracy,
ambition, and communication.
Accuracy (70%)
Did you do a correct statistical analysis?
Does your code run?
How well do your findings support your conclusions? The evidence is inconclusive" is
a very possible, and completely acceptable answer.
Ambition (20%)
Did h th i t l it d l l f dii lt f j t?
You have viewed this topic
Activity Details
7/1/2021 Final Project Description - Summer 2021 Principles of Data Science (MATH-575A-01, MATH-488P-01, MATH-488P-02, MATH-590S-01)
https://brightspace.binghamton.edu/d2l/le/content/53741/viewContent/60761/View 9/9
Last Visited Jun 30, 2021 8:44 PM

















































































































































































































































































































学霸联盟


essay、essay代写