59PM ET-无代写
时间:2023-04-09
Final Project Report
Final Data Analysis Report
Due: April 12, 2023 by 11:59PM ET on Quercus
No late submissions will be accepted
Goal of the Assessment:
Part 2 of the Final Project is your opportunity to demonstrate all that you
have learned throughout the course. This will be done by showing the
teaching team that you can use the methods and techniques learned in the
course appropriately. You can use the feedback that you have received in
Part 1 to write a report that is in a common research paper format (IMRD:
Introduction, Methods, Results, Discussion). Writing these kinds of reports is
likely something that, as a graduate student or a statistician working in
industry, you will find yourself doing occasionally.
Since this assignment is used to assess how familiar you are with the use of
the tools and methods from this course only, you should NOT use materials
that were not covered in this course. Instead, focus on showing us how much
you know about everything we have discussed throughout the term.
It can also be used as part of a dossier when applying to jobs to showcase
your abilities as a statistician and data analyst.
General Instructions:
Using only methods and techniques presented in the lecture slides
throughout the term, you are tasked with answering your proposed research
question by creating the ‘best’ linear regression model that meets the
requirements of your research question. You will then need to write a report
(details below) that (i) introduces your research question and presents some
background, (ii) outlines the steps in your analysis that you followed to reach
the ‘best’ model, (iii) presents the results of your analysis and describes and
justifies the decisions you made, and finally (iv) discusses the final model, its
interpretation and its limitations in terms of its ability to meet your research
goals. It should be made clear whether you are aiming for a model that
makes good predictions, or a model that is more descriptive and easier to
interpret, or some combination of both.
The work you have put into Part 1 of the final project should help you
structure your report in a professional and easy-to-read fashion, as well as
provide you with a good beginning to your introduction section. You may
want to consider adding some additional background research or more
discussion about how your research question is important and different from
the background you present. The EDA portion of part 1 should be helpful in
writing the beginning of the results section, where you display the
characteristics of the data you will use to answer your question.
How to present your final report:
Once you have decided upon the ‘best’ model to fulfill the goal of the
project, you must write up a short scientific report. There should be 4 main
sections of your report:
• Introduction section: where you introduce the purpose
and relevance/importance of the project and provide some
relevant background information on the topic (no results or data
should be presented here).
• Methods section: where you describe and explain the
methods, tools and techniques used to arrive at your final model
(no results or data should be presented here, but you can tell us
where you found your data and what variables it contains).
• Results section: where you present a numerical/graphical
description of your study sample and important results that led
you to make crucial decisions in building your model (following
the methods you outline in the earlier section), followed by the
final model and any other important results
• Discussion section: where you interpret your final model
and describe why it answers the research question and why it is
important, as well as discuss any limitations that still exist based
on your results.
You may use tables and plots to help present your results, but they must be
relevant and well-thought-out to convey as much information as possible
without being too overwhelming or confusing. When explaining your
methods, try to avoid just stating that you used a specific method, but add an
explanation for how it is used to achieve a specific task. When presenting
your results, avoid repeating exactly what you wrote in your methods
section. Instead, focus on the results of the process you described earlier,
and use numerical values/graphical results to support the decisions you
made in arriving at your final model. See the rubric for more information
regarding the various report components.
If you want more information about how to structure your report and what
should be contained in each section, see this cheat sheet and this outline for
reports (you may ignore the abstract portion since you do not need one).
Note that not all the elements in these resources need to be included in your
report. But you can use these to better understand how to structure your
submission.
Finally, if you use any external resources outside of the lecture slides, e.g. to
provide background on your topic, you should include a reference section at
the end of your report. You may follow APA citation styles to help format
your references. For some resources on how to cite, see the library page on
citations.
What to do if you want to change your dataset or research question:
If you wish to change your dataset or research question from what was
originally proposed in Part 1, you are allowed to do so. However, you will
need to provide a written statement that proposes the change you wish to
make. In order to change your dataset or research question, you will need to
submit a 1-page document (to be submitted by April 6 at 11:59PM ET on
Quercus) that answers the following two questions:
1. Why are you changing your topic or dataset? Elaborate on
what made your original dataset or topic not appropriate for the
final project.
2. What makes your new topic and/or dataset more
appropriate than the previous one? Be sure to clearly state your
new research question and provide a short, written description of
where you located your dataset and what information it contains.
The instructor will then approve or provide suggestions to improve your new
dataset/research question.
Technical Requirements of the Final Report:
Your report should be typed using whatever software you prefer but must be
saved and submitted as a PDF or .docx file on Quercus. Your report must
meet the following requirements:
• Font: 12-point font in a style similar to Times New Roman
(this is the default in R Markdown)
• Spacing: single-spaced
• Word count: up to a maximum of 1200 words in total (this
does not include captions on figures and tables, however, you
should also not make captions excessively long or contain
information that isn’t mentioned in the main text). We will still
accept a report that exceeds the word limit by no more than 200
words.
• Number of tables/figures in the main report: 5 in total,
but you may use any combination of tables and figures
• Figures and table captions: all figures and tables included
should include a caption that describes what is being presented
(caption not included in the word count).
• Captions should not contain information that is not
also discussed in the main report
• Figure properties:
• All plots should have an appropriate title and axis
labels, avoiding the use of variable names as they appear
in the dataset
• A figure may include multiple individual plots but
they should be related to each other and make sense as to
why they are being presented together
• Avoid having too many plots in the same
figure to ensure that they are legible and clear.
• Reference list or bibliography at the end of the report (will
not count towards word count), using appropriate citation style
• Appendix: you may add an appendix at the end of your
report to include some additional tables or figures that were not
important enough to be part of the main report, but still relevant
to your analysis:
• up to 3 additional tables/figures but they should
only be included if they are relevant to the analysis and
are referred to in the main text.
• R code: In a separate file (i.e. RMD file), you should upload
your cleaned and complete version of the R code that was used to
conduct your analysis. The R code should be well-organized and
commented appropriately to indicate what each line/section of
code is doing.
Checklist for submitting final project report:
1. Your final written report which follows the requirements
above.
2. Your R code that shows your complete analysis (this will be
used to verify the results displayed in your written report and will
not be assessed for content).
Things to keep in mind while writing your final report:
• You do not need to write out the results of every step you
took in your analysis as this will make your report too long.
• Instead, focus on summarizing the most important
results, especially where a big decision was made. You
need to justify it any big decisions.
• For the rest of your results, very short mentions of
the process with a brief piece of evidence provided are
enough to allow your reader to follow your analysis and
understand how you arrived at the final model.
• Rather than presenting the results of each step separately
(e.g creating separate tables for each), consider putting together
one larger table that you can refer to in your discussion of many
steps in your analysis so that you don’t use too much space
• For example, if you are selecting between a few
different models, you could consider presenting a table
that includes many different summaries of the fit of each
model and refer to each part as needed in the text, instead
of making individual tables for each component.
• Avoid using R output taken directly from R/RStudio.
Instead create your own tables where you select only the relevant
pieces of the output to display.
• Generally, the methods and results sections tend to be the
longest sections, while the introduction and discussion tend to be
shorter.
• Keep this in mind when deciding how much
background to provide in your introduction. Often just a
paragraph or two is plenty, given the word limits in this
project.
• However, make sure you leave yourself enough
space for a solid discussion where you can discuss the
impact of the limitations that may exist in your model.