STAT8130/4030-R代写
时间:2024-05-22
THE AUSTRALIAN NATIONAL UNIVERSITY
RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS
STAT8130/4030
Generalised Linear Modelling
Final Project: Due at 3pm, Tuesday, 11/06/2024
1. This final project is worth 60% of your final grade and is compulsory.
2. Maximum marks: 60.
3. Project reports can only be submitted via Turnitin on the Wattle.
4. Please sign the declaration form on the Wattle and include it as the cover page
in your submission. Please be aware of the quality of the file when you are preparing
the submission, such that the file is legible to read.
5. File size limit for Turnitin submission: 50MB.
6. Several trials of submission are strongly recommended before the due date. If
there is any problem in your trials, please send an email to report before the due
date. Late submission will not be accepted and your final project will be marked 0.
Please prepare to submit your report at least 30 minutes before the end of due time.
7. While you may use course material, computer software, internet, or other
resources, you must complete the final project individually. Identical submissions
even only for one sentence are treated as cheating.
8. You can use any result, formula or statement from the course material without
proof. In fact, doing this will help your project.
1
DATA
This individual final project is designed to apply materials in this course to analyse
any one or two real-world datasets chosen by yourself. It is worth noting some broad
types of real data that are available without charge. For example:
• COVID19 data for Australia (https://www.covid19data.com.au/data-notes);
• Australian government data (https://data.gov.au);
• New South Wales government data (https://data.nsw.gov.au);
• Victoria government data (https://www.data.vic.gov.au);
• ACT government data (https://www.data.act.gov.au/browse);
• Australian government statistics and datasets (e.g., https://www.abs.gov.au,
https://www.abs.gov.au/statistics/microdata-tablebuilder);
• Australian central bank datasets (http://www.rba.gov.au);
• US Census data (https://www.census.gov/data.html);
• Federal Reserve Economic Data (FRED) (https://fred.stlouisfed.org/);
• Country level data sets provided by the National Bureau of Economic Research
(http://www.nber.org/data/);
• World Bank datasets (https://www.doingbusiness.org/en/data);
• United Nations (international) demographics data (http://data.un.org/); and
• ANU library (https://anulib.anu.edu.au/find-access/e-resources-databases),
e.g., DatAnalysis Premium, Factiva (global news database), Connect4.
You may also consider some well-established datasets. For instance, datasets
from academics (via their university websites), e.g., Ken French (http://mba.tuck.
dartmouth.edu/pages/faculty/ken.french/data_library.html); and datasets from
https://archive.ics.uci.edu. However, since these datasets are usually from
refereed journal articles/academic books, if you choose to use one or two of these
datasets, please clearly indicate what is the difference/novelty/
improvement in your analysis and report compared to the extant literature. You may
also consider data from your industrial experience (you do not need to submit data
but only report) if you have any. Note that the one or two datasets that you choose
2
cannot be the same as any datasets used in the lectures, tutorials, assignments and
other assessments of this course and your other courses.
Based on the one or two datasets that you choose, you need to consider at least
two types of models to fit your data from the list below (each bullet point can only
count as one type):
• Linear Mixed Effects Model;
• Binary Regression/Binomial Logistic Regression;
• Poisson Log-Linear Regression/Log-linear Regression with Extra-Poisson Vari-
ation;
• Multicategory Logistic Regression; and
• Gamma/Exponential Generalised Linear Model.
When you work on the fitting of the two types of models that you choose, you
may consider only one dataset but select two different variables as response variables
respectively, such that your two types of models can be applied to each of them; or you
can consider two datasets respectively corresponding to the two types of models that
you select. Note that the selection of the response variables needs to be meaningful
and useful in real practice. If you include the fitting results for only one type
of models in your report, you will be deducted 30 marks for this final
project.
REPORT (60 marks)
Report Format – PDF or Word Upload
Written reports for this project (10 pages maximum for the main manuscript and
20 pages maximum for the appendix based on the format below, and all the R code
should be relegated to the appendix) are expected to be submitted via Turnitin.
Turnitin similarity check will be conducted for all the submitted reports.
Please use Australian English spelling. All pages (uploaded in PDF or Word form)
must be as follows:
• Black type, or occasional coloured type for highlighting purposes;
• Single column;
3
• White A4 size paper with at least 0.5 cm margin on each side, top and bottom;
• Text must be size 12 point Times New Roman or an equivalent size before
converting to PDF format and must be legible to assessors;
• References and appendices only can be in 10 point Times New Roman or equiv-
alent.
Report Guideline
In this project, you need to submit a report by analysing one or two datasets and
considering at least two types of models to fit your data; see the details of “DATA”
section above. Your final project report is supposed to be formed as follows.
Main Manuscript (10 pages maximum)
Please make your main manuscript precise and concise because of the
10-page limit!
1. INTRODUCTION
State the objectives of the project and provide an adequate background of data.
This section may also include: variable descriptions, where your data come from,
what scientific question(s) that you can answer using your analysis in the following
sections, what contributions your real data analyses of the following sections possibly
have (in the literature or in real practice), etc.
2. DATA CHARACTERISTICS
This section may include: exploration data analysis (EDA), descriptive statistics,
etc. However, your analysis in this section should have a clearly connection
to or a motivation for the following model fitting section. Otherwise, you
may consider selecting only the important results to report and shortening
this section because of the 10-page limit for the main manuscript. Results
should be clear and concise.
This is an official report. Please do not paste any R code and R output
(from the console of R) in the main manuscript, otherwise it is really not
professional. If you want to report a table, please use a Word table but
not pasting R output (from the console of R)! If R code or R output (from
the console of R) appears in the main manuscript, you will be deducted
10 marks for this final project. Similar rules also apply for the following
sections in the main manuscript.
4
You may have many tables/figures to report, however, because of the
10-page limit, please only report the most important ones in the main
manuscript. Other useful tables/figures mentioned in the main manuscript
may be relegated to the appendix. You may also need to adjust the ta-
ble/figure size properly in order to satisfy the page limit. If you think
some tables/figures are not useful, please do not paste those in the report
(either in the main manuscript or in the appendix) at all. Similar rules
also apply for the following sections in the main manuscript.
You may summarise your results in tables and figures, however, please
use your words to analyse your results but not just stacking results in the
main manuscript, since it is an official report. Similar rules also apply for
the following sections in the main manuscript.
3. MODEL FITTING AND INTERPRETATION
Based on the one or two datasets that you choose, you need to consider at least
two types of models to fit your data from the list below (each bullet point can only
count as one type):
• Linear Mixed Effects Model;
• Binary Regression/Binomial Logistic Regression;
• Poisson Log-Linear Regression/Log-linear Regression with Extra-Poisson Vari-
ation;
• Multicategory Logistic Regression; and
• Gamma/Exponential Generalised Linear Model.
When you work on the fitting of the two types of models that you choose, you
may consider only one dataset but select two different variables as response variables
respectively, such that your two types of models can be applied to each of them; or you
can consider two datasets respectively corresponding to the two types of models that
you select. Note that the selection of the response variables needs to be meaningful
and useful in real practice. If you include the fitting results for only one type
of models in your report, you will be deducted 30 marks for this final
project.
For each type of models, you can use all the other available variables in the
dataset (instead of the response variable) as explanatory variables, or you can use
a subset of these variables/interaction terms of these variables/transformations of
these variables to accomplish your model fitting but you need to clearly indicate
5
which variables/interaction terms you use in the model. Then please explain the
reason why you choose all variables/a subset of variables/variables + some interaction
terms/transformations of variables.
Under each type of models, please select at least one fitted model to report.
Please explain how you obtain this model. In addition, please explain why you select
this model to report, for instance, the reasons can be any one or several from below:
(i) The reported model passes the model diagnostics but the other models may
not;
(ii) The AIC/BIC of the reported model is the smallest;
(iii) Some hypothesis testing shows that the reported model can be better than
the other models;
(iv) Some variables included in the reported model are important in the interpre-
tation of real practice, and hence cannot be eliminated;
(v) The interpretations of the interaction/square/cubic/polynomial terms in the
reported model may have some special meanings in real practice;
(vi) and more ...
Please also interpret and discuss the model fitting outcome for the fitted models
that you select to report. One example of interpretation is trying to report which
variables are statistically significant but can this significance be explained by the
background of data? You can also consider other discussions and interpretations as
long as they are practically useful based on the data background.
Instead of the above compulsory parts, you may also report additional model
fitting and interpretations as long as they are useful in real practice and are within
the page limit.
4. LIMITATION
Please clearly discuss the possible limitations of the fitted models that you select
to report, e.g., model diagnostics, problems in real practice, etc. If you cannot figure
out any, please give a reason why the fitted models that you select to report are
perfect.
5. CONCLUSION
Please give a short paragraph to summarise your findings of this project.
6
Appendix (20 pages maximum)
6. APPENDIX
This section should include: R code (NOT R output!). This section may include:
additional important tables/figures which are mentioned in the main manuscript, etc.
7. REFERENCE
Please ensure that every reference (if you have any) cited in the text of the main
manuscript is also present in the reference list of Appendix (and vice versa).
7
Report Rubric
Your score for the final project report will be calculated using this rubric. You will
be evaluated on a scale of marks on the criteria below. The following table indicates
the meaning of each of these scores:
1-2 marks 3-4 marks 5-6 marks
Very bad Poor Fair
Minimal or no effort. Needs improvement. OK, but with many problems.
7-8 marks 9-10 marks 11-12 marks
Good Very good Excellent
OK, but several major problems. Only minor problems. No problems.
1. The project is comprehensive and complete.
e.g. Have the required steps in the instruction been followed? Has every aspect of
the project been thought thoroughly and explained? Is the workflow complete with
a clear logic?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
2. The report is well-written.
e.g. Is the analysis cogent? Is the report concise and precise? Is the summary
of results accurate and neat? Does the grammar mistake affect the understanding of
this report? Is the report organised well? Is the transition smooth in the report to
show the logic of analysis?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
3. The analysis is correct.
e.g. Is the content statistically correct? Are technical terms used properly? Is the
analysis consistent with the principles discussed in class? Is the methodology proper
for the data? Is the method relevant to this course?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
8
4. The interpretation is insightful.
e.g. Do you bring insight to the conclusions reached? Is the analysis accurately
addressed based on the background of data? Does the report show a good interpre-
tation of output?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
5. Overall impression.
e.g. Does the report have contributions in real practice? Is the interpretation
of statistical analysis useful in reality? Does the analysis address some scientific
questions of interest? Does the report show effort?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks