STATS762-统计代写
时间:2023-04-19
STATS 762 Learning Objectives for Sets 1–6
This document lists everything a student should be able to achieve following the first half of
the course. Each question in a mid-semester test or exam should directly relate to one or more
of these objecives. These objectives were written by Ben and may not apply when others teach
the course.
Set 1
By the end of this handout, students should be able to
ˆ Write down a specification for a generalised linear model using equations, including via
matrix notation.
ˆ List the assumptions of the generalised linear model
ˆ Describe the following components of a generalised linear model:
– The response variable
– The response distribution
– The link function
– The explanatory variables
ˆ Fit generalised linear models in R using lm() and glm().
ˆ For a linear model, calculate the following in R by directly coding up the required equa-
tions, and describe what we can infer about the population from each:
– The estimated coefficients, β̂, given a design matrix X and a vector for the response
variable, Y .
– The estimated variance of the errors.
– The variance-covariance matrix for β̂.
– The residuals.
– Confidence intervals for each coefficient in β̂.
– A test statistic and p-value for a hypothesised value of a coefficient in β̂.
– Confidence intervals for the coefficients in β̂.
– Point predictions for the response variable, given some specified value(s) of the ex-
planatory variable(s).
– Prediction intervals for a response, given some specified values(s) of the explanatory
variable(s).
1
– A test statistic and p-value for an added-variable F -test. (For generalised linear
models, the equivalent test is the analysis of deviance.)
ˆ Use standard R functions (e.g., lm(), summary()) to extract all of the above for a linear
model.
ˆ Use standard R functions to do the equivalent steps for a generalised linear model, again
describing what we can infer from the population for each.
ˆ Describe the procedure of calculating β̂ for a generalised linear model.
ˆ Use the anova() function to carry out a series of hypothesis tests for a fitted model, and
interpret the output.
ˆ Define under- and overdispersion, and identify when we need to make corresponding ad-
justments to our models.
ˆ Describe the similarities and differences between standard generalised linear models and
their quasilikelihood counterparts. Describe what changes when we switch from a standard
model to a quasilikelihood model, and what stays the same.
ˆ Fit models with negative binomial responses using glm.nb() and interpret the output.
ˆ Define offsets and identify when they are required in a generalised linear model.
ˆ Fit a model with an offset, and interpret the output.
ˆ Summarise the inference obtained from a fitted model that can be used to answer questions
of interest.
Set 2
By the end of this handout, students should be able to
ˆ Identify when conducting a bootstrap is helpful, and explain why.
ˆ Describe how parametric and nonparametric bootstrapping works, and highlight the key
differences between the two, including their relative strengths and weaknesses.
ˆ Write R code to conduct bootstrapping for a generalised linear model.
ˆ Use a bootstrap procedure to calculate standard errors and confidence intervals, and carry
out hypothesis tests, for parameters (or functions of parameters) that are of interest.
Set 3
By the end of this handout, students should be able to
ˆ Define and explain the following terms: outliers, high leverage points, influential points,
multicollinearity.
ˆ Identify when outliers, high leverage points, influential points, and multicollinearity exist
in a data set using the diagnostic tools discussed in the lectures.
2
ˆ Directly calculate leverage, different types of residuals, and Cook’s distance for a linear
model in R.
ˆ Describe and discuss properties of the different types of residuals, and calculate residuals
directly in R for a linear model.
ˆ For a generalised linear model, directly calculate
– Pearson residuals for a generalised linear model, given an observed response and a
fitted value.
– Cook’s distance, given Pearson residuals and the hat matrix.
– Deviance change, given deviance and Pearson residuals, and the hat matrix.
ˆ Create and interpret GAM plots to test for curvature in a regression surface.
ˆ Use the deviance to test for goodness-of-fit for a GLM either using a chi-squared sampling
distribution or a parametric bootstrap, where appropriate.
ˆ Decide whether or not a fitted model is appropriate using the diagnostic techniques de-
scribed in this lecture set, and explain why or why not.
ˆ Describe what effect a violated assumption may have on any inference obtained from a
model.
ˆ Propose modifications to a model in order to fix any problems identified by a diagnostic
technique.
Set 4
By the end of this handout, students should be able to
ˆ Directly calculate AIC and BIC, given a model’s maximised log-likelihood, number of
parameters, and sample size.
ˆ Use AIC and BIC to assess the relative support of different models.
ˆ Discuss similarities and differences between information criteria such as AIC and BIC,
along with their strengths and weaknesses.
ˆ Carry out and interpret the output from model search strategies.
ˆ Discuss strengths and weaknesses of different search strategies.
Set 5
By the end of this handout, students should be able to
ˆ Describe descriptive models, causal models, and discuss their differences.
ˆ Create a causal diagram from a description of the direct effects that are believed to exist
amongst a set of variables.
3
ˆ Using a causal diagram, identify
– which variables have direct effects on others,
– which variables have indirect effects on others,
– which variables are confounders when considering the effect of one variable on another,
– which variables are colliders when considering the effect of one variable on another,
– which pathways are confounding pathways when considering the effect of one variable
on another, and
– which pathways are colliding pathways when considering the effect of one variable on
another.
ˆ Using a causal diagram, propose models that can estimate
– direct effects of explanatory variables on a response variable, and
– the total effect of a particular variable on a response variable.
ˆ Define what an effect modifier is, and propose models that are appropriate when an effect
of interest is affected by a modifier.
ˆ Fit models to estimate direct effects and total effects, and interpret these estimated effects.
ˆ Describe the impact of a missing variable on the inference obtained from a causal model.
Lab 1
Following this lab, students should be able to
ˆ Critique a model based on its description, identify when the fitted response distribution
is potentially inappropriate, and propose a better alternative.
Lab 2
Following this lab, students should be able to
ˆ Identify when a model might be inappropriate by visually inspecting the data, and com-
paring with simulated data from a fitted model.
ˆ Propose ways to improve a model when it is found to be inappropriate using this approach.
Lab 3
Following this lab, and using their understanding from both Labs 2 and 3, students should be
able to
ˆ Discuss strengths and weaknesses of parameteric and nonparametric bootstrapping.
4
Lab 4
Following this lab, students should be able to
ˆ Discuss scenarios for which ChatGPT is a useful tool in terms of using methods and
techniques covered in this course, and for which ChatGPT is unhelpful or misleading.
ˆ Provide examples of queries that fall into the helpful vs unhelpful/misleading categories.
Lab 5
Following this lab, students should be able to
ˆ Discuss the performance of information-theoretic criteria when presented with a model-
selection problem.
essay、essay代写