MATH5945-math5946代写-Assignment 2
时间:2023-10-21
MATH5945: Categorical Data Analysis
Term 3, 2023
Assignment 2
Submission deadline: Thursday 26 October, 2:05pm
Deliverables: 2 files uploaded to Moodle: (1) PDF file of your worked solutions, and (2)
SAS file forALL computations. Files names should be surname firstname z123456789 ASS2.
Assignment length: There is a 5 page limit and minimum 12pt font size. Any pages
exceeding this limit or submissions with smaller font sizes will not be marked. Handwritten
assignments will not be accepted. This does not include a SAS file of your code. Your
document should begin with the Plagiarism Statement below (copy-and-paste it).
SAS code: All computations must be performed using SAS. Your SAS code must run as
is and I should not need to modify your code in any way to make it work. You may create
a library to import data, but any other code should only use the WORK library (you may
assume data files of the same name are in my WORK library). SAS should be used for
computing only and answers given only within SAS code will not be marked.
Penalties: Failure to adhere to instructions will result in a minimum 5% mark reduction.
Name: Student Number:
I declare that this assessment item is my own work, except where acknowledged,
and has not been submitted for academic credit elsewhere, and acknowledge that
the assessor of this item may, for the purpose of assessing this item:
• Reproduce this assessment item and provide a copy to another member of the
University; and/or,
• Communicate a copy of this assessment item to a plagiarism checking service
(which may then retain a copy of the assessment item on its database for the
purpose of future plagiarism checking).
I certify that I have read and understood the University Rules in respect of Student
Academic Misconduct.
Signed: Date:
1
1. Suppose that you and a friend have a discussion about movies you like. You start by
naming a movie you like and, if your friend has seen that movie, your friend can either
agree with you or not. You take turns going first until you have rated 25 movies in
total. You repeat this process but instead of naming movies you like, you both name
movies you dislike. In total, you should have jointly rated n = 50 movies and this data
can be organised in the 2× 2 table shown below.
You
Friend Like Dislike Total
Like a b n1
Dislike c d n2
m1 m2 n
For this scenario, we are interested in how much you and your friend agree about
movies and to test, on average, whether you agree about movies. Note that this is
fundamentally different than the usual null hypothesis significance testing where the
goal would be to reject H0. Instead, this problem will take you through the steps for
developing a measure of agreement often called Cohen’s κ.
(a) First, using the below representation of the 2× 2 table probability distribution,
You
Friend Like Dislike Total
Like π11 π12 π1+
Dislike π21 π22 π2+
π+1 π+2 π++
determine πa which is the probability you and your friend agree. This is also
known as accuracy or the observed proportionate agreement.
(b) Note that there will some amount of agreement, i.e., πa > 0, even if the two
ratings are independent. So, determine πe the expected probability of agreement
under an assumption of independence.
(c) The idea behind the κ statistic
κ =
πa − πe
1− πe
is to measure agreement while accounting for any chance agreement. Suggest
estimators for πa and πe and, therefore, κ.
(d) Derive the variance for the estimator κˆ assuming πˆe is a constant, i.e., only πˆa
has variability, and determine its asymptotic distribution.
(e) With a friend, family member or classmate, collect data on 50 movies as described
above. Input this data into SAS and compute the κ statistic and 95% confidence
interval. Do you think there is evidence that you agree or disagree about movies?
2
2. In Lecture 5, we demonstrated that for independent Poisson sampling for an r × c
contingency table, the maximum likelihood estimate was µˆij = xij or, equivalently,
πˆij = xij/n. When constrained by a null hypothesis of independence, the MLE is
µˆij = xi+x+j/x++ or, equivalently, πij = πi+π+j.
(a) Derive the unrestricted MLE’s for πij under multinomial sampling
(b) Using your results from part (a), deduce that the generalised likelihood ratio
statistic G2 for Poisson and multinomial sampling schemes are identical.
(some hints are given in the lecture notes)
3. The following is a five-way table summarising a survey on alcohol, cigarette and mar-
ijuana use about high school seniors by their gender and race.
Marijuana Use
White Other
Alcohol Cigarette Female Male Female Male
Use Use Yes No Yes No Yes No Yes No
Yes Yes 405 268 453 228 23 23 30 19
No 13 214 28 201 2 19 1 18
No Yes 1 17 1 17 0 1 1 8
No 1 117 1 133 0 12 0 17
We would like to fit log-linear models to the data in this table. For ease of interpreta-
tion, denote these variables as A (alcohol), C (cigarette), M (marijuana), G (gender),
and R (race), respectively, in model shorthand and use numbers 1, 2 to identify the
levels of a variable. For example, τA1 represents past alcohol use.
(a) Create a SAS data file for this data in this table. As a suggestion, you can use a
combination of formats and do-loops to more efficiently read in the data.
(b) Check the goodness of fit of the following hierarchical models:
• (M1) main effects only
• (M2) all two-way interaction terms
• (M3) all three-way interaction terms
What is the lowest order model that reasonably fits the data? Give reasons.
(c) What is the error message if you fit the four- and five-way interaction terms?
What may be causing this?
(d) Based on the model chosen in part (a), perform forward selection using partitioned
G2 statistics to choose a best model. Justify your steps in a manner similar to the
lecture notes.
essay、essay代写