sas代写-TASK 1
时间:2022-08-20
ASSESSMENT TASK 1 Quiz 2
Total Marks – 85 marks (15%)
Dataset Description
The data set being analysed comprises the study of 49 candidates for recruitment to be emergency
health workers. Each recruit was given a psychological assessment on the following psychometric
traits (temperament and character inventory traits, TCIs) of

• Novelty seeking (NS) denoted by NSTCI
• Harm Avoidance (HA) trait denoted by HATCI
• Reward dependence (RD) trait denoted by RDTCI
• Persistence denoted (P) by PTCI

The aim of the study was to establish the best traits and their combination for the optimal choice of
recruits for the job.
Question 1:
Use the study context described above and the output of PROC UNIVARIATE below, its plots and
numerical values, to answer the following questions.
a) What are p and n in this multivariate data set analytic context?
b) Write down the vector of sample means for the multivariate data (NS, HA, RD, P).
c) On which of the four traits (TCIs) are the individuals scoring highest on? Justify your answers
by numerical summary values shown in the output below.
d) On which of the four traits (TCIs) are the individuals scoring lowest on? Justify your answers
by numerical summary values shown in the output below.
e) For each variable (or trait, TCI) identify which individual in the study is the highest ranked
individual and give their actual attained value for the trait.
f) For each variable (or trait, TCI) identify which individual in the study is the lowest ranked
individual and give their actual attained value for the trait.
g) Which of the variables or traits (TCIs) are normally distributed, if any? Justify your answer
based on the plots and output provided, state which plot and why.
h) Briefly explain the concept of normal quartiles and how they are used below.
i) Write down the vector of sample medians for the multivariate data (NS, HA, RD, P).
0.5+1.5+1+1+1.5+1.5 +4+2+2 = 15 marks
Code:





2


Outputs:
NSTCI:




3


HATCI:





4


RDTCI:






5


PTCI:




6


Question 2:
Using the SAS code and outputs below, answer the following questions.
a) Which pair of variables is the most positively correlated and gives the associated correlation
and p-value?
b) Which pair of variables is the most negatively correlated (if any) otherwise the least
correlated and give the associated correlation and p-value?
c) What is the null hypothesis used in PROC CORR?
d) Name the parameter being tested and its value under the null in PROC CORR.
1+1+2+1 = 5 marks
Code:

Output:


Code:

Output:








7


Question 3:

The ODS graphic code below uses PROC CORR to create both 80% and 90% prediction ellipses for the
specified variables.

a) What is the ellipse’s centroid written as a numerical vector?
b) How many outliers are there with respect to the 80% and the 90% prediction ellipse?
c) Hypothetically, if the prediction ellipses were created for 70% and 90%, how would the
coverage change?
d) Briefly explain what alpha (α) relates to with respect to the prediction ellipses drawn.
e) Using the outputs in Question 1 and Question 2 calculate Hotelling’s T2 statistic which is
traditionally used to test for a hypothetical null value for μ’0 = [2.82, 3.36].
HINT: Give the equation for T2 and show all your working, formula and the calculated value
showing all your steps.
2+2+2+2+7 =15 marks


Code:

Output:


8


Question 4:
Using the 2 sets of codes and outputs below answer the following questions succinctly.
a) What are the aims of the 2 sets of codes and the analysis performed in terms of the
variables analysed?
b) Write down the steps and what the 2 proc procedures do below with some explanation.
c) What does mahala denote and write down its mathematical formula?
d) Report how many observations are outside the ellipse, and how many observations are
smaller and inside the ellipse.
e) What conclusions can you make based on the number of observations inside and outside the
ellipse?
f) Explain briefly how the χ22(0.5) =1.39 value in the SAS code relates to distribution in this
procedure?
g) What is p for the mathematical formulation used in the code below for the analyses in this
question?
2+3+2+2+2+3+1 = 15 marks
Code:

Output:

Code:

Output:
9


Question 5:
Use the SAS code and the output below to answer the following questions.
a) What variables are being analysed in the procedure below?
b) Explain the steps and what the procedure does in the code below.
c) What conclusions can you make from the plot below? Give justification for your answer.
1+5+4= 10 marks
Code:

Output:

Question 6:
From the sample mean vector and sample covariance matrix obtained in Question 1 and Question 2
derive and show the formula and your working out to determine the mean and the variance of the
following new difference variable “HATCI - PTCI”.
10 marks
10


Question 7:
(Part 1)
Use the code and output below to answer the following questions.
a) Which difference variable of the newly created variables is most positively correlated (if any)
to which of the original TCI variables of (NSTCI, HATCI, RDTCI, PTCI)? Give the associated
correlation and p-value.
b) Which difference variable of the newly created variables is most negatively correlated (if
any) to which of the original TCI variables of (NSTCI, HATCI, RDTCI, PTCI)? Give the associated
correlation and p-value.
c) Which pair of difference variables is most positively correlated to which other difference
variable, and give the associated correlation and p-value? Is the correlation statistically
significant and why? Explain why you think a positive correlation is to be expected in this
case, or not.
d) Which pair of difference variables is most negatively correlated (if any) to which other
difference variable and give the associated correlation and p-value? Is the correlation
statistically significant and why? Explain why you think a negative correlation is to be
expected in this case, or not.
e) What is p for the mathematical formulation used in the SAS code for the analysis in this
question
1+1+3+3+1 = 9 marks
Dataset:
The dataset called “Dataset_new” is created below. This dataset has three newly created variables,
“x1_x2”, “x1_x4”, and “x2_x3”.

Code:


11


Question 7:
(Part 2)
What is being tested for in the following code and analysis?
a) What is p for the mathematical formulation used in the code below for the analysis?
b) What conclusions can you make from the plot below? Give justification for your answer.
c) Compare the plot produced by proc sgplot data= chiplot at the end of this question below
with the plot reported in question 5. What can you conclude about the distributions by
comparing the plots?
0.5+3+2.5 = 6 marks
Code:
essay、essay代写