xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

r studio代写-MATH3811/MATH3911

时间：2021-04-08

THE UNIVERSITY OF NEW SOUTH WALES

DEPARTMENT OF STATISTICS

MATH3811/MATH3911- Statistical Inference/Higher Statistical Inference

ASSIGNMENT 2

Please, add a cover page containing a copy of your ID card, write with your own handwriting:

“I declare that this assignment is my own work, except where acknowledged and I have read

and understood the University rules regarding Academic Misconduct”, and sign it.

Assignment due: Friday, 16th April 2021, 5 pm at the latest.

Math3811: Attempt the first four questions. Math3911: Attempt all questions.

1. Suppose X1, X2, . . . , Xn are independent and identically distributed random variables from

N(µ,20) (each with a density f(x;µ) =

1p

2⇡0

e

(xµ)2

220 ). Suppose that 0 is known.

a) Argue that the joint density ofX1, X2, . . . , Xn has monotone likelihood ratio in

Pn

i=1Xi.

b) Derive the UMP unbiased size ↵ = 0.05 test '⇤ of H0 : µ = µ0 versus H1 : µ 6= µ0.

c) Show that the power function of this test is

Eµ'

⇤ = 1

✓

1.96

p

n(µ µ0)

0

◆

+

✓

1.96

p

n(µ µ0)

0

◆

with denoting the cdf of the standard normal distribution.

d) Set n = 10,0 = 2, µ0 = 3. Evaluate numerically the power function for µ = 1, 2, 3, 4, 5

and draw its graph on the real axis using R.

e) Calculate the density fX(3)(x) of the third order statistic X(3) under H0. Hence find

numerically P (X(3) < 2). (You could use the integrate function in R ).

2. In a sequence of consecutive years 1, 2, . . . , T, an annual number of high-risk events is

recorded by a bank. The random number Nt of high-risk events in a given year is modelled

via Poisson() distribution. This gives a sequence of independent counts n1, n2, . . . , nT . The

prior on is Gamma(a, b) with known a > 0, b > 0 : ⌧() =

a1e/b

(a)ba , > 0.

a) Determine the Bayesian estimator of the intensity with respect to quadratic loss.

b) Assume a = 3, b = 2. If the counts within the last seven years were 2, 4, 7, 3, 4, 4, 5 find

the estimate of for this data.

c) The bank claims that the yearly intensity is less than 4. Test the bank’s claim via

Bayesian testing with a zero-one loss, using the data from b).

Hint: You may find it helpful to use the R function pgamma in your answer.

3. Important measures in exploratory data analysis are the skewness 1 =

E(XE(X))3

V ar(X)3/2

and

the kurtosis 2 =

E(XE(X))4

V ar(X)2 3. One way of estimating them is by using their empirical

counterparts ˆ1 =

p

n

Pn

i=1(XiX¯)3

(

Pn

i=1(XiX¯)2)3/2

and ˆ2 =

n

Pn

i=1(XiX¯)4

(

Pn

i=1(XiX¯)2)2 3, respectively.

1

E

S_ofE

Bada

a) Write down two R functions myskewness and mykurtosis to get ˆ1 and ˆ2.

b) Load Library MASS in R and find the data galaxies (this variable describes the

velocities of 82 galaxies taken in the Corona Borealis region (a small constellation in

the northern sky)). Use your functions to estimate 1 and 2 for the variable galaxies.

c) Bootstrap the ˆ1 and ˆ2 estimators by using B = 2000 replicates and report the

resulting 95% confidence intervals using first principles.

d) For a normal population, theoretical skewness and kurtosis are both equal to zero. If the

95% confidence interval for either skewness or kurtosis excludes zero, the normality is

in doubt. What is your conclusion about the normality of the galaxies data? Include

your coding, and the output containing the confidence intervals, in your assignment.

4. We simulate an example to demonstrate the strength of the LTS procedure in isolating

outliers. Suppose your student number contains the numbers XXXXXXX, in order that

you generate data that is unique to your student number, include the student number in

the starting seed for random number generation as shown below. After setting the initial

seed, generate pairs of observations (xi, yi, i = 1, 2, . . . , 100) of which 70% are scattered

around the line y = x+ 2 and 30% are clustered around (6,3).

>set.seed(round(log(XXXXXXX)))

>x70<-runif(70,0.5,4)

>e70<-rnorm(70,mean=0,sd=0.2)

>y70<-2+x70+e70

>x30<-rnorm(30,mean=6,sd=0.5)

>y30<-rnorm(30,mean=3,sd=0.5)

>x<-c(x70,x30)

>y<-c(y70,y30)

>simuldata<-data.frame(x,y)

...

Using the above commands as a starter and the help of R, produce and include in your

assignment the following graphs and comment on your findings.

i) Graph 1. Plot the x,y data to produce a scatterplot. Study the help and examples of

the abline command to superimpose three regression lines: the ordinary least squares line,

the default M-estimate line and the default LTS line using the lqs function. Label clearly

the lines.

ii) Graph 2. Using the instructions from the Computing exercise on robust regression

you can override the default value using the quantile statement. Modify the default LTS

regression by asking that only 70 residuals be included in the calculation. Redraw the graph

with the new LTS regression line replacing the old one.

iii) Suppose however that you did not know the amount of contamination and used 85

residuals instead of 70 in ii) (i.e., some outliers are still influencing the LTS fit). Try the

LTS estimator again. Does it deliver a good fit? Attach the graph (Graph 3).

5. (*) A random sample X = (X1, X2, X3) of size n = 3 is taken from a population with

density

f(x) = 2x, x 2 [0, 1].

a) Evaluate the covariance between X(1) and X(3).

b) Evaluate the correlation between X(1) and X(3).

c) Find the density of the range R = X(3) X(1) and show that E(R) = 0.4 holds.

2

tapas

DEPARTMENT OF STATISTICS

MATH3811/MATH3911- Statistical Inference/Higher Statistical Inference

ASSIGNMENT 2

Please, add a cover page containing a copy of your ID card, write with your own handwriting:

“I declare that this assignment is my own work, except where acknowledged and I have read

and understood the University rules regarding Academic Misconduct”, and sign it.

Assignment due: Friday, 16th April 2021, 5 pm at the latest.

Math3811: Attempt the first four questions. Math3911: Attempt all questions.

1. Suppose X1, X2, . . . , Xn are independent and identically distributed random variables from

N(µ,20) (each with a density f(x;µ) =

1p

2⇡0

e

(xµ)2

220 ). Suppose that 0 is known.

a) Argue that the joint density ofX1, X2, . . . , Xn has monotone likelihood ratio in

Pn

i=1Xi.

b) Derive the UMP unbiased size ↵ = 0.05 test '⇤ of H0 : µ = µ0 versus H1 : µ 6= µ0.

c) Show that the power function of this test is

Eµ'

⇤ = 1

✓

1.96

p

n(µ µ0)

0

◆

+

✓

1.96

p

n(µ µ0)

0

◆

with denoting the cdf of the standard normal distribution.

d) Set n = 10,0 = 2, µ0 = 3. Evaluate numerically the power function for µ = 1, 2, 3, 4, 5

and draw its graph on the real axis using R.

e) Calculate the density fX(3)(x) of the third order statistic X(3) under H0. Hence find

numerically P (X(3) < 2). (You could use the integrate function in R ).

2. In a sequence of consecutive years 1, 2, . . . , T, an annual number of high-risk events is

recorded by a bank. The random number Nt of high-risk events in a given year is modelled

via Poisson() distribution. This gives a sequence of independent counts n1, n2, . . . , nT . The

prior on is Gamma(a, b) with known a > 0, b > 0 : ⌧() =

a1e/b

(a)ba , > 0.

a) Determine the Bayesian estimator of the intensity with respect to quadratic loss.

b) Assume a = 3, b = 2. If the counts within the last seven years were 2, 4, 7, 3, 4, 4, 5 find

the estimate of for this data.

c) The bank claims that the yearly intensity is less than 4. Test the bank’s claim via

Bayesian testing with a zero-one loss, using the data from b).

Hint: You may find it helpful to use the R function pgamma in your answer.

3. Important measures in exploratory data analysis are the skewness 1 =

E(XE(X))3

V ar(X)3/2

and

the kurtosis 2 =

E(XE(X))4

V ar(X)2 3. One way of estimating them is by using their empirical

counterparts ˆ1 =

p

n

Pn

i=1(XiX¯)3

(

Pn

i=1(XiX¯)2)3/2

and ˆ2 =

n

Pn

i=1(XiX¯)4

(

Pn

i=1(XiX¯)2)2 3, respectively.

1

E

S_ofE

Bada

a) Write down two R functions myskewness and mykurtosis to get ˆ1 and ˆ2.

b) Load Library MASS in R and find the data galaxies (this variable describes the

velocities of 82 galaxies taken in the Corona Borealis region (a small constellation in

the northern sky)). Use your functions to estimate 1 and 2 for the variable galaxies.

c) Bootstrap the ˆ1 and ˆ2 estimators by using B = 2000 replicates and report the

resulting 95% confidence intervals using first principles.

d) For a normal population, theoretical skewness and kurtosis are both equal to zero. If the

95% confidence interval for either skewness or kurtosis excludes zero, the normality is

in doubt. What is your conclusion about the normality of the galaxies data? Include

your coding, and the output containing the confidence intervals, in your assignment.

4. We simulate an example to demonstrate the strength of the LTS procedure in isolating

outliers. Suppose your student number contains the numbers XXXXXXX, in order that

you generate data that is unique to your student number, include the student number in

the starting seed for random number generation as shown below. After setting the initial

seed, generate pairs of observations (xi, yi, i = 1, 2, . . . , 100) of which 70% are scattered

around the line y = x+ 2 and 30% are clustered around (6,3).

>set.seed(round(log(XXXXXXX)))

>x70<-runif(70,0.5,4)

>e70<-rnorm(70,mean=0,sd=0.2)

>y70<-2+x70+e70

>x30<-rnorm(30,mean=6,sd=0.5)

>y30<-rnorm(30,mean=3,sd=0.5)

>x<-c(x70,x30)

>y<-c(y70,y30)

>simuldata<-data.frame(x,y)

...

Using the above commands as a starter and the help of R, produce and include in your

assignment the following graphs and comment on your findings.

i) Graph 1. Plot the x,y data to produce a scatterplot. Study the help and examples of

the abline command to superimpose three regression lines: the ordinary least squares line,

the default M-estimate line and the default LTS line using the lqs function. Label clearly

the lines.

ii) Graph 2. Using the instructions from the Computing exercise on robust regression

you can override the default value using the quantile statement. Modify the default LTS

regression by asking that only 70 residuals be included in the calculation. Redraw the graph

with the new LTS regression line replacing the old one.

iii) Suppose however that you did not know the amount of contamination and used 85

residuals instead of 70 in ii) (i.e., some outliers are still influencing the LTS fit). Try the

LTS estimator again. Does it deliver a good fit? Attach the graph (Graph 3).

5. (*) A random sample X = (X1, X2, X3) of size n = 3 is taken from a population with

density

f(x) = 2x, x 2 [0, 1].

a) Evaluate the covariance between X(1) and X(3).

b) Evaluate the correlation between X(1) and X(3).

c) Find the density of the range R = X(3) X(1) and show that E(R) = 0.4 holds.

2

tapas