STAT7055代写-INATION 4
时间:2022-11-08
Research School of Finance, Actuarial Studies and Statistics
PAST FINAL EXAMINATION 4 SOLUTIONS
STAT7055 Introductory Statistics for Business and Finance
Writing Time: 180 minutes
Reading Time: 15 minutes
Exam Conditions:
Central examination.
Students must return the examination paper at the end of the examination.
This examination paper is not available to the ANU Library archives.
Materials Permitted in the Exam Venue:
(No electronic aids are permitted, e.g., laptops, phones).
Calculator (non-programmable).
Two A4 pages with notes on both sides.
Unannotated paper-based dictionary (no approval required).
Materials to be Supplied to Students:
Script book.
Scribble paper.
Instructions to Students:
Please write your student number in the space provided on the front of the script book.
Attempt all 5 questions.
Start your solution to each question on a new page and clearly label each solution with the corresponding
question number.
To ensure full marks show all the steps in working out your solutions. Marks may be deducted for failure
to show working or formulae.
Selected statistical tables are attached to the back of the examination paper.
If a required degree of freedom is not listed in a statistical table, please use the closest degree of freedom.
Unless otherwise stated, use a significance level of α = 5%.
Round all numeric answers to 4 decimal places.
Question: 1 2 3 4 5 Total
Marks: 21 21 18 21 22 103
Question 1 [21 marks]
There are many things which can affect the price of a second hand car. Data was
collected on 105 second hand car sales. A multiple linear regression model was fitted
with sale price as the dependent variable (Y ), and the odometer reading (X1), the
odometer reading squared (X21 ), the age (X2) and an indicator of whether the car has
an automatic transmission (Z = 1 if the car has an automatic transmission and Z = 0
otherwise) as the independent variables. That is, the following model was fitted:
Y = β0 + β1X1 + β2X
2
1 + β3X2 + β4Z +
Note that sale price was measured in thousands of dollars (e.g., Y = 32 corresponds to a
sale price of 32 000 dollars), odometer reading was measured in thousands of kilometres
(e.g., X1 = 19.1 corresponds to 19 100 kilometres) and age was measures in years. The
regression output, which includes some missing entries, is displayed below:
Predictor Coef SE Coef T p-value
Intercept 57.5629 5.7896 9.94 0.0000
Odometer ? ? −2.03 0.0455
Odometer2 ? ? −1.25 0.2136
Age −0.1216 0.1441 −0.84 0.4009
Z −0.1707 0.6040 −0.28 0.7780
Analysis of Variance
Source DF SS MS F p-value
Regression ? ? ? ? ?
Residual Error ? ? ?
Total ? 5783.875
(a) [4 marks] The adjusted R2 for the model is equal to 0.8544375. Test the overall
significance of the model. Clearly state your hypotheses and use a significance level
of α = 5%.
Solution:
The hypotheses are:
H0 : β1 = β2 = β3 = β4 = 0
H1 : Not all coefficient parameters are equal to 0.
In order to calculate the F -statistic, we need to know the SSR and the SSE.
From the given values of the adjusted R2 = 0.8544375, SS(Total) = 5783.875,
Past Final Examination 4 Page 2 of 22 STAT7055
n = 105 and k = 4, we can calculate the SSE:
adj R2 = 1−
SSE
n−k−1
SS(Total)
n−1
⇒ 0.8544375 = 1−
SSE
105−4−1
5783.875
105−1
⇒ SSE = (1− 0.8544375)× 5783.875
104
× 100
⇒ SSE = 809.5339
We can then calculate the SSR:
SSR = SS(Total)− SSE = 4974.3411
Now we can calculate our F -statistic:
F =
MSR
MSE
=
SSR
k
SSE
n−k−1
=
4974.3411
4
809.5339
100
= 153.6174
We need to compare this to an F -distribution with k = 4 numerator degrees
of freedom and n− k − 1 = 100 denominator degrees of freedom and reject H0
when F > F0.05,4,100 = 2.46. Since 153.6174 > 2.46, we reject H0 and conclude
that the overall regression model is significant.
(b) [2 marks] What do you conclude about the relationship between sale price and
odometer reading squared? Clearly state your hypotheses and use a significance
level of α = 5%.
Solution:
The hypotheses are:
H0 : β2 = 0
H1 : β2 6= 0
From the output, we see that the p-value for this test is 0.2136, which is greater
than 0.05, so we fail to reject H0 and we conclude that β2 = 0. That is, once
the other independent variables have been included in the model, there is no
significant linear relationship between sale price and odometer reading squared.
(c) [2 marks] Test whether a different intercept is needed for cars that have an au-
tomatic transmission. Clearly state your hypotheses and use a significance level of
α = 5%.
Past Final Examination 4 Page 3 of 22 STAT7055
Solution:
To see if a different value of the intercept is needed for cars that have an auto-
matic transmission, we are testing:
H0 : β4 = 0
H1 : β4 6= 0
From the output, we see that the p-value for this test is 0.7780, which is greater
than 0.05, so we fail to reject H0 and we conclude that β4 = 0. That is, once
the other independent variables have been included in the model, a different
intercept is not needed for cars that have an automatic transmission.
(d) [3 marks] Test whether the expected change in sale price when the age increases
by 1 year (all other variables held constant) is less than +0.15 (that is, less than
positive 150 dollars). Clearly state your hypotheses and use a significance level of
α = 5%.
Solution:
The expected change in sale price when the age increases by 1 year (all other
variables held constant) is, by definition, β3. So the hypotheses are:
H0 : β3 = 0.15
H1 : β3 < 0.15
We can no longer use the p-value from the output to perform this test and we
must manually calculate the T -statistic:
T =
βˆ3 − c
sβˆ3
=
−0.1216− 0.15
0.1441
= −1.8848
We need to compare this to a t-distribution with n − k − 1 = 100 degrees of
freedom and reject H0 when T < −t0.05,100 = −1.660. Since −1.8848 < −1.660,
we reject H0 and we conclude that the expected change in sale price when the
age increases by 1 year (all other variables held constant) is less than +0.15
(e) [5 marks] Using the estimated regression model, the predicted sale price for a car
that is 4 years old, travelled 32 000 km (X1 = 32) and has a manual transmission
is yˆ = 22.81569 and the predicted sale price for a car that is 7 years old, travelled
29 000 km (X1 = 29) and has an automatic transmission is yˆ = 26.22919. Based on
this information, calculate the estimates βˆ1 and βˆ2.
Past Final Examination 4 Page 4 of 22 STAT7055
Solution:
Using the estimated regression model, we get the following equations from the
predicted sale prices for the two cars:
22.81569 = 57.5629 + βˆ1 × 32 + βˆ2 × 322 − 0.1216× 4− 0.1707× 0
26.22919 = 57.5629 + βˆ1 × 29 + βˆ2 × 292 − 0.1216× 7− 0.1707× 1
Some algebra reduces these equations to:
32βˆ1 + 1024βˆ2 = −34.2608 (1)
29βˆ1 + 841βˆ2 = −30.3118 (2)
From the first equation we get:
βˆ1 =
−34.2608− 1024βˆ2
32
(3)
Substituting this into the second equation and solving for βˆ2 we get:
βˆ2 = −0.0085
Substituting βˆ2 back into the third equation we get:
βˆ1 = −0.7996
When people look to buy second hand cars, the odometer reading is generally the first
thing they check. Given this, a simple linear regression was fitted with sale price (Y )
as the dependent variable and the odometer reading (X1) as the independent variable.
The regression output is given below:
Predictor Coef SE Coef T
Intercept 63.3897 1.5936 39.78
Odometer −1.2906 0.0520 −24.83
(f) [2 marks] Test the overall significance of the model. Clearly state your hypotheses
and use a significance level of α = 5%.
Solution:
Past Final Examination 4 Page 5 of 22 STAT7055
The hypotheses are:
H0 : β1 = 0
H1 : β1 6= 0
From the output, we see that the T -statistic for this test is equal to T = −24.83.
We need to compare this to a t-distribution with n−2 = 103 degrees of freedom
and reject H0 when T > t0.025,103 ≈ t0.025,100 = 1.984 or T < −t0.025,103 ≈ −1.984.
Since −24.83 < −1.984, we reject H0 and we conclude that the overall regression
model is significant.
(g) [3 marks] The following sample statistics for the odometer readings are given:
X¯1 = 30.19046 and s
2
X1
= 28.60936. Calculate a 90% prediction interval for the sale
price of a car that has travelled 39 000 km (X1 = 39) given that the standard error
of estimate is s = 2.835572.
Solution:
A 90% prediction interval for the sale price of a car that has travelled 39 000
km is given by: (
yˆg ± tα
2
,n−2 × s
√
1 +
1
n
+
(x1g − X¯1)2
(n− 1)s2X1
)
where yˆg = 13.0563, t0.05,103 ≈ t0.05,100 = 1.660, x1g = 39, X¯1 = 30.19046,
n = 105, s2X1 = 28.60936 and s = 2.835572. Plugging this all in we get:
(8.2662, 17.8464)
Question 2 [21 marks]
The 100 metre sprint is one of the most watched events in any Olympic Games. Sprinters
will often go to great lengths to improve their times, looking into factors such as the
equipment they use and the coaches they hire. Listed in the table below are the personal
best times for the 100 metre sprint for 27 sprinters that are in training. The 27 sprinters
were chosen as follows: For each of three shoe brands (the Fast, the Quick and the
Speedy brand), 9 sprinters who used that particular brand were randomly selected. The
sample variances of the times for each shoe brand are also listed in the table.
(a) [6 marks] Test whether the mean time for sprinters using the speedy brand is more
than 0.2 seconds faster than for sprinters using the quick brand. Clearly state your
hypotheses and use a significance level of α = 5%. Clearly state any assumptions
you have made (without testing them) when performing this test.
Past Final Examination 4 Page 6 of 22 STAT7055
Shoe Brand Times s2
Fast 9.82 10.22 10.15 9.77 10.12 9.86 9.87 10.11 10.18 0.0313111
Quick 9.67 9.95 10.10 10.16 9.98 10.09 10.20 10.68 10.29 0.0753278
Speedy 9.69 9.75 9.49 10.03 9.66 10.24 10.04 9.87 10.01 0.0557528
Solution:
Let µs and σ
2
s denote the population mean and variance, respectively, of times
for the speedy brand and let µq and σ
2
q denote the corresponding values for the
quick brand. We will use a T -statistic to compare the population means and
the main assumption we are making is that the two population variances are
equal, i.e., σ2s = σ
2
q . The hypotheses we want to test are:
H0 : µq − µs = 0.2
H1 : µq − µs > 0.2
Letting Y represent times, from the table we can calculate Y¯s = 9.8644 and
Y¯q = 10.1244. From the given sample variances for the speedy and quick brands,
we can calculate the pooled sample variances to be s2p = 0.0655. The test
statistic is therefore:
T =
(Y¯q − Y¯s)−D0√
s2p
(
1
nq
+ 1
ns
) = (10.1244− 9.8644)− 0.2√
0.0655
(
1
9
+ 1
9
) = 0.4972
We need to compare this to a t-distribution with nq + ns − 2 = 16 degrees of
freedom and reject H0 when T > t0.05,16 = 1.746. Since 0.4972 < 1.746, we fail
to reject H0 and we conclude that the mean time for sprinters using the speedy
brand is not more than 0.2 seconds faster than those using the quick brand.
A one-way ANOVA was performed on this data, using shoe brand as the factor. The
partially filled ANOVA table is provided below:
Source Sum of squares Degrees of freedom Mean squares F
Shoe Brand 0.3059 ? ? ?
Error ? ? ?
Total ? ?
(b) [2 marks] Calculate the sum of squares for error for the one-way ANOVA.
Past Final Examination 4 Page 7 of 22 STAT7055
Solution:
The sum of squares for the one-way ANOVA can be calculated from the given
sample variances:
SSE =
k∑
j=1
nj∑
i=1
(Yij − Y¯j)2
=
nf∑
i=1
(Yif − Y¯f )2 +
nq∑
i=1
(Yiq − Y¯q)2 +
ns∑
i=1
(Yis − Y¯s)2
= (nf − 1)× s2f + (nq − 1)× s2q + (ns − 1)× s2s
= 8× (0.0313111 + 0.0753278 + 0.0557528)
= 1.2991
(c) [3 marks] Test whether there is a difference in the mean times between the three
shoe brands. Clearly state your hypotheses and use a significance level of α = 5%.
Solution:
Let µf , µq and µs denote the population mean times for the fast, quick and
speedy shoe brands, respectively. The hypotheses are:
H0 : µf = µq = µs
H1 : At least two population means differ.
From the ANOVA table we know that SST = 0.3059 and from part (b) we know
that SSE = 1.2991. Since n = 27 and k = 3, we can calculate the F -statistic:
F =
MST
MSE
=
SST
k−1
SSE
n−k
=
0.3059
2
1.2991
24
= 2.8256
We need to compare this to an F -distribution with k − 1 = 2 numerator and
n−k = 24 denominator, degrees of freedom, and reject H0 when F > F0.05,2,24 =
3.40. Since 2.8256 < 3.40, we fail to reject H0 and we conclude that there is no
difference in mean times between the three shoe brands.
For each shoe brand, suppose the first 3 sprinters in the table were trained by coach
Andy, the next 3 sprinters were trained by coach Bobby and the last 3 sprinters were
trained by coach Carl. A two-way ANOVA was performed on the same data, using shoe
brand and coach as factors. The partially filled ANOVA table is displayed below:
(d) [3 marks] Test whether there is an interaction between coach and shoe brand.
Clearly state your hypotheses and use a significance level of α = 5%.
Past Final Examination 4 Page 8 of 22 STAT7055
Source Sum of squares Degrees of freedom Mean squares F
Coach ? ? ? ?
Shoe Brand ? ? ? ?
Interaction 0.2970 ? ? ?
Error 0.6781 ? ?
Total ? ?
Solution:
The hypotheses are:
H0 : There is no interaction between coach and shoe brand.
H1 : There is an interaction.
The SS(Int) and the SSE are given in the ANOVA table, and we can calculate
df(Int) = (3 − 1) × (3 − 1) = 4 and df(E) = 27 − 3 × 3 = 18. Therefore, the
F -statistic is:
F =
MS(Int)
MSE
=
SS(Int)
df(Int)
SSE
df(E)
=
0.2970
4
0.6781
18
= 1.9709
We need to compare this to an F -distribution with 4 numerator and 18 denom-
inator, degrees of freedom, and reject H0 when F > F0.05,4,18 = 2.93. Since
1.9709 < 2.93, we fail to reject H0 and we conclude that there is no interaction.
(e) [3 marks] Test whether there is a difference in the mean times between the three
shoe brands. Clearly state your hypotheses and use a significance level of α = 1%.
Solution:
The hypotheses are:
H0 : µf = µq = µs
H1 : At least two population means differ.
The SS(Shoe) in the two-way ANOVA is the same as the SST from the one-way
ANOVA of part (c) so the SS(Shoe) = 0.3059. The F -statistic is then equal to:
F =
MS(Shoe)
MSE
=
SS(Shoe)
df(Shoe)
SSE
df(E)
=
0.3059
2
0.6781
18
= 4.0600
We need to compare this to an F -distribution with 2 numerator and 18 denom-
inator, degrees of freedom, and reject H0 when F > F0.01,2,18 = 6.01. Since
Past Final Examination 4 Page 9 of 22 STAT7055
4.0600 < 6.01, we fail to reject H0 and we conclude that there is no difference
in mean times between the three shoe brands.
(f) [4 marks] Test whether there is a difference in the mean times between the three
coaches. Clearly state your hypotheses and use a significance level of α = 1%.
Solution:
Let µa, µb and µc denote the population mean times for coach Andy, Bobby and
Carl, respectively. The hypotheses are:
H0 : µa = µb = µc
H1 : At least two population means differ.
In order to determine the SS(Coach), we need to find the total sum of squares
SS(Total). Since we are using the same data, the SS(Total) for the two-way
ANOVA is the same as the SS(Total) for the one-way ANOVA. Hence from the
one-way ANOVA:
SS(Total) = SST + SSE = 0.3059 + 1.2991 = 1.6050
From the SS(Total), we can now determine the SS(Coach):
SS(Coach) = SS(Total)− SS(Shoe)− SS(Int)− SSE
= 1.6050− 0.3059− 0.2970− 0.6781
= 0.3240
Hence we calculate the F -statistic to be:
F =
MS(Coach)
MSE
=
SS(Coach)
df(Coach)
SSE
df(E)
=
0.3240
2
0.6781
18
= 4.3003
We need to compare this to an F -distribution with 2 numerator and 18 denom-
inator, degrees of freedom, and reject H0 when F > F0.01,2,18 = 6.01. Since
4.3003 < 6.01, we fail to reject H0 and we conclude that there is no difference
in mean times between the three coaches.
Question 3 [18 marks]
We have a room of 6 people, all of who are celebrating their birthday today. Each
person’s age (the age that they turned today) is listed below:
31, 18, 21, 21, 18, 29
(a) [3 marks] A study has shown that 70% of people younger than 22 are likely to
Past Final Examination 4 Page 10 of 22 STAT7055
hold a party on their birthday, whereas only 22% of people older than 22 are likely
to hold a party on their birthday. For a person randomly selected from the room
of 6 people, find the probability that they will be holding a birthday party today.
Solution:
Let BP denote the event that the selected person will hold a birthday party
today and let the age of the selected person be denoted by the random variable
X. We can use the law of total probability by considering the events {X < 22},
{X = 22} and {X > 22}. For a person randomly selected from the room, we
know that P (X < 22) = 2
3
, P (X = 22) = 0 and P (X > 22) = 1
3
. So we get:
P (BP ) = P (BP ∩ {X < 22}) + P (BP ∩ {X = 22}) + P (BP ∩ {X > 22})
= P (BP |{X < 22})P (X < 22) + P (BP |{X = 22})P (X = 22)
+ P (BP |{X > 22})P (X > 22)
= 0.7× 2
3
+ P (BP |{X = 22})× 0 + 0.22× 1
3
=
27
50
= 0.54
Suppose that we select a sample of 2 people from this room (without replacement).
(b) [2 marks] Find the probability that the sample of size 2 contains the oldest person
in the room.
Solution:
Let SO denote the event that the sample contains the oldest person in the room
and let Xi denote the age of the ith person selected in the sample. Since 31 is the
oldest age in the room, the event SO can be denoted as {X1 = 31}∪{X2 = 31}.
Therefore:
P (SO) = P ({X1 = 31} ∪ {X2 = 31})
= P (X1 = 31) + P (X2 = 31)− P ({X1 = 31} ∩ {X2 = 31})
=
1
6
+
1
6
− 0
=
1
3
= 0.3333
(c) [2 marks] Find the probability that the oldest person in the sample of size 2 is
younger than 24 years old.
Solution:
Past Final Examination 4 Page 11 of 22 STAT7055
Let Xi be defined as in part (b). For the oldest person in the sample to be
younger than 24, both the first person selected and the second person selected
must be younger than 24. Therefore we want to find the following probability:
P ({X1 < 24} ∩ {X2 < 24}) = P ({X2 < 24}|{X1 < 24})P (X1 < 24)
=
3
5
× 4
6
=
2
5
= 0.4
(d) [4 marks] Determine the sampling distribution of the age of the oldest person in
the sample of size 2.
Solution:
Let Y denote the age of the oldest person in the sample. The following table
lists all possible samples of size 2 selected without replacement, along with the
age of the oldest person in each sample:
Sample y Sample y
(31, 18) 31 (18,29) 29
(31, 21) 31 (21, 21) 21
(31, 21) 31 (21, 18) 21
(31, 18) 31 (21, 29) 29
(31, 29) 31 (21, 18) 21
(18, 21) 21 (21, 29) 29
(18, 21) 21 (18, 29) 29
(18, 18) 18
Each sample is equally likely to be selected, so we obtain the sampling distri-
bution of Y by counting the possible outcomes.
y 18 21 29 31
p(y) 1
15
5
15
= 1
3
4
15
5
15
= 1
3
(e) [3 marks] Calculate the expected value and variance of the sampling distribution
of the age of the oldest person in the sample of size 2.
Solution:
Past Final Examination 4 Page 12 of 22 STAT7055
E(Y ) =
∑
all y
(y × p(y))
= 18× 1
15
+ 21× 1
3
+ 29× 4
15
+ 31× 1
3
=
394
15
= 26.2667
V (Y ) = E(Y 2)− (E(Y ))2
=
∑
all y
(
y2 × p(y))− (394
15
)2
= 182 × 1
15
+ 212 × 1
3
+ 292 × 4
15
+ 312 × 1
3
−
(
394
15
)2
=
5234
225
= 23.2622
(f) [4 marks] Suppose we took 10 samples of size 2, without replacement, from the
room of 6 people (but before each new sample was taken, the previous sample was
returned to the room). Find the probability that in more than 7 samples the oldest
person in the sample was older than 30.
Solution:
Let W denote the number of the 10 samples for which the oldest person in the
sample was older than 30. Then W ∼ Bin(n = 10, p = P (Y > 30) = 1
3
).
Therefore we can calculate:
P (W > 7) = P (W = 8) + P (W = 9) + P (W = 10)
=
10!
8!× 2! ×
(
1
3
)8
×
(
2
3
)2
+ 10×
(
1
3
)9
× 2
3
+
(
1
3
)10
= 0.0034
Question 4 [21 marks]
Let X and Y be independent continuous random variables with the following probability
density functions:
f(x) =
4
9
, 0 ≤ x < 1
−2
9
x+ 6
9
, 1 ≤ x < 2
2
9
, 2 ≤ x ≤ 3
and f(y) =
{
−y + 1, 0 ≤ y < 1
y − 1, 1 ≤ y ≤ 2
(a) [3 marks] Find the probability that X is between 1.2 and 2.2.
Past Final Examination 4 Page 13 of 22 STAT7055
Solution:
x
f(x
)
0
1
1.2
2
2.2
3
0
2/
9
4/
9
We want to find the red shaded area, which we can break into regions (either a
triangle and two rectangles or a trapezium and a rectangle):
P (1.2 < X < 2.2) = P (1.2 < X < 2) + P (2 < X < 2.2)
= Area of trapezium + Area of rectangle
=
(
1
2
× (Height at x = 1.2 + Height at x = 2)× (2− 1.2)
)
+
(
2
9
× (2.2− 2)
)
=
(
1
2
×
(
2
5
+
2
9
)
× 0.8
)
+
(
2
45
)
=
22
75
= 0.2933
(b) [3 marks] Find the probability that X is between 1.2 and 2.2, given that it is
larger than 1.5.
Solution:
Past Final Examination 4 Page 14 of 22 STAT7055
xf(x
)
0
1
1.2
1.5
2
2.2
3
0
2/
9
4/
9
We want to find the red and blue shaded area, then divide that by the blue
shaded area:
P ({1.2 < X < 2.2}|{X > 1.5}) = P ({1.2 < X < 2.2} ∩ {X > 1.5})
P (X > 1.5)
=
P (1.5 < X < 2.2)
P (X > 1.5)
=
P (1.5 < X < 2) + P (2 < X < 2.2)
P (1.5 < X < 2) + P (X > 2)
=
(
1
2
× (1
3
+ 2
9
)× 0.5)+ ( 2
45
)(
1
2
× (1
3
+ 2
9
)× 0.5)+ 1× 2
9
=
33
65
= 0.5077
(c) [3 marks] Find the probability that Y is between 0.7 and 1.1 or between 1.2 and
1.9.
Solution:
Past Final Examination 4 Page 15 of 22 STAT7055
yf(y
)
0
0.7
1
1.1
1.2
1.9
2
0
1
We want to find the red and blue shaded areas:
P ({0.7 < Y < 1.1} ∪ {1.2 < Y < 1.9})
= P (0.7 < Y < 1.1) + P (1.2 < Y < 1.9)
= P (0.7 < Y < 1) + P (1 < Y < 1.1) + P (1.2 < Y < 1.9)
=
1
2
× 0.3× 0.3 + 1
2
× 0.1× 0.1 + 1
2
× (0.2 + 0.9)× 0.7
=
87
200
= 0.435
(d) [3 marks] Find the probability that X is between 1.2 and 2.2 or Y is between 0.7
and 1.1.
Solution:
Since we are told that X and Y are independent, using the Addition Rule we
get:
P ({1.2 < X < 2.2} ∪ {0.7 < Y < 1.1})
= P (1.2 < X < 2.2) + P (0.7 < Y < 1.1)
− P (1.2 < X < 2.2)× P (0.7 < Y < 1.1)
=
22
75
+
1
20
− 22
75
× 1
20
=
493
1500
= 0.3287
(e) [2 marks] Find the probability that X is between 1 and 1.9, given that Y is
between 0.7 and 1.1.
Past Final Examination 4 Page 16 of 22 STAT7055
Solution:
We want to find P ({1 < X < 1.9}|{0.7 < Y < 1.1}). However, since X and Y
are independent, this probability is equal to P (1 < X < 1.9).
x
f(x
)
0
1
1.9
2
3
0
2/
9
4/
9
Therefore, we want to find the red shaded area, which we can calculate as the
area of a trapezium:
P (1 < X < 1.9) =
(
1
2
×
(
4
9
+
11
45
)
× 0.9
)
=
31
100
= 0.31
For parts (f) and (g), we can assume that µX = E(X) =
34
27
and σ2X = V (X) = 0.69204.
(f) [3 marks] Find the probability that, from a sample of size n = 50 taken from the
distribution of X, the sample mean lies between 1.2 and 1.5.
Solution:
From the CLT, we know that X¯ ∼ N
(
µX¯ = µX =
34
27
, σ2
X¯
=
σ2X
n
= 0.69204
50
)
.
P (1.2 < X¯ < 1.5) = P
1.2− 3427√
0.69204
50
<
X¯ − µX√
σ2X
n
<
1.5− 34
27√
0.69204
50
= P (−0.50 < Z < 2.05)
= P (Z < 2.05)− P (Z < −0.50)
= 0.9798− 0.3085
= 0.6713
(g) [4 marks] Suppose a sample of size 50 taken from the distribution of Y produced
the following sample statistics:
∑50
i=1 Yi = 44 and
∑50
i=1
(
Yi − Y¯
)2
= 45.08. Based
Past Final Examination 4 Page 17 of 22 STAT7055
on this data, test whether the expected value of Y is less than the expected value
of X. Clearly state your hypotheses and use a significance level of α = 5%.
Solution:
We want to test the following hypotheses:
H0 : µY =
34
27
H1 : µY <
34
27
From the data we can calculate Y¯ = 44
50
= 0.88 and s2Y =
45.08
49
= 0.92. We use a
T -statistic to test these hypotheses:
T =
Y¯ − µ0√
s2Y
n
=
0.88− 34
27√
0.92
50
= −2.7959
We need to compare this to a t-distribution with n− 1 = 49 degrees of freedom
and reject H0 when T < −t0.05,49 ≈ −t0.05,50 = −1.676. Since −2.7959 <
−1.676, we reject H0 and we conclude that the expected value of Y is less than
the expected value of X.
Question 5 [22 marks]
Suppose X, the amount of time a person stays in bed after their alarm goes off, is
uniformly distributed between 5 and a minutes. Also, suppose Y , the number of minutes
they are late to work is uniformly distributed between 0 and b minutes. Over 100 days,
how long they slept past their alarm (X) and how late they were to work (Y ) were
recorded (in minutes). However, only the number of days for which X and Y fell within
certain ranges was reported in the table below:
5 < X < 7 7 < X < 8 8 < X < 10 Totals
0 < Y < 2 18 9 12
2 < Y < 3 9 4 12
3 < Y < 5 13 9 14
Totals 100
(a) [5 marks] Based on the data given above, test whether a is equal to 10. Clearly
state your hypotheses and use a significance level of α = 5%.
Solution:
Past Final Examination 4 Page 18 of 22 STAT7055
The given ranges of X values can be thought of as a categorical variable with
3 categories. If X was uniformly distributed between 5 and a = 10 minutes,
then the observed count in each range should be close to the expected count
we would see under the given uniform distribution. Since we are comparing
observed counts to expected counts (for a single categorical variable), we should
be performing a Chi-squared goodness-of-fit test. The hypotheses are:
H0 : a = 10 (i.e., X is uniformly distributed between 5 and 10)
H1 : a 6= 10 (i.e., X is not uniformly distributed between 5 and 10)
We need to calculate the probability of falling into each category under the
assumption that a = 10. We use the uniform probability density function to
calculate the probabilities:
p1 = P (5 < X < 7) =
2
5
, p2 = P (7 < X < 8) =
1
5
, p3 = P (8 < X < 10) =
2
5
So now we can also state our hypotheses as:
H0 : p1 =
2
5
, p2 =
1
5
, p3 =
2
5
H1 : The probabilities do not match that given above.
From H0, we calculate the expected counts to be e1 = 40, e2 = 20 and e3 = 40.
The observed counts are obtained from the column totals and are equal to
f1 = 40, f2 = 22, f3 = 38. The χ
2-statistic is therefore equal to:
χ2 =
3∑
i=1
(fi − ei)2
ei
= 0.3
We need to compare this to a chi-squared distribution with 5 − 1 = 4 degrees
of freedom and reject H0 when χ
2 > χ20.05,2 = 5.99. Since 0.3 < 5.99, we fail to
reject H0 and we conclude that a = 10.
(b) [3 marks] Calculate a 90% confidence interval for the probability (population pro-
portion) that the person sleeps in between 5 and 7 minutes and is between 0 and 2
minutes late for work.
Solution:
Let p denote the probability that a person sleeps in between 5 and 7 minutes
and is between 0 and 2 minutes late for work, i.e.:
p = P ({5 < X < 7} ∩ {0 < Y < 2})
Past Final Examination 4 Page 19 of 22 STAT7055
From the table of counts, we calculate the sample proportion to be pˆ = 18
100
=
0.18. Hence a 90% confidence interval for p is given by:
pˆ± zα
2
×
√
pˆ(1− pˆ)
n
= 0.18± 1.645×
√
0.18× 0.82
100
= (0.1168, 0.2432)
(c) [4 marks] Test whether the probability (population proportion) that the person
is between 2 and 3 minutes late for work is greater than 0.18. Clearly state your
hypotheses and use a significance level of α = 1%.
Solution:
Let p denote the probability that the person is between 2 and 3 minutes late
for work. The hypotheses are:
H0 : p = 0.18
H1 : p > 0.18
From the data we can calculate the sample proportion, pˆ = 25
100
= 0.25. The
test statistic is given by:
Z =
pˆ− p0√
p0(1−p0)
n
=
0.25− 0.18√
0.18×(1−0.18)
100
= 1.8220
As this is an upper-tailed test, the rejection region is given by Z > z0.01 = 2.33.
Since 1.8220 < 2.33, we fail to reject H0 and we conclude that the probability
is not greater than 0.18.
(d) [5 marks] Based on the data above, test whether X and Y are independent (that
is, whether the rows and columns are independent). Clearly state your hypotheses
and use a significance level of α = 5%.
Solution:
We should perform a Chi-squared test of a contingency table. The hypotheses
are:
H0 : X and Y are independent.
H1 : X and Y are not independent.
Under the assumption of independence, the expected counts are calculated using
the row and column totals and are given in the table below:
Past Final Examination 4 Page 20 of 22 STAT7055
5 < X < 7 7 < X < 8 8 < X < 10
0 < Y < 2 15.6 8.58 14.82
2 < Y < 3 10 5.5 9.5
3 < Y < 5 14.4 7.92 13.68
The test statistic is given by:
χ2 =
3∑
i=1
3∑
j=1
(fij − eij)2
eij
= 2.3842
We need to compare this to a chi-squared distribution with (3−1)× (3−1) = 4
degrees of freedom and reject H0 when χ
2 > χ20.05,4 = 9.49. Since 2.3842 < 9.49,
we fail to reject H0 and we conclude that X and Y are independent based on
the data.
The following sample statistics were produced from the sample of 100 sleep-in times and
the sample of 100 late-for-work times: X¯ = 7.51092, s2X = 1.894077, Y¯ = 2.488363 and
s2Y = 1.999385. For part (e), assume that the samples of sleep-in times and late-for-work
times are independent, and that σ2X = σ
2
Y .
(e) [5 marks] Test whether a exceeds b by more than 3 minutes. Clearly state your
hypotheses and use a significance level of α = 5%.
Solution:
The hypotheses are:
H0 : a− b = 3
H1 : a− b > 3
However, since the data given are the sample means, we should express the
hypotheses in terms of the population means, µX and µY . As X and Y are
uniformly distributed, we know that µX =
a+5
2
and µY =
b
2
. Rearranging
these equations we obtain a = 2µX − 5 and b = 2µY , which means a − b =
2(µX − µY )− 5. Therefore we can also express the hypotheses as:
H0 : µX − µY = 4
H1 : µX − µY > 4
Now we calculate a T -statistic:
T =
(X¯ − Y¯ )−D0√
s2p
(
1
nX
+ 1
nY
) = (7.51092− 2.488363)− 4√
1.9467
(
1
100
+ 1
100
) = 5.1823
Past Final Examination 4 Page 21 of 22 STAT7055
We need to compare this to a t-distribution with nX + nY − 2 = 198 degrees
of freedom and reject H0 when T > t0.05,198 ≈ t0.05,200 = 1.653. Since 5.1823 >
1.653, we reject H0 and we conclude that a exceeds b by more than 3 minutes.
END OF EXAMINATION
Past Final Examination 4 Page 22 of 22 STAT7055