CC0294 Semester 1 – 2016 Page 1 of 14
SEAT NUMBER:
LAST NAME:
FIRST NAME:
SID:
CONFIDENTIAL EXAM PAPER
This paper is not to be removed from the examination room.
ECMT1010
Introduction to Economic Statistics
End of Semester Examination
Semester 1 – 2016
Total Duration: 2 hours and 10 minutes
Writing Time: 2 hours
INSTRUCTIONS TO CANDIDATES
1. This is a closed book exam. Electronic devices, apart from a simple, non-programmable
calculator, are not permitted. Formulas, definitions, and distribution tables are provided
at the end of the exam.
2. This exam contains 30 Multiple Choice Questions and 2 Problems with multiple parts.
3. Multiple Choice Questions are worth 0.5 marks each. Problems are worth 20 marks each.
The marks for each part of the Problems are indicated. Marks total 55.
4. Answer all Multiple Choice Questions on the answer sheet provided for that purpose.
Answer the Problems in the Exam Booklets provided.
5. This question paper must be returned with the Multiple Choice answer sheet and the
Please check your examination paper is complete (14 pages) and indicate you have done
this by signing below.
I have checked the examination paper and affirm it is complete.
Student Signature: Date:
1
CC0294 Semester 1 – 2016 Page 2 of 14
30 Multiple Choice Questions [15 marks total—suggested time approx. 32 minutes].
1. A bank reports that 30% of households have a MasterCard, 20% have an American Express card, and
25% have a Visa card. Eight percent of households have both a MasterCard and an American Express
card. Twelve percent have both a Visa card and a MasterCard. Six percent have both a American
Express card and a Visa card. If a household has a MasterCard, what is the probability it also has a
Visa card?
A) 0.12
B) 0.25
C) 0.40
D) 0.43
E) 0.48
2. Lenovo Group Limited, a Hong Kong IT company, has a 30% share of the Hong Kong PC market.
Suppose 10 new PC buyers are selected at random from the Hong Kong population. What is the
probability that fewer than 3 bought their PC from Lenovo?
A) 0.028
B) 0.121
C) 0.233
D) 0.267
E) 0.382
3. A pair of (fair) dice is rolled once. What is the probability that the sum of the values on the two die
faces is 7?
A) 1/6
B) 7/36
C) 1/2
D) 1/18
E) 1/3
4. Calculate the expected value (mean) of the following discrete probability distribution:
x 1 2 3 4
p(x) 0.16 0.26 0.26 0.32
A) 1
B) 2.5
C) 1.86
D) 2.74
E) 2.0
2
CC0294 Semester 1 – 2016 Page 3 of 14
5. A psychologist is interested to test whether the IQ of statisticians is higher than 100. Based on a
random sample of 100 statisticians, the sample mean of IQ is 120. What is the p-value of the test
assuming a population standard deviation of 100?
A) 0.0000
B) 0.0228
C) 0.0456
D) 0.4207
E) 0.8414
Scenario 1 In a marketing research project, a major supermarket wants to study the relationship between
the annual consumption of ramen noodles (Y , in number of packs) and the annual income level of con-
sumers (X , in \$000s). Based on a random sample of 100 customers, the linear regression model
Yi = β0 + β1X i + εi
is estimated with the following result:
Coefficient Standard Error
Intercept 55.4 32.3
Annual income (in \$000s) −0.22 0.1
6. Refer to Scenario 1. What is the predicted annual consumption (in number of packs) of ramen
noodles for a consumer who earns \$100,000 a year?
A) 2.2
B) 22.0
C) 33.4
D) 53.2
E) 55.4
7. Refer to Scenario 1. To see whether income level has an effect on the consumption of ramen noodles,
Adam, Simon and Tim consider the following hypotheses:
H0 : β1 = 0 Ha : β1 6= 0
They arrive at the following conclusions:
Adam: The null hypothesis is rejected at the 5% significance level.
Simon: The null hypothesis is rejected at the 2% significance level.
Tim: The null hypothesis is rejected at the 1% significance level.
Who is/are correct?
A) Tim only
B) Simon only
3
CC0294 Semester 1 – 2016 Page 4 of 14
8. Alcohol content in beer is believed to follow a normal distribution. A chemist takes a sample from
9 bottles of beer and measures the alcohol content, finding a sample mean of 7.5% and a sample
standard deviation of 1%. The chemist wishes to compute a 90% confidence interval for the mean.
However, the chemist mistakenly treats the sample standard deviation as if it were the population
standard deviation. What is the confidence interval constructed by the chemist?
A) (6.647, 8.153)
B) (6.952, 8.048)
C) (6.880, 8.120)
D) (7.073, 7.927)
E) (7.034, 7.966)
9. Suppose the chemist in the previous question realises he has made a mistake. If he correct his mistake
and recalculates the confidence interval using the same sample, how will the new confidence interval
compare to the previous one?
A) The new interval will be the same width as the previous one and will be shifted to the left to
account for small sample bias.
B) The new interval will be wider than the previous one and will be centered around the same
point estimate.
C) The new interval will be narrower than the previous one and will be centered around the same
point estimate.
D) The new interval will be narrower than the previous one and will be shifted to the left to account
for small sample bias.
E) This cannot be determined from the data given.
10. A lecturer hires a tutor to mark exam papers. To ensure that the tutor is grading correctly, the
lecturer marks a few exam papers herself and compares her mark with the mark given by the tutor.
She chooses these papers by physically going through the pile of exams and pulling out a paper “when
she feels like it”. This corresponds to which form of sampling?
A) Judgement sampling
B) Simple random sampling
C) Systematic sampling
D) Cluster sampling
E) Snowball sampling
11. A pair of (fair) die is rolled once. What is the probability that the sum of the values on the two die
faces is not a 7?
A) 1/6
B) 7/36
C) 1/3
D) 1/2
E) 5/6
4
CC0294 Semester 1 – 2016 Page 5 of 14
Scenario 2 The following table is derived from the Banerjee et al (2010) study on vaccination rates in
India.
Control Group Treatment Only Treatment plus
Incentive
Children not fully immunised 810 311 234
Children fully immunised 50 68 148
12. Refer to Scenario 2. What is the point estimate of the proportion of children that were fully immu-
nised in villages that received only the treatment and not the additional incentives?
A) 0.613
B) 0.219
C) 0.179
D) 0.821
E) 0.387
13. Refer to Scenario 2. Suppose a researcher wants to test the null hypothesis that the true proportion
of children that were fully immunised in villages that received only the treatment was exactly 0.2 at
a 99% level of significance. What critical value will the researcher have to look up in the appropriate
statistical table in order to do this?
A) 1.28
B) 1.645
C) 1.96
D) 2.33
E) 2.575
14. Complete the following sentence to arrive at the correct statement of the Central Limit Theorem: “If
samples of size n are drawn randomly from a population with mean µ and standard deviation σ . .
. ”
A) then repeated observations of the sample mean x¯ will follow a normal distribution with mean
µ and standard deviation σ, regardless of the underlying distribution.
B) then if the sample size is sufficiently large (n≥ 30), repeated observations of the sample mean
x¯ will follow a normal distribution with mean µ and standard deviation σ, regardless of the
underlying distribution.
C) then if the sample size is sufficiently large (n≥ 10), repeated observations of the sample mean
x¯ will follow a normal distribution with mean µ and standard deviation σ/
p
n, regardless of
the underlying distribution.
D) then if the sample size is sufficiently large (n≥ 30), repeated observations of the sample mean
x¯ will follow a normal distribution with mean µ and standard deviation σ/
p
n, regardless of
the underlying distribution.
E) then if the sample size is sufficiently large (n≥ 30), repeated observations of the sample mean
x¯ will follow a normal distribution with mean µ and standard deviation σ/
p
n, as long as the
underlying distribution is normal.
5
CC0294 Semester 1 – 2016 Page 6 of 14
15. A researcher is interested in the following hypotheses about the mean of a population:
H0 : µ≤ 2 Ha : µ > 2
Based on a sample of 45 observations and the researcher calculates a t statistic of 2.2. At a 1% level
of significance what is the researcher’s conclusion?
A) The researcher is unable to reject a false null hypothesis.
B) The researcher fails to reject the null hypothesis.
C) The researcher accepts the null hypothesis.
D) The researcher rejects the null hypothesis.
E) The researcher commits a type I error.
Scenario 3 On the first day of class, students in an introductory economics course were asked their sex
and eye color. The results are summarized in the table below.
Blue Brown Green Hazel All
Female 24 21 10 11 66
Male 20 17 8 10 55
Total 44 38 18 21 121
16. Refer to Scenario 3. What is the probability that a randomly selected student in the class is a female
or has brown eyes?
A) 0.660
B) 0.860
C) 0.314
D) 0.545
E) 0.686
17. Refer to Scenario 3. What is the probability that a randomly selected student in the class is a female
and has hazel eyes?
A) 0.634
B) 0.091
C) 0.149
D) 0.174
E) 0.545
18. Refer to Scenario 3. What is the probability that a randomly selected student is a male, if we know
that they have hazel eyes?
A) 0.476
B) 0.182
C) 0.083
D) 0.078
E) 0.455
6
CC0294 Semester 1 – 2016 Page 7 of 14
19. An article published in the Canadian Journal of Zoology presented a method for estimating the body
fat percentage of North American porcupines. The method was illustrated with a sample of n = 25
porcupines. Based on this sample, a 95% bootstrap confidence interval for the average body fat
percentage of porcupines is 17.4% to 25.8%. Which of the following null hypotheses would be
rejected based on this confidence interval?
A) H0 : µ= 18.6%.
B) H0 : µ= 26.6%.
C) H0 : µ= 20.0%.
D) H0 : µ= 22.9%.
E) H0 : µ= 24.6%.
Scenario 4 Admissions records at MIT indicates that 6.7% of the graduate students enrolled are from
20. Refer to Scenario 4. What is the minimum sample size for which the Central Limit Theorem applies
in this case?
A) n= 30.
B) n= 40.
C) n= 50.
D) n= 100.
E) n= 200.
21. Refer to Scenario 4. Find the mean and standard error of the sample proportion of Canadian students
in random samples of 100 graduate students at MIT.
A) pˆ = 0.067, SE = 0.0625.
B) pˆ = 0.067, SE = 0.006.
C) pˆ = 0.067, SE = 0.025.
D) pˆ = 0.670, SE = 0.250.
E) pˆ = 0.067, SE = 0.0067.
22. Refer to Scenario 4. Roughly what percentage of samples of 100 randomly selected graduate students
at MIT will have at least 10% of students from Canada?
A) 5%.
B) 6.7%.
C) 10%.
D) 18%.
E) 25%.
23. For a N(0, 1) density, what is the area to the left of z = −1.645.
A) 2.5%.
B) 3.5%.
C) 5%.
D) 10%.
E) 11%.
7
CC0294 Semester 1 – 2016 Page 8 of 14
24. For a N(0, 1) density, what is the area outside of the interval z = −2.326 and z = 1.282.
A) 2.5%.
B) 3.5%.
C) 5%.
D) 10%.
E) 11%.
25. A sample of 148 university students reports sleeping an average of 6.85 hours on weeknights. The
sample size is large enough to use the normal distribution, and a bootstrap distribution shows that
the standard error is SE = 0.175. Use a normal distribution to construct a 95% confidence interval
for the mean amount of weeknight sleep students get at this university.
A) 6.68 to 7.03 hours.
B) 6.51 to 7.19 hours.
C) 6.50 to 7.20 hours.
D) 6.52 to 7.21 hours.
E) 6.85 to 7.85 hours.
26. Suppose that a 95% confidence interval for µ is (54.8,60.8). Which of the following is most likely
the p-value for the test of H0 : µ= 56 versus Ha : µ 6= 56?
A) 0.031
B) 0.001
C) 0.016
D) 0.231
E) 0.05
27. The randomization distribution for testing the hypotheses H0 : µ1 = µ2 versus Ha : µ1 6= µ2 is
provided. The sample statistic is x¯1 − x¯2 = −2.5. Use the provided randomization distribution
(based on 100 samples) to estimate the p-value for this test.
A) 10%
B) 2%
C) 5%
D) 1%
E) 4%
8
CC0294 Semester 1 – 2016 Page 9 of 14
Scenario 5 Refer to the following probability tree diagram to find the requested probabilities. (Round your
28. Refer to Scenario 5. What is P(Y |A)?
A) 0.60
B) 0.50
C) 0.40
D) 0.20
E) 0.06
29. Refer to Scenario 5. What is P(A|Y )?
A) 0.82
B) 0.30
C) 0.20
D) 0.18
E) 0.06
30. Refer to Scenario 5. What is P(X )?
A) 0.66
B) 0.50
C) 0.48
D) 0.42
E) 0.44
9
CC0294 Semester 1 – 2016 Page 10 of 14
Problem 1 [20 marks total—suggested time approx. 44 minutes]
Using data from the United States for 1970–2009, a researcher obtains the following regression output for a
model to predict life expectancy based on the total number of vehicles produced (measured in thousands).
Predictor Coefficient SE coef. t stat
Intercept 65.8455 0.2434 270.5326
Vehicles 0.05015 0.001286 38.9868
Regression statistics
R square 0.9756 SD error 0.3311 Observations 40
Analysis of variance
Source df SS
Regression 1 166.6377
Residual 38 4.1660
Total 39 170.8037
a) What is the correlation between vehicles produced and life expectancy? [2 marks]
b) Test whether the correlation between vehicles and life expectancy is statistically significant at the 1%
level. Show all your steps. [3 marks]
c) State in words your conclusion from the correlation test of significance. [2 marks]
d) Give a interpretation of the slope coefficient. [2 marks]
e) Test whether the slope coefficient is statistically significant at the 1% level. Show all your steps. [3
marks]
The researcher uses the bootstrap to investigate the regression slope estimate. The following shows the
results from 1,000 bootstrap samples.
f) Briefly explain the purpose of the bootstrap distribution in this context. [2 marks]
g) Use the bootstrap distribution to build a 99% confidence interval for the slope parameter. [2 marks]
h) Comment on your findings in b), e), and g). [2 marks]
i) What do you think about the overall validity of this study? [2 marks]
10
CC0294 Semester 1 – 2016 Page 11 of 14
Problem 2 [20 marks total—suggested time approx. 44 minutes]
An upcoming biology quiz has 10 multiple choice questions, each with 4 choices. Eugene has not studied
for the quiz. In fact, he hasn’t even opened the textbook since the beginning of term. In short, he knows
nothing about biology and will have to guess the answer to every question. As it happens, Eugene is very
good at statistics and he is going to compute the probability that he passes the quiz (5 or more correct
answers). Let X be the number of questions Eugene correctly guesses on the biology quiz.
a) What is the name of the distribution of X? Specify the parameters of X . [2 marks]
b) Compute the mean of X . [1 mark]
c) What is the probability that Eugene gets at least 1 answer correct? [2 marks]
d) What is the probability that Eugene passes the quiz? [3 marks]
Eugene’s biology lecturer Sandy, who is also very good at statistics, wants to evaluate whether the marks
on the quiz have improved since another 10-question quiz carried out earlier in the term. The table below
gives a sample of 10 grades on the two quizzes. Sandy is interested in testing whether the mean mark on
the second quiz is significantly higher than the mean mark on the first quiz.
First quiz 7 9 6 9 8 10 7 7 8 6
Second quiz 8 9 7 9 8 9 9 8 9 7
e) Clearly define your notation and state the null and alternative hypothesis assuming that the marks
from the first quiz come from a random sample of 10 students in the class and the grades on the
second quiz come from a different random sample of 10 students in the class. [2 marks]
f) Complete the test in (e) and clearly state the conclusion. [3 marks]
g) Clearly define your notation and state the null and alternative hypothesis assuming that the marks
recorded for the first quiz and second quiz are from the same 10 students (so that the first student
got 7 on the first quiz and 8 on the second quiz, and so on). [2 marks]
h) Complete the test in (g) and clearly state the conclusion. [3 marks]
i) Why are the hypothesis test results so different? Which is a better way to collect the data to answer
the question of whether grades are higher on the second quiz? [2 marks]
END OF THE EXAM
11
CC0294 Semester 1 – 2016 Page 12 of 14
Formulas, definitions, and distribution tables
Population and sample statistics.
Statistic Population Sample
size N n
mean µ=
∑N
i=1 x i
N
x¯ =
∑n
i=1 x i
n
standard deviation σ =
√√√∑N
i=1(x i −µ)2
N
s =
√√√∑ni=1(x i − x¯)2
n− 1
correlation ρ =
1
N
N∑
i=1
(x i −µx)
σx
(yi −µy)
σy
r =
1
n− 1
n∑
i=1
(x i − x¯)
sx
(yi − y¯)
sy
Descriptive statistics.
Statistic Definition
z-score zi =
x i − x¯
s
range range = max−min
inter-quartile range IQR=Q3 −Q1
outliers x i Q3 + 1.5(IQR)
95% rule x¯ ± 2s
interval estimate statistic±margin of error
95% confidence interval statistic± 2× SE
Standard deviations and standard errors for various statistics.
Statistic Standard deviation Standard error

σp
n
sp
n

√√ p(1− p)
n
√√ pˆ(1− pˆ)
n
x¯1 − x¯2
√√√σ21
n1
+
σ22
n2
√√√ s21
n1
+
s22
n2
x¯1 − x¯2
√√√ p1(1− p1)
n1
+
p2(1− p2)
n2
√√√ pˆ1(1− pˆ1)
n1
+
pˆ2(1− pˆ2)
n2
12
CC0294 Semester 1 – 2016 Page 13 of 14
Confidence intervals.
100(1−α)% confidence interval: statistic± z∗
α/2 × SE for N(0,1) distribution
statistic± t∗
α/2 × SE for t distribution with df = n− 1
Null hypothesis Test statistic
H0 : µ= µ0
x¯ −µ0
s/
p
n
∼ tn−1
H0 : p = p0
pˆ− p0p
p0(1− p0)/n
∼ N(0,1)
H0 : µ1 −µ2 = 0 x¯1 − x¯2È
s21
n1
+
s22
n2
∼ tn−1 where n= min(n1,n2)
H0 : p1 − p2 = 0 pˆ1 − pˆ2r
1
n1
+ 1n2

pˆ(1− pˆ)
∼ N(0,1) where pˆ = x1 + x2
n1 + n2
Selected percentiles from the N(0,1) distribution.
Right-tail probability Confidence level z∗
0.10 80% 1.282
0.05 90% 1.645
0.025 95% 1.960
0.01 98% 2.326
0.005 99% 2.575
Selected percentiles from t distributions with various degrees of freedom.
Right-tail probability
df 0.05 0.025 0.01 0.005
8 1.860 2.306 2.896 3.355
9 1.833 2.262 2.821 3.250
38 1.686 2.024 2.427 2.712
98 1.661 1.984 2.365 2.627
Probability rules.
Conditional probability: P(A|B) = P(A and B)
p(B)
Multiplicative rule: P(A and B) = P(A|B)P(B)
Independence: P(A|B) = P(A)
Mutual exclusion: P(A and B) = 0
13
CC0294 Semester 1 – 2016 Page 14 of 14
Law of total probability.
P(A) = P(A and B) + P(A and (not B))
P(A) = P(A and B1) + P(A and B2) + · · ·+ P(A and Bk) where (B1,B2, . . . ,Bk) are disjoint
Bayes’ rule for two cases.
P(A|B) = P(B|A)P(A)
P(B|A)P(A) + P(B|not A)P(not A)
Bayes’ rule for j = 1, 2, . . . , k.
P(A j|B) = P(B|A j)P(A j)P(B|A1)P(A1) + P(B|A2)P(A2) + · · ·+ P(B|Ak)P(Ak) where (A1,A2, . . . ,Ak) are disjoint.
Population statistics for a discrete random variable X with probability function p(x).
Mean: µ=
n∑
i=1
x ip(x i)
Standard deviation: σ =
√√√ n∑
i=1
(x i −µ)2p(x i)
Suppose X follows a binomial distribution with parameters n and p.
Binomial probability: P(X = k) =

n
k

pk(1− p)n−k = n!
k!(n− k)! p
k(1− p)n−k
Expected value: n× p
Standard deviation:
p
np(1− p)
Simple linear regression.
Population regression model: y = β0 + β1x + ε
Sample regression model: yˆ = b0 + b1x
100(1−α)% confidence interval for βk bk ± t∗df,α/2 × SEbk for k = 0,1 with df = n− 2
t statistic for H0 : βk = 0 t =
bk
SEbk
for k = 0,1 with df = n− 2
t statistic for H0 : ρ = 0 t =
r
p
n− 2p
1− r2 with df = n− 2
Goodness-of-fit: R2 = r2 =
SSR
SST
Standard deviation of the error: sε =
√√ SSE
n− 2
14