STA258 Week 10 Practice Questions
Solution
1. 20 toddlers were observed over several months at a nursery school. “Time” is the average number of
minutes a child spent at the table when lunch was served. “Calories” is the average number of calories
the child consumed during lunch, which was calculated from careful observation of what the child ate
each day.
a. Refer to the scatterplot of calories consumed during lunch time and time (in min) spent at the lunch
table. Describe the overall pattern of this plot. That is, describe the direction, form, and strength of the
relationship between the two variables.
There appears to be a moderate to strong negative
linear association between the number of calories consumed
by children during lunch and the average time (in minutes)
that they spent at the lunch table.
Longer time spent at the lunch table are associated with
less number of calories consumed at the lunch table.
As time (in min) spent at the lunch table increases, the
number of calories consumed tends to decrease.
b. Suppose a researcher wants to investigate the relationship between the calories consumed during lunch
time and time (in min) spent at the lunch table. Identify the response variable and the explanatory
variable in the context of this study.
Response variable: Calories consumed during lunch time
Explanatory variable: Time spent at the lunch table
c. The estimated correlation coefficient, = -0.65. Interpret this number in the context of this problem.
If a child’s time spent at the lunch table is one standard deviation above the mean, their number of
calories spent during lunch time is 0.65 standard deviation below the mean of number of calories
consumed during lunch time.
2. A survey was conducted in the United States and 10 countries of Western Europe to determine the
percentage of teenagers who used marijuana and other drugs. This means that sample size n is 11.
a. Refer to the scatterplot of percentage who have used marijuana and percent who have used other drugs.
Describe the nature of the relationship between the two variables. Describe the overall pattern of this
plot. That is, describe the direction, form, and strength of the relationship between the two variables.
There appears to be a positive, linear, moderate to strong
association between percentage who have used other drugs
and percentage who have used marijuana.
b. Refer to the summery statistics table below.
Summary statistics:
Column n Sum
Z Scores Marijuana % 11 -7.7715612e-16
Z Scores Other Drugs % 11 0
Z Scores Marijuana % * Z Scores Other Drugs % 11 9.3410002
Recall that a numerical measure of the direction and strength of a linear association can be calculated using this
formula correlation coefficient, r : =
∑
−1
i. Find the estimated correlation, .
=
9.341002
11−1
= 0.9341002
ii. Interpret this value in the words of the problem.
If a country’s marijuana % was 1 standard deviation above the mean, their other drug usage % is
estimated to increase by 0.934 standard deviation, on average.
3. A study finds that high school students who have a computer at home get higher grades on average than
students who do not. Does this mean that parents who can afford it should buy a computer to enhance
their children's chances of academic success? Briefly explain/justify your answer.
Not necessarily. While it is possible that some students are doing better academically and therefore
getting int university because of their computers, it is also possible that their parents have enough
money to buy them a computer, and also enough money to pay for their education. It may be that
academically able student who is more likely to go to university will want a computer more, and
therefore be more likely to get one somehow. Therefore, the study does not provide good evidence that
a computer at home will enhance chances of academic success.
4. The United Nations Development Programme (UNDP) uses the Human Development Index (HDI) in an
attempt to summarize in one number, the progress in health, education, and economics of a country. The
gross domestic product per capita (GDPPC) is used to summarize the overall economic strength of a
country. Below is a scatterplot of GDP per capita against HDI for 169 countries throughout the world.
a. Is it appropriate to summarize the strength of association with a correlation? Answer Yes or No.
Briefly Explain/Justify your answer.
It is not appropriate to summarize
the strength of the association
between GDP per capita and HDI value
with a correlation, since the association
is not linear (e.g., exponential).
b. GDPPC is measured in dollars. Incomes and other economic measures tend to be highly right
skewed. Taking logs often makes the distribution more unimodal and symmetric. Suppose that we
use the log re-expression to make a scatterplot of log(GDPPC) against HDI. Comment on the effects
of re-expression.
The plot is much straighter than the original.
5. The International CensusAtSchool Project collects data on primary and secondary school students from
various countries, including Canada. A random sample of 111 Canadian secondary school students age
14 and over was selected, which include data on gender, age, height (in centimeter), armspan (in cm),
wristbone (in centimeter), middlefinger (in centimeter), and foot (in centimetre).
a. Refer to the pair-wise correlation (correlation matrix) plot below.
b. Which two variables posses the strongest relationship? Justify your answer.
Correlation between armspan and height is the strongest relationship since the points appear to be
tighter (less scattered on the plot).
c. Refer to the pairwise correlation matrix table below. Which estimated correlation coefficient between
the two variables posses the strongest relationship? Interpret the number in words of this study.
= 0.781
Interpretation (any of the following):
• High values of height (in cm) is associated with high values of armspan (in cm).
• As height values (in cm) increase, armspan values (in cm) tend to increase.
• If a student’s height was 1 standard deviation above the mean, their armspan is estimated to increase
by 0.781 standard deviation, on average.
d. Suppose a researcher is interested to investigate whether there is a relationship between height (in cm)
and armspan (in cm). Identify the role of each variable (e.g., explanatory or response variable) in the
researcher’s investigation.
Explanatory Variable: Height (in centimetre)
Response Variable: Armspan (in centimeter)
e. Suppose the information on another variable, gender is included in the analysis.
i. Does gender appear to be an important variable in understanding the relationship between
armspan and height? Answer Yes or No. Justify your answer.
The relationship is similar for both genders, but it looks likes the points for females (the red points)
are closer to a straight line than those for males (the blue points). This means the relationship
between armspan and height is stronger for females ( = 0.75) than for males ( = 0.64).
ii. If we omitted the information about gender in the plot of armspan and height, it would
become a lurking variable. Fill in the blank.