xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

Rstudio代写-STA258

时间：2021-04-19

STA258 Week 10 Practice Questions

Solution

1. 20 toddlers were observed over several months at a nursery school. “Time” is the average number of

minutes a child spent at the table when lunch was served. “Calories” is the average number of calories

the child consumed during lunch, which was calculated from careful observation of what the child ate

each day.

a. Refer to the scatterplot of calories consumed during lunch time and time (in min) spent at the lunch

table. Describe the overall pattern of this plot. That is, describe the direction, form, and strength of the

relationship between the two variables.

There appears to be a moderate to strong negative

linear association between the number of calories consumed

by children during lunch and the average time (in minutes)

that they spent at the lunch table.

Longer time spent at the lunch table are associated with

less number of calories consumed at the lunch table.

As time (in min) spent at the lunch table increases, the

number of calories consumed tends to decrease.

b. Suppose a researcher wants to investigate the relationship between the calories consumed during lunch

time and time (in min) spent at the lunch table. Identify the response variable and the explanatory

variable in the context of this study.

Response variable: Calories consumed during lunch time

Explanatory variable: Time spent at the lunch table

c. The estimated correlation coefficient, = -0.65. Interpret this number in the context of this problem.

If a child’s time spent at the lunch table is one standard deviation above the mean, their number of

calories spent during lunch time is 0.65 standard deviation below the mean of number of calories

consumed during lunch time.

2. A survey was conducted in the United States and 10 countries of Western Europe to determine the

percentage of teenagers who used marijuana and other drugs. This means that sample size n is 11.

a. Refer to the scatterplot of percentage who have used marijuana and percent who have used other drugs.

Describe the nature of the relationship between the two variables. Describe the overall pattern of this

plot. That is, describe the direction, form, and strength of the relationship between the two variables.

There appears to be a positive, linear, moderate to strong

association between percentage who have used other drugs

and percentage who have used marijuana.

b. Refer to the summery statistics table below.

Summary statistics:

Column n Sum

Z Scores Marijuana % 11 -7.7715612e-16

Z Scores Other Drugs % 11 0

Z Scores Marijuana % * Z Scores Other Drugs % 11 9.3410002

Recall that a numerical measure of the direction and strength of a linear association can be calculated using this

formula correlation coefficient, r : =

∑

−1

i. Find the estimated correlation, .

=

9.341002

11−1

= 0.9341002

ii. Interpret this value in the words of the problem.

If a country’s marijuana % was 1 standard deviation above the mean, their other drug usage % is

estimated to increase by 0.934 standard deviation, on average.

3. A study finds that high school students who have a computer at home get higher grades on average than

students who do not. Does this mean that parents who can afford it should buy a computer to enhance

their children's chances of academic success? Briefly explain/justify your answer.

Not necessarily. While it is possible that some students are doing better academically and therefore

getting int university because of their computers, it is also possible that their parents have enough

money to buy them a computer, and also enough money to pay for their education. It may be that

academically able student who is more likely to go to university will want a computer more, and

therefore be more likely to get one somehow. Therefore, the study does not provide good evidence that

a computer at home will enhance chances of academic success.

4. The United Nations Development Programme (UNDP) uses the Human Development Index (HDI) in an

attempt to summarize in one number, the progress in health, education, and economics of a country. The

gross domestic product per capita (GDPPC) is used to summarize the overall economic strength of a

country. Below is a scatterplot of GDP per capita against HDI for 169 countries throughout the world.

a. Is it appropriate to summarize the strength of association with a correlation? Answer Yes or No.

Briefly Explain/Justify your answer.

It is not appropriate to summarize

the strength of the association

between GDP per capita and HDI value

with a correlation, since the association

is not linear (e.g., exponential).

b. GDPPC is measured in dollars. Incomes and other economic measures tend to be highly right

skewed. Taking logs often makes the distribution more unimodal and symmetric. Suppose that we

use the log re-expression to make a scatterplot of log(GDPPC) against HDI. Comment on the effects

of re-expression.

The plot is much straighter than the original.

5. The International CensusAtSchool Project collects data on primary and secondary school students from

various countries, including Canada. A random sample of 111 Canadian secondary school students age

14 and over was selected, which include data on gender, age, height (in centimeter), armspan (in cm),

wristbone (in centimeter), middlefinger (in centimeter), and foot (in centimetre).

a. Refer to the pair-wise correlation (correlation matrix) plot below.

b. Which two variables posses the strongest relationship? Justify your answer.

Correlation between armspan and height is the strongest relationship since the points appear to be

tighter (less scattered on the plot).

c. Refer to the pairwise correlation matrix table below. Which estimated correlation coefficient between

the two variables posses the strongest relationship? Interpret the number in words of this study.

= 0.781

Interpretation (any of the following):

• High values of height (in cm) is associated with high values of armspan (in cm).

• As height values (in cm) increase, armspan values (in cm) tend to increase.

• If a student’s height was 1 standard deviation above the mean, their armspan is estimated to increase

by 0.781 standard deviation, on average.

d. Suppose a researcher is interested to investigate whether there is a relationship between height (in cm)

and armspan (in cm). Identify the role of each variable (e.g., explanatory or response variable) in the

researcher’s investigation.

Explanatory Variable: Height (in centimetre)

Response Variable: Armspan (in centimeter)

e. Suppose the information on another variable, gender is included in the analysis.

i. Does gender appear to be an important variable in understanding the relationship between

armspan and height? Answer Yes or No. Justify your answer.

The relationship is similar for both genders, but it looks likes the points for females (the red points)

are closer to a straight line than those for males (the blue points). This means the relationship

between armspan and height is stronger for females ( = 0.75) than for males ( = 0.64).

ii. If we omitted the information about gender in the plot of armspan and height, it would

become a lurking variable. Fill in the blank.

Solution

1. 20 toddlers were observed over several months at a nursery school. “Time” is the average number of

minutes a child spent at the table when lunch was served. “Calories” is the average number of calories

the child consumed during lunch, which was calculated from careful observation of what the child ate

each day.

a. Refer to the scatterplot of calories consumed during lunch time and time (in min) spent at the lunch

table. Describe the overall pattern of this plot. That is, describe the direction, form, and strength of the

relationship between the two variables.

There appears to be a moderate to strong negative

linear association between the number of calories consumed

by children during lunch and the average time (in minutes)

that they spent at the lunch table.

Longer time spent at the lunch table are associated with

less number of calories consumed at the lunch table.

As time (in min) spent at the lunch table increases, the

number of calories consumed tends to decrease.

b. Suppose a researcher wants to investigate the relationship between the calories consumed during lunch

time and time (in min) spent at the lunch table. Identify the response variable and the explanatory

variable in the context of this study.

Response variable: Calories consumed during lunch time

Explanatory variable: Time spent at the lunch table

c. The estimated correlation coefficient, = -0.65. Interpret this number in the context of this problem.

If a child’s time spent at the lunch table is one standard deviation above the mean, their number of

calories spent during lunch time is 0.65 standard deviation below the mean of number of calories

consumed during lunch time.

2. A survey was conducted in the United States and 10 countries of Western Europe to determine the

percentage of teenagers who used marijuana and other drugs. This means that sample size n is 11.

a. Refer to the scatterplot of percentage who have used marijuana and percent who have used other drugs.

Describe the nature of the relationship between the two variables. Describe the overall pattern of this

plot. That is, describe the direction, form, and strength of the relationship between the two variables.

There appears to be a positive, linear, moderate to strong

association between percentage who have used other drugs

and percentage who have used marijuana.

b. Refer to the summery statistics table below.

Summary statistics:

Column n Sum

Z Scores Marijuana % 11 -7.7715612e-16

Z Scores Other Drugs % 11 0

Z Scores Marijuana % * Z Scores Other Drugs % 11 9.3410002

Recall that a numerical measure of the direction and strength of a linear association can be calculated using this

formula correlation coefficient, r : =

∑

−1

i. Find the estimated correlation, .

=

9.341002

11−1

= 0.9341002

ii. Interpret this value in the words of the problem.

If a country’s marijuana % was 1 standard deviation above the mean, their other drug usage % is

estimated to increase by 0.934 standard deviation, on average.

3. A study finds that high school students who have a computer at home get higher grades on average than

students who do not. Does this mean that parents who can afford it should buy a computer to enhance

their children's chances of academic success? Briefly explain/justify your answer.

Not necessarily. While it is possible that some students are doing better academically and therefore

getting int university because of their computers, it is also possible that their parents have enough

money to buy them a computer, and also enough money to pay for their education. It may be that

academically able student who is more likely to go to university will want a computer more, and

therefore be more likely to get one somehow. Therefore, the study does not provide good evidence that

a computer at home will enhance chances of academic success.

4. The United Nations Development Programme (UNDP) uses the Human Development Index (HDI) in an

attempt to summarize in one number, the progress in health, education, and economics of a country. The

gross domestic product per capita (GDPPC) is used to summarize the overall economic strength of a

country. Below is a scatterplot of GDP per capita against HDI for 169 countries throughout the world.

a. Is it appropriate to summarize the strength of association with a correlation? Answer Yes or No.

Briefly Explain/Justify your answer.

It is not appropriate to summarize

the strength of the association

between GDP per capita and HDI value

with a correlation, since the association

is not linear (e.g., exponential).

b. GDPPC is measured in dollars. Incomes and other economic measures tend to be highly right

skewed. Taking logs often makes the distribution more unimodal and symmetric. Suppose that we

use the log re-expression to make a scatterplot of log(GDPPC) against HDI. Comment on the effects

of re-expression.

The plot is much straighter than the original.

5. The International CensusAtSchool Project collects data on primary and secondary school students from

various countries, including Canada. A random sample of 111 Canadian secondary school students age

14 and over was selected, which include data on gender, age, height (in centimeter), armspan (in cm),

wristbone (in centimeter), middlefinger (in centimeter), and foot (in centimetre).

a. Refer to the pair-wise correlation (correlation matrix) plot below.

b. Which two variables posses the strongest relationship? Justify your answer.

Correlation between armspan and height is the strongest relationship since the points appear to be

tighter (less scattered on the plot).

c. Refer to the pairwise correlation matrix table below. Which estimated correlation coefficient between

the two variables posses the strongest relationship? Interpret the number in words of this study.

= 0.781

Interpretation (any of the following):

• High values of height (in cm) is associated with high values of armspan (in cm).

• As height values (in cm) increase, armspan values (in cm) tend to increase.

• If a student’s height was 1 standard deviation above the mean, their armspan is estimated to increase

by 0.781 standard deviation, on average.

d. Suppose a researcher is interested to investigate whether there is a relationship between height (in cm)

and armspan (in cm). Identify the role of each variable (e.g., explanatory or response variable) in the

researcher’s investigation.

Explanatory Variable: Height (in centimetre)

Response Variable: Armspan (in centimeter)

e. Suppose the information on another variable, gender is included in the analysis.

i. Does gender appear to be an important variable in understanding the relationship between

armspan and height? Answer Yes or No. Justify your answer.

The relationship is similar for both genders, but it looks likes the points for females (the red points)

are closer to a straight line than those for males (the blue points). This means the relationship

between armspan and height is stronger for females ( = 0.75) than for males ( = 0.64).

ii. If we omitted the information about gender in the plot of armspan and height, it would

become a lurking variable. Fill in the blank.