“Final” Homework for MATH 7343
Clearly state your method for analysis and results for each problem. If any computer
printouts are included in your answer, you should label clearly each part of the printout
which problem is it for. Also summarize your conclusion in plain language that non-
statisticians can understand.
1. (40 points)
The file detroit.txt contains data for the number of homicides per 100,000 population
(variable homicide) in the city of Detroit between years 1961 to 1973. (See page 468-469
of the textbook.) The other variables in the data set include police, the number of full-time
police officers per 100,000 population; unemp, the percentage of adults who are employed;
register, the number of handgun registrations per 100,000 per population; and weekly, the
average weekly earnings for city residents. We are interested in effect of these variables on
the homicide rate.
(a) Draw scatter plots of variables homicide vs. each of the four other variables. Are there
any apparent linear relationships from the graphs?
(b) Using homicide as response variable, do four simple linear regressions for each of the
four variables. Which of the four variables, on its own, has statistical significant effect on
the homicide rate? Which of the four variables, on its own, explains the largest percentage
of variation in the homicide rate?
(c) Using the stepwise selection method, finds the best multiple regression model. What
variables are selected? Which of the selected variable, after the other variables effects are
considered, explains the largest percentage of variation in the homicide rate?
(d) Suppose in a particular year, the four variables police, unemp, register and weekly have
the same values as in the year 1970. Construct a 95% prediction interval for the homicide
rate.
(e) Evaluate the fit of the model to data by creating a plot of residuals and a normal
probability plot of the residuals. Discuss your findings.
2. (30 points)
Insurance adjusters from the Shrewd Insurance Company are concerned with the high
estimates they are receiving from Garage Elegance for auto repairs compared to Joe’s
Garage. To verify their suspicions, each of the 15 cars recently involved in an accident was
taken to both garages for separate estimates of repair costs. The data is shown below. (The
repair estimates are in Hundreds of Dollars.) Also can find data in Garage.txt
Car Garage Elegance Joe’s Garage
1 7.6 7.3
2 10.2 9.1
3 9.5 8.4
4 1.3 1.5
5 3.0 2.7
6 6.3 5.8
7 5.3 4.9
8 6.2 5.3
9 2.2 2.0
10 4.8 4.2
11 11.3 11.0
12 12.1 11.0
13 6.9 6.1
14 7.6 6.7
15 8.4 7.5
Find the appropriate statistical method to solve this problem. Report your analysis.
What can you say to the insurance adjusters about your result?
3. (30 points)
Investors use many “indicators” to predict the stock market. Yale Hirsch coded one such
indicator as the “January indicator" in his 1972 book. It states that, if the S&P 500 index is
up in the month of January, then the S&P 500 index will be up for the rest of the year. On
the other hand, if it is down in January, then it will be down for the rest of the year. The
following table gives the data for the 60 years from 1951 to 2010:
Jan.
Feb.-Dec. Up Down
Up 33 11
Down 3 13
(a) State the appropriate null hypothesis and the alternative hypothesis.
(b) I would like to use the 2 -test to solve this problem. However, this is a one-sided
problem while the 2 -test is usually used as a two-sided test. What should I do?
(c) Carry out the test at = 0 05. level.
(d) Explain what your analysis means to the investors.
(e) If you are a portfolio manager, how are you going to use this January indicator?
4. (30 points)
To demonstrate the effect of nematodes (microscopic worms) on plant growth, a botanist
prepares 16 identical planting pots and then introduces different numbers of nematodes
into the pots. A tomato seedling is transplanted into each plot. Here are data on the increase
in height of the seedlings (in centimeters) 16 days after planting.
Nematodes Seedling Growth
0 10.8 9.1 13.5 9.2
1000 11.1 11.1 8.2 11.3
5000 5.4 4.6 7.4 5.0
10000 5.8 5.3 3.2 7.5
(a) I would like to know if the nematodes have any effect on the plant growth at all. Carry
out appropriate statistical analysis. What is your conclusion?
(b) Before conducting the experiment, I decided to test whether the introduction of
nematodes reduces the plant growth. State the appropriate contrast to test this hypothesis,
and carry out the appropriate test.
(c) Looking at the data, it seems that the biggest drop in plant growth is between the group
treated with 1000 nematodes per plant and the group treated with 5000 nematodes per plant.
If I then decide to test whether these two groups do have different plant growth, what is the
appropriate contrast for this test? Carry out the analysis.
5. (40 points).
A study was carried out to see how previous work experience improves the ability of a
programmer to complete a complex programming task. 25 randomly chosen programmers
were asked to complete a complex programming task within a specified time period. For
each programmer, the length of previous programming work experience (in months) was
recorded as well as whether the task was successfully completed (1 indicate success). The
data was presented in the following table (and in programmer.txt).
(a) Does work experience improve the programmer’s ability? Carry out appropriate
statistical analysis. What is your conclusion?
(b) With 95% confidence, give an estimation interval of the improvement in the odds of
completing the task with specified time period for each extra year of work experience.
(Note the data has unit of month.)
(c) The employer values the employee according to the probability of finishing the given
task within the time period. A person with 100% probability of finishing the given task
within the time period is worth the pay of $90,000 per year; A person with 80% probability
of finishing is worth $72,000, etc. Knowing only the work history of an applicant, what
salary (X dollars per year) should the employer pay a programmer with 24 months of
previous work experience? If a second programmer with 18 months of work experience is
willing to accept a yearly salary of $(X-10,000), is the second programmer a better deal for
the company according to the analysis?
Experience in months success
14 0
29 0
6 0
25 1
18 1
4 0
18 0
12 0
22 1
6 0
30 1
11 0
30 1
5 0
20 1
13 0
9 0
32 1
24 0
13 1
19 0
4 0
28 1
22 1
8 1
14 0
6. (20 points) Do 21.5.8 on page 512.
7. For the following data, we wish to test if X has the same mean in group A and group B.
(a) (3 points) Conduct the two sample t-test for this hypothesis.
(b) (7 points) Please conduct a permutation test using the t-test statistic in part (a). Submit
your R code for this permutation test, the R outputs and results of your test. (Hint: while
you can mimic the example R code for the permutation correlation test, please note that the
way of permutation is different. See the lecture notes at end of the non-parametric test
module.)
The data is also in the file PermutationTestData.txt
X Group
1 9.02 A
2 7.88 A
3 10.80 A
4 7.97 A
5 7.81 A
6 8.91 A
7 8.43 A
8 8.88 A
9 8.91 A
10 9.42 A
11 13.16 A
12 9.05 A
13 7.32 B
14 7.95 B
15 7.32 B
16 8.61 B
17 7.01 B
18 7.09 B
19 9.17 B
20 7.78 B
21 7.51 B
22 10.92 B
23 8.09 B
24 7.07 B
25 7.02 B
26 8.46 B
27 9.23 B