统计代写-STA310|学霸联盟

统计代写-STA310

时间：2021-04-16

University of Toronto Mississauga
STA310 H5S: Bayesian Statistics in Forensic Science -
Winter 2021
Instructor: Dr. Ramya Thinniyam
Midterm Test
(Administered on Quercus)
February 22, 2021
SOLUTIONS
INSTRUCTIONS:
Test Duration/Submission Period:
• The test will be open on Quercus for 24 hours from Feb 22nd 9:00am EST to Feb 23rd
9:00am EST. You may submit your answers any time during the open period.
• The actual test duration is at most 90 minutes (if you were writing it in person, you would
not be given more than 90 mins). I am leaving the test open for 24 hours to accommodate
online test writing conditions, time zone differences, technical difficulties, etc.
• Do not leave the test to the last minute. It is your responsibility to make sure you have a
stable internet connection and the required materials to complete the test
Test Policies:
• You will be required to sign an Honour Pledge to confirm that you have maintained aca-
demic honesty during this assessment and submitted your own work. You must complete the
test individually. You are not allowed to discuss the test questions/content with any student
in the course or anyone beyond the course during the open period of the test. You will get
randomized questions/output, etc. so do not try to commit an academic offense by sharing
your answers with others - anomalies will be investigated.
• You may use a calculator and your notes as aids. You cannot use other resources such as
the internet or solutions from this course/other courses to copy answers (this is considered
plagiarism).
• You cannot base your justification on any procedure/fact that is not covered in the lectures.
• Questions will be randomized, locked (so you can’t go back to the previous question), and
displayed one at a time.
**If you skip/submit a question by mistake or submit the whole test by mistake without
finishing, you CANNOT go back. BE CAREFUL! You will get warning messages each time
you click Next. You cannot skip questions or go back**.
• For some Short Answer questions, you will be asked to upload your answers from a file. You
can either type up your answers or clearly write them using dark pen/pencil and then
scan/take photographs. Format your solution neatly, make sure to label the question num-
bers/part letters (1, 2, 3, a, b, c, etc.), and put all the parts together and then upload ONE
file. Your writing/scan/upload should be legible and clear for the marker to read.
• Numerical answers should be rounded to 4 decimal places where appropriate.
TOTAL: 50 marks
BEST WISHES ! ,
STA310 - Winter 2021 Midterm Test Solutions Page 1 of 11
[1 mark - 1m for typing in the pledge statement that was given and including
student’s full name and student number. ]
0. HONOUR PLEDGE
[5 marks - 1m each part. 5 parts randomly assigned ]
1. True/False: If the statement is true under all conditions, select T; otherwise select F.
a) The total sum of squares will change according to the model that is fit. T F
b) The Likelihood is a measure of the relative strength of evidence in favour of one hypothesis
against another. T F
c) The Bayes Factor is a probability. T F
d) The likelihood ratio is equal to the posterior probability when prior odds are 1:1. T F
e) Consider a One-Way ANOVA with only two levels (Group 1, Group 2). Conducting the
One-Way ANOVA F-test is equivalent to testing H0 : µ1 = µ2 vs Ha : µ1 6= µ2 using a
t-test. T F
f) Least Square means are equal to arithmetic means in a One-Way ANOVA model. T F
g) Bonferroni is a more conservative for pairwise comparisons of group means than Tukey’s
method when the design is balanced. T F
h) Tukey can be used for multiple tests that are not pre-planned. T F
[14 marks - 1m each part. 14 parts randomly assigned ]
2. Fill in the Blanks: Refer to Fingerprint Matching based on Minutiae study. Fill in the
blank with correct word/number. For numerical values, use 4 decimal places where appropriate.
a) Fill in the following missing number from the output: (A) = 2
b) Fill in the following missing number from the output: (B) = 21
c) Fill in the following missing number from the output: (C) = 2,931.9
d) Fill in the following missing number from the output: (D) = 12.2789
e) Fill in the following missing number from the output: (E) = 0.0103
f) Fill in the following missing number from the output: (F) = 0.0458
g) Is the design balanced or unbalanced? Balanced [write either ‘balanced’ or
‘unbalanced’ ].
h) What percent of variability in quality of fingerprints is accounted for by the dominant
minutia type used to match them? 53.9 %.
i) Give an unbiased estimate for the common standard deviation of the error terms from
model1: 15.4524 or 15.4525 or 15.45 .
STA310 - Winter 2021 Midterm Test Solutions Page 2 of 11
j) Suppose we want to do all pairwise comparisons between minutia types using Bonferroni
method. Write the p-value that corresponds to this question of interest: “Do finger-
prints that use bifurcations for matching have different quality than those that use island
minutia?” . 0.0309
k) What p-value corresponds to testing the question of interest? 0.0003 .
l) Based on the analyses conducted, which dominant minutia type is the worst choice for
matching poor quality fingerprints? Island .
m) Suppose we want to do all pairwise comparisons between minutia types using Bonferroni
method. Write the p-value that corresponds to this question of interest: “Do finger-
prints that use ridge endings for matching have poorer quality than those that use island
minutia?” 0.0001 .
n) Consider this model: Ygi = β0 + β1XB,i + β2XI,i + egi where for g = 1, 2, 3
XB,i =

1, if ith fingerprint used bifurcation minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
and
XI,i =

1, if ith fingerprint used island minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
Give a point estimate for β0. 29.6997
o) Consider this model: Ygi = β0 + β1XB,i + β2XI,i + egi where for g = 1, 2, 3
XB,i =

1, if ith fingerprint used bifurcation minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
and
XI,i =

1, if ith fingerprint used island minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
Give a point estimate for β1. -1.7877
p) Consider this model: Ygi = β0 + β1XB,i + β2XI,i + egi where for g = 1, 2, 3
XB,i =

1, if ith fingerprint used bifurcation minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
STA310 - Winter 2021 Midterm Test Solutions Page 3 of 11
and
XI,i =

1, if ith fingerprint used island minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
Give a practical interpretation for β0 (your answer should be in terms of this particular
case study).
It is the mean quality score of all latent fingerprints regardless of the dominant minutia
used to match them.
q) Consider this model: Ygi = β0 + β1XB,i + β2XI,i + egi where for g = 1, 2, 3
XB,i =

1, if ith fingerprint used bifurcation minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
and
XI,i =

1, if ith fingerprint used island minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
Give a practical interpretation for β1 (your answer should be in terms of this particular
case study).
It is the difference between the mean quality score of latent fingerprints that used bifur-
cation as dominant minutia and the mean of all fingerprints (regardless of minutia).
r) Consider this model: Ygi = β0 + β1XB,i + β2XI,i + egi where for g = 1, 2, 3
XB,i =

1, if ith fingerprint used bifurcation minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
and
XI,i =

1, if ith fingerprint used island minutia as dominant type to match
−1, if ith fingerprint used ridge ending minutia as dominant type to match
0, otherwise
Give a practical interpretation for β2 (your answer should be in terms of this particular
case study).
It is the difference between the mean quality score of latent fingerprints that used island
as dominant minutia and the mean of all fingerprints (regardless of minutia).
STA310 - Winter 2021 Midterm Test Solutions Page 4 of 11
[20 marks]
3. Short Answer: Show your work and explain your answers. Answers (even correct ones)
without justification will not receive marks. Refer to Fingerprint Matching based on Minutiae
study. Recall that the question of interest is if the quality of latent fingerprints differ by the
dominant minutia type used to match them.
[3m-1m for indicators and defining them, 1m for proper notation of parame-
ters, 0.5m for response, 0.5m for errors]
a) Write out the theoretical model that is being fitted in model1 in the R output. Define
any variables you include.
The model being fitted in model1 is:
Yk = β0 + β1II,k + β2IR,k + ek for k = 1, 2, . . . , 24
where Yk is the quality score for the kth fingerprint, ek ∼ iid N(0, σ2), and
II,k =
{
1 , if the kth fingerprint used island as the dominant minutia for matching
0 , otherwise
IR,k =
{
1 , if the kth fingerprint used ridge ending as the dominant minutia for matching
0 , otherwise
[2m - 1m for correct values of parameter estimates, 1m for proper notation
with hat and not using error term ]
b) Write out the fitted model from model1 in the R output. Define any new variables you
introduce.
yˆk = 27.912 + 21.763II,k − 16.4IR,k
[2m - 1m using proper notation and correct statement in H0, 1m for using
proper notation and correct statement in Ha]
c) Write out the appropriate null and alternative hypotheses that would be needed to test
the question of interest using the linear regression model that was fitted in model1. Use
proper notation. (You do not have to conduct the actual test.)
H0 : β1 = β2 = 0 vs Ha: at least one of βj 6= 0 for j = 1, 2
[6m - 1m for each step- hypotheses, test stat, distribution, conclusion, and
practical conclusion with proper words]
d) Test the question of interest using the ANOVA test at the 5% significance level. Include
all the necessary steps for the hypothesis test (and include a practical conclusion).
H0 : µB = µI = µR vs Ha : µi 6= µj for at least one of pair i 6= j for i, j = 1, 2, 3
F = MSReg
MSE
= 5863.8/2
238.78
= 12.2787 ∼ F2,21 under H0
p = P (F2,21 > 12.2787) = 0.0003 ⇒ Reject H0
STA310 - Winter 2021 Midterm Test Solutions Page 5 of 11
There very strong evidence to conclude that the quality scores of latent fingerprints vary by the
dominant minutia used for matching.
(Hypotheses are in terms of the means, not the regression parameters. In practical conclusion,
underlined words or synonyms for them should be used.)
[4m]
e) Based on the Post-Hoc Analysis conducted, what do you conclude? Be specific and
name the methods that you are referring to. Which of the procedures carried out is most
appropriate in this example? Justify. Give an overall conclusion to this study.
• Since the design is balanced, Tukey’s HSD method will be more powerful for pairwise
comparisons between group means. (1m)
• There is moderate to strong evidence (p = 0.0268) of a difference between the quality of
fingerprints that used island and bifurcation, and very strong evidence (p = 0.0002) of
a difference between fingerprints that used island and ridge ending. There is insufficient
evidence of a difference between fingerprints that used bifurcation and that used ridge ending.
(2m)
• Specifically, fingerprints that used ridge ending are optimal since they have the lowest quality
scores but were still successful in being matched.(1m)
[4m]
f) Does there appear to be any violation of the model assumptions? Discuss in detail with
reference to the appropriate diagnostic plots/summary statistics. If any of the assumptions
may be of concern, pick one and give a practical reason (in terms of this scenario) for why
this may be the case.
• The QQ-plot of the residuals shows a slight curve/ S shape indicating that the residuals
are skewed and not normally distributed. (1m)
• There does not seem to be a major violation of the constant variance assumption when
looking at the boxplot and plot of residuals vs fitted values.
The ratio of the largest standard deviation to smallest is sI
sR
= 20.78583
11.78830
= 1.76 < 2.
(1m)
• There time series plot of the residuals appears to have a pattern (not randomly scat-
tered about 0) so there could be a problem with independence. It is possible for quality
scores to be correlated because some fingerprints could be collected together or affected
by the collection method, measurement method, laboratory, or collected from similar
crime scenes, etc. (2m)
[10 marks]
4. Short Answer: For each of the following questions, show your work and explain your
answers. Answers (even correct ones) without justification will not receive marks.
(Important ideas/parts of the solution are underlined for general solution. Specific examples
will be different but should include parts of solution but with different scenarios and numbers.)
STA310 - Winter 2021 Midterm Test Solutions Page 6 of 11
[2m]
a) Explain in YOUR OWN words what ”Prosecutor’s Fallacy” means.
The Prosecutor’s Fallacy is usually committed by the prosecution when they wrongfully conclude
that the chance of the suspect being innocent based on the evidence is very small when in
reality it is the probability of the observed evidence if the suspect is innocent that is very small.
[2m]
b) Explain in YOUR OWN words what ”Defender’s Fallacy” means.
The Defender’s Fallacy is usually committed by defense lawyers who argue that the evidence
is irrelevant and has no value. If the suspect was identified solely based on the one piece of
forensic evidence that they mention, then their argument is acceptable, but often the suspect
will be found from other evidence as well.
[2m]
c) Make up YOUR OWN ORIGINAL example of a case and show how both of the above
fallacies would be used in your example. Use only ONE example that shows both fallacies
(not two different examples).
**You may NOT use any of the cases we covered in the course (State vs Skipper from
lecture, Murder vs SIDS from lecture, Blood Stain example from HW). You may not copy
an example from the internet; you have to make up your own to demonstrate knowledge of
the concepts.**
To answer this question, include the following:
• Make up a realistic scenario (1m - cannot be same as one of the examples listed above)
• State probabilities (make probabilities realistic/sensible but they don’t have to be ac-
curate) (0.5m)
• Clearly define events/hypotheses (0.5m - use proper notation)
• State what the prosecutor and defense lawyer would say to commit the above fallacies
in this scenario (2m -wordings should be accurate and in terms of the case example,
not general)
• Rewrite both fallacies as probabilistic statements using the events/hypotheses you de-
fined (1m - use proper notation and distinguish between conditional, inverse conditional,
and unconditional probabilities)
• Explain why these fallacies are problematic for a court case (1m - practical reason about
court cases should be given such as the reasons listed below)
(Here is an example. Each student’s example will vary but should include relevant numbers
and explanations:)
Scenario: A crime has been committed and a blood stain is left behind at the crime scene.
A suspect, whose blood type matches that of the stain found at the crime scene, is arrested.
Only 1 in 100, 000 of the population has this rare blood type found at the crime scene (and
in the suspect).
STA310 - Winter 2021 Midterm Test Solutions Page 7 of 11
Hypotheses: Let Hs be the hypothesis that the blood stain came from the suspect and Ho
be the hypothesis that the blood stain was left by someone other than the suspect. Let E
represent the blood test evidence.
Defense Lawyer’s claim: “In a city like this with a population of 30,000,000 people who may
have committed the crime, this blood type would be found in approximately 300 people. So
the evidence merely shows that the suspect is one of 300 people in the city who might have
committed this crime. The blood test evidence has provided a probability of guilt of 1 in
300, which is negligible and cannot prove the suspect is guilty.”
Stated as Probabilities: P (Hs|E) = 1/300 and P (Hs) = 1/100000
Prosecutor’s claim: “The chance of observing this blood type if the blood came from someone
other than the suspect is 1 in a 100,000.’ Therefore, the chance that the blood came from
someone other than the suspect is only 1 in a 100,000. The probability is so small that the
defendant must have committed the crime. ”
Stated as Probabilities: P (E|Ho) = 0.00001 and P (Ho|E) = 0.00001
Defender’s fallacy is problematic because it ignores evidence that was important because it
helped narrow down the population of suspects. The Prosecutor’s fallacy is problematic
because mathematically it assumes a 50% chance that the suspect is guilty (before the evi-
dence is introduced). This goes against the legal principle of “innocent until proven guilty”
that should be followed in our court system.
STA310 - Winter 2021 Midterm Test Solutions Page 8 of 11
Fingerprint Matching based on Minutiae:
Latent fingerprints (fingerprints left at a crime scene) are often used to identify
suspects by matching them to digital prints from a database. There are different
features called “minutiae” on latent prints that can provide useful information
to a forensic examiner. There are several types of minutiae (see photos). A
new technology has been proposed to rate the quality of fingerprints based on
gradient quality (by converting the latent fingerprint into a digital copy and
then examining pixels around the image). The quality score ranges from 0 to
100, where higher scores indicate higher quality. For the purposes of this test,
you do not need to know the specific technical details of this algorithm. This
study considered only prints of poor quality that are damaged by substances
from crime scenes and so are naturally harder to match. Amongst these poor
quality prints, only ones that were matched properly to digital prints of suspects
based on one dominant minutia (either ridge ending, bifurcation, or island)
were analyzed. The question of interest is if the quality of latent fingerprints
differ by the dominant minutia type used to match them. Refer to the R output
The variables are:
‘quality score’ (score from 0 to 100)
‘minutia type’ (R-ridge ending, B-bifurcation, I-island)
R OUTPUT ON NEXT PAGE
STA310 - Winter 2021 Midterm Test Solutions Page 9 of 11
Fingerprint Matching based on Minutiae: R OUTPUT
> tapply(quality_score,minutia_type,sd)
B I R
12.05481 20.78583 11.78830
> tapply(quality_score,minutia_type,length)
B I R
8 8 8
> model1 <- lm(quality_score ~ minutia_type)
> summary(model1)
Call:
lm(formula = quality_score ~ minutia_type)
Residuals:
Min 1Q Median 3Q Max
-28.875 -8.238 -1.994 11.797 29.725
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.912 5.463 5.109 4.63e-05 ***
minutia_typeI 21.763 7.726 2.817 0.0103 *
minutia_typeR -16.400 7.726 -2.123 0.0458 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.45
Multiple R-squared: 0.539, Adjusted R-squared: 0.4951
> anova(model1)
Analysis of Variance Table
Response: quality_score
Df Sum Sq Mean Sq F value Pr(>F)
minutia_type (A) 5863.8 (C) (D) 0.000294 ***
Residuals (B) 5014.3 238.78
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> bonf = pairwise.t.test(quality_score, minutia_type, p.adj = "none")
> bonf
Pairwise comparisons using t tests with pooled SD
data: quality_score and minutia_type
STA310 - Winter 2021 Midterm Test Solutions Page 10 of 11
B I
I (E) -
R (F) 6.9e-05
P value adjustment method: none
> tukeyCIs = TukeyHSD(aov(model1),factor="minutia_type")
> tukeyCIs
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = model1)
$minutia_type
diff lwr upr p adj
I-B 21.7625 2.288026 41.236974 0.0267884
R-B -16.4000 -35.874474 3.074474 0.1093058
R-I -38.1625 -57.636974 -18.688026 0.0001967
STA310 - Winter 2021 Midterm Test Solutions Page 11 of 11

学霸联盟