FIT2086 -fit2086 modelling for data analysis代写
时间:2023-02-12
FIT2086是一门为留学生提供的计算机科学课程。该课程主要介绍了计算机系统的基础知识,包括硬件和软件的工作原理、编程语言、数据结构和算法。学生将学习如何使用编程语言开发计算机程序,以及如何设计和实现有效的算法。除了课堂讲解,课程还涵盖了实际项目,让学生把所学知识应用到实际问题中。通过学习FIT2086,留学生将能够更好地了解计算机科学,并为以后的学习和职业生涯打下坚实的基础。
FIT2086 Exam Revision
Supplementary Questions
Daniel F. Schmidt
November 8, 2022
Contents
1 Introduction 2
2 Maximum Likelihood Estimation 2
3 Confidence Intervals and p-values 3
4 Logistic Regression 4
5 Bias and Variance 5
6 Appendix I: Standard Normal Distribution Table 6
7 Appendix II: Formulae 7
1
1 Introduction
This document contains some extra examples of the types of questions you will be asked on the exam.
2 Maximum Likelihood Estimation
A random variable Y is said to follow a Gamma distribution with an integer shape parameter equal to α, and a
rate parameter β, if
P(Y = y |α, β) = β
α
(α− 1)!Y
α−1 exp (−βY )
where y > 0 is a non-negative continuous number. Imagine we observe a sample of n non-negative real numbers y =
(y1, . . . , yn) and want to model them using a Gamma distribution. (hint: remember that the data is independently
and identically distributed).
1. Write down the Gamma distribution likelihood function for the data y (i.e., the joint probability of the data
under a Gamma distribution with shape parameter α and rate parameter β).
A: The data is independently and identically distributed, so the likelihood is the product of the probability
for each data point
p(y |α, β) =
n∏
i=1
βα
(α− 1)!y
α−1
i exp (−βyi)
=
βnα
((α− 1)!)n
(
n∏
i=1
yα−1i
)(
n∏
i=1
exp (−βyi)
)
=
βnα
((α− 1)!)n
(
n∏
i=1
yi
)α−1
exp
(
−β
n∑
i=1
yi
)
where we use the fact that e−ae−b = e−a−b and abcb = (ac)b.
2. Write down the negative log-likelihood function of the data y under a Gamma distribution with shape pa-
rameter α and rate parameter β.
A: Taking negative logarithm of the above likelihood we have
− log p(y |α, β) = − log
 βnα
((α− 1)!)n
(
n∏
i=1
yi
)α−1
exp
(
−β
n∑
i=1
yi
)
= − log β

((α− 1)!)n − log
(
n∏
i=1
yi
)α−1
+ β
n∑
i=1
yi
= −nα log β + n log(α− 1)!− (α− 1) log
(
n∏
i=1
yi
)
+ β
n∑
i=1
yi
where we use the facts: log a b = log a+ log b, log ab = b log a and log ea = a.
3. Assume that α is known (i.e., we do not have to estimate it but it is a given constant). Derive the maximum
likelihood estimator for β.
A: Differentiate the negative log-likelihood with respect to β:
d

{− log p(y |α, β)} = − d

{nα log β}+ d

{n log(α− 1)!} − d

{
(α− 1) log
(
n∏
i=1
yi
)}
+
d

{
β
n∑
i=1
yi
}
= −nα d

{log β}+
n∑
i=1
yi
d

{β}
= −nα
β
+
n∑
i=1
yi
2
where we use d log x/dx = 1/x. Now set the derivative to zero and solve for β:
−nαβ +
∑n
i=1 yi = 0
⇒ −nα+ β∑ni=1 yi = 0
⇒ β∑ni=1 yi = nα
⇒ β = nα∑n
i=1 yi
3 Confidence Intervals and p-values
A car company runs a fuel efficiency test on a new model of car. They perform 6 tests, and in each test they
drive the car until the fuel tank is empty, then calculate the liters of fuel consumed per one-hundred kilometers of
distance covered. The observed efficiencies (in litres per 100 kilometers, L/100km) were:
y = (7.87, 8.10, 9.07, 8.83, 7.60, 8.91).
From previous efficiency experiments the car company has estimated the population standard deviation in fuel
efficiency recordings (i.e., the experimental error) to be 0.3 (L/100km). We can assume that a normal distribution is
appropriate for our data, and that the population standard deviation of fuel efficiency recordings for our experiment
is the same as the population fuel efficiency recordings of previous experiments.
1. Using our sample, estimate the population mean fuel effiency for this brand of car. Calculate a 95% confidence
interval for the population mean fuel efficiency and summarise your results appropriately.
A: We begin by computing the mean, which is
µˆ = (7.87 + 8.10 + 9.07 + 8.83 + 7.60 + 8.91)/6 ≈ 8.396
We are assuming that the population standard deviation is known and is σ = 0.3. To compute the the 95%
confidence interval we use the formula
CI95% =
(
µˆ− 1.96 σ√
n
, µˆ+ 1.96
σ√
n
)
where our sample size n = 6. We therefore have:
CI95% =
(
8.396− 1.96 0.3√
6
, µˆ+ 1.96
0.3√
6
)
= (8.156, 8.636)
We can summarise this by saying: “The estimated population mean fuel efficiency of this brand of car is 8.396
L/100km. We are 95% confident that the population mean efficiency for this brand of car is between 8.156
L/100km and 8.636 L/100km.”
2. The car company runs the same set of tests, on the same set of cars, but with a different brand of fuel. The
new observed fuel efficiencies (again, in L/100km) were
yB = (7.74, 7.74, 8.22, 7.88, 7.85, 8.27).
The company wants to know if this fuel has made any difference to the fuel efficiency. Again, we can assume
the population standard deviation for this new set of fuel efficiency measurements is known to be 0.3 L/100km.
Using this information, please provide a p-value for testing the null hypothesis that the mean fuel efficiency
for the two fuel types is the same. Please interpret this p-value.
A: Let µA be the population fuel efficiency of our first brand of fuel, and µB be the population fuel efficiency
for the second brand of fuel. We want to test the hypothesis:
H0 : µA = µB
vs
HA : µA 6= µB
that is, our null hypothesis is that there is no difference in fuel efficiency between either of the fuels. To test
this, we need an estimate for µA, which we have from above (µˆA = 8.396), and an estimate for the population
mean fuel efficiency for the fuel type B, which is
µˆB = (7.74 + 7.74 + 8.22 + 7.88 + 7.85 + 8.27)/6 = 7.95
3
Again we are assuming the population standard deviation is known and is σ = 0.3. So we need to calculate a
z-score for difference of two means with known variances which has the formula
zµˆA−µˆB =
µˆA − µˆB√
σ2
nA
+ σ
2
nB
where nA is the sample size for the first fuel type (nA = 6) and nB is the sample size for the second fuel type
(nB = 6). We then have
zµˆA−µˆB =
8.396− 7.95√
0.32
6 +
0.32
6
≈ 2.575.
To calculate the p-value we use the formula
p = 2P (Z < −|zµˆA−µˆB |) .
To do this, use the Standard Normal Distribution table in the Appendix. Find the value closest to |zµˆA−µˆB | =
2.575 in the |z| column: this is 2.605. Then, we see that P(Z < −2.605) = 0.004598, so we can calculate our
p-value to be approximately
p ≈ 2× 0.004598 ≈ 0.0092
We can conclude then that: “We have strong evidence to reject the null hypothesis that the two fuel types
are the same. If the two fuel types were the same, then the likelihood of seeing a difference in average fuel
efficiency as large, or larger than the one we observed in our experiment is approximately 1 in 110, which is
quite unlikely.”
4 Logistic Regression
Imagine that we have built a logistic regression model to predict the probability of heart disease, H, given a persons
age (AGE, in years) and their cholesterol level (CHOL, in mg/dl):
logOdds(H) = −7 + 0.0082 CHOL + 0.1 AGE
1. From this model, what is the effect of age of heart disease?
A: For every year a person lives, the log-odds of heart-disease increase by 0.1
2. If a person is 70 years old and has 260 mg/dl cholesterol, what is the probability that they will have heart
disease?
A: Plug the numbers into the above formula to get the log-odds:
logOdds(H) = −7 + 0.0082 · 260 + 0.1 · 70 ≈ 2.132
Then we can use the logistic transformation to get the probability:
P(H = 1) =
1
1 + e−2.132
≈ 0.894
3. Using this probability, if you were asked to predict if they have heart disease, what would you say?
A: The probability of having heart disease is 0.894 > 0.5, so we would say that we would predict this individual
to have heart disease.
4. If you suspected cholesterol was non-linearly related to the log-odds of having heart disease, what could you
do to try and improve the model?
A: There are two different possible answers:
• We could use some non-linear transformations of CHOL; for example, adding the square or cube of CHOL
(or even more polynomial transformations), or the log of CHOL, etc into the logistic regression model as
new predictors;
• We could potentially switch from using a logistic regression to a decision tree or random forest (i.e., a
model that allows non-linearities naturally)
4
5 Bias and Variance
Imagine you have a n random variables Y1, . . . , Yn, with mean E [Yi] = µ and variance V [Yi] = σ2. Consider the
sample mean
Y¯ =
1
n
n∑
i=1
Yi.
Answer the following questions:
1. Prove that the sample mean Y¯ is an unbiased estimator of the population mean.
A: The bias of the sample mean is
bias = E
[

]− µ
so we have
E
[

]
= E
[
n∑
i=1
Yi
n
]
=
1
n
E
[
n∑
i=1
Yi
]
=
1
n
n∑
i=1
E [Yi]
=
1
n
n∑
i=1
µ
= µ
Plugging this into the above formula for bias yields a bias of zero proving the sample mean is unbiased.
2. Prove that the sample mean Y¯ has variance V
[

]
= σ2/n.
A: The variance of the sample mean is:
V
[

]
= V
[
n∑
i=1
Yi
n
]
=
1
n2
V
[
n∑
i=1
Yi
]
=
1
n2
n∑
i=1
V [Yi]
=
1
n2
n∑
i=1
σ2
=
σ2
n
3. Prove that the sample mean is a consistent estimator of the population mean.
A: An estimator θˆ is a consistent estimator of θ if
E
[
(θ − θˆ)2
]
→ 0 as n→∞
Remember the mean-squared-error can be written as the sum of the bias-squared and variance:
E
[
(µ− Y¯ )2] = E [Y¯ − µ]2 + V [Y¯ ]
=
σ2
n
It is clear that as n→∞, σ2/n→ 0.
5
6 Appendix I: Standard Normal Distribution Table
|z| P(Z < −|z|) P(Z < |z|) |z| P(Z < −|z|) P(Z < |z|)
0.000 0.500000 0.500000 2.047 0.020353 0.979647
0.093 0.462943 0.537057 2.140 0.016196 0.983804
0.186 0.426204 0.573796 2.233 0.012789 0.987211
0.279 0.390096 0.609904 2.326 0.010020 0.989980
0.372 0.354912 0.645088 2.419 0.007790 0.992210
0.465 0.320924 0.679076 2.512 0.006009 0.993991
0.558 0.288375 0.711625 2.605 0.004598 0.995402
0.651 0.257471 0.742529 2.698 0.003491 0.996509
0.744 0.228382 0.771618 2.791 0.002630 0.997370
0.837 0.201237 0.798763 2.884 0.001965 0.998035
0.930 0.176125 0.823875 2.977 0.001457 0.998543
1.023 0.153093 0.846907 3.070 0.001071 0.998929
1.116 0.132151 0.867849 3.163 0.000781 0.999219
1.209 0.113273 0.886727 3.256 0.000565 0.999435
1.302 0.096403 0.903597 3.349 0.000406 0.999594
1.395 0.081455 0.918545 3.442 0.000289 0.999711
1.488 0.068326 0.931674 3.535 0.000204 0.999796
1.581 0.056894 0.943106 3.628 0.000143 0.999857
1.674 0.047024 0.952976 3.721 0.000099 0.999901
1.767 0.038577 0.961423 3.814 0.000068 0.999932
1.860 0.031410 0.968590 3.907 0.000047 0.999953
1.953 0.025381 0.974619 > 4.000 < 0.000032 > 0.999968
Table 1: Cumulative Distribution Function for the Standard Normal Distribution Z ∼ N(0, 1)
6
7 Appendix II: Formulae
Formulae are collated on this page, some of which may be useful in answering this exam.
Probability and Random Variables
Expectation of a RV: E [X] =

x∈X
x · P(X = x)
Marginal probability formula: P(Y = y) =

x∈X P(Y = y,X = x)
Conditional probability formula: P(Y = y |X = x) = P(Y = y,X = x)
P(X = x)
Bayes’ Rule: P(Y = y |X = x) = P(X = x |Y = y)P(Y = y)
P(X = x)
Differentiation
d
dx
{a f(x)} = a d
dx
{f(x)}
d
dx
{
xk
}
= kxk−1
d
dx
{log x} = 1
x
Chain rule:
d
dx
{f(g(x))} = d
d g(x)
{f(g(x))} · d
dx
{g(x)}
Confidence Interval and Hypothesis Test for Mean with Known Variance
Let µˆ be the sample mean of a sample of size n with population variance σ2. Then a 100(1 − α)% confidence
interval for µ is (
µˆ− zα/2

σ2
n
, µˆ+ zα/2

σ2
n
)
where zα/2 is the 100(1− α/2)-percentile of the standard normal distribution N(0, 1). To test the null hypothesis
H0 : µ = µ0, calculate
p =
 2P(Z < −|zµˆ|) if H0 : µ = µ0 vs HA : µ 6= µ01− P(Z < zµˆ) if H0 : µ ≤ µ0 vs HA : µ > µ0P(Z < zµˆ) if H0 : µ ≥ µ0 vs HA : µ < µ0 .
where Z ∼ N(0, 1), and
zµˆ =
µˆ− µ0√
σ2/n
.
Confidence Interval and Hypothesis Test for Difference of Means with Known Variances
Let µˆx, µˆy be the sample means from two samples of size nx, ny, and σ
2
x, σ
2
y be the known population variances
of the two samples. The 100(1− α)% confidence interval for µx − µy is(
µˆx − µˆy − zα/2

σ2x
nx
+
σ2y
ny
, µˆx − µˆy + zα/2

σ2x
nx
+
σ2y
ny
)
where zα/2 is the 100(1− α/2)-percentile of the standard normal distribution N(0, 1). To test the null hypothesis
H0 : µx = µy, calculate
p =

2P(Z < −|z(µˆx−µˆy)|) if H0 : µx = µy vs HA : µx 6= µy
1− P(Z < z(µˆx−µˆy)) if H0 : µx ≤ µy vs HA : µx > µy
P(Z < z(µˆx−µˆy)) if H0 : µx ≥ µy vs HA : µx < µy
.
where Z ∼ N(0, 1), and
zµˆx−µˆy =
µˆx − µˆy√
σ2x/nx + σ
2
y/ny
7
essay、essay代写