ST305-无代写|学霸联盟

ST305-无代写

时间：2023-12-22

ST305 Statistical Inference
1
Why study statistics?
Collecting, analyzing, and interpreting data is growing
more important every year in nearly every field.
Many important real-world decisions hinge on conflicting
claims about “what the data show,” such as:
Does raising the minimum wage increase unemployment?
Is a new cancer drug more effective than the current
standard of care?
What are the most likely ecological or agricultural effects of
climate change, and how do we mitigate them?
Interpreting data will also increasingly become part of your
personal decision making.
Knowledge on Statistics will empower us to be an active
participant in the data-driven arguments that drive
decisions and shape our understanding of the world.
2
Chapter 1 Probability Theory
“You can, for example, never foretell what any one man will
do, but you can say with precision what an average
number will be up to. Individuals vary, but percentages
remain constant. So says the statistician."
Sherlock Holmes
The signs of Four
3
Overview
Probability theory
the foundation upon which all of statistics is built
has a long and rich history, dating back to at least to the
17th century
The aims of this chapter:
to outline some of the basic ideas that are fundamental to
the study of statistics
not to give a thorough introduction to probability theory
4
Set theory
Sample space S: the set of all possible outcomes of an
experiment
Example 1: An experiment consists of flipping two coins.
The sample space
S = {(h,h), (h, t), (t ,h), (t , t)},
where t = tail and h = head .
Event: any subset of the sample space
An event is a set consisting of some possible outcomes of
the experiment.
Example 1: An experiment consists of flipping two coins.
The sample space
S = {(h,h), (h, t), (t ,h), (t , t)},
E = {(h,h), (h, t)} is an event (the event that a head
appears on the first coin).
5
Exercise: An experiment consists of flipping one coin. What is
the sample space? List one event.
6
Union
Union of events E and F : E ∪ F
E ∪ F : the event that consists of all outcomes that are
either in E
or in F
or in both E and F
Venn diagram
Example 1: Suppose E = {(h,h), (h, t)} and
F = {(t ,h), (h,h)}.
Then E ∪ F = {(h,h), (t ,h), (h,h)}.
The union, ∪∞n=1En: the event that consists of all outcomes
that are in at least one of the sets, E1, · · · , En.
7
Intersection
Intersection of events E and F : E ∩ F or EF
E ∩ F : the event that consists of all outcomes that are
both in E and in F
Venn diagram
Example 1: Suppose E = {(h,h), (h, t)} and
F = {(t ,h), (h,h)}.
Then E ∩ F = {(h,h)}.
The intersection, ∩∞n=1En: the event that consists of those
outcomes that are in all of the events, E1, · · · , En.
If E ∩ F does not contain any outcomes, it is the null event
and is denoted by ∅
8
Complementation
Complementation. Ac , the complement of A, is the set of
all elements that are not in A:
Ac = {x ; x ∈ A}
9
Exercise
Exercise: Suppose A = {(h,h), (h, t)}, B = {(t ,h)} and
C = {(h,h), (h, t), (t , t)}. Find A ∩ B, A ∪ C, A ∩ C and Cc .
10
Some rules for the operations of events
Commutative laws
E ∪ F = F ∪ E Exercise Show this is by using Venn diagrams
E ∩ F = F ∩ E Exercise Show this is by using Venn diagrams
Associative laws
E ∪ F ∪G = E ∪ (F ∪G)
E ∩ F ∩G = E ∩ (F ∩G)
Distributive laws:
(E ∪F )∩G = (E ∩G)∪(F ∩G) Exercise Show this is by using Venn diagrams
(E ∩F )∩G = (E ∩G)∩(F ∩G) Exercise Show this is by using Venn diagrams
11
DeMorgan’s laws
DeMorgan’s laws:
a) (∪ni=1Ei)c = ∩ni=1Eci
b) (∩ni=1Ei)c = ∪ni=1Eci
Proof to a):
Suppose x ∈ (∪ni=1Ei)c
=⇒ x /∈ ∪ni=1Ei
=⇒ x /∈ Ei for all i = 1,2, · · ·
=⇒ x ∈ Eci for all i = 1,2, · · ·
=⇒ x ∈ ∩ni=1Eci for all i = 1,2, · · ·
12
Disjoint (mutually exclusive) and partition
If E ∩ F = ∅, then E and F are said to be disjoint or
mutually exclusive
The events E1, E2,. . . are pairwise disjoint (or mutually
exclusive) if Ei ∩ Ej = ∅ for all i ̸= j .
If the events E1, E2,. . . are pairwise disjoint (or mutually
exclusive), then the collection of E1, E2,. . . forms a
partition of S.
Partitions are useful as it allows us to divided the smaple
space into small, non-overlapping pieces.
Example: Define Ei = [i , i + 1).
Then E1, E2,. . . are pairwise disjoint
E1, E2,. . . forms a partition of [0,∞).
13
Sigma algebra (or Borel field)
Let B be a collection of subsets of S. It is a Sigma algebra
(or Borel field) if all the following hold:
a. ∅ ∈ B
b. B is closed under complementation: If A ∈ B, then Ac ∈ B
b. B is closed under countable unions: If A1, A2, . . . ∈ B, then
∪∞i Ai ∈ B.
Example 1: {0,S}
Example 2: Let S = (−∞,∞) and B be a set that contains
all sets of the form [a,b], (a,b], (a,b) and [a,b).
14
Probability function
Given a sample space S and an associated sigma lagebra
B, a probability function P is a function with domain B
such that
Axiom 1: P(E) ≥ 0 for all E ∈ B
Axiom 2: P(S) = 1
Axiom 3: For any sequence of pairwise disjoint events, E1,
E2, . . . (EiEj = ∅ when i ̸= j),
P
(∞⋃
i=1
Ei
)
=
∞∑
i=1
P(Ei)
The three properties given above are also referred to as
the Kolmogorov Axioms (after A. Kolmogorov, one of the
fathers of probability theory).
Any function P that satisfies the Axioms of Probability is
called a probability function.
15
Example
Suppose a die (with 6 sides) is rolled and that all 6 sides are
equally likely to appear. What is the probability of rolling an
even number?
Answer.
The event of rolling even number: {2,4,6}
We have
P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) = 1
6
.
Note {2}, {4} and {6} is a sequence of mutually exclusive
events.
So by Axiom 3, the prob. of rolling an even number:
P({2,4,6}) = P({2}∪{4}∪{6}) = P({2})+P({4})+P({6}) = 1
2
.
16
A common method of defining a legitimate
probability function
Theorem: Let S = {s1, . . . , sn} be a finite set. Let B be any
sigma algebra of subsets of S. Let p1, . . . ,pn be
nonnegative numbers with
∑n
i=1 pi = 1. For any A ∈ B,
define P(A) by
P(A) =
∑
i:si∈A
pi .
(The sum over an empty set is defined to be 0 .)
Then P is a probability function on B.
This remains true if S = {s1, s2, . . .} is a countable set.
(See next slide for a proof)
17
Proof:
For any A ∈ B,P(A) =∑{i:si∈A} pi ≥ 0
=⇒ Axiom 1 is true.
P(S) =
∑
{i:si∈S}
pi =
n∑
i=1
pi = 1
=⇒ Axiom 2 is true.
Let A1, . . . ,Ak denote pairwise disjoint events. ( B contains only
a finite number of sets, so we need consider only finite disjoint
unions.) Then,
P
(
k⋃
i=1
Ai
)
=
∑
{j:sj∈∪ki=1Ai}
pj =
k∑
i=1
∑
{j:sj∈Ai}
pj =
k∑
i=1
P (Ai)
=⇒ Axiom 3 is true.
18
Some properties
P(∅) = 0
P(E) ≤ 1
P(Ec) = 1− P(E), or equivalently, P(E) + P(Ec) = 1.
If E ⊂ F , then P(E) ≤ P(F ).
Proof.
Note F = (E ∪ Ec)F = (EF ) ∪ (EcF ) = E ∪ (EcF )
(ExerciseUse Venn diagram to show this.)
Note Ec and F are mutually exclusive
By Axiom 3:
P(F ) = P(E) + P(EcF ) ≤ P(E)
(since P(EcF ) ≥ 0 by Axiom 1)
19
Some properties
P(E ∩ F c) = P(E)− P(E ∩ F ).
Proof.
E = (E ∩ F ) ∪ (E ∩ F c) (Exercise Show this in Venn
diagrams.)
E ∩ F and E ∩ F c are mutually exclusive
P(E) = P(E ∩ F ) + P(E ∩ F c)
20
Some properties
P(E ∪ F ) = P(E) + P(F )− P(EF ).
Proof.
E ∪ F = S(E ∪ F ) = (E ∪ Ec)(E ∪ F ) = E ∪ (EcF )
E and EcF mutually exclusive (by Axiom 3) =⇒
P(E ∪ F ) = P(E ∪ (EcF )) = P(E) + P(EcF )
Note F = (EF ) ∪ (EcF ) (by Axiom 3) =⇒
P(F ) = P(EF ) + P(EcF ), i.e., P(EcF ) = P(F )− P(EF )
21
Some properties
P(A) =
∑∞
i=1 P(A ∩ Ci) for any partition C1,C2, . . .
Proof.
A = A ∩ S = A ∩ (
∞⋂
i=1
Ci) =
∞⋃
i=1
(A ∩ Ci)
where the last equality follows from the Distributive Law.
Thus,
P(A) = P
(∞⋃
i=1
(A ∩ Ci)
)
Note Ci are disjoint =⇒ the sets A ∩ Ci are also disjoint
(Exercise Show this in Venn diagrams.)
P(A) = P
(∞⋃
i=1
(A ∩ Ci)
)
=
∞∑
i=1
P (A ∩ Ci)
22
Boole’s inequality
Boole’s inequality: P(E1 ∪ E2 ∪ · · · ∪ En) ≤
∑n
i=1 P(Ei).
Proof.
Construct a disjoint collection A∗1,A
∗
2, . . ., with the property that
∪∞i=1A∗i = ∪∞i=1Ai in the following way:
A∗1 = A1, A
∗
i = Ai\
i−1⋃
j=1
Aj
 , i = 2,3, . . .
Notation: A\B = A ∩ Bc.
P
(∞⋃
i=1
Ai
)
= P
(∞⋃
i=1
A∗i
)
=
∞∑
i=1
P (A∗i )
By construction A∗i ⊂ Ai , =⇒ P
(
A∗i
) ≤ P (Ai)
Thus,
∞∑
i=1
P (A∗i ) ≤
∞∑
i=1
P (Ai)
23
Conditional Probabilities
The conditional probability that E occurs given that F has
occurred:
P(E |F )
Definition of conditional probability: If P(F ) > 0,
P(E |F ) = P(EF )P(F ) .
24
Conditional Probability-Example
Joe is 80% certain that his missing key is in one of the two
pockets of his hanging jacket, being 40% certain it is in the
left-hand pocket and 40% certain it is in the right-hand pocket.
If a search of the left-hand pocket does not find the key, what is
the conditional probability that it is in the other pocket?
Answer:
L: the event that the key is in the left-hand pocket
R: the event that the key is in the right-hand pocket
The desired probability:
P(R|Lc) = P(RL
c)
P(Lc)
=
P(R)
1− P(L) =
40%
1− 40% =
2
3
.
25
Baye’s rule
Baye’s rule: Let A1, A2,. . . be a partition of the sample
space, and let B be any set. Then,
P(Fj |E) =
P(EFj)
P(E)
=
P(E |Fj)P(Fj)∑n
i=1 P(E |Fi)P(Fi)
.
26
Example
Suppose that we have 3 cards that are identical in form, except
that both sides of the first card are colored red, both sides of the
second card are colored black, and one side of the third card is
colored red and the other side black. The 3 cards are mixed up
in a hat, and 1 card is randomly selected and put down on the
ground. If the upper side of the chosen card is colored red,
what is the probability that the other side is colored black?
Solution
R- the upturned side of the chosen card is red
RR(BB,RB)- the chosen card is all red (all black, red black)
The desired probability: P(RB|R)
P(RB|R) = P((RB) ∩ R)
P(R)
=
P(R|RB)P(RB)
P(R|RR)P(RR) + P(R|RB)P(RB) + P(R|BB)P(BB)
=
( 1
2
) ( 1
3
)
(1)
( 1
3
)
+
( 1
2
) ( 1
3
)
+ 0
( 1
3
) = 1
3
. (Activity)
27
Independence between two events
Definition: E is (statistically) independent of F if
P(E |F ) = P(E) or equivlently P(EF ) = P(E)P(F )
.
Definition: Two events E and F that are not independent
are said to be dependent.
If E and F are independent, then the following pairs are
also independent
E and F c .
Proof. WTS: P(EF c) = P(E)P(F c)
Note P(E) = P(EF ) + P(EF c) = P(E)P(F ) + P(EF c) =⇒
P(E)− P(E)P(F ) = P(EF c) =⇒
P(EF c) = P(E)(1− P(F )) = P(E)P(F c)
Ec and F
Ec and F c .
28
Example
Suppose that we toss 2 fair dice. Let E1 (E2) denote the event
that the sum of the dice is 6 (7), and F denote the event that
the first die equals 4. (i) Is E1 independent of F? (ii) Is E2
independent of F?
Sol. (i) P(E1F ) = P({(4,2)}) = 136
P(E1)P(F ) = P({(1,5), (2,4), (3,3), (4,2), (5,1)})P({2)})
=
5
36
1
6
=
5
216
So P(E1F ) ̸= P(E1)P(F ) =⇒ E1 and F are NOT independent.
(ii) P(E2F ) = P({(4,3)}) = 136
P(E2)P(F ) = P({(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)})P({3)})
=
6
36
1
6
=
1
36
So P(E2F ) = P(E2)P(F ) =⇒ E2 and F are independent.
29
Independence among multiple events
Definition: Three events E , F , and G are said to be
independent if
P(EFG) = P(E)P(F )P(G)
P(EF ) = P(E)P(F )
P(EG) = P(E)P(G)
P(FG) = P(F )P(G)
The events E1, E2,. . ., En are said to be independent if for
every subset Ei1 , Ei2 ,. . ., Eir , r ≤ n of these events
P(Ei1Ei2 · · ·Eir ) = P(Ei1)P(Ei2) · · ·P(Eir )
30
31
ST305 Statistical Inference
32
Random variables
A random variable is a function from a sample space S
into the real numbers.
Example Suppose that our experiment consists of tossing
3 fair coins. If we let Y denote the number of heads that
appear, then Y is a random variable.
Suppose we have a sample space S = {s1, · · · , sn} with a
probability function P. Define a random variable with range
{x1, · · · , xm}. Let PX be defined by
PX (X = xi) = P({sj ∈ S : X (sj) = xi}.
Then,
PX is an induced probability function on X , defined in terms
of the original function P
We will simply write P(X = xi) rather than PX (X = xi).
33
Example
Suppose that our experiment consists of tossing 3 fair coins. If
we let Y denote the number of heads that appear, then Y is a
random variable.
Solution.
The random variable Y defined as above takes one of the
values 0, 1, 2, and 3 with respective probabilities:
PY (Y = 0) = P({(t , t , t)}) = 18
PY (Y = 1) = P({(t , t ,h), (t ,h, t), (t , t ,h)}) = 38
PY (Y = 2) = P({(t ,h,h), (h, t ,h), (h,h, t)}) = 38
PY (Y = 3) = P({(h,h,h)}) = 18
34
Cumulative distribution function
Definition: The function defined by
FX (x) = PX (X ≤ x) −∞ < x <∞
is called the cumulative distribution function (cdf) or,
more simply, the distribution function of X .
35
Example - Tossing three coins
Consider the experiment of tossing three fair coins, and let X =
number of heads observed.
The cdf of X :
FX (x) =

0 if −∞ < x < 0
1
8 if 0 ≤ x < 1
1
2 if 1 ≤ x < 2
7
8 if 2 ≤ x < 3
1 if 3 ≤ x <∞
This is a step function, see the graph below.
36
Example - Some observations
FX is defined for all values of x , not just those in
X = {0,1,2,3}.
For example,
FX (2.5) = P(X ≤ 2.5) = P(X = 0,1, or 2) = 78
FX has jumps at the values of xi ∈ X and the size of the
jump at xi is equal to P (X = xi)
The size of the jump at any point x is equal to P(X = x).
FX (x) = 0 for x < 0 since X cannot be negative
FX (x) = 1 for x ≥ 3 since x is certain to be less than or
equal to such a value.
FX can be discontinuous, with jumps at certain values of x .
By the way in which FX is defined, at the jump points FX
takes the value at the top of the jump.
FX is continuous when a point is approached from the
right.
37
Cumulative distribution function
Theorem: The junction F (x) is a cdf if and only if the
following three conditions hold:
limx→∞ F (x) = 0 and limx→−∞ F (x) = l .
F (x) is a nondecreasing function of x .
F (x) is right-continuous; that is, for every number x0,
limx↓x0 = F (x0).
38
Discrete Random Variables
Definition: A random variable X is discrete if Fx(x) is a
step function of x .
A discrete random variable can take on at most a
countable number of possible values
For a discrete r.v. X , the probability mass function is
defined by:
fX (a) = P(X = a)
If X must assume one of the values x1, x2, . . ., then
fX (xi) ≥ 0 for i = 1,2, . . .
fX (x) = 0 for all other values of x
∞∑
i=1
fX (xi) = 1
A discrete random variable’s mass function values equal to
the jump sizes in the distribution function.
39
Example 2a
The probability mass function of a random variable is given by
p(i) = e−λλi/i!. Find (a) P(X = 0) and (b) P(X > 2).
Solution.
a.
P(X = 0) = f (0) = e−λλ0/0! = e−λ
b.
P(X > 2) = 1− P(X = 0)− P(X = 1)− P(X = 2)
= 1− e−λ − λe−λ − λ2e−λ/2
40
Continuous random variables
Definition: A random variable X is continuous if Fx(x) is a
continuous function of x .
For a continuous random variable X , its distribution
function can be expressed in the form of
F (a) := P(X ≤ a) =
∫ a
−∞
fX (x)dx , −∞ < a <∞
with a function fX ≥ 0.
Then
fX is the probability density function (pdf)
41
Example
Let X be random variable with the pdf:
f (x) =
{
0, if x ≤ 0
λe−λx , if x ≥ 0
Then, its cdf:
F (x) =
{
0, if x ≤ 0
1− e−λx , if x ≥ 0
42
Properties of pdf and pmf
Theorem: A function fX (x) is a pdf (or pmf ) of a random
variable x if and only if
fX (x) ≥ 0 for all x∑
x fX (x) = 1 (pmf ) or
∫∞
−∞ fX (x)dx = 1 (pdf )
43
Identically distributed random variables
Definition: Let B1 be the smallest sigma algebra containing
all the interval of real numbers of the form (a,b), [a,b),
(a,b] and [a,b]. The random variables X and Y are
identically distributed if, for every set A ∈ B1,
P(X ∈ A) = P(Y ∈ A).
Note: two random variables that are identically distributed
are not necessarily equal.
Example (Identically distributed random variables)
Consider the experiment of tossing a fair coin three times.
Define the random variables X and Y by
X= number of heads observed and Y = number of tails
observed.
We can verify P(X = k) = P(Y = k).
=⇒ X and Y are identically distributed.
However, for no sample points do we have X (s) = Y (s).
44