ECMT1010-ecmt1010代写
时间:2023-09-04
ECMT 1010
Introduction to economic statistics
Semester 2, 2023
Cynthia Wen
1
statistics uses data to answer questions . . .
. . . data are a collection of variables measured on
individual cases or observations
• each variable contains specific information on each case
• data are often organized into a spreadsheet (matrix)
• questions are usually framed in terms of a hypothesis
2
EXAMPLES http://www.lock5stat.com/datapage.html
• countries dataset
AllCountries.xlsx
• student survey dataset
StudentSurvey.xls
3
. . . types of variable
• a categorical variable: defines groups
• e.g., gender, award, year
• a quantitative variable: numerical measure
• e.g., SAT, height, pulse, year
. . . relationships between variables
• we often use one variable, the explanatory variable,
to understand or predict the values of another, the
response variable
• for example:
• does meditation help reduce stress?
• does sugar consumption increase hyperactivity?
• does the interest rate affect the exchange rate?
THE key concepts
A population includes all the individuals
or objects of interest
• a sample consists of the cases selected into a dataset;
a sample is a subset of the population
• the process of using a sample to gain information
about the population is called inference
sample
population sampling
7
inference
8Most famous stuff-up in stats history
The newspaper was published before the end of the
1948 U.S. presidential election, based on the results of
a large telephone poll.
The poll showed that Thomas Dewey would easily
defeat Harry Truman.
• the problem is: Truman won the election.
• what went wrong?
9
sampling bias occurs when the method used to
select the sample causes it to differ from the
population in a relevant way
• if sampling bias exists, we cannot trust any
generalization from the sample to the population
• that is, we will make incorrect inferences
sample
sample
population
11
• how can we avoid sampling bias?
• imagine putting the names of the entire
population into a hat, and drawing out 2,000
names at random
• (we can use technology to do this)
12
take a RANDOM sample
Before the 2008 U.S. election, Gallup (a polling firm)
took a random sample of 2,847 Americans.
• 52% of those sampled supported Obama
• 53% voted for Obama in the election
• in this case, the inference was accurate
13
random sampling
random versus non-random sampling
• random samples (usually) provide accurate
information on the population
• non-random samples (usually) suffer from sampling
bias; any implied population information will be wrong
• non-random samples cannot be trusted to make
generalizations about the population
14
Reality check . . .
• a random sample is ideal, but may not be feasible
• you may have to alter the ‘target population’ to get
something feasible to sample
EXAMPLE: suppose you are interested in all student
opinions, but you only have data from one class
• inferences are limited to the population sampled
15
EXAMPLE – sampling bias
Suppose you want to estimate the average number of
hours students spend studying each week
Which is the best method of sampling?
1. Go to the library and ask all the students how much they study.
2. Email all students asking how much they study and use the responses.
3. Hand out a questionnaire in class and make every student respond.
4. Stand outside Manning Bar and ask the people going in.
16
Bad methods of sampling – 1
• sampling based on something obviously related to
the variable(s) of interest
• e.g., sampling students in the library (or pub) about study
habits
EXAMPLE: many online surveys
• sydneycyclingclub.org.au or www.mynrma.com.au
17
• allowing the sample to be made up of whoever
chooses to participate (volunteer bias)
• e.g., email all students and base your analysis on the
replies
• responders may not be representative of the population
EXAMPLE: sites posting reviews, e.g., Google
18
Bad methods of sampling – 2
sample
population
sampling bias?
other sources
of bias?
19
association vs causation
two variables are associated if their
values are related to one another
two variables are causally associated
if the value of one variable influences
the value of the other
21
another example of
association
• TVs do not cause people
to live longer
• again, not a causal
association
0 200 400 600 800 1000
40
50
60
70
80
TVs per 1000 People
Li
fe
E
xp
ec
ta
nc
y
Angola
Australia
Cambodia
Canada
China
Egypt
France
Haiti
Iraq
Japan
Madagascar
Mexico
Morocco
Pakistan
Russia
South Africa
Sri Lanka
Uganda
United KingdomUnited States
Vietnam
Yemen
r = 0.74
→ association does not imply causal association
What’s going on with TVs and life expectancy?
22
number of TVs
per capita
life
expectancy
wealth
→ wealth is a confounding variable
a third variable associated with both the
explanatory variable and the response
variable is called a confounding variable
• confounding variables are a major problem when
you are trying to establish causal association
• causal association cannot be determined when
confounding variables are present
exercise dementia
lifestyle
choices
EXAMPLE 1 effect of exercise on dementia in the elderly
how can we eliminate confounding variables?
. . . a process referred to as randomization
25
by RANDOMLY assigning the
values of the explanatory variable
exercise dementia
lifestyle
choices
EXAMPLE: randomly assign elderly people to either an
exercise program or not
random assignment
randomized experiment or trial
• different levels of the explanatory variable are called
treatments
• we randomly divide subjects into groups, and assign a
different treatment to each group
• because the groups are chosen randomly, they should look
(roughly) similar in every aspect except the treatment
• emergent group differences may be attributed to the different
treatments
28
if a randomized experiment yields a strong
association, we may establish causation from the
explanatory to the response variable
randomized experiments are very powerful
because they allow us to infer causality
• an experimental study is a setting where the researcher
controls the explanatory variable along with random
assignment
• e.g., randomly assign exercise/non-exercise groups of elderly
• e.g., randomly assign textbook/non-textbook groups of students
• an observational study uses information gathered from
observed behaviour as it naturally exists
• e.g., look at exercise habits and dementia among the elderly
• e.g., look at whether a student buys a textbook and their mark
29
EXAMPLE: Exercise and the brain
Reynolds, “Phys Ed: Your Brain on Exercise", NY Times, July 7, 2010.
An experiment to determine whether exercise changes the brain
Step 1: Assign rats randomly to one of two groups
Treatment group Control group
Step 2: Measure brain activity and IQ in the two groups
• can this experiment a causal link between exercise and the brain?
30
31
exercise IQ
“energetic”
gene
• an observational study has a confounding variable
• an experiment eliminates it, so causality may be established
random assignment
EXAMPLE: Knee Surgery for Arthritis (1)
Researchers conducted a study on the effectiveness of a knee
surgery to cure pain from arthritis
• whether people got knee surgery was randomly determined
• the surgery group reported less pain than the control group
Is this evidence that the surgery causes a decrease in pain?
• not necessarily . . . due to placebo effect
• patients believe they are better because they have been treated
32
EXAMPLE: Knee Surgery for Arthritis (2)
“The Placebo Prescription,” NY Times Magazine, 1/9/00
In another study, the control group received fake knee surgery
(patients were anaesthetized and cut open, but no surgery was
performed)
• both groups are subject to a placebo effect, so it has been eliminated
• the reported pain reduction was the same for both groups!
Conclusion: the knee surgery is not effective
33
is the sample
randomly selected?
possible to
generalize to
the population
Yes
cannot
generalize to
the population
No
is the explanatory variable
randomly assigned?
possible to make
conclusions
about causality
Yes
cannot make
conclusions
about causality
No
randomness in data collection
34
does this mean observational studies are useless?
• a random sample is not always achievable (e.g., in economics)
o if the focus is estimating a statistic about a population, you
need a random sample but not a randomized experiment
• e.g., election polling, GDP, unemployment, etc.
o if the focus is establishing causality, you need a randomized
experiment
• e.g., drug testing