程序代写案例-DATA5207
时间:2022-04-27
Lecture 3: Confounding factors and human
behaviour
DATA5207: Data analysis in the social sciences
Dr Shaun Ratcliff
The world is not simple.
$0
$25,000
$50,000
$75,000
$100,000
60 65 70 75 80
Height (inches)
An
nu
a
l e
ar
n
in
gs
(1
99
1 U
S$
)
Earnings by height
Confounding factors.
$0
$25,000
$50,000
$75,000
$100,000
60 65 70 75 80
Height (inches)
An
nu
a
l e
ar
n
in
gs
(1
99
1 U
S$
)
gender
Female
Male
Earnings by height and gender
$0
$25,000
$50,000
$75,000
$100,000
60 65 70 75 80
Height (inches)
An
nu
a
l e
ar
n
in
gs
(1
99
1 U
S$
)
gender
Female
Male
Earnings by height and gender
This was made with just this code:
ggplot(earnings.data,
aes(y=earn, x=height,
colour=gender, fill=gender)) +
geom_smooth(method="lm", fullrange=TRUE,
colour="black", se=FALSE, aes(group=1)) +
geom_smooth(method="lm", fullrange=TRUE) +
geom_jitter(alpha=.5, width = 2, height = 1000) +
scale_color_manual(values=c("red","blue")) +
labs(title = "Earnings by height and gender",
y = "Annual earnings (1991 US$)",
x = "Height (inches)") +
scale_y_continuous(labels = dollar,
breaks = c(0,25000,50000,75000, 100000))+
coord_cartesian(ylim=c(0,100000), xlim=c(58,80)) +
theme_bw()
Confounding factors.
Linear regression
y = α + βx + ϵ
Linear regression
Assuming y is your dependent variable, x is your substantive
predictor and X all your controls (other independent variables):
When you fit a regression, it shows you the estimated change of y
when x changes by 1, holding X constant (at zero or their baseline
category).
If some subset of X is a confounding factor for x , this will change
the coefficient than if you fit a regression without these controls
included.
Fitting linear regression in R
earnings.model <- lm(earn ~ z.height + race2 + gender + z.age,
data = earnings.data)
display(earnings.model)
## lm(formula = earn ~ z.height + race2 + gender + z.age, data = earnings.data)
## coef.est coef.se
## (Intercept) 16444.11 758.48
## z.height 4557.52 1427.38
## race2Black -2632.00 1742.56
## race2Other 1620.40 3590.42
## race2Hispanic -4072.34 2141.91
## genderMale 11240.25 1450.25
## z.age 4252.46 1137.31
## ---
## n = 1377, k = 7
## residual sd = 18354.27, R-Squared = 0.14
Regression as a tool
▶ Why use regression?
▶ Three major uses both in academia, and in private and public
sectors.
▶ Controlling for confounding factors.
▶ Smoothing.
▶ Prediction.
Regression as a tool
▶ Why use regression?
▶ Three major uses both in academia, and in private and public
sectors.
▶ Controlling for confounding factors.
▶ Smoothing.
▶ Prediction.
Regression as a tool
▶ Why use regression?
▶ Three major uses both in academia, and in private and public
sectors.
▶ Controlling for confounding factors.
▶ Smoothing.
▶ Prediction.
Regression as a tool
▶ Why use regression?
▶ Three major uses both in academia, and in private and public
sectors.
▶ Controlling for confounding factors.
▶ Smoothing.
▶ Prediction.
Regression as a tool
▶ Why use regression?
▶ Three major uses both in academia, and in private and public
sectors.
▶ Controlling for confounding factors.
▶ Smoothing.
▶ Prediction.
The research project
In the labs.
Labs
▶ Fitting linear regressions in R.
▶ Reading regression output.
▶ Standardising variables.
▶ Plotting regression estimates.
Labs
▶ Fitting linear regressions in R.
▶ Reading regression output.
▶ Standardising variables.
▶ Plotting regression estimates.
Labs
▶ Fitting linear regressions in R.
▶ Reading regression output.
▶ Standardising variables.
▶ Plotting regression estimates.
Labs
▶ Fitting linear regressions in R.
▶ Reading regression output.
▶ Standardising variables.
▶ Plotting regression estimates.

essay、essay代写