rstufio代写-ECO2008-Assignment 1
时间:2021-02-25
ECO2008 Assignment 1
Alan Fernihough
IMPORTANT
It is vital that you are working with your assigned year. Please see the accompanying file on Canvas where
your student number is matched to a year in these data.
Data Description
The file OECDdata.csv is a panel dataset containing information on the following variables:
Name Description doi
LOCATION Country Isocode
TIME Year
POPULATION Population in millions link
FERTILITY Children/Woman link
GNI Gross national income in millions of US Dollars link
1
Setup
Step 1
Load the data with the read.csv() function.
Step 2
Create a new “GNI per capita” variable called “GNIPERCAP”. This variable should measure Gross National
Income per per capita (in thousands). If a country has a GNI = 15, 000 and a POPULATION = 1 then
GNIPERCAP = 15,0001×1,000 then GNIPERCAP=15 or $15,000 per capita. Use the mutate() function from the
package dplyr.
Step 3
Filter your data so that the data frame you analyze consists only of observations for your given year. Use the
filter() function for this.
Question 1
What are the mean and standard deviation associated with the fertility and GNI per capita variables? What
is the standard error associated of the two aforementioned sample means?
Use mean() and sd(). Remember that the standard error of the sample mean estimate is: SEx¯ = SD/
√
n,
and you can find out the sample size of a data variable with the function length().
Question 2
Construct a 95% confidence interval around the mean of the fertility variable from part (a). The formula is:
x¯± t∗n−1SEx¯ and you can look up the critical t∗n−1 using the statistical tables or with the function qt(0.975,
df = n) where n is the sample size.
Question 3
Is fertility below replacement? Evaluate this claim using a hypothesis test where H0 : f¯ ≥ 2.1, where f is the
total fertility rate.
What is the test t-test statistic? Do you reject the null with α = 0.05? What is the p-value?
Use the t.test() function.
Question 4
Is fertility lower in wealthier countries? Let’s model Fertility as a function of GNI:
FERTILITY = β0 + β1GNIPERCAPITA+ .
Estimate the bivariate model above using the lm() function. What is the estimated slope: βˆ1?
Question 5
Generate the previous model’s residuals ˆ using residuals() and then use these to calculate the model
RSS =
∑
ˆ2.
Calculate the total sum of squares: TSS =
∑
(y − y¯)2.
2
Use the TSS and RSS to calculate the explained sum of squares ESS.
What is the R2 statistic?
Hint
A good habit to get into to check if your regression estimates are correct is to eyeball the data using a
scatterplot. The code below produces the scatterplot image at the top of this page. As you can see there is a
slight negative relationship between income and fertility in 1970.
# load ggplot and ggrepel
library(ggplot2)
library(ggrepel)
# do ggplot where the dataslice object is that dataset for 1970
dataslice %>%
ggplot(aes(y =FERTILITY, x = GNIPERCAP, label = LOCATION ))+
geom_smooth(method = "lm", se = F, col = "red")+
geom_point(col = "steelblue", alpha = 0.5, size = 2) +
ylim(c(0,10))+
xlab("GNI per Capita (000s)")+
ylab("Total Fertility Rate")+
ggtitle("Income-Fertility Scatterplot in 1970") +
geom_text_repel()
ggsave("scatter1970.png", device = "png", width=6, height=5)
Data Sources
OECD (2020), Gross national income (indicator). doi: 10.1787/8a36773a-en (Accessed on 11 December 2020)
OECD (2020), Population (indicator). doi: 10.1787/d434f82b-en (Accessed on 11 December 2020)
OECD (2020), Fertility rates (indicator). doi: 10.1787/8272fb01-en (Accessed on 11 December 2020)
3