rstufio代写-ECO2008|学霸联盟

rstufio代写-ECO2008

时间：2021-02-25

ECO2008 Tutorial 2
Alan Fernihough
2021-02-05
Data
By definition, unofficial economic activity in what is called the black economy escapes being recorded in a
country’s GDP. It is conceivable that official and unofficial employment are substitutes: if one goes up, the
other comes down. Table 1 displays data on unemployment (%) and an estimate of the size of the black
economy (%) for 7 large economies from the year 1999.
Unemployment and the Black Economy
Country Unemployment (%) Black Economy (%)
United Kingdom 9.5 7
United States 6.5 7
Italy 11.5 20
Japan 3 4
France 12 8
Germany 9 9
Spain 23 25
Create Variables as Vectors
Country <- c("United Kingdom","United States","Italy","Japan","France","Germany","Spain")
Unemply <- c(9.5,6.5,11.5,3,12,9,23)
Blackeco <- c(7,7,20,4,8,9,25)
Some Basic Statistics
mean(Unemply)
## [1] 10.64286
sd(Unemply)
## [1] 6.256425
1
quantile(Unemply, probs = seq(0, 1, 0.2))
## 0% 20% 40% 60% 80% 100%
## 3.0 7.0 9.2 10.7 11.9 23.0
summary(Unemply)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 7.75 9.50 10.64 11.75 23.00
Creating a Data Frame
Data frames are useful structures in which to store data. Here we will create one using the data above.
data <- data.frame(Country, Unemply, Blackeco)
data
## Country Unemply Blackeco
## 1 United Kingdom 9.5 7
## 2 United States 6.5 7
## 3 Italy 11.5 20
## 4 Japan 3.0 4
## 5 France 12.0 8
## 6 Germany 9.0 9
## 7 Spain 23.0 25
summary(data)
## Country Unemply Blackeco
## Length:7 Min. : 3.00 Min. : 4.00
## Class :character 1st Qu.: 7.75 1st Qu.: 7.00
## Mode :character Median : 9.50 Median : 8.00
## Mean :10.64 Mean :11.43
## 3rd Qu.:11.75 3rd Qu.:14.50
## Max. :23.00 Max. :25.00
Indexing the Data Frame
Use the dollar sign $ to index, i.e. refer to a specific variable in the data frame.
data$Unemply
## [1] 9.5 6.5 11.5 3.0 12.0 9.0 23.0
mean(data$Blackeco)
## [1] 11.42857
2
Create New Variable
Let’s say we want to create a new variable that is a transformed version of an existing variable. Let’s get
unemployment as a share rather than percentage point.
data$UnemplyShare <- data$Unemply/100
data
## Country Unemply Blackeco UnemplyShare
## 1 United Kingdom 9.5 7 0.095
## 2 United States 6.5 7 0.065
## 3 Italy 11.5 20 0.115
## 4 Japan 3.0 4 0.030
## 5 France 12.0 8 0.120
## 6 Germany 9.0 9 0.090
## 7 Spain 23.0 25 0.230
Tables
table(data$Country)
##
## France Germany Italy Japan Spain
## 1 1 1 1 1
## United Kingdom United States
## 1 1
Europe Dummy Variable
europe <- c("United Kingdom","Italy","France","Germany","Spain")
data$Europe <- ifelse(data$Country %in% europe, 1, 0)
summary(data)
## Country Unemply Blackeco UnemplyShare
## Length:7 Min. : 3.00 Min. : 4.00 Min. :0.0300
## Class :character 1st Qu.: 7.75 1st Qu.: 7.00 1st Qu.:0.0775
## Mode :character Median : 9.50 Median : 8.00 Median :0.0950
## Mean :10.64 Mean :11.43 Mean :0.1064
## 3rd Qu.:11.75 3rd Qu.:14.50 3rd Qu.:0.1175
## Max. :23.00 Max. :25.00 Max. :0.2300
## Europe
## Min. :0.0000
## 1st Qu.:0.5000
## Median :1.0000
## Mean :0.7143
## 3rd Qu.:1.0000
## Max. :1.0000
3
Simple Scatterplot
We are going to use ggplot to make beautiful graphics.
library(ggplot2)
ggplot(data, aes(x=Unemply, y=Blackeco)) +
geom_point()
5
10
15
20
25
5 10 15 20
Unemply
Bl
ac
ke
co
Let’s add loads of bells and whistles. Look at all of the different options you have here:
https://ggplot2-book.org/index.html
https://rdpeng.github.io/Biostat776/lecture-the-ggplot2-plotting-system-part-1.html
https://tidyverse.github.io/ggplot2-docs/reference/
theme_af <- theme(
legend.position = "bottom",
panel.background = element_rect(fill = NA),
panel.border = element_rect(fill = NA, color = "grey75"),
axis.ticks = element_line(color = "grey85"),
panel.grid.major = element_line(color = "grey95", size = 0.2),
panel.grid.minor = element_line(color = "grey95", size = 0.2),
legend.key = element_blank(),
text = element_text(size = 14)
)
4
library(ggrepel)
ggplot(data, aes(x=Unemply, y=Blackeco, label = Country))+
geom_smooth(method = "lm", se = F, col = "red")+
geom_point(col = "steelblue", alpha = 0.5, size = 4) +
xlab("Black Economy (%)")+
ylab("Unemployment (%)")+
geom_text_repel()+
theme_af
## `geom_smooth()` using formula 'y ~ x'
United Kingdom
United States
Italy
Japan
FranceGermany
Spain
5
10
15
20
25
5 10 15 20
Black Economy (%)
Un
em
pl
oy
m
en
t (%
)
ggsave("blackeconunemploy.png", device = "png", width=6, height=4)
## `geom_smooth()` using formula 'y ~ x'
Tasks for Students
Let’s look at a dataset containing the album names, release year, and Pitchfork.com album review score for
the American band Deerhunter. Deerhunter Albums and Pitchfork Reviews
5
Album Title Year Score Best New Music
Fluorescent Grey EP 2007 8.8 Y
Cryptograms 2007 8.9 Y
Microcastle/Weird Era Cont. 2008 9.2 Y
Rainwater Cassette Exchange 2009 7.5 N
Halcyon Digest 2010 9.2 Y
Itunes Live from Soho 2011 8.2 N
Monomania 2013 8.3 Y
Fading Frontier 2015 8.4 Y
1. What are the median, sd, max and min of the “Score” variable?
2. Create a data.frame consisting of the year, score, and best new music variables.
3. Create a dummy variable indicating if the album was awarded a “Best New Music” tag
4. What is the average score for albums awarded “best new music” compared to the two albums that were
not?
5. Create a barplot that shows the individual album’s scores (i.e. [https://www.r-graph-gallery.com/218-
basic-barplots-with-ggplot2.html] (https://www.r-graph-gallery.com/218-basic-barplots-with-ggplot2.
html)). Please see the below for an example. Note the “fill” color is steelblue.
6
7