rstudio代写-STA130
时间:2022-03-02
STA130 Midterm Code & Analysis Document
Winter 2022
Dr. Samantha-Jo Caetano & Dr. Scott Schwartz
March 2, 2022
Important notes
1. Do not assume that all aspects of each analysis shown are relevant or appropriate.
2. We highly recommend that BEFORE you start the timed component for that you spend some time
identifying which analyses are being conducted in Sections A & B.
3. You must carefully follow instructions about rounding numeric answers.
• Example 1: If a question states that you must round to 3 decimal places and your values were
0.222500 and 0.330000, you should input 0.223 and 0.330. You need to do this carefully or
Quercus will not mark your answer as correct. There will be no part marks for incorrect rounding
or formatting.
• Example 2: If a question states that you must round to 3 decimal places and your value was
7.96376 you should input 7.964. You need to do this carefully or Quercus will not mark your
answer as correct. There will be no part marks for incorrect rounding or formatting.
• Other than the decimal point, there should be no other punctuation included or any letters.
• Do not convert to percentages, unless instructed to.
4. In Section B, some of the analysis uses some new functions. Specifically rank(), unite(), and nchar().
Understanding the functions may aid you in reading through this document. Please see Section B1 for
a thorough explanation of each of these functions.
1
Introducing the data
Context
These data contain information from a sample of board games as of December 2021, that were rated on a
board game rating website called Board Game Geek. This information was collected from Kaggle, a data
hosting site.
Data dictionary
variable description
name Name of the board game.
description Description of the board game.
yearpublished Year that the board game was published.
minplayers Minimum number of players required to play the board game.
maxplayers Maximum number of players allowed to play the board game.
minplaytime Minimum required amount of time (in minutes) to play the board game.
maxplaytime Maximum required amount of time (in munutes) to play the board game.
minage Minimum age (in years) required to play the board game.
medical Indicates whether the board game has a medical theme.
cardgame Indicates whether the board game is considered a card game
cooperative Indicates whether the board game is a cooperative multiplayer board game.
dice Indicates whether the board game requires players to roll dice to play.
memory Indicates whether the board game requires players to memorize game play to progress
through the game play.
owned The number of copies of the board game owned.
rank The rank of the board game on the Board Game Geek website.
average The average rating of the board game.
users_rated The number of users who has rated the board game.
Taking a look at the data
glimpse(boardgames)
## Rows: 500
## Columns: 17
## $ name "Space Mission", "Castles of Caladale", "Pandemic: Risin~
## $ description "In Space Mission, players explore eight planets (random~
## $ yearpublished 2011, 2017, 2017, 2003, 1150, 2011, 2001, 2017, 1996, 20~
## $ minplayers 2, 1, 2, 3, 2, 2, 2, 1, 2, 2, 3, 2, 2, 2, 2, 1, 2, 2, 1,~
## $ maxplayers 5, 4, 5, 4, 2, 4, 5, 6, 4, 5, 16, 4, 4, 6, 4, 4, 4, 4, 4~
## $ minplaytime 45, 30, 45, 60, 30, 60, 15, 15, 60, 90, 90, 15, 45, 180,~
## $ maxplaytime 45, 30, 45, 60, 30, 90, 15, 20, 60, 150, 90, 15, 45, 180~
## $ minage 10, 8, 8, 10, 6, 10, 4, 6, 10, 14, 12, 0, 8, 12, 8, 13, ~
## $ medical FALSE, FALSE, NA, FALSE, FALSE, FALSE, FALSE, FALSE, FAL~
## $ cardgame FALSE, FALSE, NA, FALSE, FALSE, FALSE, FALSE, TRUE, FALS~
## $ cooperative FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, F~
## $ dice FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, ~
## $ memory FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, ~
2
## $ owned 529, 1639, 5194, 2328, 9279, 2493, 2598, 1989, 2393, 430~
## $ rank 7525, 3907, 749, 2726, 21800, 2081, 14087, 4278, 3001, 3~
## $ average 5.97, 6.41, 7.60, 6.44, 4.89, 6.73, 5.59, 6.55, 6.42, 7.~
## $ users_rated 455, 882, 2341, 1773, 7549, 1657, 1223, 595, 1438, 3464,~
head(boardgames)
## # A tibble: 6 x 17
## name description yearpublished minplayers maxplayers minplaytime maxplaytime
##
## 1 Space~ In Space M~ 2011 2 5 45 45
## 2 Castl~ Descriptio~ 2017 1 4 30 30
## 3 Pande~ It is the ~ 2017 2 5 45 45
## 4 Indus~ Descriptio~ 2003 3 4 60 60
## 5 Check~ Abstract s~ 1150 2 2 30 30
## 6 Panth~ From BGG N~ 2011 2 4 60 90
## # ... with 10 more variables: minage , medical , cardgame ,
## # cooperative , dice , memory , owned , rank ,
## # average , users_rated
3
Part A
Section A1: Descriptive Figures & Summaries
Figure A1: Histogram of Average Rating of Cooperative Games
boardgames %>% filter(cooperative=="TRUE") %>%
ggplot(aes(x = average)) +
geom_histogram(bins=10) +
theme_minimal() +
labs(x = "Average Rating", y="frequency")
0
5
10
15
6 7 8 9
Average Rating
fre
qu
en
cy
Figure A2: Histogram of Average Rating of Non-Cooperative Games
boardgames %>% filter(cooperative=="FALSE") %>%
ggplot(aes(x = average)) +
geom_histogram(bins=20) +
theme_minimal() +
labs(x = "Average Rating", y="frequency")
0
20
40
3 4 5 6
Average Rating
fre
qu
en
cy
4
Figure A3: Boxplot of Average Rating for board games that do and do not use Dice
boardgames %>% filter(!is.na(dice)) %>%
ggplot(aes(x = average, y=dice)) +
geom_boxplot() +
theme_minimal() +
labs(x = "Average Rating", y="Dice Game")
FALSE
TRUE
4 6 8
Average Rating
D
ic
e
G
am
e
5
Figures A4: Boxplot of Average Rating for Medical and Non-Medical board games
boardgames %>%
ggplot(aes(x = average, y=medical)) +
geom_boxplot() +
theme_minimal() +
labs(x = "Average Rating", y="Medical Game")
FALSE
TRUE
NA
4 6 8
Average Rating
M
ed
ica
l G
am
e
boardgames %>% filter(!is.na(medical)) %>%
ggplot(aes(x = average, y=medical)) +
geom_boxplot() +
theme_minimal() +
labs(x = "Average Rating", y="Medical Game")
FALSE
TRUE
4 6 8
Average Rating
M
ed
ica
l G
am
e
6
Table A1: Summaries for Cooperative
boardgames %>%
group_by(cooperative) %>%
summarise(x1 = n(),
x2 = min(average),
x3 = max(average),
x4 = mean(average),
x5 = median(average),
x6 = quantile(average, 0.25, na.rm=TRUE),
x7 = quantile(average, 0.75, na.rm=TRUE),
x8 = sd(average, na.rm=TRUE))
## # A tibble: 3 x 9
## cooperative x1 x2 x3 x4 x5 x6 x7 x8
##
## 1 FALSE 428 2.46 7.66 5.72 5.76 5.3 6.23 0.741
## 2 TRUE 62 5.72 9.34 7.70 7.74 7.37 7.99 0.669
## 3 NA 10 NA NA NA NA NA NA NA
Table A2: Summaries for Dice
boardgames %>%
group_by(dice) %>%
summarise(y1 = n(),
y2 = min(average),
y3 = max(average),
y4 = mean(average),
y5 = median(average),
y6 = quantile(average, 0.25, na.rm=TRUE),
y7 = quantile(average, 0.75, na.rm=TRUE),
y8 = sd(average, na.rm=TRUE))
## # A tibble: 3 x 9
## dice y1 y2 y3 y4 y5 y6 y7 y8
##
## 1 FALSE 367 2.65 9.24 5.91 5.84 5.36 6.34 0.961
## 2 TRUE 123 2.46 9.34 6.15 6 5.42 6.65 1.04
## 3 NA 10 NA NA NA NA NA NA NA
7
Section A2: Testing
Code for Means
set.seed(130)
repetitions <- 1000
simulated_values <- rep(NA, repetitions)
boardgames_noNA <- boardgames %>% filter(!is.na(cooperative))
## If interested in difference of Means
for(i in 1:repetitions){
simdata <- boardgames_noNA %>%
mutate(cooperative = sample(cooperative)) %>%
group_by(cooperative) %>%
summarise(x1 = mean(average)) %>% # MEAN
summarise(value1 = diff(x1))
simulated_values[i] <- as.numeric(simdata)
}
sim1 <- tibble(mean_diff = simulated_values)
Figure A4: Histograms of simulated values
## Histogram
sim1 %>% ggplot(aes(x=mean_diff)) +
geom_histogram(bins=30,color="black", fill="grey") +
theme_minimal()
0
25
50
75
−0.25 0.00 0.25 0.50
mean_diff
co
u
n
t
8
Figure A5: Histograms of simulated values
## Extending the x-axis limits (zooming out a bit)
sim1 %>% ggplot(aes(x=mean_diff)) +
geom_histogram(bins=50,color="black", fill="grey") +
theme_minimal() +
xlim(-2, 2)
0
50
100
150
200
250
−2 −1 0 1 2
mean_diff
co
u
n
t
Code for Medians
set.seed(130)
repetitions <- 1000
simulated_values <- rep(NA, repetitions)
boardgames_noNA <- boardgames %>% filter(!is.na(cooperative))
## If interested in difference of Medians
for(i in 1:repetitions){
simdata <- boardgames_noNA %>%
mutate(cooperative = sample(cooperative)) %>%
group_by(cooperative) %>%
summarise(x1 = median(average)) %>% # MEDIAN
summarise(value1 = diff(x1))
simulated_values[i] <- as.numeric(simdata)
}
sim2 <- tibble(median_diff = simulated_values)
9
Figure A6: Histograms for simulated values
## Histogram
sim2 %>% ggplot(aes(x=median_diff)) +
geom_histogram(bins=30,color="black", fill="grey") +
theme_minimal()
0
30
60
90
120
−0.50 −0.25 0.00 0.25 0.50
median_diff
co
u
n
t
Figure A7: Histograms for simulated values
## Extending the x-axis limits (zooming out a bit)
sim2 %>% ggplot(aes(x=median_diff)) +
geom_histogram(bins=50,color="black", fill="grey") +
theme_minimal() +
xlim(-2, 2)
0
50
100
150
200
−2 −1 0 1 2
median_diff
co
u
n
t
10
Section A3: Alternative Data
Here we found some data on more boardgames. They are available in 3 different formatted data sets
(ExtraGames1, ExtraGames2, and ExtraGames3) presented below.
ExtraGames1
First 10 rows of the ExtraGames1 data set, looking at the year the game was published, whether or not it’s
a card game and average rating of the other games.
name yearpublished cardgame_Yes cardgame_No average
Tiny Ninjas 2019 1 0 NA
MindTrap II 1997 0 1 5.25
Big Monster 2018 1 0 7.26
Zoowaboo 2009 0 1 6.38
Scuttle! 2016 1 0 6.41
Land vs Sea 2021 0 1 7.60
Huggermugger 1989 0 1 5.20
Draco & Co 2001 1 0 5.50
Albion 2009 0 1 6.06
Homeworlds 2001 0 1 7.61
ExtraGames2
First 10 rows of the ExtraGames2 data set, looking at the year the game was published, minimum number
of players, maximum number of players, whether or not it’s a card game and average rating of the other
games.
name yearpublished minplayers maxplayers cardgame average
Tiny Ninjas 2019 1 2 TRUE NA
MindTrap II 1997 2 10 FALSE 5.25
Big Monster 2018 2 NA TRUE 7.26
Zoowaboo 2009 2 4 FALSE 6.38
Scuttle! 2016 1 5 TRUE 6.41
Land vs Sea 2021 2 4 FALSE 7.60
Huggermugger 1989 2 4 FALSE 5.20
Draco & Co 2001 3 6 TRUE 5.50
Albion 2009 2 4 FALSE 6.06
Homeworlds 2001 2 6 FALSE 7.61
11
ExtraGames3
First 10 rows of the ExtraGames3 data set, looking at the minimum number of players, maximum number
of players, year the game was published, and average rating of the other games.
name player_variable player_number yearpublished average
Tiny Ninjas minplayers 1 2019 NA
Tiny Ninjas maxplayers 2 2019 NA
MindTrap II minplayers 2 1997 5.25
MindTrap II maxplayers 10 1997 5.25
Big Monster minplayers 2 2018 7.26
Big Monster maxplayers NA 2018 7.26
Zoowaboo minplayers 2 2009 6.38
Zoowaboo maxplayers 4 2009 6.38
Scuttle! minplayers 1 2016 6.41
Scuttle! maxplayers 5 2016 6.41
12
Part B
Section B1: New Functions
For the analysis used in the midterm exam we will use these new functions; namely, the rank, unite, and
nchar functions. These functions are introduced and explained below.
The rank Function
Ranks are another way of expressing percentiles. Ranks imply ordering, and a lower number rank is typically
better than a higher order rank, e.g., “1st place”, “2nd” place, etc.
The rank function can be used to label the smallest to largest values in a column, as in the example below.
The rank function does not reorder the rows of a tibble so if you want to actually order the you still need
to use the arrange function, as in the example below.
boardgames %>% tail() %>% select(owned) %>% mutate(rank(owned))
## # A tibble: 6 x 2
## owned ‘rank(owned)‘
##
## 1 1112 2
## 2 970 1
## 3 4851 5
## 4 3016 4
## 5 4908 6
## 6 1890 3
# Using the `rank` function does not sort the data, mark what the sorting would be
# To actually sort the data we would apply the `arrange` function
boardgames %>% tail() %>% select(owned) %>% mutate(rank(owned)) %>%
arrange(owned)
## # A tibble: 6 x 2
## owned ‘rank(owned)‘
##
## 1 970 1
## 2 1112 2
## 3 1890 3
## 4 3016 4
## 5 4851 5
## 6 4908 6
(Optional) This link has some more information about the rank() function if you’d like to look more into it
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/rank.
The unite Function
The unite function allows one to paste together columns of strings into a single column. This is a short
cut for using mutate to paste the string values of columns together with the paste function, as seen in the
examples below.
13
# names of games
boardgames %>% select(name)
## # A tibble: 500 x 1
## name
##
## 1 Space Mission
## 2 Castles of Caladale
## 3 Pandemic: Rising Tide
## 4 Industria
## 5 Checkers
## 6 Pantheon
## 7 Hisss
## 8 The Lady and the Tiger
## 9 Entdecker
## 10 The Great Zimbabwe
## # ... with 490 more rows
# reversing names of names of games and pasting them together with `mutate` and `paste`
reverse_string <- stringi::stri_reverse
boardgames %>%
mutate(new_column = paste(name, ' >>--|--<< ', reverse_string(name), sep='')) %>%
select(new_column)
## # A tibble: 500 x 1
## new_column
##
## 1 Space Mission >>--|--<< noissiM ecapS
## 2 Castles of Caladale >>--|--<< eladalaC fo seltsaC
## 3 Pandemic: Rising Tide >>--|--<< ediT gnisiR :cimednaP
## 4 Industria >>--|--<< airtsudnI
## 5 Checkers >>--|--<< srekcehC
## 6 Pantheon >>--|--<< noehtnaP
## 7 Hisss >>--|--<< sssiH
## 8 The Lady and the Tiger >>--|--<< regiT eht dna ydaL ehT
## 9 Entdecker >>--|--<< rekcedtnE
## 10 The Great Zimbabwe >>--|--<< ewbabmiZ taerG ehT
## # ... with 490 more rows
# doing the same thing with `unite`
boardgames %>% mutate(eman = reverse_string(name)) %>%
unite(new_column, c(name, eman), sep=' >>--|--<< ') %>% select(new_column)
## # A tibble: 500 x 1
## new_column
##
## 1 Space Mission >>--|--<< noissiM ecapS
## 2 Castles of Caladale >>--|--<< eladalaC fo seltsaC
## 3 Pandemic: Rising Tide >>--|--<< ediT gnisiR :cimednaP
## 4 Industria >>--|--<< airtsudnI
## 5 Checkers >>--|--<< srekcehC
## 6 Pantheon >>--|--<< noehtnaP
14
## 7 Hisss >>--|--<< sssiH
## 8 The Lady and the Tiger >>--|--<< regiT eht dna ydaL ehT
## 9 Entdecker >>--|--<< rekcedtnE
## 10 The Great Zimbabwe >>--|--<< ewbabmiZ taerG ehT
## # ... with 490 more rows
(Optional) This link has some more information about the unite() function if you’d like to look more into
it https://www.rdocumentation.org/packages/tidyr/versions/0.8.1/topics/unite.
The nchar Function
The nchar function counts the number of characters (i.e., the “nchar”s) in a character string, as seen in the
example below.
boardgames %>% mutate(nchar(description)) %>%
select(`nchar(description)`, description)
## # A tibble: 500 x 2
## ‘nchar(description)‘ description
##
## 1 2784 In Space Mission, players explore eight planets (random~
## 2 829 Description from the publisher: In a forgotten~
## 3 1557 It is the dawn of the Industrial Age in the Netherlands~
## 4 1354 Description from designer Michael Schacht (swiped from ~
## 5 1385 Abstract strategy game where players move disc-shaped p~
## 6 866 From BGG News (Eric Martin): "In Pantheon, players~
## 7 2507 The players try to form snakes, which are as long as po~
## 8 705 The Lady and the Tiger is five games in one! D~
## 9 1410 [Careful: there are two similar games with Entdecker in~
## 10 3198 The Great Zimbabwe is a game about building a trade bas~
## # ... with 490 more rows
(Optional) This link has some more information about the nchar function if you’d like to look more into it
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/nchar.
15
Section B2: Six-Sided Die
Six-Sided Die Null Sampling Distribution for n=500
This is the simulated sampling distribution of a proportion with a null hypothesis which assumes the chance
of a “success” for the outcome in question is assumed to be equal to the chance of rolling any specific side
of a fair six-sided die. E.g., Pr(“random game uses dice") = 1/6.
bgg_sample_n500 <- boardgames
set.seed(10)
n <- 500
r <- 10000
simulated_test_statistics <- rep(NA, r)
for (i in 1:r){
sample(c(0,1), size=n, prob=c(5/6,1/6), replace=TRUE) %>%
sum()/n -> simulated_test_statistics[i]
}
Figure B1: Histogram of simulated proportions of dice board games
tibble(`Proportion of 500 samples that use dice if Pr("dice game") = 1/6` =
simulated_test_statistics) %>%
ggplot(aes(x=`Proportion of 500 samples that use dice if Pr("dice game") = 1/6`)) +
geom_histogram(bins=10) +
labs(title = 'Sampling Distribution for a proportion simulation test
statistic under a null hypothesis that assumes
Pr("dice game") = 1/6')
0
1000
2000
3000
0.10 0.15 0.20
Proportion of 500 samples that use dice if Pr("dice game") = 1/6
co
u
n
t
Sampling Distribution for a proportion simulation test
statistic under a null hypothesis that assumes
Pr("dice game") = 1/6
16
Figure B2: Barplot of “dice” Variable
This is the number games which use dice (dice=TRUE) and do not use dice (dice=FALSE) in our n=500
sample of games from Board Game Geek.
bgg_sample_n500 %>%
select(dice) %>%
filter(dice|!dice) %>%
ggplot(aes(x=dice)) +
geom_bar() +
labs(x = "The game uses dice?") +
labs(title = 'Values of the "dice" variable in our sample n=500
games from BoardBameGeek')
0
100
200
300
FALSE TRUE
The game uses dice?
co
u
n
t
Values of the "dice" variable in our sample n=500
games from BoardBameGeek
17
Section B3: Coin-Flip
Coin-Flip Sampling Distribution for n=500
This is the simulated sampling distribution of a proportion with a null hypothesis which assumes the chance
of a “success” for the outcome in question is assumed to be equal to the chance of flipping a fair coin. E.g.,
Pr(“I rank something higher than you rank something”) = 1/2.
set.seed(20)
n <- 500
r <- 10000
simulated_test_statistics <- rep(NA, r)
for (i in 1:r){
sample(c(0,1), size=n, replace=TRUE) %>% sum()/n -> simulated_test_statistics[i]
}
sampling_distribution <- tibble(simulated_test_statistic = simulated_test_statistics)
Comparisons of Sample Ranks Test Statistic
This test statistic is based on “The rank Function”. The rank function is distinct from the “rank” variable
which is used above and is the “overall game rank on Baord Game Geek”. The other data variable used above
is the “average” variable which is the “average rating of game by board game geek users”. The “average”
variable is measured on a ten-point scale, ranging from 0 to 10 (from bad to good), so rank(10-average) is
used above is so that the rank function assigns better ranks to larger “average” variable scores.
test_statistic <- bgg_sample_n500 %>%
filter(!is.na(rank) & !is.na(average)) %>%
mutate(`Rank of 'user average' relative to the sample` = rank(10-average),
`Rank of 'BGG rank' relative to the sample` = rank(rank),
`"average" sample rank better than "rank" sample rank` =
as.numeric(`Rank of 'user average' relative to the sample` <
`Rank of 'BGG rank' relative to the sample`)) %>%
summarise(p_hat = sum(`"average" sample rank better than "rank" sample rank`)/n())
Figure B3: Histogram for sampling distribution rank Comparisons versus coin flipping
Here we examine the test statistic in “Comparisons of Sample Ranks Test Statistic” relative to the sampling
distribution under the null hypothesis in
“Coin-Flip Sampling Distribution for n=500”.
sampling_distribution %>%
ggplot(aes(x=simulated_test_statistic)) +
geom_histogram(bins=20) +
geom_vline(aes(xintercept=as.numeric(test_statistic))) +
labs(x = 'Proportion of 500 samples where average users rank is better
than BGG rank\nif for any one sample the chance is
Pr(average user rank < BGG rank) = 1/2',
title = 'Sampling Distribution for a proportion simulation test statistic
18
under a null hypothesis that assumes
Pr(average user rank < BGG rank) = 1/2')
0
500
1000
1500
0.45 0.50 0.55
Proportion of 500 samples where average users rank is better
than BGG rank
if for any one sample the chance is
Pr(average user rank < BGG rank) = 1/2
co
u
n
t
Sampling Distribution for a proportion simulation test statistic
under a null hypothesis that assumes
Pr(average user rank < BGG rank) = 1/2
Table B1: p-value calculation
sampling_distribution %>%
summarise(`test statistic`=as.numeric(test_statistic),
`p-value` = mean(abs(simulated_test_statistic-0.5) >
abs(as.numeric(test_statistic-0.5))))
## # A tibble: 1 x 2
## ‘test statistic‘ ‘p-value‘
##
## 1 0.433 0.0032
19
Section B4: Qualitative Game Characterizations
Here we create a single string characterizing a game with respect to the following variables:
• “medical” (boolean): Indicates whether the board game has a medical theme
• “memory” (boolean): Indicates whether the board game is considered a card game
• “cooperative” (boolean): Indicates whether the board game is a cooperative multiplayer board game
• “cardgame” (boolean): Indicates whether the board game requires players to roll dice to play
• “dice” (boolean): Indicates whether the board game requires players to memorize game play to progress
through the game play
For example, “cardsdice” indicates that a game uses both cards and dice, but is not a memory or cooperative
game, and is not medically themed; whereas, “medmemcoop” is a cooperative memory based game with a
memory theme, but does not use cards or dice.
Note: the details of the implementation are given below; however, they are NOT critical for the purposes of
the examination and are only included here for completeness.
# Convert TRUE/FALSE game information into short string representations and
# cobmine these strings together to make a single column of game information
bgg_sample_n500_gametypes <- bgg_sample_n500 %>%
mutate(med=case_when(medical~"med"),
mem=case_when(memory~"mem"),
coop=case_when(cooperative~"coop"),
cards=case_when(cardgame~"cards"),
dice=case_when(dice~"dice")) %>%
unite(`Game Type`, c(med,mem,coop,cards,dice), sep="", na.rm=TRUE) %>%
mutate(`Game Type` = case_when(`Game Type`=="" ~ "NONE", TRUE~`Game Type`))
# The `unite` function is described in the `New Functions` section above, and
# "TRUE~`Game Type`" keeps everything not assigned to "NONE" its original value
Table B2: Sample Sizes of Different Game Types
Our sample of games bgg_sample_n500 can be divided into types defined by the Game Type variable created
above.
bgg_sample_n500_gametypes_n <- bgg_sample_n500_gametypes %>%
group_by(`Game Type`) %>%
summarise(n=n())
bgg_sample_n500_gametypes_n
## # A tibble: 15 x 2
## ‘Game Type‘ n
##
## 1 cards 128
## 2 cardsdice 14
## 3 coop 22
## 4 coopcards 14
## 5 coopcardsdice 1
## 6 coopdice 22
## 7 dice 82
20
## 8 med 2
## 9 medcoop 1
## 10 medmemcoop 1
## 11 mem 4
## 12 memcards 10
## 13 memcoop 1
## 14 memdice 4
## 15 NONE 194
Note: the “NONE” category of the created Game Type variable indicates the subset of games in our n = 500
sample which do not use cards or dice, are not cooperative or memory based games, and are not medically
themed.
Figure B4: Barplot of different Game Types
Visually, the sample sizes of the different Game Type subsets (sorted from smallest to largest) are as follows.
coopcardsdice
medcoop
medmemcoop
memcoop
med
mem
memdice
memcards
cardsdice
coopcards
coop
coopdice
dice
cards
NONE
0 50 100 150 200
(Sub)Sample Size
G
am
e
Ty
pe
Figure B5: Boxplot game Title Lengthfor different Game Types
When considering the different populations of games implied by the Game Type variable, one (admittedly
silly) thing we could ask is: “Are the lengths of game titles the same between the different game types?”
To allow us to begin addressing this question, we now create the variable Title Length.
21
# Create a column storing the number of characters (i.e., length) of each game name
bgg_sample_n500_gametypes <- bgg_sample_n500_gametypes %>%
mutate(`Title Length` = nchar(name)) # `nchar` counts the characters in a string
The distributions of the created Title Length variable observed in each of our subsamples (sorted from
smallest to largest subsample size) are as follow.
coopcardsdice (n=1)
medcoop (n=1)
medmemcoop (n=1)
memcoop (n=1)
med (n=2)
mem (n=4)
memdice (n=4)
memcards (n=10)
cardsdice (n=14)
coopcards (n=14)
coop (n=22)
coopdice (n=22)
dice (n=82)
cards (n=128)
NONE (n=194)
0 20 40 60
Title Length
Table B3: Bootstrapping 95% Confidence Intervals within Game Type
Since each unique value in the Game Type variable corresponds to a subset of our n = 500 sample, we can
analyze each of these subsets as it’s own sample. Therefore, we will compute 95% confidence intervals of the
average number of characters characters (i.e., length) of game names for each of the populations implied by
the Game Type variable. The lengths of game titles observed in our sample have been stored in the Title
Length variable created above.
Usually when we want to separate data into groups we use the powerful group_by function; however, because
the task we want to within each group is complicated (i.e., we will be making bootstrap confidence intervals
for each subset of the data), we will approach this problem using a “double for loop”.
Note: the “double for loop” code producing bootstrap confidence intervals for each population implied by
the Game Type is somewhat advanced; and, the implementation details are NOT critical for the questions of
the examination. They are thus only included here for your review in case they are independently of interest
to you. If you continue further into computational disciplines you can look forward to seeing many more
“double for loops!
## # A tibble: 15 x 3
22
## ‘Game Type‘ ‘95% CI LB‘ ‘95% CI UB‘
##
## 1 NONE 12.3 14.6
## 2 coop 19.0 28.3
## 3 cards 13.9 17.0
## 4 dice 17.6 23.0
## 5 cardsdice 17.9 27.1
## 6 memdice 8.5 21.8
## 7 memcards 9.00 20.3
## 8 coopdice 16.8 28.4
## 9 coopcards 15.2 28.2
## 10 mem 16 24.2
## 11 memcoop 7 7
## 12 med 5 22
## 13 medmemcoop 6 6
## 14 coopcardsdice 20 20
## 15 medcoop 8 8
Figure B6:Visual Bootstrap 95% Confidence Intervals within Game Type
Here are the bootstrap created in the previous section.
coopcardsdice (n=1)
medcoop (n=1)
medmemcoop (n=1)
memcoop (n=1)
med (n=2)
mem (n=4)
memdice (n=4)
memcards (n=10)
cardsdice (n=14)
coopcards (n=14)
coop (n=22)
coopdice (n=22)
dice (n=82)
cards (n=128)
NONE (n=194)
0 20 40 60
95% Coinfidence Intervals
23

essay、essay代写