MGT 205 R Project - Output
Prof. Rich
Overview
In this project, you will use a data set of property values from
2007 - 2019. The data contain sales prices for houses and units
with 1, 2, 3, 4, and 5 bedrooms.
The data are: date of sale; price in dollars; property type (unit or house); number
of bedrooms
START HERE
Run the code block below.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Load the property.csv file into a data frame.
prop <- read_csv("property.csv")
## Rows: 347 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): saledate, type
## dbl (2): price, bedrooms
## lgl (2): V1, V2
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
1. How many rows and columns are there in the data set?
dim(prop)
## [1] 347 6
2. Calculate the summary statistics for the “price” column.
summary(prop$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 316751 427681 507596 547434 626106 1017752 1
3. Remove the unwanted variables V1 and V2 from the data set.
Show the head() of your data set to confirm that the unwanted
variables are gone.
prop<-prop[,-(5:6)]
head(prop)
## # A tibble: 6 × 4
## saledate price type bedrooms
##
## 1 30/06/2007 421291 house 3
## 2 30/06/2007 548969 house 4
## 3 30/06/2007 368817 unit 2
## 4 30/06/2008 441854 house 2
## 5 30/06/2008 419628 house 3
## 6 30/06/2008 559580 house 4
4a. Are there any NA values in the data set?
any(is.na(prop))
## [1] TRUE
4b. If any NA values are in the “price” column, replace it with the
mean price.
prop$price[which(is.na(prop$price))] <- mean(prop$price, na.rm = TRUE)
4c. If any NA values are in any of the other columns, remove the
entire row from the data set.
prop<-na.omit(prop)
4d. Show that there are no more NA values in the data set.
any(is.na(prop))
## [1] FALSE
5. Rename the “saledate” column to “date”. Show the head() of
your data set to confirm the change.
colnames(prop)[1]<-"date"
head(prop)
## # A tibble: 6 × 4
## date price type bedrooms
##
## 1 30/06/2007 421291 house 3
## 2 30/06/2007 548969 house 4
## 3 30/06/2007 368817 unit 2
## 4 30/06/2008 441854 house 2
## 5 30/06/2008 419628 house 3
## 6 30/06/2008 559580 house 4
6. Run the code below to convert the variable “bedrooms” into a
factor variable using the as.factor() function. Replace “prop” in
the code with whatever you called your data set. You do not need
to do anything else in this question after you edit and run the
code.
prop$bedrooms<-as.factor(prop$bedrooms)
7a. Calculate and show the mean price of the house properties
and unit properties.
prop %>% group_by(type) %>% summarise(mean_price =mean(price))
## # A tibble: 2 × 2
## type mean_price
##
## 1 house 626587.
## 2 unit 439743.
7b. Display the two values from 7a as a bar chart. Be sure to
include useful information to help understand your chart.
8. Calculate the total income for each of the bedroom types (hint:
group by). Sort the output from highest to lowest total income.
## # A tibble: 5 × 2
## bedrooms total_income
##
## 1 3 54340744
## 2 2 43868210
## 3 5 41619884.
## 4 4 34159994
## 5 1 15970772
9. Which property and bedroom type COMBINATION draws the
highest income?
## `summarise()` has grouped output by 'type'. You can override using the `.groups` argument.
## # A tibble: 1 × 3
## # Groups: type [1]
## type bedrooms total_income
##
## 1 house 5 41619884.
10. Run the code below to create a new column “year” which is
the year from the sales date. Replace “prop” in the code with
whatever you called your data set. You do not need to do
anything else in this question after you edit and run the code.
prop$year<-format(as.Date(prop$date, format="%d/%m/%Y"),"%Y")
11. Find the years having a total income of more than
$16,000,000 from selling properties.
## # A tibble: 3 × 2
## year total_income
##
## 1 2016 16459234
## 2 2017 17077926
## 3 2018 17359589
12. How many houses have been sold after 2015 for a price
higher than $1,000,000?
## # A tibble: 1 × 1
## n
##
## 1 6
13. Display the income distribution in each property type over the
years using a stacked column chart. Be sure to include useful
information to help your TA understand your chart.
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
14. Find the number of UNIT sales per year for the year range
2010 - 2019. Sort the output from lowest to highest sales.
## # A tibble: 10 × 2
## year n
##
## 1 2019 9
## 2 2010 12
## 3 2011 12
## 4 2012 12
## 5 2013 12
## 6 2014 12
## 7 2015 12
## 8 2016 12
## 9 2017 12
## 10 2018 12
## 1 30/06/2007 421291 house 3
## 2 30/06/2007 548969 house 4
## 3 30/06/2007 368817 unit 2
## 4 30/06/2008 441854 house 2
## 5 30/06/2008 419628 house 3
## 6 30/06/2008 559580 house 4
4a. Are there any NA values in the data set?
any(is.na(prop))
## [1] TRUE
4b. If any NA values are in the “price” column, replace it with the
mean price.
prop$price[which(is.na(prop$price))] <- mean(prop$price, na.rm = TRUE)
4c. If any NA values are in any of the other columns, remove the
entire row from the data set.
prop<-na.omit(prop)
4d. Show that there are no more NA values in the data set.
any(is.na(prop))
## [1] FALSE
5. Rename the “saledate” column to “date”. Show the head() of
your data set to confirm the change.
colnames(prop)[1]<-"date"
head(prop)
## # A tibble: 6 × 4
## date price type bedrooms
##
## 1 30/06/2007 421291 house 3
## 2 30/06/2007 548969 house 4
## 3 30/06/2007 368817 unit 2
## 4 30/06/2008 441854 house 2
## 5 30/06/2008 419628 house 3
## 6 30/06/2008 559580 house 4
6. Run the code below to convert the variable “bedrooms” into a
factor variable using the as.factor() function. Replace “prop” in
the code with whatever you called your data set. You do not need
to do anything else in this question after you edit and run the
code.
prop$bedrooms<-as.factor(prop$bedrooms)
7a. Calculate and show the mean price of the house properties
and unit properties.
prop %>% group_by(type) %>% summarise(mean_price =mean(price))
## # A tibble: 2 × 2
## type mean_price
##
## 1 house 626587.
## 2 unit 439743.
7b. Display the two values from 7a as a bar chart. Be sure to
include useful information to help understand your chart.
8. Calculate the total income for each of the bedroom types (hint:
group by). Sort the output from highest to lowest total income.
## # A tibble: 5 × 2
## bedrooms total_income
##
## 1 3 54340744
## 2 2 43868210
## 3 5 41619884.
## 4 4 34159994
## 5 1 15970772
9. Which property and bedroom type COMBINATION draws the
highest income?
## `summarise()` has grouped output by 'type'. You can override using the `.groups` argument.
## # A tibble: 1 × 3
## # Groups: type [1]
## type bedrooms total_income
##
## 1 house 5 41619884.
10. Run the code below to create a new column “year” which is
the year from the sales date. Replace “prop” in the code with
whatever you called your data set. You do not need to do
anything else in this question after you edit and run the code.
prop$year<-format(as.Date(prop$date, format="%d/%m/%Y"),"%Y")
11. Find the years having a total income of more than
$16,000,000 from selling properties.
## # A tibble: 3 × 2
## year total_income
##
## 1 2016 16459234
## 2 2017 17077926
## 3 2018 17359589
12. How many houses have been sold after 2015 for a price
higher than $1,000,000?
## # A tibble: 1 × 1
## n
##
## 1 6
13. Display the income distribution in each property type over the
years using a stacked column chart. Be sure to include useful
information to help your TA understand your chart.
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
14. Find the number of UNIT sales per year for the year range
2010 - 2019. Sort the output from lowest to highest sales.
## # A tibble: 10 × 2
## year n
##
## 1 2019 9
## 2 2010 12
## 3 2011 12
## 4 2012 12
## 5 2013 12
## 6 2014 12
## 7 2015 12
## 8 2016 12
## 9 2017 12
## 10 2018 12
学霸联盟