R代写 - STA302H1F / 1001HF Autumn 2020 Assignment
STA302H1F / 1001HF Autumn 2020 Assignment # 2 A Simple Linear Model for Toronto and Mississauga House Prices Posted by: Dr. Shivon Sue-Chee on Saturday, October 10, 2020 Due: In Quercus by 8pm on Saturday, October 24, 2020. Late assignments will be subjected to a penalty of 20% per day late. Submissions will not be accepted beyond 48 hours of the due date. Email submissions are not allowed. 1 Instructions • Use R Studio to create two files: 1. An Rmarkdown file with your codes according to the standard format in the A2 a20 RMForm.Rmd file. 2. The corresponding report in html or pdf format. • Create a video presentation of no longer than 5 minutes with you presenting your written report. Your face should be shown at least once during your presentation. Save your video presentation as an MP4 file and upload it into your UofT MyMedia account. I suggest that you use Zoom to create your video. Here are two demonstration videos of how to record and download your presentation in Zoom: – https://www.youtube.com/watch?v=P6cTbnUPwfY – https://kb.siue.edu/61721 Here is documentation on using UofT MyMedia: – https://www.oise.utoronto.ca/online/Instructors/Video_server_-_MyMedia/index.html • Into Quercus Assignment 2, submit the following three items: 1. A MyMedia link to your video presentation 2. An Rmd file with your RMarkdown codes 3. A Portrait picture of yourself with your T-card • Note that for a separate participation activity, which would be announced later, you would be asked to upload your video presentation to peer Scholar for peer reviews. For the sake of privacy, please avoid revealing your full identity in your video presentation. You could use your initials and up to the last four digits of your student number, if you like. • Presentation of your report is very important. Do not show R codes unless it is required for your solutions. Only required numbers and plots should be shown. Extraneous output should be hidden. Use options, include=FALSE, echo=FALSE, message=FALSE, where necessary. • Write and present your own work. For instance, personalized your code as much as possible, using your initials. All plots produced must be given a title with the last 4 digits of your student number. • Use a benchmark significance level of 5%. Report p-values to 4 decimal places. 1 2 Grading Scheme Grading rubrics will be posted in Quercus for the video presentation and the RMarkdown file. Note that if a portrait picture of yourself with a clear view of your T-card is not received by the due date or if your picture and T-card do not correspond to our other records, a mark of zero will be given for the entire assignment. 3 The Data First-time home buying is currently a major federal issue. Prices for detached houses have been at an all￾time high during the current COVID-19 period. Data for this assignment was obtained from the Toronto Real Estate Board (TREB) on detached hourses in two separate neighbourhoods- one in the city of Toronto and another in the city of Mississauga. Data is contained in the file “real20.csv” on the assignment 2 page. The variables in the dataset are: • ID: property identification • sold: the actual sale price of the property in millions of Canadian dollars • list: the last list price of the property in millions of Canadian dollars • taxes: previous year’s property tax in Canadian dollars • location: M- Mississauga Neighbourhood, T- Toronto Neighbourhood For this assignment, we are interested in establishing a simple linear model that home buyers can use to determine the expected sale price of detached, single family homes in the two neighbourhoods in the Greater Toronto Area. 4 The Analysis Set the seed of your randomization to be the last 4 digits of your student number. Randomly select a sample of 200 cases. Based on your sample data, complete an RMarkdown file and the corresponding report with the following sections. I. Exploratory Data Analysis section. • Use a single plot to describe your data. Create a subset of your data by removing at most two cases and briefly explain your choice. • Then using the data subset for this and the remaining parts of this assignment, draw two scat￾terplots of the response variable- sale price by (i) list price and then (ii) taxes. In each plot, distinguish between properties in neighbourhood M and those in neighbourhood T, and include a legend/key. • Interpret each of the three plots produced in this part, that is, describe at least one major highlight from each plot. Each highlight should differ from the other. II. Methods and Model section. • Carry out three simple linear regressions (SLR) for sale price from list price, one for all data, one for properties of neighbourhood M and another for properties of neighbourhood T. In a table, give the values of the following for each of these regressions: 2 – R2 – the estimated intercept – the estimated slope – the estimate of the variance of the error term – the p-value for the test with null hypothesis that the slope is 0 – a 95% confidence interval for the slope parameter • Interpret and compare the three R2 values. Give a briefly explanation why they appear similar or different. • Briefly discuss whether a pooled two-sample t-test can be used to determine if there is a statisti￾cally significant difference between the slopes of the simple linear models for the two neighbour￾hoods. (Note: You do not need to carry out a pooled two-sample t- test here.) III. Discussion and Limitations section. A sensible data science approach is to base inferences or conclusions only on valid models. • Select one of the three fitted models in part II and give a brief explanation for your choice. • Discuss whether there are any violations of the normal error SLR assumptions for your selected model. Use at most two plots. • Identify two potential numeric predictors (other than those given in the data set) that could be used to fit a multiple linear regression for sale price. 3