The University of Sydney School of Mathematics and Statistics Assignment 2 Part B: Statistics MATH1062: Mathematics 1B Semester 1, 2025 Lecturers: Tiangang Cui and Jun Yong Park This individual assignment is due by 11:59pm Sunday 11 May 2025, via Canvas. Late assignments will receive a penalty of 5% per day until the closing date. Your answers must be uploaded in Canvas following the instruction at the beginning of each Part of this document. Please make sure you review your submission carefully. What you see is exactly how the marker will see your assignment. Submissions can be overwritten until the due date. To ensure compliance with our anonymous marking obligations, please do not under any circumstances include your name in any area of your assignment. The School of Mathematics and Statistics encourages some collabo- ration between students when working on problems, but students must write up and submit their own version of the solutions. Even though the use of AI is allowed, it is better for your learning to do your own work to complete the assignment. If you have technical difficulties with your submission, see the University of Sydney Canvas Guide, available from the Help section of Canvas. This assignment has two parts. It is worth a total of 5% + 5% = 10% of your final assessment for this unit. Please cite any resources used, including AI, and show all working. Present your arguments clearly using words of explanation and diagrams where relevant. The marker will give you feedback and allocate an overall mark to your assignment using the following criteria: Copyright © 2025 The University of Sydney 1 Part B: Statistics Solutions to the statistics part (Part B) must be prepared in written form, and uploaded as a single pdf file to https://canvas.sydney.edu.au/courses/64063/assignments/598720. The statistics part of Assignment 2 contains three questions, each with multiple parts. Background In this assignment, we will use simulated climate data based on real observations from the Bureau of Meteorology at Canterbury Racecourse AWS (station 066194), collected in 2023. The simulated dataset includes various daily measurements over a period of 117 days. For this assignment, we will focus on the daily morning temperature (morning.temp), daily maximum temperature (max.temp), and daily relative humidity (humidity). Boxplots of these variables are shown below. Temperatures are measured in degrees (Celsius), and relative humidity is expressed as a per- centage (taking values betweeen 0 and 100). To avoid confusion, please focus on the given data values rather than their units in answering the following questions. 1. (a) Using the following R output, write down step-by-step the equation of the linear re- gression model to predict the value for max.temp given the value of morning.temp. Round the slope and intercept to two decimal places. > summary(morning.temp) Min. 1st Qu. Median Mean 3rd Qu. Max. 17.10 19.98 21.00 21.10 21.93 25.70 > summary(max.temp) Min. 1st Qu. Median Mean 3rd Qu. Max. 20.52 25.75 26.99 27.37 29.02 33.87 > sd(morning.temp) [1] 2.218058 2 > sd(max.temp) [1] 2.875926 > cor(max.temp, morning.temp) [1] 0.9386172 (b) The plot below shows the residuals after fitting the regression line in (a). Comment on whether the regression line is a good fit. 2. A relative humidity above 65 is often considered high and may pose potential health risks. Using historical data from the early 1900s, researchers at the Bureau of Meteorology estimated that 18% of days had relative humidity exceeding 65. They claimed that the current proportion of days with risky humidity levels (i.e., relative humidity exceeding 65) is consistent with the level observed in the early 1900s. We want to test whether the data provided in this assignment is consistent with their claim that “18% of days have relative humidity exceeding 65”. The following R output is useful. sum(humidity> 65) [1] 30 > length(humidity) [1] 117 > round(qnorm(c(0.95, 0.955, 0.96, 0.965, 0.97)), 3) [1] 1.645 1.695 1.751 1.812 1.881 > round(qnorm(c(0.975, 0.98, 0.985, 0.99, 0.995)), 3) [1] 1.960 2.054 2.170 2.326 2.576 (a) State the null and alternative hypotheses for this test. In answering, introduce an appropriate parameter, as well as state your null and alternative hypotheses in terms of this parameter. (b) Calculate the expected value and standard error of the sample proportion, assum- ing the null hypothesis is true. Round your calculations to three decimal places. 3 (c) Assuming the Central Limit Theorem holds, calculate the two-sided 98% prediction interval that can be used to test whether the data is consistent with the claimed 18% in the null hypothesis. You can use the provided R output. Round your calculations to three decimal places. (d) First, use the provided R output to calculate the observed sample proportion, and then compute the P-value based on this observed proportion. You may need to use the pnorm() function in R to calculate the P-value. You must include the R command and its output in your submission, either as a screenshot or written by hand. (e) What is the conclusion of your hypothesis test at the 2% significance level? Is the observed sample proportion significantly different from 18%? What assumptions do we need about our data to make our hypothesis test valid? Your answer should have three things: 1. At most one sentence stating the conclusion of your hypothesis test. 2. State a reason for your conclusion. At most two sentences. 3. At most two sentences explaining what assumptions we used during our hy- pothesis test. (f) Without calculating the actual confidence interval, will the 98% confidence interval for the observed sample proportion cover the claimed 18%? You should provide a Yes/No answer and justify your response. 3. Suppose we now want to determine whether the current proportion of days with risky humidity levels (i.e., relative humidity exceeding 65) is higher than the level observed in the early 1900s. How would you formulate the hypothesis test, and what would be the conclusion? We will use the same R output from Question 2 to answer this question. (a) State the null and alternative hypotheses for this test. You may use the same parameter defined in Question 2. In addition, state which values of the test statistic argue against the null hypothesis. (b) Use your calculated P-value in Q2(d). Calculate the P-value for this new test. What is the conclusion of your hypothesis test at the 2% significance level? 4
学霸联盟