MATH1041 -无代写|学霸联盟

MATH1041 -无代写

时间：2025-04-10

MATH1041 Statistics for Life and Social Science
Term 1, 2025
MATH1041 Assignment
Data: Together with this document, you should have received your unique dataset in an e-mail sent to your
official university email address. The data (i.e., your dataset) are available in a text file with the name 5428814.csv.
If you have not received your dataset (double check your UNSW email inbox and the spam folder), please contact
your lecturer.
Submission due date: Tuesday 15th April (Week 9) before 11:59 PM (Sydney time, AEST). Note that
a late penalty of 5% of the maximal possible mark per day will apply. No assignment will be accepted more
than five days after the deadline.
Your submission must contain your full name and student zID at the top of your assignment. Submit your
assignment through Turnitin via Moodle. See the “Assessments Hub” section on Moodle for further information
regarding online submission.
Please submit a neatly typed assignment as a Microsoft Word document (.doc or .docx), see the
information and help about the assignment in the assessment section on Moodle, or as a PDF document (.pdf)
created for instance using Google Docs, LATEX, RMarkdown or similar tools. For your convenience, there is a
Microsoft Word template that can be downloaded from Moodle which you can write your assignment in, that is
already in a format appropriate for this assignment.
Verify that your assignment has been submitted correctly by downloading the submission receipt and clicking on
the link to check that it displays correctly in the Turnitin viewer. If not, it is your responsibility to make the
necessary amendment.
Typesetting (*) /2
Q1 /5
Q2 /9
Q3 /13
Q4 /17
Q5 /15
Q6 /4
Total /65
(*) See the next pages and the “Assessments Hub” on Moodle for details, help and explanations about the assignment and
typesetting.
Note that your assignment and dataset is unique. You cannot show your dataset or your
assignment to anyone. It is your responsibility to keep your dataset and your assignment secret.
Also, your assignment must be your own work. You cannot get any outside help in
any form. If you have a question about the assignment, the only places where you can ask it is
on the MATH1041 Assignment help FORUM, provided you do not reveal your data, or at a staff
consultation.
2
Computing assignment format
Keep in mind that this assignment is not only about assessing your Statistical skills; it is also about giving you
feedback on your Mathematical writing skills. The assignment must be typeset correctly and provide complete
but concise explanations in complete English sentences and paragraphs. Think of this as practice for a document
you might produce in your future studies or career that includes mathematical explanations.
Here are some more details that may assist you:
• Regarding the overall assignment structure, please answer all questions in the given order (that is, 1.a.,
1.b., ... etc). Do not re-write the assignment questions again, only their label (write “3.e.” for
instance when you start question 3.e.). Keep your answers brief, clear and concise.
DO NOT reproduce the cover sheet, i.e., the first 5 pages of the pdf file sent to you, in your assignment.
• Start your answer to each Question (1, 2, etc.) on a new page. Each Question should start on a
new page, but sub-parts of a Question (such as Question 3.d., 3.e.) should continue on the same page.
• You are required to type up your entire assignment (in Microsoft Word, Google docs, LATEX, Overleaf
or RMarkdown) including any equations. The only exception are the plots produced by RStudio, for
which you can save the figures (use “export” in the bottom right window in RStudio) which you then
paste in your assignment. Nothing can be handwritten then scanned. As a UNSW student, you can
download Microsoft Word for free, see: https://www.myit.unsw.edu.au/software-students.
• As in any properly typeset document containing mathematic symbols, you must use an equation editor
for all maths symbols. For instance, you should write “X is normal”, rather than “X is normal” (Notice
how the ‘X’ looks different?) and you should write “tobs = 1.23”, rather than “tobs = 1.23”.
The marking scheme for this criterion is the following: Are mathematical symbols typeset using the
equation editor (or LATEX)? 2 marks for ‘almost always’, 1 mark for ‘sometimes’, 0 mark for ‘rarely’.
Help about Microsoft equation editor can be found in a document called Microsoft Word Equation editor
help for MATH1041 located in the Moodle’s Assignment (20%) section within the Assessments Hub section.
• You should add some working out for the questions involving calculations; do not just give the final answer.
Note that you may get partial marks for clear explanations and a correct method even if you get the wrong
answer. However, try to keep your solutions brief and concise. Depending on what the question is asking,
your working out could consist of RStudio commands, a formula, or perhaps the main steps explaining
how you arrived at your answer. Only include key R commands, using a different font or colour.
• Keeping your results to 3 or 4 significant figures should be fine. If there are multiple steps in a calculation,
do not round any numbers until you have reached the final step. To help you do calculations correctly in
RStudio without rounding, values should be stored as variables, rather than copying the output number
into a further calculation. For example, if you are constructing a confidence interval and need to calculate t∗,
you should write the code: tstar <- qt(0.975, df = 10) and then use the variable tstar in calculating
your confidence interval, rather than pasting in the number 2.228139.
• There is no requirement for font size and line spacing but please make sure your assignment is readable —
do not make the font size too small or the spacing too compact.
• If the question asks you to produce a graph/plot, you should always include that graph in your answer,
unless otherwise specified.
• It is FORBIDDEN to use functions from the Tidyverse suite of packages (e.g., ggplot, etc.).
3
Scenario Do NOT copy-paste these data
A group of research ecologists were interested in
studying the impacts of climate change on different
species of plants that grow in New South Wales,
Australia. Some of these plants are native to Australia
while others are non-native (exotic).
To obtain their data, the research team decided to col-
lect a random sample of n plants from a national park.
Some measurements were then taken in 2024 on each
plant. The random sample of data consists of plant
height measurements (measured in centimeters), dry
weight measurements (measured in grams), whether
the plant was native or non-native to Australia and
the polinization mode of the plant (this could be one
of four types: wind, water, insect and self-polinization).
A limited number of rows of your unique personal
dataset is shown on the right. Your data set contains
four variables:
• Height which corresponds to the heights,
• Weight which corresponds to dry weight of a
plant,
• Type which corresponds to plant type (native =
0 and exotic = 1),
• Polin which corresponds to the polinization
mode of the plant (Wind, Water, Insect and Self).
Your job is to assist the research team by analysing
the data set provided to you.
## Weight Height Type Polin
## 18.63 117.27 0
## 26.38 167.34 0 Self
## 21.21 144.77 1 Water
## 21.20 150.80 0 Self
## 23.66 160.88 0 Self
## 23.22 171.89 0 Self
## 27.36 198.51 1 Water
## 21.62 134.98 0 Self
## 28.66 212.83 1 Water
## 26.61 171.97 1 Water
## 15.98 129.58 0 Self
## 21.45 199.92 0 Insect
## 25.11 153.30 0 Self
## 27.25 217.86 1 Insect
## 30.90 214.63 1 Water
## 20.25 111.85 0 Self
## 29.68 226.12 1 Insect
## 22.53 189.70 1 Wind
## 20.33 131.83 0 Insect
## 21.94 166.34 1 Insect
## 20.27 155.97 0 Self
## 22.27 175.60 1 Wind
## 30.41 229.32 1 Insect
## 19.31 128.13 0 Self
## 33.10 234.04 1 Wind
## 23.13 138.08 0 Insect
## 21.90 160.67 0 Self
## 14.96 118.57 0 Insect
## 35.81 246.04 1 Wind
## 21.07 138.20 0 Self
## 28.40 210.89 1 Insect
## 15.79 153.13 0 Self
## 22.87 185.12 1 Water
## 20.28 152.31 0 Self
## 23.97 162.33 0 Insect
## 22.67 177.58 1 Insect
## 21.60 158.10 1 Water
## 32.00 202.77 1 Insect
## 18.96 142.85 1 Wind
## 20.75 135.65 0 Self
## .........................................
4
Reading the data into RStudio
The data are in a text file with the name 5428814.csv. This file was sent to you by e-mail (see page 1). To
complete this assignment, you need to use the FULL dataset provided to you by email. Do NOT copy and paste
the data on the previous page as your dataset.
The first step is to read the data into RStudio. The data format is like what you have already worked with
in the Weekly Mobius lessons. Follow the instructions given in section R1.4 “How to import a text file into
RStudio” of the RStudio “How-To-Manual” available on Moodle. Alternatively, you can also review your lecture
slides to find the R function to use to import a CSV file into RStudio. Two arguments that are often used when
calling this function are header = TRUE (to indicate whether names of variables are present on the first line in
the file) and row.names = 1 (to indicate that names of the cases are provided in the first column). Another
very important argument is colClasses which takes a vector of classes to be assumed for the columns (such as
"character" for strings, "factor" for a categorical variable, and "numeric" for a quantitative variable). Once
you have uploaded the data then you are ready to start your analysis!
Checkpoint: To make sure everything is all right, we suggest that you first calculate the average of
the n = 500 values read from your file 5428814.csv for each quantitative variable, and check that
they match the values given below. If your data have been stored in an R object called student.data,
you can type print(colMeans(student.data[, unlist(lapply(student.data, is.numeric))], na.rm =
TRUE), digits = 5) where na.rm = TRUE indicates to remove non available () (i.e., missing) values, if
any.
## Weight Height Type
## 23.302 164.100 0.464
They do? It means you imported the data correctly in RStudio. You are ready to start!
IMPORTANT: Completing this checkpoint is essential. If you load in the dataset incorrectly, you will have
incorrect answers throughout the entire assignment, and you will have marks removed for every incorrect answer.
5
The Analysis Tasks
The questions below are structured in a logical sequence, applicable to analysing a wide range of real-world data
sets. Working through these questions will deepen your understanding of key concepts covered in the lecture
slides and help you locate specific information within them–a skill that will prove invaluable for the final exam.
While the slides will be available during the exam, the volume of content makes it challenging to quickly find the
necessary information unless you have spent substantial time reviewing and familiarising yourself with them
beforehand. Therefore, it is strongly recommended that you carefully review the lecture slides as you complete
this assignment.
PART I: Study Design
Q1. In this question, you will think about the research questions and aspects of study design. For all parts
in Q1, your answers should be no more than one sentence long.
1.a. Briefly, explain what is the research question that the stakeholders are interested in based on what is
described in the scenario. Keep this in mind when you analyse the data in Parts II and III.
1.b. What is the population that is of interest to researchers?
1.c. What are the cases here? (We do not expect a list of all cases here.)
1.d. Is it an observational study or is it an experiment? Provide a brief justification for your answer.
With the markers in mind, in your assignment, please start every question on a new page.
Q2. In this question, you will describe the organisation of the data. For each one of part a–c, your answer
should be no more than two sentences.
2.a. Your data is provided to you in a specific file format. What is the extension of the data file and what
does the extension stand for?
2.b. What is the sample size? (We expect a value here.)
2.c. Are there any IDs (labels) provided in this data set? If no, how could you define them?
2.d. Complete the table below so that it lists all of the variables that are contained in the dataset and the
type of each variable. You should add rows to the table as required. When describing the type of each
variable, you should be more specific than just saying that the variable is categorical or quantitative,
i.e., you should specify what kind of categorical or quantitative variable it is.
Table 1: Table to be completed and submitted with your assignment.
Variable Name Variable Type
6
With the markers in mind, in your assignment, please start every question on a new page.
PART II: Exploratory Data Analysis
Q3. Your second task, as any statistician would, is to explore your data with univariate analyses to gain a good
understanding of each variable in the data set. This is always a good strategy to help you detect problems
in a data set, and also to know enough about your data to better answer the research questions.
3.a. Let us deal with missing values first, if any. How many missing values are there in your dataset? You
can determine this using the R function is.na(). (They are indicated by NA entries after importation
into R, a code meaning “Non Available”.) Just state the number of missing values.
3.b. When doing initial data exploration, it is always good to consider the potential reasons for missing
data and where they appear in the data set. One way to handle missing values is sometimes to replace
all of them with a suitably chosen value. Other times, it is more appropriate to leave them as they
are. What is or are the variable that contain missing values, if any? What is the appropriate strategy
here? Justify your answer. Your answer should be no more than two sentences.
3.c. We now move on to univariate graphical summaries. Create a boxplot of the variable Weight. Include
it in your submitted assignment properly labelled.
3.d. Comment on the presence or absence of outliers in the boxplot you produced in part 3.c (in no more
than one sentence).
3.e. Create an appropriate graphical summary for the variable Polin Include it in your submitted
assignment properly labelled.
3.f. Comment in no more than one sentence on the graphical summary in part 3.e.
3.g. We now move on to univariate numerical summaries. Compute an appropriate numerical summary
for the variable Type.
3.h. In no more than one sentence, comment on the result of part 3.g.
3.i. Compute the five number summary of variable Height for all subjects combined (healthy and
non-healthy). (Do NOT use the fivenum() function since it does not compute the quartiles.)
3.j. In no more than one sentence, comment on the result of part 3.i.
7
With the markers in mind, in your assignment, please start every question on a new page.
Q4. 4.a. We now move on to bivariate graphical summaries. We want to study the relationship between the
variables Height and Weight. What type of graphical summary is appropriate for this? Just state
the name of the summary. (Do not give the graphical summary at this point. You will be asked to do
so in part 4d.)
4.b. It is sometimes appropriate to add a least-squares line to graphical summaries of the kind referred to
in part 4.a. Is it the case here? Just answer yes or no for this part.
4.c. Justify your answer to the part 4.b. Write no more than 3 sentences.
4.d. Now, produce the graphical summary referred to in part 4.a. Ensure that your plot is properly labelled
and include it in your assignment.
4.e. Describe the nature of the relationship observed on the plot you produced in part 4.d, using the four
adjectives (or their antonyms) given in the lecture slides. Your answer should be no more than four
sentences, but writing only one sentence should suffice. Do you notice some unusual observations on
this plot?
4.f. What is an appropriate numerical summary to describe the relationship between the Weight and
Height? Just state the name of the numerical summary.
4.g. Compute the value of the numerical summary referred to in part 4.f. Give your answer to at least
two decimal places.
4.h. Comment on the value of the numerical summary you computed in part 4.g in no more than one
sentence.
4.i. Produce an appropriate graph to study the association between the variables Polin and Type. Ensure
that your plot is properly labelled and include it in your assignment.
4.j. Comment on the plot you produced in part 4.i in no more than two sentences.
8
With the markers in mind, in your assignment, please start every question on a new page.
PART III: Modeling and Inference
Q5. Now, we are going to do some modeling and statistical inference.
5.a. Let µ be the population mean plant height (in centimeters) of plant heights in the national park now.
The research team decided to compare the current mean plant height with the mean from 20 years
ago using plant height data obtained from the same national park. It is known that the mean plant
height from 20 years ago is µ0 = 190 centimeters. Recall the name of the hypothesis test strategy you
can use here (just state the name of the test).
5.b. Perform an appropriate hypothesis test to compare the true means of Height between the two periods
of time. You must summarise all steps in your solution:
• (1) state the null (give both H0 and H˜0) and alternative (Ha) hypotheses relevant to the research
objectives stated in this scenario,
• (2) write down an expression/formula for a suitable test statistic,
• (3) give its observed value in the sample,
• (4) give the null distribution for this statistic,
• (5) give the expression of the P-value,
• (6) give the numerical value of the P-value,
• (7) give your interpretation of the P-value and
• (8) give your conclusion in plain language.
5.c. Some assumptions need to be made for the sampling distribution of the test statistic of Part 5.b to be
valid. State these assumptions, briefly discuss the validity of all the assumptions needed to safely
apply this hypothesis test. Justify.
With the markers in mind, in your assignment, please start every question on a new page.
Q6. 6.a. Produce a 95% confidence interval for µ, the present mean heights. For this question you may assume
that it is appropriate to use a t-distribution.
6.b. Does this confidence interval include the value µ0 given in part 5.a?
6.c. Is your answer to 6.b consistent with your conclusions from the hypothesis test in part 5.b?
6.d. Referring back to the scenario, write a one-sentence plain-language interpretation of the confidence
interval obtained above.
END OF ASSIGNMENT
9

学霸联盟