ETF5912 -无代写-Assignment 2
时间:2025-05-10
ETF5912 Data Analysis in Business
Semester 1, 2025
Assignment 2
Due: Friday 30 May 2025 (Week 12), 11:55 PM
This assignment is worth 25 marks. You are required to upload only ONE .docx (Microsoft
Word) …le through Moodle submission.
A group of up to four students may work together, and all group members are in the same
tutorial. Groups must consist only of students enrolled in the same unit; that is, ETF2121
and ETF5912 students cannot mix when forming a group. Your Word (.docx) …le needs to
be uploaded by only one member of each group. However, all group members must click the
"Submit Assignment" button on Moodle and accept the University’s submission statement.
This is essential, so please make sure to do this.
We strongly encourage students to attempt Questions 1, 2, and 3 themselves (indepen-
dently) before consulting with their group members. Question 4 encourages collaboration
among group members from the beginning (outset). Each group member will also be required
to complete an anonymous peer evaluation survey. The survey will be done via Feedback
Fruits on Moodle. You will be asked to rate your group members’participation and e¤ort.
These surveys may also be used to adjust your assignment marks (for example, if one group
member is deemed to only have contributed half as much as everyone else, they will only get
half of the assignment mark). Failure to complete the survey will result in a loss of marks.
Further details regarding Feedback Fruits will be released via Moodle in due course.
It is very simple to copy R output and paste it directly into a Word document for writing
up your assignment answers. To copy the R output, you can use "snipping tool" to capture
screenshots on a Windows computer. Just search for "snipping tool" on Google or YouTube.
Or you can take "screenshot" on your Mac - see this website: https://support.apple.com/en-
au/102646
1
Question 1. ETF5912 students only [4 marks = 0.5+0.5+0.5+0.5+0.5+0.5+1]
The dataset q1.xlsx contains 15393 individuals and includes the following variables:
gender = 1 if male
= 2 if female
salary = hourly salary in dollars
The dataset range from 1992 to 2008. Use the dplyr library to answer parts (a) and (c) and
provide the required R code and the R output.
The following R code will return all (rows) individuals with the following 3 columns: "year",
"gender" and "salary".
library(dplyr)
library(readxl)
mydata <- read_excel("q1.xlsx")
mydata %>%
select(year,gender,salary)
(a) Group by "gender" and calculate the average (mean) salary for each group.
(b) Fill in the blanks: Based on your answer to part (a), the average salary for males is
$ , and for females is $ .
(c) Group by "gender" and "year", then compute the average salary for each group.
(d) Fill in the blanks: Based on your answer to part (c), the average salary for males in
1992 is $ , and for females in 1992 is $ .
(e) The gender gap in salary refers to the di¤erence in salary between males and females.
Manually calculate (without using R) the gender gap in 1992, de…ned as the average
salary for males in 1992 minus the average salary for females in 1992.
(f) Manually calculate (without using R) the gender gap in 1996, 2000, 2004, and 2008.
(g) Interpret the …ndings from part f.
Question 2 ETF5912 students only [4 marks = 0.5 x 8]
Load the mtcars dataset and use the dplyr library to answer the following questions. Provide
the required R code. You are not required to include the R output for Question 2.
(a) Returns all (rows) cars with the following 2 columns: "hp" and "wt".
(b) Returns all (rows) cars where the column "am" equals 0.
2
(c) Fill in the blanks: Based on your answer to part (b), the "mpg" for Merc 240D is
.
(d) True or false? Justify your answer by including your R code. Referring to part (b), your
boss claims that among cars with automatic transmissions (am == 0), the Merc 240D
has the highest "mpg", indicating that Merc 240D has the highest fuel e¢ ciency.
(e) Add a new column called "power_to_weight", de…ned as power_to_weight = hp / wt
(ie. "hp" divided by "wt").
(f) True or false? Justify your answer by including your R code. Referring to part (e), your
boss claims that Merc 240D has the lowest "power_to_weight" ratio, indicating that
Merc 240D accelerates slower because there is less power available for its weight.
(g) Group by "cyl" and then calculate the mean "hp".
(h) Referring to part (g), which group has the highest mean "hp"? Justify your answer by
providing the required R code.
Just for fun, Merc 240D is an executive car produced by Mercedes-Benz
Visit the link below to view a photo of one:
https://www.hagerty.com/media/video/the-mercedes-benz-240d-has-always-been-for-outsiders/
Question 3. ETF5912 students only [7 marks]
You are not required to provide the R code for Question 3. This question will be marked
out of 21 and this mark will be converted to a mark out of 7 marks.
It is widely believed that workers with more education have, on average, higher wages than
workers with less education. The data set comprises a random sample of 399 full-time workers
and is stored in the Excel …le q3.xlsx. It includes the following variables for the workers.
wage = hourly wages in dollars
educ = years of education
exper = years of job experience
Consider the following linear regression model
wagei = 0 + 1 educi + 2 experi + "i:
(a) [1 mark] Use R to run a regression of wage on educ and exper: Provide the R output.
Write down the sample regression line. Report the results to 5 decimal places.
(b) [0.5 marks] Jenny has 12 years of education and 10 years of job experience. What is
Jenny’s predicted hourly wage?
3
(c) [0.5 marks] John has 13 years of education and 10 years of job experience. What is
John’s predicted hourly wage?
(d) [0.5 marks] Calculate the di¤erence in predicted hourly wage between John and Jenny;
that is John’s predicted hourly wage minus Jenny’s predicted hourly wage.
(e) [0.5 marks] In part (a), what is the value of ^1?
(f) [2 marks] Compare your answers in parts (d) and (e). Are they the same? Explain why
or why not?
(g) [2 marks] Interpret the coe¢ cient 2:
Consider the following non-linear regression model:
wagei = 0 + 1 educi + 2 experi + 3 exper
2
i + "i:
(h) [1 mark] Use R to run a regression of wage on educ; exper and exper2: Provide the R
output. Write down the sample regression line.
(i) [3 marks] Which model …ts the data better, the estimated model in part (a) or part (h)?
Explain.
(j) [2 marks] Using = 0:05 and the p-value approach, test
H0 : 1 = 0
HA : 1 6= 0
(k) [2 marks] Using = 0:05 and the p-value approach, test
H0 : 2 = 3 = 0
HA : at least one i (i = 2; 3) is not equal to zero
Consider the following regression model:
ln (wagei) = 0 + 1 educi + 2 experi + "i:
(l) [3 marks] Use R to run a regression of ln (wage) on educ and exper: Provide the R output.
Interpret the coe¢ cient 1:
(m) [3 marks] If exper increases from 20 to 22, can you determine how wage (but not
ln (wage)) is expected to change on average, holding educ …xed? If not, explain why
not; if yes, explain why and how.
Note: 6.123e-02 means 6:123 102; which is just 0:06123. The e-02 means "move the
decimal point 2 places to the left."
4
Question 4. ETF5912 students only [10 marks]
You are employed as an analyst in a consulting …rm in Melbourne. Your consulting …rm
has a consulting contract with a major housing construction company. You have been asked
by your manager to write a report that uses statistical techniques that you have learnt in
ETF5912 seminar from Week 1 to Week 9 (inclusive) to characterise the housing market in
Melbourne.
The construction company wants to target its housing building plans. The company is
interested in knowing how housing prices (dependent variable) are a¤ected by the size of
house and the number of bedrooms. In particular the company wants to know which of
the two variables –size of house or number of bedrooms –is relatively more important in
determining the price of a house, and why? The company is interested to know what is the
expected change in housing price of building a 25-square-meter addition to a house. Also
the di¤erence in the price of a house that has an ocean-view compared to a house that does
not have an ocean-view.
The construction company has given you a data set in an Excel …le (etf5912_q4.xlsx) that
contains information about 88 randomly selected houses in Melbourne to undertake this
assignment. The Excel …le contains the following variables:
1. price is the house price in thousands of dollars
2. size is the size of house in square-meters
3. bdrms is the number of bedrooms
4. aircon = 1 if a house has central air conditioning
= 0 if a house does not have central air conditioning
5. ocean = 1 if a house has an ocean-view
= 0 if a house does not have an ocean-view
Your report should include all empirical results obtained using R. For example, it may
contain data wrangling, graphs, simple and multiple regression models, hypothesis testing,
and interpretation of the empirical results.
The aim of this report is to allow students to undertake statistical analysis by using the
techniques taught in class to investigate a real-world problem. This question is intentionally
open ended and so there are not necessarily "right or wrong answers". The quality of
your brief report counts. For example, if you wrote "So many people wear heavy coats
during winter because they want to stay warm" would receive more marks than if you
wrote "So many people wear heavy coats during winter because they are fashion-conscious".
Remember you are an analyst writing a report for your boss :) Of most importance is a
correct justi…cation of your empirical results.
Although there is no strict word limit for your report, we strongly suggest that it does
not exceed 900 words, excluding tables, graphs and appendix. Any R code not included in
the main body of the report may be incorporated in an appendix. The appendix will not
be marked and will be used only to check whether the R output included in the report is
reported correctly.
5

学霸联盟
essay、essay代写