QBUS2820-无代写-Assignment 1
时间:2024-09-10
QBUS2820 Assignment 1 (30 marks)
August 23, 2024
1 Background
Developing a predictive model for building heating load is essential in energy efficiency
management. Suppose you work for an energy efficiency consulting firm, and your task is to
optimize the heating system operations of buildings by predicting their daily heating load
requirements.
The variable HeatingLoad in the dataset HeatingLoad training.csv represents the
daily energy required to maintain comfortable indoor temperatures in buildings. This data
includes several predictors that influence heating load, such as building characteristics, en-
vironmental conditions, and occupancy. The response variable and covariates are detailed
in the table below.
Variable Description
HeatingLoad Total daily heating energy required (in kWh)
BuildingAge Age of the building (in years)
BuildingHeight Height of the building (in meters)
Insulation Insulation quality (1 = Good, 0 = Poor)
AverageTemperature Average daily temperature (in °C)
SunlightExposure Solar energy received per unit area (in W/m²)
WindSpeed Wind speed at the building’s location (in m/s)
OccupancyRate Proportion of the building that is occupied (percentage)
Table 1: Description of Variables
Your task is to develop a regression model to predict HeatingLoad based on these covari-
ates. Additionally, you are provided with the dataset HeatingLoad test without HL.csv,
which is the real test dataset HeatingLoad test.csv with the HeatingLoad column re-
moved. The test dataset HeatingLoad test.csv (not provided) has the same structure as
the training data HeatingLoad training.csv.
1.1 Test Error
To measure prediction accuracy, please use mean squared error (MSE) on the test data. Let
yˆi be the prediction of yi, where yi is the i-th HeatingLoad in the test data. The test error
is computed as follows:
Test error “ 1
ntest
ÿ
yiPtest data
pyˆi ´ yiq2,
where ntest is the number of observations in the test data.
1
2 Submission Instructions
1. Please submit THREE files (or more if necessary) via the Canvas site:
• A document file named SID Assignment1 document.pdf, reporting your data
analysis procedure and results. You should replace “SID” with your student ID.
• A Python file named SID Assignment1 implementation.ipynb that imple-
ments your data analysis procedure and produces the test error. You may submit
additional files if needed, following the format SID Assignment1 .
• A CSV file SID Assignment1 HL prediction.csv containing the predictions
of HeatingLoad for the dataset HeatingLoad test without HL.csv. This CSV
file should have only one column, named HeatingLoad, which holds the pre-
dicted values.
2. Regarding your document file SID Assignment1 document.pdf :
• Detail your data analysis procedure: how the Exploratory Data Analysis (EDA)
was conducted, the methods/predictors used, and the reasoning behind them.
The description should be thorough enough for other data scientists in your field
to understand and replicate the task. All numerical results should be reported to
four decimal places.
• Present relevant graphs and tables clearly and appropriately.
• The page limit is 15 pages, including everything: appendices, computer output,
graphs, tables, etc.
3. The Python file must be written using Jupyter Notebook, assuming all necessary data
files (HeatingLoad training.csv and HeatingLoad test.csv) are in the same folder
as the Python file.
• The Python file SID Assignment1 implementation.ipynb must include the
following code in the last code cell:
import pandas as pd
HeatingLoad_test = pd.read_csv("HeatingLoad_test.csv")
# YOUR CODE HERE: code that produces the test error test_error
print(test_error)
The marker expects to see the same test error you would obtain if you were
provided with the complete test data. The file should contain enough explanations
for the marker to run your code.
• Use only the methods covered in the lectures and tutorials. You are free to use any
Python libraries to implement your models as long as they are publicly available.
2
3 Marking Criteria
This assignment is worth 30 marks in total, with 18 marks allocated to the content of
SID Assignment1 document.pdf and 12 marks to the Python implementation. The
marking breakdown is as follows:
1. Prediction accuracy: Your test error will be compared against the smallest test error
among all submissions, including the teaching team.
The marker first runs SID Assignment1 implementation.ipynb.
• If the file runs smoothly and produces a test error, up to 12 marks will be awarded
based on prediction accuracy relative to the smallest MSE and the appropriateness
of your implementation.
• If the marker cannot run SID Assignment1 implementation.ipynb or if no
test error is produced, partial marks (maximum 4) may be awarded based on the
appropriateness of the file.
2. Report described in SID Assignment1 document.pdf : Up to 18 marks are allo-
cated based on:
• The appropriateness of the chosen prediction method.
• The detail, discussion, and explanation of your data analysis procedure.
See the Marking Criteria for more details.
3. CSV File Submission: Up to 2 marks will be deducted if you fail to upload the CSV
file in the correct format.
4 Errors
If you believe there are errors in this assignment, please contact the teaching team.