ACS61013-Python代写
时间:2022-11-22
Module title: ACS61013
Assignment Name: Coursework 1
Person
responsible and contact
details:
Dr John Oyekan Assignment
weighting:
60%
Assignment released: 18th of Nov 2022 Assignment hand in: 9th of Dec 2022
Assignment due date: Hand in by 11pm on the 9th of December; this course work makes
up 60% of your total module mark. Submit your report as a pdf file on Blackboard. Also,
include your Orange (.ows), Python and your MATLAB files as part of your submission.
Unfair Means: The assignment should be completed individually. You should not discuss
the assignment with other students and should not work together in completing the
assignment. The assignment must be wholly your own work. Any suspicions of the use of
unfair means will be investigated and may lead to penalties. See
http://www.shef.ac.uk/ssid/exams/plagiarism for more information.
Penalties for Late Submission: Late submissions will incur the usual penalties of a 5%
reduction in the mark for every working day (or part thereof) that the assignment is late and
a mark of zero for submission more than 5 working days late.
Extenuating Circumstances: If you have any extenuating circumstances (medical or special
circumstances) that might have affected your performance on the assignment, please follow
the guidance at https://www.sheffield.ac.uk/ssid/forms/circs
Help: This assignment briefing and the lecture notes provide all the information that is
required to complete this assignment. It is not expected that you should need to ask further
questions. However, if you need clarifications on the assignment then please discuss the
issue with me after a lab, or email me at j.oyekan@sheffield.ac.uk.
Specific assignment information and instructions
The challenge: You have been provided a dataset made up of the energy usage of a
domestic dwelling as well as the weather conditions in the area. The data set contains the
time the data (in current epoch unix timestamp format. See this link) was collected, the energy
usage by the various appliances and rooms in the dwelling, the amount of energy generated
by an installed solar panel as well as various weather conditions such as cloud cover, wind
speed, precipitation etc. In total, there are 30 features in the dataset most of which are self-
explanatory. In addition to your domain analysis, study the Appendix as well as the columns
in the dataset to understand the meaning of the features.
Tools to use: Majority of the MATLAB code you need to complete the assignment are
available from various lab sessions. If you are comfortable using Python, you are free to use
it. You are also free to use Orange for various aspects of the coursework as required.
Tasks and Mark Scheme: The aim of this coursework is to design, implement and evaluate
effective machine-learning pipelines for various tasks. The specific tasks and the
corresponding marking schemes are given in the table below. It is up to you to decide how
you approach the various tasks, design a solution and write-up your results. For each task,
the mark within the grade boundary will be based on your discussion in your report and results
obtained.
Task/Assessment Description Mark
Range
Level of
achievement
Task 1: Conduct and write a domain analysis that discusses
the important weather features that affect the energy usage of
a house as well as discuss the weather features that could
affect the energy generated by the solar panel attached to the
house.
Discuss how what you have found from your domain analysis
will support and be carried over to other parts of your work.
0-10% 1
Task 2: Achieve level 1 as well as conduct data cleaning,
pre-processing and feature engineering.
Discuss how you used your understanding of the domain from
level 1 to support this task. This should also involve
discussions on deciding which features to drop and which
relevant features to keep. Support your explanation by
applying dimension reduction (e.g PCA or Hierarchical
Clustering Analysis) techniques.
10-20% 2
Task 3. Achieve all the previous levels as well as build a
regression model (decide which hypothesis function that is
best to use e.g polynomial or linear etc) or neural network
model to predict the value of energy usage from at least two
weather features you deemed important to keep from Task 2.
20-30% 3
Task 4. Achieve all the previous levels as well as use
learning curves to discuss how effective your regression
model machine learning pipeline is at preventing overfitting
and underfitting.
30-45% 4
Task 5. Achieve all the previous levels plus discuss which
cross validation technique you applied in Task 4 above and
why.
45-50% 5
Task 6. Using the features you consider most important to
this challenge, apply a classification machine learning
methodology (e.g Decision Trees or Neural Network) to build
a model that predicts when energy usage of a house will be
LOW, MEDIUM or HIGH.
Use the classification metrics of confusion matrix, accuracy,
precision and recall to explain your results.
Can your model explain or highlight which appliance is used
most when the energy usage is LOW, MEDIUM or HIGH?
50-65% 6
Task 7. From Task 6, compare the results of the Decision
Tree methodology with Neural Network methodology using
the classification metrics of confusion matrix, accuracy,
precision and recall to explain your results.
Demonstrate and explain how model complexities (both
Decision Tree and Neural Network) affect the results you
obtain.
65-80% 7
Task 8. Achieve all the previous levels and the below:
Using the dataset given, compare the results of the
machine learning algorithms above with the results of
two other algorithms that we have not covered in class.
Discuss the mathematical peculiarities of the
algorithms you have chosen (strengths and
weaknesses) and how they impact the results you
obtain.
Apply the appropriate metrics to compare the
algorithms you have chosen with the ones we have
used in class.
80-100% 8
Technical Report and code
Write your results in no more than a 15 page technical report. Make sure your report has
a table of content, sections, discussion and conclusions.
You must create a MATLAB (or Python code) and an Orange pipeline design for your
solution(s). Support your report with an Orange pipeline design and MATLAB code. Make
sure you provide comments in your MATLAB code as well as instructions on how to run it.
Hand in your report (.pdf), software (Orange and MATLAB (or Python)) via Blackboard by
11pm on the 9th of December 2022. This course work makes up 60% of your total module
mark.
Appendix
Most of the features in the dataset are self-explanatory. However, I have highlighted a few
below:
Features Description
Time The time is provided in the current epoch unix timestamp format.
See this link
Use [kW] This is the energy usage of the house. This is similar to the house
overall [kW] column.
Gen [kW] The is the energy generated by the solar panel attached to the
building. This feature is similar to the Solar [kW] column.
Weather Icon These are weather icons used to indicate weather conditions.
PrecipIntensity This means precipitation intensity
PrecipProbability This means precipitation probability