Group Project
COMP9417: Machine Learning & Data Mining
Aims
Ø Acquire more hands-on experience with ML techniques
Ø Gain more practical skills in handling ML problems
Ø Exercise communication skills in motivating, reporting and
summarising work done on a ML task
1COMP9417
Group Formation
Group formation must be completed and registered by 5 pm, Sunday,
12 March, on Moodle.
COMP9417 2
Project Scope
Each team identifies a predictive problem irrespective of domain or
application and goes through the steps:
Ø Identifying the problem where predictive modeling is needed.
Ø Finding, cleaning and preparing relevant datasets.
Ø Exploratory Data analysis, feature engineering, modeling and
validation.
Ø Story-telling & presentation to articulate the value and
contribution of their work.
COMP9417 3
Project Scope
Teams can pick a problem from any domain or industry - environment,
architecture, law, finance, engineering, arts, medicine, etc. The goal is to
encourage diverse thinking and innovative application of predictive modeling in
solving problems that matter within those domains or industries.
The predictive problem can be from any domain or industry but must be one of
the following five types of predictive problems:
Ø Yes or No Prediction - i.e., 2-class classification
Ø Multiple class classification
Ø How Much - i.e., regression analysis to predict quantities.
Ø What Next - i.e., recommendation problems for the next best action,
product or offer
Ø When - i.e., time or date-based forecasting of anticipated temporal
events.
Ø What’s Odd - i.e., anomaly detection
COMP9417 4
Project Topic: Topic 1 – Propose your own
The objective of this topic is to propose a machine learning problem, source the
dataset(s) and implement a method to solve it. This will typically come from an
area of work or research that you have some previous experience and have
access to some data for it (could be a public dataset).
Ø it must involve some practical work with some implementation
of machine learning
Ø you must send an email to the course admin (use the class account)
with a description of what you are planning (a couple of paragraphs
should be enough) that needs to be approved in an emailed reply before
you start
Ø it must not involve double-dipping, i.e., be part of project for another
course, or for research postgrads it must include a statement to the
effect that it is not part of the main work planned for the thesis (although
it can be related)
Ø If you choose to do topic 1, the deadline to propose a project is the
Friday of week 5 (17th March 2023).
COMP9417 5
Project Topic: Topic 2 – Challenges and Competitions
Kaggle competitions are hosted here. You may only work on competitions that
are labelled either Featured or Research or Analytics. You can select one
from either Active or Completed competitions to work on.
Ø assess carefully the time you will need to understand the competition
requirements, get familiar with the data and run the algorithm(s) you
plan to use
Ø for live competitions you can include your submission’s placing on the
leaderboard at submission time! Note however, that your grade will not
be determined solely by your leaderboard ranking. Of course, it will be
great to do well in the competition, but we are mainly grading you based
on your approach and final report.
Ø You do not need admin approval for this topic. You must include a link to
the competition on the first page of your report. Failure to do so will
result in a 2 mark immediate penalty.
COMP9417 6
Note: some of the datasets on Kaggle are big — you can sample a
subset of the data for your project, just make sure that how you do this
sampling is detailed in your group’s report.
Please discuss carefully with your group if you want to do this topic, and
also search Kaggle for more options before you make your selection.
Highlight in your report how your approach is different than the available
implementations.
COMP9417 7
Project Topic: Topic 2 – Challenges and Competitions
Project Topic: General Considerations
Ø Do not choose a project that needs a significant amount of data processing,
or ’create’ a dataset, as we are primarily interested in machine learning in
this course, not data cleaning. Of course, most tasks will require some
preprocessing.
Ø A larger group is expected to achieve more, and group size will be taken
into consideration when assigning marks for achievement and extra
features.
Ø Choose a topic that interests you, but be pragmatic when it comes to time
requirements and difficulty of the project.
Ø Use common sense when choosing competitions/datasets/models. Do not
expect a good grade if you choose a very simple task.
COMP9417 8
Project Schemes
You can do your project as part of:
Ø Berrijam Jam competition with prizes
Ø or no competition.
Berrijam Jams are competitions with cash prizes and awards to encourage the
creative application of machine learning and data science among students and
aspiring data scientists. Berrijam is an AI company that sponsors prizes and
awrds for the best projects. However, to do your project as part of Berrijam
Jam, your project needs to comply with their requirement and all team
members must sign the Berrijam Jam Terms and Conditions of the competition
which can be found in their website (see next slide).
Note: Berrijam competition team members will receive an email to sign the
T&C for the competition.
COMP9417 9
Berrijam Jam Information
• The Jams webpage contains all information and FAQs, and will be updated
throughout the jam. Some example questions:
– What data can I use?
– What are the data licenses allowed?
• The Terms and Conditions (T&Cs) of the Jam.
• A video from the Berrijam founder discussing the Jam.
• Next steps if you choose to do the Jam:
– Declare your team members and preference for Berrijam on Moodle (we
will make an announcement soon)
– Berrijam will contact you to fill out the T&Cs
– Any Jam specific questions can be asked on the Jams webpage
• The rank in the Berrijam competition does not directly affect your total mark
(e.g., there might be a very report).
COMP9417 10
Project Scheme
The project scheme choice needs to be finalized through the Moodle
link (will be announced) by 5 pm, Sunday,12 March .
COMP9417 11
Submission
The final group project submission has two parts:
1. Presentation and Code+Data
o Codes must be combined into a single tar or zip archive
o Presentation videos need to be in .mp4 format
o Data should be included. If data files are too large, please
host them on Onedrive and provide a link in your submission.
2. Report
o must be a single document in PDF format.
o must include names and zIDs of ALL team members
All submissions will be via the Moodle page.
Note: ONLY ONE person on the team submits both parts of the
assignment.
COMP9417 12
Marking
Total: 30 marks available
1.Report (22 marks)
2.Presentation and achievement (5 mark)
3.Code (3 marks)
COMP9417 13
Group Configuration
Each team comprises 3-5 group members, and this group must be
declared on Moodle under Group Project Member Selection by by 5 pm,
Sunday,12 March.
Ø Teams can consist of students from different tutorials, and groups can
consist of PG and UG students.
Ø Larger teams are expected to do more (achievement grades will be
affected by this)
Ø Individual contributions to the project will be assessed through a peer-
review process which will be announced later after the reports are
submitted. This will be used to scale marks based on contribution.
Anyone who does not complete the peer review by 5 pm Thursday, 27
April will be deemed to have not contributed to the assignment. Peer
review is confidential, and group members are not allowed to disclose
their reviews to their peers.
COMP9417 14
Member Contributions
Ø Please note that 80% of your mark will be weighted based on your
individual contribution. Individual contributions will be assessed
through a peer-review process.
Ø We expect all group members to contribute equally to any work
submitted.
Ø In the case the group feels that one or more of the students have not
contributed sufficiently, we will take steps to re-distribute the marks
accordingly.
Ø Some good advice: Keep a record of your contributions throughout
the project. Keep a record of all communications with other group
members (emails/chat), etc. In the event of a group dispute, we will
request evidence from all group members about their contribution.
COMP9417 15
Deliverables
COMP9417 16
Presentation
Each team has to submit a 2-minute video presentation of their project:
1. Tell a story about their problem, why it is important to solve it, the
data they used, how they modelled it, how they evaluated it and
what they discovered. How they went beyond the previous
models (if applicable)
2. PowerPoint, Google Slides or PDF used in their video
COMP9417 17
Code
All code files (in Python) that is required to recreate the findings should
be submitted in .zip format. Codes are expected to be well-organised
and well-commented. It is suggested to include a readme file to give an
instruction on how to run the code to replicate the findings.
COMP9417 18
Presentation & Code Submission Deadline
The deadline to submit the group presentation video and code is on
Friday, 14 April 5:00 pm.
For the Berrijam competition, two identical submissions are expected
by the same deadline: one on Moodle and one on the Berrijam platform
COMP9417 19
Report Structure
Each team must write a detailed report outlining their exploration of the data
and approach to modelling. The report is expected to be 10-12 pages (with a
single column, 1.5 line spacing) and easy to read. The body of the report
should contain the main parts of the presentation, and any supplementary
material should be deferred to the appendix. For example, only include a plot if
it is important to get your message across.
The guidelines for the report are as follows:
1. Title Page: tile of the project, name of the group and all group members
(names and zIDs).
2. Introduction: a brief summary of the task, the main issues for the task
and a short description of how you approached these issues.
COMP9417 20
Report Structure
3. Exploratory Data Analysis: this could be a crucial aspect of this project
and should be done carefully. Some (potential) questions for
consideration: are all features relevant? How can we represent the data
graphically in an informative way? What is the distribution of the
classes? What are the relationships between the features? …
4. Methodology: A detailed explanation and justification of methods
developed, method selection, feature selection, hyper-parameter tuning,
evaluation metrics, design choices, etc. State which method has been
selected for the final test and its hyper-parameters.
5. Results: Include the results achieved by the different models
implemented in your work using a sensible evaluation metric. Be sure
to explain how each model was trained and how you chose your final
model.
COMP9417 21
Report Structure
6. Discussion: Compare different models, their features and their
performance. What insights have you gained?
7. Conclusion: Give a brief summary of the project and your findings, and
what could be improved on if you had more time.
8. Reference: list of all literature that you have used in your project, if any.
You are encouraged to go beyond the scope of the course content for
this project.
COMP9417 22
Deadline
The deadline to submit the group report is on
Monday, 17 April 5:00 pm.
COMP9417 23
Project Help
Consult Python package online documentation for using methods,
metrics and scores. There are many other resources on the Internet
and in literature related to classification. When using these resources,
please keep in mind the guidance regarding plagiarism in the course
introduction. General questions regarding the group project should be
posted in the Group project forum on the course Moodle page. For any
questions about the Berrijam competition, please first check their page
and the FAQ section, before posting to the forum.
COMP9417 24
Peer Review
Individual contributions to the project will be assessed through a peer-
review process which will be announced later after the reports are
submitted. This will be used to scale marks based on contribution.
Anyone who does not complete the peer review by 5 pm Thursday of
Week 11 (27 April) will be deemed to have not contributed to the
assignment. Peer review is a confidential process, and group members
are not allowed to disclose their reviews to their peers.
COMP9417 25