1
Cardiff School of Computer Science and Informatics
Coursework Assessment Pro-forma
Module Code: CMT307
Module Title: Applied Machine Learning
Lecturer: Yuhua Li, Yukun Lai
Assessment Title: Coursework 2 Machine learning project
Assessment Number: 2
Date Set: 19 March 2021
Submission Date and Time: 14 May 2021 at 9:30am
Return Date: 14 June 2021
This assignment is worth 50% of the total marks available for this module. If coursework is
submitted late (and where there are no extenuating circumstances):
1 If the assessment is submitted no later than 24 hours after the deadline,
the mark for the assessment will be capped at the minimum pass mark;
2 If the assessment is submitted more than 24 hours after the deadline, a
mark of 0 will be given for the assessment.
This will apply to any of the three parts to be submitted as part of this assignment.
Your individual submission must include the official Coursework Submission Cover sheet,
which can be found here:
https://docs.cs.cf.ac.uk/downloads/coursework/Coversheet.pdf
Submission Instructions
This coursework submission consists of a group submission (note: group submission refers
to the submission of the three files for Part 1 and Part 2 hereafter) and an individual
submission. The group submission will be submitted in Learning Central by a nominated
team member, and the individual submission will be submitted in Learning Central by
individuals.
The group submission (from Part 1 and Part 2 of this assignment) consists of three files:
1. A single PDF file for your group report (up to 4500 words) on a specific machine
learning project.
2. A zip file containing all source code of your group project.
3. A single PDF file for the slides of your group presentation, which should include the
link to the video of the recorded group presentation on the first slide.
All group members must have seen and agreed to the final version of the submission.
The individual submission (from Part 3 of this assignment) consists of a single PDF file for
the self-reflection and peer assessment proforma.
2
Description Type Name
Part 1
(group
submission)
Compulsory One PDF (.pdf) file for group report (up to 4500 words) groupreport_[group number].pdf
Compulsory One ZIP (.zip) file containing the Python code Groupcode_[group_number].zip
Part 2
(group
submission)
Compulsory One PDF (.pdf) file for the presentation slides which also contains
the link to the video of your group presentation.
groupslides_[group_number].pdf
Part 3
(individual
submission)
Compulsory One PDF (.pdf) file for the individual peer assessment proforma peerassessment_[student number].pdf
Compulsory One PDF (.pdf) file for Cover sheet (to be individually submitted
with peer assessment proforma)
[student number].pdf
Note: This coursework consists of three part: Part 1 and Part 2 are for group report and
presentation. Part 3 is for individual work.
Part 1: Group report and project code. The deliverable includes a zip file with the code, and
a a single PDF file for the written summary (up to 4500 words) describing solutions, design
choices, evaluation and a reflection on the main challenges faced during development and
insights gained throughout the process. Prior to handing in make sure all documentation has
been collected. Additional supporting material, such as sources or data may also be submitted
if appropriate along with the code zip file. Any code submitted will be run in Python 3 and
must be submitted as stipulated in the instructions. Make sure the report clearly mentions
your group number, the project title, and the name of supervisor and a list of student ID
numbers of all members of the group on the title page of your report.
Part 2: Group presentation. The slides for the presentation should be submitted as a single
PDF document in learning central (group assignment) by the same nominated team member
as in Part 1. The link to the video of the group presentation should be given on the first slide
of the presentation.
Part 3: Peer assessment. Part 3 consists of a peer assessment proforma where students
reflect on individual contributions and assign marks to other members in your group. Each
individual will submit a cover sheet together with the peer assessment proforma in Learning
Central by the deadline.
Any deviation from the submission instructions above (including the number and types of
files submitted) will result in a reduction of 20% of the mark.
Staff reserve the right to invite students to a meeting to discuss coursework submissions
3
Assignment
In this coursework, students demonstrate their familiarity with the topics covered in the
module via a group project. This coursework consists of three parts: Part 1 and Part 2 are for
group report and presentation. Part 3 is for individual work.
Marks will be awarded to the individual student based on the quality of the group report, the
presentation and their contribution.
Part 1: Group report (70%)
In Part 1, students will be allocated in groups to design a machine learning project in one
specific topic. The list of all topics along with their descriptions is available in Appendix A.
Each group is given a specific dataset and a supervisor. The task of each group consists of
developing a whole machine learning pipeline that attempts to solve the task. The usage of
neural networks as methods/baselines is not mandatory but will be positively assessed; the
non-usage of neural methods should be properly justified.
Throughout the course the groups will have several milestones and should present their
progress to their supervisor in each session. Finally, the group will write a report summarizing
the steps followed and the main insights gained as part of the process.
As part of the group decisions, each student will be allocated to one of the following tasks:
- Descriptive analysis of the dataset + Error analysis
- Preprocessing + Literature review
- Implementation + Results
Each of these tasks will have a minimum of two students involved (except in exceptional cases
when this is not possible), who will work together in the specific task and as part of the group.
The structure of the report will be decided by the group members. In Appendix B, students
can find some guidelines to write the report, including some of the common sections that
groups may want to include in their report.
Note: These are just guidelines and students are not forced to follow this structure. New
sections may be added or adjusted if necessary.
Each student will also be involved in all group activities/tasks and will be responsible for the
well functioning and coordination of the team members.
4
Deliverables
The deliverables for this part include a report of no more than 4500 words and a zip file with
all the Python code and a README file. The group report must have the first page from
Appendix C. The code and README should contain three specific parts:
(1) Code to get the statistics used to complement the descriptive analysis of the dataset.
(2) Code to train one of the best performing models in the training set and evaluate it in
the test set. This code should also include all steps for preprocessing the original
dataset, if it were necessary.
(3) A README file explaining how to run the code for each of the two parts.
The code will not be marked separately and will only be used as a complement to assess
specific parts of the report.
Assessment
The final mark for this part (70% of the total marks) will result from the following items:
- Descriptive analysis of the dataset + result analysis (17%)
- Preprocessing + Literature review (18%)
- Implementation + Results (18%)
- Group report as a whole, including its coherence and structure (17%)
Note: Normally every member of the group will receive the same mark for the group report
and presentation, in some cases marks might be weighted by the individual contribution in
the project. This would be based on peer assessment for which instructions will be given in
Part 3.
5
Credit will be awarded against the following criteria.
Criteria Fail (0-49%) Pass (50-59%) Merit (60-69%) Distinction(>=70%)
Descriptive
analysis of the
dataset + result
analysis (17%)
No or arbitrary
data
exploration.
No or little
meaningful
result analysis
and discussion.
Suitable but limited
data exploration.
General result
analysis and
discussion.
Good data
exploration but
miss some
insightful
analysis.
Good result
analysis and
discussion but
lack of depth.
Thorough and
insightful data
exploration.
Insightful result
analysis and
discussion.
Preprocessing +
Literature
review (18%)
No or very
little data pre-
processing and
literature
review.
Some necessary
pre-processing and
basic literature
review are
conducted.
Adequate pre-
processing to
prepare the
data for model
development.
Adequate
literature
review.
Extensive pre-
processing to deal
with all aspects of
non-ideal
characteristics of
the data with an
aim to achieve a
best classification
performance.
Extensive and
insightful
literature review.
Implementation
+ Results (18%)
Unsuitable ML
method is
chosen. The
models are not
correctly
implemented
and optimised.
Little/improper
performance
evaluation.
The models are
implemented but
not
properly/sufficiently
trained.
Performance
evaluation using
metrics without
considering data
characteristics.
The
implemented
models are
properly
trained and
optimised
Good
performance
evaluation with
suitable
metrics.
All models are
excellently
implemented,
properly trained
and systematically
optimised.
Comprehensive
model evaluation
for best results.
Group report as
a whole,
including its
coherence and
structure (17%)
The report is
poorly
presented. No
or little
meaningful
discussion.
The report is
acceptable in terms
of technical
contents and
structure. General
discussion and
vague conclusion.
The report well
written. Good
discussion but
lack of depth.
The report is
professionally and
cohesively
presented.
Insightful
discussion and
clear conclusions.
6
Part 2: Group presentations (10%)
In Part 2, students are asked to present their projects as a group. The presentation should be
recorded and the video should be stored on cloud, e.g., Office 365 OneDrive or Google Drive.
Specific guidelines for the presentations will be available in Appendix D.
The presentation weighs 10% of the total marks.
The main assessment criteria for the presentation will be based on the communication skills
of the presenters as a whole.
Part 3: Self-reflection and peer assessment (20%)
In Part 3, students are asked to do a self-reflection and peer assessment using the proforma
in Appendix E. In the proforma, you must discuss your and each member’s contribution to the
group project and to the overall group report and presentation. You must show that you
contributed to the group report and presentation. Discuss what tasks you have performed
and provide evidence of your work (you may refer to the group report for the actual
work/results). Discuss how you approached these tasks and how you interacted with other
members, both in sharing your results and in organising the team's activities. Consider how
well your existing skills were utilised and what new skills you have learned. Then reflect on
your overall performance and role in the team and suggest what went well and what changes
you will be making to improve (1) your performance in particular, and (2) the performance
and results of methods and analyses performed as part of the project. You may also reflect
on how your perspective and approach changed over time and adapted to improve your work.
Note: Please indicate the information about your group (group number, project name) in the
proforma.
This part weighs 20% of the total marks.
Contribution of group members
This is a team project and this assignment is assessed as a team, apart from the individual
mark for the supporting evidence for each team members’ service. Each team member is
expected to contribute to the project for the tasks that are agreed in the group.
You will also be asked to submit a peer assessment form. You will evaluate the contribution
of each group member and your own contribution to the deliverables. Normally every
member of the group will receive the same mark for the group components except in the case
where a group member’s contribution and/or quality of work falls significantly below that of
the rest of the group, in which case the marks will be adjusted accordingly.
7
Please inform the module leader if there are any problems with any group members not
engaging in group tasks or missing group meetings. The teaching team will check on the
engagement of the group members in the contact session and review meetings. Students
should therefore inform the module leader (and the other team members, if appropriate) if
circumstances arise that are likely to affect their engagement with their work and/or
attendance at weekly meetings with the rest of the team.
The teaching team will provide formative feedback during contact sessions. Your team should
also meet regularly outside of these.
Learning Outcomes Assessed
1. Implement and evaluate machine learning methods to solve a given task.
2. Explain the basic principles underlying common machine learning methods.
3. Choose an appropriate machine learning method and data pre-processing strategy to
address the needs of a given application setting.
4. Reflect on the importance of data representation for the success of machine learning
methods.
5. Critically appraise the ethical implications and societal risks associated with the
deployment of machine learning methods.
6. Explain the nature, strengths and limitations of an implemented machine learning
technique to an audience of non-specialists.
Criteria for assessment
Criteria for each individual part is provided separately as in previous sections. The final
mark will be obtained from a weighted sum of the three parts: Part 1 - 70%; Part 2 - 10%;
Part 3 - 20%.
The grade range is divided in:
Distinction (70-100%)
Merit (60-69%)
Pass (50-59%)
Fail (0-50)
Feedback
Feedback on your coursework will address the given criteria. Feedback and marks will be
returned by 7 June via Learning Central. There will be opportunity for individual and group
feedback during an agreed time.
8
Appendix A: CMT307 Group Projects
Note 1: Datasets are provided for all projects
Note 2: Not all datasets contain train/dev/test splits. It is up to the group members to
decide a suitable split in those cases (or cross-validation).
1. Hyperpartisan news detection
The task consists of detecting polarized pieces of news. Dataset from the SemEval 2019 task
on hyper partisan news detection (https://pan.webis.de/semeval19/semeval19-web/).
Direct link to download the dataset:
https://drive.google.com/file/d/1tD1bCYmF5G3PlsEmh-jPsJLrrhBalS9f/view?usp=sharing
2. Emoji prediction
The goal of this task is to predict an emoji (e.g. ) given a tweet. Dataset based on the
SemEval 2018 task on emoji prediction
(https://competitions.codalab.org/competitions/17344).
Direct link to download the dataset: emoji_prediction.zip
3. Word sense disambiguation
Given a word in context, the task of word sense disambiguation consists of finding its most
suitable meaning from a pre-defined set of possible meanings (e.g. mouse can be a
computer device or an animal). Each dataset contains ten different ambiguous words and
sentences where they occur.
- Group G24: https://drive.google.com/file/d/1zPcYrtN6cmRupitLr--
jR24m8THEWikD/view?usp=sharing
- Group G25:
https://drive.google.com/file/d/1vUqRkMDMreXSclixOYs5MNBB3e5Dqjz-
/view?usp=sharing
4. Definition extraction
The task of definition extraction consists of finding definitions (e.g. “a computer is an
electronic device for storing and processing data”) from an unlabeled corpus of text.
Specifically, this project treats the problem as a binary classification problem where given a
sentence, the task is to decide whether such a sentence is a definition or not.
Link to download the dataset:
https://drive.google.com/file/d/1eFQsAdLayy5jM_CYWPkr4nsGhhcYcTWe/view?usp=sharin
g
5. Text categorization
9
Text categorization (also referred to as text classification) consists of associating a document
with a given topic (e.g. sports, politics, etc.). 20 Newsgroups dataset
(http://qwone.com/~jason/20Newsgroups/).
Direct link for download the dataset (bydate version):
http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz
6. Hate speech detection
Given a tweet or a piece of comment, the task of hate speech consists of predicting whether
the given text represents hate speech or not, and classify it accordingly. This task is based
on the SemEval 2019 task on detection of hate speech against immigrants and women in
Twitter (https://competitions.codalab.org/competitions/19935).
Direct link to download the
dataset:https://drive.google.com/file/d/1Cn60H0klYNRNI_q5SeghzFlu6eMYmMOO/view?us
p=sharing
7. Lie detection
The goal of this task is to detect deceptive language, or false statements, in a dataset of
reviews. Available datasets and links:
- Hotel:
https://drive.google.com/file/d/19ZkFhP8Mw1vbFnJAwibiBLj3YbKXxuEx/view?usp=s
haring
- Doctor/restaurant and hotel:
https://drive.google.com/file/d/1XYa3d5ebsp7TWCQFfFKjpmXW57JKUR9f/view?usp
=sharing
8. Opinion vs. factual news stories
When building models to analyse news data at scale, it is important to distinguish between
articles where the author is reporting "facts" (e.g. that something happened, or that a
person said something) and those where the author is reporting their own opinion (e.g. that
they think something will happen or that something happening is a good thing). Using a
dataset of financial news articles, the task consists of building a classifier which labels an
article as being fact or opinion. (Note: this dataset may contain noisy labels as they were
obtained semi-automatically).
Direct link to download the dataset (dataset provided by AYLIEN):
https://drive.google.com/file/d/1Mqoh7gG-g3Sc3Zh6zEO8o2sKBpdSeFiJ/view?usp=sharing
9. Urban Sound Classification
Dataset from here: https://urbansounddataset.weebly.com/urbansound8k.html . The goal
of this task is to classify different urban sounds (e.g. dog bark or street horn) into their
correct classes.
Direct link to download the dataset:
https://drive.google.com/file/d/15fojQ3xKcPMLwIm6s8hoyM0xJjZ6SOZU/view?usp=sharing
Note: the size of this dataset is over 6GB (a portion of it may be used if hard to process).
Therefore an important part of this project is the preprocessing and handling of the data.
10
The following repository includes a machine learning model that can be used as a
guide/reference at the beginning: https://github.com/AmritK10/Urban-Sound-Classification
10. Fine-grained image classification
The goal of this task is to develop an algorithm to learn to classify images containing objects
of the same category (e.g. birds, dogs) into specific sub-categories, i.e. specific species.
Datasets are available:
Caltech-UCSD Birds-200-2011: 200 categories of birds
http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
Stanford dogs: 120 categories of dogs
http://vision.stanford.edu/aditya86/ImageNetDogs/
11. Image emotion recognition
Images often trigger different emotions to the viewer. The goal of this task is to develop an
automatic algorithm to recognise the emotion that a specific image exhibits.
Dataset available at: https://cf-
my.sharepoint.com/:f:/g/personal/laiy4_cardiff_ac_uk/ElhtEyhDRa1GqeD_Y43ZSL8BcBRiKZ
8N3kNgdZa79ZNaaw?e=xlvJPE
Some explanation for the dataset: https://arxiv.org/pdf/1605.02677.pdf
12. Detection and recognition of traffic signs
Detection and recognition of traffic signs is an important task for autonomous driving. The
task is to identify traffic signs in images and recognise them.
Dataset available at
https://sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/published-
archive.html (German Traffic Sign Recognition Benchmark)
13. Object localisation
It is straightforward for people to locate objects in an image, but can you develop a system
to learn to do this? Given an image, the task is to find all the instances of relevant objects
(as bounding boxes).
Dataset available at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
It contains multiple datasets, and you have the flexibility to choose the ones that satisfy
your project needs. So the following are just suggestions, and you may come up with your
own approach, as long as it is reasonable and well justified.
For object detection/localisation, the most relevant dataset is the one used for
Detection: Predicting the bounding box and label of each object from the twenty
target classes in the test image.
You can see example images for this task at
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/examples/index.html
14. Energy usage prediction
11
Prediction of building energy consumption is important to electricity distribution,
management and environment. This project will develop a machine learning model to
predict energy usage based on historic usage rates and observed weather. Dataset available
at https://www.kaggle.com/c/ashrae-energy-prediction/overview.
15. Stock price prediction
Machine learning can be useful for algorithmic trading. It can help to develop a profitable
trading strategy by predicting a stock price in the future based on historical daily OHLC price
data. Data can be readily downloaded from Yahoo Finance using Python package yfinance:
https://pypi.org/project/yfinance/ .
-----------------------------------
For groups working on image-related tasks (Projects 10-13):
Some tutorials for basic image processing with Keras/Tensorflow:
https://stackabuse.com/image-recognition-in-python-with-tensorflow-and-keras/
https://developer.ibm.com/technologies/artificial-intelligence/articles/image-recognition-
challenge-with-tensorflow-and-keras-pt1
https://www.tensorflow.org/tutorials/keras/classification
12
Appendix B: Group report guidelines
In this document we detail some of the common sections involved in a machine learning
project report. These sections are only presented as a guideline, but the report may have a
different structure both in terms of sections and order.
1. Introduction
Summary of the task and main goals/contributions/insights of the project.
2. Description of the task/dataset
Description of the task and dataset, including relevant statistics of dataset splits.
3. Methodology
Description of the machine learning methods used in the project.
4. Experimental setting
Description of the specific details of the evaluation (e.g. parameter tuning, usage of the
development set).
5. Results
Final results of the experiments, including baselines and table/s with
precision/recall/accuracy/f1, etc.
6. Analysis
Analysis of the results, error analysis (investigate the type of error the system makes, etc.).
7. Literature review / Related work
Overview of the related work most connected to the methods and tasks of the projects.
Explain the differences and the connection between works in the literature with respect to
the employed method (e.g. advantages/disadvantages, ideas you exploited, etc.).
Tip: Google Scholar is an excellent resource to find relevant articles to any of the topics.
8. Conclusion and future work
Summary of the main conclusions and takeaways from the experiments. Explain ways to
investigate or improve the method in the future.
13
Appendix C: Group report front page
You will include the following information on the first page of your group report.
CMT307 Coursework 2 Group Project
Group number
Project title
Supervisor
14
Appendix D: Presentation Guide
In this document we include guidelines for the presentation of the projects.
Audience
The goal of the presentation is to explain the project and what the group has done to an
audience of imagined students and module instructors. Presenters can assume that the
audience is knowledgeable about machine learning but does not necessarily know about the
specific topic. Therefore, a clear introduction to the task and challenges is required.
Number of presenters
A minimum of three presenters, at least one for each of the allocated tasks. The presenters
can be decided among group members. However, we would expect all group members to
contribute to the presentation (slides, help rehearsals, etc.). Given the situation, the pre-
recorded video could consist of a concatenation of several pieces recorded individually
(individual pieces could consist of slides and audio, for example) - group members do not
need to meet physically.
The presentation video should be stored on cloud such as Office 365 OneDrive or Google
Drive and be made accessible to all module instructors including module lecturers and
teaching assistants.
Slides
We encourage all groups to use slides for the presentations as a visual aid, but other means
are also allowed. The slides should be submitted in Learning Central in pdf. Note the link to
the presentation video should be included on the first slide.
Time
Each presentation will have a maximum duration of 15 minutes.
15
Appendix E: Peer Assessment and Self-reflection
Notes for completing the peer assessment form
1. Each row of the form is for one member
2. In the first column, add full names of all other group members (excluding yourself) in
alphabetical order by surname.
3. In the second column, evaluate contributions of each member of the group to the project
(up to 150 words for each member). This evaluation should be based on evidences, such
as tasks completed, interaction with other members, participation in the group activities,
team working skills, etc.
4. In the third column, fairly and honestly allocate marks to each member based on your
evaluation in the second column. You have a total of 20 marks to be allocated to all
other members. For example, for a group with 4 other members, an allocation of the 20
marks to those 4 members could be 7, 3, 10, 0, respectively. A 0 mark indicates that
member has done nothing to the group project. If contributions are felt to be fair, you
can score the same marks to all other students in your group.
5. Total peer assessment (TPA) marks of a member will be obtained by aggregating marks of
all peer assessment forms in the group, and a member can have a maximum peer
assessment mark of 20 (even if her/his aggregated TPA is greater than 20).
6. The final peer assessment (FPA) marks of a member will be weighted by the total group
mark from Part 1 and Part2. For example, if the total group mark is 57 out of 80 marks
(70 for Part 1 plus 10 for Part 2) and a member called Boris has a TPA of 16, then Boris’s
FPA will be (57/80) x 16 = 11 (round to nearest integer). Finally Boris overall mark for this
coursework will be 57+11=68%.
7. If you failed to submit Peer Assessment and Self-reflection Proforma or the submitted
proforma was not usable (e.g., clearly unfair mark allocation), your FPA would be zero.
16
Peer Assessment and Self-reflection Proforma
Student ID:
Group number:
Project title:
Supervisor:
Peer assessment
Member name Contribution and justification Marks
Self-reflection
[write your self-reflection here, up to 300 words. Your self-reflection will be used to cross-
check if you have been fairly assessed by your peers]
学霸联盟