COMP3425/COMP8410 Data Mining, Semester 1 2021
22 February to 16 April 2020 Hybrid mode (remote or in-person).
This course is an introduction to data mining and the broad skills for selecting and applying data
mining algorithms. We cover a breadth of common and emerging techniques from statistics and
computer science with the aim of understanding where they might be useful and how to use them
properly, while understanding limitations. There is an emphasis in the course on>practical work over the mathematical and statistical foundations, although a conceptual
understanding of methods is expected. Detailed course content will be made available during the
course at the Wattle site https://wattlecourses.anu.edu.au/course/view.php?id=33447
Quick Reference
Mode of Delivery Hybrid: 12 weeks plus exam. Course material is presented on-line,
supplemented with a 1-hour (real-time on-line and also
recorded) lecture and 2-hour (real-time online or in-
person) lab in most weeks.
Prerequisites COMP8410: (COMP7240 or COMP6240) and (COMP6730 or
COMP7230 or COMP6710).
COMP3425: (COMP1100 or COMP1130 or COMP1730) and
COMP2400.
Incompatible courses COMP3420, COMP8400
Co-Taught courses COMP3425, COMP8410
Course Convenor Prof Kerry Taylor with co-convenor Dr Pouya Omran
Phone 6125 8560
Email comp8410@anu.edu.au
Office hours for consultation Immediately following Friday lecture until 1pm
Research Interests Semantic Web, Machine Learning, Spatial and IoT data analysis
Administrator CECS Student Services
Email studentadmin.cecs@anu.edu.au
Lecturer Kerry Taylor & Pouya Omran
Lead Tutor Pouya Omran
Email comp8410@anu.edu.au
Phone No phone contact
Other Tutors Muhammad Salman, Ehsan Emamirad, Shuaiqun Pan, Nicholas
Burrell
Textbook (not required) Han, Kamber & Pei, Data Mining: Concepts and Techniques 3rd
Edition, 2011. www.elsevier.com/books/>and-techniques/han/978-0-12-381479-1 . It is available in the
university library in soft and hard copy and at the on-campus
bookshop. The second edition would also be adequate.
Other recommended
references
Graham Williams, Data Mining with Rattle and R, The Art of
Excavating Data for Knowledge Discovery, Springer 2011.
http://www.springer.com/gp/book/9781441998897
Witten, Frank, Hall and Pal, Data Mining, Practical Machine
Learning Tools and Techniques, 4th Edition, Elsevier 2017.
https://www.elsevier.com/books/>804291-5
Mode of delivery
We aim to provide a uniform learning environment for all students, whether local or remote in order
to direct our limited resources equitably to all students. Lectures will be delivered as dual delivery
for both in person and remote real-time attendance one hour per week (usually). The format is
interactive and we need physical attendance at those lectures from local students so that students
not attending can also benefit from the interactive style. Lectures will be recorded. Please refer to
the University timetable for time and location, and watch for advice on remote real-time
participation. Most laboratories will be conducted on-line with real-time tutor support although an
option for on-campus laboratory attendance is available. One laboratory class will be recorded each
week. Exams will be conducted remotely online at a time to be announced. As the global
environment and our own understanding of needs and best solutions changes, there may be
considerable operational change during the semester.
Throughout the course all times are given in the Canberra time zone, i.e. at first AEDT (UTC+11) but
changing to AEST (UTC+10) on Sunday 4th April 2020, see
https://www.timeanddate.com/worldclock/australia/canberra .
p/g only COMP8410 Learning Outcomes
Upon successful completion of this course, students will:
1. Critically analyse and justify the steps involved in the data mining process,
2. Anticipate and identify data issues related to data mining,
3. Research, test and apply the principal algorithms and techniques used in data mining,
4. Justify suitable techniques to use for a given data mining problem,
5. Appraise and reflect upon the results of a data mining project using suitable measurements,
6. Investigate application areas and current research directions of data mining,
7. Reflect upon ethical and social impacts of data mining.
u/g only COMP3425 Learning Outcomes
Upon successful completion of this course, students will:
1. Critically analyse and justify the steps involved in the data mining process,
2. Anticipate and identify data issues related to data mining,
3. Test and apply the principal algorithms and techniques used in data mining,
4. Justify suitable techniques to use for a given data mining problem,
5. Appraise and reflect upon the results of a data mining project using suitable measurements,
6. Reflect upon ethical and social impacts of data mining.
Assessment Scheme
Assessment components, weighting and due dates
Assessment Task Value % Due Date Learning
outcomes
Weekly online quiz 1 11:59pm Wednesdays All
Assignment 1 (Essay) 15 9am Monday Week 4, 15 March 1, 6, 7
Mid-term exam 20 Week 6, TBA 1, 2, 3, 4, 5
Assignment 2 (Prac) 20 9am Monday Week 10, 10 May 1, 2, 3, 4, 5
Final exam (hurdle) 44 Exam period 3 June to 19 June, TBA All
OVERALL MARK 100
• For assignment topics please see the Wattle site.
• Online quizzes will be offered for most of the learning weeks. Quizzes open at 8am on
Monday of the relevant week and close at 11:59pm of the Wednesday of the following week
(i.e., open for 10 days). Quizzes are open-book and primarily intended for self-assessment
but also for exam practice. Automated feedback on answers is given and multiple attempts
are permitted. Marks for all such quizzes will be totalled and scaled to contribute 1% to the
overall course mark. Combined, they are also intended as exam practice. If you do not
attempt the quiz before closing time is will not be available to you for subsequent
revision.
• The Mid-term (1.5 hour) and Final (3 hour) exams will be closed-book with no personal notes
or online materials permitted. However, access to the online course material distributed via
Wattle will be provided. They will be conducted remotely online under supervision. Zoom is
required. The date and time will be announced by the Exams Office. Detailed information
will be provided via the Wattle News Forum.
Overall course mark
• The final exam component is a “hurdle” under the ANU Rules. A student must achieve at
least 40% in the final exam to pass the course.
• At least 50% overall is required to pass the course.
• A supplementary exam will be offered to any student who has an overall mark of at least
45% and either
o an overall mark of less than 50%; or
o has failed a hurdle assessment
• Marks may be moderated so that raw marks for assessment components as well as overall
marks may be scaled by the convenor or as a result of school or college academic review.
Policy on late assessment and re-marking
• Assessment submitted after the due date and time will not be accepted and will not be
marked.
• Extensions to the due date for submission will only be granted if requests are made to the
convenor at comp8410@anu.edu.au well in advance, stating the reasons for requiring the
extension, evidence to support the reason (usually a medical certificate), and the extension
period requested.
• Students may consider applying for special consideration. An application form must be
completed and lodged online within three business days of the original due date of the
assessment.
• Any appeals or request for re-consideration regarding an assessment piece must be
submitted within two weeks of the assessment result being released. The procedure for such
requests will be advised on the Wattle News forum.
Academic Misconduct
Students are expected to have read the ANU Academic Misconduct Rule before commencement of
the course. No group work is permitted in any part of the assessment in this course. Plagiarism will
not be tolerated and University procedures will be applied ruthlessly. Therefore your contributions
are expected to be yours alone, except for work that is clearly attributed appropriately. You may find
this a helpful guide to understanding what constitutes plagiarism and how seriously various
violations will be treated: http://thevisualcommunicationguy.com/2014/09/16/did-i-plagiarize-the-
types-and-severity-of-plagiarism-violations/
Every student is expected to be able to explain and defend a submitted assessment item. The course
convener may conduct or initiate an additional interview about any submitted assessment item at
any time. If there is a significant discrepancy it will be treated as a case of suspected academic
misconduct.
Support for Students
The University offers a number of support services for students. Information on these is available.
Organisation of Course
Please spend a while familiarising yourself with the Wattle course site.
You will see that there is a section for each week of the course. Sections may not be visible until the
respective time period commences, to help you pace your way through the course. You are expected
to work through the course notes by self-study or in self-organised study groups if you prefer.
Each section includes a description of the topic to be covered and extensive course notes. Most,
but not all, of the course material is sourced from the course text, Han, Kamber and Pei. Reference
to relevant sections of the text are given so that you may refer to the text for alternative
explanations and extension material. For some topics, additional reading or detailed video
explanations are prescribed.
Usually there are paper-based exercises embedded within the notes to assist you to understand the
course topics. All the exercises are considered mandatory and examinable and you will have trouble
if you do not keep up with them. Some exercises build on the results of previous exercises.
There are also software-based practical exercises (titled “practical exercises”) embedded in the
course notes. You may do these exercises at your own pace if you prefer, or they may be
undertaken in labs with the support of your tutor. If you do not have time to complete them in your
scheduled lab, please do complete them outside classes. Either way, the practical exercises are
mandatory components of the course. In week 1 you will be asked to enrol in a lab at a time to suit
you and no labs will be held in week 1. Most labs will be of 1.5 hours’ duration but some should be
completed within an hour. Due to adjustments for potentially extra lab work during the semester,
you must be prepared to (remotely or in-person) attend 1.5 hour laboratories every week.
Most sections also include an open-book self-assessment online quiz that you are advised to
attempt as many times as you need to gain confidence in more theoretical aspects of the course
topic. Your final mark for each quiz is automatically marked and contributes to the weekly quiz
component of the assessment scheme. You should attempt the quiz prior to your allocated lab class
and your tutor will work with you to clarify any issues in the class.
Lectures are scheduled for 2 hours on Fridays at 10am. Usually the lecture will be of only one hour
duration, with the second hour timetabled but seldom used. The first hour of the lecture will
normally be an interactive Q&A session for the topic of the week, and, falling after the weekly labs, is
expected to be the final activity for that week’s topic, aimed at revision and consolidation.
Attendance and participation in the lecture is highly recommended for your own learning, as this
activity been strongly appreciated by students in the past. However, the session will not be effective
for learning if there is insufficient participation. In this case, the lecture will be cancelled and
replaced by similar on-line video material. Remote students can participate fully via live streaming.
All lectures will be recorded and available within a few hours at the top of the Wattle site.
Please be aware that there is an additional 2-hour lecture scheduled for Monday of Week 1 only.
This lecture will orient you to the course and provide valuable concepts for understanding data
mining and for succeeding in the course. In addition, the first assignment will be discussed.
Communication and getting help
The course Discussion forum is the primary mechanism to raise questions or observations on the
course material and this will be monitored very frequently by the course convenor or tutors. Please
pay attention to course announcements on the News forum as these are sometimes critical for
course completion and assessment. Feedback will be provided for submitted assignments, generally
within two weeks of due date.
Unless you are specifically directed, please do not contact course tutors outside scheduled classes
as they have been engaged to assist on specific tasks in the course. You may contact the course
convenor for private or personal matters using the contact information given at the top of this
document, but, to repeat, the Discussion forum is to be used as the primary method for engagement
with course staff. Generally, if you find that you do not understand something, or that it might be
erroneous, or that something is particularly interesting, many of your co-students will also find it
confusing, wrong or interesting, and we can all benefit from your post.
ANU is committed to the demonstration of educational excellence and regularly seeks feedback
from students. One of the key ways students have to provide feedback is through Student
Experience of Learning Support (SELS) surveys. The feedback given in these surveys is anonymous
and provides the Colleges, University Education Committee and Academic Board with opportunities
to recognise excellent teaching, and opportunities for improvement. For more information on
student surveys at ANU and reports on the feedback provided on ANU courses, see
http://unistats.anu.edu.au/surveys/selt/students/ and
http://unistats.anu.edu.au/surveys/selt/results/learning/ .
Once or twice during the course students may also be asked to complete a survey on specific
matters that can inform the remainder of the course or future course design.
Workload
An ANU 6 unit Course is designed for around 130 hours of student effort over the 12 weeks. For this
course, this includes 3 hours per week of semester for self-study when you are expected to work
through the extensive course materials posted on Wattle. Typically, 3 hours of lecture and
laboratory work is also required, although there is less in some weeks. The time budget also
includes assignment work. Any remaining time should be used for additional reading such as the text
book, recommended papers, self-study, and review and reflection.
Required Resources
A laptop or desktop with a reliable internet connection is required for accessing the course material
on Wattle and for completing the practicals, assignments and labs. Rattle and R will be used
extensively in this course so being able to install freely available software will be necessary. An
alternative is to have access to a laptop or desktop where appropriate software is already installed,
such as ANU CSIT student laboratories. Course software is also available by installing the Horizon
VMWare software and logging in to the CS Virtual Desktop Infrastructure and this may be the most
convenient for students with good internet. A smart phone or tablet is unlikely to be satisfactory.
Additional Course Costs
You could purchase one or more of the recommended books for this course. Successful completion
of the course does not require such purchase.
And finally, the convenor’s expectations of you as a learner
Kindly refer to the Learning Expectations for students of the School of Computing, provided on the
course Wattle site. Despite best efforts, and especially this semester, some errors or confusing
messages will slip through in the course materials and in the course administration, and we
encourage you to assist in their resolution or improvement. Please, if you ask for clarification or
correction of administrative matters or the course material, we want you to ask it publicly via the
Discussion forum or in lectures or labs so that we can share the answer, for the benefit of all of us.
We are intolerant of questions that have already been addressed in this course outline, lectures, or
forums, as we consider such questions to demonstrate a lack of responsibility for your learning and
disrespect for the teaching staff. We expect this semester will hold new challenges for the learning
and teaching for all of us engaged in this endeavour, and we look forward to working with you as a
team!
学霸联盟