engg5103代写-ENGG 5103-Assignment 2
时间:2022-10-30
ENGG 5103 Assignment 2
Deadline: HKT 23:59, Nov 6, 2022
Exercise 1
Data Preprocessing (Total: 25%)
• Clean data with missing values. Output the new table at this step. (5%)
• There is a duplicate record. Please remove it and output the new table at this step. (5%)
• There is a redundant feature. Please remove that feature and select useful features.
Output the new table at this step. (5%)
• Convert categorical values to numerical values and output the new table. Output the
new table at this step. (10%)
Table 1: Table 1: Initial data records
Course ID Course Name Instructor Gender Average GPA Required Course?
CSC236 Theory of Computing Male 3.2 Yes
CSC263 Data Structures Female 3.2 Yes
CSC373 Algorithms Female 3.8 No
CSC463 Complexity Theory Male No
CSC258 Computer Organizations Male 3.8 No
CSC165 Computer Logic Male 2.0 No
CSC411 Machine Learning Female 2.0 No
CSC108 Introduction to Computer Science Male 4.0 No
CSC108 Introduction to Computer Science Male 4.0 No
CSC384 Artificial Intelligence Female 3.7
Exercise 2
Clustering (Total: 75%)
• Write two pseudo codes of the clustering algorithm. (25%)
• Find a publicly available dataset and apply your clustering algorithm to the dataset. Take
a suitable measurement, make a comparison between these two clustering algorithms
(25%) and make a comparison with the ground truth. (25%)
Submission list: (1) two pseudo codes; (2) available link to download the dataset; (3) brief
comparison report. For the clustering algorithms, you can call functions pre-defined in any
package from Python or R Language. Also, you can write them by yourself.