infs7203 data mining-INFS7203-数据挖掘代写
时间:2022-11-09
Preview Test: INFS7203 Semester Two Final Examination 2020
Test Information
Description
Instructions
Multiple
Attempts
Not allowed. This test can only be taken once.
Force
Completion
This test can be saved and resumed later.
Your answers are saved automatically.
Undertaking this online examination deems your commitment to UQ’s academic
integrity pledge as summarised in the following declaration:
"I certify that I have completed this examination in an honest, fair and trustworthy manner,
that my submitted answers are entirely my own work, and that I have neither given nor
received any unauthorised assistance on this examination".
You need to answer all of the questions in the Blackboard Test.
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
QUESTION 1
a. [2 marks] Simply list two applications of Association Rule Mining.
b. [4 marks] Briey describe the Apriori Principle. Explain the benets
of applying the Apriori Principle in the context of the Apriori Algorithm
for Association Rule Mining.
6 points   Save Answer
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
QUESTION 2
a. [2 marks] Briey describe the main dierences between the two
major data mining tasks: classication and clustering.
b. [5 marks] List the main procedures of the DBSCAN algorithm.
c. [3 marks] Briey discuss whether cross-validation can be used in
clustering algorithms to determine parameters.
10 points   Save Answer
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
QUESTION 3
a. [2 marks] What are the relationship between outliers and noises? 
b. [4 marks] List the main procedures of the Local Outlier Factor (LOF)
method for the detection of outliers.
c. [2 marks] List one strength of the LOF method with justication. 
8 points   Save Answer
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
QUESTION 4
You are working on a spam classier. Assume that "Spam" is the
positive class (y=1) and "Not Spam" is the negative class (y=0).
Additionally, an algorithm "A" is used to classify the test set. The
following table shows the corresponding classication results
obtained by the algorithm, where the left column is the ground truth
class and the right column is the predicted class. Answer the following
questions
Spam 1
Not Spam 0
Spam 0
Spam 1
Not Spam 1
Spam 1
Not Spam 0
Not Spam 0
Spam 1
Spam 1
a. [1 mark] Evaluate the algorithm A under the metric accuracy.
b. [1 mark] Evaluate the algorithm A under the
metric precision (round to 2 decimal places).
c.  [1 mark] Evaluate the algorithm A under the metric recall (round
to 2 decimal places).
d. [1 mark] Evaluate the algorithm A under the
metric F1 measurement (round to 2 decimal places).
e.  [3 marks] Briey describe the meaning of generalization and
overtting in classication.  How are these two concepts related to
each other?
7 points   Save Answer
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
QUESTION 5
We have some data about whether people go to watch a football game. The data
includes three attributes: whether the game is on a weekend or not, whether the
person has friends to watch the game together or not, and whether there is any
football star in the game or not:
Tid Weekend Friends Star
there
Watch?
1 Yes No No No
2 Yes Yes No No
3 Yes Yes No Yes
4 Yes Yes Yes Yes
5 Yes No Yes Yes
6 Yes No Yes No
7 Yes Yes No No
8 Yes Yes Yes Yes
9 No Yes Yes No
10 No Yes No No
11 No No Yes No
a. [4 marks] Construct a decision tree from the provided dataset
to predict whether people will watch the football game given
three attributes "Weekend", "Friends" and "Star there", using
the GINI index-based splitting criterion.
b. [2 marks] Briey describe the pruning procedure and its
purpose in decision tree classiers.
c. [2 marks] Briey describe the random forest method.
8 points   Save Answer
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
QUESTION 6
Given the data set below, answer the following questions.
Tid Home
Owner
Marital
Status
Annual
Income
Class
1 Yes Single 125K No
2 No Married 120K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 120K No
8 No Married 85K Yes
9 No Married 75K No
10 No Single 120K Yes
 
a. [2 marks] Why is Naïve Bayesian classier named “Naïve”?
b. [3 marks] Predict the class label of a given test record X: (Home
Owner: No, Marital Status: Married, Annual Income: 120K) using
the Naïve Bayesian Classier.
c. [1 mark] List one strength of the Naïve Bayesian classier compared
to decision tree?
6 points   Save Answer
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
QUESTION 7
Given the two-dimensional data set below, answer the following
questions.
(x, y) Class
(1, 5) +
(2, 3) +
(4, 3) -
(5, 5) -
(6 ,7) -
(8, 5) -
(3, 7) +
(4, 10) -
(7, 9) +
a. [3 marks] Given the above data set as the training set, classify a
new data point p = (4, 5) using the Manhattan distance and majority
vote according to the following 2 cases: 1- nearest neighbor, and 2-
nearest neighbors.
b. [2 marks] If the data is high-dimensional, briey discuss the impact
of the high-dimensionality on the k-NN method. 
5 points   Save Answer
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
QUESTION 8
Please use this space to specify any assumptions you have made in
completing the exam and which questions those assumptions relate
to. You may also include queries you may have made with respect to a
particular question, should you have been able to ‘raise your hand’ in
an examination room.
0 points   Save Answer
essay、essay代写