Python代写-FIT3080-Assignment 2|学霸联盟

Python代写-FIT3080-Assignment 2

时间：2021-09-24

Monash University
Faculty of Information Technology
FIT3080 Artificial Intelligence
2nd Semester 2021

Assignment 2:

This assignment is worth 24% of your final mark (subject to the hurdles described in the
FIT3080 Unit Guide, Moodle preview and other locations). Among other things (see below),
note the need to hit the `Submit’ button.

Due Date: Thursday 14th October 2021, 11:55pm (Melbourne, Australia time)

Method of submission: Your submission should consist of at least 1 and at most 4 files:

1. A text-based .pdf file named: FamilyName-StudentId-2ndSem2021FIT3080_2.pdf
All the files must be uploaded on the FIT3080 Moodle site by the due date and time. The
text-based .pdf file will undergo a similarity check by Turnitin at the time you submit to
Moodle. Please read submission instructions on last page carefully re use of Moodle.

2. A .py Python file named as: FamilyName-StudentId-2ndSem2021FIT3080_2_Qu4.py

3. A .py Python file named as: FamilyName-StudentId-2ndSem2021FIT3080_2_Qu7.py

(The Python .py files are only for Question 4 and Question 7. Question 4 and Question 7
would be answered both in the .pdf and in .py files.)

4. A .xls (or .xlsx) file named as: FamilyName-StudentId-2ndSem2021FIT3080_2_Qu5.xls
or FamilyName-StudentId-2ndSem2021FIT3080_2_Qu5.xlsx .

The details of the submission instructions are possibly subject to change. In the event of any
change, students will be notified.

Total available marks: 10 + 15 + 15 + 15 + 15 + 15 + 20 = 105 marks, to a maximum of 100
marks. Anyone achieving 100 or more marks will be given 100 marks.

Note 1: Please recall the Academic Integrity exercises from week 2 and the start of semester.
In submitting this assignment, you acknowledge both that you are familiar with the relevant
policies, rules and regulations regarding Academic Integrity and also that you are familiar with
the consequences of being deemed to be in contravention of these policies.
Note 2: And a reminder not to post even part of a proposed partial solution to a forum or
other public location. If asking a question in public, you are advised to please make an effort
to ensure that your question is asked in a way that does not contain even part of a proposed
partial solution. It will often be helpful to find a way to post it as a general question (not
directly pertaining to the Assignment) and then to put it in the General category at Ed
Discussion. You are reminded that Monash University takes academic integrity very
seriously.
Note 3: As previously advised, it is your responsibility to be familiar with the special
consideration policies and special consideration process.
Note 4: As a general rule, please don’t just give a number or an answer like `Yes’ or `No’
without at least some clear and sufficient explanation - or, otherwise, you risk possibly being
awarded 0 marks for the relevant exercise. Make sure to explain your answer and show your
working in all parts of all questions. Make it easy for the person marking your work to follow
your reasoning. Your .pdf should typically cross-reference any corresponding answer in your
(Question 4 and Question 7) Python .py files. Without clear cross-reference between .pdf and
.py, it is possible that any such exercise will be awarded 0 marks. Your .pdf should typically
also cross-reference any corresponding answer in your (Question 5) .xls (or .xlsx) spreadsheet
file. Without clear cross-reference between .pdf and .xls (or .xlsx), it is possible that any such
exercise will be awarded 0 marks.
Note 5: The only questions requiring any submitted programming are Question 4, Question 5
and Question 7. Any programming for Question 4 and Question 7 should be done in Python
and submitted as a .py file – with corresponding parts in the .pdf file. Any spreadsheet
programming for Question 5 should be done in MicroSoft Excel and submitted as a .xls (or
.xlsx) file – with corresponding parts in the .pdf file.
Note 6: As a general rule, if there is an elegant way of answering a question without
unnecessary extra work, try to do it that way. More generally, more elegant solutions are
preferable - and might at least sometimes be given more marks, possibly many more marks.
Show your working and make your answers clear. (Another way to think of this is to try to put
yourself in the marker’s situation.)
Note 7: All of your submitted work should be and must be in machine readable form, and none
of your submitted work should be hand-written - with all cases of handwritten work possibly
resulting in 0 marks. Show your working and make your answers clear.
Note 8: If you wish for your work to be marked and not to accrue (possibly considerable) late
penalties, then make sure to upload the correct files and (not to leave your files as Draft but)
also to hit `Submit’ to make sure that your work is submitted.

----

----

Question 1 (5 + 5 = 10 marks)
For a modified version of (rock (R), paper (P), scissors (S)), we have the following pay-off
matrix:
Player II R P S
Player I
R (0, 0) (-1, 1) (4, -4)
P (1, -1) (0, 0) (-2, 2)
S (-4, 4) (2, -2) (0, 0)

An entry (x, y) means that x is the payoff to Player I and y is the payoff to Player II.
Show your working in answering the questions below.

(a) If Player II plays (R, P, S) with the corresponding probabilities (4/7, 1/7, 2/7), then what is
player I's expectimax strategy and what is the expected payoff?

(b) If Player II plays (R, P, S) with the corresponding probabilities (2/7, 4/7, 1/7), then what is
player I's expectimax strategy and what is the expected payoff?

----

Question 2 (5 + 5 + 5 = 15 marks)

In the following 2-person zero-sum game, players alternate their moves.
Player I seeks to maximise the stated value, and Player II seeks to minimise the stated value.
Consider the following (sub-)tree emanating from a particular node in the game, with an
evaluation given in each leaf:
I
/ | \
II II II
/ | \ / | \ / | \
2 1 3 -1 4 8 6 -2 I
a b c d e f g h / \
5 7
i j

Put another way, I has three available moves. After I’s 1st move, II has 3 choices (respectively
with evaluation 2, 1, 3). After I’s 2nd move, II has 3 choices (respectively with evaluations -
1, 4, 8). After I’s 3rd move, II has 3 choices. After II’s first 2 choices, the evaluation would
respectively be 6 and -2. After II’s 3rd choice, I has two choices, respectively with evaluation
5 and 7.

Label the paths in the (sub-)tree systematically - e.g., the bottom rightmost leaf node (j) with
evaluation 7 could have a path I 3 (3rd of 3 possible moves), II 3 (3rd of 3 possible moves), I 2
(2nd of 2 possible moves), and so could be described as I 3 II 3 I 2 or even as (3, 3, 2).

Assume an appropriate search strategy, with optimal (or rational) play by both sides.

(a) How many leaf nodes need to be explored without use of alpha-beta pruning?
Show these nodes on your search tree. Show the order in which the nodes are searched.

(b) With use of alpha-beta pruning, how many leaf nodes need to be explored?
Show these nodes on your search tree. Show the order in which the nodes are searched.

(c) Using an appropriate search strategy, with optimal (or rational) play by both sides (i.e.,
player I plays maximin and player II plays minimax), what is the path to the resulting node and
what is the pay-off for player I?

----

----

Question 3: Modified Vacuum Cleaner World (6 + 3 + (2+2+2) = 6 + 3 + 6 = 15 marks)

Make sure to show your working and explain your answer to all parts of this question.

Recall the modified vacuum cleaner world example discussed in lab 6a. For this assignment, it
is changed as follows: After any action, exactly one of the two blocks becomes randomly dirty
- with probabilities ?and ?for the left block (?) and the right block (?), respectively.
(Note that ? + ? = 1 .)
There are two actions:
1) stay in the current block and vacuum (abbreviated by “S&V”),
2) move to the other block and vacuum (abbreviated by “M&V”),
Using these two actions, the vacuum cleaner should clean the dirt and receive a reward.
The reward for staying in the same block and vacuuming, “S&V”, (if dirty) is two times the
square root of the respective probability (i.e., if in ? then 2?? and if in ? then 2?? ).
The reward for moving to the other block and vacuuming, “M&V”, (if dirty) is equal to the
maximum of the two probabilities ?and ? (i.e., max(?, ?)).
If the block where the vacuum cleaner moves is not dirty, you will receive a zero reward and
the game ends. The goal is to maximise the cumulative sum of discounted rewards.

(a) Assuming the probabilities ? and ? are given, formulate this modified vacuum cleaner
world as a Markov Decision Process (an MDP), writing down the reward function (, , ′)
and the transition function (, , ′) as tables. (6 marks)

(b) Assume that the vacuum cleaner always starts from the right block, ?. Moreover, if at the
beginning, the vacuum cleaner takes the “M&V” action, then assume that it receives a non-zero
reward equal to 1.5?.

Based on the above, calculate the exact values for ?, ?, and the Transition and Reward
matrices. (3 marks)

Note: You can use a calculator and round the numbers to 3 decimal places, e.g., 2.4557 can be
displayed as 2.456.

(c) Considering ? () = ?() = 0 , and assuming that the discount factor for the
Bellman equation is equal to = 0.8, and starting from ?,
(i) calculate the value iterations over the four iterations. (2 marks)
(ii) Also, find the best policy after these four iterations. (2 marks)
(iii) Can we assume that we achieve the optimal policy after 4 iterations?
Please explain your answer. (2 marks)

Note: You can use a calculator and round the numbers to 3 decimal places, e.g., 2.4557 can be
replaced by 2.456.

Question 4 (8 + 5 + 2 = 15 marks)
Linear regression model is a commonly used supervised learning method for predictive data
analysis. Given the Prostate Cancer Dataset from a study by Stamey et al. (1989), the goal is to
predict the level of prostate specific antigen (PSA) based on a number of clinical measures. The
dataset consists of 97 subjects who were about to receive a radical prostatectomy.
The data can be accessed from
http://web.stanford.edu/~hastie/ElemStatLearn//datasets/prostate.data
Write Python code to build multivariate linear regression model to predict the log of PSA
(lpsa) using the following 8 clinical measures as predictors or features
● lcavol: log cancer volume
● lweight: log prostate weight
● age: age of patient
● lbph: log of the amount of benign prostatic hyperplasia
● svi: seminal vesicle invasion
● lcp: log of capsular penetration
● gleason: Gleason score
● pgg45: percent of Gleason scores 4 or 5

(a) Divide the dataset into a training set (subjects 1-67) and a test set (68-97). Fit a linear
regression model on the training set, using the lpsa as target (or dependent) variable and the 8
clinical measures as predictor (independent) variables. (8 marks)
(b) Use the fitted model to predict the lpsa values in the test set. (5 marks)
(c) Compute the mean squared error (MSE) between the predicted and the true lpsa
values in the test set. (2 marks)

Hint: You can use the scikit-learn library in Python.

Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E. and Yang, N.
(1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the
prostate II radical prostatectomy treated patients, Journal of Urology 16: 1076–1083.

----

----

Question 5 (11 + 4 = 15 marks)
Consider a set of 10 two-dimensional data points , shown in Table
below
i 1 2 3 4 5 6 7 8 9 10
5 4 1 8 7 6 10 2 9 3
3 2 2 7 9 5 9 1 8 1

Apply the k-means clustering algorithm on the data above. Assume the number of clusters is
k = 2, and the initial centroids chosen are and . Show the calculations for two
iterations of the k-means algorithm. Use Euclidean distance to compute the distance between
two data points. You will be required to your working in a spreadsheet (.xls or .xlsx) and
make your answers clear.

(a) Show the calculations for two iterations of the k-means algorithm, and write down the
estimated centroids and cluster membership for each data point. Use Euclidean distance to
compute the distance between two data points. (11 marks)
(b) Using the estimated centroids in part (a) above, compute the sum of squared distances
from each data point to its closest centroid. (4 marks)
Hint: You might wish to be guided by the template Excel file in Week 8 lecture resources.

----

Question 6 (3 + 5 + 7 = 15 marks)
Suppose that we want to build a (simple) Bayesian network model for COVID-19 (or
Covid19, or Covid, or covid) risk assessment. We hypothetically know that Covid19 may
cause loss of smell and taste. A Covid19 test can help determine whether a person is positive
or not. After careful analysis, our modelling leads us develop a hypothetical Bayesian
network and to learn the model parameters (hypothetically) as follows:

Based on this hypothetical learned model, we want to answer the follow questions

a. What is probability of getting severe status if one has loss of taste and smell, and tests
negative? (3 marks)
b. What is the covid status (i.e., the probabilities of each status) if one has loss of taste and
smell, and tests negative? (5 marks)

We now modify that question just given and further hypothesize that Covid19 may cause
cough. Our model is rectified with the coughing condition to give the new Bayesian network
as follows:

c. What is the covid status (i.e., the probabilities of each status) if one has cough, and tests
negative? (7 marks)

----

----

Question 7 Training an artificial neuron from 2D point data (5 + 10 + 5 = 20 marks)

Write an executable script (i.e., code in Python) to train an artificial neuron with sigmoid
activation function in order to classify two classes of 2D (two-dimensional) point data from
two Gaussian distributions as follows:
i) For the first Gaussian distribution, the x variable and the y variable are derived from i.i.d.
Normal distributions which both have mean equal to -0.5 and standard deviation equal to 0.7
(zero correlation between the x and y dimensions - i.e., isotropic Gaussian).
ii) For the second Gaussian distribution, the x variable and the y variable are derived from i.i.d.
Normal distributions which both have mean equal to 1.5 and standard deviation equal to 0.6
(zero correlation between the x and y dimensions - i.e., isotropic Gaussian).

From the above introductory description, we now provide some coding exercises.

(a) Generate 100 samples randomly for each class and provide their output labels.
Randomly initialize all the weights for the neuron between 0 and 1 and visualize the data points
for each class using different colours/markers. (5 Marks)

Hint: for the data point generation and the weights’ initialization, you can use the
“np.random.randn” in NumPy Python for the random data point generation and, you can use
“np.random.rand” in NumPy Python for the initialization of weights.

(b) Assuming the learning rate is 0.1, train the model (using gradient descent - step 1, 2
and 3 in lecture materials from Lecture 9, approximately slides 17-20) for 30 iterations and
report the final value of the weights and the total error (use the values at iteration 30, assuming
that iteration 0 denotes the beginning). (10 Marks)

(c) Report the output values of the model for the following 4 test data points (use the
values at iteration 30, assuming that iteration 0 denotes the beginning)
Data point 1 = (-1, -1)
Data point 2 = (0.2, 0.2)
Data point 3 = (0.5, 0.5)
Data point 4 = (1.5, 2)
(5 Marks)

--------

Instructions:
You are to upload your submission on the FIT3080 Moodle site and should include the
following:
1. A text-based .pdf document (save as: FamilyName-StudentId-
2ndSem2021FIT3080_2.pdf) that includes all your answers to Questions 1 to 7; and
2. (if you did Question 4 and produced a .py file) a .py file (save as: FamilyName-
StudentId-2ndSem2021FIT3080_2_Qu4.py).

3. (if you did Question 5 and produced a .xls file) a .xls file (save as: FamilyName-
StudentId-2ndSem2021FIT3080_2_Qu5.xls);
(if you did Question 5 and produced a .xlsx file) a .xls file (save as: FamilyName-
StudentId-2ndSem2021FIT3080_2_Qu5.xlsx);
4. (if you did Question 7 and produced a .py file) a .py file (save as: FamilyName-
StudentId-2ndSem2021FIT3080_2_Qu7.py).

If you attempted Question 4, then submit both files (.pdf and .py). In that case, your answer
to Question 4 should appear in both files.
If you attempted Question 5, then submit both files (.pdf and .xls or .xlsx). In that case, your
answer to Question 5 should appear in both files.
If you attempted Question 7, then submit both files (.pdf and .py). In that case, your answer
to Question 7 should appear in both files.
Question 4 will require .py file, Question 5 will require a .xls (or .xlsx) file, Question 7 will
require a .py file.
If you did not attempt any of Question 4 and Question 5 and Question 7 (and attempted
nothing other than Questions 1, 2, 3 and 6), then you can just submit the .pdf file.

Recall that, at the time you submit these files to Moodle, the text-based .pdf will undergo a
similarity check by Turnitin. This is done at the time you upload your assignment to Moodle.
It is also our intention to perform such a check on the other files at the same time if you
submit it.

(This largely ends the submission instructions. Please read them and the notes on page 1
carefully. Also recall that, as a general rule, when answering questions, don’t just give a
number or an answer like `Yes’ or `No’ without at least some clear and sufficient explanation.
If the details of the submission instructions change then students will be notified.)

Late penalties:
Work submitted after the deadline (possibly with a small amount of grace time) will be
subject to late penalties in accordance with the FIT3080 Unit Guide and Faculty and
University policies, possibly 10% per calendar day and certainly no less than 5% per calendar
day. (This percentage is taken from the total marks for the Assignment, not from the
student’s mark. So, as an example, if the student gets 70% and then gets a 10% late penalty
then the mark becomes 70% - 10% = 60% and does not become 70% x 90% = 63%.)
If you do not submit matching .pdf and .py files (e.g., if you submit two files but one is blank
or unreadable, or if you only submit one file) and/or if you do not submit matching .pdf and
.xls (or .xlsx) files, then any affected work will be deemed late - and will be subject to the
relevant penalties, possibly receiving a mark of 0.
Work submitted 10 or more calendar days after the deadline will possibly be given a mark of
0.

Plagiarism declaration:
You are required to state explicitly that you have done your own work, however the Moodle
assignment submission details permit you to declare this.
For example, if you are presented with an 'Assignment Electronic Plagiarism Statement', then
you are required to complete the 'Assignment Electronic Plagiarism Statement' quiz on the
FIT3080 Moodle site and accept the Student Statement (electronic version of the Assignment
cover sheet). If you do not accept the Student Statement, then your assignment may not be
marked, and you may be given a mark of 0.

Recall instructions above and notes on page 1 (including but not only, e.g., Academic
Integrity, Special Consideration, make sure to hit the `Submit’ button, etc.), and please follow
these carefully.

The details of the submission instructions are possibly subject to change. In that event,
students will be notified.

And a reminder, as per note 2, not to post even part of a proposed partial solution to a forum
or other public location. If asking a question in public, you are advised to please make an
effort to ensure that your question is asked in a way that does not contain even part of a
proposed partial solution. It will often be helpful to find a way to post it as a general question
(not directly pertaining to the Assignment) and then to put it in the General category at Ed
Discussion. You are reminded that Monash University takes academic integrity very
seriously.

Again, as on page 1,
Total available marks: 10 + 15 + 15 + 15 + 15 + 15 + 20 = 105 marks, to a maximum of 100
marks.
Anyone achieving 100 or more marks will be given 100 marks.

*** END FIT3080 Assignment 2 Faculty of I.T., Monash University 2nd semester 2021 ***