Python代写-COMP9321
时间:2022-05-03
THE UNIVERSITY OF NEW SOUTH WALES
School of Computer Science and Engineering



Final Examination– Term1, 2022
3rd of May 2022


COMP9321 Data Service Engineering

Total Exam Mark: 40
Total Number of Questions: 30 + 6
Exam Duration: 2 Hours +15 minutes (reading and submitting)



















**** IMPORTANT NOTICE****
There are Two parts in this exam paper: Part A - Multiple Choice Questions, Part B - Written Answer
Questions. Plan your time wisely and attempt to complete all parts.
You may submit your solutions as many times as you like. The last submission ONLY will be marked.
Questions (and sub-questions) are not worth equal marks. Answer all questions.
For multiple choice questions select the response which best answers the question. Keep your written
answers clear and coherent. Messy or irrelevant answers will not be marked.
The Answers need to be according to your own effort and in Your Own Words. If you do not follow
these instructions, you will get zero marks for the exam and a possible charge of academic
misconduct.
PartA: Multiple Choice Questions (Total 15 Marks)
Use Moodle Quiz to Answer all the 30 Questions. The last submission is going to be marked. Make
sure you click submit at the end so the submission will be considered. If the time ends before you click
submit, the previous attempt will be marked.
https://moodle.telt.unsw.edu.au/mod/quiz/view.php?id=4650935
PartB: Written Answer Questions (Total 25 Marks)
The written Answer Questions Paper is to be submitted using Give System as a PDF file named z{id}.pdf
Remember that the file size limit is 1.5MB. We have 6 questions in total. Attempt to answer them all.
Most of the questions are analytical and scenario base. Manage your time well. You can use Python
code, pseudo code, or you can explain as a series of steps. In the case of using code there is no need
to preserve the syntax but it is a MUST to include proper commenting to explain each step.

Question1 (2 marks)
You are building a data service to provide sellers information about purchase orders and their status.
In your own words, briefly describe two approaches that you will follow to secure your data service
and why you prioritized them.

Question2 (2 marks)
You are using machine learning to apply some analytics on a dataset. You have properly split the
dataset into training set and test set. You have trained the model using your training set. You have
evaluated the performance relying on accuracy as a metric. You have acquired an accuracy of 93%
when using the test set. As you have deployed the model in production, you noticed that it is
performing very poorly on new instances.
A. What could have caused this issue? Explain. (1 mark)
B. How would it be possible to overcome this issue? (1 mark)
Go To Part A
Question3 (6 marks)
A manufacturing company has three datasets one for machines, their location, the operator, and
scheduled maintenance; the other is about the machine breakdown; and the last one about personal
information of the operators. The organization want to draw some insights in regard to the breakdown
of each machine and the relation with when the machine went through a scheduled maintenance and
who is the operator.
In the light of the dataset’s snippets shown below and what we covered in the course material:
1. what pre-processing (cleansing and manipulation) is needed to make sure that the factory can
conduct the required task. Explain each step in the light of the datasets provided. Be advised
that the organization is low on resources (e.g., storage, memory, cpu), so that need to be
considered in the pre-processing. (3 marks)
2. How would you approach the problem to help the organization draw some insights to help
them maximize productivity and keeping production lines working? Explain what you are
going to use and the rationale behind your choice. Explain if there is any limitation with your
approach. (3 marks)
Dataset 1
Machine ID Scheduled Maintenance
Date/Time
Location Operator
B1834 2019-01-16:23:59:12 L17-401-08 Albert
B9872 2019-01-03:09:15:17 Warehouse B Albert
N2543 2019-01-27:06:39:01 L17-502-12 Jill
n/a 2019-01-18:06:39:01 NaN NaN
M4328 2019-03-27:09:30:01 W17-401-09 Chris
B9872 2019-01-29:08:19:17 M17-401-08 Albert

Dataset 2
Machine ID Breakdown Date/time Investigated by
B1834 2019-21-01:11:59:12 AM Morty
N2543 2019-01-03:03:39:01 PM Chris
M4328 2019-23-05:01:30:01 PM Morty
B9872 2019-16-03:08:19:17 AM Albert
M4328 2019-23-05:01:30:01 PM -

Dataset 3
Operator Name Date of Birth Home address
Albert 01/08/1980 Australia, NSW, Kensington, Barker street, unit23
Jill 23/02/1982 Australia, NSW, Kensington, Barker street, unit22
Chris 03/03/1983 N/A
Barry 05/05/1981 Australia, NSW, Parramatta, Lamont street, unit 72
Morty 01/10/1970 Australia, NSW, Kingsford, Some street, house#1



Question4: (4 Marks)
You have a coffee service where you handle coffee orders (drink type, size, number of shots):
A. Consider the following HTTP request invoking a POST method of the Coffee RESTful API.
Write down the issues (if any), that you notice within the request, and the proposed fixes (if
needed) (1.5 marks)
POST /order/add/12
HTTP/1.1
Host: api.coffeehouse.com
Content-Type: application/xml
{
“id” : 12
“drink” : ”latte”,
“size” : ”small”,
“shots” : “2”
}

B. What would you return as a response for this request? Explain. (1 marks)

C. In the light of part A of the question, consider the following HTTP request invoking a PATCH
method of the Coffee RESTful API. Explain if there are any issues with the request and
recommend fixes (if needed) and the rationale behind your recommendation. (1.5 marks)
PATCH /update_orders/orders?id=12
HTTP/1.1
Host: api.coffeehouse.com
Content-Type: application/xml
{
“id” = 13
“size” = “large”
}





Question5: (7 marks)

You are helping an imaging laboratory with finding a way to screen patients with potential malignant
tumours for immediate follow up. You have acquired the historical dataset of patients (Dataset1
below). You have also acquired another dataset with personal information of the patients (Dataset 2
Below). The lab wants to allow the feeding of new test results and acquiring the initial screening
outcome to be done online by integrating this new system with other medical systems and
applications. The laboratory top priority is to minimize the possibility that the service would miss a
potential patient with malignant tumour not going through a follow up consolation.

A. Describe how will you approach the problem and what will you use and why? Explain your
approach, any data preparation if needed, and mention any limitation/requirements if any.
(2.5 marks)
B. In the light of the scenario and part A of the question above, what metric(s) are you going to
use for evaluation? Explain the rationale behind your choice. (2 mark)
C. Describe how will you allow the service to be consumed. Give an example of how you will
feed the information needed of new patients and get screening outcome. It worth to
mention that the imaging laboratory is very conscious of data privacy and health record
protection. (2.5 marks)

Dataset1
Patient ID Diagnosis Date of diagnosis Mass radius Mass texture Mass Compactness
The
diagnosis
of breast
tissues (M
=
malignant,
B =
benign)
7234 M 2021-01-28 22.1 10.2 0.2
2434 B 2020-04-02 9.3 19.3 N/A
7272 M 2019-03-03 N/A 8.3 0.3
1010 M 2019-05-01 15.2 20 0.28

Dataset2
Patient ID Name Address Date of
birth
Family member
with cancer
Smoking
7234 John NSW, Sydney 1950-09-28 Yes n/a
2434 Jane NSW, Parramatta 1980-07-07 No Yes
7272 Jill NSW, Fairfield 1940-06-01 N/A No
1010 Jones NSW, Kingsford 1948-01-19 Yes Yes














Question6: (4 Marks)

Consider a database containing information about movies: genre, director, and decade of release.
We also have information about which users have watched each movie. The rating for a user on a
movie is either 0 or 1. Here is a summary of the database:

Movie Release decade Genre Director Total numbers of
rating
A 1970s Drama D1 50
B 2010s Drama D1 150
C 2000s Action D2 100
D 1990s Action D2 18
E 2010s Drama D3 1
F 2000s Comedy D4 150
G 1990s Fantasy D2 300



Consider user U1 is interested in the time period 1990s, the director D2 and the genre Action. We
have some existing recommender system R that recommended the movie D to user U1.

The recommender system R could be one or more of the following options:
• User-based collaborative filtering
• Item-based collaborative filtering
• Content-based recommender system

1. Given the above dataset, which one(s) do you think R could be? (If more than one option is
possible, you need to state them all.) Explain your answer. (2 marks)
2. What are the disadvantages of the recommender system R in the question? Recommend
how you can mitigate these disadvantages. (2 marks)


essay、essay代写