Python代写 - fit1043
时间:2020-11-27
Instructions
You must write all your answers in the Script-book and clearly indicate
which question you are answering.
You can write in pen or pencil. Marks are indicated next to each question. This
exam paper consists of 2 parts and the total marks for the exam are 65 marks.
Page 2 of 5
Part 1 (15 marks in total)
Multiple Choice Questions: This section is worth 15 marks. Each question is
worth 1 mark. Identify the choice that best completes the statement or answers
the question. There is only one best answer for each question. Sometimes two
answers may appear feasible, but you are to pick the one you believe is the
best. If you change your selection during the review of your paper, prior to the
end of the Examination, make sure that the alteration is clear.
Marking Scheme for Multiple Choice Questions:
• 1 mark for a correct answer
• 0 marks for a wrong or more than one answer
• 0 marks for no answer
Please pay attention that this is a sample exam, and we provided some
questions to give you an insight into the type of questions which you
would have in your final exam. The number of questions is less than what
you will see in your final exam. You will have 15 multiple choice question
and 25 short answer questions in your final exam.
QUESTION 1.1:
What is Hadoop?
A. An abbreviation for “Hadrian's Loop”, a firewall management system
B. A programming language designed for agile development
C. An encryption system used extensively at Google
D. A system for partitioning computation across a compute cluster
QUESTION 1.2:
Which of the following is true about “open data”?
A. Open data is both private and machine readable
B. Open data is always useful
C. Open data is a machine-readable data that is publicly available
D. None of the above option
QUESTION 1.3:
Which of the following statements about Data Wrangling tools is TRUE:
A. Python and R are general purpose languages that cannot be used for data
wrangling.
B. All data wrangling tools require users to write code.
C. Data wrangling tools are all open source.
D. ALL of the above statements are false.
Page 3 of 5
QUESTION 1.4:
The 3Vs of big data are important because:
A. they are an industry standard
B. they are the basis for the development of more Vs (e.g. Value)
C. they are used to describe in what way a dataset may be too big to handle
D. they are from the influential Gartner Inc
QUESTION 1.5:
The growth of NoSQL databases occurred because:
A. they were better suited for distributed implementation
B. variety, volume and specific processing demands of some classes of data
challenges RDBMSs
C. they were more easily integrated with web client applications
D. enterprising database developers expanded in the niche markets of NoSQL
QUESTION 1.6
What is the proper explanation for the following code:
titanic.groupby(['sex','class'])['age']
A. It groups the data by the data based on Sex and Class and returns the
average
B. It shows the average of age in each class and sex
C. It groups the data based on Sex and Class
D. It first groups the data by sex. Then shows the average age in different
classes
QUESTION 1.7
What one is correct about R and Python:
A. Python is more powerful compared to R as it has more libraries
B. R supports dataframe while Python does not have the concept of dataframe
C. Both R and Python can be used for data wrangling
D. R and Python are not comparable as they cannot be used for the same
purposes
Page 4 of 5
Part 2 (50 marks in total)
Short Answer Questions: This section is worth 50 marks and each question
is worth 2 marks. Your answer should be written in clear, simple English and
should be complete enough in addressing the question. Extensive prose is not
required. Structured bullet points are acceptable.
QUESTION 2.1:
Explain what big data is. Consider the four V’s of big data and explain veracity
in a few words.
QUESTION 2.2:
Name two typical tasks performed while “wrangling data”.
QUESTION 2.3: Assume you are collecting data about traffic accidents in Melbourne to develop
a predictive model. Would it be better to collect “more data” (e.g. the locations
of accidents over many years) or “more types of data” (e.g. the types of vehicles
involved, the weather conditions, etc)? Give a brief justification.
QUESTION 2.4
Explain the differences between a classification and a regression. Which one
can be used to predict a salary based on age and job title of a person?
QUESTION 2.5:
Would you consider user’s emails as to be sensitive information? Why or why
not?
QUESTION 2.6:
Name two different data science roles (jobs) and explain their responsibilities.
QUESTION 2.7:
Explain the k-means algorithm.
END OF EXAM