Python代写 - fit1043
Instructions You must write all your answers in the Script-book and clearly indicate which question you are answering. You can write in pen or pencil. Marks are indicated next to each question. This exam paper consists of 2 parts and the total marks for the exam are 65 marks. Page 2 of 5 Part 1 (15 marks in total) Multiple Choice Questions: This section is worth 15 marks. Each question is worth 1 mark. Identify the choice that best completes the statement or answers the question. There is only one best answer for each question. Sometimes two answers may appear feasible, but you are to pick the one you believe is the best. If you change your selection during the review of your paper, prior to the end of the Examination, make sure that the alteration is clear. Marking Scheme for Multiple Choice Questions: • 1 mark for a correct answer • 0 marks for a wrong or more than one answer • 0 marks for no answer Please pay attention that this is a sample exam, and we provided some questions to give you an insight into the type of questions which you would have in your final exam. The number of questions is less than what you will see in your final exam. You will have 15 multiple choice question and 25 short answer questions in your final exam. QUESTION 1.1: What is Hadoop? A. An abbreviation for “Hadrian's Loop”, a firewall management system B. A programming language designed for agile development C. An encryption system used extensively at Google D. A system for partitioning computation across a compute cluster QUESTION 1.2: Which of the following is true about “open data”? A. Open data is both private and machine readable B. Open data is always useful C. Open data is a machine-readable data that is publicly available D. None of the above option QUESTION 1.3: Which of the following statements about Data Wrangling tools is TRUE: A. Python and R are general purpose languages that cannot be used for data wrangling. B. All data wrangling tools require users to write code. C. Data wrangling tools are all open source. D. ALL of the above statements are false. Page 3 of 5 QUESTION 1.4: The 3Vs of big data are important because: A. they are an industry standard B. they are the basis for the development of more Vs (e.g. Value) C. they are used to describe in what way a dataset may be too big to handle D. they are from the influential Gartner Inc QUESTION 1.5: The growth of NoSQL databases occurred because: A. they were better suited for distributed implementation B. variety, volume and specific processing demands of some classes of data challenges RDBMSs C. they were more easily integrated with web client applications D. enterprising database developers expanded in the niche markets of NoSQL QUESTION 1.6 What is the proper explanation for the following code: titanic.groupby(['sex','class'])['age'] A. It groups the data by the data based on Sex and Class and returns the average B. It shows the average of age in each class and sex C. It groups the data based on Sex and Class D. It first groups the data by sex. Then shows the average age in different classes QUESTION 1.7 What one is correct about R and Python: A. Python is more powerful compared to R as it has more libraries B. R supports dataframe while Python does not have the concept of dataframe C. Both R and Python can be used for data wrangling D. R and Python are not comparable as they cannot be used for the same purposes Page 4 of 5 Part 2 (50 marks in total) Short Answer Questions: This section is worth 50 marks and each question is worth 2 marks. Your answer should be written in clear, simple English and should be complete enough in addressing the question. Extensive prose is not required. Structured bullet points are acceptable. QUESTION 2.1: Explain what big data is. Consider the four V’s of big data and explain veracity in a few words. QUESTION 2.2: Name two typical tasks performed while “wrangling data”. QUESTION 2.3: Assume you are collecting data about traffic accidents in Melbourne to develop a predictive model. Would it be better to collect “more data” (e.g. the locations of accidents over many years) or “more types of data” (e.g. the types of vehicles involved, the weather conditions, etc)? Give a brief justification. QUESTION 2.4 Explain the differences between a classification and a regression. Which one can be used to predict a salary based on age and job title of a person? QUESTION 2.5: Would you consider user’s emails as to be sensitive information? Why or why not? QUESTION 2.6: Name two different data science roles (jobs) and explain their responsibilities. QUESTION 2.7: Explain the k-means algorithm. END OF EXAM