r stata sql代写-STATS 220|学霸联盟

r stata sql代写-STATS 220

时间：2022-05-16

STATS 220 Final Assessment Semester 1, 2020
Time Allowed: This Final Assessment has been designed so that a well-prepared student
could complete it within two hours. From the 1pm release time you will have 24 hours to
complete and submit your assessment. No marks will be deducted for taking longer than two
hours within that 24-hour period, but you must submit before the deadline.
INSTRUCTIONS
This assessment is open book, you are permitted to access your course manuals and
other written material including online resources.
Calculators and computers are permitted.
It is your responsibility to ensure your assessment is successfully submitted on time.
We recommend you aim to submit a couple of hours in advance of the deadline, to allow
time to deal with any technical issues that might arise.
We STRONGLY recommend you download your submitted document from Canvas,
after submitting it, to verify you have uploaded the correct document.
This examination consists of 5 questions.
The marks for all questions sum to 100.
You should attempt ALL questions.
Questions are NOT worth equal marks.
For questions where you are required to write computer code, if you do not know the
exact code, you can still gain some of the marks by writing an approximation of what
the code should be.
Page 1 of 13
Support
If you have any concerns regarding your Final Assessment, please call the Contact
Centre for advice, rather than your instructors.
The Contact Centre can be reached on these numbers:
– Auckland: 09 373 7513
– Outside Auckland: 0800 61 62 63
– International: +64 9 373 7513
For any Canvas issues, please use 24/7 help on Canvas by chat or phone.
If any corrections are announced during the 24 hours of the final assessment, you will
be notified by a Canvas Announcement. Please ensure your notifications are turned on
during this period.
Question Interpretation
Please note that during the final assessment period you cannot contact your instructors for
clarification on how to interpret the wording of any specific questions or to verify that your
answer is correct. Interpreting wording and making appropriate assumptions is part of what is
being assessed. You will need to interpret the question yourself and check your own answers.
If you believe there is a typo, first re-read the question to check you have not misunderstood
the question, as it is very common for students to misread questions. If you still believe there
is a typo, please phone the Contact Centre.
Page 2 of 13
Academic honesty declaration
By completing this assessment, I agree to the following declaration:
I understand the University expects all students to complete coursework with integrity and
honesty. I promise to complete all online assessment with the same academic integrity
standards and values. Any identified form of poor academic practice or academic misconduct
will be followed up and may result in disciplinary action.
As a member of the University’s student body, I will complete this assessment in a fair, honest,
responsible and trustworthy manner. This means that:
I declare that this assessment is my own work, except where acknowledged appropriately
(e.g., use of referencing).
I will not seek out any unauthorised help (i.e., anyone other than the course lecturer or
tutor) in completing this assessment.
I declare that this work has not been submitted for academic credit in another University
of Auckland course, or elsewhere.
I am aware the University of Auckland may use Turnitin or any other plagiarism
detecting methods to check my content.
I will not discuss the content of the assessment with anyone else in any form, including,
Canvas, Piazza, Facebook, Twitter or any other social media within the assessment
period.
I will not reproduce the content of this assessment anywhere in any form.
I declare that I generated the calculations and data in this assessment independently,
using only the tools and resources defined for use in this assessment.
I declare that I composed the writing and/or translations in this assessment indepen-
dently, using only the tools and resources defined for use in this assessment.
Page 3 of 13
Question 1 [14 marks]
Write HTML code that would produce the web page shown in Figure 1 when displayed in a
web browser. Make sure to consider the following points:
Your code should have correct syntax, and produce no errors or warnings in an HTML
validator.
Your code should have good style, including sensible indenting and at least one comment.
You should use CSS to control all formatting.
Do not worry about where the text wraps, the font size of the paragraphs, or the fonts
themselves.
The big heading is in italics.
The image URL is
https://upload.wikimedia.org/wikipedia/commons/9/9a/BTC_Logo.svg, its alt-text
is Bitcoin Logo, and its width is 50 pixels.
The link points to
https://lbry.tv/@3Blue1Brown:b/ever-wonder-how-bitcoin-and-other:1
There is an — element somewhere in the text.
The text in the table is left-justified, and the cells were padded using padding: 0 20px 0 0;
The column headings in the table are bold.
You do not have to write out all the text. Just write out parts near where some HTML is
used.
Page 4 of 13
Figure 1: The web page you need to reproduce for Question 1.
Page 5 of 13
Question 2 [12 marks]
Answer the following questions in a few sentences each.
(i) [3 marks] Give an example of a problem that could occur if the DRY principle is not
followed.
(ii) [3 marks] Give an example of a problem that could occur if data is stored in a text
format, when a binary format would have been more appropriate.
(iii) [3 marks] Give an example of a problem that could occur if data is stored in an open
standard binary format, when a text format would have been more appropriate.
(iv) [3 marks] Give an example of a problem that could occur if data is stored in a proprietary
binary format, when an open standard binary format would have been more appropriate.
Page 6 of 13
Question 3 [12 marks]
The beginning of an XML document is shown below. The document is intended to store data
about spiders, including their scientific names and common names.

name ID #REQUIRED>
name ID #REQUIRED>
name ID #REQUIRED>
location CDATA #IMPLIED>
]>
(i) [6 marks] Write XML code to represent the following information in the XML document.
Spiders in the Sparassidae family are commonly known as Huntsman Spiders.
Spiders in the Linyphiidae family are known as Money Spiders in Australia, Portugal,
and the United Kingdom.
Within the Araneidae family, there is a genus called Argiope, and within that genus there
is a species called keyserlingi. This species is sometimes known as the Crucifix Spider.
In Australia it is also known as the St. Andrews Cross Spider.
(ii) [3 marks] Suppose each family, genus, or species of spider can optionally be categorised as
small, medium, or large. Suggest changes to the DTD that would allow this.
(iii) [3 marks] Write XML code to represent the fact that Huntsman spiders are large and Money
spiders are small.
Page 7 of 13
Database for Question 4
Question 4 refers to the following database, which contains information about drugs and
categories of drugs.
drugs
------------------------------------------------------------
id (pk) name prescription_needed cents_per_pill
------------------------------------------------------------
1 Ritalin 1 50
2 Prozac 1 50
3 Zoloft 1
4 Amoxycillin 1 20
5 Penicillin 1
6 Panadol 0 45
7 Nurofen 0 80
categories
--------------------------
id name
--------------------------
1 analgesic
2 psychiatric
3 stimulant
4 antidepressant
5 antibiotic
drug_categories
-------------------------------------
id (pk) drug (fk) category (fk)
-------------------------------------
1 1 3
2 2 4
3 3 4
4 2 2
5 1 2
6 3 2
7 7 1
8 6 1
9 4 5
10 5 5
------------------------------------
The foreign key `drug` refers to `id` in the drugs table.
The foreign key `category` refers to `id` in the categories table.
Page 8 of 13
Question 4 [12 marks]
(i) [2 marks] The database contains a drugs table, where each row is about a drug, and a
categories table, where each row is about a category. Explain why it also contains a
drug_category table.
(ii) [3 marks] Write SQL code to find the name and cents_per_pill of each drug, ordered
from cheapest to most expensive. Do not worry about NULL values. The output should
look like this:
name cents_per_pill
---------- --------------
Zoloft
Penicillin
Amoxycillin 20
Panadol 45
Ritalin 50
Prozac 50
Nurofen 80
(iii) [3 marks] Write SQL code to find the average cents_per_pill for drugs whose name
starts with P, and rename the output column to average. Again, do not worry about
NULL values. The result should look like this:
average
----------
47.5
(iv) [4 marks] Write SQL code to list all categories in alphabetical order, along with the
average cost per pill of all drugs in the category. Do not worry about NULL values, and
rename the output columns so the output would look like this:
category average_cost
---------- ------------
analgesic 62.5
antibiotic 20.0
antidepressant 50.0
psychiatric 50.0
stimulant 50.0
Page 9 of 13
Question 5 [50 marks]
This question is about computer code in the R language. This question relates to a data set
of pedestrian counts in the city of Melbourne, Australia (between January 2019 and April
2020). The data reside in the data/ folder with 16 files, "pedestrians-2019-1.csv" to
"pedestrians-2020-4.csv". Each file represents hourly tallies of pedestrians during one
month.
Within each file, each row of data gives information on the number of pedestrians passing by
a sensor location:
The sensor contains only one location (Southbank);
The time (hour of the day), wday (day of the week), mday (day of the month), month, and
year describe the temporal components of a timestamp; The count records the total number
of pedestrians at hourly intervals. There are a few missing values (NA) in count, when the
sensor is down.
The first few lines of the file "pedestrians-2019-4.csv" are shown in Figure 2.
sensor,time,wday,mday,month,year,count
Southbank,0,Tue,1,1,2019,2802
Southbank,1,Tue,1,1,2019,6096
Southbank,2,Tue,1,1,2019,2711
Southbank,3,Tue,1,1,2019,1347
Southbank,4,Tue,1,1,2019,544
Southbank,5,Tue,1,1,2019,283
Figure 2: The first few lines of the file "pedestrians-2019-4.csv".
The following code reads the file "pedestrians-2019-4.csv" into R and shows the first few
rows of the resulting data frame.
> april2019 <- read.csv("data/pedestrians-2019-4.csv")
> head(april2019)
sensor time wday mday month year count
1 Southbank 0 Mon 1 4 2019 131
2 Southbank 1 Mon 1 4 2019 104
3 Southbank 2 Mon 1 4 2019 35
4 Southbank 3 Mon 1 4 2019 24
5 Southbank 4 Mon 1 4 2019 31
6 Southbank 5 Mon 1 4 2019 115
Page 10 of 13
(i) [3 marks] Write down the result of the following three R expressions.
rownames(april2019) # write down the first 5
colnames(april2019)
dim(april2019)
HINT: refer to part (iv) for the number of rows of april2019.
(ii) [3 marks] Write R code to subset the row containing the maximum pedestrian counts
in April 2019 and assign the result to the symbol maxCount.
The symbol maxCount would print like this:
> maxCount
sensor time wday mday month year count
114 Southbank 17 Fri 5 4 2019 4351
(iii) [10 marks] Explain the purpose of each line of the following R code as well as the overall
purpose of the code. Identify all data structures involved, describe the semantics of
each function call, and describe any output that is produced (though exact numeric
values are not necessary).
mondays <- april2019$wday == "Mon"
daytime <- april2019$time > 8 & april2019$time < 18
aboveavg <- april2019$count > mean(april2019$count, na.rm = TRUE)
subset <- april2019[mondays & daytime & aboveavg, ]
c(avgCount=mean(subset$count, na.rm=TRUE),
totalCount=sum(subset$count, na.rm=TRUE))
Page 11 of 13
(iv) [15 marks] Write R code to read all 16 files and print out a message reporting the
number of observations (rows) in each file.
Your code would produce output like this:
data/pedestrians-2019-1.csv contains 744 observations
data/pedestrians-2019-2.csv contains 672 observations
data/pedestrians-2019-3.csv contains 744 observations
data/pedestrians-2019-4.csv contains 721 observations
data/pedestrians-2019-5.csv contains 744 observations
data/pedestrians-2019-6.csv contains 720 observations
data/pedestrians-2019-7.csv contains 744 observations
data/pedestrians-2019-8.csv contains 744 observations
data/pedestrians-2019-9.csv contains 720 observations
data/pedestrians-2019-10.csv contains 743 observations
data/pedestrians-2019-11.csv contains 720 observations
data/pedestrians-2019-12.csv contains 744 observations
data/pedestrians-2020-1.csv contains 744 observations
data/pedestrians-2020-2.csv contains 696 observations
data/pedestrians-2020-3.csv contains 744 observations
data/pedestrians-2020-4.csv contains 721 observations
NOTE: that there are two spaces after most of the file names, but only one space after
three of the file names (so the word contains lines up on all rows).
(v) [7 marks] Identify seven different types of R expressions in the following code.
weekdays <- april2019$wday
factor(weekdays[weekdays == "Tue"], levels = unique(weekdays))
(vi) [5 marks] Identify the syntax error in each of the following R expressions.
april2019["Count"]
sum(april2019$count, na.rm=true)
pedestrians-2019 <- nrow(april2019)
cat(paste("There were", nrow(april2019), "pedestrians")))
paste("The highest\lowest count was", max(april2019))
Page 12 of 13
(vii) [3 marks] Write down the output of the following R paste() expression.
> firstObs <- april2019[1, ]
> firstObs
sensor time wday mday month year count
1 Southbank 0 Mon 1 4 2019 131
> paste(firstObs$count, "pedestrians passing by", firstObs$sensor, "on",
paste(firstObs$year, firstObs$month, firstObs$mday, sep = "-"))
(viii) [4 marks] Explain what is type coercion and how was it used in part (vii)?
Page 13 of 13