MATH1712 Probability and Statistics II
Practical 2
http://www1.maths.leeds.ac.uk/~arief/MATH1712/
Arief Gusnanto, MATH1712@leeds.ac.uk
2020/21, semester 2
In this practical we consider a data set about UK fishing vessels from the module external webpage
https://www1.maths.leeds.ac.uk/~arief/MATH1712
under the heading Data. The original source of the dataset is from the UK government website at
https://www.gov.uk/government/statistical-data-sets/vessel-lists-over-10-metres
The UK government web page provides several versions of the data set and for the practical we
will use the data from January 2020. The practical counts 10% towards the final mark of the
module.
the instruction on how to write your report at the end of this document.
You can get help with the use of R in the timetabled practical sessions. These sessions take place
in the week beginning 8th March. You can also refer to the ‘Introduction to R’ on the module
web page for more explanations about the required R commands.
Task 1. Import the data into R and give an overview over the data, using summary statistics as
appropriate. If you access the data directly from the UK government webpage, it may be easiest
to first load the data into Microsoft Excel and then to save the relevant section of the spreadsheet
as a csv file. Make sure to read the explanations in the Excel file.
As hinted at in the original spreadsheet, rows are duplicated, with the duplicates differing only
in the column ‘Licence Category’. Since we are not interested in licence categories, remove the
column ‘Licence Category’ from the data, and then use the command unique() to remove the
excess rows. (You can type ‘help(unique)‘ to learn how this command works.) Your report
should at least address the following issues:
 Give some general information about the data set.
 Convince the reader that you have imported the data correctly.
 State how many rows were removed as duplicates.
 What is the most common vessel name?
Task 2. For the remaining tasks we will consider only vessels where the home port is either
Ardglass (Northern Ireland) or Newlyn (Cornwall). From the full data set, extract two subsets,
corresponding to all vessels which have Ardglass or Newlyn as their home ports, respectively. Your
report should at least address the following issues:
 Explain how you split out the rows corresponding to a given home port.
 How many vessels have Ardglass as their home port?
 How many vessels have Newlyn as their home port?
 Comparing the two subsets to the full data set, would you consider vessels from the two
ports ‘typical’?
Task 3. We want to compare overall lengths and engine powers of vessels based in the two
ports. As a first step, plot histograms of overall length and engine power for both ports (i.e. four
histograms in total). Plot your histograms so that it is easy to compare the two ports.
p.t.o.
Task 4. Still considering Ardglass and Newlyn, use a t-test to test the following hypotheses at
5%-level:
H0 : overall vessel lengths at both ports have the same mean
H1 : overall vessel lengths at both ports have different means
and
H0 : vessel engine powers at both ports have the same mean
H1 : vessel engine powers at both ports have different means.
For both tests, first compute the test statistic yourself (using simple R commands), and then re-do
the test using the R function t.test() (see help(t.test) for how to use this function). Make
sure that both methods give the same result. Comment on the applicability of the chosen test,
and discuss the results of the test.
b) Don’t forget to attach the academic integrity form and submit the report as a single pdf file
c) Your report must be typeset (not handwritten) and must not exceed four pages, including
all figures and R code. Since space is limited, focus your discussions on the essentials and
does not count towards the page limit.
d) Write complete sentences, including correct punctuation.
e) Use plots to illustrate your results, and discuss each plot you include in the report. Take
care to use good axis labels, captions, etc. for your plots and make sure that your plots are
meaningful and easy to interpret.
f) Include and explain all R code you use to derive your results, but do not include unused or
unnecessary R code. Include the code in the main text rather than into an appendix. If you
use LATEX, put the R codes in verbatim mode. If you use Microsoft Word, put the R codes
in Courier New font type and a font size of 10 (just for the R codes and its output – not
the whole report).
2