xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

程序代写案例-MATH1712

时间：2021-03-16

MATH1712 Probability and Statistics II

Practical 2

http://www1.maths.leeds.ac.uk/~arief/MATH1712/

Arief Gusnanto, MATH1712@leeds.ac.uk

2020/21, semester 2

In this practical we consider a data set about UK fishing vessels from the module external webpage

https://www1.maths.leeds.ac.uk/~arief/MATH1712

under the heading Data. The original source of the dataset is from the UK government website at

https://www.gov.uk/government/statistical-data-sets/vessel-lists-over-10-metres

The UK government web page provides several versions of the data set and for the practical we

will use the data from January 2020. The practical counts 10% towards the final mark of the

module.

The deadline for handing in your report is Wednesday, 17th March, 12noon and please find

the instruction on how to write your report at the end of this document.

You can get help with the use of R in the timetabled practical sessions. These sessions take place

in the week beginning 8th March. You can also refer to the ‘Introduction to R’ on the module

web page for more explanations about the required R commands.

Task 1. Import the data into R and give an overview over the data, using summary statistics as

appropriate. If you access the data directly from the UK government webpage, it may be easiest

to first load the data into Microsoft Excel and then to save the relevant section of the spreadsheet

as a csv file. Make sure to read the explanations in the Excel file.

As hinted at in the original spreadsheet, rows are duplicated, with the duplicates differing only

in the column ‘Licence Category’. Since we are not interested in licence categories, remove the

column ‘Licence Category’ from the data, and then use the command unique() to remove the

excess rows. (You can type ‘help(unique)‘ to learn how this command works.) Your report

should at least address the following issues:

Give some general information about the data set.

Convince the reader that you have imported the data correctly.

State how many rows were removed as duplicates.

What is the most common vessel name?

Task 2. For the remaining tasks we will consider only vessels where the home port is either

Ardglass (Northern Ireland) or Newlyn (Cornwall). From the full data set, extract two subsets,

corresponding to all vessels which have Ardglass or Newlyn as their home ports, respectively. Your

report should at least address the following issues:

Explain how you split out the rows corresponding to a given home port.

How many vessels have Ardglass as their home port?

How many vessels have Newlyn as their home port?

Comparing the two subsets to the full data set, would you consider vessels from the two

ports ‘typical’?

Task 3. We want to compare overall lengths and engine powers of vessels based in the two

ports. As a first step, plot histograms of overall length and engine power for both ports (i.e. four

histograms in total). Plot your histograms so that it is easy to compare the two ports.

p.t.o.

Task 4. Still considering Ardglass and Newlyn, use a t-test to test the following hypotheses at

5%-level:

H0 : overall vessel lengths at both ports have the same mean

H1 : overall vessel lengths at both ports have different means

and

H0 : vessel engine powers at both ports have the same mean

H1 : vessel engine powers at both ports have different means.

For both tests, first compute the test statistic yourself (using simple R commands), and then re-do

the test using the R function t.test() (see help(t.test) for how to use this function). Make

sure that both methods give the same result. Comment on the applicability of the chosen test,

and discuss the results of the test.

Writing your report:

a) Clearly mark your report with your name and your student ID.

b) Don’t forget to attach the academic integrity form and submit the report as a single pdf file

to Gradescope via Minerva.

c) Your report must be typeset (not handwritten) and must not exceed four pages, including

all figures and R code. Since space is limited, focus your discussions on the essentials and

think about what is most important to include in your report. The academic integrity form

does not count towards the page limit.

d) Write complete sentences, including correct punctuation.

e) Use plots to illustrate your results, and discuss each plot you include in the report. Take

care to use good axis labels, captions, etc. for your plots and make sure that your plots are

meaningful and easy to interpret.

f) Include and explain all R code you use to derive your results, but do not include unused or

unnecessary R code. Include the code in the main text rather than into an appendix. If you

use LATEX, put the R codes in verbatim mode. If you use Microsoft Word, put the R codes

in Courier New font type and a font size of 10 (just for the R codes and its output – not

the whole report).

2

学霸联盟

Practical 2

http://www1.maths.leeds.ac.uk/~arief/MATH1712/

Arief Gusnanto, MATH1712@leeds.ac.uk

2020/21, semester 2

In this practical we consider a data set about UK fishing vessels from the module external webpage

https://www1.maths.leeds.ac.uk/~arief/MATH1712

under the heading Data. The original source of the dataset is from the UK government website at

https://www.gov.uk/government/statistical-data-sets/vessel-lists-over-10-metres

The UK government web page provides several versions of the data set and for the practical we

will use the data from January 2020. The practical counts 10% towards the final mark of the

module.

The deadline for handing in your report is Wednesday, 17th March, 12noon and please find

the instruction on how to write your report at the end of this document.

You can get help with the use of R in the timetabled practical sessions. These sessions take place

in the week beginning 8th March. You can also refer to the ‘Introduction to R’ on the module

web page for more explanations about the required R commands.

Task 1. Import the data into R and give an overview over the data, using summary statistics as

appropriate. If you access the data directly from the UK government webpage, it may be easiest

to first load the data into Microsoft Excel and then to save the relevant section of the spreadsheet

as a csv file. Make sure to read the explanations in the Excel file.

As hinted at in the original spreadsheet, rows are duplicated, with the duplicates differing only

in the column ‘Licence Category’. Since we are not interested in licence categories, remove the

column ‘Licence Category’ from the data, and then use the command unique() to remove the

excess rows. (You can type ‘help(unique)‘ to learn how this command works.) Your report

should at least address the following issues:

Give some general information about the data set.

Convince the reader that you have imported the data correctly.

State how many rows were removed as duplicates.

What is the most common vessel name?

Task 2. For the remaining tasks we will consider only vessels where the home port is either

Ardglass (Northern Ireland) or Newlyn (Cornwall). From the full data set, extract two subsets,

corresponding to all vessels which have Ardglass or Newlyn as their home ports, respectively. Your

report should at least address the following issues:

Explain how you split out the rows corresponding to a given home port.

How many vessels have Ardglass as their home port?

How many vessels have Newlyn as their home port?

Comparing the two subsets to the full data set, would you consider vessels from the two

ports ‘typical’?

Task 3. We want to compare overall lengths and engine powers of vessels based in the two

ports. As a first step, plot histograms of overall length and engine power for both ports (i.e. four

histograms in total). Plot your histograms so that it is easy to compare the two ports.

p.t.o.

Task 4. Still considering Ardglass and Newlyn, use a t-test to test the following hypotheses at

5%-level:

H0 : overall vessel lengths at both ports have the same mean

H1 : overall vessel lengths at both ports have different means

and

H0 : vessel engine powers at both ports have the same mean

H1 : vessel engine powers at both ports have different means.

For both tests, first compute the test statistic yourself (using simple R commands), and then re-do

the test using the R function t.test() (see help(t.test) for how to use this function). Make

sure that both methods give the same result. Comment on the applicability of the chosen test,

and discuss the results of the test.

Writing your report:

a) Clearly mark your report with your name and your student ID.

b) Don’t forget to attach the academic integrity form and submit the report as a single pdf file

to Gradescope via Minerva.

c) Your report must be typeset (not handwritten) and must not exceed four pages, including

all figures and R code. Since space is limited, focus your discussions on the essentials and

think about what is most important to include in your report. The academic integrity form

does not count towards the page limit.

d) Write complete sentences, including correct punctuation.

e) Use plots to illustrate your results, and discuss each plot you include in the report. Take

care to use good axis labels, captions, etc. for your plots and make sure that your plots are

meaningful and easy to interpret.

f) Include and explain all R code you use to derive your results, but do not include unused or

unnecessary R code. Include the code in the main text rather than into an appendix. If you

use LATEX, put the R codes in verbatim mode. If you use Microsoft Word, put the R codes

in Courier New font type and a font size of 10 (just for the R codes and its output – not

the whole report).

2

学霸联盟