xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

扫码添加客服微信

扫描添加客服微信

统计数学report代写-ST404-Assignment 1

时间：2021-02-01

ST404 Assignment 1 2021

ST404 Applied Statistical Modelling 2021

1 Assignment 1

Assignment 1 counts for 25% of the module mark and consists of two deliverables:

1. An exploratory data analysis report in pdf format (20 marks).

Deadline for report submission: Monday, 8 February 12:00.

2. A slide presentation in Week 6 (5 marks).

Deadline for presentation submission: Monday, 8 February 12:00.

Assignment submission is via Moodle. Remember the advice given on report writing, proof reading

and avoiding plagiarism! All sources used, whether online or paper, need to be acknowledged and

appropriately referenced. The data analysis report will be submitted to TurnItIn UK for a plagiarism

check.

1.1 The data

The data to be used for assignments 1 and 2 is the USA Crime data. A full description of the data

set can be found in a separate file on Moodle.

The data concerns crime rates in a sample of USA counties. A county is an administrative and

political sub-division of a US state. Counties vary widely in both geographical area and population

size. Each row of the data gives summary statistics for one county.

This kind of aggregated data is sometimes called “ecological”. This terminology may be confusing

as it has nothing to do with the science of ecology. The relationships between aggregated variables

may be very different from the relationships at an individual level. You should be careful not to

draw conclusions about individuals from your analysis of aggregated data. This is called the

“ecological fallacy”.

One consequence of the ecological fallacy is that we cannot make conclusions about what causes

crime from these data. However, if we can identify factors associated with lower or higher rates or

crime we can understand how state resources are allocated to deal with the consequences of crime

and identify priority areas for crime prevention programmes.

1.2 The task

The aim of this analysis is to investigate the factors that are associated with both violent and non-

violent crime in different counties. Before building a formal statistical model we need to use

exploratory data analysis (EDA) techniques to understand the distribution of the variables and the

relationships between them. This assignment is all about EDA. The task of building a statistical

model is deferred until assignment 2 and should not be done in this assignment.

Some key questions to ask are:

Which variables show a strong relationship with the outcome variables?

◦ Can the relationship be characterized as a linear?

◦ Does the relationship appear to be homoscedastic?

◦ What transformations, if any, might be applied, to resolve any issues?

Page 1 of 4

ST404 Assignment 1 2021

◦ Are there any other approaches that could be taken to tackle these issues?

Which variables, if any, have a highly skewed distribution? What transformations might be

applied to reduce skewness and stabilize the spread of the observations?

Do any of the variables have outlying values? How should outliers be treated?

Which variables are highly correlated with each other? Are there variables that represent

different ways of measuring the same thing?

Given all of the above, what recommendations would you suggest for preparing these data

in order to fit a linear model?

Some of the data values are missing. You should also investigate possible missing data mechanisms

Can you suggest a mechanism for missing data (MCAR/MAR/MNAR)?

What should be done with missing values when you come to build your statistical model?

The assignment is deliberately open-ended. You should not assume that there is a single “correct”

answer. It is expected that different teams will come to different conclusions. You will be judged by

your ability to use sound statistical methodology to extract meaningful conclusions from real-world

data, and to communicate clearly your findings to the target audience. Details of the marking

scheme are given below.

Note that the emphasis of this part of the assignment is on exploration of the data before developing

a more rigorous statistical model. Explore the various EDA tools from the lectures and

motivate/support your conclusions with appropriate numerical and graphical evidence.

1.3 The report

The report should be written to a professional standard, using an 11pt font or larger and contain the

following sections:

1. Executive summary (maximum 3 bullet points): The aim of the executive summary is to

give a succinct overview on the key messages of your report that is accessible to a lay

audience.

2. Findings (maximum 2 pages): This section should contain a brief description of your main

findings.

3. Statistical methodology (maximum 5 pages): This section should give a description of the

EDA with appropriate numerical and graphical information. (It should also include an

appropriately referenced bibliography). The section may include, but should not necessarily

be restricted to, a discussion of the following questions:

a) What issues did you identify in the initial exploratory analysis? How will these impact

on modelling decisions when you come to build your statistical model?

b) Did you transform any of the variables? Why or why not?

c) Can you exclude any of the variables from the model building process? Why or why

not?

4. A paragraph or table on "authors' contributions": a very brief description of what each

team member contributed to the project (this is common practice in journals that publish

multi- disciplinary research) and the proposed mark weighting for each student (see below).

5. References A bibliography containing references cited in the text.

6. Appendix: comprehensive and annotated R-code that allows the initial exploratory analysis

and model development to be reproduced.

Page 2 of 4

ST404 Assignment 1 2021

1.4 The presentation

The oral presentation must be no more than 12 minutes long, and all students in a group should

spend a roughly equal amount of time speaking.

The slides should contain a brief description of your methodology and your findings, and be as

visually appealing as possible.

1.5 Marking criteria (Total 25 points)

Executive Summary [3 points]:

1. relevance of presented information;

2. appropriateness, clarity and correctness of language;

Findings [5 points]:

1. clarity and accurateness of overview;

2. quality and relevance of numerical and graphical output;

3. quality of conclusions presented;

4. appropriateness, clarity and correctness of language.

Statistical methodology [10 points]:

1. quality of exploratory analysis;

2. relevance and quality of numerical and graphical evidence;

3. structure and clarity, appropriate use of terminology, correctness of English.

Appendix [2 points]: appropriately annotated and complete.

Presentation [5 points]:

Slides

1. Layout, structure and visual appeal;

2. Accuracy and relevance of content.

Oral Presentation

1. Fluidity;

2. Persuasiveness;

3. Appropriate use of language;

4. Response to targeted questions, where appropriate.

Page 3 of 4

ST404 Assignment 1 2021

1.6 Further Guidelines and Instructions

1.6.1Layout

The report should be written in a font size 11 or higher with a 1.5 spacing between the lines. All

figures and tables should be numbered, have captions and be of appropriate size. Text included in

figures (such as titles and axis labels) should be readable under the same conditions as the rest of

the text. Margins should be sensible.

1.6.2Penalties

Late submission (-5% per working day), over page limit (-5%), not using the prescribed

layout (-5%).

1.6.3Marks

For the delivery during the poster session students will receive an individual mark. The other

deliverables (report and the poster itself) receive a group mark which will be distributed across team

members using the weighting algorithm described below.

1.6.4Peer Review

Each team should decide how to distribute the group mark by allocating to each team member a

share of n x 100% where n is the number of students in the team. This will act as a weighting factor

to convert the group mark into an individual mark. For example, suppose the group mark is 70%

and a team of 5 students decides to allocate 100% to each team member, then each member receives

the mark of 70%. On the other hand, if the team decides to allocate 108% to one team member and

98% to the other four members, then the former receives a mark of 75.6% and the latter four team

members receive the mark 68.6%. The maximum weighting factor that can be awarded is 110%, the

minimum weighting factor is 90%. The module leader reserves the right to moderate the weighting

factors, impose equal weighting factors and/or request further evidence.

Page 4 of 4

学霸联盟

ST404 Applied Statistical Modelling 2021

1 Assignment 1

Assignment 1 counts for 25% of the module mark and consists of two deliverables:

1. An exploratory data analysis report in pdf format (20 marks).

Deadline for report submission: Monday, 8 February 12:00.

2. A slide presentation in Week 6 (5 marks).

Deadline for presentation submission: Monday, 8 February 12:00.

Assignment submission is via Moodle. Remember the advice given on report writing, proof reading

and avoiding plagiarism! All sources used, whether online or paper, need to be acknowledged and

appropriately referenced. The data analysis report will be submitted to TurnItIn UK for a plagiarism

check.

1.1 The data

The data to be used for assignments 1 and 2 is the USA Crime data. A full description of the data

set can be found in a separate file on Moodle.

The data concerns crime rates in a sample of USA counties. A county is an administrative and

political sub-division of a US state. Counties vary widely in both geographical area and population

size. Each row of the data gives summary statistics for one county.

This kind of aggregated data is sometimes called “ecological”. This terminology may be confusing

as it has nothing to do with the science of ecology. The relationships between aggregated variables

may be very different from the relationships at an individual level. You should be careful not to

draw conclusions about individuals from your analysis of aggregated data. This is called the

“ecological fallacy”.

One consequence of the ecological fallacy is that we cannot make conclusions about what causes

crime from these data. However, if we can identify factors associated with lower or higher rates or

crime we can understand how state resources are allocated to deal with the consequences of crime

and identify priority areas for crime prevention programmes.

1.2 The task

The aim of this analysis is to investigate the factors that are associated with both violent and non-

violent crime in different counties. Before building a formal statistical model we need to use

exploratory data analysis (EDA) techniques to understand the distribution of the variables and the

relationships between them. This assignment is all about EDA. The task of building a statistical

model is deferred until assignment 2 and should not be done in this assignment.

Some key questions to ask are:

Which variables show a strong relationship with the outcome variables?

◦ Can the relationship be characterized as a linear?

◦ Does the relationship appear to be homoscedastic?

◦ What transformations, if any, might be applied, to resolve any issues?

Page 1 of 4

ST404 Assignment 1 2021

◦ Are there any other approaches that could be taken to tackle these issues?

Which variables, if any, have a highly skewed distribution? What transformations might be

applied to reduce skewness and stabilize the spread of the observations?

Do any of the variables have outlying values? How should outliers be treated?

Which variables are highly correlated with each other? Are there variables that represent

different ways of measuring the same thing?

Given all of the above, what recommendations would you suggest for preparing these data

in order to fit a linear model?

Some of the data values are missing. You should also investigate possible missing data mechanisms

Can you suggest a mechanism for missing data (MCAR/MAR/MNAR)?

What should be done with missing values when you come to build your statistical model?

The assignment is deliberately open-ended. You should not assume that there is a single “correct”

answer. It is expected that different teams will come to different conclusions. You will be judged by

your ability to use sound statistical methodology to extract meaningful conclusions from real-world

data, and to communicate clearly your findings to the target audience. Details of the marking

scheme are given below.

Note that the emphasis of this part of the assignment is on exploration of the data before developing

a more rigorous statistical model. Explore the various EDA tools from the lectures and

motivate/support your conclusions with appropriate numerical and graphical evidence.

1.3 The report

The report should be written to a professional standard, using an 11pt font or larger and contain the

following sections:

1. Executive summary (maximum 3 bullet points): The aim of the executive summary is to

give a succinct overview on the key messages of your report that is accessible to a lay

audience.

2. Findings (maximum 2 pages): This section should contain a brief description of your main

findings.

3. Statistical methodology (maximum 5 pages): This section should give a description of the

EDA with appropriate numerical and graphical information. (It should also include an

appropriately referenced bibliography). The section may include, but should not necessarily

be restricted to, a discussion of the following questions:

a) What issues did you identify in the initial exploratory analysis? How will these impact

on modelling decisions when you come to build your statistical model?

b) Did you transform any of the variables? Why or why not?

c) Can you exclude any of the variables from the model building process? Why or why

not?

4. A paragraph or table on "authors' contributions": a very brief description of what each

team member contributed to the project (this is common practice in journals that publish

multi- disciplinary research) and the proposed mark weighting for each student (see below).

5. References A bibliography containing references cited in the text.

6. Appendix: comprehensive and annotated R-code that allows the initial exploratory analysis

and model development to be reproduced.

Page 2 of 4

ST404 Assignment 1 2021

1.4 The presentation

The oral presentation must be no more than 12 minutes long, and all students in a group should

spend a roughly equal amount of time speaking.

The slides should contain a brief description of your methodology and your findings, and be as

visually appealing as possible.

1.5 Marking criteria (Total 25 points)

Executive Summary [3 points]:

1. relevance of presented information;

2. appropriateness, clarity and correctness of language;

Findings [5 points]:

1. clarity and accurateness of overview;

2. quality and relevance of numerical and graphical output;

3. quality of conclusions presented;

4. appropriateness, clarity and correctness of language.

Statistical methodology [10 points]:

1. quality of exploratory analysis;

2. relevance and quality of numerical and graphical evidence;

3. structure and clarity, appropriate use of terminology, correctness of English.

Appendix [2 points]: appropriately annotated and complete.

Presentation [5 points]:

Slides

1. Layout, structure and visual appeal;

2. Accuracy and relevance of content.

Oral Presentation

1. Fluidity;

2. Persuasiveness;

3. Appropriate use of language;

4. Response to targeted questions, where appropriate.

Page 3 of 4

ST404 Assignment 1 2021

1.6 Further Guidelines and Instructions

1.6.1Layout

The report should be written in a font size 11 or higher with a 1.5 spacing between the lines. All

figures and tables should be numbered, have captions and be of appropriate size. Text included in

figures (such as titles and axis labels) should be readable under the same conditions as the rest of

the text. Margins should be sensible.

1.6.2Penalties

Late submission (-5% per working day), over page limit (-5%), not using the prescribed

layout (-5%).

1.6.3Marks

For the delivery during the poster session students will receive an individual mark. The other

deliverables (report and the poster itself) receive a group mark which will be distributed across team

members using the weighting algorithm described below.

1.6.4Peer Review

Each team should decide how to distribute the group mark by allocating to each team member a

share of n x 100% where n is the number of students in the team. This will act as a weighting factor

to convert the group mark into an individual mark. For example, suppose the group mark is 70%

and a team of 5 students decides to allocate 100% to each team member, then each member receives

the mark of 70%. On the other hand, if the team decides to allocate 108% to one team member and

98% to the other four members, then the former receives a mark of 75.6% and the latter four team

members receive the mark 68.6%. The maximum weighting factor that can be awarded is 110%, the

minimum weighting factor is 90%. The module leader reserves the right to moderate the weighting

factors, impose equal weighting factors and/or request further evidence.

Page 4 of 4

学霸联盟