STAT1008-R代写
时间:2023-04-17
Chapter 2 - Organising and Visualising Data
STAT1008
Quantitative Research Methods
Review question – Survey error
1. The Crime Victimisation Survey is
designed to provide statistics on crime
related events. Respondents are asked to
recall their experiences of events that
occurred in the last 12 months. What is a
potential source of survey error from this
method of data collection?
2
Review question – Survey Error
2. A survey is conducted to measure the
level of physical activity among year 12
students. Prior to completing the survey,
students are given information on the
benefits of physical activity. What is a
potential source of survey error from reading
the additional information prior to completing
the survey?
3
Summarising Categorical Data
• Data: GRADES_Ch2.xls (from text book)
• Summary Table: gives the frequency/count/proportion
of the data in each level of the categorical variable
4
SUMMARY TABLE
Grades Frequency Percentage
HD 4 7%
DIST 7 13%
CREDIT 11 20%
PASS 24 44%
FAIL 9 16%
Total 55 100%
In Excel – use COUNTIF() function
Visualising Categorical Data
• Bar Chart: Graphical representation of summary table.
• What conclusions can you draw from the bar chart of
grades?
5
In Excel – insert 2D column chart
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
FAIL PASS CREDIT DIST HD
Grade Percentage
Summarising Two Categorical Variables
• Two-way summary table (contingency table) (already
created in data sheet)
6
Visualising Two Categorical Variables
• create side-by-side bar charts
• Data: ROAD_FATALITIES_example.xls. Interpret the
graph.
7
0%
5%
10%
15%
20%
25%
30%
< 10
10 to < 20
20 to < 30
30 to < 40
40 to < 50
50 to < 60
60 to < 70
70 to < 80
80 to < 90
90 or m
ore
U
nknown
Road Fatalities by Age and Gender
Male
Female
In Excel – insert 2D column chart > Select Data > add two series of data
Summarising Two Categorical Variables
• Data: Property.xls
Create a Percentage Contingency Table for number of
bedrooms and location
8
Count of Bedrooms Bedrooms
Location 1 2 3 4 5 6 8 Grand Total
Rural 2.00% 5.00% 16.00% 10.00% 1.00% 0.00% 0.00% 34.00%
Town 4.00% 14.00% 29.00% 14.00% 3.00% 1.00% 1.00% 66.00%
Grand Total 6.00% 19.00% 45.00% 24.00% 4.00% 1.00% 1.00% 100.00%
EXCEL: Insert> Pivot Table > Drag variables names into Row/Column field>
Change value field settings
• IGNORE PIE CHART
• IGNORE STEM AND LEAF PLOT
• ORDERED ARRAYS – means sort the
data in order of magnitude
9
Summarising Numerical Data
• Data: GRADES_Ch2.xls (from text book)
• Frequency Table/Distribution – analogous to summary
table for categorical data
Step 1: Bin numerical values into intervals
Example
• Consider the variable `semester mark’.
• Range = maximum – minimum = 49-12=37
• Desired number of bins = 5 (this can vary, choose the
number of bins so the number of observations per bin is
not too small nor too big)
• Interval Width = 37/5=7.4 (not a user friendly value).
Let’s round to 10.
10
Summarising Numerical Data
Step 2: Create Frequency Table
11
Freq Rel. Freq Percentage Cum Percent
lower upper
11 20 6 0.11 11% 11%
21 30 9 0.16 16% 27%
31 40 29 0.53 53% 80%
41 50 11 0.20 20% 100%
Total 55
Grade Interval (inclusive)
In Excel: use COUNTIF() function or Histogram function in DataAnalysis Toolpak
Visualising Numerical Data
• Histogram: A bar plot of the frequency table for
numerical data
12
Interpret the histogram. What data range of values are less likely to be observed?
Where is the grade distribution concentrated?
0
5
10
15
20
25
30
35
11-20 21-30 31-40 41-50
Histogram of Semester Mark
• IGNORE PERCENTAGE POLGON and
OGIVE
13
Visualising Two Numerical Variables
• Scatter Plot – a plot of one numerical
variable (x-axis) versus another numerical
variable (y-axis)
• Data: GRADES_CH2.xls – draw a scatter
plot between Exam Mark and Semester
Mark. Comment on the association
between the two variables?
14
Visualising Two Numerical Variables
15
In EXCEL: Insert > Scatter Plot
Comment on the association between the two variables.
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50 60
Ex
am
M
ar
k
Semester Mark
Line of best fit/
regression line
Semester mark = Exam mark
Time series plot
• Plot the data vs time (x-axis). Can observe patterns in
the value of a variable over time.
16
0.0000
0.2000
0.4000
0.6000
0.8000
1.0000
1.2000
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
01
-J
an
-…
01
-M
ay
-…
01
-S
ep
-…
FXRUSD
• IGNORE SECTION 2.6 – just contains
more definitions on subfields of analytics
and associated system setups
17
Misleading graphs
• Consider 5 sales agents denoted as persons A, B, C, D
and E. The bar chart shows their sales volumes for the
past month. Which sales agents show outstanding
performance relative to the others??
18
8600
8800
9000
9200
9400
9600
9800
A B C D E
Misleading graphs
19
Consider the revised graph. Which sales agents show
outstanding performance relative to the others??
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
A B C D E
Misleading use of graphs
20
Misleading use of graphs
21
Misleading use of graphs
22
Misleading use of statistics
Kelloggs MiniWheats Lawsuit.
• https://www.npr.org/sections/thesalt/2013/05/30/1873302
35/no-frosted-mini-wheats-don-t-make-your-kids-smarter
Kellogg’s claim:
``Based upon independent clinical research, kids who at
Frosted Mini Wheats cereal for breakfast had up to 18%
better attentiveness three hours after breakfast than kids
who ate no breakfast’’.
Kellogg’s agreed to a $4m settlement in a class-action
lawsuit because of the deceptive marketing campaign.
23
Ethical Issues (section 3.6 of text)
• Results should be presented in a “fair, objective
and neutral manner”.
• It is unethical to choose “an inappropriate
summary measure … to distort the facts to
support a particular position.”
• Results should not be selectively quoted to
support a particular conclusion, or to exaggerate
certainty. For example, by only showing results
from businesses where a new technology gave
improvements, or by showing data from a
conveniently selected time range.


essay、essay代写