113P-Python代写
时间:2023-11-03
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
QBUS6860
Visual Data Analytics
2023 S2 – Group Assignment
Prepared by 糖糖
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Assignment:
Background
In this group assignment you will work on retail turnover and consumer price index
data from 2010-2023 in Australia. You will be given specific key business questions to
address. You must present your results in a 10-page report to be uploaded to Canvas.
You should also submit any Python code used to generate your visualisations.
The brief
You are working for a consulting company who have been asked to prepare a report
that focuses on the following key business questions (KBQs).
1) What is the relationship between retail turnover and inflation in Australia, where
inflation is measured via the change in the CPI index? Which seems to lead the
relationship, i.e. is turnover affected by inflation, or vice versa? And at what time
lag(s)?
2) Are any such relationships uniform in nature over different industry sectors and
over the states, or does the relationship between turnover and inflation behave
differently in different industry sectors and/or in different states?
3) What was the effect of the Covid period on the relationships between retail
turnover and inflation? Did it affect different sectors or different states differently?
Finally,
4) In light of the intricate relationship between retail turnover and inflation, please
formulate your own distinctive key business research question to address. Please
approach this task highlighting your originality in both the KBQ and how your address
it.
Key Information
1.This is a group assignment. You may form groups of three or four students however
we strongly recommend groups of three. Instructions to register your group can be
found here. It is entirely your responsibility to get yourself into a group of 3 or 4
students (no exceptions).
2.You are required to submit:
1.ONE written report (word or pdf format, through Canvas).
2.Python “.py” or Jupyter Notebook “ipynb” files. You may submit multiple
files.
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
3.The late penalty for the assignment is 5% of the assigned mark per
calendar day, starting after 23:59pm on 3rd November 2023. We will be
lenient with assignments that are marginally late due to slow uploads.
4.The main text of your report (including everything except for possible
appendices) should have a maximum of 10 pages in 12-point Times New
Roman (or Calibri) fonts and single line spacing, including all the plots,
figures and tables (if any). Any cover page, appendices and/or a list of
references are NOT counted towards the 10 page limit.
5.By 23rd October you should submit a progress report. This only
requires you to submit details of early group meetings (in particular who
attended and what each team member’s responsibilities will be). This
will not be marked, but will be used as evidence if there are complaints
about free riders in group.
Data
Retail turnover
Retail turnover for different industry sectors and different states are collected and
published by the Australian government and made available via the ABS, see
https://data.gov.au/data/dataset/retail-trade. The data for your assignment were
downloaded from data.gov.au., a platform for sharing open-source data. The data
contain the following columns:
•DATAFLOW: A fixed tag for the whole dataset
•MEASURE: Measure: A fixed tag for the measure used (prices) in the dataset
•INDUSTRY: Industry: A number and a name for the industry sector.
•TSEST: Adjustment Type: A fixed number and name indicating the original
data, not seasonally adjusted, is given.
•REGION: Region: The name of the region the data applies to, e.g. the state of
Augtralia
•FREQ: Frequency: A fixed name indicating the data are on the monthly
frequency.
•TIME_PERIOD: Time Period: Year and month the data apply to.
•OBS_VALUE: The actual retail turnover data
•UNIT_MEASURE: Unit of Measure: A fixed name indicating the data are in
Australian dollars
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
•UNIT_MULT:Unit of Multiplier: A fixed name indicating the data are in
Millions of dollars
Consumer Price Index (CPI)
The Consumer Price Index (CPI) measures quarterly changes in the price of a 'basket'
of goods and services which account for a high proportion of expenditure by the CPI
population group (i.e. metropolitan households). The data contain the following
columns:
•DATAFLOW: A fixed tag for the whole dataset,
•MEASURE: Measure: A fixed tag for the measure used (Index numbers) in the
dataset,
•INDEX: Index: A number and the name of the industry sector the data pertains
to,
•TSEST: Adjustment Type: A fixed number and name indicating the original
data, not seasonally adjusted, is given,
•REGION: Region: The name of the region the data applies to, e.g. the state of
Australia
Group assignment rubric
Quality of writing:Report is clearly written, well structured and conveys information in an effective and
succinct fashion.
Addresses Key Business Questions (KBQs):Clearly addresses key business questions (KBQs) and
visualisations carefully chosen to align with KBQs and provide insight.
Data cleaning and preparation:Analysis is reproducible in the sense that all data cleaning steps are
clearly described and justified. Python code can be run without errors
Principles of Visualisation:Principles of good visualisation are followed.
Sophistication of Visuals:Diverse use of plots. Highly sophisticated, plots are used for example those
showing the spatial nature of data or a large number of variables.
Thoroughness of analysis:Sensitivity of results to different measures/assumptions is investigated but in
a way that does not distract from the flow of the main findings (e.g. through the use of appendices).
Discussion of Shortcomings:Shortcomings of data and shortcomings of analysis are discussed in an
insightful way.
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
知识点回顾
Week 2: Data Preparation
1. Operations on Data Frames
six "simple machines"
of data frames
Description
Transforming Create a new variable based on values of existing
variables.
Sorting Sort the data according to one of the variables
Filtering Filtering involves selecting only some subset of the data.
Select rows/ Select columns/ Select by a logical condition
Group by/ aggregate e.g. compare total points scored by players country of
birth
The groupby function tells us the variable to group on:
Country of Birth.
The agg function tells us which variable to aggregate:
points
Reshaping
(melting and casting)
produce the visualisation we need to reshape the data
The function melt converts the data from wide to long.
The function pivot converts the data from long to wide.
Joining
(merging)
Bring two data frames together
Similar to a VLOOKUP type function in spreadsheet
programs such as Excel.
Different type of merge
Left: Keep all entries from first data frame
Right: Keep all entries from first data frame
Inner: Keep all entries that appear in both data frames
Outer: Keep all entries that appear in either data frame
2 Data Profiling
Some issues
2.1 Duplicated entries
Can be removed using drop_duplicates function in pandas.
2.2 Data entry errors
For example cannot score 20000 points in 3 minutes.
Requires domain knowledge.
These issues can be discovered during visualisation.
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
2.3 Standardisations
Steph Curry may appear elsewhere in the data as
"Stephen Curry"
"Wardell Stephen Curry II" (his full name).
Similar things happen with company names
Facebook/ Meta, General Motors/ GM etc.
Requires domain knowledge and some extra coding.
2.4 Missing Data
In Python there is NaN for this.
Often will replace with a number but this is a bad strategy
Do not use "0" for "missing", the zeros will distort the mean.
Do not use "-999" for "missing", sometimes a completely implausible
number such as -999 is used to denote missing.
This can lead to strange visualisations
Dealing with missing data
Only use complete cases
Impute missing values
With a random value
With mean/median or mode
More complicated models
Report/Visualise missing data
By reporting missing data this gives a better idea of uncertainty.
Week 3: Visualising One Variable
1. Categorical variables
分类变量,说明事物的类别,如性别,学科,国籍
1.1 Bar chart
一个坐标轴表示 Categories,一个坐标轴表示 frequency
条形图可以用来 comparing 或 ranking
颜色:种类越多,颜色越分心,不如单一颜色
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Ordinal data:Always order according to categories of the variable,有评级按照评级排序,钻石
等级排序,IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1,不能根据字母顺序、量级排序
1.2 Lollipop charts
适用情况:
种类的数量很多,同时种类的 frequency 差不多
1.3 Pie charts
一般定义成 poor practice,能不用就不用
It is difficult to compare sizes of angles.
It is difficult to make comparisons unless categories are close.
They do not handle large numbers of categories.
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
A donut chart is a pie chart with a hole. It is even worse than a pie chart.
1.4 Tree maps
A treemap consists of packed rectangles where the area of a rectangle corresponds to the size of a
particular measure.
Attributes include:
The size (area) of rectangles
The color of the rectangles
应用:
Using when the number of categories is truly huge
particularly well suited when categories follow a hierarchy
缺点:很难比较矩形的精确面积
2. Numerical Data
2.1 Histogram
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
The equivalent of a bar chart for numerical data is a histogram.
The area of each bar represents the frequency within a certain interval.
If all bars have equal width then frquency is mapped to the length of the bars too.
Zero should always be included on the y axis (but not necessarily x axis).
Right skew; A few big outliers; A spike (second 'mode') at around $50
Bins 数量可以调整
2.2 KDE plot
A kernel density estimates the probability density function (pdf) of data.
For data x1, x2, …,xn, the KDE is given by
The function depends on a bandwidth h,h 如果选的太大,肯定不符合 h 趋向于 0 的要求。h
选的太小,那么用于估计 f(x) 的点实际上非常少。
The bandwidth h controls whether the mountain of sand is 'peaked' or 'flat' .
For small bandwidth the mountain of sand is more peaked and the KDE is more wiggly.
For large bandwidth the mountain of sand is more flat and the KDE is more smooth.
2.3 Boxplot
This type of graph is used to show the shape of the distribution, its central value, and its variability
适合 numerical variables + categorical variables
The boxplot is a summary of five statistics
Median; First Quartile; Third Quartile; Minimum; Maximum
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
2.4 Violin plot
A violin plot mirrors a KDE and fills it in.
It is particularly useful for making comparisons of density according to a grouping variable.
It shows five summary numbers inbox plot, violin plots also show the peaks, valleys, and tails of
each variable's density
2.5 Rug plot
在轴上显示对应的值,可以显示数据的密集程度, highlight outliers.
缺点:harder to understand the shape of the distribution using a rug plot, especially for large sample
sizes.
As a univariate plot, a jittered rug plot (strip plot) works better.
Week 4: Visualising Two Variables
1. Categorical v Categorical variables
1.1 Bar charts
plot the frequency of each category.
用法: Grouped bar charts, Stacked bar charts
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Grouped bar chart Stacked bar chart Percentage stacked bar chart
2. Categorical v Numeric
Side by side boxplots Side by side violin plots
Side by side strip plots Side by side swarm plots (slow)
3. Numerical v Numerical
3.1 Scatter plot
X 轴和 Y 轴都是 numerical variable,可 distribution, outlier, relationship
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Adding rug plot Adding regression line
Regression: X 一般为自变量,Y 一般为因变量
注意 scatterplots (similar to linear regressions) can only show correlation, but not
causality!
Example of negative correlation Example of non-linear relationship
Local Weighted Estimated Scatterplot Smoothing (LOWESS)
▪ This technique combines the idea of nearest neighbours with regression.
▪ The smoothness of this function can be tuned by changing the number of nearest
neighbours.
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
▪ Local regressions estimated by weighted least squares so that closer neighbours are
given more influence.
▪ LOWESS is not ideal for large datasets.
Log transform
▪ For positively skewed data it is often worth looking a log transformation
▪ It particularly makes sense for relationships best understood in terms of percentage
changes
▪ For example prices change by 1%, demand changes by 5%.
Overplotting
▪ With datasets with a large number of observations, points are plotted on top of one
another.
Smaller / Transparent / Add KDE contours / shading / binning
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Week 5: Visualising Many Variables
1. Color 用颜色增加维度
1.1 Examples
Numerical Categorical
1.2 Types of colormaps
1.2.1 Sequential 渐变色
适用范围:ordinal or numeric variables
优点:Perceptually uniform
Large range
Work when printed in black and white
Accessible to colorblind people
Colorful and pretty
Jet v Viridis
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
1.2.2 Divergent colormaps
特点:Central point (usually white or a light color).
Different colors to the left and right.
Colors become darker for more extreme values.
Well suited to data that can be positive or negative(e.g. profit/loss, returns,
growth, trade balances).
Match white with 0
1.2.3 cyclical colormap
特点:'wraps' back around on itself.
useful for displaying data about angles (e.g wind direction) or calendar effects
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
1.2.4 Qualitative Colormap
特点:Used for nominal variables
Ideal for colors to be very different (especially those adjacent on legend)
Avoid reds and greens to make as colorblind friendly as possible.
These is a subtle difference between the default qualitative scheme in Seaborn and
the colorblind version.
Defaut Colorblind Pallette
2 Size and shape
对于在图中加第四个第五个甚至更多的变量,有以下方法
改变大小 Size of the point (bubble plot)
改变形状 Shape of the point
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
增加文本 Use text
2.1Bubble plot 2.2 Point shapes
2.3 Facetting
Bubble plot 颜色太多,marker shapes 会 overly confusing
Facetting:use many simpler plots rather than one complicated plot
Facetting can show roughly which categories have the most observations.
Can show categories with most outliers.
Can show us if and how a relationship between variables can depend on a
nominal variable
We can facet with one variable or with two (across rows and columns).
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Facetting by a single variable
Facetting with wrap Other plots
2.4 Pair plots
Combines scatter plots with histograms or KDEs
Can also use color.
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
3 其他 plots
3.1 parallel coordinates plot
3.2 Spider/radar chart
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Week 6: Visualising Time Series
1 Line plot
- The line plot is the most common plot of a time series
- It shows a single variable on the vertical axis against time on the horizontal axis
- Using this plot we can see
▪ Trend
▪ Seasonal patterns
▪ Calendar effects
▪ Outliers
▪ Volatility clustering
2 Trend
▪ This is an example of data with a trend.
▪ Life expectancy goes up over time.
▪ There is no seasonality (regular repeating pattern).
▪ There are no cycles (irregular repeating pattern).
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
3 Trend and Cycle
▪ As well as trend, the series goes up and down.
▪ These are known as cycles.
▪ The cycles are irregular
▪ Some are longer than others
▪ The peaks and troughs are not always the same
size and do not grow or shrink in a systematic way.
4 Seasonality
▪ The data roughly repeat every 12 periods.
▪ The pattern is amplified over time (which is in
line with the increasing trend of the data).
▪ For higher frequency data there may be
multiple seasonalities (e.g, day of week and
month of year).
5 Stocks v flows 存量/流量
存量 stocks: data measured at a single instant of time,
流量 flows: changes to a stock over a period of time.
e.g. The amount of money in my bank account is a stock, my spending during the
week is a flow.
This motivates looking at first differences, or percentage changes in data.
Plotting Change Percentage Change
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
7 Autocorrelation 自相关
▪ When discussing time series, the concept of autocorrelation is important.
▪ This is the idea that a time series is correlated with its own past values.
▪ Line plots can indicate whether data are positively correlated, negatively correlated
or not correlated.
7.1 3 Types
Positively correlated Negatively correlated Not autocorrelated
Interpreting autocorrelation
▪ For positively autocorrelated data, there will be runs where the series is above or
below its mean.
▪ For negatively autocorrelated data, the series oscillates above and below the mean.
▪ For data with no autocorrelation, the series does not have these patterns.
7.2 Scatterplot
▪ A scatterplot of a variable against its first lag can also indicate positive
autocorrelation
▪ A scatterplot against other lags can show seasonality
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
▪ There are other ways to plot the autocorrelation function that are covered in other
courses on time series
8 Volatility clustering
▪ Sometimes there is no correlation in the mean, but there is correlation in the
variance
▪ Volatile periods more likely to follow volatile periods
▪ Calm periods more likely to follow calm periods
▪ This can be seen using a line plot
▪ It is common with financial returns data
9 Issues with axes
9.1 Should zero be on y axis?
▪ For a line plot, to decide whether to include zero, think about whether 0 is a
sensible value for the y variable to take. 如果只想看变化,0 值不一定重要
▪ For bar plots always include zero. Note that for bar plots, it is more natural to
interpret length of the bar rather than position on the y axis.
9.2 The x-axis
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
注意 x 轴范围 1m or 5 yrs
9.3 Both axes
注意 x,y 轴长度比例
9.4 Banking to 45
▪ As a rough guide, consider lines making up a line plot.
▪ The average angle of these should be about 45 degrees
▪ Good software packages will do this by default.
▪ If you resize an image things may change.
▪ Always look at the axes of a line plot!
Problems
▪ Natural to look at and interpret 'crossing points'
▪ Crossing point nearly always meaningless;
▪ Arbitrarily defined by changing y axis,
▪ Could be different if different units are used.
▪ Put multiple lines on a lineplot ONLY when all variables measured in same units.
Week 7: Visualising Spatial Data
1 Geographical Information Systems (GIS)
It involves complicated problems including how coordinates are obtained, how they
can be projected in different ways, etc.
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Package: geopandas; geoplot
2 World map projection 地图投影
There are different ways to project a sphere onto a flat screen.
All create distortion.
2.1 Mercator projection
The Mercator projection makes areas near the poles look bigger.
2.2 Robinson Projection
Robinson projection helps identify large countries near the North pole.
2.3 Albers projection
For the North America it is common to use an Albers projection.
2.4 Lambert azimuthal projection
For Europe it is common to use an Lambert azimuthal projection.
3 Example: Australia
A Shape file is actually multiple files that store all the information about borders.
Shapefiles are a standard format used globally.
SA4 areas of Australia:
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Statistical Areas Level 4 (SA4) are geographical areas. The SA4 regions are the largest
sub-State regions in the Main Structure of the Australian Statistical Geography
Standard (ASGS). The ASGS is a hierarchical geographical classification, defined by the
Australian Bureau of Statistics (ABS), which is used in the collection and
dissemination of official statistics.
The shapefiles can be downloaded from the Australian Bureau of Statistics.
Mercator Robinson
The Mercator projection looks OK for Australia
3 choropleth
3.1 Example: Australia
Data: mortgage repayments in Sydney (from the Australian Bureau of Statistics)
We merge this with the geopandas data frame
Spatially extensive data: the sum of the properties of elements that make up the
unit. For example, totals are the sum of the items counted in the unit.
Spatially intensive data: values such as population density or cancer rates can
describe any part of the unit. These statistics do not depend on the size of the unit. If
you divide the unit, the value will stay the same.
Commonwealth of Australia
WARNING
This material has been reproduced and communicated to you by or on behalf of the Navigator Union Pty Ltd in
accordance with section 113P of the Copyright Act 1968 (Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
3.2 Misleading choropleths
▪ In many countries, the population density is not uniform
▪ This includes Australia, the US and China where most of the population lives on the
coast and well as India and France as other examples with less densely populated
regions.
▪ This commonly leads to misinterpretation of data.
4 Heatmaps
▪ A heatmap is a similar idea with areas shown using different colors.
▪ However, a heatmap is used to visualise numbers in a matrix.
A data matrix
A correlation matrix