Python代写-CEGE0117
时间:2022-05-12
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
5. Data Visualisation
Dr. Brian Healy
Institute of Financial Technology
CEGE0117: Data Analytics & Machine Learning
11th January 2022
| 1 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Summary
Visualisation is a fundamental part of the data scientist’s toolkit
It is very easy to create visualization but it is much harder to
produce good ones
There are two primary uses for data visualization:
1 To explore data
2 To communicate data
Summary | 2 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Explore The Data
Explore First .......... Model Later
When you know questions you’re trying to answer and have gotten
your hands on some data, the temptation is to immediately start
building models.
You should resist this urge and spend some time exploring your
data.
Data Exploration | 3 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
One-Dimensional Data
The simplest situation is one-dimensional data such as:
- The daily average number of minutes each user spends on your site
- The number of times each of a collection of data science tutorial
videos was watched
- The number of pages of each of the data science books in your data
science library.
A good first step is to compute some summary statistics. You’d like
toknow how many data points you have, the smallest, the largest,
the mean,and the standard deviation.
Data Exploration | One-Dimensional Data 4 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
One-Dimensional Data
A good first step is to compute some summary statistics. You’d like
toknow how many data points you have, the smallest, the largest,
the mean,and the standard deviation.
Data Exploration | One-Dimensional Data 5 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Matplotlib
matplotlib is an older but very useful library
It can produce complicated plots of many types
You can have plots within plots
You can easily change all the formatting options
Plots can be interactive
The only way to learn is to experiment
Data Exploration | One-Dimensional Data 6 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Simple Line Plot
1 from matplotlib import pyplot as plt
2
3 years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
4 gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]
5
6 # create a line chart, years on x-axis, gdp on y-axis
7 plt.plot(years, gdp, color='green', marker='o', line)
8
9 # add a title
10 plt.title("Nominal GDP")
11
12 # add a label to the y-axis
13 plt.ylabel("Billions of $")
14 plt.show()
Data Visualisation | Line Plots 7 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Simple Line Plot
Data Visualisation | Line Plots 8 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
A Line Plot
As we saw already, we can make line charts using plt.plot. These are
agood choice for showing trends
1 variance = [1, 2, 4, 8, 16, 32, 64, 128, 256]
2 bias_squared = [256, 128, 64, 32, 16, 8, 4, 2, 1]
3 total_error = [x + y for x, y in zip(variance, bias_squared)]
4 xs = [i for i, _ in enumerate(variance)]
1 total_error
2 Ouput:
3 [257, 130, 68, 40, 32, 40, 68, 130, 257]
4 xs
5 Output:
6 [0, 1, 2, 3, 4, 5, 6, 7, 8]
Data Visualisation | Line Plots 9 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Simple Line Plot
Make multiple calls to plt.plot to show multiple series on the same
chart
Use a different colour for each line
Lines can also be dotted, dashed etc
We use our xs variable as the labels for the horizontal axis (x-axis)
1 # green solid line
2 plt.plot(xs, variance, 'g-', label='variance')
3
4 # red dot-dashed line
5 plt.plot(xs, bias_squared, 'r-.', label='bias^2')
6
7 # blue dotted line
8 plt.plot(xs, total_error, 'b:', label='total error')
Data Visualisation | Line Plots 10 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Simple Line Plot
Here is our current plot
Data Visualisation | Line Plots 11 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
A Line Plot
We can also make further improvements by:
1 Adding a label to the x-axis
2 Including a legend which will by default use the labels we provided
3 Removing the index from the x-axis as they are not informative in this
case
4 Adding a title
1 lt.xlabel("Model Complexity")
2 lt.legend(loc="upper center")
3 lt.xticks([])
4 lt.title("The Bias-Variance Tradeoff")
Data Visualisation | Line Plots 12 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Simple Line Plot
The final result looks like:
Data Visualisation | Line Plots 13 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Next Steps with Line Plots
There are further things we can do
1 Embed plots in notebook
5. Use one of the many nice styles available
1 %matplotlib inline
2 import matplotlib.pyplot as plt
3 import numpy as np
4
5 plt.style.use('seaborn-whitegrid')
Data Visualisation | Line Plots 14 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Next Steps with Line Plots
1 fig = plt.figure()
2 ax = plt.axes()
Data Visualisation | Line Plots 15 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Next Steps with Line Plots
In Matplotlib, a figure is a container for all the objects corresponding
to a plot:
Axes
Lines
Legends
Labels
Background colour
Data Visualisation | Line Plots 16 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Next Steps with Line Plots
Here is a simple plot of the sin() function.
1 fig = plt.figure()
2 ax = plt.axes()
3 x = np.linspace(0, 10, 1000)
4 plt.plot(x, np.sin(x)); # the ; keeps the plot inline
Data Visualisation | Line Plots 17 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Next Steps with Line Plots
Let’s add a second series, cos()
Note how matplotlib chooses colours for us
1 plt.plot(x, np.sin(x))
2 plt.plot(x, np.cos(x));
Data Visualisation | Line Plots 18 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Changing Colours and Styles in Plots
We have full control over the colour and have many ways of
specifying it:
1 By name (all HTML colour names supported)
2 By short colour code (r,g,b,c,m,y,k)
3 Using grayscale (0.0 to 1.0)
4 By RGB hex code (RRGGBB from 00 to FF)
5 By RGB tuple (0.0 to 1.0, 0.0 to 1.0, 0.0 to 1.0)
1 plt.plot(x, np.sin(x - 0), color='blue')
2 plt.plot(x, np.sin(x - 1), color='g')
3 plt.plot(x, np.sin(x - 2), color='0.75')
4 plt.plot(x, np.sin(x - 3), color='#FFDD44')
5 plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3)) ;
Data Visualisation | Line Plots 19 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Changing Colours and Styles in Plots
We can now see the different colours
Data Visualisation | Line Plots 20 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Changing Colours and Styles in Plots
We can change the line styles:
1 Solid line (’solid’ or ’-’)
2 Dashed line (’dashed’ or ’–’)
3 Dash-dot line (’dashdot’ or ’-.’)
4 Dotted line (’dotted’ or ’:’)
1 plt.plot(x, np.sin(x - 0), line)
2 plt.plot(x, np.sin(x - 1), line)
3 plt.plot(x, np.sin(x - 2), line)
4 plt.plot(x, np.sin(x - 3), line);
Data Visualisation | Line Plots 21 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Changing Colours and Styles in Plots
Here we see how the different line styles help to visualise the
different series
Data Visualisation | Line Plots 22 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Changing The Axes in Plots
Let’s go back to a simple plot of sin() and alter the axes
2. Change the x-axis to run from -1 to 11
3. Change the y-axis to run from -1.5 to 1.5
1 plt.plot(x, np.sin(x))
2 plt.xlim(-1, 11)
3 plt.ylim(-1.5, 1.5);
Data Visualisation | Line Plots 23 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Changing The Axes in Plots
Sometimes we wish to flip an axes:
2. Change the x-axis to run from 10 to 0
3. Change the y-axis to run from 1.2 to -1.2
1 plt.plot(x, np.sin(x))
2 plt.xlim(10, 0)
3 plt.ylim(1.2, -1.2);
Data Visualisation | Line Plots 24 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Changing The Axes in Plots
We can also change axes by directly accessing the axis property of
the plot object.
2. Change the x-axis to run from -1 to +11 and the y-axes to run from
-1.5 to +1.5
1 plt.plot(x, np.sin(x))
2 plt.axis([-1, 11, -1.5, 1.5]);
Data Visualisation | Line Plots 25 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Plot Labelling
We should give our plots the following labels:
2. A title for the plot
3. A name for the x-axis
4. A name for the y-axis
1 plt.plot(x, np.sin(x))
2 plt.title("A Sine Curve")
3 plt.xlabel("x")
4 plt.ylabel("sin(x)");
Data Visualisation | Line Plots 26 / 27
INSTITUTE OF FINANCIAL TECHNOLOGY
CEGE0117: DATA ANALYTICS & MACHINE LEARNING
Plot Labelling
This looks much better:
Data Visualisation | Line Plots 27 / 27

essay、essay代写