CSC8621-无代写
时间:2023-05-24
CSC8621: Computing Foundations of Data Science i
CSC8621: Computing Foundations of Data Science
CSC8621: Computing Foundations of Data Science 1 / 45
Chapter 1
CSC8621: Computing Foundations of Data Sci-
ence
Note
Colophon
This is the print version of my slides. It should be the same as the main content, but I haven’t checked it extensively.
1.1 Programming
• Are primarily written for humans to understand
• Computers come second
• It must be clear to others (including your future self)
• how the program works
• in order to modify, fix or extend the program
1.2 Programming
• Two key points to learning a program
• You need to learn what you want to say
• And how to say it.
• With normal language, first is easy, second is hard
• With computers, it is the other way around.
1.3 Programming
• There are many programming languages
• Differ in both grammar, vocabulary, libraries
• Some programmers learn one language then stick to it
• Others use the one is most suited to their task
CSC8621: Computing Foundations of Data Science 2 / 45
1.4 Programming
• In this module, we focus on Python
• Will give overview, don’t worry about things you do not understand
1.5 Why learn Python?
• Easy to use, allows quick development
• Relatively clean, so easy to read after 2 years
• Good scientific libraries (http://www.scipy.org/)
1.6 Why learn Python?
• It is interpreted i.e. no need to compile it
• It can support Object-Orientation (OO)
• It is a Beginner’s Language
• It does have disadvantages
1.7 Why learn Python?
• It is cross-platform
• Runs on Windows, Mac and Linux
• In this course, we will use it on Windows and Linux
• At home, you can:
– Use the Web (https://repl.it/languages/python3)
– Install it on Windows/Mac/Linux
– Use a Virtual Machine (VM)
– Buy a Raspberry Pi
1.8 Summary (Programming)
• Computer languages are designed for humans
• There are many languages
• Learn the programming fundamentals (using Python)
• We start with "procedural" and "functional", then move to "OO"
CSC8621: Computing Foundations of Data Science 3 / 45
1.9 Values and Variables
Topics:
• hello_world.py
• Values and Operators
• Types
• Variables
1.10 hello_world
• All computing starts with hello_world
• https://en.wikibooks.org/wiki/Computer_Programming/Hello_world
print("Hello World")
Outputs:
Hello World
1.11 On the use of code
• Programming is like learning a human language
• Be a parrot
• Fiddle with things
• Try changing it a bit ("Goodbye World")
• Type (NOT cut-and-paste) code in
Note
The advice to type things in sounds daft; it’s quicker, easier and more accurate to cut-and-paste. But by typing you will be
learning a number of things. You will get to appreciate the syntax of python; you will learn the colour scheme of your IDE; you
will
learn all the keywords. Most importantly, though, you will get it
wrong. Try and fix it yourself. Eventually, if you really can’t
get
it to work, try cutting-and-pasting. And get the source code (they are
hyperlinked from the book versions of this document).
If it works
then, compare it line by line ("diff" is a good tool for this, once you
understand the output) and work out what you did
wrong. Remember the error message for next time.
1.12 broken_world
• And what happens when it goes wrong
• Note: Python files which are supposed to break are commented
CSC8621: Computing Foundations of Data Science 4 / 45
## Status: Crash
prin("Hello World")
Crashes:
Traceback (most recent call last):
File "broken_world.py", line 3, in
prin("Hello World")
NameError: name 'prin' is not defined
1.13 The python shell
• A "shell" is something you type things into and get a response
• Also called REPL (Read-Eval-Print Loop)
• Python’s shell is mostly useless in serious development
• But nice to try things out
• Avoids lots of print statements
1.14 Python as a calculator
• In this case 2 is a value
• While + is an operator
• >>> is the prompt and is printed by python
• Note: Python files which are supposed to be typed in the shell are commented
>>> ## Status: Shell
>>> 2 + 2
4
1.15 Python as a calculator
• We can also add comments with #
• Comments are important! Code is for people to read.
>>> ## Status: Shell
>>> # This is a comment
>>> 2 + 2
4
CSC8621: Computing Foundations of Data Science 5 / 45
>>> 2 + 2 # and a comment on the same line as code
4
1.16 Python as a calculator
• Values have types
• So far we have created integer values
• Python has other numeric types also
>>> ## Status: Shell
>>> ## decimal "floating points"
>>> (50-5*6)/4
5.0
>>> 3 * 3.75 / 1.5
7.5
>>> ## complex numbers
>>> 1j * 1j
(-1+0j)
1.17 Python and strings
• We can enter strings in many ways
• ... is another prompt, when python is expecting more input
>>> ## Status: Shell
>>> ## python has lots of ways of giving strings
>>> "a" + "b"
'ab'
>>> ## Single quotes hide double
>>> 'Here is a "word"'
'Here is a "word"'
>>> ## Triple (single or double) quotes hide new lines
>>> """Now is the winter of our discontent,
CSC8621: Computing Foundations of Data Science 6 / 45
... Made glorious by the sun over Newcastle"""
'Now is the winter of our discontent,\nMade glorious by the sun over Newcastle'
1.18 Python and strings
• Python will coerce between numeric types
• But not strings and numerics
## Status: Crash
2 + "2"
Crashes:
Traceback (most recent call last):
File "type_mismatch.py", line 3, in
2 + "2"
TypeError: unsupported operand type(s) for +: 'int' and 'str'
1.19 Variables
• We now have some useful programs
• They automate things
• But always the same thing!
1.20 Variables
• We need to be able to store a value
• And change these values on each invocation
• So, we try and calculate something more dynamic
1.21 Variables
• We can assign a value to a variable
• In python a variable is defined using a name (or "identifier")
CSC8621: Computing Foundations of Data Science 7 / 45
1.22 Variables
x = 200
• x — the identifier
• = — the assignment
• 200 — the value
Note
Most languages have all of the features shown here, but the details vary. Python uses implicit line termination; most languages
have explicit ones such as ";". Some use words (such as end). Some use whitespace or end of line. Some are completely
free-form and just work it out.
The assignment operator of "=" is almost universal. Most languages that you are likely to see use this as the main form of
assignment. Except for R, which uses→ (although nowadays you can use = if you want, although many people don’t). R was
written by statisticians, which explains a lot.
Most
languages have identifiers, although all languages seem to have
different rules about what characters are legal in them, as
well as different conventions about how you use them. With python, in general, lower_case, and underscore_seperated_words.
Some languages have explicit types --- python just knows we are talking about integers here, while languages like Java force
you to declare these up front. There are lots of arguments about whether this is a good thing or not.
1.23 Variables
• Aside: most examples are now not using the shell
• If we wish to see things in the output, we need to print them
#!/usr/bin/python3
x = 1
y = 2
print(x + y)
Outputs:
3
1.24 Variables: identifiers
• Getting the naming wrong is a common mistake
• Check if compile errors contain the word "name"
• Normally, this just means a spelling mistake
CSC8621: Computing Foundations of Data Science 8 / 45
1.25 Variables: identifiers
• Follow coding conventions which are stricter
• Always try and follow the coding conventions for the language you are using
– Use snake_case with underscores
– Start with lower_case rather than Upper_case
– https://docs.python.org/3/reference/lexical_analysis.html#identifiers
• Use descriptive names
1.26 Variables
• We try and write some code to calculate some percentages
• This code is ugly because the print lines are too long
#!/usr/bin/python3
## Status: Crash
students_in_room = 40
students_awake = 20
students_listening = 4
print("Percentage of students awake is:" + (students_awake/students_in_room * 100))
print("Percentage of students listening is:" + (students_listening/students_in_room * 100))
1.27 Variables
• Style is important
• We can do multiple lines
#!/usr/bin/python3
## Status: Crash
students_in_room = 40
students_awake = 20
students_listening = 4
print("Percentage of students awake is:"
+ (students_awake/students_in_room * 100))
print("Percentage of students listening is:"
+ (students_listening/students_in_room * 100))
CSC8621: Computing Foundations of Data Science 9 / 45
1.28 Variables
• But this code crashes
• Why?
#!/usr/bin/python3
## Status: Crash
students_in_room = 40
students_awake = 20
students_listening = 4
print("Percentage of students awake is:"
+ (students_awake/students_in_room * 100))
print("Percentage of students listening is:"
+ (students_listening/students_in_room * 100))
Crashes:
Traceback (most recent call last):
File "students_awake_2.py", line 9, in
print("Percentage of students awake is:"
TypeError: can only concatenate str (not "float") to str
1.29 Variables
• If we separate with a comma, it all works
• print is a variadic function
#!/usr/bin/python3
students_in_room = 40
students_awake = 20
students_listening = 4
print("Percentage of students awake is:",
students_awake/students_in_room * 100)
print("Percentage of students listening is:",
students_listening/students_in_room * 100)
Outputs:
Percentage of students awake is: 50.0
Percentage of students listening is: 10.0
1.30 Variables
• Our variables are not very variable
• Still hard-coded in the source file
CSC8621: Computing Foundations of Data Science 10 / 45
• Let’s try reading from a file instead
• Variables can have values of many types!
#!/usr/bin/python3
f = open('x_plus_y.in', 'r')
x = f.readline()
y = f.readline()
f.close()
print(x + y)
1.31 Variables
f = open( "x_plus_y.in", "r" )
• f --- a variable
• = --- assignment
• open --- "open" a file
• "x_plus_y.in" --- the name of the file
• "r" --- read from the file
1.32 Variables
x = f.readline()
• read the line from the file in f
• assign the value of the line to x
1.33 Variables
• What will this print?
#!/usr/bin/python3
f = open('x_plus_y.in', 'r')
x = f.readline()
y = f.readline()
f.close()
print(x + y)
CSC8621: Computing Foundations of Data Science 11 / 45
1.34 Variables
• Probably not what you were expecting
• readline returns strings
• readline includes the newline!
• So, we are concatenating two string
#!/usr/bin/python3
f = open('x_plus_y.in', 'r')
x = f.readline()
y = f.readline()
f.close()
print(x + y)
Outputs:
2
3
1.35 Variables
• Here is a working version
• int function converts string to a number
#!/usr/bin/python3
f = open('x_plus_y.in', 'r')
x = int( f.readline() )
y = int( f.readline() )
f.close()
print(x + y)
Outputs:
5
1.36 Summary (Values and Variables)
• Python programs manipulate values
• Operators can combine or alter values
• Values have a type
• Variables can hold any type of value
• We can assign to a variable with =
• We can access the value implicitly
CSC8621: Computing Foundations of Data Science 12 / 45
1.37 Composite Data Types
• What if we want to store many values?
• But we need to know how many in advance
#!/usr/bin/python3
students_in_room = 100
student1 = "John"
student2 = "Paul"
student3 = "George"
student4 = "Ringo"
1.38 Another way
• We could do this
• Ugly and doesn’t work for all kinds of value
#!/usr/bin/python3
students_in_room = 100
student_names = "John:Paul:Ringo:George"
1.39 Lists
• We can use lists instead
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> students_in_room = 100
>>> student_names = ["John", "Paul", "George", "Ringo"]
1.40 Lists
• Think of it like a set of pigeon holes
• Lists are values, so can be held by variables
• We can access individual values by index
• Lists start from 0
CSC8621: Computing Foundations of Data Science 13 / 45
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> students_in_room = 100
>>> student_names = ["John", "Paul", "George", "Ringo"]
>>> student_names[0]
'John'
>>> student_names[3]
'Ringo'
Note
The
reasons that lists start with zero are somewhat lost in time, although
Wikipedia has a brief discussion. Unfortunately, this
varies between languages, with some starting at 1. Again, Wikipedia has a reasonable list. Most of the languages, you will
meet will be zero-indexed, with R being a notable exception. For a somewhat wryer investigation of the issue, try XKCD. It
would be more amusing, if it were not a source of so many programming errors that we have a name for them: "off-by-one".
The
only excuse that language designers have for this, is that it is not
unique to programming: floor numbering in the UK is zero
(or "Ground") indexed, while in the US it starts at one.
1.41 Lists
• We can slice and dice lists
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> students = ["John", "Paul", "George", "Ringo"]
>>> ## access elements
>>> students[0]
'John'
>>> students[3]
'Ringo'
>>> ## from the end
>>> students[-2]
'George'
>>> ## slicing
CSC8621: Computing Foundations of Data Science 14 / 45
>>> students[1:3]
['Paul', 'George']
>>> students[1:-1]
['Paul', 'George']
Note
If you come from a Java background all of this might be quite a surprise, because lists in Java do, well, very little. Java
collections are more fully featured, and can do most of the things that python lists can; however, they are all based around
method calls on objects, while python has syntax. This might not seem important, but if you are writing heavily numeric code
(and much scientific code is heavily numeric), having an easy-to-read syntax, is important.
Python is quite restrained, compared to R. Consider:
z[z*z > 8]
which means "all the elements whose square is greater than 8".
This kind of functionality is available in the NumPy library, so it’s possible to do in python.
1.42 Lists
• Operators work on lists
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> students = ["John", "Paul", "George", "Ringo"]
>>> students[:2] + ["four", 12]
['John', 'Paul', 'four', 12]
>>> 3 * students[:-2]
['John', 'Paul', 'John', 'Paul', 'John', 'Paul']
>>> ar = students[:2] + ["four", 12]
>>> ar[3] ** 2
144
CSC8621: Computing Foundations of Data Science 15 / 45
Note
If you come from an R background, then python list operators will not do what you expect. So
[1,2,3] + [4,5,6]
in python will return
[1,2,3,4,5,6]
while the equivalent in R would return "5,7,9".
[1,2,3] + 4
crashes in python but returns "5,6,7" in R.
Python’s list operators work only on the list not the values in the list as R’s operators do. This means that Python operators
make sense for any kind of value, while R’s are fairly handy for numeric values, which is what R was designed for.
1.43 Lists
• A list knows how long it is
• In python this is not that useful
>>> ## Status: Shell
>>> students_in_room = 100
>>> student_names = ["John", "Paul", "George", "Ringo"]
>>> len(student_names)
4
1.44 Dictionaries
• Dictionaries are also very useful
• Map between keys and values
• As with a list, you can access individual elements
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> student = {"john":1940,
... "paul":1942,
... "george":1943,
... "ringo":1940
CSC8621: Computing Foundations of Data Science 16 / 45
... }
>>> student
{'john': 1940, 'paul': 1942, 'george': 1943, 'ringo': 1940}
>>> student['john']
1940
>>> student['pete'] = 1941
>>> student.keys()
dict_keys(['john', 'paul', 'george', 'ringo', 'pete'])
>>> del student['ringo']
>>> student
{'john': 1940, 'paul': 1942, 'george': 1943, 'pete': 1941}
Note
Dictionaries are also called "hashes", "maps" or "tables" in other languages. Some languages also use them to provide lists,
just using numeric keys. With the combination of dictionaries and lists you can do just about anything.
1.45 Dictionaries
• The key-value pairs are unordered
• Sort of
– In Python 3.7 and upward, they are ordered
– In Python 3.6, they happened to be ordered
– In Python 3.5 and previous, they were not ordered
• The distinction between ordered and unordered is important
• What code happens to do, and what it guarantees to do
1.46 Tuples
• There are two lesser used structures, tuples and sets
• Tuples are sequences (like lists)
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> t = (1,2,3,4,5)
CSC8621: Computing Foundations of Data Science 17 / 45
>>> t[0]
1
>>> t[1:3]
(2, 3)
1.47 Tuples
• But they are immutable
#!/usr/bin/python3
## Status: Crash
t = (1,2,3,4,5)
t[2] = 4
Crashes:
Traceback (most recent call last):
File "tuples_break.py", line 6, in
t[2] = 4
TypeError: 'tuple' object does not support item assignment
1.48 Sets
• Sets are unordered and contain no duplicates
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> {1,2,3}
{1, 2, 3}
>>> {5,4,3,2,1}
{1, 2, 3, 4, 5}
>>> {1,2,3,4,5,1,2,3,4,5}
{1, 2, 3, 4, 5}
1.49 Operations
• Most common thing with a list is to do something to every element
• There are several ways to do this
CSC8621: Computing Foundations of Data Science 18 / 45
• List comprehensions are the most concise
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> y = [1,2,3,4,5]
>>> ## square every element in y
>>> [x**2 for x in y]
[1, 4, 9, 16, 25]
>>> ## or in the given list
>>> [x**2 for x in [1,2,3,4,5]]
[1, 4, 9, 16, 25]
>>> ## we can use any variable for the iterator variable
>>> [n**2 for n in y]
[1, 4, 9, 16, 25]
>>> ## even this!
>>> [y**2 for y in y]
[1, 4, 9, 16, 25]
1.50 Operations
• Combined with the range function can be powerful
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> [x**2 for x in range(100)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, ←↩
441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, ←↩
1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, ←↩
2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, ←↩
4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, ←↩
6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, ←↩
8649, 8836, 9025, 9216, 9409, 9604, 9801]
CSC8621: Computing Foundations of Data Science 19 / 45
1.51 Operations
• The other way is to use a for loop
#!/usr/bin/python3
x = [1,2,3,4,5]
for i in x:
print(i, end=" ")
print()
print()
for i in range(100):
print(i, end=" ")
Outputs:
1 2 3 4 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 ←↩
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 ←↩
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 ←↩
92 93 94 95 96 97 98 99
1.52 Blocks (a brief diversion)
• In the last example, we have 4 code blocks
• The list creation
• The first for loop
• Two print statements
• The second for loop
1.53 Blocks
• Blocks are an important syntactic concept
• You will see them used in other areas
• Python uses indentation based blocks
• All code in one block is indented to the same level
• The python designers still think this is a good idea
CSC8621: Computing Foundations of Data Science 20 / 45
Note
The idea was, basically, this. Many languages use curly braces ({}) to show blocks. This is fine for a computer but horrible
for the programmer. So, programmers indent their code to show the block structure. This leads to a class of errors where the
indentation suggests one set of blocks, while the curly braces suggest another. The idea with python was to say, well, let’s
make them the same.
I think this is bogus though. Almost everyone nowadays uses their editor to do the indentation; so you can use the automated
indentation to check that your blocks are what you meant them to be.
Worse, because you are using indentation, python has to guess where the end of the block is. This seems like a small issue
but is not. This leads to some awfulness like the pass statement, but worse, when you are cutting-and-pasting, you, the
programmer, don’t know whether you have cut the end of the block or not, and when you paste in, the editor doesn’t really know
how far to indent.
Python enthusiasts always seem to emphasize that keeping the block structure as simple as possible makes simpler code.
True. But I still don’t think this is an attractive feature of python. No language is perfect.
1.54 Operations
• Sometimes you need the index as well
• This works because python supports multiple return values
#!/usr/bin/python3
for i, v in enumerate(['tic', 'tac', 'toe']):
print(i, v)
Outputs:
0 tic
1 tac
2 toe
Note
Generally, you don’t, but in many languages, you have to use the index. Java used to be like this, although now this has been
fixed.
1.55 Operations
• Are some other things you can do with a list
• sorting
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> x = [5,4,3,2,1]
>>> x
[5, 4, 3, 2, 1]
CSC8621: Computing Foundations of Data Science 21 / 45
>>> ## returns a *new* sorted list
>>> sorted( x )
[1, 2, 3, 4, 5]
>>> ## original is unaffected
>>> x
[5, 4, 3, 2, 1]
1.56 Operations
• And reversing
>>> #!/usr/bin/python3
>>> ## Status: Shell
>>> list(reversed([1,2,3,4,5]))
[5, 4, 3, 2, 1]
>>> ## or in place
>>> x = [1,2,3,4,5]
>>> x.reverse()
>>> x
[5, 4, 3, 2, 1]
1.57 Operations
• There are more things that you might want to do
• But for these, we need functions which we look at next.
1.58 Summary (Composite Data Types)
• List, dictionary, tuple and set
• List comprehension
• Iteration using the for loop and enumerate
• Sorting and reversing
CSC8621: Computing Foundations of Data Science 22 / 45
1.59 Functions
• Functions are a critical part of programming
• Most programs consist of a large number of functions
• You have already used functions
– int, list and sorted are examples
1.60 Functions
def add(x,y):
return x + y
• def— function definition coming up
• add— an identifier or name for the function
• x,y— the parameters of the function
• :— here comes a block
• return— return the result of this statement
• x+y— add x and y!
1.61 Functions
• A function takes a number of parameters and returns a value
#!/usr/bin/python3
def add(x,y):
return x+y
print(add(2, 3))
Outputs:
5
1.62 Functions
• add just does what the + operator does
• Let’s try a function which does something new
• This calculates percentages
CSC8621: Computing Foundations of Data Science 23 / 45
#!/usr/bin/python3
def percentage(x,y):
return x / y * 100
print(percentage(2,4))
Outputs:
50.0
1.63 Functions
• We can call percentage from many places
#!/usr/bin/python3
def percentage(x,y):
return x / y * 100
print(percentage(2,4))
print(percentage(1,10))
print(percentage(56,60))
Outputs:
50.0
10.0
93.33333333333333
1.64 Functions (scope)
• It is possible to have two variables with the same name
• This seems an obscure thing to do, but is very useful
• Has important consequences
#!/usr/bin/python3
def percentage(x,y):
return x / y * 100
print(percentage(2,4))
Outputs:
50.0
CSC8621: Computing Foundations of Data Science 24 / 45
1.65 Functions (scope)
• The fact that we have defined x and y outside of the function makes no difference
• The x outside of the function, and the one inside are different variables
• The inner x masks the outer.
#!/usr/bin/python3
def percentage(x,y):
return x / y * 100
x = 2
y = 4
print(percentage(x,y))
Outputs:
50.0
Note
Scoping is there to make your life easier and not harder; it means that we do not have to keep track of the variable names that
I have used in each method.
Consider
this metaphor. If two students have exactly the same name, it is going
to make life very difficult, in terms of marking
and keeping things straight. But we only need to worry if they are doing the same module or degree, so it is normally not a
problem. Imagine the situation, though, where we had to worry if two students in one University had the same name; or in the
country, or the world.
1.66 Functions (scope)
• Variables defined outside a block are visible inside
• Useful, but can be a source of errors
• In Python, best practice is to do this only for constants
#!/usr/bin/python3
z = 4
def percentage(x):
return x / z * 100
x = 2
print(percentage(x))
Outputs:
50.0
CSC8621: Computing Foundations of Data Science 25 / 45
1.67 Functions
• In Python, a function:
– is an instance of Object
– can be assigned to a variable
– can be passed as a parameter to other functions
– can be returned as values from other functions
– can be stored in data structures
• Therefore, called First-Class functions
1.68 Functions
• Functions are also values
• They can be passed to other functions
• Reduce applies a function to first two elements
• Then applies to result and next element
• And so on
#!/usr/bin/python3
from functools import reduce
def add(x,y):
return x+y
l = [1,2,3,4,5]
print(reduce(add,l))
Outputs:
15
1.69 Functions
• Or using range
#!/usr/bin/python3
from functools import reduce
def add(x,y):
return x+y
print(reduce(add,range(100)))
Outputs:
4950
CSC8621: Computing Foundations of Data Science 26 / 45
1.70 Modules (a brief diversion)
• You probably noticed these statements earlier
from functools import reduce
• Python has a module system
• This works rather like an address
• Within Newcastle "Science Central" is enough, but having world-wide unique names would be hard.
1.71 Modules
• You can import many things at once import functools
• Or just specific parts from functools import reduce
• At first, it’s easier to import as much as possible
• Tends to lead to bugs and name clashes.
1.72 Functions
• Functions can use values which are not parameters
• Map applies a function to every element in a list
#!/usr/bin/python3
def by_ten(x):
return x * 10
print(by_ten(3))
l = [1,2,3,4,5]
j = list(map(by_ten,l))
print(l)
print(j)
Outputs:
30
[1, 2, 3, 4, 5]
[10, 20, 30, 40, 50]
CSC8621: Computing Foundations of Data Science 27 / 45
1.73 Functions
• Functions can use other functions
• Can introduce new variables
• Can cover multiple lines
#!/usr/bin/python3
from functools import reduce
def add(x,y):
return x+y
def mean(l):
tot = reduce(add,l)
return tot/len(l)
l = [1,2,3,4,5]
print(mean(l))
print(mean(range(1000)))
Outputs:
3.0
499.5
1.74 Functions
• New variables are locally scoped
• They cannot be accessed outside the function
#!/usr/bin/python3
## Status: Crash
from functools import reduce
def add(x,y):
return x+y
def mean(l):
tot = reduce(add,l)
return tot/len(l)
print(mean([1,2,3,4,5]))
print(tot)
Crashes:
Traceback (most recent call last):
File "average_crash.py", line 15, in
print(tot)
NameError: name 'tot' is not defined
CSC8621: Computing Foundations of Data Science 28 / 45
1.75 Functions
• There is more to be said on functions
• But first conditions
1.76 Conditions
• Sometimes, want to do different things at different times
• We have a condition "if this then do that"
#!/usr/bin/python3
def greater_than_ten(x):
if( x > 10 ):
return True
print(greater_than_ten(20))
print(greater_than_ten(2))
Outputs:
True
None
1.77 Conditions
• This doesn’t feel right
• True or None?
#!/usr/bin/python3
def greater_than_ten(x):
if( x > 10 ):
return True
print(greater_than_ten(20))
print(greater_than_ten(2))
Outputs:
True
None
CSC8621: Computing Foundations of Data Science 29 / 45
1.78 Conditions
• We can also do "if this then do that, else do the other"
#!/usr/bin/python3
def greater_than_ten(x):
if( x > 10 ):
return True
else:
return False
print(greater_than_ten(20))
print(greater_than_ten(2))
Outputs:
True
False
1.79 Conditions
• Although in this case, it’s not needed
#!/usr/bin/python3
def greater_than_ten(x):
if( x > 10 ):
return True
return False
print(greater_than_ten(20))
print(greater_than_ten(2))
Outputs:
True
False
1.80 Functions
• In general, functions should be self-contained
• They should do one thing and only one thing
• Operating only on parameters
• This maximises the opportunities
• It minimises risks of errors
• If errors occur, it makes them easier to find.
• Scoping is a language feature which enables this.
CSC8621: Computing Foundations of Data Science 30 / 45
1.81 Functions (organisation)
Imagine a single block of code. This has problems:
• Hard-to-read code — you have to read it all to understand what it does
• Hard-to-reuse code
• Hard-to-modify code
• Code duplication
• It is hard to design programs in this way.
1.82 Functions
To buy the food on your shopping list, you need to:
• Put on your shoes
• Cycle/Bus/Metro/Drive to the shops
• Put the food into your basket
• Pay for the food
• Cycle/Bus/Metro/Drive home.
1.83 Functions
Hard-to-read:
The shopping list would get long if it had shoelace tying instructions,
a bus timetable, and the terms and conditions
for your bank card.
Hard-to-reuse: You are probably wearing shoes now, and you must have got here in some way. Instructions about shoelaces and
bus timetables are needed for many activities besides shopping.
Hard-to-modify: As well as being hard to find the right instruction, it’s also hard to test. As it stands, if you buy some new
shoes, you have to go shopping to test it all still works. This is made worse by. . . .
Code duplication: The instructions for going to the shops are going to be largely the same as instructions for coming from it.
Hard-to-design: You need to think of everything at once.
1.84 Functions
• Most programs have this form of organisation
• Main program is normally very short
• In python, function must be defined before being called
CSC8621: Computing Foundations of Data Science 31 / 45
def fun1():
pass
def fun2():
pass
def fun3():
pass
## main program
statement
statement
statement
1.85 Functions
• Although you can refer to functions before defining them.
#!/usr/bin/python3
def fun1():
return fun2()
def fun2():
return fun1()
1.86 Functions
• Functions can also refer to themselves
• Called a recursive function
• They are very useful
• They tend not to fit well into the way people think, so can confuse
• For some problems, they produce very elegant solutions
Common Uses:
• Almost all sort functions are recursive
• Operating over graphs and trees
– Operating over web pages
1.87 Functions
#!/usr/bin/python3
def lift_off(countdown):
if(countdown>0):
print(countdown)
CSC8621: Computing Foundations of Data Science 32 / 45
lift_off(countdown-1)
else:
print("lift off")
lift_off( 10 )
Outputs:
10
9
8
7
6
5
4
3
2
1
lift off
1.88 Functions
• Python does a few things that other language don’t
• Useful to know for completeness
• Python can return several values from functions
• enumerate does this
#!/usr/bin/python3
def swap(x,y):
return y,x
print(swap(1,2))
x,y = swap(1,2)
print(x)
print(y)
Outputs:
(2, 1)
2
1
1.89 Functions
• Parameters by keyword, rather than position
• You can mix them, but all the position args must come first
CSC8621: Computing Foundations of Data Science 33 / 45
#!/usr/bin/python3
def named_parameters(one,two,three):
print(one)
print(two)
print(three)
print("Positional")
named_parameters("one", "two", "three")
print("Named")
named_parameters(three="three",two="two",one="one")
print("Mixed")
named_parameters("one",three="three",two="two")
Outputs:
Positional
one
two
three
Named
one
two
three
Mixed
one
two
three
1.90 Functions
• You can add doc strings!
#!/usr/bin/python3
def boring():
"""This function really does very little and is
therefore quite a dull function"""
pass
1.91 Functions (long example)
• Finish functions with a longish example
• Need one more language feature — the while loop
#!/usr/bin/python3
def lift_off(countdown):
while countdown > 0:
print(countdown)
countdown = countdown - 1
CSC8621: Computing Foundations of Data Science 34 / 45
print("lift off")
lift_off( 10 );
Outputs:
10
9
8
7
6
5
4
3
2
1
lift off
1.92 Functions (long example)
#!/usr/bin/python3
## Code modified from:
## http://rosettacode.org/wiki/Sorting_algorithms/Bubble_sort#Python
def bubble_sort(seq):
"""Inefficiently sort the mutable sequence (list) in place.
seq MUST BE A MUTABLE SEQUENCE.
"""
changed = True
while changed:
changed = False
for i in range(len(seq) - 1):
if seq[i] > seq[i+1]:
seq[i], seq[i+1] = seq[i+1], seq[i]
changed = True
return seq
print(bubble_sort([2,4,1,3,9]))
Outputs:
[1, 2, 3, 4, 9]
1.93 Summary (Functions)
• Functions are a key part of any program
• Functions take parameters
• Functions return values
• Functions can be called from many places
CSC8621: Computing Foundations of Data Science 35 / 45
1.94 Reading and Writing
• At some level, most programs need to read input
• And write output
• Here we will cover some more techniques
1.95 Writing
• The main mechanism is the "print" function
print("Hello World")
Outputs:
Hello World
1.96 Writing
• This is a bit limited
• Most common requirement is "variable interpolation"
• "Old" way (which most people use) — the % operator
#!/usr/bin/python3
who = "Jennifer"
print("Hello %s!" % (who))
Outputs:
Hello Jennifer!
1.97 Writing
• The format operator is richer still
• The arguments are placed in order
#!/usr/bin/python3
print("Hello, {} and {}.".format("John", "Paul"))
Outputs:
Hello, John and Paul.
CSC8621: Computing Foundations of Data Science 36 / 45
1.98 Writing
• And if we need to repeat an arg, or they are not in order
#!/usr/bin/python3
print("Hello, {0} and {1} (and {0} again).".format("John", "Paul"))
Outputs:
Hello, John and Paul (and John again).
1.99 Writing
• Keyword arguments are also possible
#!/usr/bin/python3
print("Hello, {guitar} and {bass}.".format(guitar="John", bass="Paul"))
Outputs:
Hello, John and Paul.
1.100 Writing
• And controlling the values (in this case to 3 decimal places)
#!/usr/bin/python3
import math
print('The value of PI in Python is {}.'.format(math.pi))
print('The value of PI is approximately {0:.3f}.'.format(math.pi))
Outputs:
The value of PI in Python is 3.141592653589793.
The value of PI is approximately 3.142.
1.101 Writing
• We covered reading before
• Files need to be "opened" before use
• "w" stands for writing
• "r" just for reading — helps to prevent over-writing files
• And close when finish!
CSC8621: Computing Foundations of Data Science 37 / 45
#!/usr/bin/python3
f = open("outfile.txt","w")
f.write( "Hello file!\n")
f.close()
• which prints
Hello file!
1.102 Reading
• For reading, need to open.
• Then there are a variety of ways to get at the data
#!/usr/bin/python3
f = open("mab.txt","r")
for line in f:
print(line)
f.close()
Outputs:
O, then, I see Queen Mab hath been with you.
She is the fairies' midwife, and she comes
In shape no bigger than an agate-stone
On the fore-finger of an alderman,
Drawn with a team of little atomies
1.103 Reading
• This might not be doing what you expect
• Each line contains a new line
#!/usr/bin/python3
f = open("mab.txt","r")
for line in f:
print(line.strip())
f.close()
Outputs:
CSC8621: Computing Foundations of Data Science 38 / 45
O, then, I see Queen Mab hath been with you.
She is the fairies' midwife, and she comes
In shape no bigger than an agate-stone
On the fore-finger of an alderman,
Drawn with a team of little atomies
1.104 Reading
• An alternative way is to use "slurping"
• You can then operate on all the lines like a list
• But have to hold them all in memory
#!/usr/bin/python3
f = open("mab.txt","r")
lines = f.readlines()
for line in lines[1:3]:
print(line.strip())
f.close()
Outputs:
She is the fairies' midwife, and she comes
In shape no bigger than an agate-stone
1.105 Reading
• With all of these, really, you should call close()
• The with keyword does it for you
#!/usr/bin/python3
with open("mab.txt") as f:
for line in f:
print(line.strip())
print()
print("Closed? %s" % (f.closed))
print()
print("Closed? %s" % (f.closed))
Outputs:
O, then, I see Queen Mab hath been with you.
She is the fairies' midwife, and she comes
In shape no bigger than an agate-stone
On the fore-finger of an alderman,
Drawn with a team of little atomies
Closed? False
Closed? True
CSC8621: Computing Foundations of Data Science 39 / 45
1.106 Reading
• Also, can read Keyboard input
• Treats entered data as string
• Common source of errors
#!/usr/bin/python3
x = input("Please enter a value between 1 and 10: ")
print(x)
print(type(x))
Outputs:
Please enter a value between 1 and 10: 10
1.107 Reading and Writing (Summary)
• Python can read and write from files
• Files must be opened first
• And closed when finished with
• Strings can be formatted in many ways
• Lines can be read in many ways
• It often breaks!
1.108 Errors
• In life, things break often
• In programming, things break even more often
• How do we deal with errors?
1.109 Errors
• Python uses an error mechanism called "exceptions"
#!/usr/bin/python3
## Status: Crash
10 * (1/0)
Crashes:
Traceback (most recent call last):
File "exceptions.py", line 5, in
10 * (1/0)
ZeroDivisionError: division by zero
CSC8621: Computing Foundations of Data Science 40 / 45
1.110 Errors
• An error message in detail
Traceback (most recent call last):
File "exceptions.py", line 5, in
10 * (1/0)
ZeroDivisionError: division by zero
• Traceback — where the error came from
• What the error was
• And a description
1.111 Errors
• Traceback trivial in the last case
• Here is a more useful one
#!/usr/bin/python3
## Status: Crash
def fun1():
10/0
def fun2():
fun1()
def fun3():
fun2()
fun3()
Crashes:
Traceback (most recent call last):
File "exceptions_4.py", line 14, in
fun3()
File "exceptions_4.py", line 12, in fun3
fun2()
File "exceptions_4.py", line 9, in fun2
fun1()
File "exceptions_4.py", line 6, in fun1
10/0
ZeroDivisionError: division by zero
1.112 Errors
• And different errors cause different types of exception
• The NameError here indicates a programming error
CSC8621: Computing Foundations of Data Science 41 / 45
#!/usr/bin/python3
## Status: Crash
4 + spam*3
Crashes:
Traceback (most recent call last):
File "exceptions_2.py", line 5, in
4 + spam*3
NameError: name 'spam' is not defined
1.113 Errors
• As does the TypeError
#!/usr/bin/python3
## Status: Crash
'2' + 2
Crashes:
Traceback (most recent call last):
File "exceptions_3.py", line 5, in
'2' + 2
TypeError: can only concatenate str (not "int") to str
1.114 Errors
• Exceptions work like "hot potatoes"
• The hot potato is raised at the point of error
• It’s then thrown upward
• Till some one catches it
1.115 Errors
• Catching happens with the except keyword
#!/usr/bin/python3
try:
print(10/0)
except ZeroDivisionError:
print("An attempt was made to divide by zero")
Outputs:
An attempt was made to divide by zero
CSC8621: Computing Foundations of Data Science 42 / 45
1.116 Errors
• It is possible to catch different kinds of exception
• The code in the try block after the error does not run!
#!/usr/bin/python3
try:
print(10/0)
print("2" + 2)
except(ZeroDivisionError,TypeError):
print("Something went wrong")
Outputs:
Something went wrong
1.117 Errors
• Or to catch several types independently
#!/usr/bin/python3
try:
print(10/0)
print("2" + 2)
except ZeroDivisionError:
print("Divide by zero!")
except TypeError:
print("Type error")
Outputs:
Divide by zero!
1.118 Errors
• Or catch any kind of error
• This is normally a mistake
• It will hide programming errors silently
#!/usr/bin/python3
try:
10 * (1/0)
except:
print("Arithmetic was wrong")
try:
4 + spam*3
except:
print("Program was wrong")
CSC8621: Computing Foundations of Data Science 43 / 45
Outputs:
Arithmetic was wrong
Program was wrong
1.119 Errors
• try also supports an else clause
• "Do this so long as there is no exception"
#!/usr/bin/python3
file_name = "missing.txt"
try:
f = open(file_name, "r")
except IOError:
print("%s cannot be opened for reading" % (file_name))
else:
print("%s has %s lines" % (file_name,len(f.readlines())))
Outputs:
missing.txt cannot be opened for reading
1.120 Errors
• You can raise your own exceptions
• And interrogate exceptions that are produced
#!/usr/bin/python3
try:
raise Exception("Problem")
except Exception as inst:
print(type(inst)) # the exception instance
print(inst.args) # arguments stored in .args
Outputs:
('Problem',)
1.121 Errors
• Like a hot potato, exceptions can be handled anywhere
• Here, the error happens in a function, but we handle it outside
• We can separate error handling from the rest of our code
• But it can make it hard to understand what code will do
CSC8621: Computing Foundations of Data Science 44 / 45
#!/usr/bin/python3
def this_fails():
x = 1/0
try:
this_fails()
except ZeroDivisionError as detail:
print('Handling run-time error:', detail)
Outputs:
Handling run-time error: division by zero
1.122 Errors
• Finally, allows you to run code, whether an exception happens or not
#!/usr/bin/python3
## Status: Crash
try:
raise KeyboardInterrupt
finally:
print('Goodbye, world!')
Crashes:
Traceback (most recent call last):
File "exception_finally.py", line 6, in
raise KeyboardInterrupt
KeyboardInterrupt
1.123 Errors
• Finally, allows you to run code, whether an exception happens or not
#!/usr/bin/python3
## Status: Crash
try:
raise KeyboardInterrupt
finally:
print('Goodbye, world!')
Outputs:
Goodbye, world!
CSC8621: Computing Foundations of Data Science 45 / 45
1.124 Errors
• Main use for this is closing resources
• Often using with can do the same thing more easily.
#!/usr/bin/python3
with open("mab.txt") as f:
for line in f:
print(line.strip())
print()
print("Closed? %s" % (f.closed))
print()
print("Closed? %s" % (f.closed))
Outputs:
O, then, I see Queen Mab hath been with you.
She is the fairies' midwife, and she comes
In shape no bigger than an agate-stone
On the fore-finger of an alderman,
Drawn with a team of little atomies
Closed? False
Closed? True
1.125 Errors (Summary)
• Errors happen
• Python uses exceptions which are raised
• And can be caught elsewhere