CAB402 Programming Paradigms
Assignment 1
Due: 23rd April 2021
Worth: 30%
The
purpose of this assignment is to develop your functional programming
skills by implementing a non-trivial application using
F#. The
principal focus will be on learning to program in a “pure” functional
style – i.e., without making use of any mutable state.
In order to
contrast this pure functional style of programming with the more
traditional imperative (impure) and object-oriented
paradigms, you will develop two separate implementations of the Genetic Algorithm library:
1. A pure functional implementation using F#
2. An object-oriented C# implementation
Required Software
1. Latest version of Visual Studio 2019 (with .NET desktop development and .NET 5.0)
Genetic Algorithms
For many so-called Combinatorial Problems, there exist an exponential number of possible solutions. For sufficiently large
problems
it therefore becomes prohibitively time consuming to perform a brute
force analysis of every possible solution. We must
resort to some
form of heuristic algorithm that aims to find a good solution, but not
necessarily the best possible solution. One
such class of heuristic
algorithms is Genetic algorithms which are based on the process of
evolution and natural selection as occurs
in the wild. We start
with some population of individuals, those individuals mate to produce
offspring. This mating process creates
children that have some combination of genes derived from their parents. During this reproductive process, some genes might
also
mutate (i.e., accidently change at random). Such random mutations are
generally harmful to the child, but occasionally they
might result
in a fitter mutant offspring. Natural selection means that fitter
individuals are more likely to mate and pass on their
genetic material to the next generation, so over time, the population as a whole tends to get fitter.
In
a genetic algorithm, each individual in the population represents a
particular proposed solution to a specific combinatorial
problem. So, to use a genetic algorithm to solve a combinatorial problem we must first decide how we will encode a proposed
problem solution as a sequence of genes. Each gene will represent some particular aspect of the solution. We can use whatever
data
type we like to represent each gene. For example, we could use binary
bits, integers, real numbers, characters, etc. The choice
will depend on the particular problem we are trying to solve.
If
natural selection involves “survival of the fittest”, we need to
provide a means for determining which individuals are fitter or
better
than others. We need to provide a fitness function that takes a
sequence of genes as input and returns a numeric fitness
value (where higher means better). The fitness function depends on what aspect of the problem we are trying to optimize. If we
want
to optimize for some different property, then we just need to change
the fitness function used and the evolutionary process
will then tend to find better and better solutions according to that fitness criteria.
In this assignment we are going to use Genetic Algorithms to solve two different combinatorial problems:
1. Travelling Salesperson Problem (https://en.wikipedia.org/wiki/Travelling_salesman_problem)
The
first is the Travelling Salesperson Problem (TSP), a well-known NP-hard
combinatorial problem. Given a set of cities at specific
locations, the challenge is for the salesperson to visit each city precisely once and then return to where they started, while
travelling
the least total distance. Visiting the cities in different orders will
result in different total distances, so when encoding a
particular
solution to the TSP problem as a sequence of genes, it is natural to use
a sequence of integers representing the order
in which the cities
will be visited. For example, the sequence [2,4,3,1,5] means that we
would start at city 2, then travel to city 4,
then to city 3 and so
on before eventually returning to city 2. Given a particular order in
which cities will be visited we can compute
the total distance
travelled, so the fitness function is based on total distance travelled,
however, since we want to minimize the
total distance travelled,
we actually use the negative of the total distance travelled as our
fitness value, since we want a higher
fitness to correspond to a better solution.
2. The Resource Constrained Scheduling Problem (RCSP) (https://en.wikipedia.org/wiki/Genetic_algorithm_scheduling)
The resource constrained scheduling problem (RCSP) is another well known and important combinatorial problem. We are given
a
set of jobs (or projects) that need to be completed. There are also a
finite number of various types of resources that are needed
to help complete the jobs. Each job requires a particular number of these various resources. For example, job 1 may require 2
resources
of type A and 1 resource of type B, while job 2 might require just 1
resource of type A. Each job also has a specific
duration. The
resources used by a job need to be available for the entire duration of
that job, i.e., they can not be used by any
other job during that time period. The resources are renewable, meaning that once we are finished using them for one job, we
can
use them again for some other job. We are also given an objective
function that we are trying to optimize. Often, we are just
trying
to minimize the total duration required to complete all jobs. A solution
to the RCSP involves determining a start time for
each of the jobs and an allocation of resources.
One popular method for creating such a solution is the Serial Schedule Generation Scheme. This algorithm involves scheduling
each
of the jobs, one by one in a particular order. For each job we seek to
determine the earliest possible time that the job could
be feasibly scheduled given the jobs that have already been scheduled. When we say feasibly scheduled, we mean that for the
entire
duration of that job (from start time to start time plus the job
duration), that we can allocate the required resources for
that job
such that those allocated resources are not being used by any of the
other jobs during that time period. If we change the
order in which
we schedule the jobs, then the resulting schedule will change and
possibly have a better or worse total duration
required to complete all jobs.
So,
in searching for the best possible schedule, rather than considering
every possible schedule, we instead just need to determine
the
order in which we should schedule the jobs, because a given scheduling
order leads deterministically to a specific schedule.
Solving a
RCSP problem using a genetic algorithm is therefore very similar to
solving a TSP problem. In both cases we are seeking
an ordering (or
permutation). In one case the order in which cities should be visited,
and in the other the order in which we should
serially schedule the
jobs. So, we use the same genetic representation as for TSP, a sequence
of integers (which must always be a
permutation). The only
difference is the fitness function. In the case of the RCSP problem, we
take the job order provided by a
particular individual’s genes and
use it to conduct a serial schedule generation. From that we can
determine the finish time of the
latest job and return the negative of that time value (since a shorter total duration is considered better).
During
serial schedule generation, determining the earliest possible start
time for each event can be computationally expensive
as it may
involve considering many possible start times before we eventually find
one that works. In order to make this algorithm
more efficient, we
exploit the following observation: The earliest possible start time for a
job is always going to be either time 0
(if it can be started
immediately), or it will be the precise finish time of some other job
that has already been scheduled. So, rather
than testing every
possible potential start time, we only need to consider either 0 or the
distinct finish times of jobs that have
already been scheduled. But we can leverage this idea even further. In determining whether some job can start at say time t0
(where t0 is either 0 or one of the previous finish time), we then need to ensure that the resources that we are proposing to
allocate
to this job are available for the entire duration of that job (i.e.,
times t such t0 <= t < t0 + duration). However, based on
the
above idea, we do not actually need to check at every time point between
t0 and t0+duration. We need only check at the time
points within
that period which also happen to be a finish time of one of the previous
jobs – because it is only at those time points
that resource usage changes. So, we can save the time checking at all the other time points within that period.
If
job 1 requires one resource of type A and there are a total of 3
resources of type A, then in developing a schedule we need to
decide
not only the start time of job 1, but also which resource of type A we
will allocate to it. Otherwise, it would be difficult to
ensure
that particular resource instance will be able to be used solely by job 1
for its entire duration. We therefore number each
resource instance from 0 up to N-1 where N is the finite number of resources of that type available.
Athletics Event Scheduling
The resource constrained scheduling problem is applicable to many real-world problems, from scheduling exams or staff rosters
or
factory manufacturing schedules. In this assignment we are going to
consider a real-world problem of scheduling the events for
a junior
athletics competition. We are given a list of events that each age
group of athletes will compete in during the competition.
For example:
U15-17G Javelin
U8B 70m Sprint
U13-14B Long Jump
U15-17G 700m Walk
etc …
Each
event requires two resources. It requires the group of athletes in that
age group and it requires a suitable location to conduct
the
event. Each different type of event requires a different type of
location. For example, the 70m Sprint and the 700m Walk both
require a running track, while the Long Jump event require a sand pit. For a given competition arena, there are a fixed set of
locations which are available for each type of event. For example, there may be 4 sand pits, but only 1 running track. When
scheduling
an athletics event (using serial schedule generation) we must ensure
that both required “resources” are available for
the entire
duration of that event: the group of athletes in that age group need to
be available and the location that will be allocated
to that event,
e.g., sand pit number 2, needs to be available. Each event will have a
fixed duration which will depend on the event
and the number of
athletes in that age group. The result of the scheduling process will be
a start and finish time for each event
and an indication of which
location resource was allocated to that event. The age group is
considered a “resource” but we don’t
need to worry about which instance to allocate as there is only one instance of each age group at the competition.
Provided Skeleton Solution
For
this assignment you have been provided with a skeleton solution (in the
form of a Visual Studio 2019 Solution) that you must
use. The Solution consists of 9 Visual Studio Projects as illustrated below.
The three projects with a red border are the ones you need to complete:
1. FSharpGeneticAlgorithm (an F# library for implementing a Genetic Algorithm)
2. ScheduleModel (which includes the fitness function to be used for Athletics Scheduling)
3. CSharpGeneticAlgorithm (an alternative C# library for implementing a Genetic Algorithm).
The skeleton code included in each of those projects provide more detailed implementation instructions.
The remaining six projects have already been implemented and may not be altered in any way. They include:
• An F# library called RandomMonad that your F# implementation of the Genetic Algorithm will use when it needs to make
random choices (see explanation on next page)
• An F# library called TSPModel that includes the fitness function to be used for the TSP problem.
• Two separate sets of unit tests for the FsharpGeneticAlgorithm library and the ScheduleModel library. If you have
implemented those libraries correctly, all unit tests should pass.
• Two separate GUI applications (TSPView and ScheduleView) which provide a dynamic visualization of the TSP and
Athletics Scheduling problems as they are being optimized. Both GUI applications are built using .NET Window
Presentation Foundation (WPF) and use the MVVM pattern.
MVVM (Model-View-ViewModel)
The GUI apps uses the Model-View-ViewModel (MVVM) architectural pattern. See:
• https://en.wikipedia.org/wiki/Model-view-viewmodel,
• https://docs.microsoft.com/en-us/windows/uwp/data-binding/data-binding-and-mvvm,
• https://docs.microsoft.com/en-us/xamarin/xamarin-forms/enterprise-application-patterns/mvvm,
• https://blogs.msdn.microsoft.com/msgulfcommunity/2013/03/13/understanding-the-basics-of-mvvm-design-pattern/)
The GUI apps are split into 3 components: The Model, The View and the ViewModel. The model provides data structures for
representing objects in the domain as well as basic “business logic” for the application domain. The View is the part of the
application that determines what the user sees on the screen and what user interactions (clicks, keyboard, gestures, etc) are
allowed. The ViewModel acts as a bridge between the View and the Model:
https://documentation.devexpress.com/WPF/15112/MVVM-Framework
• The View is included in the GUI app in the form of a WPF window named MainWindow.xaml. The ViewModel is also contained
in the GUI app via a class named AppViewModel.cs. The Model is implemented in the TSPModel and ScheduleModel libraries
respectively, together with the Genetic Algorithm libraries. Both GUI apps are set up so the user can dynamically switch from
using your F# version of the Genetic Algorithm to your C# version of the Genetic Algorithm. Input files for ScheduleView can
be found in the TestCompetitions folder.
Figure 1: TSPView GUI app
Figure 2: ScheduleView GUI app
The Rand Monad
One
of the F# modules provided to you is RandMonad. We will talk about
Monads later in the semester, but for now it is sufficient
to understand that this Monad is designed to allow our Genetic Algorithm to make random decisions. However, one of the key
advantages of functional programming is the use of pure functions. A pure function will produce the same result every time if
called again with the same parameters. Any function that makes random decisions is therefore not going to be a pure function.
To produce random values, we need a random number generator. A random number generator can generate a sequence of
pseudo random numbers. In .NET we can create a random number generator by calling new System.Random(seed), where
seed is an integer. If we call new System.Random multiple times, but with the same seed, then we will get the same sequence of
pseudo random numbers every time. So, given a particular seed, the system becomes deterministic. The other problem with
random number generators is that they are stateful.
Consider:
var random = new System.Random(42);
var v1 = random.Next();
var v2 = random.Next();
Each
time we call the Next method the internal state of the random object is
changed, so the second call to random.Next() is likely
to return a different value than the first call to random.Next (even though we are using the same random number generator and
have not altered the parameters). We can think of the random number generator as generating an infinite sequence of numbers.
The
first call to random.Next takes the first value in that sequence and
the next call takes the second value in this sequence.
In order to make our genetic algorithm implementation more deterministic and purer, we are going to abstract the use of the
random number generator by wrapping it inside a monad which enable us to avoid directly creating or using random number
generators. In F# to use a Monad we make use of a Computational Expression. Our RandomMonad provides a
ComputationalBuilder called rand to create Computational Expressions. The following example from TSPModel illustrates how
we make use of the random monad:
let randomCity width height: Rand
=
rand {
let! randomX = intRange 0 width
let! randomY = intRange 0 height
return { x = randomX; y = randomY }
}
First, notice that the code in the function is wrapped in rand { … }. This produces a Rand value, in this case a Rand. The
intRange function from the RandomMonad module returns a value of type Rand, i.e., a random integer. A random integer is
different
to a regular integer. We cannot for example add or subtract random
integers. Notice the exclamation mark after the let.
If we had instead said (without the exclamation mark):
let random1 = intRange 0 width
then random1 would be of type Rand. However, by using let!, random1 gets an integer value. The let! allows us to
“unwrap” the random monad and allows us to get our hands on the actual random value hidden inside. But we can only use let!
within a rand computational expression. So, the final result ends up being a random value, in this case a randomly generated
City.
So, the monad provides us a way to unwrap, various random values, but
the final result gets wrapped up again. Ultimately,
we need to provide a random number generator for such functions to be able to be evaluated, but that is done just once at the
root our application – and if we use the same seed each time we run our application then the entire program will run
deterministically
every time. Importantly, this allows us to create deterministic unit
tests for individual functions and for the entire
program.
The other important thing to note about the Rand monad is that the order in which we list the let! statements is significant,
because each time we unwrap a random value we are implicitly consuming values from a stateful random number generator. For
example, the following function definition would produce different results to the one above (even if the same random number
generator is used):
let randomCity width height: Rand =
rand {
let! randomY = intRange 0 height
let! randomX = intRange 0 width
return { x = randomX; y = randomY }
}
Details of the Genetic Algorithm Implementation
We start with a randomly generated population of individuals. In our case, each individual will be a randomly generated
permutation of integers. All individuals in a population always have the same number of genes.
In
each generation we create offspring by selecting parents via a process
of natural selection. There are many different selection
algorithms
(https://en.wikipedia.org/wiki/Selection) , but in our case we will use
a process called tournament selection. From the
entire population
we select a field of N individuals who will compete. The individual with
the highest fitness value in the field is
then selected as the winner and gets to mate. In our implementation, the parameter N will be 2. i.e., two individuals will be
selected
to compete in each tournament. Each of the N individuals is selected
independently, so there is a small chance that the
same individual
may be selected twice and hence compete with itself (which is not a
problem). This tournament selection process
is carried out twice to select both parents – there is no notion of gender.
Once we have two parents, we combine their genetic material via a cross over process to create the offspring. There are many
different
crossover algorithms (https://en.wikipedia.org/wiki/Crossover), but in
our case the cross over will happen via a process
called Ordered
Crossover (OX). We randomly split the first parent’s genes into two
sequences such that the first part and the last
part each have at
least one gene. The first part of the first parent’s genes becomes the
first part of the offspring’s genes. The last
part of the
offspring’s genes is determined by adding genes from the last part of
the first parent but reordered according to the
order in which they
appear in the second parent. For example, if parent1 = [0,3,5,4,2,1,6]
and parent2 = [6,4,2,1,0,3,5] and the
split point is 4 then the
first 4 genes come from parent1 [0,3,5,4] and the remaining genes
[2,1,6] are ordered according to parent2
i.e. [6,2,1] because 6 comes before 2 and 2 comes before 1 in parent2. So, the child in this example will be [0,3,5,4,6,2,1].
Once we have combined the genetic material from the two parents, sometimes (with probability = 0.15) a mutation occurs which
randomly changes the makeup of the offspring’s genes. There are many different mutation algorithms
(https://en.wikipedia.org/wiki/Mutation), but in our case we will perform a reverse mutation. We randomly split the genes into
three
sequences, start, middle and end (such that the middle section has at
least two genes. The start and end sequences remain
unchanged, but the genes in the middle are reversed in order. For example, if we randomly split at positions 2 and 4, then
[0,3,5,4,2,1,6] would mutate into [0,3,2,4,5,1,6].
Once a specified number of offspring have been produced, the next generation is then formed by combining the offspring with
some
of the individuals from the previous generation. Many different
algorithms exist, but in our case, we will use a process called
Elitism
Selection. We combine all the offspring together with the N fittest
individuals from the previous generation (where N = 10).
This ensures that we do not lose the best solution found so far.
There are also various methods for deciding how many generations to evolve before terminating the application. Given the
heuristic nature of the optimization process, we may never know for sure when we have found the best possible solution. We
might
search forever and never find a better solution or a better solution
might be just a generation away. We therefore leave it
to the
end-user to decide when to terminate the search. Our algorithm will
keep searching forever, producing an infinite sequence
of the best individuals from each successive generation.
To obtain good results with genetic algorithms, it is important to experiment with various parameters. In our case we will
generally
use an initial population of 100 individuals. Using more individuals
creates more genetic diversity but takes longer to
simulate each generation.
The Array Type
Unlike
lists and sequences, arrays in F# are mutable, meaning you can update
the values of individual elements after the array has
been created. For example, the following changes the 3rd element of the array to 32.
myArray.[2] <- 32
Such
impure F# code is forbidden in this assignment. The goal of this
assignment is to learn the art of pure functional programming.
We do however use the array type extensively in the Genetic Algorithm module. We do not use arrays because they can be updated
after
they have been created; we use them only because they are more
efficient that lists to lookup a particular element. Element
lookup
with arrays in O(1) whereas it is O(N) for lists and sequences. i.e.,
with a list or sequence, if you wanted to access the 100th
element the system would need to traverse element by element from the start until it got to the 100th element, whereas with an
array it can jump straight to the 100th element in constant time.
The following are some useful Array functions that I have used in my sample solution:
• Array.append
• Array.except
• Array.filter
• Array.init
• Array.init
• Array.isEmpty
• Array.map
• Array.map2
• Array.maxBy
• Array.rev
• Array.sortBy
• Array.sum
• Array.take
• Array.truncate
• Array.unfold
What to Submit
1. A 1-to-2-page PDF report discussing your experience and comparing the two approaches (F# pure functional and C#
object-oriented imperative). Your comparison should include a discussion on the efficiency and effectiveness of the two
approaches. You should discuss which was easier to program, which code is easier to read, understand and maintain,
which is more concise, and which produces more efficient code. Provide measures were applicable to support your
claims.
2. A zip file containing your entire Visual Studio Solution folder.
You should work with the skeleton Visual Studio solution provided and make changes only to the following projects:
• FSharpGeneticAlgorithm
• CSharpGeneticAlgorithm
• ScheduleModel
Note:
• All occurrences of // TODO: add correct implementation here comment should be replaced by an actual
implementation.
• Extra helper functions and methods should be added as needed. Complex functions and methods should be
broken down into simpler functions. Ensure you use meaningful names for all functions, methods, parameters,
and variables.
• Ideally, all 8770 unit-tests should pass.
To reduce file size when submitting you should remove the following:
• .vs folder
• bin folders
• obj folders
• Visual Studio solution files except for CAB402GeneticAlgorithm.sln
Everything remaining should be added to a compressed zip file with a name of the form n1234567.zip
Note: this must be a zip file, not a tar file, not a rar file, not an iso file, etc.
When we mark your assignment, we will:
1. Unzip your zip file
2. Double click on CAB402GeneticAlgorithm.sln to open your solution in Visual Studio 2019
3. Perform a Rebuild Solution (this should cause any required packages to download as necessary).
4. Run all the unit tests via the Test Explorer Window
5. Start your TSPView and ScheduleView applications and test that they behave as expected, trying both the F#
and C# versions of the Genetic Algorithm library.
6. We will then browse your source code and assess it against the assessment criteria.