School of Electronic Engineering - Dublin City University - Module EE544
1
Jan
2021
© 2021 Paul F Whelan
Computer Vision Assignment 20/21
Assignment Due Week 111
All assignment work must use the Python/Keras2 development environment.
Deliverables
The main purpose of this assignment is to introduce students to the practical aspects of developing deep learning
based computer vision systems within the Python/Keras environment. The assignment is worth 40% of your overall
module mark3.
Implement the tasks outlined and record your observations. Answer all questions and develop appropriate code
based solutions when requested. Submit a report detailing the rational, design and testing of the approach taken in
developing your solution. This document4 should include your report details (introduction, rational, design, testing
[including the test procedures used to evaluate the effectiveness of the method chosen], and conclusion).
A key element of this assignment is your ability to design and implement your own test strategy. Please submit5 a
single pdf document via Loop (Moodle).
Students should include full text code listings in the appendix
Plus a link within your PDF report to an online drive containing all assignment related material (code / data
/ test results).
Please use the naming convention
_.pdf for your submission. A completed
Coursework Assignment Cover Sheet (this can be found in the shared folder: CV_coversheet.pdf) must accompany
each assignment submission for it to be considered valid (this should be the first page of your report).
The assignment (including all code submitted) must be your own original work. Selected students will be subjected
to interview and/or demonstration. The use of third party source code to directly address the task outlined in this
assignment is not permitted and will result in at least a zero mark. If in doubt, ask your module tutor/coordinator.
1
Please refer to the Module Protocol for the strict no late assignment
policy. Please refer to Loop for the exact submission deadline and the
detailed marking scheme (rubric).
2
This is installed on all the PC’s in the Schools own Computer Labs.
These are available for use by ALL students registered on this module.
3
While I will endeavour to get your coursework marks back to you as soon
as possible, this is resource dependent and may not occur prior to
the exam board.
4
The format is as a technical report. While there is no page quantity
requirement, the final report should normally be no more than 40 pages.
See Appendix A.
5 All computer accounts should use the usernames and passwords issued by ISS at registration (notify the Paul Wogan if
you have any problems with this).
School of Electronic Engineering - Dublin City University - Module EE544
2
Jan
2021
© 2021 Paul F Whelan
Note:
Screenshots of code will not be accepted (this could result in a zero mark for your CA submission). All code
must be in text form in your final report. No reuse of 3rd party code – all coding solutions presented for
grading must be your own original work.
Submission of compressed files (e.g. zip, rar) will not be accepted (this could result in a zero mark for your
CA submission). Please use PDF submissions.
High URKUND (text matching scores) will automatically result in a zero CA mark.
All third party material must be referenced correctly and explicitly. All ideas, paraphrases of other people's
words must be correctly attributed in the body of the report and in the references.
Getting Started
Read and understand the Python/Keras requirements/installation instructions as indicated on the relevant websites.
Programme development6 should use the training7 and validation (pseudo test data to allow tuning of our
hyperparameters) datasets. Note that final system accuracy measure should only apply to the data generated on the
previously unused test data set when applied to your finalised system (this data set is not to be used for
programme development). Do not edit the datasets; they must be used as presented.
The assignment8 is in three sections. The first focuses on developing a basic VGG-lite CNN trained from scratch. The
second part will focus on the development of a high accuracy and computationally efficient solutions using fine-
tuning based transfer learning (pre-trained on ImageNet). While the third section will cover an image segmentation
task using UNet.
1: Multi-class Image Classification [Training from Scratch using ImageNette]
Dataset #1 (ImageNette): Based on fast.ai’s ImageNette [1] dataset, a subset of 10 easily classified objects from
ImageNet [2]. In this assignment, we will use a smaller subset of 4 classes [church, garbage_truck, gas_pump and
parachute]. This dataset is on the EE544 module drive in the assignment folder (imagenette_4class.zip). Data is
organised in three main sub-folders: train, validation and test. Our training data consist of 1200 images for each of
the classes, we have 100 validation images per class and 50 test images per class.
Complete all sections, justifying all engineering design choices. In particular, discuss the relevance of your network
architecture, your choice of optimizer and all the hyper-parameters used. In addition to the training and validation
accuracy and loss diagrams your solution must also produce all the metrics listed below (see scikit-learn metrics for
details) for each section.
6
Using a CPU only PC will result in long run times for these tasks. If
you have access to GPU [Dettmers, servethehome] on your PC then this is a
much better option. Alternatively, you can make use of cloud-based services such as Colab (Colaboratory: free Jupyter notebook
environment,
but time limited) and AWS (Amazon paid Web Services). For details on
using Keras with GPU on Amazon EC2 see [CS231n,
Brownlee].
Limited access to a high end Nvidia Tesla K40C GPU is available via the
School of Electronic Engineering via Linux (Ubuntu). See
https://sites.google.com/dcu.ie/ee-gpgpu/ for details. Contact conor.mcardle@dcu.ie for any additional information.
7
Training Dataset: data samples used to fit the model. Validation
Dataset: data samples of data to evaluate model fitting on training data
as
we tune model hyperparameters. Test Dataset: data samples that
are set aside until we wish to generate an unbiased evaluation of a
final
model fit on training data.
8 “If you're using Colab and
you feel like training your model on GPU is slow, switch to the TPU
runtime and tune the "steps_per_execution"
parameter in compile(). Seeing a 5-10x speedup is pretty common”. François Chollet
School of Electronic Engineering - Dublin City University - Module EE544
3
Jan
2021
© 2021 Paul F Whelan
Required Diagrams: Illustrate you answer with appropriate training and validation accuracy and loss diagrams.
Required Metrics: Final validation accuracy and loss, final test accuracy and loss, confusion matrix and the networks
computational cost.
Required Models: All models developed should be saved in Hierarchical Data Format (HDF5) (.h5) format.
Layer Number of
output filters
2D (3x3) convolution layer 32
2D (3x3) convolution layer 32
2D (2x2) Pooling
2D (3x3) convolution layer 64
2D (3x3) convolution layer 64
2D (2x2) Pooling
Flatten
Fully-connected NN layer 512
Fully-connected NN layer
(Prediction)
Number of
classes
Figure 1: Baseline VGG-lite9 CNN Structure.
a) Design and develop a simple VGG-lite (based on [3], which in turn is based on VGG-16D CNN [4]) baseline
convolutional neural network architecture as per the CNN structure illustrated in Figure 1 (and as detailed
below) to classify our 4-class ImageNette data. Illustrate the networks performance using the required diagrams
and metrics listed previously. Save your final model as a HDF5 file.
Network details:
Convolution layers should use Xavier (glorot) uniform kernel initialization, a dilation rate of 1, ‘same’
padding, strides of (height,width)= (1,1) and ‘relu’ activation. Note the first layer will need to account for
the input shape of the data been examined.
Pooling layers will use (2x2) max pooling with (ie a pool size of (vertical, horizontal) = (2,2)).
Fully-connected layers will also use Xavier (glorot) uniform kernel initialization and ‘relu’ activations.
[Marks: 5/40]
b) Without adding any additional convolutional or fully-connected layers, experiment with ways to improve the
baseline VGG-lite networks performance. Each change to the original simple network outlined in (a) must be
clearly justified in terms of the theoretical concept and supported by appropriate experiential results. Illustrate
the networks performance using the required diagrams and metrics listed previously. Save your final model as a
HDF5 file.
[Marks: 5/40]
9 The default input size for this model is 224x224.
School of Electronic Engineering - Dublin City University - Module EE544
4
Jan
2021
© 2021 Paul F Whelan
2: Dog Breed Classification using Fine-Tuning based Transfer Learning10
Dataset #2 (ImageWoof): In this section we will use fast.ai’s ImageWoof [1] dataset, a subset of 10 harder to classify
classes [Australian terrier, Border terrier, Samoyed, beagle, Shih-Tzu, English foxhound, Rhodesian ridgeback, dingo,
golden retriever, Old English sheepdog] from ImageNet [2]. The original data has a train/validation split of 1300/50
images, although this will be changed as part of the assignment.
Complete all sections, justifying all engineering design choices. In particular, discuss the relevance of your network
architecture, your choice of optimizer and all the hyper-parameters used. In addition to the training and validation
accuracy and loss diagrams, your solution must also produce all the metrics listed below for each section.
Required Diagrams: Illustrate you answer with appropriate training and validation accuracy and loss diagrams.
Required Metrics: Final validation accuracy and loss, final test accuracy and loss, confusion matrix and the networks
computational cost.
Required Models: All models developed should be saved in Hierarchical Data Format (HDF5) (.h5) format.
a) The original data has a train/validation split of 1300/50 images. This will need to be reorganized into appropriate
train/validation/test split before you train your network models. The details of the splitting is left to you, but you
must fully justify any final split used in your evaluation.
[Marks: 3/40]
b) Implement a Resnet-5011 [5] based fine-tuning based transfer learning CNN architecture to optimise the Dog
Breed classification task based on the new data split developed in (a). During fine-tuning, only retrain res5c block
(freeze the weights for the first 174-33=141 layers of base model). Illustrate the networks performance using the
required diagrams and metrics listed previously. Save your final model as a HDF5 file.
[Marks: 10/40]
c) Illustrate the performance of your network (as developed in part 2(b)) by applying it to previously unseen
images of the Dog Breed classes that you have acquired yourself. Comment on the performance of the network
when applied to these previously unseen “in the wild” images.
[Marks: 2/40]
10 Pretrained on ImageNet
11 The default input size for this model is 224x224.
School of Electronic Engineering - Dublin City University - Module EE544
5
Jan
2021
© 2021 Paul F Whelan
3: Image segmentation using UNet
Dataset #3 (Oxford Pet dataset): In this section, we will implement the original UNet [6] based Image segmentation
model trained from scratch on the Oxford-IIIT Pet dataset [7]. The data splitting for is left upto you, but this must be
fully justified. The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 200 images for each class
created by the Visual Geometry Group at Oxford. The images have large variations in scale, pose and lighting. All
images have an associated ground truth annotation of breed, head ROI, and pixel level trimap12 segmentation (which
we will use to address this task).
Figure 2: Random Oxford-IIIT Pet dataset image and its associated (trimap) segmentation mask.
Complete all sections, justifying all engineering design choices. In particular, discuss the relevance of your network
architecture, your choice of optimizer and all the hyper-parameters used. In addition to the training and validation
accuracy and loss diagrams, your solution must also produce all the metrics listed below for each section.
a) Implement the original UNet image segmentation architecture [6] on the Oxford-IIIT Pet Dataset [7]. The details
of the train/validation/test splitting is your decision, but you must fully justify any final split used in your
evaluation. Evaluate the performance of your baseline network
[Marks: 10/40]
b) What changes can be applied to your baseline network (implement in (a)) to improve the networks
performance?
[Marks: 3/40]
c) Generate predictions for all images in the test set. How well does the network work on previously unseen “in the
wild” image data?
[Marks: 2/40]
12 Trimap annotations for every image in the dataset. Pixel Annotations: 1: Foreground 2: Background 3: Not classified
School of Electronic Engineering - Dublin City University - Module EE544
6
Jan
2021
© 2021 Paul F Whelan
References
1. Jeremy Howard (2019), “Fast.ai Imagenette”, https://github.com/fastai/imagenette. Please refer to usage
LICENSE
2. Deng, J. and Dong, W. and Socher, R. and Li, L.-J. and Li, K. and Fei-Fei, L. (2009), “ImageNet: A Large-Scale
Hierarchical Image Database”, CVPR09
3. Adrian Rosebrock (2019), Deep Learning for Computer Vision with Python.
4. Karen Simonyan, Andrew Zisserman (2014), “Very Deep Convolutional Networks for Large-Scale Image
Recognition”, arXiv:1409.1556
5. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (2015), “Deep Residual Learning for Image Recognition”,
arXiv:1512.03385
6. O Ronneberger, P Fischer, T Brox (2015). "U-Net: Convolutional Networks for Biomedical Image
Segmentation". arXiv:1505.04597
7. Cats and Dogs (2012), O. M. Parkhi, A. Vedaldi, A. Zisserman, C. V. Jawahar, IEEE Conference on Computer Vision
and Pattern Recognition, 2012
8. Francois Chollet (2017), Deep Learning with Python
9. Aurelien Geron (2019), Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools,
and Techniques to Build Intelligent Systems
Appendix A: Assignment Report Details
The technical report structure [i.e. introduction, rational, design, testing, and conclusion] is just for guidance. The
key to your assignment report is that you fully and clearly answer each questions given in your assignment sheet
and provide the appropriate evidence of any conclusion through experimentation / testing.
Key to any technical report is the precision of your answer. As well as the engineering content, the report is
evaluated based on the clarity and communication of the ideas involved. Please reference any work that is not your
own (there are many bibliography formats, select one and stick to it).
All quantitative results, including the test procedures used to evaluate the effectiveness of the method chosen,
should also be included in the report. A key element of this assignment is your ability to design and implement your
own test strategy where required.
This is an individual assignment and students must produce their own report.
Selected students are subject to interview and/or demonstration.
REMEMBER:
Everything you write has to be in your own words
All ideas, paraphrases of other people's words must be correctly attributed in the body of the report and in
the references
No reuse of 3rd party code – all coding solutions presented for grading must be your own original work.