ENEL 645 Project: Object classification in remote sensing images with machine learning
1 Dec, 2020
Motivation: Satellite and aerial remote sensing imagery are used for planning and recovery of natural
disasters, urban development, surveillance, law enforcement, and sustainability. Object recognition makes
a key aspect of this analytics. This project aims to guide you through the process of applying machine
learning to a practical topic of interest, including data acquisition, model training and analysis of results.
Task: Object recognition. You will be given a dataset from a known sensor. You will train an object
classifier to automatically classify these objects such as [vehicles, ships, buildings (commercial, residential),
roads, water, land (forest, grass, etc)]. To guide you, two steps are recommended.
Deliverables:
- A final report on the following
o Title page (group members) and chosen sensor
o Part I: Data processing and understanding
▪ Explain the data, bands, algorithm, training
o Part II: Results of chosen algorithm/model with evaluation
- Links to code, results and trained models (and run instructions)
- Links to the datasets gathered for training and test
Part I: Data preprocessing and understanding:
1) Download the data from respective links. The assigned image will be your test image for evaluation.
Use some software (QGIS, Google Earth, python) to visualize the data. Label/format/crop data to
a format suitable for your algorithms. Search for useful data to train your algorithms (available or
label yourself).
2) Select objects you want to classify, depending on the image(s) and the resolutions, and various
literature on their use. The output of the algorithm will be class of object(s) in the image. The more
classes you can try to obtain and label the better.
- low-resolution imagery covering large regions, it is evident that roads, buildings, water and
vegetation types may be classified (Land Use and Land Cover Classification System).
- high-resolution imagery may allow for vehicles within a parking lot or ships in the harbour to
be recognized.
Part II. You are free to choose any algorithm(s) and any preprocessing, any bands of your choice on the
assigned image. Evaluation on the test image will be based on the following metrics below. Your test image
should be labelled according to the objects you choose.
1) Export your test image in a compatible format for evaluation. This could involve splitting it into
smaller sub-images for evaluation and combining the result.
2) Evaluate algorithm(s) using the following metrics:
a. Accuracy of classification via confusion matrix
b. If your algorithm outputs bounding boxes, accuracy given 50% overlap of boxes on your
test images compared to labels.
3) Show/attach sample outputs vs labelled groundtruth to prove your algorithm is working as
intended.
Note: The image processing algorithms may require a reasonable computation power, feel free to search or
use known techniques to reduce the input size, dimensionality reduction. In addition, you may have free
access to high-end GPU’s with Google Collab/cloud, please make backup plans and files as needed:
https://colab.research.google.com/notebooks/intro.ipynb
Note: Labelling your own images: You can use LabelImg, LabelMe, other software or ad-hoc. The tools
are quite intuitive and easy to learn.
Background/Primer: Image processing:
Figure 1 and Figure 2 show sample outputs of models.
An image is simply matrices, where each pixel represents an intensity in the color (for example
Red,Green,Blue there are 256 different intensities) from value 0 to 255 (or 0 to 1 if normalized).
Top left corner is (x,y)=(0,0). Bottom right corner is (x,y)=(width,height) of image. So this image here is
(800,800) pixels. This image is an RGB image represented as a matrix of size [x,y] x 3 for each R,G,B
channels = [800,800,3]
Fig 1. Sample image. This [800,800,3] image is passed to the algorithm, which outputs class probabilities
and boxes. For this image, we only display probability of airplane and ignore vehicle/ship classes.
Background on object detection: Object detection is the task of localizing (where is the object?) and
identifying (classifying what is the object) multiple objects in a single image.
Localizing: Find bounding boxes, or the coordinates where the object is. [xmin,xmax,ymin,ymax]
So one of the airplane locations is roughly [180,400,380,510]. This is already drawn as the green box in
the figure above.
Classifying: For each box, also output the softmax probability of the classes. The corresponding probability
is 36% to airplane class. (the other 64% goes to vehicle and ship class which we trained (3 object classifier)
Sometimes the resolution or location does not permit “ships/airplanes” as objects, so we consider buildings
or land types as an object. For example, in a SAR image obtained right before the conclusion of this site:
https://medium.com/gsi-technology/a-beginners-guide-to-segmentation-in-satellite-images-9c00d2028d52
A SAR image (left) rendered to 0-255, and (right) a sample building segmentation. White corresponds to
the pixels of buildings.
Fig 2: A common land use land cover map derived from LANDSAT over Mississippi. (Public domain)
https://www.usgs.gov/media/images/5-maps-dates-land-use-and-land-cover-data