MCEN90032-Matlab代写|学霸联盟

MCEN90032-Matlab代写

时间：2023-05-07

MCEN90032 Sensor Systems
Workshop 3: Sensor Data Processing
Prepared by: Nandakishor Desai, Noor E Karishma Shaik, and Marimuthu Palaniswami
1 Introduction
Sensors are vital components of an autonomous vehicle, with cameras, light detection and ranging (LiDAR), radar,
global positioning system (GPS) and inertial measurement unit (IMU) among the most common. Sensors play a
crucial role in the perception and localisation of vehicles to achieve safe and reliable navigation. The localisation
system identifies the vehicle’s location in a reference coordinate system, while the perception system evaluates the
driving environment around the vehicle and identifies road objects such as other road users, traffic signals and obstacles.
Further, the sensor information can be combined to achieve the same objective more effectively and reduce uncertainties
due to sensors acting separately. In this workshop, you will work with data collected by a ground vehicle (GV). GV
is equipped with LiDAR and camera sensors and navigates through an urban environment.
• In part A, you will work with sample image data collected by GV and investigate how to relate two images
collected from two different viewpoints to describe vehicular motion.
• In part B, you will read a sample LiDAR data collected by GV and perform spatial transformations to understand
how vehicular movement affects the perception of the external environment and investigate the relationship
between two pointclouds of the same object collected from two viewpoints.
• In Part C, you will need to design and develop a perception system for GV using its LiDAR data to identify
everyday urban objects in its environment, such as pedestrians, bikes, cars, and others.
• In part D (optional), you will investigate basic concepts associated with sensor fusion and work on simple
operations to combine information from LiDAR and camera onboard the GV.
2 Assessment
• You will need to complete and submit this work individually.
• Write three separate MATLAB scripts (partA.m, partB.m, partC.m) for the three tasks. You may have additional
scripts for supporting functions that you call from your task scripts.
• The workshop assignment is worth 15 marks.
• The demonstrations are due by week 11. You need to show the working of your code to demonstrators, explain
its structure and answer their questions. Demonstration is worth 12 marks.
• Submit your well-documented codes as a single zip file. Ensure the matlab script files contain your name and
ID at the top. Name the zip file with your student ID and submit to Canvas before before 16/05/2023, 11:59
PM. Submission of well-documented codes is worth 3 marks.
• The submission will be considered if its working has been demonstrated to the demonstrators.
Late submission will not be considered.
3 Learning Objectives
• Feature descriptors from images and feature matching.
• Structure of LiDAR pointclouds and storage management.
• Aligning two LiDAR pointclouds of the same scene using iterative closest point algorithm.
• Using images to design perception system for an autonomous vehicle.
1
4 Technical Skills
• MATLAB programming.
• Digital Image Processing.
• Machine Learning.
5 Components Required (minimum)
• MATLAB 2021b or higher.
• MATLAB toolboxes: Computer Visiton Toolbox, Machine Learning Toolbox, Signal Processing Toolbox, and
other supporting toolboxes.
2
6 Project Instructions
A ground vehicle GV navigates through an urban environment and is equipped with LiDAR and camera sensors,
among other sensors. GV acquires the pointclous and images from different locations through its navigation. Perform
the following described tasks to process the data acquired by Gv for decision making.
Part A: Feature Descriptors and Feature Matching [5 marks]
Feature matching is an essential step in several computer vision applications that involves establishing correspondences
between two images of the same object or scene. The first step is to detect a set of interest points associated with
image descriptors. Once the features and descriptors are extracted from the sequence of images, the next step involves
establishing matching between feature descriptors to establish correspondences.
Feature Descriptors
Feature descriptors need to be robust and invariant to changes in scale, rotation, and minor viewpoint changes. One
widely used descriptor is the scale-invariant feature transform (SIFT). SIFT is a local feature detector and descriptor
constructed based on the difference of the Gaussian (DoG) scale-space pyramid. It includes fixing a reproducible
orientation based on information from a circular region around the interest point and then extracting the descriptor
from a square region constructed in alignment with the selected orientation.
A.1 The image acquired by GV’s camera from location A is given in file1.png. Extract SIFT feature keypoints and
display the ten strongest (based on the metric returned in the SIFTpoints object) keypoints by overlaying them
on the image. Assume the number of layers in octaves to be 1 and the sigma of the Gaussian applied to the
input image at the zeroth octave to be 1.6.
A.2 Repeat (a) by increasing the number of layers within an octave to 5. Compare the ten strongest keypoints from
(a) with (b) and comment.
Feature Matching
Feature matching involves comparing feature descriptors from two images of the same scene. A brute-force method
employs a distance metric to exhaustively compare each feature from one image with all the features from the other
to establish correspondences.
A.3 The images acquired by GV’s camera from locations A and B are given in file1.png and file2.png. Extract SIFT
feature descriptors from both the images assuming the number of layers in octaves to be 3 and the sigma of
the Gaussian applied to the input image at the zeroth octave to be 1.6. Write a MATLAB script (without
directly using the matchfeatures command from MATLAB) to establish unique correspondences between feature
descriptors of the two images. Employ exhaustive search with the sum of squared differences as the distance
metric. Unique correspondences imply that the features matched should be one-one (i.e., feature mapping from
image 1 to image 2 should be the same as image 2 to image 1). Ensure that your features are l2 normalized
before performing the feature matching.
A.4 Implement feature matching using the built-in MATLAB function to establish the unique correspondences be-
tween the feature descriptors (assume MatchThreshold to be 100 and MaxRatio to be 1.0). Verify with your
matched features from (a). Display the matched feature points overlaid on the images.
3
Part B: LiDAR Pointcloud Matching [5 marks]
LiDAR Data and Preprocessing
LiDAR measures the position of points in the three-dimensional world using spherical coordinates: radial distance from
the centre origin to the 3D point, elevation angle measured up from the sensor XY plane, and azimuth angle measured
counterclockwise from the x-axis of the sensor. The spherical coordinates are converted to cartesian coordinates x,
y, and z and stored as a 3 by 1 column vector. The points are stacked into a matrix that stores the scanned points.
Sometimes, there can be a 4th column in the matrix storing the amplitude of the returned signal. One matrix of
LiDAR points is usually referred to as pointcloud, and a pointcloud can represent an entire object or part of an object
or surrounding environment.
B.1 The ‘pointcloudA.mat ’ is the LiDAR pointcloud of an urban road object obtained by the GV’s LiDAR when
it is at location A. The mat file contains the location coordinates and the amplitude of the returned signal.
MATLAB stores LiDAR scan points in a data structure pointCloud consisting of the cartesian coordinates of
the scanned points and associated metadata. Read ‘pointcloudA.mat ’ into MATLAB data structure, inspect its
properties, and visualise the pointcloud.
B.2 LiDARs can acquire a large number of points (up to a million at times) per second which may increase com-
putational complexity. The pointclouds can be downsampled to reduce the complexity without affecting overall
performance. Downsample the ‘pointcloudA’ using a 3D box grid filter of size 0.25 such that the overall shape of
the original pointcloud is preserved to a greater extent. Display the downsampled pointcloud and original and
verify.
Spatial Transformations and Pointcloud Registration
Objects in the world mostly stay put while the reference frame attached to the vehicle moves, observing them from
different perspectives. When the vehicle moves, it will affect the perception of all the scanned points in the point cloud.
A combination of translation and rotation operations can describe the vehicular movement between two locations.
Given two point clouds in two different coordinate frames, and with the knowledge that they correspond to or contain
the same object in the world, optimal translation and rotation between the two reference frames can be computed to
minimise the distance between the two point clouds. The process of finding a spatial transformation (rotation and
translation) that aligns the two point clouds is known as point-set registration. A popular algorithm is the iterative
closest point (ICP) procedure. The basic intuition behind ICP is that when we find the optimal translation and
rotation, we can use them to transform one point cloud into the coordinate frame of the other such that the pairs of
points that truly correspond to each other will be the ones that are closest to each other in a Euclidean sense.
B.3 GV moves from location A to location B, affecting the perception of pointcloud obtained in B.1. The changes
can be described by a rigid transformation that includes a 30◦ rotation and a translation of (2,4,0). Apply the
described transformations to ‘pointcloudA’ described in B.1, without using the built-in MATLAB transformation
command, to construct the transformed pointcloud ‘pointcloudB ’. Verify the transformation using the built-in
command.
B.4 Employ the ICP procedure between ‘pointcloudA’ and ‘pointcloudB ’, without using built-in MATLAB command,
to recover the transformations (rotation and translation) performed in B.4. Verify your results using the built-in
command.
4
Part C: LiDAR-based Perception Module [5 marks]
In this task, you will need to design and implement a perception system for vehicle GV that operates on LiDAR
pointclouds obtained by the onboard sensor. The perception system must be able to identify urban road objects in
the surrounding environment, such as bikes, pedestrians, trees, buildings and others. You can use the pointclouds of
urban objects obtained by GV and their corresponding labels and employ supervised machine learning to develop and
validate the perception system.
The pointclouds and their road object names are available in two ‘.mat’ files: lidarData.mat and lidarLabel.mat. The
lidarData.mat is a cell array consisting of 523 elements corresponding to 523 LiDAR scans of different objects. Each
element of the cell array is a 2D matrix with 4 columns (x,y,z coordinates of a world object, and the fourth column
represents intensity A of the returned laser signal). The number of rows in each element varies and represents the
number of 3D pointclouds obtained for the given scan. The lidarLabel.mat is a cell array consisting of 523 class names
corresponding to the 523 LiDAR scans. These 523 scans belong to a total of 10 classes: building, car, pedestrian, pillar,
pole, traffic lights, traffic sign, tree, trunk and van. Load the data and labels and inspect the content and dimensions
before proceeding further.
Feature Extraction
A primary step in developing a supervised machine learning object recognition algorithm involves extracting feature
descriptors from LiDAR pointclouds that can efficiently represent different categories in the data. The pointcloud
features available in literature can be roughly divided into two classes: the global features for the whole object or the
local features for each point inside the object pointcloud. A simple object-level global feature can be the intensity of
the returned laser signal. Different objects with different characteristics and views reflect the laser signal differently.
The intensity values (4th column in the LiDAR pointcloud matrix) can be used to extract simple features for our
recognition task.
C.1 Extract three scalar features from the intensity values for all the pointclouds in the given data and store them
in a 2D matrix.
The intensity features alone cannot capture all the properties of the objects from the pointclouds. Features that can be
extracted from the 3D coordinates(x,y,z) of the object can provide information on the physical structure of the object.
A simple shape feature can be the spread of pointcloud data in different directions, which can be calculated using
an unsupervised machine learning technique called principal component analysis. The eigenvectors of the covariance
matrix give the spread directions, and the eigenvalues give the spread along each direction.
C.2 Compute a three-dimensional shape feature that can describe the three largest amounts of spread along the three
orthogonal directions.
Data Split
An essential step before training a machine learning classifier is to split the dataset into training and test subsets. The
training subset data and labels are used to train the classifier, which is then evaluated using the test subset data and
labels to assess its classification performance. The split into training and test subsets is usually done using stratified
sampling to ensure all the classes in the dataset are represented in both subsets.
C.3 After C.1 and C.2, you have a feature matrix of intensity and shape features. Perform a stratified split to
randomly sample 70% of the data points into your training subset and the remaining into test subsets. Verify
that the training and test subset each contains LiDAR scans from all the ten classes.
Classification
Once the features are extracted from pointclouds, the subsequent step in the object recognition task is to train the
classifier using the training data and labels. A widely used classifier is the support vector machine (SVM). SVM
classification is usually formulated for a 2-class (binary) scenario. An SVM classifies the data by finding the best
hyperplane that separates all data points of one class from those of the other. The best hyperplane for an SVM is the
one with the largest margin between the two classes, where the margin is defined as the maximal width of the slab
parallel to the hyperplane with no interior data points. Binary SVM can then be converted to multi-class SVM by
training K separate SVMs (one-versus-the-rest) for a K-class classification.
5
C.4 Use the training feature matrix and class labels from C.1 and C.2 to train a linear multi-class SVM. Evaluate
this trained classifier using the test data and labels and determine the classification accuracy.
C.5 Retrain the classifier in C.4 using only the intensity features and determine test accuracy. Now, incrementally
add one shape feature at a time and repeat the training-testing process. You will have four values of test
accuracies after this trial. Inspect the test accuracies and comment on the results.
Part D: Combining LiDAR and Camera Data [Optional]
Sensor fusion combines data from multiple sensors to provide a more accurate and complete understanding of the
environment. It aims to overcome the limitations of individual sensors by gathering and fusing data from multiple
sensors to produce more reliable information with less uncertainty. For example, LiDAR sensors and cameras are
commonly used together in autonomous driving applications because a LiDAR sensor collects 3D spatial information
while a camera captures the appearance and texture of that space in 2D images. Data from these sensors are then
aligned to ensure they correspond to the same view of the surrounding environment and subsequently combined to
get a complete picture.
The LiDAR pointcloud of a scene and its corresponding image of a simulated environment are provided in ‘partD.mat’
and ‘partD.png’, respectively. The rigid LiDAR to camera transformation matrix for GV is
0 0 1 0
−1 0 0 0
0 −1 0 0
0 0 0 1

The camera has a focal length of 1109 mm and the principal point is at (640,360)mm. It produces images with a
width of 1280 pixels and height of 720 pixels.
Align LiDAR and Camera Coordinates
A basic step to combine LiDAR and camera data is to project the LiDAR points onto the image plane, using the
LiDAR to camera transformation and camera intrinsics, such that the pointcloud coordinates (x, y) is aligned with
image plane coordinates (x, y). The z coordinate of the pointcloud provides depth information for the image, and
colour information images can be assigned to pointcloud to obtain complementary information.
D.1 Use the rigid LiDAR to camera transformation matrix to project the LiDAR point cloud onto the image. Do
not use the built-in MATLAB command. Retain only the pointclouds in front of the camera plane (positive
depth) and filter out the final projected points outside the image size. Verify your projections using the built-in
commands.
Fuse Camera Intensity and Pointclouds
A simple way to combine LiDAR and camera data is to impart the color information from the image to the corre-
sponding projected point clouds and get a complete view of the object with depth and color information.
D.2 After projecting LiDAR points onto image plane in C.1, incorporate the image intensities to the corresponding
LiDAR points and display the point cloud. You can crop the fused point cloud to contain only the points present
in the field of view of the camera. Perform the fusion without using the built-in command and later verify with
the built-in command.
6
7 Notes and tips
• Implement the algorithms from scratch if specifically asked. Otherwise, you can use built-in MATLAB
commands.
• Pay special attention to arguments inside a MATLAB command to configure your code for the task.
• You may have to standardise your feature data for part C.
• For Part C, set a random generator using your student ID as the seed to ensure the code is reproducible.
• Refer to tutorial 7 for relevant machine learning basics and developing a classifier for your LiDAR data.
• Talk to your demonstrators if you need clarification on anything.
8 Academic Integrity
We take academic integrity seriously. Please note that while students may discuss and help each other in the thinking
process, you should work on your assignments separately. Details about academic integrity can be found at
http://academicintegrity.unimelb.edu.au/. Please check with the tutors or the lecturer if you are in doubt. Ignorance
is not a valid reason for academic misconducts.
References
[1] Himmelsbach, M., Mueller, A., Lu¨ttel, T., & Wu¨nsche, H. J. (2008, October). LiDAR-based 3D object perception.
In Proceedings of 1st international workshop on cognition for technical systems (Vol. 1).
[2] Chen, T., Dai, B., Liu, D., & Song, J. (2014, June). Performance of global descriptors for velodyne-based urban
object recognition. In 2014 IEEE Intelligent Vehicles Symposium Proceedings (pp. 667-673). IEEE.
[3] Wang, D. Z., Posner, I., & Newman, P. (2012, May). What could move? finding cars, pedestrians and bicyclists
in 3d laser data. In 2012 IEEE International Conference on Robotics and Automation (pp. 4038-4044). IEEE.
[4] De Deuge, M., Quadros, A., Hung, C., & Douillard, B. (2013, December). Unsupervised feature learning for clas-
sification of outdoor 3d scans. In Australasian Conference on Robitics and Automation (Vol. 2, p. 1). Kensington,
Australia: University of New South Wales.
[5] Quadros, A. J. (2013). Representing 3D shape in sparse range images for urban object classification.
[6] Tang, G., Liu, Z., & Xiong, J. (2019). Distinctive image features from illumination and scale invariant keypoints.
Multimedia Tools and Applications, 78, 23415-23442.