ECMM426 UNIVERSITY OF EXETER COLLEGE OF ENGINEERING, MATHEMATICS AND PHYSICAL SCIENCES COMPUTER SCIENCE Examination, May 2020 Computer Vision Module Leader: Dr. Anjan Dutta Duration: TWO HOURS + 30 MINUTES UPLOAD TIME Answer ALL the questions. Question 1 is worth 80 marks, while question 2 is worth 20 marks. The marks for this module are calculated from 40% of the percentage mark for this paper plus 60% of the percentage mark for associated coursework. This is a OPEN BOOK examination. ECMM426 SECTION A (Multiple Choice Questions) Question 1 There are FORTY multiple choice questions with several possible choices each. Clearly mark or write all the correct choices. Please note these questions might have multiple correct answers, with partial marking. 1. Consider a grayscale image of size 200× 300. How much space in kilobytes (KB) would this image require for storing in a disk? (i) 20 KB (ii) 60 KB (iii) 300 KB (iv) 100 KB (2 marks) 2. Which of the following is a challenge when dealing with computer vision problems? (i) Variations due to geometric changes (like pose, scale etc) (ii) Variations due to photometric factors (like illumination, appearance etc) (iii) Background clutter (iv) All of the above (2 marks) 3. Convolution of a Gaussian filter with another Gaussian filter generates: (i) Box filter (ii) Unsharp filter (iii) Gaussian filter (iv) None of the above (2 marks) ECMM426 (2020) 1 ECMM426 4. Suppose we have the following noisy image (Figure 1): Figure 1: salt and pepper noise This type of noise in the image is called ‘salt & pepper’ noise. Which type of filter should be applied to denoise the image? (i) Linear filter (ii) Median filter (iii) Sobel filter (iv) None of the above (2 marks) 5. ‘Ringing’ is an image artefact generated by: (i) Box filter (ii) Gaussian filter (iii) Unsharp filter (iv) All of the above (2 marks) ECMM426 (2020) 2 Please Turn Over ECMM426 6. What would be the relation between the original and modified image if the original image be convolved with the following filter (Figure 2)? Figure 2: filter (i) Blurred image (ii) Sharpened image (iii) Inverted image (iv) Rotated image (2 marks) ECMM426 (2020) 3 ECMM426 7. If we convolve an image with the filter given below (Figure 3), what would be the relation between the original and modified image? Figure 3: filter (i) The original image will be shifted to the right by 1 pixel (ii) The original image will be shifted down by 1 pixel (iii) The original image will be shifted to the left by 1 pixel (iv) The original image will be shifted up by 1 pixel (2 marks) 8. In Canny edge detection, we will get more continuous edges if we make the following change to the hysteresis thresholding (i) increase the high threshold (ii) decrease the high threshold (iii) increase the low threshold (iv) decrease the low threshold (2 marks) ECMM426 (2020) 4 Please Turn Over ECMM426 9. In the following image (Figure 4), you can find an edge labelled in the red region. Which form of discontinuity create this kind of edge? Figure 4: chair (i) Depth Discontinuity (ii) Surface colour Discontinuity (iii) Illumination discontinuity (iv) None of the above (2 marks) 10. What kind of edges would the Canny edge detector generate without doing the non-maximum suppression step? (i) Very thin edges (ii) Thick edge regions (iii) Perfect edges (iv) None of the above (2 marks) ECMM426 (2020) 5 ECMM426 11. What are the main benefits of detecting image edges using the zero-crossings of Laplacian of Gaussian (LoG) of the image rather than thresholding its gradient magnitude? (i) Zero-crossing produces contours instead of regions (ii) Zero-crossing is less sensitive to image noise (iii) Zero-crossing is independent of threshold parameter (iv) All of the above (2 marks) 12. Let λ1 and λ2 be the eigenvalues of the second order moment matrix M, from which we can compute the measure for detecting Harris corners as R = λ1λ2− k(λ1 +λ2)2, where k is a small constant. What are the different criteria in terms of R to reject a region as a purpose of detecting corner? (i) R > 0 (ii) |R| is small (iii) R < 0 (iv) All of the above (2 marks) 13. Which of the following transformations is the Harris corner detector invariant to? (i) Translation (ii) Scaling (iii) Rotation (iv) Photometric (2 marks) ECMM426 (2020) 6 Please Turn Over ECMM426 14. Let f 11 be a SIFT descriptor from an image I1, and f 1 2 and f 2 2 be two SIFT descriptors from another image I2, which are respectively the nearest and second nearest neighbours (in L2 distance) of f 11 in I2. f 1 1 from I1 is said to be matched to f 12 in I2 if it satisfies the following criteria, where ‖ ·‖ denotes L2 distance: (i) ‖f 1 1−f12 ‖ ‖f11−f22 ‖ ≈ 0 (ii) ‖f 1 1−f12 ‖ ‖f11−f22 ‖ ≈ 1 (iii) ‖f 1 1−f12 ‖ ‖f11−f22 ‖ 1 (iv) ‖f 1 1−f12 ‖ ‖f11−f22 ‖ 1 (2 marks) ECMM426 (2020) 7 ECMM426 15. Suppose you have to rotate an image (Figure 5). Image rotation is nothing but multiplication of image by a specific matrix to get a new transformed image. Figure 5: rotation For simplicity, we consider one point in the image to rotate with co-ordinates as (1, 0) to a co-ordinate of (0, 1), which of the following matrix would we have to multiply with? (i) [ 1 1 1 1 ] (ii) [ 0 1 1 1 ] (iii) [ 0 −1 1 0 ] (iv) [ 0 1 1 0 ] (2 marks) ECMM426 (2020) 8 Please Turn Over ECMM426 16. The Cartesian coordinate of the homogeneous coordinate (x, y, w) is (i) ( x w , y w ) (ii) ( x w , y w , 1) (iii) (x, y, 1) (iv) (x, y) (2 marks) 17. Let R1 and R2 be two matrices that define two different rotation transformations. Which one of the followings is true about them? (i) R1R2 6= R2R1 (ii) R1R2R1 = R2R1R2 (iii) R2R1 > R1R2 (iv) R1R2 < R2R1 (2 marks) 18. In 2D coordinate system, mirroring about the line y = x can be achieved by the following transformation matrix: (i) [ 0 1 1 0 ] (ii) [ 0 1 −1 0 ] (iii) [ 0 −1 1 0 ] (iv) [ 1 1 1 1 ] (2 marks) ECMM426 (2020) 9 ECMM426 19. LetO be the origin of a 2D coordinate system C and P (6= O) be any point in C. We further assume that R be a rotation about O and T be the translation from the point P to O. The transformation matrix that achieve rotation R about the point P can be written as: (i) RTR−1 (ii) T−1RT−1 (iii) T−1RT (iv) TRT (2 marks) 20. Which of the following could affect the intrinsic parameters of a camera? (i) A crooked lens system (ii) Diamond/Rhombus shaped pixels with non right angles (iii) The aperture configuration and construction (iv) Any offset of the image sensor from the lens’s optical centre (2 marks) 21. Which of the following statements describes an affine camera but not a general perspective camera? (i) Relative sizes of visible objects in a scene can be determined without prior knowledge (ii) Can be used to determine the distance from a object of a known height (iii) Approximates the human visual system (iv) An infinitely long plane can be viewed as a line from the right angle (2 marks) ECMM426 (2020) 10 Please Turn Over ECMM426 22. Let us assume that P number of unknown 3D points are projected into F number of images where the 2D coordinates of those P points and their correspondences are known. Assuming W (shape: 2F × P ) as the 2D coordinates of those P points in F images, R (shape: 2F × 3) as the camera rotation matrix for F images and S as the reconstructed 3D real world points, their relation can be expressed as W = R × S, where W , R are known and S is unknown. The solution of S can be given by: (i) R−1W (ii) W TR−1W TW (iii) W TR−1W TR−1 (iv) None of the above (2 marks) 23. Let us assume that P number of unknown 3D points are projected into F number of images where the 2D coordinates of those P points and their correspondences are known. Assuming W (shape: 2F × P ) as the 2D coordinates of those P points in F images, R (shape: 2F × 3) as the camera rotation matrix for F images and S as the reconstructed 3D real world points, their relation can be expressed as W = R× S, where W is known and R, S are unknown. The solutions of R and S can be estimated by: (i) Random matrices that satisfy the expression (ii) Singular value decomposition (SVD) and then selecting appropriate submatrix depending on matrix rank (iii) Selecting those rows and columns that respectively maximise and minimise the matrix rank (iv) None of the above (2 marks) ECMM426 (2020) 11 ECMM426 24. Recognising an ‘Armchair’ among a collection of ‘Wing chair’, ‘Deck chair’, ‘Desk chair’, ‘Barber chair’, ‘Operator chair’, ‘Armchair’, ‘Executive chair’, ‘Garden chair’ is known as: (i) Instance recognition (ii) Category recognition (iii) Deep recognition (iv) None of the above (2 marks) 25. Which one of the following steps is not involved in bag-of-words model? (i) Feature extraction (ii) Feature quantisation (iii) Non-maximum suppression (iv) Visual vocabulary creation (2 marks) 26. Let us assume that for creating a bag-of-visual-words (BoVW) model, we have created a visual vocabulary of size 300. Now if we want to create a bag-of-visual-words image descriptor with a 4 × 4 spatial pyramid, the dimension of the feature should be: (i) 4800 (ii) 1200 (iii) 300 (iv) 2400 (2 marks) ECMM426 (2020) 12 Please Turn Over ECMM426 27. In a bag-of-visual-words model, the optimal size of the visual vocabulary should be determined on the evaluation performance on the following data split: (i) Train set (ii) Validation set (iii) Test set (iv) Both Validation and Test set (2 marks) 28. What is the regular practice to use linear SVM to classify two classes that are not linearly separable? (i) Cross validation (ii) Kernel trick (iii) Neural neighbour trick (iv) None of the above (2 marks) 29. In Viola-Jones face detection algorithm, how does one implement a ‘weak classifier’? (i) SIFT feature with thresholding (ii) Rectangular feature with thresholding (iii) HOG feature with SVM (iv) Rectangular feature with SVM (2 marks) ECMM426 (2020) 13 ECMM426 30. Suppose we have the following image (Figure 6): Figure 6: image Our task is to segment the objects in the image. A simple way to do this is to represent the image in terms of pixel intensity and then cluster them according to the values. On doing this, we got the following histogram (Figure 7) of pixel intensity Figure 7: histogram Suppose we choose k-means clustering to solve the problem, what would be the appropriate value of k from just a visual inspection of the pixel intensity histogram? (i) 1 (ii) 2 (iii) 3 (iv) 4 (2 marks) ECMM426 (2020) 14 Please Turn Over ECMM426 31. Which of the following is a representation learning algorithm? (i) Neural network (ii) Random Forest (iii) k-Nearest neighbour (iv) None of the above (2 marks) 32. Which of the following gives non-linearity to a neural network? (i) Stochastic Gradient Descent (ii) Rectified Linear Unit (ReLU) (iii) Sigmoid (iv) None of the above (2 marks) 33. Suppose you have 5 convolutional kernels of size 7 × 7 with zero padding and stride 1 in the first layer of a convolutional neural network. You pass an input of dimension 224×224×3 through this layer. What are the dimensions of the data which the next layer will receive? (i) 217 x 217 x 5 (ii) 217 x 217 x 8 (iii) 218 x 218 x 5 (iv) 220 x 220 x 7 (2 marks) ECMM426 (2020) 15 ECMM426 34. Which of the following options can be used to reduce overfitting in deep learning models? 1. Add more data 2. Use data augmentation 3. Use architecture that generalises well 4. Add regularisation 5. Reduce architectural complexity (i) 1, 2, 3 (ii) 1, 4, 5 (iii) 1, 3, 4, 5 (iv) All of these (2 marks) 35. Suppose an input to average pooling layer is given above. The pooling size of neurons in the layer is (3, 3): 3 4 6 5 7 3 4 3 7 What would be the output of this pooling layer? (i) 3 (ii) 14 (iii) 5.5 (iv) 7 (2 marks) ECMM426 (2020) 16 Please Turn Over ECMM426 36. Which of the following is a data augmentation technique used in image recognition tasks? 1. Horizontal flipping 2. Random cropping 3. Random scaling 4. Colour jittering 5. Random translation 6. Random shearing (i) 1, 2, 4 (ii) 2, 3, 4, 5, 6 (iii) 1, 3, 5, 6 (iv) All of these (2 marks) 37. What are the steps for using a gradient descent algorithm? 1. Calculate error between the actual value and the predicted value 2. Reiterate until you find the best weights of network 3. Pass an input through the network and get values from output layer 4. Initialise random weight and bias 5. Go to each neuron which contributes to the error and change its respective values to reduce the error (i) 1, 2, 3, 4, 5 (ii) 5, 4, 3, 2, 1 (iii) 3, 2, 1, 5, 4 (iv) 4, 3, 1, 5, 2 (2 marks) ECMM426 (2020) 17 ECMM426 38. While training a neural network for image recognition task, we plot the graph of training error (loss) and validation error for debugging as follows (Figure 8): Figure 8: training curve What is the best place in the graph to stop the training of the neural network? (i) A (ii) B (iii) C (iv) D (2 marks) 39. What is the sequence of the following tasks in a perceptron? 1. Initialise weights of perceptron randomly 2. Go to the next batch of dataset 3. If the prediction does not match the output, change the weights 4. For a sample input, compute an output (i) 1, 2, 3, 4 (ii) 4, 3, 2, 1 (iii) 3, 1, 2, 4 (iv) 1, 4, 3, 2 (2 marks) ECMM426 (2020) 18 Please Turn Over ECMM426 40. Assume a simple MLP model (single layer) with 3 hidden units with inputs x = (3, 2, 1). The current weights and bias of the input units are respectively w = (4, 5, 6) and b = 7. Assume the activation function is a linear constant value of σ = 5. What will be the output? (i) 64 (ii) 96 (iii) 175 (iv) 435 (2 marks) (Total 80 marks) ECMM426 (2020) 19 ECMM426 SECTION B (TRUE or FALSE) Question 2 Read each of the TWENTY statements below carefully. Clearly write TRUE if you think a statement is TRUE and FALSE if you think the statement is FALSE. 1. To blur an image, you can use any linear filter (1 mark) 2. A box filter is a spatial domain linear filter in which each pixel in the resulting image has a value equal to the average value of its neighbouring pixels in the input image. (1 mark) 3. Convolving twice with a Gaussian kernel of width σ is same as convolving once with a Gaussian kernel of width 2σ (1 mark) 4. Thresholding is a linear filter. (1 mark) 5. Convolution in spatial domain is equivalent to multiplication in frequency domain, which is one of the advantages of Fourier transform for performing convolution on images. (1 mark) 6. An alternative and computationally cheaper way of detecting corners involves computing the cornerness measure as R = trace(M) − kdet(M)2, where k is a small constant. (1 mark) 7. Blob detector is invariant to scaling but variant to illumination. (1 mark) 8. Scale Invariant Feature Transform (SIFT) is a feature descriptor that computes histogram of gradients in 8 directions within a local patch which is divided into 4x4 grids. (1 mark) ECMM426 (2020) 20 Please Turn Over ECMM426 9. With homogeneous coordinates, all the transformations can be expressed as linear mappings and be computed as matrix multiplication. (1 mark) 10. In a pinhole camera, too big diameter limits the amount of light entering the camera and causes light diffraction, which eventually blurs the image. (1 mark) 11. The assumption that corresponding pixel values remain the same in the two consecutive frames in a video is called as the brightness constancy constraint. (1 mark) 12. In computer vision, the aperture problem refers to the fact of relative darkness that appeared in an image due to the small aperture in a camera. (1 mark) 13. Training a linear support vector machine is the process of finding the hyperplanes that equally maximise the distance between the positive and negative examples. (1 mark) 14. In an integral image, each pixel represents the cumulative sum of a corresponding input pixel with all pixels above and to the left of the input pixel. (1 mark) 15. An attentional cascade of classifiers is built with a series of classifiers starting with the simpler ones which reject many of the negative sub- windows while correctly detecting almost all the positive responses which trigger the evaluation of a second and more complex classifier, and so on. (1 mark) 16. The difference between deep learning and machine learning algorithms is that there is no need of feature engineering in machine learning algorithms, whereas, it is recommended to do feature engineering first and then apply deep learning. (1 mark) ECMM426 (2020) 21 ECMM426 17. Increase in size of a convolutional kernel would necessarily increase the performance of a convolutional neural network. (1 mark) 18. Suppose we have a neural network with Rectified Linear Unit (ReLU) activation function, which can approximate an XNOR function. Now, if we replace the Rectified Linear Unit (ReLU) activations by linear activations, the resulting neural network would not be able to approximate the XNOR function anymore. (1 mark) 19. The number of neurons in the output layer should match the number of classes (where the number of classes is greater than 2) in a supervised learning task. (1 mark) 20. The function f(x) = ax3+ bx2+ cx+d can be represented by a single fully connected hidden layer without any non-linear activation. (1 mark) (Total 20 marks) ECMM426 (2020) 22 End of Paper
学霸联盟