Student number Semester 2, 2021 Computing and Information Systems COMP90086 - Computer Vision Reading time: 15 minutes Writing time: 2 hours Permitted Materials • Calculator Instructions to Students • This paper has 7 pages including this cover page. • There are 9 questions in the exam worth a total of 120 marks, making up 50% of the total assessment for the subject. • Please answer all questions in this examination paper in the spaces provided. • Your writing should be clear; illegible answers will not be marked. • You may not remove any part of this examination paper from the examination room. Page 2 of 7 Section A. Short Answer Questions Answer each of the questions in this section as briefly as possible. Expect to answer the text response questions in nomore than 2-3 sentences. Question 1: Short Answer Questions [35marks] (a) (3 marks) Why are image borders a problem for convolution? Explain two options for handling the image borders when doing convolution. (b) (3 marks) Why is a Gaussian filter preferred to a box filter (e.g., the filter shown below) for blur- ring images? 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 (c) (3 marks) Suppose that you convolve an image I wiht a filter f , then convolve that output with a second filter g: (I ∗ f) ∗ g (1) Which of the following would do the equivalent filtering operation on this image in the Fourier domain? (There may be multiple answers; select all that apply.) Notation: ∗ denotes convolu- tion, denotes element-wisemultiplication, FT [x] is the Fourier transform of x, andFT 1[x] is the inverse Fourier transform of x. © FT [I] FT [f g] © FT [I] ∗ FT [f g] © FT [I ∗ f ] FT [g] © FT [I] FT [f ] FT [g] (d) (4 marks) If two dierent objects are photographed under exactly the same lighting conditions with the same camera and produce the same RGB values, can we conclude that both objects have the same spectral power distribution? Why or why not? (e) (3 marks) Which of the following statements are TRUE of the ReLU activation function? (There may bemultiple answers. Select all that apply.) © It helps reduce the vanishing gradient problem, compared to other activation func- tions like sigmoid. © It allows for faster training of deep CNNs compared to other activation functions like sigmoid. © It adds a non-linearity to the output of a convolutional kernel. © It reduces the dimensions of the output of a convolutional kernel. (f) (3 marks) The Canny edge detector has two threshold parameters that must be set by the user. What is the role of each threshold? Page 3 of 7 (g) (4 marks) Consider the image of a golf ball shown below. Describe two cues that can be used to infer the 3D surface shape of the golf ball from the image and explain what can be computed from each cue. (h) (4 marks) Assume that you have trained a deep CNN on an image dataset for classification, and you observe that the training accuracy is very high but the testing accuracy is very low. Name three possible strategies that can address this problem. (i) (4 marks) Why are generative adversarial networks (GANs) prone to mode collapse? Describe a method to detect mode collapse. (j) (4 marks) Compare and contrast the region-merging and normalised cuts approaches to image segmentation. How are they similar and where do they dier? Section B. Methodological Questions In this section you are asked to demonstrate your conceptual understanding of a subset of the methods that we have studied in this subject. Question 2: Corner detection [12 marks] The following questions relate to corner detection. (a) (6 marks) Would the patch w in the image above be considered a “corner” by the Harris cor- ner detection algorithm? Why or why not? Justify your answer in terms of the corner response function. (b) (6 marks) Is the corner response function invariant to: • translation? • image-plane rotation? • scale? In addition to providing a yes/no response for each property, briefly justify your answer. Page 4 of 7 Question 3: Convolutional neural networks [12 marks] The architecture of U-Net is shown below. (a) (4 marks) Give an example of a task that is suited to the U-Net architecture and explain why. (b) (4 marks) Why does the U-Net architecture contain downsampling stages followed by upsam- pling stages? (c) (4 marks) Why does the U-Net architecture contain the “bypass” arrows that concatenate acti- vations from the downsampling side with upsampled activations? Question 4: Texture synthesis [13 marks] (a) (5 marks) Howdoes parametric texture synthesis dier fromnon-parametric texture synthesis? What is an advantage of each approach over the other? (b) (4 marks) The non-parametric texture synthesis algorithms discussed in class (Efros & Leung, 1999; Efros & Freeman, 2001) have patch size as a free parameter. What is the eect of decreasing patch size? (c) (4 marks) How would you choose an appropriate patch size to correctly synthesize the texture shown below? Be specific, referencing the image. Page 5 of 7 Question 5: Object detection [13 marks] (a) (4 marks) Briefly explain thedierencebetween region-proposal-based and single-stage object detectors and the relative advantage of each approach. (b) (4 marks) Ina region-proposal-basednetwork, how isa regionof interestdierent fromabound- ing box prediction and how are they related? (c) (5 marks) Why is class imbalance a problem for CNN-based object detectors? Explain how this problem is handled by a region-proposal-basedmethod and a single-stage method. Section C. Algorithmic Questions In this section you are asked to demonstrate your understanding of a subset of themethods thatwe have studied in this subject, in being able to perform algorithmic calculations. Question 6: Stereo disparity [6marks] Assume the two images shownbelowwere taken froma calibrated pair of stereo cameras. Each cam- era has a focal lengthof 30mmandproduces a 100 x 100mm image. The two cameras are at the same height and each has its optical centre in the centre of the image (at the point (50,50) in the image). The image planes of the cameras are parallel to each other and to the baseline which is 500 mm. What is the depth (distance to the baseline) of the indicated point x, which is located at (31,30) in the le camera’s image and (29,30) in the right camera’s image? Show your work. Question 7: Epipolar Geometry [8marks] The essential matrix for a pair of cameras, mapping points in camera 1 to lines in camera 2 is: E = 3 −4 45 0 0 −4 −3 3 There are three points of interest: p1 = ( 0 0 ) , p2 = ( 1 0 ) , p3 = ( 0 1 ) (a) (6 marks) Which of these three points is the epipole in camera 1? Show your working. (b) (2 marks) Which of these three points corresponds to the point q = ( 3 −4 ) in camera 2? Show your working. Page 6 of 7 Question 8: Convolutional neural networks [15 marks] In the following CNNnetwork, the input is a RGB image, with both height andwidth equal to 224. The convolution operation in the convolutional layer is standard 2D convolution (i.e., each kernel has the same number of channels of the input, and the kernel goes through local patches of the input to conduct element-wise multiplication and take sum). “FC10” denotes a fully connected layer with 10 units. Answer the following questions (show your workings): (a) (3 marks) Compute the size of the feature maps output by the convolutional layer and max- pooling layer (The output size should be in format of height×width× number of channels.) (b) (6 marks) Compute the number of parameters and multiplications of the convolutional layer andmax-pooling layer (ignore the bias). (c) (6 marks) If the standard 2D convolution in the convolutional layer is replaced with Depthwise Separable Convolution (with the same padding, stride and output featuremap size), what is the number of parameters andmultiplications in the layer? Question 9: Transposed convolution [6marks] Compute the result of performing a transposed convolution on the 2× 2 input with the 3× 3 kernel shown below (a) (3 marks) with a stride of 2 (b) (3 marks) with a stride of 1 Express each result as a matrix and include the trimming step. Input: 4 7 6 3 Kernel: 0 1 0 1 2 1 0 1 0 Page 7 of 7
学霸联盟