3551 Trousdale Rkwy, University Park, Los Angeles, CA
1. Object Representation (8 pts): We can represent an object by its boundary (𝑥(𝑠), 𝑦(𝑠)), 0 ≤ 𝑠 ≤ 𝑆 where S is the length of the object’s boundary and s is distance along that boundary from some arbitrary starting point. We can combine x and y into a single complex function 𝑧(𝑠) = 𝑥(𝑠) + 𝑗𝑦(𝑠). The Discrete Fourier Transform (DFT) of z is 𝑍(𝑘) = ∑𝑒−2𝜋𝑗𝑘𝑆𝑠 𝑆−1 𝑠=0 𝑧(𝑠), 0 ≤ 𝑘 ≤ 𝑆 − 1 We can use the coefficients 𝑍(𝑘) to represent the object boundary. The limit on s is S-1 because for a closed contour 𝑧(𝑆) = 𝑧(0). The Inverse Discrete Fourier Transform is 𝑧(𝑠) = 1𝑆∑𝑒+2𝜋𝑗𝑘𝑠 𝑆 𝑆−1 𝑘=0 𝑍(𝑘), 0 ≤ 𝑘 ≤ 𝑆 − 1 a. Suppose that the object is translated by (∆𝑥, ∆𝑦), that is, 𝑧′(𝑠) = 𝑧(𝑠) + ∆𝑥 + 𝑗∆𝑦. How is 𝑧′’s DFT 𝑍′(𝑘) related to 𝑍(𝑘)? b. What object has 𝑧(𝑠) = 𝑅 cos 2𝜋𝑠 𝑆 + 𝑗𝑅 sin 4𝜋𝑠 𝑆 ? Sketch it. c. What is 𝑍(𝑘) corresponding to 𝑧(𝑠) from Part b? Hint: Most coefficients are 0. 2. Interpretation Tree (5 pts): We have a choice of matching detected image elements (edges) to the model or model elements to the object. Let E be the set of detected image edges and M the set of model edges. In the first case, matching image edge to model edges, we generate a tree of depth |𝐸| and breadth |𝑀| with tree size |𝑀||𝐸| . In the case of matching the model to image, we generate a tree of size |𝐸||𝑀| . We expect many more image elements than model elements – there may be many candidate image edges in a cluttered scene vs. a small number of model edges. a. Which approach is preferable, matching image edges to model or model to image edges? You might consider the case where there are 12 image edges and 5 model edges, for example. b. One advantage to using the interpretation tree approach is that it is possible to match an unknown object in the image to a model even if the object is partially occluded. We do this by allowing an object element to match a “null element” in the model. Does this change your answer to part a.? How and why? Or why not? 3. Stereo via Singular Value Decomposition (8 pts): Assume the usual stereo geometry, where the left and right cameras are offset by baseline 𝐵⃗ that is perpendicular to the common focal vector 𝐹 . Then the stereo imaging equations are 𝑋 𝐿 = |𝐹 |2 𝐹 ∙ 𝑋 𝑊 (𝑋 𝑊 + 𝐵⃗ 2) , 𝑋 𝑅 = |𝐹 |2 𝐹 ∙ 𝑋 𝑊 (𝑋 𝑊 − 𝐵⃗ 2) In the presence of imaging errors or noise, these equations might not hold exactly. They can be approximated by 𝑋 𝐿 − |𝐹 |2 𝐹 ∙ 𝑋 𝑊 (𝑋 𝑊 + 𝐵⃗ 2) ≈ 0⃗ , 𝑋 𝑅 − |𝐹 |2 𝐹 ∙ 𝑋 𝑊 (𝑋 𝑊 − 𝐵⃗ 2) ≈ 0⃗ a. Show that these equations can be written as a 4x4 matrix operating on a column vector in homogeneous coordinates. [−𝑓 0 0 −𝑓 𝑥𝐿 −𝑓𝑏/2 𝑦𝐿 0 −𝑓 0 0 −𝑓 𝑥𝑅 𝑓𝑏/2 𝑦𝑅 0 ] [𝑥𝑊𝑦𝑊𝑧1𝑊] ≈ 0⃗ Hint: Combine the approximate imaging equations into a single matrix equation, multiply to eliminate the denominators, and simplify, not necessarily in that order! b. The above equation can be written as 𝐴𝑋̃′ ≈ 0⃗ . We can use SVD to find the singular vector 𝑋̃′ that minimizes |𝐴𝑋 |2 subject to |𝑋 |2 = 1. Express world point 𝑋 𝑊 = [𝑥, 𝑦, 𝑧]𝑇 in terms of 𝑋̃′ = [𝑥′, 𝑦′, 𝑧′, 𝑤′]𝑇. c. When 𝑦𝐿 = 𝑦𝑅, show that a. gives 𝑧𝑊 = 𝑓𝑏 𝑑 , where 𝑑 is the disparity. 4. Binary Image Matching (2 pts): Let 𝐼1 and 𝐼2 be binary images. Show that |𝐼1 − 𝐼2|2 = ∑# of pixels where 𝐼1 ≠ 𝐼2 Where |𝐼|2 = ∑ 𝑖𝑗𝑘 2 is the sum of all (pixels squared) in I.