Excel/Spss/Python代写-COMP-381|学霸联盟

Excel/Spss/Python代写-COMP-381

时间：2021-06-06

COMP-381 Introduction to Machine Leanring 2021 Assignment # 3 NAME: Signature: STD. NUM: 1. Image Compression: Convert a photo of yours to .png format. Then load it with python and compute its SVD compression as explained in the lecture notes and in class. Choose a number of eigenvalues (truncation threshold) and explain why you chose this value below. How much compression have you achieved? Answer this question below and hand in all code and your image before and after compression. 2. PCA: As explained in class in the context of digit images (the images of 2’s), implement a PCA projection of about 10 or 20 data instances of your liking (images, text documents, objects in a excel document, wikipedia entities, etc.) to a 2D PCA layout (like the ones for countries, hand gestures and numbers in the lectures). If the data instances are images, transform them to be of the same size, say 40 by 40 pixels, and grayscale. For help on generating 2D PCA layouts, see the posting of Bobak Shahriari on “annotating scatter plots with images” in the google group or visit http://matplotlib.org/examples/pylab_examples/demo_annotation_box.html. Hand in all the code and your 2D PCA layout. For example: for a hotel browsing app, I would select 10 Vancouver hotels in tripadvisor. As the attributes for each hotel, I would use the 5 traveler ratings (excellent, very good, average, poor, terrible), the tripadvisor rank and the cheapest room price. I would then form a matrix of 10 rows by 7 columns, and use the SVD to compute the 2D PCA components U2Σ2. Finally, I would do a scatter plot of the hotels and, at the location of the hotel point in 2D, I would insert either the name of the hotel or the image of the hotel. Be creative and enjoy the exercise! 3. Learning Bayesian networks: For the Fritz network that was discussed in the lecture notes, the joint distribution of the i-th observation is: P (Ti,Mi, Si, Fi|θ, α, γ1:2, β1:4) = P (Ti|Si, Fi, β1:4)P (Fi|Mi, γ1:2)P (Mi|θ)P (Si|α), where each distribution is Bernoulli: P (Mi|θ) = θI(Mi=1)(1− θ)I(Mi=0) P (Si|α) = αI(Si=1)(1− α)I(Si=0) P (Fi|Mi = 0, γ1) = γI(Fi=1|Mi=0)1 (1− γ1)I(Fi=0|Mi=0) P (Fi|Mi = 1, γ2) = γI(Fi=1|Mi=1)2 (1− γ2)I(Fi=0|Mi=1) P (Ti|Si = 0, Fi = 0, β1) = βI(Ti=1|Si=0,Fi=0)1 (1− β1)I(Ti=0|Si=0,Fi=0) P (Ti|Si = 0, Fi = 1, β2) = βI(Ti=1|Si=0,Fi=1)2 (1− β2)I(Ti=0|Si=0,Fi=1) P (Ti|Si = 1, Fi = 0, β3) = βI(Ti=1|Si=1,Fi=0)3 (1− β3)I(Ti=0|Si=1,Fi=0) P (Ti|Si = 1, Fi = 1, β4) = βI(Ti=1|Si=1,Fi=1)4 (1− β4)I(Ti=0|Si=1,Fi=1) (a) Derive the ML estimates of θ, α, γ1:2, β1:4 for the dataset in the slides. Hint: we did some of these estimates already in class. (b) Derive the posterior mean estimates of θ, α, γ1:2, β1:4 assuming that we use the same Beta(1, 1) prior for each of the parameters. Start by writing each of the 8 posterior distributions. For example, P (θ|M1:5) ∝ 5∏ i=1 P (Mi|θ)p(θ) ∝ θ4(1− θ)1θ1−1(1− θ)1−1 = θ5−1(1− θ)2−1 and, consequently, the posterior mean estimate of θ is E(θ|M1:5) = 55+2 = 5/7. (c) Calculate P (M |T = 0) from the ML estimates of the parameters. (d) Calculate P (M |T = 0) from the posterior mean estimates of the parameters. (e) As explained in the last slide of the lectures on Bayesian network learning, assume we have another model with an arrow from M to S. In this case, P (Si|α) gets replaced by P (Si|Mi, α1, α2). For the validation dataset in that slide, which model provides a better prediction? Hint: evaluate∏2 i=1 P (Ti,Mi, Si, Fi|θ, α, γ1:2, β1:4), where i is the index over the two items in the validation dataset, for both models. Note that for model 2, you first have to learn α1:2. 4. Bayesian linear regression Let X ∈ Rn×d and y ∈ Rn×1. Assume the likelihood is Gaussian: p(y|X,θ,Σ) = |2piΣ|−1/2e− 12 (y−Xθ)TΣ−1(y−Xθ). Assume that the prior for θ is also Gaussian: p(θ) = |2pi∆|−1/2e− 12θT∆−1θ. (a) Using Bayes rule and completing squares, derive an expression for the posterior distribution of θ. In this part, assume that the covariance Σ is given. State clearly what the mean and variance of the posterior are. (b) State the conditions under which the posterior mean would be equivalent to the ridge and maxi- mum likelihood estimators. .

学霸联盟