Excel/Spss/Python代写-A4|学霸联盟

Excel/Spss/Python代写-A4

时间：2021-06-06

First Semester 2019 Statistical Machine Learning (COMP4670/8600) Writing period : 3 Hours duration Study period : 15 Minutes duration Permitted materials : One A4 sheet, written on one side Total marks: 60. Answer all questions. All questions to be completed in the script book provided. Please submit the A4 sheet with the script book. 1 1 Logistic Regression (14 marks total) In the following questions we consider logistic regression without quadratic regularisation. For the questions which require code, use Python 3 and include comments which specify the input and output formats for your functions. Note that marks are allocated for documentation. You may assume the following preamble: [frame=single, numbers=none] import numpy as np (0.5 marks for having sufficient comments in the code) a) (1 mark) For binary logistic regression (as discussed in the lecture and used in the tutorial), how are the labels encoded? b) (1.5 marks) There are three possible output types that a logistic regression classifier can pro- duce, corresponding to the linear model, the probabilistic output and the classification. The three output types are called decision function, predict and predict proba in the scikit-learn interface (in no particular order). Explain (using both text and equations) the three output types. c) (1 mark) Define and explain the purpose of a confusion matrix (using both text and equations). d) (1 mark) Write down the mathematical definition of the sigmoid function y = σ(x). e) (1 mark) Write down the mathematical definition of the cost function for binary logistic regres- sion (average cross-entropy). Do not forget to define your notation. f) (2 marks) Calculate the mathematical form of the gradient (with respect to the model parame- ters) of the cost function above. g) (1 mark) Write code for a function sigmoid which operates element-wise on its vector input. h) (1 mark) Write code for a function cost calculating the cost function as defined above. Note: ensure that the finite floating point precision of a real computer does not destroy your result. i) (1 mark) Write code for a function costgradcalculatingthegradiento f thecost f unctionasde f inedabove.(1mark)Assumeyouhave f oundthebest parameterthetabest f orsometrainingdatawithinputXandtargetstusing[ f rame= single]thetabest = opt. f minb f gs(cost, theta, f prime= costgrad,args=(X , t))Writecode f ora f unctionpredictwhichgivenaseto f testdataXtestreturnsanarrayo f predictions. j) (2 marks) Consider a single (input, target) pair. Draw graphical models which depict i) the logistic regression model, and ii) the naive Bayes classifier model. Page 2 of ?? – Statistical Machine Learning – ( COMP4670/8600 ) 2 Kernel Linear Regression (14 marks total) Consider the function f (x,w) = φ(x)>w where x ∈ RD is the input vector, w ∈ RK is the parameter vector and φ is a feature function which maps from RD to RK . Let the kernel function be given by k(x,x′) = φ(x)>φ(x′). Assume we are given N (input, target) training pairs, D = {(x1, t1),(x2, t2), . . .(xN , tN)}. For the questions which require code, use Python 3 and include comments which specify the input and output formats for your functions. You may assume the following preamble: [frame=single, numbers=none] import numpy as np def xvalinds(ndata,n f olds) : ndata : thenumbero f datapointsn f olds : thenumbero f cross−validation f oldsreturns : alisto f (trainind, testind)pairs(o f listso f integers)assertndatanchunk= int(ndata/n f olds)itest = [list(range(i∗ nchunk,(i+1)∗nchunk)) f oriinrange(int(n f olds))]itrain= [sorted(set(range(ndata)).di f f erence(set(i))) f oriinitest]returnlist(zip(itrain, itest)) a) (1 mark) Write the sum of squared errors objective J(w), assuming that our prediction for tn is f (xn,w). Include the regularisation term λ2w >w. b) (1.5 marks) Show that the parameter vector w∗ which minimises J takes the form∑Nn=1 anφ(xn), and give an in terms of w∗. c) (1.5 marks) Write f (x,w∗) in terms of a = (a1,a2, . . . ,aN)> and k(·, ·), thereby eliminating φ(·). d) (1.5 marks) Write J(w∗) in terms of a and the kernel function k(·, ·) (or one or more matrices defined in terms of the kernel function), thereby eliminating φ(·) and w∗. e) (1.5 marks) Give a simplified, closed form expression for the optimal a . f) (1 mark) Write a function fit which takes the D , k, and λ and returns the optimal a. g) (1 mark) Write a function predict which takes the D , k, a and a matrix X test of test points, and returns the values resulting from evaluating the regression function at each test point. h) (2 marks) Write a function cross validate, which takes D , k, λ and nfolds, and returns a cross validated estimate (with nfolds cross validation folds) of the expected mean squared error on an independent test set. Use the function xvalindsabove.(3marks)Letf∈ RN . Spec- ify a likelihood function p(f|x1,x2, . . . ,xN ,k,a) and corresponding prior distribution p(f|λ ) for which the maximum a posteriori (MAP) f given D corresponds to the solution found above. That is, the n-th element of the MAP f should equal f (xn,∑Nn=1 a∗nφ(xn)) where a∗ is the op- timal solution found in ??, above. Where possible express your answer using well known distribution(s) with standard parametrisation(s). Page 3 of ?? – Statistical Machine Learning – ( COMP4670/8600 ) 3 Graphical Models - Alarm (10 marks total) John and Mary live in a house which is equipped with a burglar alarm system. The situation can be described by the following Bayesian network with the random variables B (a Burglar entered the house), E (an Earthquake occured), A (the Alarm in the house rings), R (Radio news report about an earthquake), J (John calls the police), and M (Mary, who listens to the radio, calls the police). The domain of each random variable is B = {0,1} encoding False (= 0) and True(= 1). i) A EB J M R (a) (2 mark) Write out the joint distribution p(B,E,A,R,J,M) in its factored form according to this Bayesian network structure. Express the answers to the following three queries only in terms of marginalisations “∑X∈X ” and maximisations “X∈X ” , where each X is a random variable, and X the corresponding set of possible values X can take. For example, the following identity is acceptable: ∑ B,A,R,J,M∈B p(B,E = 1,A,R,J,M) = 1 Wherever possible, simplify your expressions by exploiting the structure of the above graphical model. (b) (1 mark) The probability that the alarm will ring given no observations. (c) (1.5 marks) The joint probability that Mary called the police and John did not when there was an earthquake observed. (d) (1.5 marks) The most probable values for the tuple (E,B) if both John and Mary called the police and at least one of the events E,B did happen. (e) (2 marks) Write down all conditional independence relations in the Bayesian network when only A is observed. (f) (2 marks) Write down all conditional independence relations in the Bayesian network when only E is observed. Page 4 of ?? – Statistical Machine Learning – ( COMP4670/8600 ) 4 Sampling - Gaussian Mixture (10 marks) a) (1 marks) Define the setup for and the goal of rejection sampling. b) (2 marks) Write down all steps of the rejection sampling algorithm. You are encouraged to provide precise mathematical formulations and/or pseudo-code where appropriate, and also to explain your answer. c) (1 marks) What are some limitations of rejection sampling? d) (1 marks) Define the setup for and the goal of ancestral sampling. e) (2 marks) Write down all steps of the ancestral sampling algorithm. You are encouraged to provide precise mathematical formulations and/or pseudo-code where appropriate, and also to explain your answer. f) (1 marks) What are some limitations of ancestral sampling? g) Consider the mixture of three Gaussians, p(x) = 3 10 x50.5+ 3 10 x92+ 4 10 x220, where xµσ is a Gaussian distribution with mean µ and standard deviation σ . g1) (1 mark) Discuss which sampling method (rejection or ancestral) is more appropriate for the above distribution, and why. g2) (1 mark) Derive, explain and write code in Python 3 to draw 1000 samples from the given distribution using whichever of the two methods you deem most appropriate. Assume that your code starts with [frame=single, numbers=none] import numpy as np Page 5 of ?? – Statistical Machine Learning – ( COMP4670/8600 ) 5 Approximate Inference (12 marks total) a) (2.5 marks) In the context of Bayesian machine learning, describe i) the Laplace approximation and ii) variational approximations. Compare and contrast the two methods. b) (2 marks) Consider the problem of approximating some p(Z) with the factorised distribution q(Z) = N ∏ i=1 q(Z i) where Z = (Z1,Z2, . . . ,ZN). By holding q(Z i), i 6= j fixed, derive expressions for each of the following variational approxi- mation updates q?L(Z j)≡ arg min q(Z j) KL [ q(Z) ∥∥ p(Z)] q?R(Z j)≡ arg min q(Z j) KL [ p(Z) ∥∥q(Z)] where KL [ p(x) ∥∥q(x)]≡ ∫ p(x)(log p(x)− logq(x))x. c) (1 mark) Describe the Expectation Propagation algorithm, using the equations from the previous question if needed. d) (3 marks) Consider the distribution defined by p(A,B,C) = f1(A,B) f2(B,C), where A,B and C are discrete (categorical) random variables. Assume a factorising approxi- mation for f1 and f2, so that f1(A,B)≈ f˜1(A,B)≡ f˜1A(A) f˜1B(B), and similarly for f˜2(B,C)≡ f˜2B(B) f˜2C(C). Consider the case where the first factor f˜1(A,B) is being refined. Derive the Expectation Propagation updates using f˜1A(A), f˜1B(B), f˜2B(B),and f˜2C(C). e) (1.5 marks) Consider the same p(A,B,C) (still with categorical A,B, and C) from the previous question, but instead assume a factorising approximating distribution, q(A,B,C)≡ q(A)q(B)q(C). By optimising KL [ q(A,B,C) ∥∥ p(A,B,C)] with respect to each of the three factors (holding the other two fixed), derive the variational mean field updates. f) (2 marks) In this question, we consider a Gaussian approximation to the unknown distribution p(x) on continuous random variables. By treating the covariance Σ as fixed, derive an ex- pression for the mean µ of the (multivariate normal, or multivariate Gaussian) approximating distribution q(x)≡N (x|µ ,Σ) which minimises the objective KL [ p(x) ∥∥q(x)]. End of exam. Page 6 of ?? – Statistical Machine Learning – ( COMP4670/8600 )

学霸联盟