程序代写案例-CSE 513T|学霸联盟

程序代写案例-CSE 513T

时间：2021-10-17

CSE 513T Theory of Artificial Intelligence & Machine Learning October 5, 2021 Homework 5 Instructor: Brendan Juba Due: October 19, 2021, 11:59 PM Reminder: You may work in groups and use outside sources. But, you must write up solutions in your own words and properly reference your sources for each problem. This includes listing your collaborators and properly citing any sources you use. Solutions to each problem must be electronically typeset and submitted online via Gradescope. 1. The sample complexity of confidence. In lecture we saw that in order to learn a class of VC-dimension d with confidence 1 − δ and error bounded by , it is sufficient to find a representation h that is consistent with O(1 (d log 1 + log 1 δ )) examples. We also saw that Ω(d/) examples were necessary to guarantee an error rate of with confidence 1−δ0 for some constant δ0 > 0. In this problem we will extend this lower bound to include a dependence on δ when we wish to learn to arbitrarily high confidence 1− δ. (a) Show that for any class of VC-dimension at least 2, Ω(1 log 1 δ ) examples are necesssary for an algorithm to identify a representation that has error at most with probability at least 1− δ. (b) Now, using the lower bound we proved in lecture, conclude that for any algorithm to find a representation with error at most with probability 1− δ when the data is labeled by a class of VC-dimension d requires Ω(1 (d+ log 1 δ )) examples (as d→∞, → 0, δ → 0). Remark. Hanneke recently improved the analysis of the upper bound, to obtain that O(1 (d+ log 1δ )) examples suffice to find a representation in a class of VC-dimension d that is correct with probability 1− with confidence 1− δ. Thus, the sample complexity of learning a class of VC-dimension d is asymptotically Θ(1 (d+ log 1 δ )) examples. 2. VC-dimension bounds for nonzero error. We can derive analogous sample complexity bounds in cases where no representation h from the class H fits the data perfectly, perhaps due to noise in the data. Suppose a class H of representations of VC-dimension d is given. Show that if we take a sample of m = O ( 1 2 (d log 1 + log 1 δ ) ) examples from a distribution D, then with probability 1 − δ (δ < 1/2) every representation h ∈ H has an empirical error ∑mj=1 1mI[h(x(j)) 6= b(j)] within an additive ( < 1/2) of its true error PrD[h(x) 6= b]. (Here I[ϕ] denotes the 0/1 valued indicator function that is 1 iff ϕ is true.) (Hint. You can reuse the lemmas stated in lecture! Try to follow the analogous argument except that instead of choosing a set of m out of 2m elements uniformly at random, try considering a pair of m-element samples in which you independently swap the ith element in the two samples with probability 1/2.) 1

学霸联盟