统计代写-STAT 426|学霸联盟

统计代写-STAT 426

时间：2021-09-07

STAT 426 1.3 Statistical Inference for Categorical Data (Part I) Maximum likelihood estimation We will mostly discuss maximum likelihood estimation. Assuming certain regularity conditions, the properties of the maximum likelihood estimators are: Large-sample normal distributions Asymptotically consistent (converge to the population value) Asymptotically efficient (lower variance than other estimators) 1 Maximum likelihood estimation Let β a generic unknown parameter and βˆ the parameter estimate: Likelihood function: the probability of observing a sample, as a function of the unknown parameter. Maximum Likelihood (ML) estimate: parameter value that maximizes the likelihood function. If βˆ maximizes the likelihood function `(β), βˆ also maximizes the logarithm of the likelihood function L(β) = log(`(β)) The maximum likelihood estimate is the solution of ∂`(β)/∂β = 0. If β is multidimensional, we denote the parameter vector as β and get βˆ as the solution of a set of equations. The kernel of `(β) includes only the factors that depend on β. Inference will involve only the kernel, so L(β) need only be specified up to an additive constant. 2 Covariance of the ML estimators Let cov(βˆ) the covariance matrix of βˆ. Under some regularity conditions covariance matrix is the inverse of the information matrix. The (j,k) element of the covariance matrix can be estimated as: cov(βˆ)jk = −E(∂ 2L(β) ∂βj∂βk ) The standard errors (SE) of βˆ, are the square roots of the elements in the diagonal of the covariance matrix. The greater the curvature of the log likelihood, the smaller the standard errors. Exercise: Find the likelihood function and ML estimate of the Binomial parameter. 3 Wald test These tests use the asymptotic normality of the maximum likelihood estimators. We want to test the null hypothesis H0 : β = β0. The test statistic: z = βˆ − β0 SE for a non-zero SE, has an approximate normal distribution when β = βo. We can obtain one-sided or two-sided P-values from the standard normal table. For the two-sided test, the statistics z2 has a chi-squared distribution with 1 degree of freedom. The P-value can be found as the right-tail area above the observed value. This type of statistic is called the Wald statistic. 4 Wald test Multivariate extension of the Wald test We want to test H0 : β = β0. The Wald statistic can be written as: W = (βˆ− β0)′[cov(βˆ)]−1(βˆ− β0) The asymptotic normal distribution for βˆ implies the asymptotic chi-square distribution for W , with degrees of freedom rank(cov(βˆ)). 5 For our purposes, L(β) will be well-defined and at least twice continuously differentiable. A maximum likelihood estimate (MLE) βˆ maximizes `(β). βˆ is usually the (unique) solution of L′(β) = 0. Note: An MLE also maximizes the kernel. 6 The score function is u(β) = ∂L(β) ∂β The (Fisher) information is ı(β) = −E ( ∂2L(β) ∂β2 ) where the expectation is over the assumed distribution for the data when the parameter value is β. Note: These can be found even when L(β) is known only up to an additive constant. 7 If the data are from a sample of size n, we consider asymptotic behavior as n→∞ ... Typically, ( ı(β) )−1 = asymptotic variance of MLE βˆ in the sense that using it to “standardize” βˆ results in an asymptotic limit (often normal) with variance 1. Also, σ(βˆ) = √( ı(β) )−1 = asymptotic standard error 8 Can also show E ( u(β) ) = 0 var ( u(β) ) = ı(β) where the expectations are over the assumed distribution for the data when the parameter value is β. When the parameter value is β, u(β) is often asymptotically normal (after appropriate standardization). 9 Example (Binomial Probability) Y ∼ binomial(n, pi) 0 < pi < 1 n known pi unknown Can take L(pi) = ln ( piy(1− pi)n−y) = y ln pi + (n− y) ln(1− pi) so that u(pi) = ∂L ∂pi = y pi − n− y 1− pi = y − npi pi(1− pi) Note E ( u(pi) ) = 0. 10 Example (Binomial Probability) Y ∼ binomial(n, pi) 0 < pi < 1 n known pi unknown Can take L(pi) = ln ( piy(1− pi)n−y) = y ln pi + (n− y) ln(1− pi) so that u(pi) = ∂L ∂pi = y pi − n− y 1− pi = y − npi pi(1− pi) Note E ( u(pi) ) = 0. 10 Example (continued) Solving u(pi) = 0 gives MLE pˆi = y n = proportion of “successes” whenever 0 < y < n. (We will also formally allow y = 0 and y = n, even though pˆi = 0 and pˆi = 1 are outside the parameter space.) 11 Example (continued) The information is ı(pi) = −E ( ∂2L ∂pi2 ) = E ( Y pi2 + n− Y (1− pi)2 ) = npi pi2 + n(1− pi) (1− pi)2 = n pi + n (1− pi) = n pi(1− pi) 12 Example (continued) E(pˆi) = pi var(pˆi) = var(Y/n) = npi(1− pi)/n2 = pi(1− pi)/n = (ı(pi))−1 so the variance is exactly the inverse information, in this case, though in general that is only approximately true. We write σ(pˆi) = √ pi(1− pi)/n By the LLN, pˆi is consistent. By the CLT, pˆi is asymptotically normal. 13 Example (continued) E(pˆi) = pi var(pˆi) = var(Y/n) = npi(1− pi)/n2 = pi(1− pi)/n = (ı(pi))−1 so the variance is exactly the inverse information, in this case, though in general that is only approximately true. We write σ(pˆi) = √ pi(1− pi)/n By the LLN, pˆi is consistent. By the CLT, pˆi is asymptotically normal. 13 Example (continued) Note also that var ( u(pi) ) = var ( Y − npi pi(1− pi) ) = 1 pi2(1− pi)2 npi(1− pi) = n pi(1− pi) = ı(pi) So var ( u(pi) ) = ı(pi) exactly, as is generally true. 14 Likelihood Inference Back to the general model with parameter β ... How can we test H0 : β = β0 Ha : β 6= β0 or form a confidence interval (CI) for β? Three main likelihood approaches: Wald Likelihood Ratio Score 15 Likelihood Inference Back to the general model with parameter β ... How can we test H0 : β = β0 Ha : β 6= β0 or form a confidence interval (CI) for β? Three main likelihood approaches: Wald Likelihood Ratio Score 15 Wald The Wald statistic: zW = βˆ − β0 SE SE = 1√ ı(βˆ) (Note that SE uses βˆ, not β0.) Usually zW d−−−→ n→∞ N(0, 1) under H0 : β = β0 so reject if |zW | ≥ zα/2 for a two-sided level α test. 16 Wald The Wald statistic: zW = βˆ − β0 SE SE = 1√ ı(βˆ) (Note that SE uses βˆ, not β0.) Usually zW d−−−→ n→∞ N(0, 1) under H0 : β = β0 so reject if |zW | ≥ zα/2 for a two-sided level α test. 16 The Wald test also has a chi-squared form, using z2W = (βˆ − β0)2 1/ı(βˆ) ·∼ H0 χ21 17 Likelihood Ratio Let Λ = `(β0)/`(βˆ) where `(β0) is the maximized value of the likelihood under H0 and `(β1) is the maximized value over all parameter space. The ratio Λ cannot exceed 1. The likelihood-ratio test (LRT) chi-squared statistic: −2 ln Λ = −2 ln(`(β0)/`(βˆ)) = −2(L(β0)− L(βˆ)) It has an approximate χ21 distribution under H0 : β = β0, and otherwise tends to be larger. Thus, reject H0 if −2 ln Λ ≥ χ21(α) 18 Likelihood Ratio Let Λ = `(β0)/`(βˆ) where `(β0) is the maximized value of the likelihood under H0 and `(β1) is the maximized value over all parameter space. The ratio Λ cannot exceed 1. The likelihood-ratio test (LRT) chi-squared statistic: −2 ln Λ = −2 ln(`(β0)/`(βˆ)) = −2(L(β0)− L(βˆ)) It has an approximate χ21 distribution under H0 : β = β0, and otherwise tends to be larger. Thus, reject H0 if −2 ln Λ ≥ χ21(α) 18 Score The score statistic: zS = u(β0)√ ı(β0) (This is the score standardized under H0.) Under H0 : β = β0, its distribution is approximately N(0, 1). Otherwise, it tends to be further from zero. Thus, reject H0 if |zS| ≥ zα/2. There is also a chi-squared form: z2S = u(β0) 2 ı(β0) ·∼ H0 χ21 19 Score The score statistic: zS = u(β0)√ ı(β0) (This is the score standardized under H0.) Under H0 : β = β0, its distribution is approximately N(0, 1). Otherwise, it tends to be further from zero. Thus, reject H0 if |zS| ≥ zα/2. There is also a chi-squared form: z2S = u(β0) 2 ı(β0) ·∼ H0 χ21 19 Score The score statistic: zS = u(β0)√ ı(β0) (This is the score standardized under H0.) Under H0 : β = β0, its distribution is approximately N(0, 1). Otherwise, it tends to be further from zero. Thus, reject H0 if |zS| ≥ zα/2. There is also a chi-squared form: z2S = u(β0) 2 ı(β0) ·∼ H0 χ21 19 All three kinds tend to be “asymptotically equivalent” as n→∞. For smaller n, the likelihood-ratio and score methods are preferred. 20