R代写-MATH1312
时间:2021-05-18
MATH1312 Week 10 – Generalised Linear Models Impact of Autocorrelation 1. OLS coefficients unbiased, but not minimum variance 2. If positively autocorrelated, then underestimates σ2 Standard errors will be too small Confidence intervals too short Misleading hypothesis tests 3. The CI, PI and tests based on t and F are no longer exact procedures Linear Regression Analysis 5E Montgomery, Peck, & Vining Dealing with Autocorrelation 1. Find and include the missing variable(s) 2. Use weighted (generalized) least squares 3. Use a model that incorporates the autocorrelation and estimate the parameters in that model Generalized Linear Models Traditional applications of linear models, such as design of experiments and multiple linear regression, assume that the response variable is Normally distributed Constant variance Independent There are many situations where these assumptions are inappropriate The response is either binary (0,1), or a count The response is continuous, but non-normal Some Approaches to These Problems Data transformation Induce approximate normality Stabilize variance Simplify model form Weighted least squares Often used to stabilize variance Generalized linear models (GLM) Unifies linear and nonlinear regression models Response distribution is a member of the exponential family (normal, exponential, gamma, binomial, Poisson) Generalized Linear Models Original applications were in biopharmaceutical sciences Lots of recent interest in GLMs in industrial statistics GLMs are simple models; include linear regression and OLS as a special case Parameter estimation is by maximum likelihood (assume that the response distribution is known) Inference on parameters is based on large-sample or asymptotic theory We will consider logistic regression, Poisson regression, then the GLM Binary Response Variables The outcome (or response, or endpoint) values 0, 1 can represent “success” and “failure” Occurs often in the biopharmaceutical field; dose- response studies, bioassays, clinical trials Industrial applications include failure analysis, fatigue testing, reliability testing For example, functional electrical testing on a semiconductor can yield: “success” in which case the device works “failure” due to a short, an open, or some other failure mode Binary Response Variables Possible model: The response yi is a Bernoulli random variable 0 1 1,2,..., 0 or 1 k i j ij i i i j i i n y x y β β ε ε = = ′= + + = +  = ∑ x β 2 ( 1) with 0 1 ( 0) 1 ( ) ( ) (1 ) i i i i i i i i i i i y i i P y P y E y Var y pi pi pi µ pi σ pi pi = = ≤ ≤ = = − ′= = = = = − x β Problems With This Model The error terms take on only two values, so they can’t possibly be normally distributed The variance of the observations is a function of the mean (see previous slide) A linear response function could result in predicted values that fall outside the 0, 1 range, and this is impossible because 0 ( ) 1i i i iE y µ pi′≤ = = = ≤x β Binary Response Variables – The Challenger Data 50 60 70 80 0.0 0.5 1.0 Temperature O - R i n g F a i l Temperature at Launch At Least One O-ring Failure Temperature at Launch At Least One O-ring Failure 53 1 70 1 56 1 70 1 57 1 72 0 63 0 73 0 66 0 75 0 67 0 75 1 67 0 76 0 67 0 76 0 68 0 78 0 69 0 79 0 70 0 80 0 70 1 81 0 Data for space shuttle launches and static tests prior to the launch of Challenger Binary Response Variables There is a lot of empirical evidence that the response function should be nonlinear; an “S” shape is quite logical The logistic response function is a common choice exp( 1( ) 1 exp( 1 exp(E y ′ = = ′ ′+ + − x x x β) β) β) The Logistic Response Function The logistic response function can be easily linearized. Let: Define This is called the logit transformation and ( )E yη pi′= =x β ln 1 piη pi = − Logistic Regression Model Model: The model parameters are estimated by the method of maximum likelihood (MLE) ( ) where ( ) exp( 1 exp( i i i i i i i y E y E y ε pi = + = ′ = ′+ x x β) β) A Logistic Regression Model for the Challenger Data (Using Minitab) Binary Logistic Regression: O-Ring Fail versus Temperature Link Function: Logit Response Information Variable Value Count O-Ring F 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 10.875 5.703 1.91 0.057 Temperat -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99 Log-Likelihood = -11.515 A Logistic Regression Model for the Challenger Data Test that all slopes are zero: G = 5.944, DF = 1, P-Value = 0.015 Goodness-of-Fit Tests Method Chi-Square DF P Pearson 14.049 15 0.522 Deviance 15.759 15 0.398 Hosmer-Lemeshow 11.834 8 0.159 exp(10.875 0.17132 ) ˆ 1 exp(10.875 0.17132 ) xy x − = + − Note that the fitted function has been extended down to 31 deg F, the temperature at which Challenger was launched Maximum Likelihood Estimation in Logistic Regression The distribution of each observation yi is The likelihood function is We usually work with the log-likelihood: 1( ) (1 ) , 1, 2,...,i iy yi i i if y i npi pi −= − = 1 1 ( , ( ) (1 )i i n n y y i i i i i i L f y pi pi − = = −∏ ∏y =1 β) = 1 11 ln ( , ln ( ) ln ln(1 ) 1 n n n i i i i i i ii i L f y y pi pi pi = ==    = + −   −   ∑ ∑∏y β) = Maximum Likelihood Estimation in Logistic Regression The maximum likelihood estimators (MLEs) of the model parameters are those values that maximize the likelihood (or log-likelihood) function Often gives estimators that are intuitively pleasing MLEs have nice properties unbiased (for large samples) minimum variance (or nearly so an approximate normal distribution when n is large Maximum Likelihood Estimation in Logistic Regression If we have ni trials at each observation, we can write the log-likelihood as The derivative of the log-likelihood is 1 ln ( , ln[1 exp( n i i i L n = ′ ′ ′− +∑y X y xβ) = β β)] [ ] 1 1 ln ( , exp( 1 exp( because ) n i i i i i n i i i i i i i nL n n pi µ pi = =  ∂ ′ ′= −   ′∂ +  ′= − ′ ′= − = ∑ ∑ y X y x x x X y x X y X β) β)β β) µ ( Maximum Likelihood Estimation in Logistic Regression Setting this last result to zero gives the maximum likelihood score equations These equations look easy to solve…we’ve actually seen them before in linear regression: (′ −X y 0µ) = 1 results from OLS or ML with normal errors Since , ˆ ˆ , and ) (OLS or the normal-theory MLE)− = ′ ′ ′ ′ ′ ′ ′ y X X y 0 X X y X y X 0 X X X y X X X y = β + ε µ + ε ( − µ) = µ = β, ( − µ) = ( − β) = β = β = ( Maximum Likelihood Estimation in Logistic Regression Solving the ML score equations in logistic regression isn’t quite as easy, because Logistic regression is a nonlinear model The solution is based on iteratively reweighted least squares or IRLS (see Appendix for details) An iterative procedure is necessary because parameter estimates must be updated from an initial “guess” through several steps Weights are necessary because the variance of the observations is not constant The weights are functions of the unknown parameters , 1, 2,..., 1 exp( i i i n i nµ = = ′+ −x β) Interpretation of the Parameters in Logistic Regression The log-odds at x is The log-odds at x + 1 is The difference in the log-odds is 0 1 ˆ( ) ˆ ˆ ˆ( ) ln ˆ1 ( ) x x x x piη β β pi = = + − 0 1 ˆ( 1) ˆ ˆ ˆ( 1) ln ( 1) ˆ1 ( 1) x x x x piη β β pi + + = = + + − + 1 ˆ ˆ ˆ( 1) ( )x xη η β+ − = Interpretation of the Parameters in Logistic Regression The odds ratio is found by taking antilogs: The odds ratio is interpreted as the estimated increase in the probability of “success” associated with a one-unit increase in the value of the predictor variable 1 ˆ1ˆ x R x OddsO e Odds β+ = = Odds Ratio for the Challenger Data This implies that every decrease of one degree in temperature increases the odds of O-ring failure by about 1/0.84 = 1.19 or 19 percent The temperature at Challenger launch was 22 degrees below the lowest observed launch temperature, so now This results in an increase in the odds of failure of 1/0.0231 = 43.34, or about 4200 percent!! There’s a big extrapolation here, but if you knew this prior to launch, what decision would you have made? 0.17132 ˆ 0.84RO e − = = 22( 0.17132)ˆ 0.0231RO e − = = Inference on the Model Parameters Likelihood ratio test can compare “full” to “reduced” model – analogous to “extra sums of squares” For large samples, when RM is correct, LR follows chi- square distribution (df=pFM-pRM) When LR exceeds α reject the claim that reduced model is appropriate [ ])(ln)(ln2)( )(ln2 RMLFML RML FMLLR −== Inference on the Model Parameters Test for significance, comparing full model to reduced model with constant probability of success: So probability is y/n, which gives maximum value for the reduced model: So the test statistic is: Minitab calls this “G”. Logistic regression model with no regressors Goodness of Fit Also assessed using likelihood procedure Compares current model to a saturated model where each observation (or group) has its own parameter The deviance is When the fit is adequate, large sample size, then follows a chi-square distribution (df=n-p) Large p-value – fit is satisfactory D/(n-p) should be close to 1 Pearson Chi-Square Goodness of Fit Compares observed and expected probabilities of success and failure at each group of observations (df=n-p) Large p-value – fit is satisfactory Hosmer-Lemeshow Goodness of Fit Used when there are no replicates on the regressors Observations grouped based on estimated probabilities usually g=10 Observed compared to expected Df=(g-2) Large p-value – fit is satisfactory HF/df should be close to 1 Likelihood Inference on the Model Parameters Deviance can also be used to test hypotheses about subsets of the model parameters (analogous to the extra SS method) Procedure: 1 2 2 2 0 2 1 2 1 1 , with parameters, has parameters This full model has deviance ( : : The reduced model is , with deviance ( ) The difference in deviance between the full and reduce p r H H λ λ + = ≠ X X 0 0 X 1 1 η = β β β β) β β η = β β 1 1 1 0 1 0 d models is ( | ) ( ) ( with degrees of freedom ( | ) has a chi-square distribution under : Large values of ( | ) imply that : should be rejected r H H λ λ λ λ λ = − = = 0 0 2 2 2 2 2 β β β β), β β β β β β Inference on the Model Parameters Tests on individual model coefficients can also be done using Wald inference Uses the result that the MLEs have an approximate normal distribution, so the distribution of is standard normal if the true value of the parameter is zero. Some computer programs report the square of Z (which is chi-square), and others calculate the P-value using the t distribution 0 ˆ ˆ( ) Z se β β= Another Logistic Regression Example: The Pneumoconiosis Data A 1959 article in Biometrics reported the data: The fitted model: Confidence Intervals For the parameter For the odds ratio For the linear prediction Probability of success at x0 ( ) ( )jjjjj sezsez βββββ αα ˆˆˆˆ 22 +≤≤− ( )[ ] ( )[ ]jjRjj sezOsez ββββ αα ˆˆexpˆˆexp 22 +≤≤− Lab Questions – Week 10 Reading Chapter 13, Sections 13.1 – 13.2.4 of Montgomery et al. “Linear Regression”































































































































































































































































































































































































































































































































































































学霸联盟


essay、essay代写