程序代写案例-STAT 505
时间:2022-04-06
Topic 3 Part 1 - Model Selection
STAT 505 - Time Series Analysis
Lecture: TR 13:30-14:45
[1]
Order Selection for an ARMA Models
Using ACF and PACF to identify the order of AR(p), MA(q) models.
Characteristics of ACF and PACF
Process ACF PACF
AR(p) Tails off as exponential decay or
damped sine wave
Cuts off after lag p
MA(q) Cuts off after lag q Tails off as exponential
decay of damped sine wave
ARMA(p,q) Tails off Tails off
[2]
> par(mfrow=c(3,2))
> ACF = ARMAacf(ar=-0.7,lag.max=24)
> PACF = ARMAacf(ar=-0.7, lag.max=24,pacf=TRUE)
> plot(ACF, type="h", xlab="lag", ylim=c(-1,1),main="True ACF for ARMA(1,0)");
> abline(h=0)
> plot(PACF, type="h", xlab="lag", ylim=c(-1,1),main="True PACF for ARMA(1,0)");
> abline(h=0)
> ACF = ARMAacf(ma=-0.7,lag.max=24)
> PACF = ARMAacf(ma=-0.7, lag.max=24,pacf=TRUE)
> plot(ACF, type="h", xlab="lag", ylim=c(-1,1),main="True ACF for ARMA(0,1)");
> abline(h=0)
> plot(PACF, type="h", xlab="lag", ylim=c(-1,1),main="True PACF for ARMA(0,1)");
> abline(h=0)
> ACF = ARMAacf(ar=-0.7,ma=-0.5,lag.max=24)
> PACF = ARMAacf(ar=-0.7,ma=-0.7, lag.max=24,pacf=TRUE)
> plot(ACF, type="h", xlab="lag", ylim=c(-1,1),main="True ACF for ARMA(1,1)");
> abline(h=0)
> plot(PACF, type="h", xlab="lag", ylim=c(-1,1),main="True PACF for ARMA(1,1)");
> abline(h=0)
[3]
5 10 15 20 25

1.
0
0.
0
1.
0
True ACF for ARMA(1,0)
lag
AC
F
5 10 15 20

1.
0
0.
0
1.
0
True PACF for ARMA(1,0)
lag
PA
CF
5 10 15 20 25

1.
0
0.
0
1.
0
True ACF for ARMA(0,1)
lag
AC
F
5 10 15 20

1.
0
0.
0
1.
0
True PACF for ARMA(0,1)
lag
PA
CF
5 10 15 20 25

1.
0
0.
0
1.
0
True ACF for ARMA(1,1)
lag
AC
F
5 10 15 20

1.
0
0.
0
1.
0
True PACF for ARMA(1,1)
lag
PA
CF
[4]
[5]
There are commonly two approaches: Try-and-error and information criteria
1. Try-and-error to select ARMA models
• For the order selection of ARMA models, we may use try-and-error to find the order of p and
q. For any pair of (p, q), check the residuals to see whether the ACF and PACF display the
patterns for a AR or MA model. Then continue the fitting process until the residual is a white noise.
• According to the large-sample result, if the residuals Zt ( obtained from the modeling of a time
series {Xt} ) is a iid sequence, then the components of ACF ρk and PACF φˆmm of {Zt}
follows approximately normal(0, 1/n) distribution. That is for k > 0, φˆkk will fall between the
bounds ±1.96n−1/2 with probability close to 0.95. This suggests that φˆkk < 1.96n−1/2 for
all k > 0, as a preliminary estimator of p and q.
• R-code example for a data set, say W1, we fit ARMA(2,1) model:
library(stats)
w1.fit <- arima(data, order=c(2, 0, 1))
#The output of this code provides the log-likelihood and AIC value.
[6]
# You can also compute BIC values.
data_res <- w1.fit$residual
plot(data_res)
acf(data_res)
pacf(data_res)
tsdiag(data_res) # We may use this to analyze model residual
Box.test(data_res, lag = 20, type=`Ljung')
2. Using information criteria: A more systematic approach to order selection is to find the values
of p and q that minimize the certain information criteria such as AIC or BIC.
(a) AIC (or AICc) and BIC
AIC = 2k − 2(log likelihood)
where k = p+ q + 1 is the number of parameters.
AICc(l) = 2(p+ q + 1)n/(n− p− q − 2)− 2(log likelihood)
BIC(l) = k ln(n)− 2(log likelihood)
where n is the number of observations.
(b) To apply the AIC (or AICc) and BIC criteria, we choose the value of p and q to minimize AIC or
BIC.
[7]
(c) AIC and BIC are usually given in the model output.
> Dirdata="~/OneDrive - University of Calgary//MyCoursesThierry/STAT505/Winter2019/DataandITSM/ITSM2000-2/"
> sunspot=scan(paste(Dirdata,"sunspots.tsm",sep=""))
> sunspot.ts=ts(data=sunspot, frequency = 1,start = 1770,end = 1869)
> par(mfrow=c(1,3))
> plot.ts(sunspot.ts,lwd=1.5)
> acf(sunspot.ts,lag.max = 50,lwd=1.5)
> pacf(sunspot.ts,lag.max = 50,lwd=1.5)
> AIC1=BIC1=matrix(0,4,4)
> for (i in 1:4){ for (j in 1:4){
+ fit=arima(sunspot.ts, order = c(i-1, 0,j-1))
+ AIC1[i,j]=AIC(fit);
+ BIC1[i,j]=BIC(fit); }}
> AIC1
[,1] [,2] [,3] [,4]
[1,] 1010.9298 906.9696 852.8261 843.4655
[2,] 903.8076 848.8172 838.9291 839.7595
[3,] 837.2348 833.0532 834.6754 836.6731
[4,] 834.9896 834.7797 836.6897 838.4875
> BIC1
[,1] [,2] [,3] [,4]
[1,] 1016.1401 914.7851 863.2468 856.4913
[8]
[2,] 911.6231 859.2379 851.9549 855.3905
[3,] 847.6555 846.0791 850.3064 854.9093
[4,] 848.0155 850.4107 854.9259 859.3288
> idxminA=which(AIC1 == min(AIC1), arr.ind = TRUE)
> idxminA
row col
[1,] 3 2
> idxminB=which(BIC1 == min(BIC1), arr.ind = TRUE)
> idxminB
row col
[1,] 3 2
Time
su
n
sp
ot
.ts
1780 1800 1820 1840 1860
0
50
10
0
15
0
0 10 20 30 40 50

0.
2
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
Lag
AC
F
Series sunspot.ts
0 10 20 30 40 50

0.
5
0.
0
0.
5
Lag
Pa
rti
al
A
CF
Series sunspot.ts
> fit=arima(sunspot.ts, order = c(idxminA[1]-1, 0,idxminA[2]-1))
> library(lmtest) # install.packages("lmtest")
[9]
> coeftest(fit)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 1.22482 0.11310 10.8291 < 2.2e-16 ***
ar2 -0.56008 0.10830 -5.1716 2.321e-07 ***
ma1 0.38467 0.13342 2.8832 0.003937 **
intercept 48.46201 6.01870 8.0519 8.152e-16 ***
---
Signif. codes: 0
> tsdiag(fit)
[10]
Standardized Residuals
Time
1780 1800 1820 1840 1860

2
0
2
0 5 10 15 20

0.
2
0.
4
1.
0
Lag
AC
F
ACF of Residuals
2 4 6 8 10
0.
0
0.
4
0.
8
p values for Ljung−Box statistic
lag
p
va
lu
e
[11]
You may also use R code: auto.arima (from the r package forecast) function for modeling.
For seasonal models, we will discuss later on. You can still use the same function for models with
seasonality.
> library(forecast)
> auto.arima(sunspot.ts)
Series: sunspot.ts
ARIMA(2,0,1) with non-zero mean
Coefficients:
ar1 ar2 ma1 mean
1.2248 -0.5601 0.3847 48.4620
s.e. 0.1131 0.1083 0.1334 6.0187
sigma^2 estimated as 222.7: log likelihood=-411.53
AIC=833.05 AICc=833.69 BIC=846.08
[12]
Case Study: Analysis of the monthly accidental deaths
(Modified from Example 6.5.4)
> Dirdata="~/OneDrive - University of Calgary/MyCoursesThierry/STAT505/Winter2019/DataandITSM/ITSM2000-2/"
> death=scan(paste(Dirdata,"deaths.tsm",sep=""))
> death.ts=ts(data=death, frequency = 12,start = c(1973,1),end = c(1978,12))
> par(mfrow=c(3,1))
> plot.ts(death.ts,main="The monthly accidental deaths data, 1973-1978"
+ ,ylab="(thousands)",lwd=1.5)
> acf(death.ts,lag.max = 48,xlab="Year",lwd=1.5)
> pacf(death.ts,lag.max = 48,xlab="Year",lwd=1.5)
[13]
The monthly accidental deaths data, 1973−1978
Time
(th
ou
sa
nd
s)
1973 1974 1975 1976 1977 1978 1979
70
00
10
00
0
0 1 2 3 4

0.
4
0.
2
0.
8
Year
AC
F
Series death.ts
0 1 2 3 4

0.
4
0.
2
Year
Pa
rti
al
A
CF
Series death.ts
[14]
> NoSeaso.death=diff(death.ts,lag=12)
> # We use the operator 1-B^{12} to remove the seasonality
> par(mfrow=c(3,1))
> plot.ts(NoSeaso.death,main="The differenced series (1-B^{12})X_t"
+ ,ylab="(thousands)",lwd=1.5)
> acf(NoSeaso.death,lag.max = 60,lwd=1.5)
> pacf(NoSeaso.death,lag.max = 60,lwd=1.5)
> #The seasonal component evident previously is absent from the graph,
> # However there appears to be a nondecreasing trend. It can be removed
> # by applying the operator 1-B
[15]
The differenced series (1−B^{12})X_t
Time
(th
ou
sa
nd
s)
1974 1975 1976 1977 1978 1979

10
00
0
0 1 2 3 4 5

0.
2
0.
4
1.
0
Lag
AC
F
Series NoSeaso.death
0 1 2 3 4 5

0.
2
0.
2
0.
6
Lag
Pa
rti
al
A
CF
Series NoSeaso.death
[16]
> NoTrend.Seaso.death=diff(NoSeaso.death,lag=1)
> par(mfrow=c(3,1))
> plot.ts(NoTrend.Seaso.death,main="The differenced series (1-B)(1-B^{12})X_t"
+ ,ylab="(thousands)",lwd=1.5)
> acf(NoTrend.Seaso.death,lag.max = 48,xlab="Year",lwd=1.5)
> pacf(NoTrend.Seaso.death,lag.max = 48,xlab="Year",lwd=1.5)
[17]
The differenced series (1−B)(1−B^{12})X_t
Time
(th
ou
sa
nd
s)
1974 1975 1976 1977 1978 1979

50
0
50
0
0 1 2 3 4

0.
4
0.
2
0.
8
Year
AC
F
Series NoTrend.Seaso.death
0 1 2 3 4

0.
3
0.
0
Year
Pa
rti
al
A
CF
Series NoTrend.Seaso.death
[18]
> fit1=arima(NoTrend.Seaso.death,order=c(0,0,1),
+ seasonal = list(order=c(0,0,1),period=12))
> library(lmtest)
> coeftest(fit1)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ma1 -0.49626 0.13512 -3.6726 0.0002401 ***
sma1 -0.61460 0.19324 -3.1805 0.0014700 **
intercept 21.02204 11.99207 1.7530 0.0796029 .
---
Signif. codes: 0
> tsdiag(fit1,lwd=1.5)
> print(AIC(fit1))
[1] 856.5324
> print(BIC(fit1))
[1] 864.8426
[19]
Standardized Residuals
Time
1974 1975 1976 1977 1978 1979

2
0
1
2
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.
2
0.
4
1.
0
Lag
AC
F
ACF of Residuals
2 4 6 8 10
0.
0
0.
4
0.
8
p values for Ljung−Box statistic
lag
p
va
lu
e
[20]
> fit2=arima(NoTrend.Seaso.death,order=c(1,0,1),
+ seasonal = list(order=c(1,0,1),period=12))
> coeftest(fit2)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.447582 0.120267 3.7216 0.000198 ***
ma1 -0.999911 0.074372 -13.4448 < 2.2e-16 ***
sar1 0.232117 0.161734 1.4352 0.151235
sma1 -0.998135 0.695532 -1.4351 0.151268
intercept 20.750229 3.430254 6.0492 1.456e-09 ***
---
Signif. codes: 0
> tsdiag(fit2,lwd=1.5)
> print(AIC(fit2))
[1] 855.1901
> print(BIC(fit2))
[1] 867.6553
[21]
Standardized Residuals
Time
1974 1975 1976 1977 1978 1979

1
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.
2
0.
4
1.
0
Lag
AC
F
ACF of Residuals
2 4 6 8 10
0.
0
0.
4
0.
8
p values for Ljung−Box statistic
lag
p
va
lu
e
[22]
> fit3=arima(NoTrend.Seaso.death,order=c(1,0,1),
+ seasonal = list(order=c(1,0,0),period=12))
> coeftest(fit3)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.431752 0.122144 3.5348 0.0004081 ***
ma1 -0.999997 0.055902 -17.8883 < 2.2e-16 ***
sar1 -0.399054 0.121840 -3.2752 0.0010557 **
intercept 20.611825 3.306490 6.2337 4.554e-10 ***
---
Signif. codes: 0
> tsdiag(fit3,lwd=1.5)
> print(AIC(fit3))
[1] 857.9437
> print(BIC(fit3))
[1] 868.3314
[23]
Standardized Residuals
Time
1974 1975 1976 1977 1978 1979

1
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.
2
0.
4
1.
0
Lag
AC
F
ACF of Residuals
2 4 6 8 10
0.
0
0.
4
0.
8
p values for Ljung−Box statistic
lag
p
va
lu
e
[24]
> fit4=arima(NoTrend.Seaso.death,order=c(0,0,1),
+ seasonal = list(order=c(1,0,0),period=12))
> coeftest(fit4)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ma1 -0.51869 0.13490 -3.8450 0.0001205 ***
sar1 -0.35165 0.12671 -2.7753 0.0055155 **
intercept 22.46605 16.43886 1.3666 0.1717374
---
Signif. codes: 0
> tsdiag(fit4,lwd=1.5)
> print(AIC(fit4))
[1] 860.6085
> print(BIC(fit4))
[1] 868.9187
[25]
Standardized Residuals
Time
1974 1975 1976 1977 1978 1979

2
0
1
2
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.
2
0.
4
1.
0
Lag
AC
F
ACF of Residuals
2 4 6 8 10
0.
0
0.
4
0.
8
p values for Ljung−Box statistic
lag
p
va
lu
e

essay、essay代写