/* This program estimates the Deterministic Trend / Deterministic Seasonal
model for the Plano Sales Tax Revenue data set. Proc Autoreg is used because
the errors of the regression are autocorrelated and statistical inference
on any of coefficients cannot be conducted without proper adjustment for
this autocorrelation. At the same time the program tests for the presence of
seasonality while adjusting for autocorrelation in the errors. Also we find that
a quadratic term is not needed to model the trend in the data. Obviously, we
are using an outdated model here because the trend in the data is more likely
to be stochastic rather than deterministic. */
data Plano;
input month $ 1-3 yr 4-5 rev;
title 'Plano Sales Tax Revenue Data';
title2 'By Month';
datalines;
Feb90 2068592
Mar90 867387
Apr90 791878
May90 1731316
Jun90 911839
Jul90 909258
Aug90 1826999
Sep90 964868
Oct90 1020941
Nov90 1881435
Dec90 1075607
Jan91 964977
Feb91 2699324
Mar91 884494
Apr91 1035007
May91 1930143
Jun91 1124814
Jul91 1098136
Aug91 1812798
Sep91 1095294
Oct91 1163039
Nov91 1920424
Dec91 1000743
Jan92 1075763
Feb92 2341127
Mar92 1062449
Apr92 1120898
May92 1939866
Jun92 1316907
Jul92 1284888
Aug92 2098891
Sep92 1375423
Oct92 1201251
Nov92 2165295
Dec92 1301110
Jan93 1251165
Feb93 2986796
Mar93 1271028
Apr93 1228055
May93 2349629
Jun93 1385267
Jul93 1537452
Aug93 2576586
Sep93 1642938
Oct93 1577049
Nov93 2765401
Dec93 1940847
Jan94 1640531
Feb94 3271545
Mar94 1383909
Apr94 1495825
May94 2772734
Jun94 1592051
Jul94 1560732
Aug94 2773904
Sep94 1523255
Oct94 2013622
Nov94 2957306
Dec94 1789103
Jan95 1848972
Feb95 3507801
Mar95 1821378
Apr95 1930585
May95 2823010
Jun95 1970356
Jul95 1970534
Aug95 2982305
Sep95 1795240
Oct95 2145180
Nov95 3021075
Dec95 1908781
Jan96 1957956
Feb96 3955970
Mar96 2119970
Apr96 2208176
May96 3063504
Jun96 2190613
Jul96 2197082
Aug96 3085586
Sep96 2642591
Oct96 2550586
Nov96 3230872
Dec96 2482466
Jan97 2315274
Feb97 4388396
Mar97 2335249
Apr97 1956240
May97 3183566
Jun97 2421722
Jul97 1879301
Aug97 3094563
Sep97 2599894
Oct97 2320012
Nov97 3518486
Dec97 2407487
Jan98 2291118
Feb98 4813948
Mar98 2380134
Apr98 2223477
May98 3378416
Jun98 2876314
Jul98 2650942
Aug98 3788448
Sep98 2651506
Oct98 2450710
Nov98 4118992
Dec98 2434040
Jan99 2763878
Feb99 5227962
Mar99 2762093
Apr99 2528931
May99 4040412
Jun99 2883152
Jul99 3100274
Aug99 4149743
Sep99 3061236
Oct99 2805394
Nov99 3962285
Dec99 3197688
Jan00 3149649
Feb00 5401137
Mar00 3393528
Apr00 2852524
May00 4708691
Jun00 3567883
Jul00 3405732
Aug00 4885709
Sep00 4142396
Oct00 3564755
Nov00 4794159
Dec00 3459785
Jan01 3600702
Feb01 5789400
Mar01 3283596
Apr01 3411052
May01 4783941
Jun01 3706871
Jul01 3756080
Aug01 4318154
Sep01 3201376
Oct01 3502712
Nov01 4864603
Dec01 3108517
Jan02 3357796
Feb02 5904823
Mar02 2951480
Apr02 3185525
May02 4729624
Jun02 3282329
Jul02 3271971
Aug02 4559047
Sep02 3350292
Oct02 3286394
Nov02 4566940
Dec02 2863028
Jan03 3049842
Feb03 5780438
Mar03 3286533
Apr03 3016081
May03 4533575
Jun03 3296881
Jul03 3535071
Aug03 5290070
Sep03 3323063
Oct03 3318144
Nov03 5206490
Dec03 3240679
Jan04 3673046
Feb04 6166054
Mar04 3573983
Apr04 2999256
May04 5177550
Jun04 3845943
Jul04 3492933
Aug04 4975878
Sep04 3531498
Oct04 3611446
Nov04 5145814
Dec04 3260597
Jan05 3715755
Feb05 6239931
Mar05 3730730
Apr05 3431157
May05 5404423
Jun05 4049371
Jul05 3648390
Aug05 5394527
Sep05 3968853
Oct05 3970771
Nov05 5384216
;
DATA plano;
SET plano;
t = _n_;
t2 = t*t;
d1 = (month='Jan');
d2 = (month='Feb');
d3 = (month='Mar');
d4 = (month='Apr');
d5 = (month='May');
d6 = (month='Jun');
d7 = (month='Jul');
d8 = (month='Aug');
d9 = (month='Sep');
d10 = (month='Oct');
d11 = (month='Nov');
d12 = (month='Dec');
run;
/* Here we are estimating the "Relative to January" deterministic time trend model but assuming that the
errors are not autocorrelated, that is, we are ASSUMING that there is no cyclical movement in the data.
The F-test for seasonality is reported but is not to be used for statistical inference if the errors are
autocorrelated. */
proc reg data = plano;
model rev = t t2 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 / dwprob;
test d2, d3, d4, d5, d6, d7, d8, d9, d10, d11, d12;
run;
/* The previous OLS regression indicates that there is substantial autocorrelation
in the errors of the model. The Durbin-Watson statistic applied to the OLS residuals
is low and corresponding p-value for the case for Pr < DW is very small indicating
the presence of autocorrelation in the OLS residuals. Therefore,we leave "Proc Reg"
and go to "Proc Autoreg" to do our work. Since we don't know the extent of the
autocorrelation in the data we are going to let Proc Autoreg choose the AR(p) model
for the errors that works best. The order of the AR(p) model is chosen by a backward
elimination search with a drop out significance level set to 0.05 (slstay=0.05).
We use p = 12 because it is possible that our seasonal dummies might not pick up all
of the seasonality in the data that might exist in corresponding months 12 months
earlier. */
proc autoreg data = plano;
model rev = t t2 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12/ nlag = 12 DW=4 DWPROB
method=ml backstep slstay=0.05;
run;
/* Given the above run of proc autoreg it appears that, after adusting our test statistics
for autocorrelation it appears that the residuals of the estimated model are
now white noise. Looking at the t2 variable in this context we can see that
its coefficient in not statistically significant at convenential levels therefore
we conclude that the quadratic t2 term is no longer needed. So let's reexamine the
model without the quadratic (i.e. curvature) term. */
proc autoreg data = plano;
model rev = t d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12/ nlag = 12 DW=4 DWPROB
method=ml backstep slstay=0.05;
run;
/* Given the immediate above results it appears that the AR(1,3,5,8,10,12) model
is appropriate to explain the cyclical behavior in the Plano Sales Tax Revenue data,
apart from linear trend and seasonal effects some of which are highly significant. */
/* Now in the below code let us once again look at the F-test for Seasonality but this time
adjust for the autocorrelation in the residuals of the model. */
proc autoreg data = plano outest = coeff;
model rev = t d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12/ nlag = 12
method=ml backstep slstay=0.05;
test d2, d3, d4, d5, d6, d7, d8, d9, d10, d11, d12;
run;
/* Note that the p-value of the F-statistic for testing H0: no seasonality versus
H1: seasonality is present is less than 0.05 (F = 36.02 with p < 0.0001). Thus,
we conclude that the data has seasonal variation in it and therefore needs to be
modeled vis-a-vis the inclusion of seasonal dummy variables. */
/* In this next section of the program we are going to create "standardized" versions
of the seasonal coefficients so that we can compare them with each other as it relates
to sign and magnitude. If a standardized coefficient is positive, it represents
a "strong" month and the larger the size of the coefficient, the stronger the month's
seasonal effect is. In contrast, if a standardized coefficient is negative, it
represents a "weak" month and the more negative the coefficient, the weaker the month's
seasonal effect is. */
/* Remember in the "Relative to January" parametrization, the intercept is the January
intercept, while the Feburary coefficient is the INCREMENT to January's intercept and,
therefore, the February intercept is equal to the SUM of the intercept estimate and
the February coefficient. */
data coeff;
set coeff;
seasonsum = (12*intercept + d2 + d3 + d4 + d5 + d6 + d7 + d8 + d9 + d10 + d11 + d12);
seasonave = seasonsum/12;
d1a = (intercept - seasonave)/seasonave;
d2a = (intercept + d2 - seasonave)/seasonave;
d3a = (intercept + d3 - seasonave)/seasonave;
d4a = (intercept + d4 - seasonave)/seasonave;
d5a = (intercept + d5 - seasonave)/seasonave;
d6a = (intercept + d6 - seasonave)/seasonave;
d7a = (intercept + d7 - seasonave)/seasonave;
d8a = (intercept + d8 - seasonave)/seasonave;
d9a = (intercept + d9 - seasonave)/seasonave;
d10a = (intercept + d10 - seasonave)/seasonave;
d11a = (intercept + d11 - seasonave)/seasonave;
d12a = (intercept + d12 - seasonave)/seasonave;
sum = d1a + d2a + d3a + d4a + d5a + d6a + d7a + d8a + d9a + d10a + d11a + d12a;
run;
title1 'Standardized seasonal effects by month. They sum to zero.';
title2 'Strong months are positive and weak months are negative.';
title3 'Their magnitudes can be compared.';
proc print data = coeff;
var sum d1a d2a d3a d4a d5a d6a d7a d8a d9a d10a d11a d12a;
run;