/* Data obtained for the Principles of Econometrics website */ data cars; input mpg cyl eng wgt; datalines; 18 8 307 3504 15 8 350 3693 18 8 318 3436 16 8 304 3433 17 8 302 3449 15 8 429 4341 14 8 454 4354 14 8 440 4312 14 8 455 4425 15 8 390 3850 15 8 383 3563 14 8 340 3609 15 8 400 3761 14 8 455 3086 24 4 113 2372 22 6 198 2833 18 6 199 2774 21 6 200 2587 27 4 97 2130 26 4 97 1835 25 4 110 2672 24 4 107 2430 25 4 104 2375 26 4 121 2234 21 6 199 2648 10 8 360 4615 10 8 307 4376 11 8 318 4382 9 8 304 4732 27 4 97 2130 28 4 140 2264 25 4 113 2228 19 6 232 2634 16 6 225 3439 17 6 250 3329 19 6 250 3302 18 6 232 3288 14 8 350 4209 14 8 400 4464 14 8 351 4154 14 8 318 4096 12 8 383 4955 13 8 400 4746 13 8 400 5140 18 6 258 2962 22 4 140 2408 19 6 250 3282 18 6 250 3139 23 4 122 2220 28 4 116 2123 30 4 79 2074 30 4 88 2065 31 4 71 1773 35 4 72 1613 27 4 97 1834 26 4 91 1955 24 4 113 2278 25 4 97.5 2126 23 4 97 2254 20 4 140 2408 21 4 122 2226 13 8 350 4274 14 8 400 4385 15 8 318 4135 14 8 351 4129 17 8 304 3672 11 8 429 4633 13 8 350 4502 12 8 350 4456 13 8 400 4422 19 3 70 2330 15 8 304 3892 13 8 307 4098 13 8 302 4294 14 8 318 4077 18 4 121 2933 22 4 121 2511 21 4 120 2979 26 4 96 2189 22 4 122 2395 28 4 97 2288 23 4 120 2506 28 4 98 2164 27 4 97 2100 13 8 350 4100 14 8 304 3672 13 8 350 3988 14 8 302 4042 15 8 318 3777 12 8 429 4952 13 8 400 4464 13 8 351 4363 14 8 318 4237 13 8 440 4735 12 8 455 4951 13 8 360 3821 18 6 225 3121 16 6 250 3278 18 6 232 2945 18 6 250 3021 23 6 198 2904 26 4 97 1950 11 8 400 4997 12 8 400 4906 13 8 360 4654 12 8 350 4499 18 6 232 2789 20 4 97 2279 21 4 140 2401 22 4 108 2379 18 3 70 2124 19 4 122 2310 21 6 155 2472 26 4 98 2265 15 8 350 4082 16 8 400 4278 29 4 68 1867 24 4 116 2158 20 4 114 2582 19 4 121 2868 15 8 318 3399 24 4 121 2660 20 6 156 2807 11 8 350 3664 20 6 198 3102 19 6 232 2901 15 6 250 3336 31 4 79 1950 26 4 122 2451 32 4 71 1836 25 4 140 2542 16 6 250 3781 16 6 258 3632 18 6 225 3613 16 8 302 4141 13 8 350 4699 14 8 318 4457 14 8 302 4638 14 8 304 4257 29 4 98 2219 26 4 79 1963 26 4 97 2300 31 4 76 1649 32 4 83 2003 28 4 90 2125 24 4 90 2108 26 4 116 2246 24 4 120 2489 26 4 108 2391 31 4 79 2000 19 6 225 3264 18 6 250 3459 15 6 250 3432 15 6 250 3158 16 8 400 4668 15 8 350 4440 16 8 318 4498 14 8 351 4657 17 6 231 3907 16 6 250 3897 15 6 258 3730 18 6 225 3785 21 6 231 3039 20 8 262 3221 13 8 302 3169 29 4 97 2171 23 4 140 2639 20 6 232 2914 23 4 140 2592 24 4 134 2702 25 4 90 2223 24 4 119 2545 18 6 171 2984 29 4 90 1937 19 6 232 3211 23 4 115 2694 23 4 120 2957 22 4 121 2945 25 4 121 2671 33 4 91 1795 28 4 107 2464 25 4 116 2220 25 4 140 2572 26 4 98 2255 27 4 101 2202 17.5 8 305 4215 16 8 318 4190 15.5 8 304 3962 14.5 8 351 4215 22 6 225 3233 22 6 250 3353 24 6 200 3012 22.5 6 232 3085 29 4 85 2035 24.5 4 98 2164 29 4 90 1937 33 4 91 1795 20 6 225 3651 18 6 250 3574 18.5 6 250 3645 17.5 6 258 3193 29.5 4 97 1825 32 4 85 1990 28 4 97 2155 26.5 4 140 2565 20 4 130 3150 13 8 318 3940 19 4 120 3270 19 6 156 2930 16.5 6 168 3820 16.5 8 350 4380 13 8 350 4055 13 8 302 3870 13 8 318 3755 31.5 4 98 2045 30 4 111 2155 36 4 79 1825 25.5 4 122 2300 33.5 4 85 1945 17.5 8 305 3880 17 8 260 4060 15.5 8 318 4140 15 8 302 4295 17.5 6 250 3520 20.5 6 231 3425 19 6 225 3630 18.5 6 250 3525 16 8 400 4220 15.5 8 350 4165 15.5 8 400 4325 16 8 351 4335 29 4 97 1940 24.5 4 151 2740 26 4 97 2265 25.5 4 140 2755 30.5 4 98 2051 33.5 4 98 2075 30 4 97 1985 30.5 4 97 2190 22 6 146 2815 21.5 4 121 2600 21.5 3 80 2720 43.1 4 90 1985 36.1 4 98 1800 32.8 4 78 1985 39.4 4 85 2070 36.1 4 91 1800 19.9 8 260 3365 19.4 8 318 3735 20.2 8 302 3570 19.2 6 231 3535 20.5 6 200 3155 20.2 6 200 2965 25.1 4 140 2720 20.5 6 225 3430 19.4 6 232 3210 20.6 6 231 3380 20.8 6 200 3070 18.6 6 225 3620 18.1 6 258 3410 19.2 8 305 3425 17.7 6 231 3445 18.1 8 302 3205 17.5 8 318 4080 30 4 98 2155 27.5 4 134 2560 27.2 4 119 2300 30.9 4 105 2230 21.1 4 134 2515 23.2 4 156 2745 23.8 4 151 2855 23.9 4 119 2405 20.3 5 131 2830 17 6 163 3140 21.6 4 121 2795 16.2 6 163 3410 31.5 4 89 1990 29.5 4 98 2135 21.5 6 231 3245 19.8 6 200 2990 22.3 4 140 2890 20.2 6 232 3265 20.6 6 225 3360 17 8 305 3840 17.6 8 302 3725 16.5 8 351 3955 18.2 8 318 3830 16.9 8 350 4360 15.5 8 351 4054 19.2 8 267 3605 18.5 8 360 3940 31.9 4 89 1925 34.1 4 86 1975 35.7 4 98 1915 27.4 4 121 2670 25.4 5 183 3530 23 8 350 3900 27.2 4 141 3190 23.9 8 260 3420 34.2 4 105 2200 34.5 4 105 2150 31.8 4 85 2020 37.3 4 91 2130 28.4 4 151 2670 28.8 6 173 2595 26.8 6 173 2700 33.5 4 151 2556 41.5 4 98 2144 38.1 4 89 1968 32.1 4 98 2120 37.2 4 86 2019 28 4 151 2678 26.4 4 140 2870 24.3 4 151 3003 19.1 6 225 3381 34.3 4 97 2188 29.8 4 134 2711 31.3 4 120 2542 37 4 119 2434 32.2 4 108 2265 46.6 4 86 2110 27.9 4 156 2800 40.8 4 85 2110 44.3 4 90 2085 43.4 4 90 2335 36.4 5 121 2950 30 4 146 3250 44.6 4 91 1850 33.8 4 97 2145 29.8 4 89 1845 32.7 6 168 2910 23.7 3 70 2420 35 4 122 2500 32.4 4 107 2290 27.2 4 135 2490 26.6 4 151 2635 25.8 4 156 2620 23.5 6 173 2725 30 4 135 2385 39.1 4 79 1755 39 4 86 1875 35.1 4 81 1760 32.3 4 97 2065 37 4 85 1975 37.7 4 89 2050 34.1 4 91 1985 34.7 4 105 2215 34.4 4 98 2045 29.9 4 98 2380 33 4 105 2190 33.7 4 107 2210 32.4 4 108 2350 32.9 4 119 2615 31.6 4 120 2635 28.1 4 141 3230 30.7 6 145 3160 25.4 6 168 2900 24.2 6 146 2930 22.4 6 231 3415 26.6 8 350 3725 20.2 6 200 3060 17.6 6 225 3465 28 4 112 2605 27 4 112 2640 34 4 112 2395 31 4 112 2575 29 4 135 2525 27 4 151 2735 24 4 140 2865 36 4 105 1980 37 4 91 2025 31 4 91 1970 38 4 105 2125 36 4 98 2125 36 4 120 2160 36 4 107 2205 34 4 108 2245 38 4 91 1965 32 4 91 1965 38 4 91 1995 25 6 181 2945 38 6 262 3015 26 4 156 2585 22 6 232 2835 32 4 144 2665 36 4 135 2370 27 4 151 2950 27 4 140 2790 44 4 97 2130 32 4 135 2295 28 4 120 2625 31 4 119 2720 ; data cars; set cars; cyl_eng = cyl*eng; cyl_wgt = cyl*wgt; eng_wgt = eng*wgt; cyl2 = cyl*cyl; eng2 = eng*eng; wgt2 = wgt*wgt; run; /* Here we are going to use the "response surface modeling" (RSM) approach (Box and Wilson (1951) to building a model for mpg. We include all explanatory variables and their cross-product and squared terms and then do backward selection to see what variables remain. */ proc reg data=cars; model mpg = eng wgt cyl cyl_eng cyl_wgt eng_wgt cyl2 eng2 wgt2 / selection=backward slstay=0.05; title 'Response Surface Regression'; run; /* As it turns out the variables that are chosen are eng, wgt and eng_wgt. */ /* We choose to look at the OLS estimates as well as the Heteroskedasticity consistent standard errors and their t-values to make sure we cover ourselves for possible heteroskedasticity. Looking at the OLS residual plots of these equations and the heteroskedasity test (SPEC) we probably need to adjust for heterskedasticity. */ proc reg data=cars; model mpg = eng wgt eng_wgt / WHITE SPEC; title 'Backward Selected Regression'; title2 'with Heteroskedasticity Robust Standard Errors'; run; /* The above equation seems to be a nice one except for having to adjust for heteroskedasticity by calculating HC standard errors for the OLS coefficient estimates. */ /* Let us try to reduce the heteroskedasticity in our model by modeling the log of mpg. This sometimes works. */ data cars; set cars; lmpg = log(mpg); run; proc reg data=cars; model lmpg = eng wgt eng_wgt / WHITE SPEC; title 'Log MPG Equation'; run; /* The log transformation of mpg did not get rid of the heteroskedasticity problem. See the result of White's test for heteroskedasticity (SPEC). */ /* Let's look at the distribution of mpg versus lmpg. */ proc sgplot data = cars; title 'MPG Distribution'; histogram mpg; density mpg; density mpg / type = kernel; keylegend / location=inside position=topright; run; /* Let's look at the distribution of mpg versus lmpg. */ proc sgplot data = cars; title 'Log MPG Distribution'; histogram lmpg; density lmpg; density lmpg / type = kernel; keylegend / location=inside position=topright; run; /* Let's look at the Skewness and Kurtosis of mpg versus lmpg and do some formal statistical testing for normality. */ Proc univariate data = cars normal plot; var mpg lmpg; QQPLOT mpg /Normal(mu=est sigma=est color=red l=1); QQPLOT lmpg /Normal(mu=est sigma=est color=green l=1); title 'Statistical Tests for Normality of mpg and lmpg'; run; /* From the tests reported here the log transform of mpg does not induce normality but this is not such an important consideration given that the sample size we have here is moderate in size (N=392). With respect to the test statistics in the regression equations we have run, the Central Limit Theory should be sufficient to provide sampling distributions of the test statistics that are OK. */ /* To illustrate the weighted least squares approach, let us assume (after looking at the plot of the least squares residuals versus the scale variable wgt (weight)) that the heterokedasticity is of the parametric form (sigma^2)*(1/wgt). Then the weight in SAS will be of the form weight = proportional to the reciprocal of the error variance = (1/(1/wgt) = wgt. We know from Aitken's theorem that if the weights for the observations are proportional to the reciprocals of the error variances, then the weighted least squares (WLS) estimates are best linear unbiased estimators (BLUE). */ proc reg data = cars; model mpg = eng wgt eng_wgt; weight wgt; title 'Weighted Least Squares Estimation of mpg Regression Equation'; run; /* Looking at the WLS results versus the OLS-HCC results, they are very similar and give us basically the same conclusions as to the determinants of mpg performance in the sample of cars we have looked at. */