data hprice; input price assess bdrms lotsize sqrft colonial lprice lassess llotsize lsqrft; datalines; 300 349.1 4 6126 2438 1 5.703783 5.855359 8.720297 7.798934 370 351.5 3 9903 2076 1 5.913503 5.86221 9.200593 7.638198 191 217.7 3 5200 1374 0 5.252274 5.383118 8.556414 7.225482 195 231.8 3 4600 1448 1 5.273 5.445875 8.433811 7.277938 373 319.1 4 6095 2514 1 5.921578 5.765504 8.715224 7.82963 466.275 414.5 5 8566 2754 1 6.144775 6.027073 9.055556 7.92081 332.5 367.8 3 9000 2067 1 5.80664 5.907539 9.10498 7.633853 315 300.2 3 6210 1731 1 5.752573 5.704449 8.733916 7.456455 206 236.1 3 6000 1767 0 5.327876 5.464255 8.699514 7.477038 240 256.3 3 2892 1890 0 5.480639 5.546349 7.969704 7.544332 285 314 4 6000 2336 1 5.652489 5.749393 8.699514 7.756196 300 416.5 5 7047 2634 1 5.703783 6.031887 8.860357 7.876259 405 434 3 12237 3375 1 6.003887 6.073044 9.412219 8.12415 212 279.3 3 6460 1899 0 5.356586 5.632287 8.773385 7.549083 265 287.5 3 6519 2312 1 5.57973 5.661223 8.782476 7.745868 227.4 232.9 4 3597 1760 1 5.426711 5.450609 8.187856 7.473069 240 303.8 4 5922 2000 0 5.480639 5.71637 8.68643 7.600903 285 305.6 3 7123 1774 1 5.652489 5.722277 8.871084 7.480992 268 266.7 3 5642 1376 1 5.590987 5.586124 8.637994 7.226936 310 326 4 8602 1835 1 5.736572 5.786897 9.05975 7.5148 266 294.3 3 5494 2048 1 5.583496 5.684599 8.611412 7.624619 270 318.8 3 7800 2124 1 5.598422 5.764564 8.961879 7.661057 225 294.2 3 6003 1768 0 5.416101 5.68426 8.700015 7.477604 150 208 4 5218 1732 0 5.010635 5.337538 8.55987 7.457032 247 239.7 3 9425 1440 1 5.509388 5.479388 9.151121 7.272398 275 294.1 3 6114 1932 0 5.616771 5.68392 8.718336 7.566311 230 267.4 3 6710 1932 0 5.438079 5.588746 8.811355 7.566311 343 359.9 3 8577 2106 1 5.83773 5.885826 9.05684 7.652546 477.5 478.1 7 8400 3529 1 6.168564 6.16982 9.035987 8.16877 350 355.3 4 9773 2051 1 5.857933 5.872962 9.187379 7.626083 230 217.8 4 4806 1573 1 5.438079 5.383577 8.47762 7.36074 335 385 4 15086 2829 0 5.81413 5.953243 9.621523 7.947679 251 224.3 3 5763 1630 1 5.525453 5.412984 8.659213 7.396335 235 251.9 4 6383 1840 1 5.459586 5.529032 8.761394 7.517521 361 354.9 4 9000 2066 1 5.888878 5.871836 9.10498 7.633369 190 212.5 4 3500 1702 0 5.247024 5.358942 8.160519 7.439559 360 452.4 4 10892 2750 1 5.886104 6.114567 9.295784 7.919356 575 518.1 5 15634 3880 1 6.35437 6.250168 9.657204 8.263591 209.001 289.4 4 6400 1854 1 5.342339 5.66781 8.764053 7.525101 225 268.1 2 8880 1421 0 5.416101 5.59136 9.091557 7.259116 246 278.5 3 6314 1662 1 5.505332 5.629418 8.750525 7.415777 713.5 655.4 5 28231 3331 1 6.570182 6.485246 10.24818 8.111028 248 273.3 4 7050 1656 1 5.513429 5.61057 8.860783 7.41216 230 212.1 3 5305 1171 0 5.438079 5.357058 8.576406 7.065613 375 354 5 6637 2293 1 5.926926 5.869297 8.800415 7.737616 265 252.1 3 7834 1764 1 5.57973 5.529826 8.966228 7.475339 313 324 3 1000 2768 0 5.746203 5.780744 6.907755 7.92588 417.5 475.5 4 8112 3733 0 6.034285 6.164367 9.0011 8.224967 253 256.8 3 5850 1536 1 5.53339 5.548297 8.674197 7.336937 315 279.2 4 6660 1638 1 5.752573 5.631928 8.803875 7.401231 264 313.9 3 6637 1972 1 5.575949 5.749074 8.800415 7.586803 255 279.8 2 15267 1478 0 5.541264 5.634075 9.633449 7.298445 210 198.7 3 5146 1408 1 5.347107 5.291796 8.545975 7.249926 180 221.5 3 6017 1812 1 5.192957 5.400423 8.702344 7.502186 250 268.4 3 8410 1722 1 5.521461 5.592478 9.037177 7.451241 250 282.3 4 5625 1780 1 5.521461 5.64297 8.634976 7.484369 209 230.7 4 5600 1674 1 5.342334 5.441118 8.630522 7.422971 258 287 4 6525 1850 1 5.552959 5.659482 8.783396 7.522941 289 298.7 3 6060 1925 1 5.666427 5.69944 8.709465 7.562681 316 314.6 4 5539 2343 0 5.755742 5.751302 8.619569 7.759187 225 291 3 7566 1567 0 5.416101 5.673323 8.931419 7.356918 266 286.4 4 5484 1664 1 5.583496 5.65739 8.60959 7.41698 310 253.6 6 5348 1386 1 5.736572 5.535758 8.584478 7.234177 471.25 482 5 15834 2617 1 6.155389 6.177944 9.669915 7.869784 335 384.3 4 8022 2321 1 5.81413 5.951424 8.989944 7.749753 495 543.6 4 11966 2638 1 6.204558 6.298213 9.389825 7.877776 279.5 336.5 4 8460 1915 1 5.633002 5.818598 9.043104 7.557473 380 515.1 4 15105 2589 1 5.940171 6.244361 9.622781 7.859027 325 437 4 10859 2709 0 5.783825 6.079933 9.292749 7.904335 220 263.4 3 6300 1587 1 5.393628 5.573674 8.748305 7.369601 215 300.4 3 11554 1694 0 5.370638 5.705115 9.354787 7.434848 240 250.7 3 6000 1536 1 5.480639 5.524257 8.699514 7.336937 725 708.6 5 31000 3662 0 6.586172 6.563291 10.34174 8.205765 230 276.3 3 4054 1736 1 5.438079 5.621487 8.307459 7.459339 306 388.6 2 20700 2205 0 5.723585 5.962551 9.937889 7.698483 425 252.5 3 5525 1502 0 6.052089 5.531411 8.617039 7.314553 318 295.2 4 92681 1696 1 5.762052 5.687653 11.43692 7.436028 330 359.5 3 8178 2186 1 5.799093 5.884714 9.009203 7.689829 246 276.2 4 5944 1928 1 5.505332 5.621125 8.690138 7.564239 225 249.8 3 18838 1294 0 5.416101 5.52066 9.843632 7.165493 111 202.4 4 4315 1535 1 4.70953 5.310246 8.369853 7.336286 268.125 254 3 5167 1980 1 5.591453 5.537334 8.550048 7.590852 244 306.8 4 7893 2090 1 5.497168 5.726196 8.973732 7.644919 295 318.3 3 6056 1837 1 5.686975 5.762994 8.708805 7.515889 236 259.4 3 5828 1715 0 5.463832 5.558371 8.670429 7.447168 202.5 258.1 3 6341 1574 0 5.31074 5.553347 8.754792 7.361375 219 232 2 6362 1185 0 5.389072 5.446737 8.758098 7.077498 242 252 4 4950 1774 1 5.488938 5.529429 8.507143 7.480992 ; /* Set up a house id so that when observations are sorted we will know which house we are talking about in terms of the original data set. */ data hprice; set hprice; houseid = _n_; /* Here we get measures of the influence of each observation in the multiple regression. leverage (of i-th observation) = x(i)'[inv(X'X)]x(i), rstudent = studentized residual based on all observations EXCEPT for the i-th observation, , student = studentized residuals based on all observations, dffits = standardized difference in fits of i-th observation first with all observations used and then with all observations EXCEPT the i-th observation. For more discussion of these influence measures see SAS/STAT User's Guide, Version 6, Fourth Edition, pp. 1418-1419. The larger each of these measures are for a given observation, the more influential the observation. Belsley, Kuh, and Welsch (1980) suggest looking more closely at observations with leverage > 2*K/N, where K = the number of parameters in the regression model (including the intercept) and N = the number of observations, abs(rstudent) > 2, and abs(dffits) > 2/sqrt(K/N). David A. Belsley, Edwin Kuh, Roy E. Welsch, "Regression Diagnostics: Identifying Influential Data and Sources of Collinearity," (Wiley: NY), 1980. */ proc reg data=hprice; model price = lotsize sqrft /stb influence; output out=results h=leverage rstudent = rstudent student = student dffits = dffits; run; /* Here we do a "descending" sort of the data by the influence measure leverage (i.e. from the observation that has the highest leverage value to the observation with the lowest leverage value.) The descending option has to be used because the ascending sort is the default. */ proc sort data=results; by descending leverage; /* "Prune" the results file so that we only have a few variables to look at. */ data results; set results; keep houseid leverage rstudent student dffits; /* Print out the sorted data so that we can identify the houses (from their house ids) that are most greatly influencing the multiple regression results. In the present case, outliers are those that satisfy the criteria leverage > 0.068, abs(rstudent) > 2, and abs(dffits) > 10.83. Observation houseid = 77 satisfies ALL of these criteria. */ proc print data=results; run; /* Drop the most influential observation (houseid = 77) from the original data set, hprice, and form a new data set, hprice2. */ data hprice2; set hprice; if _n_ = 77 then delete; /* Run the original multiple regression on the data set without the outlier (houseid = 77). How were the coefficients changed after you dropped the outlier house that had a huge lotsize but, relatively speaking, a low price? */ proc reg data=hprice2; model price = lotsize sqrft /stb; run; /* Finally, Wooldridge p.188 points out that the logarithmic transformation of the dependent variable can be used to reduce the influence effects of outliers. Let's see if that is the case here. Recalll that in the present case, outliers are those that satisfy the criteria leverage > 0.068, abs(rstudent) > 2, and abs(dffits) > 10.83. */ data hprice; set hprice; lprice = log(price); proc reg data=hprice; model lprice = lotsize sqrft / influence; output out=results h=leverage rstudent = rstudent student = student dffits = dffits; run; proc sort data=results; by descending leverage; data results; set results; keep houseid leverage rstudent student dffits; proc print data=results; run; /* As we can see, the log transformation of the dependent variable didn't reduce the effect of the observation with houseid = 77 very much. The leverage and rstudent criteria still exceed the Belsley, Kuh, and Welsch (1980) guidelines for detecting outliers although the diffits criteria was reduced enough by the log transformation to no longer qualify the houseid = 77 observation as an outlier. We should probably look closely at this observation and decide if it is "from another population" and therefore drop it from the current analysis of factors that affect housing prices. */