regress salary roe sales
* As indicated by the below scatter plots, there are a few outliers in the data
scatter salary roe
scatter salary sales
* Get the studentized residuals of the above regression
predict rstu,rstu
hilo rstu salary, show (5)
* Drop the observations with studentized residuals above 2.0 in absolute value
keep if abs(rstu) < 2.00
* Let's look at histograms of salary and lsalary
* Log salary looks more normally distibuted
histogram salary
histogram lsalary
* Let's look at histograms of sales and lsales
* Log sales looks more normally distributed
histogram sales
histogram lsales
* Let's look at the scatter plots with salary and sales logged.
* The scatter plots look better with log transforms.
scatter lsalary roe
scatter lsalary lsales
* Here we use the Box Cox transformation test to see if the use of log
* transformations of salary and sales are suggested.
* Theta = 0 implies the log transformation log(x)
* Theta = 1 implies no transformation
* Theta = -1 implies the reciprocal transformation x^-1.
boxcox salary sales, notr(roe) model(lhs)
boxcox salary sales, notr(roe) model(rhs)
boxcox salary sales, notr(roe) model(lambda)
* We settle on the log transformations (Theta = 0 for both salary and sales)
regress lsalary roe lsales
* Let's get a plot of the residuals of the model.
* Do they look normally distributed?
predict e, residual
histogram e, normal
* Here is formal test for normality of residuals. They appear to have a little
* excess kurtosis but not enough to reject normality outright.
sktest e
* Let's go ahead and test for heteroskedasticity in the residuals.
* The tests indicate homoskedasticity so OLS is OK for inference.
estat hettest roe lsales
* Now interpret the coefficients of the model.
*
* One last thing.
* Let's convince ourselves of the "partialling out" interpretation of
* multiple linear regression. Let's see if we can get the coefficients
* of the above multiple regression model by doing some "partial" regressions.
regress roe lsales
predict e_roe, residual
regress lsalary e_roe
* Hey, guess what. The coefficient estimate on e_roe is the same as
* the coefficient for roe produced in the multiple regression above.
regress lsales roe
predict e_lsales, residual
regress lsalary e_lsales
* Hey, guess what. The coefficient estimate on e_lsales is the same as
* the coefficient for lsalary produced in the multiple regression above.
* Yeah, we have shown that the "partialling-out" interpretation of multiple
* linear regression is true!