regress salary roe sales * As indicated by the below scatter plots, there are a few outliers in the data scatter salary roe scatter salary sales * Get the studentized residuals of the above regression predict rstu,rstu hilo rstu salary, show (5) * Drop the observations with studentized residuals above 2.0 in absolute value keep if abs(rstu) < 2.00 * Let's look at histograms of salary and lsalary * Log salary looks more normally distibuted histogram salary histogram lsalary * Let's look at histograms of sales and lsales * Log sales looks more normally distributed histogram sales histogram lsales * Let's look at the scatter plots with salary and sales logged. * The scatter plots look better with log transforms. scatter lsalary roe scatter lsalary lsales * Here we use the Box Cox transformation test to see if the use of log * transformations of salary and sales are suggested. * Theta = 0 implies the log transformation log(x) * Theta = 1 implies no transformation * Theta = -1 implies the reciprocal transformation x^-1. boxcox salary sales, notr(roe) model(lhs) boxcox salary sales, notr(roe) model(rhs) boxcox salary sales, notr(roe) model(lambda) * We settle on the log transformations (Theta = 0 for both salary and sales) regress lsalary roe lsales * Let's get a plot of the residuals of the model. * Do they look normally distributed? predict e, residual histogram e, normal * Here is formal test for normality of residuals. They appear to have a little * excess kurtosis but not enough to reject normality outright. sktest e * Let's go ahead and test for heteroskedasticity in the residuals. * The tests indicate homoskedasticity so OLS is OK for inference. estat hettest roe lsales * Now interpret the coefficients of the model. * * One last thing. * Let's convince ourselves of the "partialling out" interpretation of * multiple linear regression. Let's see if we can get the coefficients * of the above multiple regression model by doing some "partial" regressions. regress roe lsales predict e_roe, residual regress lsalary e_roe * Hey, guess what. The coefficient estimate on e_roe is the same as * the coefficient for roe produced in the multiple regression above. regress lsales roe predict e_lsales, residual regress lsalary e_lsales * Hey, guess what. The coefficient estimate on e_lsales is the same as * the coefficient for lsalary produced in the multiple regression above. * Yeah, we have shown that the "partialling-out" interpretation of multiple * linear regression is true!