* A description of the data describe * Report summary statistics for the variables summarize * More detailed information on the variables codebook * You can summarize variable information by categories summarize salary if finance == 1 summarize salary if finance == 0 * Here is the regression of salary on roe in Chapter 2 of the Wooldridge * textbook regress salary roe * Retrieve the predicted values and the residuals of the regression predict salaryhat predict e, residual * Plot the fitted regression. There are some pretty obvious outliers in the * regression fit. scatter salary roe || line salaryhat roe * Examine the largest and smallest of studentized residuals to * determine if there are some strong outliers. predict rstu, rstu * the "hilo" function has to be downloaded from * http://www.ats.ucla.edu/stat/stat/ado/analysis * It is not a part of the standard installation of STATA hilo rstu salary, show (5) histogram e, normal * Test the residuals for normality. They are not normally distributed. See * the sktest restult below. sktest e * Produce a Box Plot of the residuals graph box e * One possible way to treat the outliers in the original fit is to look at * the salary variable for unusual observations * Notice that the salary variable is not normally distributed histogram salary, normal * Let us try to address the outlier issue by taking the log of salary * as the dependent variable. Notice there are still a few very large salaries. * See the below histogram. histogram lsalary, normal * Now let us use lsalary for our subsequent dependent variable regress lsalary roe * Examine the largest and smallest of studentized residuals to * determine if there are some strong outliers. predict rstu_log, rstu * the "hilo" function has to be downloaded from * http://www.ats.ucla.edu/stat/stat/ado/analysis * It is not a part of the standard installation of STATA hilo rstu lsalary, show (5) * Let us eliminate the observations whose studentized residuals are greater * than three in absolute value. keep if abs(rstu_log)< 2.33 * Produce a histogram for the lsalary variable and superimpose a normal curve * on the histogram with the outliers eliminated. histogram lsalary, normal * After deleting outliers the lsalary variable appears to be normally * distributed. See the above histogram and the below sktest for normality. sktest lsalary * Produce a Box Plot for the lsalary variable * Notice there no outliers for the lsalary variable graph box lsalary * * As it turns out the roe variable is highly significant and of the correct * a priori sign in the lsalary regression without outliers. * Retrieve the predicted values and the residuals of the lsalary regression * without the outliers. regress lsalary roe predict lsalaryhat predict loge, residual * Plot the fitted regression scatter lsalary roe || line lsalaryhat roe * Plot the histogram of the residuals and compare to the normal distribution histogram loge, normal * Test the residuals for normality. They appear to be normally distributed. sktest loge * plot the residuals as a function of the roe variable. A visual check for * heteroskedasticity in the residuals scatter loge roe * Here is a formal test for Heteroskedasticity. There does not appear to * be heteroskedasticity in the residuals of the regression according to * the Bruesch/Pagan test regress lsalary roe estat hettest * Just for your information, if there had been heteroskedasticity in the * residuals of the regression we could have gotten consistent estimates * of the standard errors of the OLS estimates by using White's * robust standard errors. regress lsalary roe, vce(robust) * Just for the fun of it let's do a multiple regression analysis of the ceo * data but first let's do a matrix correlation plot looking at the * relationships between lsalary and the continous variables roe lsales and ros graph matrix lsalary roe lsales ros * Also let's do a multicollinearity check of the continous explanatory * variables roe lsales ros. Multicollinearity does not seem to pose a * significant problem here. Also see the VIF report below. None of the VIFs * are greater than 10. pwcorr lsalary roe lsales ros, sig * Now on to the multiple regression analysis regress lsalary roe lsales ros indus finance consprod estat hettest roe lsales ros estat vif * There seems to be heteroskedasticity in the residuals so * let us proceed with OLS and heteroskedasticity robust (White's) standard * errors. regress lsalary roe lsales ros indus finance consprod, vce(robust) * Using the backward selection technique we can eliminate the ros variable. regress lsalary roe lsales indus finance consprod, vce(robust) * Everything looks to be significant. This is our final equation. * * Interpretation of coefficients: * Notice roe is measured in percentages. A one percent increase (decrease) * in roe gives rise (on average) to a 1.45 percent increase in (decrease) * in salary. * A 100% increase (decrease) in sales implies (on average) a 23% increase * (decrease) in salary. * Lowest CEO salaries are in the utilities and transportation industries. * The highest CEO salaries are in the finance industry. * If one goes from being a utilities CEO to an industrial CEO, one expects * (on average) a 27% increase in salary.