* A description of the data
describe
* Report summary statistics for the variables
summarize
* More detailed information on the variables
codebook
* You can summarize variable information by categories
summarize salary if finance == 1
summarize salary if finance == 0
* Here is the regression of salary on roe in Chapter 2 of the Wooldridge
* textbook
regress salary roe
* Retrieve the predicted values and the residuals of the regression
predict salaryhat
predict e, residual
* Plot the fitted regression. There are some pretty obvious outliers in the
* regression fit.
scatter salary roe || line salaryhat roe
* Examine the largest and smallest of studentized residuals to
* determine if there are some strong outliers.
predict rstu, rstu
* the "hilo" function has to be downloaded from
* http://www.ats.ucla.edu/stat/stat/ado/analysis
* It is not a part of the standard installation of STATA
hilo rstu salary, show (5)
histogram e, normal
* Test the residuals for normality. They are not normally distributed. See
* the sktest restult below.
sktest e
* Produce a Box Plot of the residuals
graph box e
* One possible way to treat the outliers in the original fit is to look at
* the salary variable for unusual observations
* Notice that the salary variable is not normally distributed
histogram salary, normal
* Let us try to address the outlier issue by taking the log of salary
* as the dependent variable. Notice there are still a few very large salaries.
* See the below histogram.
histogram lsalary, normal
* Now let us use lsalary for our subsequent dependent variable
regress lsalary roe
* Examine the largest and smallest of studentized residuals to
* determine if there are some strong outliers.
predict rstu_log, rstu
* the "hilo" function has to be downloaded from
* http://www.ats.ucla.edu/stat/stat/ado/analysis
* It is not a part of the standard installation of STATA
hilo rstu lsalary, show (5)
* Let us eliminate the observations whose studentized residuals are greater
* than three in absolute value.
keep if abs(rstu_log) < 2.33
* Produce a histogram for the lsalary variable and superimpose a normal curve
* on the histogram with the outliers eliminated.
histogram lsalary, normal
* After deleting outliers the lsalary variable appears to be normally
* distributed. See the above histogram and the below sktest for normality.
sktest lsalary
* Produce a Box Plot for the lsalary variable
* Notice there no outliers for the lsalary variable
graph box lsalary
*
* As it turns out the roe variable is highly significant and of the correct
* a priori sign in the lsalary regression without outliers.
* Retrieve the predicted values and the residuals of the lsalary regression
* without the outliers.
regress lsalary roe
predict lsalaryhat
predict loge, residual
* Plot the fitted regression
scatter lsalary roe || line lsalaryhat roe
* Plot the histogram of the residuals and compare to the normal distribution
histogram loge, normal
* Test the residuals for normality. They appear to be normally distributed.
sktest loge
* plot the residuals as a function of the roe variable. A visual check for
* heteroskedasticity in the residuals
scatter loge roe
* Here is a formal test for Heteroskedasticity. There does not appear to
* be heteroskedasticity in the residuals of the regression according to
* the Bruesch/Pagan test
regress lsalary roe
estat hettest
* Just for your information, if there had been heteroskedasticity in the
* residuals of the regression we could have gotten consistent estimates
* of the standard errors of the OLS estimates by using White's
* robust standard errors.
regress lsalary roe, vce(robust)
* Just for the fun of it let's do a multiple regression analysis of the ceo
* data but first let's do a matrix correlation plot looking at the
* relationships between lsalary and the continous variables roe lsales and ros
graph matrix lsalary roe lsales ros
* Also let's do a multicollinearity check of the continous explanatory
* variables roe lsales ros. Multicollinearity does not seem to pose a
* significant problem here. Also see the VIF report below. None of the VIFs
* are greater than 10.
pwcorr lsalary roe lsales ros, sig
* Now on to the multiple regression analysis
regress lsalary roe lsales ros indus finance consprod
estat hettest roe lsales ros
estat vif
* There seems to be heteroskedasticity in the residuals so
* let us proceed with OLS and heteroskedasticity robust (White's) standard
* errors.
regress lsalary roe lsales ros indus finance consprod, vce(robust)
* Using the backward selection technique we can eliminate the ros variable.
regress lsalary roe lsales indus finance consprod, vce(robust)
* Everything looks to be significant. This is our final equation.
*
* Interpretation of coefficients:
* Notice roe is measured in percentages. A one percent increase (decrease)
* in roe gives rise (on average) to a 1.45 percent increase in (decrease)
* in salary.
* A 100% increase (decrease) in sales implies (on average) a 23% increase
* (decrease) in salary.
* Lowest CEO salaries are in the utilities and transportation industries.
* The highest CEO salaries are in the finance industry.
* If one goes from being a utilities CEO to an industrial CEO, one expects
* (on average) a 27% increase in salary.