*This code uses the data set from Chapter 8 of the Franses and Paap Book to conduct survival analysis. global group display feature featdispl insheet using "C:\data\CHAPTER8.csv" describe summarize *Generate the variable of the first difference of the log price gen logprice = ln(price) gen dlogprice = logprice - logprice[_n-1] if householdid - householdid[_n-1] ==0 drop if dlogprice==. *Scale some variables following the instruction on page 176 (line 3) in the Franses and Paap Book. replace nondetergexpend = nondetergexpend/100 replace volprevpurchase = volprevpurchase/32 *Set data as the Survival Analysis data *Note censdum=1 means that a failure event happened (in other words, this observation is not censored) stset interpurchtimes,failure(censdum) stdescribe *There are three popular ways to conduct Survival Analysis. *There are Non-Parametric, Semiparametric and Parametric. *We will show each of them in the following. ************************************************** ***********Non-Parametric Analysis**************** ************************************************** *obtain Kaplan-Meier survival estimates sts list *graph the Kaplan-Meier survival function. (i.e. graph the overall estimated survivor function for the data) sts graph *graph the Kaplan-Meier survival functions by group sts graph, by($group) *compare the survivor curves of (display==1) versus (display==0) sts graph,by(display) *compare the survivor curves of (feature==1) versus (feature==0) sts graph,by(feature) *compare the survivor curves of (featdispl==1) versus (featdispl==0) sts graph,by(featdispl) *plot the Nelson-Aalen cumulative hazard sts graph, cumhaz *plot an estimate of the hazard function sts graph, hazard *Test for equality of survival functions between groups sts test display,logrank sts test feature,logrank sts test featdispl,logrank *Test for equality of survival functions across all groups at once sts test $group **************************************************** *************Semi-Parametric Models***************** **************************************************** *Cox proportional hazards model - a Semiparametric Model (report the coefficients) *In this model, the baseline hazard function is unspecified. *In addition, this model does not have intercept because the intercept term is unidentified. stcox householdsize nondetergexpend volprevpurchase dlogprice display feature featdispl,nohr * Here we use the Proportional hazards test of the proportional-hazards assumption used in the Cox model estat phtest, detail *Cox proportional hazards model (report the hazard ratio) stcox householdsize nondetergexpend volprevpurchase dlogprice display feature featdispl *After we obtained the estimates for the Cox PH model, we can plot the estimates for the baseline cumulative hazard and survivor function *in order to check what the baseline hazard looks like. *Obtain the estimate of the baseline cumulative hazard predict H0,basechazard line H0 _t,c(J) sort *Obtain the estimated baseline survivor function predict S0,basesurv line S0 _t,c(J) sort ************************************************************************ ******************************Parametric Models************************* ************************************************************************ *************Part I: **************************************************** *************Parametric proportional hazards (PH) model****************** *Weibull Proportional Hazards Model (Weibull_PH), both coefficient and hazard rate forms streg householdsize nondetergexpend volprevpurchase dlogprice display feature featdispl,nohr dist(weibull) streg householdsize nondetergexpend volprevpurchase dlogprice display feature featdispl,dist(weibull) *Exponential Proportional Hazards Model, both coefficient and hazard rate forms streg householdsize nondetergexpend volprevpurchase dlogprice display feature featdispl,nohr dist(exponential) streg householdsize nondetergexpend volprevpurchase dlogprice display feature featdispl,dist(exponential) *************Part II: **************************************************** *************Parametric Accelerated Failure Time(AFT) model*************** *Weibull Accelerated Lifetime model (Table 8.2 in the Franses and Paap Book) *Note: the option "time" specifies that the model is in the Accelerated Failure Time (AFT) metric. *Note: the relationship between the coefficient in the following model (Weibull_AFT) *and the coefficient in the previous Weibull proportional hazards model (Weibull_PH) is that Beta_Weibull_AFT*P=Beta_Weibull_PH, *where P is the estimated parameter for the Weibull distribution. streg householdsize nondetergexpend volprevpurchase dlogprice display feature featdispl,dist(weibull) time *Log-Logistic AFT model *Note: in Stata, the loglogistic model is implemented only in the AFT form; *and loglogistic model has no natural PH interpretation. See An Introducution to Survival Analysis Using Stata by Cleves, et. al. * for more discussion on this point. streg householdsize nondetergexpend volprevpurchase dlogprice display feature featdispl,dist(loglogistic)