### Survival Analysis

Survival analysis concerns the investigation of time-to-event data, and typically arises in the field of medical prognosis. For instance, one may be interested to predict heart attacks or organ failure in individuals with specific characteristics. Survival analysis also allows investigating the extent to which particular factors contribute in the occurrence of these events. In this chapter, I will describe how survival analyses can be performed in R. But first of all, I will explain what shape survival data takes, and how it can be generated. These data may then serve as verifiable input for the subsequent steps.

Typical survival data contains a set of patient characteristics, which may for instance represent covariates such as age and gender, or the treatment group in a trial. We denote these characteristics as *X*. In addition, survival data contains information about event times, which may for instance be represented by 2 variables indicating the reference time point (e.g. a surgery) and the time an event occurs (e.g. heart attack). Often, however, these two time points are combined into a duration time *T*, denoting the time until event. Recently, Bender et al. presented a technique to generate survival times based on a general formula that applies to exponential, Weibull and Gompertz distributions. This technique can be implemented as follows:

The example generates a fictional trial where a hazard ratio of 0.5 is assumed, i.e.subjects from the treatment group die at half the rate per unit time compared to subjects from the control group.

dsI = generateSurvivalData(N=1000, beta=log(0.5), v=1.45, lambda=0.07, method="Weibull") dsU = as.data.frame(cbind(dsI,1)) #all events are observed colnames(dsU) = c("x","time","status") sfit <- survfit(Surv(time,status)~x, data=dsU) ggkm(sfit,timeby=5)

Below is the corresponding Kaplan-Meier curve for the treatment and control arm.

Censoring

Bender R, Augustin T and Blettner M. *Generating survival times to simulate Cox proportional hazards models*. Statistics in Medicine 2005; **24**: 1713-1723. [Full Text]