The Kaplan-Meier method (Kaplan & Meier, 1958) (also known as the “product-limit method”) is a nonparametric method used to estimate the probability of survival past given time points (i.e., it calculates a survival distribution). Furthermore, the survival distributions of two or more groups of a between-subjects factor can be compared for equality.

For example, in a study on the effect of drug dose on cancer survival in rats, you could use the Kaplan-Meier method to understand the survival distribution (based on time until death) for rats receiving one of four different drug doses (i.e., “40 mg/m2/d”, “80 mg/m2/d”, “120 mg/m2/d” and “160 mg/m2/d”) (i.e., the survival time variable would be “time to death” and the between-subjects factor would be “drug dose”) and then compare the survival distributions (experiences) between the four doses to determine if they are equal. Alternately, you could use the Kaplan-Meier method to determine whether the (distribution of) time to failure of a knee replacement differs based on exercise impact amongst young patients (i.e., the survival time would be “time to knee replacement failure” and the between-subjects factor would be “exercise impact”, which has three groups: “sedentary”, “low impact” and “high impact”).

## Assumptions

In order to run a Kaplan-Meier analysis, there are six assumptions that must be met. These are:

• Assumption 1: The event status should consist of two mutually exclusive and collectively exhaustive states: “censored” or “event” (where the “event” can also be referred to as “failure”). The event status is mutually exclusive because the outcome for a case can either be censored or the event has occurred. It cannot be both. For example, imagine that we were interested in the survival times of people suffering from skin cancer, where the event is (sadly), “death”. If the length of the experiment was 5 years, at the end of the 5 year period, all participants would either be “censored” or “dead”. Therefore, the two states should not only be mutually exclusive, but also collectively exhaustive (i.e., at least one of these states – censored or event – must occur).
• Assumption 2: The time to an event or censorship (known as the “survival time”) should be clearly defined and precisely measured. The Kaplan-Meier method, unlike some other approaches to survival analysis (e.g., the actuarial approach), requires the survival time to be recorded precisely (i.e., exactly when the event or censorship occurred) rather than simply recording whether the event occurred within some predefined interval (e.g., only recording when a death or censorship occurred sometime within a 1, 2, 3, 4 and 5 year follow-up). In addition, the survival time should be clearly defined, whether this is measured in days, weeks, months, years, or some of time-based measurement.
• Assumption 3: Where possible, left-censoring should be minimized or avoided. Left-censoring occurs when the starting point of an experiment is not easily identifiable. For example, imagine that we were interested in the survival times of people suffering from skin cancer. The “ideal” starting point would be to measure the survival time from the very moment that the participant developed skin cancer. However, it is more likely that the first time the participant knew they had cancer was the moment it was diagnosed, such that the “diagnosis” acts as the starting point for the experiment. Even if we isolated our sample to a “Stage 1” cancer diagnosis, there will still be differences between participants. For example, some participants may have had a suspicious mole that they did not get checked for some time, whilst other participants may have regular check-ups such that a diagnosis was made much earlier. Therefore, the time between the participant developing skin cancer and the diagnosis is unknown and is not included in the Kaplan-Meier analysis. The result is that this data – known as left-censored data – does not reflect the observed survival time. Instead, the survival time recorded will be less than (or equal to) the observed survival time. As such, the goal is to avoid left-censoring as much as possible.
• Assumption 4: There should be independence of censoring and the event. This means that the reason why cases are censored does not relate to the event. For example, imagine that we were again interested in the survival times of people suffering from cancer, where the event is “death”. For the assumption of independent censoring to be met, we need to be confident that when we record that a participant is “censored”, this is not because they were at greater risk of the event occurring (i.e., death being the “event” in this case). Instead, there may be many other reasons why a participant is “legitimately censored”, including: (a) natural dropout or withdrawal (e.g., perhaps because the participant does not want to take part in the experiment any more or moves from the area); and (b) the event not occurring by the end of the experiment (e.g., if the follow-up period for the experiment is 5-years, any participant still alive at this point will be recorded as “censored”). Independent censoring is important because the Kaplan-Meier method is based on observed data (i.e., observed events) and assumes that censored data behaves in the same way as uncensored data (after the censoring). However, if the censored data does relate to the event (e.g., a participant that was recorded as being censored died due to the cancer or perhaps even something related to the cancer), this introduces serious bias to the results (e.g., over-estimating 5-year survival rates from skin cancer amongst participants).
• Assumption 5: There should be no secular trends (also known as secular changes). A characteristic of many studies that involve survival analysis is that: (a) there is often a long time period between the start and end of the experiment; and (b) not all cases (e.g., participants) tend to start the experiment at the same time. For example, the starting point in our hypothetical experiment was when participants were “diagnosed” with skin cancer. However, imagine that we wanted a sample of 500 participants in our experiment. It may take a number of months to recruit all of these participants, who each would have different starting points (i.e., the dates when they were diagnosed), but we would “pool” the starting and subsequent times (e.g., everybody’s first diagnosis would be time point 0). However, if over this period of time, factors have changed that affect the likelihood of the event then we may introduce bias. For example, death rates for skin cancer may have gone down following the introduction of new drugs, improving survival rates amongst participants joining the experiment later on (i.e., increasing right-censoring). Alternately, the introduction of a national skin screening programme may have led to faster diagnoses, increasing the appearance of better survival rates (i.e., reducing left-censoring). These factors (e.g., new drugs or better screening) are examples of secular trends that can bias the results.
• Assumption 6: There should be a similar amount and pattern of censorship per group. One of the assumptions of the Kaplan-Meier method and the statistical tests for differences between group survival distributions (e.g., the log rank test, which we discuss much later in the guide) is that censoring is similar in all groups tested. This includes a similar “amount” of censorship per group and similar “patterns” of censorship per group. Failure to meet the assumption can lead to incorrect conclusions as will be discussed later (Bland & Altman, 2004; Hosmer et al., 2008; Norušis, 2012).

## Interpreting Results

After running the Kaplan-Meier test procedures in the previous section, SPSS Statistics will have generated a number of tables and graphs that contain all the information you need to report the results of your Kaplan-Meier analysis. We show you how to interpret these results. We also show how to write up this output as you work through the section.

There are two stages to interpreting the results from a Kaplan-Meier analysis: (a) determining whether there are statistically significant differences between the survival distributions; and (b) if there are statistically significant differences between the survival functions, carrying out pairwise comparisons to determine where such differences are. To recap:

• First, you need to determine whether there are statistically significant differences between the survival functions: Before doing this, it is useful to interpret the plot of the (cumulative) survival functions for the groups of your between-subjects factor (e.g., the three groups of our between-subjects factor, intervention, which were: the “hypnotherapy programme”, “nicotine patch” and “e-cigarette”). To build on this plot and get another ‘feel’ for the results, it is a good idea to view the descriptive statistics that are produced, which illustrate how survival times vary between the groups. You can use the SPSS Statistics output from the three statistical tests that can be run to determine whether the survival distributions are equal (i.e., the log rank testBreslow test and Tarone-Ware test). Ultimately, the results from these tests will determine whether there are any statistically significant differences in survival distribution between the groups of your between-subjects factor.
• Second, if you have statistically significant differences between the survival functions, you can carry out pairwise comparisons: If you already ran the pairwise comparison procedure, you can go straight to interpreting the SPSS Statistics output from this. We will show you how to interpret the pairwise comparisons for the log rank test. This will tell you which of the groups of your between-subjects factor differed from each other (e.g., whether there was a difference in the survival distribution for those participants that underwent the hypnotherapy programme compared to those using nicotine patches).