Outcomes from clinical and other healthcare trials of most interest to patients and health systems are usually increases in the length and quality of life as a result of treatment. This poses a problem, because to estimate overall survival (OS) sufficiently accurately often requires long-term follow-up. Direct extrapolation of OS to encompass those patients who had not died by the end of the follow-up period may be carried out. Although it is the preferred method of the United Kingdom's National Institute of Health and Care Excellence (NICE), its use is not always satisfactory (Reference Latimer2;Reference Latimer, Abrams and Lambert3). An alternative is to use progression-free survival (PFS) as a surrogate outcome. Trials with an adequate surrogate end point can be shorter and involve fewer patients, and can thus help to bring a new drug or treatment to the market sooner, or allow products to be brought to market where the costs of a trial using OS would not be justified by the expected returns. This is beneficial to patients and health systems, and improves returns to manufacturers.
However, these benefits are achieved at the expense of a less accurate measure of final outcome than would occur by waiting for data such as OS to become available. Thus there is a trade-off between time elapsed from the end of study (excluding follow-up) before information becomes available, and the accuracy of the information about the benefits of treatment. This study analyses the methodologies and challenges faced when using PFS as a surrogate for OS in oncology.
METHODS
Davis et al. (Reference Davis, Tappenden and Cantrell1) conducted a literature review on the use of surrogate end points in oncology up to the end of 2011. They identified 266 articles, using citation searching to identify relevant papers from an initial list of three papers already known to the authors. They said a systematic literature review was not feasible, because an exploratory search returned a very large number of references (over 3,000), and because any attempt to make the search more specific resulted in many relevant papers being excluded.
Davis et al. (Reference Davis, Tappenden and Cantrell1) included all reviews that examined a statistical relationship between OS and either PFS or time to progression (TTP) and considered any form of treatment where curing the disease was not expected. Nineteen key articles concerning the relationship between PFS/TTP and OS in advanced/metastatic cancer were included.
We updated the review conducted by Davis et al. (Reference Davis, Tappenden and Cantrell1) to 2016 using similar search methods and selection criteria to preserve as much comparability as possible, and using the nineteen papers identified by them as our key papers upon which to base our citation search. We considered only articles in which PFS was mentioned, and excluded those articles that analyzed only TTP. Davis et al. (Reference Davis, Tappenden and Cantrell1) did not include radiographic progression-free survival studies, and so neither did we. A previous follow-up of Davis et al. (Reference Davis, Tappenden and Cantrell1) was carried out by Ciani et al. (Reference Ciani, Davis and Tappenden4). Our analysis differs from that of Ciani et al. (Reference Ciani, Davis and Tappenden4) as our aim has additionally been to examine the statistical methodologies most commonly applied as well as the main challenges faced by authors when assessing the validity of PFS as a surrogate.
In August 2016, we conducted a Google Scholar citation search from January 2012 to June 2016, identifying a total of 790 articles which had cited any of the original nineteen key articles identified by Davis et al. (Reference Davis, Tappenden and Cantrell1). We applied four inclusion criteria: (i) mentioned PFS and OS in the title, or (ii) mentioned PFS as a surrogate (including surrogate + outcome/end point/measure), or (iii) analysis of possible surrogate measures in cancer, or (iv) analyzed end points for cancer. An additional seven articles were excluded because they are reviews of previous studies of PFS surrogacy. After applying a series of exclusion criteria (Figure 1), forty-eight articles were included in the analysis.
The nineteen papers reviewed by Davis et al. (Reference Davis, Tappenden and Cantrell1) and the forty-eight in this study were mostly summaries of many studies. For each study within a study, there was a single aggregate data point pair, the average/median PFS and the average/median OS. This pair of points was then used as one observation in an analysis that pooled all such pairs from the many studies considered in the study. The sample size for the estimated correlations or regression parameters equals the number of studies considered by the paper. Such data are known as aggregated clinical trial data (ACTD), in which large amounts of data (all the data points of individuals within each trial) have been ignored. In both Davis et al. (Reference Davis, Tappenden and Cantrell1) and our paper, a minority of analyses used unaggregated source data. That is, each patient within the study has a pair of observations which is brought together in a sample, the size of which is the number of patients, for estimating correlations and regression parameters. Such data are known as individual patient data (IPD). Information regarding methodology, data, and factors affecting the relationship between PFS and OS were extracted. Author affiliation and publication journal were also collected by K.H.V. The quality and accuracy of the extraction was verified by A.F.
RESULTS
The 2012–16 results from this review are very similar to those of Davis et al. (Reference Davis, Tappenden and Cantrell1). Davis et al. (Reference Davis, Tappenden and Cantrell1) usually found a positive correlation between PFS/TTP and OS for individual patients, individual trial arms and the treatment effect between trial arms: only 10.5 percent (2/19) of articles did not support the idea that PFS/TTP could be a useful surrogate for OS. However, the size of the correlation and its statistical significance varied considerably across studies, particularly between cancer types. The authors attributed this variation to the dissimilarities in patient characteristics from study to study, such as tumor type, line of therapy, and diversity of treatment methods.
We classified the results into three groups: (i) papers that explicitly mentioned that PFS is a good surrogate for OS, (ii) those that indicated that PFS could be a good surrogate only under certain conditions, and (iii) those that concluded that PFS is not a good surrogate for OS. Davis et al. (Reference Davis, Tappenden and Cantrell1) (7/12) and our (17/32) analysis both indicate that around 55 percent of the articles using ACTD from multiple trials support PFS surrogacy. Eight of the remaining fifteen articles in our study do not support surrogacy, while seven articles support PFS surrogacy only under particular conditions (e.g., treatment line).
The lack of IPD is evident if we consider that around 35 percent of the articles found both by Davis et al. (Reference Davis, Tappenden and Cantrell1) (7/19) and by this review (16/48) include IPD data (Table 1). Among the ten articles that used solely IPD in our review, four supported surrogacy, five did not and one supported surrogacy for treatments that have a major impact on PFS. Moreover, six of the ten IPD articles in our review were based on information collected in a single Japanese institution, which indicates that the conclusions should be viewed with caution (Reference Imai, Mori and Ono5–Reference Shitara, Matsuo, Muro, Doi and Ohtsu10).
Note. Source: Authors’ search.
a Excluding the seven articles that are reviews of previous studies.
b Three of the Davis et al. (Reference Davis, Tappenden and Cantrell1) studies include more than one cancer type.
c Correlation between PFS and OS or between PFS/TTP when the first was not reported.
ACTD, aggregated clinical trial data; AP, Appropriate surrogate; DPF, Depends on particular factors; IPD, individual patient data; NoAP, PFS is not an appropriate surrogate; NSCLC, non-small cell lung cancer; SCLC, small cell lung cancer
Methodologies and Statistical Results
In analyzing the relationship between PFS and OS, the most usual preference has been for the use of correlation such as Spearman, Pearson, or Kendall's τ (71 percent-34/48) and weighted or unweighted linear regression (73 percent-35/48) (Table 2), comparable to the findings of Davis et al. (Reference Davis, Tappenden and Cantrell1). Moreover, like Davis et al. (Reference Davis, Tappenden and Cantrell1), we found that many different variations in methodology have been applied which makes it difficult to compare the results of studies (e.g., Aboshi et al. (Reference Aboshi, Kaneko and Narukawa11) and Bria et al. (Reference Bria, Massari and Maines12), Table 2).
Note. Source: authors’ research.
a Weighted linear regression.
b Value not significant at 5% confidence level.
c Factor which determines the validity of the PFS surrogacy.
; Copula R2, weighted (by trial size) correlation of the joint copula effects; ; EIV, errors in variables; HR, hazard ratio; ; NSCLC, NSCLC, non-small cell lung cancer; OS, overall survival; PFS, progression-free survival; PPS, postprogression survival; ROC, receiver operating characteristic; SCLC, small cell lung cancer; Sp, Spearman; Pe, Pearson; WLS R2, weighted (by trial size) least squares regression of marginal Cox model effects.
Additionally, of the nine articles that used Pearson correlation, seven support surrogacy and two support surrogacy only under particular factors. However, from the twenty-one articles that used Spearman correlation only seven support surrogacy. This suggests an effect of the correlation test selected. Pearson correlations are affected by outliers more strongly than Spearman correlations; the use of the Pearson correlation without considering outliers could lead to misleading conclusions.
A common practice for testing whether the surrogate is capable of predicting the clinical end point is to analyze the relationship between the actual values of the PFS and OS. Those articles that include ACTD mostly use median PFS and median OS, but in some cases the estimation uses a logarithmic transformation of the variables (Reference Giessen, Laubender and Ankerst27). In the case of IPD articles, the relationship between the actual values per patient of PFS and OS is used to estimate the predictive capacity of PFS.
The effect of treatment in changing PFS to predict the effect of treatment in changing OS is explored throughout the analysis of ACTD. Here it is common to compare hazard ratios (HRs); however, we also identified articles in which the differences in median PFS and OS were examined (Reference Delea, Khuu, Heng, Haas and Soulieres13;Reference Adunlin, Cyrus and Dranitsaris33;Reference Ciani, Buyse, Garside, Peters, Saad and Stein41).
By comparing the distribution of the observations by Surrogacy (Appropriate surrogate [AP]; Depends on particular factors (DPF); PFS is not an appropriate surrogate [NoAP]) (Table 1) with the distribution of the observations split into two depending on the type of data (ACTD, IPD, or Both), we observe a significant relationship (Fisher's Exact Test 0.039; Pearson's Chi-squared 0.043) between Surrogacy and Type of data, which suggests that the availability of IPD data affects the final conclusion.
Surrogate Threshold Effect
Surrogate threshold effect (STE) is defined as the minimum treatment effect on the surrogate necessary to predict a nonzero effect on the true end point (Reference Mauguen, Pignon and Burdett47). It is normally presented as a HR, for instance, a STE equal to 0.8 means that a PFS HR smaller than 0.8 would need to be observed to predict a less than 1.0 HR for OS. This concept has the advantage of not being a yes or no answer to the question of surrogacy, but a lower bound for PFS that, if achieved, would indicate that a statistically significant effect of the treatment on OS can be predicted. STE has been used relatively more frequently since 2012. In Davis et al. (Reference Davis, Tappenden and Cantrell1), STE was reported in only two of nineteen papers; we identified eleven of forty-eight articles (including five of the six articles that included both IPD and ACTD). However, it is not clear whether STE is affecting authors’ final recommendations. For instance, Foster et al. (Reference Foster, Renfro and Schild49) and Shi et al. (Reference Shi, De Gramont and Grothey50) supported PFS surrogacy with an STE for PFS HR smaller than 0.70 while Ciani et al. (Reference Ciani, Buyse, Garside, Peters, Saad and Stein41) rejected surrogacy with an STE of 0.80 (Table 3).
Note. Source: authors’ research.
a “Both” means ACTD together with IPD.
ACTD, aggregated clinical trial data; ; ; HR, hazard ratio; IPD, individual patient data; ; NSCLC, non-small cell lung cancer; OS, overall survival; PFS, progression-free survival; SCLC, small cell lung cancer; STE, surrogate threshold effect.
b The ratios of the median between the control and experimental arms in each trial were used to summarise treatment effects because the HRs were not always available. STE corresponds to the upper 95% limit and a median OS ratio equal to 1.
Weaknesses of the Current Approaches
Lack of Rigor in Applying Methodology
Weighted linear regression, the most frequently used method of analysis, is based on assumptions that are not tested in the majority of articles when analyzing surrogacy. In only a few such cases was the type of model mentioned (Reference Li, Liu, Gu and Wang14). Exceptions are Félix et al. (Reference Félix, Aragão and Almeida16) that use the generalized method of moments to control for heteroscedasticity; and Johnson et al. (Reference Johnson, Liauw and Lassere35) whose results showed unsatisfactory diagnostics with non-normality and heteroscedasticity in the residuals.
Although linear regression assumptions should be so widely known as to be considered irrelevant to report them, their absence leaves many analyses open to the suggestion of failure to handle complications, such as the presence of outliers. Clear outliers are shown in Yoshino et al. (Reference Yoshino, Imai and Mori9) and Moriwaki et al. (Reference Moriwaki, Yamamoto and Gosho38). Only 23 percent (11/48) of studies consider or mention outliers. In five of the eleven cases, authors test the sensitivity of the results by applying a “leave-one-out” strategy (Reference Chen, Sun and Chen26;Reference Mauguen, Pignon and Burdett47;Reference Foster, Renfro and Schild49;Reference Michiels, Pugliano and Marguet52). The six remaining articles test the sensitivity of the results by excluding those trials that are considered outliers (Reference Delea, Khuu, Heng, Haas and Soulieres13;Reference Félix, Aragão and Almeida16;Reference Flaherty, Hennig and Lee23;Reference Singh, Wang and Law24;Reference Petrelli and Barni32;Reference Özer-Stillman, Strand, Chang, Mohamed and Tranbarger-Freier36).
Publication bias could also have a significant impact on the results, particularly for ACTD. Of the thirty-one articles that include systematic literature reviews, eleven mentioned publication bias as a possible limitation of the study while an additional six articles included a step to overcome the possible bias. Among the six, some articles considered both published and unpublished clinical trials (Reference Shitara, Ikeda and Yokota15;Reference Hotta, Suzuki and Di Maio30;Reference Shitara, Matsuo, Muro, Doi and Ohtsu40); the others analyzed the extent to which bias represents a problem using Egger's regression test (Reference Delea, Khuu, Heng, Haas and Soulieres13;Reference Singh, Wang and Law24;Reference Ciani, Buyse, Garside, Peters, Saad and Stein41).
Apparently Inconsistent Conclusions
The German Institute of Quality and Efficiency in Health Care (IQWiG) framework considers “low correlation” to be present when the upper limit (95 percent confidence interval) of the correlation is under 0.70 (Pearson R2 smaller than 0.49) (53). Two of the five articles concerning colorectal cancer that support surrogacy have correlation values lower than 0.70 (Reference Petrelli and Barni19;Reference Shi, De Gramont and Grothey50). For renal carcinoma, Delea et al. (Reference Delea, Khuu, Heng, Haas and Soulieres13) and Halabi et al. (Reference Halabi, Rini, Escudier, Stadler and Small44) support PFS as a surrogate, despite values of association lower than 0.7 (Table 2).
Petrelli and Barni (Reference Petrelli and Barni20) stand out because of the lack of consistency between supporting PFS surrogacy and having results in which R2 (equal to 0.00) and correlation values (treatment effect correlation 0.59 and actual values correlation 0.26) are particularly low. They observed a weak correlation between PFS and OS for NSCLC. However, they still supported PFS as a surrogate for OS, a decision that appears to be influenced by the slope of the linear regression. The slope suggests that a 1-month gain in PFS will be linked to 3 weeks’ prolongation in OS. However, the reasons for concluding that the surrogacy is supported remain unclear.
Table 2 shows inconsistencies between IPD and aggregate approaches (Reference Shitara, Matsuo, Muro, Doi and Ohtsu40;Reference Terashima, Yamashita and Takata42). Foster et al. (Reference Foster, Renfro and Schild49), based on ACTD, supported surrogacy, while IPD data from the same trial were less conclusive.
Finally, it is possible that different studies analyze PFS surrogacy for a particular cancer type by including a similar list of clinical trials. However, it is not possible to observe whether similar lists of clinical trials could result in different conclusions because the references of the included trials are not mentioned in some of the papers (Reference Li, Liu, Gu and Wang14).
Challenges for Analyzing PFS as a Surrogate of OS
Based on the variables included as part of sensitivity analyses or that have been included in multivariate analyses in the papers included in this review, we identify a group of factors that could affect the relationship between PFS and OS and appear in at least five studies.
Type of Treatment and/or Therapy
The literature suggests that the relationship between PFS and OS can be different within the same cancer trial depending on the treatment applied or the therapy selected (Reference Aboshi, Kaneko and Narukawa11–Reference Beauchemin, Cooper, Lapierre, Yelle and Lachaine22;Reference Chen, Sun and Chen26;Reference Petrelli and Barni32;Reference Adunlin, Cyrus and Dranitsaris33;Reference Johnson, Liauw and Lassere35;Reference Özer-Stillman, Strand, Chang, Mohamed and Tranbarger-Freier36;Reference Moriwaki, Yamamoto and Gosho38;Reference Shitara, Matsuo, Muro, Doi and Ohtsu40–Reference Terashima, Yamashita and Takata42;Reference Mauguen, Pignon and Burdett47;Reference Foster, Renfro and Schild49;Reference Shi, De Gramont and Grothey50).
Treatment Line
In some cases, the analysis cannot validate the surrogacy for first line therapy, as distinct from the second or third line therapy, mainly because postprogression survival obscures the results of the first line treatments. Petrelli and Barni (Reference Petrelli and Barni32), who analyzed twenty first-line clinical trials, proposed that the decreases in correlation between PFS HR and OS HR observed in recent years is likely to be due to the influence of postprogression treatments (Reference Li, Liu, Gu and Wang14;Reference Sidhu, Rong and Dahlberg21;Reference Beauchemin, Cooper, Lapierre, Yelle and Lachaine22;Reference Cartier, Zhang and Rosen25;Reference Chen, Sun and Chen26;Reference Petrelli, Coinu, Borgonovo, Cabiddu and Barni28;Reference Adunlin, Cyrus and Dranitsaris33;Reference Hotta, Kato and Leighl34;Reference Özer-Stillman, Strand, Chang, Mohamed and Tranbarger-Freier36;Reference Shitara, Matsuo, Muro, Doi and Ohtsu40;Reference Ciani, Buyse, Garside, Peters, Saad and Stein41;Reference Foster, Renfro and Schild49;Reference Michiels, Pugliano and Marguet52).
Year of the Trial
The importance of the year in which the clinical trial was conducted or published was explained by the number of drugs available having increased (eleven) and because the criteria applied to measure progression have changed (e.g., RECIST published in 2000 was modified in 2010 to mRECIST (Reference Lencioni and Llovet54)) (Reference Aboshi, Kaneko and Narukawa11;Reference Delea, Khuu, Heng, Haas and Soulieres13;Reference Shitara, Ikeda and Yokota15;Reference Félix, Aragão and Almeida16;Reference Han, Ren and Wick18;Reference Petrelli and Barni19;Reference Beauchemin, Cooper, Lapierre, Yelle and Lachaine22;Reference Petrelli, Coinu, Borgonovo, Cabiddu and Barni28;Reference Kawakami, Okamoto and Hayashi31–Reference Hotta, Kato and Leighl34;Reference Shitara, Matsuo, Muro, Doi and Ohtsu40).
Sub-group of Patients or Tumor Type
As in Davis et al. (Reference Davis, Tappenden and Cantrell1), the results from the validation of PFS as a surrogate point for OS vary substantially between cancer types (Reference Shitara, Matsuo, Muro, Doi and Ohtsu10;Reference Aboshi, Kaneko and Narukawa11;Reference Félix, Aragão and Almeida16;Reference Han, Ren and Wick18;Reference Sidhu, Rong and Dahlberg21;Reference Chen, Sun and Chen26;Reference Petrelli and Barni32;Reference Hotta, Kato and Leighl34;Reference Petrelli and Barni39;Reference Michiels, Pugliano and Marguet52). Six of sixteen articles for lung cancer conclude that PFS is not an appropriate surrogate for OS, and consistency does not improve when we consider the line of therapy and the phase of the clinical trial. This might be related to the fact that the criteria for supporting surrogacy differ considerably between studies. Additionally, it might not be just the observed relationship that is changing between studies. This suggests a need to standardize criteria.
Definition of PFS and Other Measures
Disease progression is often defined differently between clinical trials. This heterogeneity pertains to the period within which patients are evaluated; time intervals between radiological and clinical assessments; and what constitutes patient progression (e.g., variation of the size of the tumor, and tumor characteristics). There are many other forms of heterogeneity, some of which have been mentioned in four sub-sections above. The inclusion of Phase II trials is also likely to increase heterogeneity. This occurs because at most one arm of a dose-determining trial should be featured in a subsequent Phase III trial. Arms in which patients are over-dosed and on average die sooner than those in the control group should not feature in Phase III trials. Their inclusion in papers that measure the correlation between PFS and OS will reduce the estimated correlation.
Twenty two articles mentioned the problem of heterogeneity as a limitation, but did not adjust the methodology in response to the problem. A further seven studies adjusted the methodology (Reference Han, Ren and Wick18;Reference Flaherty, Hennig and Lee23;Reference Chen, Sun and Chen26;Reference Kawakami, Okamoto and Hayashi31–Reference Adunlin, Cyrus and Dranitsaris33;Reference Terashima, Yamashita and Takata42), two of which included only clinical trials that have the same set of progression criteria (RECIST criteria) (Reference Flaherty, Hennig and Lee23;Reference Terashima, Yamashita and Takata42). Three of the seven included variables such as presence of measurable lesions and tumor response in sensitivity analysis (Reference Han, Ren and Wick18;Reference Kawakami, Okamoto and Hayashi31;Reference Adunlin, Cyrus and Dranitsaris33). Finally, two of the seven studies used established definitions to extract the information collected from the clinical trials, regardless of the terminology used by the original authors (Reference Chen, Sun and Chen26;Reference Petrelli and Barni32).
Additionally, nineteen of the thirty-two studies based on trial data combined PFS and TTP into a single surrogate measure. In addition to progression, PFS includes death as a result of any cause while in the case of TTP the event of interest is only disease progression, although some authors consider, as part of TTP, deaths caused by the disease in question (e.g., Burzykowski et al. [Reference Burzykowski, Buyse and Piccart-Gebhart55] identified by Davis et al. [Reference Davis, Tappenden and Cantrell1]). All-cause mortality can dilute the association between PFS/TTP and OS. Nine of the nineteen articles that include PFS and TTP analyze the sensitivity of the results by breaking down the articles into those that measure PFS and those that measure TTP (Reference Delea, Khuu, Heng, Haas and Soulieres13;Reference Shitara, Ikeda and Yokota15;Reference Beauchemin, Cooper, Lapierre, Yelle and Lachaine22;Reference Cartier, Zhang and Rosen25;Reference Kawakami, Okamoto and Hayashi31–Reference Adunlin, Cyrus and Dranitsaris33;Reference Moriwaki, Yamamoto and Gosho38;Reference Terashima, Yamashita and Takata42). In contrast to what we would expect, Delea et al. (Reference Delea, Khuu, Heng, Haas and Soulieres13), Petrelli and Barni (Reference Petrelli and Barni32), and Shitara et al. (Reference Shitara, Ikeda and Yokota15) found that studies that include PFS have a higher correlation with OS compared with studies that include TTP. However, Moriwaki et al. (Reference Moriwaki, Yamamoto and Gosho38) found a slightly lower correlation when TTP trials were excluded.
Geographical Context (Reference Aboshi, Kaneko and Narukawa11;Reference Li, Liu, Gu and Wang14;Reference Shitara, Ikeda and Yokota15;Reference Petrelli, Coinu, Borgonovo, Cabiddu and Barni28;Reference Adunlin, Cyrus and Dranitsaris33;Reference Shitara, Matsuo, Muro, Doi and Ohtsu40)
A reason given to explain geographical differences in trial results is the variation in comparator (i.e., standard) treatments between Asian and occidental countries. In addition, in advanced gastric cancer, Shitara et al. (Reference Shitara, Ikeda and Yokota15) pointed out several differences in tumor characteristics and practice patterns (e.g., surgery and chemotherapy) that have been identified between Asian and occidental countries.
7) Crossover (Reference Delea, Khuu, Heng, Haas and Soulieres13;Reference Flaherty, Hennig and Lee23;Reference Hotta, Suzuki and Di Maio30;Reference Adunlin, Cyrus and Dranitsaris33;Reference Ciani, Buyse, Garside, Peters, Saad and Stein41)
Some clinical trials allow crossover to the experimental regimen upon disease progression. This hinders the analysis of the treatment effect on OS. Eighteen of the thirty-two articles that use ACTD mentioned crossover, while six articles considered it during the estimation. For renal cell carcinoma, Delea et al. (Reference Delea, Khuu, Heng, Haas and Soulieres13) indicate that the link between the effect of the treatment on PFS and the effect of the treatment on OS was stronger in studies that did not allow crossover. In melanoma, Flaherty et al. (Reference Flaherty, Hennig and Lee23) suggested that correlation coefficients for the nine trials without crossover were significant and more than 7 percentage points higher than with crossover. Hotta et al. (Reference Hotta, Suzuki and Di Maio30), in studying NSCLC, suggest that for clinical trials in which the median proportion of crossover was lower than 1 percent, the association between the HRs of PFS and OS was strong.
Kim and Prasad (Reference Kim and Prasad56), identified by Davis et al. (Reference Davis, Tappenden and Cantrell1), evaluated previous publications to assess the strength of the surrogate-survival correlation among cancer drugs approved. They found no significant differences in survival benefit between clinical trials with or without crossover. They suggest that the results are opposed to the commonly shared idea that crossover masks OS benefits, possibly because crossover prevents observation of late toxicity. Contrary to other studies, in an analysis of colorectal cancer trials, Adunlin et al. (Reference Adunlin, Cyrus and Dranitsaris33) found that among crossover trials the strength of the association between PFS and OS was higher.
Characteristics of Postprogression Survival
Characteristics of postprogression survival (treatment line 1st/2nd/3rd, year of the clinical trial, crossover between control and treatment arms, newly diagnosed vs recurrent, and sub-group of patients) and the fact that an important number of articles analysed postprogression survival (PPS) together with PFS suggest that PPS has a role in the discussion of the validation of PFS as a surrogate of OS. Amir et al. (Reference Amir, Seruga, Kwong, Tannock and Ocaña29) indicate that when PPS is short, the correlation between OS and PFS is higher than when PPS is long.
For patients with advanced NSCLC, Suzuki et al. (Reference Suzuki, Hirashima and Okamoto37) identified the optimal point of correlation of the HR for PFS and the HR for OS by analyzing every 1 month of PPS. They found that the correlation between the HR for PFS and for OS increases for a PPS of less than 6 months and then decreases (<4 months 0.70; <6 months 0.77; <9 months 0.46). From the sixteen articles that analysed PPS, thirteen suggest that the relationship between OS and PPS is stronger than between OS and PFS and one of the remaining three pointed out a high correlation between PFS and PPS.
A group of Japanese researchers specializing in a study of factors that affect the relationship between PPS and OS (Reference Imai, Mori and Ono5–Reference Shitara, Matsuo, Muro, Doi and Ohtsu10) suggests that the significant factors to explain the effect of PPS on OS are: (i) number of regimens used after progression, (ii) response to the second or third-line treatment, (iii) performance status at progression, (iv) PFS of first line chemotherapy, (v) tumor stage after initial treatment, (vi) presence of distant metastases at recurrence.
DISCUSSION
The percentage of articles that conclude that PFS is an appropriate surrogate for OS (52 percent) is higher than the percentage of those that do not support surrogacy (25 percent). An additional 23 percent of the samples suggest that surrogacy depends on factors such as the length of the PPS and whether the treatment was first or subsequent line. In such a complicated area, it is no wonder that simple rules of thumb, to determine whether a surrogate end point can replace OS, will not work in all situations. Additionally, it seems that different investigators use different rules of thumb in the same circumstances.
The first set of criteria to establish whether a surrogate would be an adequate replacement for OS was proposed by Prentice (Reference Prentice57). His criteria have been amended and elaborated since then. Ciani et al. (Reference Ciani, Davis and Tappenden4) summarized three different frameworks that are currently applied to validate the strength of the evidence (IQWiG, Biomarker-Surrogacy Evaluation Schema, and Elston and Taylor's frameworks). All include the Prentice (Reference Prentice57) criteria, but also analyze factors that could influence the strength of the relationship, such as the quality of the data and characteristics of the clinical trial.
Ciani et al. (Reference Ciani, Buyse and Drummond58) highlight three further conditions. First, the strength of the association between the surrogate end point and the final outcome should be measured through approaches such as regression and meta-analysis. Second, it is necessary that the effect on the final outcome can be predicted and quantified based on the effect on the surrogate. The effect of the treatment on PFS must be large enough to predict an improvement in OS. Third, the level of evidence supporting the relationship between the surrogate end point and the desired outcome needs to be considered. A strong correlation should be observed between the surrogate and the end point based on individual patient data as well as between the treatment effect on the surrogate end point and the final outcomes across multiple randomized trials.
Similarly, Buyse et al. (Reference Buyse, Molenberghs, Burzykowski, Renard and Geys59) propose that to validate a surrogate end point, it is necessary to analyze both individual and trial level data. ACTD is important for testing the relationship between the treatment effect on PFS and the treatment effect on OS while the IPD allows the analysis of the relationship between the actual value of PFS and the actual value of OS. Thus, despite the existence of recognized surrogate validation criteria, which have been developed over time, no consistent application of these criteria was observed in the studies we reviewed; different authors made different, often arbitrary assessments of surrogate adequacy.
The lack of any substantial proportionate increase in the number of articles including IPD data between Davis et al. (Reference Davis, Tappenden and Cantrell1) study and our analysis suggests the rate of progress in this field is being hampered by an unwillingness by most pharmaceutical companies to provide IPD data to independent and well-qualified researchers for analysis or even to report analyses based on IPD when those data have routinely been collected. Nevertheless, some progress in the topic has been observed, for example, firms have joined recent initiatives such as clinicalstudydatarequest.com or project data sphere that allows researchers to analyze pooled IPD data sets.
The existence of heterogeneity in the definition of progression among clinical trials and a lack of clear information in the clinical trial reports as to how disease progression was evaluated (Reference Shitara, Ikeda and Yokota15;Reference Giessen, Laubender and Ankerst17;Reference Shitara, Matsuo, Muro, Doi and Ohtsu40) indicate that there is a need to standardise clinical trial protocols to provide comparability between trials for the same cancer type.
Finally, our search process found additional evidence that went beyond the scope of Davis et al. (Reference Davis, Tappenden and Cantrell1) analysis. Stevens et al. (Reference Stevens, Philipson, Wu, Chen and Lakdawalla60) outlined an economic approach which bears on both clinical effectiveness and on cost-effectiveness, suggesting a framework for factoring the use of surrogates into the decision-making process. Perhaps surprisingly, the benefits (or costs) of earlier adoption of a new technology that the use of a surrogate end point will usually allow are not taken into consideration when assessing new treatments. The longer the lag between the results of a trial using a surrogate end point rather than OS, the greater the additional benefits of using a surrogate should be, provided the surrogate is valid and that subsequent treatments do not act as confounders. However, as Davis et al. (Reference Davis, Tappenden and Cantrell1) explain, even when strong consistent evidence supporting a correlation between the treatment effects is available, it is unclear how that should be converted into a quantified relationship between PFS and OS treatment effects within a cost-effectiveness model.
CONCLUSION
The analysis strongly suggests that the use of IPD to assess surrogacy should increase. A case could be made for release of all IPD as a condition of publication. As in Davis et al. (Reference Davis, Tappenden and Cantrell1), our findings show that the availability of such information has been limited, though recent data-sharing initiatives may be changing that.
There is a high variation in the characteristics of the methodologies and little apparent consistency in what should be considered appropriate statistical estimation methodology. Thus the need for standardization that allows for more consistent results. Standardization, in the form of adhering to common definitions, statistical techniques and a checklist of necessary items in reporting results, would often be virtually costless. This could facilitate the use of PFS by policy makers, if it were deemed appropriate, based upon standardized validation methodology, and could increase both the speed and accuracy of their decision making.
Many of the factors that affect the validation of surrogacy are related to the length and characteristics of postprogression survival. Procedures for gathering information on factors affecting the postprogression management of a disease should be described in protocols for following-up clinical trial patients, making it possible to derive stronger conclusions from statistical analysis.
Some limitations of the study need to be mentioned. First, it is not a full literature review. We conducted a citation search based on Davis et al. (Reference Davis, Tappenden and Cantrell1) nineteen studies that we assume captured all relevant articles. Discussions with experts and comparisons with previous systematic reviews suggest that no relevant article has been excluded from the analysis. Second, this is also not a systematic literature review of any particular cancer type. Therefore, analyzing whether PFS should or should not be used in any particular case was outside of the scope of this analysis. It is recommended that the factors that affect the relationship between PFS and OS by cancer type should be analyzed to understand the particular challenges faced in each case. Third, for pragmatic reasons, our exclusion of TTP ignores the possibility that the names of TTP and PFS have in error been used interchangeably (Reference Saad and Katz61).
Finally, in addition to using one of the frameworks promoted by Ciani et al. (Reference Ciani, Davis and Tappenden4) to ensure a higher standard of validation of the strength of evidence, both researchers and policy makers in an area that makes use of surrogate end points need to be aware that the statistical methodology must be properly understood and documented. The importance that validating PFS as a surrogate for OS may have on allowing patients to access new health technologies more quickly should not be undermined by a poor knowledge of the methodology applied. The results of this study are broadly in line with those of Kemp and Prasad (Reference Kemp and Prasad62) who have concluded that the use of surrogate outcomes should be limited to situations where a surrogate has demonstrated robust ability to predict meaningful benefits, or where cases are dire, rare, or with few treatment options.
CONFLICTS OF INTEREST
Dr. Hernandez-Villafuerte reports grants from The Pharmaceutical Oncology Initiative of the Association of the British Pharmaceutical Industry during the conduct of the study. Dr. Fischer has nothing to disclose. Dr. Latimer reports grants from Office of Health Economics during the conduct of the study; personal fees from BMS, personal fees from Pfizer, personal fees from Merck EMD Serono, personal fees from Celegene, personal fees from Janssen, personal fees from Astra Zeneca outside the submitted work.