Introduction
Palliative care clinicians are frequently asked to predict how long their patients would live. The prediction of survival in patients with advanced cancer is a practical and important issue for patients, their families, and medical staff. An accurate estimation of survival facilitates clinical decision making such as transition to hospice and palliative care. Although a number of prognostic models have been developed and validated, there are multiple barriers to their use and busy clinicians still rely mostly on their clinical judgment in their daily practices (Morita et al., Reference Morita, Tsunoda and Inoue1999; Pirovano et al., Reference Pirovano, Maltoni and Nanni1999; Gwilliam et al., Reference Gwilliam, Keeley and Todd2011; Scarpi et al., Reference Scarpi, Maltoni and Miceli2011; Hamano et al., Reference Hamano, Morita and Inoue2015; Hui, Reference Hui2015). Currently, the two most common approaches to clinician prediction of survival (CPS) are the temporal question (TQ) and “the surprise question” (SQ). TQ involves provides a specific duration of survival (in days, weeks, months, or years), whereas SQ involves a “Yes” or “No” answer to the question “Would I be surprised if this patient died in [specific time frame].”
SQ was originally developed to assess when a patient should be referred for palliative care (Weeks et al., Reference Weeks, Cook and O'Day1998). Most studies on SQ have used the 1-year time frame and reported the accuracy to be approximately 70% (Pattison and Romer, Reference Pattison and Romer2001; Moroni et al., Reference Moroni, Zocchi and Bolognesi2014). However, a short time frame of prediction (e.g., four weeks) is more likely suitable for many clinical decisions related to the care of patients in the far advanced cancer setting (Hui, Reference Hui2015), particularly since the median survival of palliative care patients with advanced cancer is less than 1–2 months in many countries (Hyodo et al., Reference Hyodo, Morita and Adachi2010; Perez-Cruz et al., Reference Perez-Cruz, Dos Santos and Silva2014). To date, only a handful of studies have examined a shorter time frame of SQ prediction. A Japanese study examined SQ with one-week and one-month survival and concluded that they were satisfactory as screening tools (Hamano et al., Reference Hamano, Morita and Inoue2015). Although some believed SQ was more accurate than TQ, no studies have directly compared the performance of these two questions. A better understanding of the accuracy of TQ and SQ especially for short time frames in hospitalized patients would help palliative care clinicians to understand how to best deploy these questions. The purpose of this prospective multicenter study was to examine the prognostic accuracy of SQs using 7-, 21-, and 42-day time frames in patients with advanced cancer and compared this to TQ.
Methods
This is a preplanned analysis of a prospective study to investigate the spiritual well-being of palliative inpatients in South Korea. We previously published an article about spirituality and survival in advanced cancer patients (Shin et al., Reference Shin, Suh and Kim2018).
Study setting and population
Patients with advanced cancer were eligible for the study if they were admitted to receive palliative care; age ≥18 years; had an expected survival of <3 months. We excluded patients if they were still on disease-modifying treatment. Written informed consents were collected from either patients or their families. All study procedures were approved by the Institutional Review Boards of each institution (15Yeon IRB017-2).
Study design and procedure
This was a prospective multicenter study of adult inpatients with advanced cancer in seven hospital-based palliative care units (PCUs) in South Korea. All seven hospitals have hospice wards as nationally designated institutes for hospice. Five hospitals are university hospitals located in metropolitan cities such as Seoul and Incheon, and two hospitals were general hospitals located in Gyeonggi Province. We followed participants until death or discharge from hospitals. This study was performed from May 2015 through August 2016. Study coordinators in each participating institution approached all eligible patients consecutively, explained the study purpose, and enrolled those who agreed to participate in the study.
Data collection
We collected the baseline characteristics of the subjects such as gender, age, sites of primacy cancer, and palliative performance scale (PPS). The attending palliative care physicians were asked to estimate their patients’ survival by weeks using the TQ and answer SQs for three time frames: “Would I be surprised if this patient died in the next 7 days?” (7-day SQ), “Would I be surprised if this patient died in the next 21 days?” (21-day SQ), and “Would I be surprised if this patient died in the next 42 days?” (42-day SQ). These time frames were chosen because the inpatients of PCUs have the median survival time of three weeks in South Korea. A “Yes” answer suggests that the clinician expects the patient will likely live longer than the pre-specified timeframe, while a “No” answer denotes the opposite.
Statistical analysis
The data were categorized into two independent groups according to the reply to each SQ (“yes, surprised”/“no, not surprised”). A “no” answer was coded as correct if the patient died within the specified time frame (true positive). And a “yes” answer was coded as correct if the patient did not die in the specified time frame (true negative).
We also dichotomized the answers to TQ to facilitate comparison with SQ for one, three, and six weeks of survival. TQ was coded as correct if the answer was less than or equal to the pre-specified time frame and the patient died (true positive), or the answer was greater than the time frame of interest and the patient did not die in the time frame (true negative).
For both TQ and SQ, we applied a 2 × 2 table to calculate the sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy.
The survival was calculated from the date each patient was recruited into the study until the date of death while in hospital. We were unable to follow up on patients who were discharged alive. Thus, this analysis focused only on patients who died in the hospital.
The concordance index (c-index) of SQs and TQs were calculated using Harrell's c-index method. We selected the c-index instead of the area under the curve (AUC), which examines the sensitivity against the false positive rate (1-specificity) (White et al., Reference White, Kupeli and Vickerstaff2017). Because the c-index is known to be more compatible than AUC in dealing with the continuity of data. The c-index showed predictive discrimination, known as the proportion of patient pairs in which the predicted and observed survival outcomes are concordant (Harrell et al., Reference Harrell, Lee and Mark1996).
All statistical analyses were performed using the statistical package for social science (SPSS) Windows version 21.0 (IBM, Armonk, NY, USA) and the R project for statistical computing version 3.6.0. for Windows. The significance level was set at p < 0.05.
Results
We enrolled a total of 204 patients in the original study. We excluded 61 discharged patients from PCUs and 13 alive patients at the end of the study, thus 130 patients remained in this study. The patient characteristics are summarized in Table 1. Their mean age was 66.0 ± 12.2 years, and half were men (50.8%). The most common primary cancers were lung (24.6%), colorectal (20.8%), and liver/biliary tract (14.6%). Over 40% (43.7%) of participants had relatively good performance status (PPS ≥ 60) at enrollment. Median survival was 21.0 days (range: 0–146 days).
Table 1. Characteristics of participants (n = 130)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20240220134524831-0404:S1478951521000766:S1478951521000766_tab1.png?pub-status=live)
SD, Standard Deviation.
The numbers of patients with the response “no, not surprised” to the SQs were 7/15 (46.7%) for 7-day survival, 37/70 (52.9%) for 21-day survival, and 87/106 (82.1%) for 42-day survival (Table 2). Table 3 shows the sensitivity, specificity, PPV, NPV, and accuracy of the 7-, 21-, and 42-day SQs and TQs in parallel. The specificity for the 7-day TQ was the highest among the values as 98.3% [95% Confidence Interval (CI): 93.9–99.8%]. The specificities of 7-day SQ and 21-day TQs are high as 88.7% (95% CI: 81.5–93.8%) and 93.3% (95% CI: 83.8–98.2%), respectively. The overall accuracies of SQs were very similar to TQs for each time frame. The comparison of accuracies for SQs and TQs were performed using c-indices (Table 4). The c-index of 7-day SQ was 0.662 (95% CI: 0.539–0.785), and it was significantly better than that 0.521 (95% CI: 0.464–0.579) of 7-day TQ. In contrast, the c-index of 42-day TQ was 0.616 (95% CI: 0.569–0.663) which was significantly higher than that (0.554 (95% CI: 0.509–0.599) of 42-day SQ.
Table 2. Responses to “the surprise questions” and temporal questions by 7-, 21-, and 42-day survival time frames (n = 130)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20240220134524831-0404:S1478951521000766:S1478951521000766_tab2.png?pub-status=live)
SQ, “The Surprise Question”; TQ, Temporal Question.
Table 3. Performance of “the surprise question” and temporal question for 7-, 21-, and 42-day survival
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20240220134524831-0404:S1478951521000766:S1478951521000766_tab3.png?pub-status=live)
Data in parentheses are 95% confidence intervals.
Prevalence is defined death events in each time frame per total study population.
PPV, Positive Predictive Value; NPV, Negative Predictive Value; SQ, “The Surprise Question”; TQ, Temporal Question.
Table 4. c-indices for “the surprise questions” and temporal questions for 7-, 21-, and 42-day survival
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20240220134524831-0404:S1478951521000766:S1478951521000766_tab4.png?pub-status=live)
Data in parentheses are 95% confidence intervals.
c-indices were calculated using Harrell's c-index method by the R project for statistical computing.
SQ, “The Surprise Question”; TQ, Temporal Question.
Discussion
This is the first study to examine both TQ and SQ with short prediction time frames in palliative care inpatients with only weeks of survival. Surprisingly, the two approaches had similar overall accuracies when the same clinicians were asked to make predictions in the same patients for the same time frames. TQ was generally more specific. We also found that the accuracies of 7-day survival were generally higher than those of 21-day and 42-day survival. In particular, the specificity for the 7-day SQ and 7- and 21-day TQ were around 90%, suggesting that these questions may be helpful to rule in death if positive.
Our findings were in contrast to the Japanese study which demonstrated more than 80–90% of sensitivity for SQs predicting 7, 30-day survival (Hamano et al., Reference Hamano, Morita and Inoue2015). This difference may be related to differences in patient populations. Specifically, our study was confined to PCU only, while the Japanese study included PCU, home hospice, and hospital-based palliative care teams (Hamano et al., Reference Hamano, Morita and Inoue2015). Other reasons can be differences of death prevalence or prognostication style according to culture in each country. The experience of clinicians, threshold for considering a death to be surprising may also contribute further to this discrepancy. Our study highlights that the performance (i.e., sensitivity, specificity) of SQ may vary widely even in seemingly similar populations. Thus, the performance of SQ should ideally be examined before it is applied in clinical practice and research studies (e.g., as eligibility criteria).
The relatively low sensitivities of TQ and SQ highlight clinician's overestimation of survival. It is already well-established that TQ is often overly optimistic (Christakis and Lamont, Reference Christakis and Lamont2000; Amano et al., Reference Amano, Maeda and Shimoyama2015). The low sensitivity may be related to the unexpected nature of some deaths due to acute catastrophic events (Ekstrom et al., Reference Ekstrom, Vergo and Ahmadi2016), lack of detection of some tell-tale prognostic signs (Hui et al., Reference Hui, Bansal and Morgado2014a, Reference Hui, dos Santos and Chisholm2014b, Reference Hui, Moore and Park2019a), and/or reluctance to acknowledge that their patients have a short survival. Across the three time frames of prediction, the sensitivity of SQ appeared to be slightly higher than TQ; however, the specificity of TQ appeared to be better than SQ. Nevertheless, the sensitivity for both approaches was low to moderate, suggesting that one cannot use either approach to rule out death with a negative answer.
According to a previous study, the accuracy of CPS expressed as TQ was lower than the prognostic score. Vigano et al. reported that CPS had low sensitivity in detecting patients who died within shorter time frames (< or =2 months) and also clinicians’ tendency to overestimate survival was noted (Vigano et al., Reference Vigano, Dorgan and Bruera1999).The sensitivity of our 42-day SQ was 82.1%, which can be compared with the sensitivity of the Palliative Prognostic Index (PPI) for six-week prognosis ranged from 62% to 80% (Morita et al., Reference Morita, Tsunoda and Inoue1999; Subramaniam et al., Reference Subramaniam, Thorns and Ridout2015). Over 80% of the sensitivity of 42-day SQ is relatively good considering its simplicity. It is consistent with a recent meta-analysis that reported the pooled sensitivity as 67% (95% CI: 55.7–76.7%), though their time frames were from 6 to 18 months (Downar et al., Reference Downar, Goldman and Pinto2017). Unfortunately, the specificity of 42-day SQ was 45.8% which was relatively low compared with those of 1-year prognosis prediction as 70–90% (Moss et al., Reference Moss, Lunney and Culp2010; Moroni et al., Reference Moroni, Zocchi and Bolognesi2014). For predicting intermediate survival, there have not been much useful parameters. The rate of deterioration of patients might not be distinct in this intermediate phase, so that it would make survival prediction inaccurate (Stiel et al., Reference Stiel, Bertram and Neuhaus2010).
We found that TQ had high specificity for the 7- and 21-day cutoff and SQ also had a specificity of around 90% for the 7-day question. A high specificity is useful to rule in death if positive. However, the PPV was low (particularly, for 7-day survival), which is explained by the low prevalence of patients who died in this time frame. The accurate prediction of patients who will likely die in 7 or 21 days has important clinical implications. For example, patients expected to die in seven days may shift their focus entirely to comfort care. Clinicians may recommend keeping them in the PCUs instead of planning for home discharge. Furthermore, family members may be encouraged to travel to say goodbye if they have not already done so.
How to improve sensitivity, specificity, and misclassification error in this setting? Prognostic models have been validated and may be useful to improve accuracies, such as Palliative Prognostic Score, PPI, Prognosis in Palliative Care Study, or Objective Prognostic Scores (Suh et al., Reference Suh, Choi and Shim2010; Jho et al., Reference Jho, Suh and Yoon2016; Yoon et al., Reference Yoon, Suh and Lee2017). There are also websites available to facilitate the computation and interpretation of these prognostic scores (e.g., predictsurvival.com). SQ may be incorporated into the web-based calculators to compare its accuracy to TQ and other prognostic indices in near future. It will facilitate to enhance accuracy and reproducibility as well as understanding and communicating with patients and families about the uncertainty of prognostication (Hui et al., Reference Hui, Paiva and Del Fabbro2019b).
The SQ relies on the intuition of clinicians to predict survival as a counter-question. These predictions may vary according to the degree of knowledge, experiences of clinicians, and subspecialties of the physicians. A study showed that the palliative care physicians provided more accurate prediction than referring physicians and oncologists (Amano et al., Reference Amano, Maeda and Shimoyama2015). Oncologists are usually interested in patient's treatment such as chemotherapy rather than considering life expectancy very near to death (Kao et al., Reference Kao, Shafiq and Vardy2009). It will be necessary to explore the relationship between these SQs and clinicians’ experiences, knowledge, and subspecialties in near future (Yoon et al., Reference Yoon, Kim and Lee2015). Also, the accuracy of the SQ can vary with different communication styles of clinicians.
The limitations of this study are as follows. First, this is a multicenter study in South Korea. It is unclear whether these results would be similar if these SQs are investigated in other countries with different cultural and medical environments. Second, our study was performed in PCUs only. Thus, our findings might be changed in other palliative care setting is different such as home hospice and nursing home. Third, all predictions were made by palliative care physicians with backgrounds of family physicians or medical oncologists. Thus, clinicians of different specialties might yield different results. Subsequent studies may be useful to compare the treatment provided, the timing of advance care planning (ACP) discussion according to the SQs are performed or not. Fourth, we were unable to provide follow up on patients discharged from the hospital alive. The exclusion of this population of patients may bias our findings toward sicker individuals and affect the prognostic performance of the SQ and TQ.
In conclusion, after controlling for the time frame of prediction, clinician, and patient population, we found that SQ and TQ were very similar in accuracy. The overall level of accuracy was low to moderate even when the survival estimates were made by palliative care specialists, with somewhat higher accuracy for shorter term (7-day prediction). Our findings suggest that both SQ and TQ have some limitations, and that prognostic models may potentially be helpful to augment CPS.
Author contributions
All authors contributed conception, analysis, interpretation, revising, and final approval of the manuscript. SYS served as a principal investigator and had full access to all of the data in the study. SHK wrote the manuscript as a first author. SHK, SJY, JP, YJK, BDK, YMP, JHK, KOP, and JYK collected data; HNC and HYA did statistical analysis; JH and DH interpreted the results and revised. SYS takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors read and approved the final manuscript.
Ethics approval and consent to participate
All study procedures were approved by the Institutional Review Boards of each institution. Written informed consents were provided to patients or their families.
Funding
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Conflict of interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.