Introduction
It has been reported that about 60–80% of patients with cancer suffer from a wide variety of physical and psychological symptoms (Seow et al., Reference Seow, Barbera and Sutradhar2011; Breitbart et al., Reference Breitbart, Bruera and Chochinov2018). According to the WHO (https://www.who.int/cancer/palliative/definition/en/), the goal of palliative and supportive care is to relieve or prevent such symptoms, thereby improving quality of life (QoL). Similar to surgery, chemotherapy, and radiotherapy, palliative and supportive care has increasingly been recognized as one of the mainstays of treatment because of its ethical, medico-economic, and possible survival benefits (Temel et al., Reference Temel, Greer and Muzikansky2010; May et al., Reference May, Normand and Cassel2018). As a result, its integration into not only oncologic but also cardiologic and neurological practice is being promoted worldwide (Ferrell et al., Reference Ferrell, Temel and Temin2017; The Lancet Neurology, 2017; Jordan et al., Reference Jordan, Aapro and Kaasa2018; Sobanski et al., Reference Sobanski, Alt-Epping and Currow2020).
The progression of this integration has made the periodic evaluation of QoL and symptoms necessary for quality improvement, clinical audits, and research (Antunes et al., Reference Antunes, Rodrigues and Higginson2018). The “gold standard” method for assessing QoL and symptoms is the use of patient-reported outcome measures (PROMs). However, as end-of-life approaches, most patients become too ill to respond to self-report measures (Neumann et al., Reference Neumann, Araki and Gutterman2000; Jones et al., Reference Jones, McPherson and Zimmermann2011; Usman et al., Reference Usman, Lewis and Hinsliff-Smith2018; Robertson et al., Reference Robertson, Cooper and Hoe2020). Therefore, PROMs are often substituted with clinician-reported outcome measures (Clin-ROMs), even though few Clin-ROMs have been validated or confirmed to follow good practice guidelines (Evans et al., Reference Evans, Benalia and Preston2013; Bausewein et al., Reference Bausewein, Daveson and Currow2016). Given the rapid aging of many populations, the development and validation of Clin-ROMs are urgent.
One of the few tools that have been validated is the Integrated Palliative care Outcome Scale (IPOS), which was devised to comprehensively measure physical and psychological symptoms, the anxiety experienced by family and friends, access to various types of aid, medical information, and spiritual needs (Murtagh et al., Reference Murtagh, Ramsenthaler and Firth2019). A patient version of the IPOS, the IPOS-Patient, is used as a PROM, whereas a staff version, the IPOS-Staff, is used as a Clin-ROM. The objective of the present study was to assess the validity and reliability of the Japanese version of the IPOS-Staff. By so doing, this study adds a nuanced analysis of Clin-ROMs to existing knowledge, enabling international comparisons in future studies.
Methods
This study was a multicenter, cross-sectional, observational study conducted concurrently with the validation of the IPOS-Patient (Sakurai et al., Reference Sakurai, Miyashita and Imai2019). Patients and staff at the following six facilities participated between August 2015 and March 2017: two cancer centers (the Cancer Institute Hospital of Japanese Foundation of Cancer Research and the National Kyushu Cancer Center), two general hospitals (Seirei Mikatahara General Hospital and Japanese Red Cross Medical Center), and two regional hospitals (Heiwa Hospital and Tsujinaka Hospital).
Patients, staff, and procedure
The inclusion criteria for patients were having a cancer diagnosis, being aged ≥20 years, receiving palliative care, and possessing the ability to read and write in Japanese. The exclusion criteria were having consciousness disturbance and physical or psychological symptoms that markedly affect activities and concentration, as determined by the attending physician. If a patient could not provide responses to the IPOS-Patient independently due to a physical problem, a family or staff member could provide assistance.
The inclusion criterion for staff was having a staff member status that provides care for one of the recruited patients. Any staff member who assisted a patient with providing responses was excluded. The IPOS-Staff was completed by two staff members: the primary and secondary evaluators. Evaluating multiple patients was permitted. Each investigator at each facility could serve as an evaluator. The primary evaluator responded to the IPOS-Staff the second time on the day after responding to the first, unless the patient's condition changed.
We did not prepare any special guidance for family or staff so that the study could be conducted in a manner similar to an actual clinical setting. In addition, we did not forbid them from filling in the blanks or correcting multiple answers while assisting, as long as they checked with the patient. If patients could not decide, the blanks were left empty. A quiet environment was prepared for the completion of the measure. Patients and staff were asked to submit the survey sheet in a sealed envelope.
Clinical data
Patients’ characteristics and clinical data, including age, sex, site of cancer origin, disease status, Eastern Cooperative Oncology Group Performance Status (ECOG-PS), current treatment, palliative care setting, and staff profession, were obtained.
Measures
Japanese versions of IPOS-Patient and IPOS-Staff
In the IPOS, the Palliative care Outcome Scale (POS) is integrated with the version for symptoms — the POS-Symptom (Hearn and Higginson, Reference Hearn and Higginson1999). The IPOS is composed of physical items (Pain, Shortness of Breath, Weakness, Nausea, Vomiting, Poor Appetite, Constipation, Sore Mouth, Drowsiness, and Poor Mobility), psychological items (Anxiety and Depression), items regarding family and friends (Family Anxiety and Sharing Feelings), a spirituality item (Feeling at Peace), and others (Information and Practical Problems) during the preceding three days. All items were rated using a 5-point Likert scale ranging from 0 (affected not at all) to 4 (overwhelmingly). A score of 2 (moderately) or higher indicates the need for an additional treatment or care plan.
The validity and reliability of the Japanese version of the IPOS-Patient have been confirmed (Sakurai et al., Reference Sakurai, Miyashita and Imai2019). The IPOS-Staff is similar to the IPOS-Patient, except that for the IPOS-Staff, the subject “you” was changed to “the patient.”
Sample size
Based on the previous versions of the IPOS, the Support Team Assessment Schedule study (Higginson and McCarthy, Reference Higginson and McCarthy1993; Miyashita et al., Reference Miyashita, Yasuda and Baba2010), and the POS studies (Hearn and Higginson, Reference Hearn and Higginson1999; Eisenchlas et al., Reference Eisenchlas, Harding and Daud2008), we concluded that a sample size of 150 patient–staff pairs, 70 retests, and 70 staff pairs were required.
Statistical analysis
For statistical analysis, we assessed the characteristics of the patients and staff members as well as the missing values of the IPOS-Staff and compared all items in terms of means, standard deviations, prevalence of treatment need (defined as the percent of having score 2 or higher in each item; Murtagh et al., Reference Murtagh, Ramsenthaler and Firth2019), and total IPOS scores. The total score was the sum of all items, except for “how responses were given” and free descriptions.
There is no scoring manual for the IPOS regarding missing values; therefore, we did not adjust for missing items and simply excluded them from the subsequent validity and reliability analyses.
For the analysis of criterion validity, we calculated intraclass correlations (ICCs) with the IPOS-Patient as the gold standard for the IPOS-Staff. We also calculated ICCs for the intra-rater (two days) and inter-rater (two staff members) reliability analyses. We performed subgroup analysis based on disease status and ECOG-PS. We did not use a kappa coefficient because we considered the 5-point Likert scale an interval scale. Qualitative ratings of ICCs were based on a previously established guideline (Koo and Li, Reference Koo and Li2016).
SPSS Statistics version 24.0 (IBM Corp., Armonk, NY) was used for all data analyses, with the level of significance set at 0.05.
Ethics
This study was approved by the institutional ethical review board of each facility. Following approval, the investigators provided information about the study to the patients and staff members, and then obtained written consent. This study was conducted in compliance with the Declaration of Helsinki and the Ethical Guidelines for Epidemiological Research (https://www.mhlw.go.jp/seisakunitsuite/bunya/hokabunya/kenkyujigyou/i-kenkyu/dl/02-02.pdf).
Results
Patient and staff characteristics
Among the 146 patients who met the inclusion criteria, one did not provide consent to participate and two dropped out due to tiredness during response. Ultimately, 143 patients and 79 medical staff members gave their consent and completed the IPOS-Patient and IPOS-Staff, respectively. One hundred and twenty out of 143 (83.9%) responded on their own, while 23 (16.1%) responded with their family, friends, or staff. The characteristics of the patients and staff are summarized in Table 1.
Table 1. Patient characteristics and staff profession

a Multiple answers permitted.
b Evaluating multiple patients permitted.
Missing values
The most common missing values from the IPOS-Staff were Family Anxiety (3.5%) and Sharing Feelings (3.5%; see Table 2).
Table 2. Missing values, prevalence, mean, and criterion (patient–staff) validity

a Intraclass correlation.
b Number missing at least one IPOS item.
* p < 0.001.
Prevalence of treatment need
In Table 2, the IPOS-Staff and IPOS-Patient items are matched. Over half of the patients gave themselves scores of 2 or higher (moderate or worse) for Poor Mobility (52.5%), Anxiety (51.1%), and Family Anxiety (66.5%). Similarly, more than half of the staff members scored patients moderate or worse in Weakness (51.4%), Anxiety (56.0%), and Family Anxiety (69.6%).
Criterion validity (patient–staff agreement)
In total, 143 matched patient–staff pairs were analyzed. The ICC ranged from 0.114 to 0.826; the lowest score was for Sharing Feelings, and the highest was for Nausea (Table 2).
Intra-rater reliability (two days agreement) and inter-rater reliability (two staff agreement)
In total, 61 two-day pairs and 68 two-staff pairs for the intra- and inter-rater reliability analyses, respectively, were available. The ICCs for intra- and inter-rater reliability ranged from 0.720 (Anxiety) to 0.933 (Nausea) and −0.038 (Practical Problems) to 0.830 (Nausea), respectively (Table 3).
Table 3. Intra-rater and inter-rater reliability

ICC, Intraclass correlation.
Subgroup analysis
ECOG-PS 2–4 patients with stage III or IV cancers as well as their staff were analyzed. In total, 81 patient–staff pairs for criterion validity, 41 two-day pairs for intra-rater reliability, and 43 two-staff pairs for inter-rater reliability were available. The ICCs ranged from 0.012 (Share Feelings) to 0.812 (Nausea), 0.715 (Anxiety) to 0.929 (Nausea), and −0.048 (Practical Problems) to 0.842 (Nausea), respectively (Table 4).
Table 4. Subgroup analysis for ECOG-PS 2–4 among stage III or IV patients

ICC, Intraclass correlation.
* p < 0.01.
** p < 0.05.
Discussion
The results of this study indicate that the IPOS-Staff is easy to respond to; it has fair validity and reliability for physical items but poor for psycho-social items. This nuance must be handled with attention; the IPOS-Staff should be chosen as Clin-ROMs when PROMs are a burden for patients.
Missing values comprised no more than 5% of any evaluated item. One of the possible reasons for the low response rates for Family Anxiety and Sharing Feelings was not having any visits from friends or family members in the previous three days. Given that approximately 2% and 6% of the values were missing in the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 studies (Kobayashi et al., Reference Kobayashi, Takeda and Teramukai1998; Fayers et al., Reference Fayers, Aaronson and Bjordal2001) and an IPOS study for chronic heart failure (Kane et al., Reference Kane, Daveson and Ryan2017), in addition to other recommendations (Schulz and Grimes, Reference Schulz and Grimes2002; Preston et al., Reference Preston, Fayers and Walters2013), the use of the IPOS-Staff was considered feasible and acceptable.
Regarding prevalence, patients provided more severe scores than did staff members for most of the items. Differences over 10% were observed for Constipation, Drowsiness, Depression, Feeling at Peace, and Information.
In the criterion validity analysis, most items on the IPOS-Staff, except Sharing Feelings and Information, were poorly to moderately correlated; the same results were observed in the English, French, and Portuguese versions (Antunes et al., Reference Antunes, Rodrigues and Higginson2016; Murtagh et al., Reference Murtagh, Ramsenthaler and Firth2019; Sterie et al., Reference Sterie, Borasio and Bernard2019). It has been reported that observers tend to underestimate others’ subjective experiences (Jones et al., Reference Jones, McPherson and Zimmermann2011; Roydhouse et al., Reference Roydhouse, Gutman and Keating2018; Usman et al., Reference Usman, Lewis and Hinsliff-Smith2018). Murtagh et al. (Reference Murtagh, Ramsenthaler and Firth2019) suggested that one of the possible explanations for this observation is that the IPOS assesses influence rather than severity. Another possibility is that these symptoms could have fluctuated more frequently than expected by medical staff members. Nevertheless, further studies to bridge these gaps are needed.
In the intra-rater analysis, good or excellent reliability was confirmed. Consistent with the previous versions of the IPOS in different languages (Hearn and Higginson, Reference Hearn and Higginson1999; Eisenchlas et al., Reference Eisenchlas, Harding and Daud2008), retests were carried out on the following day. Anchoring their previous responses and changes in symptoms between the first and second tests should be avoided for test–retest analysis. Thus, taking a retest on the day immediately following the first was considered to be the shortest interval that could limit the impact of these concerns.
In the inter-rater analysis, poor ICCs were observed for Family Anxiety, Feeling at Peace, Sharing Feelings, and Practical Problems (ICC: −0.038–0.386), while moderate to excellent correlations were observed for the other items (ICC: 0.532–0.830). The same discrepancy has been reported for the items related to emotion, communication, and problems in other languages as well as for other proxy rating scales for other disease entities (Murtagh et al., Reference Murtagh, Ramsenthaler and Firth2019; Sterie et al., Reference Sterie, Borasio and Bernard2019; Robertson et al., Reference Robertson, Cooper and Hoe2020). This may be affected greatly by the profession and experience of the evaluator, their level of involvement in daily care, and the amount of patient family support. To overcome these discrepancies, it is crucial for staff members to continuously share different viewpoints in clinical care among themselves. Which characteristics of medical staff, other than their profession, affect outcome measures is a future research question.
The subgroup analysis had the same tendency as the whole analysis. Poor correlations were observed for psycho-social items. In the subgroup criterion validity analysis for physical items, Weakness and Poor Mobility had the lowest correlation. This means that patients feel a greater lack of energy or are more bed bound than what staff members think as ECOG-PS gets worse.
This study had several limitations. First, we targeted patients who were receiving palliative care because the trend of starting palliative care from the early stage is being increasingly promoted in Japan; thus, the number of early stage patients was limited. Second, most patients with severe symptoms or consciousness disturbance (who are the true target of Clin-ROMs) were excluded. This selection bias was believed to be appropriate from an ethical perspective. Third, because response assistance and correction based on the patient's check were permitted, which is similar to the original IPOS, response bias cannot be excluded completely. Considering that over 80% of patients responded on their own, the effect is unlikely to have a significant impact on the result. Finally, we targeted only inpatients and outpatients with cancer in this study, while the IPOS in other languages was validated among patients and staff members at nursing care facilities and home care setting as well as those with other disease entities (Ellis-Smith et al., Reference Ellis-Smith, Evans and Murtagh2017, Reference Ellis-Smith, Higginson and Daveson2018; Kane et al., Reference Kane, Daveson and Ryan2017; Lind et al., Reference Lind, Sandberg and Brytting2018). This is a further research task.
Conclusion
The IPOS-Staff is easy to respond to; it has fair validity and reliability for physical items but not for psycho-social items. By defining the context and objectives of its use and interpretation, the IPOS-Staff can be a useful tool for measuring outcomes in adult patients with cancer who cannot complete self-evaluations. Further research regarding the generalizability of our results to other conditions, populations, and settings is needed.
Acknowledgments
We thank all patients and medical staff who provided responses for this study. We also acknowledge Dr. Yasushi Ishida, Dr. Sachiko Ohde, Dr. Tatsuya Hashimoto, Dr. Keiko Tanaka, Dr. Nobuhisa Nakajima, Dr. Mariko Shuto, Dr. Jun Shirahama, Dr. Hiroaki Goto, Dr. Tomohiro Nishi, Dr. Takeshi Hirohashi, Dr. Yoshinori Saeki, and Dr. Yoshihisa Matsumoto for providing valuable advice.
Funding
This work was supported by the Sasakawa Health Foundation (Grant Nos. 2013a001 and 2016a003) and JSPS KAKENHI (Grant No. JP 15KK0326).
Conflict of interest
There are no conflicts of interest.