Breast cancer is not a single disease but rather a group of heterogeneous tumors at the molecular level (Reference Perou, Sørlie and Eisen1). Based on the knowledge that certain biological features of cancers may indicate an increased likelihood of rapid growth and metastasis (in particular, distant recurrence) gene expression profiling (GEP) and expanded immunohistochemistry (IHC) (or protein expression) tests have been developed. These tests have an aim of improving the targeting of chemotherapy in breast cancer by stratifying patients and identifying those patients who will gain most benefit from adjuvant chemotherapy. These tests either measure the risk of cancer recurrence (by incorporating a wider range of biomarkers with prognostic significance than standard clinico-pathological algorithms), or aim to identify breast cancer sub-types which may influence recurrence risk and guide treatment decisions.
In current practice treatment regimens are tailored according to traditional clinical characteristics such as age, tumor size, and grade together with a tumor's molecular signature based on estrogen (ER) and progesterone (PR) receptor status and HER2 receptor status (Reference Senkus, Kyriakides and Ohno2), although guidelines may differ slightly from country to country.
The purpose of this systematic review was to evaluate the clinical effectiveness of GEP and expanded IHC tests in guiding the use of adjuvant chemotherapy in women with early breast cancer. A summary of the evaluated gene expression profiling and expanded immunohistochemistry tests is presented in Table 1. This review was originally undertaken to inform the UK National Institute for Health and Care Excellence's (NICE) assessment of GEP (MammaPrint, OncotypeDX) and IHC (IHC4 and Mammostrat) tests to guide selection of chemotherapy regimens in breast cancer management (Reference Ward, Scope and Rafia3), but has been updated with new evidence up to May 2016.
Table 1. Summary of Evaluated Gene Expression Profiling and Expanded Immunohistochemistry Tests
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170615085354-22997-mediumThumb-S0266462317000034_tab1.jpg?pub-status=live)
RS, recurrence score; ROR, risk of recurrence score; ER+, estrogen receptor positive; ER-, estrogen receptor negative; LN-, lymph node negative; LN 1–3, one to three lymph nodes involved; IHC, immunohistochemistry.
METHODS
A systematic review of the evidence was undertaken according to the general principles recommended in the Centre for Reviews and Dissemination (CRD) (4) guidance for undertaking systematic reviews, and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement (Reference Liberati, Altman and Tetslaff5), and The NICE Diagnostic Assessment Programme Interim Methods Statement (6).
Data Sources and Searches
Ten electronic databases were searched, these were: Medline and Medline in Process by means of Ovid SP, Embase by means of Ovid SP; Cochrane Library databases all by means of Wiley: Cochrane Central Register of Controlled Trials (CENTRAL), Cochrane Database of Systematic Reviews (CDSR), Database of Abstracts of Reviews of Effects (DARE), Health Technology Assessment Database (HTA), NHS Economic Evaluation Database (NHS-EED); Web of Science databases all by means of Thomson Reuters: BIOSIS Previews, Science Citation Index Expanded (SCI-Expanded) and the Conference Proceedings Citation Index-Science (CPCI-S).
The search strategy used free text and thesaurus terms and combined breast cancer related synonyms (e.g., breast neoplasm) with terms related to gene expression profiling tests or biomarkers (e.g., OncotypeDX or “gene?twentyone”). A publication date limit of January 2002 was applied. This was the date that the longest standing test used in the review had been devised, as confirmed by manufacturers’ submissions to NICE as part of the original review, and, therefore, it would not be possible for evidence to predate this. For the OncotypeDX and MammaPrint test, the current review used two previous systematic reviews (Reference Marchionni, Wilson and Marinopoulos7;Reference Smartt8) to identify included studies, thus the searches were limited from January 2009 (last date from earlier reviews) for these tests. Although several other systematic reviews examining GEP tests have been reported, these reviews (Reference Marchionni, Wilson and Marinopoulos7;Reference Smartt8) were considered the most appropriate reviews to update. The reviews were assessed as being of high quality, and in particular the search strategies were assessed as being complete. No other limits were applied to the searches. An update search was conducted in Medline and Medline in Process from January 2013 to May 2016.
Supplementary search techniques were also undertaken to augment the topic searches, these included hand searching of relevant journals, citation searches of included papers in the review, searching of conference proceedings, and finally experts in the field were contacted to ask for suggestions for relevant evidence for the project.
Study Selection
The inclusion of potentially relevant articles was undertaken using a two-stage process. First, all titles and abstracts were screened for inclusion, followed by the assessment of full manuscripts. Both stages were undertaken by one reviewer and any uncertainties in the selection process were resolved through discussion with another reviewer. All study designs were included. Eligible studies included adult patients diagnosed with early invasive breast cancer. The index test included OncotypeDX, MammaPrint, IHC4, or Mammostrat. The comparator was standard care and could include the use of Adjuvant! Online (AoL) and/or the Nottingham Prognostic Index (NPI), to predict the risk of recurrence and survival for patients with early breast cancer. The outcome measure was clinical utility (the test's ability to discriminate between those who will have more or less benefit from a therapeutic intervention) (Reference Marchionni, Wilson and Marinopoulos7;Reference Smartt8). Specifically, (i) the ability of the test to predict treatment effect with adjuvant chemotherapy, and (ii) to what extent are test results used in treatment decisions. Studies published in languages other than English (unless no other comparable data existed) were excluded. Abstracts were considered but only included if they represented significant new knowledge, such as prospective randomized controlled trial (RCT) evidence.
Data Extraction and Quality Assessment
Data relating to study design, methodological quality, and outcomes, were extracted by one reviewer into a standardized data extraction form and independently checked for accuracy by a second. Discrepancies were resolved by discussion. The methodological quality of each included study was assessed by two reviewers according to the criteria recommended by Altman (2001) (Reference Altman9) for assessing the internal validity of prognostic (predictive factor) studies.
Data Synthesis and Analysis
Although a meta-analysis was planned, this was not considered appropriate due to a high degree of heterogeneity, for example, study populations, outcomes, and diagnostic thresholds between and within studies. Therefore, data were tabulated and discussed in a narrative review.
RESULTS
PRISMA Flow
Figure 1 summarizes the process of identifying and selecting relevant literature. Of the 7,064 citations identified, twenty-nine new studies (30 citations) were identified and were added to the eleven studies from the previous systematic reviews.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170615085354-66264-mediumThumb-S0266462317000034_fig1g.jpg?pub-status=live)
Figure 1. PRISMA diagram.
Study and Patient Characteristics
Forty studies (forty-one citations) were included in the review. All studies were published between 2002 and May 2016.
Most of the evidence was related to the OncotypeDX (thirty-two studies). Four studies related to the prediction of treatment effect with adjuvant chemotherapy, with the remaining twenty-eight studies relating to evidence on the test result leading to changes in treatment decisions. Six studies were identified for MammaPrint, all relating to evidence on the test result leading to changes in treatment decisions. Only one relevant study was identified for IHC4, and one for Mammostrat. The IHC4 study provided evidence relating to the test leading to changes in treatment decisions, whereas the Mammostrat study provided evidence on the prediction of treatment effect with adjuvant chemotherapy. Details of the study and patient characteristics, together with key findings of the included studies are provided in Tables 2 and 3.
Table 2. Summary of Patient Characteristics, Study Characteristics, and Key Findings Relating to the Prediction of Treatment Effect with Adjuvant Chemotherapy
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170615085354-57809-mediumThumb-S0266462317000034_tab2.jpg?pub-status=live)
RS, recurrence score; RR, relative risk; HR, hazard ratio; AoL, Adjuvant! Online; ER+, estrogen receptor positive; ER-, estrogen receptor negative; LN-, lymph node negative; LN 1–3, one to three lymph nodes involved; HT, hormone therapy; CHT, chemotherapy; CMF/MF; CAF, specific chemotherapy regimen; DRFS, distant recurrence free survival; DRFI, distant recurrence free interval; DFS, disease free survival; OS, Overall survival.
Table 3. Summary of Patient Characteristics, Study Characteristics, and Key Findings Relating to Changes in Treatment Recommendations
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170615085354-68501-mediumThumb-S0266462317000034_tab3.jpg?pub-status=live)
RS, recurrence score; AoL, Adjuvant! Online; NPI, Nottingham Prognostic Index; ER+, estrogen receptor positive; ER-, estrogen receptor negative; LN-, lymph node negative; LN 1–3, 1–3 lymph nodes involved; HT, hormone therapy; CHT, chemotherapy.
Quality Assessment
Limitations in the clinical data were identified for all tests. No studies had a prospective, RCT design and only five studies included a prospective analysis of archived tissue samples from a previous RCT (OncotypeDX n = 4; Mammostrat n = 1). For the four OncotypeDX studies and the one Mammostrat study providing evidence relating to the prediction of treatment effect with adjuvant chemotherapy, the overall risk of bias was judged to be moderate, although retrospective analysis of archived tissue samples, the evidence was derived from relatively large scale RCTs. The remaining twenty-eight OncotypeDX studies providing evidence relating to changes in treatment recommendations, were in the main, small scale studies (n = 25–979). Fifteen were retrospective in study design, and some (n = 14) did not provide full details of the patient characteristics. Similarly, of the six studies identified for MammaPrint two were retrospective in study design, and some were lacking full details of patient characteristics. The IHC4 study was prospective in design, however, the sample size was relatively small (n = 124). Overall, particularly for the studies relating to evidence of the tests leading to changes in treatment decisions, there was a high level of clinical heterogeneity across studies both within each test and across the four tests.
NARRATIVE DATA SYNTHESIS
Prediction of Treatment Effect with Adjuvant Chemotherapy
OncotypeDX
Studies by Paik et al. (Reference Paik, Tang and Shak10), Albain et al. (Reference Albain, Barlow and Shak11), and Tang et al. (Reference Tang, Shak and Paik12;Reference Tang, Constantino, Crager, Shak and Wolmark13) assessed the predictive ability of OncotypeDX using archived tissue samples collected during RCTs comparing tamoxifen with tamoxifen plus chemotherapy. The strongest evidence appeared to be presented by Paik et al. (Reference Paik, Tang and Shak10). The OncotypeDX recurrence score (RS) was found to be correlated with chemotherapy benefit, defined in terms of 10-year DRFS, with a significantly increased benefit from the use of chemotherapy in the OncotypeDX high-risk group compared with the low-risk group, in estrogen receptor positive (ER+), lymph node negative (LN-) breast cancer patients. However, in a multivariate analysis the benefit from chemotherapy was unclear due to large confidence intervals in the low and intermediate RS risk groups. Albain et al. (Reference Albain, Barlow and Shak11) demonstrated that the RS was prognostic for tamoxifen-treated patients with positive nodes and predicts significant benefit of chemotherapy in tumors with a high RS. They concluded that a low score could identify women who might not benefit from anthracycline-based chemotherapy, despite positive nodes.
It was also reported by Tang et al. (Reference Tang, Shak and Paik12) that both RS and AoL provided strong independent prognostic information in tamoxifen treated patients, and that RS used alone remained the best predictor of chemotherapy benefit in ER+, LN- breast cancer (Reference Tang, Constantino, Crager, Shak and Wolmark13).
Of these four studies reporting evidence that OncotypeDX predicts benefit from chemotherapy, only one, on a lymph node positive (LN+) population (Reference Albain, Barlow and Shak11) presented that had not come from the National Surgical Adjuvant Breast and Bowel Project (NSABP) cohorts. However, there were limitations associated with this study. It had only a moderate sample size, and the time over which tumor samples were collected was not reported, therefore, they may be differences in diagnostic criteria being applied. Two other studies (Reference Tang, Shak and Paik12;Reference Tang, Constantino, Crager, Shak and Wolmark13) reported the same trial data as Paik et al. (Reference Paik, Tang and Shak10) from the NSABP cohorts, introducing biases associated with double counting in the evidence base as a whole. It should further be noted that the study by Paik et al. (Reference Paik, Tang and Shak10) may also have been subject to bias, as some patients in the validation dataset were also in the training dataset which may partly explain the treatment interaction seen with OncotypeDX.
Mammostrat
No prospective studies of the impact of Mammostrat on long-term outcomes such as overall survival were identified. Initial evidence for the predictive ability of Mammostrat from one study (Reference Ross, Kim and Tang14) suggests that low- and high-risk groups benefited from chemotherapy, with high-risk patients benefiting more than low-risk. The intermediate-risk group did not appear to benefit.
CHANGES IN TREATMENT RECOMMENDATIONS AS A RESULT OF TESTING
OncotypeDX
Twenty-eight studies (Reference Oratz, Paul, Cohn and Sedlacek15–Reference Ozmen, Atasoy and Gokmen42) (see Table 3) provided evidence on the impact of OncotypeDX on clinical decision making. These studies indicated that the use of OncotypeDX leads to changes in treatment recommendations for between 21 percent and 74 percent of all patients who underwent OncotypeDX testing. Three studies (Reference Rayhanabad, Difronzo, Haigh and Romero17;Reference Joh, Esposito and Kiluk24;Reference Partin and Mamounas25) did not report whether changes led to increased or decreased use of chemotherapy. However, where this was reported the number of patients being recommended chemotherapy after the test was introduced declined in most studies. This change from chemotherapy to no chemotherapy ranged from 6 percent to 51 percent of all patients tested. However, in one study more chemotherapy was used after the introduction of OncotypeDX (Reference Biroschak, Schwartz and Palazzo28). It was not clear in a large number of the studies whether these figures represented actual changes in the treatments patients received.
MammaPrint
Six studies were identified which provided evidence on changes in treatment recommendations as a result of MammaPrint (Reference Bueno-De-Mesquita43–Reference Exner, Bago-Horvath and Bartsch49) (see Table 3). These studies indicated that the use of MammaPrint in addition to clinicopathological factors led to changes in treatment recommendations for between 18 percent and 40 percent of all patients tested, and that the between 2 percent and 32 percent of all patients would be recommended to change from chemotherapy to no chemotherapy. One of these studies (Reference Gevensleben, Gohring and Buttner45) reported the use of MammaPrint compared with AoL would result in altered treatment advice for 40 percent of patients. However, this was based on the assumption that all patients classified as high-risk would receive chemotherapy and patients classified as low risk would not receive chemotherapy. Again, in several of these studies it is not clear if actual treatment changes occurred following introduction of the test.
A prospective observational study (Reference Bueno-De-Mesquita43) showed that adjuvant treatment was recommended for 48 percent of patients based on, and Dutch Institute for Healthcare Improvement (CBO) guidelines (2004) alone, increasing to 62 percent when MammaPrint was added. This increased the number of patients receiving adjuvant systemic therapy by 20 (5 percent). For the other guidelines assessed (St Gallen guidelines, the NPI, and AoL), less adjuvant chemotherapy would be given when the data was based on prognostic signature alone are used. A 5-year follow-up study (Reference Drukker, Bueno-De-Mesquita and Retel44) showed that 15 percent of the MammaPrint low risk patients received adjuvant chemotherapy versus 81 percent of the high-risk patients. The 5-year distant recurrence free interval probabilities for MammaPrint low-risk patients were 97 percent, and 91.7 percent for the high-risk patients. Actual treatment decisions were based on restrictive CBO guidelines, and doctors and patients preferences limiting the generalizability of these findings.
IHC4
Evidence from one prospective study (Reference Yeo, Zabaglo and Hills50) demonstrated that the IHC4 test led to changes in treatment recommendations for 34 percent of the patients, with 25 percent recommended to switch from chemotherapy to no chemotherapy. As there is only one study available and it has a small sample size (n = 124), it is difficult to make generalizations based on this evidence. Again, it is not clear whether actual treatment given was changed.
DISCUSSION
OncotypeDX currently has the largest body of evidence on clinical utility relative to the other three tests included in this review. Although, no prospective studies reporting the impact of OncotypeDX on long-term outcomes, such as overall survival, yet exist. The study by Paik et al. (Reference Paik, Tang and Shak10) represented the most robust evidence of clinical utility. The study showed a decreased relative benefit of chemotherapy in the lower-risk groups. However, the specific cancers in the low-risk groups were less likely to respond to chemotherapy, independent of actual survival probability. Other specific limitations include that fact that in one study (Reference Holt, Bertelli and Humphreys32), compared with the study regimens, more effective chemotherapy regimens are currently being used, and more than 44 percent of patients were aged below 50 years old, limiting the generalizability of the findings.
The evidence base for MammaPrint, is primarily based on small sample sizes (n < 427). Some studies were retrospective in design and had heterogeneous patient populations. Some studies included only premenopausal women, which may overestimate the benefit of MammaPrint in the early breast cancer population as a whole, given that younger women are likely to be at higher risk of recurrence and are more likely to be classified as poor prognosis using MammaPrint. Further evidence is required to clarify whether using the test will improve the use of adjuvant chemotherapy in the management of breast cancer. It is also unclear to what extent MammaPrint risk groups are predictive of chemotherapy benefit or how the use of MammaPrint will improve patient outcomes through increases in disease-free and overall survival.
One study on Mammostrat (Reference Ross, Kim and Tang14) provides evidence relating to the benefit of chemotherapy by risk group. However, this indicates that both low- and high-risk groups benefit, whilst it is unclear how those in the moderate risk group would be affected. Further evidence is required. In particular, there was no published evidence on the impact of the test on decision making.
One clinical utility study was available for IHC4 (Reference Yeo, Zabaglo and Hills50). This study provided evidence on the impact of the test on decision making leading to reductions in the amount of chemotherapy recommended. Although the design was prospective it included a relatively small sample of patients.
Limitations
The varied nature of the evidence base makes comparisons between tests difficult. A characteristic feature of the studies across all tests was their heterogeneity, and a large proportion of the studies were small. Many studies used old archived tumor samples, and some, retrospective chart review to elicit treatment recommendations before and after testing. There was a lack of standardized decision-making tools both within and between studies and nonstandardized methods of patient selection were used. Furthermore, several of the studies for OncotypeDX and MammaPrint were funded by the manufacturer giving rise to potential issues of conflict of interests and publication bias.
Conclusion and Implications
One of the tests (OncotypeDX) has a reasonably large evidence base, although there are some methodological weaknesses relating to this evidence, in terms of heterogeneity of patient cohorts, and retrospective study design. The previous systematic reviews (Reference Marchionni, Wilson and Marinopoulos7;Reference Smartt8) on which our updates were based reported that OncotypeDX was furthest along the validation pathway, and that recurrence score was significantly correlated with disease-free-survival and overall survival. There was also some evidence that there may be a significant benefit from the use of chemotherapy in the OncotypeDX high-risk group, although it was acknowledged that this study may have been subject to bias. Our previous review (Reference Ward, Scope and Rafia3) and this update demonstrates that further larger studies have now reported, which support the prognostic capability of the OncotypeDX test, and in the evidence base has been extended to include the LN+ population.
Also, further studies have presented evidence on the impact of OncotypeDX on clinical decision making. The previous reviews (Reference Marchionni, Wilson and Marinopoulos7;Reference Smartt8) indicated that evidence relating to the clinical validity of MammaPrint was not always conclusive or supportive of the prognostic value of the test, and one study was identified which suggested that MammaPrint had an impact on clinical decision making. Our previous review (Reference Ward, Scope and Rafia3), together with this update identified studies which showed the MammaPrint score is a strong independent prognostic factor and may provide additional value to standard clinicopathological measures, although the populations in all of these studies were relatively small. Further studies on the clinical utility of MammaPrint reported on test reclassification against currently used guidelines, reporting that treatment advice for a percentage of patients may change. However, none of the studies provided evidence of actual changes in treatment decisions following introduction of the test.
This update has demonstrated that in comparison to our original review (Reference Ward, Scope and Rafia3) several new studies have emerged which assess the effect of the tests on clinical decision making. However, most of these studies are small scale and it remains the case that further robust evidence on the clinical utility of all of these tests is needed. This would include studies investigating predictive ability, and prospective studies investigating how the tests will be used in clinical practice. Two ongoing trials relating to OncotypeDX (51) and MammaPrint (52) have been designed to address some of these issues, specifically relating to the effect of these tests on patient outcomes and their ability to predict treatment response. The TAILORx trial (51) aims to demonstrate that endocrine treatment alone is noninferior to chemoendocrine treatment in women with an intermediate OncotypeDX score. Patients allocated to an intermediate risk group using the recurrence score will receive endocrine therapy and be randomly assigned to chemotherapy or no chemotherapy. The MINDACT trial (52) aims to assess the value of MammaPrint in predicting which patients would benefit from chemotherapy compared with AoL. Patients assessed as high risk by one method and low risk by the other will then be randomized to follow the treatment indicated by MammaPrint or the treatment indicated by AoL.
Two further objectives of the trial relating to the efficacy of different chemotherapy agents and endocrine treatment strategies are addressed by two further stages of randomization. These trials will result in direct evidence that these tests in breast cancer patients lead to improvement in outcomes with the use of RCTs comparing the outcomes of patients following standard management to those of patients managed with the aid of the expression-based assays. All tests would benefit from further evidence demonstrating how they will be used in the current decision-making process and, especially, how this will impact on patient management decisions.
CONFLICTS OF INTEREST
The authors have no conflicts of interest to report.