Introduction
We sought to replicate a comparison of the outcomes of cognitive–behavioural therapy (CBT), person-centred therapy (PCT) and psychodynamic or psychoanalytic therapy (PDT) as delivered in routine primary-care mental health practice within the UK National Health Service (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006). CBT, PCT, and PDT are distinct approaches in terms of their usual repertoires of interventions and their assumptions about the nature and sources of psychopathology. Each encompasses a range of techniques and should be considered a family of treatments rather than a specific treatment protocol. Each is considered by its practitioners as widely applicable to the varied problems presented for psychotherapeutic treatment (Gabbard et al. Reference Gabbard, Beck and Holmes2005; Feltham & Horton, Reference Feltham and Horton2006).
There is strong evidence for the efficacy and effectiveness of CBT for a wide variety of disorders (Dobson, Reference Dobson1989; Clark et al. Reference Clark, Salkovskis, Hackmann, Wells, Ludgate and Gelder1999; Ladouceur et al. Reference Ladouceur, Dugas, Freeston, Leger, Gagnon and Thibodeau2000; Teasdale et al. Reference Teasdale, Segal, Williams, Ridgeway, Soulsby and Lau2000; Hollon et al. Reference Hollon, Thase and Markowitz2002; Hollon & Beck, Reference Hollon, Beck and Lambert2004; Ma & Teasdale, Reference Ma and Teasdale2004; Ehlers et al. Reference Ehlers, Clark, Hackmann, McManus and Fennell2005; Westbrook & Kirk, Reference Westbrook and Kirk2005; Whittal et al. Reference Whittal, Thordarson and McLean2005; Butler et al. Reference Butler, Chapman, Forman and Beck2006). Fewer studies have systematically examined outcomes of the other two approaches, but available evidence similarly supports the efficacy and effectiveness of at least some varieties of PCT (Greenberg & Watson, Reference Greenberg and Watson1998; Ward et al. Reference Ward, King, Lloyd, Bower, Sibbald, Farrelly, Gabbay, Tarrier and Addington-Hall2000; Elliott et al. Reference Elliott, Greenberg, Lietaer and Lambert2004; Goldman et al. Reference Goldman, Greenberg and Angus2006) and PDT (Leichsenring, Reference Leichsenring2001; Leichsenring & Leibing, Reference Leichsenring and Leibing2003; Leichsenring et al. Reference Leichsenring, Rabung and Leibing2004). Clinical trials comparing alternative approaches (Elkin et al. Reference Elkin, Shea, Watkins, Imber, Sotsky, Collins, Glass, Pilkonis, Leber, Docherty, Fiester and Parloff1989; Shapiro et al. Reference Shapiro, Barkham, Rees, Hardy, Reynolds and Startup1994; Barkham et al. Reference Barkham, Rees, Shapiro, Stiles, Agnew, Halstead, Culverwell and Harrington1996; Ward et al. Reference Ward, King, Lloyd, Bower, Sibbald, Farrelly, Gabbay, Tarrier and Addington-Hall2000) and broadly based reviews (Wampold, Reference Wampold2001; Roth & Fonagy, Reference Roth and Fonagy2004) have concluded that bone fide therapies that have been actively researched tend to be similarly effective. This is the equivalence paradox: many psychotherapies appear to have equivalently positive outcomes despite manifestly non-equivalent theories and techniques. The paradox is expressed by the Dodo verdict, ‘Everybody has won, and all must have prizes’ (Carroll, 1865/Reference Carroll1946, p. 28; italics in original). This verdict has been quoted by psychotherapy researchers for more than 70 years, although debate about it continues (Rosenzweig, Reference Rosenzweig1936; Luborsky et al. Reference Luborsky, Singer and Luborsky1975; Stiles et al. Reference Stiles, Shapiro and Elliott1986; Beutler, Reference Beutler1991; Norcross, Reference Norcross1995; Seligman, Reference Seligman1995; Hunsley & Di Giulio, Reference Hunsley and Di Giulio2002).
Despite such indications of equivalent outcomes across many treatments, the predominance of published research on CBT has given CBT a greater credibility than the other approaches. In the USA, the great majority of approaches on the list of empirically supported treatments produced by the American Psychological Association's Division 12 Task force (Chambless & Hollon, Reference Chambless and Hollon1998; Chambless et al. Reference Chambless, Baker, Baucom, Beutler, Calhoun, Crits-Christoph, Daiuto, DeRubeis, Detweiler, Haaga, Johnson, McCurry, Mueser, Pope, Sanderson, Shoham, Stickle, Williams and Woody1998) were in the CBT family, while in the UK, guidelines proposed by the National Institute for Clinical Excellence (NICE 2004a, b, 2005) and a widely discussed proposal by Layard (Reference Layard2006) for improving access to psychological therapies concentrated on CBT. However, these endorsements were based explicitly on the quantity and quality of research about CBT approaches rather than their demonstrated superiority to alternative bona fide treatments (Hunot et al. Reference Hunot, Churchill, Silva de Lima and Teixeira2007).
Our approach followed the logic of clinically representative, or effectiveness research, in which the risks of selection biases associated with lack of randomization and the lack of assurance that the treatments were delivered in a standard way are balanced by the greater realism, or external validity (Seligman, Reference Seligman1995). Results address the effects of treatments as routinely delivered, using practitioners' versions of the treatments and the patients who typically receive them (Shadish et al. Reference Shadish, Navarro, Matt and Phillips2000; Street et al. Reference Street, Niederehe and Lebowitz2000; Stirman et al. Reference Stirman, DeRubeis, Crits-Christoph and Brody2003).
In our earlier study (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006), we found negligible differences in effectiveness among these alternative approaches (CBT, PCT, and PDT) as delivered in routine NHS practice. The practical importance of the question of differential effectiveness, however, mandates replication. The present comparison drew from a similar population using a later, non-overlapping sample of patients, which was more than four times larger (5613 v. 1309). In addition, whereas the previous sample included some patients treated in secondary and tertiary care, all in the present sample were treated in primary-care mental heath services, eliminating a possible source of confounding.
Method
Participants
We studied data from 5613 adult patients who received CBT, PCT, or PDT and completed the Clinical Outcomes in Routine Evaluation – Outcome Measure (CORE-OM; Barkham et al. Reference Barkham, Evans, Margison, McGrath, Mellor-Clark, Milne and Connell1998, Reference Barkham, Margison, Leach, Lucock, Mellor-Clark, Evans, Benson, Connell, Audin and McGrath2001, Reference Barkham, Gilbert, Connell, Marshall and Twigg2005; Evans et al. Reference Evans, Mellor-Clark, Margison, Barkham, Audin, Connell and McGrath2000, Reference Evans, Connell, Barkham, Margison, Mellor-Clark, McGrath and Audin2002, Reference Butler, Chapman, Forman and Beck2006; Cahill et al. Reference Cahill, Barkham, Stiles, Twigg, Hardy, Rees and Evans2006; Mellor-Clark et al. Reference Mellor-Clark, Curtis Jenkins, Evans, Mothersole and McInnes2006; Connell et al. Reference Connell, Barkham, Stiles, Twigg, Singleton, Evans and Miles2007) at the beginning and end of their treatment. These data were collected during a 3-year period (April 2002–September 2005) at 32 National Health Service (NHS) primary-care services delivering counselling and psychological therapy as part of routine evaluation and outcome auditing. The services each contributed from 3 to 669 of these patients (median=85.5; 11 of the services each contributed 200 or more of the patients and five of the services each contributed fewer than 20 of the patients). These patients were seen by 399 therapists, who each saw 3–153 patients (median=6; 90 of the therapists each saw 20 or more of these patients and 145 of the therapists each saw 3 or fewer of these patients). Therapist characteristics were not recorded.
Of these 5613 patients, 70.7% (n=3970) were female, and their mean age was 40.7 years (s.d.=12.7, range 16–99). Patients presented a variety of psychological problems, as described later. Over half of the patients (n=2989, 53.3%) were taking prescribed psychotropic medications at the start of therapy, most commonly antidepressants and anxiolytics/hypnotics.
CORE system measures
Self-report outcome measure
The CORE-OM comprises 34 items addressing domains of subjective well-being, symptoms (anxiety, depression, physical problems, trauma), functioning (general functioning, close relationships, social relationships) and risk (risk to self, risk to others), half low intensity (e.g. ‘I feel anxious/nervous’) and half high intensity (e.g. ‘I feel panic/terror’). Items are scored on a 0–4 scale, anchored Not at all, Only occasionally, Sometimes, Often, and All or most of the time. CORE clinical scores, computed as the mean of all completed items multiplied by 10, can range from 0 to 40. A recommended cut-off between clinical and normal populations is 10 (Connell et al. Reference Connell, Barkham, Stiles, Twigg, Singleton, Evans and Miles2007). Forms are considered valid if up to three items are omitted (Evans et al. Reference Evans, Connell, Barkham, Margison, Mellor-Clark, McGrath and Audin2002). Internal consistency reliability was α=0.93 based on pre-treatment forms in the present study (n=5613). Test–retest reliability was r=0.88 over 1 month in a clinical sample (n=89, Barkham et al. Reference Barkham, Mullin, Leach, Stiles and Lucock2007).
Therapist assessments
The CORE Assessment (Mellor-Clark et al. Reference Clark, Salkovskis, Hackmann, Wells, Ludgate and Gelder1999; Mellor-Clark & Barkham, Reference Mellor-Clark, Barkham, Feltham and Horton2006) comprises the Therapist Assessment form, completed at intake, and the End of Therapy form. On the Therapist Assessment form, therapists gave referral information, patient demographics, and data on the nature, severity, and duration of presenting problems using 14 categories: depression, anxiety, psychosis, personality problems, cognitive/learning difficulties, eating disorder, physical problems, addictions, trauma/abuse, bereavement, self-esteem, interpersonal problems, living/welfare and work/academic.
On the End of Therapy form, therapists indicated which type(s) of therapy was (were) undertaken with the patient – as many as appropriate. Categories were psychodynamic, psychoanalytic, cognitive, behavioural, cognitive/behavioural, structured/brief, person-centred, integrative, systemic, supportive, art, and other. Therapists also reported the number of sessions attended and other aspects of the treatments.
Procedure
Data collection
Patients attending for psychological assessment or therapy at services using the Personal Computer format of the CORE System (CORE-PC; Mellor-Clark et al. Reference Mellor-Clark, Curtis Jenkins, Evans, Mothersole and McInnes2006) were asked to complete a CORE-OM before treatment began – during screening or assessment or immediately before the first therapy session. Services were asked to administer CORE system measures to all such patients, although compliance was not monitored. Patients were allocated to treatments and to therapists following services' normal procedures. Services were instructed to give the post-treatment CORE-OM at the last session; the timing and specific procedures were determined by what worked best for each service administratively and were not recorded. Therapists completed the Therapist Assessment form after the intake session and the End of Therapy form when the patient was discharged or stopped attending for therapy. Patients and therapists completed forms by hand, and the data were then entered onto proprietary CORE-PC software to prepare standardized reports for the services. Each patient was allocated a unique code number by the service. With the agreement of the NHS services, the anonymized data were stored by an independent IT management system for use in research. Data collection complied with data protection procedures for the use of routinely collected clinical data. Ethics approval for this study was covered by NHS COREC application 05/Q1206/128.
Selection of patients
The 5613 patients studied were selected from the CORE National Research Database-2005 (CORE NRD-2005; Mellor-Clark et al. Reference Mellor-Clark, Curtis Jenkins, Evans, Mothersole and McInnes2006), which includes information on 33 587 patients (69.4% female; mean age=38.5, s.d.=13.1) whose therapist returned a CORE Assessment form at one of 34 NHS primary-care counselling services during the 3-year period we considered. These services had been using CORE-PC for at least 2 years and represented approximately 50% of the NHS primary-care psychological therapy services using the CORE common methodology (Evans et al. Reference Evans, Mellor-Clark, Barkham and Mothersole2006). The CORE NRD-2005 comprises data donated by services that were engaged in ongoing data collection, rather than a time-limited study.
Of the 33 587 patients in the CORE NRD-2005, 5327 did not return any valid CORE-OM forms; 569 returned post-therapy but not pre-therapy forms; and 14945 returned pre-therapy but not post-therapy forms. The latter, largest category included patients who did not attend any sessions, patients who attended sessions but left without completing the final form, and patients who had not ended their treatment by the closing date of data collection. The therapists of 584 of the remaining 12 746 patients failed to complete the End of Therapy form (which included information on treatment approaches used).
From the 12 162 patients, who completed reliable pre- and post-treatment CORE-OM forms and whose therapist completed End of Therapy forms, we selected six treatment groups based on therapists' reports on the End of Therapy form regarding the type(s) of therapy undertaken. A majority of therapists indicated more than one of the 12 categories provided (mean=1.85, range=1–9). Following the same criteria as previously (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006), we classified the three targeted approaches as follows:
• CBT, cognitive, behavioural, and/or cognitive/behavioural;
• PCT, person-centred;
• PDT, psychodynamic and/or psychoanalytic.
Using these targeted approaches, we defined six groups of patients. Three groups included those whose therapists specified therapies belonging to one and only one of the targeted approaches – CBT, PCT, or PDT. The other three groups were those whose therapists specified one of the targeted approaches plus one treatment not included in the targeted approaches (i.e. one of the following: structured/brief, integrative, systemic, supportive, art, or other), abbreviated CBT+1, PCT+1, and PDT+1, respectively. We reasoned that the latter three groups offered comparisons among the targeted approaches that were parallel to, but, depending on one's perspective, somewhat diluted or somewhat enhanced.
Of the 12 162 patients for whom the required information was available, 5613 adult patients met specifications for one of the six groups (see Table 1). In this study, we did not give further consideration to patients whose therapists indicated none of the targeted approaches (n=3800) or more than one of the targeted approaches (n=1849) or more than one treatment in addition to one of the targeted approaches (n=894). Each of these residual categories comprised a large variety of the approaches listed on the CORE End of Therapy form. We also excluded six patients whose recorded age was below 16.
Table 1. CORE-OM clinical scores for treatment groups: pre- and post-therapy means, differences and confidence intervals for the differences
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042853625-0305:S0033291707001511:S0033291707001511_tab1.gif?pub-status=live)
CBT, Cognitive, behavioural, or cognitive/behavioural therapy; PCT, person-centred therapy; PDT, psychodynamic/psychoanalytic therapy; CBT+1, CBT combined with one other therapy; PCT+1, PCT combined with one other therapy; PDT+1, PDT combined with one other therapy; Effect size, calculated as the mean difference divided by the pooled pre-therapy standard deviation.
Results
Outcomes of treatment in NHS settings
Patients in these treatments showed very substantial gains, with patients improving, on average, from 17.60 (s.d.=6.33) to 8.77 (s.d.=6.43) on the CORE-OM, a difference of 8.83 (s.d.=6.64). The overall treatment effect size, calculated as the mean pre/post difference divided by the pre-therapy s.d. was 1.39.
Table 1 shows the mean pre-treatment and post-treatment CORE-OM clinical scores for each of the six groups, mean differences across treatment, and effect sizes. A one-way analysis of variance (ANOVA) comparing the pre-therapy means across the six groups was not significant [F(5, 5607)=1.90, p=0.091, partial η2=0.002 (partial η2, calculated as effect variance divided by effect plus error variance)], indicating that all groups began treatment with approximately equivalent levels of disturbance.
To assess treatment outcomes, we conducted a repeated-measures (pre-treatment v. post-treatment) ANOVA, with treatment approach (CBT v. PCT v. PDT) and degree of purity (pure v. ‘+1’) as fixed factors. Results showed a very large overall within-patients main effect of treatment [F(1, 5607)=6805.63, p<0.001, partial η2=0.548], indicating that improvement across treatment accounted for a large proportion of the variation in the obtained CORE-OM scores. In this analysis, a differential treatment effect appears as a treatment by occasion of assessment (pre/post) interaction. This effect was not significant [F(2, 5607)=0.81, p=0.446, partial η2<0.001]. The comparative effectiveness of the pure versus ‘+1’ forms of treatments (the purity by occasion of assessment interaction) also failed to reach significance [F(1, 5607)=3.23, p=0.073, partial η2=0.001], as did the three-way treatment by purity by occasion interaction, which would have indicated that purity was differentially important for the therapies [F(2, 5607)=0.58, p=0.561, partial η2<0.001].
Fig. 1 depicts the distributions of the pre/post differences in CORE-OM clinical scores for each group in the form of notched box plots, which indicate the median, middle 50%, and range. Although these change scores ranged widely (from −29 to 40 out of the possible CORE-OM range of −40 to 40), the medians and distributions of all six groups were very similar.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042853625-0305:S0033291707001511:S0033291707001511_fig1g.gif?pub-status=live)
Fig. 1. Notched box plots showing pre/post differences in CORE-OM clinical scores. The notch shows the 95% confidence interval around the median. The boxes show the middle 50% of the distribution. The whiskers show the range, except that observations falling 1.5 times the interquartile range or more away from the top or bottom of the box are considered outliers and are shown separately. CBT, Cognitive, behavioural, or cognitive/behavioural therapy (n=1045); PCT, person-centred therapy (n=1709); PDT, psychodynamic/psychoanalytic therapy (n=261); CBT+1, CBT combined with one other therapy (n=1035); PCT+1, PCT combined with one other therapy (n=1033); PDT+1, PDT combined with one other therapy (n=530).
As an additional approach to the same question, following Jacobson & Truax (Reference Jacobson and Truax1991), we distinguished patients who had achieved reliable and clinically significant improvement (RCSI) as those who met two criteria. First, patients must show reliable improvement, defined as a pre/post difference that, when divided by the standard error of the difference, is equal to 1.96. Calculating the standard error of the difference using our s.d.diff=6.64 and pre-treatment internal consistency reliability, α=0.93, yielded a reliable change index of 4.9 (see Jacobson & Truax, Reference Jacobson and Truax1991 for formulae). Second, the patient must show clinically significant improvement – entering treatment scoring within the clinical population and leaving treatment in the normal population – defined as moving from above to below the recommended CORE-OM clinical cut-off score of 10 (Connell et al. Reference Connell, Barkham, Stiles, Twigg, Singleton, Evans and Miles2007).
Table 2 shows the number and percentage of patients in each treatment group who (a) achieved RCSI (CORE-OM scores decreasing from ⩾10 to <10, having dropped by 4.9 or more), (b) achieved reliable improvement only (decrease of 4.9 or more that did not fall below the cut-off of 10), (c) showed no reliable change, and (d) showed reliable deterioration, defined as an increase of 4.9 or more points on the CORE-OM. We restricted this analysis to the 4954 patients whose pre-therapy CORE-OM scores were ⩾10, insofar as patients whose initial scores were below the clinical cut-off could not, by definition, achieve clinically significant improvement (Barkham et al. Reference Barkham, Connell, Stiles, Miles, Margison, Evans and Mellor-Clark2006). The results showed substantial RCSI rates in all groups. A 2×6 χ2 comparing the rates of RCSI versus non-RCSI across the six treatment groups was not significant [χ2(5)=4.14, p=0.530, n=4954], reflecting the very similar improvement rates across the six groups.
Table 2. Reliable and clinically significant improvement (RCSI) by treatment group
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042853625-0305:S0033291707001511:S0033291707001511_tab2.gif?pub-status=live)
n=4954 patients whose pre-therapy CORE-OM scores were at or above the recommended cut-off of 10 (Connell et al. Reference Connell, Barkham, Stiles, Twigg, Singleton, Evans and Miles2007).
RCSI, Reliable and clinically significant improvement; Reliable improvement only, decrease of 4.9 or more that did not fall below the cut-off of 10 on the CORE-OM; CBT, cognitive, behavioural, or cognitive/behavioural therapy; PCT, person-centred therapy; PDT, psychodynamic/psychoanalytic therapy; CBT+1, CBT combined with one other therapy; PCT+1, PCT combined with one other therapy; PDT+1, PDT combined with one other therapy.
Characteristics of patients allocated to treatment groups
Table 3 shows the distributions of presenting problems within treatment groups, as reported by the therapists just following their first contact with the patients. Note that therapists often indicated multiple problems. Broadly similar distributions of problems were treated within each group, but there appeared to be a few small differences. For example, patients seen in CBT were less likely to be reported as presenting with interpersonal problems.
Table 3. Percentage of patients in each treatment group with indicated presenting problems
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042853625-0305:S0033291707001511:S0033291707001511_tab3.gif?pub-status=live)
Based on n=5613 patients.
CBT, Cognitive, behavioural, or cognitive/behavioural therapy; PCT, person-centred therapy; PDT, psychodynamic/psychoanalytic therapy; CBT+1, CBT combined with one other therapy; PCT+1, PCT combined with one other therapy; PDT+1, PDT combined with one other therapy.
Columns in the presenting problems section add to more than 100% because therapists often indicated multiple problems for some patients.
Table 4 shows the gender, age, mean number of problems indicated, and mean number of sessions attended for patients in each treatment group. Gender was somewhat unevenly distributed across groups [χ2(5)=25.14, p<0.001, n=5613], with lower percentages of female patients in the CBT and PDT groups and higher percentages in the PCT+1 and PDT+1 groups. Mean age was similar in the three groups. A one-way ANOVA was nominally significant [F(5, 5607)=2.39, p=0.036, partial η2=0.002], however, none of the groups was significantly different from the others in post-hoc tests. The numbers of presenting problems across treatment groups varied significantly across the six groups [F(5, 5607)=31.42, p<0.001, partial η2=0.027], reflecting somewhat larger numbers of problems being attributed to patients in the PDT and PDT+1 groups. There was also significant variation in numbers of sessions attended [F(5, 5447)=25.10, p<.001, partial η2=0.023], as patients in PDT averaged a somewhat higher number of sessions than patients in other groups.
Table 4. Demographics and mean numbers of presenting problems and sessions attended by treatment groups
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042853625-0305:S0033291707001511:S0033291707001511_tab4.gif?pub-status=live)
Based on n=5613 patients.
CBT, Cognitive, behavioural, or cognitive/behavioural therapy; PCT, person-centred therapy; PDT, psychodynamic/psychoanalytic therapy; CBT+1, CBT combined with one other therapy; PCT+1, PCT combined with one other therapy; PDT+1, PDT combined with one other therapy.
Discussion
The design improvements in this replication (larger sample and restriction to primary-care treatments) in comparison to the previous study (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006) yielded, if anything, a closer approximation to equivalent outcomes. For patients who completed the pre- and post-treatment forms, these therapies appeared effective, with mean pre/post improvements that approached those observed in efficacy trials (M. Barkham et al. unpublished observations; Hunsley & Lee, Reference Hunsley and Lee2007). The mean differences among the three targeted approaches, CBT, PCT, and PDT, did not approach significance, despite the high statistical power of this test, nor did the differences between the pure and the diluted/enhanced (‘+1’) treatment groups (Table 1). The RCSI rates too were closely similar across these treatments (Table 2).
In replicating the earlier results, the present results uphold the 70-year-old Dodo verdict on this issue (cf. Rosenzweig, Reference Rosenzweig1936). More immediately, they extend findings from randomized trials conducted within the NHS (Barkham et al. Reference Barkham, Rees, Shapiro, Stiles, Agnew, Halstead, Culverwell and Harrington1996; Ward et al. Reference Ward, King, Lloyd, Bower, Sibbald, Farrelly, Gabbay, Tarrier and Addington-Hall2000), confirming that the equivalence of these treatments may also be observed in routine practice. These replicated results may be of particular interest to practitioners of PCT and PDT, insofar as these approaches' comparable effectiveness to CBT in routine practice may have been unappreciated (cf. Holmes, Reference Holmes2002).
As Fig. 1 shows, the distributions of change scores were mostly overlapping. There was a great deal of variation in outcomes within each group, however, contrasting with the lack of differences between groups. Thus, much scope remains for research on sources of variation in psychotherapy outcome, including differences among therapists (Lutz et al. Reference Lutz, Leon, Martinovich, Lyons and Stiles2007), patients, problems, and contexts.
In addition to the noted design improvements, there were incremental improvements in data gathering in the CORE NRD-2005, compared with the database used in the previous study (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006). A larger percentage of patients returned valid pre-treatment and post-treatment CORE-OM forms than previously (38% or 12746/33587 v. 33% or 3424/10351), and the larger sample was drawn from a smaller set of services (34 v. 58), representing more therapists from each site and larger percentages of therapists' caseloads.
Limitations and alternative accounts
Despite this study's improvements, many of the limitations and caveats of the previous study (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006) apply to this one as well. All are common in research on routine treatments (Shadish et al. Reference Shadish, Navarro, Matt and Phillips2000; Westbrook & Kirk, Reference Westbrook and Kirk2005).
Limited specification of treatments and therapist responsiveness
As previously (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006), we had no independent check on whether or how the therapists delivered the treatments they indicated, no precise descriptions of what the treatments comprised, and no details of the therapists' qualifications. The therapists received no special training for this study. The observed equivalent outcomes could, in principle, reflect a lack of differences in how the treatments were conducted. Or, systematic failure to implement a treatment correctly could account for any particular treatment having failed to prove superior.
On the other hand, we know of no reason why the therapists would have misrepresented the theoretical approach from which they worked. The most plausible assumption, we suggest, is that they sincerely sought to implement the approach they specified. To put it another way, for better or worse, these treatments represent the way CBT, PCT, and PDT are currently practised in these settings.
Presumably, any concern about deviations from the indicated approach(es) applies to all of the treatments, although proponents may be differentially sensitive to possible deviations within their favoured approach. The observation that patients in the diluted/enhanced treatments improved as much as those in the pure treatments suggests that, across the range of purity represented by our comparisons and given the limitations of self-report, greater purity did not yield better outcomes.
Even psychotherapies that specify component techniques are far from standardized. More than most medical treatments, psychotherapies must be adapted to the emerging needs of varied patients in ways that are not specified in a protocol but depend on the skill and interpersonal responsiveness of the therapist (Stiles et al. Reference Stiles, Honos-Webb and Surko1998). Therapists responsively vary their interventions depending on patients' backgrounds, circumstances, capacities, and personalities as well as on the nature and severity of patients' presenting problems (Hardy et al. Reference Hardy, Stiles, Barkham and Startup1998; M. Barkham, et al. unpublished observations).
Non-random assignment and equivalence of treatment groups
Because patients were not randomly assigned to treatments, we cannot rule out biased assignment as an explanation for the observed equivalence. In principle, assigning the most disturbed patients to the most effective therapies and vice versa might mask outcome differences that would emerge in a randomized trial. Alternatives to the Dodo verdict could be built on scenarios that explain how a selective bias differentially penalized an approach that would otherwise have proved superior, so that it appeared equivalent to the other approaches.
Most importantly, however, the groups had equivalent initial scores on the CORE-OM – more equivalent than is typically the case in small-sample randomized trials – indicating that the groups began with similar overall levels of disturbance (Table 1). There were some other modest differences in the patient mix across groups. Replicating the previous results (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006), the largest proportion of males was assigned to the CBT group, the largest average numbers of problems were in the PDT and PDT+1 groups, and patients in the PDT group tended to receive more sessions than patients in the other groups (Table 4). Distributions of presenting problems were broadly similar across groups (Table 3) and in any case were based on therapists' reports, so they could reflect differences in interviewing styles or case conceptualization (e.g. PCT and PDT therapists might be more inclined than CBT therapists to consider interpersonal problems as central). To us, the nature and magnitude of these differences did not suggest compelling scenarios in which otherwise superior treatments were differentially penalized.
Lack of experimental control
Because there was no control group of untreated patients, we cannot attribute patients' improvement to the treatments with certainty. Of course, this is usual in naturalistic studies (cf. Corney & Simpson, Reference Corney and Simpson2005). In another NHS primary-care sample, however, CORE-OM scores remained reasonably stable over periods of up to a year among patients waiting to be assessed for therapy (high test–retest correlations and negligible mean change from screening scores among patients who returned for treatment; Barkham et al. Reference Barkham, Mullin, Leach, Stiles and Lucock2007).
In the same vein, we had little control over procedures used to collect data, which were gathered by participating services for other purposes as part of an ongoing evaluation process. Although we had no indication of biased procedures that would have produced a spurious equivalence of outcomes, we can not rule out this possibility.
Incomplete data, the possibility of selective reporting, and the case for improved access
Whereas 38% of patients represented in the CORE NRD-2005 (12 746/33 587) returned valid pre-treatment and post-treatment CORE-OM forms, 44% (14 945/33 587) returned valid pre-therapy but not post-therapy forms. Incomplete data are common in routine practice settings (Stiles et al. Reference Stiles, Leach, Barkham, Lucock, Iveson, Shapiro, Iveson and Hardy2003; Gilbert et al. Reference Gilbert, Barkham, Richards and Cameron2005; Greasley & Small, Reference Greasley and Small2005), and we emphasize that our conclusions are limited to patients who completed post-treatment forms. Among patients who do receive treatment, those who complete post-treatment measures are more likely to have agreed with their therapist about when treatment should end (Barkham et al. Reference Barkham, Connell, Stiles, Miles, Margison, Evans and Mellor-Clark2006) and more likely to have improved more during treatment (Stiles et al. Reference Stiles, Leach, Barkham, Lucock, Iveson, Shapiro, Iveson and Hardy2003) than are patients who fail to complete them.
The low percentage of patients with post-treatment CORE-OM scores raised concerns about possible selective reporting. In an analysis not yet published, we addressed one concern by examining data from the 343 therapists who saw 15 or more of the patients; these included 31 966 (95%) of the patients in the CORE NRD-2005. If therapists were selectively influencing their good-outcome patients to complete post-treatment forms, improvement rates would be negatively correlated with reporting rates, i.e. the more selective therapists would tend to have relatively better improvement rates. Results showed that the therapists varied widely in their reporting rates (proportions of patients with post-treatment forms) and in their patients' rates of improvement (mean CORE-OM change scores and proportions of patients who achieved RCSI). The return rates were essentially uncorrelated with improvement rates across therapists, however, suggesting little or no selective reporting of their good-outcome cases after all; or perhaps therapists tried but failed. Successful selective reporting would require therapists to know which of their cases would do well or poorly on their post-treatment CORE-OM. However, other research suggests that most therapists think most of their cases did very well and have no accurate idea of which of their patients did well or poorly (Hunsley et al. Reference Hunsley, Aubry, Verstervelt and Vito1999; Hannan et al. Reference Hannan, Lambert, Harmon, Nielsen, Smart, Shimokawa and Sutton2005). If therapists can not discriminate successful from unsuccessful patients, as these results suggest, they could not select the successful ones.
Patients who drop out before completing treatment or who never begin treatment in the first place are of great policy interest and deserve attention by researchers. The substantial mean improvement we observed among those who did complete treatment (and post-treatment measures) is a strong argument for improving access to psychotherapy (cf. Layard, Reference Layard2006), working to overcome economic, social, and personal barriers to treatment.
Restriction to one self-report measure
In principle, self-report instruments are vulnerable to distortions. Patients may exaggerate their distress before treatment or exaggerate their improvement following treatment. The CORE-OM is a broad spectrum measure and, in our use of it, did not focus on the specific problems. Qualitatively different results of the different treatments might have shown up on more targeted measures.
Arguably, however, subjective symptoms and distress define the need for psychotherapeutic treatment in most cases. The CORE-OM is highly correlated with other self-report and clinician-rated measures of outcome (Evans et al. Reference Evans, Connell, Barkham, Margison, Mellor-Clark, McGrath and Audin2002; Leach et al. Reference Leach, Lucock, Barkham, Noble, Clarke and Iveson2005, Reference Leach, Lucock, Barkham, Stiles, Noble and Iveson2006; Cahill et al. Reference Cahill, Barkham, Stiles, Twigg, Hardy, Rees and Evans2006). Additional measures would be informative but, the gain must be balanced against the burden they would impose on the patients and the system.
Investigator allegiance
In a review of psychotherapy outcome studies, Luborsky et al. (Reference Luborsky, Diguer, Seligman, Rosenthal, Krause, Johnson, Halperin, Bishop, Berman and Schweizer1999) reported that the investigator's allegiance, assessed by ratings of previous publications, ratings by colleagues, and self-ratings, was strongly correlated with the outcomes of the treatments in published reports (r=0.85, p<0.001, n=29 studies). The present study's first author has published papers dealing with the equivalence paradox, so in a sense this report, like the previous one (Stiles et al. Reference Stiles, Barkham, Twigg, Mellor-Clark and Cooper2006), fits the pattern. He has a particular interest in the PCT approach. The second author is an accredited CBT therapist and has delivered manualized versions of both CBT and PDT in randomized trials. The third and fourth authors are not qualified therapists, although the third belongs to an organization whose members have predominantly PCT orientations.
Self-regulation as an account of equivalent outcomes
Our results underline the call for more research on non-CBT approaches (Hunot et al. Reference Hunot, Churchill, Silva de Lima and Teixeira2007). Equivalent outcomes, such as those replicated in this study, are often ascribed to common factors in the relationship, such as the alliance, empathy, and collaborative involvement (Norcross, Reference Norcross2002). To us, however, the outcomes of these treatments seemed so remarkably similar that we wondered if active self-regulatory processes could be responsible (Stiles et al. Reference Stiles, Honos-Webb and Surko1998).
Perhaps the equivalent results reflect therapists and patients responsively optimizing gains. In analyses not yet published, we found that average patient outcomes in the CORE NRD-2005 were equivalent or declined across increasing numbers of sessions. That is, the average outcomes of patients who had one or two sessions were at least as positive as those of patients who had 15 or 16 sessions. This replicated result (cf. Barkham et al. Reference Barkham, Connell, Stiles, Miles, Margison, Evans and Mellor-Clark2006) may seem paradoxical and surprising if treatment is considered as an independent variable in an experimental manipulation, but it seems clinically sensible if patients and therapists are considered as responsively ending treatment when they have reached a satisfactory balance between gains achieved and further effort required. Insofar as different patients begin at different points and change at different rates, patients achieve satisfactory gains at different treatment durations.
If active, responsive self-regulation determines the level of gains achieved, then the type of approach (CBT, PCT, PDT, or other) might be incidental to the gross degree of improvement. Yet, the logic of self-regulation does not contradict the treatment theories. Alternative approaches may offer equally effective but different solutions to psychological problems (Stiles, Reference Stiles1983).
Acknowledgements
Michael Barkham and Janice Connell were supported by the Priorities and Needs Research and Development Levy from Leeds Community Mental Health & Teaching Trust.
Declaration of Interest
Michael Barkham and John Mellor-Clark received funding from the UK Mental Health Foundation to develop the CORE-OM, a measure used in this study. John Mellor-Clark runs a company that supplies training, software support, and data analysis and benchmarking services to users of the CORE system.