Introduction
Cognitive-behavioural (CBT) supervision was specified in seminal articles (Padesky, Reference Padesky and Salkovskis1996; Liese and Beck, Reference Liese, Beck and Watkins1997) and updated in 2010 (Newman, Reference Newman2010). However, the actual practice of CBT supervision has not demonstrated good fidelity to this model. In a repeat survey of CBT supervisors after a 5-year interval, Townend, Iannetta, Freeston and Harvey (Reference Townend, Iannetta, Freeston and Harvey2007) found continued evidence of poor fidelity, especially as regards the limited use of direct observation, few standardized competency ratings (such as the CTS), and minimal use of experiential methods (role play and behavioural rehearsal). Similarly, observational research on the CBT supervision model indicated that it had rarely been implemented with fidelity, leading to the view that manual-based, N = 1 analyses are needed to evaluate its effectiveness (Milne, Reference Milne2008).
Competency statements (Roth and Pilling, Reference Roth and Pilling2008) may improve fidelity, but the field still lacks instruments to operationalize the practice of CBT supervision (Milne and Reiser, Reference Milne and Reiser2011). For example, the Improving Access to Psychological Therapies (IAPT) Education and Training Group (2011) noted: “. . .reliable scales of supervision competence have yet to be developed. . .” (p.3) Fidelity may also be improved by having a supervision manual. Watkins (Reference Watkins1998) has made the case for a manualized approach, together with other efforts to address the supervision paradox (i.e. a weak scientific knowledge-base yet huge importance within professional practice). He concluded that “something does not compute” (Watkins, Reference Watkins1997, p. 604). As a result of this paradox, it is thought that, in place of training, most supervisors rely on imitating how they themselves were supervised, or on transferring what they do in therapy to their supervision (Townend, Iannetta and Freeston, Reference Townend, Iannetta and Freeston2002). In their consensus statement, Falender et al. (Reference Falender, Cornish, Goodyear, Hatcher, Kaslow and Leventhal2004) urged that, in order to support the emergence of supervision as a distinct competency, “A range of research procedures should be employed, including, for example, self-report, experimental, single-subject repeated measures. . .” (p. 775).
To contribute to improved fidelity, the present study adopts an N = 1, multiple baseline design to assess CBT supervision in comparison to Evidence-Based Clinical Supervision (EBCS), a science-informed approach to conducting and evaluating supervision (Milne, Reference Milne2009). This new approach enhances CBT supervision by addressing the above objections through a developmentally-informed framework, explicit emotional processing of the material presented by the supervisee, higher levels of challenging and experiential learning, an integrated programme of research, systematic supervisor training (Milne, Reference Milne2010), and the operationalization of EBCS through a manual and an observational instrument for rating supervisory competence (Milne, Reiser, Cliffe and Raine, Reference Milne, Reiser, Cliffe, Breese, Boon and Raine2011). This approach is consistent with Bellg et al.'s (Reference Bellg, Borrelli, Resnick, Hecht, Minicucci and Ory2004) recommendations in the area of delivery of treatment: “The gold standard to ensure satisfactory delivery is to evaluate or code intervention sessions (observed in vivo or video- or audiotaped) according to a priori criteria.” (p. 446).
While a limited amount of literature exists reviewing the overall outcomes of CBT supervision (Milne and James, Reference Milne and James2000; Townend et al., Reference Townend, Iannetta and Freeston2002; Wheeler and Richards, Reference Wheeler and Richards2007; Milne et al., Reference Milne, Reiser, Aylott, Dunkerley, Fitzpatrick and Wharton2010), the literature comparing outcomes of different supervision methods is even more restricted. In a randomly controlled trial comparing CBT and psychodynamic supervision, Bambling, King, Raue, Schweitzer and Lambert (Reference Bambling, King, Raue, Schweitzer and Lambert2006) found no difference in patients’ scores on a standardized depression measure at the end of an 8-session treatment period. However, analysis of outcomes of patients assigned to a third arm of the study, who were treated in a problem-solving therapy without supervision, suggested that supervision of either kind improved patient outcome. Similarly, Uys, Minnaar, Simpson and Reid (Reference Uys, Minnaar, Simpson and Reid2005) found that the two supervision approaches that they compared (i.e. a developmental model, and one based on Holloway's Reference Holloway1995 matrix model) both produced similarly significantly improved supervisee ratings of supervision.
A number of previous studies of supervision have used a single subject (N = 1) design in order to evaluate the effectiveness of EBCS and CBT supervision. Milne and Westerman (Reference Milne and Westerman2001) studied the effects of fortnightly consultation (supervision-of-supervision) on the clinical supervision of three supervisees over an 8-month period, using a multiple baseline design. Results of the study suggested that targeted supervisor behaviours (i.e. increased use of guided experiential learning) increased in frequency over time, and across supervisees and phases. In a second N = 1 study, Milne and James (Reference Milne and James2002) analyzed the impact of consultancy on the effectiveness of routine CBT supervision with one supervisor and six supervisees. The results again indicated that the supervisor could develop a CBT approach with the aid of consultancy, adding corroborating evidence in terms of the supervisees’ increased experiential learning. However, to our knowledge there has not yet been a direct comparison between CBT and another method of supervision.
Hypotheses of the study
This paper builds on these prior N = 1 studies by reporting a comparative evaluation. Our major hypotheses were as follows:
-
1. We would be able to reliably manipulate the EBCS and CBT supervision conditions (i.e. fidelity would be demonstrated); and
-
2. EBCS supervision would demonstrate significantly stronger effects, in terms of enhanced supervisee engagement in experiential learning.
Method
Design
We utilised a multiple-baseline across participants, alternating treatments design. This design is considered appropriate for this kind of comparison (Oliver and Fleming, Reference Oliver and Fleming1997; Borckardt et al., Reference Borckardt, Nash, Murphy, Moore, Shaw and O'Neil2008), especially given that both approaches were still in need of systematic operationalization, and that only a few rigorous N = 1 evaluations have been conducted (Milne, Reference Milne2008).
To explore the hypotheses, we utilized ABA and ABAB phases across three clients (see Figure 1). These phases included 37 consecutive, audio-taped sessions of supervision over an 11-month period. A minimum of three participants is recommended, to allow for accurate visual inspection (Hersen and Barlow, Reference Hersen and Barlow1976); whilst at least 20 data points are recommended to detect a large effect size by statistical means (Onghena and Edgington, Reference Onghena and Edgington2005).
Figure 1. The multiple baseline across participants (clients) N = 1 research design, with the ratings of supervision (SAGE) during the CBT and EBCS phases of the study. This Figure displays the data for the supervisor (i.e. SAGE items 1–18)
Instruments
We rated audio-taped sessions of supervision using a direct observation instrument called SAGE (Supervision: Adherence and Guidance Evaluation: Milne, Reiser, Cliffe and Raine, Reference Milne, Reiser, Cliffe, Breese, Boon and Raine2011). SAGE is a competence rating tool, consisting of 23 supervisor and supervisee behaviours. The first 18 items of SAGE were used to rate the supervisor's competence (e.g. “agenda-setting” and “giving feedback”). Each item was rated on a 7-point competence rating scale, ranging from “incompetent” to “expert”, following the revised cognitive therapy scale (Blackburn et al., Reference Blackburn, James, Milne, Baker, Standart and Garland2001). These ratings were then aggregated into an overall “supervisor competence score”, by summing up individual scores for items 1–18, and expressing them as a percentage of the total possible score (18 × 7 = 126). These data allowed us to test the first hypothesis. The final five SAGE items (19–23) reflected supervisees’ engagement in experiential learning within each supervision session (for example, signs of the supervisee “reflecting” and “conceptualizing”), and were similarly aggregated to give a total percentage “supervisee learning” score for each supervision session. This allowed us to test the second hypothesis. SAGE has promising inter-rater reliability (r = .82) and validity, following Landis and Koch (Reference Landis and Koch1977), as detailed in Milne et al. (Reference Milne, Reiser, Aylott, Dunkerley, Fitzpatrick and Wharton2010). For example, in terms of content validity, the 23 items of SAGE were derived from two existing instruments: “Teacher's PETS” (Milne, James, Keegan and Dudley, Reference Milne, James, Keegan and Dudley2002) and “CBT STARS” (James, Blackburn, Milne and Freeston, Reference James, Blackburn, Milne and Freeston2005). Construct validity was assessed by factor analysis, using Principal Axis Factoring. The KMO measure of Sampling Adequacy was 0.96. Examination of the Eigenvalues (19.06, 1.15, 0.71, 0.62 for the first four, respectively) and the Scree plot indicated a single factor, Supervisory Competence, accounting for 76.6% of the variance. Internal consistency was 0.98.
Participants
The study site was a community-based psychology training clinic in the US serving adults presenting with complex mental health problems. The participants were one male consultant (i.e. the supervisor of the supervisor), one male supervisor, one female therapist (i.e. supervisee) and three clients (two males and one female) presenting with anxiety and depression. The consultant was a doctoral-level, licensed (i.e. registered) clinical psychologist, with over 30 years of experience in providing supervision and training to psychologists and other mental health professionals in CBT. The supervisor was a licensed psychologist certified by the Academy of Cognitive Therapy with over 20 years of clinical experience; and a training clinic director with over 6 years of experience in providing doctoral-level training. The supervisee was a post-doctoral student.
This was a convenience sample, having been selected on the pragmatic grounds that the supervisor had invited the consultant to engage in a collaborative research study. Similarly, the supervisees were currently working with the supervisor at the time of the study; and the three clients who completed all study phases (three clients discontinued therapy or did not consent) were seen by the second supervisee (i.e. the therapist). There were no exclusion criteria and the sample was considered to be representative of supervisees and clients in this clinic. The study was approved by the Palo Alto University Institutional Review Board in the USA; and by the Research and Development Department of the consultant's employing Trust in the UK. This Department deemed that additional ethical clearance was not required, as this was a “service improvement” project. Informed consent was obtained from the supervisees and the clients.
Procedure
Supervisor training took place over 5 consecutive weeks immediately prior to the initial study baseline, following the consultancy procedure described below. Training practices were consistent with Bellg et al. (Reference Bellg, Borrelli, Resnick, Hecht, Minicucci and Ory2004) recommendations for improving provider training, including: “. . .. using standardized training materials, conducting role-playing, and observing actual intervention and evaluating adherence to protocol.” (p. 446). The supervisor quickly acquired competence, as indicated by an improvement from a pre-training SAGE score of 55% to a score of 80% by the end of the training, indicating proficiency.
During the training, baseline and experimental phases, consultancy entailed fortnightly, phone-based, hour-long reviews of the preceding week's tape recorded supervision (i.e. supervision-of-supervision). Consultancy involved support and training utilizing didactic methods, including reference to supervision guidelines and to other supportive materials (such as a client experiencing scale, a supervisee's learning outcomes list, and illustrative video material: Milne, Reference Milne2009). During the CBT training phase, supervision guidelines were based on authoritative texts (Padesky, Reference Padesky and Salkovskis1996; Liese and Beck, Reference Liese, Beck and Watkins1997), scientific papers (e.g. Pretorius, Reference Pretorius2006), and guidance from The British Association for Behavioural and Cognitive Psychotherapy (e.g. Lewis, Reference Lewis2005). During the EBCS training phase, a supervisor training manual, including supervision guidelines (Milne, Reference Milne2010), was supplemented by experiential work including educational role-plays and behavioural rehearsal. Corrective feedback to the supervisor was based on the consultant's ratings of his supervision, using SAGE based on scrutinizing audiotapes of supervision. During the CBT (baseline) phases, the supervisor adhered to the CBT supervision condition (e.g. the supervisee would not be exposed to the same emotional processing or experiential learning as in the experimental phase). The objective was to try and demonstrate experimental control over these two approaches (i.e. high fidelity).
All supervision sessions were recorded on audiotape by the supervisor. A total of 37 supervision sessions were recorded, each lasting approximately one hour. This sample was selected as it included all recorded supervision sessions in which there was a discussion of one or more of the three clients, during the entire 11-month study period.
Statistical analysis
The current convention within N = 1 statistical analysis is a combination of visual inspection of the descriptive (longitudinal) data and inferential statistics. However, visual inspection is prone to low reliability and Type I errors, as are the usual parametric tests. Therefore, experts recommend conservative inspection (e.g. summarizing differences in mean scores between phases) and statistical tests that are appropriate for N = 1 data and that address the problem of auto-correlation (serial dependence) within longitudinal data (Borckardt et al., Reference Borckardt, Nash, Murphy, Moore, Shaw and O'Neil2008). We therefore provide summary mean scores alongside the raw N = 1 data, and apply the non-parametric PAND statistic (Percentage of All Non-Overlapping Data: Parker, Hagen-Burke and Vannest, Reference Parker, Hagan-Burke and Vannest2007). This provides a quantitative comparison by determining how many observations overlap between the ABAB phases. The percentage overlap converts to phi, allowing significance testing by calculating confidence intervals using an online resource (Pezzullo, Reference Pezzullo2008).
Results
Hypothesis 1
As illustrated in Figure 1, visual inspection suggests that the EBCS phases yielded the highest SAGE scores for all three clients, though there is considerable variability. Mean values for clients in each of the phases (CBT or EBCS) is indicated by the broken horizontal lines in Figure 2, and supports this interpretation (i.e. in every instance, the EBCS phases have higher mean SAGE scores than the CBT phases), indicating good experimental control of the alternating interventions: CBT supervision and EBCS. A sub-analysis of the full data set (i.e. considering all 23 individual items within SAGE) supported this interpretation, with ratings 0.63 points higher, on average, during EBCS phases for the key SAGE variables of “emotional engagement”, “challenging”, “training” and “experiencing”. Conversely, CBT supervision was rated more highly for “conceptualizing” (mean: 0.2), reflecting the relative emphasis on case discussion within this approach.
Figure 2. Data for the supervisee's learning (i.e. SAGE items 19–23) during the CBT (white) and EBCS (shaded) phases of clinical supervision
We next analyzed the SAGE data using an inferential statistic. Participants contributed 8, 5 and 7 data points respectively in the first phase, and 14, 7, and 7 (respectively) in the second (see Figure 1). PAND was 88.2% and the non-overlap beyond chance level (50%) was 38.2%; phi = 0.762. The confidence intervals (95%) for the phi coefficient were 0.605 and 0.862. As these confidence intervals do not include 0, this indicates a statistically significant difference between the phases (p<.05). Although guidelines for the interpretation of effect sizes in single case research are not yet well established (Parker et al., Reference Parker, Hagan-Burke and Vannest2007), a phi of this size would elsewhere be considered a “moderate” effect size (Cohen, Reference Cohen1988). In summary, the quantitative (SAGE) data indicated that the two supervision approaches were implemented with fidelity.
Hypothesis 2
As regards the relative effectiveness of EBCS over CBT supervision, quantitatively (i.e. SAGE ratings) we obtained trends for each successive client's supervision that indicated consistently that EBCS resulted in greater engagement in experiential learning by the supervisee. Visual inspection of Figure 2 suggests that the best results occurred during EBCS, and similarly that the mean values within the EBCS phases all exceeded those within the CBT phases. The two phases, CBT supervision and EBCS, were then compared across all three clients combined (to increase power), in terms of the supervisee's learning data (i.e. the relevant SAGE items, 19–23: see Figure 2), using the non-parametric statistic PAND.
Participants contributed the same data points as already noted, and the PAND value was again 88.2% in both cases, giving the same confidence intervals, phi coefficient and statistically significant difference between the supervisee's learning in the CBT and EBCS phases (p<.05). This would be considered a “moderate” effect size (Cohen, Reference Cohen1988). Expressed in terms of the observed differences in the SAGE learning scores between phases for all three clients, EBCS was associated with enhanced engagement in experiential learning on four of the five SAGE learning items (mean: 32% improvement).
Discussion
Our first hypothesis was that the two supervision approaches, CBT and EBCS, could be reliably manipulated. Such fidelity is a basic logical requirement for process-outcome research, also increasing power (Lipsey, Reference Lipsey1990). However, it is rare for studies in the supervision field to include demonstrations of fidelity (Watkins, Reference Watkins1997).
Using quantitative ratings of supervision, based on direct observation (i.e. SAGE), we were able to show that the CBT and EBCS approaches could be manipulated with good fidelity between the experimental phases. Our second main finding was that the CBT and EBCS approaches appeared to yield predictably different effects, as measured in terms of the supervisee's engagement in experiential learning. Visual inspection and non-parametric statistical testing suggested that EBCS promoted significantly more engagement in experiential learning. However, it is evident from Figure 1 that the supervisor became more proficient over time across both conditions. A similar finding was reported by Milne and James (Reference Milne and James2002), who explained the progression in terms of a lag effect (i.e. the study design did not allow sufficient time for the full socialization of supervisor and supervisees to the new approach).
We should also recognize a number of additional methodological weaknesses within the present analysis, including the need to do further psychometric work on SAGE (we have commenced a generalizeability study); to add complementary instruments, such as objective assessments of learning and the transfer of the supervisees’ learning to therapy; and to refine training in EBCS (see, for example, Culloty, Milne and Sheikh, Reference Culloty, Milne and Sheikh2010). Another weakness was our use of an experienced supervisor who, with an established allegiance to his usual approach (CBT supervision), might be less markedly influenced by training and consultancy, a common finding within therapist-training studies (Beidas and Kendall, Reference Beidas and Kendall2010). A related confound was the respective guidelines: in the case of EBCS there was a formally-developed supervisor training manual (Milne, Reference Milne2010), whereas in the CBT supervision we relied on a range of informal guidance (e.g. book chapters: Padesky, Reference Padesky and Salkovskis1996; Liese and Beck, Reference Liese, Beck and Watkins1997). On the other hand, both approaches were guided by the consultant's experience of both EBCS and CBT supervision. Finally, we acknowledge that there are inherent methodological weaknesses in N = 1 designs, including significant limitations on the generalizability of the study and an inability to rule out confounding variables or alternative explanations for results, due to the inability to control for confounding variables through randomization (Harris et al., Reference Harris, McGregor, Perencevich, Furuno, Zhu and Peterson2006).
A more substantive issue concerns the differentiation of our two conditions, CBT supervision and EBCS. It could be argued that EBCS is simply CBT supervision done properly (i.e. EBCS is not conceptually different from CBT supervision, even if there are differences in implementation). After all, authoritative accounts of CBT supervision advocate much that is in EBCS (e.g. Liese and Beck, Reference Liese, Beck and Watkins1997; Padesky, Reference Padesky and Salkovskis1996), such as educational role-play and mutual feedback. However, in addition to a differential emphasis on some shared variables (e.g. EBCS stresses the behavioural and affective aspects of supervision), EBCS appears to be conceptually distinct in terms of drawing on ideas about adult learning from beyond the scope of CBT supervision literature (Milne, Reference Milne2009).
A final reason to believe that CBT supervision and EBCS are different is the recurring finding of non-equivalence in the respective processes and outcomes across the various measures, as summarized above, and related findings from a parallel qualitative evaluation (i.e. episode and content analysis, plus interviews: Milne, Reiser, Cliffe, Breese et al., Reference Milne, Reiser, Cliffe and Raine2011). On the other hand, a more pragmatic stance is to view EBCS as an enhanced form of CBT supervision, on the logic that conceptually EBCS is compatible with the principles of CBT (e.g. the empirical emphasis) and because practically the technology that supports EBCS can readily support the development of CBT supervision (e.g. the supervisor training manual). This seems to us to be a reasonable position, given the methodological limitations of the present study and the pressing need for progress.
Suggestions for future research
Future research might draw on these methodological points, so that more systematic comparisons can be undertaken. This may help identify the key conceptual and procedural features (e.g. parallel manuals for CBT and comparison approaches), relevant instruments, and the most effective elements within such approaches (Kazdin, Reference Kazdin1998). We can envisage how this improved operationalization might build on the emergent competencies frameworks (Roth and Pilling, 2011; Falender et al., Reference Falender, Cornish, Goodyear, Hatcher, Kaslow and Leventhal2004), as might suitable instruments, such as SAGE. In turn, this will facilitate supervisor training. Indeed, one might helpfully view the challenge of advancing supervision research from the perspective of treatment fidelity (Bellg et al., Reference Bellg, Borrelli, Resnick, Hecht, Minicucci and Ory2004), as in requiring an integrated approach to the successive tasks of intervention design, practitioner training, delivering competent supervision, and achieving relevant short-term and longer-term outcomes (e.g. generalization of supervision to therapy). Developments within the field should be wedded to this strategy, such as clinical outcome-monitoring (e.g. Reese et al., Reference Reese, Usher, Bowman, Norsworthy, Halstead and Rowlands2009). In addition, it should be noted that other statistical approaches to the analysis of N = 1 data are available and offer certain advantages, particularly for longitudinal analyses (McKnight, McKean and Huitema, Reference McKnight, McKean and Huitema2000), and for estimating effect sizes (Parker, Vannest and Davis, Reference Parker, Vannest and Davis2011) and it is possible that such an analysis would yield different conclusions in the present study.
Conclusions
This intensive case study indicates the potential for multiple-baseline N = 1 designs to advance our understanding of CBT supervision, moving supervision towards its rightful place as a science-informed specialization within professional practice. The present analysis was novel in demonstrating the fidelity and comparative effectiveness of two closely-related supervision approaches. However, as a preliminary, small-scale analysis, it is appropriate to next undertake improved N = 1 studies, before proceeding to larger sample evaluations. The present evaluation indicates that the present methodology is viable, illuminating, and is capable of fulfilling the expectations placed on it (e.g. Falender et. al., Reference Falender, Cornish, Goodyear, Hatcher, Kaslow and Leventhal2004).
Comments
No Comments have been published for this article.