Introduction
Surgical education has long relied on the model of apprenticeship, where an expert surgeon guides trainees during cadaveric dissection and later during surgery. However, the exclusive use of apprenticeship is becoming impractical in the current climate, where there are increasing demands on surgeons’ time, legislation has limited trainees’ working hours and cadaveric material is less available. Virtual reality simulation is one of the more promising new approaches to surgical training because it offers an environment in which repeated, risk-free practice can be undertaken, and trainees can be exposed to a wide range of cases, including rare pathologies.Reference Hammoud, Nuthalapaty, Goepfert, Casey, Emmons and Espey 1 , Reference Reznick and MacRae 2
Currently, simulation-based training still requires the presence of expert trainers, because most simulators provide limited or no automated guidance during training. This limits the scope and flexibility of simulation in training. Therefore, we and others have begun to explore the provision of automated guidance in simulated environments. For example, Rhienmora et al. provided real-time feedback on individual performance metrics such as force, position and orientation in a dental simulator, by making comparisons with average metric values obtained from expert performances.Reference Rhienmora, Haddawy, Suebnukarn and Dailey 3 Fried et al. quantitatively defined a range of errors for surgical performance (violation of tissue, violation of instrument tolerances, force patterns and so on) and provided feedback in an endoscopic sinus surgery simulator, by making comparisons with metrics from pre-recorded performances.Reference Fried, Satava, Weghorst, Gallagher, Sasaki and Ross 4 The metrics used were able to differentiate an expert's performance from that of a novice, and found improvement in the performance of novices with repetition. Zhou et al. provided guidance that emulated the advice of a human expert by developing multivariate models of expert and trainee behaviour, and comparing a real-time drilling technique to these models.Reference Zhou, Bailey, Ioannou, Wijewickrema, Kennedy and O'Leary 5 , Reference Zhou, Bailey, Ioannou, Wijewickrema, O'Leary and Kennedy 6 These metrics were used successfully to provide feedback on surgical technique.
Most automated real-time guidance in surgical simulation has so far focused on surgical technique (based on motor skills), and less attention has been paid to how a procedure can be taught in such an environment. We have found that the provision of automated feedback on technical aspects of surgery does not necessarily teach trainees how to perform an operation.Reference Wijewickrema, Piromchai, Zhou, Ioannou, Bailey and Kennedy 7 This suggests that additional guidance, perhaps pertaining to the sequence of steps undertaken by the surgeon, may be needed. This we refer to as procedural skills training, and here we investigate its importance and how it relates to automated guidance on surgical technique.
Procedural advice has typically been provided to students prior to training on a simulator in the form of instructions (e.g. video tutorials).Reference Hatala, Cook, Zendejas, Hamstra and Brydges 8 – Reference Scott, Goova and Tesfay 10 However, for trainees who are not familiar with the procedure, this may not be sufficient, and additional guidance during the procedure may be beneficial. Literature suggests ‘path following’, where visual cues guide the trainee on the steps pertaining to a surgical procedure, as a way to overcome this issue.Reference Lamata, Enrique, Bello, Kneebone and Aggarwal 11 , Reference Crossan, Brewster, Reid and Mellor 12 Passmore et al. provided such visual guidance using a spline that indicated the path to follow in a virtual laparoscopic simulator.Reference Passmore, Nielsen, Cosh and Darzi 13 Botden et al. presented visual cues such as arrows to guide trainees performing a suturing task.Reference Botden, de Hingh and Jakimowicz 14 In this paper, we investigate ways of implementing procedural guidance using such visual cues within the context of temporal bone surgery simulation. We explore the effectiveness and usefulness of these forms of presentation, when provided in parallel with surgical technique guidance, through randomised controlled trials.
Materials and methods
Simulation platform
The simulation platform used in this study is a virtual reality temporal bone surgery simulator (Figure 1). It presents the surgeon with two slightly offset images to produce the illusion of a three-dimensional (3D) operating space, when viewed through 3D glasses. Major anatomical structures that must be identified without injury during surgery, such as the facial nerve, sigmoid sinus, dura, ossicles and the labyrinth, are represented in the virtual temporal bone. The surgeon interacts with the virtual temporal bone using a pen-like haptic device (surgical drill) that provides force feedback in three dimensions.

Fig. 1 A surgeon performing temporal bone surgery on the temporal bone simulator.
The simulator saves a stream of data (such as drill position, force, speed, proximity to anatomical structures and so on) at regular intervals. Surgical technique feedback was provided by identifying expert and trainee behaviour patterns in these data based on an existing algorithm.Reference Zhou, Bailey, Ioannou, Wijewickrema, Kennedy and O'Leary 5 – Reference Wijewickrema, Piromchai, Zhou, Ioannou, Bailey and Kennedy 7
In order to provide procedural guidance during surgery, the performance of an expert otologist was used as a reference. The areas of the virtual temporal bone drilled by the expert were highlighted during the procedure. The areas drilled by the expert were presented in two different ways, and each was tested in a randomised controlled trial (RCT). In the first presentation method, the complete area drilled by the expert surgeon was highlighted all at the one time. In the second, the expert procedure was manually segmented into a sequence of logical steps and shown to the trainee one at a time, following the surgical flow of the operation. See Figure 2 for examples of the two presentation modalities.

Fig. 2 Procedural guidance provided in the form of coloured overlays on top of the virtual temporal bone: (a) all drillable areas highlighted at once, and (b) drillable areas highlighted once 75 per cent of the current step has been completed.
Study design
The studies discussed in this paper were approved by the Human Research Ethics Committee of the University of Melbourne, Australia (committee approval number: 1135497).
Initially, we tested whether procedural guidance where all drillable areas were highlighted at once (Figure 2a) was effective in improving surgical performance. This involved an RCT of 30 medical students (trial 1, Figure 3a). Participants were shown a video tutorial on how to perform a cortical mastoidectomy, taught how to use the simulator and given 5 minutes to familiarise themselves with the virtual reality environment.

Fig. 3 Flow diagram of: (a) trial 1 and (b) trial 2. N = control group; T = technical guidance only group; PT = technical and procedural guidance group
Participants were then stratified based on year of study and randomly allocated to one of three groups: (1) the control group, which did not receive automated guidance during drilling; (2) the technique guidance group, which received only feedback on surgical technique; and (3) the technique and procedural guidance group, which received guidance on where to drill and feedback on surgical technique.
The participants then performed the virtual mastoidectomy procedure twice. In order to test the usability of the automated procedural guidance, those in the technique and procedural guidance group were given the option to turn the procedural feedback on or off, as desired. We reasoned that if the participants found the procedural guidance useful and engaging, they would use it when given the option.
A second implementation of the procedural feedback was developed to be more interactive (Figure 2b), and this was tested in a second RCT (trial 2, Figure 3b). This feedback provided step-by-step guidance, where the trainee was shown each step of the procedure once 75 per cent of the previous step had been completed. This implementation was tested with a group of 20 participating medical students. The design was similar to that of trial 1, except that in trial 2 there was no group that only received technique feedback. This is because of a lack of an effect in trial 1, as presented in the results below. As in trial 1, the usefulness of the automated guidance method was evaluated in terms of the percentage of time that participants used it when given the option during the second run through the simulation.
Evaluation method
To evaluate the effectiveness of procedural guidance, we compared the quality of the temporal bone dissection performed by participants who received procedural guidance with that of participants who did not. The quality of dissection was assessed by an expert otologist, who was blinded to the identity of the participants and the study groups. This assessment utilised a validated tool, the Welling scale.Reference Butler and Wiet 15 This is a 35-item binary (0 or 1) grading instrument that evaluates different aspects of dissection, such as the adequacy of saucerisation and skeletonisation.
Data from a previous study informed the sample size calculations.Reference Wijewickrema, Piromchai, Zhou, Ioannou, Bailey and Kennedy 7 That study had 12 participants per arm, and a mean difference of 21.25 per cent in the outcome measure was found between the feedback and no-feedback groups.Reference Wijewickrema, Piromchai, Zhou, Ioannou, Bailey and Kennedy 7 The standard deviation for the groups varied from 13.43 to 15.19 per cent. Assuming the worst-case scenario of a standard deviation of 15.19 per cent, p < 0.05 and power = 0.8, a difference between the groups of 20.2 per cent could be detected for 10 participants per arm (t-test, Minitab® statistical software, version 17). Assuming that similar effects could be observed for quality of dissection scores, with respect to the 35-point Welling scale,Reference Butler and Wiet 15 this translates to the trials being able to detect effects (differences between the groups) of 7.4 points.
As the experimental conditions changed between run 1 and 2 (in the technique and procedural guidance group, guidance was turned on throughout the procedure in run 1, while participants were given the option to turn it on when needed in run 2), the quality of dissection scores from each run were analysed separately. The Kruskal–Wallis statistic was used because these data were not parametric. Post-hoc analyses were conducted using a Bonferroni adjustment. Differences in the quality of dissection scores between runs within each study group were evaluated using the Wilcoxon signed-rank test. The percentage of time the students used procedural guidance was calculated and compared to the total time of each procedure, in order to evaluate the usefulness of the presentation method. A confidence interval of 95 per cent was considered when testing for significance (p ≤ 0.05). Data analysis was performed using Matlab® R2014a software.
Results
In trial 1, all of the areas to be drilled were highlighted simultaneously (Figure 2a). During run 1, use of the procedural guidance was mandated, and there was a significant difference between the end-product scores of the groups (χ2 (2) = 6.83, p = 0.03). A post-hoc analysis revealed that participants who received technique and procedural feedback (technique and procedural guidance group) performed the surgery significantly better than those who received technique feedback only (technique guidance group) (Figure 4). During run 2, use of the procedural feedback was optional, and could be toggled on or off by the participant. Optional usage of the automated guidance was observed to be extremely low. Only 4 out of 10 students did not use the procedural guidance at all. The remaining participants used the procedural guidance for 3.73 per cent of the total time. There was no significant difference in performance between the users and non-users of the feedback in run 2.

Fig. 4 Quality of dissection scores of the groups for run 1 in trial 1. N = control group; T = technical guidance only group; PT = technical and procedural guidance group
Participants’ quality of dissection scores between runs were compared for the three groups. The group that received no feedback (control group) showed a significant improvement in the scores (Wilcoxon signed-rank test, p = 0.03). The scores of the technique guidance and technique and procedural guidance groups did not improve significantly between runs.
We surmised that although the procedural feedback delivered in trial 1 had been effective in improving performance, it had not engaged the user, given the low rate of voluntary usage. Hence, we devised the step-by-step guidance approach (Figure 2b), and then designed a second study (trial 2) to test its effectiveness and acceptance. We reasoned that the most efficient experimental control would be to compare both procedural and technique feedback (technique and procedural guidance group) against no feedback, given that: (1) the technique and procedural guidance group performed better than those who received technique guidance alone in trial 1; and (2) in practice, both procedural and technique feedback would be provided together, so it was best to compare this with the no-feedback condition.
In trial 2, all 10 students used procedural guidance during run 2, and for an average of 60.40 per cent of the total time. The group that received automated guidance (the technique and procedural guidance group) performed significantly better with respect to the quality of dissection scores, when compared to the group that received no guidance (control group), in both runs (χ2 (1) = 14.16, p < 0.001 in run 1, and χ2 (1) = 9.35, p = 0.002 in run 2; Figure 5).

Fig. 5 Quality of dissection scores of the groups in: (a) run 1 and (b) run 2, for trial 2. N = control group; PT = technical and procedural guidance group
The quality of dissection scores of participants who did not receive guidance (control group) was also found to improve between the two runs (Wilcoxon signed-rank test, p = 0.03). No significant difference was observed between runs in the technique and procedural guidance group.
Discussion
Measures of surgical simulation validity assess whether the simulation is actually teaching or evaluating what was intended.Reference McDougall 16 Validity is multifaceted, and includes measures such as face validity (realism of the simulator), content validity (appropriateness of the simulator to teach the subject matter), construct validity (ability to distinguish experienced and inexperienced surgeons) and criterion validity (the effectiveness of the simulator when compared to an alternative technique, e.g. conventional training). In this paper, we assessed objective criterion validity to investigate the effectiveness of an automated guidance system in teaching surgical procedures.
The study design used here is similar to that used in previous studies, specifically those that evaluated guidance and feedback techniques. For example, Wijewickrema et al. conducted a randomised controlled trial (RCT) of medical students to evaluate the effectiveness of automated technical feedback.Reference Wijewickrema, Piromchai, Zhou, Ioannou, Bailey and Kennedy 7 Strandbygaard et al. investigated the impact of instructor feedback versus no instructor feedback when training participants on a complex operational task using a laparoscopic virtual reality simulator, within a randomised trial.Reference Strandbygaard, Bjerrum, Maagaard, Winkel, Larsen and Ringsted 17 Similarly, Kruglikova et al. conducted an RCT to assess the impact of external feedback on learning curves when using a virtual reality colonoscopy simulator.Reference Kruglikova, Grantcharov, Drewes and Funch-Jensen 18
The results of this study indicate that real-time procedural guidance was effective in improving participants’ quality of dissection scores. Both the full and step-by-step presentation methods of procedural guidance significantly improved the performance of trainees as compared to the control group. However, when the use of procedural feedback was optional, the step-by-step guidance method was used more often, and for longer, than the first method trialled (10 out of 10 students, 60.40 per cent of the total time on average, compared with 4 out of 10 students, using procedural guidance for only 3.73 per cent of the total time). This suggests that the interactive step-by-step guidance was considered more useful by the trainees. Furthermore, trainees receiving the stepwise guidance maintained their performance advantage over controls when given the option to use it (Figure 5). These findings suggest that when providing feedback, the presentation modality should be considered carefully. The more interactive and relevant to the task at hand that the presentation method is, the higher the level of trainee engagement in the procedure.
Previous literature suggested that the provision of real-time procedural feedback would not necessarily lead to improved surgical performance. In a systematic review on the effect of feedback in acquiring motor skills, Hatala et al.Reference Hatala, Cook, Zendejas, Hamstra and Brydges 8 cautioned that in several surgical simulations (e.g. knot-tying,Reference Xeroulis, Park, Moulton, Reznick, LeBlanc and Dubrowski 19 colonoscopyReference Walsh, Ling, Wang and Carnahan 20 and joint mobilisationReference Chang, Chang, Chien, Chung and Hsu 21 ), the provision of concurrent (immediate) advice may lead to over-reliance on this feedback. The latter could result in a decrement in performance over the long-term,Reference Wulf, Shea, Williams and Hodges 22 as suggested by Salmoni et al.,Reference Salmoni, Schmidt and Walter 23 and Wulf and Shea.Reference Wulf, Shea, Williams and Hodges 22 The improved performance seen in the present study shows this may not be the case when feedback is applied in an engaging manner. However, further research is required to determine how the provision of this feedback influences the retention or transfer of skills.
In order to understand why participants may have found the step-by-step feedback more engaging, it is helpful to reflect upon the nature of the surgery. Drilling of a temporal bone involves the sequential exposure of a series of anatomical structures. The step-by-step method followed this sequence, ‘showing the way’ for the trainee. In contrast, the first method presented all the bone to be drilled in one block. Hence, it could be argued that there was in fact more information provided in the second implementation; not only did this method provide information on where the surgeon should drill, but also when. The improved usage and surgical outcomes with this approach is indirect evidence that these characteristics are both valued and constructive for surgeons in training. This method also lends itself to implementation in simulation with more complex anatomical variations, where a step-by-step manner of guidance may aid the learner in understanding a logical approach to variations in anatomy.
This study provided not just one, but two forms of concurrent real-time feedback. A previous study demonstrated that the drill-handling technique of inexperienced surgeons could be driven towards that exhibited by experts by the provision of real-time technique feedback.Reference Wijewickrema, Piromchai, Zhou, Ioannou, Bailey and Kennedy 7 That feedback was provided as a series of oral ‘suggestions’, and was optimised to be helpful but not obtrusive. The provision of both procedural and technique feedback in this study raised some interesting questions concerning implementation. Cognitive load theory suggests that the working memory used in learning is limited and, as such, only a certain amount of information can be processed simultaneously.Reference Sweller 24 , Reference Kirschner 25 It was not clear in advance whether the participants would be able to assimilate the simultaneous provision of both types of feedback. In order to minimise this risk, we chose to present the technique and the procedural feedback through different sensory modalities. Technique feedback was provided through verbal instruction and procedural guidance was presented as visual cues in order to be as unobtrusive as possible. This approach appears to have been effective.
In both the trials discussed here, a significant learning effect was observed between runs in the group that received no feedback (control group). This is consistent with the predictions of the theory of deliberate practice, which states that in order for a novice to become an expert, an individual is required to undertake tasks with the explicit intent of improving skills.Reference Ericsson 26 It calls for the trainee to focus on a particular task in order to improve specific aspects of performance, and involves repeated practice with coaching and immediate feedback on performance.Reference Cox, Irby, Reznick and MacRae 27 It is interesting, however, that the participants receiving feedback (the technique and procedural guidance group) did not show a statistically significant learning effect from one repetition to the next, even though their scores were better than those achieved by non-feedback controls. This might be a result of the divided attention associated with students having to focus on multiple aspects of the procedure at once.
A limitation of this study was that there was no testing of how multiple types of feedback might affect students’ performance, and further studies are needed to resolve this question. Further, given that this was a pilot study conducted to evaluate the usability of two presentation modes for automated feedback, there was no baseline assessment of participant performance or testing of skill retention. These would be incorporated into future studies with the ultimate goal of developing a fully automated surgical training module to teach temporal bone surgery. Initial work towards this goal, to automate the segmentation of the surgical trajectory in providing procedural guidance, is discussed in Wijewickrema et al.Reference Wijewickrema, Zhou, Bailey, Kennedy and O'Leary 28
-
• Virtual reality simulation based surgical training still requires the presence of expert trainers
-
• This is because most simulators only provide limited or no automated guidance during training
-
• Two different methods of presenting automated procedural guidance for temporal bone surgery were explored
-
• Both the display of all drillable areas and a step-by-step display of regions led to improved performance
-
• Step-by-step display of procedural guidance was more engaging and hence useful for teaching a surgical procedure
In conclusion, the results of the studies discussed here indicate that real-time guidance on procedure in simulated temporal bone surgery is effective in teaching the steps of a surgical procedure. Such guidance, when coupled with other forms of feedback, such as that on surgical technique, can be used to reduce the burden placed on human experts during training and pave the way for the development of fully self-directed platforms for training surgeons.
Acknowledgements
This work was funded by an Australian Research Council Linkage Project Grant, in which Cochlear Ltd is a partner (grant number: LP130100806). The Biostatistics Hub at the Eastern Hill/St Vincent's Academic Centre (University of Melbourne) was consulted for statistical advice.