INTRODUCTION
Egocentric spatial neglect is characterised by an inability to respond to stimuli presented in the contralesional hemifield (Heilman, Bowers, Valenstein, & Watson, Reference Heilman, Bowers, Valenstein and Watson1987). Neglect is commonly diagnosed using a test battery including cancellation tasks (Albert, Reference Albert1973; Azouvi et al., Reference Azouvi, Bartolomeo, Beis, Perennou, Pradat-Diehl and Rousseaux2006; Donnelly et al., Reference Donnelly, Guest, Fairhurst, Potter, Deighton and Patel1999; Rorden & Karnath, Reference Rorden and Karnath2010; Vaes et al., Reference Vaes, Lafosse, Nys, Schevernels, Dereymaeker, Oostra, Hemelsoet and Vingerhoets2015; Weintraub & Mesulam, Reference Weintraub and Mesulam1988). In these tasks, patients with egocentric neglect cancel fewer targets on the contralesional than on the ipsilesional side of space, while healthy controls cancel a high number of targets evenly throughout the search array (Dalmaijer, Stigchel, Nijboer, Cornelissen, & Husain, Reference Dalmaijer, Stigchel, Nijboer, Cornelissen and Husain2014; Donnelly et al., Reference Donnelly, Guest, Fairhurst, Potter, Deighton and Patel1999; Gauthier, Dehaut, & Joanette, Reference Gauthier, Dehaut and Joanette1989).
Performance on cancellation tasks can be quantified by comparing the number of cancelled targets on the left and right side of space (R-L score) or with the Centre of Cancellation (CoC), which represents the average location of cancelled targets (e.g. Azouvi et al., Reference Azouvi, Bartolomeo, Beis, Perennou, Pradat-Diehl and Rousseaux2006; Demeyere & Gillebert, Reference Demeyere and Gillebert2019; Demeyere, Riddoch, Slavkova, Bickerton, & Humphreys, Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015; Robertson et al., Reference Robertson, Halligan, Bergego, Hömberg, Pizzamiglio, Weber and Wilson1994; Rorden & Karnath, Reference Rorden and Karnath2010; Vaes et al., Reference Vaes, Lafosse, Nys, Schevernels, Dereymaeker, Oostra, Hemelsoet and Vingerhoets2015). These measures aim to capture the spatial asymmetry typical of neglect, that is, the difference in the probability to cancel targets between the left and right side of the cancellation display (Huygelier & Gillebert, Reference Huygelier and Gillebert2018). To establish whether a patient has neglect, the R-L or CoC score is compared to a single impairment threshold, not considering the total number of cancelled targets (e.g. Demeyere et al., Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015; Rorden & Karnath, Reference Rorden and Karnath2010). We refer to this type of thresholds as fixed normative cutoffs. Fixed normative cutoffs are based on percentiles of test scores obtained in neurologically healthy individuals or stroke patients without egocentric neglect (e.g. Demeyere, et al., Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015; Rorden & Karnath, Reference Rorden and Karnath2010). These percentiles represent the most extreme test scores that can be observed in individuals without neglect and are expected to limit false-positive diagnosis.
Measurement Precision of Cancellation Performance Depends on Non-Spatial Impairments
Despite the popularity of cancellation tasks, few studies investigated their measurement precision (Bailey, Riddoch, & Crome, Reference Bailey, Riddoch and Crome2004; Machner, Mah, Gorgoraptis, & Husain, Reference Machner, Mah, Gorgoraptis and Husain2012). Measurement precision is important as it affects the certainty with which conclusions can be drawn based on test scores (Crawford & Garthwaite, Reference Crawford and Garthwaite2002; Lord, Reference Lord1952; Lord, Novick, & Birnbaum, Reference Lord, Novick and Birnbaum1968; Slick, Reference Slick, Strauss, Sherman and Spreen2006). In classic test theory, it is assumed that the observed test score of a person on a specific moment can be divided in two parts: the true underlying test score Footnote 1 that remains stable across test moments and the measurement error that leads to probabilistic variation of observed test scores across moments (Novick, Reference Novick1966). Thus, the observed score of an individual is considered an estimate of the true underlying score (Lord, Novick, & Birnbaum, Reference Lord, Novick and Birnbaum1968; Slick, Reference Slick, Strauss, Sherman and Spreen2006), and measurement precision refers to the extent to which an observed score reflects the true score. When measurement precision is low, the chance is low that we can observe the same score across repeated assessments.
Bailey et al. (Reference Bailey, Riddoch and Crome2004) and Machner et al. (Reference Machner, Mah, Gorgoraptis and Husain2012) estimated the measurement error of the total number of cancellations and the R-L score divided by the total number of cancellations by studying the variability in these scores across repeated assessments in neglect patients. The authors observed considerable variability in cancellation scores across repeated testing. However, by estimating the measurement error across all patients, they treated the error as a constant property, reflecting the assumption that each patient’s observed score is associated with the same level of error. Measurement error is indeed considered a constant property of scores in classic test theory (Novick, Reference Novick1966), but this principle may not apply to cancellation scores. That is, in contrast to continuous outcome measures, responses on a cancellation task are discrete: a patient either cancels or omits a single target.
Given their discrete nature, cancellation responses are best described by a binomial distribution with the mean equal to NP and variance equal to NP(1 – P), where N is the number of targets and P is the probability to cancel a target. The association between the mean and variance of the distribution (Brown, Thomas, & Patt, Reference Brown, Thomas and Patt2017; Lord, Reference Lord1952; Lord et al., Reference Lord, Novick and Birnbaum1968; McDonald, Reference McDonald2011) implies that the measurement error of cancellation test scores is variable. Applying the binomial formula to a cancellation task with 50 targets reveals that we can expect, for instance, a variance of 12.5 cancelled targets across repeated tests for a patient with 50% chance to cancel each target and a variance of 4.5 cancelled targets for a patient with a 10% or 90% chance to cancel each target. In other words, the measurement error of the number of cancelled targets is highest when the probability to cancel a target equals 50% and decreases as the probability approximates 0% or 100%. Binomial variance affects test scores that rely on the number of cancelled targets such as the R-L score in the same way. The CoC measurement error has already been shown to be variable, with the CoC standard deviation increasing as the number of cancelled targets decreases (Toraldo, Romaniello, & Sommaruga, Reference Toraldo, Romaniello and Sommaruga2017).
This non-constant error variance may be important since stroke patients likely vary in their ability to cancel targets in a non-lateralised way, or in other words, in the number of omissions made across the visual field. We will refer to the reduced probability of cancelling targets irrespective of their location in the visual field as non-spatial errors (Huygelier & Gillebert, Reference Huygelier and Gillebert2018). Non-spatial errors can result from a mix of multiple non-spatial impairments, including, among others, impairments in selective and sustained attention (Foldi, Jutagir, Davidoff, & Gould, Reference Foldi, Jutagir, Davidoff and Gould1992) and working memory (Husain et al., Reference Husain, Mannan, Hodgson, Wojciulik, Driver and Kennard2001). The measurement error of cancellation test scores will be higher in stroke patients with versus without such non-spatial impairments. Consequently, the most extreme R-L and CoC scores that can be observed in individuals without neglect will depend on non-spatial impairments. Therefore, it is unclear whether fixed normative cutoffs can successfully limit the rate of false-positive neglect diagnoses as these cutoffs ignore non-spatial errors.
The chance of false-positive neglect diagnosis also depends on how multiple test instruments inform the diagnosis. Although it is not uncommon to diagnose neglect on a single cancellation task (e.g. Brink, Verwer, Biesbroek, Visser-Meily, & Nijboer, Reference Brink, Verwer, Biesbroek, Visser-Meily and Nijboer2017; Demeyere & Gillebert, Reference Demeyere and Gillebert2019; Farnè et al., Reference Farnè, Buxbaum, Ferraro, Frassinetti, Whyte, Veramonti, Angeli, Coslett and Làdavas2004; Nijboer, Kollen, & Kwakkel, Reference Nijboer, Kollen and Kwakkel2013), many researchers and clinicians use multiple tasks (e.g. cancellation, line bisection, and figure copying) to diagnose egocentric neglect. There is however no gold-standard approach for reconciling conflicting diagnoses from multiple tests, as diagnostic criteria vary considerably across published studies. Some studies required unanimous diagnostic agreement from two cancellation tasks (e.g. Smania et al., Reference Smania, Martini, Gambina, Tomelleri, Palamara, Natale and Marzi1998), while others required a certain proportion of agreement across multiple tasks varying from 17% to 67% (e.g. Cazzoli et al., Reference Cazzoli, Müri, Schumacher, von Arx, Chaves, Gutbrod, Bohlhalte, Bauer, Vanbellingen, Bertschi, Kipfer, Rosenthal, Kennard, Bassetti and Nyffeler2012; Dalmaijer et al., Reference Dalmaijer, Li, Gorgoraptis, Leff, Cohen, Parton, Husain and Malhotra2018; McIntosh, Schindler, Birchall, & Milner, Reference McIntosh, Schindler, Birchall and Milner2005; Plummer, Morris, & Dunai, Reference Plummer, Morris and Dunai2003; Rengachary, He, Shulman, & Corbetta, Reference Rengachary, He, Shulman and Corbetta2011; Rorden & Karnath, Reference Rorden and Karnath2010; Urbanski et al., Reference Urbanski, Schotten, Rodrigo, Oppenheim, Touzé, Méder, Moreau, Loeper-Jeny, Dubois and Bartolomeo2010; Verdon, Schwartz, Lovblad, Hauert, & Vuilleumier, Reference Verdon, Schwartz, Lovblad, Hauert and Vuilleumier2010). The impact of these different methods on the rate of false positives has not been investigated yet.
The Present Study
Previous studies developed fixed cutoffs to interpret R-L and CoC scores to limit false positives below a specific threshold (e.g., Demeyere et al., Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015; Rorden & Karnath, Reference Rorden and Karnath2010). We investigated whether fixed cutoffs can indeed control false positives. We did not study how cutoffs affect the balance between false positives and false negatives, since their optimal balance depends on the diagnostic context (Habibzadeh, Habibzadeh, & Yadollahie, Reference Habibzadeh, Habibzadeh and Yadollahie2016). We simulated cancellation data using a simple probabilistic model (Huygelier & Gillebert, Reference Huygelier and Gillebert2018). Our model merely represents non-spatial errors on cancellation tasks and makes no assumptions of the underlying non-spatial impairments or how search strategy affects performance. We used Monte Carlo simulation, a method to make inferences using random numbers that follow a certain probability distribution (i.e. in our case the binomial distribution) (Beisbart & Norton, Reference Beisbart and Norton2012). This method allows full control over the true test score underlying the observed test score, making it valuable for psychometric research (Feinberg & Rubright, Reference Feinberg and Rubright2016) and providing theoretical insights that can aid clinical decision making (Beaujean, Reference Beaujean2018).
Using these simulated data, we assessed the impact of fixed cutoffs on false-positive diagnosis. We predicted that the measurement precision of R-L scores would be lowest in simulated cases who had 50% non-spatial errors, which in turn would result in an inflated false-positive neglect diagnosis. For the CoC, we predicted the lowest measurement precision and highest false-positive rates for the highest percentages of non-spatial errors, in line with earlier findings (Toraldo et al., Reference Toraldo, Romaniello and Sommaruga2017). This procedure was compared to a new approach, in which cutoffs were adjusted according to the total cancelled targets. We predicted better control over false positives using adjusted cutoffs compared to fixed cutoffs. Moreover, we assessed the impact of different diagnostic methods on false positives. Neglect was either diagnosed on a single test or on multiple tests. In case of multiple tests, we compared an approach of diagnosing neglect on unanimous versus proportional agreement of test results.
Finally, we aimed to illustrate the real-world impact of our premises using cancellation data acquired from a cohort of 651 stroke patients. In this analysis, we first assessed the occurrence and the distribution of non-spatial errors in our sample of stroke patients. We then assessed the percentage of patients for which fixed normative versus adjusted cutoffs would lead to a different diagnosis. Finally, we assessed whether fixed normative versus adjusted cutoffs differed most in rates of diagnosing neglect for patients who made approximately 50% non-spatial errors.
METHOD
The Theoretical Impact of Diagnostic Methods on False-Positive Rates
Simulating cancellation data using a binomial model
To simulate cancellation data, we generated 50 uniformly distributed target locations ranging from −1 (left border of search matrix) to +1 (right border of search matrix). We chose 50 targets, since this approximates the number of targets of many cancellation tasks: the Bells test with 35 targets (Gauthier et al., Reference Gauthier, Dehaut and Joanette1989), line cancellation with 40 targets (Albert, Reference Albert1973), diamond cancellation with 48 targets (Vaes et al., Reference Vaes, Lafosse, Nys, Schevernels, Dereymaeker, Oostra, Hemelsoet and Vingerhoets2015), Oxford Cognitive Screen (OCS) cancellation with 50 targets (Demeyere et al., Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015), Star cancellation with 54 targets (Halligan, Cockburn, & Wilson, Reference Halligan, Cockburn and Wilson1991) and the letter cancellation with 60 targets (Weintraub & Mesulam, Reference Weintraub, Mesulam and Mesulam1985).
Then, cancellation responses for each target were simulated according to a simple binomial model (see Huygelier & Gillebert, Reference Huygelier and Gillebert2018). The model assumes that cancellation responses result from a probabilistic process in which each target has a certain probability to be cancelled. In case of egocentric neglect, the probability underlying cancellation responses on the contralesional side is smaller than the probability underlying cancellation responses on the ipsilesional side of the cancellation page. The difference between these two probabilities is referred to as ‘spatial asymmetry’. For patients with no neglect, these probabilities are equal. The cancelled targets depend on these two probabilities and are randomly generated using a binomial distribution.
Importantly, our model assumes that the probability to cancel targets across the entire display can be smaller than 1, allowing for non-spatial errors to occur. Note that in case of no true spatial asymmetry, non-spatial errors will be directly related to overall cancellation performance. That is, if a patient has a probability of cancelling 50% of all targets across the cancellation array, then the expected total cancelled targets will be equal to 50%. In case of a true spatial asymmetry, the overall cancellation performance can reflect a combination of non-spatial and spatial errors. Using this model, cancellation responses were simulated for patients with no spatial asymmetry and with varying levels of non-spatial errors. The level of non-spatial errors ranged from 0 to 1 in 10 steps with 10,000 simulations for each level, producing a dataset with 110,000 simulated observations.
Finally, the R-L and CoC scores were calculated for each of these simulated cancellation tasks. The R-L score was calculated by subtracting the proportion cancelled targets for the 25 targets located on the left side from that of the 25 targets located on the right side. The CoC was calculated by averaging the location of cancelled targets and subtracting the average location of all targets. Both R-L and CoC scores ranged from −1 to +1, where negative values indicated more cancelled targets on the left than right side and vice versa.
Estimating fixed and adjusted cutoffs
The 5th and 95th percentiles of the R-L and CoC scores were estimated based on our simulated dataset. Note that, based on the law of large numbers, the percentile cutoffs will represent the true percentiles. We chose the 5th and 95th percentiles in line with the cutoffs reported by Demeyere et al. (Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015). Two types of R-L and CoC cutoffs were calculated; fixed and adjusted cutoffs. To estimate the fixed cutoffs, a subset of simulated data in which at least 80% targets had been cancelled was chosen as this performance is similar to that of neurologically healthy individuals as reported for instance in Demeyere et al. (Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015). The fixed cutoffs are based on different expected total performances. Adjusted cutoffs were determined on subsets of simulated data, each with a specific expected proportion of cancelled targets ranging from 0 to 1 in 10 steps.
False positives for different diagnostic methods
Fixed and adjusted cutoffs were used to interpret the simulated cancellation test scores. Each of the 110,000 simulated individual performances was classified into one of three categories. For a single test administration, observations with an R-L or CoC score greater than the 95th percentile were classified as left neglect, while observations with an R-L score lower than the 5th percentile were classified as right neglect. Single observations with R-L or CoC scores between the 5th and 95th percentile were classified as no neglect. This classification was performed separately for the fixed and adjusted cutoffs. For the unanimous agreement, the number of administered tests varied from 2 to 3 and neglect was only diagnosed when each of the administered tests consistently indicated neglect for the same side of space (i.e. consistent left-sided or consistent right-sided neglect). For the proportional agreement, we simulated 5 test administrations and varied the minimum number of positive test results required for diagnosis from 1 to 4. Observed scores across simulated retests varied probabilistically, while the true scores remained constant. Then, for each classification method, we calculated the rate of false-positive diagnoses. Note that false positives should remain below 10% when testing for the presence of left- and right-sided neglect (two-sided testing) when using the 5th and 95th percentiles and a single test. That is, as 10% of the control group without neglect have test scores that exceed these percentiles, you expect 10% false-positive diagnosis. We checked whether false positives indeed remained below the 10% threshold, as this threshold has been used in previous studies (e.g., Demeyere et al., Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015). For unanimous agreement, false positives should be 1% and 0.1% for 2 and 3 unanimous tests respectively. For proportional agreement, false positives should follow 1 – [(1 – Qk)m], with Q the probability of a false positive, k the number of false positives and m the number of tests. Note that false positives refer to simulated cases that were classified as neglect patients of which the true R-L and CoC score was zero. However, in case of a test that is insufficiently able to detect neglect, a true test score of zero does not necessarily proof the absence of neglect.
The Real-World Impact of Fixed Normative versus Adjusted Cutoffs on Neglect Diagnosis
A consecutive sample of stroke survivors was recruited from the John Radcliffe Hospital (Oxford, UK) between February 2012 and September 2018 in compliance with the regulations of the National Research Ethics Service (11/WM/0299 and 14/LO/0648) and Helsinki declaration. Patients were included if they were able to remain alert for 20 min and were able to provide informed consent. Participant characteristics are reported in Table 1.
Table 1. Demographic and stroke characteristics of stroke sample
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200810125557142-0748:S1355617720000041:S1355617720000041_tab1.png?pub-status=live)
a Missing numbers represent patients whose stroke characteristics were unreported in their medical notes.
The OCS cancellation task is a search matrix of 150 heart drawings pseudo-randomly scattered across a landscape orientation A4 page. Two-thirds of these drawings have left or right gaps (distractors) and the remaining third are complete drawings (targets). These drawings are arranged according to a grid pattern ensuring that there are an equal number of targets and distractors across different areas of the page. Patients were asked to cross out all complete hearts. Patients were given two practice trials before proceeding to the full task. Patients who were unable to hold a pen responded by pointing to each stimulus which was then immediately marked by the examiner. Each patient was allowed 3 min to complete the task. Performance was summarised by subtracting the number of cancelled targets on the right side from the left side of this array and was divided by 20 targets. This R-L scoreFootnote 2 ranged from −1 to 1 and discards the central 10 targets. Demeyere et al. (Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015) reported fixed normative cutoffs (5th and 95th percentiles) of –.10 and .15.
Using these empirical data, we evaluated the implications of our simulations. First, we determined the prevalence of non-spatial errors in stroke patients without statistically significant neglect according to the fixed normative cutoffs. For this purpose, we calculated the proportions of patients obtaining a proportion of cancelled targets of either 0–.20, .22–.40, .42–.60, .62–.80 and .82–1.00 cancelled targets with an R-L score ≥ –.10 and ≤ .15. Second, we assessed whether fixed normative versus adjusted cutoffs lead to different diagnostic decisions. To this end, the proportion of patients classified as having statistically significant neglect based on the fixed normative cutoffs (i.e. –.10 and .15) and our adjusted cutoffs was calculated. For the adjusted cutoffs, stroke patients were categorised according to their total performance in different groups: 0–.20, .22–.40, .42–.60, .62–.80, .82–1.00 cancelled targets. Third, we assessed whether diagnostic decisions based on these two cutoffs differed most for patients who cancelled around .50 (range: .40 to .60 targets), for whom we expected the highest false-positive rates.
RESULTS
Fixed and Adjusted Cutoffs
First, fixed cutoffs were calculated based on the simulated dataset. In simulated cases where at least 80% of targets were cancelled, R-L scores more extreme than ±.12 represented statistically significant neglect impairment at the 10% level (Table 2). These cutoffs align well with fixed normative cutoffs with Demeyere et al. (Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015) reporting cutoffs of −.10 and .15 and Robertson et al. (Reference Robertson, Halligan, Bergego, Hömberg, Pizzamiglio, Weber and Wilson1994) reporting ±.10. CoC scores more extreme than ±.05 represented statistically significant neglect impairment at the 10% level based on our simulated data. These CoC cutoffs align well with the normative cutoffs reported by Rorden and Karnath (Reference Rorden and Karnath2010) of ±.08.
Table 2. Fixed and adjusted cutoffs for the two measures of cancellation performance
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200810125557142-0748:S1355617720000041:S1355617720000041_tab2.png?pub-status=live)
Note: p = expected total performance. p* = observed total performance. CoC = the standardised average location of all cancelled targets, R-L = the difference between the proportion of cancelled targets on the left versus right visual field. Pc = percentile.
Next, adjusted cutoffs were calculated for each performance level. Adjusted R-L cutoffs ranged from ±.24 in cases where 50% of targets were expected to be cancelled to ±.12 in cases where 90% or 10% of targets were expected to be cancelled (Table 2). Adjusted CoC cutoffs ranged from ±.51 in cases where 10% of targets were expected to be cancelled to ±.05 in cases where 90% of targets were expected to be cancelled. These adjusted cutoffs illustrate that extreme R-L scores were most likely when the probability to cancel targets was between 40% and 60%. Extreme CoC scores were most likely when the probability to cancel targets was equal to 10%.
False-Positive Rates for Different Diagnostic Methods
A single test administration
The rate of false positives was calculated based on fixed and adjusted cutoffs for a single test administration (Figure 1A). When R-L scores and fixed cutoffs were used to diagnose neglect, the rate of false positives increased as the non-spatial errors approximated 40–60%. False positives reached a maximum of 30% when the non-spatial errors equaled 50%. False positives exceeded the 10% threshold for non-spatial errors ranging from 20% to 80%. However, when using adjusted cutoffs, false positives remained below the 10% threshold for non-spatial errors between 20% and 80%. When CoC scores and fixed cutoffs were used to diagnose neglect, false positives increased as the non-spatial errors increased. The false positives exceeded the 10% threshold for all non-spatial errors larger than 10% and reached 90% for the highest level of non-spatial errors. The adjusted CoC cutoffs produced better control over false positives as they remained equal to or below the 10% threshold for non-spatial errors from 20% to 80%.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200810125557142-0748:S1355617720000041:S1355617720000041_fig1.png?pub-status=live)
Fig. 1. The simulated false positives as a function of the expected non-spatial errors for the CoC and R-L score for fixed cutoffs and adjusted cutoffs and for basing the diagnosis on a single test result (A) or on unanimous positive tests results based on two tests (B) and three tests (C). The dashed grey line represents the maximum expected false-positive rate based on the 5th and 95th percentiles (10%). The CoC is the standardised average location of all cancelled targets. The R-L is the difference between the proportion of hits on the left and right side of the cancellation array.
Multiple tests – unanimous agreement
Next, the false-positive rate was calculated in simulated cases where multiple consistent test results were required to diagnose neglect. In simulated cases where unanimous agreement across multiple tests was required for diagnosis, false positives decreased as the number of tests increased both when using fixed and adjusted cutoffs (Figures 1B and 1C). When using fixed cutoffs, R-L false positives dropped below the 10% level when two consistent positive test results were required (Figure 1B). CoC false positives dropped below the 10% threshold when three consistent positive test results were required (Figure 1C). False positives for the adjusted cutoffs were below 1% and 0.1% for 2 and 3 unanimous tests, respectively.
Multiple tests – proportional agreement
Finally, the rate of false positives was calculated for simulated cases in which positive test results from only a proportion of administered tests was required to diagnose neglect. In simulated cases where 1/5 positive test results were required for diagnosis, false positives exceeded the 10% threshold for almost all cases of non-spatial errors when using fixed cutoffs (Figure 2A) and adjusted cutoffs (Figure 2B) for both R-L and CoC scores. More specifically, for fixed cutoffs false positives ranged from 40% to 80% for R-L scores and from 40% to 100% for CoC scores for non-spatial errors ranging from 10% to 90%. Thus, false positives exceeded the expected 10% rate by a fourfold to a ninefold. When fixed cutoffs were used, false positives fell below the 10% threshold for R-L scores when 3/5 positive tests were required, but remained above this threshold for CoC scores even when 4/5 positive tests were required (Figure 2A). Alternatively, when adjusted cutoffs were used, false positives dropped to the 10% level for R-L and CoC cutoffs when at least 2/5 positive results were required to diagnose neglect (Figure 2B). For adjusted cutoffs, false positives approximated the expected rates of 41%, 5%, 0.5%, and 0.005% for 1/5, 2/5, 3/5, 4/5 positive results, respectively.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200810125557142-0748:S1355617720000041:S1355617720000041_fig2.png?pub-status=live)
Fig. 2. The simulated false positives as a function of the expected non-spatial errors for the CoC and R-L score for fixed cutoffs (A) and adjusted cutoffs (B) and for basing the diagnosis on a minimum of one, two, three, four positive tests out of five administered tests. The dashed grey line represents the maximum expected false-positive rate based on the 5th and 95th percentiles (10%). The CoC is the standardised average location of all cancelled targets. The R-L is the difference between the proportion of hits on the left and right side of the cancellation array.
The Real-World Impact of Fixed Normative versus Adjusted Cutoffs on Neglect Diagnosis
First, we estimated the prevalence of non-spatial errors in stroke patients without spatial asymmetry in cancellation performance. These patients were identified by comparing cancellation responses to the fixed normative cutoffs published by Demeyere et al. (Reference Demeyere, Riddoch, Slavkova, Bickerton and Humphreys2015). The results showed that 394 out of 651 stroke patients did not obtain an R-L score indicative of egocentric neglect. Figure 3A illustrates that a considerable number of these patients cancelled a low number of targets. More specifically, 28 of these 394 patients with normal R-L scores cancelled between 0% and 20% targets, 22 patients cancelled between 22% and 40% targets, 33 patients cancelled between 42% and 60% of targets, 69 patients cancelled between 62% and 80% of targets and 242 patients cancelled more than 80% of targets. Thus, 119 of these 394 patients who were not diagnosed with neglect according to fixed normative cutoffs (i.e. no spatial asymmetry in cancellation performance) cancelled between 20% and 80% of all targets. It is likely that their cancellation performance was affected by non-spatial impairments as they did not show a large difference in cancelled targets between both sides of the visual field, but still failed to cancel many targets. This group of patients represents 19% of our entire stroke cohort. These results suggest that the inflated false-positive rates in our simulations that applied to simulated cases with non-spatial errors from 20% to 80% and without spatial asymmetries could apply to 19% of all stroke patientsFootnote 3.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200810125557142-0748:S1355617720000041:S1355617720000041_fig3.png?pub-status=live)
Fig. 3. The relation between the total performance on the hearts cancellation task and R-L scores. The relation of R-L scores and total performance on the hearts cancellation task of 655 stroke patients (A). The grey circles represent data of the stroke patients. The black dashed line represents the fixed cutoffs −.10 and .15. The dark grey lines represent adjusted cutoffs based on the binomial model. The proportion of patients classified as a neglect patient according to the fixed cutoffs and not according to the adjusted cutoffs is visualised as a function of the total performance on the hearts cancellation task (B).
Patient cancellation responses were also compared to the fixed normative and the new adjusted cutoffs. As previously mentioned, when compared to fixed normative cutoffs, 394 patients did not show statistically significant neglect, implying that 257 patients exhibited statistically significant egocentric neglect. Of these 257 patients, 85 were not diagnosed with neglect according to the adjusted cutoffs. All patients who were diagnosed as neglect patients using the adjusted cutoffs were also diagnosed as neglect patients using the fixed normative cutoffs. The discrepancy between the number of patients diagnosed according to fixed normative versus adjusted cutoffs was highest for patients who cancelled between 40% and 60% of targets (Figure 3B).
DISCUSSION
While fixed normative cutoffs of cancellation test scores are frequently used for diagnosing egocentric neglect, our study shows that they do not adequately consider the discrete nature of cancellation responses. Discrete responses are best modeled by the binomial distribution of which the variance is not constant. This non-constant error variance implies that more extreme R-L scores are more likely to occur when the probability to make responses equals 50% across the cancellation task. By simulating data using a binomial model, we showed that extreme R-L scores were indeed most probable when 50% of targets were expected to be cancelled. In contrast, extreme CoC scores were more likely to occur when fewer targets were expected to be cancelled. This pattern of CoC error variance can be explained by the fact that the CoC relies on averaging the location of cancelled targets and the statistical average is less precise when it is based on a small sample (i.e. number of cancelled targets).
Our findings on the inflated false-positive rate for the CoC were consistent with an earlier study using a logistic model of spatial neglect (Toraldo et al., Reference Toraldo, Romaniello and Sommaruga2017). Toraldo et al. (Reference Toraldo, Romaniello and Sommaruga2017) predicted inflated false-positive rates as high as 97% when the number of cancelled targets decreases to a minimum and provided a standardised CoC score to correct for this inflation of false positives associated to non-spatial impairments. Our study adds onto these earlier findings by revealing the impact of multiple tests on false positives, by providing a direct comparison of false-positive rates for the CoC and R-L scores and by showing the impact of the non-constant error variance on real-world neglect diagnosis.
Patients with Non-spatial Impairments More Likely Misdiagnosed as Neglect Patients
Our simulations showed that false positives depended on non-spatial errors when diagnosis was based on fixed cutoffs. This suggests that fixed cutoffs lead to more liberal diagnostic decisions for certain patients just because those patients cancel fewer targets overall. In other words, our results suggest that patients with non-spatial impairments are more likely misdiagnosed as neglect patients. Real-world data of a large stroke cohort showed that a considerable number of patients likely have non-spatial impairments. Specifically, 119 of the 394 patients without neglect showed signs of non-spatial impairments in our stroke cohort. Moreover, we showed that fixed normative versus adjusted cutoffs led to a difference in the estimated prevalence of neglect of 13%. These findings suggest that inflated false positives associated with non-spatial impairments have impact on the diagnostic specificity for a considerable number of stroke patients. As such, our analyses showed that accurate theory about the measurement precision of cancellation scores has implications for clinical practice. Neglect patients also suffer from non-spatial impairments in working memory, selective and sustained attention, among others, and these impairments may even interact with spatial impairments (Husain et al., Reference Husain, Mannan, Hodgson, Wojciulik, Driver and Kennard2001; Husain & Rorden, Reference Husain and Rorden2003; Husain, Shapiro, Martin, & Kennard, Reference Husain, Shapiro, Martin and Kennard1997; Robertson, Reference Robertson2001; Robertson, Mattingley, Rorden, & Driver, Reference Robertson, Mattingley, Rorden and Driver1998; Robertson, Tegnér, Tham, Lo, & Nimmo-smith, Reference Robertson, Tegnér, Tham, Lo and Nimmo-smith1995). Therefore, one may wonder why it matters that non-spatial impairments impact the chance of diagnosing a spatial impairment. We argue that it is important to measure constructs independently in order to investigate their interaction. That is, we would not be able to establish the true association of spatial and non-spatial impairments, if the measurements of both impairments itself are dependent.
Adjusted Cutoffs and Unanimous Test Results Control False Positives
The results of this study provide two concrete ways to limit false positives to the 10% thresholdFootnote 4. First, adjusted cutoffs produced a stable rate of false positives that did not depend on the non-spatial errors. Thus, applying adjusted cutoffs leads to diagnostic decisions that are equally liberal for all patients, independent of whether these patients have non-spatial impairments. Second, requiring unanimous positive test results drastically reduced the rate of false positives, even for fixed cutoffs. For the R-L fixed cutoffs, requiring unanimous agreement of two tests effectively reduced false positives and for CoC fixed cutoffs unanimous agreement of three tests effectively reduced false positives. Some may argue that the ease of interpreting test scores using a fixed cutoff outweighs the benefit of reducing false positives. To illustrate the ease of using adjusted cutoffs, we added an explanation and simulated adjusted cutoffs that can be used to interpret R-L scores of the OCS cancellation task in Supplementary Material 1. Our adjusted cutoffs for R-L scores are equivalent to the z-test of proportions (Newcombe, Reference Newcombe1998). Based on this statistical test implemented in our online application (http://www.psytests.be/stats/cancellation_task), one can obtain interval estimates of R-L scores for patients.
Multiple Tests Inflate False Positives
Many researchers and clinicians already integrate data of multiple tests to diagnose neglect. However, rather than requiring unanimous agreement of multiple tests, it is not uncommon to use a single positive test result out of multiple tests as the criterion to diagnose neglect (e.g. McIntosh et al., Reference McIntosh, Schindler, Birchall and Milner2005; Rengachary et al., Reference Rengachary, He, Shulman and Corbetta2011; Verdon et al., Reference Verdon, Schwartz, Lovblad, Hauert and Vuilleumier2010). The results of our simulations reveal that this method dramatically increased false positives when both fixed and adjusted cutoffs were used. This result is not surprising given that the chance that at least one test result will be a false positive increases as the number of tests administered increases (Farcomeni, Reference Farcomeni2008; Nichols & Hayasaka, Reference Nichols and Hayasaka2003). The true test scores did not vary in our simulated retests, which differs from clinical practice where information across multiple test instruments that differ in construct validity (e.g., line bisection, cancellation, and figure copying tasks) is integrated. This raises the question to what extent our estimated false positives for multiple tests generalize to clinical practice. Indeed, for patients who have spatial impairments, performance on different tests (e.g. line bisection, cancellation, and figure copying tasks) is associated (e.g. Azouvi et al., Reference Azouvi, Olivier, de Montety, Samuel, Louis-Dreyfus and Tesio2003; Bailey et al., Reference Bailey, Riddoch and Crome2004; Rorden & Karnath, Reference Rorden and Karnath2010; Sperber & Karnath, Reference Sperber and Karnath2016), and combining tests may reduce the chance of a misdiagnosis. However, for patients without a spatial deficit, multiple tests act as independent tests. The association between multiple tests for patients without true spatial deficits does not improve if tests are combined that differ in construct validity. Thus, although combining different tests can increase the chance of detecting a true neglect case, it will simultaneously increase false positives if no correction for multiple comparisons is applied.
Reducing False Positives Does Not Necessarily Improve Diagnostic Accuracy
Throughout this investigation, we have focused on false-positive diagnosis. However, accurate diagnosis requires a minimal rate of false positives as well as false negatives. One may wonder whether the adjusted cutoffs are not too conservative, risking to reduce false positives at the cost of detecting true neglect. Indeed, changing cutoffs will only re-weigh the balance between false positives and false negatives and it cannot simultaneously reduce both errors. However, the goal of detecting 100% of true neglect cases, cannot entirely justify inflated false positive rates as high as the rates that we found. That is, consider a scenario in which the true prevalence of neglect equals 20% and we can detect 100% of these cases, than the total percentage accurate diagnosis drops below the guess rate when false positives exceed 60%. Thus, our results signal an important issue that requires consideration.
Since we have not established how adjusted cutoffs affect the balance between false positives and false negatives, we can only recommend the use of adjusted cutoffs for contexts in which false positives need to be controlled. For instance, when including patients in a clinical trial or costly therapy, one may wish to avoid including patients who may not benefit from treatment. Moreover, we advise to use cutoffs that give equal weight to false positives and false negatives for epidemiological neglect research. Future research is necessary to develop cutoffs that balance false positives and false negatives, while considering the impact of non-spatial errors on false positives.
Search strategy and performance measures
Moreover, the current study focused on a simplified representation of clinical decision making. In clinical practice, neglect diagnosis is informed by test scores, but clinicians can integrate behavioural observations with test scores. Research has shown that the ways in which patients search targets can provide valuable information (Behrmann, Watt, Black, & Barton, Reference Behrmann, Watt, Black and Barton1997; Jalas, Lindell, Brunila, Tenovuo, & Hamalainen, Reference Jalas, Lindell, Brunila, Tenovuo and Hamalainen2002), while current practice mostly relies on performance rather than strategy measures (Dalmaijer et al., Reference Dalmaijer, Stigchel, Nijboer, Cornelissen and Husain2014). Search strategy can indeed impact performance measures.Chatterjee, Mennemeier, and Heilman (Reference Chatterjee, Mennemeier and Heilman1992) showed that instructing a patient to alternate cancelling targets on the left and right side, re-distributed cancellations, with many cancellations on both sides of the page and few cancellations in the center. Thus, future research into the clinical relevance of strategy measures relative to performance measures is necessary.
Can we still consider cancellation tasks a valid method to assess neglect?
There has been debate about which task is most valid to assess neglect. Some have argued that cancellation tasks should be preferred over line bisection tasks (Ferber & Karnath, Reference Ferber and Karnath2001; Sperber & Karnath, Reference Sperber and Karnath2016), but through changing the quantification of line bisection performance, McIntosh et al. found better convergence between cancellation and bisection tasks (McIntosh, Ietswaart, & Milner, Reference McIntosh, Ietswaart and Milner2017; McIntosh et al., Reference McIntosh, Schindler, Birchall and Milner2005). Similarly, the way in which cancellation performance is quantified can affect its construct validity (Huygelier & Gillebert, Reference Huygelier and Gillebert2018). In sum, more research is required to understand how each task and measure represents neglect.
CONCLUSIONS
To conclude, when you want to control false-positive neglect diagnoses, we recommend to base diagnosis on adjusted cutoffs that account for the total performance. Alternatively, if you want to use fixed cutoffs, we recommend to use the R-L score and a criterion of two unanimous test results or the CoC score and a criterion of three unanimous test results.
ACKNOWLEDGEMENTS
This work was funded by the Research Foundation Flanders (FWO) with a grant awarded to HH (1711717N) and to CRG (G072517N). The patient data were collected as part of an NIHR program development grant (RP-DG-0610-10046) and NIHR Clinical Research Facility, supported by the National Institute for Health Research Clinical Research Network. ND and MJM are supported by the Stroke Association (TSA LECT 2015/02 and SA PGF 18\100031). The code for this study is available on Figshare (https://doi.org/10.6084/m9.figshare.9962720.v1).
CONFLICT OF INTEREST
The researchers declare no conflicts of interest.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S1355617720000041