Significant outcomes
• The modified Eriksen Flanker Task provides an error-related negativity (ERN) with excellent reliability; the task can be halved to a patient-friendly length of 200 trials.
Limitations
• Instant feedback does not allow for analysing feedback-related potentials; sample size although sufficient for detecting the ERN does not allow sub-analyses (e.g. gender effects).
Introduction
Distinguishing error from correctness is an essential requirement for learning progress (Reference Holroyd and Coles1). In order to understand the function of error-related brain activity, an event-related potential (ERP) has been investigated in several electroencephalography (EEG) studies: the ERN, a negative deflection appearing within 100 ms after an erroneous response that peaks in fronto-central midline recording sites (Reference Gehring and Coles2,Reference Falkenstein and Hoormann3). To elicit the ERN the Eriksen Flanker Task (Reference Eriksen and Eriksen4) is broadly used (Reference Cassidy, Robertson and O’Connell5–Reference Falkenstein and Christ8) which involves discriminating a central target symbol (e.g. an arrow) from surrounding distracting ‘flanker’ symbols. There is strong evidence that the ERN is generated in the anterior cingulate cortex (Reference Brázdil, Roman, Daniel and Rektor9–Reference Luu, Tucker and Makeig11), an area of the medial prefrontal cortex responsible for the integration of affective and cognitive information (Reference Bush, Luu and Posner12).
A similar, but smaller negative ERP can arise also after correct responses in the same time window and at the same recording sites as the ERN: the correct-related negativity (CRN) (Reference Falkenstein and Christ8,Reference Gehring and Knight13–Reference Vidal, Hasbroucq, Grapperon and Bonnet15). It has been discussed whether the same process (Reference Vidal, Hasbroucq, Grapperon and Bonnet15) or two different processes (Reference Coles, Scheffers and Holroyd16,Reference Yordanova, Falkenstein, Hohnsbein and Kolev17) underlie the ERN and CRN.
The function of the ERN is described in different models with regard to an error detection system (Reference Falkenstein and Hoormann3), reinforcement learning (Reference Holroyd and Coles1) or general conflict-detection process (Reference Yeung, Botvinick and Cohen18). Recently, by application of a forward model it has been discovered (Reference Joch, Hegele, Maurer, Müller and Maurer19), that the ERN is likely to reflect an error-prediction. This is in line with the predicted response-outcome (PRO) model (Reference Alexander and Brown20), which interprets the ERN as a surprise signal caused by non-occurrence of a predicted event.
Several factors have been shown to influence the ERN. Particularly important is the performance of the individual subject: The higher the error rate, the lower the ERN amplitude (Reference Fischer, Klein and Ullsperger21,Reference Hajcak, McDonald and Simons22). In addition, the structure of the task and the instruction are relevant: (a) using congruent stimuli (i.e. target and flanker arrows point to the same direction) leads to increased ERN compared to incongruent stimuli (Reference Scheffers and Coles14); (b) task instruction focusing on accuracy over speed leads to increased ERN (Reference Gehring and Coles2,Reference Falkenstein and Christ8); (c) the ERN scales with the availability of sensory information and the task goal (Reference Brown and Braver23).
Moreover, negative affect (Reference Hajcak, McDonald and Simons24,Reference Luu, Collins and Tucker25) and several psychiatric disorders (Reference Fissler, Winnebeck, Schroeter, Gummbersbach, Huntenburg, Gärtner and Barnhofer26–Reference Meyer, Hajcak, Glenn, Kujawa and Klein29) are related to the ERN amplitude. Recently, it has been demonstrated that the ERN can (a) predict the onset of internalising disorder (Reference Meyer, Danielson, Danzig, Bhatia, Black, Bromet, Carlson, Hajcak, Kotov and Klein30) such as anxiety disorder during the adolescence (Reference Meyer31,Reference Meyer, Nelson, Perlman, Klein and Kotov32), (b) provide evidence for therapy responsiveness (Reference Fissler, Winnebeck, Schroeter, Gummbersbach, Huntenburg, Gärtner and Barnhofer26,Reference Rabella, Grasa, Corripio, Romero, Mañanas, Antonijoan, Münte, Pérez and Riba27,Reference Schroder, Moran and Moser33–Reference Hobson, Bonk and Inzlicht35) and (c) help to guide treatment decisions (Reference Gorka, Burkhouse, Klumpp, Kennedy, Afshar, Francis, Ajilore, Mariouw, Craske, Langenecker, Shankman and Luan Phan36).
Particularly the latter case emphasises the clinical relevance of the ERN and makes it a promising candidate as a biomarker for psychiatric disorders. A basic requirement for a biomarker is the reliable measurement. Only a few studies have investigated ERN reliability by using different Eriksen Flanker Task variants and found intraclass correlation coefficients (ICCs) between 0.62 and 0.74 (Reference Cassidy, Robertson and O’Connell5,Reference Olvet and Hajcak7,Reference Segalowitz, Santesso, Murphy, Homan, Chantziantoniou and Khan37–Reference Larson, Baldwin, Good and Fair39).
With the present study we seek to investigate test–retest reliability of the ERN by using a modified Eriksen Flanker Task with an adaptive reaction time (RT) deadline (Reference Debener40,Reference Unger, Heintz and Kray41) in two measurement sessions separated by 28 days. The application of an adaptive RT deadline is intended to maximise reliability due to higher error rate (Reference Larson, Baldwin, Good and Fair39) while ERN is significantly different from CRN amplitude and a potential decrease of ERN amplitude (Reference Fischer, Klein and Ullsperger21,Reference Hajcak, McDonald and Simons22) is negligible. At the behavioural level it is expected that the accuracy data are constant across sessions due to the adaptive RT deadline, whereas RT is predicted to be faster in session 2 because of training effects (Reference Olvet and Hajcak7). To ensure the validity of the modified Eriksen Flanker Task, we attempt to replicate known correlation patterns:(1) positive correlation of ERN amplitude with number of errors (Reference Fischer, Klein and Ullsperger21,Reference Hajcak, McDonald and Simons22) and (2) negative correlation of ERN amplitude with negative affect, measured by the Positive and Negative Affect Schedule (PANAS) questionnaire (Reference Hajcak, McDonald and Simons24). In order to optimise a potential future clinical use of the task, we determine whether the task can be shortened without significant loss in reliability.
Aims of the study
To quantify test–retest reliability of the ERN evoked by a modified Eriksen Flanker Task with an adaptive RT deadline. We seek to determine whether the task can be shortened without significant loss in reliability.
Materials and methods
Participants
For the pilot study N = 12 healthy participants were recruited to adjust task parameters. Two subjects had to be excluded from analyses due to technical problems. Power estimation for the main study was calculated based on the pilot study results. Using G*Power 3.1.9.2 we calculated a required sample size of N = 11 subjects given a statistical power of 0.80, α = 0.05 (one-tailed) and an effect size of 0.83 for a t-test with dependent means. To compensate for drop-outs a new sample of N = 15 subjects was recruited for the main study. One subject had to be excluded from the main study due to technical problems. Finally, test–retest data from N = 14 subjects (9 F/5 M; mean age = 23.5 years, SD = 2.07 years, range = 20–28 years) were included for main analyses. All participants were tested for mental health by the Mini International Neuropsychiatric Interview (M.I.N.I.), German Version 5.0.0 (Reference Sheehan, Lecrubier, Sheehan, Amorim, Janavs, Weiller, Hergueta, Baker and Dunbar42). Exclusion criteria included current or preceding psychiatric diagnoses. We documented consumption of cigarettes, caffeine (including coffee, coke, or caffeinated tea) and alcohol before the first testing and requested the subjects to appear in a comparable condition for the second testing.
All participants were compensated for their participation and gave written informed consent after detailed explanation of the experimental procedure. The study was approved by the Ethics Committee of the University of Frankfurt and is in accordance with the latest version of the Declaration of Helsinki.
PANAS
The German version of the PANAS is a self-report measuring instrument of affect adapted by Krohne et al. (Reference Krohne, Egloff, Kohlmann and Tausch43) from the English language questionnaire PANAS (Reference Watson, Clark and Tellegen44). The questionnaire consists of 20 adjectives describing different emotions (see Supplementary Material). Ten adjectives each cover the dimensions positive affect and negative affect. Every item can be rated on a Five-Point Likert-Type Scale ranging from 1 ‘not at all’ to 5 ‘extremely’. Subjects responded on the basis of their present mood. The sum scores representing negative and positive affect have adequate internal consistency, test–retest reliability, and convergent and discriminant validity (Reference Watson, Clark and Tellegen44).
Subjects completed the PANAS before the Eriksen Flanker Task started at both sessions. To ensure validity by replicating known correlation patterns, state affect was correlated with the ERN amplitude.
Modified Eriksen Flanker Task
Subjects performed a modified arrow version of the Eriksen Flanker Task (Reference Eriksen and Eriksen4) two times (session t1 and session t2) separated by exactly 28 days (Fig. 1) in a dimly illuminated room (subjects of the pilot study finished only session t1). Presentation software Version 18.1 (Neurobehavioral Systems Inc.) was used. The whole task included 411 trials, 12 exercise trials and 399 experimental trials. In order to force many errors only incongruent stimuli were included. On each trial five horizontally aligned arrows were shown in the middle of the monitor (‘«>«’ or ‘>><>>’ or ‘><><>’ or ‘<><><’) for 125 ms followed by a white screen during the RT deadline of maximal 475 ms. Each of the stimulus types was intended to be shown 100 times (due to a technical problem, ‘«>«’ was only shown 99 times on both sessions and all subjects). The subject was instructed to respond as fast and accurate as possible with the right or left arrow key using his/her right index finger on a keyboard, congruent to the direction of the central arrow. Immediately after the button press, a feedback was presented: a plus (+) sign for correct answers, minus (−) for erroneous answers and exclamation point (!) was shown when the subject did not answer within the current RT deadline. In order to force quick answers, the RT deadline was adjusted after each trial by a reduction of 25 ms in case the subject reacted correctly within the current RT deadline or an extension by 25 ms in case the response took longer than the current RT deadline. Between each trial a white screen without fixation cross was shown for randomly 500–1500 ms.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190610144749659-0132:S0924270818000364:S0924270818000364_fig1g.jpeg?pub-status=live)
Fig. 1 Procedure of the Eriksen Flanker Task. RT, reaction time.
EEG recording
The EEG was recorded using an elastic head cap with 64 scalp electrodes according to the international 10/20-System. Four additional electrodes were placed to record an electrooculogram, two close to each angulus oculi lateralis, one on the supercilium and one on the palpebrae inferioris. Ground electrode was placed between the FPz and Fz electrode, reference electrode between the Fz and Cz electrode. All signals were digitised with a 64-channel DC-amplifier and the software ‘BrainVision Recorder’ 2.0 (BrainProducts, Munich, Germany) with a sampling rate of 5000 Hz.
Data analysis
EEG data were analyzed using the software ‘BrainVision analyzer’ 2.0 (BrainProducts, Munich, Germany). First, electrode TP9 and TP10 were disabled, since they are placed on the mastoid and not used as reference. Data were band pass filtered with a low cutoff of 0.1 Hz, a high cutoff of 50 Hz and a notch filter of 50 Hz. Blinks and eye movements were corrected based on the method established by Gratton et al. (Reference Gratton, Coles and Donchin45). The algorithm corrects eye artefacts by subtracting the eye channel voltages multiplied by a channel-dependent corrective factor from the respective EEG channels.
Subsequently data were re-referenced on an average reference of all electrodes and the former reference was reused as channel FCz. The EEG was segmented response-locked with an entire length of 800 ms, with 400 ms pre- and post-response each. The automatic artifact rejection searched for values exceeding a difference of ±70 µV within 200 ms and excluded data 200 ms before and after the artefact. This procedure did not reveal any artefacts. Afterwards the segments were averaged separately into correct, error and missed trials and a window −400 to −200 ms before the response was used as baseline. The ERP components ERN and CRN were analyzed in terms of area and peak measures at electrode sites FCz and Cz. For area measures the mean activity in the interval 0–100 ms after response was calculated, for peak analysis automatic peak detection identified the largest negativity in the same interval.
In the process of our analysis it was necessary to evaluate the EEG data additionally stimulus-locked. The window −400 to −200 ms pre-stimulus was used as baseline and the average time course separated into correct and error as well as sessions t1 and t2 was calculated.
Statistical methods
For statistical calculations IBM SPSS statistics (version 22) and MATLAB R2017b (The Mathworks, Natick, MA, USA) was used.
In case of behavioural data we used Wilcoxon-test (α = 0.05; two-tailed) due to non-normally distributed data as tested by Shapiro–Wilk tests. In order to analyse EEG data, we tested for Gaussian distribution by Shapiro–Wilk tests (all ps >0.42) and calculated a 2 × 2 repeated measure ANOVA (analysis of variance) with factors (1) accuracy (CRN, ERN) and (2) sessions (t1, t2). Posthoc dependent t-tests (α = 0.05; two-tailed) were performed in case of significant interaction effects.
Test–retest reliability was assessed by calculating ICC (Reference Holroyd and Coles1,Reference Gehring and Coles2) for absolute agreement defined by Shrout and Fleiss (Reference Shrout and Fleiss46) as:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190610144749659-0132:S0924270818000364:S0924270818000364_eqnU1.gif?pub-status=live)
BMS = between-subjects mean square; EMS = error mean square; JMS = session mean square (the original terminology of ‘J’ is ‘Judge’); k = number of repeated sessions and N = number of subjects. Thus, in the current study, k = 2 and N = 14.
Following Shrout and Fleiss (Reference Shrout and Fleiss46) we defined ICC values <0.4 as poor, 0.4–0.75 as fair to good and >0.75 as excellent. Negative ICC values were reset to 0 (Reference Bartko47).
For correlation analyses of ERN amplitude, we calculated the correlation according to Spearman (one-tailed), since the scores of negative affect and number of errors were not Gaussian distributed.
Results
Behavioural results
Participants responded significantly faster in session t2 compared to session t1, for both correct and error trials (Table 1). There was a significant effect on number of correct trials, but not on number of error and missed trials between sessions (Table 1). Across sessions the accuracy was consistent at a level of ∼80% (see Supplementary Fig. 1).
Table 1 Performance data
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190610144749659-0132:S0924270818000364:S0924270818000364_tab1.gif?pub-status=live)
IQR = interquartil range.
Comparing CRN and ERN
Figure 2a shows response-locked ERPs for error and correct trials at FCz electrode averaged over all subjects and trials. As expected, there was a significant difference between CRN and ERN [peak amplitude measures: F(1,13) = 16.673, p <0.001, ηp 2 = 0.562; area measures: F(1,13) = 10.008, p = 0.007, ηp 2 = 0.435] with more pronounced negativity for ERN versus CRN. For factor session, there was a significant effect [peak amplitude measures: F(1,13) = 15.282, p = 0.002, ηp 2 = 0.540; area measures: F(1,13) = 27.924, p <0.001, ηp 2 = 0.682] with a more pronounced negativity for session 1 versus session 2. In addition, there was a significant interaction of accuracy and session for peak amplitude measures [F(1,13) = 11.484, p = 0.005; ηp 2 = 0.469] but not for area measures. Posthoc t-tests revealed that the interaction resulted from a significant change in the CRN amplitude (t = −4.270, p <0.001, dz = 1.141) while the ERN amplitude difference was not significant across sessions (t = −1.841, p = 0.089, dz = 0.492).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190610144749659-0132:S0924270818000364:S0924270818000364_fig2g.jpeg?pub-status=live)
Fig. 2 (a) Response-locked time courses of correct and error trials at FCz electrode (±SE) for session t1 and t2. (b) Topographic mapping of correct-related negativity (CRN) and error-related negativity (ERN) and t-map of the difference ERN–CRN (area measure).
The topographies showed a more pronounced negativity in frontal areas for error compared to correct trials and the major difference between CRN and ERN in the central cortex (Fig. 2b).
Test–retest reliability
Table 2 shows test–retest reliability indices of ERP measures for error and correct trials at FCz and Cz electrode. Considering the FCz electrode, ICCERN was excellent (peak amplitude measures: ICC = 0.947, p <0.001; area measures: ICC = 0.806, p <0.001) and ICCCRN was fair to good (peak amplitude measures: ICC = 0.747, p <0.001; area measures: ICC = 0.675, p <0.001). For peak amplitude measures the ICCERN–CRN was excellent (ICC = 0.792, p <0.001) and fair to good for area measures (ICC = 0.585, p = 0.013). On the contrary, peak latency measures were characterised by a low non-significant reliability of ERN (ICC = 0.143, p = 0.290) and CRN (ICC = 0.347, p = 0.113) but a moderate and significant reliability of ERN–CRN (ICC = 0.690, p = 0.002). For Cz electrode we found comparable results.
Table 2 Test–retest reliability for error-related negativity (ERN) and correct-related negativity (CRN) at FCz and Cz electrode*
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190610144749659-0132:S0924270818000364:S0924270818000364_tab2.gif?pub-status=live)
CI, confidence interval.
* Note that the intraclass correlation coefficients (ICCs) are comparable at C2 electrode where the difference between ERN and CRN was at maximum.
† ICC for absolute agreement.
Validity
Spearman correlation for the ERN amplitude (FCz) with relative number of errors (Fig. 3a) revealed a trend to significance (r = 0.394; p = 0.082). The correlation of negative affect and ERN amplitude (Fig. 3b) reached significance (r = −0.583, p = 0.014). Topographic mappings of the correlations show that the absolute maxima were located at central electrodes (Fig. 3c).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190610144749659-0132:S0924270818000364:S0924270818000364_fig3g.jpeg?pub-status=live)
Fig. 3 (a) Correlation of error-related negativity (ERN) amplitude with absolute number of errors. (b) Correlation of ERN amplitude with negative affect, measured by the Positive and Negative Affect Schedule (PANAS) questionnaire. (c) Topographic mapping of correlation values of ERN amplitude with absolute number of errors (left) and negative affect (right).
A negative deflection preceding the ERN is noticeable in our response-locked time courses (Fig. 2a). To further examine this negative deflection, we analysed the EEG data stimulus-locked (see Supplementary Fig 2) and identified visual evoked potentials: the negative potential before the ERN and CRN is most likely the N200 which peaks at FCz electrode (correct: 292 ms (t1)/281 ms (t2); error: 291 ms (t1)/ 277 ms (t2) post-stimulus) (Reference Kopp, Rist and Mattler48).
Can the task be shortened?
Figure 4 shows ICCERN and ICCCRN values with increasing number of included trials at FCz electrode. Analysing peak measures the ICCERN exceeded the threshold of >0.80 including 35 trials, for area measures 45 trials were required.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190610144749659-0132:S0924270818000364:S0924270818000364_fig4g.jpeg?pub-status=live)
Fig. 4 Intraclass correlation coefficients (ICC) values of correct-related negativity (CRN) and error-related negativity (ERN) with increasing number of included trials at FCz electrode.
Analysing ICCCRN values for peak measures at least 50 trials were required, for area measures the threshold was not exceeded.
Discussion
The overall objective of this study was to establish ERN acquisition in a reliable, time-efficient and patient-friendly way. Therefore, we used a modified Eriksen Flanker Task that increases the number of errors. To ensure external validity we aimed to replicate previously reported correlation patterns of ERN amplitude with number of errors and negative affect. In order to optimise the clinical use of the task, we determined to which extent the task can be shortened while keeping reliability >0.80. Overall, we (A) found excellent reliability of the ERN, which was >0.80 even when the task was reduced to halve of the trials and (B) ensured external validity of the ERN assessed by replicating previously reported correlation patterns with internal and external variables.
Reliability and effects of the adaptive RT deadline
Excellent reliability of the ERN was found. For peak measures, reliability is higher compared to other studies (Reference Cassidy, Robertson and O’Connell5,Reference Olvet and Hajcak7,Reference Segalowitz, Santesso, Murphy, Homan, Chantziantoniou and Khan37–Reference Larson, Baldwin, Good and Fair39), with a 95% CI ranging from 0.832 to 0.983. A potential explanation for this is the adaptive RT deadline which produced about twice as many errors in comparison to other studies. For example, the subjects in the study of Larson et al. (Reference Larson, Baldwin, Good and Fair39) made errors in 12% of the incongruent trials on average. In the studies of Weinberg and Hajcak (Reference Weinberg and Hajcak38) and Olvet and Hajcak (Reference Olvet and Hajcak7) error rate was 11.97 and 11.34%. In our paradigm, however, error rate was 20%. (The data refers to the first session, but the second session is comparable.) An increasing number of error trials has been shown to increase the ERN reliability (Reference Larson, Baldwin, Good and Fair39) and power (Reference Fischer, Klein and Ullsperger21,Reference Boudewyn, Luck, Farrens and Kappenman49).
In addition, the adaptive RT deadline counteracts a potential learning effect. According to the PRO model (Reference Alexander and Brown20), the ERN amplitude changes with the likelihood of errors. When a subject performs the same task at two sessions, a learning effect arises and thereby a difference in likelihood for errors between the sessions. However, due to the adaptive RT deadline, the paradigm adapts to the performance level of the subject and the likelihood remains stable despite the learning effect. This may explain the excellent reliability of the ERN as found in our study.
A further advantage of the adaptive RT deadline is performance adjustment across groups. Several studies (Reference Fischer, Klein and Ullsperger21,Reference Hajcak, McDonald and Simons22) demonstrated a negative relationship between number of errors and ERN amplitude. This can lead to biased results when comparing groups with different error rates (Reference Fischer, Klein and Ullsperger21). According to the PRO model (Reference Alexander and Brown20), different performance levels, for example, in healthy controls and patients would lead to different subjects’ expectations of making errors and thus may confound the ERN amplitudes. The adaptive RT deadline can reduce this potential bias because subjects would produce a comparable error rate.
However, there are also potential caveats: a high task performance can be defined not only by the error rate but also by the RT. According to the forward model (Reference Joch, Hegele, Maurer, Müller and Maurer19,Reference Joch, Hegele, Maurer, Müller and Maurer50) better task performance corresponds to more accurate forward model predictions about the performance outcome. This could lead to higher ERN amplitudes in subjects with faster RT. Therefore, differences in RT, for example, between patients and healthy controls might lead to biased ERN comparisons.
Finally, other studies generate sufficient number of errors by increasing task length [e.g. 900 trials (Reference Larson, Baldwin, Good and Fair39)], while we achieved the high number of errors by a higher error rate. According to the PRO model (Reference Alexander and Brown20) and Fischer et al. (Reference Fischer, Klein and Ullsperger21) a smaller ERN amplitude is then expected. However, this ERN amplitude decrement seems to be negligible in our case since we have detected significant differences between ERN and CRN.
Validity
We found evidence for validity of the recorded ERN by partially replicating known correlation patterns: (A) a trend-wise positive correlation of ERN amplitude with number of errors (Reference Fischer, Klein and Ullsperger21,Reference Hajcak, McDonald and Simons22) and (B) negative correlation of ERN amplitude with negative affect (Reference Hajcak, McDonald and Simons24). In our study, the correlation between ERN amplitude and number of errors showed only a trend to significance. However, this is likely due to the small sample because our revealed effect size is in line with the reported values (Reference Fischer, Klein and Ullsperger21).
An additional aspect supporting the validity of the ERN is the topographic mapping: the ERN peaks in fronto-central midline recording sites as reported by previous studies (Reference Gehring and Coles2,Reference Falkenstein and Hoormann3). However, compared to other ERN studies a negative deflection preceding the ERN is noticeable in our response-locked time courses (Fig. 2a). To further examine this potential, we evaluated stimulus-locked time courses (see Supplementary Figure 2) and identified this negative deflection most likely as the N200. It has been shown in former studies that the N200 appears particularly on incongruent flanker stimuli (Reference Kopp, Rist and Mattler48).
Can the task be shortened?
To determine whether the task can be shortened without significant loss in reliability we analysed from which number of processed trials a reliability >0.80 (Reference Rosaroso51) can be achieved. Our analyses showed that at least 35 error trials are necessary to achieve reliability >0.80 for peak amplitude measures of the ERN. For area measures, 45 error trials are required. A subject made 68 (t1) and 81 (t2) errors on average during the entire task. Therefore it can be concluded that a reduction of the paradigm to approximately half of the trials (= 200) can equally ensure excellent reliability of ERN peak measures. Processing the whole task took on average 16.33 min. Thus, our paradigm can acquire highly reliable ERN within 8 min. This is advantageous in clinical practice as patients often have shorter concentration spans (52).
Limitations and recommendations for future studies
No comparison and reliability assessment of congruent versus incongruent trials could be conducted, because only incongruent stimuli were shown.
Moreover, our modifications of the Eriksen Flanker Task, that is using only incongruent stimuli and an adaptive RT deadline, might have influenced the ERN. For example, it has been shown that faster RTs are associated with larger ERNs (Reference Quik53) while higher error numbers (Reference Fischer, Klein and Ullsperger21,Reference Hajcak, McDonald and Simons22) and incongruent stimuli (Reference Scheffers and Coles14) lead to reduced ERN amplitudes. In order to investigate these influences systematically, future studies should compare the ERN elicited by a flanker task variant with versus without these modifications.
The instant feedback does not allow for analysing feedback-related potentials (Reference Bismark, Hajcak, Whitworth and Allen54). To achieve this, introducing a delay period between response and feedback would be necessary. Furthermore, contaminations of the response-locked ERP components by the visual feedback cannot be ruled out.
Although sufficient for detecting the ERN with a power >0.80, the current sample size does not allow any sub-analyses, for example, gender effects. Studies focusing on such effects should include larger sample sizes.
In order to use the ERN as a biomarker e.g. to control the course of an intervention (Reference Gorka, Burkhouse, Klumpp, Kennedy, Afshar, Francis, Ajilore, Mariouw, Craske, Langenecker, Shankman and Luan Phan36) it is important to assess and interpret the ERN of a single subject (e.g. assignment into treatment type). However, measuring the ERN in single subjects usually is fairly difficult because of high variance due to diverse pre-analytical and analytical sources (Reference Micheel and Ball55) that all have an potential impact on reliability. Future studies have to investigate further criteria for establishing the ERN (effects of sex, stress, age, pre-existing disease, medication effects, circadian rhythm, etc.) as a trans-diagnostic biomarker in particular within the Research Domain Criteria (Reference Carcone and Ruocco56) matrix (Reference Ladouceur57,Reference Weinberg, Meyer, Hale-Rude, Perlman, Kotov, Klein and Hajcak58).
Finally, the task and its practicability should be evaluated in patients to examine feasibility and compare reliability.
Conclusion
The present study found an excellent reliability of the ERN acquired by a modified Eriksen Flanker Task with adaptive RT deadline with only 200 trials which is time-efficient and clinically feasible. Summarising, the present modified task provides a reliable and efficient recording of the ERN, which will facilitate its use in psychiatry.
Supplementary Material
To view supplementary material for this article, please visit https://doi.org/10.1017/neu.2018.36
Acknowledgements
The authors thank Lea Marie Schnetzler for her help. The authors gratefully acknowledge the subjects who participated. Authors‘ contributions: All authors (F.S., J.K., A.R., M.M.P.) substantially contributed to the conception and design of the study. F.S. recorded the data. F.S., H.A. and M.M.P. analysed and interpreted the data and wrote the manuscript. All authors critically revised the manuscript and gave their final approval of the version to be published.
Funding
This work was supported by the German Research Association (DFG) [SFB 1193 Z03]; BMBF BipoLife; EU Horizon 2020 [CoCA No 667302, MiND No 643051, Eat2BeNICE No 667302] and EU FP7 Aggressotype [No 602805].
Statement of Interest
The authors declare no financial, professional and personal relationships with the potential to bias the work.
Ethical Standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional guides on the care and use of laboratory animals.