Introduction
For psychologists or psychiatrists conducting forensic evaluations, a forensic psychological report is a work product—one of many reports they will author over the course of their careers. Many forensic evaluators conduct a large number of evaluations per year; for example, Colorado state evaluators conduct an average of 144 competency to stand trial (CST) evaluations per year. 1 For psychologists or psychiatrists who author a large volume of evaluations, some cases may seem routine. Evaluators may fall into a pattern in which many evaluations appear mundane and typical.
However, for the individual being evaluated, each report holds a tremendous amount of influence. Forensic evaluations cover a wide swath of psycholegal referral questions, and they carry a great deal of influence over the lives of those under evaluation. Although forensic evaluators are not triers of fact, judicial decisions are overwhelmingly correlated to opinions of forensic evaluators;Reference Redding, Murrie and Goldstein 2 studies have shown that judges follow the opinions of evaluators in 76%–99% of cases.Reference Acklin, Fuger and Gowensmith 3 – Reference Cruise and Rogers 6 These opinions can be far-reaching. For example, evaluations of adjudicative competency or sanity can influence whether a defendant is temporarily detained in a correctional facility, hospital, or released to the community—and they may also ultimately lead to charges being dismissed.Reference Mowen 7 Criminal responsibility evaluations can provide the tipping point between acquittal and hospital commitment versus a guilty verdict and imprisonment.Reference Eaton 8 Other forensic mental health evaluations may influence whether a parent maintains custody of his/her child, a person is released from a locked facility, a teenager is tried as an adult, a confession is valid, or a plaintiff receives monetary awards. In capital cases, a forensic mental health evaluation can influence whether a defendant is executed.
In addition to the impact on the individual, forensic evaluations have systemic impact. Forensic evaluations that are not conducted within a certain time period can result in a backlog of cases. Many states are currently in the throes of federal lawsuits centered around these delays in evaluation time frames, as defendants with mental illnesses languish in county jails awaiting their evaluations.Reference Gowensmith 9 , Reference Locklair 10 However, when evaluations are conducted too quickly, emerging research shows that forensic opinions are often subject to inaccuracies.Reference Bryson, Boccaccini and Gowensmith 11 , Reference Gowensmith, Metroz and Bratcher 12 Further, the quality of reports from state evaluators has been shown to be of mediocre quality in some settings.Reference Nguyen, Acklin and Fuger 13 Reports of poor quality can result in appeals or second opinion requests, compounding the backlog even more. Finally, a great deal of research demonstrates that many evaluators are biased by multiple internal and external factors.Reference Robinson and Acklin 14 – Reference Bergeron 23 Biased, unreliable, or low quality forensic mental health evaluations deteriorate the fairness of the justice system overall. Given the important systemic and individual impacts of forensic evaluations, it is critical that they are efficient, valid, reliable, and held to high standards of quality. However, an accumulating body of literature suggests that the efficiency, validity, reliability, and quality of these reports have substantial room for improvement.
Efficiency
Efficiency of conducting forensic evaluations is specifically relevant to CST evaluations, the most frequently ordered mental health evaluations by criminal courts.Reference Melton, Petrila and Poythress 25 Courts order an estimated 25,634–51,500 CST evaluations each year nationally, varying from fewer than 50 to approximately 5,000 per year in individual states.Reference Warren, Chuahan and Kois 26 , Reference Fitch 27 Numbers of evaluations continue to increase annually. For example, CST evaluations in Wisconsin increased 32.5% from 2010 through 2015, 28 while evaluations in Washington increased 76.3% from 2001 through 2012. 29 Colorado reported a 206% increase in the number of CST evaluations from 2005 to 2014, 30 and Los Angeles county reported a 273% increase from 2010 to 2015.Reference Sewall 31 Unsurprisingly, this burgeoning need for CST evaluations has led to long waitlists for evaluations, as states across the nation have struggled to keep pace with the rapid growth of demand. Some states have reported waitlists of more than a year for CST evaluations to be conducted, and several other states are operating under federal or state oversight to ensure that evaluation wait times are reasonable.Reference Locklair 10 , Reference Gowensmith, Murrie and Packer 32 Class action lawsuits in the states of Oregon and Washington successfully lobbied for shorter wait times for CST evaluations and access to competency restoration services. 33 , 34 Other states are grappling with consent decrees, lawsuits, and legislation on the issue (eg, Alabama, Colorado, Pennsylvania, Nevada, and California).
Some legislation has resulted in specified time frames for evaluations to be completed. The national average number of days from court order to evaluation report date is 31 days.Reference Gowensmith, Murrie and Packer 32 However, there is considerable variability in these time frames. Oregon and Maryland require CST reports within 7 days, North Carolina has a 10-day time frame for defendants awaiting evaluations in jail, Washington as a 14-day time limit, and Rhode Island mandates reports within 15 days. Alternatively, several states (Arkansas, Kansas, Missouri, Montana, among others) extend the deadline to 60 days, while 15 states have no statutorily defined time frame at all. Evaluation time frames are currently being adjusted in several states to accommodate the increasing demand for CST evaluations; Colorado’s current Consent Decree will decrease the current 28-day time frame for evaluations to 21 days in 2020. 1
Decreasing these CST evaluation time frames may seem like a good solution to long waitlists for these evaluations. Hiring more evaluators to conduct evaluations more quickly certainly ensures that defendants will be evaluated more quickly. However, emerging research suggests that systems may experience unintended negative consequences if evaluations are conducted too quickly.
Most CST evaluators find between 20% and 40% of defendants incompetent to stand trial (IST).Reference Pirelli, Gottdiener and Zapf 35 However, recent data shows that the timing of CST evaluations has a substantial impact on evaluators’ opinions. Hawaii data indicates that IST rates were nearly 40% higher in evaluations conducted within 7 days of the court order than those conducted beyond 7 days.Reference Gowensmith, Metroz and Bratcher 12 Washington shows a 50% IST rate for CST evaluations conducted within 7 days, 34 as does the state of Maryland. 36 This trend seems to be especially true for defendants with psychotic and substance use diagnoses. Defendants found IST within 7 days in the Hawaii sample were more likely to have substance-related and/or psychotic disorders. Additionally, data from a large dataset in Texas show that the rate of IST opinions rose approximately 25% in defendants with a schizophrenia-spectrum diagnosis or a substance-related disorder when evaluations were completed within 10 days of the court order.Reference Bryson, Boccaccini and Gowensmith 11 , Reference Bryson, Boccaccini and Gowensmith 37 It seems that quick turnaround time frames for CST evaluations may artificially inflate the numbers of IST opinions in some cases.
Aside from the effects of quick turnaround evaluation on the accuracy of individual competency opinions, these aggressive time frames may also have systemic effects. If courts and attorneys are assured that they will receive CST opinions within 1–15 days, they may in turn request them more often. Paradoxically, shorter time frames for evaluations may incentivize courts and attorneys to order them more frequently—increasing referral numbers, evaluator caseloads, and ultimately defendant wait times. No data is available currently to monitor this possibility; however, Oregon and Washington’s recent drastic reductions in evaluation time frames would provide a good naturalistic opportunity to review CST evaluation referral rates, evaluator opinion rates, and systemic effects on wait times.
Quality
The forensic evaluator has considerable influence in how information is included and presented to the court. A quality forensic evaluation is not only defined by good grammar, syntax, and readability, but also the inclusion of critical elements and a well-supported answer to the psycholegal question posed. Poorly written reports could result in a myriad of negative consequences, such as an increasing need for second opinions, compounding the problems noted above. Further, providing the trier of fact with inaccurate or incomplete information risks an unfair trial process.
Several reviews of forensic evaluations have revealed deficits in report quality. Robinson and Acklin found that many critical components, such as historical information, collateral information, or a rationale for the forensic opinion, were not included in 150 CST reports, resulting in an overall “poor” quality rating.Reference Robinson and Acklin 14 A similar review of 150 conditional release reports from Hawaii found that evaluators documented informed consent in only half (52%) of reports and provided a rationale for their opinion of dangerousness in only 34.7% of reports.Reference Nguyen, Acklin and Fuger 13 Skeem, Golding, Cohn, and Berge found that CST evaluators in Utah rarely linked the defendant’s clinical presentation to the forensic opinion and rarely explained their rationale for arriving at that opinion.Reference Skeem, Golding and Cohn 38
The rising demand for CST evaluations can further threaten the quality of reports by pressuring state systems to widen the pool of qualified evaluators. However, the discipline of the evaluator likely has less influence on report quality than requisite training. Originally, criminal courts only qualified psychiatrists as forensic experts before slowly including psychologists.Reference Gowensmith, Pinals and Karas 39 Despite early concerns about report quality of psychologists, no substantive differences in report quality have consistently been found between disciplines.Reference Petrella and Poythress 40 , Reference Poythress, Otto and Heilbrun 41 However, both psychiatry and psychology have developed an infrastructure of forensic specialty training through pre- and postdoctoral programs, dedicated sub-specialty professional organizations, and high-impact forensic journals.Reference Gowensmith 9 This training and infrastructure is crucial to for evaluators to attain minimum quality standards, as forensic evaluations are often complicated, requiring evaluators to consider difficult psychological, legal, and cultural nuances.Reference Boccaccini, Chevalier and Murrie 15 , Reference Chevalier, Boccaccini and Murrie 17 , Reference Kois, Pearson and Chauhan 42 – Reference Pinals, Tillbrook and Mumley 44 Simply expanding the pool of eligible evaluator disciplines runs the serious risk of experts offering poorly formulated opinions, unless enough forensic training and infrastructure within that discipline exists.
Reliability
When a psychological evaluation is used in either criminal or civil court, it is expected to be objective and reliable. In admitting either psychiatric or psychological testimony, courts consider factors established by the Daubert decision, such as whether it is scientifically valid, whether the “theory or technique can be (and has been) tested,” and whether there is a “known potential rate of error.” 45 Several scholars have argued psychiatric and psychological testimony do not meet this standard, citing a lack of available base rates, errors in clinical decision making, and a range of theories too diverse and inconsistent to result in reliable opinions.Reference Bergeron 23 , Reference Faust and Ziskin 24 Of course, psychiatric and psychological testimony continued to be present in courtrooms despite these criticisms. In the time since, literature has only accumulated regarding threats to reliability of forensic opinions.
Variability between forensic evaluators is concerningly high. In a sample of 59 evaluators who conducted a total of 4,498 evaluations of legal sanity, seven evaluators opined the individual was sane in 100% of their evaluations, whereas three evaluators opined the individual was sane in 50% of their evaluations.Reference Murrie and Warren 21 In a sample of 15 evaluators who each completed at least 100 CST evaluations, rates of incompetency findings ranged from 1.7% to 27.9%.Reference Murrie, Boccaccini and Zapf 20 Similar discrepancies among evaluators have also been found in the use of forensic assessment instruments. For example, Boccaccini, Turner, and Murrie found that some evaluators assigned consistently higher scores on the PCL-R compared to other evaluators.Reference Boccaccini, Turner and Murrie 46
This variability may simply be caused by differences between evaluators and the sometimes subjective nature of interpreting statutory language of mental health law. For example, one psychologist may have a higher decision threshold for determining a defendant’s “sufficient present ability to consult with his lawyer with a reasonable degree of rational understanding” (emphasis added). 47 Indeed, Mossman found individual differences in decision thresholds between evaluators in CST evaluations.Reference Mossman 48 Though he acknowledged individual differences in feelings and beliefs may contribute to differences in decision thresholds, he also discussed several other influencing variables, such as internal and external expectations and conventions in specific agencies, knowledge of local judicial decision-making trends, or differing understandings of constructs underlying adjudicative competence.
A now well-established threat to reliability is bias of the evaluator toward the side that retained his/her services.Reference Boccaccini, Chevalier and Murrie 15 , Reference Chevalier, Boccaccini and Murrie 17 , Reference Murrie, Boccaccini and Guarnera 19 This “adversarial allegiance” effect is present even when evaluators score objective, structured risk assessments designed to mitigate subjective bias.Reference Murrie, Boccaccini and Turner 49 Earlier research has suggested evaluators’ opinions are also influenced by the fees they earn.Reference Bergeron 23 , Reference Faust and Ziskin 24 , Reference Callahan and Silver 50 Additionally, research has found discrepancies in evaluator opinions related to the racial and/or ethnic background of the defendant.Reference Parker 16 , Reference McCallum, MacLean and Gowensmith 18 , Reference Hagen 22 Clearly, these threats to the reliability of forensic opinions must be addressed if evaluations are to reach the highest standards of objectivity and neutrality. The reliability of evaluator opinions is surprisingly low across nearly all psycholegal referral questions, Reference Gowensmith, Murrie and Boccaccini 51 most likely for many of the reasons articulated previously. However, this poor reliability can cause differential and undue harm to certain defendants. Defendants of color, for example, should not be routinely found IST more often than Caucasian defendants; unfortunately, some scholars have shown this to be true in some samples.Reference Parker 16 , Reference McCallum, MacLean and Gowensmith 18 If race or skin color is truly a differential factor for some CST evaluators, then the notion of objective and reliable opinions is clearly tainted. The final forensic opinion (and subsequent judicial adjudication) should not depend on race, skin color, fees, the individual evaluator, or any other idiosyncratic factors. However, if the realities of implicit and explicit bias—as well as other factors related to low reliability across evaluators—are not adequately addressed, then some subpopulations of persons being forensically evaluated may be at risk for injustice.
Validity
The above reliability concerns undoubtedly undermine the validity of forensic evaluations. Unfortunately, research evaluating the accuracy of forensic opinions in the field is substantially lacking because the ground truth is often unknown.Reference Dror and Murrie 52 Forensic evaluators typically never know if they “got it right” with their opinion, but some research has attempted to measure the accuracy of forensic psychologists. In 2010, Mossman et al. found impressively high accuracy in five forensic evaluators asked to review redacted court reports and provide CST opinions.Reference Mossman, Bowen and Vanness 53 However, the evaluators provided opinions on a graded scale, rather than a binary one typical for most CST evaluations (competent or not). Though the finding may be encouraging to some, the incredible variability of forensic evaluator opinions summarized above suggest that validity is likely much poorer in the field. Agreement between evaluators intuitively seems to lead to greater accuracy and evidence seems to support this. In Hawaii, when three evaluators evaluated the same defendant for conditional release and unanimously opined release was appropriate, most evaluators “got it right”; about 74% accurately predicted those who would be rehospitalized within three years. Interestingly, agreement of evaluators in their predictions was important. In cases in which three evaluators independently opined in favor of release, only 29.6% of defendants were re-hospitalized—compared to a 71.4% rehospitalization rate in cases in which evaluators disagreed as to the person’s readiness for release.Reference Gowensmith, Murrie and Boccaccini 54
Corrective strategies
Forensic evaluations play an important role in the due process of adjudicating criminal cases with mental health components. Poor forensic standards can unwittingly collude with discriminatory social and judicial practices, such that certain defendants undergoing poor evaluation practices may be more likely to be found competent, sane, dangerous, and so on. It is critical that forensic evaluations provide the court with objective, comprehensive, and accurate information so that the court can utilize the data and opinions in a just manner. Although we have outlined several potential areas in which evaluations can be misused, we now turn to potential mechanisms to mitigate against these threats to evaluation reliability, validity, and quality.
Ensuring high-quality evaluations
One possible strategy to improve quality and reliability is training and certification of forensic evaluators. Approximately half of the states in the US have implemented formal certification processes for psychologists to conduct CST evaluations.Reference Petrella and Poythress 40 In Hawaii, forensic psychologists and psychiatrists who attended a three-day certification training subsequently showed improvement in the quality and reliability of their CST evaluations.Reference Robinson and Acklin 14 , Reference Gowensmith, Sledd and Sessarego 55 However, much more research examining the outcomes of these training is needed to truly assess effectiveness.Reference Gowensmith 9 The use of a peer-reviewed evaluation report system can also be used to identify evaluation areas or specific evaluators that need improvement. Although little empirical evidence exists as to the incremental utility of standardized training and peer reviews of reports, it seems safe to assume that high-quality training, and maintenance of high standards among evaluators and their work, will guard against the misuse of forensic evaluations in many contexts.
Evaluation parameters
The amount of time allotted to complete forensic evaluations appears to matter. Completing evaluations too quickly may lead to inflated rates of incompetent findings among CST evaluations.Reference Bryson, Boccaccini and Gowensmith 11 , Reference Gowensmith, Metroz and Bratcher 12 , Reference Skeem, Golding and Cohn 38 However, completing evaluations too slowly can lead to defendants experiencing unconstitutional wait times in jail.Reference Gowensmith 9 To strike a balance, The National Judicial College recommends that CST evaluations be conducted within 15 to 30 days of the initial court order. 56 Although more research is needed to fully assess the optimal time frame of forensic evaluations, the current evidence suggests 15 to 30 may be ideal.Reference Gowensmith 9 In addition to time frames, other conditions of evaluations may affect quality. The ideal testing environment for psychological testing (quiet, uninterrupted time, enough working space, few distractions) is often inaccessible in correctional facilities. Although flexibility is important, evaluators must be willing to require minimally acceptable assessment standards within correctional facilities, lest their evaluation results be potentially tainted.
Even with the ideal evaluation parameters in place, proficient evaluators are paramount in ensuring improved report quality. Creating and maintaining an infrastructure of training and certification of forensic evaluators is a critical step. Additionally, reasonable workloads and competitive salaries will also bolster a workforce of capable evaluators.
Identifying and mitigating potential bias
Employing strategies to identify and mitigate against internal and external evaluation biases is important in ensuring optimal levels of evaluation reliability and validity. Self-monitoring should be a systematic, consistent process for evaluators. However, not all self-monitoring processes are created equal; selecting effective mechanisms is critical. Humans are simply not adept at monitoring their own biases through introspection.Reference Gowensmith and McCallum 57 Evaluators, like all people, are not only prone to the effects of bias but also to the “bias blind spot”—an insidious phenomenon in which people do not recognize their own biases.Reference Pronin and Kugler 58 Therefore, simply looking in the metaphorical mirror to check one’s biases is rarely effective.
Instead, following social psychology’s tenets, evaluators should use objective, measurable data to guide the identification and mitigation of biases.Reference Neal and Grisso 59 For example, evaluators could collect and analyze data from their own evaluations to assess measurable outcomes. Brodsky suggested practitioners can use such a database to measure their own objectivity.Reference Brodsky 60 , Reference Brodsky 61 Some practitioners who have engaged in such self-studies have found surprising and illuminating results.Reference Parker 16 , Reference Gowensmith, McCallum and Nadkarni 62 Further, if self-monitoring were a standard of practice and data were aggregated across evaluators and settings, we would have far richer information about base rates of opinions in relation to a myriad of potentially significant factors (i.e. geographic locations, relationships to attorneys, fees, workplace environments).
Such an approach can be an especially valuable tool in mitigating the criminalization of mental illness in the forensic evaluation context. If robust data is captured on all forensic evaluations submitted to the court (even, for example, within a state-employed forensic evaluator pool), analyses could illuminate areas of potential discrimination or differentiation. Certain jurisdictions might be found to refer a high number of spurious cases. Specific evaluators may have unreasonable thresholds for psycholegal criteria. Perhaps defendants from certain races may be found more dangerous than those from other races. Perhaps fees, the referring attorney, or the setting in which evaluations are conducted lead to differential rates of opinions. This sort of data would be tremendously informative in ensuring that systemic biases are identified and addressed in a practical and tangible way, optimizing the reliability and validity of the forensic evaluation process. In doing so, the evaluation process would be less likely to be an unintentional contributor to the criminalization of persons with mental illness.
Conclusions
An unprecedented number of people with mental illness are being funneled to the criminal court in order to access mental health care. Jails and court dockets are increasingly overwhelmed with cases involving mental illness, state hospitals devote far more beds and resources to forensic cases, and people without a criminal commitment are increasingly left waiting for mental health services as forensic cases are prioritized. Each component of the forensic mental health process likely plays a role in maintaining this trend. However, some forensic evaluators appear to operate in a figurative vacuum, assuming that their roles have little impact on the criminalization of people with mental illness; after all, evaluators do not arrest people, raise forensic referral questions, order evaluations, or make any final judicial decisions. However, poorly conducted forensic evaluations can indeed exacerbate the criminalization of persons with mental illness in many ways. Evaluations that suffer from poor reliability, low quality, or evaluator biases can extend criminal commitments unnecessarily, cause delays in the resolution of the case, or posit inaccurate opinions. Research is consistent that evaluation reliability, quality, validity, and accuracy all have room for improvement. In addition, forensic evaluations are inevitably vulnerable to errors and bias, as is all human decision making. However, when the stakes for individuals and systems are so high, forensic evaluators should always strive for the highest standards. As a field and as individual evaluators, this will involve examining ourselves and our field objectively, ensuring that the standards for our evaluators and our field are as high as possible.
Disclosures
Katherine McCallum and W. Neil Gowensmith report that they have developed an app for mobile phones that allows forensic evaluators to track base rates of forensic opinions. One of the uses of the app is to identify and mitigate against bias. In the current article, the authors encourage practicing forensic evaluators to keep track of their base rates, variables that could affect opinions, and so on. Although the authors do not highlight their app in the current article, and although several options for identifying and mitigating bias were provided, Dr. McCallum and Dr. Gowensmith both state that some readers might equate the recommendation that evaluators monitor their work for bias with the authors’ financial interest in the mobile app. To be clear, their ultimate goal is to encourage evaluators to monitor their work reliably and carefully, regardless of methodologies used.