Introduction
This article describes the history of the Scientific Review Committee (SRC) for DSM-5 that I chaired. The five parts of this article delineate the background of the SRC, its procedures and the process of its deliberations; the conceptual/philosophical framework for its approach; the results of its deliberations; and the most important and/or contentious issues that arose in its work. In the final section, I provide recommendations, based on lessons learned, for similar efforts that might be included in future iterations of our psychiatric nosology.
History and procedures
In the fall of 2010, the then President of the American Psychiatric Association (APA), Dr Carol Bernstein, asked whether I would chair a new committee that would review the scientific justification for proposed changes in diagnostic criteria in DSM-5. The APA leadership felt that the creation of such a committee was important to assure the rigor and consistency of the DSM review process. This committee would function outside the DSM organizational structure and report, in an advisory role, to the APA President and Board of Trustees overseeing the DSM process. Discussions about the constitution and procedures for the SRC ensued with Dr Bernstein and the APA leadership. These discussions concluded with a memo by Dr Bernstein appointing the SRC (see Table 1).
Table 1. Key sections of memo from Dr Carol Bernstein creating the Scientific Review Committeea
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160712014847-48516-mediumThumb-S0033291713001578_tab1.jpg?pub-status=live)
APA, American Psychiatric Association; NIH, National Institutes of Health; BOT, Board of Trustees.
a Reprinted with permission from the American Psychiatric Association. Copyright ©2012.
Dr Bernstein's memo referred to a document – ‘Guidelines for Making Changes to DSM-V’ (hereafter Guidelines) – developed a year earlier that outlined standards for determining which changes to DSM-IV diagnostic criteria would be included in DSM-5 (online Supplementary Appendix I). Developed in an iterative process between K.S.K. and the other co-authors, especially Dr D. Kupfer and Dr D. Regier, Chairperson and Vice-Chairperson of the DSM-5 Task Force, the Guidelines were in the tradition of Robins & Guze (Reference Robins and Guze1970) emphasizing the role of validators in evaluating psychiatric disorders, intended to assure that a good diagnosis conveys important objective things about the person so diagnosed. The Guidelines utilized a temporal organization for the validators: antecedent, concurrent and predictive (Kendler, Reference Kendler1980). This document (i) provided a framework of validators to organize data supporting criteria change; (ii) divided proposed changes in DSM-IV criteria into four levels; and (iii) outlined the level of evidential support needed for each level of change. The Guidelines assumed that larger diagnostic changes needed stronger empirical justification and specified four high-priority validator categories: familial/genetic factors; diagnostic stability; course of illness; and response to treatment. The memo, which also provided additional criteria for new disorders, was endorsed by the DSM-5 Task Force.
The initial goals of the SRC were to develop rating forms, a standardized protocol, and a recusal/conflict of interest (COI) policy. While the SRC underwent modest membership changes in its first months, for most of its existence it consisted of eight individuals: Kenneth S. Kendler M.D., Chair; Robert Freedman M.D., Co-Chair; Daniel Blazer M.D.; David Brent M.D.; Ellen Leibenluft M.D.; Paul Summergrad M.D.; Myrna Weissman Ph.D.; and Joel Yager M.D. A memo was prepared for the DSM-5 workgroups (WGs) outlining the recommended documentation for proposals to the SRC, based largely on the Guidelines.
To understand the SRC workings, some knowledge about the DSM process is necessary. Since DSM-III, that process has centered on small groups of experts – termed advisory committees in DSM-III and DSM-III-R, and WGs in DSM-IV and DSM-5 – with responsibility for specific diagnostic areas. They met for many hours, reviewed the diagnostic categories in their charge and recommended possible changes. These suggestions were then reviewed by the DSM leadership that included the Chairperson and Vice-Chairperson, and the Task Force, which consisted of the WG chairs and other individuals chosen for particular expertise. All WG proposals were approved by the Task Force before being forwarded to the SRC for consideration.
The SRC review process, which was novel for the DSM, was modeled on a grant review panel. WG proposals were reviewed independently by at least two SRC members. Based on the evidence supporting the proposed changes, reviewers rated each major section of the proposal on a six-point scale: 1 = strong support; 2 = moderate support (acceptable); 3 = modest support (questionable); 4 = limited support (probably not justified); 5 = poor support (do not include); and 6 = insufficient data. These ratings were discussed in a conference call. When the discussion was concluded, each member scored the proposal by private email to the SRC administrator – Ms Jill Opalesky. Our COI policy was, if an SRC member served on a DSM-5 WG, the SRC member recused themselves from discussing and voting on all proposals from that WG.
The final reports from each meeting of the SRC were sent to the APA President and President-Elect, and to Dr David Kupfer and Dr Darrel Regier, and consisted of: (i) averaged final scores from all SRC members on each proposal; (ii) de-identified reviewers' summaries; and (iii) a brief rationale for the SRC's recommendations and any special issues that arose during the review summarized by the SRC chairs.
The SRC had 36 teleconferences from 23 March 2011 until 19 November 2012; received a total of 109 WG proposals (many of which had multiple parts); reviewed 153 proposals (including those resubmitted and re-reviewed); wrote 130 reports; and sent 36 memos requesting more information from the WGs. A total of 66 proposals (60%) were submitted and reviewed once, 43 proposals (40%) were revised, submitted and/or reviewed more than once, and nine(8%) were revised and submitted two or more times.
Conceptual background
When I was first approached by Dr Bernstein, I had already served on WGs and on the Task Forces for DSM-III-R and DSM-IV, and had been a member of the DSM-5 Mood Disorder WG. In my judgment, DSM-III-R, DSM-IV and what I had observed of DSM-5 thus far were best characterized as scientifically assisted expert consensus. Discussions typically focused on the opinions of individual members about what diagnostic features should be changed. These opinions were based on a wide range of factors including the extensive clinical experience of the members, their conceptual difficulties with the current criteria and relevant research findings. While detailed literature reviews were sometimes commissioned and utilized in deliberations, research results were typically used less systematically, usually to support particular positions advocated by WG members. Of note, the magnitude and quality of the field's research information differed widely across DSM categories and sometimes the evidence to address important clinical questions was minimal.
In my discussions with Dr Bernstein, I advocated for a scientifically driven expert consensus model in which a comprehensive literature review would play a central role in the DSM-5 deliberative process. In this approach, after the key questions are articulated, the first step taken by the WG would be to conduct a detailed review of the entire available empirical literature. The analysis and interpretation of this review would then form the focus of the deliberative process as the group moves towards forming recommendations. There are surely shades of gray in the distinction between a scientifically assisted and scientifically driven deliberative process that reflects the relative emphasis on opinions, individual studies or kinds of studies versus a more detailed and comprehensive literature review across all available validator classes. For comparison, both the Center for Drug Evaluation and Research of the Federal Drug Administration (CDER) and the Advisory Committee on Immunization Practices (ACIP) for the Centers for Disease Control employ the scientifically driven expert consensus model. In both instances, teams of assembled experts serve to review data presented to them and render, based on those findings, expert consensus judgments. However, unlike the DSM, which reviews all of psychiatric nosology, CDER and ACIP receive focused proposals typically prepared only when sufficient data are available to address each specific issue.
In my view, DSM's expert consensus model had three important limitations. First, it was potentially vulnerable to changing opinions. Psychiatry, like all human disciplines, has fashions. Diagnoses change over time in their popularity and formulation. Sometimes these changes are associated with new and robust scientific evidence but sometimes not. In many human endeavors, and psychiatry is no exception, it is common to regard current thinking as inherently superior to what came before.
Second, the outcome of the expert consensus model is quite sensitive to the composition of the WG. If a field is divided in diagnostic opinions, the choice of members for the WG, typically respected experts in their field, could often pre-determine the outcome of the deliberations. It would be naive to suggest that prior opinions do not make an impact on scientifically driven expert consensus. However, as advocated by Bacon at the dawn of the scientific revolution (Bacon, Reference Bacon1620), a focus on empirical evidence, especially using agreed-upon assessment criteria, can move a discussion away from prior subjective beliefs to the more objective process of interpreting the relevant data.
Third, the DSM expert consensus model implemented did not always adequately balance the inherent trend toward making changes built into the DSM process. The individuals serving on DSM WGs are typically busy and highly motivated volunteers who care about their psychiatric diagnostic category and want to ‘do things’ to improve DSM. It is difficult for such individuals to spend dozens of hours over several years in meetings, travel time, reading and writing reports only, at the end, to conclude ‘Nothing needs to change so let's leave well enough alone.’ For WG members, it is a natural source of pride to ‘make a difference’, to ‘put their mark’ on the document. More rarely, individuals working on diagnostic categories not yet in DSM know that acceptance of that category into the next edition would positively affect their career or research. All these factors bias toward initiating changes, which were proposed for over two-thirds of all categories in DSM-5.
However, the downsides of nosologic change are numerous and substantial. Trainees and practitioners have to learn the new criteria. Patients have to be re-diagnosed. Coding forms are changed. Books are rewritten. Research studies are interrupted by requirements to re-diagnose patients or jeopardized because they are using ‘out-dated’ diagnostic criteria. Access to needed public services or support may be put at risk.
It was my view, shared with Dr Bernstein and others in the APA leadership, that moving toward a more scientifically driven expert consensus model could help address these issues. However, there was one further reason why I advocated that the SRC should take this position: the concept of epistemic iteration as articulated by the philosopher and historian of science, Hasok Chang (Chang, Reference Chang2004; Kendler, Reference Kendler2009). Iteration as a process originates in mathematics as a computational method that, using available data, generates a series of increasingly accurate estimations of a parameter. In an iterative system, each estimate improves on its predecessor so that, with sufficient time, the process asymptotes to a stable and accurate parameter estimate. Chang applied this notion to science and defined epistemic iteration (where ‘epistemic’ refers to the acquisition of knowledge) as an historical process in which successive stages of knowledge build in a sequential manner upon each other. Accordingly, epistemic iteration should lead through successive stages of scientific research toward better and better approximations of reality in ‘a spiral of improvement’, each subsequent stage producing more accurate estimates than the stage that came before. As described elsewhere (Kendler, Reference Kendler2009, Reference Kendler, Kendler and Parnas2012), I felt that this model could be usefully applied to psychiatric nosology and represent a potential framework for the future of DSM. But how might we try to ensure that each edition of the DSM produced better and especially more valid diagnoses? The response would be to put all proposals through a rigorous scientific review. This vision for the SRC was supported by Dr Bernstein and the APA leadership.
Results of SRC deliberations
The SRC quickly developed an efficient work pattern. WG proposals were sent to us from Dr Kupfer and Dr Regier, and we tried to review and respond to submissions within 2 weeks. Particularly important was the development of a working consensus on how to approach scoring of the proposals. Over the course of the SRC, the two independent raters differed by more than one point in our scoring only 5.3% of the time. The intra-class correlation for their ratings was +0.86.
Table 2 presents the consensus ratings for our 248 scores, often reflecting multiple ratings for complex proposals. For 38% of the proposals, we judged the empirical data to well support the proposed changes. For 42% of the time, we judged the proposal to have inadequate scientific support. For 20% of the time, we judged the support to be modest and in most of these cases, recommended approval of the proposal.
Table 2. Summary of scores (includes subset score from more complex proposals)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160314090204493-0247:S0033291713001578_tab2.gif?pub-status=live)
Conceptual issues in the implementation of the SRC
A number of issues arose during the tenure of the SRC. Here, I review the most prominent.
Objections to the SRC
A frequent objection to our work was, ‘Why do we have to justify changes from DSM-IV when often those DSM-IV criteria were not strongly supported by scientific evidence?’ Indeed, on the face of it, this request seems unfair. The response of the SRC – as charged by the APA leadership – was simple: ‘We agree, but you have to start somewhere.’ That is, if our nosologic process is going to shift from an expert consensus to a more empirically based model, you have to have a turning point. Given that any diagnostic change has a cost, and we were starting with DSM-IV, the obvious choice for this turning point was DSM-IV. Many WG members were dissatisfied with this response and felt defining DSM-IV as the starting benchmark was frustratingly arbitrary.
A second common complaint was that the SRC process was too conservative. Indeed, the SRC did put WGs into a Catch 22: WGs argued: ‘We can only gather good data on the diagnosis after it has been adopted into DSM-5 but you are requiring validating data beforehand.’ However, being conservative regarding nosologic change is not, as noted above, a bad thing. While suboptimal diagnoses are probably less harmful than ineffective drugs, the Food and Drug Administration would not approve drugs without first requiring adequate data, including assessment for adverse effects. In addition, the claim about the Catch 22 is not entirely true. Fields looking at new diagnostic formulations need not wait for DSM. They study it anyway. The SRC reviewed several proposals that we approved for new diagnoses where good empirical information was available despite the fact that the new criteria had never previously been in DSM.
Potential pitfalls
A worry going into the SRC process was the problems inherent in evaluating the psychiatric research literature with respect to the validity of diagnostic changes including how to balance the number of studies versus their quality, the importance of clinical versus epidemiological studies, the value of studies using diagnostic criteria close to but not exactly the same as those proposed, and the relative importance of different validators (e.g. response to treatment, genetics or biological findings). One pleasant surprise of the SRC process was the degree to which the group rapidly developed a congruent approach to the widely divergent nature and quality of research literature that we encountered. Even when reviewing complex proposals with many data points, SRC members working independently, and despite different backgrounds and areas of research expertise, typically (but not always) reached similar conclusions. In informal discussions with SRC members about the reasons for our ability to obtain consensus so frequently, most felt that the most important factor was the clarity of the operationalized criteria that we were asked to apply.
What should be the threshold for a change in a diagnostic category that needed empirical support? We saw a number of WG proposals that were essentially criteria clarification, often reflecting small wording changes. In the spirit of the Guidelines, our rule of thumb was that criteria changes not likely to change casesness did not require substantial empirical support.
All SRC members had busy day-jobs. The SRC work was often intense and time demanding. We expressed, in our meetings, some concern about ‘reviewer fatigue’. We worked hard to prevent criteria creep but objective review might reveal a modest trend for less empirical rigor over the course of our tenure.
When the SRC was constituted, the APA and DSM leadership agreed that we should not have direct contact with the WGs. However, our ‘revise and resubmit’ policy was frequently used for proposals not receiving strong SRC support on their first submission. Sometimes this worked well, where additional data were available and the revised proposals were much stronger so that we were able to give them more support. Sometimes this was less effective, particularly when no further data were available with which the WG could address our concerns.
Our reviews demonstrated clearly the wide variation in the quantity and quality of empirical studies across the diagnostic domains of psychiatry. This confronted the SRC with a dilemma as we struggled to implement our charge. Should we apply the criteria for change similarly across diagnoses, or require less evidence for change in categories with smaller research literatures? Our consensus was, in the main, to keep criteria consistent across the board but incorporate some flexibility so that reviewers could adjust their ratings modestly to account for the paucity of data in particular fields. This was consistent with the Guidelines that allowed for less robust empirical support for changes in diagnoses which had ‘not been widely studied or well validated’. We were aware that our approach, even with the modest ‘wiggle room’ we used, effectively made SRC support more difficult to obtain in diagnostic areas with limited funding or research literatures.
The Guidelines have an extensive set of criteria for the creation of a new DSM disorder (section 2.iii) including demonstrating the need for the new category, its independence from current diagnoses, its potential harm, and the demonstration that it meets criteria for a psychiatric disorder. While rigorous, our view was that these criteria were appropriate and functioned well. The SRC received a fair number of proposals for new disorders, a number of which were supported.
The SRC was well placed within the DSM process to see how differently individual WGs approached the use of empirical data in their deliberations. For some WGs, detailed literature reviews were central to their proposals and the quality and thoroughness of their reviews were typically high. Their approach already represented a scientifically driven expert consensus model and thus fit easily with that of the SRC. Other WGs functioned more within an expert consensus model, where detailed literature reviews were less utilized and informed clinical opinions were more central to their deliberative process. Some WGs fell between these poles. In their proposals to the SRC, WGs that examined disorders with strong research foundations often but not always emphasized the importance of thorough empirical reviews, while those with very limited research literatures tended to adopt the expert consensus model.
Unexpected challenges
Unanticipated disagreement arose about the relative importance of the validity of diagnostic categories versus individual criteria. The SRC received a WG proposal to add a single criterion to a major category. A good deal of scientific evidence supported the validity of this criterion. Furthermore, the WG argued that this criterion was a useful marker of the overall disorder and had historical, conceptual and clinical relevance. However, when this criterion was added to those already present for the disorder, the validity of the diagnosis did not appreciably improve. We did not recommend the proposal, stating that our charge was to evaluate the validity of the diagnostic category as a whole. Further, we argued that the approach of this WG would not generalize well. If DSM focused on the validity of individual criteria and not on how they perform together in diagnostic categories, we would produce criteria sets that were cumbersome and redundant. The WG disagreed with our judgment.
It was important for the WGs to understand that, in evaluating a proposal, the SRC focused narrowly on the question of whether the proposed change increased the validity of the diagnosis. For example, one WG proposal to add new criteria to a disorder presented evidence that in clinical populations, affected patients demonstrated a symptom factor not assessed by DSM-IV criteria. While these data suggested a need for diagnostic criteria reflecting this factor, it did not demonstrate that the addition of these new criteria improved the validity of the diagnosis.
The epistemic iteration model for scientific progress is best suited for evolutionary and not revolutionary change. However, if a field or diagnostic category is in a ‘scientific box canyon’, then a ‘re-boot’ (also known as ‘scientific revolution’; Kuhn, Reference Kuhn1996) may be needed rather than small incremental improvements (Kendler, Reference Kendler, Kendler and Parnas2012). But how should such large shifts in conceptualization of a diagnostic category be evaluated within the SRC mandate? Validation data could be generated for a substantially different diagnostic approach and compared with that found for the parallel category in DSM-IV. However, this would require time and effort. One proposal for a major diagnostic category within DSM involved such a major conceptual shift. However, a final version of this formulation did not emerge until late in the DSM-5 process, leaving little time for the collection of validating data. While the SRC appreciated the strength of the arguments for the need for the conceptual shift, by our criteria, the proposal was insufficiently supported by validating data.
A challenging aspect of the SRC work was how to balance the importance of scientific evidence against clinical or public health need. This problem arose in several different contexts. First, several proposals addressed clinical issues of public health urgency that also contained a reasonable amount of empirical support. In light of these public health concerns, the SRC gave these proposals slightly more positive scores than warranted by the validating data alone. Second, WGs sometimes submitted proposals for major revisions of the DSM-IV criteria that they believed addressed important conceptual or clinical issues, but for which there was little or no empirical support. We did not give such proposals strong support. However, there were sufficient numbers of such proposals that, several months into our work, we asked the APA and DSM-5 leadership to consider establishing an additional review group to consider these clinical and public health issues. While the SRC was best constituted to address empirical/scientific issues with our focused charge and specific criteria, we agreed that there could be other justifications for changes to DSM-IV. After an extensive discussion, such a committee – the Clinical and Public Health Committee – was constituted and played an important role in subsequent DSM-5 deliberations.
Third, we received WG proposals for ‘small conceptual changes’ or ‘fine-tuning’ of DSM-IV criteria. For example, the SRC received a proposal for a moderate criterion change for a major psychiatric disorder. This change was estimated to exclude from the category around 2% of cases meeting DSM-IV criteria. The WG said that, because of the small proportion of cases involved, no empirical data were needed to address whether the excluded individuals differed systematically from the remaining patients. Their proposal made a reasonable case that the change was conceptually and historically sensible, although the group of qualified experts who created the DSM-IV criteria disagreed. The SRC felt that the justification for the change was insufficient given the lack of empirical support. The WG disagreed, arguing that the increased clarity of the revised criteria was sufficient justification for the change.
In addition to proposals for criteria changes, the WGs sometimes proposed that disorders be assigned to different sections within DSM. This came to be called ‘meta-structure’ issues. Ongoing discussions occurred between the SRC and the APA and DSM leadership about whether meta-structure changes should be reviewed by the SRC and, if so, what level of validating criteria should be required. The decision was that such changes should not be systematically subject to SRC review, in part because of complications this might introduce in efforts to maximize consilience between DSM-5 and ICD-11.
The end game
As the DSM-5 revision process moved toward completion, final decisions needed to be made about which disorders would go into DSM-5. The SRC was one of the voices participating in these discussions, first with the DSM-5 Summit Group (a final advisory panel representing major DSM constituencies), and then with the APA Board of Trustees, who along with the APA assembly, were the final arbiters of the DSM process. These discussions were open and often vigorous. In general, the SRC was among the most conservative of voices, typically (although not always) arguing against the inclusion of changes advocated by the WGs. As might be expected, this was not a format in which the specific scientific points that served as the basis for the SRC decisions could be articulated at length. In general, our role was to give global recommendations, sometimes with brief summaries of our justification. Many, but not all, of our recommendations were followed. In such cases, the Board of Trustees had to integrate conflicting advice from the WGs and the SRC, as well as from other important voices including the Clinical and Public Health Committee.
Recommendations
One of the most difficult aspects of the SRC work resulted from the fact that the committee was not constituted until well into the DSM-5 process. Understandably, some WGs were unhappy about the creation of the SRC and felt that the goal posts were moved in the middle of the game.
If an SRC-like body will be part of future nosologic revisions, it should be so from the beginning. The rules for its relationship to the WGs and the DSM leadership should also be developed and accepted at the beginning. Prior to beginning a DSM revision, the leadership should constitute a broad expert committee to update the explicit criteria for change and for the addition of new diagnoses. In other words, this group should promulgate the second edition of the DSM-5 ‘Guidelines’. Furthermore, the procedure for review should be clearly articulated. This would ensure a broad ‘buy-in’ to the review process and specifically to the role of the scientifically driven expert consensus model represented by the SRC. This would also foster a consistent approach across the WGs in how they approach their deliberative task and justify their proposed changes.
The SRC experience reinforced the value of having an independent expert panel convened by the APA outside of the formal DSM process to review all proposals for change. This review has the important virtue of representing a check on the biases toward change that can arise during the revision process. In attempting to provide an objective review of the evidence, an SRC-like body can maximize the consistency of rules for change across DSM categories.
Conclusions
The central role of the SRC was to provide external review for all proposals for diagnostic change in DSM-5, evaluate them on their level of empirical support using objectively structured rules of evidence agreed upon in advance and make appropriate recommendations to the APA leadership. As expected, given the many competing voices in the DSM-5 process, several of its recommendations were not, after vigorous debate, accepted by the Board of Trustees. The creation of the SRC necessitated a great deal of additional work on the part of the SRC, the WGs and the DSM-5 Task Force. However, the SRC succeeded in increasing the focus on empirical standards for nosologic change and providing a greater degree of consistency and objectivity in the DSM review process.
For future iterations of the DSM, the APA must decide whether it wants to continue the process begun in the SRC. Does it wish to continue the increased focus on systematic empirical reviews as the main ‘engine’ driving diagnostic change? How will the DSM process balance the need for scientific evidence versus other clinical, conceptual and public health demands on our nosology? How does the field want to deal with the wide variation in the quality of scientific evidence across our different diagnostic categories? How high do we wish to set the bar for change in our diagnostic criteria? If the APA decides to move in the direction represented by the SRC, then the SRC experience in DSM-5 can be considered a test-run that can provide important lessons about what to do (and not do) in the future. Those who feel that the best interests of our field and of our patients are served by increasingly focusing on the scientific basis of nosologic change using objective and clearly articulated criteria should encourage the incorporation of SRC-like processes into future iterations of the DSM.
Supplementary material
For supplementary material accompanying this paper visit http://dx.doi.org/10.1017/S0033291713001578.
Acknowledgements
The members of the SRC selflessly contributed their time and expertise to the benefit of the DSM process and the psychiatric community more widely. Jill Opalesky M.S. served as the SRC administrator and organized the many required SRC functions. Carol Bernstein M.D., Daniel Blazer M.D. Ph.D., David Brent M.D., Ellen Leibenluft M.D., Myrna Weissman Ph.D. and Joel Yager M.D. provided helpful comments on earlier versions of this article. There was no financial support for this article.
Declaration of Interest
None.