Introduction
This essay, which attempts to sketch a conceptual and historical framework for our efforts to develop a scientifically based psychiatric nosology, examines four themes. First, I develop an historical perspective to the problems of classification using examples from the early history of biological taxonomy. Initial efforts at species definitions used top-down approaches advocated by experts and based on a few a priori essential features. This approach was rejected for both conceptual and practical reasons and replaced by bottom-up approaches using a much wider array of characteristics. In the second section of the essay, I outline parallels between the beginnings of biological taxonomy and psychiatric nosology. Although there are certainly important differences between these two kinds of classification, they are similar in one important respect. As with biological taxonomy, our nosology began largely with ‘expert’ classifications frequently influenced by a few a priori essential features articulated by one or more great 19th-century diagnosticians. In the third section, I describe the underappreciated historically contingent nature of our current psychiatric classification; that our nosology is based on a modest number of ‘expert’ classifications, chosen for both substantive and arbitrary reasons. Like biology, we need to move toward more soundly based bottom-up approaches based on wider arrays of features. This segues into the fourth section where I argue that it is important to ensure that our psychiatric nosologic enterprise is cumulative. This can be best achieved through a process of epistemic iteration.Footnote †
Episodes from the history of biological taxonomy
In 16th- and 17th-century Europe, the task of biological taxonomy, as it evolved from the pragmatic classifications of herbalists and apothecaries, was to discover the organizational scheme of God's creation (Dear, Reference Dear2006). This task was approached using ‘downward classification by logical division’ (Mayr, Reference Mayr1982). For example, a particular expert botanist would begin by selecting a single trait on which the plants could be initially divided. This trait was chosen to reflect the essential property of plants, the key feature that made that group of plants what it was. Then essential traits of lesser importance would be used to subdivide the first set of classes and so forth. The influence of the particular scheme so proposed would reflect the strength of the reputation of the given expert. The first clear articulation of this principle was by the early modern botanical taxonomist, Cesalpino (1519–1603), who considered the ‘fructifying characters’ of plants to be means provided ‘by the Grace of God’ for distinguishing between different genera of plants [De Plantis Libri, Florence, 1583; translated by Sloan (Reference Sloan1972, pp. 4–5)].
However, a problem arose. Various expert taxonomists proposed different essential characters on which plants should be first classified. Although Cesalpino favored fruits, other botanists advocated general growth patterns (i.e. discriminating trees, shrubs and herbs) or flowers. The result was taxonomic confusion because the various taxonomies agreed poorly with each other. As described by Mayr:
The choice of characters during the early steps of the division results by necessity in entirely different classifications. This is why the systems of the great botanists of the 17th and 18th centuries, who followed in Cesalpino's footsteps, differed so drastically from each other. … In the classification of animals, likewise it led to entirely different classifications whether one chose as the first differentia with blood or without, hairy or hairless, two-footed or four-footed. (Mayr, Reference Mayr1982, p. 161)
John Ray (1627–1705), one of the expert biological taxonomists of the late-17th century, was among the first to articulate this problem (Sloan, Reference Sloan1972; Dear, Reference Dear2006). If the goal was to select the essential features of plants and animals that accurately reflected God's plan for creation, by what principles was it possible to read the mind of God? Prior discussions had focused on which feature was the most essential. Ray raised the question of whether we could, in principle, determine which of the many diverse traits of plants or animals reflected the organism's true essence.
Ray noted that there was no objective method of deciding whether a particular organismic trait reflected the essential nature of that creature. Consequently, he took the then radical position that, no matter how skilled and experienced the expert, such man-made schemes were ultimately arbitrary and could not reveal the true organization of nature. His line of reasoning, as summarized by Dear, is of particular interest.
Should, he asked, the whale be counted as a creature most closely related to warm-blooded land animals, such as horse and cows, or should it be classed as a kind of fish? It all depends on which of its characters are taken as the most important. (Dear, Reference Dear2006, p. 46)
Although God might know that being warm-blooded is more important than having fins, how are humans to perceive this? The English philosopher John Locke, who trained in both medicine and botany, weighed in on this question in his 1690 famous Essay Concerning Human Understanding (Locke, Reference Locke1870). As summarized by Sloan (Reference Sloan1972), Locke concluded:
the very claim that a natural classification was a worthwhile goal of scientific investigation had … rested on the assumption that there is some ‘natural’ arrangement of organisms, and furthermore that this arrangement ultimately can be known by man … [However] the grouping of objects into different classes and kinds cannot claim to be based on the knowledge of some real essence or substantial form … There can be no possibility of weighting one character or structure as being more indicative of the real essence than any other. (Sloan, Reference Sloan1972, pp. 22–26)
These abstract problems became central to a heated taxonomic botanic debate in the late-17th century between Ray, then the leading classificatory biologist of the British Royal Society, and Pitton, the expert ‘demonstrator’ of plants in the French Jardin du Roi (Sloan, Reference Sloan1972). Pitton, a follower of Cesalpino, insisted that the development of a ‘natural’ classification of plants required focusing solely on fruiting bodies. Ray, deeply skeptical about the entire concept of essential features, argued that taxonomists needed to take a much wider set of traits into account including ecological criteria such as habitat (Sloan, Reference Sloan1972).
History was not kind to the essentialist top-down classification approach advocated by Pitton, where ‘top-down’ refers to the a priori specification of a small number of key traits on which the classification depends. Mayr writes:
Eventually, it became clear that it was futile to attempt to salvage downward, divisional classification by modifying it and that the only way out was to replace it by a completely different method: upward or compositional classification … Not only was the direction of the classificatory steps reversed, but reliance on a single character was replaced by the utilization and simultaneous consideration of numerous characters. (Mayr, Reference Mayr1982, p. 192)
One prominent representative of this line of thought was the French Botanist Adanson, who concluded in 1763 that there can be only one natural method, which ‘can only be attained by consideration of the collection of all the plant structures’ (Sloan, Reference Sloan1972, p. 3).
Space precludes a detailed examination of the history of biological taxonomy over the ensuing centuries, dominated by what Mayr calls ‘upward classification by empirical grouping’. The field struggled with how to weight the wide variety of possible characters in the development of species categories and whether the weight of individual characters could change from one taxon to another. After the publication of the Darwin's Origin of Species in 1859, the debate shifted substantially. Darwin argued for an entirely different conceptual framework for species and one that was incompatible with the prior essentialist position. Species arose, he suggested, from many accumulated variations selected to ensure survival. Evolution had no essences. The post-Darwinian history of taxonomy, though interesting, is less relevant to our story and is not pursued here. I conclude this section with the following summary by Mayr of Adanson's sophisticated approach to the problem of classification articulated a century before Darwin's crucial work:
One can summarize Adanson's attitude toward characters by saying that he did indeed favor the weighting of characters, but not on the basis of any preconceived notion or a priori principles … but rather on an a posteriori method based on a comparison of groups that had been [tentatively] … established by inspection. (Mayr, Reference Mayr1982, p. 195)
Parallels in the history of biological and psychiatric taxonomy
Several parallels are evident between the history of psychiatric and biological taxonomy. First, in what Zilboorg (Reference Zilboorg1967) called the ‘Era of Systems’, the 19th and early 20th centuries saw a profusion of proposed psychiatric nosologies each from different experts similar in kind to the botanical and zoological proposals of the 16th to 18th centuries. The proposing authors (including luminaries such as Pinel, Griesinger, Kahlbaum, Krafft-Ebing, Wernicke, Kraepelin and Bleuler) each brought to their classifications both wide clinical experience and a range of assumptions about what constitutes the essential features of psychiatric illness. Kraepelin (Reference Kraepelin1987), for example, focused on the course of illness whereas Bleuler (Reference Bleuler1950) assumed that the diverse features of schizophrenia were all manifestations of deeper abnormalities (e.g. his ‘4 As’). Put into our current terminology, these authors emphasized different validators. Just as diverse botanists, using distinct essential criteria, developed incompatible taxonomies, so various expert psychiatrists, using different validators to implement their concept of the essential nature of psychiatric disorders, proposed a bewildering array of different possible nosologic schemes. What Kraam (Reference Kraam2008) has called the ‘chaotic psychiatric nosology of the late nineteenth century’ well mirrored the confusion in zoological classifications between experts who emphasized (in Mayr's words), ‘differentia with blood or without, hairy or hairless, two-footed or four-footed.’
Second, the conclusions of Ray and Locke about botanical taxonomy apply to psychiatric nosology. We have no inherent ability to determine a priori those features of psychiatric disorders that reflect their essential nature. Influenced by his experience with dementia paralytica (which, untreated, did have a distinct deteriorative course), Kraepelin followed his intuition that outcome ought to be decisive. Conceptually, this is no different from Cesalpino's judgment that fruits were the critical trait for classifying plants. Kraepelin was a gifted and very experienced clinician who attempted to develop his nosology in a systematic way using his famous method of file cards (Kraepelin, Reference Kraepelin1987). But even he lacked the ability to see ‘essences’. (It is worth noting that Kraepelin also had more pragmatic goals for his nosologic system, which were partly fulfilled by his system.)
The deep problem with such essentialistic approaches is that they are not easily open to empirical verification. If expert taxonomist A claims that plants should be initially divided on the basis of their fruiting bodies and expert taxonomist B on the nature and shape of leaves, they will develop internally consistent but distinct taxonomies. On what basis can we compare one with the other – as each achieves its own internal goals?
Third, this history illustrates well the close connection between nosology and philosophy. Classifiers inherently have a priori concepts about the things they are classifying, and the nature and goals of those classifications. This history should provide a cautionary tale for those who vigorously pursue nosologic goals without a careful examination of their assumptions.
The historical contingency of our classification
In considering the historical origins of our psychiatric nosology, we are faced with an uncomfortable truth. We are justifiably proud of the increasing influence of empirical evidence in our nosologic process. However, this should not blind us to the important role played by historical processes. The creation and revisions of the DSM are firmly entrenched in a particular historical context. DSM-III did not begin with a blank slate. Rather, it started with the major categories of psychiatric illness as articulated by European psychiatrists from the late-18th century onward (although sometimes based on much older clinical traditions). DSM-III developed operationalized criteria for these categories, often relying substantially on earlier criteria sets (Feighner et al. Reference Feighner, Robins, Guze, Woodruff, Winokur and Munoz1972; Spitzer et al. Reference Spitzer, Endicott and Robins1975). Although this is not necessarily an important limitation, we should recognize the degree of historical contingency in our system. Had we been in a parallel universe in which Emil Kraepelin, Eugen Bleuler, Kurt Schneider and Robert Spitzer never lived, DSM-IV would surely have differed in important ways. In the following sections, I illustrate this issue with an historical sketch and a thought experiment.
An historical sketch: the story of first-rank symptoms
What we now call the Schneiderian approach to the diagnosis of schizophrenia arose from the work of the Heidelberg School beginning in the early 1930s, primarily through the work of Carl Schneider (no relation to Kurt), Hans Gruhle and Willy Mayer Gross (Shorter, Reference Shorter2005; Wikipedia, 2008; written personal communication, German Berrios, February 2008). Willy Mayer Gross, who was Jewish, was forced to flee rising Nazi influences in Germany in 1933 and moved to the Maudsley Hospital, London, to work with Edward Mapother who was then superintendent. With the help of the Rockefeller Foundation, Mapother provided fellowships for German academics fleeing Hitler, one of which supported Mayer Gross in 1933–34. In his teaching there and later in Birmingham, he was a strong advocate for the Heidelberg view of schizophrenia.
Kurt Schneider, lacking enthusiasm for the developing tide of psychiatric eugenics championed by the Nazis, left the institute at Heidelberg and served as an army doctor during the war. With the help of the Allies, he took up the Heidelberg chair from Carl Schneider who, while imprisoned, killed himself in 1946. Kurt Schneider's summaries of the Heidelberg view of schizophrenia were then unknown in the Anglophonic world because they had originally been written in a handbook intended for family doctors. They were assembled in his book Clinical Psychopathology published in German in 1950 and translated into English in 1959.
Several prominent British psychiatrists were influenced by Mayer-Gross, particularly Martin Roth, Eliot Slater, and later John Wing. When Wing came to develop his Present State Examination (first described in detail in a 1967 publication; Wing et al. Reference Wing, Birley, Cooper, Graham and Isaacs1967), his assessment of the positive symptoms of schizophrenia relied heavily on the symptoms articulated by Schneider and earlier Mayer-Gross.
For the Collaborative Study of Depression, a team of three experienced American psychiatric nosologists, Robert Spitzer, Jean Endicott and Eli Robins, were asked to develop diagnostic criteria for what would become the Research Diagnostic Criteria (Spitzer et al. Reference Spitzer, Endicott and Robins1975, Reference Spitzer, Endicott and Robins1978). Working from the previous published Washington University (or ‘Feighner’) criteria (Feighner et al. Reference Feighner, Robins, Guze, Woodruff, Winokur and Munoz1972), Spitzer recalls that there was concern that these criteria contained the terms ‘hallucinations’ and ‘delusions’ without any further specification (personal oral communication, February 2008). He had read about Wing's Present State Examination with its emphasis on Schneiderian symptoms and been particularly impressed by the evidence that these could be assessed with high reliability. On that basis, they were adopted into the Research Diagnostic Criteria, which then formed the basis of the symptomatic criteria for schizophrenia in DSM-III and all subsequent DSM editions. In DSM-IV, the imprint of Schneider's work is seen in the ‘note’ in the A criteria that includes a specific description of two forms of special auditory hallucinations (‘running commentary’ and ‘voices conversing’), which are straight out of his first-rank symptoms (Mellor, Reference Mellor1970).
This simplified historical vignette illustrates the historical contingencies of our diagnostic process. The chain of historical events leading up to the inclusion of the Schneiderian symptoms in DSM-III could have been interrupted in any of a wide variety of ways (if Hitler had not come to rule Germany, if Mayer-Gross had not been Jewish, if Kurt Schneider had been a prominent Nazi sympathizer, if John Wing had not been so influenced by Mayer-Gross and Schneider, if the Collaborative Study of Depression had not been funded, if Robert Spitzer had not been impressed by and sought to emulate aspects of the Present State Examination, etc., etc.).
A thought experiment: rewinding the tape
The historical contingency of our diagnostic system can also be illustrated by a thought experiment detailed elsewhere (Kendler & Parnas, Reference Kendler and Parnas2008, ch. 9). Assume that we could take the tape of history and rewind it to about 10 000 years ago when modern humans had developed agriculture and the beginnings of urban civilization. Then, we follow the tape forward until modern science and medicine develops, out of which a field something like psychiatry emerges. At some point, this field then develops a formal nosology, a ‘proto-DSM’. If we did this experiment 1000 times, what would we find? How often would the categories in these ‘proto-DSMs’ resemble DSM-IV? Considering medical disorders, it seems likely that at least some medical disorders such as insulin-dependent diabetes, myocardial infarction and peptic ulcer disease would emerge with considerable consistency in such a process. The structure of the periodic table would almost certainly emerge in a functionally identical fashion each time. However, given the importance of historically arbitrary factors in the development of our own nosologic system, how often would we expect to find syndromes closely resembling schizophrenia, major depression, dysthymia, generalized anxiety disorder, intermittent explosive disorder or narcissistic personality disorder on these ‘tape rewinds’? (Although beyond the scope of this essay, the cross-cultural and historical study of psychiatric disorders suggests that some forms of disorder may be relatively independent of cultural background or historical period and more likely to emerge from multiple ‘rewinds’ whereas others may be much less stable and therefore less likely to be reproducible in different historical scenarios; Edgerton, Reference Edgerton1976; Murphy, Reference Murphy1976; Westermeyer, Reference Westermeyer1979; Hacking, Reference Hacking1998; Kleinman, Reference Kleinman2004.)
Importance of recognizing historical contingencies
Many of us trained in the past two generations in psychiatry, especially those under the influence of the ‘DSMs’, have come to consider our major diagnostic categories to be obvious and even ‘natural’. They have become a part of our world view. However, the formulations of the categories in use have been heavily influenced by specific ‘expert’ opinion, which, though certainly clinically informed, has been heavily influenced by a priori factors. In this situation, we are vulnerable to these historical contingencies, to arbitrary features of our heritage that reflect much better a particular essential view of psychiatric disorders rather than the empirical reality out there in the world. This would not be a problem if the nature of psychiatric illness was clear and discrete in the world. Were our task to develop the taxonomy of the elements – the periodic table – it would not matter much who the experts were. Once started, we would expect the science to eventually settle down to the correct classification given the robustly discrete nature of the chemical elements. But like species, psychiatric disorders are fuzzy constructs that shift when viewed in different ways. They are vulnerable to the ‘expert’ effect. How, like botany, can we over time reduce the impact of expert, top-down, a priori nosology and move toward a broader reaching, empirical bottom-up nosology? I outline such an approach in the final section.
Epistemic iteration
A defining feature of the mature sciences is their cumulative nature. Knowledge progresses with research programs building on what has gone before. Should psychiatric nosology strive toward such a goal? For critics of psychiatric diagnoses who view them as social constructions, this is an incoherent project. If there is no truth out there, we cannot expect to get closer to it. For those who adopt either realist or pragmatist perspectives on psychiatric nosology – that there are things or inter-related sets of things out there in the real world that correspond to individual psychiatric illnesses – it is a more rational and, I would argue, vital task. I here explicate this approach through the concept of iteration.
The idea of iteration originates in mathematics and is defined as a computational method that generates a series of increasingly accurate estimations of a desired parameter. In a properly working iterative system, each estimate improves on its predecessor. With a sufficient number of iterations, this process asymptotes to a stable and accurate parameter estimate. Iterative processes are robust in that they can begin with widely divergent starting values and reliably converge to the same correct solution.
In a recent thoughtful book with important, albeit indirect, lessons for psychiatry, Chang (Reference Chang2004) expands on this concept to develop what he terms ‘epistemic iteration’ and applies it to the history of the science of temperature. According to Chang, epistemic iteration (where ‘epistemic’ refers to the acquisition of knowledge) is an historical and scientific process in which successive stages of knowledge in a given area build in a sequential manner upon each other. Directly analogous to its original mathematical meaning, when correctly applied, the process of epistemic iteration should lead through successive stages of scientific research toward a better and better approximation of reality in ‘a spiral of improvement’.
How does this apply to the history of thermometers? In the measurement of temperature, someone had to start somewhere with a crude measuring device; in this case an air thermometer developed by Robert Fludd in 1638. This initial approach required all sorts of assumptions that at first were untestable (such as the increase in air volume with temperature would be linear across the temperature range under investigation). Over the next two centuries, the accuracy of thermometers was evaluated and improved by cross-calibration with a variety of different physical designs, calibration scales and substances that, because of their expanding–contracting nature, reflected temperature (including, in addition to air, alcohol, mercury and clay) (Chang, Reference Chang2004). Chang documents the deep controversy between scientists and their followers over the ‘best’ thermometer and the challenges, all eventually overcome, of expanding the range of temperature measurement to the very cold (below the freezing point of most substances) and the very hot (beyond the boiling point of most substances). This in part required the intercalation of measurements from different media and also extensive research on the nature of the boiling and freezing of water, as these temperatures were quickly established as important benchmarks. It was a messy, conflict-filled process including colorful characters and trips to Siberia. Yet, over time, relentlessly, the range and accuracy of the measurement of temperature improved. Chang claims that an inevitably untidy but fundamentally iterative process was at work. The development of the thermometer reflected a series of increasingly successful approximations. Chang, concludes from his history, that
in an iterative process, point-by-point justification of each and every step is neither possible nor necessary; what matters is that each stage leads on to the next one with some improvement. (Chang, Reference Chang2004, p. 215)
The concept of epistemic iteration poses a substantial challenge to psychiatric nosology. If our current methods for validating psychiatric disorders, including description, genetics, imaging, treatment response and follow-up studies, reflect aspects of an objective truth out there in the world and we want our nosology to describe those truths with increasing accuracy, the only way to achieve this is to assure ourselves that each periodic revision of our manuals contains improvements on its predecessor. That is, changes are only made on the basis of convincing evidence that, using an agreed upon set of validators, the new diagnostic criteria improve upon the performance of their predecessor.
We noted above the inherent historically contingent nature of many of our clinical categories. Let us take for a moment the influence of Emil Kraepelin. We argued that, despite his deep clinical experience, Kraepelin simply could not know that long-term course should be the defining criteria for psychotic disorders. (Indeed, his concept that dementia praecox was fundamentally a deteriorative disorder may be mistaken; Menezes et al. Reference Menezes, Arenovich and Zipursky2006.) So, we cannot be sure that we are starting off at the right spot in our iterative process. However, a wonderful property of iteration is its capacity to get to the real solution regardless of the starting point. Although the process might be slow, as long as every iteration improves on its predecessor, the logic is relentless.
Epistemic iteration might work well for a clearly defined, unidimensional physical phenomenon such as temperature, where progress is easy to measure (e.g. how well does your thermometer agree with my thermometer?). Psychiatric disorders are ‘messier’ and can be viewed from several perspectives that do not always agree with each other (Kendler, Reference Kendler1990). In addition, historical and cultural factors can influence conceptions of psychopathology. Can the iterative process work well with such complex constructs?
It will not work if our nosologic revisions reflect largely a power struggle between different branches of psychiatry, each with its own essentialist views of the true nature of mental illness (e.g. social, psychodynamic, genetic, neurobiological). In this pessimistic view, different constituencies within psychiatry would, over time, vie with each other for influence and control of the nosologic process and the professional status that it brings. When it gains control, each group in turn reshapes the nosologic system in their own a priori image. Instead of a gradual iterative process bringing us toward increasing validity, we would instead have wide fluctuations between different systems with divergent theoretical perspectives and no net progress.
However, between this gloomy view of a non-progressive nosologic process and the consistently cumulative progression in the development of thermometers, lies a middle ground of wobbly iterations that our nosologic process should continue to occupy. We do not yet and probably never will possess a diagnostic construct as simply and clearly measured as temperature. The most important source of wobble will probably be shifts in the importance attached to one validating perspective on psychiatric illness versus others. Because psychiatry is both a science and a practical medical discipline, we must allow for the impact of both empirical and pragmatic factors in our nosology. If these changes are kept modest – empirically driven small differences in emphasis on one perspective versus another – this will retard but not derail the iterative process. To operationalize this approach will require maturity and a consistency of vision in a rapidly shifting historical landscape. This is the best way in which psychiatry can follow biology in maturing historically from top-down essentialist views of our categories to bottom-up empirically defined entities that reflect with increasingly accuracy the world as we can best understand it.
Conclusion
Understanding the history of our own nosologic efforts in psychiatry and also that of other disciplines, especially biology, can help us to frame the nature of our process and the goals toward which we should strive. Essentialist top-down ‘expert-driven’ approaches to taxonomy were rejected in the biological sciences in the 18th and 19th centuries. They are flawed because they are based on the unsupportable assumption that it is possible, a priori, to know the true essence of a category. We cannot develop a progressive scientifically based nosology shaped by a single expert-driven conception of psychiatric illness no matter how wise its advocate. Our nosologic system started with such a history, with the work of influential and often insightful expert clinicians. However, the influence of these few individuals is somewhat arbitrary and influenced by the same a priori top-down approaches advocated in the early days of biology.
I propose the method of epistemic iteration is getting us from our historical starting point toward progressively more accurate approximations of the reality of psychiatric illness. However, for this process to achieve its aim, it will require a relatively stable consensus about the goals of psychiatric diagnosis. Cumulative progress will be difficult if large shifts occur in the theoretical orientation toward psychiatric illness, resulting in applications of very different hierarchies of validators. This approach also places a heavy and conservative burden on the revision process. Iteration only works if the broad thrust of this process is improvement – a better approximation of reality – at each stage.
Acknowledgments
This work was supported in part by NIH grants MH 068643, AA 011408, DA011287 and MH 41953. Drs P. Zachar, J. Campbell and G. Berrios kindly provided helpful comments on earlier versions of this essay.
Declaration of Interest
None.