‘No other organism is so thoroughly and extensively phenotyped, and the result is bewildering’, wrote cancer biologist Alfred G. Knudson in 1967. One might presume that Knudson was referring to the fruit fly, or perhaps the mouse, canonical model organisms with a wealth of genetic markers derived from what historian Bruno Strasser terms ‘live museums’.Footnote 1 But he was actually talking about Homo sapiens, and reviewing Mendelian Inheritance in Man (MIM), a catalogue of the existing literature on clinically observed genetic disorders compiled by Johns Hopkins cardiologist Victor A. McKusick. Knudson's statement is perhaps less surprising upon further inspection. After all, historians have demonstrated how eugenics research and state-administered policies refined the information practices central to the rise of modern bureaucracies.Footnote 2 MIM was a different enterprise, an index in the most literal sense, but also figuratively. It pointed to a transition under way: the rebranding of genetics as a medical speciality.Footnote 3 Though Knudson saw no need to comment on this point, its two closest antecedents had emerged from the British Eugenics Society and Nazi racial-hygiene projects. MIM channelled the modest aspiration of helping laboratory researchers, physicians and genetic counsellors alike to demarcate their science from white-supremacist projects of racial betterment.
Likened to an encyclopedia, even a bible, MIM was a bibliographic initiative that organized biomedical abstracts under specific genes – a printed card catalogue maintained actively by a leading authority in the field.Footnote 4 Over the course of its twelve print editions from 1966 to 1998 and eventual move online, MIM has grown from 1,486 to more than 25,000 entries and established a widely used classification system based on numerical IDs.Footnote 5 McKusick was the sole attributed author for the majority of its editions, and MIM was central to nearly every one of his many institution-building efforts: from a well-attended medical genetics summer school at Bar Harbor, Maine, to his clinical fellowship programme at Johns Hopkins, to the pivotal gene-mapping workshops he co-sponsored at Yale that formed the basis for the Human Genome Project (HGP).Footnote 6 Consequently, it became the standard reference guide to human genes, helping non-geneticists and a growing profession of genetic counsellors follow the leading edge of a field in rapid advance.Footnote 7
In light of growing scholarship on the relationship between genetics and computing, scientific publishing and information practices in the sciences more broadly, MIM warrants a history of its own. It shaped how countless investigators encountered and made sense of genetics, whether on paper or online.Footnote 8 For these very reasons, MIM's contribution is often cited, although rarely scrutinized.Footnote 9 This article sheds new light on the intersection of genetics and computing by tracing the origins, digitization, reception and eventual online distribution of this catalogue. Following Simon Schaffer, I employ C.A. Bayly's notion of an ‘information order’, in which formal and informal knowledge operate together within a particular social formation, to explore MIM's role in the informational economy of modern genetics.Footnote 10
To be sure, a number of scholars have explored the convergence of genetics with information technology, but most are primarily concerned with the manipulation and circulation of sequences: strings of nucleotides and amino acids represented by a cipher of letters.Footnote 11 Recent studies argue that preoccupation with the vision of an immutable genetic text effaces the bumpy cellular cartography forged in labs and clinics throughout the post-war period with the goal of refining the diagnosis and pathophysiology of well-known human hereditary disorders.Footnote 12 Whether in government-sponsored workshops or samizdat newsletters, these investigators developed their own strategies for sharing information to coordinate a research programme that formed the basis of the HGP.Footnote 13 Focusing on sequencing at the expense of these developments risks making only the most visible efforts to accommodate information technology to genetics appear as the motive force of the entire enterprise. Pursuing an alternative view of the genetic-information revolution from the perspective of a key bibliographic resource – setting aside the aims and intentions of genome boosters who have shaped what we know about genetic databases – highlights a different set of technical and human constraints than the scalar concerns of making sequences available.Footnote 14
In the post-war political economy of biomedical research, clinicians struggled to keep up to date while competing amongst themselves for grants. MIM helped them comprehend the disorderly production of genetic knowledge but in a way that was not as systematic as it appears at first glance. Sceptical of so-called ‘information overload’ as a uniquely modern predicament, historians of science interested in data practices more broadly have interrogated the relationship between technology, scholarship and institutional norms.Footnote 15 MIM seems to fit nicely into Jon Agar's schema for early computerization projects, ‘attempted [only] in settings where there already existed material and theoretical computational practices and technologies’.Footnote 16 That McKusick's own hard-fought access to computer time for parallel projects made the enterprise possible lends credence to Agar's account.
However, MIM provides a different portrait of what was at stake in leveraging computer power. Mathias Grote calls the encyclopedic handbook a ‘paradoxical medium of scientific modernity’: relied upon for comprehensiveness, yet always (already) out of date.Footnote 17 McKusick had a provisional solution to this problem. The impressive amount of information gathered in MIM was a palimpsest rather than a fully fledged systematization project, coming from particular sources like abstract digests, and sometimes including personal communications and observations. Its information order not only helped centre the gene as an object of biomedical scrutiny, but also reinforced a particular temporality of knowledge. Time is money, so they say, and Alex Csiszar argues that catalogues should be understood as ‘technologies of valuation’; rather than merely extending access as unbranded containers of the latest findings, they promote specific visions of who gets to produce knowledge, and get repurposed in unintended ways.Footnote 18 Though edited by a dedicated staff, MIM's successor, Online Mendelian Inheritance in Man (OMIM), bears traces of McKusick's own process and priorities – a ‘legacy system’ that remains integral to the organization of genetic knowledge.Footnote 19 I suggest that MIM shaped the norms of publication around individual genes before the sequencing revolution, but only after abandoning a focus on the individual author as the basis for organization, as well as the very ontology of Mendelian inheritance upon which its underlying technology was based.
This article begins by situating MIM within a genealogy of genetic compendia before recounting how McKusick came to the project as director of Hopkins's genetics clinic. It then turns to consider how the primary record locator of the MIM software shifted from the author to the allele. Summarizing changes over the years within a representative set of disorders, the article highlights broader shifts in MIM's bibliographic information order throughout the rise of genetic sequencing. It concludes by sketching the development of OMIM in the context of national funding for medical-library infrastructure and the rise of information-sharing resources through the HGP.
Eugenics and the cataloguing imperative
McKusick did not hesitate to call MIM ‘an encyclopedia of genes’, invoking the Enlightenment project of Diderot and D'Alembert: ‘a barometer of knowledge in the making, to be constantly revised as experience and knowledge were enlarged and refined’.Footnote 20 He also drew parallels to the Oxford English Dictionary, since MIM was compiled using ‘a historical or diachronic approach rather than by the descriptive or synchronic method used by most encyclopaedias and textbooks’. Yet he was not the first to catalogue human genes, and such rhetorical gestures conceal the work's antecedents, discussed in this section.Footnote 21 In the Treasury of Human Inheritance (1912, 1948), Julia Bell compiled data in order to argue for distinct patterns of genetic inheritance. Otmar Freiherr von Verschuer's Genetik des Menschen (1959) was more of a catalogue but organized its information along the lines of medical speciality and remained committed to a normal/pathological distinction that McKusick would eschew. MIM addressed itself to medical geneticists as an impartial literature review that presented information at the level of individual genes, though it inherited aspects of both.
The Galton Laboratory for National Eugenics was established when the biology of heredity was still in flux. Founding documents defined the institution primarily as ‘a storehouse for statistical material bearing on the mental and physical conditions in man and the relation of these conditions to inheritance and environment’.Footnote 22 For a time, adherents of biometry, a discipline pioneered by Charles Darwin's cousin Francis Galton, carried the flag of eugenics, and such statistical material was amenable to their needs. Biometricians worked to derive laws of inheritance from observable, continuous human characteristics like intelligence and stature, analysing their distribution across populations and within families using mathematical techniques.Footnote 23 In a well-studied episode, Cambridge biologist William Bateson took the biometrical conception of heredity – championed by Karl Pearson and W.F.R. Weldon, his former teacher – to task after successfully replicating the work of Gregor Mendel, becoming Mendel's leading English proponent and naming his science ‘genetics’.Footnote 24 Mendelians would eventually lay claim to the soil of the new field, as well as the culpability for promoting eugenic legislation, both in Britain and the United States.Footnote 25
The key distinction at stake between Mendelism and biometry was whether traits were inherited in a continuous or a discrete manner. This could be difficult to sort out in practice, as conspicuous traits like eye colour appeared to result from blending both parents’ contributions. Before the role of the X chromosome in sex determination was sorted out, this issue also caused trouble for the Mendelian camp, which struggled to explain how the human sex ratio could be maintained if some sex traits were dominant, manifesting more commonly, and others recessive and merely carried. As research on generations of experimentally bred fruit flies and pea plants resolved some of these questions, another lasting contribution to heredity research came from tabulating clinical observations.
While the Galton Laboratory produced a number of signature publications – the journals Biometrika and the Annals of Eugenics, for example – one of its most enduring documents was the meticulous Treasury of Human Inheritance, a project begun in 1909 under Pearson's directorship. As geneticist-cum-historian Peter Harper has argued, many of the early issues, later collected as volumes, were more compilations of data than analyses: pedigrees and photographs of congenital abnormalities aimed toward standardizing eugenics in the clinic.Footnote 26 The demands of the First World War made it difficult for Pearson to keep up with the work, and in the 1920s physician Julia Bell assumed the mantle of the Treasury. With support from Britain's Medical Research Council and Pearson's successor, R.A. Fisher, whose wartime work on blood groups laid the groundwork for more robust study of human heredity, Bell's assiduous efforts turned the endeavour decisively toward a Mendelian approach to inheritance.Footnote 27
The section ‘Dystrophia myotonica and allied diseases’ best captures Bell's approach and doubles as an overview of the disorder I discuss below. A condition originally described in 1830 by neurologist Charles Bell (no relation), myotonic dystrophy is characterized by a lack of control over facial musculature and, as of her writing up in the 1930s and 1940s, had been classified into five putatively distinctive subtypes.Footnote 28 Bell provided a detailed historical overview of the various grouped disorders as well as clinical notes furnished by a neurologist. She analysed 223 pedigrees culled from the literature or contributed directly by neurologist Otto Maas, all of which are published in detail and annotated following an extensive bibliography. Throughout numerous tables that tabulate and collate the pedigree data, Bell singled out Maas's data because he treated two of the conditions, Thomsen's disease and the more prominent dystrophia myotonica, as the same disorder. She apologized that her ‘conclusions and mode of procedure sometimes diverge conspicuously from his expressed and authoritative views’, but insisted on distinguishing between the two on the basis of her own observation of patients at the London Hospital.Footnote 29 Part of the difficulty in carrying out the work was dealing with extra information within the pedigrees, such as notes on cataracts and other mental conditions, not reported consistently by investigators.Footnote 30 Extracting tabular data from pedigrees was by no means straightforward – particularly vexed by inconsistent ages of onset – but Bell ventured to propose a dominant pattern of inheritance that awaited confirmation.Footnote 31
Though outside the anglophone context, the distinction of the first tallied-up compendium of known human genetic disorders belongs to German geneticist Otmar Freiherr (baron) von Verschuer, whose role in the genocide at Auschwitz and subsequent redemption by mainstream medicine has been documented by prior historians.Footnote 32 His 1959 Genetik des Menschen: Lehrbuch der Humangenetik updated an earlier textbook, Erbpathologie (Hereditary Pathology) initially published in 1936, with descriptions of 412 known human mutant genes and a decreased emphasis on applications of genetics, perhaps to distance himself from associations with Nazism.Footnote 33
The book represents a halfway point between Bell's Treasury and McKusick's early work. Verschuer's text is divided into three major parts: a textbook-style ‘Allgemeine Genetik des Menschen’ (General genetics of man), that relates research on human heredity to the latest cellular and organismal research, and two catalogue-like sections on ‘Spezielle Genetik des Menschen’ (Specialized genetics of man), one of ‘Normale Eigenschaften’ (Normal qualities) and a far more substantial section of ‘Krankhafte Eigenschaften’ (Pathological qualities).Footnote 34 Verschuer counted fifty-one ‘normal’ phenotypes in all, largely blood groups and features like hair colour, and frequently engaged in speculations on racial difference.Footnote 35 The ‘pathological’ section is organized by medical speciality, consisting of literature reviews alongside reproduced pedigrees and images, without much further analysis. Finally, Verschuer maintained an extensive author index, which McKusick would also prioritize.
As the need for compendia to keep track of the field became clear in the 1950s, intra-speciality synthesis was de rigueur. McKusick's catalogue took a different tack. He abandoned Verschuer's organization by medical speciality, and distinctions between normal and pathological altogether. With the accelerating tempo of research driven by new heredity clinics and interest in the effects of radiation, McKusick thought a more general tool would help stave off a veritable avalanche of printed phenotypes.Footnote 36
McKusick's catalogue
Victor A. McKusick came to medical genetics as an internist, and his career as the cornerstone of what would become the Moore Clinic at Johns Hopkins University cemented his status as a founding father of the field. McKusick's research and writing were centred on the clinical encounter. Trained as a cardiologist, he established his reputation with the textbook Heritable Disorders of Connective Tissue. Upon taking over Hopkins's genetics clinic in 1956, he began to organize what would become the new discipline's central text. In this section, I discuss McKusick's own ambitions for the cataloguing project in order to frame the technical aspects of its computerization.
In their work on McKusick, Susan Lindee and Andrew Hogan both emphasize his ‘cataloging imperative’ that inspired other clinicians to pool knowledge and see one-off abnormalities as disorders awaiting characterization.Footnote 37 As its sole credited author into the 1990s, when the writing and editing were scaled up and allocated among a team of science writers, McKusick exerted decisive influence over MIM.Footnote 38 While the catalogue undoubtedly helped foster a newfound focus on individual human genes, this change was evolutionary rather than revolutionary. Unlike Margaret O. Dayhoff's Atlas of Protein Sequence and Structure, as discussed by Strasser, MIM did not contain valuable information about sequences that could readily be exploited and analysed through the purchase of computer tape.Footnote 39 But like Dayhoff's Atlas, it formed a kind of publication system in itself: employing an early bibliographic retrieval computer program, it organized the biomedical literature and established a classificatory system for expanding them through a cheap and widely used publication.
MIM grew out of a modest exercise. In 1957, McKusick began holding a monthly journal club at his home for the Moore Clinic fellows to prepare annual reviews of medical genetics for the Journal of Chronic Diseases. Participants summarized the literature on a field with an index card for each relevant article: the full reference on one side, relevant disorders and points of interest on the other.Footnote 40 In the words of one geneticist, these reviews were the most one could ask for in a rapidly changing field whose ‘current state cannot be put into a textbook fast enough – by the time such a text were written and printed, it would be out of date’.Footnote 41 The following year, McKusick published a monograph-length article in the Quarterly Review of Biology on X-linked traits, sorting them based on how conclusive the evidence of linkage was.Footnote 42 Dealing with the other twenty-two chromosomes begat an information management problem that emerged in tandem with visions of a full human genome map.Footnote 43
MIM allegedly began in earnest when David Bolling, McKusick's own ‘computer man’, observed his ritual of sifting through hundreds of notecards to update his reviews and suggested that using the computer to store it on magnetic tape would vastly simplify the process of adding entries for the annual review.Footnote 44 However, it is clear that other factors made the project interesting – and feasible. McKusick had already sought the use of electronic computers for related work. At IBM's 6th Medical Symposium in October of 1964, he discussed two such initiatives: one to aid in the calculation of linkages between different genetic loci using human pedigrees in which multiple traits were present, and another to use computers to record more detailed information on isolated populations to aid in longitudinal studies.Footnote 45
This romanticized version of McKusick trading his notecards for punched cards also differs from the account in the forewords of MIM. According to McKusick writing in 1966, a full catalogue of recessive disorders was essential to his studies of the Amish.Footnote 46 Even when McKusick limited the Amish project to rare, distinctive recessive phenotypes rather than potential chronic conditions that occurred more commonly than one in a thousand people, by 1963 ‘the complexity of the recessives catalog prompted exploration of computer methods for assembling, revising, and indexing’.Footnote 47 Retroactive constructions of the affinity between the computer catalogue and the shaky state of knowledge in the field underestimate the magnitude of the work involved, as well as its centrality to McKusick's own research programme.
We gain more from understanding the catalogue as a tool for executing McKusick's broader vision for medical genetics. To the extent that diseases could be classified as Mendelian, they were typically rare conditions of little consequence to the health of the general population. Therefore medical geneticists had to justify their knowledge as serving a more general project.Footnote 48 In contrast to Verschuer, McKusick presented his work as a comprehensive picture of the human genetic make-up built from pathologies: ‘Discovery of “new” genetic diseases is not mere “stamp-collecting” … The catalogs of simply inherited genetic traits in man are like photographic negatives from which a positive picture of the normal genetic constitution of man can be constructed.’Footnote 49 Increasing the resolution of these negatives required McKusick to convince his medical colleagues that genetic classification had to aim for the exclusive rigour of cleanly delineated phenotypes. In a 1969 article on ‘lumping’ and ‘splitting’ in genetic nosology, McKusick displayed his keen awareness of the politics of taxonomy and implicitly declared himself a splitter.Footnote 50 A comprehensive nosology of Mendelian traits allowed him to leverage an exponentially growing number of discrete clinical studies in favour of this perspective.Footnote 51 He envisioned each chromosome with its own catalogue, which underlines the sway of the chromosomal perspective that would not give way to a genomic one until the late 1980s.Footnote 52 The map, as the saying goes, would not be the territory.
By organizing the literature into discrete genetic entities, provisional as they were, McKusick helped instil a gene-centred view of hereditary disorders before human DNA sequences were even available. However, this path was neither straightforward nor inevitable, and involved translating a bibliographic endeavour organized around medical specialities into a database of distinct genetic disorders, with a unique and stable numerical ID for each gene.
Going digital
Early reviews of MIM evince commingled excitement and anxiety over computerized management of biomedical knowledge. One reviewer praised the self-consciously provisional character of the printed-out electronic database (Figure 1) as offering ‘the hope that the mechanics for keeping it up-to-date are at hand’.Footnote 53 Another was less sanguine about such ‘heavy reliance on the computer’, citing poor editorial oversight – typos and duplicate entries abounded – and insufficient critical attention that allowed speculative publications to be reified into putative biological entities.Footnote 54 Nonetheless, MIM was hailed as a model for leveraging technology to manage an unwieldy literature:
The entire work, which would be impossible without the aid of the computer, is an instance of the organization and presentation of scientific information that must occur in many fields … for the mountainous masses of knowledge that have accumulated must somehow be rendered accessible to would-be users and new investigators.Footnote 55
Taking concerns about ‘information overload’ seriously does not mean taking them at face value. The mundane organization techniques of this emergent information order offer a glimpse at a computerized political economy of knowledge in the making.
This section considers how MIM was compiled with the equipment available to the Moore Clinic. Geneticists working today might be most familiar with MIM for providing the ID numbers used to organize all known human genes and their alleles. The catalogue was alphabetical, organized into three major sections – dominant, recessive, and X-linked traits, indicated by the first numeral of the ID – with long indexes of authors and disorders. However, a closer investigation of source material reveals that this scheme was a contingent feature that emerged over time. Both in McKusick's mind and in the bibliographic software used to compile MIM, author was the primary record locator and organizing principle. Unpacking how MIM was created from this early computer system allows us to see how, in contrast to McKusick's elegant conceptualizations of genetic nosology, the gene was secondary to a bibliographical information order. Lest these organizational changes appear trivial, I conclude by reviewing some of the systems and publications linked to MIM's ID system.
At the September 1966 International Congress of Human Genetics in Chicago, where the community focused on issues of standardization, McKusick highlighted the Moore Clinic's bibliographic use of computers rather than his statistical linkage mapping research.Footnote 56 The computer program retained features of McKusick's manual abstract-compiling process: article abstracts were assigned paragraph numbers and index words, and references tied to the number of the paragraph where they were discussed, allowing an alphabetical list of authors and an index to be generated and stored on magnetic tape. While Bell's Treasury synthesized clinical data alongside relevant medical histories, McKusick et al.'s paragraph-length summaries compiled the latest research findings in a largely additive fashion. Because clinical geneticists worked in a system in which credit accrued to individuals, due to their tight social network – ‘one recalls that X.Y.Z. Jones described a peculiar syndrome in 1931’ – the author remained the most important piece of information. A virtue of the system, as McKusick saw it, was the ‘automatic updating of the numbers in the indices each time a new item [was] added to the catalogs … changing the numbering in the text’.Footnote 57 References to stable genetic entities were not built into its design.
The Moore Clinic shared a system developed by Jane Olmer and Robert Rich for Hopkins's Applied Physics Laboratory (APL) using an IBM 1401 computer. Though the bibliographic program was coded to manage the APL's literature, interest from users like McKusick led them to envision a system ‘general and flexible enough to handle a wide variety of applications in a multiplicity of input and output formats’.Footnote 58 Magnetic tape was the storage medium of choice for a multiple-user system, since the tape could be moved between units for processing or printing.Footnote 59 All documents were ‘key punched and fed directly to the library master tape’, which made ‘searching a matter of selection with no sorting required’.Footnote 60 Though more sophisticated searching was possible, Olmer's technique used a set of thesaurus terms for each user that were appended to individual entries, demarcated by slash marks to set them apart from free-standing text. The approach privileged general usability and portability to a better system, and was chosen over viable alternatives to satisfy the most users and work best with the 1401.
In early negotiations with the publisher, Johns Hopkins University Press, one of McKusick's staff argued that the book be compiled straight from computer printouts to speed up the proofing process, and claimed that reviewers who encountered the text in this format failed to note the drawbacks of all-caps text and limited characters.Footnote 61 Photo-offset printing of computer output had been pioneered in the early 1960s by the US Government Printing Office, but was deemed so unreadable that they commissioned an adaptation of Linotype technology for magnetic tape.Footnote 62 For McKusick and his collaborators, convenience and control superseded any aesthetic concerns.
McKusick insisted that scientific advances, rather than the economics of publishing, determine when the catalogue was released and how it was distributed. He wanted to print up to ten thousand copies, but his editor insisted that they limit the first run to three thousand bound copies and two thousand sets of unbound sheets, and suggested that he consider planning for supplements.Footnote 63 McKusick bristled at the suggestion; he preferred to publish a new volume ‘even if it means dumping a thousand copies in the bay’, ensuring that ‘advances in genetic nosology [were] the only consideration in republishing, not the backlog of an unsold prior edition’.Footnote 64 Nonetheless, in 1966, three thousand copies of the 368-page catalogue were bound and sold for only eight dollars, due to low production costs. New editions every few years would be sent to press in March to be bound by July, just in time for the annual course on medical genetics McKusick ran in Bar Harbor, Maine.
The catalogue's popularity meant that the collaboration with the Applied Physics Laboratory would not continue indefinitely. In 1971, the group acquired its own microcomputer, a Datapoint 2200, that allowed them to bring their editing process in-house and eliminate many of the earlier method's steps.Footnote 65 The new machine did not help them appease their editor, eagerly awaiting ‘the day when computers will print both upper and lower case’.Footnote 66 Having an in-house computer did, however, allow McKusick and his team to respond better to user suggestions.
Recalling that McKusick saw the program's ability to constantly update entry numbers as a feature, rather than a bug, it is noteworthy that it took until the third edition for him to change his mind. ‘Some geneticists expressed a desire to use the numbering system of these catalogues as the basis of a diagnostic and bibliographic filing system and were distressed by the change of numbering between the first and second editions’, he noted in his introduction.Footnote 67 McKusick quickly saw the benefits of having stable IDs, an innovation that helped enable cross-referencing and mapping human genes on a large scale. The schema allowed him to construct tables and appendices of valuable biological and clinical markers for quick consultation, which would not be rendered useless by constant re-enumeration. One such appendix was the map of human chromosomes emerging from the first Human Gene Mapping conference at Yale in 1973, updated in all editions following the fourth. MIM became an integral resource for coordinating the mapping work using somatic cell genetics; an appendix was even added listing genes with mutant cell lines available from a repository in New Jersey to facilitate further research.Footnote 68
Though seemingly trivial, the implementation of a standard enumeration scheme for MIM turned this printed database into a publication infrastructure for medical genetics, linking up cutting-edge mapping work with protein biochemistry research and the filing cabinets of clinicians worldwide.Footnote 69 MIM incorporated others’ databases before the hyperlinked online version, including information from Dayoff's Atlas on protein sequences for haemoglobin and a World Health Organization report on glucose-6-phosphate dehydrogenase diversity in the global population.Footnote 70 It even became the basis for a hospital network computer database for clinical genetics, MEDGEN, run by the University of California, San Francisco.Footnote 71 A classification system developed to keep track of an exploding literature helped reorient biomedical research around distinct gene entities, providing the basis for other systems of storage, circulation and interconnection.
A palimpsest of pathology
While there is little doubt that MIM saw wide circulation and was central to the work of human genetics, one still might ask how its information order shaped the norms of the emerging field, or even how it was put to use. While the latter question remains largely elusive, following changes in an exemplary set of disorders – the myotonic dystrophy family discussed above – through each edition reveals important features of how the catalogue worked, with broad import for the historical analysis of reference material.
First, centralization is never neutral. Cataloguing is a human effort, often bound up in institutions with interests, constraints and values. While McKusick always encouraged readers to submit information, many of the citations in MIM came from his browsing of the Institute for Scientific Information's Current Contents, a weekly digest of cover pages throughout the biomedical sciences spearheaded by Eugene Garfield, a linguist regarded as one of the founders of scientometrics. While contributions came from hundreds of journals, an internal analysis in 1994 revealed that two-thirds of all entries came from just twenty-one high-impact journals.Footnote 72 Though ‘largely a bibliographic task’, the mark of the Moore Clinic and its steady rotation of international and domestic fellows was apparent in the inclusion of ‘unpublished observations’ derived from their own research projects; further, ‘much judgment based on personal experience was necessary in selecting items for inclusion and in deciding the manner in which they should be treated’.Footnote 73 Whether used as filters or merely presented alongside published results, McKusick's own assumptions and clinical data assumed the mantle of fact. The write-up of myotonic dystrophy in the fourth edition discussed the results of a study from his lab without direct attribution, and this very segment was duplicated within the entry in a later edition.Footnote 74
Second, not all entries are created equal. Subjects of controversy tend to attract the most attention, and this is particularly the case with the genetic loci found in MIM. The first edition of MIM (1966) included four variants of myotonic dystrophy (a fifth would later be added; two are shown in Figure 1), and while the entries themselves would remain, the number of references they contained (Figure 2) and their genetic status would come to differ greatly.Footnote 75 When mapping efforts expanded during the 1980s, the literature on myotonic dystrophy grew exponentially as researchers came closer to pinning down its location within the human genome.Footnote 76 The entry became increasingly unwieldy, taking up nearly an entire page; new information was tacked on as it came in, and no distinctions were drawn between the kinds of studies discussed. Finally, McKusick brought on board a team of medical writers who consulted with various specialists and reorganized such entries into sizable review essays.Footnote 77 Other variants did not change substantially over the course of three decades.
Finally, knowledge is sticky. Paying attention to changing entries across editions shows that McKusick largely worked in an additive fashion – a feature of the early bibliographic-entry software – and entries were often more chronological than prescriptive. Only a major upheaval could compel him to make a substantial revision. A fascinating example of this was a problem known as ‘anticipation’: the notion that a genetic disorder could get worse and have an earlier onset in successive generations. Even before MIM's first edition, McKusick had authoritatively cited a prominent article claiming this phenomenon as a statistical artefact.Footnote 78 However, molecular studies showed that successive generations did, in fact, accumulate repeat mutations in the gene, and McKusick was eventually forced to amend the entry.Footnote 79 This raises a more general problem. MIM began as a catalogue of phenotypes – the physical manifestation of an underlying genotype, or presumably unique genetic variant observed in the clinic. But what would happen to entries once a coding sequence for a protein involved in more general pathways was discovered to be responsible? Rather than collapse entries, they decided to link them using a hash tag to indicate phenotypes caused by another mutation.Footnote 80
Eventually, molecular studies led the catalogue to abandon its numbering scheme separating dominant and recessive disorders. Having already appended a sixth digit to all entries and allowed for decimals in order to account for molecular variants, they adopted a new digit, 6, to prefix all new entries.Footnote 81 Mendelian Inheritance in Man ceased to be truly Mendelian as it developed in tandem with knowledge in the field, and it increasingly relied on pointers between different databases as knowledge about underlying molecular entities accrued.Footnote 82 Technical fixes like this were the tip of the iceberg; reimagining MIM for a new era required accommodating its information order to work alongside other resources, and without McKusick.
The centre cannot hold
The transition between the physical MIM and its online counterpart entailed more than porting a computerized file to a singular computer network, such as the Internet. Rather, it was bound up in negotiations over how to make all manner of new genetic information available over different networks. In the realm of sequences, this took the form of a valuable public–private contract vied for by different biotech start-ups at the outset of the genomic age.Footnote 83 However, the bibliographic information compiled for OMIM had a different trajectory, traced in this final section.
In 1986, the director of Hopkins's Welch Medical Library, Nina Matheson, approached McKusick with the offer of creating an online version of MIM that would serve as a testbed for a contextual search engine, IRX (Information Retrieval Experiment), being developed by the National Library of Medicine (NLM).Footnote 84 Editors at Hopkins Press stood their ground against making the catalogue available through a computer connection, but the Howard Hughes Medical Institute funded the joint venture the following year.Footnote 85 This was a time of rapid growth for medical informatics. The Medical Library Assistance Act of 1965 had substantially broadened the NLM's purview, allowing it to invest in computing technology in addition to serving regional medical library needs.Footnote 86 A 1987 report declared the urgent need for a National Center for Biotechnology Information (NCBI), established the following year, to meet the demands of the growing genetics community.Footnote 87 Constant collaboration between the Welch and the NLM, bolstered by their geographical proximity, supported this multi-institutional effort to keep OMIM available as a public resource, despite changes at the NCBI and in the cost of maintaining the staff and servers to support it.
Throughout the next few years, McKusick remained largely responsible for the content of MIM and its online counterpart. 1993 saw the beginning of major changes to the project: entries and names were updated to better interact with the HGP-proximal Genome Data Base (GDB) project, and a full editorial board was implemented to handle different subject areas.Footnote 88 As automatic sequence submission systems were put in place, the OMIM project argued that expert-based data curation was more valuable than ever. It had vocal allies. Stanford human geneticist and eventual scientific director of 23andMe, Uta Francke, and Phyllis J. McAlpine, director of the Human Genome Organization's nomenclature committee, both claimed that OMIM entries often replaced a literature search for overworked researchers struggling to put together grant proposals.Footnote 89 A review in the first issue of Genome Biology declared OMIM's annotations ‘second to none … a result of their policy of manual curation’, and evinced anxiety over ‘the coming deluge of data’.Footnote 90 Users of the platform valued its connectedness and curation equally.
Yet there were always tensions over the project's ownership. Managing a team of science writers and coordinating efforts to stay in sync with other digital resources proved difficult for subsequent directors, who had to contend with McKusick's ongoing presence even as he took the back seat. Throughout his correspondence during these tumultuous years of growth, he expressed constant concern that technical choices and personnel changes would derail his vision: ‘There is concern by many, not only within my group but in the genetics community at large, that we will screw up [in the margins: ‘destroy?’] what is now a very successful operation. OMIM is too valuable to let that happen.’Footnote 91 Nonetheless, McKusick stepped back over the years and let a new, dedicated team of science writers – with many of whom he collaborated closely – take charge.
By 2005, the NCBI assumed full control of OMIM, by that point fully hooked into its suite of related resources: ‘genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, HUGO nomenclature, MapViewer, GeneTests, patient support groups and many others’.Footnote 92 Strasser argues that the NCBI's success in uniting publication platforms with sequence databases (and mandatory sequence submission) represented a triumph of open science long resisted by the scientific community and journal publishers.Footnote 93 In practice, however, this consolidation proved to be short-lived for OMIM. Contract restrictions forced it to separate from the NCBI, landing it back where it began at Hopkins. This allowed its team to be more flexible in their information management and website design, but also made the project dependent on grants and donations rather than funded by federal funds earmarked for maintaining open-access scientific resources.Footnote 94 Although MIM helped usher the genomic information order represented by the NCBI's managed databases into being, providing both a scaffold and an inspiration, today it exists independently.
Conclusion
In this paper, I have argued that MIM formed an information order that helped shape the credit system of human genetics around individual genes. This was not one of the project's initial aims – indeed, it had originally been organized by reference to the author of a particular entry – but it emerged as a consequence of evolving technological solutions to problems of information management and the demands of users. In the midst of present debates over open-access journals, government data-rescue initiatives, replication crises, and how to make science more democratic, I concur with Alex Csiszar's claim that historians of science need to ‘attend to the ways in which information technologies built to extend access to knowledge can also become technologies of valuation and exclusion’.Footnote 95 The history recounted here is an effort to disentangle some of the infrastructure that allows patients, physicians, geneticists and even biotech marketers to stay up to date on genetic research, and to attend to how information orders shape and are shaped by the technologies through which they are realized.
Just as the original MIM had been an early example of the use of an electronic digital computer to generate a publication, McKusick called OMIM ‘one of the first electronic resources to exploit the advantages of the Web’.Footnote 96 Rather than matters of bald fact, these priority claims, echoed by others, reflect the self-perception of human geneticists as early adopters of information technology. They also raise the matter of how central McKusick was to the myth-making and community value of MIM and OMIM. He continued to refer to the project idiosyncratically as a ‘knowledgebase’, to the irritation of his systems operators.Footnote 97 Speaking of the editorial reorganization project, one reviewer quipped that the labour should be thought of in terms of ‘Whole McKusick Equivalents’, or WMEs.Footnote 98 Reconstructing the broken links of this digital history makes OMIM appear more and more like a ‘legacy system’ in a dual sense: somewhat outmoded yet providing shape to an organization, and a tribute to the power of personality that brought such resources together in the first place.
Nonetheless, despite its authoritative reputation, OMIM continues to parry threats to its existence. Upon visiting omim.org at present, one is greeted with a pop-up urging private donations to sustain the resource, now back at Hopkins. A number of medical writers are still employed to update this knowledge without publication credits. The value of such translational work is only increasing; biotechnology companies continue to invest in it as they grow their consumer bases. As such, OMIM relies increasingly on licensing fees from biotechnology platforms that make use of its application program interface (API) to integrate OMIM information. With the resource becoming increasingly marginal, we can see a shift in this synthetic and translational work so central to modern biomedicine toward the private sector.Footnote 99
The quality of information available on omim.org makes it one of the best civilian science resources in biomedicine. What would it look like if the biotechnology companies that drew on it were required to make meaningful intellectual contributions? How would grants look if the definition of publications and original research were broadened to include the kind of synthetic work that could only be taken seriously from someone as creditworthy as McKusick? These questions exceed the boundaries of my study, but in emphasizing the connections between genetics, clinical research, capital and information technology, I want to suggest that despite OMIM's ‘legacy’ status, this kind of reference infrastructure can hardly be a one-person job, and should neither be relegated to charity nor left for the market to determine access.Footnote 100
Acknowledgments
I would like to thank Andy Harrison and Phoebe Evans Letocha at the Alan Mason Chesney Medical Archives of the Johns Hopkins Medical Institutions for their assistance with the Victor McKusick papers, and to the staff of OMIM, in particular Joanna Amberger, for their insight. The careful guidance of Angela N.H. Creager was essential at every stage of this paper, inspired by Nick Hopwood's interest in a piece of an earlier project he helped shepherd into being. Participants in the Learning by the Book conference held at Princeton in June 2018 shaped its current iteration, as did insightful feedback from the editor, two anonymous reviewers, Andy Hogan, Alex Csiszar, Erika Milam and her pandemic ‘lab group’. Hospitality and intellectual friendship in and around Baltimore made it all the more worthwhile, so thanks in particular to Janel Jin, Ding Xuan Ng, Julie Barzilay, Jon Phillips, Ayah Nuriddin and Heidi Morefield.