As the fields of digital humanities and digital history have grown in scale and visibility since the 1990s, legal history has largely remained on the margins of those fields.Footnote 1 The move to make material available online in the first decade of the web featured only a small number of legal history projects: Famous Trials; Anglo-American Legal Tradition; The Proceedings of the Old Bailey Online, 1674–1913.Footnote 2 Early efforts to construct hypertext narratives and scholarship also included some works of legal history: “Hearsay of the Sun: Photography, Identity and the Law of Evidence in Nineteenth-Century Courts,” in Hypertext Scholarship in American Studies; Who Killed William Robinson? and Gilded Age Plains City: The Great Sheedy Murder Trial and the Booster Ethos of Lincoln, Nebraska.Footnote 3 In the second decade of the web, the focus shifted from distributing material to exploring it using digital tools.Footnote 4 The presence of digital history grew at the meetings of organizations of historians ranging from the American Historical Association to the Urban History Association, but not at the American Society for Legal History conferences, the annual meetings of the Law and Society Association, or the British Legal History Conference.Footnote 5 Only a few Anglo-American legal historians took up computational tools for sorting and visualizing sources such as data mining, text mining, and topic modeling; network analysis; and mapping.Footnote 6 Paul Craven and Douglas Hay's Master and Servant project text mined a comprehensive database of 2,000 statutes and 1,200,000 words to explore similarities and influence among statutes.Footnote 7 Data Mining with Criminal Intent mined and visualized the words in trial records using structured data from The Proceedings of the Old Bailey Online, 1674–1913. Locating London's Past, a project that mapped resources relating to the early modern and eighteenth century city, and also made use of the Old Bailey records.Footnote 8 Digital Harlem mapped crime in the context of everyday life in the 1920s.Footnote 9 Only in the past few years has more digital legal history using computational tools begun to appear, and like many of the projects discussed in this special issue, most remain at a preliminary stage.Footnote 10 This article seeks to bring into focus the constraints, possibilities, and choices that shape digital legal history, in order to create a context for the work in this special issue, and to promote discussion of what it means to do legal history in the digital age.
The dearth of digital legal history is particularly striking, given that legal history is better positioned for a digital turn than most historical fields when it comes to the amenability of legal sources to computational analysis and the availability of those sources in digitized forms. The consistent forms of legal sources such as statutes, court records, trials, and judicial opinions give structure to the information they contain. Although those records have changed in shape and substance, those variations retain sufficient structure to allow them to be compared over time. Legal sources were published frequently, and survive in comprehensive collections. The language of those records is marked by a repetitious and highly specialized vocabulary that gives legal texts many standardized elements. In addition, that language also mitigates one of the limitations of computational text analysis: that it processes words, but because words and meaning have no easy correspondence, identifying patterns in words does not always offer a clear picture of the information that documents contain. However, the highly technical nature of legal language means that the correlation between words and meaning is much higher than in most textual sources, and the results of computational text analysis of legal sources are more revealing of their contents.
Many legal history sources already exist in digital formats that can be used with computational tools. Legal records were some of the first historical documents transformed by databases and digitization. To take just the American example, LexisNexis and Westlaw introduced computerized databases in the 1970s, and have progressively expanded them to include all published federal court decisions and the decisions of the higher state courts. Gale's Making of Modern Law databases include 22,000 English and American legal treatises, published trials, and United States Supreme Court briefs and records. HeinOnline includes a law reports, treatises, and a wide range of session bills, statutes, and published legal sources. Much of the published material from the years before 1923 can also be accessed through HathiTrust's digital library and Google Books.Footnote 11
The nature of the legal sources that have been digitized has contributed to the limited amount of digital legal history. These sources are overwhelmingly from inside the law—case law, statutes, treatises, trials—whereas legal history since the 1970s has been increasingly focused on the relationship between law and the wider society of which it is part.Footnote 12 The “law and…” approach requires additional sources beyond those generated by the legal system. Far less of that material has been digitized. Much of the digital historical record that does exist consists of periodical literature.Footnote 13 The vast majority of archival material has not been digitized.Footnote 14
Although all the sources for legal history are not digitized, the wealth of databases of legal material nonetheless means that scholars who study the law are likely among the historians who frequently conduct searches of databases as part of their research. That is an assumption because, as is the case across historical fields, there is no discussion of search as a research method in legal history scholarship. Nonetheless, studies of the research practices of historians report widespread use of searches, beginning with Google searches to identify sources, and proceeding to full text keyword searching to research within digitized collections and documents.Footnote 15 The failure to discuss these searches in scholarship implicitly treats them as “a finding aid analogous to a catalog.” That characterization was somewhat true when searching focused on metadata. However, from the mid-2000s, the widespread use of optical character recognition (OCR) software to turn images of documents into machine readable text brought a shift to full-text search, “a name for a large family of algorithms that humanists have been using for decades to test hypotheses and sort documents by relevance to their hypotheses.”Footnote 16
Recognizing that full-text search is a computational tool highlights that much legal history is at least inflected with digital history, and requires more attention to be given to how database searches work and what it means to use searching as a research tool.Footnote 17 A search will produce different results depending upon how the searchable text was generated. Although the document—the image of the page—is the same regardless of the database, machine-readable words can be generated from that image in different ways: by transcription (the results of which vary depending upon the individual transcriber) and by OCR, the results of which vary by software and the extent of efforts to correct errors. A search of the same documents in different databases can, therefore, produce different results.Footnote 18 The results a search returns also depend upon what word or phrase is searched for and in which part of a document, and, depending upon the database, whether searches for phrases are “Boolean searches (United AND States) or exact-phrase searches made with database-specific delimiters (United States).”Footnote 19 Searching also struggles to deal with what lies outside a set of results. In returning only the terms one enters, a search filters out any alternative hypotheses. For historians, this poses particular challenges, as the language and ways of organizing knowledge in the past often differ significantly from contemporary terms and patterns of thought. If scholars use the wrong search terms, they literally misread their sources, and might not read them at all. Moreover, when working with interfaces that indicate how many results were found without reference to how many results were possible, it is not always clear just how significant those results might be. As Ted Underwood notes, “in a database containing millions of sentences, full-text search can turn up twenty examples of anything.”Footnote 20
In addition to enabling full-text search, the words produced by digitization can be used with other computational tools. As Lara Putnam points out, this “mass data-fication of words” both associates digital history with earlier forms of quantitative history and distinguishes it from them. It is not historians’ ability to undertake computational analysis that is new; it is that the data available for computational analysis now includes words.Footnote 21 However, few legal historians have used computational tools other than searchable databases because most digitized legal sources have not been made available for use with such tools. The databases of vendors such as LexisNexis, HeinOnline, ProQuest, and Gale currently neither offer tools for text mining nor allow the access to their contents that is necessary to obtain data for text mining. As Tanenhaus and Nystrom put it in their article in this issue, “The sort of digital access appropriate for a traditional user, including limitations posed by the quantity of material available, the size and format of permitted downloads, the ease of machine interaction with the download interface, and the cost of licensing, can be a stumbling block for legal historians who want to conduct additional computational analysis.” They were able to obtain the session laws they needed by negotiating to exceed the download limit imposed by HeinOnline, but could not undertake digital analysis of newspapers because their access was blocked after downloading only 18 months of publications.Footnote 22 Currently, vendors such as ProQuest and Gale are imposing additional charges for the access required for text mining, and even then, delivering that content on hard drives, with terms of use that limit the text mining that can be done.Footnote 23 Developing large-scale open access collections as an alternative to proprietary databases is a challenging and expensive activity. As Andrew Prescott notes, The Proceedings of the Old Bailey Online, 1674–1913 required funding from four separate agencies and three different universities, and involved a team of twenty-two people, an infrastructure that “has more in common with filmmaking than old-style academic publishing.”Footnote 24 In the absence of sufficient public funding, an alternative is to partner with commercial vendors, as Harvard Law School Library has with Ravel Law to digitize United States case law. This project will make approximately 40,000,000 pages of material freely available; however, the bulk access required for computational analysis will be restricted for 8 years by the exclusive commercial license granted to Ravel.Footnote 25
Given how little of the abundance of digitized material is accessible for computational analysis, and consequently, how important digitization continues to be to digital history, it is appropriate that several of the articles in this special issue discuss projects to make legal history sources available online. Surveying the changes being wrought by digitization in 2003, Roy Rosenzweig asked, “Should the work of collecting, organizing, editing, and preserving of primary sources receive the same kind of recognition and respect that it did in earlier days of the profession?”Footnote 26 As the profession has moved toward the recognition of digital history, that question has been answered in the negative. The American Historical Association, in its recent Guidelines for the Evaluation of Digital Scholarship in History defines digital history as “scholarship that is either produced using computational tools and methods or presented using digital technologies.” Other projects involving digital tools are bracketed off as service.Footnote 27 One way to bridge that gap is to publish peer-reviewed scholarship about those projects. This issue of Law and History Review is an important venue for such work.
Eiseman and Seipp offer accounts of two different approaches to delivering legal sources online. Comparing those two projects illustrates the different character of digitization undertaken by libraries and archives and by researchers, the consequence of cultural institutions that hold source materials having “their own ways of organising and describing source materials which may be quite different from the information produced by the research process.”Footnote 28 Those differences impact the extent to which the digitized sources can be explored using digital tools. As archivists and librarians, Eiseman and his colleagues began with the notebooks of students at the Litchfield Law School, not their contents, and with data about those legal sources, not the data in the sources. Their approach to making the notebooks accessible to researchers was to focus on the library catalog records, and then create a portal to “enhance access” to the notebooks. The portal includes valuable contextual information, but in its current preliminary form, the data and its format offer only a limited ability to explore the contents of the notebooks. The information about the notebooks is sortable rather than searchable. A user can click on “different legal titles such as Baron and Feme, Real Property, Powers of Chancery” associated with a section of a notebook to see all lectures on that subject across all the notebooks. That format limits a user to following a single path to exploring the collection, the one created by the scholar Whitney Bagnall. Moreover, as Eiseman notes, the subjects are not in a standardized form; therefore, using them to sort the contents of the notebooks does not necessarily gather all the related documents.Footnote 29 A transcription of the first line of each notebook section is included, but the records cannot be sorted based on that text, nor can that text be searched; therefore, those transcriptions cannot be readily used as a way of exploring the notebooks. Further stages of the project will address some of these limitations, with plans to standardize the subject categories used to describe the contents of the notebooks; however, the emphasis is on using catalogue metadata to expand access to the notebooks not to expand the ability to explore their contents.Footnote 30
By contrast, Seipp's project is primarily concerned with the data in a source, rather than data about the source and the source itself. To more quickly and easily find information in a corpus of eleven large volumes of black-letter text and sixty-seven modern scholarly editions of reports, Seipp “compil[ed] a database of 22,318 records indexing and paraphrasing every printed Year Book report from the years 1268 to 1535 in England.” Databases require information to be organized into a tabular form following a set of rules—columns for different types of information, rows for each instance or record—and standardized. Many historical sources are difficult to convert into a database resource. They contain unstructured information, and mix different types of information, producing ambiguous and inconsistent data. However, those problems occur less often with legal sources. Consistent forms such as statutes, court records, trials, and judicial opinions give structure to the information they contain. They also employ a repetitious and highly specialized vocabulary that gives legal texts many standardized elements. Those records were published frequently, and survive in comprehensive collections.
Seipp's database reflects these characteristics of legal sources. The Year Books have sufficient structural consistency to allow records that span more than 250 years to be included in a single database. The thirty-nine fields (columns) of data Seipp defined for the reports include fields for types of information produced by the structure of a report: “the name of the court, the writ, the names of the parties, if disclosed in the report, and other persons and places named.” Also included are types of information that are features of legal proceedings: “the full names and abbreviated titles of all judges and lawyers quoted or mentioned every time they appear, statutes mentioned or hinted at…a field of keywords that lists every legal term in the report (in noun form), and a field that I call ‘process’ in which I include legal steps of pleading and procedure.” A field somewhat misleadingly labeled “Commentary” contains “summaries and rough translations.” Those “paraphrases,” the information in the reports in an unstructured form, highlight that there is a range of information related to the facts of the cases, not the workings of the law, that is less easily organized, and which Seipp has chosen not to include in the database. The remaining fields are devoted to information in the traditional form of citation (akin to the catalog metadata created by Eiseman), contextual information and relationships drawn from sources other than the reports themselves.
In describing the database as an index, Seipp signals his focus on using searching to explore the Year Books. The interface offers a full text/keyword search (of all the information in the database), and a search limited to each field. Search results are delivered in a form that mirrors an index: as a list of individual reports, for which only the citation information is displayed, each linked to a single page containing all the fields of information relating to that report, and a link to an image of the document. The online database directs users to individual records, but offers no other ways to explore the reports. However, another way databases can be used to explore sources can be glimpsed in the total number of results that appears at the top of the list of search results.
A count of search results is an instance of how a database can group as well as retrieve information. Aggregating records allows them to be counted and patterns identified within the information. This simple data mining is a quantitative approach, but does not rely on the statistical analysis that characterized the quantitative history of the 1960s and 1970s. As Fred Gibbs and Trevor Owens note, “The mere act of working with data does not obligate the historian to rely on abstract data analysis. Historical data might require little more than simple frequency counts, simple correlations, or reformatting to make it useful to the historian looking for anomalies, trends, or unusual but meaningful coincidences.” Such data mining is a technique focused on discovering and framing research questions, rather than generating evidence to confirm or refute a hypothesis.Footnote 31
The Prosecution Project described by Mark Finnane involves creating a database that can be explored to discover patterns in Australian trials. Like Eiseman and Seipp, Finnane describes the creation of this digital resource, but he also offers the results of some preliminary explorations of the database to identify patterns in the information. The information in the database is drawn from registers of trials, a source even more structured than case reports such as those with which Seipp worked. In the example given by Finnane, Victoria, “the data by early twentieth century includes name, committal date and location, trial date and location, judge, prosecutor and defending counsel, names of witnesses (including their title if a police officer or medical expert), plea and outcome including sentence when convicted and appeal outcome when that applies.” Registers of cases also have the advantage of being available for all the Australian states, a comprehensive source that offers the possibility of creating a database of all of the country's criminal trials. However, there is “sufficient variety in format that information across every category is not available for all 52,495 trials [currently in the database].” More broadly, the lack of narrative text that makes the registers relatively easy to convert into a database does also make them less rich in information than a case report.
Notwithstanding the limits on the types of information provided by the registers, Finanne's preliminary data mining identified a range of patterns that provide directions for further research: variations in judicial sentencing, in granting bail, in legal representation for defendants, in which offenses involved co-defendants, and in the proportion of defendants who pled guilty. For example, defendants were “significantly more likely to be released on bail if charged with crimes against the person than crimes against property,” and “being released on bail was very strongly associated with a higher likelihood of acquittal or the abandonment of the case by the prosecution.” For Finnane
such evidence prompts of course further inquiry into the reasons for that association—how far were doubts about the strength of a case already in play at the committal stage, thereby shaping a bail decision? Or, as contemporary criminologists who have discerned similar trends have speculated, did pre-trial release better enable defendants to present themselves and their case in a more favorable light, while prolonged detention hampered their efforts to properly consult with legal counsel and perhaps encouraged them to simply plead guilty?
The Digital Panopticon illustrates another way that databases can be used to explore digital sources, by facilitating record linkage.Footnote 32 The project aims to “trace the criminal and wider life histories of the 90,000 or so offenders sentenced at the Old Bailey to transportation to Australia or imprisonment within Britain between 1780 and 1925.” The evidence for those histories appears in more than forty different sets of judicial and civil records. Organized into databases, those documents can be linked by algorithms that use names to identify all the documents related to an individual. Ward and Williams do not discuss how that record linking is achieved; however, details of this central element of their digital method can be found on the project blog. Historical records are particularly challenging to link because of variations in spelling, the lack of unique identifiers, and imprecise dating. To increase the number of matches, the project has used algorithms that identify names that sound similar when spoken, but might be (accidentally) spelled differently, and that quantify and identify variance to match names with very small variance, in which only a single letter is different or omitted. Information from other sources is added to try to verify matches, and they are manually checked. An ongoing challenge of record linkage is trying to find the optimal, complementary balance of automated and manual work.Footnote 33
The “life archives” produced by the Digital Panopticon’s record linking can be mined for quantitative patterns. In that way, as Ward and Williams note, record linkage combines quantitative breadth with qualitative depth. It allows the recovery of individual lives alongside the patterns shared by thousands of lives. Ward and Williams offer a preliminary exploration of patterns in the current, incomplete data, using data mining in the same manner as Finnane. Only an outline is visible in their article of the complementary approach, being pursued by Tim Hitchcock, of using record linkage to orientate digital history toward the lives of ordinary individuals, and give greater shape to “history from below.”Footnote 34 Somewhat oxymoronically, big data is a powerful tool for producing small stories.
Textual data that has not been converted into a database, that remains unstructured in documents, can also be mined: this subfield of data mining is known as “text mining.”Footnote 35 Whereas patterns in structured data can be found by counting fields in a database, with unstructured text it is necessary to identify what will be counted. “A computer doesn't know what a word is and certainly has no sense of what words might refer to,” as Stefan Sinclair and Geoffrey Rockwell note. “A computer “reads”—processes—text as a meaningless string of characters.” To demarcate words, computational tools look for spaces and punctuation, a process called “tokenization.”Footnote 36 In this issue, Tim Hitchcock and William Turkel use text mining to revise a narrative of court behavior being transformed in eighteenth and nineteenth century London by “the development of the ‘adversarial trial’, the changing role of legal counsel, the rise of ‘plea bargaining’ and summary justice, and the evolving functions of both judge and jury.”Footnote 37 They first explore what kind of evidence The Proceedings of the Old Bailey Online, 1674–1913 can provide of court behavior, particularly the extent to which the changing nature of the Proceedings as a document reflects changes in court behavior rather than other forces. The existing understanding of this source is based on small samples and impressions; it is not possible for a researcher to look at all of the 127,000,000 words and 197,745 trials that make up the Proceedings. Text mining allows a comprehensive view: a computer can process every word, making it possible “to locate patterns made invisible by the sheer volume of inherited text.”
Hitchcock and Turkel use trial length—the number of words—in aggregate in each year, and in each trial, to explore the nature of the Proceedings. They found that in the eighteenth century, “the relationship between what was published and what occurred at the Old Bailey changed from decade to decade and from year to year,” which “makes their use as evidence for the rise of legal counsel and the adversarial trial difficult to sustain.” By contrast, text mining showed that the Proceedings give a fuller account of nineteenth century trials, contradicting the impressions of legal historians, which have led them to ignore those later records in favor of those from the eighteenth century. Text mining the nineteenth century Proceedings reveals “a mixture of longer and shorter trial reports between the early 1830s and 1850 with relatively few trials occupying,” and that trials that resulted in verdicts of not guilty in this period are “reported at much greater length than those resulting in a ‘guilty’ verdict.” Together with data mining that shows that guilty pleas and verdicts rose in the same period, Hitchcock and Turkel's text mining confirms the growing importance of plea bargaining. Explaining the forces that brought about this change requires pairing the use of computational tools, with close reading and archival research.
David Tanenhaus and Eric Nystrom use another text mining technique, grouping documents together based on a measure of their similarity.Footnote 38 Algorithms measure similarity in different ways. Tanenhaus and Nystrom use the Jaccard coefficient, “which is the number of elements [words or phrases] the two documents have in common, divided by the number of elements found in both documents (with those appearing in both documents only counted once).”Footnote 39 Given the importance of questions about precedent and influence to legal history, and the specialized and repetitive vocabulary used in legal sources and settings, a computational tool that measures similarity has obvious value to legal historians. In this combination of counting and calculation, a computer is performing a form of close reading, and doing so with accuracy that a human would be hard pressed to match.
Tanenhaus and Nystrom's article offers only a glimpse of how measuring similarity can be used for discovery, with text mining playing a minor role in their argument. Although the frame of the article discusses the use of digital tools, the body is a traditional, dense narrative argument. It is digitally inflected history rather than digital history, to borrow a distinction used in teaching. Computational methods clearly play a greater role in their larger project—as they described in another recently published article—and the conclusion of this article lays out how the authors plan to develop their digital tools. However, in the narrative, text mining is used only to confirm the choice of Arkansas as a representative case study, and to point to possible sources for the new language that appeared in state law in 1991. In neither case did the text mining reveal strong similarities. Four state laws passed after the 1997 Arkansas law shared significant elements with that law. Testing phrases did not reveal a source for the new sections for the 1991 law; therefore, Tanenhaus and Nystrom tested for the frequency of individual words, and weighted those common in one document but rare in the corpus more heavily, an approach effective only in giving a sense of common ideas, not whether language was borrowed. That text mining found similarity with many of the transfer laws passed during the mid-1990s, but especially in Virginia, which helped focus the analysis on the role of prosecutors.
Kellen Funk and Lincoln Mullen's work in progress on the transmission of the Field Code (New York's Code of Civil Procedure) also measures the similarity of documents, but not for discovery, as Tanenhaus and Nystrom did. Instead, Funk and Mullen seek to answer a specific question: how did the Field Code influence other American jurisdictions? That approach requires knowledge of the nature of a set of sources. Rather than mining all nineteenth century statutes of procedural law, Funk and Mullen worked with only potentially relevant laws: 135 statutes from the nineteenth century, amounting to 7,700,000 words organized into 98,000 regulations. Measuring the similarity of the sections of those laws reveals patterns in how law migrated at several different scales of analysis. An overview of the relationships among codes as a whole shows a network that features several different branches. Looking at borrowings in each code reveals a variety of different patterns in how many sections each code borrowed from another code. Finally, to find small changes in the wording and substance of the law, sections are grouped based on their similarity to one another, regardless of which code they come from, putting them in the context of their particular variations, not particular codes. This clustering shows, for example, that the Field Code expanded witness competency so that it excluded only the insane and very young children, but legislators in California grafted on older racial exclusions from Midwestern states, which were then adopted in the codes of many other Western states. Similar bars on testimony by nonwhites appeared in Iowa's code, which relied exclusively on understanding of a legal oath to establish competency, a model reproduced a small number of other states.Footnote 40
Whereas Tanenhaus and Nystrom and Mullen and Funk use computational tools to measure the similarity of documents, Charles Romney measured the similarity of the context in which words appear as a means of discovering the similarity of concepts used in legal decisions.Footnote 41 What this technique counts are the words that co-occur with the key word. He argues that using this computational tool is an approach analogous to the contextual close reading of Skinner's Cambridge School of intellectual history. Calculating similarity helped him identify both persistent concepts and moments of conceptual changes in the law. Specifically, he found in the Hawaii decisions a stable language of liberty across disparate fields of law and different periods of time, and the moment when the legal discourses about labor and habeas corpus crossed.Footnote 42
It is striking that, to date, legal historians have not used topic modeling, the computational text analysis tool mostly widely used in the digital humanities.Footnote 43 These algorithms produce possible topics by identifying clusters of words that appear in proximity to each other, which are in the same context. The algorithm divides the texts into as many topics as the user specifies to produce a model of probable topics; not a picture of the topics in a corpus. It is the “the task of the interpreter [researcher] to decide, through further investigation, whether a topic's meaning is overt, covert, or simply illusory.”Footnote 44 A tool for topic modeling The Proceedings of the Old Bailey Online, 1674–1913 does exist, but as yet no scholarship making use of it has appeared.Footnote 45 Tanenhaus and Nystrom rejected topic modeling because they were concerned that the results of their computational work be reproducible, and topic models “are not designed to give the same answers each time—that is, with the same inputs, and the same set of procedures, the outcomes can vary from one run to the next.” However, the results of topic modeling are not answers, in the sense of providing evidence of the meaning of a set of sources. They are a place to begin, a pathway for discovery. Any answers they offer will come only when they are explored through close reading.
Given the current digital projects of legal historians, a more compelling explanation for topic modeling not having been used in digital legal history is that there is no need for it. The nature of legal sources—their consistent structures, comprehensiveness, and distinctive and repetitive language—means that tools based on word frequency are effective in discovering what is in sources, in revealing semantic information. Historical sources are more often characterized by a lack of structure and a variety of different types of information and language that causes text mining to be less revealing of their contents and meaning. It is historians exploring newspapers, magazines, and State Department memoranda and teleconference transcripts who have turned to topic modeling.Footnote 46
The nature of legal sources, together with the questions legal historians ask, likewise explains the relative lack of digital legal history projects using digital mapping tools. Drawn both by spatial questions, and by the lack of digitized historical sources, historians have “turned to digital mapping to a greater extent than other disciplines in the digital humanities, adopting it as their favored computational tool.”Footnote 47 Neither of the factors that draw historians in other fields to mapping have the same pull on legal historians. The relative wealth of digitized legal sources instead has led digital legal historians toward text mining, and the character of those sources has provoked questions about the law and legal practice. Locating London's Past extends the Proceedings of the Old Bailey Online, 1674–1913 to a mapping project that places the crimes tried in Old Bailey in the city of London. However, that site has not been as generative of scholarship as have the online Proceedings. One prominent digital history mapping project, Digital Harlem, which I created with colleagues at the University of Sydney, does map crime, using prosecutors’ records of felonies, but it is not a legal history. The focus of Digital Harlem is everyday life, and our concern has been more with what legal records tell us about the neighborhood than what they reveal about the law and legal process.Footnote 48 For example, our book about numbers gambling—Playing the Numbers: Gambling in Harlem Between the Wars—is a cultural history, exploring how the game permeated all aspects of life. The locations in which bets were placed, winnings collected, and games organized are only a small part of that story, as are the arrests that revealed that information, and the hearings and prosecutions that resulted.Footnote 49
Ng's article points to spatial dimensions to legal practice and the legal process that could be mapped. At this early stage of his project, Ng has mapped only the locations of lawyers’ offices. The future directions for the project that he mentions are related to policing; however, there are other legal spaces and places that could be mapped, such as courts, prisons, probation and parole departments, and medical and psychiatric clinics. Mapping these places would highlight how the movement of individuals through the legal process was spatial as well as bureaucratic. Digital Harlem, for example, shows that the two magistrate's courts that processed cases from the neighborhood were located beyond the boundaries of black settlement, to the south and north, in overwhelmingly white neighborhoods. Tim Hitchcock's current work creating a three-dimensional (3D) model of the Old Bailey courtroom is effectively a spatial history of the interior of the places of the legal process, one that places the words recorded in the Proceedings in the spatial context in which they were spoken. The model highlights the changing layout of the court, and the relative position of the different speakers, and allows for an exploration of how the different actors in a legal proceeding heard themselves and each other.Footnote 50
Mapping the legal process does not require the tools and approaches of geographic information systems (GIS) that Ng uses. Much of the concern with geometric and mathematical precision and technical challenges that occupies Ng in this article is related to working with GIS, which is ultimately unnecessary given the kind of point mapping that he is undertaking. His account reflects the constraints that limited historical mapping before the Internet more than the possibilities opened up by web mapping. Ng is trying to bring “the quantitative methods stemming from geographic information systems (GIS)…to bear on qualitative, historical and interpretive methods from the humanities.” Mapping in digital humanities involves something more, as Todd Presner and David Shepherd argue; a “reconceptualization of the significance of place in relationship to narrative, practices of representation, and digital technologies.”Footnote 51 It has been catalyzed by the availability of alternatives to GIS that are easy to use, and by web map mashups, such as Digital Harlem, projects built on top of platforms such as Google Maps. A map that effectively shows the patterns that Ng discusses could be created with a fraction of the effort it took him using a free web-mapping platform such as CartoDB, Neatline, Google Maps or Google Earth. Such a map would be dynamic and interactive, with users able to modify what they saw on a map using options such as filters, time sliders, panning, and zooming. It would be a research tool; a means of discovering as well as displaying knowledge.Footnote 52
One way to think about digital mapping is as a form of data visualization “that uses levels of abstraction, scale, coordinate systems, perspective, symbology, and other forms of representation to convey a set of relations.”Footnote 53 Other forms of visualization are deployed in the articles in this issue, most notably the graphs that Hitchcock and Turkel produce to “allow all the available data to be viewed at a single glance, and to facilitate an open-eyed engagement with the patterns revealed.” I could identify only one legal history project centered on visualizations: William Thomas's O Say Can You See? Early Washington, D.C., Law and Family. Dynamic network graphs offer a way to “explore the web of litigants, jurists, attorneys, and community members present in the court records,” and “legal, occupational, family, and social connections to each other.”Footnote 54 However, as Fred Gibbs recently noted, “as the volume of digitized historical data grows, the visualizations that help make sense of data at large scales will play an increasingly significant role in our analyses and interpretations of the historical record. They have a new element of necessity.”Footnote 55 Network graphs have a particular potential for legal history, offering a visualization of relationships that can be applied to questions of precedent and influence.Footnote 56
Although data visualization has only a small footprint to date in digital legal history, it is poised to become a prominent part of legal research. Ravel Law is undertaking the digitization of Harvard Law Library's collection of American case law discussed earlier to obtain data for a visualization product that creates visual maps of search results and networks of cases, as well as data visualizations of judges’ careers, showing all their decisions and citations, and the specific language that they use.Footnote 57 That product is only one of many launched recently to visualize legal research.Footnote 58 Individual legal researchers are also developing visualization tools. Colin Starger, for example, has developed software to create interactive visual citation networks for the United States Supreme Court.Footnote 59
These new kinds of data visualizations are helping renew interest in the possibilities for publishing history in digital formats. Print forms of scholarly communication struggle to accommodate visualizations. The problem exceeds the cost of reproducing images, especially the color images needed to capture visualizations, in scholarly journals. It is also that many of those visualizations are dynamic, and readers need to be able to interact with them to explore and assess their value as evidence. Currently, the solution is to have authors create and host online supplements.Footnote 60 Taken together with researchers’ increasing reliance on online sources—which remains somewhat occluded by the widely shared practice of citing print editions of sources even when an online edition was used—and on accessing secondary sources online, the growing use of visualization means that print forms are not able to accommodate the core elements of much historical scholarship. The limits of what can be done in print are coupled with new recognition of what is possible in digital forms. Cheil van den Akker has argued that, “the digital environment supports, indeed demands, new narrative forms that are more participatory, dialogic, procedural, reciprocal, and spatial.”Footnote 61 Thanks to a funding initiative launched by the Andrew W. Mellon Foundation, several university presses are developing digital platforms for scholarly publication. These projects range from a system and framework for publishing digital-born scholarship, and a platform for multimedia online journals, to iterative, networked, electronic versions of scholarly monographs to appear alongside the print edition of the book, and platforms for digitally enhanced monographs, and open-access monographs.Footnote 62 However, for the moment, the nearest most historical scholarship gets to a digital format is the PDF file in which print articles are delivered online.
Searching for digital legal history reveals that what currently distinguishes digital legal history as a field of digital history is its focus on text mining. Legal sources skew the field toward that approach, which offers a way of exploring questions about the re-use of text and concepts that characterizes key facets of legal process and legal publication. A wealth of opportunities exists to expand the use of this approach in analyzing the bills, statutes, and judicial opinions, the sources discussed in this issue, and beyond, to trials and treatises. To do so, legal historians must join with academic librarians in pushing commercial vendors to provide bulk access to their products, and to provide more information about the quality of their contents. It will also be necessary for legal historians to create and curate digital collections of their own, and here again, they will need to work with librarians, to ensure that their projects meet data standards. They also need to work with libraries and other institutions undertaking digitization to bridge the gap between digital collections and research data. A team at the Roy Rosenzweig Center for History and New Media that Sean Takats and I are leading is developing software that will help connect the digitization work of researchers and institutions. Tropy provides researchers with a tool to manage the digital photographs that they take in archives, and facilitates sharing that material with the archives that hold those sources.Footnote 63
Other forms of digital history have yet to engage the attention of legal historians, although that may be about to change. Through the multifaceted Digital Panopticon project, the team that played a pivotal role in introducing text mining to legal history through its creation of The Proceedings of the Old Bailey Online, 1674–1913 is poised to played a similar role with regard to data visualization, mapping, and 3D modeling. Developments in legal research are likewise putting visualization tools in front of legal historians. The results will likely echo what happened when LexisNexis and Westlaw introduced computerized databases. Legal historians will have access to computational tools ahead of scholars in other fields. As they start using those tools, they need to avoid the mistake we made when we started searching databases as part of our research. Legal historians need to explore how new tools are transforming their practice. And while they are doing that, it is important to start discussing how we as historians have been using searching to do legal history, and bring our current digital research practices and their consequences into our scholarship.