The use of digital equipment, methods, and techniques for recording, describing, and analyzing data has made it possible for archaeologists to take advantage of new sources of data and research opportunities. Digital methods and techniques have increased the amounts and intensity of field and laboratory data collection. However, few have confronted the challenges of digital data access, reuse, and preservation after data have been collected, described, and analyzed.
Before the turn of the twenty-first century, Hedstrom (Reference Hedstrom1998) described the challenges of digital data preservation faced by libraries. Ten years ago, Kintigh (Reference Kintigh2006) summarized the situation within archaeology. Waters (Reference Waters2007:8) described the increasing availability and “greatly expanded . . . scale of humanistic, social, and scientific data for scholars to digest.” He paired the opportunities presented by digital data with a challenge that their use creates, noting:
Individually and collectively, [scholars] must mobilize their resources to create a dependable, deeply scaled, and flexible infrastructure to help faculty and students interact with the electronic content in all the ways associated with rigorous scholarship, including discovering evidence, aggregating it, arranging and editing it for use, analyzing and synthesizing it, and disseminating results through reports and teaching.
Other recent articles (Hilton et al. Reference Hilton, Cramer, Korner and Minor2013; Rumsey Reference Rumsey2016; Science 2011) describe the challenges and highlight the need to access and preserve the vast amount of digital data produced by science and humanities disciplines. Reviewing this topic broadly, York and his colleagues (Reference York, Gutmann and Berman2016:3–4) warn of a “‘stewardship gap’ between the amount of valuable sponsored research data that is produced, and the amount that is effectively stewarded.” Recognizing the key roles of research and publication in the scientific enterprise, Lord and his colleagues at the United Kingdom's Digital Curation Centre (Reference Lord, Macdonald, Lyon and Giaretta2004) argue strongly for data curation as a third key component in science and scholarship. Figure 1 illustrates the relationship of data curation to the well-recognized “research” and “publication” processes and their interaction. Digital data repositories are a central part of scientific investigations serving as sources of background research and new hypotheses, as well as curation facilities in which newly generated research data are deposited at the end of an investigation. Repositories are needed not just for data preservation, but to ensure that data are easily discoverable, accessible, and usable as sources for new research.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20241103140419-81110-mediumThumb-S2326376817000183_fig1g.jpg?pub-status=prepub)
FIGURE 1. A model of integrated research, publication, and data curation processes (Lord et al. Reference Lord, Macdonald, Lyon and Giaretta2004).
THE DIGITAL DATA DELUGE IN ARCHAEOLOGY
Altschul and Patterson (Reference Altschul, Patterson, Ashmore, Lippert and Mills2010:297) estimated that as much as $1 billion is spent annually in the United States by public agencies and private sector companies involved in cultural resource management (CRM). Although it is impossible to precisely quantify the amount of data produced by these investigations, Altschul's (Reference Altschul2016) analysis of reports by federal agencies for the years 1985 to 2012 (Departmental Consulting Archeologist 2009:23–42, 2010:40–74; National Park Service [NPS] Archeology Program 2016) shows that staggering amounts of archaeological data have been collected. During this 27-year period, federal agencies report 851,271 archaeological field studies that surveyed more than 140,000,000 acres and recorded more than 880,000 archaeological sites. Among these field studies are 33,327 data recovery projects—an average of more than 30,000 field studies, including over 1,000 excavations per year. These numbers are substantial underestimates, since not all federal agencies reported or fully reported their cultural heritage work, and no state or local projects are included.
Of course, each of these 800,000-plus studies generates one or more digital files (e.g., reports, photographs, and data sets) that describe, analyze, or interpret the resources investigated. Even if each field study reported in Altschul's incomplete inventory produced only a single report, each site merited a single photograph, and each excavation merited a report, a database, and 20 photographs, that would total more than 2,000,000 digital files, with substantial amounts added annually. Suffice it to say that we have a huge backlog of legacy data, and each year, an enormous amount of new digital data are created, all of which need to be managed to ensure access and preservation for future use (Archaeology Data Service and Digital Antiquity [ADS and DA] 2013:9–60, 2016; Clarke Reference Clarke2015:313–318, 324; Lord et al. Reference Lord, Macdonald, Lyon and Giaretta2004; Rumsey Reference Rumsey2016).
The real problem is not that a great deal of digital data is being generated. The real problem is that most data resulting from older and current investigations are difficult or impossible to discover, access, and use. Our discipline's data most certainly suffer from the data “stewardship gap” (York et al. Reference York, Gutmann and Berman2016).
Of course, vast quantities of this digital data are never curated and are lost when hard drives fail, servers are replaced, or investigators retire or die. If the digital documents, images, and data sets from cultural heritage studies are curated at all, they are usually deposited with the artifacts and paper records in repositories that focus on maintaining physical objects. Few artifact repositories deal effectively with digital data—either to provide access or to ensure long-term preservation (Childs and Kinsey Reference Childs and Kinsey2004; Childs et al. Reference Childs, Kinsey and Kagan2010:196–197; Faniel and Yakel Reference Faniel and Yakel2017:109; Watts Reference Watts2011). Typically, the media containing the digital data are simply numbered, boxed, and curated as physical objects. The data are not accompanied by sufficient metadata for others to find or use them, nor are they easily discoverable or accessible online. They are not checked for readability or migrated as new digital file formats become standard.
The millions of digital files are not only the products of enormous public investments in cultural heritage management; they constitute a treasure trove of information. However, in order to exploit this incredible potential, we must be able to discover the existence of relevant files, and we must be able to acquire them in formats that contemporary software can use. In addition, digital files must be accompanied by the information that makes them discoverable, meaningful, and reusable (ADS and DA 2013:46–52, 2016; Faniel and Yakel Reference Faniel and Yakel2017:115–116; Faniel et al. Reference Faniel, Kansa, Kansa, Barrera-Gomez and Yakel2013:297–303; Kansa et al. Reference Kansa, Kansa and Arbuckle2014; Kintigh Reference Kintigh2010; Kintigh et al. Reference Kintigh, Altschul, Kinzig, Limp, Michener, Sabloff, Hackett, Kohler, Ludäscher and Lynch2015:8–11; McManamon Reference McManamon and Smith2014). If reports were searchable only by title or author, relevant documents would often not be found. What use can be made of a digital photograph labeled only with an automatically generated image number? A data set with numeric codes for pottery types is useless in the absence of a key explaining the codes. For digital data to be discoverable and useful in future research, they must be documented by and linked with rich contextual and descriptive metadata.
Most physical collections repositories lack the expertise, procedures, and resources to systematically acquire, make accessible, manage, and preserve digital files and associated metadata necessary for data sharing and reuse. Disciplinary digital repositories have emerged in archaeology and in other fields to fill these gaps. They have the expertise to provide the stewardship the data require in order to be exploited to advance knowledge. This stewardship is even more important in archaeology, where resources are nonrenewable and data often literally irreplaceable.
What must be done to transform the mass of existing data from an unusable backlog into an actively accessed, rich, and usable archive? What must be done to prevent adding annually to an inaccessible and unusable backlog? Without effective intervention, curation, and active stewardship, these data will soon be forgotten and unavailable for future uses. The investment in human energy, intellectual focus, and funding will be lost. The data and information, both those in the existing backlog and those created by current investigations, must be deposited in a digital archive where they can be discovered, accessed, preserved, and used.
Digital data repositories focused on archaeology and cultural heritage, such as the Digital Archaeological Record (tDAR; Digital Antiquity 2016) in the United States and the Archaeology Data Service (ADS 2017) repository in the United Kingdom, were established as responses to these problems. They constitute established infrastructure that is necessary if we are to take advantage of the wealth of data from past investigations and integrate it with information being created now and in the future. However, while this infrastructure is essential, so are changes in archaeological practice that will ensure the regular and systematic movement of data into these repositories (ADS and DA 2013, 2016; Kintigh et al. Reference Kintigh, Altschul, Kinzig, Limp, Michener, Sabloff, Hackett, Kohler, Ludäscher and Lynch2015; McManamon Reference McManamon and Smith2014).
If this tremendous store of data was easily discoverable and accessible, it could be used to streamline the historic preservation review process required for most public undertakings, to create informative new educational materials, to provide critical information for comparative and synthetic research addressing important scientific questions, to support traditional cultural practices, and generally contribute to the advancement of knowledge.
ACCESSING AND PRESERVING ARCHAEOLOGICAL AND CULTURAL HERITAGE DIGITAL RESOURCES— THE CURRENT LANDSCAPE
The Center for Digital Antiquity (Digital Antiquity) is one of a number of organizations that provide access to digital cultural heritage data. Some focus on particular regions, types of sites, or topics and aggregate relevant data and information in databases that are Internet-accessible. Some, like Digital Antiquity and the ADS in the United Kingdom, also provide long-term data curation and preservation, while others, such as the Chaco Research Archive (CRA) and OpenContext, utilize archival services provided by existing institutional repositories at major research universities.
An example of a project focused on providing access to data is the CRA (Heitman et al. Reference Heitman, Martin and Plog2016), an online archive and database that integrates much of the widely dispersed archaeological data collected from Chaco Canyon from the late 1890s through the first half of the twentieth century. The CRA (CRA 2017) includes thousands of images, data, reports, and other documents from investigations of ancient sites, architecture, and artifacts related to the ancient culture found in and near Chaco Canyon in the U.S. Southwest. A topically oriented project focused on data sharing is the Digital Archaeological Archive of Comparative Slavery (DAACS). DAACS (2016) is designed for intersite comparative studies focused on slavery and the lives of enslaved people in the United States and the Caribbean. Both DAACS and CRA focus on providing easy access to and fostering research on particular kinds of archaeological and cultural heritage resources and topics.
OpenContext focuses on publishing archaeological data sets, making them more accessible and advocating for greater professional attention to data archiving and publication (Atici et al. Reference Atici, Kansa, Lev-Tov and Kansa2013; Kansa et al. Reference Kansa, Kansa and Arbuckle2014). Managed and supported by the Alexandria Archive Institute (AAI), OpenContext is “an open access, web-based publication system for archaeology and other field sciences” (AAI 2017). At present, OpenContext lists data from 98 projects—some complete, others in various states of editing. Unlike the CRA and DAACS, OpenContext does not concentrate on one area or topic. Projects in Open Context show a concentration in the Middle East, but a number are in Europe and the Americas. OpenContext concentrates its publication on data sets, in particular archaeofaunal data. In a series of recent articles, OpenContext leaders Eric C. Kansa and Sarah Whitcher Kansa and colleagues describe research involving reanalysis of archaeofaunal data sets created by different researchers and research projects (e.g., Arbuckle et al. Reference Arbuckle, Kansa, Kansa, Orton, Cakirlar, Gourichon, Atici, Galik, Marciniak, Mulville, Buitenhuis, Carruthers, De Cupere, Demiregi, Frame, Helmer, Martin, Peters, Pollath, Pawlowska, Russell, Twiss and Wurthenberger2014; Atici et al. Reference Atici, Kansa, Lev-Tov and Kansa2013; Kansa and Kansa Reference Kansa and Kansa2013). The method for data publication advocated by OpenContext is for data sets submitted by researchers to be reviewed and modified as necessary for structural coherence, consistency, and general and domain-specific quality by OpenContext editors before they are published (Atici et al. Reference Atici, Kansa, Lev-Tov and Kansa2013:675–677). Once published, the OpenContext data sets are archived with the California Digital Library. Digital Antiquity and the AAI are sharing links between archaeological site data common in both the Digital Index of North American Archaeology (DINAA) project in OpenContext and tDAR resources.
Institutionally focused data repositories are available for faculty, research staff, and students at a number of major research universities, such as the University of California system, the University of Michigan, Harvard University, Johns Hopkins University, and the University of North Carolina. These “institutional repositories” focus on archiving research data created by their faculty, research staff, and students (Johnson Reference Johnson2002; Lynch Reference Lynch2003). While they may provide access to the archived data, the breadth and ease of access may be limited by the extent to which the data are exposed to external search engines and any institutional limitations placed on access to the data. Lynch (Reference Lynch2017) has raised a number of concerns regarding institutional repositories. He notes that faculty use of such repositories is spotty and that as universities come to realize the staff and technological commitment required to run a dedicated, high-quality digital repository, smaller institutions may find the administrative and financial commitment too great (Lynch Reference Lynch2017:126–127). For individuals, CRM firms, and other organizations outside of universities, there may be no way that they can typically deposit the digital products of their contracts or other research in institutional repositories. It also may be difficult for external users to find relevant data in institutional repositories, because the diversity of research data created by the wide range of disciplines encompassed by major research universities makes it impossible to supply the level of metadata detail needed for outside scholars to find and access the data (Ember et al. Reference Ember, Hanisch, Alter, Berman, Hedstrom and Vardigan2013:5).
Domain repositories provide digital data access, curation, and preservation designed to serve specific scientific and scholarly communities (Cambridge Concord Associates 2013; Ember et al. Reference Ember, Hanisch, Alter, Berman, Hedstrom and Vardigan2013). They are particularly valuable because they combine “domain-specific scientific knowledge, expertise in data stewardship, and close relationships with scientific communities” (Ember et al. Reference Ember, Hanisch, Alter, Berman, Hedstrom and Vardigan2013:2). Digital Antiquity's tDAR repository is an example, as is the Archaeology Data Service's repository in the United Kingdom (Archaeology Data Service 2017; Richards Reference Richards2017). Both tDAR and the ADS repository focus on providing access to archaeological data and its reuse, as well as actively curating data for their long-term preservation.
CREATING THE CENTER FOR DIGITAL ANTIQUITY AND tDAR
History
In the last years of the twentieth century, the difficulty of accessing and integrating information from multiple archaeological projects was a major obstacle to the synthetic research being pursued by a group of Arizona State University (ASU) researchers. This ASU team of archaeologists, computer scientists, and digital librarians initiated an effort to develop general-purpose computational tools for synthesizing data sets recorded by multiple investigators using different coding conventions data sets. Their early efforts resulted in a National Science Foundation (NSF)–funded workshop in 2004. The 31 workshop participants, drawn from archaeology and computer science and representing a variety of universities and public agencies, developed recommendations concerning archaeology's need for information infrastructure (Kintigh Reference Kintigh2006). The report's recommendations were endorsed by the Society for American Archaeology (SAA), the Society for Historical Archaeology, and the American Association of Physical Anthropologists. Based on these recommendations, NSF funded development of a prototype information infrastructure (from which tDAR ultimately developed) to facilitate synthetic and comparative research.
This work to articulate and address discipline-wide concern caught the eye of program officers at the Andrew W. Mellon Foundation. The Mellon Foundation has supported projects involving archaeological data, including the CRA (2017; Heitman et al. Reference Heitman, Martin and Plog2016), the DAACS (2016), and scanning of records and notes of decades of excavations at the Athenian Agora and Ancient Corinth (American School for Classical Studies in Athens 2016). In 2006, the foundation's Scholarly Communications program convened a multi-institutional group of archaeologists to consider the feasibility of a large-scale digital repository for archaeological data. The scope of the repository was to be comprehensive, not limited to a single site, time period, culture, or region of the world. Keith Kintigh (ASU), Tim Kohler (Washington State University), Fred Limp (University of Arkansas), and Dean Snow (Pennsylvania State University) were among the participants. In 2007, the Mellon Foundation funded a planning grant to develop a proposal for a self-sufficient, disciplinary, digital repository and an organization with a structure and business model that could support it. Jeff Altschul (SRI Foundation), John Howard (ASU Associate University Librarian), and Julian Richards (ADS and University of York) joined those listed above as coauthors of the proposal.
The following year, Mellon Foundation funded the implementation proposal submitted by ASU, with Kintigh as the lead principal investigator, in partnership with co-PIs from the additional institutions involved in the original planning. The Mellon grant supported the establishment of Digital Antiquity at ASU to oversee the development of tDAR and manage it as a domain repository for archaeology and cultural heritage. The grant provided initial support to hire staff with the administrative, archaeological, informatics, and programming expertise necessary to transform the NSF-funded prototype into a usable, publically available, global digital repository.
The tDAR Repository
Since its initial development as an experimental tool for data integration, tDAR has grown into a full-fledged digital repository for storing, discovering, accessing, and using archaeological and cultural heritage documents, images, data sets, and other digital materials. This development effort started by identifying basic design requirements for the repository, developing the software to meet those requirements, and then refining the application to improve and simplify the user experience. Cognizant of the enormous scale of the backlog and the large annual flow of data, the application was designed, from the beginning, to allow users to contribute data directly—without staff intervention—through easy-to-use forms for uploading and managing a user's materials and logical workflows. Additional features assist in the management of the repository, including tools such as “collections” and “projects” that allow for the organization, management, and display of groups of records. In 2010, tDAR became an operational repository with advanced tools to further archaeological and cultural heritage data management and research.
The tDAR code is open-source and licensed under the Apache 2.0 license. Where possible, the tDAR development team leverages existing standards and conventions in development, employing common formats and tools above inventing new ones. Digital Antiquity is working to ensure that tDAR complies with the full OAIS standard (CCSDA 2012). The repository is a J2EE application written using the Hibernate, Spring, Struts2, JQuery, and Bootstrap open-source libraries to assist in the development of basic infrastructure components. To ensure application quality, the tDAR development environment contains over 1,100 tests that must be passed before a new release is deployed. Wherever feasible, the application is designed to be as modular as possible to enable components to be replaced as more modern ones are developed.
Digital Antiquity has a professional staff, including an executive director, director of technology, archaeologist/project managers, digital curators, administrative support, and programmers who develop and maintain tDAR's code, features, user interface, curation workflows, and content. Digital Antiquity is overseen by an independent Board of Directors, which includes the archaeologists who were involved in the earlier planning, as well as experts in digital libraries, finance, law, and nonprofit organization management. The organization's mission is to extend knowledge of the human past and improve the management of cultural heritage by permanently preserving digital archaeological data and supporting their discovery, access, and reuse. Digital Antiquity further seeks to transform the practice of archaeology such that the digital data, reports, images, and records, accompanied by appropriate metadata, are routinely discoverable, accessible, used, and preserved.
The immediate operational objectives of Digital Antiquity are to (1) provide long-term preservation and access to the documents and data in tDAR; (2) expand the user-contributed content in tDAR and enlarge the community of tDAR users; (3) build a consistent, regular revenue stream sufficient to sustain the center and repository financially; (4) enhance the experience and research capabilities for tDAR users through a robust and scalable technical foundation; and (5) ensure that the administrative and organizational framework provides a strong foundation for growth and sustainability. As expressed in the organization's mission and objectives, commitment to improvements in professional practice regarding data management and digital curation means that Digital Antiquity and associated experts invest considerable time in promoting best practices. Professional publications and particularly presentations and workshops are among the activities undertaken (e.g., Ellison Reference Ellison2017; Ellison and Brin Reference Ellison and Brin2015; Ellison et al. Reference Ellison, Eyre and Brin2016; Kintigh Reference Kintigh2010; McManamon Reference McManamon and Smith2014, Reference McManamon2017; McManamon and Flores Reference McManamon and Flores2016; McManamon and Kintigh Reference McManamon and Kintigh2010a; McManamon and Richards Reference McManamon and Richards2015, Reference McManamon and Kintigh2016; McManamon et al. Reference McManamon, Kintigh and Brin2010).
Funding and Sustaining Digital Antiquity and tDAR
Adequate funding to support a digital repository is, of course, crucial. This is a challenge for archives and data repositories in many fields. Digital Antiquity's business plan is informed by the growing body of research on sustaining a digital repository (e.g., Guthrie et al. Reference Guthrie, Griffiths and Maron2008; Maron Reference Maron2014; Maron et al. Reference Maron, Smith and Loy2009). In particular, surveys of existing institutions show that a diverse strategy, securing revenue by providing a range of services and/or from a number of different sources is common and may be a necessity for financial adaptation (Erway Reference Erway2012; Erway and Rinehart Reference Erway and Rinehart2016). Potential sources listed by Maron and colleagues (Reference Maron, Smith and Loy2009:21–26) include subscriptions, licensing to publishers and users, custom services and consulting, corporate sponsorships and advertising, deposit/upload fees, endowments, grants, and other sources of donated funds.
The Digital Antiquity business model identifies four revenue streams: full-service digital curation; self-service curation; grants; and institutional funds and services. Since Digital Antiquity was created in 2009, most new revenue has come from full-service curation for public agencies that use tDAR to preserve and make their data accessible because they do not have the expertise or staff to do so on their own. Full-service digital curation is charged on an hourly basis for staff time, plus per-file upload fees. As part of full-service projects, Digital Antiquity staff provide the data management and digital curation consultation and carry out related activities for clients. The activities may involve organizing data and files, technical checking of file formats and integrity, file uploads, metadata drafting and editing, and, in some cases, specialized data management or digital curation programing.
We define self-service digital curation as metadata record creation and file upload to tDAR by data contributors without any intervention by Digital Antiquity staff. The tDAR website, which provides a set of metadata record templates and instructions for uploading files, enables this direct, “do-it-yourself” digital curation. Once the metadata records are created and files uploaded, it also is easy for the depositor to manage data. Self-service curation has the greatest potential for long-term revenue growth, as more contributors upload their own data. Digital Antiquity charges $10 per file (up to 10 MB) for individual files. The charge is discounted to $5 per file for purchase of 100 or more files. Grants, notably from the Andrew W. Mellon Foundation, NSF, and, very recently, the National Endowment for the Humanities, directly support tDAR software development and Digital Antiquity operations. Institutional support includes funding and services provided to Digital Antiquity by ASU and the SAA in support of its general operations, not specific project activities.
USING tDAR
Digital Antiquity provides digital curation services to individual researchers and organizations that require a repository where they can manage access to and preservation and use of their data. Data in tDAR have been contributed by over 350 individuals and organizations, affiliated with a wide range of colleges and universities, private sector consulting firms, and public agencies; for example, the University of Washington, Washington State University, the University of Arkansas, the Pennsylvania State University, Harvard University, Brockington Associates, Statistical Research Inc., the PaleoResearch Institute, the New York State Museum, the North Carolina State Archaeologist's Office, the Bureau of Land Management, the U.S. Army Corps of Engineers, the U.S. Air Force Environmental Center, and some units of the NPS.
Data are deposited in tDAR for many reasons. Agencies, organizations, and individuals use the repository and services provided by Digital Antiquity to manage documents and data for which they are responsible, including substantial amounts of gray literature and data not destined to appear in traditional scholarly publications. Some contributors deposit data to meet their professional and ethical obligations, or grant or permit requirements for open access to documents or research data. Some tDAR depositors create collections of data to share among their research team members who are located at different institutions.
Depositing Data in tDAR
Depositing data in tDAR enables easy discovery and access and secures the long-term preservation of diverse forms and formats of digital archaeological data, including the most common digital formats for documents, data sets, geospatial data, and 2D and 3D images. It equally serves both newly generated and legacy data. Templates available on the website (Digital Antiquity 2017) guide data contributors through a comprehensive process of metadata entry and file upload.
Digital files deposited in tDAR can be documented thoroughly by administrative and technical metadata for preservation, descriptive metadata for effective resource discovery, and detailed semantic metadata needed to permit sensible scientific reuse of the data. tDAR's metadata includes general and bibliographic components incorporating Dublin Core and the Library of Congress's MODS metadata standard. The technical metadata meets the Library of Congress's PREMIS metadata standard for capturing technical, preservation, and rights information. tDAR includes metadata fields tailored for archaeology such as site numbers, feature types, cultural terms, period names, and investigation types. For data sets, metadata includes detailed information documenting data sets tables, columns, and nominal values. Detailed guidance regarding the creation of tDAR metadata records and uploading of data are available in the tDAR “Help and Tutorials” web pages (Digital Antiquity 2017).
Digital Object Identifiers (DOIs) and persistent URLs are assigned automatically to all records. These unique designations provide a permanent means of citing and locating the records. tDAR metadata enables explicit crediting not only of authors or creators of files but also records credit for a variety of other individual and institutional roles, from laboratory analyst to funding agency to landowner.
Access to tDAR content is provided through a web interface with basic and advanced (including spatial) search capabilities. Content is indexed by Google and other major search engines. Apart from legally protected confidential data and data temporarily embargoed by their contributors, all data are freely available over the web to registered tDAR users. Metadata records are publically available without registration. Legally protected site locations are obfuscated in tDAR's geographical search and display. Data that are not designated as “confidential” or “embargoed” by their depositor may be reused, redistributed, or transformed, subject only to the provision of appropriate credit to the author(s) or data creator(s), proper citation, and indication of any changes made (Creative Commons Attribution 3.0 Unported License). All data downloads include suggested citation information. tDAR's unique data integration tools advance scholars’ capacities for comparative and synthetic research (e.g., Spielmann and Kintigh Reference Spielmann and Kintigh2011).
tDAR Content and Uses
Content grows daily in tDAR. As of December 2016, tDAR includes more than 383,000 records of archaeological reports and other documents (over 9,800 of these include a digital full-text file), 20,000 images, 1,100 data sets, 450 geospatial data sets, 150 landscape (lidar) or 3D object scan files (Figure 2). Data contributors have created over 900 “projects” in tDAR to organize their data and for the ease of data entry that this option provides. Over 2,000 “collections” have been created by data contributors to organize their data administratively or functionally.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20241103140419-91806-mediumThumb-S2326376817000183_fig2g.jpg?pub-status=prepub)
FIGURE 2. Cumulative growth of tDAR contents by year (2009 through end of 2016).
Most of the data in tDAR are from North America. However, the scope of the repository is international, spanning the archaeological and historical records of all continents (Figure 3). In addition to U.S.-based organizations, others located outside of the United States are using tDAR for their digital data management needs. In 2016, Digital Antiquity agreed to serve as the repository for the Field Acquired Information Management Systems Project (FAIMS) of Australia (FAIMS 2017). Digital Antiquity also established an agreement with the Museum of Ontario Archaeology and the University of Western Ontario to explore how their Sustainable Archaeology Program can use tDAR as a repository for access to and preservation of digital information related to its physical collections.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20241103140419-89593-mediumThumb-S2326376817000183_fig3g.jpg?pub-status=prepub)
FIGURE 3. Schematic view of global contents of tDAR showing some examples.
Contributors of data to tDAR use their content to preserve, make available, and otherwise manage access and use of their information. Professional colleagues report that they use tDAR data for various types of background research. Data depositors add to and use tDAR content for research on a number of topics, including research related to change over time in ancient economies in the U.S. Southwest and Midwest (Neusius Reference Neusius2017; Spielmann and Kintigh Reference Spielmann and Kintigh2011). Broadly useful data being uploaded include 32,863 well-documented tree-ring dates and related metadata from across the Southwest, the largest and most comprehensive database of its kind to date (Kohler and Bocinsky Reference Kohler and Bocinsky2015).
From nonspecialists, we receive e-mail messages about using tDAR for research on family and local histories, locating researchers whose work is represented in the archive's records, and locating copies of documents for which tDAR has only citation-level data. In 2016, monthly page views of content have typically ranged between 50,000 and 70,000. Since 2014, content downloads consistently have exceeded 1,000 per month. Most content seekers discover tDAR records via simple Internet search engine queries, although the website provides much more sophisticated search capabilities.
Physical artifact curation facilities use tDAR as a “digital annex” for their collections-associated digital files. Digital Antiquity worked with the Maryland Archaeological Conservation Laboratory and Fort Lee Regional Repository to establish an archive in tDAR for digital files from 17 military facilities for which these two repositories curate physical collections and associated paper records (Cofield and Eyre Reference Cofield and Eyre2014; Department of Defense Legacy Program 2014).
Publishers use tDAR to provide access to data that supplements their publications or to distribute technical reports (notably ones that are out of print). The University of Pennsylvania University Museum Press collection (University of Pennsylvania Museum 2017) provides access to 46 data sets, 75 documents, and over 100 images as data supplemental to their academic publications. Statistical Research Inc., a large CRM firm, provides access to its SRI Press Technical Reports series through tDAR (SRI Press 2017). The NPS Midwest Archeological Center (2017) provides access to its various report series through tDAR, as well.
Research programs use tDAR so that collaborators at different institutions have access to the same data (e.g., Kuril Biocomplexity Research 2017; North Atlantic Ecodynamics 2017). A long-lived research program that is using the archive is the Mimbres Ceramic Database (2017; see also Hegmon et al. Reference Hegmon, McGrath and Munson2017:139). Organized by Steven LeBlanc (Harvard) and Michelle Hegmon (ASU), this collection contains over 10,500 images from Mimbres vessels. Part of the collection is accessible to registered users; access to a fuller database requires review and approval by Hegmon. tDAR allows data contributors to permit general public access to part of their collection but also provides for control of broader access to vetted researchers. Public agencies use tDAR to share data among different contractors working on particular programs. One example of this is the Bureau of Land Management's Permian Basin Program (2017), which shares some of its reports (in redacted form) with the general public and limits access to full reports to CRM firms and researchers working with their program.
DIGITAL ANTIQUITY USER EXAMPLES
Digital Antiquity provides a variety of data management and digital curation services. Among the most common are self-service digital curation, in which contributors directly deposit data in tDAR without intervention by Digital Antiquity staff, and full-service digital curation, in which Digital Antiquity staff provide the organization, file check and upload, and metadata drafting and editing for clients. The following are a few examples from among the work Digital Antiquity has done or that is underway.
Self-Service Curation
The Eastern Mimbres Archaeological Project (EMAP) uses self-service curation. EMAP is an extensive and long-term investigation of the post-AD 1000 history of the eastern Mimbres area in southwest New Mexico that is codirected by Drs. Margaret Nelson and Michelle Hegmon. The tDAR collection of data from EMAP began in 2012 (Nelson and Hegmon Reference Nelson and Hegmon2016). Included in this collection are excavation records and reports for the sites studied, analysis tables and data sets, dating results, field notes and maps, photographs, and reports. Most of the EMAP data and information are publically accessible. This includes about 300 reports, articles, or other documents; about 80 images; and 20 data sets. Postdoctoral fellows and graduate students working for EMAP are familiar with the tDAR data upload and metadata creation templates by using them on a regular basis. With a little experience, they easily create project records and upload files to the repository. The project principals and affiliated researchers access and manage the records without any direct intervention from Digital Antiquity.
A second example of self-service digital curation is the large collection of research reports (3,298 documents so far) being built by Dr. Linda Scott Cummings of the PaleoResearch Institute and her staff (Cummings Reference Cummings2016). This collection was set up in 2012. Most of the reports are publically accessible, although a few of the PaleoResearch clients have requested that their reports be held as confidential and only made available following review of the request for access by Dr. Cummings or the clients themselves, a procedure that is easily conducted for files designated by contributors as “confidential” in tDAR.
Full-Service Curation
Beginning in 2011, Digital Antiquity undertook full-service curation work for the CRM program of the Phoenix Area Office (PXAO) of the Bureau of Reclamation (Digital Antiquity 2013). The PXAO is responsible for extensive cultural resource data and records collected and created since the 1970s as part of large investigations carried out on the canal reaches, lands, and reservoirs constructed and managed by the agency in central and southern Arizona. These paper reports, other documents, images, and related data occupied considerable space in PXAO offices, and specific contents of this paper archive were difficult to locate, search, use, and share among staff, contractors, researchers, and the interested public. PXAO faced three problems in managing the information for which it is responsible: (1) how to make the large amount of legacy data useful internally and externally, (2) how to ensure its long-term preservation economically, and (3) how to treat new data in a way that could make it immediately useful within a system that would ensure its preservation
The PXAO had already digitized many of the paper records related to its older archaeological projects, in particular from various parts of the Central Arizona Project. Together, Digital Antiquity and PXAO staff created a simple full-service workflow to manage the deposit of records into tDAR. Digital Antiquity staff checked the digital documents for completeness and appropriate formatting and passed the PDF files through an optical character reader (OCR) program so that they could be easily searched. Digital Antiquity's curators reviewed the texts and illustrations to identify sensitive information that might need to be redacted or designated as “confidential.” When encountered, a redacted version of the report, which could be made public, was created with the sensitive information (mainly detailed site location) removed using Adobe Acrobat Pro's redaction tools. The full report, marked as “confidential” in tDAR, is accessible to users authorized by PXAO. Digital curators created appropriate descriptive and administrative metadata for each PXAO file record, then uploaded it to tDAR as “draft” for the PXAO to review. Staff also organized the PXAO digital archive so that documents, images, and data sets could be retrieved easily for administrative, educational, management, or research purposes. Once the PXAO CRM office staff reviewed and approved the tDAR metadata records, curators changed the records’ status to “active,” making the metadata public and the files, if they are not marked “confidential,” freely accessible to users.
Outreach to professionals and the general public became possible when the collections and records were made public (Bureau of Reclamation 2016). A recent example involved the PXAO Theodore Roosevelt Dam Archaeological Project (2017), including detailed technical site reports, synthesis volumes, shorter descriptive articles, and data sets from the Bureau's multiyear set of investigations in the Tonto Basin of central Arizona (McManamon and Kintigh Reference Kintigh2016).
Digital Antiquity curators also provided assistance and training to the three “on-call” PXAO contractors who deposit information from new PXAO cultural resource projects through tDAR's self-service website interface. Requiring contractors to enter the data, and placing the requirement in the on-call contract, the PXAO is able to avoid growth in a backlog of data not properly curated. This workflow is an effective and economical and should be adopted widely.
TWENTY-FIRST-CENTURY ARCHIVES: MAKING DIGITAL CURATION A REGULAR PART OF ARCHAEOLOGY
The research value of and critical need for effective digital archiving for archaeological and related data and information is highlighted in the concluding paragraph in a recent article in the Proceedings of the National Academy of Sciences addressing Grand Challenges for Archaeology:
Although new archaeological field work will be needed, the greatest payoff will derive from exploiting the explosion in systematically collected archaeological data that has occurred since the mid-20th century, largely in response to laws protecting archaeological resources. Both the needed modeling and synthetic research will require far more comprehensive online access to thoroughly documented research data and to unpublished reports detailing the contextual information essential for the comparative analyses. Indeed, our survey emphatically reinforced the need for the kinds of online access provided by the Digital Archaeological Record (United States) and the Archaeology Data Service (United Kingdom) [Kintigh et al. Reference Kintigh, Altschul, Beaudry, Drennan, Kinzig, Kohler, Fredrick Limp, Maschner, Michener, Pauketat, Peregrine, Sabloff, Wilkinson, Wright and Zeder2014a:879, Reference Kintigh, Altschul, Beaudry, Drennan, Kinzig, Kohler, Fredrick Limp, Maschner, Michener, Pauketat, Peregrine, Sabloff, Wilkinson, Wright and Zeder2014b].
Recognizing that archaeological resources are nonrenewable, SAA's statement of ethical principles include “Principle No. 7: Records and Preservation: Archaeologists should work actively for the preservation of, and long-term access to, archaeological collections, records, and reports” (Society for American Archaeology 1996; see also Kintigh Reference Kintigh2006:571–572; McManamon and Kintigh Reference McManamon and Kintigh2010b).
Cultural Heritage Partners, PLLC (CHP), a DC-based law firm, conducted a legal analysis of the requirements for curation of digital archaeological and cultural heritage data based on the National Historic Preservation Act (NHPA), the Archaeological Resources Protection Act (ARPA), and regulations (36 CFR 1220.1–1220.20) promulgated by the National Archives and Records Administration (NARA). Their report concludes that “relevant federal laws, regulations, and policies mandate that digital archaeological data generated by federal agencies must be deposited in an appropriate repository with the capability of providing long-term digital curation and accessibility to qualified users” (CHP 2012:10). Public agencies responsible for the care of archaeological and cultural heritage resources and related data, whether they have direct resource management or grant-related responsibilities, must enforce these requirements to ensure long-term preservation of and access to digital data. Consequences of the absence of enforcement include the loss of data, less informed and effective research, poorer resource management, and the squandering of scarce public funding.
Maintaining digital data documenting cultural heritage in a robust repository provides important benefits for a broad community of archaeologists, cultural heritage specialists, librarians, ecologists, historians, historical architects, economists, climate scientists, and other researchers who need the unique long-term data that only archaeology can provide. Members of the general public likewise value archaeological information as a means of learning about the ancient or historical past.
Despite compelling research demands, ethical obligations, federal regulations, NSF and NEH requirements for data management plans, government calls for open access (e.g., Office of Science and Technology Policy 2014; White House 2013), and evident benefits to a number of constituencies, appropriate stewardship of archaeological data is far from commonplace. The discipline must embrace project workflows and practices that ensure both newly created and legacy documents and data are fully documented and deposited in a publicly accessible digital repository where they can be discovered, accessed, and reused to enable new insights and build cumulative knowledge. Granting, regulatory, and resource management agencies must ensure that research products of their actions are deposited in public repositories that provide for data discovery, access, reuse, and preservation. Federal, state, tribal, and local agencies sponsoring or permitting archaeological investigations must require the deposit of data from these investigations in such repositories as a part of the each project's scope of work. All archaeologists share an obligation for effective stewardship of these irreplaceable data. The public and the profession all benefit when that obligation is met.
Acknowledgments
This material is based upon work supported by the Andrew W. Mellon Foundation, Scholarly Communications Program, by the National Science Foundation under grant numbers 0433959, 0624341, and 1202413, and by a joint award, number PX 50022 09, from the National Endowment for the Humanities and the Joint Information Systems Committee (United Kingdom). No permits were required for the research done as part of this article. The authors thank Rachel Fernandez, Digital Antiquity, for her assistance with the Spanish translation of the abstract. The authors appreciate the detailed comments we have received from the outside experts who reviewed the manuscript. We also appreciate very much the trust placed in Digital Antiquity by our clients. We work diligently and steadily with a focused mission to keep their data safe and manage how it is accessed according to their directions.
Data Availability Statement
Examples of data and collections of data referred to in the text are available in tDAR (the Digital Archaeological Record; www.tdar.org). In most or all cases, the specific links to data collections referred to in the text are included in the References Cited section of this work.