Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-11T07:08:07.697Z Has data issue: false hasContentIssue false

The Time of Data: Timescales of Data Use in the Life Sciences

Published online by Cambridge University Press:  01 January 2022

Rights & Permissions [Opens in a new window]

Abstract

This article considers the temporal dimension of data processing and use and the ways in which it affects the production and interpretation of knowledge claims. I start by distinguishing the time at which data collection, dissemination, and analysis occur (Data time, or Dt) from the time in which the phenomena for which data serve as evidence operate (Phenomena time, or Pt). Building on the analysis of two examples of data reuse from modeling and experimental practices in biology, I then argue that Dt affects how researchers (1) select and interpret data as evidence and (2) identify and understand phenomena.

Type
Data
Copyright
Copyright © The Philosophy of Science Association

1. Introduction: Data Time, Phenomena Time, and the Epistemic Role of Data Processing Efforts

Existing analyses of the epistemic status and role of scientific data have focused on synchronous aspects of research, often without considering how the diverse timescales characterizing the handling of data affect processes of inference and knowledge generation. In this article, I analyze the temporality of the data practices required to facilitate data-to-phenomena inferences and its impact on researchers’ inferential reasoning and understanding of the phenomena under study. I argue that concerns around the temporality and historicity of data practices affect any research situation in which data are (re)used at a time and place other than those in which they are generated. This article thus considers the epistemological concerns and challenges involved in processing data to facilitate their preservation and analysis in the long term and in identifying the conditions under which data can be kept, shared, and analyzed through time, thus enabling researchers to build on past efforts and boost future research.

Many philosophical discussions of the temporality of data and its implications for research revolve around the credibility of the evidential strategies employed by the historical sciences—typically defined as sciences that attempt to reconstruct and explain long-lost events and objects (such as extinct organisms, ecosystems, human cultures, and climatic conditions) and that therefore contend with scarce, sporadic, and partial data sources. The differential survival of evidence through time has been argued to provide relatively poor evidential ground for knowledge claims, making the historical sciences hostage to “lucky finds” in terms of what they can and cannot investigate and explain.Footnote 1 In this article, I argue that concerns around whether and how data maintain evidential value through time are not restricted to the historical sciences but are common to any field in which data acquired in previous periods can play a significant role as evidence for subsequent research or in situations in which investigators spend long periods of time investigating and revisiting the same data sets. These situations occur both in experimental and field-based research and whether or not the data in question are quantitative or qualitative. Indeed, I will argue that experimental data are particularly time sensitive due to the ever-changing nature of the know-how and laboratory conditions under which they are produced, which makes these data difficult to preserve as meaningful and reusable sources of evidence. This issue is often underestimated by philosophers who emphasize the degree of experimental control exercised by researchers at the moment of producing data yet disregard the ease with which such control can be lost once the original experimental setup changes or ceases to exist or the data are retrieved and examined by researchers working in different laboratory conditions.Footnote 2

As a starting point for analysis, I propose to distinguish between two types of temporalities involved in knowledge production and interpretation: the temporal dimension of data practices used to prepare and manage data so that they can be subjected to inferential reasoning (which I will refer to as Data time, or Dt) and that of the phenomena under investigation, for which data are meant to serve as evidence (Phenomena time, or Pt).Footnote 3 Dt is closely associated to the ways in which researchers manage time in their work, particularly to the constraints and opportunities posed by the time spent in the production, dissemination, and analysis of data. Pt refers instead to the assumptions that researchers make about the temporal features of their research targets and the ways in which such assumptions condition their understanding of the natural world as well as their investigative strategies.

The distinction builds heavily on Bogen and Woodward’s (Reference Bogen and Woodward1988) seminal work on the material conditions under which researchers make data-to-phenomena inferences, which stand in striking contrast to the constraints that apply to the development of a priori, logical inferences (Woodward Reference Woodward2000). At the same time, my views on the categories of “data” and “phenomena” differ from Bogen and Woodward’s in two respects. First, I explicitly endorse a relational understanding of the epistemology of data, according to which data are identified and conceptualized in relation to their function within specific situations of inquiry (Leonelli Reference Leonelli2015, Reference Leonelli2016).Footnote 4 Second, I favor an interpretation of the ontological status of phenomena as human constructs rather than actual features of the world—although, contrary to McAllister’s (Reference McAllister2010) antirealist interpretation, I view such constructs as highly constrained by the characteristics of processes and entities in the world and, thus, reliably capturing aspects of reality as researchers experience it (Massimi Reference Massimi2009; Feest Reference Feest2011).

These premises are salient to my proposed distinction between Dt and Pt. On the one hand, they are consistent with Dt and Pt being intertwined in scientific practice, with both dimensions typically affected by practical considerations such as the resources, materials, institutional frameworks, and technologies available to researchers. As Griesemer and Yamashita (Reference Griesemer, Yamashita and Schmidgen2005) argued in relation to research on biological model systems, phenomena have no intrinsic time-scale: the temporality that researchers ascribe to phenomena depends at least in part on the circumstances of inquiry, which often include issues of data access and data analysis. In a similar way, the temporality of data is defined largely by the research contexts in which they are used, which often include specific conceptualizations of phenomena. On the other hand, the interdependence of Dt and Pt does not make it any less useful to distinguish them analytically. Focusing specifically on Dt means paying attention to the efforts involved in data generation, processing, dissemination, and analysis and the large variability in the stages—and related timescales—through which any given data set is handled and interpreted. This temporal dimension can have a significant impact on how Pt is measured, but it is conceptually separate from Pt: Dt pertains to the realm of inquiry and research processes (the so-called context of discovery), rather than to the knowledge derived from such processes.

In what follows, I use the distinction between Dt and Pt to examine two cases from contemporary biological practice in which researchers attempt to reuse data previously collected by others as evidence for novel claims about phenomena.Footnote 5 The first case involves the construction of models to track and predict the spread of plant pathogens, which is grounded on the retrieval and integration of data from a variety of sources and is highly dependent on the accuracy with which Dt is preserved and managed. The second case is typical of experimental work on regulatory mechanisms within molecular and cell biology and concerns the retrieval and comparison of data collected on two species of yeast to study the role of the cell cycle in regulating transcription in humans, potentially resulting in breakthroughs in the understanding of cancer onset and development.

These examples illustrate how the distinction between Dt and Pt helps to highlight two important features of scientific knowledge production. First, the ways in which researchers acknowledge and document Dt affect the extent to which they can successfully preserve data, integrate them with other data, and (re)use them as evidence for new claims. In other words, knowledge about Dt affects researchers’ ability to identify relevant data and assess their reliability and significance as evidence for a given hypothesis. Second, knowledge of Dt affects researchers’ understanding of the phenomena for which data are taken to serve as evidence and, thus, the content of the knowledge claims derived from data analysis.

2. Data Reuse Case 1: Modeling the Global Spread of Plant Pathogens

My first example concerns contemporary attempts to track the global distribution and movements of plant pathogens (such as the fungus Hemileia vastatrix, responsible for the infamous coffee rust disease, or the various blights severely affecting the cultivation of banana, wheat, and major crops) over the last century, with the goal of identifying trends that may help to predict crop pathogen spread across the globe and its potential impact on agriculture. A key source of data underpinning such efforts are observational reports on pathogens. These reports are typically collected by field stations and plant clinics located on various sites around the world and later assembled into a unique body of evidence by initiatives such as the Plantwise database of the Centre for Agriculture and Bioscience International (CABI). CABI uses observational reports to produce maps tracking the geographical spread of different pathogens through several decades (e.g., fig. 1).

Figure 1. Global spread of tomato pathogen Oidium neolycopersici in 2007. Source: CABI Head Office, Map no. 1000, edition 1, CAB International, Wallingford, UK, 2007.

These efforts are hampered by the lack of consistent observational data documenting pathogen movements across different parts of the world. Records for low-income regions such as sub-Saharan Africa and South America, for instance, are patchy at best, and significant time intervals are missing even for well-monitored countries. Furthermore, the ways in which different field stations assemble, store, and disseminate pathogen observation data are highly variable and not easily amenable to integration into global maps. To remedy this situation, researchers have devised modeling tools to infer plausible pathogen movements from environmental factors. These models build on available knowledge of the conditions under which fungi are likely to produce spores and infect their hosts, such as temperature and the availability of water on the leaf surface of the plants in question (i.e., when it is too dry, too hot, or too cold, the spores die). This knowledge enables researchers to infer infection rates from the triangulation of observational data with measurements of air temperature, which are often available thanks to the ubiquity of meteorological stations, and estimates of the amount of water in the crop canopy.

Such work can then be used to develop models to predict future trends and target measures to stop the spread of harmful pathogens. It is at this point that this example becomes relevant for an investigation of Dt. This attempt to put old observational data to a new use prompted some of the researchers involved to take a closer look at how the observational data from pathogen reports had been compiled and assembled in the first place and the extent to which they could be reliably aligned with meteorological data. This brought to light two challenges that had not been apparent at the start of the modeling effort, and which I found to underpin most cases of data reuse in biology, with significant implications for data analysis and subsequent interpretations.

The first challenge lies in reconstructing Dt for the key data set underpinning CABI maps (i.e., pathogen reports). This involves several distinct events, which in some cases are separated by 1 or 2 decades, including:

Dt1: data collection, for example, the date on which a local farmer brought an affected plant specimen to a plant clinic for pathogen identification;

Dt2: compilation of observational data sets into a consistent report about pathogen spread in the region of interest;

Dt3: official publication of the compiled data, for example, in a journal or a report;

Dt4: use of publications as sources for national maps of pathogen spread;

Dt5: incorporation of national maps into a global digital repository or online database, such as that run by CABI;

Dt6: retrieval of the data from the repository for further analysis.

It turns out that published maps can be unreliable in their temporal location of data and subsequent data processing and interventions and often conflate Dt2–Dt6 with Dt1. Particularly in the case of data older than 10 years, extensive efforts are now required to date Dt1 and disentangle it from other Dts. In the absence of an accurate timeline for data processing, it is hard for researchers to construct reliable predictive models. The lack of certainty around Dt also decreases researchers’ ability to quantify underreporting and thus the extent to which data may be missing for specific areas/periods/pathogens (Bebber, Holmes, and Gurr Reference Bebber, Holmes and Gurr2014).

The second challenge consists in aligning Dt across the diverse data sources at hand, including pathogen location reports, water estimates, and climate data. Given the amount of processing required to prepare data for use in modeling, Dt is much longer and more complicated for pathogen distribution and water estimates than for air temperature—a difference made more dramatic by the recent introduction of direct temperature measurements via satellite. A consequence of this temporal mismatch is that integrating data sources requires considerable and expert judgment labor, with researchers needing to use their knowledge of the territory and the species in question to adjudicate specific cases. The ways in which researchers choose to temporally align these different data sets affect their characterization of the phenomena of interest (such as pathogen spread) and their temporal dimensions (e.g., the rate of spread), which go on to affect the predictive ability of the models in which they are assumed.

3. Data Reuse Case 2: Identifying Conserved Regulatory Mechanisms across Species

Shifting now from modeling to experimental practices, my second example concerns the use of experimental data to study the regulatory mechanisms at work in the cell cycle and assessing potential links between defects in protein regulation and the proliferation of tumor cells.Footnote 6 To identify and study regulatory pathways that may be conserved in humans, researchers often resort to analyzing data coming from much simpler forms of life, which are more tractable and easier to study. In the case of regulatory pathways involved in the cell cycle, a successful investigative strategy involves the comparative use of data collected from two types of fungi: fission yeast (Schizosaccharomyces pombe) and baker’s yeast (Saccharomyces cerevisiae). I will now focus on the management of Dt in relation to the collection and dissemination of data from these two organisms, particularly the role played by databases in making them available for comparative analysis.

S. cerevisiae has long been a favorite model in biology, with a vast repertoire of knowledge, databases, and tools available to researchers interested in studying the cell cycle. It thus constitutes an obvious starting point to identify new regulatory functions associated with cell replication. However, cerevisiae spends a lot of time in the G1 phase of the cell cycle, which is problematic for researchers interested in investigating the S and G2 phases of the cycle (see fig. 2). S. pombe, a much simpler form of life with three chromosomes to cerevisiae’s 16, turns out to work as an ideal complement: its S and G2 phases are longer, enabling researchers to scrutinize their potential regulatory functions, and the shared evolutionary history of the two organisms makes it plausible to expect that regulatory mechanisms found in pombe may be conserved in cerevisiae. The systematic comparison of data produced in cerevisiae and pombe produced findings that turned out to be conserved in humans, making this approach useful to understanding the emergence of cancer (e.g., Caetano et al. Reference Caetano, Limbo, Farmer, Klier, Dovey, Russell, Antonius and de Bruin2014).

Figure 2. Different phases of the cell cycle in S. cerevisiae. Source: Michel Durinx (CC BY 4.0), https://datastudies.eu/resources#cc-by.

What makes this investigative strategy possible is the opportunity to retrieve, visualize, and compare yeast data through PomBase and Schizosaccharomyces Genome Database (SGD), the main databases for pombe and cerevisiae. Robert de Bruin, a leading scientists involved in this work, describes the strategy as follows:

we used PomBase to find whether that [mechanism] was conserved in fission yeast. We could really easily establish it in fission yeast, and then we could go back to budding yeast now that we knew exactly what we’re looking at, and then found that also in budding yeast. … Now my work in my lab is all focused on that. Without going into fission yeast and having it accessible that easily, I would have never gone in, and I would have completely missed it. And people in budding yeast completely missed even that regulation, let alone the mechanism, and the same in mammalian cells. People have been studying that for decades and completely missed it.

(Transcript PI_8_A in Leonelli [2017])

Without the quick and accurate comparative tools provided by PomBase and SGD, it would have been much harder for researchers to compare data across the two species, potentially hindering the discovery of significant connections between their regulatory systems. As it turns out, making it “really easy” to explore data in this way requires labor-intensive practices of data annotation and curation, which largely determine which data sets are found online, what information about the data is captured and made available within databases, and how the data are presented and retrieved for inferential reasoning. Database curators pay particular attention to the selection and inclusion of information about the provenance of data, such as the time at which data were produced and further processed. This often means consulting directly with the original data producers, who are not always accurate when publishing their data as part of research articles and may quickly lose memory of or interest in these details after the end of their project. This information enhances researchers’ ability to assess the quality of the data and the extent to which each source is comparable to others.

Database curators also participate in the development and application of standard labels identifying the phenomena for which data may serve as evidence, a task made difficult by the diversity of terms used by different biological communities to denote the same genes, gene functions, and phenotypes (most often in the case of groups working on different species; Leonelli and Ankeny Reference Leonelli and Ankeny2012). The developers of both PomBase and SGD are deeply engaged in the construction of classification systems, such as the Gene Ontology and the Fission Yeast Phenotype Ontology, that make it possible for researchers to look for data of potential relevance to the phenomena they are interested in. This work requires regular updates of data formats and labels to reflect shifts in the knowledge base and in the technologies used to produce and disseminate data. Much effort is devoted to providing an accurate record of Dt, that is, a timeline for the ways in which data are manipulated to remain accessible and usable. Such a record is indispensable to the comparison of data acquired on different species, especially when—as in our example—researchers are not sure about the relation between the phenomena under investigation in the two types of organisms. In such cases, precise notations about the temporality and provenance of data are crucial to interpretation: researchers need to know when and why a certain data set has come to be associated with a given phenomenon and how such inference may be triangulated with findings coming from complementary approaches (such as the functional and evolutionary data required to establish a mechanism as conserved; Bechtel Reference Bechtel2006).

Remarkably, the better database curators accomplish these tasks, the more their work remains invisible to database users and research funders. The epistemic significance of these practices becomes visible whenever a lack of information about Dt affects the ability of database users to interpret the data. Because of the strong collaborative ethos and relatively small size characterizing the community of pombe researchers, PomBase curators are highly successful in eliciting accurate and updated information from data producers. This does not work as well in larger research communities, where database curators are confronted with much larger data sets, and data producers have little incentive to participate in data curation. It also fails within the increasingly nested landscape of data infrastructures characterizing contemporary biomedical research, where information about Dt is easily lost when data are passed from one platform to another—leading to a situation of high uncertainty around Dt, similar to what we encountered in the first example. Indeed, the yeast researchers I interviewed reacted strongly against the suggestion that it may be efficient to integrate all data relating to yeast species in one single database, which in their view may entail significant loss of information about data provenance.

4. The Epistemic Significance of Data Time

In the cases described above, Dt may span several different events over an extended period, ranging from the moment in which data are originally collected to the times at which they are modified to make them widely accessible and reusable. Ideally, given the significance of knowledge about Dt for data analysis and interpretation, the researchers and curators involved in data processing (including the compilation and visualization of data in publications, databases, maps, and models) should ensure that information about Dt accompanies the relevant data points in all stages of their travels. This is what happened in my second example, in which attentive data curation and a well-constructed database enable researchers to easily find information about Dt and use it for data analysis. However, particularly in situations in which Dt is recorded under diverse working conditions by individuals with different skills and goals, Dt can be remarkably difficult to track and retrieve. Our first example illustrates the problems that can emerge when researchers have limited access to the history of the data that they are analyzing.

Knowledge or ignorance of Dt can thus affect research processes and outcomes in (at least) two ways. First, it can alter researchers’ perception of which data sets are most reliable as sources of evidence, thus affecting the evidential value attributed to data in any given inquiry. Without access to accurate information about Dt, researchers may need to reject whole data sets or modify the ways in which they analyze them (e.g., by shifting their evidential weight in relation to other sources or seeking to triangulate them with other types of findings). This is clear in the case of pathogen spread, where researchers found it problematic to deal with data sets for which they could not distinguish between Dt1 and other Dts, thus throwing doubt on the reliability of existing data maps and related modeling tools. Although less overtly, this also happens in the case of cross-species study of transcriptional regulation, where researchers needing to reuse data available online need to be able to trace when the data were collected, with which technology, and by whom. These examples show that preserving data is not enough to facilitate their future reuse. Equally relevant is the preservation of information about the temporality of the interventions through which data have been assembled, circulated, and visualized.

Second, knowledge or ignorance of Dt can determine the frame and resolution at which phenomena are studied, thus affecting researchers’ understanding of phenomena—including their perception of what can be known and what aspects of a given target system are worth focusing on. Consider, for instance, the epistemic risk posed by the integration of data acquired on various aspects of a phenomenon of interest by a variety of sources, such as when bringing together genomic, metabolic, environmental, and physiological data collected by different research teams in order to study gene-environment interactions (see also O’Malley and Soyer Reference O’Malley and Soyer2012; Leonelli Reference Leonelli2016, chap. 6). Such integration is crucial to providing novel insights into complex processes (particularly in the emerging landscape of “big data” analysis), yet both examples show that it carries the risk of loss of information about Dt, since not all existing information about data is preserved when bringing large and diverse data sets together. This carries significant implications for data analysis and the reliability of subsequent inferential processes. Wrong information about Dt or difficulties in aligning different Dt can result in predictive failure of models or misleading cross-species inference.

5. Conclusion: Timescales of Data Processing and the Limits of Experimental Control

I proposed to focus on the temporal dimensions involved in practices of data processing and distinguish between Dt and Pt to shed light on the extent to which the diverse timescales of data and phenomena affect processes of inference and knowledge production. Through the analysis of two examples from biological practice, I have emphasized the complex set of conditions required to preserve data and related metadata in the long term and argued that data processing and related temporalities are crucial to inferential processes and (re)interpretation. I thus hope to have illustrated how tracing the movements of data through processes of inquiry, particularly the conditions under which data do or do not function as evidence, can help to foster philosophical understanding of how data processing affects the content of knowledge claims. Data are defined by their temporal characteristics as much as by their spatial and morphological ones, and underestimating the challenges and timescales involved in data processing can disrupt inferential reasoning and invalidate the use of data as evidence.

In closing, I briefly examine one implication of this argument, which is that experimental control over data production is not enough to guarantee good, (re)usable data. This is significant when considering philosophical accounts that portray historical and experimental sciences as exemplifying two opposed epistemic situations: one in which researchers have complete control over the range, type, and quantity of data that they can obtain on the phenomena of interest, thus providing strong warrant for inferential claims made on the basis of those data, and one in which researchers cannot control what data are available and are thus “at the mercy of the processes by which time covers her tracks” (Currie Reference Currie2018, 7), which seems to threaten the epistemic reliability of their claims.Footnote 7

Both Currie and Alison Wylie have critiqued facile dismissals of the epistemic reliability and scope of claims made within the historical sciences, by pointing to the variety of evidence and methods that researchers in those fields can use (e.g., modeling, analogies, various new technologies to study the composition of material traces) as well as the importance of triangulation and consilience in warranting inferences in other fields (Chapman and Wylie Reference Chapman and Wylie2016; Wylie Reference Wylie2017; Currie Reference Currie2018). My analysis in this article corroborates their arguments through the following observation: not only is the pessimism about the epistemic status of the historical sciences based on a lack of recognition of their methodological sophistication in processing data, but it is also linked to an exaggerated optimism with respect to the potential and warrants of experimental methods in contemporary science and the extent to which they can really guarantee control over phenomena.

The cases of data handling that I analyzed above, particularly my second example, point to the fact that experimental results are difficult to control, not only at the point at which they are produced but most significantly at the point of dissemination, storage, and reuse. Data can disappear or become unusable very quickly if not properly curated: it only takes a destroyed hard disk, a misleading annotation, or a postdoc changing jobs. Worries about differential survival of evidence and informational destruction are thus arguably as alive with contemporary data collection in the life sciences as they are for historical sciences and observational data therein. And because experimentalists today operate in what are often characterized as ideal conditions (particularly in the case of molecular biology, where they can avail themselves of ready-made samples, high-throughput instruments for data production, high computational power, and myriads of modeling tools), they themselves tend to underestimate the challenges involved in processing data and related information, which can cause real trouble with data interpretation and inferential reasoning down the line. Both scientists and philosophers can learn from the strategies elaborated within the historical sciences to record and update information about Dt and thus maximize the evidential value of existing data collections.

Footnotes

This research was funded by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement 335925 (project: The Epistemology of Data-Intensive Science) and the ARC Discovery Grant Organisms and Us (DP160102989). I am very grateful to Dan Bebber, Robert de Bruin, Midori Harris, Val Woods, Steve Oliver, and the many others who wish to remain anonymous for taking time from their schedules to discuss their research with me. Many thanks also to the audience and other participants in the symposium Data in Time: The Epistemology of Historical Data at the 2016 PSA/HSS meeting in Atlanta, where this article was presented; to participants in the Biological Interest Group in Exeter, particularly Niccolò Tempini, Brian Rappert, Ann-Sophie Meincke, Dan Nicholson, Giovanna Colombetti, Thomas Bonnin, and Staffan Müller-Wille for their comments; and to Adrian Currie, Alison Wylie, James McAllister, James Griesemer, Rachel Ankeny, William Bechtel, John Dupré, and David Sepkoski for useful discussions.

1. Sober (Reference Sober1988, 1–2), Currie and Turner (Reference Currie and Turner2016), and Currie (Reference Currie2018) provide a useful overview of these arguments.

2. As is often the case in contemporary data-centric biology (Leonelli Reference Leonelli2016).

3. My position is thus sympathetic to the analysis of the relation between experimental and historical sciences provided by Cleland (Reference Cleland2002) and Turner (Reference Turner2004), although their discussion of the role of the temporal asymmetry of underdetermination does not explicitly consider the distinction between Dt and Pt, thus underestimating the relevance of practical issues of data preservation and handling to the warrant available to claims about past and present events.

4. Recent work by Woodward (Reference Woodward2010) indicates affinities with this view, yet neither Bogen nor Woodward has devoted much attention to defining what they mean by data.

5. These examples have been researched through an analysis of scientific literature and online tools such as databases, as well as interviews with the scientists involved, which I carried out in 2014 and 2015 and which helped me to reconstruct the activities and reasoning involved in data processing and analysis. Full transcripts of those conversations that interviewees consented to make available online are available in Leonelli (Reference Leonelli2017).

6. This line of research was made famous by the work of Paul Nurse, Tim Hunt, and Leland H. Hartwell, earning them a Nobel prize in 2001.

7. A similar distinction is frequently made between hermeneutic and quantitative approaches to data reuse (as championed by the social and natural sciences, respectively) and is convincingly challenged by James McAllister in his contribution to the PSA symposium where this article was also presented (McAllister Reference McAllister2018).

References

Bebber, Dan P., Holmes, Timothy, and Gurr, Sarah J.. 2014. “The Global Spread of Crop Pests and Pathogens.” Global Ecology and Biogeography 23 (12): 1398–407..CrossRefGoogle Scholar
Bechtel, William. 2006. Discovering Cell Mechanisms: The Creation of Modern Cell Biology. Cambridge: Cambridge University Press.Google Scholar
Bogen, James, and Woodward, James. 1988. “Saving the Phenomena.” Philosophical Review 97 (3): 303–52..10.2307/2185445CrossRefGoogle Scholar
Caetano, Catia, Limbo, Oliver, Farmer, Sarah, Klier, Steffi, Dovey, Claire, Russell, Paul, Antonius, Robertus, and de Bruin, Maria. 2014. “Tolerance of De-regulated G1/S Transcription Depends on Critical G1/S Regulon Genes to Prevent Catastrophic Genome Instability.” Cell Reports 9 (6): 2279–89..10.1016/j.celrep.2014.11.039CrossRefGoogle Scholar
Chapman, Robert, and Wylie, Alison. 2016. Evidential Reasoning in Archaeology. London: Bloomsbury.Google Scholar
Cleland, Carol. 2002. “Methodological and Epistemic Differences between Historical Science and Experimental Science.” Philosophy of Science 69 (3): 474–96..CrossRefGoogle Scholar
Currie, Adrian. 2018. Rock, Bone, and Ruin: An Optimist’s Guide to the Historical Sciences. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Currie, Adrian, and Turner, Derek. 2016. “Introduction: Scientific Knowledge of the Deep Past.” Studies in History and Philosophy of Science A 55:4346.10.1016/j.shpsa.2015.09.003CrossRefGoogle ScholarPubMed
Feest, Uljana. 2011. “What Exactly Is Stabilized When Phenomena Are Stabilized?Synthese 182 (1): 5771..10.1007/s11229-009-9616-7CrossRefGoogle Scholar
Griesemer, James R., and Yamashita, Grant. 2005. “Zeitmanagement bei Modellsystemen: Drei Beispiele aus der Evolutionsbiologie” [Managing time in model systems: Illustrations from evolutionary biology]. In Lebendige Zeit, ed. Schmidgen, H., 213–41. Berlin: Kulturverlag Kadmos.Google Scholar
Leonelli, Sabina. 2015. “What Counts as Scientific Data? A Relational Framework.” Philosophy of Science 82:810–21.CrossRefGoogle ScholarPubMed
Leonelli, Sabina 2016. Data-Centric Biology: A Philosophical Study. Chicago: University of Chicago Press.CrossRefGoogle Scholar
Leonelli, Sabina 2017. “[DATA_SCIENCE] Interviews PomBase Users, January–February 2016.” Figshare. doi:10.6084/m9.figshare.5484010.v1.CrossRefGoogle Scholar
Leonelli, Sabina, and Ankeny, Rachel A.. 2012. “Re-thinking Organisms: The Epistemic Impact of Databases on Model Organism Biology.” Studies in History and Philosophy of Biological and Biomedical Sciences 43 (1): 2936..CrossRefGoogle Scholar
Massimi, Michela. 2009. “From Data to Phenomena: A Kantian Stance.” Synthese 182:101–16.Google Scholar
McAllister, James W. 2010. “The Ontology of Patterns in Empirical Data.” Philosophy of Science 77 (5): 804–14..CrossRefGoogle Scholar
McAllister, James W. 2018. “Scientists’ Reuse of Old Empirical Data: Epistemological Aspects.” Philosophy of Science, in this issue.CrossRefGoogle Scholar
O’Malley, Maureen A., and Soyer, Orkun S.. 2012. “The Roles of Integration in Molecular Systems Biology.” Studies in History and Philosophy of Biological and Biomedical Sciences 43 (1): 5868..CrossRefGoogle ScholarPubMed
Sober, E. 1988. Reconstructing the Past: Parsimony, Evolution, and Inference. Cambridge: Cambridge University Press.Google Scholar
Turner, Derek. 2004. “Local Underdetermination in Historical Science.” Philosophy of Science 72 (1): 209–30..Google Scholar
Woodward, James. 2000. “Data, Phenomena, and Reliability.” Philosophy of Science 67 (Proceedings): S163S179.CrossRefGoogle Scholar
Woodward, James 2010. “Phenomena, Signal, and Noise.” Philosophy of Science 77:792803.CrossRefGoogle Scholar
Wylie, Alison. 2017. “How Archeological Evidence ‘Bites Back’: Strategies for Putting Old Data to Work in New Ways.” Science, Technology and Human Values 42 (2): 203–25..CrossRefGoogle Scholar
Figure 0

Figure 1. Global spread of tomato pathogen Oidium neolycopersici in 2007. Source: CABI Head Office, Map no. 1000, edition 1, CAB International, Wallingford, UK, 2007.

Figure 1

Figure 2. Different phases of the cell cycle in S. cerevisiae. Source: Michel Durinx (CC BY 4.0), https://datastudies.eu/resources#cc-by.