1. Introduction
Biological development is classically assumed to reflect the expression of information accumulated in the genome during evolution (Mayr Reference Mayr1961; Jacob Reference Jacob1970). Major textbooks and popular science presentation of biology rely on this picture (e.g., Alberts et al. Reference Alberts2013). Leading biologists are also attracted to this view (Williams Reference Williams1992, 10; Maynard Smith and Szathmary Reference Maynard Smith and Szathmary1995, Reference Maynard Smith and Szathmary2000; Jablonka Reference Jablonka2002). On closer scrutiny, however, the role of information in biology seems purely instrumental: it serves either as a metaphor or as a tool for big data analyses; biology does not yet have a theory of life as an information-processing phenomenon (Sarkar Reference Sarkar, Sarkar and Cohen1996; Godfrey-Smith Reference Godfrey-Smith2000). The aim of this article is to offer some scientific substance to such a theory.
Several theoretical and philosophical approaches have interpreted living systems as information-processing systems. One tradition identifies information with meaning, interpretation, and intentionality (Barbieri Reference Barbieri2007; Shea Reference Shea2007). A second tradition, which I espouse here, identifies information with patterns of association between objects (Dretske Reference Dretske1981).
I start from the sense of information introduced by Crick in his sequence hypothesis and central dogma of molecular biology, which was to become massively influential in biology: “Information … means the precise determination of sequence” (Crick Reference Crick1958, 153; see Kay Reference Kay2000). Information here is causal (Šustar Reference Šustar2007). Crick introduced this conception in an attempt to understand how DNA and RNA carry biological specificity for the synthesis of proteins, an idea that parallels the modern contrast philosophers draw between specific causes and other necessary, background factors to obtain an effect (Woodward Reference Woodward2010). Griffiths and Stotz (Reference Griffiths and Stotz2013, chap. 4) have argued that Crick’s sense of information vindicates the idea that factors other than DNA are also sources of information for biomolecules, a phenomenon they called ‘distributed specificity’. This idea needs substantiation.
I explore here this idea and develop an approach to biological information as a measurable and distinctive aspect of biological systems. This approach has two facets, inspired from, respectively, Shannon’s and Kolmogorov’s approaches in information theory. On the one hand, we have a measure of the relative, complementary influence of several causes of the same event (sec. 2). This concerns the choice between alternative objects and is blind to the information content of each object. This approach has been extensively discussed and applied elsewhere. On the other hand, we have measures of the complexity of a single object, independently of any particular set of alternatives (sec. 3). These can measure the information inherent to a biomolecule and the quantity of information in a molecule that can be attributed to a particular source. The computability of these latter measures, however, is problematic; in practice, tentative measures ought to be used. The role of randomness in creating information is outlined (sec. 4). I sketch potential developments for a Kolmogorov-inspired approach (sec. 5) and argue that it is a potentially fruitful yet challenging biological research program (sec. 6). The two approaches are not straightforwardly reducible to one another and are not suggested to exhaust all aspects of biological information.
2. Causal Specificity: Information as Choice
Recent work has defined an information-theoretic measure of the ‘specificity’ of a cause for an effect, the extent to which a cause precisely determines an effect, and applied this measure to biological problems (Griffiths et al. Reference Griffiths, Pocheville, Calcott, Stotz, Kim and Knight2015; Pocheville, Griffiths, and Stotz Reference Pocheville, Griffiths, Stotz and Leitgeb2017; Weber Reference Weber2017; Calcott, Pocheville, and Griffiths Reference Calcott, Pocheville and Griffiths2018). This work develops earlier, qualitative discussions of ‘causal specificity’ in philosophy (Woodward Reference Woodward2010) and converges with formal work on causation in complex systems theory (see n. 1).
Causal specificity is measured using Shannon information theory, which conceives information as a reduction in uncertainty (Shannon Reference Shannon1948; Cover and Thomas Reference Cover and Thomas2006). Uncertainty, measured in bits, can be understood as the average number of binary (yes/no) questions that are required to determine the value of an unknown variable. A variable is said to share mutual information with another variable when it reduces our uncertainty about that variable. Mutual information measures the association between two variables: the more two variables are associated, the more each of them answers questions about the value of the other. Causal specificity can be measured by the mutual information between values of a cause variable set by an intervention and the value of a putative effect variable (Griffiths et al. Reference Griffiths, Pocheville, Calcott, Stotz, Kim and Knight2015, 538). Formally, the causal specificity of C for E when controlling for a putative background B is given by the following formula (Pocheville et al. Reference Pocheville, Griffiths, Stotz and Leitgeb2017):Footnote 1
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105130104654-0199:S0031824800011259:S0031824800011259_df1.png?pub-status=live)
The ^ (hat) on a variable is an operator indicating that its value is set by an intervention rather than observed (Pearl Reference Pearl2009).Footnote 2 This operator transforms the symmetrical mutual information, representing observed association, into an asymmetric measure of causal influence, representing how much experimentally intervening on C while controlling for B affects E. If C is not a cause of E, then . Reciprocally, if C is a cause of E, then there exists at least one set of background variables B (which can be empty) such that
(Pocheville Reference Pocheville2018a).
This measure of causal specificity seems to capture one aspect of Crick’s, and the above cited biologists’, conception of information as ‘precise determination’. It can be used to compare the causal contributions of genetic and epigenetic causes to the production of biomolecules (Griffiths et al. Reference Griffiths, Pocheville, Calcott, Stotz, Kim and Knight2015). It can be applied to objects other than biomolecules and is a practical tool for the analysis of biological networks (Tononi et al. Reference Tononi, Sporns and Edelman1999; Calcott et al. Reference Calcott, Pocheville and Griffiths2018; Pocheville Reference Pocheville2018b).
There is, however, a blind spot in Shannon information theory: it is silent about the information content of the objects themselves. For example, it makes no difference to the amount of information that DNA carries about RNA whether the DNA strands are three or 1,416 nucleotides long. What matters is only the number of values that the variable ‘DNA’ can take and the probability distribution over those values. Arguably, the longer the sequences, the greater the number of possible alternatives, and thus the greater the potential causal specificity of these alternatives.Footnote 3 Still, in an actual case, the number of alternatives can be zero, and a very long DNA sequence can therefore have null causal specificity for its own transcript. Causal specificity represents a sense of information that enables us (or causes the system) to choose between a set of well-defined alternatives with a well-defined probability distribution. This is ‘information as choice’. If what we are interested in is the information content of a single object, another branch of information theory, Kolmogorov complexity, is more appropriate. It is to this second aspect of information that I now turn.
3. Kolmogorov Meets Crick: Information as Construction
Kolmogorov complexity can measure the complexity of a single object (Grünwald and Vitányi Reference Grünwald and Vitányi2003; Li and Vitányi Reference Li and Vitányi2008). The intuitive idea is that the more complex the object, the longer its description needs to be. The Kolmogorov complexity is the length of the shortest description enabling one to reconstruct the object using a computer (or, more precisely, a universal Turing machine).Footnote 4 Kolmogorov complexity also provides a measure of the amount of information in an object about another object. This is measured by the algorithmic mutual information: it is the amount of program length that one saves when describing one object given a description of the other object for free. Algorithmic mutual information is symmetrical.
Obviously, the length of the shortest description will depend not only on the object at stake but also on the language (the description method) used to write the program generating the object. However, the lengths of the shortest descriptions in two different languages will be the same up to a given translation constant, independent of the object itself. The reason is that the translation from one language to another can be described by a program, of which the length is fixed (which gives the constant of translation). In this sense, the Kolmogorov complexity is an objective property of the object.
A drawback of Kolmogorov complexity is that it is provably uncomputable: there is no computer program that, given any string as an input, returns its Kolmogorov complexity as an output. In itself, this is an interesting negative result: if what we are interested in is complexity in this sense, then what we want to know is simply uncomputable. In practice, one can bound the complexity of binary objects using diverse lossless compression methods (e.g., those used in the zip file format). Indeed, the compressed object is a (hopefully shorter) description enabling one, together with a decompression program, to reconstruct the initial object. The length of the description is then the length of the compressed file plus a constant, the length of the decompression program. This measurement is tentative, not definitive, as other, potentially unknown compression methods might compress the object more. For the sake of the argument, we assume for the moment that we are given a reasonable compression method.
Kolmogorov complexity can be used to explore what Crick meant when he described the determination of proteins by nucleic acids as the “detailed residue-by-residue transfer of sequential information” (Crick Reference Crick1970, 561), where nucleotides would form a quaternary alphabet and amino acids a vigesimal one. Two kinds of questions can be addressed: about how much information there is in a given biological object and, closer to Crick’s thinking, about how much information in an object comes from another.Footnote 5
The complexity of a strand of DNA, for instance, can be approached by measuring the length of the compressed sequence. Telomeres provide an interesting limit case. They are nucleotide sequences at the end of chromosomes, consisting of a repetitive pattern (e.g., TTAGGG in humans and many other species). Telomeres are elongated by an enzyme, called telomerase, which embeds an RNA sequence as a template (Hiyama, Hiyama, and Shay Reference Hiyama, Hiyama, Shay and Hiyama2009). It is not difficult to come up with a program describing a given telomeric sequence in a compact way. Whatever the length of a telomere, it can be described by a template for the repeated pattern and the number of repeats (fig. 1, algorithm 1). A naive observer would surely think that telomeres do not contain much information and, in particular, not much sequential information. The intuition here coincides with the low Kolmogorov complexity of these sequences.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105130104654-0199:S0031824800011259:S0031824800011259_fg1.png?pub-status=live)
Figure 1. Four algorithms illustrating an algorithmic approach to biological functioning.
The situation looks quite different for coding sequences. There does not seem to be, at first sight, as easy a way to compress these sequences as we did with telomeres, and their Kolmogorov complexity is probably substantially higher: a program to reconstruct a coding sequence may have to spell it out explicitly—or at least to spell out significant aspects of the sequence (fig. 1, algorithm 2). This lower compressibility coincides with the intuition that coding sequences carry sequential information—and even that it is their function to carry sequential information. However, coding sequences do not carry a maximal amount of sequential information: as an anonymous reviewer noticed, coding sequences are structured and are thus expected to be compressible to some extent—as are noncoding, so-called ‘junk’ DNA sequences containing a significant number of repetitive elements and duplications.Footnote 6 Note that the intrinsic amount of information in a sequence is independent of whether the sequence is inserted in a region that will actually undergo transcription. Arguably, even a coding sequence carries no information about any transcript if it is not transcribed, but it nevertheless carries sequential information tout court.
I now turn to the second question, asking how much information there is in an object about another object. For the sake of the argument, suppose that the world is as Crick supposed in 1958: the accuracy of information transfers is high, which we idealize by assuming that transcription (of DNA into RNA) and translation (of RNA into polypeptides) are error-free, deterministic processes. I ignore splicing and other posttranscriptional processes, which will be treated elsewhere. As described above, one can estimate the amount of information in DNA about RNA by their algorithmic mutual information, that is, .Footnote 7 The shared information between DNA and RNA is substantial: the transcription process is all about replacing the nucleotides by their complementary ones, with the proviso that A’s in the coding DNA sequence are complementary to U’s (not T’s) in the RNA sequence. To see this sharing of information, compare the lengths of an algorithm spelling out the RNA explicitly (similar to algorithm 2) and one treating transcription generically (fig. 1, algorithm 3). The difference in length would increase with sequence length. This corresponds to the fact that sequential information is transferred from DNA to RNA through transcription. If transcription is errorless, the sequential information in RNA that does not come from DNA, measured by the remainder complexity
, is a constant, independent of the sequence. Algorithmic mutual information between biological sequences has been used in the past decade with various aims, such as the building of phylogenetic trees according to the amount of information needed to transform one DNA sequence into another (see, e.g., Chen, Kwong, and Li [Reference Chen, Kwong and Li2000], Li et al. [Reference Li, Badger, Chen, Kwong, Kearney and Zhang2001], Chen et al. [Reference Chen, Li, Ma and Tromp2002], and Vinga [Reference Vinga2014] for a review).
Since the ‘true’ Kolmogorov complexity is uncomputable, an algorithmic approach relies on a bet: that the language of description and compression methods captures interesting and relevant aspects of the object at stake. This is not to say that the approach is necessarily entirely arbitrary: once these methods are agreed on, researchers can agree on the measures obtained for finite sequences. If a particular language gives particularly interesting results (e.g., saving biological appearances, leading to new questions, predictions, and generalizations), then this language becomes a theoretical entity worth discussing in its own right. In the remainder of the article, I outline what I deem desirable features for such a language and substantially develop the algorithmic approach to take into account the fact that biological systems are not, strictly speaking, deterministic, universal Turing machines. What I aim at is not an application of conventional algorithmic information theory to biology, but a specifically biological approach to information inspired by the Kolmogorov branch of information theory.
4. Randomness as the Source of Information
I made several idealizing assumptions in the previous sections. Let us now relax the assumption that cellular processes are deterministic. The argument here will remain theoretical: there is no room to take sides on whether, and how, randomness is actually realized in biology.Footnote 8
Random events, by definition, cannot be determined in advance by an algorithm. This means that randomness in the generation of a sequence creates information de novo. In biological terms, this means that any random point mutation, any error of transcription, and so forth, if they are genuinely random, can create information in the Kolmogorov sense. As seen in the previous section, this also means that randomly generated sequences contain more information than highly structured sequences. From an algorithmic point of view, randomness is, ultimately, the only way to create information.
This information need not always be functional, that is, of any use to the cell. That it may sometimes be so, however, is a reasonable assumption. There are several biological examples suggesting that randomness plays a key role in biological functioning (Kupiec Reference Kupiec1983; Heams Reference Heams2014). Gene shuffling in the immune system of jawed vertebrates provides one such example regarding biological sequences. It enables a great variety of antibodies to be produced, orders of magnitude more numerous than the genes producing them, increasing the chance of matching potentially threatening antigens (Cooper and Alder Reference Cooper and Alder2006).
This tension between information and function is why it is crucial to distinguish them. One might be interested in how information flows in biological systems without committing oneself to a particular account of biological function. More importantly, if one is interested in whether and how information leads to function, a concept of biological information as necessarily biologically functional will beg the question.
5. A Language for the Cell
Kolmogorov complexity allowed us to flesh out the idea of information as construction. Now we need to kick that ladder away and ask what information as construction actually looks like in living systems. I suggest that it ought to be measured using a particular programming language: the language of the cell itself, in which available programming functions mimic actual operations by which molecules are produced. It goes without saying that what I evoke here is not the ‘true’ language, but a model of a language of the cell.
The idea of a language of the cell takes us away from treating cells as universal Turing machines and from the genuine Kolmogorov complexity K, to consider a more biological algorithmic complexity, the Kolmogorov complexity in the chosen biological language (hereafter denoted KB). For instance, algorithmic mutual information is symmetric: there is as much information in DNA about RNA as there is in RNA about DNA, that is, . But not all operations are possible in a cell. A central feature of molecular biology is that flows of information are asymmetrical. Crick’s ‘central dogma’ (still widely held today) states which flows of information between biomolecules are possible and which are not. If no reverse-transcriptase is present, for instance, no information can flow from RNA to DNA. In ‘biologically’ algorithmic terms, this means that a biological program aiming at reconstructing a DNA sequence being given an RNA sequence as an input would fare no better than a program being given no input, and we would obtain
. This means that we would get, as for a biological analogue of algorithmic mutual information,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105130104654-0199:S0031824800011259:S0031824800011259_df101.png?pub-status=live)
(The subscript B again denotes that the measure is defined using the chosen biological language, and the arrow now reflects that it can be asymmetric.)Footnote 9 The reciprocal, as we have seen above, is very different: when DNA is transcribed into RNA, , where C is a constant not depending on the sequences. Assuming, for the sake of presentation, that
, we would obtain
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105130104654-0199:S0031824800011259:S0031824800011259_df102.png?pub-status=live)
Thus, contrary to its genuine counterpart, biological algorithmic mutual information would not be expected to be always symmetrical, reflecting the directionality of possible information flows.
In the same vein, not all sequences can be produced by a given cell. In algorithmic information theory, a universal Turing machine can emulate any other Turing machine, which means that there is no sequence that a particular machine can produce that a universal machine cannot produce. By contrast, if the cell lacks a programming function, for instance, if it lacks a nucleic acid template, or if some nucleotides do not belong to its alphabet, then some sequences may be impossible to produce. In this case, the information needed to produce an impossible sequence is ill-defined; in other words, the amount of information needed to produce the sequence is indefinite. Even on an evolutionary time scale, the amount of information needed to acquire the programming function (if it is acquired) and produce the previously impossible sequence could be orders of magnitude greater than the length of the sequence.
Granted that some operations are impossible, how are we to describe the set of primitive programming functions that, by contrast, are possible?
As we have seen above, the complexity of an object depends on the language used to describe it. An example will flesh out this idea. Assume, say, that ‘Transcribe’ is given for free by the language and that the description of the function is short: say, just a few letters. Contrast this with a DNA sequence of several kilobases. This DNA sequence appears much more informational than the function ‘Transcribe’.Footnote 10 Now, imagine that ‘Transcribe’ is not given for free by the language, but that one has to write a program for this function, using other, more primitive, available functions. I exemplify such a program in algorithm 4 (fig. 1); it could be made much longer by describing explicitly the dynamics of chemical bonds in a binary manner (assuming for the sake of the argument that this would be feasible). Conversely, the description of a long DNA sequence can be very short. For instance, nominal genes are usually described not by their full sequence, but by a nickname like ‘p53’. This nickname is enough, on most occasions, for biologists to communicate about the processes at stake. A language can lack the function ‘Transcribe’ but have a built-in function ‘P53’ dedicated to returning the full sequence of the gene. In such a language, descriptions of transcription would be complex (informational) and that of DNA simple. Thus, one needs to be cautious about the language of description before assigning any particular object a privileged informational role, much in the same way that one needs to be cautious about specifying the probability distributions when using Shannon information theory.
I propose that the primitive functions should be those that enable us to understand the processes of interest. Assume, for instance, that our interest lies in understanding the flows of sequential information between biological polymers. Then assuming that ‘Transcribe’ and ‘Translate’ are given as primitive functions is fine: if they are errorless, they are not difference makers with regard to the final sequences of the products (an assumption that I made in algorithm 3). Generally speaking, it makes sense to consider as primitives those operations that are not difference makers with regard to the outputs of interest, and as inputs those very difference makers: the genericity for functions and the specificity for inputs. Incidentally, it is good algorithmic practice to write functions for generic operations and give them specific variables as inputs. This is not unlike causal specificity: once the generic functional relationships in the causal model are set, information flows from difference makers.
6. Payoff of the Approach
The algorithmic approach sketched above may promote research on biological systems, although not without significant challenges.
Even the best current specifically designed compression algorithm may overestimate biological complexity because the algorithm may not have compressed the object sufficiently. On the positive side, compression algorithms may actually tend to parallel the biological processes that have produced the sequences at stake. For instance, if DNA translocation is frequent, then an algorithm that pays due attention to translocation should be more likely to compress a DNA sequence. Conversely, considering that most strings are random in the algorithmic sense, it is highly unlikely that a series of refined algorithms will converge, if they do convergence, toward something other than the processes involved in producing the sequences. It is highly unlikely that the cell will, by chance, produce a string that is compressible by other means than some of its own means of production (or the corresponding models of these means). In other words, improving these algorithms may yield a better grasp of functions that are in fact available in the language of the cell.
Just as an algorithm may overestimate complexity, however, it can also underestimate biological complexity. Because cells are not universal Turing machines, a biological sequence may be more complex than its algorithmic counterpart. For instance, a cell may need a complex process to resist random perturbations when duplicating a sequence, while a universal Turing machine, being deterministic, would not. Similarly, a short sequence may require a complex machinery or a complex evolutionary history to produce it. Just as biological complexity can shortcut algorithmic complexity (when a cell generates randomness), it can also exceed it.
7. Conclusion
This article aimed to give substance to the idea of biological information—an idea that has grounded significant aspects of informal biological thought for the past 50 years. Crick’s seminal use, in molecular biology, of the term ‘information’, meaning the precise determination of sequence, is grounded on causation, not meaning or representation. I inflected this idea in two ways, corresponding to two aspects of information theory: the precise determination of a single output from a set of alternatives (‘information as choice’) and the precise determination of the sequence of a single output (‘information as construction’). These two aspects can be traced back to Crick, whose idea of information as construction—to rephrase in our terms—was an attempt to provide an explanation of information as choice, in the sense of biological specificity (Crick Reference Crick1958, Reference Crick1970). This suggests that Griffiths and Stotz’s (Reference Griffiths and Stotz2013) idea of distributed specificity is theoretically richer than initially envisioned.
Information as choice is captured by causal specificity, proposed elsewhere to be measured by the Shannon mutual information between values of a cause set by an intervention and observations of the effect. This measure can be applied to causal graphs, such as those representing gene regulatory or animal signaling networks, and has numerous potential applications in biology.
Information as construction is captured by the Kolmogorov complexity of a sequence and the algorithmic mutual information between two sequences. These measures capture the intuition that there is something in common between a program generating a sequence and the biological processes of transcription and translation. I insisted, however, that there is more to biology than discrete, deterministic computing: randomness plays a central role in biological functioning. A similar point could be made regarding the nondiscrete nature of biological phenomena. From the point of view of Kolmogorov complexity, randomness creates information. Such information is not necessarily functional, and distinguishing between information and function is a necessary step toward better understanding how information can lead to function.
I proposed that biological algorithmic complexity ought to be measured using a biologically relevant programming language—the language in which the cell performs its own operations. In such a language, some operations, such as reverse translation, will be impossible. This means that the biological complexity of a sequence can far exceed its own length, making it very different from nonbiological algorithmic complexity. In planned future work, I will take up the challenge of fleshing out the ‘language of the cell’ and articulating the choice and construction aspects of biological information.