1. Introduction
In June 1707 John Mill, fellow of Queen's College, Oxford, published his Greek New Testament. The labour of the previous thirty years of his life, Mill's edition was published just two weeks before his death. The text of this edition is of no particular importance, being as it was a mere reprinting of Stephanus's 1550 text. What was noteworthy was what lay beneath it. In his thirty years of work, Mill had managed to collect an estimated 30,000 variants among the witnesses. It was these variants that became the cause of some controversy in the years that followed. Some felt that the presence of so many differences would render the text and therefore the authority of the New Testament insecure.Footnote 1 It was Richard Bentley, the Master of Trinity College, Cambridge, who offered the most substantial response to these concerns in his Remarks upon a Late Discourse of Free-Thinking, first published in 1713 and surviving through eight editions. Bentley pointed out the connection between the number of manuscripts and the number of variants, writing that ‘if more copies are collated, the sum will mount higher’ and that ‘the more copies you call to assistance, the more do the various readings multiply upon you’.Footnote 2
Three hundred years after Bentley penned these words, the number of known copies of the New Testament has increased significantly. Whereas Mill's edition was based on less than a hundred Greek manuscripts, the Institut für Neutestamentliche Textforschung (INTF) in Münster, Germany currently catalogues over 5,600.Footnote 3 Despite this fifty-six-fold increase, the actual sum of variants that Bentley referenced has not risen at the same rate for the simple (but sometimes forgotten) reason that ‘no one has yet been able to count them all’.Footnote 4 Instead, what has increased steadily since Bentley and Mill are estimates about the total number of variants in the New Testament.
One finds these estimates across the literature, in New Testament introductions, exegetical handbooks, and especially in textbooks on textual criticism. The purpose is almost always to raise awareness about the need for textual criticism. Sometimes the point is made with more pessimism, as when Günther Zuntz, for example, says that the total is an ‘unimaginable and unmanageable mass’.Footnote 5 In still other cases, the estimate plays the same role it played in Mill's day: causing concern for some and thus requiring a response from others.Footnote 6 In some cases, attempts to put these estimates in perspective lead to surprising conclusions about the overall transmission of the New Testament text, as when Stanley Porter suggests that ancient manuscript production ‘nearly rivals that sometimes found today in modern print’ or when Craig Blomberg concludes that there may be as few as eight variants per manuscript.Footnote 7
Despite the continuing appeal of such estimates, Eldon Epp is right that ‘there is, however, no reliable estimate of the total number of variants found in our extant witnesses’.Footnote 8 The present essay hopes to provide just such an estimate while offering a few brief comments on how this estimate might be put to good use. Before turning to our own estimate, it will be useful to trace briefly the estimates that have been offered in the past and to demonstrate something of their inadequacy.
2. Past Estimates and Their Problems
2.1. Survey of Estimates
A survey of books and articles from last 150 years shows how frequently such estimates are appealed to (for a survey, see 6). The starting point – or at least the point of comparison – for many of these is the estimated 30,000 variants in Mill's edition.Footnote 9 One of the first attempts to update the estimate is found in F. H. A. Scrivener's Plain Introduction, first published in 1861. After making the same point as Bentley about more manuscripts producing more variants, Scrivener suggests that, if Mill found 30,000 variants in his day, then the total number ‘must at present amount to at least fourfold that quantity’ (= 120,000).Footnote 10 Although he gives no rationale for his degree of increase, his estimate was picked up by others and even enlarged soon afterward by Philip Schaff, who wrote in 1883 that the number ‘now cannot fall much short of 150,000, if we include the variations in the order of words, the mode of spelling, and other trifles which are ignored even in the most extensive critical editions’.Footnote 11 The qualification Schaff attaches to his own increase highlights the importance of definitions, a point we will return to in due course.
The next jump in the estimate comes from B. B. Warfield of Princeton, who adds yet another 30,000+ variants.Footnote 12 Writing just six years after Schaff, Warfield claims that ‘roughly speaking, there have been counted in it [the New Testament] some hundred and eighty or two hundred thousand “various readings” – that is, actual variations of reading in existing documents’.Footnote 13 Aside from its claim to present a ‘count’ rather than an ‘estimate’, Warfield's number is worth noting not only because he is the first to offer an explanation of how the count was done but even more so because the explanation he gives is so strange. Rather than a count of the number of differences among manuscripts, Warfield actually offers us a count of the number of manuscripts that differ from an unstated standard of comparison. The count, he tells us, is conducted in such a way that ‘each place where a variation occurs is counted as many times over, not only as distinct variations occur upon it, but also as the same variation occurs in different manuscripts’.Footnote 14 This would mean that if a hundred manuscripts agreed against the standard, the result would be a hundred variants.
Over the next forty-five years, the estimates range between Scrivener's and Warfield's with the trend towards Warfield's higher numbers, despite his odd way of ‘counting’. Ezra Abbot suggested 150,000 in 1891, Eberhard Nestle gave 120,000–150,000 in 1897, and Marvin Vincent gave 150,000–200,000 two years later.Footnote 15 Only Adolf Jülicher gave a lower number, suggesting either 30,000 or 100,000, but he felt that the choice made no theological difference since the church has never had an errorless copy from which to work.Footnote 16 By 1915, Charles Sitterly offered Warfield's upper limit alone (200,000), though he makes clear that he is not just thinking of the Greek manuscripts.Footnote 17
In 1934, the estimate makes its next major jump in both Louis Pirot and Léon Vaganay, who put the estimate as high as 250,000.Footnote 18 Pirot, we should note, is the first to point out that there are probably more variants than words in the New Testament. Another fifteen years add another 50,000 variants to the estimate when Erwin Nestle gives 250,000–300,000 in 1951, and this just in the Greek manuscripts according to him.Footnote 19
Almost a hundred years after Scrivener, we find only the second estimate after von Maestricht's estimate of Mill that is based on explicit data. With the work of the International Greek New Testament Project (IGNTP) on Luke, Kenneth Clark wrote in 1966 that scholars can now ‘estimate more accurately the scope and character of the textual condition of the Greek NT’.Footnote 20 Using these data, Merrill Parvis concluded that the actual number is perhaps much higher than previous estimates of 150,000–250,000 and Kenneth Clark made clear just how much higher with his own estimate of 300,000.Footnote 21 Following this, older estimates continue to be cited in the literature, but Clark's 300,000 variants slowly begin to dominate. This number is cited, for example, in essays and books by J. K. Elliott and Ian Moir, Eldon Epp, Bart Ehrman and Eckhard Schnabel.Footnote 22 But like all such estimates, this one too was not to last long.
In 2007, Eldon Epp rounded up his previously cited 300,000 variants to one third of a million, but it was Bart Ehrman, in his bestselling Misquoting Jesus, who was the first to suggest that ‘some scholars’ estimate as high as 400,000.Footnote 23 No doubt due to the book's popularity and certainly in keeping with the historic trend, the largest number offered by Ehrman has since been adopted by a number of authors including J. Harold Greenlee, Daniel B. Wallace and Lee Martin McDonald.Footnote 24 But even now, this number risks being superseded by Eldon Epp's self-styled ‘wild guess’ of 400,000–750,000 variants, a number that marks what is, to date, both the highest estimate and the largest single increase from previous estimates.Footnote 25
2.2 Problems
In his entertaining and helpful guide to spotting dubious data, Joel Best sums up his advice in one sentence: ‘We need to be very careful when we can't tell who produced the figures, why, or how, and when we can't be sure whether consistent choices were made in the measurements at different times and places.’Footnote 26 Unfortunately, the estimates offered over the last 150 years all suffer from just these problems.
In the first case, we often have no idea who produced the estimate. The use of the passive voice to introduce these numbers is rampant. Phrases like ‘some say …’Footnote 27 or ‘one speaks of …’Footnote 28 or ‘it has been estimated that …’Footnote 29 or ‘there have been counted …’Footnote 30 pave a long trail of unverified estimates. By citing the number this way, those who cite them are able to make use of the number while at the same time avoiding any real responsibility for it. The problem is made worse when the number is presented as one of ‘the best estimates’,Footnote 31 ‘competent estimates’ (kundiger Schätzung)Footnote 32 or the like. The impression on the reader is that someone somewhere has taken the trouble to work out a sound method of estimating; but no such source appears forthcoming.
Not surprisingly, the second problem is that those who cite these statistics never explain how they arrived at their estimate and this despite the fact that the numbers get repeated again and again in the literature. If we judge these estimates by The Chicago Guide to Writing about Numbers when it says that ‘an essential part of writing about numbers is a description of the data and methods used to generate your figures’,Footnote 33 then all previous estimates must be deemed inadequate. Most estimates come with no rationale whatsoever, but even those few that do are problematic. Several estimates are offered as multiplications of Mill's 30,000 variants, but no rationale is given for the rate of multiplication. Worse still, they fail to recognise that their starting number is itself an estimate. Warfield is unique in telling us how his numbers were ‘counted’, but on this point there is every reason not to follow him. The most promising estimates of the bunch are those offered by Parvis and Clark because they were based on fresh collations of a significant number of manuscripts of Luke. But it turns out that neither estimate is based on a count of the variants found in the Luke collations, but only on an estimate of them, and precisely here the two scholars disagree. Whereas Parvis suggests that there are 30,000 variants in 150 of the 300 manuscripts collated, Clark estimates 25,000 variants among all 300 manuscripts.Footnote 34 The fact that Clark derives fewer variants from more manuscripts suggests that something is amiss. This, of course, illustrates the broader problem of basing one estimate on another.
The third problem is that it is not always clear what is being estimated. Is it some differences among some witnesses, some differences among all witnesses, or all differences among all witnesses to the New Testament? Eldon Epp, for example, has elsewhere carefully distinguished ‘textual readings’ from ‘textual variants’ with the latter excluding all ‘nonsense readings’, ‘clearly demonstrable scribal errors’, ‘orthographic differences’ and ‘singular readings’.Footnote 35 But when it comes to his own ‘wild guess’ of 400,000–750,000, which of these does he have in mind? As with so many past estimates, the answer is not clear.
3. Proposing a New Estimate
3.1. Method and Scope
As any survey of bad statistics can, this one may induce the negative impression that all numbers are meaningless. But this would be unduly cynical. The truth is that the most important feature of good statistics is very simple: they are public – public in the sense that ‘we are told where they come from and how they were produced, but also public in the sense that dissenting views about methods might be taken into account and used to refine definitions and measurement choices.’Footnote 36 It is this quality above all that we attempt to provide in the estimate that follows.
3.1.1 Who
If any estimate is to be useful, it must clearly explain the who, the what and the how that characterise all good statistics. The first is obviously the simplest. The estimate offered below is my own and therefore so is the credit or discredit for its quality.
3.1.2 What
In the second case, I limit my estimate to the number of variants found in the Greek manuscripts only (papyri, majuscules, minuscules and lectionaries). This is not to disparage other witnesses such as the versions, patristic citations, inscriptions etc., but is simply due to the difficulties of translation technique, citation style and, in many cases, the dearth of robust data.
The question of what we are counting is at once complex and simple. It is complex because any decision about what constitutes a difference between any two texts involves the subjectivity of human judgement. Yet it is simple in this particular case because I will be entirely dependent on the collations of others. In looking for good collations to work from I have chosen those that include the most data from the most witnesses in the most accessible form.Footnote 37
The three main sources I have selected are Bruce Morrill's dissertation on John 18, Matthew Solomon's dissertation on Philemon and Tommy Wasserman's work on Jude.Footnote 38 Each of these works offers some of the most extensive collation data available for the Greek New Testament. A fourth resource considered is the Text und Textwert series published from 1987 to 2005 by Kurt Aland and his colleagues at the Institut für neutestamentliche Textforschung.Footnote 39 However, because the Text und Textwert volumes only provide collations in select passages (or Teststellen), they will have to be used with care as will be explained in what follows. For comparison, the relevant features of each of these four sources are listed in Table 1.Footnote 40
The most important aspect of our estimate is, of course, the definition of the term ‘variant’. So far we have used ‘variant’, ‘reading’ and ‘difference’ interchangeably and somewhat imprecisely. But if our estimate is to be useful, we need to be crystal clear about what it is we are estimating. Within the discipline of New Testament textual criticism, a number of attempts have been made to distinguish the terms ‘variant’ and ‘reading’, but a consensus has yet to emerge.Footnote 41 For the present purpose, I will restrict myself to the term ‘textual variant’, which I define as a word or concatenation of words in any manuscript that differs from any other manuscript within a comparable segment of text, excluding only spelling differences and different ways of abbreviating nomina sacra.Footnote 42
Before moving on, two important observations should be made about this definition. First, the definition is relative to the manuscripts themselves rather than to any particular editorial text.Footnote 43 This means that at any point of comparison where there are at least two readings, all of them are counted as ‘variants’, even those that the collator or editor believes to be the original source of the other(s). In this context, then, ‘original’ and ‘variant’ are not mutually exclusive descriptors.
Second, notice should be taken of the important qualification ‘comparable segment of text’ in our definition. This phrase simply designates what textual critics normally refer to as a ‘variant unit’.Footnote 44 Deciding exactly where to place the boundaries of comparable segments is a matter of human judgement and one that, significantly for our purposes, can affect the number of resulting variants.Footnote 45 Exactly how much it may affect the overall results is hard to say with certainty, but my impression from working in multiple datasets is that the more complete the collation, the less effect such decisions have on the overall number of variants. In any case, it must be said that the following estimate is entirely dependent on the judgement of others when it comes to setting these boundaries.
3.1.3 How
Given these collation sources and our definition of what is to be counted, it remains to explain how we will arrive at our overall estimate for the entire New Testament. The first thing to note is that our estimate is not based on another estimate but instead on an actual count of textual variants. In this respect, it differs from all previous estimates. Still, it is an estimate, and every estimate is essentially an extrapolation from one set of data to another. The simplest point of extrapolation in our case is the number of words in each book of the New Testament. Obviously this number depends on the edition we use but, so long as we use the same edition for each side of the formula, the results will be consistent. Because of its close relationship with the Text und Textwert volumes, I have chosen the 27th edition of the Nestle–Aland Novum Testamentum Graece, which has 138,020 words including those in double and single brackets.Footnote 46 If we know the number of variants per word in one section of text, or what we might call the ‘rate of variation’, we can extrapolate from this to the New Testament as a whole. The formula is as follows:
Since we are interested in the number of variants in the New Testament (y), we can arrange the formula as (a ÷ b) × z = y.
3.2 Data for the Estimate
To arrive at the rate of variation for each corpus, I carefully combed the selected collations and counted the variants in each one, noting nonsense or singular readings where possible. In some cases the count was aided by the availability of electronic datasets, but otherwise it was done by hand. The raw data from our three main sources are presented in Table 2.
To this data we can add a number of useful points of comparison such as the number of words and the number of variant units. We can also tabulate what percentage of total variants are nonsense and singular readings. These comparisons are given in Table 3 (rates are rounded up to the nearest hundredth and percentages to the nearest tenth).
* The last two columns show the percentage of variants that are grammatically or logically nonsensical or that only occur in one of the collated manuscripts. These last two categories are not mutually exclusive.
Before proceeding to our estimate, a few observations are worth making. First, the percentage of singular variants is especially high, averaging just over half of all variants across the three collations and reaching nearly 60 per cent in John 18. The percentage of nonsense variants is not as high but is still significant, averaging over 30 per cent across the three collations and reaching nearly 45 per cent in John 18. Not surprisingly, these last two categories show substantial overlap so that 86.3 per cent of all nonsense variants in John 18 are also singular variants. In Philemon the percentage is 64.2 and in Jude it reaches 84.7 per cent. This confirms that obvious mistakes were the easiest kind for scribes to spot and then correct.
Second, we should consider the relationships between the number of variants and the number of manuscripts. It is true, as Bentley knew, that collating more manuscripts increases the number of variants. But we can also say that the increase is not linear or exponential but rather logarithmic. This is because the majority of manuscripts are Byzantine, which means they are also the most uniform. As more Byzantine manuscripts are collated, they individually contribute fewer and fewer variants. We can see this first of all by noting that the rate of variation (or words-to-variants) is very close between the three collations despite the fact that John 18 has almost three times the number of manuscripts. The reason is that so many of these additional manuscripts are Byzantine. We can observe the same effect if we compare Wasserman's collation of Jude to that of the Editio Critica Maior (ECM).Footnote 47 Although Wasserman collated more than quadruple the number of witnesses, the result was less than double the number of total variants.Footnote 48 The reason is the same: when it comes to Byzantine manuscripts and the number of textual variants, the law of diminishing returns sets in.Footnote 49
3.3 A Proposed Estimate
On the basis of these numbers, we are now in a position to estimate the total number of variants in the Greek New Testament. Our formula again is (Number of variants in the sample ÷ Number of words in sample) × Number of words in NA27 = Estimated number of variants in the New Testament.
Given that these estimates are based on collations from a range of the New Testament (Gospels, Pauline Epistles and Catholic Epistles), they are remarkably similar. If they have a shortcoming, however, it is that they assume a constant rate of variation across the entire New Testament. In order to let the transmission of each book have its due, we could use the data from the Text und Textwert volumes, being aware that they offer data only in the 920 test passages (Teststellen) and that they do not include any nonsense variants.Footnote 50 The data from these volumes is presented in Table 4 and Table 5.
* The number of manuscripts is taken from the test passage in each book with the most number of witnesses cited. Omissions that result from either homoeoteleuton or homoeoarchton (designated with ‘U’ or ‘V’ in the apparatus) are counted only where they result in a distinct reading within their variation unit. When multiple such omissions occur in the same variation unit, they are not counted as singular readings. Manuscripts that omit all of Mark 16.9–20 or John 7.53–8.11 are not recounted in subsequent variation units within these passages. A dash marks unavailable data.
* The word counts are taken from the primary line text of each test passage (marked in Text und Textwert by an underline).
To ensure that each book's transmission is treated separately, we applied our formula to each book individually and only then added the totals together.Footnote 51 The result is the highest estimate so far: 591,044 variants for the entire New Testament. Comparing this with the other three estimates, it is striking that the more expansive collations result in lower estimates. How could this be? One explanation might be that John 18, Philemon and Jude were more carefully copied than other parts of the New Testament and therefore exhibit below average rates of variation as compared to the rest of the New Testament. The more likely explanation is found in the selective nature of the Text und Textwert test passages, which may not be as representative of the amount of variation as we might hope. The test passages were not, after all, chosen at random, but were ‘carefully selected’ for the specific purpose of evaluating a manuscript's textual worth (Textwert).Footnote 52 In fact, we do not need to hypothesise this explanation; we can demonstrate it by comparing the overlapping data in Table 6.
* Nonsense readings are here excluded from Solomon and Wasserman's data and included for the Text und Textwert data for John 1–10.
In all three cases, the Text und Textwert test passages show above average rates of variation. In the case of John, there are 0.29 more variants per word in the John 1–10 test passages than in Morrill's John 18 collation; in Jude, the rate is 0.74 more variants per word in the test passages; and particularly striking, in Philemon the rate is 1.75 more variants per word. This means that if we were to use the Text und Textwert test passages to estimate the number of variants in all of Philemon and Jude, our estimate would overshoot the actual number of variants by more than 580 and 350, respectively. The difference might seem slight, but if the same rate of overestimation held across the New Testament, the result would be 100,000–240,000 variants too many. Even so, our estimate would not be wildly off the mark, and the benefit of having data from each individual book means that we should not discard the Text und Textwert estimate completely.
We suggest that a reasonable estimate for the number of textual variants in the Greek New Testament (not including spelling differences) is about 500,000. This estimate – and we emphasise that it is still an estimate – is based on a sample size of about 3 per cent of the entire Greek New Testament and includes minuscules, majuscules and some lectionaries. Except for Revelation, it is based on data from portions of every book and therefore does not assume that all books were copied with the same frequency or the same accuracy. It does not include variants from patristic citations, versions, amulets or inscriptions.
4. The Value of the Estimate
If the preceding estimate is reasonable, what is its value? Some might suggest that there is no value whatsoever. Kenneth Clark is convinced, for example, that ‘counting words is a meaningless measure of textual variation, and all such estimates fail to convey the theological significance of variable readings’.Footnote 53 We may agree with the second claim without agreeing with the first. There is no reason to be so pessimistic that counting and estimating can tell us nothing at all about the overall transmission of the New Testament; we simply need to be careful how we use the data.
By way of negative example, we might be tempted to compare our estimate to the number of extant manuscripts as Craig Blomberg and Stanley Porter have done.Footnote 54 In that case we could conclude that each of our 5,600 manuscripts contributes, on average, only ninety variants. But a moment's reflection reminds us that our Greek manuscripts are of such widely varying length that this is a meaningless comparison – just think of ninety variants in Codex Sinaiticus and P52 alike. What if we used pages instead of manuscripts as our unit of comparison? The homepage for the New Testament Virtual Manuscript Room (NT.VMR) currently lists the number of catalogued pages for the Greek New Testament at 2,111,770 pages.Footnote 55 This would mean, on average, about one variant contributed to the total for every four pages; or 0.25 variants per page. Unfortunately, we still have the problem that a page is not a stable unit of comparison since pages vary both in size and in the amount of text they contain without any necessary correlation between the two. It is, after all, in the process of copying words that scribes introduce variants, not in trimming pages or in binding them together.
As a further example, comparison has often been made between the number of variants and the number of words in the New Testament (presumably in some particular edition).Footnote 56 This leaves one with more variants than there are words, a view of the matter which some seem to find particularly appealing for its ‘shock value’. Despite its popularity, this comparison is the most dubious, at least if it is intended to tell us anything about the transmission of the New Testament. The reason is that it completely fails to recognise that the same process that introduces variants into a textual tradition (i.e. copying) also increases the total number of words that thereby attest to that very same textual tradition. As with the other comparisons considered, this one also fails to recognise that scribes introduce variants only in the process of writing. As before, the result is a false comparison.
Can we, then, say anything meaningful about textual transmission of the New Testament based on the number of estimated variants? We can if we compare the number of variants in our manuscripts, not with the number of manuscripts, pages, or words in the New Testament, but instead with the number of words in the manuscripts from which the variants derive.Footnote 57 Unfortunately, no one knows the number of words in our extant manuscripts and probably no one will for some time still. Nevertheless, we can make such a comparison on a small scale with the data from our three main collation sources. If, for example, we assume that all 1,659 manuscripts collated for John 18 have somewhere between the NA27's 791 words and Robinson-Pierpont's 801 words, this would tell us that scribes contributed, on average, roughly one new variant for every 430 words they copied. This is only slightly lower than what David Parker calculates for two very close members of family 1 in Matthew: one variant for every 550 words.Footnote 58 Turning to Philemon and Jude, the rate drops significantly to about one variant for every 150 words copied in both cases. As before, the difference is surely attributable to the smaller number of Byzantine manuscripts of Philemon and Jude. In all three cases, however, the data confirm that the large number of variants is a reflection of the frequency with which scribes copied more than a reflection of their failure to do so faithfully.Footnote 59
Another way our proposed estimate is helpful is that it is founded on qualitative and not merely quantitative data. We can say, for example, that almost 50 per cent of our estimated variants are the kind that many textual critics would deem to be the least likely to be original, namely, singular readings.Footnote 60 We can go further and note that in John 18, 44 per cent of all variants are such that the editor could not make sense of them either logically or grammatically (i.e. ‘nonsense’ variants). In Philemon and Jude, the rates are lower but still amount to 18 and 29 per cent, respectively. This simply confirms what seasoned textual critics have always known and that is that a significant percentage of the variants in our manuscripts have little or no claim to being original.
5. Conclusion
Roughly 150 years after Mill's edition was published with its estimated 30,000 variants, Scrivener suggested that the number should be quadrupled. Now, more than 150 years after Scrivener, we can more than quadruple Scrivener's estimate, although we do so with reference to Greek manuscripts alone. We can also say that all previous estimates have been too low, especially those that claim to include variants from versional and patristic sources. The exception is Eldon Epp's ‘wild guess’ of up to 750,000, which is probably too high, even with the inclusion of patristic and versional evidence. Most importantly, our estimate allows scholars to avoid passing the responsibility for their estimates to silent and invisible sources. The present estimate is based on a clear foundation in the available data and a clear method, both of which are open to public scrutiny. One hopes that these two qualities alone will be enough to discourage all of us from the continued rehashing of unverified and unverifiable information about the transmission of the Greek New Testament.Footnote 61
6. Appendix: Survey of Estimates
The following list offers a survey of estimates in New Testament introductions, dictionary and encyclopedia articles, exegetical handbooks, books on New Testament textual criticism, and books about the origin and formation of the Bible from the last 150 years. Where an author has been cited in the main text, only partial bibliography is given here.