To sum up, we may accept that Zielinski's statistics, while they are far from perfect, do nevertheless give a tolerably accurate picture of Cicero's clausulae … It is conceivable that in the future computer technology may allow accurate statistics to be produced for large amounts of material, such as whole authors, at the touch of a button. But until that day arrives, Zielinski's figures for Cicero's speeches … may suffice. They are the best we have and, until computers come to our aid, will not be improved upon.Footnote 1
For over a century, scholars studying Latin prose rhythm have relied on the statistics generated by Theodor Zielinski's pioneering Das Clauselgesetz in Ciceros Reden.Footnote 2 They have also complained about his methodology and its inadequacies:Footnote 3 Zielinski read his own Russian translations of Cicero's speeches out loud in order to develop a feel for where sense breaks (and so clausulae) occurred in the Latin;Footnote 4 he arbitrarily decided that the cretic was the basis for Latin prose rhythm;Footnote 5 he did not compare his observed frequencies of clausular patterns to any expected values, thus ignoring the naturally occurring rhythms of the Latin language;Footnote 6 he came up with dubious rules for word division and resolutions within his clausular categories.Footnote 7 But this was path-breaking scholarship, for Zielinski had no real predecessors — and he has had no successors either.Footnote 8 In the 115 years since Das Clauselgesetz, no scholar has had the Sitzfleisch to do what Zielinski did: he counted, by hand, 17,902 clausulae in Cicero's speeches. He analysed his results in detail and produced elegant summary tables, all without the aid of electronic calculators. The result is an imposing and apparently authoritative monument.
The real problem with Zielinski's analysis, however, is not its methodological basis. About his methodology Zielinski is an exemplar of openness and honesty: he lays out his assumptions and reasoning at every step of the process, and in exhaustive detail. While all of these have been questioned, no one would expect the first explorer of uncharted terrain to map it perfectly. A bigger problem is that Zielinski's results are unverifiable and unreproducible. He seems to provide a deluge of data, but readers must trust that he has scanned and counted and tabulated correctly, for he provides comprehensive scansion for only one speech. Zielinski was a great scholar, but in most fields of scientific inquiry we do not simply accept unverifiable pronouncements. And yet it is not just Zielinski: all scholars of Latin prose rhythm who give even partial statistics have presented their varying results and varying methodologies from a black box that could not be inspected or verified.Footnote 9 Furthermore, there looms a potentially even bigger problem: if one wanted to modify Zielinski's methodology — disregarding some of his strictures on word division and resolution, say — it would require recounting everything from scratch.
Fortunately, computers have come to our aid.Footnote 10 In this article, we describe a series of interrelated algorithms and modules that can produce a comprehensive analysis of the prose rhythms of a given corpus of Latin literature with a few keystrokes. This digital approach presents entirely new possibilities for the study of prose rhythm. With complete openness and transparency, we can calculate prose rhythm statistics from across the whole of extant Latin literature. Furthermore, we can be absolutely consistent in our procedures and confident in our statistics, and yet we are not bound to any one methodology. If it becomes clear, for example, that we should treat elision differently, we can do so and generate new numbers and new statistics — instantly. Zielinski laboriously counted 17,902 clausulae by hand over years: we count hundreds of thousands of clausulae in seconds. Furthermore, all of our results are verifiable from the highest to the lowest level: we can show how any individual phrase has been scanned and categorised, and all of our code and data are open source. We can thus answer fundamental and challenging questions about prose rhythm, and answer them with speed, consistency and transparency.
I METHODOLOGY
Latin prose rhythm sometimes looks like a species of philological witchcraft, albeit one without the seductive power of most black magic. In part this is because the ancient testimonia on the subject are confused or confusing, and ancient theory does not always seem to match ancient practice.Footnote 11 But it is clear that ancient orators and rhetoricians perceived prose rhythm as a real phenomenon, and they cannot be faulted for failing to reduce a complex and intuitively felt system to a set of clear rules. Indeed, it was not just prose rhythm that caused headaches for ancient linguistic theorists: everything from the Latin stress accent to its ablative case proved obstacles for ancient authorities trying to systematise the properties of their language.Footnote 12 Self-diagnosis is hard.
With the distance of two millennia and a bevy of statistics, we may actually stand a better chance today of describing the practice of ancient prose rhythm. Modern theories continue to proliferate, and we do not propose to adjudicate among them here.Footnote 13 After much trial and error, we have settled on a system that both seems generally reasonable and accounts for the data. Our typology accords fairly well with modern scholarly approaches, but it is fundamentally a pragmatic choice, adopted because it yields useful and interesting results.Footnote 14 It is not meant to be the last word. Again, a virtue of the digital approach is that we can adjust — and have adjusted — our methods and classification as our understanding improves. Looking at the data that we provide, new readers may detect other points of interest which have eluded us.
We divide all possible clausular patterns into seven main categories. Of these seven categories, the first four — cretic-trochaic, double cretic/molossus cretic, ditrochaic and hypodochmiac — are traditionally considered ‘rhythmic’. These are the rhythmic preferences that seem to have been developed for Greek prose by the shadowy Hegesias (third century b.c.) and are exemplified in Latin by Cicero.Footnote 15 When scholars talk about ‘rhythmic’ authors, they usually mean those who follow this system.Footnote 16 Hegesias’ doctrines were very influential and found a number of adherents; as we will see, looking at authors’ differing preferences for so-called ‘rhythmic’ and ‘non-rhythmic’ clausulae has real explanatory power. But we will also see that all Latin authors have their own rhythmic preferences, even those who do not follow this artificial system. Thus, in a slight but significant terminological shift, we will avoid calling authors ‘rhythmic’ and ‘non-rhythmic’, even as we still find it useful to compare the artificially ‘artistic’ rhythms (the first four categories below) with ‘non-artistic’ rhythms (the last three).
1. Cretic-Trochaic: –⏑– –×
Resolved:
a. ⏑⏑⏑– –×
b. –⏑⏑⏑–×
c. –⏑–⏑⏑×
2. Double cretic/molossus cretic: –⏑– –⏑× or – – – –⏑×
Resolved:
a. ⏑⏑⏑– –⏑×
b. –⏑⏑⏑–⏑×
c. –⏑–⏑⏑⏑×
d. ⏑⏑– – –⏑×
e. –⏑⏑– –⏑×
f. – –⏑⏑–⏑×
g. – – –⏑⏑⏑×
h. –⏑– – –⏑×Footnote 17
3. Double trochee: –⏑–×
Resolved:
a. ⏑⏑⏑–×Footnote 18
b. –⏑⏑⏑×Footnote 19
4. Hypodochmiac: –⏑–⏑×Footnote 20
Resolved:
a. ⏑⏑⏑–⏑×
b. –⏑⏑⏑⏑×
5. Spondaic: – – –× (no resolutions)
6. Heroic: –⏑⏑–× (no resolutions)
7. Miscellaneous (everything else)Footnote 21
In the first four categories, we allow one resolution of a long into two shorts. Despite some temptation, we have nowhere permitted two or more resolutions in a single clausula. Once you allow more than one resolution, clausulae quickly begin to lose their individual character: should ⏑⏑⏑⏑⏑–× count as a twice-resolved cretic-trochaic or a once-resolved double trochee? There are ways around this problem, but complications immediately multiply, and we doubt whether something like ⏑⏑⏑⏑⏑⏑⏑× could ever be felt as anything other than a very long series of shorts.Footnote 22
We have used the Packard Humanities Institute (PHI) Latin texts as our corpus of data.Footnote 23 These texts are of high quality and freely available, although they require extensive preprocessing for machine analysis. First they must be reformatted to Unicode and extra spaces and line breaks must be removed, along with section numbers and book divisions and so forth. Then their orthography must be made uniform: we have converted consonantal i and u to j and v throughout, and systematically incorporated certain unusual features of Latin prosody (for example, huius → hujjus). Then the texts must be ‘macronised’: vowels that are long by nature must be so marked. This is a non-trivial process for which we have used the excellent tool of Johann Winge, which shows a remarkably high degree of accuracy for classical Latin texts (95–98 per cent).Footnote 24 This done, the texts must be syllabified, i.e., separated out into their constituent syllables; here again we have made use of an open-source tool, this time from the Classical Language Toolkit (CLTK).Footnote 25 Finally, problematic elements must be removed from our sample and tracked separately: we exclude clausulae that contain abbreviations (most notably proper names), Roman numerals, textual corruptions marked by editors (daggers, brackets and the like) or fewer than four syllables.
After preprocessing, by default we collect up to thirteen syllables worth of clausular data before every mark of ‘heavy’ punctuation, viz. full-stops, semicolons, colons, question marks and exclamation marks (. ; : ? !). This is not a perfect method, since clausulae can and do occur where editors tend to punctuate with commas, as well as in places where there is no punctuation at all.Footnote 26 Furthermore, many previous scholars have only looked at clausulae before periods, question marks and exclamation marks.Footnote 27 Including semicolons and colons by default seems best to us, but within our framework users can decide for themselves and set which punctuation they would like to consider, and so results with different punctuation patterns can easily be generated.Footnote 28
Then these data must be scanned, sorted and counted. On the one hand, it is easy to write a programme to scan macronised Latin texts. The basic rules are straightforward: if a syllable is closed or ends with a long vowel, it is long. If a syllable is open and ends with a short vowel, it is short. But there are a variety of subtleties that must be accounted for, including elision, instances of mute + liquid and cases of short open syllables before s impura (sc, sm, sp, sq, st, z; so ipse sceleratus); in Cato, at least, one might even countenance ‘sigmatic ecthlipsis’, or loss of final s.Footnote 29 By default we do elide, do lengthen a short final open syllable followed by an s impura, but do not lengthen a short vowel followed by a mute + liquid. We think that this is the most accurate representation of classical Latin pronunciation.Footnote 30 But we also allow users to set these parameters for themselves, and we try to track more fine-grained data as well: so we record whether an elision is of a long vowel/diphthong or of -m or of a short vowel and allow users to choose to elide or not elide in any of these categories.Footnote 31 We furthermore track word division/word shape and word accent, which may be relevant if we wish to consider iambic shortening or rules for resolutions that depend on word division or hypothetical ‘prose ictus’.Footnote 32 In sum, we have built in flexibility to allow users to set their own preferred parameters and slice the data differently.
With these tools, we can generate all manner of reports in seconds.Footnote 33 After preprocessing, we can show the complete syllabification, scansion and accentuation of any Latin text; we can show those results divided into clausulae; we can produce data on numbers and percentages of individual clausulae within a text; and of course we can combine all this information to yield comprehensive data on the prose rhythms of any set corpus of Latin literature, as we do below. Such reports allow us to ask and answer with ease questions that would have taken weeks and months and years of tedious (and error-prone) calculation before.
Some Limitations
Our method certainly is not perfect. For example, we currently assume that Latin prosody showed no variation or evolution over time. This is manifestly untrue, most obviously perhaps in the case of final -o. We know from verse evidence that in the first century b.c., final -o in most words was regularly long (for example, ergō, so always in Vergil). But by the time of Lucan, and still more so by that of Martial, final -o was usually short. We treat such cases as invariably long. In our current model, we likewise ignore effects like iambic shortening, which presumably was in operation in all ages on at least some words at least some of the time.
With further modifications we could allow users to consider different treatments of prosody, but two risks immediately present themselves: first, if final -o is treated differently in, for example, Cicero and Pliny, it may no longer be legitimate to compare results between the two. Second, and perhaps more seriously, it is hard to know in any individual case whether -o is pronounced short or long. In Silius Italicus we find only ergŏ — except at 16.217 ‘cui nescire licet? quin ergō tristia tandem’. For Silius, metre guarantees prosody in each instance, including when it differs from our expectations. But what do we do with Pliny the Younger? At Ep. 6.19.5 ‘concursant ergo candidati’, ergō gives a ‘better’ clausula (molossus ditrochee), and so perhaps -ō should be preferred there, but there is no metrical guarantee. For now consistent practice throughout seems methodologically safest.Footnote 34
There is also the fact that our output will necessarily be determined by our input. The PHI texts are meticulous reproductions of standard print editions, but they do not include a critical apparatus, and so we cannot take account of variant readings. More importantly, for the past century editors have made decisions among variant readings and competing emendations at least in part based on their understanding of prose rhythm. Indeed, they have also considered prose rhythm in how they punctuate their texts. Thus, to some degree, prose rhythm has already been ‘baked in’ to these texts, and our results could be circular. While this is admittedly true at a local level — that is, in the case of any given sentence — over a large corpus, the vast majority of clausulae will be free from textual troubles, and most editorial decisions concerning choice of reading and choice of punctuation will not have hinged on prose rhythm. This objection is thus more potent in theory than practice.
Finally, the various component parts of our programme occasionally err. Although the macroniser returns correct results 95–98 per cent of the time, the rest of the time it does not.Footnote 35 Even more rarely, sometimes the u/v and i/j converter makes a mistake, as does the syllabifier.Footnote 36 While we have tried to make our algorithms as accurate as we can, some error inevitably remains, and we have not adjusted any of our results by hand. We plead the following:
1. The error is very small by comparison with the enormous amounts of data that we can consider. Our sample size is large enough that we can rely on the central limit theorem to justify our statistical analysis. Put plainly: Big Data eliminates small error as a practical issue.
2. The error will be the same in all of our tests. That is to say, we expect that the same types and proportions of error will be present in a text of Cicero or Caesar or Apuleius. Since we use a uniformly consistent methodology, we will always be comparing like with like.
3. It seems very likely that those who count by hand make mistakes too, although because their results are not easily reproducible, it is very hard to determine what kinds of mistakes they have made and how often they have made them.Footnote 37
Our method is not perfect, but we believe that the advantages of getting very accurate — but not perfect — results on large swaths of data in an instant are bigger than the advantages of getting ‘perfect’ results on small amounts of data that take a long time to compile, which cannot be verified and from which it is hard to generalise.Footnote 38
II DATA
Without further ado, we present in tabular form some of the data that our algorithms have generated. We give first a table of the prose rhythms of most major extant Latin prose authors through the age of Trajan, with Suetonius, Gellius and Apuleius appended. There follow tables of Cicero's speeches, his rhetorical and philosophical works, and his letters. Finally we give detailed results for Tacitus and Pliny, which we will discuss in the next section. The arrangement within each table is broadly chronological, although perfect consistency in arrangement has proved neither possible nor desirable.
Fragmentary and incomplete works have generally been excluded.Footnote 39 We have removed passages in verse from Seneca's Apocolocyntosis and Petronius’ Satyrica, but otherwise have not systematically taken special account of verses or quotations.Footnote 40 In some authors and works particular caution must be exercised. Given the nature of the Suasoriae and Controuersiae, for example, the statistics for Seneca the Elder are probably of little value and are included only for the sake of completeness. Similar warnings apply to certain texts with particularly small sample sizes or those with unusual transmissions. Numbers never absolve readers of the responsibility to think critically, but with the appropriate caveats in mind, we hope that these numbers will be useful.
The columns in the tables are as follows:
A. Author and title of work.
B. Total number of clausulae detected in the work.
C. Total number of clausulae excluded from consideration (those containing abbreviations, editorially marked textual corruptions, fewer than four syllables and so forth).
D. Total number of clausulae considered (= B - C).
E. Percentage of cretic trochees (including resolved forms).
F. Percentage of double cretics and molossus cretics (including resolved forms).
G. Percentage of double trochees (including resolved forms).
H. Percentage of hypodochmiacs (including resolved forms).
I. Percentage of double spondees (no resolutions).
J. Percentage of heroic clausulae (no resolutions).
K. Total percentage of ‘artistic’ clausulae (= E + F + G + H).
L. Total percentage of double spondees and heroic clausulae (= I + J).
M. Total percentage of miscellaneous (that is, all other) clausulae.
More detailed tables will be found in the Supplementary Material online (https://doi.org/10.1017/S0075435819000881).
III ANALYSIS
The foregoing tables provide an order of magnitude more information about Latin clausulae than has been available before, and they provide it all in one place with a consistent methodology. We hope that they will prove useful in a variety of research questions, and we give a sample of such questions below. These only scratch the surface of what we think is possible. We begin with a new approach to determining statistical significance in prose rhythm data, and then proceed to specific questions about the prose rhythm practices of individual authors like Cicero, Sallust, Tacitus and Pliny the Younger.
How Do You Tell If Any of These Data Are Meaningful? A New Approach
It is not necessarily obvious that the use of particular sequences of short and long syllables should be regarded as a consciously sought artistic phenomenon in Latin prose. After all, every Latin syllable is long or short, and so every sentence must end with some pattern of longs and shorts.Footnote 41 Furthermore, the character of the Latin language itself will dictate that some patterns occur more frequently than others: long syllables are more common than short, for example, and so it would surprise no one to hear that – – –× is more common than ⏑⏑⏑×. Likewise, many authors favour verbs at the ends of clauses (i.e., in an important clausular position), and the third person and past tense are disproportionately represented in our surviving texts. These and many other tightly intertwined biases make it extremely hard — we think impossible — to establish any kind of ‘baseline’ expected distribution of rhythms. There is simply no way to say that you would ‘expect’ Latin sentences to end with a cretic-trochee 6 per cent of the time: what do you base your expectations on?
Scholars have generally taken one of three approaches to this question. Some, like Zielinski, ignored it altogether, and simply presented absolute numbers and percentages. But from the beginning it was objected that, for example, reporting that clausulae of the ēssĕ uĭdĕātūr type occur with 4.7 per cent frequency in Cicero's speeches whereas the type ōmnēs ēssēnt occur 6.4 per cent of the time is not in itself useful. What if ēssĕ uĭdĕātūr–type clausulae naturally occur in Latin 2.4 per cent of the time, while the type ōmnēs ēssēnt naturally occurs 23.5 per cent of the time? Then the real point of interest would be that Cicero sought out the former and deliberately avoided the latter, but this is hidden behind the absolute frequencies ‘4.7 per cent’ and ‘6.4 per cent’.Footnote 42 To determine the significance of any observed frequency, it must somehow be compared against an expected baseline.
A second approach has been to calculate an expected value based on a ‘neutral’ sample of Latin. Albert De Groot at one point tried sampling scholarly translations of Greek texts made in the nineteenth century, but it is almost impossible to say how such scholarly Latin would map onto a native speaker's intuitions about rhythm.Footnote 43 François Novotný looked at the distribution of syllables not in clausular position, but this is to compare different things.Footnote 44 Others have tried still other approaches: Henri Bornecque, for example, considered the proportions of various patterns in authors whom he deemed unlikely to be rhythmic.Footnote 45 But this is arbitrary at best, and circular at worst; deciding that Sallust, say, is unrhythmic, and using his numbers as a baseline, is simply to assume your desired conclusion, and it is not much helped if you add a few other authors into the mix.Footnote 46
In response to the problems of external comparison, Tore Janson and his student Hans Aili pioneered a form of ‘internal comparison’.Footnote 47 They looked at a sample of an individual author's clausulae and determined the frequency of longs and shorts in each position (that is, what percentage of penultimate syllables are long, what percentage of antepenultimate syllables are long and so forth). From this they calculated an expected frequency for each type of clausula in that author, which is simply the product of the observed frequencies for each individual syllable.Footnote 48 Then they could compare the observed percentage of a given clausula with its expected value and run statistical tests on their results. This method is ingenious, but it has a fundamental weakness that vitiates any statistics derived from it: these scholars base their ‘expected’ values on the very material that they are trying to observe. If an author systematically seeks certain clausulae and avoids others, those preferences will already be part of the ‘expected’ values and so cannot be called neutral or natural. It is a circular procedure.Footnote 49
We propose a new approach to the question of expected values. We think that the only secure basis for comparison is to look at the tendencies of individual authors and attempt to determine whether there are statistically significant differences in their practices. If so, then we can at least say that the differences among authors are unlikely to be due to random chance. Until now, this task was more or less impossible, because while there exist studies of individual authors’ rhythmic tendencies, the scholars carrying out these studies made different assumptions and employed different methodologies. Our data, by contrast, allow a comparison of like with like across all of Latin prose. Furthermore, in authors with sufficiently large corpora, we can also consider a portion of the corpus and determine whether its rhythmic practices match the rest of the corpus. So with Cicero's speeches, for example, we can consider each individual speech separately and compare it to the rest of his corpus with that speech removed. Indeed, such a comparison can even be applied to individual letters of Cicero's to determine whether it is likely that he paid extra attention to rhythm in them, or, with some further work, to compare the rhythmic practices of speeches and narrative in a historian. We will carry out all of these tests in the following sections.
Any such statistical tests must be used with appropriate caution, for their results are wholly determined by the data input. Take Varro and his two substantially extant works, De lingua Latina and De re rustica. We could consider his distribution of clausulae in five categories (including resolutions in each): cretic-trochaic, double cretic or molossus cretic, double trochee, hypodochmiac, and ‘everything else’. We would then have a table of data like this:
The most appropriate statistical test to analyse such data, and one with a long history in studies of prose rhythm, is the chi-square test.Footnote 50 The details are available in any statistical handbook,Footnote 51 but in essence, the chi-square test applied to this data will test the null hypothesis that the two rows of data come from the same distribution and that variation between the two is merely due to chance. (This is not a measure of degree of difference between two samples, but a test of whether these differences are unlikely to arise by chance if both samples were drawn from identical populations.) From our chi-square test statistic is derived a p-value; if our p-value is below a certain threshold (in this paper, as often, .05), we reject the null hypothesis and conclude that there is a statistically significant difference between the two rows of data.Footnote 52 Put plainly, the chi-square test allows us to say whether an apparent difference in authors’ use of particular clausulae is in fact statistically significant.Footnote 53
If we run a chi-square test on the above five columns of data, we get χ2 = 39.796; with four degrees of freedom this results in a p-value near zero.Footnote 54 Such a value indicates that it is almost impossible for the prose rhythms of these two works to belong to the same distribution. But a priori this is very unlikely; Varro wrote both of them, and the rhythms of neither look to be ‘artistically’ rhythmic in the Ciceronian sense of the term. A test treating these five columns of data appears too sensitive. If, however, we pool the data differently and group our ‘artistic’ clausulae (cretic trochees, double cretics, ditrochees and hypodochmiacs) together and our ‘non-artistic’ clausulae (double spondees, heroic clausulae and everything else) together, we can look instead at the two columns of the following table:
A glance at these proportions will show that they are very similar. It is no surprise, then, that a chi-square test on these data yields χ2 = 0.977, producing a p-value of about 0.32294. This p-value, by contrast, indicates that it is reasonable to conclude that any deviation in the prose rhythms of these two works is due to random chance. We get the same result if we compare the individual books of De lingua Latina and De re rustica using a chi-square test of ‘artistic’ vs ‘non-artistic’ clausulae: there are no statistically significant differences in preferences for artistic and non-artistic clausulae among the various books.
These two very different results are a salutary warning that statistical tests must be used cautiously, and always with an eye on the underlying data and reasonable expectations.Footnote 55 The choice of collapsing our data into two categories of artistic and non-artistic clausulae is, again, fundamentally a pragmatic one. It produces sensible and interesting results. It has the further virtue of agreeing with many of the theoretical models that have been constructed for Latin prose rhythm. But there may be better — and there are certainly other — ways of dividing the data, and binary tests between ‘artistic’ and ‘non-artistic’ clausulae should simply be seen as one useful tool, not as some kind of definitive measure.
This test also suggests that we should adjust certain assumptions, as another example will make clear. We can compare Varro's De re rustica and Cato's De agri cultura using our ‘artistic’ vs ‘non-artistic’ model as follows:
χ2 = 14.463, p-value ≈ 0.00014. These two authors, according to our test, almost certainly show different propensities to artistic clausulae. The commonly accepted prior assumption is that neither Varro nor Cato cares about prose rhythm, but we suggest that this assumption is wrong. It is all but certain that any Latin author had intuitive preferences for some rhythms and unconsciously avoided others. Indeed, this is borne out by our data: when we look at our tables for all authors’ prose rhythm preferences, we nowhere see, even in supposedly ‘unrhythmic’ authors, convergence around particular baseline numbers. This should not be surprising: in English no one would expect Jonathan Franzen and David Foster Wallace to share the same rhythmic tendencies, even if they were contemporaries and friends who wrote in the same genres for similar audiences. All Latin authors have their own rhythmic profiles, and thus no universal expected values can be established. But authors can be compared with each other, and furthermore, authors can be compared with the artificial system of ‘artistic’ clausulae adopted by Cicero and many later writers.
So Varro is consistent with Varro, and Caesar is consistent with Caesar:
χ2 = 1.173, p-value ≈ 0.27879: any variation in Caesar's tendency toward artistic clausulae between the Bellum Gallicum and the Bellum ciuile is not statistically significant. By contrast, Varro and Caesar clearly differ from each other:
χ2 = 148.224, p-value ≈ 0: these two authors do not have the same preferences at all. If we say that they are ‘not rhythmic’, what we really mean is that they do not follow the distribution of clausulae characteristic of Cicero, because they clearly have their own tendencies in how they distribute longs and shorts.Footnote 56
It is pretty clear from our data that no two authors show the same rhythms, although many authors are consistent with themselves in their preferences (so, for example, Sallust). What also seems pretty clear is that some authors deliberately avoid spondaic, heroic and other unusual clausulae in favour of forms of the ‘artistic’ four (including resolved forms), viz. cretic trochees, double cretics (or molossus cretics), double trochees, and hypodochmiacs. Latin teems with long syllables, and authors who have a markedly lower proportion of – – – × are probably avoiding it deliberately. The effects can be pervasive: in Cicero, for example, audistis is found 72 times, audiuistis 2, audisti 16, audiuisti 0.Footnote 57 Cicero seems to avoid the sequence of four long syllables. So too does Pliny the Younger show a marked aversion to double spondaic clausulae, which occur in his writings only around 6 per cent of the time. Authors like Tacitus, by contrast, are much less averse to double spondees, which comprise nearly a quarter of his clausulae.
In addition to double spondees, it is especially relevant to consider the frequency of heroic clausulae (that is, hexameter endings). In most authors these are not very frequent, but in certain authors, like Cicero, they are exceptionally rare.Footnote 58 The sum of double spondaic and heroic clausulae thus provides an approximate index for how ‘artistically’ rhythmic an author is; adding in the rare miscellaneous clausulae makes this measure the precise complement of the artistic four.Footnote 59 Authors who clearly pay attention to the canons of an artificial doctrine of ‘artistic’ prose rhythm include (in parentheses is given the author's percentage of artistic clausulae):
1. Cicero (e.g. 83.42 per cent in the speeches taken together)Footnote 60
2. Velleius Paterculus (79.68 per cent)Footnote 61
3. Seneca the Younger (e.g. 80.92 per cent in the Epistulae morales)Footnote 62
4. Q. Curtius Rufus (85.29 per cent)Footnote 63
5. Pomponius Mela (82.62 per cent)Footnote 64
6. Pliny the Younger (84.87 per cent in Epist. 1–9; 85.36 per cent in Pan.)Footnote 65
7. Suetonius (80.57 per cent in the Vitae)Footnote 66
8. Apuleius (in some works; e.g. 78.50 per cent in Met.)Footnote 67
9. [Quintilian], Declamationes maiores (84.03 per cent)Footnote 68
In the main, our results confirm earlier scholars’ smaller, sample-based studies of individual authors; such replication and verification has long been missing in studies of prose rhythm.Footnote 69 So, for example, Velleius Paterculus shows a remarkable affection for double cretic and molossus cretic rhythms, which comprise some 40 per cent of his clausulae. This striking preference is unexpected, unprecedented and not imitated by later authors. Aili looked at a sample of 500 Velleian clausulae, and, although counting only six syllables and presenting his data somewhat differently, found essentially the same tendency.Footnote 70
The great bulk of Latin prose authors, however, seem to have followed their own rhythmical preferences, not a set of Hellenistic precepts. To this generalisation one special case should be noted: both Sallust and especially Livy must have consciously sought out heroic and spondaic rhythms, and to an extraordinary degree (Sall., Iug.: 34.93 per cent, Cat.: 33.73 per cent; Livy: 43.99 per cent). Livy's preferences moreover intensified over time, being least marked in the first decade (35.41 per cent) but increasingly so in Books 21–30 (48.37 per cent) and 31–40 (49.72 per cent). These authors have deliberately chosen to go in precisely the opposite direction to the Ciceronian system.Footnote 71 Whether Livy's and Sallust's predilection for non-artistic clausulae constitutes a ‘historical style’ is unclear; Tacitus, at any rate, does not follow their example.Footnote 72
In sum, ‘expected values’ for the distribution of rhythms in unmarked Latin prose simply cannot be established on the basis of surviving evidence, for all authors have their own rhythmic preferences. But there are statistically significant differences in these authorial preferences. Furthermore, an important subset of Latin authors adhered in some fashion to a particular ‘artistic’ rhythmic canon, and at least a couple deliberately rebelled against it. It is in this sense that we can claim that Latin prose rhythm is not just a chimera that scholarly syllable counters have been chasing after in vain for over a century.
Authorial Variation and ‘Spurious’ Compositions
Cicero has always provided the notional benchmark against which Latin prose rhythm has been measured, but Cicero's own rhythmical practices vary widely over time and genre and even an individual work. One often reads, for example, that Cicero was less attentive to prose rhythm in his correspondence. While this claim can and should be nuanced, it is clearly right, as can be seen by comparing Cicero's speeches with the Epistulae ad Atticum:
χ2 = 856.038, p-value ≈ 0: these distributions are very different. The letters are markedly less concerned with artificially artistic prose rhythm.
Of course, not all letters are created equal.Footnote 73 When Cicero is writing for a wider audience, as in his long letter of advice to Quintus during the latter's time as a provincial administrator in Asia, he uses markedly different rhythms than when he writes for his brother's ears alone:
χ2 = 35.94, p-value ≈ 0. The polished and public Q. fr. 1.1 was composed with much more attention to pretty clausulae.
Furthermore, it should be observed that even within Cicero's corpus of speeches we find considerable variation. Pro Roscio comoedo, for example, is notably non-artistic in its rhythms, perhaps showing a ‘studied negligence’ in imitation of comedy.Footnote 74 While general trends can be descried — the earliest speeches show fewer cretic trochees, say — there exist occasional counter-examples to almost all of them (so the later Pro Rabirio Postumo shows a very low percentage of cretic trochees). Given all this variation, can we even talk about Cicero's ‘prose rhythm preferences’ as some kind of Platonic form? We are sceptical.
Studies of prose rhythm often hold out the promise of uncovering an author's unique rhythmic fingerprint, a sort of unchanging stylistic essence. Such a fingerprint could be of enormous use in questions of authenticity. Some authors, as we have seen, do present a very consistent fingerprint: Caesar is consistent with Caesar; Varro is consistent with Varro. Other authors, however, are chameleons, adapting their rhythms to circumstances. Cicero is a chameleon. Such authorial variation and adaptability means that we cannot naively rely on prose rhythm to distinguish between genuine and spurious compositions.
This claim is most easily demonstrated by using our artistic vs non-artistic test for each of Cicero's speeches set against the corpus of the rest of his speeches. In effect, we are conducting a thought experiment in which we ask, ‘If this work were not known to be Cicero's, would it fit rhythmically with the rest of his corpus?’ Table 7 shows Cicero's surviving speeches, sorted from most to least artistically rhythmic.
The test that we have just described would identify fully twenty-two of these speeches as suspect:
• Non-artistic to a statistically significant degree (9): Quinct., Rosc. Am., Caecin., Tul., Verr. 2.1 and 2.2, Q. Rosc., Rab. Post., Phil. 8.
• Artistic to a statistically significant degree (13): Leg. Man., Catil. 2 and 4, Arch., Dom., Vat., Prov. cons., Cael., Balb., Planc., Marcell., Phil. 3 and 4.
Now these data are not without use. We have already commented on the exceptional Pro Roscio comoedo, which is, rhythmically speaking, far and away Cicero's ‘least Ciceronian’ speech. It is probably not coincidence that most of the other less ‘artistic’ speeches cluster at the beginning of Cicero's career; it would not be surprising to find that his rhythmic preferences evolved and were refined over time, and any such change has been flattened out in this test. And yet Philippic 8 is rather unexpected; Cicero's tendency towards more artistic clausulae is hardly a fixed law. On the other hand, sometimes Cicero seems to have gone out of his way to be especially ‘artistic’ in his rhythms. Such speeches include some of Cicero's most important, like the Catilinarians (a sign of careful revision?), as well as particularly literary efforts like Pro Archia and Pro Caelio.Footnote 75
But while the data are not useless, a test showing that fully 38 per cent of Cicero's speeches appear ‘non-Ciceronian’ is clearly not the appropriate instrument to determine authorship of a potentially Ciceronian speech.Footnote 76 For Cicero, prose rhythm is not just a signature of authorship; it is in fact a form of content. A too simple application of statistical tests to prose rhythm to resolve questions of authenticity risks conflating variation in content with variation in authorship.
We still think that such tests can sometimes be applied with profit, but they must be applied very carefully. They work best with authors who do not appear to vary their rhythmic practices depending on content, like Sallust. As our tables show, Sallust exhibits the same rhythmic profile in all of his historical works, and we shall soon see that he does not evince any differences between his narrative and set-piece speeches within those works either. The author of the pseudo-Sallustian Inuectiua in Ciceronem, on the other hand, has a markedly different set of preferences for artistic and non-artistic clausulae:
χ2 = 4.549, p-value ≈ 0.03293.Footnote 77 One might still try to argue that this is simply an instance of generic differences dictating different rhythms, but in any case we can say that overall propensity to artistic clausulae does not encourage belief in Sallustian authorship.Footnote 78 By contrast, preferences for artistic clausulae at least do not militate against the claim that Sallust wrote the Epistulae ad Caesarem:
χ2 = 0.404, p-value ≈ 0.52503. The rhythms of the Epistulae ad Caesarem are indistinguishable from Sallust in his historical works; if they are not genuine, the imitator showed a remarkably accurate knowledge of Sallust's unusual rhythmic tendencies.Footnote 79
But such applications are perhaps more limited than we might want. Rhetorica ad Herennium, for example, is rhythmically indistinguishable from De inuentione, but this is not a function of Ciceronian authorship: you might guess that similarity in content is the reason that their rhythms converge. Tests using this method can measure real differences between texts, and this is of value, but such variation may be tied to any number of factors, most notably variation in content. While in certain circumstances, particularly when an author shows very stable rhythmic practices, these tests can be a piece of evidence in the discussion of authenticity, prose rhythm is very far from a panacea for resolving the attribution of a disputed work.
Variation Within a Text: Speeches vs Narrative in Sallust and Tacitus
We have just seen that some authors vary their prose rhythm practices in different genres (private letters vs public speeches, say), and that indeed some authors show remarkable variation even within a single broad genre (Cicero's orations). This naturally leads to the question of whether authors show different rhythmic practices within an individual work. In Latin historiography, for example, is there a difference in prose rhythm between narrative and inset speeches?Footnote 80 We have looked at the cases of Sallust and Tacitus. For Sallust, the answer is a clear no. For Tacitus the situation is more complex: Tacitus does seem to have different rhythmic profiles, and they do sometimes correlate with the distinction between narrative and speeches — but not always.
To arrive at these answers we must first separate the historians’ corpora into speeches and narrative. While it is perhaps not impossible to do this programmatically, it is a challenge,Footnote 81 and we have simply segregated by hand. We have included only longer instances of direct speech, excluding both short utterances and all indirect discourse.Footnote 82 Our corpora of speeches are as follows:Footnote 83
Sall., Cat. 20, 33, 51, 52, 58; Iug. 10, 14, 31, 85, 102, 110; Hist. Or. Lepidus, Philippus, Cotta, Macer.
Tac., Agr. 30–2, 33–4; Hist. 1.15–16, 29–30, 37–8, 83–4; 2.47, 76–7; 3.2, 20; 4.32, 42, 58, 64–5, 73–4, 77; 5.26; Ann. 1.22, 28, 42–3, 58; 2.37–8, 71, 77; 3.12, 16, 46, 50; 4.8, 34–5, 37, 40; 6.6, 8; 11.24; 12.37; 13.21; 14.43–4, 53–4, 55–6; 15.2, 20; 15.22, 31.
For Sallust the results are plain.Footnote 84 For example, in the Bellum Iugurthinum:
χ2 = 0.977, p-value ≈ 0.32294. The Bellum Catilinae shows an even greater similarity:
χ2 = 0.043, p-value ≈ 0.83573. Even the longer speeches of Sallust's Historiae seem to fit this pattern. We here compare them with the Bellum Iugurthinum and Bellum Catilinae, because the fragmentary state of the remainder of the Historiae makes any inferences drawn against them unreliable at best:
χ2 = 0.328, p-value ≈ 0.56684. Sallust shows an apparently unshakable consistency in his preferences for artistic and non-artistic clausulae, both across his various works and within them, making no distinctions between speeches and narrative.
For Tacitus the story is more nuanced. In the Annales, he shows a slight tendency toward more artistic clausulae in speeches, but it is slight and not statistically significant:
χ2 = 1.523, p-value ≈ 0.21717. In his last work, it appears that Tacitus did not differentiate speeches from narrative rhythmically, or at any rate that any differentiation is so small that it may well have arisen by chance.
But in his earlier works the tendency toward artistic clausulae in speeches is more pronounced. So in the Agricola:
χ2 = 1.31, p-value ≈ 0.25239. The chi-square test statistic here is small both because the difference in the proportion of artistic clausulae is not large and, importantly, because the sample size of speeches in the Agricola is so small. But these proportions are very nearly what we see in the Historiae, where the larger sample size allows for more statistical confidence:
χ2 = 14.969, p-value ≈ 0.00011. The difference between speech and narrative here is large and statistically significant. The narrative portion of the Historiae shows almost the exact same propensity to artistic clausulae as the narrative of the Annales (and the Agricola). The speeches of the Historiae, however, resemble nothing so much as the Dialogus and Germania, from which they are indistinguishable in their preferences for artistic clausulae.
χ2 = 0.207, p-value ≈ 0.64913.
χ2 = 0.157, p-value ≈ 0.69193.
What do all these numbers mean? They seem to indicate that while Sallust has a uniformly consistent set of (dis)preferences for artistic clausulae, Tacitus has at least two separate rhythmic profiles that he can use. These two separate profiles sometimes correlate with the distinction between speech and narrative (so in the Dialogus, Agricola and Historiae), but not always: in the Annales, Tacitus shows roughly the same proportion of artistic clausulae in both speech and narrative, and in the Germania, which is exclusively narrative, Tacitus exhibits the rhythmic preferences that he shows elsewhere for speeches. More investigation is needed here, but it is plain that prose rhythm is part of Tacitus’ literary artistry, and that he sometimes varies his practice for some kind of effect. It would certainly be a mistake to claim, as many scholars have, that Tacitus is indifferent to prose rhythm.Footnote 85
Tacitus, Dialogus de oratoribus
We have just seen that Tacitus makes use of a particular rhythmic profile in the Dialogus de oratoribus. Now in that work he imitates Cicero in numerous and varied points of diction. He postpones igitur to second position; he uses the word autem some twenty times (compared to six instances in all of the Historiae and Annales); he indulges in a number of synonymous doublets.Footnote 86 One might wonder whether his rhythmic preferences in the Dialogus are a sought-out imitation of Cicero too, as Gregory Hutchinson claims.Footnote 87
It is in some sense true that the Dialogus is Tacitus’ ‘least Tacitean’ work in its propensity to artistic clausulae. A test of its numbers of artistic and non-artistic clausulae against those of the rest of Tacitus’ corpus marks it as a clear outlier:
χ2 = 27.483, p-value ≈ 0. But as we have already seen, that is only part of the story. The Germania too, for example, shows the same rhythmic profile, as do the speeches in the Agricola and the Historiae.
Moreover, this propensity to artistic clausulae is not necessarily ‘Ciceronian’. The best point of comparison between the Dialogus and ‘Cicero’ is not completely clear. Does the Dialogus map onto the prose rhythm of Cicero's speeches?
χ2 = 51.135, p-value ≈ 0. No, it is not even close. What about Cicero's own dialogues? Here it is hard to know what corpus to pick, but the Dialogus is less artistically rhythmic than any of Cicero's surviving dialogues. If we compare it, for example, with all of Cicero's extant rhetorical and philosophical works pooled together, we get:
χ2 = 62.125, p-value ≈ 0. Again, not even close; even further away, in fact.
The rhythms of the Dialogus are clearly different from the narrative portions of Tacitus’ historical works, but they resemble the Germania and the speeches of the Agricola and the Historiae. What Tacitus is doing with this varying propensity toward artistic clausulae calls for further study, but we can say with confidence that neither in the Dialogus nor anywhere else does he even approach a true rhythmic imitation of Cicero.
Pliny the Younger
Pliny the Younger offers an interesting test case for a variety of questions, not least because he, like Sallust, presents such a consistent set of rhythmic preferences. We can thus use our statistical tests to answer questions such as: do Pliny's private letters (Ep. 1–9) differ from his correspondence with Trajan (Ep. 10)? Is there any variation within the books of private correspondence? Does Trajan's prose rhythm in Ep. 10 differ from Pliny's? And what of the rhythms of the Panegyricus, an epideictic speech perhaps liable to entirely different generic conventions from a book of stylish letters?
In the first instance we can observe that Pliny is an author with a marked preference for artistic rhythms. He shuns spondaic and heroic clausulae (even more than Cicero did in his speeches, to say nothing of his letters), and he favours cretic-trochaic rhythms to an almost unprecedented degree and with remarkable consistency across the private correspondence: they comprise some 40 per cent of the clausulae in Ep. 1–9.Footnote 88 These preferences combine to yield an extraordinarily stable rhythmic profile across the private letters. Indeed, those similarities extend even to the Panegyricus. Consider a detailed chi-square test of the sort that showed different distributions for Varro's two works:
χ2 = 5.902, p-value ≈ 0.20658. The Panegyricus, even on a very fine-grained test, cannot be distinguished from the letters, and the individual books of letters are themselves all but indistinguishable from each other.Footnote 89
The exception, of course, is Book 10. Trajan's replies show a clearly different rhythmic fingerprint. If we compare the pooled artistic and non-artistic patterns in Ep. 1–9 with Trajan's replies to Pliny in Book 10, the latter are conspiculously less artistic:
χ2 = 33.083, p-value ≈ 0. Trajan's rhythms in Book 10 are completely different from Pliny's in Books 1–9. Indeed, Trajan's rhythms in Book 10 are completely different from Pliny's in Book 10:
χ2 ≈ 10.53, p-value ≈ 0.00117. Trajan (or his chancery secretary) speaks in his own voice and with his own cadences.
The prose rhythm of Pliny's own letters in Book 10 is only slightly less ‘artistic’ than that of Books 1–9, although the difference does rise to statistical significance:
χ2 = 5.066, p-value ≈ 0.02439. Nevertheless, prose rhythm appears to have been a natural part of Pliny's composition process in a way that it was not for Cicero in his letters, although it must still be a learned part, because his preferences are so distinctive — or, just maybe, he revised Book 10 for publication himself and took some care for its rhythmic properties.Footnote 90
Finally, as we have already seen, although the Panegyricus is a speech, in it Pliny uses almost exactly the same rhythmical patterns as he does in the Epistulae. But to think of the rhythmic preferences of the Panegyricus as the same as those of the Epistulae is probably to put the cart before the horse. In his own lifetime, Pliny was above all an orator, and it is a simple twist of fate that we happen to have ten books of Pliny's letters and only one preserved speech. It seems very likely that the prose rhythms we find in his letters have their origin in the preferences that he developed for his speeches. This is probably a deliberate (and artistic) affectation, since one might have expected his correspondence, like Cicero's, to be looser about such details, and it is another reason we should consider Pliny's letters highly polished literary compositions.
IV CONCLUSIONS
Our algorithms and the data that they generate provide a powerful tool to answer questions like the ones posed above, a list which can be extended indefinitely. Because we are using computers and code, we can change assumptions or look at different texts or divide our existing texts up differently — and immediately generate refreshed data for the entirety of the corpus that we are considering. Furthermore, although it is in most cases impossible to replicate previous scholars’ methodologies with absolute precision, in broad outline we can nevertheless check their results almost instantaneously. This process of replication and verification has long been absent from studies of Latin prose rhythm. Since all our code and data are open source and publicly available, our own results can also be easily checked (and perhaps improved).
Improvements and extensions of these data may take a variety of forms. A different approach to locating clausulae, one that does not rely on punctuation, might help advance exploration of ‘internal’ clausulae, a topic which has thus far resisted rigorous analysis. More extensively marked up texts would facilitate other kinds of investigations: for example, does Cicero use different rhythms in his exordia, or narrationes or perorationes? Annotating his speeches with consistent metadata would allow for more detailed study. More sophisticated data manipulation techniques, like Principal Component Analysis, might give us other profitable ways to categorise our data beyond just ‘artistic’ and ‘non-artistic’.Footnote 91 And this is to say nothing of further work that can be done with the data that we have already collected, like that on word division and word accent in clausulae, which would necessarily be crucial in studying the rhythms of late antique texts as the cursus begins to develop.
Of course, none of the broad brush pictures painted by statistical analysis can give insights at the level of an individual clausula in an individual sentence in an individual author's text. Such an analysis of the details of prose rhythm in the context of a speech or a letter is eminently worthwhile and can have great explanatory power.Footnote 92 So when Cicero describes the same event twice in almost the same words in Pro Milone, he once writes ‘respondit triduo illum aut summum quadriduo esse periturum’ (Mil. 26), but later ‘audistis … periturum Milonem triduo’ (Mil. 44). It seems likely that he wrote esse periturum in the first case because it was in clausular position (= esse uideatur), whereas in the second the infinitive came in the middle of the phrase and so he preferred simply periturum. Prose rhythm is one of the keys to unlocking the secrets of Latin word order and word choice, revealing points of emphasis and rhetorical artifice, and understanding it at the local level is essential for appreciating an author's verbal artistry. Much of this artistry must have been put into practice subconsciously or unconsciously (see, for example, Quint., Inst. 9.4.119–20), and we remain sceptical of accounts that attempt to quantify the force of any individual clausula, but it is clear that ancient authors and ancient audiences could perceive and appreciate rhythmic prose.Footnote 93 Today, without native speaker Sprachgefühl, we can only recover these effects by philological analysis.
While interpreting prose rhythm at the level of the sentence and clause requires close reading and analysis, at the global level, questions of prose rhythm cry out for an open-source, Big Data approach. We have offered one such approach, producing algorithms to detect and categorise the rhythms of any Latin prose text, providing comprehensive data generated by these algorithms for most of extant classical Latin prose, presenting a new statistical approach to analysing the significance of those data, and giving several examples of how to use our data and procedures to answer particular questions about authors’ propensity toward artistic rhythms. For example, we can confirm that Cicero's letters are significantly less concerned with ‘artistic’ prose rhythm than are his speeches, but we can also show how certain letters, like the lengthy and polished Q. fr. 1.1, take particular care to be artistically rhythmical. We can with a few clicks compare the prose rhythms of the perhaps spurious Inuectiua in Ciceronem or Epistulae ad Caesarem senem with those of the undisputedly genuine Sallust: the former does not look at all Sallustian, but the latter actually does. We can compare the rhythms of speeches and narrative in authors like Sallust and Tacitus: Sallust's rhythms never change, but Tacitus has at least two distinct rhythmic profiles (neither of which, even in the Dialogus, counts as ‘Ciceronian’). We can see almost at a glance that Trajan's replies to Pliny's letters in Book 10 have an entirely different rhythmic fingerprint from Pliny's, while in the Panegyricus Pliny mirrors the rhythmic preferences that he shows in the Epistulae. It may be an exaggeration to claim that technology will revolutionise the study of Latin prose rhythm — the fundamental insights as worked out over a century ago seem to stand correct and confirmed — but it will certainly replough the entire field, offering fresh data and the possibility of countless new results. Nothing will ever make the study of Latin prose rhythm easy, but computers will certainly make it a lot easier.
SUPPLEMENTARY MATERIAL
For Supplementary Material for this article please visit https://doi.org.10.1017/S0075435819000881.