Agreement syncretization and the loss of null subjects: quantificational models for Medieval French

Alexandra Simonenko; Benoit Crabbé; Sophie Prévost

doi:10.1017/S0954394519000188

Agreement syncretization and the loss of null subjects: quantificational models for Medieval French

Published online by Cambridge University Press: 13 November 2019

Alexandra Simonenko ,

Benoit Crabbé and

Sophie Prévost

Show author details

Alexandra Simonenko: Affiliation:
FWO/UGhent
Benoit Crabbé: Affiliation:
LLF, CNRS/Paris Diderot, USPC
Sophie Prévost: Affiliation:
Lattice, CNRS/ENS/Université Sorbonne Nouvelle/Université PSL/USPC*

Article contents

Abstract
NULL SUBJECTS AND SUBJECT AGREEMENT IN FRENCH
CLAUSE-LEVEL RELATION MODEL
PERFORMANCE OF THE CLAUSE-LEVEL RELATION MODEL
SPELLING-PRONUNCIATION PROBLEM
CHANGE AS A VARIATIONAL LEARNING OUTCOME
CONCLUSIONS
Footnotes
References

Rights & Permissions

Abstract

This paper examines the nature of the dependency between the availability of null subjects and the “richness” of verbal subject agreement, known as Taraldsen's Generalisation (Adams, 1987; Rizzi, 1986; Roberts, 2014; Taraldsen, 1980). We present a corpus-based quantitative model of the syncretization of verbal subject agreement spanning the Medieval French period and evaluate two hypotheses relating agreement and null subjects: one relating the two as reflexes of the same grammatical property and a variational learning-based hypothesis whereby phonology-driven syncretization of agreement marking creates a learning bias against the null subject grammar. We show that only the latter approach has the potential to reconcile the intuition behind Taraldsen's Generalisation with the fact that it has proven nontrivial to formulate the notion of agreement richness in a way that would unequivocally predict whether a language has null subjects.

Type: Research Article
Information: Language Variation and Change , Volume 31 , Issue 3 , October 2019 , pp. 275 - 301

DOI: https://doi.org/10.1017/S0954394519000188 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2019

This paper examines the nature of the relation between the availability of null subjects and the “richness” of verbal subject agreement, known as Taraldsen's Generalisation (Adams, Reference Adams1987; Rizzi, Reference Rizzi1986; Taraldsen, Reference Taraldsen1980), from the point of view of grammar change in Medieval French. The original generalization based on synchronic observations states that a language having sufficiently discriminating, or nonsyncretic, subject agreement entails the possibility of nonexpression of subjects. In terms of diachronic developments, it was argued that there is a causal relation between the loss of nonsyncretic subject agreement and the emergence of obligatory subject pronouns (e.g., Ewert, Reference Ewert1943; Vennemann, Reference Vennemann and Li1975:298), the underlying intuition being that overt subjects take over the role of identifying the subject's person which can no longer be fulfilled by verbal inflection due to its phonological erosion. Haspelmath (Reference Haspelmath1999:14) said that “… in languages that are losing their rich subject agreement morphology on the verb … speakers will increasingly tend to choose the option of using the personal pronoun, because the verbal agreement does not provide the information required for referent identification in a sufficiently robust way.”

This diachronic scenario, however, was questioned for Medieval French on the grounds of an apparent temporal lag between the loss of null subjects and loss of agreement (e.g., Roberts, Reference Roberts and Svenonius2014; Schøsler, Reference Schøsler, Sampson and Ayres-Bennett2002). However, opposite assumptions have been made about the temporal sequence of the two changes, due to the unavailability of a systematic quantitative study of syncretization. We present a corpus-based study spanning the Medieval French period to evaluate two hypotheses. First, we test the predictions generated by the hypothesis that null subjects and nonsyncretic agreement exponents are related at the clause level, both being dependent on the same functional head. The second hypothesis we explore is based on Yang's (Reference Yang2002) variational learning model whereby the agreement exponents and subject expression are not strictly connected at the clause level. Instead, in the process of language learning (possibly over the speaker's lifespan) syncretic endings create a bias against the null subject grammar, which eventually drives it to extinction.

NULL SUBJECTS AND SUBJECT AGREEMENT IN FRENCH

Our estimates are based on the corpus of the project “Modéliser le changement: les voies du français” (MCVF) and Penn Supplement to MCVF (2010), which together include 35 syntactically parsed texts (n ≈ 1 million words [Appendix B]). On the assumption that null subjects correspond to phonologically null personal pronominal elements, observations about the emergence of overt subjects are given here as the estimated probability of overt personal pronominal subjects against null subjects, with demonstrative, nominal, and other kinds of overt subjects being excluded from consideration. The assumption is warranted by the fact that the rate of overt subjects that are not personal pronouns stays the same throughout the Medieval period, whereas the rate of overt pronominal subject increases and the rate of null subjects decreases in a dramatic fashion. Furthermore, null subjects in pro-drop languages and overt pronominal subjects in obligatory subject languages are said to be distributionally equivalent (e.g., Hirschbühler, Reference Hirschbühler1992).

Our dataset includes all finite clauses with either an overt pronominal or null subject (n = 56615), excluding imperatives, subject relatives, and wh-questions targeting subjects because of their idiosyncratic subject syntax.Footnote ¹ We also excluded all coordinated clauses introduced by the coordinating conjunction et and the conjunctive adverb si, since those license subject ellipsis throughout the Medieval period. Although these connectives are sometimes used even when there is no potential antecedent in the preceding clause, we take the nearly stable rate of subject omission with et and si (see Appendix C, Figure 1C) to mean that there are few true subject omission environments with these connectives. Subject ellipsis under coordination with et is still allowed in Modern French, while si itself fell out of use as a conjunctive adverb.Footnote ² The nonexpression of referential subjects occurred in Medieval French, and Old French in particular, in contexts where their expression would be obligatory in Modern French (e.g., Foulet [Reference Foulet1928] and much literature since). During the Medieval period nonexpression became more and more rare, for both main and subordinate clauses, as seen in Figure 1. As has been noted before, subordinate clauses favor overt subjects more than main clauses (e.g., Foulet, Reference Foulet1928; Franźen, Reference Franzén1939; Hirschbühler, Reference Hirschbühler1992; Roberts, Reference Roberts and Svenonius2014; Vance, Reference Vance1997; Zimmermann, Reference Zimmermann2014; among others), though null subjects can be found in all types of subordinates (Fontaine, Reference Fontaine1985; Hirschbühler & Junker, Reference Hirschbühler and Junker1988; Kaiser, Reference Kaiser2009; Prévost, Reference Prévost, Carlier and Guillot2018; Roberts, Reference Roberts1993).

Note: Absolute numbers of null and overt pronominal subjects for each text are given in Appendix B. We use frequencies to estimate probabilities.

Figure 1. Overt pronominal subjects in main and subordinate clauses (n = 76150).

Subject agreement syncretization

French went from a language characterized by nonsyncretic agreement inherited from Late Latin to a language with a largely syncretic agreement paradigm (Bettens, Reference Bettens2015; Buridant, Reference Buridant2000; De Jong, Reference De Jong2006; Dees, Meilink, van Reenen-Stein, & van Reenen, Reference Dees, Meilink, van Reenen-Stein and van Reenen1980; Foulet, Reference Foulet1935; Marchello-Nizia, Reference Marchello-Nizia1992; Morin, Reference Morin2001). We can say that there is no systematic person marking on the verb in Modern French, and the only subject agreement feature present is number.Footnote ³ In contrast, as evidenced by the system of rhymes used in Old French versification, verbal paradigms had a much less syncretic nature during that period (e.g., Bettens, Reference Bettens2015).

Overall, there are three classes of changes that resulted in syncretism, namely, the drop of the final -t after vowels, e-insertion, and s-insertion. The first two changes can be seen as related on the hypothesis of Dees et al. (Reference Dees, Meilink, van Reenen-Stein and van Reenen1980) and van Reenen and Schøsler (Reference van Reenen and Schøsler1987) that e-insertion was a compensatory process “keeping” root consonants from the final position where they would have fallen. As we will see below, they are also much closer in time and in how they spread to each other than to the third one, s-insertion. Appendix A details the main changes in verbal agreement, by verb Group and tense-aspect form. These are:

A. innovative final -e: 1st person, Group I, present indicative & subjunctive

The use of the ending -e instead of zero for the 1st person singular subjects with Group I verbs began in the 12th century, and, by the beginning of the 15th century, generalized onto the roots ending in a consonant, the zero ending lingering for longer with stems ending in a vowel (Marchello-Nizia, Reference Marchello-Nizia1992:200). A handful of verbs whose stems etymologically ended in -e, such as monstre-r ‘to show’ were not affected by this change.

B. innovative final -e (i.e., becoming final as a result of the drop of -t): 3rd person, Group I, present indicative & subjunctive

The emergence of final -e as a consequence of the disappearance of the final -t in the context of the 3rd person singular subjects is generally considered to predate the changes in the 1st person singular contexts.

C. innovative final -e: 3rd person, Group II, present subjunctive

The alternation between -et and an innovative -e as the endings of the 3rd person singular present subjunctive in Group II also resulted in syncretism.

D. innovative final -a: 3rd person, Group I, preterite & future indicative (did not result in syncretism)

E. innovative final -a: 3rd person, Group II future indicative (did not result in syncretism)

F. innovative final Ø: 3rd person, Group II, preteriteFootnote ⁶

In Group II, the ending -t alternated with zero in the context of the 3rd person singular in preterite. This case is special in that the innovative zero ending was on the rise up until the mid-14th century when it suddenly went into a sharp decline, the old ending reinstalling itself completely. In our discussion of the spelling-pronunciation correspondence below we take this fact to indicate that the mid-14th century was a cut-off point in spelling-pronunciation contiguity, and, therefore, it gives support to the assumption that, until that point, spelling and pronunciation went largely hand in hand.

G. innovative final -s: 1st person, Group II, present and preterite indicative

The variation between a new, syncretic ending -s and nonsyncretic zero for the 1st person singular with Group II verbs, from the 14th century (Marchello-Nizia, Reference Marchello-Nizia1992:201). This change is indeed delayed compared to the spread of -e. Marchello-Nizia (Reference Marchello-Nizia1992:202) observed that, in the case of the stems ending in a vowel, it takes longer for the new variant to establish itself. There is also a limited number of verbs with stems ending in -s for etymological reasons (e.g., finis < Lat. finisco ‘to finish’).

H. innovative final -s: 1st person, Group I, imperfect and future conditional

I. innovative final -s: 1st person, Group II, imperfect and future conditional

These changes can be used to model phonological changes at least until the 14th century. One of the strongest arguments in support of spelling reliability for phonological reconstruction is the novel observation, which we will discuss in more detail below, that the dropping of the final -t in verbs with stems ending in u/i is abruptly arrested and reversed just after the mid-14th century, when the French Royal Chancellerie is known (first mention 1342) to have introduced exams for the scribes requiring them to adhere to the standardized spelling rules (De Jong, Reference De Jong2006:25). While spelling unification had been taking place already for several decades, De Jong (Reference De Jong2006) observed a sharp increase in what she called “parasitic consonants” after around 1340, which she attributed to the prescriptions of the official examiners. Consequently, after that point, we can only estimate verbal syncretism based on the change trajectory in the manuscripts written before that date.

Quantifying the emergence of the new endings

To establish the temporal profile of the surface changes in verbal endings, we calculated the ratio of the “new” endings to the sum of the new and “old” endings for each text in the corpus. In order to be able to identify the subject's person in an automated way, we limited ourselves to clauses with overt nominal or pronominal subjects. This means that we took a subset of all the cases of new endings appearing in the corpus. In order to determine whether considering only overt subjects skews the results, we look at ending choice in a sample of clauses with null subjects manually annotated for subject person and conclude that there is no significant difference in the rate of new endings between null versus overt subject contexts. Thus, we can confidently estimate the rise of the new endings from a sample of clauses with overt subjects. Figures 2 and 3 show the rise of new endings divided into two major groups, namely, final -t deletion and e-insertion on the one hand, both of which resulted in an innovative -e ending, and s-insertion on the other. (Observation numbers together with a proportion of the new endings in each text are given in Tables 5B–9B, Appendix B).

Note: P(END = NEW | DATE = D, GR = I, P = 1) stands for the estimated probability of a Group I verb to have a new ending (i.e., -e) in the context of the 1st person singular subject, etc.

Figure 2. Innovative -e (changes A, B, C) and -a endings (changes D, E).

Figure 3. Innovative zero (change F) and -s ending (changes G, H, I).

Comparing now Figures 2 and 3 with Figure 1, on the assumption that the spelling innovations reflected changes in the verbal agreement phonology, there is no reason to assume that there was a temporal lag between the emergence of new syncretic endings and the rise of overt pronominal subjects. However, we see that, while the appearance of new -e and -a endings roughly parallels the emergence of overt subjects, innovative zero and -s follow a very different trend. The next question is whether we can establish a nonaccidental relation between the rise of new endings and overt subjects.

CLAUSE-LEVEL RELATION MODEL

We will first explore a classic line of analysis that relates null subjects and nonsyncretic agreement via a certain structural property giving rise to both; let us call it Agr head. The two changes are thus viewed as a consequence of the loss of the grammar with Agr head. We show that an approach that maintains a clause-level relation between subject expression and the type of ending makes incorrect predictions about the rise of the new endings and overt pronominal subjects. We will then suggest a more flexible approach whereby syncretic endings, rather than being a direct manifestation of an alternative structure without Agr, are consequences of an independent phonological change that favors the alternative grammar. Thus, the second approach dissociates null subjects from a particular set of endings in terms of surface observations, but maintains that syncretization eventually led to the disappearance of a grammar-generating null subjects.

AgrP-Grammar

As part of the first model, we assume that the initial grammar was characterized by the presence of a person feature-specified head Agr.Footnote ⁴ We will assume that person features introduce conditions on the denotation of a pronoun. A long semantic tradition ascribes to such features the status of presupposition triggers (e.g., Cooper, Reference Cooper1983; Heim, Reference Heim, Harbour, Adger and Bejar2008; Heim & Kratzer, Reference Heim and Kratzer1998; Kratzer, Reference Kratzer2009; Sauerland, Reference Sauerland, Harbour, Adger and Bejar2008). In addition to that, we will assume that a pronoun needs to be accompanied by an element triggering a presupposition about its reference, whether it comes as part of the morphological form of the pronoun itself or as a verbal ending.Footnote ⁵ Taking the existence of the constraint for granted, we propose that person features on Agr introduce presuppositions about the subject's reference. In the absence of Agr a pro will be left uninterpreted.

TP-Grammar

We model the replacement of null subjects with overt ones and of old endings with new ones as a passage from the initial AgrP-Grammar to an alternative grammar where verbal endings correspond to the spellout of head T, unspecified for the person feature.Footnote ⁶ Since T does not carry person features, it does not introduce presuppositions necessary for a felicitous use of a pro.

If TP-Grammar replaces AgrP-Grammar, null subjects will become unavailable. Assuming the Constant Rate Hypothesis of Kroch (Reference Kroch1989), our model predicts that the rate of replacement of AgrP-Grammar by TP-Grammar should be the same whether it is measured as the rise of overt pronominal subjects or of new syncretic endings. For the general case, the Constant Rate Hypothesis (CRH) states that a grammatical change has the same rate of spreading in all grammatical environments, where the rate is taken to correspond to the slope coefficient of a logistic regression model. However, Kauhanen and Walkden (Reference Kauhanen and Walkden2017), following up on the discussion in Paolillo (Reference Paolillo2011), pointed out that the standard way of assessing statistical significance (Kroch, Reference Kroch1989; Pintzuk, Reference Pintzuk1995; Santorini, Reference Santorini1993) of a putative Constant Rate effect is statistically unsound: “if the result is not statistically significant, then it is concluded that there is support for a [Constant Rate Effect]. However, it is not sound to treat a nonsignificant value as evidence for the null hypothesis, since it was assumed to begin with.” We will maintain therefore that, whenever the result of an independence test on regression coefficients is nonsignificant, it does not contradict the CRH; rather it provides direct evidence for it.

Thus, we expect the rates of the emergence of overt pronominal subjects and of the new endings to be not significantly different. One caveat of the prediction is that even stable null subject grammars allow for overt subjects. This makes it impossible to classify a given overt pronominal subject as an instance of AgrP-Grammar or TP-Grammar, since both of them are expected to generate overt pronominal subjects. The only context that sets the two apart clearly are expletive subjects, which are consistently null in null-subject languages (e.g., Jaeggli & Safir, Reference Jaeggli, Safir, Jaeggli and Safir1989).Footnote ⁷ We therefore will compare the rise of overt expletive subjects with the rise of the new endings. There are at least three other immediate predictions. First, the rise of the new endings should proceed at the same rate in different contexts: if the emergence of the new endings reflects the disappearance of Agr, on the CRH we do not expect this change to proceed differently depending on the verb type or the subject person. Second, there should be no increase in the frequency of null subjects in the contexts of new syncretic endings. This is so because the AgrP-Grammar that, by hypothesis, is the only grammar that can license null subjects, is associated with spellout rules which do not output syncretic endings, such as -e in the context of the 1st and 3rd person subjects, overt or null. Finally, there should be no increase in subject expression with old, nonsyncretic endings: although AgrP-Grammar, associated with nonsyncretic endings, does sometimes generate overt subjects, their distribution is governed by constraints that produce the same rate of subject expression during the course of existence of grammar AgrP.

PERFORMANCE OF THE CLAUSE-LEVEL RELATION MODEL

Testing the main hypothesis

In order to evaluate the hypothesis that the emergence of overt expletive subjects and syncretic verbal endings are two manifestations of the disappearance of the grammar with a person feature-specified Agr head, we fitted the data on the appearance of overt expletives and the new endings to logistic regression models plotted in Figure 4 (parameter estimates in Table 1). The model Ending predicts whether the verbal ending, Y, is new (or syncretic) by contrast with an old or nonsyncretic verbal ending as a function of time.Footnote ⁸ We compare this model with an Expletive subject model that predicts whether the expletive subject realization, Y, is new (or overt) by contrast with an old realization where the pronominal subject is null. For the sake of comparison, we also plotted the data on the overt personal pronominal subjects.

Figure 4. Spread of new endings and overt pronominal subjects.

Table 1. Logistic regression estimates for the new endings and overt pronominal subjects (numbers of observations of null and overt expletive and personal pronominal subjects are given in Table 4B in Appendix B)

The coefficients are not very different from each other but not identical either. To further test the CRH, we test for the contribution of the slope by comparing two mixed-effect models. The first predicts the new form Y, whether it is an overt expletive subject or a syncretic verbal ending, by contrast with an old form, that is, a null subject or a nonsyncretic verbal ending. The prediction is still a function of time, but we also add a random intercept α_c for each context c: either a morphological context or a subject context.Footnote ⁹ Informally, this model means that the global model intercept may be further parametrized for each specific context, but the slope is constrained to be identical for both contexts. We compare this model to an extended version, where this time we add a random slope β_c, thus allowing the slope to vary for each context. Since the slope models the rate of change, this second model allows the rate of change to differ for each context. We test whether the slope introduces a significant difference between the two models (with a log likelihood ratio test which is χ2 distributed [df = 2]). The test has p = 0.04, and so we conclude that the introduction of the slope does better predict the data, and thus, on the CRH, these results are not compatible with the analysis of the two diachronic phenomena as stemming from the same grammatical change, which we identified as a passage from a grammar with Agr head to a grammar without it. In the remainder of this section, we will explore three other predictions made by the clause-level relation model and show that none is borne out.

Syncretization in different contexts

The model for agreement syncretization merges nine different syncretization patterns (see Appendix A). If syncretization is a consequence of the TP-Grammar associated with the new spellout rules winning over the old AgrP-Grammar, then these developments are expected to have the same rate. In order to test this, we modeled them separately, as illustrated in Figure 5 (Table 11B in Appendix B shows the estimates).

Figure 5. Logistic regression models of the emergence of the new endings.

Upon visual inspection, we see that the spread of the new ending -e has more or less the same profile in all of its contexts. In contrast, it differs from the spread of -a and -s, contrary to what was predicted by the clause-level relation model. Thus, individual endings spread at different rates, and the innovations seem to group into classes in terms of their phonological environments.Footnote ¹⁰

Spread of the new endings with null subjects

Another prediction made by the clause-level relation model is that there should be no increase in the new endings in the context of null subjects. We do find occurrences of -e in the context of the 1st or 3rd person singular null subjects (see Table 9B in Appendix B), yet such occurrences of new endings with null subjects are not frequent: at all times they stay below 20 per text. One way to explain away their occurrence is to analyze them as etymological vowels that create noise in the passage from the old to new endings. However, if that is indeed noise, we expect it not to become stronger with time. To test this expectation, we fit the data on the appearance of -e in the context of the 1st person singular overt and null subjects to a logistic regression. As Figure 6 shows, the trend is the same (see Table 12B in Appendix B for the estimates).

Figure 6. Rise of -e with Group I verbs in the context of the 1st person singular subjects.

This result is unexpected if -e with null subjects is just an etymological residue. Rather, the observation that the new ending spreads at similar rates in the context of null and overt subjects suggests that we are witnessing one and the same (phonological) change operating in different contexts. In other words, the choice of ending is independent of the expression of the subject, contrary to what is predicted by the structural model relating subject expression and ending type as manifestations of a particular grammar. Note that we do not need to check the spread of different types of new endings with overt and null subjects, since the clause-level relation model predicts that no new endings increase with null subjects and is therefore falsified even by one case of the contrary.

Spread of overt subjects with old endings

The final prediction that we derive from the clause-level relation model is that there should be no increase in subject expression in the context of verbs with old, nonsyncretic endings. We compared the rate of subject expression in the contexts of verbs with the old nonsyncretic endings -t, zero on the one hand and new syncretic endings -e, -s on the other. We estimated the probability of having an overt pronominal subject for finite clauses with verbs ending in -e (Group I & II), -t (Group I & II), -s (Group II), and zero (Group I & II) endings, as shown in Figure 7 (Table 13B in Appendix B).Footnote ¹¹ Clearly, the subject expression rate grows over time for the nonambiguous endings.Footnote ¹² Relatedly, Ranson (Reference Ranson2009) concluded, based on the three texts she examined, that ending ambiguity is not a good predictor of subject expression.

Figure 7. Pronominal subject expression with old and new endings.

In sum, we have shown that a number of predictions generated by a model that assumes that subject expression and agreement type are related at the clause level via a certain functional head are not borne out. Namely, new endings spread at different rates depending on the ending type, which is unexpected if both are generated by a new grammar that is supposed to spread at the same rate on the CRH. In addition, new endings spread both with overt and null subjects, contrary to the model's assumption that null subjects are generated only by the old AgrP-Grammar, where the Agr head spells out as old, nonsyncretic endings. Finally, the expectation that there would be no increase in overt subjects in the context of old, nonsyncretic endings, which, by hypothesis, are generated by the AgrP-Grammar producing overt subjects at a constant (relatively low) rate, is also not borne out. The overall conclusion is that a model that assumes a strict dependency at the clause level between what type of endings are used and whether or not pronominal subject is expressed is not supported by the diachronic data. However, we need to deal with another possible explanation for why we do not find a complete parallelism between ending syncretization and pro-drop disappearance, namely, that the verb ending changes registered in the written texts are not reflective of the phonological reality and therefore cannot be used to evaluate a clause-level relation hypothesis.

SPELLING-PRONUNCIATION PROBLEM

For the purposes of the present study, the problem of the correspondence between pronunciation and spelling entails two independent questions. The first one is whether the spelling innovations had phonological substance. The second question about the spelling-pronunciation relation is concerned with the emergence of phonological innovations behind conservative orthography. The state of Modern French witnesses the fall of all the stops and sibilants (at least in an isolated pronunciation) that used to correspond to the present-day word-final consonantal graphemes, not just the final -t whose disappearance we tracked above. Again, judging from the Modern French spelling-pronunciation correspondence, this change is mostly not reflected in spelling. For the second part of our study, where we attempt to estimate the general level of syncretism in the system, it is important to know until what point in time we can equate presence in the spelling with phonological presence. Fortunately, it seems that we can estimate this date with a great deal of precision due to the co-occurrence of two independently attested facts. First, there exists a historical record of the first centralized spelling standardization in the mid-14th century. Second, our data show that the disappearance of the final -t with Group II verbs with unstressed roots ending in -i/-u, which, if it had followed a statistically expected trajectory, would have reached its completion around that time, was stopped and reversed in the late 14th century (Figure 8). This presumably shows the effect of the spelling standardization that marked the end of the strict spelling-pronunciation correspondence.

Figure 8. Change reversal for Group II verbs in preterite with 3rd person subject.

There seems to be a consensus that the rise of the new endings reflected the phonological reality rather than simply a change in orthographical conventions (Dees et al., Reference Dees, Meilink, van Reenen-Stein and van Reenen1980; Fouché, Reference Fouché1931; Goyette, Reference Goyette1993; Marchello-Nizia, Reference Marchello-Nizia1992; Morin, Reference Morin2001; van Reenen & Schøsler, Reference van Reenen and Schøsler1987). Arguments against a possible claim that what we observe in texts is just variation in writing conventions can be divided into the following groups. First, as we saw, the emergence of -e as a final grapheme in the context of Group I verbs with a 3rd person singular subject in present indicative and subjunctive follows a logistic curve whose slope is indistinguishable from the slope of the curve modeling the emergence of -e with Group I verbs in the context of 1st person singular subjects in present indicative and subjunctive.Footnote ¹³ These results fit well with the hypothesis of Dees et al. (Reference Dees, Meilink, van Reenen-Stein and van Reenen1980) and van Reenen and Schøsler (Reference van Reenen and Schøsler1987) about /e/-insertion being a compensatory process triggered by the instability of the final stops to preserve the integrity of the root. The appearance of -e as a final grapheme with the 3rd person singular subjects on this view results from the fall of the final /t/ (e.g., aimet > aime ‘(he) loves’ and aint > aime ‘(he) would love’), whereas its appearance with the 1st person singular subjects results from a compensatory /e/-insertion to keep the root final consonants from not being pronounced (e.g., aim > aime ‘(I) love’ and ‘(I) would love’). Although the quasi-identity of slopes is only indicative, this is expected on the hypothesis that this is a paradigm-wide morphophonological process. That is, given the CRH, it is entirely expected for a morphophonologically conditioned change to proceed at the same rate in different environments (cf., Fruehwald, Gress-Wright, & Wallenburg, Reference Fruehwald, Gress-Wright, Wallenberg, Kan, Moore-Cantwell and Staubs2009). Second, according to our estimates, in the context of the 3rd person subjects syncretization happened earlier than with the 1st person singular subjects, which makes sense if the fall of the final stops that were not part of the root (again, aimet > aime ‘(he) loves’ and aint > aime ‘(he) would love’) preceded the emergence of a “compensatory” /e/ following root-final consonants. In contrast, on the hypothesis that what we observe are changes in spelling conventions, although not theoretically impossible, it would look like a series of strange coincidences if, first, different spelling conventions in different contexts were changing at very same rates, and, second, if they first changed in the context of the 1st and then of the 3rd person subjects. Third, according to Fouché (Reference Fouché1931:180) and Marchello-Nizia (Reference Marchello-Nizia1992:201), in the context of the 1st person singular subjects in indicative and subjunctive, the -e grapheme first spread in the context of consonant-final and only later vowel-final roots (e.g., cri-er ‘to shout’ where in the context of the 1st person singular subjects cri was replaced by crie). Again, this fits a phonology-based account of the change, since the sequence of spreading across contexts can be described in terms of phonologically natural classes, whereas this appears as a mysterious plotting of the scribes on the spelling convention-based account. Lastly, a phonologically motivated change affecting final vowels has precedence in the history of Late Latin, where all the final vowels ended up falling except for those cases where their fall would have led to an unacceptable consonant cluster (see the discussion in Goyette [Reference Goyette1993] and references therein). In Old French, reflexes of this process are the so-called “etymological e,” that is, root-final -e following certain consonant clusters, as in siffle-r (from Latin sibila-re > sifila-re > sifla-re) (cf., don-er from Latin dona-re, which lost its root final /a/ in Late Latin, unlike siflare). It is not so surprising then to see another round of “compensatory” root final /e/, this time as an epenthetic process meant to keep the root final consonants from falling. In view of these arguments, none of which supports an account of the new endings in terms of spelling convention changes, we conclude that our results based on written source can be plausibly projected onto the phonological reality and thus used to test a model structurally relating syncretism, introducing changes and pro-drop disappearance. Similarly, Fruehwald et al. (Reference Fruehwald, Gress-Wright, Wallenberg, Kan, Moore-Cantwell and Staubs2009) analyzed data on the loss of final fortition in (Bavarian) Early New High German, observable in orthographic variation of the period, for example, tak versus tag ‘day (acc. sg),’ rat versus rad ‘counsel (acc.sg),’ and argued that this variation clearly represents a phonological change in progress rather than shifting scribal tradition. When it comes to determining at what point the phonological reality behind conservative spellings changed, the reconstructions of the timing of the fall of the final consonants rely mostly on the analysis of rhymes (matching versus nonmatching), hypercorrections (insertions of etymologically absent consonants), omissions of etymologically present consonants, commentaries in the grammars of the time, and analyses of the borrowing from French into other languages that likely reflected the spelling at the time of borrowing. The dating question is important to us in as much as we want to take into account final consonant instability when evaluating the overall degree of syncretism or ambiguity in the verbal system. De Jong (Reference De Jong2006) undertook a statistical analysis of the rhymes in three texts written in the Parisian dialect in the 13th-14th centuries. She looked at the frequency of the nonmatching rhymes for a given grapheme (e.g., escript ‘text’–(je) pris ‘I take’) compared with that of the matching rhymes (e.g., moult ‘many’–(je) doubt ‘I doubt’), taking higher-than-chance frequencies to be indicative of the grapheme nonpronunciation. One of the general conclusions of De Jong (Reference De Jong2006:176) is that the nonpronunciation of the final consonantal graphemes increases dramatically in the 14th century. This is the period when the mismatching rhymes, including mismatches involving our consonants of interest, begin to be observed in her corpus (cf., Foulet, Reference Foulet1935). Importantly, De Jong (Reference De Jong2006:174) linked the emerging mismatch between spelling and pronunciation with a particular historical event, namely, the introduction by the Royal Chancellery in Paris of the standard exams for the scribes in 1342. We found a rather dramatic argument in favor of this hypothesis in the form of the reversal of the final -t disappearance in the preterite forms of certain Group II verbs (Figure 8). We cannot conceive of any plausible explanation of this development in phonological terms. Rather, it seems to result precisely from an artificially introduced norm.

CHANGE AS A VARIATIONAL LEARNING OUTCOME

As was demonstrated above, the long-standing intuition going back to at least Foulet (Reference Foulet1928) that it was the impoverishment of the verbal endings that triggered the loss of null subjects cannot be impelemented as a model in which non-syncretic endings and null subjects are considered manifestations of the same grammatical property. However, given that overall the new endings and overt pronominal subjects (whether personal or expletive) spread at almost the same rates, illustrated in Figure 4, it would likewise be counterintuitive to conclude that we should give up altogether on all the models which assume a non-accidental relation between the two changes.

Sprouse and Vance (Reference Sprouse, Vance and deGraff1999) proposed the first, to our knowledge, reinforcement learning model to explain the loss of null subjects, appealing to the processing difficulty associated with their parsing. In this model, a null subject has a greater chance of inducing a parsing failure than its competitor, an overt pronominal subject. Since, by the authors’ assumption, speakers tend to produce grammatical forms at frequencies at which they have encountered them in their speech community, failures to parse null subjects will lead to the decrease in the frequency of null subjects in the output of the speakers, which in turn will reduce the ambient frequency of null subjects on the next cycle. The cycle repeats until null subjects vanish from the speech community.

Below we suggest a model of the loss of null subjects which builds on the variational learning model proposed in Yang (Reference Yang2002, Reference Yang2010). Ambiguous endings are considered within this model as the main factor that creates a parsing difficulty for null subjects (contra Sprouse & Vance [Reference Sprouse, Vance and deGraff1999]).

General framework

Yang's (Reference Yang2002, Reference Yang2010) model is based on the assumption that children have innate access to multiple grammatical systems and, in the course of language learning, use the input data to probabilistically evaluate the available options. They may either converge on a single grammar, or, as adults, they may end up with multiple grammars used at certain probabilities, which corresponds to the case of synchronic variation. Depending on whether the next generation arrives at the same or different probability distribution, we get the case of diachronically stable variation or diachronic change respectively. Hypothesizing what kind of data contributes to the probabilistic evaluation of the grammars, we can approximate the course of the competition based on corpus distributions of the relevant data.

Formally, we use Yang's (Reference Yang2002, Reference Yang2010) model as a way to estimate the probabilities P(G = G₁) of using the grammar G₁ and P(G = G₂) of using the grammar G₂ from a data set X = x₁ …x_n in which, for a specific example x ∈ X, we are not sure which of G₁ or G₂ actually generated x. Informally, the estimation procedure is iterative and increases P(G = G_i) when G_i successfully parses an example x while it decreases P(G = G_j) (i ≠ j). The iterative procedure runs as follows:

• Select randomly a clause x in the data set X
• Select randomly G_i in proportion to its probability
• Analyze x with G_i
- ○ If G_i succeeds in analyzing x, provide G_i a reward and G_j a penalty: P(G = G_i) increases and P(G = G_j) decreases.
- ○ If G_i fails in analyzing x, provide G_i a penalty and G_j a reward: P(G = G_i) decreases and P(G = G_j) increases.

Using the notation G_i ↛ x to indicate that G_i fails to parse x, we can define the notion of penalty of a grammar G_i as c_i = P (G_i ↛ x). That is, c_i is the probability that G_i fails to analyze an example in X. This quantity can be estimated simply by counting the proportion of a grammar's failures in the data set. Given this notion, for the case where we have two grammars G₁, G₂ with penalties c₁, c₂, Narendra and Thathachar (Reference Narendra and Thathachar1989) proved the following theorem:

$${\rm lim}_{t \to \infty} \,{\rm P}\lpar {{\rm G} = {\rm G}_1 \vert {\rm T} = {\rm t}} \rpar = \displaystyle{{{\rm c}_2} \over {c_1\; + \; c_2}}\;;{\rm} {\rm lim}_{t\to \infty} \,{\rm P}\lpar {{\rm G} = {\rm} {\rm G}_2 \vert {\rm T} = {\rm t}} \rpar {\rm} = \displaystyle{{{\rm c}_1} \over {c_1\; + \; c_2}}$$

The probability of using a grammar G_i is proportional to the number of observed failures of G_j in the data set (i ≠ j). Specifically P(G = G_i) = 1 when G_j always fails and P(G = G_i) = 0 when G_j never fails.

Diachronic stability and change

The outcome of the learning process (possibly over the lifespan) may stay the same or it may change from one generation to another. In the model we are considering, the only reason why learning may not converge on grammar G_i is if its penalty probability c_i is greater than zero, that is, if there are some subset input data that G_i fails to parse. Once c_i associated with G_i becomes greater than zero, a language may leave a diachronically stable state and enter a state of diachronic change. Moreover, an increase in the frequency of the data unparseable with G_i in the next generation will lead to the increase in c_i, and so on to the point when G_i gets completely demoted. Emergence of such data may have nothing to do with the grammatical options themselves and may stem from phonological changes as well as from a second language interference.

Applying this to the loss of null subjects in Medieval French, let us assume that the initial winning grammar (Agr-P Grammar) is the one that licenses null pronominal subjects. Its competitor (the TP-Grammar) only generates clauses with an overt subject. Notice that this model incorporates the Taraldsen/Rizzi insight about a categorical, core grammar-based dependency between functional head features and null subjects. In order to model the competition between these two grammars, the crucial parameters are the penalty probabilities of the grammars. By hypothesis, AgrP-Grammar fails each time the information about a subject's reference cannot be retrieved from the verbal ending, which is the case whenever the ending is ambiguous. An ending is classified as ambiguous in case the speaker has been exposed to a data sample where the ending occurs in the context of overt subjects with various (more than one) person specifications.

In the case of ambiguous endings, the Agr head cannot be projected during the parse, since there is not enough information to give it semantic content. In contrast, TP-Grammar fares well with all kinds of endings (as long as tense information can be read off of them), but fails when chosen to parse null-subject clauses. In those cases in the absence of a subject DP providing presupposition triggering features, the domain of the external argument of the verb is left underspecified, and the composition does not converge. Now a diachronically stable null subject situation is predicted to obtain in case there are no problematic data of the kind described above, that is, there are no ambiguous endings and the penalty probability c_Agr is 0. This means that AgrP-Grammar never fails and in every generation ends up driving the competing TP-Grammar out, since the latter cannot parse some of the AgrP-Grammar's output, namely null-subject clauses.

Estimating failure probabilities

To estimate c_Agr, we exhaustively classify verbal endings as ambiguous or unambiguous. We define an ending as ambiguous if it does not correspond to a unique combination of person and number features (see Appendix A). We coded every finite clause in the corpus (as usual, with the exclusion of subject wh-clauses and imperatives) as to whether the verbal ending is unambiguous. In the case of endings that were ambiguous already in the earliest texts, all clauses with a finite verb having such an ending have been coded as ambiguous. In the case of endings classified as having emerging ambiguity, namely, those that became ambiguous later than in the earliest texts, we classified clauses dated before the first attested cases of ambiguity as having unambiguous predicates and those dated after the ambiguity emerged as having ambiguous predicates. The failure probability c_Agr is then estimated as the frequency of clauses with ambiguous predicates at a given date. In the case of TP-Grammar the estimate of c_TP is even more straightforward: it is the frequency of null-subject clauses.

The predicted value of PTP given c_Agr and c_TP as estimated using the matrix above is plotted in Figure 9 (observation numbers on which c_TP and c_Agr are based are given in Appendix B, Tables 4B and 10B, respectively).

Figure 9. Parsing probability of the TP-Grammar.

Discussion

We estimated the parsing probability of the TP-Grammar based on our estimates of the probabilities with which this grammar and its competitor, AgrP-Grammar, encounter data that they cannot parse. The parsing probability of the TP-Grammar grows steadily during the Medieval period.

Recall that the parsing probability of the TP-Grammar in the limit is the ratio of the probability of AgrP-Grammar to fail to the sum of the AgrP-Grammar and TP-Grammar probabilities to fail. This means that the greater the probability of the AgrP-Grammar to fail, the greater, eventually, will be the probability of the TP-Grammar to be used. Given how we estimate the AgrP-Grammar's probability to fail, that is, as the frequency of ambiguous endings, it is clear that our estimate of the TP-Grammar probability to be used is dependent on the frequency of ambiguous endings. Thus, this model, without assuming that ending ambiguity and subject expression depend on the same underlying factor, puts the two in a relation of direct dependency. This is a welcome configuration given the desiderata expressed above, namely, finding a model which would dissociate the two phenomena at the clause level but would relate them in the course of language evolution. It is worth stressing here that a given ending in this model does not reveal which grammar was used to generate it: by assumption, the spread of syncretic endings is a phonological phenomenon which is “blind” to the syntactic origins of the string it is operating on.

As Figure 10 shows, the curve corresponding to the parsing probability of the TP-Grammar is roughly parallel to the estimated probabilities of personal and pronominal subject expression. Intuitively, the probability of the expletive subject expression corresponds to the probability of the TP-Grammar to be chosen for production, since this is the only grammar that can generate overt expletive subjects. What we observe, then, is the TP-Grammar parsing probability lagging significantly behind its production probability. One explanation for the apparent lag is that our estimate of the ending ambiguity, to which the TP-Grammar production probability is directly related, is overly conservative. We have already mentioned on several occasions that the spelling standardization of the mid-14th century seems to have had a visible effect on the manuscripts upon which our corpus is based. The phenomenon we discussed is the reintroduction of the final -t for the Group II verbs with unstressed roots in -i/-u. Given the disappearance of the inflectional -t after vowels prior to 1300, the same should have happened to the -t following glides and stops shortly thereafter (cf., a suggestion in Buridant [Reference Buridant2000:250]). And we know for a fact that eventually final inflectional stops did fall in all the environments. However, presumably due to the spelling standardization, we do not observe these changes and, therefore, cannot take them into account in our estimates of ending ambiguity, which gives the impression that the latter lags seriously behind the production probability reflected in the rate of overt expletive subjects.

Note: Assuming that, in addition to -e, -s, zero following i/u, and -eies; also ambiguous were -t, -eiet, -it, -et, -at.

Figure 10. Parsing probability of the TP-Grammar (non-conservative*).

We can see what happens to the parsing probability of the TP-Grammar if we make a less conservative assumption about final consonant fall. That is, let us assume that, in addition to the endings -e, -s, zero following i/u, and -eies, the following endings were ambiguous as well by virtue of effectively not being pronounced from 1400 on and thus resulting in verbal forms homophonous with either 1st or 2nd person singular forms: -t, -eiet, -it, -et, -at. In Figure 10, one can see that this less conservative estimate is almost identical to the estimated production probability in the form of overt expletives.

One may argue, however, that this parallelism, in general, is not a particularly interesting result, since, in addition to the frequency of ambiguous endings, the TP-Grammar parsing probability in the limit inversely depends on the relative frequency of null subjects, which, of course, decreases over time. Consequently, the question is whether ending ambiguity actually plays an important role in predicting the TP-Grammar parsing probability.

One way to evaluate the role of the endings for the outcome of the grammar competition is to design a variational learning-based model in a way that would not make reference to them at all and to compare this “ending-less” model to the model that does take them into account, such as the one that we have just considered. To this end, we use the measure of grammar fitness proposed in Yang (Reference Yang2000) and used, in particular, in variational learning models of the loss of V-to-T raising in Scandinavian in Heycock and Wallenberg (Reference Heycock and Wallenberg2013) and the loss of OV in Latin in Danckaert (Reference Danckaert2017). Fitness of grammar G is defined as the proportion of clauses that only G generates out of all clauses that G generates, or the proportion of unambiguous clauses in the output of G.

Fitness of the AgrP-Grammar cannot be straightforwardly estimated from our data, since, by hypothesis, all the attested stages of historical French correspond to mixed grammar states, that is, to the outputs of the two competing grammars. This follows from the assumption that, whenever we find overt expletive subjects, the TP-Grammar must have been at work and from the fact that expletive subjects are found in the earliest attested texts (e.g., Prévost, Reference Prévost, Carlier and Guillot2018; Zimmermann, Reference Zimmermann2014). Instead, we can approximate the fitness measure of the AgrP-Grammar on the basis of a language that is currently in a “pure” pro-drop state. The estimated probability of null subjects is around 0.7 for pro-drop languages such as Italian or Spanish (e.g., Bates, Reference Bates1976; Nagy, Aghdasi, Denis, & Motut, Reference Nagy, Aghdasi, Denis and Motut2011; Otheguy, Zentella, & Livert, Reference Otheguy, Zentella and Livert2007:778). That is, by assumption, the AgrP-Grammar produces unambiguous clauses with the probability of 0.7. Now the fitness of TP-Grammar corresponds to the estimated probability of expletive subjects in a nonpro-drop language, such as English, which obviously cannot be anywhere near 0.7.Footnote ¹⁴ Given these approximations, Fitness(AgrP-Grammar) >> Fitness(TP-Grammar). By the Fundamental Theorem of Language Change (Yang, Reference Yang2000:239), which states that the winner in the long run is always the grammar with a greater fitness, this model predicts that AgrP-Grammar wins hands down, contrary to the historical facts. We thus conclude that a model that factors in ending ambiguity fares better than a model that does not, which supports the assumption that there is a causal relation between ending syncretization and null-subject disappearance.

CONCLUSIONS

The goal of this paper was to bring parsed corpus data and statistical modeling to bear on the old-standing puzzle of the relation between the disappearance of null subjects and verbal subject agreement syncretization in the historical development of French. We engaged the Constant Rate Hypothesis in order to explore a model that relates the two changes as reflexes of one underlying structural shift and showed that it generates predictions that are not supported by the data. The key feature of the failed predictions is the independence of the two developments. Specifically, we found that the increase in overt personal pronominal subjects was uniform across old and new endings, and, likewise, that the spread of the new syncretic endings was uniform across clauses with null and overt subjects.

A second model that we explored related the two changes via the step of language learning whereby one change (ending syncretization) promotes the appearance of sentences that disadvantage a grammar with a null-subject option and thus automatically favor the overt-subject grammar, thus causing null-subject disappearance. We approached this model from two perspectives.

First, we focused on what we assumed to be the parsing capacities of the two competing grammars and used the linear reward-penalty theorem to estimate the evolution of the probabilities of the two grammars over time by estimating their failure probabilities at each time point. Crucially, the failure probability of a null-subject grammar is taken to be directly related to the frequency of ambiguous endings in the data. This was the core assumption meant to capture the intuitive link between agreement quality and subject expression in the process of language change. Another crucial assumption, which made the model compatible with the facts concerning the surface-level independence of subject and ending types, was that the type of ending is determined by a phonological process that is entirely independent of which grammar is picked by the speaker to generate a given sentence.

As a second possibility, we estimated fitness of null and overt subject grammars based on the data from pure state languages. This time the measure did not rely on ending ambiguity. Estimated this way, fitness gives an advantage to the null-subject grammar, and on Yang's Fundamental Theorem it is expected to win, contrary to the historical facts. We conclude, thus, that so far the best model of null-subject disappearance is one that factors in the increase in ending ambiguity without assuming a categorical clause-level dependency between the two phenomena.

Synchronic studies of variation in null-subject expression in Romance languages, to the best of our knowledge, fail to establish verbal ending ambiguity as a relevant factor, and thus leave us with a puzzle as to the nature of Taraldsen's generalization. For instance, Nagy and Heap (Reference Nagy and Heap1998) reported that whether an ending is ambiguous is not a good predictor of subject expression in Francoprovençal. The same conclusion is reached in Carvalho and Child (Reference Carvalho and Child2011) based on Spanish material. This agrees with Ranson's (Reference Ranson2009) conclusions and our own observations concerning the diachronic French data. Our work thus supplements synchronic variationist studies in that we offer a diachronic model that can capture the relation without postulating a clause-level dependency. This suggests that, in some cases, the study of natural language variation must include the temporal dimension, otherwise some potentially highly relevant factors will remain “invisible.”

It has to be noted that our conclusions do not rule out in principle a clause-level dependency between surface forms. Such an outcome could, for instance, be the result of a competition between Agr- and TP-Grammars, whereby by the end of a variational learning cycle they end up in a complementary distribution with respect to tense/aspect environments. An analysis along these lines would need to be worked out for the systems where the only contexts disfavoring subject omission are certain tense/aspectual forms syncretic with respect to subject person, such as, for instance, some Northern Italian, Franco-Provençal, and Occitan dialects (Manzini & Savoia, Reference Manzini and Savoia2005), Hebrew (Shlonsky, Reference Shlonsky2009), Finnish (Koeneman, Reference Koeneman, Ackema, Brandt, Schoorlemmer and Weerman2006), Irish (Speas, Reference Speas1995), Russian (Bizzarri, Reference Bizzarri2015).

This study is part of a more general agenda of using diachronic material for the study of interfaces, that is, formal relations between syntax, morphology, phonology, and semantics/pragmatics. Another prominent group of what seems to be parallel and potentially related changes is the disappearance of nominal case marking and word order changes. Simonenko, Crabbé, & Prévost (Reference Simonenko, Crabbé, Prévost, Dickinson, Hinrichs, Patejuk and Przepiórkowski2015) showed that the remnants of the case opposition in Medieval French disappear within approximately the same timeframe as the possibility of having an OV order. It remains to be seen in further research if any of the models explored above can be used to explore the nature of the relation between these two changes.

SUPPLEMENTAL MATERIALS

Appendices A, B, and C can be found at: https://doi.org/10.1017/S0954394519000188

Footnotes

Sophie Prévost's affiliation has been corrected and funding sources have been added to the acknowledgments. An addendum detailing this change has also been published (doi:10.1017/S0954394519000267).

This work has been supported by the Labex EFL and Research Foundation Flanders. The authors are very thankful to three anonymous reviewers of Language Variation and Change for extremely helpful comments. We would like to express our deep gratitude to Yves Charles Morin, Henri Kauhanen, George Walkden, Paul Hirschbühler, Philippe Schlenker, and Hedde Zeijlstra for discussions and suggestions. The project has benefited from the feedback from the audiences at WCCFL 34, DiGS 2016, XLanS: Triggers of language change in the Language Sciences, workshop Linguistic Knowledge & Patterns of Variation, Texts, Tools, and workshop Methods in Digital Classics and Medieval Studies, seminars at Institut Jean Nicod, Université Diderot Paris 7, and the University of Manchester. The first author acknowledges the support of the Research Foundation Flanders. The work of the third author has been supported by a public grant overseen by the French National Research Agency (ANR) as part of the program “Investissements d'Avenir” (reference: ANR-10-LABX-0083). It contributes to the IdEx Université de Paris - ANR-18-IDEX-0001.

1. In addition to demonstratives, we counted as nominal subjects all subject phrases headed by a noun, both animate and inanimate, as well as nominalized adjectives, numerals, quantifiers, and free relatives, disregarding their syntactic position with respect to the finite verb.

2. There are a handful of other (less frequent) conjunctive adverbs, such as puis, as well as a disjunction mais that seem to license subject ellipsis in Modern French and that we did not exclude from our dataset, since this would require an exhaustive study of ellipsis- licensing conditions in Medieval French.

3. Here we are setting aside the question of whether subject clitics in Modern French function as subject-agreement markers (see De Cat [Reference De Cat2005] for a discussion).

4. Importantly, we are not assuming that we can necessarily observe in existing sources the stage of French where the only grammar in use was the one we call here initial, that is, the null subject grammar without ending syncretism. This stage was likely left undocumented. This is why, we think, Old French is sometimes called a partial or nonpro-drop language (e.g., Kaiser, Reference Kaiser2009; Zimmermann, Reference Zimmermann, Remberger and Kaiser2009). In terms of the grammar competition model of language change that we assume, the data which only partially conform to the criteria of a given grammatical type are modeled as a mix of outputs of two “pure” grammatical types.

5. In this relation, we can invoke a long tradition, going back to Benveniste and supported by typological observations, of ascribing a universal status to the person distinctions in natural languages (e.g., Harley & Ritter, Reference Harley and Ritter2002; Tvica, Reference Tvica2017).

6. We assume that -ez and -ons endings, distinct from the rest and each other, are exponents of the feature [plural] in the context of 1st and 2nd person plural subjects.

7. As mentioned above, the stage where only AgrP-Grammar is operative is not attested; we find overt expletives in the earliest French documents, even though at a very low rate.

8. The ENDING model has the form P(Y = new | Date = d) $ = {{{\rm e}^{{\rm \alpha} + {\rm \; \beta *d}}} \over {1 + {\rm \;} {\rm e}^{{\rm \alpha} + {\rm \; \beta *d\;}}}} $, where α is the intercept and β the slope. The intercept is interpreted as an abstract indicator of when the change takes place in time, and the slope is interpreted as the rate of change. The EXPLETIVE SUBJECT model has the same form, but this time Y represents expletive subject realization instead of verbal syncretism. In order to illustrate both models we first fitted them separately to the data.

9. The two models are P(Y = new | Date = d, Context = c) $ = \textstyle{{{\rm e}^{{\rm \alpha} + {\rm \; \alpha c} + {\rm \; \beta *d}}} \over {1 + {\rm \;} {\rm e}^{{\rm \alpha \;} + {\rm \; \alpha c} + {\rm \; \beta *d\;}}}} $ and P(Y = new | Date = d, Context = c)${\rm \;} = {{{\rm e}^{{\rm \alpha} + {\rm \; \alpha c} + \lpar {{\rm \beta} + {\rm \; \beta c}} \rpar {\rm *d}}} \over {1 + {\rm \;} {\rm e}^{{\rm \alpha \;} + {\rm \; \alpha c} + \lpar {{\rm \beta} + {\rm \; \beta c}} \rpar {\rm *d\;}}}} $.

10. One way to interpret these findings is to assume that first there is a fall of the ending-final stops after -e- and a compensatory /e/-insertion following root-final stops, which appears as an innovative -e ending. Second, there is a fall of the ending-final stops after -a-, which appears as an innovative -a ending. Third, there is a fall of the ending-final stops after -i/-u (Group II 3rd person preterite) and /s/-insertion after -i/-u (Group II 1st person present and preterite), which results in innovative zero and -s endings respectively. The latter process can arguably be considered as compensatory in relation to the former in order to keep the 1/3 person distinction. Finally, there is an innovative /s/-insertion after diphthongs (Group I and II first person imperfect and future conditional).

11. We fit the seven datasets to the logistic regression model P(Pron Sbj = yes | Date = d) $ = {{e^{{\bf \alpha} + \; {\bf \beta} *{\bf d}}} \over {1 + \; e^{{\bf \alpha} + \; {\bf \beta} *{\bf d}\;}}} $.

12. One can see that overt pronominal subjects spread at very similar rates with all endings except for -t with Group I verbs. A plausible explanation is that -t with Group I verbs virtually disappears after 1200, and what we see past that date is essentially noise which skews the model.

13. We used the same approach as for the comparison between the slopes of expletive subject and new ending emergence models, with p = 0.8 this time.

14. According to Chocholoušová (Reference Chocholoušová2009), in sentence-initial position in English texts, dummy subjects occur at the frequency of 0.25% per 10,000 words. The author did not give a frequency in terms of sentences, but if we roughly estimate an average English sentence as consisting of 20 words, this gives us a frequency of 4.8%. That is, 0.05 can be used as a (very rough) approximation of the probability of the unambiguous output by the TP-Grammar.

References

REFERENCES

Adams, Marianne. (1987). From Old French to the theory of pro-drop. Natural Language & Linguistic Theory 5:1–32.CrossRef Google Scholar

Bates, Elizabeth. (1976). Language and context: The acquisition of pragmatics, volume 13. New York: Academic Press.Google Scholar

Bettens, Olivier. (2015). Chantez-vous français ? Remarques curieuses sur le français chanté du Moyen Âge à la période baroque. URL http://virga.org/cvf/.Google Scholar

Bizzarri, Camilla. (2015). Russian as a partial pro-drop language: Data and analysis from a new study. Annali di Ca' Foscari. Serie occidentale 49.Google Scholar

Buridant, Claude. (2000). Nouvelle grammaire de l'ancien français. Paris: Sedes.Google Scholar

Carvalho, Ana M., & Child, Michael. (2011). Subject pronoun expression in a variety of Spanish in contact with Portuguese. In Selected Proceedings of the 5th Workshop on Spanish Sociolinguistics, 14–25.Google Scholar

Chocholoušová, Bohumila. (2009). Dummy subjects in English, Norwegian and German. A parallel corpus study. Doctoral Dissertation, Masarykova univerzita.Google Scholar

Cooper, Robin. (1983). Quantification and syntactic theory. Dordrecht: Reidel.CrossRef Google Scholar

Danckaert, Lieven. (2017). The loss of Latin OV: Steps towards an analysis. In Elements of comparative syntax: Theory and description 127:401–44. Berlin: Walter de Gruyter GmbH & Co KG.Google Scholar

De Cat, Cécile. (2005). French subject clitics are not agreement markers. Lingua 115:1195–219.CrossRef Google Scholar

De Jong, Thera. (2006). La prononciation des consonnes dans le français de Paris aux 13 ^èmeet 14 ^èmesiècles. Utrecht: LOT/Netherlands Graduate School of Linguistics.Google Scholar

Dees, Anthonij, Meilink, Steintje, van Reenen-Stein, Karin, & van Reenen, Pieter. (1980). Un cas d'analogie: l'introduction de -e à la première personne du singulier de l'indicatif présent des verbes en -er en ancien français. Rapport/Het Franse Boek 50:105–10.Google Scholar

Ewert, Alfred. (1943). The French Language. London: Faber & Faber.Google Scholar

Fontaine, Carmen. (1985). Application de méthodes quantitatives en diachronie: L'inversion du sujet en français. Master's thesis. Université du Québec à Montréal.Google Scholar

Fouché, Pierre. (1931). Le Verbe français: étude morphologique. Paris: Société d'édition “Les Belles Lettres.”Google Scholar

Foulet, Lucien. (1928). Petite syntaxe de l'ancien français. Paris: Champion, troisième édition revue. Réédition 1982.Google Scholar

Foulet, Lucien. (1935). L'extension de la forme oblique du pronom personnel en ancien français. Romania 61–62, 257–315.CrossRef Google Scholar

Franzén, Torsten. (1939). Étude sur la syntaxe des pronoms personnels sujets en ancien français. Doctoral Dissertation. Uppsala.Google Scholar

Fruehwald, Josef, Gress-Wright, Jonathan, & Wallenberg, Joel. (2009). Phonological rule change: The constant rate effect. In Kan, S., Moore-Cantwell, C., & Staubs, R. (Eds.), Proceedings of the 40th Annual Meeting of the North East Linguistic Society, volume 1, 219–30. GLSA Publications.Google Scholar

Goyette, Stéphane. (1993). Le système verbal de Jacques Peletier du Mans, XVIe siècle. Master's thesis. Université de Montréal.Google Scholar

Harley, Heidi, & Ritter, Elizabeth. (2002). Person and number in pronouns: A feature-geometric analysis. Language 78:482–526.CrossRef Google Scholar

Haspelmath, Martin. (1999). Optimality and diachronic adaptation. Zeitschrift für Sprachwissenschaft 18:180–205.Google Scholar

Heim, Irene. (2008). Features on bound pronouns. In Harbour, D., Adger, D., & Bejar, S. (Eds.), Phi Theory: Phi-features across Modules and Interfaces, 35–52. Oxford: Oxford University Press.Google Scholar

Heim, Irene, & Kratzer, Angelika. (1998). Semantics in Generative Grammar. Oxford: Blackwell.Google Scholar

Heycock, Caroline, & Wallenberg, Joel. (2013). How variational acquisition drives syntactic change. The Journal of Comparative Germanic Linguistics 16:127–57.CrossRef Google Scholar

Hirschbühler, Paul. (1992). L'omission du sujet dans les subordonnées V1: les CNN de Vigneulles et les CNN anonymes. Travaux de linguistique 24:25–46.Google Scholar

Hirschbühler, Paul, & Junker, Marie-Odile. (1988). Remarques sur les sujets nuls en subordonnées en ancien et en moyen français. Revue québécoise de linguistique théorique et appliquée 7:63–84.Google Scholar

Jaeggli, Osvaldo & Safir, Kenneth J. (1989). The null subject parameter and parametric theory. In Jaeggli, O. and Safir, K. J. (Eds.), The null subject parameter, 1–44. Dodrecht: Springer.CrossRef Google Scholar

Kaiser, Georg A. (2009). Losing the null subject. A contrastive study of (Brazilian) Portuguese and (Medieval) French. In Proceedings of the Workshop “Null-subjects, expletives, and locatives in Romance,” 131–56.Google Scholar

Kauhanen, Henri, & Walkden, George. (2017). Deriving the Constant Rate Effect. Natural Language and Linguistic Theory, 1–39.Google Scholar

Koeneman, Olaf. (2006). Deriving the difference between full and partial pro-drop. In Ackema, P., Brandt, P., Schoorlemmer, M., & Weerman, F. (Eds.), Arguments and agreement, 76–100. Oxford: Oxford University Press.Google Scholar

Kratzer, Angelika. (2009). Making a pronoun: Fake indexicals as windows into the properties of pronouns. Linguistic Inquiry 40:187–237.CrossRef Google Scholar

Kroch, Anthony. (1989). Reflexes of grammar in patterns of language change. Language Variation and Change 1:199–244.CrossRef Google Scholar

Manzini, Maria Rita & Savoia, Leonardo Maria. (2005). I dialetti italiani e romanci: morfosintassi generativa, volume 1. Edizioni dell'Orso.Google Scholar

Marchello-Nizia, Christiane (1992). Histoire de la langue française aux XIVe et XVe siècles. Paris: Dunod.Google Scholar

Morin, Yves Charles. (2001). La troncation des radicaux verbaux en français depuis le Moyen Age. Études diachroniques. Recherches linguistiques de Vincennes 30:63–85.Google Scholar

Nagy, Naomi & Heap, David. (1998). Franco-provençal null subjects and constraint interaction. CLS 34:151–66.Google Scholar

Nagy, Naomi G., Aghdasi, Nina, Denis, Derek, & Motut, Alexandra. (2011). Null subjects in heritage languages: Contact effects in a cross-linguistic context. University of Pennsylvania Working Papers in Linguistics 17(2):134–44.Google Scholar

Narendra, Kumpati S. & Thathachar, Mandayam A.L. (1989). Learning Automata: An Introduction. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar

Otheguy, Ricardo, Zentella, Ana Celia, & Livert, David. (2007). Language and dialect contact in Spanish in New York: Toward the formation of a speech community. Language 83:770–802.CrossRef Google Scholar

Paolillo, John C. (2011). Independence claims in linguistics. Language Variation and Change 23:257–74.CrossRef Google Scholar

Penn Supplement to MCVF. (2010). Penn Supplement to the MCVF Corpus by Anthony Kroch & Beatrice Santorini.Google Scholar

Pintzuk, Susan. (1995). Variation and change in Old English clause structure. Language Variation and Change 7:229–60.CrossRef Google Scholar

Prévost, Sophie. (2018). Increase of Pronominal Subjects in Old French: Evidence for a Starting-point in Late Latin. In Carlier, A. & Guillot, C. (Eds.), Latin tardif-français ancien: continuités et ruptures, 169–98. Coll. Beihefte zur Zeitschrift für romanische Philologie. Berlin: De Gruyter.Google Scholar

Ranson, Diana L. (2009). Variable subject expression in Old and Middle French prose texts: The role of verbal ambiguity. Romance Quarterly 56:33–45.CrossRef Google Scholar

Rizzi, Luigi. (1986). Null objects in Italian and the Theory of pro. Linguistic Inquiry 17:501–57.Google Scholar

Roberts, Ian. (1993). Verbs and Diachronic Syntax: A Comparative History of English and French. Dordrecht: Kluwer.Google Scholar

Roberts, Ian. (2014). Taraldsen's Generalization and Language Change: Two Ways to Lose Null Subjects. In Svenonius, P. (Ed.), Functional Structure from Top to Toe: The Cartography of Syntactic Structures, 9, 115–48. Oxford: Oxford University Press.CrossRef Google Scholar

Santorini, Beatrice. (1993). The rate of phrase structure change in the history of Yiddish. Language Variation and Change 5:257–83.CrossRef Google Scholar

Sauerland, Uli. (2008). On the semantic markedness of phi-features. In Harbour, D., Adger, D., & Bejar, S. (Eds.), Phi Theory: Phi-features across Modules and Interfaces, 527–82. Oxford: Oxford University Press.Google Scholar

Schøsler, Lene. (2002). La variation linguistique: le cas de l'expression du sujet. In Sampson, R. & Ayres-Bennett, W. (Eds.), Interpreting the History of French, A Festschrift for Peter Rickard on the occasion of his eightieth birthday, 187–208. New York: Rodopi.Google Scholar

Shlonsky, Ur. (2009). Hebrew as a partial null-subject language. Studia linguistica 63:133–157.Google Scholar

Simonenko, Alexandra, Crabbé, Benoit, & Prévost, Sophie. (2015). Morphological triggers of syntactic changes: Treebank-based information theoretic approach. In Dickinson, M., Hinrichs, E., Patejuk, A., & Przepiórkowski, A. (Eds.), Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), 194–205. Warsaw: Institute of Computer Science, Polish Academy of Sciences.Google Scholar

Speas, Margaret. (1995). Economy, agreement and the representation of null arguments. Ms. University of Massachusetts, Amherst.Google Scholar

Sprouse, Rex, & Vance, Barbara. (1999). An explanation for the decline of null pronouns in certain Germanic and Romance dialects. In deGraff, M. (Ed.), Language creation and language change: creolization, diachrony and development, 257–84. Cambridge, Mass.: MIT Press.Google Scholar

Taraldsen, Tarald. (1980). On the NIC, vacuous application and the that-trace filter. Indiana University Linguistics Club.Google Scholar

Tvica, Seid. (2017). On minimal person and number features. Unpublished manuscript. https://www.academia.edu/23278325/On_the_universality_of_person_and_number.Google Scholar

van Reenen, Pieter & Schøsler, Lene. (1987). Le problème de la prolifération des explications. Vrije Universiteit working papers in linguistics 27.Google Scholar

Vance, Barbara. (1997). Syntactic change in medieval French. Studies in Natural Language and Linguistic Theory. Dordrecht/Boston/London: Kluwer.CrossRef Google Scholar

Vennemann, Theo. (1975). An explanation of drift. In Li, Ch. N. (Ed.), Word Order and Word Order Change, 269–305. Austin: University of Texas Press.Google Scholar

Yang, Charles. (2000). Internal and external forces in language change. Language Variation and Change 12:231–50.CrossRef Google Scholar

Yang, Charles. (2002). Knowledge and learning in natural language. Oxford: Oxford University Press.Google Scholar

Yang, Charles. (2010). Three factors in language variation. Lingua 120:1160–77.CrossRef Google Scholar

Zimmermann, Michael. (2009). On the evolution of expletive subject pronouns in Old French. In Remberger, E.-M. & Kaiser, G.A. (Eds.), Proceedings of the workshop “null-subjects, expletives, and locatives in romance.” Arbeitspapier 123, 63–92. Fachbereich Sprachwissenschaft, Universität Konstanz.Google Scholar

Zimmermann, Michael. (2014). Expletive and referential subject pronouns in Medieval French, volume 556. Berlin, Boston: Walter de Gruyter GmbH.CrossRef Google Scholar

Figure 1. Overt pronominal subjects in main and subordinate clauses (n = 76150).

Note: Absolute numbers of null and overt pronominal subjects for each text are given in Appendix B. We use frequencies to estimate probabilities.

Figure 2. Innovative -e (changes A, B, C) and -a endings (changes D, E).

Note: P(END = NEW | DATE = D, GR = I, P = 1) stands for the estimated probability of a Group I verb to have a new ending (i.e., -e) in the context of the 1st person singular subject, etc.

Figure 3. Innovative zero (change F) and -s ending (changes G, H, I).

Figure 4. Spread of new endings and overt pronominal subjects.

Figure 5. Logistic regression models of the emergence of the new endings.

Figure 6. Rise of -e with Group I verbs in the context of the 1st person singular subjects.

Figure 7. Pronominal subject expression with old and new endings.

Figure 8. Change reversal for Group II verbs in preterite with 3rd person subject.

Figure 9. Parsing probability of the TP-Grammar.

Figure 10. Parsing probability of the TP-Grammar (non-conservative*).

Note: Assuming that, in addition to -e, -s, zero following i/u, and -eies; also ambiguous were -t, -eiet, -it, -et, -at.

Simonenko et al. supplementary material

Simonenko et al. supplementary material 1

File 37.4 KB

Simonenko et al. supplementary material

Simonenko et al. supplementary material 2

File 76.5 KB

Simonenko et al. supplementary material

Simonenko et al. supplementary material 3

File 292.2 KB

Simonenko et al. supplementary material

Simonenko et al. supplementary material 4

File 12.5 KB

Agreement syncretization and the loss of null subjects: quantificational models for Medieval French—Addendum

Alexandra Simonenko , Benoit Crabbé and Sophie Prévost

Language Variation and Change , Volume 31 , Issue 3

Article contents

Agreement syncretization and the loss of null subjects: quantificational models for Medieval French

Abstract

NULL SUBJECTS AND SUBJECT AGREEMENT IN FRENCH

Subject agreement syncretization

Quantifying the emergence of the new endings

CLAUSE-LEVEL RELATION MODEL

AgrP-Grammar

TP-Grammar

PERFORMANCE OF THE CLAUSE-LEVEL RELATION MODEL

Testing the main hypothesis

Syncretization in different contexts

Spread of the new endings with null subjects

Spread of overt subjects with old endings

SPELLING-PRONUNCIATION PROBLEM

CHANGE AS A VARIATIONAL LEARNING OUTCOME

General framework

Diachronic stability and change

Estimating failure probabilities

Discussion

CONCLUSIONS

SUPPLEMENTAL MATERIALS

Footnotes

References

REFERENCES

Simonenko et al. supplementary material

Simonenko et al. supplementary material

Simonenko et al. supplementary material

Simonenko et al. supplementary material

An addendum has been issued for this article:

Linked content

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Agreement syncretization and the loss of null subjects: quantificational models for Medieval French

Abstract

NULL SUBJECTS AND SUBJECT AGREEMENT IN FRENCH

Subject agreement syncretization

Quantifying the emergence of the new endings

CLAUSE-LEVEL RELATION MODEL

AgrP-Grammar

TP-Grammar

PERFORMANCE OF THE CLAUSE-LEVEL RELATION MODEL

Testing the main hypothesis

Syncretization in different contexts

Spread of the new endings with null subjects

Spread of overt subjects with old endings

SPELLING-PRONUNCIATION PROBLEM

CHANGE AS A VARIATIONAL LEARNING OUTCOME

General framework

Diachronic stability and change

Estimating failure probabilities

Discussion

CONCLUSIONS

SUPPLEMENTAL MATERIALS

Footnotes

References

REFERENCES

Simonenko et al. supplementary material

Simonenko et al. supplementary material

Simonenko et al. supplementary material

Simonenko et al. supplementary material

An addendum has been issued for this article:

Linked content

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests