Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-02-11T20:01:56.335Z Has data issue: false hasContentIssue false

Discourse motivations for pronominal and zero objects across registers in Vera'a

Published online by Cambridge University Press:  22 May 2018

Stefan Schnell
Affiliation:
The University of Melbourne and Centre of Excellence for the Dynamics of Language
Danielle Barth
Affiliation:
Australian National University and Centre of Excellence for the Dynamics of Language
Rights & Permissions [Opens in a new window]

Abstract

The choice between pronominal and zero form for objects in the Oceanic language Vera'a is investigated quantitatively in texts from two registers with discourse topics of three different ontological class memberships. Discourse topicality is found to predict best the choice between pronoun and zero, outranking the factors of ontological class membership, antecedent form, and antecedent function. Contrary to current models of referent tracking, antecedent distance does not show any effect at all. It is concluded that (a) discourse structure and activation are not universally the most significant factors in referential choice and (b) ontological class and discourse topicality can be teased apart through appropriate text sampling, and it is the latter that is most significant. This bears important implications for the grammaticalization of object agreement and the typology of differential object marking.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2018 

Objects in Vera'a can take forms of the following three types: a full noun phrase (1a), a pronoun (1b), or zero (1c). Vera'a syntax is strictly configurational, and objects are marked solely by their position immediately following the verbal predicate. Like objects, subjects can also be left zero, as is the case in all examples hereFootnote 1:

  1. (1)

Following some of the literature on referential choice (e.g., Du Bois, Reference DuBois1987), we classify noun phrases (NPs) as lexical and pronouns and zeroes as nonlexical. Despite some remaining uncertainties and open questions, we assume here that the choice between lexical and nonlexical expression of objects, like that of any other grammatical relation, is largely accountable for in terms of information packaging, namely referent activation (or “accessibility” in Ariel's work) (Ariel, Reference Ariel1990, Reference Ariel2008; Chafe, Reference Chafe and Li1976; Givón, Reference Givón and Givón1983a, Reference Givón and Tomasello1998; Prince, Reference Prince, Thomson and Mann1992, Reference Prince and Cole1981) and information structure (Krifka, Reference Krifka2008; Lambrecht, Reference Lambrecht1994). Roughly, lexical expressions are used where the speaker does not assume that the addressee can clearly identify an intended participant in a particular role, and nonlexical expressions are used for the reverse situation where the addressee is assumed to unambiguously identify the participant in question.

The issue we are concerned with in the present paper is the choice between the two types of nonlexical expression for objects. Some scholars acknowledge the essential functional closeness of the two forms, for instance Givón (Reference Givón and Tomasello1998:57), but nonetheless claim that the choice is primarily determined by considerations of information packaging and activation. Thus, Givón (Reference Givón and Tomasello1998:57) postulated a slightly lower degree of “thematic continuity” in triggering the use of a(n unstressed) pronoun, and for Ariel (Reference Ariel1990:58–61; Reference Ariel, Barlow and Kemmer2000:202–210) the choice follows from the same principles of “accessibility” that determine the choice between nonlexical and lexical form. The functional closeness of zero and pronoun is, on the other hand, reflected in the cross-linguistic classification of languages with and without radical pro-drop (see Neeleman and Szendröí [Reference Neeleman and Szendröi2007] for a relatively recent account; and Huang [Reference Huang2000:50–90] for critical discussion): a language such as English generally requires overt expression of objects, confining zero objects to a restricted set of contexts (see Ariel [Reference Ariel1990:59–60] for examples). In Japanese, on the other hand, zero anaphor is the default choice for nonlexical objects, and pronominal objects are confined to contexts where specific aspects of the social relations between interlocutors come into play. Thus, Clancy (Reference Clancy and Chafe1980) found no pronominal objects at all in her Japanese narrative data, a fact she explained by the absence of relevant social considerations that would trigger the use of a pronominal form in this text type. In languages such as English, zero reference is seen as highly constrained by syntactic constellations of clause combining, whereas in languages such as Japanese and Mandarin, the choice has been attributed to complex pragmatic mechanisms of interpretation (Huang [Reference Huang1994] on Mandarin).

For other languages, the proportion of pronominal and zero objects has been found to be more balanced, essentially leading to greater freedom in the realization of nonlexical objects and the corresponding question about what factors determine the choice. The choice between pronoun and zero in objects has received some treatment in variationist studies. For instance, Schwenter (Reference Schwenter, Amaral and Carvalho2006, Reference Schwenter, Face and Klee2014) reported considerable variability between pronominal and zero objects in Brazilian Portuguese and numerous South American varieties of Spanish. He attributed the variation to the semantic saliency features of animacy and specificity. Meyerhoff (Reference Meyerhoff2002) reported on constraints on null objects in Bislama: her findings confirm discourse-structural considerations such as activation, as well as priming effects by form and function of an object's antecedent, but she also stated that these effects are not as pronounced in Bislama as they have been found for other languages (Meyerhoff, Reference Meyerhoff2002:338).

Language typology has treated the variable encoding of objects under the label of differential object marking (DOM; Aissen [Reference Aissen2003] among many others) or conditioned agreement (Siewierska, Reference Siewierska2004:148–162). Some corpus studies of lesser-studied languages also deal with the variable expression of objects as pronoun or zero according to semantic parameters such as animacy or specificity typically associated with DOM: for instance, Genetti and Crain (Reference Genetti, Crain, Du Bois, Kumpf and Ashby2003) found that in narrative discourse from Nepali, pronominal objects are restricted to human referents. Schwenter (Reference Schwenter, Face and Klee2014) linked his finding of animacy and specificity to mostly predict the variation between pronominal and zero objects to patterns of case-marking DOM in Spanish: animate specific objects tend to be expressed as pronouns and receive special case marking; animate nonspecific as well as inanimate ones tend to be zero, and receive no special case marking. This suggests a deeper relationship between alternations in grammatical relation systems such as DOM, and the variable treatment of grammatical relations in discourse.

In the generative-oriented literature, the discussion focusses on the correct classification of zero objects as one type of “anaphora”—‘Ā-bound variables’ (e.g., Huang [Reference Huang1984] for Chinese)—versus pro (e.g., Rizzi [Reference Rizzi1986] for Italian; Chung [Reference Chung1984] for Chamorro); for the purposes of the current paper, we exclude these theoretical considerations from our discussion (see Huang [Reference Huang2000:79–88] for a succinct critical overview).

Unlike English or Japanese, and more similar to Portuguese, Spanish, Bislama, or Nepali, Vera'a does not show a general preference for either pronominal or zero nonlexical objects. Vera'a discourse is characterized by a roughly even proportion of these two forms, and examples like (1b) and (1c) are equally common in our text data. The aim of the present paper is thus to single out the relevant factors that drive the choice between pronominal and zero form for objects. To this end, we investigate a corpus of richly annotated texts (25,646 words) that have been collected by one of the authors over 10 years of documentary work on the Vera'a language, and that is composed not only of oral narratives, but also of oral descriptions of fish and plant species. This minimal variation in register within our text sample has some bearing on our findings, since it helps tease apart the factors of animacy—in particular humanness—and what we call discourse topicality here (see the section on alternations in object realization for explanation), two features that converge in narrative texts. This leads us to the conclusion that the best predictor of pronominal forms in objects in Vera'a is in fact that of discourse topicality rather than the ontological salience of animacy as such. With our study of Vera'a, we thus hope to also contribute generally to some better understanding of referential choice in objects in Oceanic languages and beyond.

The paper unfolds as follows: in the next section, we provide a short overview of the structure of simple clauses, grammatical relations, and forms of reference in Vera'a. We then summarize those factors that have commonly been associated with the alternation under investigation, namely those factors pertaining to information packaging and those pertaining to semantic saliency features, most notably animacy. This is followed by an outline of our corpus data, methodological approach, and coding decisions for specific variables and their proportions in the data. We also discuss issues of potential collinearity and interactions. Next, we present findings from our analysis and discuss them. In the last section, we summarize a number of conclusions from our findings that are relevant not only for cross-linguistic findings on the topic, but that also have some bearing on hypotheses of grammaticalization of object agreement.

VERA'A BASIC CLAUSE STRUCTURE AND VARIABLE REALIZATION OF OBJECTS

Vera'a is an Oceanic language from North Vanuatu, spoken by a growing population of approximately 500 on Vanua Lava (Banks Islands) (Schnell, Reference Schnell2011), in close proximity to the language Vurës (Malau, Reference Malau2016). The language is essentially configurational, and grammatical relations of subject and object are encoded by means of ordering relative to the verb complex (VC; rendered in boldface in this section), with subjects preceding the VC and objects occupying a postverbal position:

  1. (2)

  2. (3)

The VC functions as the predicate of a verbal clause. With the exception of imperative and some nonfinite clause-chaining constructions,Footnote 3 it consists of minimally a marker for tense-aspect-mood-polarity (TAMP) and one verb. As is common for Oceanic languages, multiple verbs can serialize within the same VC, as in (2a), where ma-ma' ‘be dead’ is a serialized verb. The argument structure of verb serializations is composed of that of the individual verbs and bears a combined transitivity value as well as object relation. In addition, different types of adverb can follow onto a single or series of verbs. The right periphery of the VC is optionally occupied by two types of directional, namely directional adverbs, like sar in (4a), and directional particles, like ma in (4b):

  1. (4)

Where both types of directional co-occur, the directional particle follows onto the adverb; directional particles constitute the final element of a VC. As for the realization of objects, the examples in (4) show that full lexical NPs follow directional adverbs and particles and are thus outside the VC. Object pronouns, on the other hand, occur within the VC where they precede any directional adverb, like sar in (5a), or directional particle, like ma in (5b):

  1. (5)

The paradigm of free personal pronouns is given in Table 1. They distinguish categories of person, number, and clusivity. Some pronominal forms show a general tendency to drop their final vowels; this is a general phonological process found with different lexical elements, not just pronouns. All forms listed in Table 1 are found in different syntactic contexts. Thus, object pronouns are not systematically reduced in terms of their segmental phonological shape, and Vera'a does not have a specialized set of pronominal forms for objects. However, in their VC-internal position, they are by default not accessible to prosodic stress, which falls on the last constituent word of a sentence (‘house’ and the demonstrative, respectively, in (5)). The only instances where object pronouns are stressed is where they are not followed by anything else and thus happen to occupy a sentence-final position, as in (6):

  1. (6)

More research into the prosodic structure of Vera'a is required to ascertain these stress-related observations. Yet, we take examples such as (5) and (6) as essentially suggesting that we are dealing here with by-default unstressed pronouns, much in the sense of Givón (Reference Givón and Givón1983a:17) or Ariel (Reference Ariel1990:73). Finally, note that object pronouns are mutually exclusive with a coreferent object NP, that is they never “double” an NP and thus do not constitute verb-object agreement (Corbett, Reference Corbett, Booij, DeCesaris, Ralli and Scalise2003; Haspelmath, Reference Haspelmath, Bakker and Haspelmath2013).

Table 1. Paradigm of Vera'a personal pronouns

As shown in (1c), objects may be left implicit. Example (7) shows two instances of zero anaphor objects:

  1. (7)

As zero objects, we only count those instances of absence of an overt object expression where an intended referent is recoverable from discourse context. We do not consider semantic entailment a sufficient criterion to assume an unexpressed object argument (Dowty, Reference Dowty, Jackson and Pullum1982).Footnote 4 Generally, zero anaphors in Vera'a are pragmatically rather than syntactically conditioned (Huang, Reference Huang2000). For zero objects, this can be seen from instances of split antecedence, where a zero anaphor relates back to two antecedent expressions, shown in example (8):

  1. (8)

Vera'a also allows left-dislocations correlating with objects, and in this case, we either find a resumptive pronoun in its regular position, as in (9a), or a zero anaphor, as in (9b):

  1. (9)

Whether a left-dislocated phrase correlates with the object or some other (or no) grammatical relation within the clause is entirely a matter of semantic interpretation and pragmatic inference. Thus, in example (10), an interpretation like ‘But birds, they (human beings or some animals) really like (them, that is, the birds)’Footnote 5 would be entirely consistent. Only contextual information triggers the correct interpretation of the left-dislocated NP as coreferential with the subject pronoun:

  1. (10)

Note that the same is essentially true for pronominal in situ objects that share the same person and number values with dislocated phrases, as in (9a) and (11): here, the left-dislocated NP can be understood as coreferential with either subject or object, and only context and encyclopedic clues guide the intended interpretation:

  1. (11)

We conclude that left-dislocated NPs in Vera'a are outside the clausal core and do not themselves enter grammatical relations of subject or object. Where coreferent with subjects, they are always resumed by a subject pronoun, but where they are coreferent with the object, this may be pronominal or zero, as in any other sentence without left-dislocation. The fact that this alternation is preserved under left-dislocation entails that these constructions cannot be regarded a sufficient determining factor for the use of an object pronoun, as may be suggested by Givón's (Reference Givón and Li1976) seminal paper on the development of subject and object agreement. Hence, a broader approach considering a range of possible factors is required to tackle the alternation at hand.

ALTERNATIONS IN OBJECT REALIZATION: RELEVANT FACTORS

In the literature on pronominal versus zero anaphors, there are essentially two approaches: the generativist approach in terms of parametric constraints on (radical) pro-drop (Neeleman & Szendröi, Reference Neeleman and Szendröi2007) and the usage-based approach taken for instance by Bickel (Reference Bickel2003), who essentially replaces the parametric approach with a corpus-based one. Both approaches are basically holistic, classifying languages according to specific features.Footnote 6 They are not concerned with the factors that drive the formal choice in discourse from individual languages (cf. Neeleman & Szendröi, Reference Neeleman and Szendröi2007:673–674).

The factors we considered in our study can be grouped into two categories: (a) Factors pertaining to discourse structure, that is mainly features of the anaphoric relation between object anaphor and its antecedent, but also the global thematic status of object referents, that is to say whether a given text is primarily about this referent. (b) Factors pertaining to inherent semantic properties of objects, that is features that are independent of the specific relationship to discourse context, like person and number, and the ontological class of the discourse entity in question.

The first set of factors comprises antecedent distance, function, and form, as well as discourse topicality. Of these, antecedent distance has probably been the most prominent, see for instance studies relating to Givón's concept of look-back (Givón [Reference Givón1983b] and contributions therein) and is also the one central to Ariel's (Reference Ariel1988, Reference Ariel1990, Reference Ariel, Barlow and Kemmer2000) Accessibility Theory (AT). AT and similar theoretical approaches predict that the use of a more explicit form correlates with greater antecedent distance. Although AT focuses on the choice between lexical and nonlexical expressions (see Ariel, Reference Ariel1990:58, Reference Ariel2008:46–47), the choice between “high-accessibility markers” of different types of pronoun and zero has also been attributed to accessibility (see Ariel [Reference Ariel1990:58–68, 76–79] for detailed discussion). In our case, pronominal objects would, according to AT, be predicted to have antecedents at greater distance than zero objects. Note that since both pronoun and zero anaphor are high accessibility markers, both forms should show relatively low anaphoric distances, and differences in distance between the two types of expression are expected to be quite minimal. Moreover, since Vera'a pronouns do not mark gender, they are not particularly amenable to distinguishing competing referents. A switch of syntactic function—and related participant role—also diminishes the accessibility of an antecedent, and more explicit forms tend to be used in switch-function contexts; in our case, where an object's antecedent is not itself an object, it should tend to be realized as a pronoun rather than zero and vice versa. Finally, the form of the antecedent is relevant in connection with so-called priming effects—that is the tendency for speakers to reuse the same kinds of structures in series. Research on subject expression shows that the form of an earlier subject may prime a latter subject to have the same form (Cameron & Flores-Ferrán, Reference Cameron and Flores-Ferrán2004; Torres Cacoullos & Travis, Reference Torres Cacoullos and Travis2013; Travis, Reference Travis2007). Other studies have shown that a speaker's use of other kinds of variants can prime the further use of that variant over another, such as agreement marking or lack thereof (Poplack, Reference Poplack and Labov1980; Scherre & Naro, Reference Scherre and Naro1991) or contraction of verbs and auxiliaries (Barth & Kapatsinski, Reference Barth and Kapatsinski2017). In our case, this means that the form of the antecedent of an object should trigger the same form in the anaphoric object; thus, zero objects should tend to have zero object antecedents, and pronominal objects pronominal object antecedents.

Another group of factors considered here composes the more discourse-stable semantic feature of ontological class membership, most notably animacy, already found to be relevant by Schwenter (Reference Schwenter, Amaral and Carvalho2006, Reference Schwenter, Face and Klee2014) for the same alternation in Spanish and Portuguese, and Genetti and Crain (Reference Genetti, Crain, Du Bois, Kumpf and Ashby2003) for Nepali. We distinguish human, animate, and inanimate referents. Drawing on this previous research, we expect pronominal objects to be more likely with human referents than nonhuman ones and more likely with animate than with inanimate ones.

A factor featuring prominently in the discussion of DOM and object agreement is that of (sentence) topicality (Dalrymple & Nikolaeva, Reference Dalrymple and Nikolaeva2011; Krifka, Reference Krifka2008). This is related to Givón's (Reference Givón and Li1976) seminal hypothesis that argument agreement arises from so-called topic shift constructions involving the left-dislocation of the object NP (see Givón, Reference Givón and Li1976:156–160), which in turn would trigger the use of object pronouns that resume the referent in situ. Object pronouns should be particularly frequent with topicalized, left-dislocated NPs, since these objects must be topics.

Finally, we include what we call here discourse topicality. Discourse topic is understood here in the sense of Chafe (Reference Chafe1994:120–121): it is that discourse entity that an entire text is about and that makes the text interesting. As such it is different from both sentence topics (Krifka, Reference Krifka2008) and from topicality in the Givónian tradition, which is essentially a measurement of referential continuity of individual participants across sentences in a given (stretch of) discourse (Givón, Reference Givón and Givón1983a). What we are interested in here is the communicative purpose of a text: with regard to what entities is a text relevant? As such, discourse topic is part of what Biber and Conrad (Reference Biber and Conrad2009:39–47) termed the “situational characteristics” of a text with specific register and genre properties.Footnote 7 The Vera'a corpus for this study consists of three groups of texts (narratives, fish descriptions, plant descriptions) that differ with regard to their discourse topics: for narratives, we assume that all human or human-like referents (including anthropomorphized animals in fables) are the discourse topics, since these are the entities that will be of most concern to the audience, and thus constitute part of a narrative's tellability. And for the two types of descriptive texts, it is the fish or plant species, respectively, excluding other fish and plants, since the communicative purpose of producing the text is to provide information about these species. We will see in the results section that discourse topicality is the best predictor for the choice between pronominal and zero objects, with discourse topics favoring expression by pronouns. Although discourse topicality and ontological class/animacy are independent of each other, they show a tendency of convergence in human referents. Thus, in order to tease apart these two factors, we require a sample of texts with discourse topics of different ontological classes and an evaluative methodology that can directly compare animacy and discourse topicality.

METHODOLOGY

Corpus data

Data for the present study comes from a total of 36 texts in the Vera'a corpus, collected by the first author from 2007 to 2013 on Vanua Lava, North Vanuatu. The larger part of the data comprises 10 traditional narratives. These are supplemented by descriptive texts, namely 9 descriptions of fish species and 17 descriptions of plant species, which together make up a much smaller part of the data, an overview of which is provided in Table 2. The three types of text differ according to register properties (in the sense of Biber and Conrad [Reference Biber and Conrad2009:39–47]), including different discourse topics, as we have outlined. Narratives represent a register in itself that has the broad communicative function of relating adventures of human or anthropomorphized protagonists to the addressee who may feel more or less sympathetic with their characters; there may occasionally be a moral to a story. Descriptions are of a different register, bearing the primary communicative function of presenting factual information about the fish or plant species under discussion. Thus, the three types of text have three different discourse topics in terms of ontological class membership: narratives have human beings as their topics, whereas the descriptions have fish and plant species as their topics, respectively.

Table 2. Overview of Vera'a corpus data

Variables considered for statistical analysis

Based on the literature and knowledge of the Vera'a data, we have identified seven variables as having a potential impact on pronominal versus zero object expression: (a) animacy, (b) discourse topicality, (c) form of the antecedent, (d) function of the antecedent, (e) distance of the antecedent, (f) discourse interruption, and (g) speaker.

Animacy

Referents are coded either as animate or inanimate. Humans, spirits, and anthropomorphized animals capable of speech, thought, and planning, and ordinary animal referents are coded as animate, and plants, plant parts, body parts, and other inanimate objects as inanimate. Animacy may interact with the following predictor, discourse topicality. We are interested in the relative strength of these predictors and in determining when each, if any, matters.

Discourse topicality of the object referent

Referents are coded as either topical or nontopical, in the sense of Chafe (Reference Chafe1994), as we have outlined. In narrative texts, human beings, spirits, and anthropomorphized animals are considered to be the discourse topics and are thus coded as topical. In the descriptions about fish and plants, fish and plants under consideration in the specific text are considered topical. Referents considered to be nontopical are nonhuman, animate beings in narratives and descriptions (although not the fish under discussion in a descriptive text, rather a fish species other than the one described in a text, or a fish or pig in a narrative), inanimate objects in a narrative or description (other than the plant species under discussion in a plant text, rather a bow or a basket), body parts and plant parts.

Table 3 shows the distribution of tokens by form, animacy, and topicality. We see in the Total column that animate objects are most often expressed with pronouns (76%, 173 of 227) and inanimate objects are overwhelmingly expressed with zeros (92%, 205 of 223). In the Total rows, we see that topical objects are often pronouns (78%, 186 of 240) and nontopical objects are overwhelmingly zero (98%, 205 of 210). We also see that animate objects are topical (95%, 215 of 227), while inanimate ones tend to be nontopical (89%, 198 of 223).

Table 3. Object form by animacy and discourse topicality (n = 450)

Form of the antecedent

The antecedent may have been expressed as either a lexical NP or a pronoun or it may have been a zero. An issue here is that for antecedents that are zeros or pronouns, there are factors that condition the form of that antecedent pronoun. Our analysis is designed to detect the strength of this predictor, given the other predictors. Table 4 shows the distribution of tokens by their form and the form of their antecedent.

Table 4. Object form by form of the antecedent (n = 450)

Note: Percentages appear in parentheses.

Function of the antecedent

The grammatical function of the antecedent could either have been a subject of either a verbal or nonverbal clause, an object, a left-dislocated expression in preclause position, or other, which may have been an oblique argument, a nonverbal predicate, a possessor, an adjunct, a vocative address term, or a stretch of discourse (discourse-deictic reference). There were relatively few of each of these other kinds of functions. Where an object's antecedent is the subject in the same clause, the object is always expressed by a pronoun. This is obviously a rule outside the general mechanisms assumed by accessibility-related models. Since there are only seven instances of such subject-bound object pronouns in the corpus data investigated here, we do not expect this to bear any effect on our findings. Note that when the antecedent and target are both objects, then they have the same function. Table 5 shows the distribution of objects by their form and the function of the antecedent.

Table 5. Object form by function of antecedent (n = 450)

Note: Percentages appear in parentheses.

Distance of antecedent

Objects were coded for how many clauses previously the antecedent occurred. The value 0 was used in cases where the object antecedent occurs in the same clause, either as subject (equivalent to reflexive constructions in other languages) or possessor in some other constituent NP, but also cases where the antecedent is a left-dislocated NP in preclausal position, which in fact comprises the bulk of cases for this category. The value 1 was used for one clause previous, 2 for two clauses previous, 3 for three or more clauses previous. We also distinguished objects in relative clauses with their antecedent being the modified noun.Footnote 8 Table 6 shows the distribution of objects by their form and the distance of the antecedent.

Table 6. Object form by distance of antecedent (n = 449)*

Note: Percentages appear in parentheses. *There was one token of discourse-deictic use, where the object anaphor refers back to a stretch of preceding discourse.

Discourse interruption

Objects were coded for presence or absence of an interruption or disfluency in the text or a shift of levels of narrative representation, for example switching to direct speech in a text (see Lichtenberk, Reference Lichtenberk and Fox1996). Having an interruption could theoretically reset the conditioning factors, in particular lowering the antecedent's accessibility. Table 7 shows the frequencies of interruptions and object forms.

Table 7. Object form by discourse interruption

Note: Percentages appear in parentheses.

Speaker

Individuals may make their own contributions to variable patterns of object expression. It may be the case that some people simply have tendencies to express objects as pronouns or zeros. Therefore, we also include speaker (n = 14) as a variable.

Statistical analysis

While an apparent possible approach to testing the predictors is logistic regression, it is not ideal for the investigation of our data, since animacy and discourse topicality are expected to show multicollinearity due to the inclusion of narrative texts, where human referents are also topical ones. It is the inclusion of the Vera'a plant and fish stories that allows for the testing of animacy and discourse topicality against each other, so we also need a statistical method that can test these predictors simultaneously. With recursive partitioning, we are able to test predictors and eliminate those that do not contribute to significant effects in the data without doing a stepwise regression procedure. Suitability of the recursive partitioning for our particular data and research question also lies in testing interactions. Some of our predictors have several levels: function of the antecedent has four levels; distance from the antecedent has six. When testing for interactions, it is easy to see that crossing these kinds of predictors (here what would be a six by four matrix) can lead to missing test cases (empty cells), which will cause a typical logistic regression to fail to converge.

Two types of recursive partitioning analyses are described in the results section, a binary classification tree using the partykit package and a variable importance assessment of a random forest using the party package (Hothorn, Buehlmann, Dudoit, Molinaro, & Van Der Laan, Reference Hothorn, Buehlmann, Dudoit, Molinaro and Van Der Laan2006; Hothorn & Zeileis, Reference Hothorn and Zeileis2015; Strobl, Boulesteix, Zeileis, & Hothorn, Reference Strobl, Boulesteix, Zeileis and Hothorn2007; Strobl, Boulesteix, Kneib, Augustin, & Zeileis, Reference Strobl, Boulesteix, Kneib, Augustin and Zeileis2008) in R (R Core Team, 2015). A binary classification tree divides the data into two sections based on which data points are most different from each other, using the given variables. An initial split is shown across levels of the independent variable that provides the best differentiation of the data's dependent variable and the data are divided into two nodes. Under each node (or branch of the tree), the data are then split into another two sections based on which data points are most different from each other using the remaining independent variables and remaining levels of independent variables from further up the tree. Because each set of data under a node is looked at anew, the same variable can be used again in a lower level of the tree, with different levels. This splitting continues recursively until the data under each final branch is relatively homogenous. The data in a node is considered homogenous enough when there are no significant differences (using a p-value) between the data points, given the independent variables. This allows p-values to be provided for each split (cf. Hothorn et al., Reference Hothorn, Buehlmann, Dudoit, Molinaro and Van Der Laan2006). The algorithm for creating the tree model will not necessarily use all independent variables listed in the model specification. If there are independent variables that would not make a significant split in the data, they go unused.

We also present a conditional random forest analysis. When many classification trees are averaged, factors can be ranked by their importance, determined by which factors most often make a significant split in the data, especially at higher levels in the tree. The analysis resamples across subsections of the data and uses different samples of the independent variables (here four out of seven possible ones) to increase the variation in the possible trees. Slightly different data subsets and variable combinations will result in potentially different variables performing well (sometimes animacy of the referent, sometimes function or form of the antecedent, etc.), but after looking at many cases (n = 1000), one will generally be the best factor for predicting the outcome of the dependent variable. This one factor will be ranked higher than the other in a variable importance ranking. A conditional permutation of the variable importance ranking, although computationally expensive, ensures that the evaluation of a variable's importance takes into consideration its behavior in relation to other variables in its ranking (Strobl et al., Reference Strobl, Boulesteix, Kneib, Augustin and Zeileis2008; Tagliamonte & Baayen, Reference Tagliamonte and Baayen2012).

Potential multicollinearity and interactions

Each pair of predictors was checked for collinearity. Only one pair of predictors had r above .5 (or below −.5). Animacy and discourse topicality are collinear with r = .84. In an analysis type such as generalized linear regression, we would not be able to felicitously include both these predictors.Footnote 9 However, in a classification tree, we can see which predictor makes the first (best) split of the data and how the predictors interact. Furthermore, as these kinds of recursive partitioning analyses can handle nonmonotonic effects, we can detect whether the predictor effects the whole dataset or just a portion of it. Because we are interested in the strength of animacy versus discourse topicality, using a classification tree is advantageous, allowing us to compare these collinear predictors directly against each other.

If structural priming is at work in the conditioning of the object form, we should see an interaction in the form and the function of the antecedent. If there is priming, we would expect there to be a higher rate of pronominal object forms when the antecedent is a pronominal object and a higher rate of zero object forms when the antecedent is a zero object. Higher rates of pronominal objects when the antecedent is a pronominal subject (or higher rates of zero objects when the antecedent is a zero subject) would not be due to structural priming. Table 8 shows the rates of zero/pronominal object forms by antecedent form and function. In Table 8, it should be clear that objects with pronominal object antecedents are more likely to be pronouns, but that when the antecedent is a pronominal subject, the object is also more likely to be a pronoun. Objects with zero antecedents are more likely to have a zero, but so are objects with lexical NP objects. The results section shows the significant interaction of these predictors, given the other predictors. As is indicated by the numbers in Table 8, structural priming cannot be the (sole) factor at play in the matching of antecedent and target forms.

Table 8. Object form by antecedent (n = 450)

RESULTS

Recursive partitioning results

The binary classification tree (Figure 1) and random forest variable importance ranking (Figure 2) show consistent results in modelling the prediction of speakers’ choice to express objects as pronouns or zeros.Footnote 10 The classification tree has splits and nodes. The splits show the predictors that make a significant difference to the classification of pronominal versus zero object pronouns for each subsection of the data. The nodes show the proportion of zero or pronominal expression for the final partitions of the data. This classification tree has 4 splits and 5 nodes. Figure 1 shows that the first split in the data comes from the discourse topicality of the referent. Under split 1, the left branch shows that nontopical referents favor zero object expression, as there is a higher proportion of dark gray than light gray in nodes 3 and 4. Under the right branch of split 1, topical referents (humans, spirits, and anthropomorphized animals in narratives, fish and plants under discussion in descriptive texts) favor pronoun object expression, as seen by the higher proportion of light gray in nodes 7, 8, and 9.

Figure 1. Classification tree of Vera'a object expression.

Figure 2. Variable importance ranking from random forest for Vera'a object expression.

Form of the antecedent results in a further split on both sides of the tree. Under split 2 on the left side of the tree, we see that when an antecedent is a pronoun, the object is more likely to be a pronoun than when the antecedent is a lexical noun phrase or zero. Generally nontopical referents are highly likely to be zeros, but this pattern is attenuated when the antecedent is a pronoun. When the antecedent is a lexical NP or zero, then a nontopical referent is highly unlikely to be expressed pronominally (only two tokens, both with a lexical antecedent). Under split 5 on the right side of the tree, we see the same pattern. Generally topical referents are likely to be expressed pronominally, and this tendency is increased when the antecedent is also a pronoun and slightly less likely when the antecedent is a zero or a lexical NP. A split based on the same configuration of levels for one variable in multiple parts of the tree means that this effect is found throughout the data. There is no interaction with the discourse topicality of the referent, because the pattern remains the same in both splits: zeros and lexical NPs favor zeros to a greater degree and pronouns favor pronouns to a greater degree, in both subsections of the data.

The function of the antecedent results in the last partition of the data under split 6. When the antecedent was an object or a left-dislocated topic (dislocated), the object was more likely to be expressed as a zero. When the function was a subject, or had another function, the object was more likely to be expressed as a pronoun. This shows that there is likely priming by zero object antecedents on zero object targets. Additional testing showed that this split was still significant after removing lexical NP objects.

Despite split 6, which indicates structural priming for zero objects, the form of the antecedent influences the form of the target, whether or not the antecedent is an object. That is, there is form matching also when a pronominal antecedent is a subject, indicating that structural priming is not the sole underlying motivation for matching. We take this point up in the discussion.

Other predictors (animacy, discourse interruption, and distance of the antecedent) do not appear in the tree in Figure 1, because they provided no significant splits in the data. Figure 2 shows a conditional variable importance ranking for a 1000-tree random forest of classification trees. This forest shows that the three variables that appeared in the tree in Figure 1 are indeed the variables most likely to have an effect, even when taking the other variables into account. According to Strobl, Malley, and Tutz (Reference Strobl, Malley and Tutz2009:21), the “variables whose importance is negative, zero, or has a small positive value that lies in the same range as the negative values, can be excluded from further exploration.” For the ranking in Figure 2, this means that distance of the antecedent, discourse interruption, and animacy are uninformative for predicting zero or pronominal expression of the object, when the effects of the other predictors are taken into account.

Taken together, the tree and forest show a united picture, which is not always a foregone conclusion with these kinds of analyses. For Vera'a object expression, the discourse topicality of the referent is the top contributing factor, showing us that when a referent is topical, it is much more likely to be expressed as a pronoun than a non-discourse-topical one (versus zero). The form of the antecedent influences the form of the object. An object with a subject antecedent is more likely to be expressed as a pronoun, and one with an object antecedent is more likely to be expressed as a zero.

Methodological discussion and an alternative approach

Using conditional inference trees and random forests, instead of logistic or linear regression with sociolinguistic data, is still relatively new (cf. Tagliamonte & Baayen, Reference Tagliamonte and Baayen2012). There are some clear disadvantages to using recursive partitioning. One disadvantage with the partykit (Hothorn & Zeileis, Reference Hothorn and Zeileis2015) or party (Hothorn et al., Reference Hothorn, Buehlmann, Dudoit, Molinaro and Van Der Laan2006) implementation is that it does not provide an easy means to assess variation at the speaker level (i.e., a random intercept and slope in a mixed-effects regression framework). Including speaker as a variable is possible in a random forest, but in a conditional inferencing tree it can cause confusing splits because there are so many levels in the variable. Although we were able to include speaker in the tree displayed in Figure 1, and see that it did not contribute to any significant splits due to low interspeaker variance, one would often not include speaker as a variable in a conditional inferencing tree, just as one would usually not include it as a fixed-effect in a regression model. Another disadvantage is the lack of coefficients in a classification tree figure. In a regression model, we can compute coefficients that show the contribution of each predictor to the fit of the model. The structure of a classification tree model is fundamentally different, and because predictors do not always affect every data point, one cannot provide coefficient values. Finally, some see recursive partitioning as “data dredging.” To address this last point, we have clearly stated predictors that have theoretical motivation. Testing these predictors, and eliminating the predictors that do not have strong predictive contributions simultaneously, is an advantage in our eyes.

As an alternative, we built two generalized linear (logistic) mixed-effects models, one including discourse topicality (Table A1), form of the antecedent, and function of the antecedent as fixed effects and speaker as a random intercept with random slopes on discourse topicality. The other has animacy in place of discourse topicality (Table A2). The models show that either discourse topicality or animacy will be significant if it appears in a model without the other. However, the Akaike information criterion (AIC) (Akaike, Reference Akaike, Petrov and Cáski1973) for the model with discourse topicality is 279.1, as opposed to 321.1 for the model with animacy, meaning we should prefer the former, and the latter has essentially no support as the difference between the models is Δ > 10 (Burnham & Anderson, Reference Burnham and Anderson2002). The recursive partitioning shows this same result in a more elegant manner, showing that animacy makes no significant contribution when discourse topicality is available as a predictor (see the appendix for the generalized linear [logistic] mixed-effects model results).

DISCUSSION

The first thing to note is that our findings do not lend support to AT (Ariel, Reference Ariel1988, Reference Ariel1990), at least not when considering antecedent distance as its most central factor: antecedent distance is not a relevant factor in our analysis. This is illustrated by numerous examples of long-distance zero anaphors in our Vera'a corpus, like the following (cf. Table 6 for figures):

  1. (12)

In all examples in (12), the antecedent of a zero object in each final clause is an NP occurring three clause units away. Obviously, the intended patient or theme referent is identifiable through clues coming from the verbal semantics, world knowledge, and discourse context. Thus, the higher distance as such does not trigger the use of the more explicit form of a pronoun. Conversely, object pronouns are frequently (cf. Table 6) found in contexts with very low antecedent distance, as in (13), where their use would appear not only redundant, but in fact misleading from an AT point of view since it would trigger addressees to recover the appropriate antecedent of the object pronoun from a distance larger than the immediately preceding clause.

  1. (13)

Contrary to current models of referential choice, like AT, the choice between pronoun and zero in objects does thus not relate to instructions of antecedent retrieval, at least not if distance is taken as the major factor. Similarly, discourse interruptions do not show any effect in our data, a finding likewise not easily accountable in terms of accessibility.

Turning now to the three most relevant factors, we first consider switch-function and same-function objects. Our analysis indicates that where an object's antecedent is not itself an object, but say a subject of a preceding clause or some other function, pronominal realization is more likely; conversely, where the antecedent is also an object, resulting in a chain of same-object reference, there is a preference for zero realization (cf. Table 5). The following examples illustrate this pattern:

  1. (14)

Examples like (13) or (15), where a pronoun is used in same-function contexts, are much less common than zero objects in this context (cf. Table 5):

  1. (15)

Similarly rare are examples of zero objects with a nonobject antecedent, for instance a subject:

  1. (16)

The greater likelihood of objects to be expressed by a pronoun where their antecedent is a subject may have two plausible explanations. For one thing, this may suggest that a switch in function triggers the use of a pronoun in object function due to considerations of accessibility. While this interpretation is in principle justified, we should also consider the more global tendency of subjects to often have discourse-topical referents. This tendency is very pronounced in the narrative texts in our data, where approximately 85% of all subjects have human discourse-topical referents. In fish descriptions, this proportion is somewhat lower, namely 70%. Only in plant descriptions do we find a proportion of only 40% of subjects with discourse-topical referents. Given that narrative texts account for a much larger proportion of our data, this means that the third split to the left under node 3, relating to antecedent function, may in fact merely be a by-product of the overall most relevant factor of discourse topicality: discourse topical status triggers expression of objects by a pronoun, and at the same time that same referent is likely—for the same reason—to be a subject in the preceding clause.

As for discourse topicality, our analysis shows that all humans and those nonhuman referents that are also the discourse topic (fish and plants under discussion in descriptive texts) tend to be realized as an object pronoun, hence the split under node 1. Objects with nonhuman, animate referents that are not also the discourse topic tend to be realized as zero rather than pronoun (cf. Table 3). This is the case with animals referred to in narratives, as shown in (17), but also with reference to fish that are not the topic of the description at hand, as shown by the comparative examples in (18).

  1. (17)

  2. (18)

Hence, the most relevant factor here is not ontological class, in particular animacy, as such, but rather the discourse topicality of an object referent. Even inanimate plants tend to be referred to with object pronouns where they are the topic of a description (cf. Table 3), as in (19a), whereas other inanimate referents that are not the discourse topic tend to take zero reference, as in (19b):

  1. (19)

Nonetheless, humans and anthropomorphized living beings are still more likely to be pronominal than fish and plant species under discussion (cf. Table 3). Our analysis does not suggest an overall explanation for this finding. We take this point up in connection with first and second person objects.

While our cross-register investigation helps tease apart the semantic feature of animacy and the discourse factor of discourse topicality, we should also stress that the latter does not correlate with sentence topicality, as suggested by Dalrymple and Nikolaeva (Reference Dalrymple and Nikolaeva2011). It was shown that topicalization constructions in Vera'a allow for both pronominal and zero resumption in situ. In fact, for all cases of topicalization, we find roughly the same distribution of both forms as predicted by discourse topicality—that is where the topicalized referent is also a discourse topic it will tend to get resumed by a pronoun, as in examples (20a) to (22a), but if it is not a discourse topic it will tend to be left zero, as in examples (20b) to (22b):

  1. (20)

  2. (21)

  3. (22)

This suggests that discourse topicality and sentence topicality do not converge when it comes to the use of pronouns for objects in Vera'a, and discourse topicality is a factor in its own right, distinct from the pragmatic relation of topic within a sentence.

Lastly, we observed priming effect in the choice between pronoun and zero objects, so that a pronominal form will trigger the use of another pronoun in a latter mention of the same referent, and likewise for zeros. However, given the much stronger factor of discourse topicality, it seems likely that this effect is epiphenomenal of this overall most important factor: since discourse topicality does not vary for any given discourse referent in a given text, serial reference to this participant will necessarily lead to the use of the same form throughout the series. Such serial effects are in this way merely the effect of an association of referential choice with a text-stable feature. This may also explain why the effect is observed not only with pronominal object antecedents, but also for pronominal subject antecedents.

Before we proceed to draw some conclusions from our findings, we would like to briefly take up the behavior of speech-act participants with regard to the choice at hand: we excluded first and second person objects from the analysis because we essentially do not find any variation here in terms of their formal expression. That is to say, first and second person objects are (nearly)Footnote 11 categorically pronominal. The following are some illustrative examples:

  1. (23)

Note that second person singular pronouns can have generic reference, as in (23b). How can these observations be related to our findings regarding third person objects? There are essentially two possibilities: the first one is that we are dealing with two essentially autonomous subsystems, so that Vera'a grammar includes a rule to realize first and second person objects overtly, a rule that is absent from third person objects.

Another possibility is that first and second person objects share the same functional considerations with third person objects, but that these are much more pronounced with the former, so as to lead to the near-categorical use of pronouns. One functional explanation often discussed in the typological literature in regard to DOM (e.g., Comrie, Reference Comrie1989:128) is in terms of markedness reversal (see also Aissen, Reference Aissen2003): the default in natural communication is for humans to recount their acting upon other humans or—more typically—nonhuman objects; by extension, it is more natural for first and/or second persons to act upon third persons. Therefore, subjects typically correlate with positive semantic salience features, like being high in animacy, definiteness, and specificity, whereas objects show the converse negative set, being typically low in animacy, definiteness, and specificity. The explanation for special marking of objects is in terms of a deviation from this more natural constellation, so that objects correlating with positive semantic saliency features tend to get marked off, iconically reflecting the markedness of conceptual event structure and its participants. Likewise, a first or second person object is unexpected, and although the effects of markedness reversals have been discussed mainly in regard to differential case marking (but see Schwenter [Reference Schwenter, Amaral and Carvalho2006] on Spanish and Portuguese objects), similar considerations may be at work in the choice of first and second person pronouns for objects, marking their relative unexpectedness.

Yet another possible functional explanation could be in terms of what could be called “communicative salience,” that is superordinate to discourse topicality: since all communication among human beings takes place as relevant for their experience, human referents are universally salient in discourse due to a construed equivalence with speaker and addressee. This could mean that even where fish and plants are the topics of descriptive texts, human beings are simply more salient globally. In other words, despite their topical status, fish and plants are not by themselves the communicative ends of these descriptions, and speakers are nonetheless more concerned and involved when it comes to humans, and in particular themselves or the Vera'a community, referred to by the first and second person pronouns. Appealing to communicative salience may also explain the differences between human and nonhuman discourse topics: although fish and plants follow the trend to be expressed by a pronoun when they are discourse topics, this effect may be much stronger with human discourse topics since these are of yet more concern for producer and audience of these texts than the fish and plant species under discussion.

Our analysis does not suggest any clear answer to these questions, and we do not commit to any of these fairly speculative proposals here. It is still possible that we are dealing with an idiosyncratic split between first and second person objects, and third person objects. But even under the latter hypothesis, considerations of markedness or communicative salience may nonetheless be involved in the diachronic development of such a person split system.

CONCLUSIONS

We conclude that discourse topicality is overall the decisive factor triggering the use of a pronoun in contrast to zero in third person objects in Vera'a. Crucially, discourse topicality can be shown to be a factor clearly distinct from semantic salience feature of high animacy. Given the notorious convergence of these two features, it seems methodologically highly relevant that we were able to tease them apart only through systematic investigation of a varied text corpus, taking into account differences in text registers, and the content and topics of classes of texts. In this sense, our study adds to the more recent findings of typological and theoretical studies in the differential treatment of objects that claim topicality to be primary over semantic salience (Dalrymple & Nikolaeva, Reference Dalrymple and Nikolaeva2011; Iemmolo, Reference Iemmolo2010; Iemmolo & Klumpp, Reference Iemmolo, Klumpp, Iemmolo and Klumpp2014 and contributions therein; Schwenter, Reference Schwenter, Amaral and Carvalho2006, Reference Schwenter, Face and Klee2014). However, while Dalrymple and Nikolaeva, for instance, identified sentence topicality as the relevant factor, which they saw as resulting from discourse topicality, our findings suggest discourse topicality as a factor in its own right, distinct not only from more or less correlating semantic features of animacy, but also from sentence topicality. Since we treat discourse topicality differently here, our findings are not immediately comparable to those of Schwenter (Reference Schwenter, Amaral and Carvalho2006), who found a correlation between topic continuity in the sense of Givón (Reference Givón and Givón1983a) and the use of pronominal forms in objects in Portuguese and South American varieties of Spanish. However, it seems that our findings are sufficiently similar to his, and that both Schwenter's and our analysis corroborate the view that more global topicality on discourse level is more important a factor than sentence topicality or animacy.

Given the relatively high effort of our study in terms of corpus design and systematic analysis, it is not impossible that hitherto identified animacy (humanness) splits in other languages, and for other phenomena such as case-marking or agreement DOM, are indeed also a matter of discourse topicality, a possibility that could only be ascertained by extensive examination of the kind undertaken here.

The identification of discourse topicality as the best predictor for the use of object pronouns bears implications for diachronic accounts of the emergence of object agreement, since the increased use of pronouns rather than zero anaphor reflects an initial stage of such a development (Siewierska, Reference Siewierska2004): According to Dalrymple and Nikolaeva (Reference Dalrymple and Nikolaeva2011:18), grammaticalization of object agreement will start out from a tendency to mark topical nonsubjects, in particular objects, and later expand the use of pronominal agreement markers to nontopical objects that have features generally typical of topics, namely human or at least animate objects. The second possibility they mention is narrowing of topical marking, where marked objects are only those topical objects that bear specific semantic features. While we cannot provide a full-fledged alternative account at this point, it is clear that our findings do not square with either scenario: the use of pronouns in Vera'a encompasses all human objects, and discourse-topical nonhuman ones, but excludes sentence-topical ones where they are not human and not discourse topical. This means that the scenarios proposed by Dalrymple and Nikolaeva (Reference Dalrymple and Nikolaeva2011) are at least not universally attested. Working out the relevance of this conclusion for accounts of grammaticalization of object agreement in Oceanic languages and beyond will be left for future research in this area.

Finally, our findings provide ample counterevidence to the universal relevance of accessibility and activation, suggesting that at least the choice between pronoun and zero for objects is not accountable for in terms of AT and similar frameworks concerned with discourse structure. From an activation point of view, these two forms of reference appear to be too similar to mark significant differences. As suggested already by Schwenter's (Reference Schwenter, Amaral and Carvalho2006, Reference Schwenter, Face and Klee2014) work, it seems generally not unlikely that such similarity may result in operationalizing this choice for other purposes, like the marking of discourse topicality, not only in Vera'a, but possibly in yet other languages with a choice between pronoun and zero for objects.

APPENDIX

Abbreviations

1

first person

2

second person

3

third person

a

a-form of demonstrative

abil

ability

abl

ablative

aor

aorist

art

common article

at

Accessibility Theory

dat

dative

dem

demonstrative

disc

discourse particle

distr

distributive

du

dual

eat

eat possession

emph

emphatic particle

ex

exclusive

fut

future

gen

general possession

house

house possession

in

inclusive

interj

interjection

ipfv

imperfective

loc

locative

man

manner demonstrative

np

noun phrase

nsg

non-singular

num

numeral (prefix or article)

pers

personal article

pl

plural

poss

possessive classifier

proh

prohibitive

quot

quotative

red

reduplication

rel

relativizer

sap

speech-act participant

sg

singular

sp

specific

stat

stative

tamp

tense-aspect-mood-polarity

val

valuable possession

vc

verb complex

Logistic regression models

Logistic regression models using the lme4 (Bates, Maechler, Bolker, & Walker, Reference Bates, Maechler, Bolker and Walker2015) and lmerTest (Kuznetsova, Reference Kuznetsova2014) packages in R (R Core Team, 2015).

Table A1. Results of mixed-effects generalized linear regression predicting animacy with discourse topicality

Log likelihood: −129.5

AIC: 279.1

BIC: 320.2

Speaker variance: .03 ± .17

Discourse topicality by speaker variance: .00 ± .06

Table A2. Results of mixed-effects generalized linear regression with animacy

Log likelihood: −150.6

AIC: 321.2

BIC: 362.3

Speaker variance: 1.58 ± 1.26

Animacy by speaker variance: 2.69 ± 1.64

Footnotes

1. Morpheme-by-morpheme glossing follows the Leipzig Glossing Rules (Comrie, Haspelmath, & Bickel, Reference Comrie, Haspelmath and Bickel2015); abbreviation for glosses are listed in the appendix.

2. The common article as well as two TAMP allomorphs are what we call detached clitics, a term adapted from Bickel and Nichols (Reference Bickel, Nichols and Shopen2007:176). Thus, these are categorically unrestricted bound formatives that occupy the first syntactic slot in their functional phrase and attach phonologically to whatever word precedes them, similar to determiners, and subject and object markers in Kwakw'ala, discussed by Anderson (Reference Anderson1992).

3. These constructions are nonfinite and do not take any TAMP marking. These properties are not further relevant for the purposes of the current paper. See Schnell (Reference Schnellin press) for some discussion.

4. One anonymous reviewer remarks that zero as opposed to pronominal form of nonlexical objects may also be lexically determined, for instances with verbs like eat, drink, or read. According to our definition, zero objects are referentially zeroes—that is to say that there is a specific referent recoverable from the discourse context. A verb like eat in English does not permit a zero object of this kind. The alternation that the reviewer is alluding to is not treated as an instance of zero objects here, but as a constructional variation between a transitive and an intransitive frame that this verb is compatible with.

5. Note that nonhuman NPs are usually not plural-marked, and anaphoric expressions do not agree with them in number.

6. Thus, Neeleman and Szendröi (Reference Neeleman and Szendröi2007) correlated the parameter of pro-drop with the morphological makeup of pronominal forms across individual languages, and Bickel (Reference Bickel2003) found a correlation between certain case-marking patterns in complement clause constructions and (higher) degrees of referential density across languages.

7. Our understanding of discourse topic is thus similar to Nichols' (Reference Nichols, Flier and Brecht1985) “theme,” or what Lichtenberk (Reference Lichtenberk and Fox1996) captured under the label “thematic prominence.” According to Lichtenberk (Reference Lichtenberk and Fox1996), a thematically most prominent discourse entity does not necessarily need to be mentioned more frequently than a less prominent one.

8. Note that the relativized function in Vera'a can be a pronoun or zero. There are neither specialized relative pronouns nor gapping as specific relativization strategies, hence the same alternation pertains here.

9. When collinear predictors are present in a regression model, because they are making similar contributions to model fit, it is difficult to determine the exact contribution per predictor. In a classification tree, a series of decisions is made to determine the best predictor for the first split, and then each partition of the data is looked at anew. Because of this, the presence of multicollinearity does not affect p-values, where the splitting in a tree occurs, nor the ranking of the strength of the predictors in the random forest analysis.

10. The classification tree split criterion was set at .95, resulting in splitting only when there are differences at the .5 p-value level. The tree was not pruned; all significant splits are shown.

11. Only 3 of 93 first or second person objects are zero in our corpus.

Note: Negative coefficients are associated with higher pronoun expression. BIC, Bayesian information criterion.

Note: Negative coefficients are associated with higher pronoun expression.

References

REFERENCES

Aissen, Judith. (2003). Differential object marking: Iconicity versus economy. Natural Language and Linguistic Theory 21(3):435483.Google Scholar
Akaike, Hirotogu. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. N. & Cáski, F. (eds.), Second International Symposium on Information Theory. Budapest: Akademiai Kaidó. 267281.Google Scholar
Anderson, Stephen. (1992). A-morphous morphology. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Ariel, Mira. (1988). Referring and accessibility. Journal of Linguistics 24(1):6587.Google Scholar
Ariel, Mira. (1990). Accessing noun-phrase antecedents. London: Routledge.Google Scholar
Ariel, Mira. (2000). The development of person agreement markers: From pronouns to higher accessibility markers. In Barlow, M. & Kemmer, S. (eds.), Usage-based models of language. Stanford: CSLI Publications. 197260.Google Scholar
Ariel, Mira. (2008). Pragmatics and grammar. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Barth, Danielle, & Kapatsinski, Vsevolod. (2017). A multimodal inference approach to categorical variant choice: Construction, priming and frequency effects on the choice between full and contracted forms of am, are and is. Corpus Linguistics and Linguistic Theory 13(2):158.CrossRefGoogle Scholar
Bates, Douglas, Maechler, Martin, Bolker, Ben & Walker, Steve. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1):148. doi:10.18637/jss.v067.i01.CrossRefGoogle Scholar
Biber, Douglas, & Conrad, Susan. (2009). Register, genre, and style. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Bickel, Balthasar. (2003). Referential density in discourse and syntactic typology. Language 79:708736.Google Scholar
Bickel, Balthasar, & Nichols, Joanna. (2007). Inflectional morphology. In Shopen, T. (ed.), Language typology and syntactic description. Vol. 3. Grammatical categories and the lexicon. Cambridge: Cambridge University Press. 169240.Google Scholar
Burnham, Kenneth P., & Anderson, David R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. 2nd ed. New York: Springer.Google Scholar
Cameron, Richard, & Flores-Ferrán, Nydia. (2004). Perseveration of subject expression across regional dialects of Spanish. Spanish in Context 1:4165.Google Scholar
Chafe, Wallace. (1976). Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In Li, C. N. (ed.), Subjects and topics. New York: Academic Press. 2556.Google Scholar
Chafe, Wallace. (1994). Discourse, consciousness, and time: The flow and displacement of conscious experience in speaking and writing. Chicago: The University of Chicago Press.Google Scholar
Chung, Sandra. (1984). Identifiability and null objects in Chamorro. Proceedings of the Annual Meeting of the Berkley Linguistic Society 10:116130.CrossRefGoogle Scholar
Clancy, Patricia M. (1980). Referential choice in English and Japanese discourse. In Chafe, W. (ed.), The Pear stories: Cognitive, cultural, and linguistic aspects of narrative production. Norwood: ABLEX Publishing. 127202.Google Scholar
Comrie, Bernard. (1989). Language universals and linguistic typology. 2nd ed. Chicago: The University of Chicago Press.Google Scholar
Comrie, Bernard, Haspelmath, Martin, & Bickel, Balthasar. (2015). The Leipzig glossing rules: Conventions for interlinear morpheme-by-morpheme glosses. Leipzig: Max-Planck Institute for Evolutionary Anthropology.Google Scholar
Corbett, Greville C. (2003). Agreement: Canonical instances and the extent of the phenomenon. In Booij, G., DeCesaris, J., Ralli, A., & Scalise, S. (eds.), Topics in morphology: Selected papers from the Third Mediterranean Morphology Meeting (Barcelona, Sep 20–22, 2001). Barcelona: Universitat Pompeu Fabra. 109128.Google Scholar
Dalrymple, Mary, & Nikolaeva, Irina. (2011). Objects and information structure. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Dowty, David. (1982). Grammatical relations and Montague grammar. In Jackson, P. & Pullum, G. K. (eds.), The nature of syntactic representation. Reidel: Dordrecht. 79130.Google Scholar
DuBois, John W. (1987). The discourse basis of ergativity. Language 63:805855. doi: 10.2307/415719.Google Scholar
Genetti, Carol, & Crain, Laura D. (2003). Beyond preferred argument structure: Sentences, pronouns and given referents in Nepali. In Du Bois, J. W., Kumpf, L. E., & Ashby, William J. (eds.), Preferred argument structure: Grammar as architecture for function. Amsterdam: Benjamins. 197223.Google Scholar
Givón, Talmy. (1976). Topic, pronoun, and grammatical agreement. In Li, C. N. (ed.), Subjects and topics. New York: Academic Press. 149188.Google Scholar
Givón, Talmy. (1983a). Topic continuity in discourse: An introduction. In Givón, T. (ed.), Topic continuity in discourse: A quantitative cross-language study. Amsterdam: John Benjamins. 141.Google Scholar
Givón, Talmy. (ed.) (1983b). Topic continuity in discourse: A quantitative cross-language study. Amsterdam: John Benjamins.Google Scholar
Givón, Talmy. (ed.) (1998). The functional approach to language. In Tomasello, M. (ed.), The new psychology of language: Cognitive and functional approaches to language structure. Hillsdale: Erlbaum. 3862.Google Scholar
Haspelmath, Martin. (2013). Argument indexing: A conceptual framework for the syntactic status of bound person forms. In Bakker, D. & Haspelmath, M. (eds.), Language across boundaries: Studies in memory of Anna Siewierska. Berlin: De Gruyter Mouton. 197226.CrossRefGoogle Scholar
Hothorn, Torsten, Buehlmann, Peter, Dudoit, Sandrine, Molinaro, Annette, & Van Der Laan, Mark. (2006). Survival ensembles. Biostatistcs 7(3):355373.CrossRefGoogle ScholarPubMed
Hothorn, Torsten, & Zeileis, Achim. (2015). partykit: A modular toolkit for recursive partytioning in R. Journal of Machine Learning Research 16:39053909.Google Scholar
Huang, C.-T. James. (1984). On the distribution and reference of empty pronouns. Linguistic Inquiry 15:531574.Google Scholar
Huang, Yan. (1994). The syntax and pragmatics of anaphora: A study with special reference to Chinese. Cambridge: Cambridge University Press.Google Scholar
Huang, Yan. (2000). Anaphora. Cambridge: Cambridge University Press.Google Scholar
Iemmolo, Giorgio. (2010). Topicality and differential object marking: Evidence from Romance and beyond. Studies in Language 34:239272.Google Scholar
Iemmolo, Giorgio, & Klumpp, Gerson. (2014). Introduction. In Iemmolo, G. & Klumpp, G. (eds.), Differential object marking: Theoretical and empirical issues. Special issue, Journal of Linguistics 52(2):271279.Google Scholar
Krifka, Manfred. (2008). Basic notion of information structure. Acta Linguistica Hungarica 55(3–4):243276.Google Scholar
Kuznetsova, Alexandra. (2014). lmerTest: Tests in linear mixed effects models. Version 2.0-20. Package for R. Available at: http://cran.r-project.org/web/packages/lmerTest/index.html. Accessed January 15, 2015.Google Scholar
Lambrecht, Knud. (1994). Information structure and sentence form: Topic, focus and the mental representations of discourse referents. Cambridge: Cambridge University Press.Google Scholar
Lichtenberk, František. (1996). Patterns of anaphora in To'aba'ita narrative discourse. In Fox, B. (ed.), Patterns of anaphora. Amsterdam: John Benjamins. 381411.Google Scholar
Malau, Catriona. (2016). A grammar of Vurës, Vanuatu. Boston: de Gruyter Mouton.Google Scholar
Meyerhoff, Miriam. (2002). Formal and cultural constraints on optional objects in Bislama. Language Variation and Change 14:323346.CrossRefGoogle Scholar
Neeleman, Ad, & Szendröi, Kriszta. (2007). Radical pro drop and the morphology of pronouns. Linguistic Inquiry 38(4):671714.Google Scholar
Nichols, Joanna. (1985). The grammatical marking of theme in literary Russian. In Flier, M. S. & Brecht, R. D. (eds.), Issues in Russian morphosyntax. Columbus: Slavica Publishers. 170186.Google Scholar
Poplack, Shana. (1980). The notion of the plural in Puerto Rican Spanish: Competing constraints on /s/ deletion. In Labov, W. (ed.), Locating language in time and space. New York: Academic Press. 5567.Google Scholar
Prince, Ellen. (1981). Towards a new taxonomy of given and new. In Cole, P. (ed.), Radical pragmatics. New York: Academic Press. 223255.Google Scholar
Prince, Ellen. (1992). The ZPG letter: Subjects, definiteness, and information status. In Thomson, S. A. & Mann, W. C. (eds.), Discourse description: Diverse linguistic analyses of a fund raising text. Philadelphia: John Benjamins. 295325.Google Scholar
R Core Team. (2015). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available at: https://www.R-project.org/. Accessed July 20, 2015.Google Scholar
Rizzi, Luigi. (1986). Null objects in Italian and the theory of pro. Linguistic Inquiry 17:501557.Google Scholar
Scherre, Maria M. P., & Naro, Anthony J. (1991). Marking in discourse: “Birds of a feather”. Language Variation and Change 3(1):2332.Google Scholar
Schnell, Stefan. (2011). A grammar of Vera'a. Kiel: Kiel University.Google Scholar
Schnell, Stefan. (in press). Whence subject-verb agreement? Investigating the role of topicality, accessibility, and frequency in Vera'a texts. Linguistics 56(4).Google Scholar
Schwenter, Scott A. (2006). Two kinds of differential object marking in Portuguese and Spanish. In Amaral, P. & Carvalho, A. M. (eds.), Portuguese-Spanish interfaces: Diachrony, synchrony, and contact. Amsterdam: John Benjamins. 238260.Google Scholar
Schwenter, Scott A. (2014). Null objects across South America. In Face, T. L. & Klee, C. A. (eds.), Selected proceedings of the 8th Hispanic Linguistics Symposium. Somerville: Cascadila Proceedings Project. 2336.Google Scholar
Siewierska, Anna. (2004). Person. Oxford: Oxford University Press.Google Scholar
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, & Zeileis, Achim. (2008). Conditional variable importance for random forests. BMC Bioinfomatics 9:307. Available at: http://www.biomedcentral.com/1471-2105/9/307. Accessed November 12, 2012.Google Scholar
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, & Hothorn, Torsten. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8:25. Available at: http://www.biomedcentral.com/1471-2105/8/25. Accessed November 12, 2012.Google Scholar
Strobl, Carolin, Malley, James, & Tutz, Gerhard. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14(4):323348.Google Scholar
Tagliamonte, Sali A., & Baayen, R. Harald. (2012). Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24(2):135178.Google Scholar
Torres Cacoullos, R. & Travis, Catherine E. (2013). Prosody, priming and particular constructions: The patterning of English first-person singular subject expression in conversation. Journal of Pragmatics 63:1934.Google Scholar
Travis, Catherine E. (2007). Genre effects on subject expression in Spanish: Priming in narrative and conversation. Language Variation and Change 19:101135.Google Scholar
Figure 0

Table 1. Paradigm of Vera'a personal pronouns

Figure 1

Table 2. Overview of Vera'a corpus data

Figure 2

Table 3. Object form by animacy and discourse topicality (n = 450)

Figure 3

Table 4. Object form by form of the antecedent (n = 450)

Figure 4

Table 5. Object form by function of antecedent (n = 450)

Figure 5

Table 6. Object form by distance of antecedent (n = 449)*

Figure 6

Table 7. Object form by discourse interruption

Figure 7

Table 8. Object form by antecedent (n = 450)

Figure 8

Figure 1. Classification tree of Vera'a object expression.

Figure 9

Figure 2. Variable importance ranking from random forest for Vera'a object expression.

Figure 10

Table A1. Results of mixed-effects generalized linear regression predicting animacy with discourse topicalityLog likelihood: −129.5AIC: 279.1BIC: 320.2Speaker variance: .03 ± .17Discourse topicality by speaker variance: .00 ± .06

Figure 11

Table A2. Results of mixed-effects generalized linear regression with animacyLog likelihood: −150.6AIC: 321.2BIC: 362.3Speaker variance: 1.58 ± 1.26Animacy by speaker variance: 2.69 ± 1.64