1 Introduction
Cross-linguistic research has shown that the constituents of an utterance “prefer to remain within their proper domain [… and that] domains prefer not to be interrupted by constituents from other domains” (Dik, Reference Dik and Hengeveld1997, p. 402). Yet there exist many systemic exceptions to this preference, such as topicalization and wh-questions, as shown in examples (1) and (2) (taken from Goldberg, Reference Goldberg2006, pp. 10, 21):
(1) What did Liza buy Zach?
(2) A dozen roses, Nina sent her mother!
Such utterances exhibit long-distance dependencies (LDDs), which are notoriously difficult to handle in a formally explicit way because they involve constituents that seem to have been extracted from their canonical position in the utterance: in both examples, the direct object is placed in clause-initial position instead of adhering to the usual SVO order. Especially analyses within the tradition of generative grammar have treated LDDs as deviations from basic constituent structure, evoking operations of movement in transformational accounts (e.g., Chomsky, Reference Chomsky, Culicover, Wasow and Akmajian1977; Cheng & Corver, 2006) or LDD-specific rules such as filler−gap constructions in non-transformational accounts (e.g., Sag, Reference Sag2010).
In recent years, however, a cognitive-functional alternative has started to crystallize in which all utterances – including those that contain LDDs – are analyzed in terms of more straightforward surface generalizations where differences in surface patterns emerge spontaneously as a side effect of how grammatical constructions interact with each other (Goldberg, Reference Goldberg2002). For instance, Goldberg (2006, p. 10) writes that example (1) is simply the result of the combination of five lexical constructions (Liza, buy, Zach, what, and do) and at least five grammatical constructions (the ditransitive construction, question construction, subject−auxiliary construction, VP construction, and NP construction). Unfortunately, whereas the generative tradition has proposed explicit formalizations (e.g., Alexiadou, Kiss, & Müller, Reference Alexiadou, Kiss and Müller2012; Ginzburg & Sag, Reference Ginzburg and Sag2000; Sag, Reference Sag2010), no such formalization exists for the usage-based alternative.
The goal of this paper is to show that it is entirely feasible to implement a cognitive-functional analysis of LDDs in a formally precise way. More specifically, this paper reports on a computational implementation of English long-distance dependencies in Fluid Construction Grammar (FCG; Steels, Reference Steels2011a, Reference Steels2012), which includes a processing model that works for both parsing and production. The cognitive-functional approach improves upon the generative approach in terms of completeness, explanatory adequacy, and theoretical parsimony.
This paper offers a general description of the implementation. Its computational and formal details can be inspected through an interactive web demonstration and downloadable sample grammar at http://www.fcg-net.org/demos/ldd/. Throughout the paper, I will use ‘Fluid Construction Grammar’ or simply ‘FCG’ to denote my linguistic analysis, and ‘FCG-system’ to refer to the computational instrument that I used for formulating the analysis.
2 The filler−gap analysis
Before turning to this paper’s proposal, I will first briefly discuss the most influential formalization of LDDs: filler−gap constructions. The filler−gap analysis treats long-distance dependencies as a chain of local dependencies. This approach can be traced back to a seminal proposal by Gazdar (Reference Gazdar1981) and has been further pursued within Head-Driven Phrase Structure Grammar (HPSG; Ginzburg & Sag, Reference Ginzburg and Sag2000) and Sign-Based Construction Grammar (SBCG; Sag, Reference Sag2010). Both theories are generative grammars in the original sense that they aim to develop a competence model that is capable of licensing all well-formed utterances of a language. Generation should thus not be confused with actual language processing: “[A] generative grammar is not a model for a speaker or hearer. It attempts to characterize in the most neutral possible terms the knowledge of the language” (Chomsky, Reference Chomsky1965, p. 9; also see Sag & Wasow, Reference Sag, Wasow, Borsley and Börjars2011, on process-neutral competence grammars).
2.1 argument realization in SBCG
SBCG uses typed feature structures for modeling linguistic objects and feature descriptions for describing possible feature structures. It is also a strongly lexicalist theory in which the morphosyntactic behavior of verbs is largely specified in the lexicon. For instance, the form hit is typed as a transitive verb, as informally shown in example (3). The valence of the verb is described as a feature−value pair in which the value of the feature VAL(ence) is an ordered list of two valents that the verb must combine with.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:68556:20160414111437116-0165:S1866980814000088_figu1g.gif?pub-status=live)
SBCG is a constraint-based grammar, which means that it posits linguistic knowledge as static constraints that apply simultaneously and that are processing-independent. As Sag (Reference Sag2010) notes, however, it is convenient to conceptualize argument realization in SBCG as a gradual bottom-up saturation of a verb’s valence list, in which each phrase takes away one of the elements of that list until an empty valence list remains. For instance, in example (4), the verb form hit may combine with its direct object (e.g., the ball) in a VP. That VP also includes a VAL feature, but this time the verb’s direct object has been removed from it. When combined with the verb’s subject, the last remaining element of the verb’s valence list is taken away, resulting in the empty (and thus fully saturated) VAL(ence) list of the sentence S.
2.2 long-distance dependencies in SBCG
All phrasal (or combinatoric) constructions in SBCG impose an immediate dominance relation between a single parent node and its immediate children, much like the rules of a phrase structure grammar. A direct consequence is that valence selection is local, that is, valence saturation is only possible by combining the valence of a head daughter with the feature−value pairs of a sister phrase.
The locality of selection works fine for canonical word orders as found in The boy hit the ball. However, LDDs obviously challenge this approach because they involve dependents that are situated beyond local domains. SBCG analyzes such long-distance dependencies in terms of gaps at an extraction site (i.e., the position where a constituent seems to be missing), and fillers for those gaps in a different position, as illustrated for topicalization and wh-questions in examples (5−6).
The ball_FILLER the boy hit_GAP?(5)
What_FILLER did the boy hit_GAP? (6)
The formalization of this analysis contains three steps. The first is to introduce a new feature called GAP to identify which element of the valence list is ‘missing’. A lexical rule then changes the default valence of the word form hit to the following VAL(ence) and GAP features:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:59436:20160414111437116-0165:S1866980814000088_figu3g.gif?pub-status=live)
The second part of the solution is to communicate information about the missing element to the position where it can be found. Since communication in a phrase structure is only possible between a parent node and its immediate children, the information about the GAP somehow needs to be passed upwards in the tree. Transformational grammars propose various kinds of movement operations; non-transformational grammars such as SBCG use feature percolation, as shown in example (8).
As a final part of the solution, a filler−gap construction introduces the filler of the gap, which prevents the GAP feature from being percolated any further. As a consequence, there is an additional S-node, as can be seen at the top of example (8). Please note that these syntactic trees are only visualizations for the sake of convenience and that the actual representations in SBCG consist of feature−value pairs.
3 A cognitive-functional alternative
The alternative proposed in this paper subscribes itself to a cognitive-functional tradition in linguistics (a.o. Croft, Reference Croft, Östman and Fried2005; Dik, Reference Dik and Hengeveld1997; Geeraerts, Reference Geeraerts, Momma and Matto2008; Goldberg, Reference Goldberg2006; Halliday, Reference Halliday1973; Nuyts, Reference Nuyts, Brdar, Gries and Fuchs2011; Verhagen, Reference Verhagen2005). While generative analyses are restricted to (process-neutral) competence models of grammar (Chomsky, Reference Chomsky1965; Sag & Wasow, Reference Sag, Wasow, Borsley and Börjars2011), a cognitive-functional approach is also concerned with the dynamic mappings between meaning and form that are employed by speakers for expressing their conceptualizations of the world (= production) and by listeners for analyzing utterances into a meaning (= parsing).
3.1 a multi-perspective linguistic analysis
Even though the filler−gap analysis is adopted in all kinds of linguistic schools (e.g., Dabrowska, Reference Dabrowska2008), it is only a logical necessity if you (as in SBCG) directly couple all “interpretative and use conditions” (Michaelis, Reference Michaelis, Hoffman and Trousdale2013, p. 138) to phrase structure (i.e., the syntactic tree). Such an approach uses “the primitives of a tree representation, namely, linear order, dominance (but not multi-dominance) relations and syntactic category labels, […] to represent several types of information that seem quite dissimilar in nature” (Kaplan & Zaenen, Reference Kaplan, Zaenen, Dalrymple, Kaplan, Maxwell and Zaenen1995, p. 137).
Functional linguistics, on the other hand, typically separates different linguistic perspectives that are on equal footing with the phrase structure perspective. For example, Functional Grammar has a model where multiple layers contribute an utterance’s spatio-temporal perspective, its illocutionary perspective, and so on (Hengeveld, Reference Hengeveld1989). In many theories, these different perspectives are separate layers that interact with each other through linking rules (Nuyts, Reference Nuyts, Brdar, Gries and Fuchs2011).
Fluid Construction Grammar also assumes that there are different linguistic perspectives, but does not split them in separate layers. The reason for avoiding separate layers is that different linguistic perspectives often overlap with one another in intricate ways, so there can never be a clear-cut distinction. Indeed, one of the foundations of construction grammar is that all linguistic knowledge is represented in the same way, and that constructions can cut vertically across different levels of organization (Croft, Reference Croft, Östman and Fried2005). FCG, which also uses feature structures, thus represents different linguistic perspectives in terms of feature−value pairs in the same feature structure. It is therefore important to keep in mind that all visualizations used in this paper should be considered as infographics that only emphasize partial information that actually belongs to a feature structure underneath. Figure 1, for instance, displays a coupled feature structure that explicitly visualizes the phrase structure perspective.
Fig. 1 Fluid Construction Grammar represents all linguistic information as a coupling between a semantic and a syntactic feature structure. Different linguistic perspectives, such as an utterance’s phrase structure, its functional structure, or its illocutionary force, are explicitly represented as feature−value pairs in the same coupled feature structure. This Figure, taken from the FCG-system’s web interface and which only shows part of the coupled feature structure, visualizes the phrase structure perspective as a syntactic tree.
3.2 argument realization in FCG
Argument realization in Fluid Construction Grammar differs from SBCG in three important ways. First, there is no locality of selection in FCG because FCG decouples an utterance’s functional structure from its phrase structure. Instead, the valence of a verb contains pointers to other feature−value pairs in the feature structure regardless of their position in the utterance’s phrase structure. Formally, the FCG-system organizes features into a flat list of units. A unit can be considered as a place holder for grouping together certain feature−value pairs, as shown in example (9).
Any collection of feature−value pairs can have their own unit. Each unit has a unique symbol as its name, which can be used for pointing to that unit’s collection of features, as shown in example (10). Note that the unit-names in the example start with a question mark, which indicates that they are variables that still need to be bound to an actual value. In the current example, this means that the lexical entry does not ‘know’ which units it needs to combine with: the syntactic valence of hit simply states that the verb can be combined with a subject and an object, but it is agnostic as to where those units are located in the utterance.
The second difference is that the valence of a verb in FCG only represents its combinatorial potential: it only specifies the kinds of units that a verb might be combined with, but it does not commit itself to an actual argument realization pattern. Verbs can therefore directly interact with multiple argument structure constructions without the intervention of lexical rules. The actual argument realization is determined by an argument structure construction, which selects the valents that it requires from the verb’s combinatorial potential.
Figure 2 provides an illustration for the verb form sent. As can be seen in the top left, the syntactic valence of sent includes four potential valents. As can be seen at the bottom left, a transitive construction only requires the verb to be combined with a subject and a direct object. A ditransitive construction, however, selects three syntactic valents from the verb’s combinatorial potential as shown in the top right. A passive caused-motion construction, shown at the bottom right, selects a subject and an oblique.
Fig. 2 In FCG, verbs interact directly with multiple argument structure constructions without the intervention of lexical rules. Verbs provide their semantic and syntactic combinatorial potential, from which argument structure constructions select what they need. Grammatical constructions may also extend a lexical entry’s valence through coercion. For a formal description of argument realization in FCG, see van Trijp (Reference van Trijp and Steels2011).
Finally, there is no direct coupling between a verb’s syntactic and its semantic valence (i.e., semantic roles such as agent, patient, and so on). In lexicalist approaches, for instance, the subject always maps onto the agent except if a lexical rule (e.g., a passive rule) changes this mapping. In FCG, on the other hand, argument structure constructions decide how an utterance’s functional structure maps onto its argument structure. As can be seen in Figure 2, the active argument structure constructions map the syntactic subjects onto the semantic role of agent. The passive construction, shown at the bottom right, maps the subject onto the patient role.
3.3 handling long-distance dependencies
Due to the fact that argument structure constructions regulate the mapping between an utterance’s functional structure and its argument structure without necessarily consulting the utterance’s word order, the problem of a gap disappears: all of the arguments are present in the utterance, so there simply is no gap! Instead, we only need to explain two things: (a) Why does a speaker choose different word orders?; and (b) How can a listener identify ‘who did what to whom’ if word order is not fixed?
This paper hypothesizes that long-distance dependencies spontaneously emerge as a side effect of the enormous challenges of communication. Speakers not only have to communicate about a particular state-of-affairs (SoA), they also need to indicate how the SoA fits within the larger discourse context (= information structure) and what their intentions are in producing the utterance (= illocutionary force). Time and again, the speaker must figure out how to best reconcile these different considerations within a single clause because it is impossible to spell out all the information in detail (given the spatio-temporal limitations of communication and the cognitive resources of language users). Grammar can thus be seen as a highly adaptive code that allows language users to do so.
Does this hypothesis hold for English? At first sight, the language does not seem to be very generous in helping its listeners to retrieve the argument structure of an utterance: the language has almost no case marking or agreement. There are, however, at least two reliable cues. First, the subject of a clause is almost always adjacent to the phrase that contains the clause’s main verb. Second, in terms of information structure, the language places the topic of a clause in clause-initial position, as illustrated in examples (11−13).
(11) John saw the man.
sub/topmain-verbd.obj
(12) The man John saw.
d.obj/topsubjmain-verb
(13) I know [what
subj/topmain-verb [subcl.d.obj/top
you saw].
subcl.subjsubcl.main-verb]D.OBJ
The above examples display canonical word order when the topic and the subject of a clause coincide with each other, whereas a different order appears when that is not the case. The same observation is valid for wh-questions, shown in examples (14−16).
(14) Who saw the man?
sub/topmain-verbd.obj
(15) Who has John seen?
d.obj/topauxsubjmain-verb
(16) Who did John see?
d.obj/topaux/obj-markersubj main-verb
One additional difficulty for questions is that there seems to be subject−auxiliary inversion except if the discourse topic is also the subject. The word inversion suggests a deviation from the canonical order, and most linguistic analyses indeed posit different rules for subject-wh-questions and non-subject-wh-questions. But these analyses miss an important generalization, namely that the finite verb form of the main clause has to follow the topic of the question. In example (14), the topic coincides with the subject, so in simple clauses the main verb form is finite itself. In example (15), auxiliary−subject inversion then only appears as a side effect of adhering to three conventions: place the topic in the first position, keep the subject and the main verb adjacent to each other, and have the finite verb form follow the topic of the question. However, when there is no auxiliary, as in example (16), the language calls on do-support in order to maintain all three rules. Besides performing its verbal functions, auxiliary-do effectively behaves as an object marker in wh-questions.
In sum, if we manage to disentangle argument structure, information structure, and illocutionary force from phrase structure, we can eliminate the whole formal machinery needed for filler−gaps. In the following sections, I will show how such an analysis works for parsing and production in Fluid Construction Grammar. I will break down each process in different steps to illustrate the operationalization. Note, however, that these steps do not necessarily follow this order neatly. As is common to all construction grammars, the linguistic inventory in Fluid Construction Grammar is conceived as a syntax−lexicon continuum. The FCG-system will thus apply constructions as soon as it is appropriate to do so.
3.4 parsing long-distance dependencies
Let us start from the viewpoint of the listener and look at how the utterances of examples (17−20) can be parsed. Examples (17) and (19) feature canonical word order and examples (18) and (20) involve long-distance dependencies. Some recent analyses have also argued that the subject in example (19) should be analyzed as being extracted (e.g., Bouma, Malouf, & Sag, Reference Bouma, Malouf and Sag2001).
(17) The boy hit the ball.
(18) The ball the boy hit.
(19) Who has hit the ball?
(20) What did the boy hit?
3.4.1 The subject as vantage point and syntactic anchor
While processing an utterance, the FCG-system keeps all information about that utterance in a structure called the transient structure, which consists of a semantic feature structure coupled to a syntactic feature structure. At the beginning of a processing task, the transient structure is empty, but it grows more and more elaborate as more and more constructions add new information to it.
The implementation uses predominantly data-driven processing, which means that the linguistic structure that underlies an utterance is gradually built up in a bottom-up fashion. So parsing starts by segmenting the utterance into discrete forms, which are then categorized into words by morphological and lexical constructions, and which can then be grouped together as phrases (see Steels, Reference Steels and Steels2011b, for a detailed account of lexico-phrasal processing in FCG). So the parser will find similar constituents for all four utterances, as shown in examples (21−24). Since auxiliary-do in example (24) falls outside the immediate domain of the VP, it is not yet recognized as a member of the VP.
All of these phrases are disconnected, which means that the grammar still has to identify the relations between the phrases. One relation, the verb’s subject, can be identified through the subject’s position with respect to the main verb. The subject and the verb thus function as a syntactic anchor on which the listener can rely for disambiguating the roles of the other phrases. The subject also plays a significant conceptual role. Following Dik (Reference Dik and Hengeveld1997), I assume that the subject marks the vantage point from which an event is presented (also see Croft, Reference Croft, Butt and Geuder1998). In the utterance The boy hit the ball, the vantage point is the boy who applies force on an object. In The ball was hit by the boy, the vantage point shifts to the undergoer.
The FCG-implementation explicitly represents the subject−verb anchor as a unit that takes the subject-NP and the VP as its subunits. Even though these two phrases do not form a constituent in the traditional sense, the cliticization of auxiliaries and the subject (e.g., I’m, you’ll, he’ll, etc.) show that there is some unity between them. Figure 3 illustrates how the event structure’s vantage point maps onto the SV-anchor.
Fig. 3 Listeners can reliably retrieve the subject of the clause through its position with respect to the main verb. The subject expresses the vantage point from which a speaker presents an event.
The rest of the functional structure of an utterance can almost always be properly disambiguated once the subject has been identified. In all four examples, the other noun phrase has to be the direct object because this is the only remaining option for a bare NP in a transitive clause. Similarly, two bare NPs that precede the verb must be object and subject (as in the ball the boy hit). In the case of other clause types, English offers additional cues. For instance, two bare NPs that follow the verb must be indirect object and direct object (as in the ditransitive utterance Nina sent her mother flowers). Oblique functions are marked by a preposition (as in The boy hit the ball with his bat).
3.4.2 Argument linking
Since English provides sufficient cues for identifying an utterance’s functional structure, even in the case of long-distance dependencies, argument structure constructions do not have to care about word order. For instance, an active transitive construction can simply point to the verb’s subject and object by using its unit-name, no matter which position that unit takes in the utterance’s phrase structure. The listener can thus apply the transitive construction without uncertainty and retrieve ‘who did what to whom’ by mapping the functional structure onto the argument structure. Example (25) shows this process in a Goldbergian diagram of argument linking (Goldberg, Reference Goldberg1995).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:80465:20160414111437116-0165:S1866980814000088_figu8g.gif?pub-status=live)
Figure 4 shows the result of argument linking, as performed by an active transitive construction, in the transient structure. The Figure only shows the three units and their feature−value pairs that are relevant for the discussion. In a first step, the argument structure construction checks the VP’s syntactic valence to find which units it points to as the subject and object. Should the valence still be underspecified, the argument structure construction can try multiple hypotheses. Second, once the subject and object units are found, the construction takes the value of their ARGS feature. The ARGS feature in the FCG-system can be considered as a pointer to the referent of that unit in the discourse context, so the variable ?referent-the-boy is bound to the boy that the speaker is talking about, and the variable ?referent-the-ball is bound to the actual ball of the current event. The construction will now repeat these variable names in the semantic valence of the VP, thereby indicating that the boy is the agent and the ball the patient. Finally, the mapping between the argument structure and the event-participant structure was already specified in the lexicon.
Fig. 4 This shows the result of how an active transitive construction has mapped the functional structure of an utterance onto its argument structure in parsing. First, the construction has retrieved the subject and object units by using the unit-names specified in the VP’s syntactic valence. Second, the construction has linked the referents of those units to the semantic valence of the verb by repeating the same variable names.
3.4.3 Information structure and illocutionary force
Finally, the listener can retrieve the discourse status of the referents through information structure constructions and the intended goal of the speaker through speech act or illocutionary force constructions.
In terms of information structure, I have implemented two main pragmatic functions based on Dik (1997, Ch. 13): topic and focus. Topicality is defined in terms of aboutness: the topic of an utterance is what the utterance is ‘about’. Focality is defined in terms of salience: focus is used for highlighting the most important information given the current communicative setting. Both topic and focus are prototypes and can be divided into different subtypes.
In the implementation, the topic construction assigns the pragmatic role of topic to the phrase that is in clause-initial position. Since parsing occurs incrementally from left-to-right, the topic construction is always the first information structure construction to apply. Indeed, in actual parsing, this construction will already have identified the topic before all utterance constituents have been identified. So in examples (17−20), the phrases the boy, the ball, who, and what are identified as the topic.
All four examples also feature a kind of information gap focus whereby the speaker provides new information that the listener is assumed to be missing (i.e. the ball in examples (17) and (18)), or where the speaker solicits information from the listener (i.e. who and what in examples (19) and (20)). In the first case, which Dik (1997, p. 333) calls ‘completive focus’, focus is not associated with a particular position in the utterance. In the latter case, called ‘questioning focus’, the focus element takes up first position. Indeed, in wh-questions, the pragmatic roles of topic and focus are assigned to the same phrase. Examples (26−29) show the pragmatic functions of the four utterances.
(26) The boy hit the ball.
topiccompletive-focus
(27) The ball the boy hit.
topic+completive-focus
(28) Who hit the ball?
topic+questioning-focus
(29) What did the boy hit?
topic+questioning-focus
Finally, speech act constructions help the listener to identify the speaker’s intentions. In examples (26) and (27), the declarative construction identifies the utterance as an affirmative assertion. The declarative construction checks whether the subject−verb anchor of the utterance contains the finite verb form, and whether the subject involves a noun phrase that either identifies a referent (e.g., the boy) or construes one (e.g., a boy). It should thus not start with a wh-word (e.g., which boy). In examples (28) and (29), the interrogative construction identifies the utterance as a request for information. The interrogative construction checks whether the finite verb form immediately follows a phrase that has been assigned questioning focus. Additionally, the interrogative construction ‘knows’ that the finite verb is a subunit of the clause’s VP, so in the case of example (29), auxiliary-do is recognized as a discontinuous element that belongs to the same VP as hit does.
3.4.4 Summary
The above discussion shows that the same set of constructions are able to parse alternative utterance types without the need for additional formal machinery. Instead, constructions need to be flexible enough to combine freely with each other. Figure 5 illustrates this principle for the two declarative utterances The boy hit the ball and The ball the boy hit. The Figure shows the two utterances (in italics), and the parsed meanings (in regular font). For the sake of convenience, meanings are represented using a first-order predicate calculus with prefix notation. The implementation, however, uses a computational framework for Embodied Cognitive Semantics (Spranger, Pauw, Loetzsch, & Steels, Reference Spranger, Pauw, Loetzsch, Steels, Steels and Hild2012). All lines marked by ‘1’ involve meanings that are shared by the utterances; ‘2’ leads to a difference in meaning.
Fig. 5 This shows that parsing the two utterances The boy hit the ball and The ball the boy hit yields the same meanings (indicated by ‘1’) except for which phrase is the topic of the utterance (indicated by ‘2’). In both cases, however, the same constructions have been applied without any special rules for analyzing the second utterance (featuring ‘topicalization’).
Both utterances yield almost the same meanings except for which phrase refers to the topic of the utterance. Both utterances were parsed using four lexical constructions (the, boy, hit, and ball), two phrasal constructions (NP- and VP-construction), the active transitive construction, the subject−verb anchor construction, the target construction (i.e., the secondary vantage point or ‘target’ of the event profile, which maps onto the object of the utterance), and the completive focus construction. Both utterances also involve the same topic construction, but due to the different word order, a different phrase is identified as the topic.
3.5 producing long-distance dependencies
The task of the speaker is different from that of the listener because the speaker decides what to say, which means that (s)he already knows which event structure applies to the meanings that (s)he wants to express, and what the discourse status is of the referents (s)he wishes to talk about. In other words, the speaker needs to choose how to verbalize his or her conceptualizations into an utterance.
3.5.1 Deciding what to say
First, the speaker conceptualizes what to say. As is common practice in cognitive linguistics, we’ll assume a Frame Semantics approach here for handling events (Fillmore, Reference Fillmore, Cogan, Thompson, Thurgood, Whistler and Wright1977). According to the FrameNet database (Baker, Fillmore, & Lowe, Reference Baker, Fillmore and Lowe1998), a verb like hit can take two to nine ‘core’ frame elements and dozens of additional ones depending on its meaning. In the sense of hitting a target, the hit-frame takes two core elements (an Agent and a Target) and seven non-core elements (including Instrument, Manner, Means, Place, Purpose, Subregion, and Time). The verb is attested in dozens of different argument realization patterns.
Event profile. It is well known that speakers almost never express all of the Frame Elements associated with a particular frame, but instead profile only those elements that are relevant for the speaker’s communicative goals (Croft, Reference Croft, Butt and Geuder1998). In examples (17−20), the speaker profiles the core elements ‘hitter’ (the boy) and ‘hit target’ (the ball). The speaker also situates the event vis-à-vis the moment of speaking. The verbal expressions hit and did hit indicate that the speaker has conceptualized the state-of-affairs in all four examples as an event that took place in the past.
Along with the event profile, the speaker chooses the vantage point from which the event should be viewed. In all four examples the ‘hitter’ is taken as the vantage point, and the ‘hit target’ is taken as the target towards which the action is directed. The vantage point and the target of the utterance are mapped onto the syntactic functions of subject and object, and the decision to use the most agent-like participant as the vantage point will lead to the use of the active voice.
Referent accessibility. For each Frame Element that is part of the event profile, the speaker needs to signal to the listener whether its referent should be construed or identified (Dik, Reference Dik and Hengeveld1997, p. 130). Identifiable referents are referents that the speaker assumes can be retrieved by the listener because of shared background knowledge or because of earlier reference in the discourse (also see Lambrecht, Reference Lambrecht1994, for a thorough discussion on the identifiability and accessibility of referents). These referents are typically expressed as determined phrases (e.g., the boy). Constructive reference, then, means that the listener does not retrieve a specific referent, but must instead construe one. Construed referents are typically introduced through indeterminate phrases in English (e.g., a boy). In the case of wh-phrases (e.g., who or what), the speaker expects the listener to make the choice between referent construal or identification in his or her answer.
Discourse status and illocutionary force. The speaker also decides how to package the information structure of the utterance. In examples (17) and (19), the speaker introduces the subject as the topic of the utterances. In examples (18) and (20), the object is the topic. Finally, the speaker must convey a particular intention through the utterance. In the two declarative utterances, the speaker expresses a commitment to the truth-value of the utterance, whereas in the two questions, the speaker inquires about a certain referent.
3.5.2 Placement of constituents
Once the speaker has conceptualized what to say, he or she starts a production task to verbalize that meaning into an utterance. This involves applying the lexical constructions for the words the, boy, hit, ball, what, who, and did; nominal and verbal phrasal constructions; the transitive construction; the subject−verb anchoring construction, and the target construction. Up until this point, no word order constraints have been specified except for the fact that the subject must precede the phrase that contains the main verb.
In a final step, information structure constructions take care of the placement of each constituent. Placement should not be confused with movement: there is no underlying structure in which constituents are moved to a different place. The FCG-system is an order-independent formalism that explicitly declares word order constraints as feature structures themselves, hence ‘placement’ simply means that constructions declare in which order constituents need to be uttered.
First, the Topic Construction places the topic in clause-initial position, followed by the remainder of the utterance. So, for example (17), the boy is uttered in first position, and for example (18), the ball is uttered at the beginning of the clause. In the case of more complex clauses, such as the ditransitive construction, additional word order constraints can be added by any kind of construction. For instance, even though argument structure constructions can be completely disconnected from word order, the ditransitive construction imposes a strong preference for the indirect object to follow the verb, as in Nina sent her mother a dozen roses (Goldberg, Reference Goldberg2006). Alternatively, information structure constructions may impose ordering constraints depending on the discourse status of phrases.
4 Discussion
The previous section gave an informal overview of how the FCG implementation parses and produces long-distance dependencies, and readers who are interested in the formal and computational details are invited to check how everything works through an online web demonstration at http://www.fcg-net.org/demos/. In this section, I will discuss how this paper’s proposal compares to the filler−gap analysis.
4.1 theoretical parsimony
From a purely theoretical point of view, the filler−gap analysis is only a logical necessity if one either assumes a basic structure from which all other structures are derived, or if one assumes that all linguistic constraints are bound by locality (as in a rigid phrase structure grammar). In both cases, special features (such as GAP), rules (such as lexical rules and filler−gap constructions), and formal machinery (such as feature percolation) are required for analyzing all deviations from the basic structure.
This paper’s proposal is theoretically more parsimonious because it entirely eliminates the need for those additional rules and mechanisms. In FCG, constructions can be freely combined as long as there is no conflict, so different conceptualizations will lead to differences in how the same constructions interact with each other. Long-distance dependencies thus emerge spontaneously as a side effect of the need to express multiple linguistic perspectives within a clause, including the event-participant structure, the argument structure, functional structure, information structure, and illocutionary force.
4.2 explanatory adequacy
Theoretical parsimony only matters if an analysis of a particular phenomenon is also compatible with its empirical facts and able to explain them. The filler−gap analysis has received considerable support from experiments in psychology (Gibson, Reference Gibson1998, Reference Gibson, Marantz, Miyashita and O’Neil2000) and linguistic typology (Bouma et al., Reference Bouma, Malouf and Sag2001). However, both sources of evidence can also be explained within this paper’s alternative. Moreover, the cognitive-functional alternative offers a better fit with data on language change.
4.2.1 Reviewing the psychological evidence
In psychology, there exists robust evidence from reading tasks and eye-tracking experiments that long-distance dependencies generally require more processing effort than sentences with canonical word order. According to the Dependency Locality Theory (DLT) of Gibson (Reference Gibson1998, Reference Gibson, Marantz, Miyashita and O’Neil2000), this additional processing complexity is caused by two types of cost: (a) an integration cost for integrating new information with the structures that have already been built during processing; and (b) a memory cost for storing information needed for later. Gibson (1998, p. 12) argues that the integration cost increases with the number of different discourse referents that intervene before a phrase can be integrated with its head. Consider examples (30) and (31), which show the syntactic dependency relations between words in an utterance, the referents that are introduced by each word, and the associated integration cost.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:59332:20160414111437116-0165:S1866980814000088_figu9g.gif?pub-status=live)
Example (30) first introduces a referent for the man. When the verb form visited is encountered, its subject who needs to be integrated with it. This induces 0 cost because integration is local. The referent of the event itself has a cost of 1. When later on the verb form drive is encountered, the integration cost is 1+3: 1 for the verb itself, and 3 for integrating the verb’s subject the man (1 for the man itself and 2 for the intervening referents). The integration cost of example (31) is higher, because here the referent of the subject of the relative clause intervenes before who can be integrated with visited.
Gibson (Reference Gibson1998, Reference Gibson, Marantz, Miyashita and O’Neil2000) has couched his experiments in terms of the filler−gap analysis. However, the predictions of the DLT are also fully compatible with the FCG analysis. In FCG, lexical constructions introduce variables for their referents, but it is only when the verb is encountered that the listener can link these variables to the meaning of the verb. The more discourse referents are introduced before argument linking takes place, or the longer a referent must be maintained in memory, the higher the processing cost.
The DLT is essentially a backtracking approach where processing complexity is measured in terms of an integration cost for preceding referents. Another, fully complementary research stream in psychology focuses on the role of anticipation and prediction in processing complexity. For instance, Surprisal Theory (Hale, Reference Hale2003; Levy, Reference Levy2008) assumes that words that are harder to predict given the previous context are also harder to process. As it turns out, when utterances contain elements that allow the listener to make reliable predictions, the DLT’s integration cost is reduced or even eliminated. Demberg and Keller (Reference Demberg and Keller2008) found that the integration cost for nominal dependents decreases when the verb is preceded by an auxiliary. Other experiments have also discovered antilocality effects whereby increasing argument−verb distance can actually facilitate processing (Vasishth & Lewis, Reference Vasishth and Lewis2006).
The latter psychological data seem to favor this paper’s cognitive-functional analysis over the filler−gap approach. In the filler−gap analysis, long-distance dependencies always create a processing overhead in terms of the number of constructions that need to be applied (i.e., filler−gap constructions and lexical rules), suggesting that such utterances should always be more difficult to process. In the FCG analysis, however, there is no such overhead. So when elements are encountered that help the listener to anticipate what is coming next (for example, auxiliary-do explicitly functions as an object marker in wh-questions, so the listener knows that the subject will follow next), the FCG analysis indeed predicts that processing complexity goes down.
4.2.2 Reviewing the typological evidence
Another source of evidence for the filler−gap analysis comes from linguistic typology. Bouma et al. (Reference Bouma, Malouf and Sag2001) provide examples from several languages in which some grammatical phenomena occur exclusively on extraction paths or exclusively in filler−gap constructions.
The typological evidence is, however, unreliable for two reasons. First, if it were true that some languages use filler−gap constructions, then it does not automatically follow that the same analysis is valid for all languages. On the contrary, there is now a broad consensus among leading typologists that such language universals do not exist (Croft, Reference Croft, Östman and Fried2005; Dryer, Reference Dryer, Bybee, Haiman and Thompson1997; Evans & Levinson, Reference Evans and Levinson2009; Haspelmath, Reference Haspelmath2007, 2010). Even phrase structure, the prerequisite for positing a gap in the first place, is not present in all languages (Evans & Levinson, Reference Evans and Levinson2009). As for English, historical data show that its current phrase structure only gradually emerged over time and is still evolving towards more complexity today (Van de Velde, Reference Van de Velde2011).
The second and more important reason is that the evidence is circular: when there is a deviation from the most common pattern, a filler−gap analysis is posited. When there is a filler−gap, we expect a deviation from the most common pattern. Moreover, as shown in this paper, special marking of long-distance dependencies can also be explained (and is to be expected) as a side effect of combining constructions in different ways in order to express different conceptualizations. For example, whereas the filler−gap analysis cannot explain whydo-support does not occur in wh-questions where the subject is assigned questioning focus, this follows naturally from the interaction of different linguistic perspectives in this paper’s approach.
4.2.3 Adding historical evidence
Formal accounts of the filler−gap analysis have mainly emphasized the mathematical and structural properties of long-distance dependencies. Such an account does not motivate why English exhibits certain patterns. In the FCG analysis, on the other hand, different patterns are functionally motivated, so they offer a better explanation for the structural properties of English.
The functional motivation of different patterns is corroborated by evidence from historical linguistics. Let us take the evolution of English auxiliaries as an example. Hudson (Reference Hudson1997) writes that auxiliaries only emerged in Old English, and that in the Middle English period they could still be used in roughly the same way as full verbs. At this period in time, auxiliary-do did not yet have any special behaviors. In the past millennium, however, a number of highly frequent constructions evolved in which auxiliaries and full verbs are treated differently, and in which the distribution of grammatical properties between these two verb classes changes dramatically (e.g., only auxiliaries can be negated or precede the subject).
Hudson (Reference Hudson1997) argues that there are good functional reasons for separating auxiliaries from full verbs. One reason is that it is slightly easier to process utterances if the words that mark, e.g., negation and questions are distinct from the words that indicate lexical content. Moreover, auxiliaries allow the finite verb form to occur early in the sentence while adverbs can remain adjacent to the verbs they modify (as in Have you ever witnessed an accident?). The data show that the separation between auxiliaries and full verbs is accompanied by the rise of do-support, which becomes obligatory in the absence of any other auxiliary (as in non-subject-questions, negation, tag questions, and so on).
The FCG analysis is compatible with these data. As explained in the previous sections, auxiliaries are not merely part of the verb phrase, but serve other communicative functions as well. For instance, in wh-questions they explicitly function as markers of focus and illocutionary force, and auxiliary-do additionally functions as a non-subject marker. Auxiliary-do is also recruited when the language requires a finite verb form in a particular position that is different from the position that was assigned to the main verb (as in What did the boy hit?).
Moreover, English auxiliaries have continued to evolve. For instance, they have formed a new class of clitics (Anderson, Reference Anderson2008). Cross-linguistic research shows that cliticization is a typical development for auxiliaries as the result of coalescence (Haspelmath, Reference Haspelmath, Narrog and Heine2011). Coalescence arises when function words become ‘glued’ to a related content word. From the viewpoint of a strict phrase structure analysis, this reanalysis of function words into morphological markers is problematic because it requires the function word to cross a phrasal boundary. In the FCG analysis, however, this problem does not occur because there are multiple linguistic perspectives that overlap with each other. In the implementation, such overlapping occurs when units have multiple parents (as opposed to a single parent in a syntactic tree). Example (32) illustrates some of the relations that might exist between different units. Phrase structure relations are indicated by a full line, relations from a different linguistic perspective use a dotted line.
As can be seen, there is a unity between NP-UNIT-1 and AUX because the auxiliary functions as a focus marker besides also carrying verbal properties. There is also a unity between NP-UNIT-2 and VP-UNIT because their relative positions indicate that the boy is the subject of the utterance. The analysis thus already implies that a unit can perform multiple functions at the same time, which might pressure a language into developing more explicit markings for those functions.
4.2.4 Relation to Linearization-based HPSG and SBCG
In Section 2, I sketched the filler−gap analysis within the framework of SBCG, which it inherited from HPSG. However, there is an important extension of ‘default’ HPSG (and recently also SBCG) called Linearization (see, e.g., Reape, Reference Reape, Nerbonne, Netter and Carl Pollard1994; Kathol, Reference Kathol2000) that pursues similar solutions as the one proposed in this paper.
Indeed, despite its name, HPSG is not a ‘phrase structure grammar’ in the sense that it does not really use phrase structure trees for representing linguistic knowledge (although most computational implementations of HPSG grammars use a phrase structure backbone for reasons of efficiency; Levine & Meurers, Reference Levine, Meurers and Brown2006). Instead, HPSG uses typed feature logics as its formalism (Richter, Reference Richter2004), which is far more expressive than phrase structure.
Linearization-based HPSG grew out of the need to cope with languages that exhibit more freedom in word order than English, such as German (Reape, Reference Reape, Nerbonne, Netter and Carl Pollard1994; Kathol, Reference Kathol2000). The key idea behind linearization is that word order should be separated from local trees (or immediate dominance rules). Instead, word order is specified in a new level of an order domain, which can include information from several local tree configurations. Just like constituents can be combined to form larger phrases, such domains can be combined to form larger domains (Daniels & Meurers, Reference Daniels, Meurers, Lemnitzer, Meurers and Hinrichs2004).
The introduction of a new level of word order domains effectively creates an additional linguistic perspective in HPSG, similar to the different linguistic perspectives in FCG. Indeed, linearization-based HPSG has reanalyzed several word order variations in terms of linearization rules rather than in terms of extraction. However, as to my knowledge, no proposal exists that manages to completely eliminate the need for extraction (see Müller, Reference Müller2005, for a critical discussion of linearization). One reason is that linearization cannot cross sentence boundaries while extraction can (Stefan Müller, p.c.).
FCG never imposes such restrictions on word order. The FCG-formalism, which uses term unification of untyped feature structures (Knight, Reference Knight1989), allows all word order constraints to simply point to units using unit-names or variables (which need to be bound to unit-names). For instance, (meets unit-x unit-y) states that unit-x must be adjacent to unit-y, even if unit-y is structurally a member of a different clause or sentence. Such constraints can thus accommodate even long-distance dependencies in which a phrase from a subclause takes sentence-initial position, as in Who do you think you are? In principle, it should be possible to express such constraints in HPSG as well, but it remains to be investigated how this can be done using typed feature logics.
4.2.5 Relation to other prior work
Finally, it should also be noted that there have been several research streams that have formulated the filler−gap approach using direct long-distance relations between verbs and their dependents rather than using local constraints and feature percolation. The most relevant approaches for our discussion are Lexical-Functional Grammar (LFG; Kaplan & Zaenen, Reference Kaplan, Zaenen, Dalrymple, Kaplan, Maxwell and Zaenen1995) and Berkeley Construction Grammar (BCG; Kay & Fillmore, Reference Kay and Fillmore1999).
The LFG approach and this paper share the idea that different linguistic perspectives should be separated from each other. However, whereas this paper distinguishes these perspectives within the same feature structure, LFG radically proposes different layers for phrase structure (called c-structure) and functional structure (called f-structure). This separation, however, creates the need to couple the layers to each other in some way because they are not entirely independent. Kaplan and Zaenen (Reference Kaplan, Zaenen, Dalrymple, Kaplan, Maxwell and Zaenen1995) have therefore proposed the technique of functional uncertainty that makes use of regular expressions. Since the FCG-implementation only uses constructions instead of separate layers, there is no need for such techniques.
Just like LFG, Berkeley Construction Grammar has pursued a solution that uses regular expressions called valence embedding, whereby verbs are able to reach its dependents at an arbitrary depth in the sentence structure (Kay & Fillmore, Reference Kay and Fillmore1999). Moreover, BCG shares with this paper’s approach the vision that such constraints should be stated within constructions without other separate layers. However, Müller (2006, pp. 856−859) has shown that the notion of set unification assumed by Kay and Fillmore (Reference Kay and Fillmore1999) is not sound, which means that the notion of valence embedding has never been made formally precise. Moreover, the BCG approach still considers long-distance dependencies as deviations from a canonical word order because it requires specific constructions (such as the Left-Isolation Construction) for licensing LDDs, similar to filler−gap constructions.
5 Conclusion
This paper has reported on a computational implementation of long-distance dependencies in Fluid Construction Grammar. Its main goal was to show that it is entirely feasible to implement an explicit and formal account of LDDs in a cognitive-functional framework. Moreover, the paper argues that such an approach significantly improves upon the state-of-the-art formalizations of LDDs in generative grammars that adopt a filler−gap approach.
More specifically, this paper’s model improves upon such competence-only models in terms of explanatory adequacy: the structural properties of English long-distance dependencies are functionally motivated. They allow speakers to express different conceptualizations while remaining processing-friendly. I have shown that the approach is compatible with evidence from psychology, linguistic typology, and historical linguistics. Moreover, this paper’s account incorporates a processing model that works for both parsing and production.
Finally, the cognitive-functional approach is more parsimonious than the filler−gap analysis. Long-distance dependencies spontaneously emerge as a side effect of linguistic communication, in which speakers can freely combine the same constructions with each other in different ways as long as they are not in conflict with each other. Such an approach completely eliminates the need for gaps, fillers, valence-changing lexical rules, and formal mechanisms such as percolation.