Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-02-06T17:03:37.380Z Has data issue: false hasContentIssue false

Gilbert Fanselow, Caroline Féry, Matthias Schlesewsky & Ralf Vogel (eds.), Gradience in grammar: Generative perspectives (Oxford Linguistics). Oxford: Oxford University Press, 2006. Pp. x + 405.

Published online by Cambridge University Press:  29 April 2008

Anders Søgaard*
Affiliation:
Center for Language Technology, University of Copenhagen, Njalsgade 80, DK-2300 Copenhagen S, Denmark. anders@cst.dk

Abstract

Type
Book Review
Copyright
Copyright © Cambridge University Press 2008

In October 2002, a conference took place at the University of Potsdam, about 30 kilometers outside of Berlin. The theme of the conference was the notion of gradience in grammar. Last time I was in Potsdam, a boy on the train asked me for directions. He wanted to know if this was the train for the Uni. The readers of this journal will all know this abbreviation, and so did I, but the little three-letter word illustrates one of the central research questions about gradience.

In Danish, students sometimes use the word uni to refer to universities, but it is less frequent than in German. The abbreviation may even sound a bit German to many Danes. In American English, it's out. Should the Danish, German and American English grammars reflect these facts? Obviously, the word Uni should be in the German lexicon, since it is fully lexicalized; whereas the word uni has no role to play in the English lexicon. Should uni be in the Danish lexicon? Is it lexicalized or somehow productively derived from the word universitet by abbreviation? If it's put in the lexicon, should its low frequency be somehow pointed out? If it's not, should the abbreviation rule somehow account for its markedness?

The papers from the 2002 conference in Potsdam have now been rewritten and published by Oxford University Press. The book contains 16 articles and an introduction by the editors. It brings together a number of views on gradience, from both phonology and syntax. In this review, I will focus on the articles that relate to gradience in syntax.

The 16 articles come in four parts. The first part is supposed to clarify what gradience is, or how it can be properly defined. I will focus on the articles by Eric Reuland and Antonella Sorace. Part II consists of articles that discuss the role of gradience in phonology. I will have nothing to say about these articles. The third part moves on to syntax and I will discuss its four articles in some detail. Finally, Part IV brings together three articles on gradience in wh-movement constructions.

The article by Reuland, albeit a bit inconsistent, is a welcome clarification of what gradience can mean and has meant to different people. He singles out four different senses: 1. Gradience refers to the analogue rather than discrete nature of linguistic processes, 2. Gradience refers to the fact that linguistic rules are tendencies rather than what he calls ‘system governed’, 3. Gradience refers to variability in speakers’ grammaticality judgments, and 4. Gradience refers to the scalar nature of grammaticality judgments. I'm not convinced that this typology is exhaustive or even consistent, but some of the distinctions seem meaningful.

Reuland first argues against the mistaken view that since linguistic processes are biological processes it follows that these processes are analogue by nature (cf. 1). The either-or nature of neuronal signaling is enough to illustrate his point. He then observes that digital and analogue systems can mimic each other, as witnessed by our watches, and that things need be no different in our brain. Reuland notes at this point that certain linguistic phenomena, such as pitch or stress, seem to be truly analogue, but lie outside the domain of grammar.

The difference in frequency between Uni in German and uni in Danish can be stated in terms of tendencies (cf. 2). Reuland observes that some linguistic phenomena can only be stated in such ways; an example is the differences ‘between written cultures as to the average sentence length in their typical novels’ (p. 49). He then claims that, on the other hand, certain phenomena must be system governed: ‘No serious linguist,’ he says, ‘would be content claiming that English has a tendency to put the articles before the noun, and Bulgarian a tendency to put them after’ (p. 49). I think this statement is a bit too strong: I know a number of quite serious linguists who completely reject system-governed rules and state all such observations in terms of tendencies. Reuland notes that the cross-linguistic variation that is found in the expression of agents in passives is a good example of a borderline case. The solution Reuland seems most in favor of is to view tendencies as outside the domain of grammar; or in his own words, ‘one expects to find tendencies precisely there where principles of grammar leave choices open’. (p. 50)

Reuland rejects the relevance of inter-subject and intra-subject variability for linguistic theory (cf. 3), but seems to believe that linguistic theories should be able to express degrees of grammaticality (cf. 4). He takes up the Minimalist Program as an example and argues that in this framework it is particularly easy to talk about degrees of grammaticality. I think his discussion here is a bit misleading, however. The reason is that he reduces degrees of grammaticality to, what I would think of as, the issue of grammaticality versus acceptability: the examples discussed are the supposedly strictly ungrammatical John arrived Mary and the supposedly less so Sincerity hated John. (In my view the first sentence may be ungrammatical, but the second one certainly isn't.)

The Minimalist Program, Reuland argues, is better equipped to talk about the difference in grammaticality between such sentences than Government & Binding, for instance, because it is more modular. This too is a bit misleading. The modularity of theories does not, as I see it, have anything to do with their ability to distinguish between grammaticality and acceptability. Constraint-based theories such as Head-driven Phrase Structure Grammar (HPSG) are arguably non-modular, but by their constraint-based nature it is very easy to talk about degrees of grammaticality and the distinction between grammaticality and acceptability, as argued below.

Reuland's article ends with a discussion of anaphora. Government & Binding presents a categorical or system-governed approach to anaphora, whereas Optimality Theory, for instance, gives a more flexible account. Reuland argues, based on his earlier work, that binding is categorical by nature, but that the scope of binding theory, as opposed to co-reference phenomena, is more limited than what was supposed to be the case at the time of Government & Binding. In many respects, this echoes recent work by Ray Jackendorf under the banner of Simpler Syntax – or, for an even more radical position, Hagit Borer's exoskeletal version of the Minimalist Program.

Sorace and Reuland agree on most points, I think. In the second line of her article, she sums up the research hypothesis common to both authors: gradience is manifested not in syntax itself, but in its interface with other areas of grammar. She seems to provide some evidence for this by looking at first and second language acquisition, as well as first language attrition.

The main example concerns unaccusative and unergative intransitive verbs. Sorace introduces a so-called split intransitivity hierarchy that assigns every verb a position on a gradient scale from categorically unaccusative to categorically unergative. The hierarchy is motivated on largely semantic grounds; change-of-location verbs are unaccusative, non-motional-process verbs are unergative, and so on. She argues that verbs at both ends of the scale restrict their syntactic projections whereas other verbs, in the grey zone, are left underspecified. The position is supposed to be superior to the obvious lexicalist and constructionalist alternatives. Some of the evidence that Sorace reviews comes from second language acquisition, as already mentioned: Italian learners of French, for instance, find it more difficult to acquire correct auxiliaries for verbs at both ends of the scale. Similar observations can, apparently, be made in first language acquisition and across languages.

The evidence is not unambiguous. In fact, it seems to me that most observations listed in Sorace's article are fully compatible with both lexicalist and constructionalist approaches. If auxiliary selection is in part semantically motivated, which is what Sorace seems to say, the lexical representation of verbs that are not ambiguous with respect to the two properties in question, say existence-of-states verbs, will be more complex. Similarly, on the constructionalist approach, in cases where the verbs do not fall out to any of the sides, the speaker is left with a number of alternatives that may confuse her.

An observation in Sorace's article is that acquisition is not merely a matter of frequency, but is heavily influenced by such semantic ambiguities. The observation is valuable to the extent that many linguists pursue purely frequency-based linguistic theories these days; such theories seem bound to fail, as would any linguistic theory, I guess, if the division of labor between syntax and semantics is not properly defined.

It must be mentioned in passing that it is not entirely clear where the hierarchy that Sorace proposes comes from. It's obviously semantically motivated, and I've treated it as such, but in a few places Sorace seems to suggest that the hierarchy is also a part of grammar. If this is the full story, it of course leaves her with the question of how the hierarchy is acquired. It is a bit awkward to imagine that the position in the split intransitivity hierarchy of the word programming, for instance, is genetically hardwired into the human brain.

The first article in Part III is by John Hawkins. Contrary to Reuland and Sorace, Hawkins does predict gradience, in Reuland's sense 2, to arise in syntax proper. Hawkins formulates a principle that is supposed to capture gradience that relates to the preferred word order of constituents with different lengths – or weights – where more than one option is possible. Consider, for instance, the case of multiple prepositional phrases following an intransitive verb. In English, there is a clear tendency to postpone the longer prepositional phrase. In Japanese, arguably because it is a head-final language, the reverse tendency is found. In the case of non-subject noun phrases with free internal order, the longer phrase typically precedes the shorter. This is explained by a principle that Hawkins formulates (p. 208). It roughly says that speakers (and listeners) try to minimize the length of the substrings that can be used to infer the selection frames of syntactic heads. Such a window-based approach outperforms predictions based solely on notions of incremental parsing exactly because they cover head-final cases too. The length of a window is, so to say, order-independent.

The notion of selection in Hawkins' article extends to both syntax and semantics. Gradience effects can thus occur in both domains. Hawkins does not seem to suggest that this is a particularly exotic feature of the minimization principle in question. The principle that is discussed is of course only supposed to explain a certain class of gradience effects, namely those that relate to preferences in contexts with otherwise free word order; but it seems reasonable to conclude that Hawkins sees gradience as a phenomenon that can occur in all parts of grammar. Reuland would probably argue that length-dependent rules or constraints play no role in syntax proper.

The second article in this section is by Matthew Crocker & Frank Keller. Most of their article is devoted to the role of past experience in linguistic performance and can be seen as a brief survey of a decade's work at the locus of psycholinguistics and stochastic natural language processing. The introduction to probabilistic grammars is non-technical and perhaps a bit superficial, but the essay is in general a very nice and well-composed introduction to the field.

The authors claim that probabilistic grammars are potential models of linguistic performance. In particular, the probabilities that can be learned from balanced corpora and assigned to rules that associate terminal symbols with nonterminal ones (syntactic categories) seem to predict linguistic performance relatively well. The same goes, to some extent, for more complex linguistic constructions. The authors also observe, however, that the relationship between linguistic performance and gradience is nontrivial. Sentences of a certain length, for instance, are rare but may be completely grammatical. Or there's the interaction with world knowledge. In some of the frequency studies surveyed in this article frequencies were clearly not counterbalanced with world knowledge; the interaction is nontrivial.

Crocker & Keller also list what they think of as five theoretical advantages of (trained) probabilistic grammars: efficiency, coverage, performance, robustness and adaptation. I need to add a few comments here, I think: (i) the story about efficiency is a bit more complex than the one told in this article. It is typically the case that probabilistic parsing is more complex than symbolic one, and it is not always the case that probabilities provide off-the-shelf heuristics that can speed up parsing at little or no cost. (ii) The coverage argument is not clear-cut either. If you're a computational linguist and you need large coverage fast, probabilistic grammars is obviously the way to go. This is no embarrassment to symbolic grammarians, however. The fact that their grammars are typically of limited coverage just tells us that the grammarians are not done yet describing the world's languages; it does not in any way tell us that non-probabilistic language acquisition is too difficult for language learners, or that symbolic description of languages is too difficult for linguists. It is true that linguists that work with probabilistic grammars have had some success in explaining performance-related results from psycholinguistics. It is also true that probabilistic grammars are in general more robust; but see my comments about constraint-based grammars below. I have nothing to say about the fifth point, about adaptation.

The two other articles of this section are on forms of Optimality Theory (OT). The first one is by Ralf Vogel, the second one, again, by Keller. Vogel's article is mostly devoted to OT-internal problems that I will ignore here. It also includes a brief motivation of gradience, very much along the lines of the editors’ introduction to the volume. Vogel's point of departure is empirical data, and his point is that standard OT is expressive enough to account for the observed gradience. The application of stochastic extensions of standard OT, he argues, may actually blur some of the gradience phenomena. Keller presents such a stochastic extension.

Contrary to the article co-authored by Crocker, which was about the performance of speakers, the topic of this article by Keller is gradience in competence, in Reuland's sense 2 and 3. Frequency measures are still used, but supported by other people's acceptability experiments. The main proposal, to put it very briefly, is that constraint violations are linearly cumulated. The proposal is compared to related ones. In the introduction, the editors avoid the apparent conflict between Vogel's and Keller's articles as they are discussed in separate places, but they're still put together in the volume. The reader sees the contours of a future debate.

Part IV brings together three articles on gradience in wh-movement constructions. Or at least it is supposed to. Only four out of 25 pages of the first article, by Gisbert Fanselow & Stefan Frisch, are about wh-movement. In those four pages, they show how local structural ambiguity results in increased acceptability for such constructions. They present an experiment that verifies this hypothesis, first presented in work by Sigmund Kvam. The point is, briefly, that the acceptability of wh-movement constructions in German increases if the features of the phrase that has been moved match the requirements imposed by the matrix verb.

On the other hand, the authors seem to accept the idea that the processing complexity of wh-movement constructions gives rise to island conditions. I need to remark that the absence in the Nordic languages – familiar to many of the readers of this journal – of the island constraints alluded to in the article suggests that the explanation must be qualified a bit. Here's an example from Danish:

  1. (1)

This is of course in sharp contrast to the ungrammaticality of:

  1. (2) *What do you wonder who has bought?

Processing complexity should be roughly the same, though. Such data are in fact the point of departure of the next article, by Nomi Erteschik-Shir. In fact, it comes out a bit odd that neither Fanselow & Frisch nor the editors remark on these data. Erteschik-Shir shows how gradience in wh-movement constructions depends on information structure. Light verbs are apparently better in such constructions than heavy verbs, but acceptability increases if the verb has been mentioned already in the context. In other words, acceptability increases if the verb is no longer in focus. Erteschik-Shir believes that similar explanations can be found for most gradient phenomena and therefore argues that syntactic constraints will always render ungraded results (against Reuland's sense 2). It is important to note that information structure constraints, on his belief, need not be universal. Each language has its own canonical information structure.

The above article can to some extent also be said to be about the influence on acceptability judgments of the kind of creativity needed for finding a context or information structure that is suitable for a sentence. If speakers are presented with a wh-movement construction with a heavy verb, their acceptability judgment depends on whether they can somehow defocus the verb by some creative effort. Yoshihisa Kitagawa & Janet Dean Fodor talk about the influence of prosodic creativity. The point is that text-based acceptability judgments sometimes increase with nonstandard prosody patterns, and that the assignment of nonstandard prosody may require a similar creative effort. (The sentence in (1) above is arguably best with contrastive stress on the personal pronoun in subject position, for instance.) The standard prosody pattern, to put it differently, is projected onto the sentence in reading tasks and creates a prosodic garden-path effect that reduces acceptability.

Kitagawa & Fodor test the significance of such garden-path effects by comparing listening and reading tasks. The example construction is wh-movement in Japanese. In the listening tasks, the test person is presented with appropriate prosody; in the reading tasks, no prosodic cues are given. The results of the experiment, albeit a bit preliminary, indicate that target sentences are accepted more often in the listening tasks, where no prosodic creativity is required.

I said above that the Minimalist Program, as far as I can see, is not a particularly good theory to talk about degrees of grammaticality. should note that I deliberately ignore the minimalists who say that the Minimalist Program is not a theory, but a philosophy of language, and focus only on those who see the program as more than just a handful of methodological assumptions. I also wish to stay agnostic on whether you actually want to talk about degrees of grammaticality in grammar; but if so, I think that the Minimalist Program, in its present form, is not the best choice; at least not if Edward Keenan and his colleagues at UCLA are right about the formal properties of the grammars of the Minimalist Program: such grammars are in their view mildly context-sensitive generative grammars.

Generative grammars, as shown by Geoffrey K. Pullum & Barbara Scholz in a series of papers, are different from constraint-based ones. Reasonable translations often exist, designed to keep recognizability and other properties invariant, but the two kinds of grammars are still different. In constraint-based grammars, a language is the set of strings whose logical descriptions are satisfiable in conjunction with the linguistic principles, the constraints. If a few constraints are removed, say, the soft constraints on a language, more strings are licensed. In generative grammars, this works the opposite way: if rules are removed, fewer strings are generated. Consequently, the generative grammars cannot talk about degrees of grammaticality in terms of subsets of linguistic principles.

It was noted above that HPSG is a bit better for talking about degrees of grammaticality. The reason is now obvious: it is a constraint-based theory, and deliberately so. In the light of this observation, the editors' remark in the introduction that ‘leading current syntactic models such as Minimalism, OT, OT-LFG, HPSG, or categorial grammar seem disinterested in gradience’ (p. 13) seems to me a bit forced. Reuland argues – while I think there are reasons to question this claim – that the Minimalist Program is a reasonable architecture for talking about gradience. In the Lexical-Functional Grammar (LFG) community, there is widespread interest in stochastic methods that address gradience in Reuland's sense 2 in a very direct fashion. The same goes for HPSG, which for the reasons just mentioned seems to be one of the theories that are best suited to talk about gradience.

Gradience is an important phenomenon to understand. The diverse and unruly data may lead you to abandon the notion of grammaticality altogether. This volume suggests that such a move is perhaps a bit premature. It may be where you want to go, but it is not yet a forced move. Gradient acceptability can in many cases be explained away, by reference to world-knowledge, processing complexity, prosodic creativity or other extralinguistic factors. If grammaticality is abandoned we run the risk of mixing apples and oranges.