Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-11T09:32:48.593Z Has data issue: false hasContentIssue false

Determiner omission in German prepositional phrases

Published online by Cambridge University Press:  12 October 2018

TIBOR KISS*
Affiliation:
Sprachwissenschaftliches Institut – Ruhr-Universität Bochum
*
Author’s address: Sprachwissenschaftliches Institut, Ruhr-Universität Bochum, 44780 Bochum, Germanytibor@linguistics.rub.de
Rights & Permissions [Opens in a new window]

Abstract

In this paper, we present an analysis of so-called determinerless PPs in German, i.e. prepositional phrases that allow singular count nouns to occur without an accompanying determiner, despite other rules in the grammar requiring the presence of the determiner. The analysis is based on annotated corpus data, which are fed into a statistical classifier (applying logistic regression). Superficially, the syntax of bare prepositional phrases is difficult to capture, and intuitions cannot be easily elicited. The analysis is based on data sets for two pairs of German prepositions: mit ‘with’ and ohne ‘without’, and über ‘over, above’ and unter ‘under, below’. The results of the classifiers applied to annotated data indicate which syntactic, morphological and semantic features are responsible for determiner omission. We are able to detect common properties of all four prepositions, as well as preposition-specific, and idiosyncratic properties. The apparently unsystematic conditions for determiner omission can be discerned by tracing the interaction of these properties.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2018 

1 Introduction: Bare prepositional phrases and preposition+noun combinations

There is a condition governing noun phrases that holds for many languages, but seems to be weakened (or even reversed) if the NP occurs as complement of a preposition. This condition says that a determiner is required if an NP is headed by a singular count noun. A count noun like bus must not appear without a determiner if it is realized as the object of a verb, as in (1a). But it can – in fact must (Himmelmann Reference Himmelmann1998: 316; Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 172) – be used without a determiner when embedded under the preposition by:

The construction in (1b) is sometimes called determinerless PP orpreposition+noun combination (PNC, see Stvan Reference Stvan2009), reflecting that the peculiarity depends on the presence of a preposition.

The ostensibly offending combination in (1b) does not necessarily establish an irregularity. It could be described by stating that the determiner must be dropped if the noun is the complement of by and its semantics corresponds to means of transportation (see Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006).

In this paper, we would like to single out a specific sub-class of determinerless PPs, which we call Bare PPs (BPP). BPPs show an irregular behaviour that does not lend itself easily to rules of the type suggested above to deal with (1b). BPPs are restricted by the following properties:

The construction in (1b) is not a BPP: The omission of the determiner is obligatory. The properties P I to P IV are language-independent and thus constitute BPPs cross-linguistically.

For the purpose of the present analysis, we will focus on BPPs in German, starting with the examples in (2).Footnote [3]

The bracketing of the determiner indicates its optionality in (2a, c); similarly, (2b) illustrates the optionality of the determiner, which in this case is reflected in the declension class of the adjective (the alternatives listed between the curly brackets). The examples thus satisfy P III (Optionality). The pertinent nouns in (2) may also be pluralized, as is indicated in (3), corroborating their status as count nouns.

The omission of the determiner in (2) leads neither to ungrammaticality nor to a semantic shift of the nominal complement. With regard to the latter, the situation should be compared to determiner omission outside of PP. Here, either the semantics of the nominal complement changes, so that the pertinent expressions cannot be substituted salva veritate, or the whole phrase receives a non-compositional interpretation. The first case is discussed with count/mass alternations in Payne & Huddleston (Reference Payne and Huddleston2002: 335–337):

The presence of a numeral – two in (4b) leads to an interpretation that differs from the one without a determiner in (4a). The sentence refers to the abstract concept of ‘injustice’ in (4a). In (4b), however, the pluralized phrase refers to event instantiations.

The second case occurs in conventionalized constructions like the X-ist-X-und-Y-ist-Y construction in German:Footnote [4]

This example does not convey the tautological truth that an effigy is an effigy and a picture is a picture, but instead that effigies and pictures must be kept apart. The examples in (4) and (5) thus do not provide evidence against P II.

The examples in (6) show that the nouns presented in (2) must appear together with a determiner if they are marked singular and have been realized as an object of a verb. Thus, P II (Restriction to P) is satisfied in addition to P I, and P III.

BPPs are possible if the preposition mit ‘with’ shows mereological (2a), conditional (2b), and modal (instrumental) interpretations (2c). The mereological interpretation covers the presence (in the case of mit) or absence (in the case of ohne ‘without’) of an object, property or feature (see Schröder Reference Schröder1986: 162; Kiss et al. Reference Kiss, Müller, Roch, Stadtfeld, Börner and Duzy2016: 121–122). The sense conditional relates a prerequisite to a dependent state of affairs that is expressed in the sentence in which the PP is realized – modulo the PP (see also Schröder Reference Schröder1986: 151–152; Helbig & Buscha Reference Helbig and Buscha2007: 379).Footnote [5] Modal interpretations in general describe the modification of events or actions; instrumental interpretations in particular cover instruments, which are used in actions and events (see Schröder Reference Schröder1986: 146–148; Kiss et al. Reference Kiss, Müller, Roch, Stadtfeld, Börner and Duzy2016: 279–280).Footnote [6]

Yet, BPPs are not allowed under every circumstance. Optionality (P III) is to be interpreted unidirectional: BPPs can always be transformed into PPs by adding an appropriate determiner, and – in the case of German – changing the declension class of the noun and adjective(s) accordingly. The reverse, however, is not true: Not each and every PP can be turned into a BPP, as can be illustrated with the examples in (7).Footnote [7]

In (7a), the preposition assumes the sense modal (instrumental again). But here, the determiner cannot be omitted.Footnote [9] The same holds for the mereological interpretation in (7b). One could argue that the major difference between the examples in (2) and (7) is that the latter examples show postnominal extensions, a postnominal genitive NP in (7a) and a finite clause in (7b). But this does not hold for (7c), and yet, the determiner cannot be omitted here. Also, it is not simply the case that a postnominal extension always blocks determiner omission, as can be witnessed from (8).

The grammatical examples show adjectival modification, and postnominal extensions, thus establishing P IV. The apparently unsystematic pattern observed in (2), (7) and (8) is reflected in the acceptability judgments of German speakers, who are often unable to judge the acceptability of (constructed) BPPs. They are also reluctant to coin new BPPs. According to the German Duden grammar (Duden 2005), the BPPs in (2) and (8) should be as ungrammatical as the ones in (7).

The set of German simple prepositions allowing BPPs is quite large: It comprises at least 22 prepositions. While it would be worthwhile to analyze all 22 prepositions, considerations of feasibility dictate that we presently deal with a smaller subset. In this paper, we will discuss BPPs headed by mit ‘with’, ohne ‘without’, über ‘over, above’, and unter ‘under, below’. The choice of prepositions is governed by commonalities and differences of the prepositions: The prepositions mit and ohne realize a unique cluster of senses, which contains antonymous sense pairs corresponding to Baldwin et al.’s (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 171–172) distinction of positive vs. negative senses. The preposition ohne is the only preposition in German that occurs more often in BPPs than in PPs, just as the prepositions without, zonder, and sin, in English, Dutch, and Spanish, respectively. The variation available with ohne is illustrated in (9).

The prepositions unter and über have been chosen because they antonymically share a spatial sense, and also because their respective behaviour with respect to BPPs requires rather different explanations. The examples in (10) illustrate the variation of BPPs found with unter.

The examples in (11) provide evidence for the existence of BPPs with über, and, at the same time, show that they are not always possible (see (11d)).

To identify the syntactic and semantic factors of determiner omission, and to distinguish them from purely idiosyncratic properties related to individual combinations, we apply a methodology termed annotation mining (Chiarcos et al. Reference Chiarcos, Dipper, Götze, Leser, Lüdeling, Ritz and Stede2008, Kiss et al. Reference Kiss, Keßelmeier, Müller, Roch, Stadtfeld, Strunk, Huang and Jurafsky2010), in combination with Generalized Linear Mixed Modeling (GLMM, Zuur et al. Reference Zuur, Ieno, Walker, Saveilev and Smith2009). The data were extracted from a Swiss-German newspaper corpus (Neue Zürcher Zeitung). Annotations comprise all kinds of linguistic levels, including lexical, morphological, syntactic, and semantic information in particular. GLMM provides an extension of logistic regressionFootnote [10] that not only allows us to determine general factors that are responsible for determiner realization and omission, but also allows us to address the question of the productivity of the construction itself, by looking at the influence of individual nouns.

The main results of the present paper can be summarized as follows: The formation of BPPs is a productive process with three of the four prepositions (ohne, mit, unter), but is restricted to specific nouns for über.

The analysis will establish general structural conditions that interact with preposition-specific conditions, and possibly also with lexical idiosyncrasy. The structural complexity of the nominal complement of the PP has been identified as a general factor inhibiting determiner omission: Genitive NP complements, prepositional complements and modifiers of N, as well as relative clauses and clausal complements appearing to the right-hand side of N inhibit determiner omission. This condition even leaves a mark in the analysis for über, where determiner omission is generally blocked.

For the prepositions mit and ohne, a specific sense supports determiner omission, and a number of other senses inhibit it to a stronger or lesser degree. The converse situation applies to unter: A specific sense inhibits determiner omission, while other senses facilitate it. Lexical idiosyncrasy plays a role in the analysis of mit, unter, and über, but can be neglected in the analysis of ohne. In contrast to previous analyses, we have started with analyzing individual prepositions. As will be discussed in Section 5, the results justify this step. In particular, it no longer comes as a surprise that speakers are reluctant to judge or coin BPPs if the behaviour of BPPs is only partially governed by general conditions, which interact with preposition-specific, and even idiosyncratic properties. The interaction would hardly be detected if the preposition-specific differences in the analyses were neglected.

From this point onwards, the paper is structured as follows: In Section 2, we discuss previous analyses of determinerless PPs. Section 3 is concerned with the methodology employed in the present analysis. Section 4 presents the analysis, starting with the rule-based components of the models in Section 4.1. The relevance of the random component, which consists of idiosyncrasy introduced by individual nouns, will be dealt with in Section 4.2. Section 4.3 will discuss the random component in light of previous proposals dealing with ‘N-based’ PNCs (Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006, Stvan Reference Stvan2009, Le Bruyn, de Swart & Zwarts Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012), and points out the differences. Section 5 summarizes the results and concludes the paper.

2 The treatment of determinerless PPs in previous proposals

Previous research has not singled out BPPs by conditions like P I–IV, and hence, we can only discuss the impact of these analyses to determiner omission in PPs in general, and point out implications for BPPs if appropriate. We will thus discuss pertinent aspects of several recent proposals to PNCs, beginning with formal aspects in Section 2.1, and addressing questions related to the interpretation in Section 2.2.

2.1 Structural aspects of determiner omission

Any syntactic analysis of a given phenomenon must ask whether the phenomenon should be treated as syntactic, or in more general terms, as rule-based in the first place. Several proposals argue explicitly against a rule-based account of PNCs, and suggest an approach in terms of fixed expressions.Footnote [11]

Payne & Huddleston Reference Payne and Huddleston2002 and Duden 2005 assume that PNCs in English and German, respectively, form a finite set of listed constructions with high cardinality. Confronted with a possibly very large set of fixed expressions, speakers will not be able to judge their acceptability. Instead, tests could only reveal memory capabilities, or even the level of the speaker’s erudition. Duden 2005 tries to account for a subclass of PNCs and assumes that determiner omission becomes possible in PPs in the genre of law and naval language. The examples given above, however, do not fall into either of these categories. Payne & Huddleston (Reference Payne and Huddleston2002: 409–410) extend this idea from fixed expressions to fixed frames for English PNCs. Dömges et al. (Reference Dömges, Kiss, Müller, Roch, Costello, Kelleher and Volk2007) and Kiss (Reference Kiss2007) have shown, however, that the construction is indeed productive, using a quantitative measure to gauge whether new instances of a construction emerge over time.

A crucial property of BPPs is P I (Countability), so the treatment of countability may help to understand BPPs. However, analyses of countability often assume that the count/mass distinction is expressed in the lexicon (be it at the level of the lemma or the individual sense). If such a lexical analysis is assumed, it follows that BPPs do not obey a condition otherwise obligatory for count nouns, as already provided in P I. Borer Reference Borer2005, however, provides an explicit syntactic account of the count/mass distinction. Borer Reference Borer2005 assumes that nouns are interpreted as countable if they appear in the necessary syntactic context of a Quantity Phrase (QP). The plural marking on a noun or the presence of a singular indefinite determiner provide such a context equally well. If a noun that otherwise would occur in the context of a QP shows up in a context lacking such a phrase, then the noun receives a mass interpretation (similar to grinding interpretations of count nouns in other contexts, see Pelletier Reference Pelletier1975). Borer’s proposal is indeed well-suited to account for cases of count/non-count polysemy with food terms, as has also been discussed in Payne & Huddleston (Reference Payne and Huddleston2002: 336–337). However, we would thus expect that BPPs never occur under Borer’s analysis. They lack a plural marking and a determiner by adherence to P I, and counterfactually, they would always violate P II, because the omission of the determiner leads to a meaning shift. Instead, we would expect a completely regular phenomenon: Nouns that otherwise occur as count nouns would regularly receive a mass interpretation when occurring as a determinerless complement of a preposition. But the nouns in (2) and (7)–(11) do not receive a mass interpretation. An alternative would be to assume that the examples contain a covert QP. But under such an assumption, we would need some indication why a covert QP could not be realized inside NPs that are not embedded by prepositions, recall (1a) and (6) above, and why not all realizations of a covert QP equally lead to grammaticality in PPs.Footnote [12]

The property of Phrasality (P IV) is related to the issue of productivity. For BPPs, we have claimed that they show a syntactic structure that does not differ from the syntactic structure of nominal projections in other contexts – of course modulo the presence of the determiner. There are various options to implement such a condition, e.g. by assuming that the preposition (optionally) selects a nominal projection instead of a DP. A particular implementation of this idea is provided in Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 175). They assume that the preposition by – if heading a PNC as in (1b) above – selects for a nominal projection the head of which still has an open SPR-dependency in terms of Head-driven Phrase Structure Grammar (HPSG; Pollard & Sag Reference Pollard and Sag1994), and hence lacks a determiner. The analysis allows a full-fledged $\text{N}^{\prime }$ projection in all other respects, including complementation and modification of the noun. Such an analysis seems appropriate for BPPs obeying Phrasality (P IV), but it contradicts the assumption stated in Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 167) that constructions of type (1b) are restricted with regard to modification. In addition, such a proposal would have to meet the challenge imposed by the contrasts between (2) and (7). With respect to P IV, Himmelmann (Reference Himmelmann1998: 317–319) mentions that Rumanian and Albanian restrict determiner omission to unmodified nouns. Yet, Himmelmann provides examples where modification and a lack of a definite determiner go hand in hand (Himmelmann Reference Himmelmann1998: 328). When Himmelmann (Reference Himmelmann1998: 332) addresses the ‘overall complexity of the phrase’, it is not the syntactic structure of the nominal complement but the internal complexity of the preposition.

Trawiński, Sailer & Soehn (Reference Trawiński, Sailer and Soehn2006) deal with the syntax and semantics of a subclass of PNCs, which violates P III and P IV. The construction also differs from BPPs in that a postnominal PP complement is obligatory. Trawiński et al. (Reference Trawiński, Sailer and Soehn2006) assume a raising analysis in which the obligatory PP complement originates as a syntactic argument of the determinerless head noun and is raised to become a syntactic argument of the preposition. Most importantly, raising postnominal extensions would not shed any light on the syntactic distribution of BPPs, as postnominal extensions are optional in BPPs.

While Stvan’s (Reference Stvan2009) analysis addresses a subset of PNCs in English that clearly differ from the ones under investigation in the present paper, her analysis of nouns is relevant. Stvan (Reference Stvan2009) assumes that at least some PNCs are determined by a specific class of nouns. Among other characteristics, these nouns violate P II, and thus appear determinerless outside of PPs. For these nouns, modification is generally prohibited (Stvan Reference Stvan2009: 329–331). As will become clear in the present analysis, certain nouns have a strong facilitating effect on determiner omission in German as well, but these nouns only show the effect inside BPPs. This aspect will be further discussed in Section 4.3.

While not a formal aspect proper, we also note that the choice of prepositions under investigation is often unsystematic in previous proposals. Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006 chose the prepositions as, at, by, in, and on without providing a rationale. In Himmelmann’s Reference Himmelmann1998 seminal paper, the prepositions under investigation are the ones that allow determiner omission in the first place. This is a working proposition if the class of prepositions, or the class of prepositions taking part in PNCs, is very small. This assumption does not hold for German or English. As we already mentioned, German shows more than 20 prepositions allowing BPPs. Interestingly, Himmelmann (Reference Himmelmann1998: 333) remarks: ‘In the Germanic languages, no generalisations [concerning the presence or absence of a determiner in PPs] are possible with respect to individual prepositions or subclasses of primary prepositions’.

2.2 The semantics of the preposition

The present proposal follows a research strand that was initiated by Himmelmann Reference Himmelmann1998, Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006, Kiss Reference Kiss2007, and Le Bruyn et al. (Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012: 192–194) in assuming that the interpretation of the preposition plays an essential role in analyzing determiner omission.

The relevance of the preposition senses has implications for the general approach to the analysis, and also to the analyses themselves. If the preposition senses are relevant for determiner omission, then it becomes necessary to identify all senses so as to characterize which sense supports or inhibits determiner omission. Without a prior definition of the senses, this cannot be achieved. Working with the senses of preposition again justifies starting with individual prepositions, because even without a full inventory of prepositions, it should be clear that not all prepositions share all senses.

Interestingly the majority of analyses covering PNCs starts by using individual senses without a background of a sense inventory and arbitrarily divides interpretations of prepositions into (primary) senses and uses. As a consequence, the preposition form is often identified with a prominent sense of the preposition, particularly so if the preposition shows a spatial sense. The following statement from Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 171) is indicative: ‘a significant number of spatial prepositions …occur in [P+N combinations] in both temporal and stative uses’. As Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006 do not provide a distinction between senses and uses, the implications of their statement remain dubious. The least thing that would be required for an analysis of PNCs based on a distinction of senses would be a distinction of the senses mentioned. In ignoring the polysemy of the preposition, the research question is actually evaded. Following a similar line of reasoning, Trawiński et al. (Reference Trawiński, Sailer and Soehn2006: 188) restrict their considerations to two senses of the German preposition in: a spatial one, and a ‘metaphorical non-spatial meaning’. Kiss et al. (Reference Kiss, Müller, Roch, Stadtfeld, Börner and Duzy2016: 99–110) distinguish nine different super-senses of in.

Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 171) argue that preposition senses can be divided antonymically into positive (inclusive) and negative (exclusive) ones. They reach the conclusion that ‘positive prepositions’ occur more often in PNCs than ‘negative’ ones, referring to contrasts like in vs. out and on vs. off. This is an interesting assumption but naturally requires an extension to all prepositions showing such antonymous senses (including the prior definition of the senses). As will become clear in Section 4, the present analysis of the prepositions mit and ohne provide evidence against such a conclusion. The two prepositions share several sub-senses, and can be characterized as being antonymic with respect to these senses. If mit assumes a mereological sense, indicating that something is part of something else, ohne assumes the same sense indicating that something is not part of something else. As we will see, ohne is the only preposition in German that occurs more often in BPPs than in PPs, an observation that holds for other languages as well. The realization of BPPs with these two prepositions is in fact tied to senses of the prepositions, but antonymy plays a role as shared senses show the same tendencies.

Unfortunately, neither Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006 nor Himmelmann Reference Himmelmann1998 try to provide criteria for the proper segregation of senses. Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 172), for example, assume a distinction between lexicalized and grammaticalized prepositions. They assume that the latter, more abstract meanings are less prone to determiner omission. However, they do not provide a definition of the distinction between lexicalized and grammaticalized prepositions. How do we gauge senses like spatial, mereological, and modal-instrumental in terms of their abstractness/concreteness? The behavior of prepositions taking these senses sharply differs with regard to determiner omission, as will be seen below, but it seems to be impossible to provide an order of these sense reflecting their abstractness/concreteness. With regard to spatial interpretations, Himmelmann (Reference Himmelmann1998: 321) draws a distinction between generalized locative meanings and concrete spatial relations. He assumes that prepositions bearing the former meanings are susceptible to determiner omission, while prepositions bearing the latter meanings are not. Stative locative and directional interpretations are subsumed under generalized locative meanings. Proximal locatives and superessives – which is a sense expressing vertical relations as can be found with the English above – are taken to be concrete local relations. Himmelmann does not provide criteria for identifying the individual senses, and also does not account for his decision to assign proximals and superessives to the concrete local relations, and other stative (and directional) senses to the generalized locative meanings. While the identification of the senses could possibly be achieved in relevant contexts, the assignment of the individual senses to the super-senses remains dubious. In particular, it should be noted that proximal and superessive interpretations form a subset of the stative locatives.Footnote [13]

The annotations on which the analysis in Section 4 is based employ the sense inventory developed in Kiss et al. (Reference Kiss, Müller, Roch, Stadtfeld, Börner and Duzy2016). For reasons of space, the inventory itself cannot be presented here.

3 Methodology

The analysis presented in Section 4 is based on the concept of annotation mining. Annotation mining is a two-step process. In the first step, natural language data available in a corpus are annotated at all possible linguistic levels. This means that the annotations may apply to different entities within the corpus (the lemma, the phrase, the sentence), and that different linguistic levels are annotated (such as morphology, syntax, semantics). The annotated data are then subjected to further analysis by making use of Generalized Linear Mixed Modeling (Zuur et al. Reference Zuur, Ieno, Walker, Saveilev and Smith2009).

A dependent feature is predicted by summing up coefficients for a number of independent features in a Generalized Linear Mixed Model (GLMM). Here, the dependent feature is the presence or absence of a determiner, and the model provides a probability that a determiner will be realized, given the coefficients of the features present in the sentence. Since the majority of the features in the present model are categorical (e.g. indicating whether a noun is the result of a nominalization, or which of the senses of a preposition is given), the calculation is reduced to the coefficients,Footnote [14] which are summed up and fed into a link function (the inverse logit in the present case) to provide the probability of determiner realization.

The model also uses random features, which differ from ‘ordinary’ – so-called fixed – features, in various respects; while fixed features are finite by definition, random features may be sampled from a possibly infinite population, and hence are infinite themselves. The preposition’s senses, the syntactic complements and modifiers of the noun are typical fixed features, bearing only a finite number of values. Random features, as we employ them, are the nouns that head the nominal complement of the preposition in a PP. There is a possibly infinite number of nouns, and different samples may contain different subsets of this infinite set.

3.1 Annotated features

The annotated features that form the basis for the development of the GLMMs can be divided into three groups. The first group comprises global features such as the sentence type (verb-initial clause, verb-second clause, verb-final clause) and information about the contexts in which the clause as a whole appeared, including information about PPs occurring in newspaper headlines, in quotations (particularly of poems), and in (media) titles.Footnote [15] BPPs occurring in such contexts have generally been excluded from the analysis. Determiner omission appears to be a fairly general, yet stylistic operation in these environments.

The examples were part-of-speech–tagged with STTS (Stuttgart–Tübingen Tagset; Schiller et al. Reference Schiller, Teufel, Stöckert and Thielen1999) and syntactically parsed with the Malt-Parser (Nivre et al. Reference Nivre, Hall, Kübler, McDonald, Nilsson, Riedel and Yuret2007), using a Dependency Grammar (Osborne Reference Osborne2015) and the Tiger annotation for syntactic relations (Brants et al. Reference Brants, Dipper, Eisenberg, Hansen, König, Lezius, Rohrer, Smith and Uszkoreit2004). With regard to the PP as a whole, the grammar provides information about its governor and its syntactic position, i.e. whether the PP occurs in the so-called German Mittelfeld between the position of the finite verb in main clauses and the position of the finite verb in subordinate clauses, or whether it occurs in sentence-initial (topicalization) position.Footnote [16]

The second group comprises features related to the preposition, and the PP as a whole. The senses of the preposition are annotated at the level of a super-sense (if a super-sense is relevant), and at the level of the most specific sense, according to the guidelines developed in Kiss et al. (Reference Kiss, Müller, Roch, Stadtfeld, Börner and Duzy2016).

The third group comprises features related to the noun and the nominal projection.

To satisfy P I (Countability), the candidates for (B)PPs were filtered by a classifier that determined whether the nominal head of the complement can be analyzed as a count noun.Footnote [17] This prior classification step resulted in a set of 4,413 count noun lemmata. Only PPs containing such a lemma as head of the nominal complement were considered for the analysis.

Morphological information is provided for inflection, nominalization, further derivations, possible derivational suffixes, as well as compounding.

At the level of lexical semantics, we use GermaNet (Hamp & Feldweg Reference Hamp, Feldweg, Vossen, Calzolari, Adriaens, Sanfilippo and Wilks1997) to provide a rough semantic description. The nouns are mapped to their ‘unique beginners’, which form the top part of GermaNet’s taxonomy, so that each noun is described by the set of unique beginners it belongs to.

The analysis provides information about the pre- and postnominal extensions of the noun, so that we are able to capture information pertaining to P IV (Phrasality). We have aggregated prenominal modification under the term adjectival modification. It reflects that adjectival modification is the primary source for prenominal extensions in German nominal projections. The relevant postnominal extensions are genitive NPs, appositions (including titles of media), prepositional modifiers and prepositional complements, relative clauses, complement clauses, and of course the case of no extension at all. The parser produces more relations than the aforementioned, which are aggregated under the feature other extensions. This aggregation reflects that the majority of these features were erroneously assigned, and also that they appear rather rarely.

The annotations provide individual analyses of instances of the data, employing established facts on their syntax and semantics (among other linguistic levels). Even before the annotated data are fed into GLMMs, their aggregation may already provide insights that may not be visible at the individual instances; in particular, interesting gaps of otherwise possible combinations may be observed (see Section 3.3 below).

3.2 Generalized linear mixed modeling

A full technical introduction of Generalized Linear Mixed Models (GLMMs) lies beyond the scope of the present paper, so we will introduce the most important properties of the model only informally. Linear models usually provide a numeric prediction for a given value on the basis of the summation of numerical feature values and their coefficients. A formula for such a model is provided in (12).

Here, $Y$ is the term to be predicted, $\unicode[STIX]{x1D6FD}_{0}$ is the intercept term, and $\unicode[STIX]{x1D700}$ is an error term, which reflects that no model can perfectly predict the data (the error term will not be considered henceforth, but it should be clear that no statistical model reaches a perfect prediction).

The intercept term can be conceptualized as providing basic information in the absence of the other predictors. In GLMMs, the intercept is defined to provide reference information, which we will discuss below. $X_{1}$ $X_{\text{n}}$ are the features, and $\unicode[STIX]{x1D6FD}_{1}$ $\unicode[STIX]{x1D6FD}_{\text{n}}$ are the corresponding coefficients. If a coefficient is 0, then the whole feature becomes irrelevant, and smaller coefficients express smaller influences than larger coefficients. At first sight, the application of such a model to linguistic questions seems improbable since linguistic features are usually categorical, and multiplying a number by a categorical feature does not make sense. In addition, whatever could be obtained from such multiplications would at best be a number in the interval between $\pm \infty$ , the interpretation of which would remain mysterious. Fortunately, a linear model as the one in (12) can be transformed into a meaningful model in three steps.

The first step consists in re-coding categorical variables through contrast coding (Chambers & Hastie Reference Chambers and Hastie1992), so that the presence or absence of a feature is coded by 1s and 0s. Features receive the value 1 if present, and 0 if absent. As a consequence, the relevant predictors in the model are restricted to the coefficients themselves (since $\unicode[STIX]{x1D6FD}\times 1=\unicode[STIX]{x1D6FD}$ , and $\unicode[STIX]{x1D6FD}\times 0=0$ ). All coefficients can be either positive or negative, indicating which of the outcomes the features favour. For the purposes of the present analysis, a negative coefficient indicates that the feature value is in favour of determiner omission, while a positive coefficient indicates that it is in favour of determiner realization.

The second step (which is actually interdependent with the first step) is to integrate reference values into the intercept. We know that we have more than one sense per preposition and we also know that we can choose from different values for the postnominal extension of the noun in a (B)PP, i.e. one sense, and one postnominal extension will be taken to be the reference, and the coefficients for the other features will be determined in relation to this reference. Let us briefly illustrate this with the preposition mit ‘with’. We assume for PPs headed by mit (in fact for all PPs included in the present analysis) that no extension will be the reference value for postnominal extensions. With regard to the senses of mit, we assume that the mereological sense – presented in (2a) and (7b) – will be the reference (this sense is genuine to the prepositions mit and ohne ‘without’). This means that examples containing the preposition mit will provide an intercept, which contains this information, and further coefficients must be added if the sense of the preposition differs from the mereological sense, or if there is a postnominal extension present.

The third step provides the actual transformation from a numeric predictor to a probability of determiner realization or omission. To achieve this goal, the outcome of the prediction is fed into a link function: The inverse logit maps predictors from – $\infty$ to $+\infty$ to the interval between 0 and 1, and thus provides the desired probability.Footnote [18]

Let us consider two schematic examples to illustrate this. In the first example, the preposition assumes the mereological sense, and no postnominal extension is present. In the second example, we assume that the meaning of the preposition is causal, and that a postnominal genitive NP is present. Both examples are schematically presented in (13).

We assume a model (actually, we are taking the values from the model discussed in Section 4 below) in which the intercept receives a value of –0.84 (that is our $\unicode[STIX]{x1D6FD}_{0}$ ). In line with what we said before, the negative value indicates that the model suggests determiner omission with a certain likelihood on the basis of the intercept. For (13a), with the mereological sense, and no postnominal extension, we can finish the calculations here, and determine the probability of a determiner realization, by feeding the intercept term – which is the only term – into the inverse logit. This will yield a probability of 30.2% for determiner realization. In (13b), we have the intercept term and, in addition, two further features that have to be added to it to account for the differences between the reference values and the actual values of the features. The coefficient for a postnominal genitive NP is 2.30, the coefficient for the sense causal is 3.15. Both coefficients are positive, hence suggest determiner realization. They have to be added to the intercept together with the value of the contrast coding, resulting in the formula in (14).

The contrast coding yields a 1 for both features in the case of (13b). Hence, the value for Y will be be 4.61 ( $=-0.84+2.3+3.15$ ), which – when fed into the inverse logit, yields a probability of 99.02% in favour of determiner realization.

The models presented in Section 4 make use of the same set of fixed features so that fixed features showing relevance for the analysis across models suggest themselves as general properties of BPPs, while the presence or absence of features in individual models indicates a preposition-specific condition.

A model with contrast coding for categorical values, reference values contained in the intercept, and a probability for a possible outcome out of two is called a Generalized Linear Model. It becomes a Generalized Linear Mixed Model by adding a random component. In the present model, we want to gauge the influence of the individual (4,413) noun types on determiner realization, so the noun lemmata are taken as random features in the model. They differ from the features introduced so far in that they are not drawn from a finite vocabulary. As we will gauge their influence on the intercept only, the model is called a random intercept model, and the formula in (12) is extended by adding this influence ( $\unicode[STIX]{x1D703}_{0}$ ) to the intercept term:

The formula in (15) provides an average value for $\unicode[STIX]{x1D703}_{0}$ , which is interesting insofar as the fixed features $X_{1}$ to $X_{\text{n}}$ can be interpreted as the rule-based component of this model, while $\unicode[STIX]{x1D703}_{0}$ is a reflection of idiosyncrasy. If the average value of $\unicode[STIX]{x1D703}_{0}$ were large, then we would have to assume that idiosyncratic values overwhelm the rule-based component of the model, which would lead to the conclusion that the whole process modeled is idiosyncratic rather than rule-based. In the analysis in Section 4, it will turn out that for three of the four prepositions, $\unicode[STIX]{x1D703}_{0}$ is rather small. The exception is the model for über ‘above’, which suggests that BPPs with über result from the lexical influence of individual nouns. On the flipside, we will see that the influence of the random component can be neglected for the preposition ohne ‘without’, the behaviour of which appears to be almost completely rule-based. For the prepositions mit ‘with’ and unter ‘under, below’, there are individual nouns that may exert an influence over the rule-based component. These nouns can be identified by a high individual $\unicode[STIX]{x1D703}_{0}$ , so that the model allows a mixed description, taking rule-based and individual factors into account.

As will become clear in the analyses, not all of the features are actually employed. This is so because the models determine not only a coefficient for the features, but also whether the coefficient’s value is actually the result of a mere accident. If this were the case, we could not be certain whether the actual value of the coefficient would be 0, so the model provides a likelihood for the coefficient actually being 0. If this likelihood is too large (usually, one relies on a boundary of 5%), the feature showing the coefficient will be excluded from the analysis simply because even if it were included, it would be multiplied by its coefficient – the value of which would be 0 – and hence would play no role. There are some interesting exclusions of this kind, which will be discussed below.

3.3 On introspection and (un-)acceptability

The employment of annotation mining and Generalized Linear Mixed Modeling should not be mistaken as a principled statement against introspective judgments on acceptability. The methodology used here should be seen as an addition, and not as a replacement. In fact, in the present analysis introspective judgments play two important roles.

First, annotation mining is based on annotations at all kinds of linguistic levels, and for many of these levels, the annotations have been developed on the basis of introspective judgments. As with many other annotations, syntactic annotations derive at least partially from reference corpus data, and hence from the introspective judgments of the annotators. Although these judgments may have been affected by the sheer quantity of the data in the corpus, they form a basis nevertheless. Thus, the present analysis relies at least partially on now covert introspective judgments of the developers of the annotation schemata. But the present analysis also relies on overt introspective judgments. Some of these judgments are completely uncontroversial, as, for example, the optionality of the determiner in (2) above. Of course, the optionality can be backed up by the corpus data itself, since the pertinent nouns appear with and without a determiner if embedded under mit. The situation becomes controversial when we turn to data that are claimed to be unacceptable.

It is common sense in linguistics that unacceptable data cannot be found in a corpus. Yet, some qualifications are necessary here. Linguistic phenomena follow a so-called Large Number of Rare Events (LNRE) distribution (Baayen Reference Baayen2009); even if the linguistic ‘events’ were finite in nature, this means that the vast majority of them only occurs rarely. Of course, we assume infinity for linguistic events, but we are also aware that infinity should be established among the linguistic tokens, and not among their grammatical descriptions. Compositionality in particular implies that the number of linguistic rules be finite, and it would be counter-intuitive at least to assume that linguistic rules follow an LNRE distribution, too. What does this mean for the question at hand? Our claim is that patterns of unacceptability can actually be derived from a corpus, given that the corpus consists of annotated data (recall Section 3.1 above).

As an illustration of an interesting gap, consider the distribution of complement clauses and senses of the prepositions mit and unter with and without a determiner.Footnote [19] If a determiner is present, complement clauses may show up as postnominal extensions with various senses of the two prepositions. But in the absence of a determiner, we do not find a single example showing a complement clause.Footnote [20] A cross-classification of senses and postnominal extensions provides a schematic syntactic and semantic analysis for PPs – an analysis which gives rise to the suspicion that determiners are obligatory, if postnominal complement clauses are present. Given the size of the samples, it becomes conspicuous that postnominal complement clauses do not occur if a determiner is missing. This observation gives rise to an elicitation of introspective judgments, which reveals, in accord with the above observations, that speakers strongly disfavour the absence of a determiner in such patterns. We thus feel justified to assume that the lack of a determiner actually leads to unacceptability in such cases. Similar considerations apply to other instances of unacceptability reported in the present paper.

3.4 Summary statistics

The analysis is based on data sets for the prepositions mit, ohne, über, and unter as summarized in Table 1.

A few comments are in order here. First, note that the proportion of BPPs for ohne is much higher than the respective proportion of PPs. This preposition differs from all other prepositions in this respect. Apparently, determiner omission has become regular with this preposition – a conclusion that holds not only for ohne in German, but also for zonder (‘without’) in Dutch (Le Bruyn et al. Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012), and for without in English. This distribution also has consequences for the GLMM for ohne, where we are looking for factors that inhibit determiner omission.

Secondly, the data set for mit appears much larger than the other data sets. The data sets for mit, ohne, and unter are the complete data sets found in the sample from the Neue Zürcher Zeitung (https://www.nzz.ch/) corpus based on the 4,413 count noun types, after the filtering procedures (elimination of headlines, titles, etc.) were applied. The larger total number of observations for mit thus simply reflects that relevant examples with mit occur more often and with a larger subset of the 4,413 noun types than relevant examples for ohne and unter.Footnote [21] In the case of über, the data set only shows the comparatively small number of 218 BPPs given in Table 1. We have thus decided to sample PPs from the much larger data set for über, and not to look at the complete data set. The rationale behind this decision is that we are looking for features facilitating determiner omission. Looking at the complete data set would thus not have added further data showing determiner omission, but would have reduced the proportion of the BPPs.

Table 1 Summary statistics of the data sets.

4 Factors of determiner omission

The analysis of determiner realization in PPs headed by mit, ohne, unter, and über is the subject of this section. In the first sub-section, we will discuss what we call the rule-based component of the analysis, which is based on the linguistic features included in the analysis as fixed factors. Subgroups of the features re-occur with the different prepositions, and common properties can be attributed across the prepositions, even if an analysis cannot be based on one and the same feature for the prepositions. As the four prepositions do not share all senses, we do not expect that one and the same sense can be made responsible for determiner omission across the prepositions. What we can show, however, is, that there are specific senses that either inhibit or facilitate determiner omission.

The second sub-section will deal with the random component, which introduces (currently) irreducible lexical idiosyncrasy into the analysis: the random features comprise of the lexical heads of the nominal complements of the prepositions. Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 166, 169f) and Le Bruyn et al. (Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012: 188) have emphasized that the behaviour of a subset of PNCs is largely determined by the noun contained in the construction, so-called ‘N-based’ PNCs. A discussion about the particular influence of individual nouns on determiner realization in German BPPs will thus have to take into account whether the findings to be reported here allow a mere transfer of the earlier assumptions on N-based PNCs to BPPs, or whether they suggest that a different treatment is required. This issue will be addressed in Section 4.3.

The analysis has been carried out in R, with the libraries lme4 (Bates et al. Reference Bates, Maechler, Bolker and Walker2015), blme (Chung et al. Reference Chung, Rabe-Hesketh, Dorie, Gelman and Liu2013), and lsmeans (Lenth Reference Lenth2016). All graphics have been produced with the library ggplot2 (Wickham Reference Wickham2009). The models are Bayesian Generalized Linear Mixed Models (implemented using blme), which are more robust then Generalized Linear Mixed Models when quasi separation becomes an issue. Quasi separation occurs if a predictor (almost) perfectly splits the data. In the models reported here, the conspicuous gap of complement clauses not occurring in BPPs constitutes a case of quasi separation (see Section 3.3 above).

In order to keep the presentation of the models in Section 4.1 transparent, we will present the models without the random components in this section.

4.1 The categorical component of the models

4.1.1 The model for mit ‘with’

The model for mit given in (16) contains the largest set of features, which can be subdivided into postnominal extensions, the senses of the preposition, and three individual features.

The first group provides information about the influence of syntactic extensions of the head noun on determiner omission. Recall from Section 3.2 that a positive coefficient indicates that the feature facilitates determiner realization, while a negative value indicates that the feature facilitates determiner omission. All postnominal extensions bear a positive coefficient and hence suggest that postnominal extensions inhibit determiner omission.

The senses of the preposition provide information about the role of the different senses of the preposition for determiner omission. The annotated senses follow the analysis in Kiss et al. (Reference Kiss, Müller, Roch, Stadtfeld, Börner and Duzy2016). All senses listed for mit show positive coefficients, suggesting that the senses listed inhibit determiner omission (but see immediately below).

A closer scrutiny of the aforementioned groups reveals that features are missing in both groups: There is no coefficient for the feature no postnominal extension, and there is no coefficient for the mereological sense of mit, which has already been introduced in (2) and (7) above, nor is there a coefficient for two other senses of the preposition, namely dependency and association. As for these two, the model will only contain significant coefficients. The role and representation of significant features will be discussed below.

However, the features no postnominal extension and mereological sense (as well as others, to be discussed below) are not listed. They are contained in the intercept term. The intercept term provides information about determiner realization and omission, when the so-called reference values of the features can be applied. If a feature is binary, the reference value will be the absence of the feature (which will pertain to the two other features to be discussed: adjectival modification, and nominalization), if a feature contains more than two values – as with postnominal extensions and senses of the preposition – one feature is chosen as a reference value. In the models for mit and ohne, the reference value for the preposition’s sense is the mereological sense (as is the spatial super-sense in the models for unter and über), and in all models, the reference value for postnominal extension is that there is no such extension. We note that the intercept term is negative, thus indicating that determiner omission is likely in the presence of the reference values.

The feature adjectival modification represents the presence or absence of a pre-nominal modifier. Its reference value is that no such modifier is present. The feature nominalization describes whether the noun contained in the (B)PP is the result of a derivation from a verbal base. Again, the reference value assumes that the noun has not been derived. In sum, the model predicts that determiner omission is more likely than determiner realization if the sense of the preposition is mereological, if neither a prenominal nor a postnominal extension is present, and if the noun is not the result of a derivation.

Fixed features are selected only if they contribute to the analysis. The model has in fact access to further morphological information, such as compounding, but such features do not play a role for the analysis because they are not significant.

Whether or not a feature plays a role is determined by calculating how sure we can be that the coefficient of a given feature value is not actually 0, i.e. that the value given in the model is not only different from 0 by chance. We accept only those coefficients which show a probability of being 0 of less than 5%, which is determined through the Wald test statistic (abbreviated by $z$ ). We have omitted the actual Wald test statistic $z$ and only represent the relevant likelihood of the coefficient being 0 in the column headed by $Pr({>}|z|)$ – as in the model in (16) above. The value ${<}$ 2e-16 indicates that the actual value is smaller than the smallest value which can be calculated by the system used.

With regard to the senses of mit, we note that 12 of the 14 senses are actually significant, while two senses are not significantly different from the reference value, so that changing from the mereological sense to the senses dependency or association (see Kiss et al. Reference Kiss, Müller, Roch, Stadtfeld, Börner and Duzy2016: 111) will not change the prediction. The fact that adjectival modification supports determiner omission (given its negative coefficient) is surprising in light of the claim found in Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 167) that ‘modification is seldom unrestricted [in P+N combinations]’.

The value of the intercept is approximately –0.84. If the intercept term would be the only term indicating whether or not a determiner should be realized in the case of mit, it would suggest determiner realization with a likelihood of only 30%, which is the probability provided by the inverse logit of –0.84. We can thus conclude that the mereological sense and the absence of pre- and postnominal extensions support determiner omission. Let us now take a look how the model deals with the examples, which predictions are made, and how they come about, starting with examples (2a, b), repeated here under (17).Footnote [22]

Example (17a) is predicted to allow determiner omission by the model as it is completely covered by the intercept: The interpretation of the preposition is mereological, there is no pre- or postnominal extension in the nominal projection, and the noun Bildnis ‘effigy’ is not a derived nominal. We can thus calculate the likelihood of determiner realization by feeding the intercept term (–0.84) into the inverse logit, yielding a likelihood of 30.24% for determiner realization.

Next consider (17b). Here, the sense of the preposition is conditional, which inhibits determiner omission with a positive coefficient of 2.21. But (17b) differs from (17a) not only with regard to the sense of the preposition; in addition, the noun is modified by an AP, and it can be classified as a derived nominal, two features which support determiner omission due to their negative coefficients. Taking all factors together, the likelihood for determiner realization reaches a value of 46.01%. (We will return to (17b) in Section 4.2.)

Let us compare these results with the results for the examples provided in (7), repeated here under (18).

The preposition shows the sense modal (instrumental) in (18a), and a postnominal genitive NP. Hence, we find two features with positive coefficients that accordingly inhibit determiner omission. The GLMM for mit predicts a likelihood for determiner realization of 98.33%, matching the ungrammaticality of determiner omission in the example.

While the sense modal facilitates determiner realization, it does so to a much lesser degree than the sense participation (as will become clear shortly, the former makes determiner realization about ten times more likely, but the latter about a hundred times). Consequently, the model predicts that the determiner has to be realized in (18c) with a likelihood of 95.58%. In (18b), we find the mereological sense again, but the postnominal extension is a complement clause. The presence of a complement clause is an even stronger indicator for determiner realization, and the example is predicted to contain a determiner with a likelihood of 99.67%.

The influence of the individual fixed features on the outcome can be made more transparent in a plot of the odds ratio (Agresti Reference Agresti2007: 28–33). The odds ratio indicates how many times more likely or less likely the positive outcome is, given the presence of the feature. In the present case, a positive outcome is the realization, and a negative one the omission of the determiner.

Figure 1 Odds ratios and confidence intervals for significant coefficients in the GLMM for mit ‘with’.

The odds ratios for the significant coefficients of the fixed factors in the model for mit are provided together with their 95% confidence intervals in Figure 1. The horizonal dotted line is provided for the value 1. This value would indicate that the feature makes the outcome neither less nor more likely, i.e. does not have an influence at all. If a feature shows a value below the line, it makes determiner realization less likely (it shows a negative coefficient), if it occurs above the line, it makes it more likely (it shows a positive coefficient). The odds ratios are given on a logarithmic scale, since the horizontal line separates the effects symmetrically, leading to equidistance for the values below and above 1. Hence a feature value with an odds ratio of 0.1 is exactly as far away from 1 as is a feature value with an odds ratio of 10, thus allowing the former to outweigh the latter.

The majority of the postnominal extensions makes determiner realization more than ten times more likely. If no postnominal extension is present, which is indicated in the intercept, then determiner realization becomes less likely, as we have already illustrated. In the following, we will only present the odds ratios, and discuss the implications on their basis.

One could wonder whether the odds ratio in Figure 1 does reveal even more structure regarding the individual features. To this end, we have applied Tukey’s Honest Significance Distance Test (Tukey’s HSD, Baayen Reference Baayen2009: 114–116) to determine whether the features differ from each other.

With regard to the postnominal extensions, it turns out that the features postnominal genitive, PP complement, and PP modifier differ significantly from complement clause, but not from each other. These differences can be interpreted so that the former already make determiner realization more likely, but the presence of the latter makes determiner omission impossible. And this is what we have seen in (18b) – see also the discussion in Section 3.3 above.

With regard to the senses of the preposition, we can identify two groups of features. the senses causal, conditional, and modal make determiner omission more likely, but they sharply differ from the senses participation, point-of-reference, realization, event, and governed, the positive odds ratio of which is much larger, resulting in obligatory determiner realization even if there are no postnominal extensions, as in (18c).

With the exception of (2c), which will be dealt with in Section 4.2, the apparently unsystematic distribution of determiner realization and omission exemplified in (2) and (7) could be accounted for by identifying the features provided in (16) and Figure 1: One sense of the preposition, the mereological sense, facilitates determiner omission, while the other senses inhibit it. Postnominal extensions also inhibit determiner omission, while adjectival modification and nominalization support it. Since these features can be present at the same time, it might be possible that a postnominal extension is present, but a determiner can still be omitted, particularly so, if the noun is modified by an AP or is the result of nominalization. This is the situation found in (8), where we find a postnominal genitive NP suggesting determiner realization together with three properties that suggest determiner omission: the mereological sense, the presence of an AP, the derived nature of the head noun.

4.1.2 The model for ohne ‘without’

The odds ratios for the model of ohne are presented in Figure 2.

Figure 2 Odds ratios and confidence intervals for significant coefficients in the GLMM forohne ‘without’.

The model for ohne contains two features that have not reached significance in the model for mit. They relate to the position of the PP in the German clause (see Müller Reference Müller2015), and to the element with which the PP is combined. Basically, the model says that PPs that modify a verb (governor: V), and PPs occurring in sentence-initial position in German root clauses (phrase: V2) are (a little) less likely to omit a determiner – both values are positive. The two features governor: V and phrase: V2 are specific to ohne – they are the only features that do not appear in the other models, and will be discussed in Section 5 in this respect.

Let us look at the predictions of the model with respect to the data presented in (9), repeated here under (19).

The model for ohne introduces a high negative intercept (–3.74858), which – when fed into the inverse logit – yields a probability of determiner realization of only 2.3%, which is what is predicted for (19a). In addition to assuming that no postnominal extension is present, that the sense of the preposition is mereological, that there is no adjectival modification, and that the noun is not the result of the derived nominal, the intercept also assumes that the PP is not governed by a verb, and that it does not occur in the verb-initial position in a German main clause – all conditions are met in (19a). Given the high negative intercept, the model thus predicts the aforementioned low likelihood of determiner realization, which is in line with the observation that determiner realization is almost completely optional with ohne. The odds ratio shows that all other significant features except for nominalization increase the likelihood of determiner realization. As with mit, the postnominal extension complement clause stands out as a strong inhibitor of determiner omission. We note a much lower number of significant senses, which do not establish a hierarchy (at least not in the sense of Tukey’s HSD test).

Next, consider (19b). This example shows the conditional sense of the preposition, as well as a prepositional complement of the noun. Also, the PP appears in verb-initial position, and is dependent on a verb. Taken together, the likelihood of determiner realization is still only 41%, and again, the determiner can easily be omitted. This is different in (19c), where the following features come together: First, the PP modifies a verb, which is subsumed under the feature governor: V. This feature only has a weak influence (see Figure 2), but the sense conditional, and the postnominal complement clause strongly facilitate determiner realization. The features thus lift the likelihood of determiner realization to 95.8%.

The prepositions mit and ohne share five senses, for which an antonymous relation can be established: mereology, conditional, modal, restrictive, and participation. For each sense, mit provides an inclusive interpretation (as e.g. X is part of Y, X is carried out by use of Y, etc.), while ohne provides the exclusive or privative interpretation (e.g. X is not part of Y, X is not carried out by use of Y, etc.). The results of the analysis contradict Baldwin et al.’s (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006) suggestion that inclusive senses allow determiner omission more easily than exclusive senses. Determiner omission occurs more often – and according to the strong negative intercept: more easily – with ohne than with mit, despite the exclusive senses of the preposition.

4.1.3 The model for unter ‘under, below’

The odds ratios of the model for unter are provided in Figure 3. There is a high positive odds ratio for the intercept, indicating that determiner realization is much more likely than determiner omission. As in the models for mit and ohne, the significant postnominal extensions (PP complement and modifier, postnominal genitive NP, relative clause, apposition and complement clause) further inhibit determiner omission.

The intercept contains the spatial sense of the preposition, hence spatial interpretations of the preposition facilitate determiner realization. The three senses state, modal, and conditional have an inhibiting effect on determiner realization. Interestingly, the largest inhibiting effect is provided by the feature nominalization. Its negative coefficient is almost identical to the intercept (intercept: 4.86; nominalization: –4.576). We will discuss the effect of nominalization in more detail below.

In addition to the spatial sense of the preposition, the intercept contains reference values that are already familiar from the former models: It assumes that there is no postnominal extension, and that the noun is not the result of nominalization.

With regard to the features postnominal extension and senses of the preposition the Tukey HSD test does not reveal anything that is not already apparent from Figure 3: All other postnominal extensions differ significantly in their influence on determiner realization from complement clauses, which make determiner omission more or less impossible. This is quite similar to the situation observed with mit. The consequences of the model can be illustrated with the examples in (10), repeated here under (20).

Figure 3 Odds ratios and confidence intervals for significant coefficients in the GLMM forunter ‘under, below’.

The examples (20a, b) exhibit modal senses: concomitant circumstance in (20a) and instrumental in (20b). Yet, determiner omission is possible in the former, but not in the latter.

Example (20a) contains an additional inhibiting feature: The head noun is a derived nominal. The cumulative effect of the two features with negative coefficients leads to a negative prediction of approximately –3.79, the inverse logit of which yields a probability of 2.21% for determiner realization. The inhibiting features are missing in (20b), which accordingly shows a probability of 68.66% for determiner realization. (We will return to example (20b) in Section 4.2. In addition to the fixed component, the random component also speaks in favour of determiner realization in this case.) Example (20c) shows a non-modified derived nominal as the head noun, but the sense of the preposition is spatial, and there is a postnominal genitive NP present. The probability for determiner realization of 94.23% is not surprising.

4.1.4 The model for über ‘over, above’

The odds ratio for über, presented in Figure 4, differs strongly from the other three odds ratios: There are only significant fixed factors with positive coefficients in the model.

The fixed component predicts determiner omission to be impossible with über, an assumption that has been contradicted by the examples provided in (11) above. Contrary to the observations, all the examples in (11) should require the obligatory realization of the determiner. However, the examples can be accounted for if we consider the random component of the model.

Figure 4 Odds ratios and confidence intervals for significant coefficients in the GLMM forüber ‘over, above’.

4.2 The random component of the models

Before we discuss how the interaction between the random and the fixed components of the models is achieved formally, we will elucidate the linguistic interpretation of the fixed and random components. As an illustration, consider the model for mit. The features given in Figure 1 above can be interpreted as establishing syntactic and semantic patterns that either allow or prohibit determiner omission. An illustration for the first type is given in (21a), an illustration for the second one in (21b).

The position of a possible determiner is indicated through DET in (21). Since determiner omission is optional, a determiner could always be inserted in this position. But (21a) is a pattern in which no determiner has to be inserted, while (21b) requires its presence. The effect of the random component can be characterized as follows: In a pattern like (21b), where determiner omission should not happen, determiner omission still seems to happen with a high likelihood, depending on how the N $^{0}$ position is filled. The random component does not provide a reason for this change (over and above its dependency upon the presence of a specific noun), but properties that have already been captured in the fixed component cannot be made responsible for it.Footnote [23] Consequently, the random component either captures latent properties of certain nouns that have not been established yet or it models true cases of idiosyncrasy. In the current analysis, we cannot go deeper than this regarding the properties of the nouns that make up the random component. Interestingly, we find that determiner realization can also become more likely in patterns like (21a), a point to which we will return in Section 4.3.

Let us now turn to the formal combination of the random and the fixed components. In a random intercept model, the effect of the random component is provided as a standard deviation of the intercept value. A standard deviation is on the same scale as the term of which it is a standard deviation. As an illustration, consider Figure 5, which shows the distribution of the logistic function.

Figure 5 Illustration of influence of the random component on the linear predictor.

Let us assume that the fixed component has provided a linear prediction of 1 on the x-axis. This value is mapped to a probability of determiner realization – P(Determiner) – of 73% by the logistic function, as indicated by the data point (1, 0.73) in Figure 5. If only the fixed factors were considered, the model would thus predict determiner realization with high probability. A random intercept model gauges the influence of random features – the nouns in our case – on the intercept. Let us assume a particular noun that bears an individual value of –3. This value must be added to – in this case, subtracted from – the intercept term, according to the formula (15), provided in Section 3 above, and repeated here under (22).

In (22), $\unicode[STIX]{x1D6FD}_{0}$ is the intercept term, and $\unicode[STIX]{x1D703}_{0}$ is the added random component. Consequently, $\unicode[STIX]{x1D703}_{0}$ – bearing a negative value – can be subtracted from the linear predictor, yielding –2 instead of 1. Of course, the change in the linear predictor (on the x-axis in Figure 5) is reflected in a change in the mapping of the logistic function, from a 73% likelihood without the random component to a mere 12% likelihood after the random component, i.e. the individual influence of the noun present in the PP has been added, as indicated in the data point (–2, 0.12). As a result, the fixed component predicts a determiner realization, while the combination of fixed and random components makes determiner realization rather unlikely.

In the summary of the models, the random component is given as the standard deviation of the intercept. This means that we get a value that is averaged over all possible instances of the random component. We can, however, extract the individual random components for the nouns that occur with the respective prepositions in the corpus. In the following analysis of the random components we will provide individual values for nouns, which show a stable influence in one direction. This means that the noun has either a positive or a negative influence on determiner realization. The individual variance introduced by the nouns (provided as a standard deviation) will be listed together with their 95% confidence intervals, to make clear that the influence is actually one-sided. We will start our discussion with the random component for the prepositions mit and unter.

4.2.1 The random components in the models for mit ‘with’ and unter ‘under, below’

The random components in the models for mit and unter are given in (23).

The summary of the two random components indicates that in both models, the random component has an influence on determiner omission. Consider the preposition unter. If we had to deal with an example that would show all features referenced in the intercept, and no further, then we would yield a likelihood of determiner realization of over 99%. If we further assumed that the variance introduced by the random component could be deducted from the linear prediction of the fixed component (which requires transforming it into the standard deviation to get a number on the same scale), then the likelihood would drop to 85%. Of course, this does not look like a large deduction, but we should keep in mind that we have used averaged values here. If we look into the individual values, a different picture emerges, as can be illustrated with the example in (24).

The preposition in (24) shows an extended sub-sense of the spatial sense, relationship-of-power. Also, the noun Lizenz ‘license’ shows a postnominal extension (a PP complement). Taken together, these features predict a likelihood of determiner realization of 99.89%. The determiner, however, is conspicuously missing in (24). This can be accounted for if the standard deviation introduced by the noun Lizenz ‘license’ is added to the linear predictor. Since this value is negative (–7.26), it is actually subtracted, and the prediction for determiner realization is lowered to 40.81%.

A summary of the nouns that exert a random influence on the linear predictor for unter is provided in Figure 6 (a list of the relevant nouns in the models for unter, mit, and über is provided in the appendix together with an approximate English translation of each noun). In this figure, we find a large variety of nouns with an individual negative influence between –1.54 and –8.28, but we also find nouns with a positive random influence, with values between 0.83 and 4.83. Such nouns are responsible for determiner realization in patterns that would otherwise suggest determiner omission, so that the general optionality of determiner realization in PPs is barred if such a noun occurs as the nominal head in a BPP.

One such noun is Mikroskop ‘microscope’, which we already have seen in example (10b) and repeated as (20b).

The preposition shows a modal (instrumental) sense in (20b), which facilitates determiner omission. But since no other features support determiner omission in the example, the prediction is still in favour of determiner realization (68.66%). And the determiner cannot be omitted in this example. If the positive random influence of the noun is taken into account, the likelihood for determiner realization is raised to 99.00%. We will return to the nouns that exert positive random influence in Section 4.3.

Figure 6 Individual nouns with their influence on the linear predictor of the fixed component for unter ‘under, below’.

A summary of the nouns that exert a random influence on the linear predictor for mit is provided in Figure 7.

Figure 7 Individual nouns with their influence on the linear predictor of the fixed component formit ‘with’.

In Section 4.1, we did not discuss example (2c), which is repeated here:

The sense of the preposition is modal (instrumental), and hence the model predicts a likelihood of determiner realization of 85.38%. The noun Kreditkarte ‘credit card’, however, exerts a strong negative influence on determiner realization and reduces the likelihood of determiner realization to 24.09%.

In addition to (2c), let us have an additional look at (2b).

Here, the fixed component yields a likelihood of determiner realization of 46.01%, which is very close to 50%, i.e. no decision at all. But the random component applies to this example as well. The presence of the noun Genehmigung ‘permission’ lowers the likelihood of determiner realization to a mere 1.1% (see Figure 7).

Similar considerations apply to example (8).

The fixed component yields a prediction of 48.52%. The noun Verankerung ‘anchoring’ shows an additional negative influence, which turns the prediction of determiner realization in this case to 15.84%.

As was already the case with the random component for unter, not only do we find nouns that exert a negative influence but also nouns that exert a positive one, a point to which we will return in Section 4.3.

4.2.2 The random components in the models for ohne ‘without’ and über ‘over, above’

The random components for the prepositions ohne and über form two extremes of random features. The random component for ohne is irrelevant for the prediction. But the fixed component for über prohibits determiner omission altogether. As a consequence, determiner omission with über can only be accounted for by the random component of the model. A summary of the two random components is provided in (25).

The random component for über in (25) tells us that considerable variance in the model is accounted for by the nouns that are realized within the PPs headed by über, which in effect means that it is largely dependent on the nouns whether or not a determiner is realized. We have extracted the nouns that show a stable one-sided influence, and listed them with their individual values and their 95% confidence intervals in Figure 8.

Figure 8 Individual random effects (i.e. nouns) in the model for über ‘over, above’.

In contrast to the random components discussed so far, the random component for über does not contain nouns with positive influence. The effect of the random component can be illustrated with the examples in (11), repeated here under (26).

The BPP shows a causal sense in (26a). The likelihood for determiner realization is given as 98.94% by the fixed component (it is irrelevant in this case that the noun is a derived nominal since this feature does not play a role in the model for über). The noun Behandlung ‘treatment’, however, provides a strong negative value to be added to the intercept (–5.86). Thus, the likelihood of determiner realization is reduced to 9.76% by adding the random component, i.e. the influence of Behandlung on determiner realization in PPs headed by über. Similar considerations apply to (26b). In (26b), the noun Kabelnetz ‘cable net’ has the effect of reducing the likelihood from 99.40% to 31.70%. In (26c), the noun Rationierung ‘rationing’ reduces the likelihood of determiner realization from over 99% to a mere 18.73%. Finally, the pertinent noun in (26d) – Tunnelfinanzierung ‘financing of tunnel’ – does not belong to the random component of über, and hence, the omission of the determiner is not licensed.

In contrast to the random component for the other prepositions, the random component for ohne can be neglected. First, it contains only nine nouns with a one-sided influence, seven of which facilitate determiner realization. Secondly, the individual influence of the nouns ranges from 1.79 to –1.42, which must be set in relation to the high negative intercept of the fixed component (–3.749). If we take the intercept as a reference point, the presence of one of the positively influential nouns would only yield a likelihood of 11% for determiner realization.

4.3 Does the random component provide evidence for N-based PNCs?

The inclusion of a random component in the models for BPPs has emphasized the role of the noun within a BPP. This, however, does not seem to be a new finding. It is widely acknowledged that the behaviour of a subset of PNCs is determined by the noun contained in the construction. This goes so far that Le Bruyn et al. (Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012: 188) call a subclass of PNCs ‘N-based’. A similar class is singled out in Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 166, 169f) and (Stvan Reference Stvan2009: 329–331). A discussion of the particular influence of individual nouns on determiner realization in German BPPs thus has to make clear whether this is just a corroboration of the existence of ‘N-based’ BPPs, or whether the elements of the random component require a different treatment.

Le Bruyn et al. (Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012: 188) point out the following properties of N-based Dutch PNCs: First, the nouns belong to a semantic class of common locative nouns. By the same line of reasoning, Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 169–170) introduce a class of institutionalized nouns. While a similar class of nouns can be identified in German, they show a syntactic behaviour differing from PNCs. These nouns occur in constructions where the preposition and the determiner seem to be amalgamated (so-called Verschmelzungsformen), as illustrated by (27a).

In (27a), we find many of the criteria developed by Stvan Reference Stvan2009, in particular, we find a pragmatic enrichment, which is not present in a construction, where the preposition and the determiner occur independently. While (27a) can only mean that ‘he is serving a sentence’, (27b) shows an ordinary spatial interpretation. Crucially, BPPs are ungrammatical in such contexts, as is illustrated in (27c). German also defies the claim found in Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 164) that ‘articles are regularly omitted in expressions of similar semantic type across languages’. They use institutions, seasons, and metaphors as examples (it should be noted here that ‘institutions’, etc. refer to the meaning of the embedded nouns, not to the meaning of the phrase as a whole). It does not hold for similar constructions in German that the article can be omitted (English examples taken from Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006): at school vs. in *(der) Schule, in winter vs. im Winter vs. in *(einem/dem) Winter, at large vs. auf *(der) Flucht.

Secondly, the nouns occur across prepositions. Le Bruyn et al. Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012 illustrate this with uit bed ‘out of bed’, naar bed ‘to bed’, and in bed ‘in bed’. Thirdly, the nouns cannot be exchanged with near-synonymous nouns or nouns belonging to the same lexical field in the construction, as is illustrated with the contrasts between zolder ‘attic’ and kelder ‘basement’, and school ‘school’ and universiteit ‘university’ in the following examples:

In addition, Le Bruyn et al. Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012 as well as Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 169f.) assume that N-based combinations require a stereotypical pragmatic enrichment, or a semantics similar to a weak definite (Aguilar-Guevara & Zwarts Reference Aguilar-Guevara and Zwarts2011). Furthermore, Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006 and Stvan (Reference Stvan2009: 317–318) show that nouns in English N-based PNCs may occur without a determiner outside of PNCs, so that they systematically violate P II (Restriction to P).

The properties listed above do not apply to the nouns identified in the random components of the models for the German prepositions under discussion. First, we note that the nouns that have been listed in Figures 68 hardly belong to a common semantic class. Secondly, we do not find pragmatic enrichments or a necessary interpretation as a weak definite if the nouns occur within a BPP.

Thirdly, the behaviour across prepositions is much more complex than the pattern provided by Le Bruyn et al. Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012. This is particularly evident with one factor that has not been considered at all in the discussion by Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006, Stvan Reference Stvan2009, and Le Bruyn et al. Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012: The models for unter and mit (as well as the one for ohne) contain not only nouns that facilitate determiner omission, but also nouns that inhibit determiner omission. We would expect that nouns occurring across prepositions show a one-sided effect (i.e. either facilitate or inhibit determiner omission). But the very same noun may support determiner omission with one preposition, and inhibit it with another, as can be witnessed with Vorbehalt ‘prerequisite’ when combined with mit and unter: With the first preposition, the noun has a strong negative influence (–3.636), and hence facilitates determiner omission, with the second preposition, it shows a weak positive influence (0.836), and hence facilitates determiner realization.

Fourthly, we note that the contrast described by Le Bruyn et al. Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012, and illustrated in (28) above, does not pertain to similar cases in German. Consider the sub-sense boundary of the spatial senses of the prepositions über and unter. This sense describes that something has transgressed an upper or lower bound, and is found in phrases like über/unter Plan ‘above/below what was planned’, and über/unter Budget ‘above/below the scheduled budget’.

The nouns belong to the same lexico-semantic field and both nouns occur with über and unter, exerting negative influence on determiner realization. Further semantically similar nouns – such as Tarif ‘pay scale’ – could be substituted, counter the idea that the noun has to remain constant. Also, there is no stereotypical enrichment: Examples containing the phrases receive a completely compositional interpretation, indicating that the external argument of the PP has transgressed either the upper or the lower bound of what is the internal argument of the preposition.

Finally, we should reiterate that the nouns found in the random components do not occur without a determiner outside of PPs.

In sum, the nouns contained in the random components of the models cannot be equated with the nouns identified in the proposals of Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006, Stvan Reference Stvan2009, and Le Bruyn et al. Reference Le Bruyn, de Swart, Zwarts, Graf, Paperno, Szabolcsi and Tellings2012 because they do not share their characteristics.

5 Conclusions and prospects

We have set out observing that the behaviour of BPPs in German is apparently unsystematic, and that acceptability judgments are not easily elicited. Using annotation mining and generalized linear mixed modeling, we have been able to discern three traits of the phenomenon: There are general grammatical rules at work, but they interact with lexeme-specific rules, as well as with idiosyncratic behaviour governed by individual nouns. Given this state of affairs, the apparently random behaviour of BPPs in German does not come as a surprise any longer.

Across all models, we have identified postnominal extensions as a factor that inhibits determiner omission. This conclusion is even corroborated by the model for the preposition über, although the fixed component of this preposition does not contain any feature supporting determiner omission.

For the prepositions mit and unter, we were able to establish a hierarchy between complement clauses on the one hand and all other extensions on the other. For these prepositions, the presence of a complement clause (including non-finite complements) makes determiner omission impossible, while the other extensions may still coincide with determiner omission, if other factors facilitating determiner omission are present as well.

Nominalization appears to be a second general factor supporting determiner omission across prepositions. The model for über did not contain this factor, as it did not reach significance, but even in this model, the calculated value for Nominalization was negative.

At the lexeme-specific level, we have been able to identify senses of individual prepositions that support determiner omission. This holds in particular for the mereological sense of the prepositions mit and ohne. As an interesting observation, we can note that for mit, the mereological sense is actually the only sense of the preposition that supports determiner omission (for ohne, we also find support with the modal sense, the coefficient of which is not sufficiently different from the reference value, and hence the sense is not contained in the model).

For the preposition mit, which shows the largest inventory of senses, we have also been able to establish a hierarchy of the senses. While all senses apart from the mereological sense support determiner realization, their individual strength differs. It is thus much more likely to omit a determiner with the senses conditional, modal, and causal, provided that other features supporting determiner omission are present, while the senses point-of-reference, participation, governed, event, and realization much more strongly inhibit determiner omission, including making it impossible.

In addition to postnominal extensions, nominalization, and the senses of the preposition, we have identified adjectival modification as relevant for determiner omission (in the case of mit). Presently, it must remain an open question why adjectival modification differs in its effect in the case of ohne. This is the only preposition where adjectival modification supports determiner realization. Across the prepositions, we have only identified two features that are confined to a single preposition (ohne): governor: V, and phrase: V2. These features might be an artefact of an interaction of the senses of the preposition with possible positions of the PP, which is not directly reflected in the model: If the PP is governed by a verb or realized in clause-initial position of main clauses, the preposition may not show a sense that requires modification of a noun. This is obvious in the first case, i.e. if the PP modifies a verb; as for the second feature, this seems to be a reflection of the fact that extraction from NP is much more restricted than extraction from inside VP. Consequently, PPs in clause-initial position are taken to modify verbs, not nouns. But the sense that mainly facilitates determiner omission with ohne is mereological, which may only emerge if the PP modifies a noun.

The interaction of the general rules (realize a determiner if the nominal complement shows postnominal extensions; omit a determiner if the noun is a derived nominal) with the individual senses, as well as with the other lexeme-specific factors leads to the ostensibly random pattern observed in examples like (2) and (7)–(11).

This situation is bedevilled by the idiosyncratic influence exerted by individual nouns. The idiosyncrasy introduced by an individual noun can conspire with a more general pattern, as was observed in (8), where the postnominal extension suggests determiner realization, while the mereological sense as well as the adjectival modification suggest determiner omission. Together with the noun Verankerung ‘anchoring’, the dilemma is resolved, and the determiner omitted. A similar influence is found in the model for unter, as was further illustrated with (24). A particularly interesting case is example (10b), because the nominal head inside the PP actually exerts a positive influence, i.e. makes determiner omission even more unlikely.

The analysis of ohne can be fully based on the fixed component, i.e. on the interaction of postnominal extensions, syntactic dependencies, senses of the preposition, and adjectival modification. The possibility of determiner omission with über is fully based on the random component, and hence it is neither the structure nor the interpretation of the PP that allows determiner omission in the examples in (11), but the sheer presence of the respective nouns.

At this point, we have to add a few remarks on features that are not relevant in the analysis. Two such features are of particular importance: compounding and the semantics of nouns.

Baldwin et al. (Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006: 168) make the interesting observation that some PNCs in English require morphological modification, i.e. they are only possible if the noun contained in the P+N combination is a compound, as illustrated by at *(eye) level or at *(company) expense. The examples point to a problematic aspect, which is relevant to the present analysis, and will be discussed briefly below: the role of locality in a formal analysis of BPPs. What we note now, however, is that compounding has been annotated in the present analysis. Depending on the data sets, between 10% and 23% of the nouns were compounds. But the category compound does not play a role in the analysis of BPPs – the feature never reached significance. From the perspective of a language theory assuming lexical integrity, this is what one would expect if morphology and syntax are taken to be two separate realms. Of course, the data presented by Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006 point into a different direction. But further and broader investigation of the construction is needed.

The meaning of the noun provide a further feature that does not play a role in the present analysis. This means that the features representing the semantics of the nouns never reached significance, and hence their inclusion did not improve the models. At the annotation stage, we used GermaNet (Hamp & Feldweg Reference Hamp, Feldweg, Vossen, Calzolari, Adriaens, Sanfilippo and Wilks1997) to provide a rough annotation of noun senses. Now it may very well be that the annotations provided in this way were too sketchy to actually derive semantic classes for the noun. What is conspicuous in this respect, however, is that semantic classes also could not be detected from the random component. The random components would at least have hinted at semantic classes, i.e. we would have found nouns that fall into the same semantic field. Such a conclusion could not be derived from the random component, so that we have to assume that the sense of the preposition is a major factor for determiner omission, but the interpretation of the noun is not.

Let us finally return to a possible implementation of the results reached in the present paper in a theoretical framework, and its implications. As we have shown, there are at least syntactic and semantic factors to be considered in the analysis, so that a theoretical framework which covers both, such as HPSG (Pollard & Sag Reference Pollard and Sag1994), appears to be suitable. In such an analysis, we would code the senses of the preposition disjunctively and would require different subcategorization frames depending on the senses of the preposition. But it would also be necessary to deal with the apparent non-locality of the postnominal extensions. In a framework like HPSG, where local schemata are employed for syntactic combination, information about the realization of postnominal extensions are not usually projected up in a structure. While a technical implementation is feasible, it would obscure the theoretical point why such non-local information sharing is necessary in the analysis. Given the observations by Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006 on the boundary between morphology and syntax, this seems to be an issue, and should be resolved in future research.

APPENDIX

Nouns occurring as random features in the models for mit, unter, and über

The following table lists the nouns given in Figures 68, together with an approximate English translation.

Footnotes

The research reported herein would not have been possible without the members and affiliates of the ‘PNC Project’, to which I am grateful: Daniel Abbassi, Katharina Börner, Monika Duzy, Ron Hoffmann, Halima Husic, Katja Keßelmeier, Antje Müller, Johanna Poppek, Claudia Roch, Nino Simunic, Tobias Stadtfeld, Jan Strunk, and Vanessa Weidmann. Parts of this paper have been presented at Brandeis University, Simon Fraser University, University of Alberta (Edmonton), Friedrich-Alexander-Universität Erlangen-Nürnberg, Universität Leipzig, Stanford University, Universiteit Utrecht, and the Norwegian University of Science and Technology Trondheim. I would like to thank the audiences for their comments. In addition, I would like to thank three anonymous Journal of Linguistics referees for their comments and suggestions. I am indebted to Katharina Börner and Anneli von Könemann for their assistance with the manuscript. Finally, I would like to thank the Deutsche Forschungsgemeinschaft (DFG) for their support under grant KI-759/5.

2 I would like to thank an anonymous JL referee for raising the issue that P II has to consider omission salva veritate in addition to omission salva congruitate. One could argue that P II shares some similarity with stricter versions of the candidate set, as defined in Optimality Theory (Legendre, Smolensky & Wilson Reference Legendre, Smolensky, Wilson, Barbosa, Fox, Hagstrom, McGinnis and Pesetsky1998: 258).

3 In the following examples, indefinite and definite determiners are chosen so that the examples including a determiner appear as neutral as possible. In addition to the definite determiner’s contingency on aspects such as uniqueness and familiarity, definiteness is not necessarily marked by the determiner alone, but may emerge from e.g. embedding the NP under a definite NP, from uniqueness presuppositions, and from embedded structures, among others. In such cases, the insertion of an indefinite determiner would be infelicitous. Consequently, I have included an indefinite determiner if no further marking indicates definiteness (examples (2), (7a, c), (9a), (10), and (11a, c)), and a definite determiner if the NP as a whole showed definiteness markings or uniqueness presuppositions (examples (7b), (9b, c), and (11b, d)).

4 We would like to thank an anonymous JL referee for pointing out the significance of the data in (5).

5 The conditional interpretation of ohne negates the precondition.

6 As in the case of the conditional interpretation, the preposition ohne in its instrumental sense provides information about instruments not used.

7 It is an interesting question (raised by Martin Haspelmath (personal communication) and an anonymous JL referee) how the interpretation of a determiner can be recovered if the determiner is omitted. It should be noted here that the interpretation of BPPs is not necessarily indefinite, thus indicating that the absence of a determiner allows both indefinite and definite interpretations. This conclusion can be reached by looking at (near) minimal pairs containing a definite determiner and no determiner, as e.g. carried out in Kiss (Reference Kiss2007). An illustration is provided in (i) and (ii), where the NP gezogener Pistole in (ii) receives a definite interpretation.

  1. (i) Er bedrohte sein Opfer mit der Pistole. ‘He threatened his victim with a gun.’

  2. (ii) Er bedrohte den 44-jährigen Angestellten mit gezogener Pistole. ‘He threatened the 44-year-old employee with the gun he had pulled.’

8 The omission of the determiner would require a change in the declension class of the adjective, which then would be realized as drittem. The unacceptability after omission of the determiner is retained.

9 We will discuss the identification of ungrammatical examples in Section 3.3.

10 Bresnan et al. Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Kraemer and Zwarts2007 employed logistic regression to determine the distribution of NP vs. PP arguments of ditransitive verbs.

11 This tradition is implicitly addressed in Baldwin et al. Reference Baldwin, Beavers, van der Beek, Bond, Flickinger and Sag2006, where PNCs are characterized as ‘multi-word expressions’.

12 An anonymous JL referee has pointed out that the factors discussed below could equally well license a determiner that is not pronounced. Involving a determiner that is not pronounced instead of assuming that the determiner is simply not present, is, at the present state of the art, mainly a question of scientific economy. If such a determiner without pronunciation is required to analyze different areas of grammar, one could possibly use it here as well. The crucial point is that the conditions presented below will remain the same.

13 Kiss et al. (Reference Kiss, Müller, Roch, Stadtfeld, Börner and Duzy2016: 224–227) discuss stative proximal locative interpretations of three German prepositions (an ‘at’, auf ‘on’, and bei ‘next to’). They induce evidence that determiner omission is disfavoured with these senses.

14 This is so because the coefficients are multiplied by the value of the feature. If the feature is categorical, then its value is 1 if it is present, and 0 if it is absent. Consequently, we are left with the value of the coefficient $n$ in case the feature is present ( $1\times n=n$ ), and with 0 ( $0\times n=0$ ), if the feature is absent. See Section 3.2 below for further details.

15 Advertisements are another area in which truncations are used, but the corpus did not contain advertisements. We would like to thank an anonymous JL referee for pointing out the role of advertisements.

16 For a survey of German clause structure, see Müller Reference Müller2015.

17 The classifier distinguishes between count nouns and non-count nouns and thus leaves it open whether one identifies classes that are count to a certain degree (as suggested in Allan Reference Allan1980). The nouns that have been identified can be assumed to satisfy the criteria imposed on strict count nouns in Allan Reference Allan1980.

18 The inverse logit is defined as $e^{\text{Y}}/(1+e^{\text{Y}})$ , where e is Euler’s number and Y is the result of the linear predictor.

19 It should be noted here that ohne ‘without’ is the only preposition that actually shows complement clauses in BPPs. This observation is in line with the analysis of Section 4, which assumes that determiner omission has become almost perfectly regular with ohne. The preposition über ‘above’ also shows the distributional gap reported here. As the discussion in Sections 4.1 and 4.2 will reveal, however, determiner omission is dependent on individual nouns occurring with this preposition.

20 We have applied a Cramér–von-Mises test (Anderson Reference Anderson1962), which shows that the cross classification of senses and postnominal extensions differs in its distribution, depending on whether a determiner is present or not.

21 We have also developed a GLMM for mit containing a sample of the original data with 6,000 instances. The model based on the smaller sample provides results that are very similar in quantity (relative influence) and quality (for or against determiner omission) for all fixed features, and also for the random features that are shared between the two models. We would like to thank an anonymous JL referee for raising this issue.

22 Example (2c) is falsely predicted to prohibit determiner omission (due to the modal (instrumental) sense of the preposition), while the determiner can in fact be dropped in this example. We will return to this case when we discuss the influence of random features in Section 4.2.

23 If this were not the case, we would expect that significant fixed features become insignificant after inclusion of the random component; see Baayen (Reference Baayen2009: 305).

References

Agresti, Alan. 2007. An introduction to categorical data analysis, 2nd edn. Hoboken, NJ: John Wiley & Sons.Google Scholar
Aguilar-Guevara, Ana & Zwarts, Joost. 2011. Weak definites and reference to kinds. Proceedings of SALT 20, 179196.Google Scholar
Allan, Keith. 1980. Nouns and countability. Language 56.3, 541567.Google Scholar
Anderson, Theodor. 1962. On the distribution of the two-sample Cramér-von-Mises Criterion. The Annals of Mathematical Statistics 33.3, 11481159.Google Scholar
Baayen, R. Harald. 2009. Analyzing linguistic data. Cambridge: Cambridge University Press.Google Scholar
Baldwin, Timothy, Beavers, John, van der Beek, Leonoor, Bond, Francis, Flickinger, Dan & Sag, Ivan A.. 2006. In search of a systematic treatment of determinerless PPs. In Saint-Dizier (ed.), 163179.Google Scholar
Bates, Douglas, Maechler, Martin, Bolker, Ben & Walker, Steven. 2015 lme4: Linear mixed-effects models using ‘Eigen’ and S4. R package version 1.1–8. http://cran.R-project.org/package=lme4 (accessed 18 August 2015).Google Scholar
Borer, Hagit. 2005. Structuring sense: vol. I: In name only. Oxford: Oxford University Press.Google Scholar
Brants, Sabine, Dipper, Stefanie, Eisenberg, Peter, Hansen, Silvia, König, Esther, Lezius, Wolfgang, Rohrer, Christian, Smith, George & Uszkoreit, Hans. 2004. TIGER: Linguistic interpretation of a German corpus. Journal of Language and Computation 2, 597620.Google Scholar
Bresnan, Joan, Cueni, Anna, Nikitina, Tatiana & Baayen, R. Harald. 2007. Predicting the dative alternation. In Bouma, Gerlof, Kraemer, Irene & Zwarts, Joost (eds.), Cognitive foundations of interpretation, 6994. Amsterdam: Royal Netherlands Academy of Science.Google Scholar
Chambers, John M. & Hastie, Trevor (eds.). 1992. Statistical models in S. Pacific Grove, CA: Wadsworth & Brooks/Cole.Google Scholar
Chiarcos, Christian, Dipper, Stefanie, Götze, Michael, Leser, Ulf, Lüdeling, Anke, Ritz, Julia & Stede, Manfred. 2008. A flexible framework for integrating annotations from different tools and tagsets. Traitement Automatique des Langues 49.2, 217246.Google Scholar
Chung, Yeojin, Rabe-Hesketh, Sophia, Dorie, Vincent, Gelman, Andrew & Liu, Jingchen. 2013. A nondegenerate penalized likelihood estimator for variance parameters in multilevel models. Psychometrika 4, 685709.Google Scholar
Dömges, Florian, Kiss, Tibor, Müller, Antje & Roch, Claudia. 2007. Measuring the productivity of determinerless PPs. In Costello, Fintan, Kelleher, John & Volk, Martin (eds.), Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions, Prague, 3137.Google Scholar
Duden. 2005. Duden. Die Grammatik. Mannheim: Bibliographisches Institut & F.A. Brockhaus AG.Google Scholar
Hamp, Birgit & Feldweg, Helmut. 1997. GermaNet: A lexical-semantic net for German. In Vossen, Piek, Calzolari, Nicoletta, Adriaens, Geerd, Sanfilippo, Antonio & Wilks, Yorick (eds.), Proceedings of the ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, 915.Google Scholar
Helbig, Gerhard & Buscha, Joachim. 2007. Deutsche Grammatik. Ein Handbuch für den Ausländerunterricht. Leipzig: Langenscheidt.Google Scholar
Himmelmann, Nikolaus. 1998. Regularity in irregularity: Article use in adpositional phrases. Linguistic Typology 2.3, 315353.Google Scholar
Kiss, Tibor. 2007. Produktivität und Idiomatizität von Präposition–Substantiv-Sequenzen. Zeitschrift für Sprachwissenschaft 26.2, 317345.Google Scholar
Kiss, Tibor & Alexiadou, Artemis (eds.). 2015. Syntax – Theory and analysis: An international handbook (Handbooks of Linguistics and Communication Science 42), vol. 2. Berlin & New York: Mouton de Gruyter.Google Scholar
Kiss, Tibor, Keßelmeier, Katja, Müller, Antje, Roch, Claudia, Stadtfeld, Tobias & Strunk, Jan. 2010. A logistic regression model of Determiner Omission in PPs. In Huang, Chu-Ren & Jurafsky, Dan (eds.), Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, 561569.Google Scholar
Kiss, Tibor, Müller, Antje, Roch, Claudia, Stadtfeld, Tobias, Börner, Katharina & Duzy, Monika. 2016. Ein Handbuch für die Bestimmung und Annotation von Präpositionsbedeutungen im Deutschen (Bochumer Linguistische Arbeitsberichte 14). Bochum: Ruhr-Universität Bochum.Google Scholar
Le Bruyn, Bert, de Swart, Henriëtte & Zwarts, Joost. 2012. Quantificational prepositions. In Graf, Thomas, Paperno, Denis, Szabolcsi, Anna & Tellings, Jos (eds.), Theories of everything: In honor of Ed Keenan (UCLA Working Papers in Linguistics 17), 187196.Google Scholar
Legendre, Géraldine, Smolensky, Paul & Wilson, Colin. 1998. When is less more? Faithfulness and minimal links in wh-chains. In Barbosa, Pilar, Fox, Danny, Hagstrom, Paul, McGinnis, Martha & Pesetsky, David (eds.), Is the best good enough? Optimality and competition in syntax, 249289. Cambridge, MA: The MIT Press.Google Scholar
Lenth, Russell V. 2016. Least-Squares Means: The R Package lsmeans . Journal of Statistical Software 69.1, 133.Google Scholar
Müller, Stefan. 2015. German: A grammatical sketch. In Kiss & Alexiadou(eds.), 14471477.Google Scholar
Nivre, Joakim, Hall, Johan, Kübler, Sandra, McDonald, Ryan, Nilsson, Jens, Riedel, Sebastian & Yuret, Deniz. 2007. The CoNLL 2007 shared task on dependency parsing. Proceedings of the CoNLL Shared Task Session of Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) 2007, Prague, 915932.Google Scholar
Osborne, Timothy. 2015. Dependency Grammar. In Kiss & Alexiadou(eds.), 10271045.Google Scholar
Payne, John & Huddleston, Rodney. 2002. Nouns and noun phrases. In Rodney Huddleston & Geoffrey K. Pullum et al., The Cambridge grammar of the English language, 323–523. Cambridge: Cambridge University Press.Google Scholar
Pelletier, Francis Jeffry. 1975. Non-singular reference: Some preliminaries. Philosophia 5.4, 451465.Google Scholar
Pollard, Carl & Sag, Ivan A.. 1994. Head-driven Phrase Structure Grammar. Chicago, IL: The University of Chicago Press.Google Scholar
Saint-Dizier, Patrick(ed.). 2006. Syntax and semantics of prepositions. Dordrecht: Springer.Google Scholar
Schiller, Anne, Teufel, Simone, Stöckert, Christine & Thielen, Christine. 1999. Guidelines für das Tagging deutscher Textcorpora mit STTS. Ms., Universities of Stuttgart & Tübingen.http://www.sfs.uni-tuebingen.de/resources/stts-1999.pdf (accessed 18 August 2015).Google Scholar
Schröder, Jochen. 1986. Lexikon deutscher Präpositionen. Leipzig: VEB Verlag Enzyklopädie.Google Scholar
Stvan, Laurel Smith. 2009. Semantic incorporation as an account for some bare singular count noun uses in English. Lingua 119.2, 314333.Google Scholar
Trawiński, Beata, Sailer, Manfred & Soehn, Jan-Philipp. 2006. Combinatorial aspects of collocational prepositional phrases. In Saint-Dizier (ed.), 181196.Google Scholar
Wickham, Hadley. 2009. ggplot2: Elegant graphics for data analysis. New York: Springer.Google Scholar
Zuur, Alain, Ieno, Elena N., Walker, Neil, Saveilev, Anatoly & Smith, Graham M.. 2009. Mixed effects models and extensions in ecology with R. New York: Springer.Google Scholar
Figure 0

Table 1 Summary statistics of the data sets.

Figure 1

Figure 1 Odds ratios and confidence intervals for significant coefficients in the GLMM for mit ‘with’.

Figure 2

Figure 2 Odds ratios and confidence intervals for significant coefficients in the GLMM forohne ‘without’.

Figure 3

Figure 3 Odds ratios and confidence intervals for significant coefficients in the GLMM forunter ‘under, below’.

Figure 4

Figure 4 Odds ratios and confidence intervals for significant coefficients in the GLMM forüber ‘over, above’.

Figure 5

Figure 5 Illustration of influence of the random component on the linear predictor.

Figure 6

Figure 6 Individual nouns with their influence on the linear predictor of the fixed component for unter ‘under, below’.

Figure 7

Figure 7 Individual nouns with their influence on the linear predictor of the fixed component formit ‘with’.

Figure 8

Figure 8 Individual random effects (i.e. nouns) in the model for über ‘over, above’.