1. Introduction
In traditional dialectology from the late nineteenth and early twentieth centuries, it was long assumed that (morpho)syntactic phenomena are spatially less structured than phonological, morphological, and lexical phenomena and do not show clear-cut regional distributions; see e.g. Glaser (Reference Glaser and Auer2013) and Scheutz (Reference Scheutz2005) for critical assessments. Moreover, it was widely believed that morphosyntactic phenomena are very difficult to elicit through traditional dialectological methods (e.g. König et al., Reference König, Elspaß and Möller2019:163). As a result, dialectologists have long neglected (morpho)syntax. However, the last few decades have seen increasing research interest in dialect (morpho)syntax. This becomes evident in many studies on single (morpho)syntactic phenomena (e.g. Schallert, Reference Schallert, Grewendorf and Weiß2014; Bülow et al., Reference Bülow, Vergeiner and Elspaß2021b; Moser, Reference Moser2021), as well as in large-scale dialect syntax atlas projects (e.g. the projects Syntax of Hessian Dialects (SyHD) and Syntactic Atlas of German Speaking Switzerland (SADS)). The results of these studies and projects confirm the existence of syntactic areas (see Birkenes & Fleischer, Reference Birkenes and Fleischer2021). They show that “[i]n terms of areal variation … dialect (morpho)syntax is not different, in principle, from what is known about areal variation in accents and dialect vocabularies” but “has a wider areal reach than phonological and lexical variation” (Kortmann, Reference Kortmann, Auer and Schmidt2010:846). In contrast to the latter, syntactic variation is also “much subtler and less salient, less categorical, and in many cases a matter of statistical frequency” (Kortmann, Reference Kortmann, Auer and Schmidt2010:846).
Despite the generally solid level of research into individual (morpho)syntactic features, it remains an open question as to whether different (morpho)syntactic features show patterns of co-occurrence which result in different dialect areas, and if so, whether these areas correspond to areas defined by phonological, morphological, or lexical phenomena; see Birkenes & Fleischer (Reference Birkenes and Fleischer2021:159) for a brief discussion. It should be evident that such questions are challenging to answer by means of studies that focus exclusively on single (morpho)syntactic phenomena. Instead, they call for quantitative dialectometric analyses, abstracting away from individual variables to reveal general geographical structures.
However, dialectometric studies have mostly focused on phonology or lexis so far. Due to the lack of suitable data, the level of (morpho)syntax has only rarely been studied (Wieling & Nerbonne, Reference Wieling and Nerbonne2015:256-257). Some of the exceptions include Spruit (Reference Spruit2006) and Spruit et al. (Reference Spruit, Heeringa and Nerbonne2009) on Dutch dialects or Szmrecsanyi (Reference Szmrecsanyi2012) and Wolk (Reference Wolk2014) on English and Scottish dialects. For German, recent studies have focused on dialects in Hesse (Birkenes & Fleischer, Reference Birkenes and Fleischer2021) and, most importantly, on Swiss German dialects (e.g. Scherrer & Stoeckle, Reference Scherrer and Stoeckle2016; Derungs et al., Reference Derungs, Sieber, Glaser and Weibel2020). In short, these studies identify geographical patterns of (morpho)syntactic variation which often differ from traditional accounts of dialect areas. Notably, since previous dialectometric studies on (morpho)syntactic variation build on aggregative methods developed in the (Romanistic) Salzburg school (e.g. Goebl, Reference Goebl1984) and/or the Groningen school (e.g. Wieling & Nerbonne, Reference Wieling and Nerbonne2015), there are hardly any comprehensive analyses of which types of (morpho)syntactic variables or features contribute to the geographical patterns revealed. To address this issue, non-aggregative geolinguistic measures are perhaps more appropriate, because they preserve the variation of individual features while detecting areal structures (e.g. Grieve, Reference Grieve, Szmrecsanyi and Wälchli2014; Pickl & Pröll, Reference Pickl and Pröll2019).
The aim of the present study is to uncover geographical patterns of (morpho)syntactic variation in traditional Austrian dialects using non-aggregative dialectometric methods. Two research questions are targeted:
-
RQ1 Can geographical patterns for dialect (morpho)syntax in Austria be identified that are based on the co-occurrence of different variants of various variables, and if so, how do these patterns relate to traditional dialect areas?
-
RQ2 Which variants of which variables form the linguistic basis of the geographical structures revealed? Are different (morpho)syntactic variables equally important for geolinguistic structuring, and if not, how can we explain the differences between them?
To address these questions, we draw on a comprehensive dialect corpus obtained by direct dialect interviews. The corpus includes data of 163 speakers from 40 locations throughout Austria. Our analyses are based on the frequency of 79 variants of 30 (morpho)syntactic variables. In order to arrive at generalizable statements beyond individual variables, we use factor analysis.
Section 2 provides a brief overview of previous research on dialect (morpho)syntax in Austria and beyond. In Section 3, we describe the data and methods of our study, and then present our results in Section 4. We conclude with a discussion and summary of our key findings in Section 5.
2. (Morpho)syntax of German dialects in Austria and beyond
Somewhat surprisingly, the increased interest in (morpho)syntactic phenomena in dialects within the last few decades originated in generative grammar (Scheutz, Reference Scheutz2005:292–293) and not in traditional dialectology, although there is some earlier work on individual (morpho)syntactic features in dialects (e.g. Weise, Reference Weise1907, Reference Weise1917). Generativists “discovered” dialects in the 1980s and 1990s in order to address theoretical questions that revolved around specific syntactic phenomena, such as complementizers and negative concord (e.g. Bayer, Reference Bayer, Mascaro and Nespor1990), while at the same time focusing on “microvariation,” i.e. using syntactic structures in dialects as a testing ground for syntax theories. Another reason why dialects became an attractive field of research for theoretical linguists is their status as “natural,” first-order (henceforth L1) languages (Weiß, Reference Weiß1998:3), i.e. L1 varieties that are immediate derivates of internal languages, primarily acquired in family settings. This is not trivial in that the standard varieties in many regions in the German-speaking countries can be assumed to be only indirect derivates of internal languages that do not meet the L1 criterion because they are taught and learned as L2 (second-order language) at school. This is especially true for the Upper German language area, which includes the Bavarian and Alemannic dialect regions in Germany, Austria, Switzerland, Liechtenstein, and South Tyrol.
The fact that the use of dialect in everyday communication is still widespread in the Upper German language area may also be one of the reasons why theoretical linguists have focused on microvariation in the (Central) Bavarian (e.g. Bayer, Reference Bayer1984, Reference Bayer, Mascaro and Nespor1990; Weiß, Reference Weiß1998; Grewendorf & Weiß, Reference Grewendorf and Weiß2014) and Alemannic dialects (e.g. Seiler, Reference Seiler2003; Brandner & Bräuning, Reference Brandner and Bräuning2013). At the same time, since the 1990s, linguists working within the dialectological, sociolinguistic, and typological research paradigms have increasingly addressed Bavarian and Alemannic dialect syntax (e.g. Patocka, Reference Patocka1997; Fleischer, Reference Fleischer and Kortmann2004; Scheutz, Reference Scheutz2005). However, while there are two major dialect syntax projects for Alemannic dialects—the above-mentioned Syntactic Atlas of German Speaking Switzerland (SADS) and the project Syntax of Alemannic (SynAlm, focusing particularly on Alemannic dialects in Germany)—there has been no such project on Bavarian. Thus, despite the great interest in Bavarian dialect syntax, comprehensive variationist studies covering the entire Bavarian language area are still a major desideratum.
Regarding Austria, some comprehensive studies on individual (morpho)syntactic phenomena have been carried out in recent years within the context of the Special Research Program “German in Austria” (DiÖ, Deutsch in Österreich; see Budin et al., Reference Budin, Elspaß, Lenz, Newerkla and Ziegler2019) and the “Dictionary of Bavarian Dialects in Austria” (WBÖ, Wörterbuch der bairischen Mundarten in Österreich). For instance, several studies have been published on variation and change of subjunctive constructions in the Austrian language varieties (cf. contributions to Bülow et al., Reference Bülow, Elspaß and Vergeiner2021a); Bülow et al. (Reference Bülow, Vergeiner and Elspaß2021b) explored structures of adnominal possession; Fingerhuth & Lenz (Reference Fingerhuth and Lenz2020) and Vergeiner & Bülow (2021) studied complementizer agreement in Austria’s dialects; Vergeiner & Hartinger (Reference Vergeiner and Hartinger2022) and Stöckle et al. (Reference Stöckle, Hemetsberger and Stütz2021) focus on negative concord (NC); Bülow et al. (Reference Bülow, Wittibschlager and Lenz2023) investigated the variation of relativizers in attributive relative clauses; Vergeiner & Niehaus (Reference Vergeiner and Niehaus2024) and Vergeiner (Reference Vergeiner2024) examined the syntax of articles; and so on (see Lenz, Reference Lenz, Herrgen and Schmidt2019:333–338 for an overview of previous studies). Most of these studies found significant differences between the syntax of Bavarian and Alemannic dialects in Austria. For example, the use of the particle was to introduce attributive relative clauses (1a) is restricted to the Bavarian dialects in Austria, whereas the use of the particle wo (1b) is restricted to the Alemannic dialects.Footnote 1
Apart from the differences between Alemannic and Bavarian dialect (morpho)syntax, which will also prove to be a prominent factor in the results of our study (see Section 4.1), previous research has shown that the geographical patterns in dialect (morpho)syntax do not always correspond to traditional dialect classifications (see the dialect map in Figure 1 below). For a number of individual phenomena such as subjunctive and negative concord (NC), there is no north–south division in Bavarian, as shown in Figure 1, but only an east–west division. Regarding the use of NC, for example, negative spread constructions (2a) predominate in the western parts of Austria, namely in the Alemannic and South Bavarian dialect regions in Tyrol, whereas negative doubling constructions (2b) are limited to the central and eastern Bavarian dialect regions of Austria (Moser, Reference Moser2021; Vergeiner & Hartinger, Reference Vergeiner and Hartinger2022).
In sum, there is an extensive and growing body of literature on individual (morpho)syntactic phenomena across traditional dialects in Austria on a broad empirical basis. However, there is a lack of studies that shift the focus away from case studies on individual phenomena towards general areal patterns of (morpho)syntax in the Bavarian and Alemannic dialects in Austria. Moreover, there are no dialectometric studies on Austrian dialect syntax which could help to detect such patterns. This is where our study comes in, by addressing precisely these desiderata.
3. Data and methods
For our geolinguistic analyses of dialect (morpho)syntax in Austria, we draw on a dialect survey conducted within the framework of the project “Variation and Change of Dialect Varieties in Austria (in Real and Apparent Time).”Footnote 2 In what follows, we describe our project design, research locations, and participants (Section 3.1), before we elaborate on our variables (Section 3.2) and statistical methods (Section 3.3).
3.1. Research design, research locations, and participants
The data of the present study consist of dialect recordings obtained by trained fieldworkers. A survey was conducted in 40 small rural villages. Figure 1 shows the distribution of the research locations according to the most widely accepted classification of dialect areas in Austria (Wiesinger, Reference Wiesinger and Besch1983). Austria comprises a small Alemannic dialect area in the far west (Vorarlberg) and a much larger Bavarian dialect area with a Bavarian–Alemannic transition zone in between. The Bavarian area is divided into Central Bavarian, South-Central Bavarian, and South Bavarian dialects. Notably, this classification is based on a qualitative structuralist approach drawing mostly on phonology and partly also on (inflectional) morphology but ignores syntax completely (Wiesinger, Reference Wiesinger and Besch1983:813).
Four speakers of the traditional dialect from each location participated in our study.Footnote 3 Two participants were chosen from an older (65+ years) and two from a younger (18–35 years) generation, with one male and one female per age group. In sum, the sample consists of 163 speakers. Traditional dialectological criteria for sampling were applied (see Chambers & Trudgill, Reference Chambers and Trudgill1998). The older speakers are typical NORM/Fs (= non-mobile, old, rural males/females). The younger participants can also be considered prototypical dialect speakers in that they have been raised in local artisanal or agricultural networks and have not received higher education. Their parents were born and raised in the same location. Both their social and working lives are centered in the same local environments.
3.2. Variables
The interviews were conducted by trained fieldworkers using a traditional dialect questionnaire, which included several tasks such as translation tasks, cloze tasks, and picture naming tasks that were designed to elicit traditional dialect features on all linguistic levels. To analyze syntactic features, mainly translation tasks were employed. In these tasks, the participants had to translate sentences read out in standard German into their own dialects. While doing so, the participants were encouraged to use not only phonetic or morphological features but also syntactic features which they consider most natural in their everyday dialect. For example, there were twelve translation tasks in the questionnaire referring to a possessive relation, e.g. Wo sind Mutters Schuhe? ‘Where are mother’s shoes?’ or Das ist Annas Fuß ‘This is Anna’s leg’. To translate these sentences, most participants used either adnominal possessive dative constructions or competing variants such as von (‘of’) or genitive constructions (Bülow et al., Reference Bülow, Vergeiner and Elspaß2021b).
Over the past few years, the data have been extensively analyzed in feature-based studies on (morpho)syntax (see Section 2). For the present study, we move away from individual features to detect more general patterns of variation. To do so, we selected a set of 30 variables, shown in Table 1. The variables include some of the most important characteristics of dialect (morpho)syntax in Austria, and as such they are discussed in feature-based studies and general overviews of this topic (e.g. Lenz, Reference Lenz, Herrgen and Schmidt2019:333–338). In addition, the variables can be assigned to different areas of (morpho)syntax, such as verbal and nominal syntax, government and agreement relations, conjunctions, and word order. Each variable was elicited by means of several different stimuli (mean = 5.1 stimuli per variable; median = 4.5), and in sum, our analyses include 24,140 responses (mean = 805 tokens per variable; median = 680.5).
The data for each variable are coded on a categorical scale, often of binary nature. In total, 79 variants were distinguished (mean = 2.6 variants per variable; median = 2).Footnote 4 Since our corpus includes four speakers per location and at least two stimuli per variable, relative frequency distributions could be calculated for all variants per location. These relative frequencies are used in our subsequent analyses. Due to the less salient nature of syntactic data in comparison to phonological and lexical data (Kortmann, Reference Kortmann, Auer and Schmidt2010:846), our approach seems to have significant advantages over previous work, which is often based either on the presence or absence of a feature (Spruit, Reference Spruit2006; Spruit et al., Reference Spruit, Heeringa and Nerbonne2009) or on the choice of one dominant variant per location (Scherrer & Stoeckle, Reference Scherrer and Stoeckle2016). Another major difference to most previous work is our use of non-aggregative dialectometric methods, which are explained in the following section.
3.3 Statistical methods
Most dialectometrical studies are based on aggregation: they add up differences (or similarities) between every pair of locations to create a site-by-site distance matrix. This matrix is then used as input for further statistical analyses such as multidimensional scaling or cluster analysis to reveal geographical structures (e.g. Goebl, Reference Goebl1984; Wieling & Nerbonne, Reference Wieling and Nerbonne2015). Although this approach has led to valuable insights, it has the main disadvantage that information on individual variant distributions gets lost, and the linguistic basis of aggregate dialect differences can only be added again in a post hoc fashion; for a discussion, see Wieling & Nerbonne (Reference Wieling and Nerbonne2015:248-250). This problem is avoided in non-aggregative measures such as factor analysis (FA) or principal component analysis (PCA), which have been used as alternative approaches (e.g. Shackleton, Reference Shackleton2005; Nerbonne, Reference Nerbonne2006; Leino & Hyvönen, Reference Leino and Hyvönen2008; Szmrecsanyi, Reference Szmrecsanyi2012; Grieve, Reference Grieve, Szmrecsanyi and Wälchli2014; Pickl, Reference Pickl, Côté, Knooihuizen and Nerbonne2016; Pickl & Pröll, Reference Pickl and Pröll2019; Pickl et al., Reference Pickl, Pröll, Elspaß and Möller2019). For the present study, we apply a factor analysis because it is proved to be particularly suitable for identifying areal structures (e.g. Leino & Hyvönen, Reference Leino and Hyvönen2008; Vergeiner & Bülow, Reference Vergeiner and Bülow2023).
A factor analysis (FA) is a multivariate statistical method for identifying patterns of variation in a data set. Based on a correlation matrix, it subsumes variants that are correlated with one another but largely independent of other variants under a small set of underlying constructs, so-called “factors.” By means of this procedure, the variation in the data is reduced and restructured by identifying latent patterns behind the variants. In doing so, FA preserves as much information as possible from the original data set.
In the present study, FA is based on the interrelations between the research locations with regard to (morpho)syntactic variation. Notably, FA does not presuppose any information about the geographical positions of the research locations. Rather it reveals areal patterns only if there are sufficiently strong geographical signals in the linguistic data itself.
For the interpretation of the factor solution, two parameters are crucial.
-
First, the factor loadings function as a measure of the relation between a factor and, in our study, the research locations. A loading close to 1 signals a high positive correlation between a factor and a location, indicating that the factor accounts for most of a location’s variance. In contrast, loadings close to 0 suggest no correlation, and loadings < 0 indicate a negative correlation.
-
Second, the factor scores reveal how a particular variant ranks on a given factor. A high positive factor score signals that a given variant is (positively) associated with a factor while scores close to 0 indicate no association whatsoever. Negative associations result in negative factor scores.
For this study, FA were performed with the software IBM SPSS Statistics using principal axis factoring with varimax rotation. The estimation of the factor scores is based on the regression method. Notably, both the Kaiser–Meyer–Olkin Measure of Sampling Adequacy (= 0.759) and Bartlett’s Test of Sphericity (X2 = 6302.45, p < 0.000***) indicate that the data fit well to the analysis.
4. Results
In what follows, we present the results for a factor solution with four factors. This solution is based on the Kaiser–Guttman Criterion (eigenvalues < 1). In total, the four factors account for not less than 82% of the variance in the data. Factor 1 (= F1) accounts for about 57.2%, F2 for about 11.4%, F3 for about 8.7%, and, finally, F4 for about 4.6%.
Section 4.1 focuses on RQ1; that is, it investigates the geographical structures. Section 4.2 addresses RQ2, dealing with the linguistic patterns underlying these structures.
4.1. Geographical patterns
Figure 2 provides a first overview of the geographical patterns. To enhance visibility, we generated an area-class map based on Voronoi partition (using the software REDE SprachGIS; https://www.regionalsprache.de/SprachGIS/Map.aspx). The colors in the map (Figure 2) display the dominant factors in each research location, i.e. the factors with the highest factor loadings, so that, for example, a red coloring indicates that F1 is dominant in the respective location. Green stands for F2, blue for F3, and black for F4. The shadings reveal the relative strength of the dominant factor loadings: the darker the shading, the higher the factor loadings. Hence, the darker shaded areas can be interpreted as core dialect areas and the lighter shaded areas as transition zones or “foothills.”
Figure 2 clearly indicates a strong geographical signal in our (morpho)syntactic data. Based on the dominant factor loadings, we find several coherent areas.
-
F1 (red) dominates in the northwestern parts of Austria, in particular in the regions of Salzburg and Upper Austria, but also in northern Burgenland.
-
F2 (green) is the most important factor in the eastern half of Austria and also in parts of Tyrol and Carinthia.
-
F3 (blue) predominates in the south(west), i.e. in most parts of Carinthia and also in Tyrol.
-
The loadings on F4 (black) are highest in the westernmost regions, i.e. in Vorarlberg and western Tyrol.
At least to a certain extent, the geographical patterns visible in Figure 2 resemble the traditional dialect classification of Austria (see Section 3.1). The area where F4 is dominant comprises the Alemannic dialects and also the Bavarian–Alemannic transition zone. In most South Bavarian locations, the factor loadings on F3 outweigh the other factor loadings. Interestingly, however, the differentiation between the Central and South-Central Bavarian dialects made on phonological maps (see Figure 1) does not materialize here for syntactic variation. Instead of a north–south division, Figure 2 displays an east–west division for the Central and South-Central Bavarian dialects. Although there is not a complete overlap, our results indicate that the geographical patterns in (morpho)syntactic variation partially correspond to the traditional dialect classification explained in Section 3. However, mapping only the dominant factors “ignores the variation ‘below’ the threshold of dominance,” i.e. “the locally non-dominant parts of the globally dominant factors” (Pickl, Reference Pickl, Côté, Knooihuizen and Nerbonne2016:91). To account for this fine-grained variation, we have to focus on one factor at a time, as reflected in the four individual heatmaps in Figure 3.
Figure 3 reveals that F1, although dominant mostly in the northwest, also has rather high loadings in the eastern parts of Austria. In contrast, there is a clear boundary to the southern and most western dialects. The geographical patterns for F2 are more scattered since there are moderately high loadings spread across Austria, except for some western and southwestern regions. In general, there is a broad continuum visible for F2, in both a northwestern and southwestern direction. For F3 and F4, the patterns are straightforward: F3 is a latent factor in several Tyrolean and Carinthian locations and shows minor loadings eastwards into Styria. For F4, a sharp boundary becomes apparent. Note that there is no spatial continuum for F4.
4.2 Linguistic patterns
RQ2 is concerned with the linguistic basis of the geographical structures discussed in Section 4.1. As explained in Section 3.3, this can be analyzed by examining the factor scores which indicate the association between a given factor and the linguistic variants. To this end, Table 2 shows the top three (positive) factor scores for each individual factor.
Regarding F1, the factor scores are highest for the usage of complementizer agreement in the singular (3a) as opposed to its non-occurrence (3b); the use of an indefinite article before mass nouns (4a) as opposed to the lack of an indefinite article (4b); the use of eine as a partitive pronoun in the singular (5a) as opposed to the use of etwas (5b) or a null morpheme (5c).
F2 is strongly associated with the following features: preterite forms of sein (‘be’) (6a) as opposed to forms in the perfect tense (6b); overtly inflected adjectives in the nominative (7a) as opposed to adjective forms lacking overt inflection (7b); and finally, IPP constructions (infinitivus pro participio) with the modal verb können (8a) as opposed to forms without IPP (= constructions with a participle) (8b).
Most characteristic for F3 is the prepositional dative as in (9a), in contrast to (9b); the use of di/de (10a) instead of the weak article d (10b) before feminine and plural nouns; and the lack of article doubling as in (11a), in contrast to (11b).
Finally, typical features for F4 are the non-occurrence of complementizer agreement in the singular (3b); the usage of the weak article d before feminine and plural nouns (10b); and IPP constructions with the modal verb können (8a).
The factor scores indicate how the variants are ranked on one particular factor. Note that variants can have (relatively) high positive factor scores for more than one factor, and they can have high negative factor scores for other factors. For example, the use of the prepositional dative construction (see (9a)) has high positive factor scores not only for F3 (2.27) but also for F1 (1.21), while it has rather high negative factor scores for F2 (−1.55) and F4 (−1.28). This indicates that the variance of this particular feature is well accounted for by the factor solution, and it can thus be regarded a key feature for the geographical patterns discussed above. On the other hand, several other variants have generally factor scores close to 0. For instance, the use of prepositional phrases (12a) instead of pronominal adverbs—either simple pronominal adverbs (12b), or pronominal adverbs with “short doubling” (12c) or “long doubling” (12d) (see Fleischer, Reference Fleischer, Barbiers, Cornips and van der Kleij2002)—has factor scores close to 0 for all factors (F1, −0.3; F2, −0.2; F3, −0.28, F4, −0.12).
To examine which variants are most strongly—either positively or negatively—associated with all four factors, and thus best explained by the geographical structures revealed, we calculated a composite Factor Score Index for all variants. We did so by adding up the absolute values of all four factor scores per variant and dividing them by the total number of factors (= 4). Table 3 shows the results for the top three and lowest three Factor Score Index values.
Note that the Factor Score Index values are highest for the occurrence of the prepositional dative construction (see (9a)), the weak article d before feminine and plural nouns (see (10b)), and the absence of complementizer agreement in the singular (see (3b)). The values are lowest for the use of prepositional phrases instead of pronominal adverbs (see (12a)), for the use of the d-pronoun as a relativizer (13a) instead of was (13b), d- was (13c), or wo (13d), and the use of als wie (14a) instead of als (14b) or wie (14c) as a comparative particle in inequality relations. This comes as no surprise, since variants such as the prepositional dative construction or complementizer agreement are regionally restricted forms with limited areal distribution (see e.g. Vergeiner & Bülow, 2021 for complementizer agreement), while variants such as the d-pronoun as a relativizer are used throughout Austria with no clear areal patterns (e.g. Bülow et al., Reference Bülow, Wittibschlager and Lenz2023 for relativizers).
Notably, however, not all features with a clear geographical distribution show high factor score values or high Factor Score Index values. This can be demonstrated when comparing these values with the Moran’s I statistics for the variants. Moran’s I is the most common statistical measure for (global) spatial autocorrelation and clustering. It assesses whether neighboring locations tend to have above-average similarities or dissimilarities. Moran’s I values range from −1 to 1, with values around 0 resulting from random distributions (= no spatial autocorrelation). When the data are spatially dispersed and neighboring locations are highly dissimilar, Moran’s I approximates −1. In the case of strong spatial clustering, Moran’s I gets close to 1; for the use of spatial autocorrelation in dialectometry, see e.g. Grieve et al. (Reference Grieve, Speelman and Geeraerts2011), Grieve (Reference Grieve, Szmrecsanyi and Wälchli2014), and Szmrecsanyi (Reference Szmrecsanyi2012).
We computed Moran’s I for all 79 variants using the open-source software tool GeoDa 1.20 (https://geodacenter.github.io/). For defining spatial weights, we employed queen-based contiguity weights. Significance testing was based on Monte Carlo randomization tests with 9999 permutations. The significance level was set at pseudo p < 0.0008 (= 0.05/60) with Bonferroni correction for multiplicity.Footnote 5 Not surprisingly, all variants except two show positive values for Moran’s I, with a mean value of 0.37 (see also the histogram in Figure 4, left). In most cases (43 variants of 21 variables), the result is significant.
Interestingly, there is only an average correlation of r = 0.4 (p < 0.001***) between the Moran’s I values and the Factor Score Index values (based on a Pearson correlation analysis) (see Figure 4, right). On the one hand, this can be attributed to the fact that the factor solution does not account very well for some variants with clear-cut regional distributions. For example, the use of adnominal possessive dative constructions (15a) instead of prenominal genitive constructions (15b) or pre- and postnominal von-constructions (15c, 15d) has a significant high Moran’s I value (0.7) but rather low factor scores (F1, −0.1; F2, 0.7; F3, −0.4; F4, −1.1) which indicates that there is only a minor overlap with the geographical patterns of other variants.
On the other hand, there are variants without clear-cut regional patterns but rather high factor scores. For instance, the occurrence of overtly inflected adjectives in the nominative (see (7a)) has a non-significant Moran’s I value (0.2) but rather high factor scores (F1, 1; F2, −0.8, F3, 1.3; F4, −0.7). This is in line with the finding of Szmrecsanyi (Reference Szmrecsanyi2012:153) that “even features that appear to be distributed non-geographically when analyzed in isolation may help to create geographically more or less focused layers of morphosyntactic variability in conjunction with other features.”
Finally, we also calculated mean Factor Score Index values for the variables by simply adding up the Factor Score Index values for all variants of a variable and dividing them by the total number of variants per variable. These mean values indicate which variables are, on average, most important for the geographical structures detected. Table 4 shows the results for all 30 variables, with the variables ranked by the mean Factor Score Index values. In addition, Table 4 also shows the mean Moran’s I values for the variables.
What is interesting about Table 4 is that, on average, variables at the syntax-morphology interface seem to have a higher Factor Score Index than features related to other aspects of syntax (e.g. word order variation). This applies, in particular, to features concerning nominal morphology, and, to a somewhat smaller degree, verbal morphology. Although it is difficult to draw a categorical distinction between the variables in that regard, we highlighted those variables in Table 4 which arguably have the closest links to morphology. As can be seen, these variables tend to be positioned in the upper half of the table. This result may be due to the fact that variants which are of a more morphological nature tend to co-occur more strongly and, thus, tend to form more coherent areas with one another than with the variants of other variables. We will discuss this aspect, among others, in Section 5.
5. Discussion and summary
The present study aimed at detecting areal patterns of (morpho)syntactic variation in traditional Austrian dialects using methods of non-aggregative dialectometry. In this final section, we address the two research questions and include a discussion of methodological and further linguistic implications of our findings.
In response to RQ1, our results indicate clear geographical patterns of (morpho)syntactic variation in Austria. The data from the factor analysis demonstrate a strong geographical signal in our (morpho)syntactic data. This was not necessarily expected, as it is a common assumption that syntax is geographically less structured than lexis or phonology; see e.g. Glaser (Reference Glaser and Auer2013) and Kortmann (Reference Kortmann, Auer and Schmidt2010) for a discussion. It is noteworthy that these patterns could be shown with data gathered using more traditional methods of data collection, i.e. direct dialect interviews (see Section 3). Note also that it has often been questioned whether these methods allow us to detect geographical patterns in (morpho)syntax at all; see e.g. Scheutz (Reference Scheutz2005) for a discussion.
Compared to traditional dialect maps such as Wiesinger (Reference Wiesinger and Besch1983), which are mostly based on phonological variation, the geographical patterns we identified show both similarities and differences. Most notably, our findings confirm the strong contrast between the Bavarian and Alemannic dialects as well as the peculiarities of South Bavarian; however, the north–south divisions between Central, South-Central, and South Bavarian which shape the dialect landscape of Austria on traditional maps (see Figure 1) are not reflected in our results. It could be assumed from these findings that the geographical structures depend on the level of linguistic architecture under investigation, i.e. that the syntactic patterns simply look different from the phonological or lexical ones. However, this assumption can be questioned by argument and empirical evidence. First, traditional classifications of German dialects are not based on patterns of a large number of variables, but rather on boundaries defined by relatively few phonological isoglosses. Second, a study by Pickl et al. (Reference Pickl, Pröll, Elspaß and Möller2019), which used a similar method of factor analysis but drew on data from “colloquial” lexis,Footnote 6 shows very similar geographical patterns. In particular, the three dominant factors in this study correspond to Factors 1, 2, and 4 in our study.
RQ2 asks about the strength of the contribution of individual variables to the formation of linguistic geographical structures. In our data, certain variants generally show high factor scores, others comparatively low scores. According to our findings, this is only partly related to the extent to which the variants have a stronger spatial distribution at all (see the measures of spatial autocorrelation in Section 4.2). Rather, variants that differ spatially only in their frequency distribution often seem to be important for structuring the language space; see Szmrecsanyi (Reference Szmrecsanyi2012) for similar findings for English. Such variants have often been overlooked in previous studies on dialect syntax (e.g. the space-structuring function of the past tense of the copula verb sein ‘to be’; see Factor 2) or they were featured as omnipresent in Upper German dialects in Austria (e.g. Lenz, Reference Lenz, Herrgen and Schmidt2019:336 on article doubling; see Factor 3).
In sum, we were able to show that certain variants co-occur more strongly with each other than others, which is reflected in higher factor scores. For instance, the usage—and non-usage (!)—of the prepositional dative construction (see (9a)), the weak article d before feminine and plural nouns (see (10b)), or complementizer agreement in the singular (see (3a)) shows clear patterns of co-occurrence with other variants, while variants such as the comparative particle als wie for inequality relations (see (14a)), the use of the d-pronoun as a relativizer (see (13a)), or the use of prepositional phrases for pronominal adverbs (see (12a)) do not exhibit such patterns.
Against the backdrop of this finding we obtained Mean Factor Score Index values for individual variables; that is, we determined which variables, on average, have the highest impact on the structure of the linguistic space under investigation. It emerges that variants of syntactic variables with an interface to morphology rank highest on the list (see Table 4). This suggests that these (morpho)syntactic variables are similar to each other on a (deep) structural and/or typological level (e.g. with respect to their nature as more analytic vs. more synthetic structures). The relevant similarities may be attributed to the fact that the formation processes of these variables are rooted in similar historical conditions. For instance, many constructions such as prepositional dative constructions, complementizer agreement, the weak article d, or zero inflection of the attributive adjective are all related to reduction or “weakening” processes. To get a clearer picture of these connections is a desideratum for future research.
In methodological terms, the results from this study show that dialectometric procedures can contribute to answering not only questions of linguistic geography but also structural questions. The method of factor analysis adapted here is innovative compared to earlier methods in dialectology (and aggregative methods in dialectometry, in particular) in that it assesses which variants and variables are most important for the formation of linguistic geographical structures. Moreover, it does not assume homogeneity at one location but allows for variation—or orderly heterogeneity—at specific data points in space. Thus, this approach is not based on the mere presence or absence of a feature or on the dominance of a variant per location, as in previous work, but accounts for actual variation which may paint a more realistic picture of language use at individual locations.
Acknowledgments
The research for the present article was conducted within the framework of the project “Variation and change of dialect varieties in Austria (in real und apparent time)” by the Austrian Science Foundation (FWF), grant no. F 06002, as part of the Special Research Program “German in Austria” (SFB F 60).
Competing interests
The authors declare none.