Introduction
Acquisition of sound systems may often be imperfect in intermediate learning stages, exhibiting discrepancies of particular sound patterns from input. An important question for acquisitionists is to understand why learners have difficulties in learning some sound patterns more than others. One hypothesis concerns the effect of learning biases: learners’ cognitive system imposes constraints on learning that make learning certain sound patterns more difficult than others. Work on learning biases in phonology has focused on discovering their existence and underlying mechanisms. Primarily using artificial language learning paradigms, learning bias studies in phonology have tested two types of learning bias hypotheses: (a) structurally more complex patterns are harder to learn, called the complexity bias hypothesis (e.g., Chambers, Onishi & Fisher, Reference Chambers, Onishi and Fisher2010; Cristià & Seidl, Reference Cristià and Seidl2008; Kuo, Reference Kuo2008; Pycha, Nowak, Shin & Shosted, Reference Pycha, Nowak, Shin, Shosted, Garding and Tsujimura2003; Saffran & Thiessen, Reference Saffran and Thiessen2003; Skoruppa, Reference Skoruppa2009), and (b) patterns lacking phonetic substance are harder to learn, called the substantive bias hypothesis (e.g., Carpenter, Reference Carpenter, Brugos, Clark-Cotton and Ha2005, Reference Carpenter2006, Reference Carpenter2010; Koo, Reference Koo2007; Nevin, Reference Nevin and Kibbee2010; Peperkamp & Bouchon, Reference Peperkamp and Bouchon2011; Pycha et al., Reference Pycha, Nowak, Shin, Shosted, Garding and Tsujimura2003; Toro, Shukla, Nespor & Endress, Reference Toro, Shukla, Nespor and Endress2008; Wilson, Reference Wilson, Garding and Tsujimura2003, Reference Wilson2006; Zaba, Reference Zaba2008). While findings of complexity bias in artificial language learning are generally robust, the diagnosis of (phonetic) substantive bias in artificial language learning paradigms is more difficult. This latter bias is generally thought to have a far more subtle effect on language learning (Moreton & Pater, Reference Moreton and Pater2012b).
Despite fruitful results documenting substantive learning biases (see Moreton & Pater, Reference Moreton and Pater2012a, Reference Moreton and Pater2012b for a review), a crucial gap remains in the literature. Most work testing the learning bias hypotheses has predominantly assumed categorical phonology. For instance, in Saffran and Thiessen (Reference Saffran and Thiessen2003), learners were trained on a language in which [p, t, k] were found exclusively in one phonological environment (e.g., word-finally) and [b, d, g] in other environment (e.g., word-medially), and the learning outcome was compared to another group where the pattern was restricted to [p, d, k] vs. [b, t, g]. The results were in favor of the complexity bias, in that learners more readily learned the classification of [p, t, k] vs. [b, d, g], which depends on a single stimulus feature (i.e., voicing contrast only), than that of [p, d, k] vs. [b, t, g], which depends on multiple features. Such experimental evidence leads to the idea that the complexity bias might reflect some positional restrictions of phonemes observed in language universals: cross-linguistically, sounds restricted in one position are likely to share a single feature rather than multiple features (e.g., [p, t, k] in word-final positions rather than [p, d, k]). Results from such categorical pattern learning provide insights as to why particular phonological patterns are unattested or underrepresented in languages. However, natural-language phonology does not exhibit purely categorical patterns. In fact, variable phonological patterns are prevalent in languages. For example, voiceless stops in a word-final position may be pronounced with an unaspirated stop as in [khæt] or with a glottal stop as in [khætʔ], just one of many sites of phonological variation in English. Stochastic phonological variability may also occur across word boundaries. For instance, English /t, d/ in word-final consonant clusters are more likely to be deleted when followed by a consonant-initial word, such as in the phrase last call → [læst kɔl] ~ [læs kɔl]. Shapes and distributions of phonological variants are partially predictable when linguistic and social factors are considered, but the predictions are neither deterministic nor absolute (Cedergren & Sankoff, Reference Cedergren and Sankoff1974; Guy & Boberg, Reference Guy and Boberg1997; Sankoff, Reference Sankoff1978; Weinreich, Labov & Herzog, Reference Weinreich, Labov, Herzog, Lehmann and Malkiel1968).
If a goal of the learning bias research program is to link laboratory evidence to language acquisition, language change, and language universals, it becomes equally imperative to understand the role of learning biases in the context of learning probabilistic and nondeterministic variable patterns as well as categorical patterns. There are studies on syntactic variation using artificial language learning paradigms (Culbertson & Newport, Reference Culbertson and Newport2015; Kam & Newport, Reference Kam and Newport2005, Reference Kam and Newport2009; Schuler, Yang & Newport, Reference Schuler, Yang, Newport, Papafragou, Grodner, Mirman and Trueswell2016; Singleton & Newport, Reference Singleton and Newport2004), but fewer studies have focused on how phonological variation is learned using an artificial language paradigm. To the best of our knowledge, Mooney and Do (Reference Mooney and Do2018) is the only study in phonology whose primary focus was on learning artificial languages characterized by inherent variability. Mooney and Do (Reference Mooney and Do2018) designed artificial languages exhibiting two phonological variables which differed in their distributions. They tested the role of substantive bias in learning free variation of rounding harmony across a morpheme boundary (e.g., kenu bo ~ kenu be) by exposing adult English native speakers to languages exhibiting different dominant patterns; Language A showed more frequent rounding harmony (e.g., kenu bo) with a less frequent disharmony pattern (e.g., kenu be), while Language B exhibited the opposite tendency.
An artificial language paradigm with free variation in rounding harmony like the one used by Mooney and Do (Reference Mooney and Do2018) is an interesting site for the study of substantive bias partially due to the typological patterning of rounding harmony in interaction with height. Kaun (Reference Kaun and Bronson2004) shows a typological distribution in rounding harmony patterning that favors (1) a non-high trigger vowel, (2) a high target vowel, and (3) height agreement between trigger and target vowels. Where cross-height rounding harmony is observed in Kaun's study, the trigger vowel is almost always non-high and the target vowel is almost always high. Typological asymmetry exists because a non-high vowel is a perceptually weaker environment for a [+round] cue, causing the feature to duplicate or spread for enhancement in a perceptually more salient position (Suomi, Reference Suomi1983). Kaun (Reference Kaun and Bronson2004) additionally makes reference to three cases of attested free variation in rounding harmony in Tuha, Tofa, and Altai Tuvan (Turkic). In Tuha and Tofa, the target is always high, and in Tofa, RH is obligatory where there is height agreement between trigger and target. Mooney and Do (Reference Mooney and Do2018) found that adult participants significantly boosted rounding harmony compared to their input where the target was high, and their subsequent MaxEnt models came to favor rounding harmony in lexical items with non-high triggers and high targets over high triggers and non-high targets. Based on these findings, Mooney and Do (Reference Mooney and Do2018) propose that phonological variation in rounding harmony is subject to true phonetically-grounded inductive bias, or substantive bias, rather than only anti-complexity structural bias. Rounding harmony is also a phonological cue available to child learners as young as in infancy, who may use it for word segmentation; and it is not experience-dependent (Mintz, Walker, Welday & Kidd, Reference Mintz, Walker, Welday and Kidd2017), making it a useful site for testing phonological knowledge in artificial language paradigms for child learners.
Mooney and Do's (Reference Mooney and Do2018) findings make a prediction about sound change: phonological variable distribution may undergo change over time, diachronically as well as in the speech of individuals, toward phonologically more natural patterns. Despite the crucial prediction for sound change, results from adults do not provide a direct insight into child acquisition of phonology due to gradient cognitive differences between adults and children such as maturational constraints on linguistic pattern learning (Newport, Reference Newport1990). Note, though, the prevailing claim from variationist sociolinguistics that adolescents are often leaders of language change (Labov, Reference Labov2001; Tagliamonte & D'Arcy, Reference Tagliamonte and D'Arcy2009), thus putting more explanatory weight on learning bias studies with young adult participants. However, if research on learning biases were to help us better understand language acquisition and learning, examining phonological variable learning among child learners will be still essential.
Using a modified design of Mooney and Do's experiment, we trained Cantonese-native preschoolers on two artificial languages – one language with dominant rounding harmony patterns (RH language) and another with dominant no-rounding harmony patterns (NH language). Cantonese does not exhibit harmony patterns, neither for vowels nor for consonants, so the L1 effect is minimized. If preschoolers learn the correct distribution of variants in artificial languages, we predict that their subsequent production will match the probabilistic distribution of variants they learned. If a substantive bias leads to a preference of rounding harmony over no rounding harmony, we predict that rounding harmony patterns will be chosen more often than what artificial languages exhibit.
Methods
Participants
Thirty-nine Cantonese speaking preschoolers without language disorders participated and completed the test (16 females, mean age = 5;4, age range = 4;11-5;11). According to a parental language questionnaire, the dominant language is Hong Kong Cantonese for all preschoolers. Their parents are all Hong Kong Cantonese native speakers and they predominantly use Cantonese at home. Participants all attend local preschools in Hong Kong where English is taught as a second language with minimum 2 to maximum 5 hours per week. Upon completion of the test, participants were offered snacks, and the accompanying parent of each participant was offered 100 HKD (12 USD). Eleven additional preschoolers participated but their data was excluded due to distractions during training (n=9) or lack of learning (n=2, see Results).
Design and stimuli
Six animals were introduced during the training session and each animal had a disyllabic (CV.CV) name, i.e., [kenu] ‘turtle’, [miko] ‘bulldog’, [negu] ‘penguin’, [nuto] ‘bear’, [pomu] ‘dragon’, and [ruko] ‘bobcat’. Animals were presented with two motion verbs in CV suffix forms. Each motion verb suffix alternates its target vowel to create vowel (dis)harmony with a stem-final trigger vowel of a preceding noun: e.g., [be] ~ [bo] ‘singing’, and [ri] ~ [ru] ‘playing soccer’. An example of [kenu] ‘turtle’ with the two motion verbs is provided in Figure 1. Two suffix allomorphs per each motion were used to create height-conditioned rounding (dis)harmony of suffix vowels with the stem-final vowels. Stem-final vowels and suffix vowels were either non-high or high. Half of the items exhibited agreement in height between the two vowels and the other half did not. This design was to test for phonetic underpinnings of substantive bias related to rounding harmony whereby the trigger prefers to be non-high (e.g., miko bo is preferred over kenu bo), the target prefers to be high (e.g., kenu-ru is preferred over kenu bo), and the trigger and target prefer to agree in their height (e.g., kenu ru is preferred over kenu bo) (Kaun, Reference Kaun1995, Reference Kaun and Bronson2004; Suomi, Reference Suomi1983). To create free variation in both RH and NH languages, the dominant pattern was shown 67% of the time and the non-dominant pattern was shown 33%; in the RH language, rounding harmony was shown for 67% (e.g., kenu bo and kenu ru) and no harmony was shown for 33% (e.g., kenu be and kenu ri) and the proportions were the opposite in the NH language. In other words, each animal was presented more frequently with the suffixes with the same rounding feature in the RH language and with the different rounding feature in the NH language. Participants were trained only on the inflected forms and no uninflected forms (i.e., animals absent a motion verb) were presented in order to avoid explicitly teaching an underlying form of the animal name.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220213175048613-0623:S0305000920000719:S0305000920000719_fig1.png?pub-status=live)
Figure 1. Vowel harmony variation seen in the description of a turtle in two motions. The two verb suffixes alternate their target vowels to create vowel (dis)harmony with the stem-final trigger vowels.
A phonetically trained male native American English speaker read the stimuli presented in IPA symbols without knowing the purpose of the study. Lexical stress was always on the first syllable, meaning that both the trigger and the target rounding (dis)harmony vowels were always unstressed. All the recordings were made in a sound-attenuated booth at the first author's institute. The recordings were scaled to 70 dB using the Scale intensity feature in Praat. They were converted to MP3 format in Audacity, allowing the files to be embedded in HTML5 <audio> tags. Two independent American English native speakers heard the stimuli and provided corresponding written forms, which correctly showed the intended perception of all stimuli items.
Procedure
Each child participant was accompanied by their parent as well as a Hong Kong Cantonese-speaking experimenter. The parents were asked not to give any feedback during the experiment. The child participants were told to learn how to describe animals in motion in an ‘alien language’Footnote 1. They were also told that the alien language uses the same sounds with their second language, English, but the words in the alien language are different from those in English. By this instruction, participants were expected to consider the task as a new language learning task. They were not told that the suffix vowels are alternating depending on the properties of stem-final vowels, creating an implicit test. All instructions were provided in Cantonese before the test.
Half of the participants learned the RH language (n=20) and another half learned the NH language (n=19). Participants were seated in front of a computer screen in a quiet lab space. The experiment was presented using PsychoPy (Peirce et al., Reference Peirce, Gray, Simpson, MacAskill, Höchenberger, Sogo, Kastman and Lindeløv2019). During the training session, participants were shown a picture of an animal in motion one at a time (in total, 6 animals x 2 motions = 12 items) and they were asked to repeat after each sound file was played. The experimenter pressed a button on a keyboard to proceed to the next item once the participant produced the heard form correctly. If an incorrect form was produced, the participant was asked to repeat the item once again. The order of picture presentations was random for each participant. The training session was repeated three times (in total, 12 items x 3 times = 36 items), with the mean training time of 23 minutes. After the training, participants completed a test session. The test session was composed of two parts. In the first part, each question exhibited a picture of an animal seen during training. Eight randomly chosen pictures (4 animals x 2 motions) were shown to each participant, but it was ensured that two of the animals had high stem-final vowels and the other two animals had non-high stem-final vowels. It was also ensured that both the e ~ o and the i ~ u suffix vowel alternation patterns were equally present for all participants. In the second part, four new animals were introduced, [petu] ‘duck’, [timo] ‘kangaroo’, [lepu] ‘shark’, and [rino] ‘elephant’. Participants were told that these four animals are also living in the alien world and were asked to repeat after the experimenter until they correctly produced the names of the four animals. Once all four forms were correctly produced, participants were asked to guess how to describe the animals in the two motions, ‘singing’ and ‘playing soccer’, both of which were introduced during the training session. Each picture showed a new animal in a previously seen motion, totaling up 8 items (4 animals x 2 motions). In tests for both the seen animals and for the new animals, the names of animals were always given in correct forms. Participants were asked to choose how to describe motions and three answer choices were provided in audio files in a random order: (a) a suffix with the dominant rounding (dis)harmony pattern ([kenu bo] in the RH language; kenu-be in the NH language), (b) a suffix with the non-dominant rounding (dis)harmony pattern ([kenu be] in the RH language; kenu bo in the NH language), and (c) an unseen suffix with a wrong consonant and a wrong vowel (e.g., [kenu ma]). The experimenter pushed a key on the keyboard to record the participants’ answer choice of either (a), (b), or (c).
Results
Results from two participants, both who learned the RH language, were discarded because they did not learn the exhibited suffix alternations: they chose an unseen suffix, i.e., answer choice (c), for over half of their answers (73% and 100% respectively). Answers from all other participants were analyzed. Figure 2 shows the answer distribution for the seen and unseen items depending on languages, i.e., RH language vs. NH language.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220213175048613-0623:S0305000920000719:S0305000920000719_fig2.png?pub-status=live)
Figure 2. Answer choices in the RH and NH languages for seen and unseen items. Two dotted lines indicate proportions of harmony and non-harmony in input, matching 0.33 (bottom) and 0.67 (top).
The data in Figure 2 show that participants overall chose the harmony patterns more often across conditions, showing a general preference for rounding harmony over non-harmony. We examined the descriptive data from the explicit test, specifically, where participants chose the answers for the seen items. As for the items seen during training, participants in the RH language predominantly chose the rounding harmony patterns which is higher than the rounding harmony proportion exhibited in the input at (78% vs. 67%). As for the seen items in the NH language, a similar tendency was observed; participants chose harmony patterns more often than the language exhibited them (54% vs. 33%), although the rate of rounding harmony choices was lower than in the RH language. Next, we examined the descriptive data of the implicit test, where participants generalized the patterns toward unseen items. For the unseen items, a tendency toward rounding harmony over non-harmony was still observed, but there was a notable difference from the seen items. Answer distributions of rounding harmony vs. non-harmony patterns very much reflect the proportions attested in the input in the RH language (62% vs. 67%), as shown from the rough matches between dotted lines (proportions in the input) and bar graphs (participants’ production). However, rounding harmony patterns were chosen at a much higher rate than its proportion in the input in the NH language (72% vs. 33%), and subsequently the choice of non-harmony patterns was significantly discouraged (25% vs. 67%).
Statistical analysis concurs with the descriptive observations above. Given that our main focus is to compare rounding harmony vs. non-harmony answers, we tested whether participants chose the two answer options significantly differently depending on language (RH language vs. NH language) and training (seen vs. unseen) factors. We ran a mixed effects logistic regression model using lme4 (Bates, Maechler, Bolker & Walker, Reference Bates, Maechler, Bolker and Walker2015), modeling the choice of rounding harmony and non-harmony answers. The third answer choice, i.e., an unseen suffix, was excluded from the analysis. In this model, the difference between harmony and non-harmony dominance in each language was tested by including a sum-coded ‘Language’ factor. The difference between seen and unseen items was tested by including a sum-coded ‘Training’ factor. The interaction between Language and Training factors was included as well. Random intercepts were included for items and participants, and a random slope for Training by participant. The results showed that rounding harmony choices were significantly discouraged in the NH language (β = -1.381, SE = 0.284, z = -4.87, Pr(>|z|) < .001), showing a significant Language effect. Participants chose rounding harmony patterns slightly less for unseen items across the two languages, but it was not statistically significant (β = -0.902, SE = 0.528, z = -1.68, Pr(>|z|) = .094), suggesting no Training effect to the overall choices of rounding harmony or non-harmony answers. However, rounding harmony choices were significantly encouraged for unseen items especially when they learned the NH language (β = 1.96, SE = 0.389, z = 5.05, Pr(>|z|) < .001), suggesting a strongly biased generalization toward rounding harmony patterns especially when learning the NH language.
A linear regression of the combined participant mean rounding harmony productions by trigger height, target height, and height agreement between trigger and target achieved significance (F =2.31, P < .001). There was a significant effect of target height (t=2.36, P < .01) but no significant effects for either trigger height (P = .76) or height agreement (P =.98) were observed.
Discussion
A main finding of our study is that phonological variable learning is substantively biased toward more natural variants overall. The current result is consistent with a tendency reported from variable learning in morphosyntax (e.g., Kam & Newport, Reference Kam and Newport2005, Reference Kam and Newport2009): children prefer variants that are in line with Greenbergian universals (Greenberg, Reference Greenberg and Greenberg1963). However, the current results differ in that our study shows that children modulated the variant distribution only, while studies testing morphosyntactic variable learning showed that children regularized languages. In morphosyntactic learning, patterns are grounded on structural basis, often including logical relations among structures as well (e.g., Greenberg's Universal 18 ‘if a language has pre-nominal adjectives it will also have pre-nominal numerals’ tested in Culbertson, Smolensky & Legendre, Reference Culbertson, Smolensky and Legendre2011). Such logical relations may not be immediately available to children, which raises a question whether the observed biases in artificial language learning studies in morphosyntax truly reflect some component of learning (Culbertson, Reference Culbertson2012). Note, though, that the nature of the learning task in this study is different from those of morphosyntactic studies. Rounding harmony is grounded on phonetic substance, primarily acoustic and perceptual cues (Kaun, Reference Kaun and Bronson2004; Suomi, Reference Suomi1983). Presumably, acoustic and perceptual cues are immediately available to listeners despite short exposure time. If so, instead of simply regularizing the whole language, learners were able to do a complex learning task – namely, learning of two variants and their distribution, with incorporation of substantive bias.
While answer distributions in our study showed biased learning of variant distribution toward harmony overall, such a tendency was not observed from unseen items in the RH language: recall that the rates of generalization to new items matched the variant distribution shown in input. In other words, when children were exposed to a language exhibiting a dominant natural variant (RH language), they learned and generalized the variant distribution without imposing a substantive bias. When they were exposed to a language with a dominant but unnatural variant (NH language), on the contrary, they generalized patterns in a highly biased way, resulting in a high degree of discrepancy from the input. This difference suggests that generalizations based on phonological variables are more biased when children learn languages that are unnatural.
We also tested the three typological principles of rounding harmony introduced in Section 2 (Kaun, Reference Kaun1995, Reference Kaun and Bronson2004): (a) the trigger must be non-high; (b) the target must be high; (c) the trigger and target must agree in height. Our results reflect principle (b) but not (a) or (c). This is consistent with the results from adult learners in Mooney and Do (Reference Mooney and Do2018). As Mooney and Do pointed out, the result seems to reflect a rounding harmony variation tendency found in natural languages; two out of three languages with documented free rounding harmony variation (Tuha and Tofa in Harrison & Kaun, Reference Harrison, Kaun and Hall2001) restrict the pattern only to high targets, without height constraints on trigger, with the third language (Altai Tuvan) allowing harmony in all trigger-target pairs.
Though there is a parallel finding of the significance of target height between this study and Mooney and Do (Reference Mooney and Do2018), the results also demonstrate a major difference between child and adult learners. Overall, adults in Mooney and Do (Reference Mooney and Do2018) did not markedly depart from the input distributions on which they were trained, even in the phonetically unnatural NH language. In contrast, we see the child learners in this study completely reverse the NH distribution in both seen and unseen items to one where the natural rounding harmony pattern dominates. Though this child phonetic behavior does not entirely mirror the categorical learning found in similar paradigms of morphosyntactic naturalness research, as discussed above, our results can still be viewed as strikingly similar to these studies – in that we see children ready and able to rearrange probabilistic constraints gleaned from input towards a pattern that is more natural in their subsequent production.
Our findings here make explicit predictions about how variant distribution in languages can change. Experimental results from learning bias studies have been discussed in conjunction with sound change: biased learning of sound patterns should reflect the asymmetry in sound change over generations, which in turn is reflected in typology (Moreton, Reference Moreton2008; Wilson, Reference Wilson2006). An assumption behind this idea is that learners drive sound change. Likewise, our study shows that learners may drive changes in variant distribution as well, with the incipient change being towards a greater frequency of a more phonetically-natural form. If variation learning is biased, as evidenced from the current as well as previous studies, then over generations learners should reshape language to be more skewed toward phonological variants that are natural. Note though our study also shows that language change is not simply toward more natural patterns but it keeps a balance with input frequencies. For example, in the seen items condition in the NH language, even though the pattern was reversed so that the harmony pattern was dominant, no harmony pattern was present to more of an extent than it was in unseen items. This suggests more of an attempt to match harmony frequency of seen forms than to productively extend that learned harmony frequency to novel items. Results from our study may also provide insight for cases of phonological variability that remain stable over time in distribution and frequency of variants, such as the English -ing/in’ alternation (Labov, Reference Labov1989, Reference Labov2001). It is only our unnatural language (NH language) that is greatly restructured by learners to exhibit a much more natural (although still variable) pattern. While the dominant pattern in the natural language (RH language) is accentuated by learners, the shift towards a greater frequency of the natural pattern is less radical and is not generalized to extend to new items. This could suggest the existence of a kind of ‘tipping point’ to motivate a distributional change in a site of probabilistically-patterned phonological variation, where a pattern or variant must be unnatural enough to provoke the learner to amend or adjust it. Such a proposal requires future experimental conditions that test variable patterns exhibiting different phonetic properties and in different proportional distributions.
A prediction from the channel bias hypothesis may also pertain to our results (Barnes, Reference Barnes2002; Blevins, Reference Blevins2004; Hale & Reiss, Reference Hale, Reiss, Burton-Roberts, Carr and Docherty2000; Ohala, Reference Ohala and Jones1993a, Reference Ohala1993b), where sound change is expected to arise primarily due to innocent misperception in communication channel between the speaker and hearer, not necessarily related to learning. The fact that rounding harmony is grounded mainly on perceptual cues (i.e., duplication or spread of [±round] feature in order to increase its chances to be perceived by a listener) makes it hard to disentangle the role of the substantive bias from that of the channel bias. This paves the way for future work on phonological variable learning which will allow us to see the extent to which the biases observed in artificial language learning experiments can be attributed to the components of learning.
Acknowledgments
Data collection and preliminary analysis were sponsored by The University of Hong Kong’ Seed Fund for Basic Research to the first author. We have no conflicts of interest to disclose.