The roles of prediction, expectation, and prior experience in musical processing are well established (Huron Reference Huron2006; Large et al. Reference Large, Fink and Kelso2002; Meyer Reference Meyer1956; Narmour Reference Narmour1990; Phillips-Silver & Trainor Reference Phillips-Silver and Trainor2008; Vuust & Frith Reference Vuust and Frith2008), and indeed have led to the proposal that music has the capacity to create an environment of minimized prediction error within individuals and within groups (e.g., via a steady pulse) (Overy & Molnar-Szakacs Reference Overy and Molnar-Szakacs2009). Bayesian models have been shown to account for a range of phenomena in music perception (Temperley Reference Temperley2007) and have been used to bring together apparently diverging datasets from rhythm perception and production tasks (Sadakata et al. Reference Sadakata, Desain and Honing2006). Moreover, it has been shown that the motor system is engaged during auditory rhythm perception (e.g., Grahn & Brett Reference Grahn and Brett2007), and that musical imagery evokes similar neural responses as perception (Schaefer et al. Reference Schaefer, Vlek and Desain2011a; Reference Schaefer, Vlek and Desain2011b). Clark's unified framework of perception, action, and cognition is thus well supported by recent music research.
However, the current account does not attempt to deal with the range of ways in which prediction error induces arousal and affect. The extent to which our predictions are met or violated, historically theorized to lead to an arousal response (Berlyne Reference Berlyne1970), can make a piece of music more or less coherent, interesting, and satisfying. Aesthetically, this leads to the concept of an optimal level of surprisal, which (although initially formulated to describe liking or hedonic value for differing levels of musical complexity; e.g., North & Hargreaves Reference North and Hargreaves1995) can be described as an inverted U-shaped function in which, on the x-axis of prediction error, there is a preferred level of surprisal that leads to a maximally affective response, plotted on the y-axis. However, this optimal surprisal level is not uniform over musical features (e.g., expressive timing, harmonic structure), but rather is closely coupled to the specific characteristics of that musical feature or behaviour. As Clark states, context sensitivity is fundamental, and in the case of music, different levels of constraint will exist simultaneously across different systems of pitch space and time. For example: Singing often has high constraints in terms of pitch, tuning, and scale, while timing constraints may be more flexible; but drumming usually involves strict timing constraints, with more flexibility in terms of pitch. Our perceptual systems are finely attuned to these constraints, to the point that rhythmic deviations that fit with certain aspects of perceived musical structure are less well detected (Repp Reference Repp1999), and humanly produced deviations from a steady rhythm are preferred over randomly added noise (Hennig et al. Reference Hennig, Fleischmann, Fredebohm, Hagmayer, Nagler, Witt, Theis and Geisel2011).
This tuning of our perceptual system to specific deviations from an internal model is seen not only in performance aspects of music (such as expressive microtiming), but also in compositional aspects found in the score (such as syncopation). Most musical styles require and indeed “play” with levels of surprisal in the temporal domain, from the musical rubato of Romantic piano performance, to the syncopated off-beat rhythms of jazz, to the complex polyrhythms of African percussion. Proficient musicians and composers are implicitly aware of these effects, and tailor their efforts to interact with the surprisal responses of the listener. This leads to what has been coined “communicative pressure” in creating music (Temperley Reference Temperley2004): an implicit knowledge of the musical dimension in which prediction can be manipulated stylistically, without leading to a lack of clarity of the musical ideas. While this complexity corresponds closely to what Clark refers to as a designed environment, it is important to note that different musical environments have different rules, that different listeners (due to their different exposure backgrounds, such as culture and training) seek different environments, and that the desired outcome is a complex affective response. Indeed, exposure has been shown to influence liking for a completely new musical system after only 30 minutes of exposure (Loui et al. Reference Loui, Wessel and Hudson Kam2010). This finding supports the idea of a strong personalized configuration of one's own preference for unpredictability, reflected in musical likes and dislikes, as well as one's own prediction abilities, shown to be quite stable over time per individual, affecting interpersonal coordination (Pecenka & Keller Reference Pecenka and Keller2011). An individual personality might be thrill-seeking and seek out highly unpredictable new musical experiences, or, more commonly, might seek out highly predictable familiar, favorite musical experiences.
Thus, different kinds of musical experience, different musical styles, and personal musical preferences lead to different predictions, error responses, arousal, and affect responses across a range of musical dimensions and hierarchical levels. The upshot is that the surprisal response is non-uniform for music: The positioning of a curve describing “optimal surprisal” for affective or aesthetic reward will be determined by culture, training, or musical style, and its precise shape (e.g., kurtosis) may be specific to the type and level of the prediction or mental model. And while the characteristics of the optimal surprisal for each aspect of music differs, the commonality remains affect, which, we propose, plays a major part in what makes prediction error in music (large or small) meaningful, and indeed determines its value.
To the extent that prediction is established as a powerful mechanism in conveying musical meaning, it seems clear then that it is the affective response to the prediction error that gives the initial prediction such power. We thus propose that the valence of the prediction error, leading to a range of affective responses, is a necessary component of the description of how predictive processing can explain musical behaviour. The function of such affective predictability will require discussion elsewhere, but we postulate that this will include deep connections with social understanding and communication, from simple group clapping, a uniquely human behaviour requiring constant automatic adjustments of probabilistic representation (Molnar-Szakacs & Overy Reference Molnar-Szakacs and Overy2006; Overy & Molnar-Szakacs Reference Overy and Molnar-Szakacs2009), to more sophisticated rhythmic organization and self-expression (Nelson Reference Nelson and Leroy2012) with an emphasis on “error” as positive, meaningful information.
The roles of prediction, expectation, and prior experience in musical processing are well established (Huron Reference Huron2006; Large et al. Reference Large, Fink and Kelso2002; Meyer Reference Meyer1956; Narmour Reference Narmour1990; Phillips-Silver & Trainor Reference Phillips-Silver and Trainor2008; Vuust & Frith Reference Vuust and Frith2008), and indeed have led to the proposal that music has the capacity to create an environment of minimized prediction error within individuals and within groups (e.g., via a steady pulse) (Overy & Molnar-Szakacs Reference Overy and Molnar-Szakacs2009). Bayesian models have been shown to account for a range of phenomena in music perception (Temperley Reference Temperley2007) and have been used to bring together apparently diverging datasets from rhythm perception and production tasks (Sadakata et al. Reference Sadakata, Desain and Honing2006). Moreover, it has been shown that the motor system is engaged during auditory rhythm perception (e.g., Grahn & Brett Reference Grahn and Brett2007), and that musical imagery evokes similar neural responses as perception (Schaefer et al. Reference Schaefer, Vlek and Desain2011a; Reference Schaefer, Vlek and Desain2011b). Clark's unified framework of perception, action, and cognition is thus well supported by recent music research.
However, the current account does not attempt to deal with the range of ways in which prediction error induces arousal and affect. The extent to which our predictions are met or violated, historically theorized to lead to an arousal response (Berlyne Reference Berlyne1970), can make a piece of music more or less coherent, interesting, and satisfying. Aesthetically, this leads to the concept of an optimal level of surprisal, which (although initially formulated to describe liking or hedonic value for differing levels of musical complexity; e.g., North & Hargreaves Reference North and Hargreaves1995) can be described as an inverted U-shaped function in which, on the x-axis of prediction error, there is a preferred level of surprisal that leads to a maximally affective response, plotted on the y-axis. However, this optimal surprisal level is not uniform over musical features (e.g., expressive timing, harmonic structure), but rather is closely coupled to the specific characteristics of that musical feature or behaviour. As Clark states, context sensitivity is fundamental, and in the case of music, different levels of constraint will exist simultaneously across different systems of pitch space and time. For example: Singing often has high constraints in terms of pitch, tuning, and scale, while timing constraints may be more flexible; but drumming usually involves strict timing constraints, with more flexibility in terms of pitch. Our perceptual systems are finely attuned to these constraints, to the point that rhythmic deviations that fit with certain aspects of perceived musical structure are less well detected (Repp Reference Repp1999), and humanly produced deviations from a steady rhythm are preferred over randomly added noise (Hennig et al. Reference Hennig, Fleischmann, Fredebohm, Hagmayer, Nagler, Witt, Theis and Geisel2011).
This tuning of our perceptual system to specific deviations from an internal model is seen not only in performance aspects of music (such as expressive microtiming), but also in compositional aspects found in the score (such as syncopation). Most musical styles require and indeed “play” with levels of surprisal in the temporal domain, from the musical rubato of Romantic piano performance, to the syncopated off-beat rhythms of jazz, to the complex polyrhythms of African percussion. Proficient musicians and composers are implicitly aware of these effects, and tailor their efforts to interact with the surprisal responses of the listener. This leads to what has been coined “communicative pressure” in creating music (Temperley Reference Temperley2004): an implicit knowledge of the musical dimension in which prediction can be manipulated stylistically, without leading to a lack of clarity of the musical ideas. While this complexity corresponds closely to what Clark refers to as a designed environment, it is important to note that different musical environments have different rules, that different listeners (due to their different exposure backgrounds, such as culture and training) seek different environments, and that the desired outcome is a complex affective response. Indeed, exposure has been shown to influence liking for a completely new musical system after only 30 minutes of exposure (Loui et al. Reference Loui, Wessel and Hudson Kam2010). This finding supports the idea of a strong personalized configuration of one's own preference for unpredictability, reflected in musical likes and dislikes, as well as one's own prediction abilities, shown to be quite stable over time per individual, affecting interpersonal coordination (Pecenka & Keller Reference Pecenka and Keller2011). An individual personality might be thrill-seeking and seek out highly unpredictable new musical experiences, or, more commonly, might seek out highly predictable familiar, favorite musical experiences.
Thus, different kinds of musical experience, different musical styles, and personal musical preferences lead to different predictions, error responses, arousal, and affect responses across a range of musical dimensions and hierarchical levels. The upshot is that the surprisal response is non-uniform for music: The positioning of a curve describing “optimal surprisal” for affective or aesthetic reward will be determined by culture, training, or musical style, and its precise shape (e.g., kurtosis) may be specific to the type and level of the prediction or mental model. And while the characteristics of the optimal surprisal for each aspect of music differs, the commonality remains affect, which, we propose, plays a major part in what makes prediction error in music (large or small) meaningful, and indeed determines its value.
To the extent that prediction is established as a powerful mechanism in conveying musical meaning, it seems clear then that it is the affective response to the prediction error that gives the initial prediction such power. We thus propose that the valence of the prediction error, leading to a range of affective responses, is a necessary component of the description of how predictive processing can explain musical behaviour. The function of such affective predictability will require discussion elsewhere, but we postulate that this will include deep connections with social understanding and communication, from simple group clapping, a uniquely human behaviour requiring constant automatic adjustments of probabilistic representation (Molnar-Szakacs & Overy Reference Molnar-Szakacs and Overy2006; Overy & Molnar-Szakacs Reference Overy and Molnar-Szakacs2009), to more sophisticated rhythmic organization and self-expression (Nelson Reference Nelson and Leroy2012) with an emphasis on “error” as positive, meaningful information.