Creating and Programming Game Music

Part II - Creating and Programming Game Music

Introduction

Published online by Cambridge University Press: 15 April 2021

Melanie Fritsch and

Tim Summers

Edited by

Melanie Fritsch and

Tim Summers

Show author details

Melanie Fritsch: Affiliation:
Heinrich-Heine-Universität Düsseldorf
Tim Summers: Affiliation:
Royal Holloway, University of London

Book contents

Summary

Video game music is often sonically similar to film music, particularly when games use musical styles that draw on precedent in cinema. Yet there are distinct factors in play that are specific to creating and producing music for games. These factors include:Apart from books and manuals that teach readers how to use particular game technologies (such as, for example, Ciarán Robinson’s Game Audio with FMOD and Unity),1 some composers and audio directors have written about their processes in more general terms. Rob Bridgett,2 Winifred Phillips,3 George Sanger,4 Michael Sweet,5 Chance Thomas,6 and Gina Zdanowicz and Spencer Bambrick7 amongst others have written instructive guides that help to convey their approaches and philosophies to music in games. Each of these volumes has a slightly different approach and focus. Yet all discussions of creating and producing game music deal with the three interlinked factors named above.

Type: Chapter
Information: The Cambridge Companion to Video Game Music , pp. 59 - 130

DOI: https://doi.org/10.1017/9781108670289 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

technical considerations arising from the video game technology,
interactive qualities of the medium, and
aesthetic traditions of game music.

Apart from books and manuals that teach readers how to use particular game technologies (such as, for example, Ciarán Robinson’s Game Audio with FMOD and Unity),Footnote ¹ some composers and audio directors have written about their processes in more general terms. Rob Bridgett,Footnote ² Winifred Phillips,Footnote ³ George Sanger,Footnote ⁴ Michael Sweet,Footnote ⁵ Chance Thomas,Footnote ⁶ and Gina Zdanowicz and Spencer BambrickFootnote ⁷ amongst others have written instructive guides that help to convey their approaches and philosophies to music in games. Each of these volumes has a slightly different approach and focus. Yet all discussions of creating and producing game music deal with the three interlinked factors named above.

Music is one element of the video game; as such it is affected by technical aspects of the game as a whole. The first part of this book considered how sound chip technology defined particular parameters for chiptune composers. Even if modern games do not use sound-producing chips like earlier consoles, technical properties of the hardware and software still have implications for the music. These might include memory and processing power, output hardware, or other factors determined by the programming. In most cases, musical options available to the composer/audio director are determined by the (negotiated) allocation of time, budget and computational resources to audio by the game directors. This complexity, as well as the variety of audio elements of a game, is part of the reason why large game productions typically have an ‘audio director’. The role of the audio director is to supervise all sound in the game, managing the creation of sound materials (music, sound effects, dialogue), while co-ordinating with the teams programming other aspects of the game.

For smaller-sized productions such as mobile games, indie games or games that just do not use that much music, tasks conducted by an audio director are either outsourced and/or co-ordinated by a game producer. Additionally, in-house composers are rather uncommon; most composers do work-for-hire for a specific project, and are therefore oftentimes not permanent team members.Footnote ⁸

Composers must consider how their music will interact with the other elements of the game. Perhaps chief amongst these concerns is the question of how the music will respond to the player and gameplay. There are a variety of ways that music might do so. A game might simply feature a repeating loop that begins when the game round starts, and repeats until the player wins or loses. Or a game might involve more complicated interactive systems. Sometimes the music programming is handled by specialist ‘middleware’ software, like FMOD and Wwise, which are specifically designed to allow advanced audio options. In any case, the composer and audio directors are tasked with ensuring that music fits with the way the material will be deployed in the context of the game.

Karen Collins has defined a set of terms for describing this music. She uses ‘dynamic music’, as a generic term for ‘changeable’ music; ‘adaptive’ for music that changes in reaction to the game state, not in direct response to the player’s actions (such as music that increases in tempo once an in-game countdown timer reaches a certain value); and ‘interactive’ for music that does change directly as a result of the player’s actions, such as when music begins when the player’s avatar moves into a new location.Footnote ⁹

Unlike the fixed timings of a film, games often have to deal with uncertainty about precisely when particular events will occur, as this depends on the player’s actions. Composers frequently have to consider whether and how music should respond to events in the game. Musical reactions have to be both prompt and musically coherent. There are a great variety of approaches to the temporal indeterminacy of games, but three of the most common are loops, sections and layers.Footnote ¹⁰ Guy Michelmore, in Chapter 4 of this volume, outlines some of the challenges and considerations of writing using loops, sections and layers.

A less common technique is the use of generative music, where musical materials are generated on the fly. As Zdanowicz and Bambrick put it, ‘Instead of using pre-composed modules of music’, music is ‘triggered at the level of individual notes’.Footnote ¹¹ Games like Spore, Proteus, No Man’s Sky and Mini Metro have used generative techniques. This approach seems best suited to games where procedural generation is also evident in other aspects of the game. Generative music would not be an obvious choice for game genres that expect highly thematic scores with traditional methods of musical development.

As the example of generative music implies, video games have strong traditions of musical aesthetics, which also play an important part in how game music is created and produced. Perhaps chief amongst such concerns is genre. In the context of video games, the word ‘genre’ is usually used to refer to the type of game (strategy game, stealth game, first-person shooter), rather than the setting of the game (Wild West, science fiction, etc.). Different game genres have particular conventions of how music is implemented. For instance, it is typical for a stealth game to use music that reacts when the player’s avatar is discovered, while gamers can expect strategy games to change music based on the progress of the battles, and Japanese role-playing game (RPG) players are likely to expect a highly thematic score with character and location themes.

K. J. Donnelly has emphasized that, while dynamic music systems are important, we should be mindful that in a great many games, music does not react to the ongoing gameplay.Footnote ¹² Interactivity may be an essential quality of games, but this does not necessarily mean that music has to respond closely to the gameplay, nor that a more reactive score is intrinsically better than non-reactive music. Music that jumps rapidly between different sections, reacting to every single occurrence in a game can be annoying or even ridiculous. Abrupt and awkward musical transitions can draw unwanted attention to the implementation. While a well-made composition is, of course, fundamental, good implementation into the game is also mandatory for a successful score.

The genre of the game will also determine the cues required in the game. Most games will require some kind of menu music, but loading cues, win/lose cues, boss music, interface sounds and so on, will be highly dependent on the genre, as well as the particular game. When discussing game music, it is easy to focus exclusively on music heard during the main gameplay, though we should recognize the huge number of musical elements in a game. Even loading and menu music can be important parts of the experience of playing the game.Footnote ¹³

The highly collaborative and interlinked nature of game production means that there are many agents and agendas that affect the music beyond the composer and audio director. These can include marketing requirements, broader corporate strategy of the game publisher and developer, and technical factors. Many people who are not musicians or directly involved with audio make decisions that affect the music of a game. The process of composing and producing music for games balances the technical and financial resources available to creators with the demands of the medium and the creative aspirations of the producers.

4 Building Relationships: The Process of Creating Game Music

Guy Michelmore

Even more than writing music for film, composing for video games is founded on the principle of interactive relationships. Of course, interactivity is particularly obvious when games use dynamic music systems to allow music to respond to the players. But it is also reflected more generally in the collaborative nature of game music production and the way that composers produce music to involve players as active participants, rather than simply as passive audience members. This chapter will outline the process of creating video game music from the perspective of the composer. The aim is not to provide a definitive model of game music production that applies for all possible situations. Instead, this chapter will characterize the processes and phases of production that a game composer will likely encounter while working on a project, and highlight some of the factors in play at each stage.

Beginning the Project

Though some games companies have permanent in-house audio staff, most game composers work as freelancers. As with most freelance artists, game composers typically find themselves involved with a project through some form of personal connection. This might occur through established connections or through newly forged links. For the latter, composers might pitch directly to developers for projects that are in development, or they might network at professional events like industry conferences (such as Develop in the UK). As well as cultivating connections with developers, networking with other audio professionals is important, since many composers are given opportunities for work by their peers.

Because of the technical complexity of game production, and the fact that games often require more music than a typical film or television episode, video games frequently demand more collaborative working patterns than non-interactive media. It is not uncommon for games to involve teams of composers. One of the main challenges of artistic collaborations is for each party to have a good understanding of the other’s creative and technical processes, and to find an effective way to communicate. Unsurprisingly, one positive experience of a professional relationship often leads to another. As well as the multiple potential opportunities within one game, composers may find work as a result of previous fruitful collaborations with other designers, composers or audio directors. Since most composers find work through existing relationships, networking is crucial for any composer seeking a career in writing music for games.

Devising the Musical Strategy

The first task facing the composer is to understand, or help devise, the musical strategy for the game. This is a process that fuses practical issues with technical and artistic aspirations for the game. The game developers may already have a well-defined concept for the music of their game, or the composer might shape this strategy in collaboration with the developers.

Many factors influence a game’s musical strategy. The scale and budget of the project are likely well outside the composer’s control. The demands of a mobile game that only requires a few minutes of music will be very different from those of a high-budget title from a major studio that might represent years of work. Composers should understand how the game is assembled and whether they are expected to be involved in the implementation/integration of music into the game, or simply delivering the music (either as finished cues or as stems/elements of cues). It should also become clear early in the process whether there is sufficient budget to hire live performers. If the budget will not stretch to live performance, the composer must rely on synthesized instruments, and/or their own performing abilities.

Many technical decisions are intimately bound up with the game’s interactive and creative ethos. Perhaps the single biggest influence on the musical strategy of a game is the interactive genre or type of game (whether it is a first-person shooter, strategy game, or racing game, and so on). The interactive mechanics of the game will heavily direct the musical approach to the game’s music, partly as a result of precedent from earlier games, and partly because of the music’s engagement with the player’s interactivity.Footnote ¹ These kinds of broad-level decisions will affect how much music is required for the game, and how any dynamic music should be deployed. For instance, does the game have a main character? Should the game adopt a thematic approach? Should it aim to respond to the diversity of virtual environments in the game? Should it respond to player action? How is the game structured, and does musical development align with this structure?

Part of the creative process will involve the composer investigating these questions in tandem with the developers, though some of the answers may change as the project develops. Nevertheless, having a clear idea of the music’s integration into the game and of the available computational/financial resources is essential for the composer to effectively begin creating the music for the game.

The film composer may typically be found writing music to a preliminary edit of the film. In comparison, the game composer is likely to be working with materials much further away from the final form of the product.Footnote ² It is common for game composers to begin writing based on incomplete prototypes, design specifications and concept art/mood boards supplied by the developers. From these materials, composers will work in dialogue with the developers to refine a style and approach for the game. For games that are part of a series or franchise, the musical direction will often iterate on the approach from previous instalments, even if the compositional staff are not retained from one game to the sequel.

If a number of composers are working on a game, the issue of consistency must be considered carefully. It might be that the musical style should be homogenous, and so a strong precedent or model must be established for the other composers to follow (normally by the lead composer or audio director). In other cases, multiple composers might be utilized precisely because of the variety they can bring to a project. Perhaps musical material by one composer could be developed in different ways by other composers, which might be heard in contrasting areas of the game.

Unlike a film or television episode, where the composer works primarily with one individual (the director or producer), in a game, musical discussions are typically held between a number of partners. The composer may receive feedback from the audio director, the main creative director or even executives at the publishers.Footnote ³ This allows for a multiplicity of potential opinions or possibilities (which might be liberating or frustrating, depending on the collaboration). The nature of the collaboration may also be affected by the musical knowledge of the stakeholders who have input into the audio. Once the aesthetic direction has been established, and composition is underway, the composer co-ordinates with the audio director and/or technical staff to ensure that the music fits with the implementation plans and technical resources of the game.Footnote ⁴ Of course, the collaboration will vary depending on the scale of the project and company – a composer writing for a small indie game produced by a handful of creators will use a different workflow compared to a high-budget game with a large audio staff.

Methods of Dynamic Composition

One of the fundamental decisions facing the composer and developers is how the music should react to the player and gameplay. This might simply consist of beginning a loop of music when the game round begins, and silencing the loop when it ends, or it might be that the game includes more substantial musical interactivity.

If producers decide to deploy more advanced musical systems, this has consequences for the finite technical resources available for the game as it runs. Complex interactive music systems will require greater system resources such as processing power and memory. The resources at the composer’s disposal will have to be negotiated with the rest of the game’s architecture. Complex dynamic music may also involve the use of middleware systems for handling the interactive music (such as FMOD, Wwise or a custom system), which would need to be integrated into the programming architecture of the game.Footnote ⁵ If the composer is not implementing the interactive music themselves, further energies must be dedicated to integrating the music into the game. In all of these cases, because of the implications for time, resources and budget, as well as the aesthetic result, the decisions concerning dynamic music must be made in dialogue with the game development team, and are not solely the concern of the composer.

The opportunity to compose for dynamic music systems is one of the reasons why composers are attracted to writing for games. Yet a composer’s enthusiasm for a dynamic system may outstrip that of the producers or even the players. And, as noted elsewhere in this book, we should be wary of equating more music, or more dynamic music, with a better musical experience.

Even the most extensive dynamic systems are normally created from relatively straightforward principles. Either the selection and order of musical passages is affected by the gameplay (‘horizontal’ changes), or the game affects the combinations of musical elements heard simultaneously (‘vertical’ changes). Of course, these two systems can be blended, and both can be used to manipulate small or large units of music. Most often, looped musical passages will play some part in the musical design, in order to account for the indeterminacy of timing in this interactive medium.

Music in games can be designed to loop until an event occurs, at which point the loop will end, or another piece will play. Writing in loops is tricky, not least when repetition might prompt annoyance. When writing looped cues, composers have to consider several musical aspects including:

Harmonic structure, to avoid awkward harmonic shifts when the loop repeats. Many looped cues use a cadence to connect the end of the loop back to the beginning.
Timbres and textures, so that musical statements and reverb are not noticeably cut off when the loop repeats.
Melodic material, which must avoid listener fatigue. Winifred Phillips suggests using continual variation to mitigate this issue.Footnote ⁶
Dynamic and rhythmic progression during the cue, so that when the loop returns to the start, it does not sound like a lowering of musical tension, which may not match with in-game action.
Ending the loop or transitioning to another musical section. How will the loop end in a way that is musically satisfying? Should the loop be interrupted, or will the reaction have to wait until the loop concludes? Will a transition passage or crossfade be required?

A game might involve just one loop for the whole game round (as in Tetris, 1989) or several: in the stealth game Splinter Cell (2002), loops are triggered depending on the attention attracted by the player’s avatar.Footnote ⁷

Sometimes, rather than writing a complete cue as a whole entity, composers may write cues in sections or fragments (stems). Stems can be written to sound one after each other, or simultaneously.

In a technique sometimes called ‘horizontal sequencing’Footnote ⁸ or ‘branching’,Footnote ⁹ sections of a composition are heard in turn as the game is played. This allows the music to respond to the game action, when musical sections and variations can be chosen to suit the action. For instance, the ‘Hyrule Field’ cue of Legend of Zelda: Ocarina of Time (1998) consists of twenty-three sections. The order of the sections is partly randomized to avoid direct repetition, but the set is subdivided into different categories, so the music can suit the action. When the hero is under attack, battle variations play; when he stands still, sections without percussion play. Even if the individual sections do not loop, writing music this way still has some of the same challenges as writing loops, particularly concerning transition between sections (see, for example, the complex transition matrix developed for The Operative: No One Lives Forever (2000)).Footnote ¹⁰

Stems can also be programmed to sound simultaneously. Musical layers can be added, removed or substituted, in response to the action. Composers have to think carefully about how musical registers and timbres will interact when different combinations of layers are used, but this allows music to respond quickly, adjusting texture, instrumentation, dynamics and rhythm along with the game action. These layers may be synchronized to the same tempo and with beginnings and endings aligned, or they may be unsynchronized, which, provided the musical style allows this, is a neat way to provide further variation. Shorter musical fragments designed to be heard on top of other cues are often termed ‘stingers’.

These three techniques are not mutually exclusive, and can often be found working together. This is partly due to the different advantages and disadvantages of each approach. An oft-cited example, Monkey Island 2 (1991), uses the iMUSE music system, and deploys loops, layers, branching sections and stingers. Halo: Combat Evolved (2001), too, uses loops, branching sections, randomization and stingers.Footnote ¹¹ Like the hexagons of a beehive, the musical elements of dynamic systems use fundamental organizational processes to assemble individual units into large complex structures.

Even more than the technical and musical questions, composers for games must ask themselves which elements of the game construct their music responds to, and reinforces. Musical responses inevitably highlight certain aspects of the gameplay, whether that be the avatar’s health, success or failure, the narrative conceit, the plot, the environment, or any other aspect to which music is tied. Unlike in non-interactive media, composers for games must predict and imagine how the player will engage with the game, and create music to reinforce and amplify the emotional journeys they undertake. Amid exciting discussions of technical possibilities, composers must not lose sight of the player’s emotional and cognitive engagement with the game, which should be uppermost in the composer’s mind. Increased technical complexity, challenges for the composer and demands on resources all need to be balanced with the end result for the player. It is perhaps for this reason that generative and algorithmic music, as impressive as such systems are, has found limited use in games – the enhancement in the player’s experience is not always matched by the investment required to make successful musical outcomes.

Game Music as Media Music

As much as we might highlight the peculiar challenges of writing for games, it is important not to ignore game music’s continuity with previous media music. In many senses, game composers are continuing the tradition of media music that stretches back into the early days of film music in the late nineteenth century – that is, they are starting and developing a conversation between the screen and the viewer. For most players, the musical experience is more important than the technical complexities or systems that lie behind it. They hear the music as it sounds in relation to the screen and gameplay, not primarily the systematic and technical underpinnings. (Indeed, one of the points where players are most likely to become aware of the technology is when the system malfunctions or is somehow deficient, such as in glitches, disjunct transitions or incidences of too much repetition.) The fundamental question facing game composers is the same as for film composers: ‘What can the music bring to this project that will enhance the player/viewer’s experience?’ The overall job of encapsulating and enhancing the game on an aesthetic level is more important than any single technical concern.

Of course, where games and films/television differ is in the relationship with the viewer/listener. We are not dealing with a passive viewer, or homogenous audience, but a singular participant, addressed, and responded to, by the music. This is not music contemplated as an ‘other’ entity, but a soundtrack to the player’s actions. Over the course of the time taken to play through a game, players spend significantly longer with the music of any one game than with a single film or television episode. As players invest their time with the music of a game, they build a partnership with the score.

Players are well aware of the artifice of games and look for clues in the environment and game materials to indicate what might happen as the gameplay develops. Music is part of this architecture of communication, so players learn to attend to even tiny musical changes and development. For that reason, glitches or unintentional musical artefacts are particularly liable to cause a negative experience for players.

The connection of the music with the player’s actions and experiences (whether through dynamic music or more generally), forges the relationship between gamer and score. Little wonder that players feel so passionately and emotionally tied to game music – it is the musical soundtrack to their personal victories and defeats.

Delivering the Music

During the process of writing the music for the game, the composer will remain in contact with the developers. The audio director may need to request changes or revisions to materials for technical or creative reasons, and the requirement for new music might appear, while the music that was initially ordered might become redundant. Indeed, on larger projects in particular, it is not uncommon for drafts to be ultimately unused in the final projects.

Composers may deliver their music as purely synthesized materials, or the score might involve some aspect of live performance. A relatively recent trend has seen composers remotely collaborating with networks of soloist musicians. Composers send cues and demos to specific instrumentalists or vocalists, who then record parts in live performance, which are then integrated into the composition. This blended approach partly reflects a wider move in game scoring towards smaller ensembles and unusual combinations of instruments (often requiring specialist performers). The approach is also well suited to scores that blend together sound-design and musical elements. Such hybrid approaches can continue throughout the compositional process, and the contributed materials can inform the ongoing development of the musical compositions.

Of course, some scores still demand a large-scale orchestral session. While the composer is ultimately responsible for such sessions, composers rely on a larger team of collaborators to help arrange and record orchestras. An orchestrator will adapt the composer’s materials into written notation readable by human performers, while the composer will also require the assistance of engineers, mixers, editors and orchestra contractors to enable the session to run smoothly. Orchestral recording sessions typically have to be organized far in advance of the recording date, which necessitates that composers and producers establish the amount of music to be recorded and the system of implementation early on, in case this has implications for the way the music should be recorded. For example, if musical elements need to be manipulated independently of each other, the sessions need to be organized so they are recorded separately.

Promotion and Afterlife

Some game trailers may use music from the game they advertise, but in many cases, entirely separate music is used. There are several reasons for this phenomenon. On a practical level, if a game is advertised early in the development cycle, the music may not yet be ready, and/or the composer may be too busy writing the game’s music to score a trailer. More conceptually, trailers are a different medium to the games they advertise, with different aesthetic considerations. The game music, though perfect for the game, may not fit with the trailer’s structure or overall style. Unsurprisingly, then, game trailers often use pre-existing music.

Trailers also serve as one of the situations where game music may achieve an afterlife. Since trailers rely heavily on licensed pre-existing music, and trailers often draw on more than one source, game music may easily reappear in another trailer (irrespective of the similarity of the original game to the one being advertised).

Beyond a soundtrack album or other game-specific promotion, the music of a game may also find an afterlife in online music cultures (including YouTube uploads and fan remixes), or even in live performance. Game music concerts are telling microcosms of the significance of game music. On the one hand, they may seem paradoxical – if the appeal of games is founded on interactivity, then why should a format that removes such engagement be popular? Yet, considered more broadly, the significance of these concerts is obvious: they speak to the connection between players and music in games. This music is the soundtrack to what players feel is their own life. Why wouldn’t players be enthralled at the idea of a monumental staging of music personally connected to them? Here, on a huge scale in a public event, they can relive the highlights of their marvellous virtual lives.

* * *

This brief overview of the production of music for games has aimed to provide a broad-strokes characterization of the process of creating such music. Part of the challenge and excitement of music for games comes from the negotiation of technical and aesthetic demands. Ultimately, however, composers aim to create deeply satisfying experiences for players. Game music does so by building personal relationships with gamers, enriching their lives and experiences, in-game and beyond.

5 The Inherent Conflicts of Musical Interactivity in Video Games

Richard Stevens

Within narrative-based video games the integration of storytelling, where experiences are necessarily directed, and of gameplay, where the player has a degree of autonomy, continues to be one of the most significant challenges that developers face. In order to mitigate this potential dichotomy, a common approach is to rely upon cutscenes to progress the narrative. Within these passive episodes where interaction is not possible, or within other episodes of constrained outcome where the temporality of the episode is fixed, it is possible to score a game in exactly the same way as one might score a film or television episode. It could therefore be argued that the music in these sections of a video game is the least idiomatic of the medium. This chapter will instead focus on active gameplay episodes, and interactive music, where the unique challenges lie. When music accompanies active gameplay a number of conflicts, tensions and paradoxes arise. In this chapter, these will be articulated and interrogated through three key questions:

Do we score a player’s experience, or do we direct it?
How do we distil our aesthetic choices into a computer algorithm?
How do we reconcile the players’ freedom to instigate events at indeterminate times with musical forms that are time-based?

In the following discussion there are few certainties, and many more questions. The intention is to highlight the issues, to provoke discussion and to forewarn.

Scoring and Directing

Gordon Calleja argues that in addition to any scripted narrative in games, the player’s interpretation of events during their interaction with a game generates stories, what he describes as an ‘alterbiography’.Footnote ¹ Similarly, Dominic Arsenault refers to the ‘emergent narrative that arises out of the interactions of its rules, objects, and player decisions’Footnote ² in a video game. Music in video games is viewed by many as being critical in engaging players with storytelling,Footnote ³ but in addition to underscoring any wider narrative arc music also accompanies this active gameplay, and therefore narrativizes a player’s actions. In Claudia Gorbman’s influential book Unheard Melodies: Narrative Film Music she identifies that when watching images and hearing music the viewer will form mental associations between the two, bringing the connotations of the music to bear on their understanding of a scene.Footnote ⁴ This idea is supported by empirical research undertaken by Annabel Cohen who notes that, when interpreting visuals, participants appeared ‘unable to resist the systematic influence of music on their interpretation of the image’.Footnote ⁵ Her subsequent congruence-associationist model helps us to understand some of the mechanisms behind this, and the consequent impact of music on the player’s alterbiography.Footnote ⁶

Cohen’s model suggests that our interpretation of music in film is a negotiation between two processes.Footnote ⁷ Stimuli from a film will trigger a ‘bottom-up’ structural and associative analysis which both primes ‘top-down’ expectations from long-term memory through rapid pre-processing, and informs the working narrative (the interpretation of the ongoing film) through a slower, more detailed analysis. At the same time, the congruence (or incongruence) between the music and visuals will affect visual attention and therefore also influence the meaning derived from the film. In other words, the structures of the music and how they interact with visual structures will affect our interpretation of events. For example, ‘the film character whose actions were most congruent with the musical pattern would be most attended and, consequently, the primary recipient of associations of the music’.Footnote ⁸

In video games the music is often responding to actions instigated by the player, therefore it is these actions that will appear most congruent with the resulting musical pattern. In applying Cohen’s model to video games, the implication is that the player themselves will often be the primary recipient of the musical associations formed through experience of cultural, cinematic and video game codes. In responding to events instigated by the player, these musical associations will likely be ascribed to their actions. You (the player) act, superhero music plays, you are the superhero.

Of course, there are also events within games that are not instigated by the player, and so the music will attach its qualities to, and narrativize, these also. Several scholars have attempted to distinguish between two musical positions,Footnote ⁹ defining interactive music as that which responds directly to player input, and adaptive music that ‘reacts appropriately to – and even anticipates – gameplay rather than responding directly to the user’.Footnote ¹⁰ Whether things happen because the ‘game’ instigates them or the ‘player’ instigates them is up for much debate, and it is likely that the perceived congruence of the music will oscillate between game events and players actions, but what is critical is the degree to which the player perceives a causal relationship between these events or actions and the music.Footnote ¹¹ These relationships are important for the player to understand, since while music is narrativizing events it is often simultaneously playing a ludic role – supplying information to support the player’s engagement with the mechanics of the game.

The piano glissandi in Dishonored (2012) draw the player’s attention to the enemy NPC (Non-Player Character) who has spotted them, the four-note motif in Left 4 Dead 2 (2009) played on the piano informs the player that a ‘Spitter’ enemy type has spawned nearby,Footnote ¹² and if the music ramps up in Skyrim (2011), Watch Dogs (2014), Far Cry 5 (2018) or countless other games, the player knows that enemies are aware of their presence and are now in active pursuit. The music dramatizes the situation while also providing information that the player will interpret and use. In an extension to Chion’s causal mode of listening,Footnote ¹³ where viewers listen to a sound in order to gather information about its cause or sources, we could say that in video games players engage in ludic listening, interpreting the audio’s system-related meaning in order to inform their actions. Sometimes this ludic function of music is explicitly and deliberately part of the game design in order to avoid an overloading of the visual channel while compensating for a lack of peripheral vision and spatial information,Footnote ¹⁴ but whether deliberate or inadvertent, the player will always be trying to interpret music’s meaning in order to gain advantage.

Awareness of the causal links between game events, game variables and music will likely differ from player to player. Some players may note the change in musical texture when crouching under a table in Sly 3: Honor Among Thieves (2005) as confirmation that they are hidden from view; some will actively listen out for the rising drums that indicate the proximity of an attacking wolf pack in Rise of the Tomb Raider (2015);Footnote ¹⁵ while others may remain blissfully unaware of the music’s usefulness (or may even play with the music switched off).Footnote ¹⁶ Feedback is sometimes used as a generic term for all audio that takes on an informative role for the player,Footnote ¹⁷ but it is important to note that the identification of causality, with musical changes being perceived as either adaptive (game-instigated) or interactive (player-instigated), is likely to induce different emotional responses. Adaptive music that provides information to the player about game states and variables can be viewed as providing a notification or feed-forward function – enabling the player. Interactive music that corresponds more directly to player input will likewise have an enabling function, but it carries a different emotional weight since this feedback also comments on the actions of the player, providing positive or negative reinforcement (see Figure 5.1).

Figure 5.1 Notification and feedback: Enabling and commenting functions

The percussive stingers accompanying a successful punch in The Adventures of Tintin: The Secret of the Unicorn (2011), or the layers that begin to play upon the successful completion of a puzzle in Vessel (2012) provide positive feedback, while the duff-note sounds that respond to a mistimed input in Guitar Hero (2005) provide negative feedback. Whether enabling or commenting, music often performs these ludic functions. It is simultaneously narrating and informing; it is ludonarrative. The implications of this are twofold. Firstly, in order to remain congruent with the game events and player actions, music will be inclined towards a Mickey-Mousing type approach,Footnote ¹⁸ and secondly, in order to fulfil its ludic functions, it will tend towards a consistent response, and therefore will be inclined towards repetition.

When the player is web-slinging their way across the city in Spider-Man (2018) the music scores the experience of being a superhero, but when they stop atop a building to survey the landscape the music must logically also stop. When the music strikes up upon entering a fort in Assassin’s Creed: Origins (2017), we are informed that danger may be present. We may turn around and leave, and the music fades out. Moving between these states in either game in quick succession highlights the causal link between action and music; it draws attention to the system, to the artifice. Herein lies a fundamental conflict of interactive music – in its attempt to be narratively congruent, and ludically effective, music can reveal the systems within the game, but in this revealing of the constructed nature of our experience it disrupts our immersion in the narrative world. While playing a game we do not want to be reminded of the architecture and artificial mechanics of that game. These already difficult issues around congruence and causality are further exacerbated when we consider that music is often not just scoring the game experience, it is directing it.

Film music has sometimes been criticized for a tendency to impose meaning, for telling a viewer how to feel,Footnote ¹⁹ but in games music frequently tells a player how to act. In the ‘Medusa’s Call’ chapter of Far Cry 3 (2012) the player must ‘Avoid detection. Use stealth to kill the patrolling radio operators and get their intel’, and the quiet tension of the synthesizer and percussion score supports this preferred stealth strategy. In contrast, the ‘Kick the Hornet’s Nest’ episode (‘Burn all the remaining drug crops’) encourages the player to wield their flamethrower to spectacular destructive effect through the electro-dubstep-reggae mashup of ‘Make it Bun Dem’ by Skrillex and Damian ‘Jr. Gong’ Marley. Likewise, a stealth approach is encouraged by the James-Bond-like motif of Rayman Legends (2013) ‘Mysterious Inflatable Island’, in contrast to the hell-for-leather sprint inferred from the scurrying strings of ‘The Great Lava Pursuit’. In these examples, and many others, we can see that rather than scoring a player’s actual experience, the music is written in order to match an imagined ideal experience – where the intentions of the music are enacted by the player. To say that we are scoring a player experience implies that somehow music is inert, that it does not impact on the player’s behaviour. But when the player is the protagonist it may be the case that, rather than identifying what is congruent with the music’s meaning, the player acts in order for image and music to become congruent. When music plays, we are compelled to play along; the music directs us. Both Ernest Adams and Tulia-Maria Cășvean refer to the idea of the player’s contract,Footnote ²⁰ that in order for games to work there has to be a tacit agreement between the player and the game maker; that they both have a degree of responsibility for the experience. If designers promise to provide a credible, coherent world then the player agrees to behave according to a given set of predefined rules, usually determined by game genre, in order to maintain this coherence. Playing along with the meanings implicit in music forms part of this contract. But the player also has the agency to decide not to play along with the music, and so it will appear incongruent with their actions, and again the artifice of the game is revealed.Footnote ²¹

When game composer Marty O’Donnell states ‘When [the players] look back on their experience, they should feel like their experience was scored, but they should never be aware of what they did to cause it to be scored’Footnote ²² he is demonstrating an awareness of the paradoxes that arise when writing music for active game episodes. Music is often simultaneously performing ludic and narrative roles, and it is simultaneously following (scoring) and leading (directing). As a consequence, there is a constant tension between congruence, causality and abstraction. Interactive music represents a catch-22 situation. If music seeks to be narratively congruent and ludically effective, this compels it towards a Mickey-Mousing approach, because of the explicit, consistent and close matching of music to the action required. This leads to repetition and a highlighting of the artifice. If music is directing the player or abstracted from the action, then it runs the risk of incongruence. Within video games, we rely heavily upon the player’s contract and the inclination to act in congruence with the behaviour implied by the music, but the player will almost always have the ability to make our music sound inappropriate should they choose to.

Many composers who are familiar with video games recognize that music should not necessarily always act in the same way throughout a game. Reflecting on his work in Journey (2012), composer Austin Wintory states, ‘An important part of being adaptive game music/audio people is not [to ask] “Should we be interactive or should we not be?” It’s “To what extent?” It’s not a binary system, because storytelling entails a certain ebb and flow’.Footnote ²³ Furthermore, he notes that if any relationship between the music and game, from Mickey-Mousing to counterpoint, becomes predictable then the impact can be lost, recommending that ‘the extent to which you are interactive should have an arc’.Footnote ²⁴ This bespoke approach to writing and implementing music for games, where the degree of interactivity might vary between different active gameplay episodes, is undoubtedly part of the solution to the issues outlined but faces two main challenges. Firstly, most games are very large, making a tailored approach to each episode or level unrealistic, and secondly, that unlike in other art forms or audiovisual media, the decisions about how music will act within a game are made in absentia; we must hand these to a system that serves as a proxy for our intent. These decisions in the moment are not made by a human being, but are the result of a system of events, conditions, states and variables.

Algorithms and Aesthetics

Given the size of most games, and an increasingly generative or systemic approach to their development, the complex choices that composers and game designers might want to make about the use of music during active gameplay episodes must be distilled into a programmatic system of events, states and conditions derived from discrete (True/False) or continuous (0.0–1.0) variables. This can easily lead to situations where what might seem programmatically correct does not translate appropriately to the player’s experience. In video games there is frequently a conflict between our aesthetic aims and the need to codify the complexity of human judgement we might want to apply.

One common use of music in games is to indicate the state of the artificial intelligence (AI), with music either starting or increasing in intensity when the NPCs are in active pursuit of the player, and ending or decreasing in intensity when they end the pursuit or ‘stand down’.Footnote ²⁵ This provides the player with ludic information about the NPC state, while at the same time heightening tension to reflect the narrative situation of being pursued. This could be seen as a good example of ludonarrative consonance or congruence. However, when analysed more closely we can see that simply relying on the ludic logic of gameplay events to determine the music system, which has narrative implications, is not effective. In terms of the game’s logic, when an NPC stands down or when they are killed the outcome is the same; they are no longer actively seeking the player (Active pursuit = False). In many games, the musical result of these two different events is the same – the music fades out. But if as a player I have run away to hide and waited until the NPC stopped looking for me, or if I have confronted the NPC and killed them, these events should feel very different. The common practice of musically responding to both these events in the same way is an example of ludonarrative dissonance.Footnote ²⁶ In terms of providing ludic information to the player it is perfectly effective, but in terms of narrativizing the player’s actions, it is not.

The perils of directly translating game variables into musical states is comically apparent in the ramp into epic battle music when confronting a small rat in The Elder Scrolls III: Morrowind (2002) (Enemy within a given proximity = True). More recent games such as Middle Earth: Shadow of Mordor (2014) and The Witcher 3: Wild Hunt (2015) treat such encounters with more sophistication, reserving additional layers of music, or additional stingers, for confrontations with enemies of greater power or higher ranking than the player, but the translation of sophisticated human judgements into mechanistic responses to a given set of conditions is always challenging. In both Thief (2014) and Sniper Elite V2 (2012) the music intensifies upon detection, but the lackadaisical attitudes of the NPCs in Thief, and the fact that you can swiftly dispatch enemies via a judicious headshot in Sniper Elite V2, means that the music is endlessly ramping up and down in a Mickey-Mousing fashion. In I Am Alive (2012), a percussive layer mirrors the player character’s stamina when climbing. This is both ludically and narratively effective, since it directly signifies the depleting reserves while escalating the tension of the situation. Yet its literal representation of this variable means that the music immediately and unnaturally drops to silence should you ‘hook-on’ or step onto a horizontal ledge. In all these instances, the challenging catch-22 of scoring the player’s actions discussed above is laid bare, but better consideration of the player’s alterbiography might have led to a different approach. Variables are not feelings, and music needs to interpolate between conditions. Just because the player is now ‘safe’ it does not mean that their emotions are reset to 0.0 like a variable: they need some time to recover or wind down. The literal translation of variables into musical responses means that very often there is no coda, no time for reflection.

There are of course many instances where the consideration of the experiential nature of play can lead to a more sophisticated consideration of how to translate variables into musical meaning. In his presentation on the music of Final Fantasy XV (2016), Sho Iwamoto discussed the music that accompanies the player while they are riding on a Chocobo, the large flightless bird used to more quickly traverse the world. Noting that the player is able to transition quickly between running and walking, he decided on a parallel approach to scoring whereby synchronized musical layers are brought in and out.Footnote ²⁷ This approach is more suitable when quick bidirectional changes in game states are possible, as opposed to the more wholescale musical changes brought about by transitioning between musical segments.Footnote ²⁸ A logical approach would be to simply align the ‘walk’ state with specific layers, and a ‘run’ state with others, and to crossfade between these two synchronized stems. The intention of the player to run is clear (they press the run button), but Iwamoto noted that players would often be forced unintentionally into a ‘walk’ state through collisions with trees or other objects. As a consequence, he implemented a system that responds quickly to the intentional choice, transitioning from the walk to run music over 1.5 beats, but chose to set the musical transition time from the ‘run’ to ‘walk’ state to 4 bars, thereby smoothing out the ‘noise’ of brief unintentional interruptions and avoiding an overly reactive response.Footnote ²⁹

These conflicts between the desire for nuanced aesthetic results and the need for a systematic approach can be addressed, at least in part, through a greater engagement by composers with the systems that govern their music and by a greater understanding of music by game developers. Although the situation continues to improve it is still the case in many instances that composers are brought on board towards the end of development, and are sometimes not involved at all in the integration process. What may ultimately present a greater challenge is that players play games in different ways.

A fixed approach to the micro level of action within games does not always account for how they might be experienced on a more macro level. For many people their gaming is opportunity driven, snatching a valued 30–40 minutes here and there, while others may carve out an entire weekend to play the latest release non-stop. The sporadic nature of many people’s engagement with games not only mitigates against the kind of large-scale musico-dramatic arc of films, but also likely makes music a problem for the more dedicated player. If music needs to be ‘epic’ for the sporadic player, then being ‘epic’ all the time for 10+ hours is going to get a little exhausting. This is starting to be recognized, with players given the option in the recent release of Assassin’s Creed: Odyssey (2018) to choose the frequency at which the exploration music occurs.

Another way in which a fixed-system approach to active music falters is that, as a game character, I am not the same at the end of the game as I was at the start, and I may have significant choice over how my character develops. Many games retain the archetypal narrative of the Hero’s Journey,Footnote ³⁰ and characters go through personal development and change – yet interactive music very rarely reflects this: the active music heard in 20 hours is the same that was heard in the first 10 minutes. In a game such as Silent Hill: Shattered Memories (2009) the clothing and look of the character, the dialogue, the set dressing, voicemail messages and cutscenes all change depending on your actions in the game and the resulting personality profile, but the music in active scenes remains similar throughout. Many games, particularly RPGs (Role-Playing Games) enable significant choices in terms of character development, but it typically remains the case that interactive music responds more to geography than it does to character development. Epic Mickey (2010) is a notable exception; when the player acts ‘good’ by following missions and helping people, the music becomes more magical and heroic, but when acting more mischievous and destructive, ‘You’ll hear a lot of bass clarinets, bassoons, essentially like the wrong notes … ’.Footnote ³¹ The 2016 game Stories: The Path of Destinies takes this idea further, offering moments of choice where six potential personalities or paths are reflected musically through changes in instrumentation and themes. Reflecting the development of the player character through music, given the potential number of variables involved, would, of course, present a huge challenge, but it is worthy of note that so few games have even made a modest attempt to do this.

Recognition that players have different preferences, and therefore have different experiences of the same game, began with Bartle’s identification of common characteristics of groups of players within text-based MUD (Multi-User Dungeon) games.Footnote ³² More recently the capture and analysis of gameplay metrics has allowed game-user researchers to refine this understanding through the concept of player segmentation.Footnote ³³ To some extent, gamers are self-selecting in terms of matching their preferred gaming style with the genre of games they play, but games want to appeal to as wide an audience as possible and so attempt to appeal to different types of player. Making explicit reference to Bartle’s player types, game designer Chris McEntee discusses how Rayman Origins (2011) has a co-operative play mode for ‘Socializers’, while ‘Explorers’ are rewarded through costumes that can be unlocked, and ‘Killers’ are appealed to through the ability to strike your fellow player’s character and push them into danger.Footnote ³⁴ One of the conflicts between musical interactivity and the gaming experience is that the approach to music, the systems and thresholds chosen are developed for the experience of an average player, but we know that approaches may differ markedly.Footnote ³⁵ A player who approaches Dishonored 2 (2016) with an aggressive playstyle will hear an awful lot of the high-intensity ‘fight’ music; however a player who achieves a very stealthy or ‘ghost’ playthrough will never hear it.Footnote ³⁶ A representation of the potential experiences of different player types with typical ‘Ambient’, ‘Tension’ and ‘Action’ music tracks is shown below (Figures 5.2–5.4).

Figure 5.2 Musical experience of an average approach

Figure 5.3 Musical experience of an aggressive approach

Figure 5.4 Musical experience of a stealthy approach

A single fixed-system approach to music that fails to adapt to playstyles will potentially result in a vastly different, and potentially unfulfilling, musical experience. A more sophisticated method, where the thresholds are scaled or recalibrated around the range of the player’s ‘mean’ approach, could lead to a more personalized and more varied musical experience. This could be as simple as raising the threshold at which a reward stinger is played for the good player, or as complex as introducing new micro tension elements for a stealth player’s close call.

Having to codify aesthetic decisions represents for many composers a challenge to their usual practice outside of games, and indeed perhaps a conflict with how we might feel these decisions should be made. Greater understanding by composers of the underlying systems that govern their music’s use is undoubtedly part of the answer, as is a greater effort to track and understand an individual’s behaviour within games, so that we can provide a good experience for the ‘sporadic explorer’, as well as the ‘dedicated killer’, but there is a final conflict between interactivity and music that is seemingly irreconcilable – that of player agency and the language of music itself.

Agency and Structure

As discussed above, one of the defining features of active gameplay episodes within video games, and indeed a defining feature of interactivity itself, is that the player is granted agency; the ability to instigate actions and events of their own choosing, and crucially for music – at a time of their own choosing. Parallel, vertical or layer-based approaches to musical form within games can respond to events rapidly and continuously without impacting negatively on musical structures, since the temporal progression of the music is not interrupted. However, other gaming events often necessitate a transitional approach to music, where the change from one musical cue to an alternate cue mirrors a more significant change in the dramatic action.Footnote ³⁷ In regard to film music, K. J. Donnelly highlights the importance of synchronization, that films are structured through what he terms ‘audiovisual cadences’ – nodal points of narrative or emotional impact.Footnote ³⁸ In games these nodal points, in particular at the end of action-based episodes, also have great significance, but the agency of the player to instigate these events at any time represents an inherent conflict with musical structures.

There is good evidence that an awareness of musical, especially rhythmic, structures is innate. Young babies will indicate negative brainwave patterns when there is a change to an otherwise consistent sequence of rhythmic cycles, and even when listening to a monotone metronomic pulse we will perceive some of these sounds as accented.Footnote ³⁹ Even without melody, harmonic sequences or phrasing, most music sets up temporal expectations, and the confirmation of, or violation of expectation in music is what is most closely associated with strong or ‘peak’ emotions.Footnote ⁴⁰ To borrow Chion’s terminology we might say that peak emotions are a product of music’s vectorization,Footnote ⁴¹ the way it orients towards the future, and that musical expectation has a magnitude and direction. The challenges of interaction, the conflict between the temporal determinacy of vectorization and the temporal indeterminacy of the player’s actions, have stylistic consequences for music composed for active gaming episodes.

In order to avoid jarring transitions and to enable smoothness,Footnote ⁴² interactive music is inclined towards harmonic stasis and metrical ambiguity, and to avoiding melody and vectorization. The fact that such music should have little structure in and of itself is perhaps unsurprising – since the musical structure is a product of interaction. If musical gestures are too strong then this will not only make transitions difficult, but they will potentially be interpreted as having ludic meaning where none was intended. In order to enable smooth transitions, and to avoid the combinatorial explosion that results from potentially transitioning between different pieces with different harmonic sequences, all the interactive music in Red Dead Redemption (2010) was written in A minor at 160 bpm, and most music for The Witcher 3: Wild Hunt in D minor. Numerous other games echo this tendency towards repeated ostinatos around a static tonal centre. The music of Doom (2016) undermines rhythmic expectancy through the use of unusual and constantly fluctuating time signatures, and the 6/8 polyrhythms in the combat music of Batman: Arkham Knight (2015) also serve to unlock our perception from the usual 4/4 metrical divisions that might otherwise dominate our experience of the end-state transition. Another notable trend is the increasingly blurred border between music and sound effects in games. In Limbo (2010) and Little Nightmares (2017), there is often little delineation between game-world sounds and music, and the audio team of Shadow of the Tomb Raider (2018) talk about a deliberate attempt to make the player ‘unsure that what they are hearing is score’.Footnote ⁴³ All of these approaches can be seen as stylistic responses to the challenge of interactivity.Footnote ⁴⁴

The methods outlined above can be effective in mitigating the interruption of expectation-based structures during musical transitions, but the de facto approach to solving this has been for the music to not respond immediately, but instead to wait and transition at the next appropriate musical juncture. Current game audio middleware enables the system to be aware of musical divisions, and we can instruct transitions to happen at the next beat, next bar or at an arbitrary but musically appropriate point through the use of custom cues.Footnote ⁴⁵ Although more musically pleasing,Footnote ⁴⁶ such metrical transitions are problematic since the audiovisual cadence is lost – music always responds after the event – and they provide an opportunity for incongruence, for if the music has to wait too long after the event to transition then the player may be engaging in some other kind of trivial activity at odds with the dramatic intent of the music. Figure 5.5 illustrates the issue. The gameplay action begins at the moment of pursuit, but the music waits until the next juncture (bar) to transition to the ‘Action’ cue. The player is highly skilled and so quickly triumphs. Again, the music holds on the ‘Action’ cue until the next bar line in order to transition musically back to the ‘Ambient’ cue.

Figure 5.5 Potential periods of incongruence due to metrical transitions are indicated by the hatched lines

The resulting periods of incongruence are unfortunate, but the lack of audiovisual cadence is particularly problematic when one considers that these transitions are typically happening at the end of action-based episodes, where music is both narratively characterizing the player and fulfilling the ludic role of feeding-back on their competence.Footnote ⁴⁷ Without synchronization, the player’s sense of accomplishment and catharsis can be undermined. The concept of repetition, of repeating the same episode or repeatedly encountering similar scenarios, features in most games as a core mechanic, so these ‘end-events’ will be experienced multiple times.Footnote ⁴⁸ Given that game developers want players to enjoy the gaming experience it would seem that there is a clear rationale for attempting to optimize these moments. This rationale is further supported by the suggestion that the end of an experience, particularly in a goal-oriented context, may play a disproportionate role in people’s overall evaluation of the experience and their subsequent future behaviour.Footnote ⁴⁹

The delay between the event and the musical response can be alleviated in some instances by moving to a ‘pre-end’ musical segment, one that has an increased density of possible exit points, but the ideal would be to have vectorized music that leads up to and enhances these moments of triumph, something which is conceptually impossible unless we suspend the agency of the player. Some games do this already. When an end-event is approached in Spider-Man the player’s input and agency are often suspended, and the climactic conclusion is played out in a cutscene, or within the constrained outcome of a quick-time event that enables music to be more closely synchronized. However, it would also be possible to maintain a greater impression of agency and to achieve musical synchronization if musical structures were able to input into game’s decision-making processes. That is, to allow musical processes to dictate the timing of game events. Thus far we have been using the term ‘interactive’ to describe music during active episodes, but most music in games is not truly interactive, in that it lacks a reciprocal relationship with the game’s systems. In the vast majority of games, the music system is simply a receiver of instruction.Footnote ⁵⁰ If the music were truly interactive then this would raise the possibility of thresholds and triggers being altered, or game events waiting, in order to enable the synchronization of game events to music.

Manipulating game events in order to synchronize to music might seem anathema to many game developers, but some are starting to experiment with this concept. In the Blood and Wine expansion pack for The Witcher 3: Wild Hunt the senior audio programmer, Colin Walder, describes how the main character Geralt was ‘so accomplished at combat that he is balletic, that he is dancing almost’.Footnote ⁵¹ With this in mind they programmed the NPCs to attack according to musical timings, with big attacks syncing to a grid, and smaller attacks syncing to beats or bars. He notes ‘I think the feeling that you get is almost like we’ve responded somehow with the music to what was happening in the game, when actually it’s the other way round’.Footnote ⁵² He points out that ‘Whenever you have a random element then you have an opportunity to try and sync it, because if there is going to be a sync point happen [sic] within the random amount of time that you were already prepared to wait, you have a chance to make a sync happen’.Footnote ⁵³ The composer Olivier Derivière is also notable for his innovations in the area of music and synchronization. In Get Even (2017) events, animations and even environmental sounds are synced to the musical pulse. The danger in using music as an input to game state changes and timings is that players may sense this loss of agency, but it should be recognized that many games artificially manipulate the player all the time through what is termed dynamic difficulty adjustment or dynamic game balancing.Footnote ⁵⁴ The 2001 game Max Payne dynamically adjusts the amount of aiming assistance given to the player based on their performance,Footnote ⁵⁵ in Half-Life 2 (2004) the content of crates adjusts to supply more health when the player’s health is low,Footnote ⁵⁶ and in BioShock (2007) they are rendered invulnerable for 1–2 seconds when at their last health point in order to generate more ‘barely survived’ moments.Footnote ⁵⁷ In this context, the manipulation of game variables and timings in order for events to hit musically predetermined points or quantization divisions seems less radical than the idea might at first appear.

The reconciliation of player agency and musical structure remains a significant challenge for video game music within active gaming episodes, and it has been argued that the use of metrical or synchronized transitions is only a partial solution to the problem of temporal indeterminacy. The importance of the ‘end-event’ in the player’s narrative, and the opportunity for musical synchronization to provide a greater sense of ludic reward implies that the greater co-influence of a more truly interactive approach is worthy of further investigation.

Conclusion

Video game music is amazing. Thrilling and informative, for many it is a key component of what makes games so compelling and enjoyable. This chapter is not intended to suggest any criticism of specific games, composers or approaches. Composers, audio personnel and game designers wrestle with the issues outlined above on a daily basis, producing great music and great games despite the conflicts and tensions within musical interactivity.

Perhaps one common thread in response to the questions posed and conflicts identified is the appreciation that different people play different games, in different ways, for different reasons. Some may play along with the action implied by the music, happy for their actions to be directed, while others may rail against this – taking an unexpected approach that needs to be more closely scored. Some players will ‘run and gun’ their way through a game, while for others the experience of the same gaming episode will be totally different. Some players may be happy to accept a temporary loss of autonomy in exchange for the ludonarrative reward of game/music synchronization, while for others this would be an intolerable breach of the gaming contract.

As composers are increasingly educated about the tools of game development and approaches to interactive music, they will no doubt continue to become a more integrated part of the game design process, engaged with not only writing the music, but with designing the algorithms that govern how that music works in game. In doing so there is an opportunity for composers to think not about designing an interactive music system for a game, but about designing interactive music systems, ones that are aware of player types and ones that attempt to resolve the conflicts inherent in musical interactivity in a more bespoke way. Being aware of how a player approaches a game could, and should, inform how the music works for that player.

6 The Triple Lock of Synchronization

K. J. Donnelly

Contemporary audiovisual objects unify sound and moving image in our heads via the screen and speakers/headphones. The synchronization of these two channels remains one of the defining aspects of contemporary culture. Video games follow their own particular form of synchronization, where not only sound and image, but also player input form a close unity.Footnote ¹ This synchronization unifies the illusion of movement in time and space, and cements it to the crucial interactive dimension of gaming. In most cases, the game software’s ‘music engine’ assembles the whole, fastening sound to the rest of the game, allowing skilled players to synchronize themselves and become ‘in tune’ with the game’s merged audio and video. This constitutes the critical ‘triple lock’ of player input with audio and video that defines much gameplay in digital games.

This chapter will discuss the way that video games are premised upon a crucial link-up between image, sound and player, engaging with a succession of different games as examples to illustrate differences in relations of sound, image and player psychology. There has been surprisingly little interest in synchronization, not only in video games but also in other audiovisual culture.Footnote ² In many video games, it is imperative that precise synchronization is achieved or else the unity of the gameworld and the player’s interaction with it will be degraded and the illusion of immersion and the effectiveness of the game dissipated. Synchronization can be precise and momentary, geared around a so-called ‘synch point’; or it might be less precise and more continuous but evincing matched dynamics between music and image actions; or the connections can be altogether less clear. Four types of synchronization in video games exist. The first division, precise synchronization, appears most evidently in interactive sounds where the game player delivers some sort of input that immediately has an effect on audiovisual output in the game. Clearest where diegetic sounds emanate directly from player activity, it also occurs in musical accompaniment that develops constantly in parallel to the image activity and mood. The second division, plesiochrony, involves the use of ambient sound or music which fits vaguely with the action, making a ‘whole’ of sound and image, and thus a unified and immersive environment as an important part of gameplay. The third strain would be music-led asynchrony, where the music dominates and sets time for the player. Finally, in parallel-path asynchrony, music accompanies action but evinces no direct weaving of its material with the on-screen activity or other sounds.

Synching It All Up

It is important to note that synchronization is both the technological fact of the gaming hardware pulling together sound, image and gamer, and simultaneously a critically important psychological process for the gamer. This is central to immersion, merging sensory stimuli and completing a sense of surrounding ambience that takes in coherently matched sound and image. Now, this may clearly be evident in the synchronization of sound effects with action, matching the world depicted on screen as well as the game player’s activities. For instance, if we see a soldier fire a gun on screen we expect to hear the crack of the gunshot, and if the player (or the player’s avatar) fires a gun in the game, we expect to hear a gunshot at the precise moment the action takes place. Sound effects may appear more directly synched than music in the majority of cases, yet accompanying music can also be an integrated part of such events, also matching and directing action, both emotionally and aesthetically. Synchronization holds together a unity of audio and visual, and their combination is added to player input. This is absolutely crucial to the process of immersion through holding together the illusion of sound and vision unity, as well as the player’s connection with that amalgamation.

Sound provides a more concrete dimension of space for video games than image, serving a crucial function in expanding the surface of its flat images. The keystones of this illusion are synch points, which provide a structural relationship between sound, image and player input. Synch points unify the game experience as a perceptual unity and aesthetic encounter. Writing primarily about film but with relevance to all audiovisual culture, Michel Chion coined the term ‘synchresis’ to describe the spontaneous appearance of synchronized connection between sound and image.Footnote ³ This is a perceptual lock that magnetically draws together sound and image, as we expect the two to be attached. The illusory and immersive effect in gameplay is particularly strong when sound and image are perceived as a unity. While we, the audience, assume a strong bond between sounds and images occupying the same or similar space, the keystones of this process are moments of precise synchronization between sound and image events.Footnote ⁴ This illusion of sonic and visual unity is the heart of audiovisual culture. Being perceived as an utter unity disavows the basis in artifice and cements a sense of audiovisual culture as on some level being a ‘reality’.Footnote ⁵

The Gestalt psychology principle of isomorphism suggests that we understand objects, including cultural objects, as having a particular character, as a consequence of their structural features.Footnote ⁶ Certain structural features elicit an experience of expressive qualities, and these features recur across objects in different combinations. This notion of ‘shared essential structure’ accounts for the common pairing of certain things: small, fast-moving objects with high-pitched sounds; slow-moving music with static or slow-moving camerawork with nothing moving quickly within the frame, and so on. Isomorphism within Gestalt psychology emphasizes a sense of cohesion and unity of elements into a distinct whole, which in video games is premised upon a sense of synchronization, or at least ‘fitting’ together unremarkably; matching, if not obviously, then perhaps on some deeper level of unity. According to Rudolf Arnheim, such a ‘structural kinship’ works essentially on a psychological level,Footnote ⁷ as an indispensable part of perceiving expressive similarity across forms, as we encounter similar features in different contexts and formulations. While this is, of course, bolstered by convention, it appears to have a basis in primary human perception.Footnote ⁸

One might make an argument that many video games are based on a form of spatial exploration, concatenating the illusory visual screen space with that of stereo sound, and engaging a constant dynamic of both audio and visual movement and stasis. This works through isomorphism and dynamic relationships between sound and image that can remain in a broad synchronization, although not matching each other pleonastically blow for blow.Footnote ⁹ A good example here would be the first-person shooter Quake (1996), where the player sees their avatar’s gun in the centre of the screen and has to move and shoot grotesque cyborg and organic enemies. Trent Reznor and Nine Inch Nails’ incidental soundtrack consists of austere electronic music, dominated by ambient drones and treated electronic sounds. It is remarkable in itself but is also matched well to the visual aspects of the gameworld. The player moves in 3-D through a dark and grim setting that mixes an antiquated castle with futuristic high-tech architecture, corridors and underwater shafts and channels. This sound and image environment is an amalgam, often of angular, dark-coloured surfaces and low-pitched notes that sustain and are filtered to add and subtract overtones. It is not simply that sound and image fit together well, but that the tone of both is in accord on a deep level. Broad synchronization consists not simply of a mimetic copy of the world outside the game (where we hear the gunshot when we fire our avatar’s gun) but also of a general cohesion of sound and image worlds which is derived from perceptual and cognitive horizons as well as cultural traditions. In other words, the cohesion of the sound with the image in the vast majority of cases is due to structural and tonal similarities between what are perhaps too often approached as utterly separate channels. However, apart from these deep- (rather than surface-) level similarities, coherence of sound and image can also vary due to the degree and mode of synchronization between the two.

Finger on the Trigger

While synchronization may be an aesthetic strategy or foundation, a principle that produces a particular psychological engagement, it is also essentially a technological process. Broadly speaking, a computer CPU (central processing unit) has its own internal clocks that synchronizes and controls all its operations. There is also a system clock which controls things for the whole system (outside of the CPU). These clocks also need to be in synchronization.Footnote ¹⁰ While matters initially rest on CPU and console/computer architecture – the hardware – they also depend crucially on software.Footnote ¹¹

The principle of the synch point where image events and player inputs trigger developments in the music is a characteristic of video game music. Jesper Kaae discusses video games as ‘hypertext’, consisting of nodes and links, which are traversed in a non-linear fashion.Footnote ¹² ‘Nodes’ might be understood as the synch points that underpin the structure of interactive video games, and have particular relevance for the triple lock of sound and image to player input. Specific to video games is how the disparate musical elements are triggered by gameplay and combined into a continuum of coherent development for the player’s experience over time. Indeed, triggering is the key for development in such a non-linear environment, set in motion by the coalescence of the player’s input with sound and image elements. A player moving the avatar into a new room, for example, can trigger a new piece of music or the addition of some musical aspects to existing looped music.Footnote ¹³ Michael Sweet notes that triggered musical changes can alter emotional state or general atmosphere, change the intensity of a battle, indicate a fall in the player’s health rating, indicate an enemy’s proximity and indicate successful completion of a task or battle.Footnote ¹⁴

Triggered audio can be a simple process, where some games simply activate a loop of repeated music that continues ad infinitum as accompaniment to the player’s screen actions, as in Tetris (Nintendo, 1989). However, triggered audio often sets in train more complex programs where music is varied.Footnote ¹⁵ Karen Collins effectively differentiates between types of triggered audio in video games. ‘Interactive audio’ consists of ‘Sound events that react to the player’s direct input’ like footsteps and gunshots, whereas ‘adaptive audio’ is ‘Sound that reacts to the game states’ such as location, mood or health.Footnote ¹⁶ The former relates to precise gameplay, while the latter is not so directly affected by player activity. In terms of music in video games, while much works on a level of providing accompanying atmosphere in different locations, for example, and so is adaptive, some programming allows for developing music to be triggered by a succession of player inputs (such as proximity of enemies). Adaptive music is particularly effective in sophisticated action role-playing games with possible multiple paths. Good examples of this would include the later iterations of the Elder Scrolls series games, such as Skyrim (2011). A particularly complex form of synchronization comes from branching (horizontal resequencing) and layering (vertical remixing) music to fit momentary developments in the gameplay instigated by the player.Footnote ¹⁷ The process of pulling together disparate musical ‘cues’ involves direct joins, crossfades and masks, with a dedicated program controlling a database of music in the form of fragmentary loops, short transitions and longer musical pieces of varying lengths (see, for example, Halo (2001) and sequels). These sophisticated procedures yield constant variation, where music is integrated with the experience of the player. This means momentary change, synchronized precisely to events in the game, and this precise matching of musical development to action on screen owes something to the film tradition of ‘Mickey-Mousing’ but is far more sophisticated.Footnote ¹⁸ The relationship of image, player input and soundtrack is retained at a constantly close level, controlled by the programming and with momentary changes often not consciously perceived by the player.

Tradition has led to strong conventions in video game audio and in the relationship between sound and image. Some of these conventions are derived from other, earlier forms of audiovisual culture,Footnote ¹⁹ while some others are more specific to game design. Synchronization is fundamental for video games, but the relationship between sound and image can take appreciably different forms. It might be divided into four types: precise synchronization to gameplay, plesiochrony, forcing gameplay to fit music and asynchrony.

Player-Led Synchrony

Player-led synchronization has a succession of synch points where player input aligns precisely with sound and image. Player input can change screen activity, and this renders musical developments in line with the changes in game activity. This is what Collins calls ‘interactive audio’, and this principle is most evident in interactive sound effects where the game player provides some sort of input that immediately has an effect in the audiovisual output of the game. For instance, the pulling of a gun’s trigger by the player requires a corresponding immediate gunshot sound, and requires a corresponding resultant action on screen where a target is hit (or not) by the bullet. This is a crucial process that provides an immersive effect, making the player believe on some level in the ‘reality’ of the gameplay and gameworld on screen. It correlates with our experience of the real world,Footnote ²⁰ or at least provides a sense of a coherent world on screen even if it does not resemble our own. This triple lock holds together sound and image as an illusory unity but also holds the player in place as the most essential function. Indeed, the continued coherent immersive illusion of the game is held together by the intermittent appearance of such moments of direct, precise synchronization. The coherence of the experience is also aided by synchronized music, which forms a precise unity of visuals on screen, other sounds and gameplay activity. Music in such situations is dynamic, following player input to match location, mood and activity. This is reactive music that can change in a real-time mix in response to the action, depending directly on some degree of variable input from the player. It lacks the linear development of music as it is traditionally understood and indeed, each time a particular section of a game is played, the music might never be exactly the same.

An interesting case in point is Dead Space (2008), which has a particularly convoluted and intricate approach to its music. The game is set in the twenty-sixth century, when engineer Isaac has to fight his way through the mining space ship Ishimura that is filled with ‘Necromorphs’, who are the zombified remnants of its crew. To destroy them, Isaac has to dismember them. Played in third person, Dead Space includes zero-gravity vacuum sections, and puzzle solving as well as combat. Jason Graves, the game’s composer, approached the game events as a drama like a film. He stated: ‘I always think of this the way I would have scored a film … but it’s getting cut up into a giant puzzle and then reassembled in different ways depending on the game play.’Footnote ²¹ So, the aim is to follow the model of incidental music from film,Footnote ²² but in order to achieve this, the music needs to follow a complex procedure of real-time mixing. While some games only offer a repetitive music on/off experience, Dead Space offers a more sophisticated atmospheric and immersive musical soundtrack. Rather than simply branching, the game has four separate but related music streams playing all the time. These are ‘creepy’, ‘tense’, ‘very tense’ and ‘chaotic’. The relationship between these parallel tracks is controlled by what the game designers in this case call ‘fear emitters’, which are potential dangers anchored in particular locations in the gameworld. The avatar’s proximity to these beacons shapes and mixes those four streams, while dynamically altering relative volume and applying filters and other digital signal processing. This means that a constant variation of soundtrack is continually evident throughout Dead Space.Footnote ²³ Rather than being organized like a traditional film score, primarily around musical themes, the music in Dead Space is built around the synchronization of musical development precisely to avatar activity, and thus player input in relation to game geography and gameplay.

Plesiochrony

Plesiochrony aims not to match all dynamic changes of gameplay, but instead to provide a general ambience, making a unity of sound and image, and thus an immersive environment for gameplay. Player input is less important and might merely be at a level of triggering different music by moving to different locations. The term describes a general, imprecise synchronization.Footnote ²⁴ Plesiochrony works in an isomorphic manner (as discussed earlier), matching together atmosphere, location and general mood. Music and image fuse together to make a ‘whole’, such as a unified environment, following the principles of being isomorphically related to atmosphere, location and mood. This might be characterized as a ‘soft synchrony’ and corresponds to Collins’ notion of ‘adaptive audio’. The music modifies with respect to gameplay in a broad sense, but does not change constantly in direct response to a succession of player inputs. The music in these cases becomes part of the environment, and becomes instituted in the player’s mind as an emotionally charged and phenomenologically immersive experience. The music is often simply triggered, and plays on regardless of momentary gameplay. However, it nevertheless accomplishes an important role as a crucial part of the environment and atmosphere, indirectly guiding and affecting player activity. Indeed, perhaps its principal function is a general furnishing of ‘environmental’ aspects to the game, emphasizing mood, tone and atmosphere.Footnote ²⁵ For instance, in Quake, the music at times quite crudely starts and stops, often with almost no interactive aspect. It is simply triggered by the player’s avatar entering a new location. The score has something of the quality of diegetic ambience and at times could be taken to be the sound of the location. However, the sounds do not change when the avatar becomes immersed underwater, indicating that it is outside the game’s diegesis.

While 3-D games like Quake followed a model evident in most first-person shooters, other 3-D games have adopted different approaches. Indie game The Old City: Leviathan (2015) is not based on skilful fighting action or thoughtful puzzling. It is a first-person ‘walking game’, where a detailed visual environment is open to the player’s exploration. This game engages with a margin of video games history, games that are about phenomenological experience rather than progressive achievement and gameplay in the conventional sense. The music was a featured aspect of The Old City: Leviathan’s publicity, and the ‘lack’ of action-packed gameplay allows music to be foregrounded. As a counterpart to the visuals, the extensive and atmospheric music soundtrack is by Swedish dark ambient industrial musician Atrium Carceri. The game’s texture is emphasized by the player’s slow movement around the city location in first person, which allows and encourages appreciation of the landscape. Indeed, the game developers were obsessed with images and sounds: on their promotional website, the discussion alights on the difficulty of visually rendering puddles. More generally, the Postmod website states: ‘Players have the option to simply walk from start to finish, but the real meat of the game lies in the hidden nooks and crannies of the world; in secret areas, behind closed doors … ’. The music is not only an integrated part of the experience, but also follows a similar process of being open to exploration and contemplation, as ambient music tends to be quite ‘static’ and lacks a sense of developmental movement. The fact that there is little real gameplay, apart from walking, gives music a remarkable position in the proceedings, what might be called ‘front stage’. There is no need for dynamic music, and the music has the character of Atrium Carceri’s other music, as atmospheric ambience.Footnote ²⁶ The music is an equivalent to landscape in an isomorphic manner. It is like a continuum, develops slowly and has no startling changes. The aim is at enveloping ambience, with a vaguely solemn mood that matches the player’s slow movement around the large, deserted cityscape. In a way, the game ‘fulfils’ the potential of the music, in that its character is ‘programmatic’ or ambient music. While the music appears somewhat indifferent to the player, an interactive relationship is less important here as the music is an integrated component of the game environment (where the world to a degree dominates the player), and music functions as triggered isomorphic atmosphere. Less concerned with gameplay and more directly concerned with embodying environment, it lacks notable dynamic shifts and development. This decentres the player and makes them a visitor in a large dominant and independent sound- and image-scape. However, there is a coherence to the game’s world that synchronizes sound and image on a fundamental level.

Music-Led Asynchrony

In the first configuration of asynchrony, music articulates and sets controls for action and gameplay. Collins and others are less interested in this form of game audio, perhaps because initially it appears to be less defining and less specific to video games than ‘dynamic audio’. However, I would suggest that asynchrony marks a significant tradition in game audio. With games that are timebound, music often can appear to set the time for gameplay. This is most clear, perhaps, with music-based games like Guitar Hero (2005) or Dance Dance Revolution (1998), where the fixed song length dictates the gameplay. A fine example are the games corralled as the Wii Fit (2007) programme, a massively popular series of exercises that mix the use of a dedicated balancing board with the console’s movement sensor to appraise performance. The music for each task is functional, providing a basic atmosphere but also providing a sense of aesthetic structure to the often repetitive exercises.Footnote ²⁷ Each piece is fairly banal and does not aim to detract from concentration, but provides some neutral but bright, positive-sounding wallpaper (all major keys, basic melodies, gentle rhythmic pulses and regular four-bar structures). In almost every case, the music articulates exercise activity. It is not ‘interactive’ as such, and the precise duration of each exercise is matched by the music. In other words, each piece of music was written to a precise timing, and then the game player has to work to that. They should not, or cannot, make any changes to the music, as might be the case in other video games. The Wii Fit has a number of exercise options available, each of which lasts a relatively small period of time. These are based on yoga, muscle exercises or synchronized movement. There are also some balancing activities, which are the closest challenges to those of traditional video games, and include ski-jumping and dance moves coordinated to precisely timed movements. The crucial point for the player is that they can tell when the music is coming to an end, and thus when the particular burst of exercise is about to finish. Repetition of the same exercises causes the player to know the music particularly well and to anticipate its conclusion. This is clearly a crucial psychological process, where the player is forced to synchronize themselves and their activities directly to the game, dictated by the music.

Similarly, in Plants vs. Zombies (2009), changes in the musical fabric are based on the musical structure and requirements, not the gameplay. It is a tower defence game in which the player defends a house using a lawn (in the initial version), where they plant anthropomorphic defensive vegetables to halt an onrush of cartoon zombies. Each level of the game has a different piece of accompanying music, which comprises electronic approximations of tuned percussion and other traditional instrument sounds. The relation of music to action is almost negligible, with the regularity of the music furnishing a rigid and mechanical character to gameplay. The music simply progresses, almost in parallel to the unfolding of the game, as a homology to the relentless shambling movement of the zombies. The first levels take place during the day on the front lawn of the player’s unseen house and are accompanied by a piece of music that simply loops. The recorded piece mechanically restarts at the point where it finishes, irrespective of the events in the game.Footnote ²⁸ Because the music is not subject to interruptions from dynamic music systems, the music is free to feature a strong, regular rhythmic profile. The cue is based on a tango or habanera dance rhythm. The very regularity of dances means that, as an accompaniment to audiovisual culture, they tend to marshal the proceedings, to make the action feel like it is moving to the beat of the dance rather than following any external logic. In Plants vs. Zombies, this effect is compounded by the totally regular structure of the music (based on four-bar units, with successive melodies featuring particular instruments: oboe, strings and pizzicato strings respectively).Footnote ²⁹ There is a clear sense of integrity from the music’s regular rhythmic structure in the face of variable timings in the gameplay. The harmony never strays too far from the key of A minor (despite an F 7th chord), and the slow tango rhythm is held in the bass line, which plays chord tones with the occasional short chromatic run between pitches. However, the continuity of this music is halted by the player’s successful negotiation of the level, which triggers a burst of jazz guitar to crudely blot out the existing music. Plants vs. Zombies’ music conceivably could fit another game with a profoundly different character. Having noted this, the cue’s aptness might be connected most directly with the game on a deeper level, where the music’s regularity relates to the unceasing regularity of the gameplay. The cue is not synchronized to the action, apart from the concluding segment of the level. In this, a drum beat enters as an accompaniment to the existing music, appearing kinetically to choreograph movement through grabbing proceedings by the scruff of the neck, as what is billed on screen as a ‘massive wave of zombies’ approaches at the level’s conclusion. The assumption is that the beat matches the excitement of the action (and chaotic simultaneity on screen). However, again, if we turn the sound off, it does not have a significant impact on the experience of the game and arguably none at all on the gameplay. In summary, the time of the music matches the game section’s pre-existing structure, and the player has to work to this temporal and dynamic agenda rather than control it.

Parallel-Path Asynchrony

Unsynchronized music can also have a relationship of indifference to the game, and carry on irrespective of its action. It might simply co-exist with it, as a presence of ambiguous substance, and might easily be removed without significantly impairing the experience of the game. This situation is relatively common for mobile and iOS games and other games where sound is unimportant. Here, the music is not integrated strongly into the player’s experience. This relationship of asynchrony between the music and gameplay is embodied by this form of ‘non-functional’ music that adds little or nothing to the game and might easily be removed or replaced. Such music often has its own integrity as a recording and can finish abruptly, when the player concludes a game section (either successfully or unsuccessfully), sometimes interrupted by another musical passage. This owes something to the tradition of earlier arcade games, and although it may seem crude in comparison with the processes of dynamic music, it is nevertheless an effective phenomenon and thus persists. The way this non-interactive music carries on irrespective of gameplay makes it correspond to so-called ‘anempathetic’ music, where music seems indifferent to the on-screen action. Anempathetic music has been theorized in relation to film, but clearly has a relevance in all audiovisual culture. Michel Chion discusses such situations, which have the potential to ‘short-circuit’ simple emotional congruence to replace it with a heightened state of emotional confusion.Footnote ³⁰

Some games simply trigger and loop their music. Although ‘interactive dynamic music’ can be a remarkable component of video games and is evident in many so-called ‘four-star’ prestige video games, in its sophisticated form, it is hardly the dominant form of video game music. Disconnected ‘non-dynamic’ music exemplifies a strong tradition in game music, which runs back to the arcade.Footnote ³¹ While some might imagine this is a retrogressive format, and that arcade games were simplistic and aesthetically unsophisticated, this earliest form of video game sound and music remains a highly functional option for many contemporary games. For instance, this is highly evident in iOS and other mobile games, and games which lack dynamics and/or utilize highly restricted spatial schemes, such as simple puzzle games or games with heavily constrained gameplay. A good example is action platformer Crash Bandicoot (1996), which has a sonic backdrop of kinetic music emphasizing synthesized tuned percussion, while earcons/auditory icons and sound effects for game events occupy the sonic foreground. The music changes with each level, but has no requirement to enter a sophisticated process of adaptation to screen activity. Yet in purely sonic terms, the game achieves a complex interactive musical melange of the game score in the background with the highly musical sounds triggered by the avatar’s activities in the gameplay. Yet the game can happily be played without sound. The musical background can add to the excitement of playing, evident in arcade games such as jet-ski simulator Aqua Jet (1996), which required the player to ‘qualify’ through completing sections in an allotted time. The music merely formed an indistinct but energetic sonic wall behind the loud sound effects. Similarly, in driving game Crazy Taxi (1999), ‘game time’ ticks down and then is replenished if the player is successful. Here, songs (including punk rock by The Offspring) keep going, irrespective of action. The music can chop and change when a section is finished, and when one song finishes another starts. The situation bears resemblance to the Grand Theft Auto series (1997–2013) with its radio station of songs that are not synchronized with action.

Candy Crush Saga (2012) is an extremely successful mobile game of the ‘match three’ puzzle variety. Music is less than essential for successful gameplay, although the game’s music is highly effective despite many people playing the game ‘silent’. The merest repetition of the music can convince the player of its qualities.Footnote ³² It is simply a looped recording and in no way dynamic in relation to the game. The music has a fairly basic character, with a swinging 12/8 rhythm and clearly articulated chord changes (I-IV-IVm-I-V) at regular intervals. However, one notable aspect of the music is that at times it is extremely rubato, speeding up and slowing down, which helps give a sense that the music is not regimented, and the player indeed can take variable time with making a move. The player’s successive moves can sometimes be very rapid and sometimes taken at their leisure. Whatever happens, the music carries on regardless. However, it is intriguing that the music contains a moment of drama that pulls slightly outside of the banal and predictable framework. The shift from major to minor chord on the same root note is a surprising flourish in a continuum of highly predictable music. This exists perhaps to make the music more interesting in itself, yet this (slightly) dramatic change signalled in the music is not tied to gameplay at all. It is subject to the random relationship between music and gameplay: the music can suggest some drama where there is none in the game. It might be argued that this applies a moment of psychological pressure to the player, who might be taking time over their next move, or it may not. However, while the music is a repeated recording with no connection to the gameplay, oddly it has a less mechanical character than some game music. The rubato performance and jazzy swing of the music can appear less strictly regimented than some dynamic music, which needs to deal in precise beats and quantified rhythms to hold together its disparate stem elements.Footnote ³³

Conclusion

The process of ‘triggering’ is the key to the whole process of synchronization in video games and may be noticeable for the player or not. In some cases, the trigger may simply inaugurate more variations in the musical accompaniment, but in others it can cue the beginning of a whole new piece of music. A certain activity embarked upon by the player triggers a change in the game audio. This locks the synchronization of the player’s activity with sound and image elements in the game. Indeed, this might be taken as one of the defining aspects of video games more generally. This interactive point functions as a switch where the player triggers sound activity, such as when moving an avatar to a new location triggers changes in the music and ambient sound.Footnote ³⁴ This physical, technological fact manifests the heart of the sound-and-image synch that holds video games together. However, we should remember that rather than simply being a technical procedure, crucially it is also a psychological one. It initiates and signals a significant change in the gameplay (more tense gameplay, the start of a new section of play, etc.).

There has been much writing by scholars and theorists of video game music about interactive audio.Footnote ³⁵ However, ‘interactivity’ presents severe limits as a theoretical tool and analytical concept. Similarly, an analytical concept imported from film analysis used for dealing with video game audio – the distinction between diegetic and non-diegetic – also proves limited in its relevance and ability as a means of analysis. Instead of these two concepts, a division of music through attending to synchronization might be more fruitful for video game analysis. This would register the essential similarities between, for example, non-diegetic incidental music and diegetic sound environment for a particular location. Similarly, the dynamic changes that might take place in an interactive musical score could also follow a similar procedure in diegetic sound effects, when a player moves an avatar around a particular on-screen location. Analytical distinctions of music in video games should not be determined purely by mode of production. This has led to a focus on interactive video game music, which may well not be phenomenologically perceived by the player. This risks a focus on the underlying production, the coding level, to the detriment of addressing the phenomenological experience of the surface of gameplay.Footnote ³⁶ The electro-mechanical perfection of time at the centre of modern culture endures at the heart of video games, whose mechanistic structures work invisibly to instil a specific psychological state in the player.

7 ‘Less Music, Now!’ New Contextual Approaches to Video Game Scoring

Rob Bridgett

In video games, music (in particular that which is considered ‘the score’) is automatically thought of as a fundamental requirement of the identity of any production. However, in this chapter, I will discuss how musical scores are often over-relied upon, formulaic and also, conceivably in some game titles, not required at all.

From the earliest availability of sound technologies, music has been present and marketed as a crucial part of a game’s ‘identity’. Most admirably, composers themselves have garnered an elevated status as auteurs within the industry, in a way that very few designers, animators, visual artists or sound artists have done. While this marks an important achievement for music and its presence in games, we are arguably in a phase of video game aesthetics where the function and use of music, particularly orchestral music, is becoming increasingly jaded, formulaic and repetitive, and where more subtle and fresh approaches appear to be garnering much higher critical praise.

While this chapter carries a provocative title and opening statement, I should state up front that I am a very big fan of video game music, composers and game scores. The premise of this chapter is not to question or discourage the involvement of composers and music in game development, but instead to challenge and further understand some of the motivations behind music use in video games. It is also to provoke the reader to see a future in which music can be integrated and considered much more thoughtfully, effectively and positively in order to serve the game’s soundtrack in both production and aesthetic terms.

The Practicalities of Game (Music) Development

As a practitioner and audio director for the last twenty years in the video games industry, at both the triple-A and indie studio level, I have gained a good insight into how music is both commissioned and discussed amongst game developers, game directors, producers, composers and marketing departments.

Firstly, we need to consider some of the wider overall contexts in which music in contemporary video games production sits. Generally speaking, the soundtrack of a video game can be defined by three main food groups of voice-over,Footnote ¹ sound effects and music. I would also add an additional area of consideration to this, perhaps almost a fourth food group, of ‘mix’, which is the artistic, technical and collaborative balance of all three of those food groups in their final contexts, and in which the audio developers are able to make decisions about which of the three to prioritize at any given moment or any given state or transition in a game. The mix is something that should be (but is very rarely) considered with as much vigour as music, voice and SFX, as early and often as possible during the process of working across a game’s soundtrack.

To borrow from the production world of cinema sound, Ben Burtt and Randy Thom, both widely known and respected cinema sound designers and rerecording mixers, have often talked about what they refer to as ‘the 100% theory’.Footnote ² This is an observation whereby every different department on a film’s soundtrack will consider it 100 per cent of their job to tell the story. The sound FX and Foley department consider it 100 per cent of their job to hit all the story beats and moments with FX, the writers consider it their job to hit and tell all the story moments with dialogue lines, and the composer and music department consider it 100 per cent of their job to hit all the storytelling moments with music cues. The result, most often, is arriving at the final mix stage, with premixes full of choices to be made about what plays when, and what is the most important element, or balance of elements, in any particular moment. It is essentially a process of deferring decision making until the very last minute on the mix stage, and is usually the result of a director working separately with each department, rather than having those departments present and coordinated as part of the overall sound team in general. Certainly, a similar thing can be said to be true of video game sound development, whereby the final ‘shape’ of the game experience is often unknown by those working on the game until very close to the end of post-production. This is mainly because game development is an ‘iterative’ process, whereby the members of a game development team work out and refine the elements of the game, the story, the characters (quite often even the game engine itself) and the gameplay as they build it.

Iteration basically requires that something is first tried out on screen (a rough first pass at a gameplay feature or story element), then subjected to multi-disciplinary feedback, then refined from a list of required changes; and then that process of execution and review is repeated and repeated until the feedback becomes increasingly fine, and the feature or story element feels more and more satisfactory whenever played or viewed.

If we consider the difference between film and game preproduction for a moment, the sound teams in cinema are able to identify and articulate that the 100 per cent rule is being applied in their productions. They typically have a director and a pre-approved shooting script already in the bag before production begins, so they already know what they are making, who the characters are and what the story is, and likely all the shots in the movie: yet they still have every department thinking it is their job to tell 100 per cent of the story. The situation is even more exaggerated in games, because of the iterative nature of production and (often) highly segmented workflow, both of which keep a great many more factors in flux right the way through the cycle of creation. Developers know very little upfront about the story and the gameplay mechanics, and perhaps only understand the genre or the overall feeling and high-level rules of the initial creative vision as they set off into production. Coupled with the vast amount of work and re-work that occurs in game production in that iterative process, the 100 per cent theory is in overdrive, as creators work to cover what may be ultimately required by the game once the whole vision of the game has been figured out, and what is actually important has become apparent.

Deferring decisions to the mix in cinema, though less than ideal, is still very much something that can be done. There is time allotted at the end of the post-production period for this to occur, and there are craftspeople in the role of rerecording mixer who are heavily specialized in mixing. So, to further develop this picture of the production of game sound content, we also need to understand that mixing technology, practices, expertise and planning in video games have only recently come into existence in the last fifteen years, and are nowhere near anything that could be described as beyond rudimentary when compared with cinema. In contrast with those of games, because cinema mixes can be conceived and executed against a linear image, they are able to make sense of the overabundance of content in the three main food groups to a greater degree of sophistication. An interactive game experience, however, is being mixed at run-time – a mix which needs to take into account almost all possible gameplay situations that the player could initiate, and not just a linear timeline that is the same every time it is viewed.

One of the advantages that video game developers do have over those working in cinema is that their audio teams are very often, at least at the triple-A studio level, already embedded in the project from the earliest concept phases. This is something cinema sound designers and composers have long yearned for, as it would enable them to influence the other crafts and disciplines of the film-making process, such as script writing, set design and cinematography, in order to create better storytelling opportunities for sound and allow it to play the role of principal collaborator in the movie. In video games, the planning of financial and memory budgets, conversations about technology and so on begin very early in the concept phase. At the time of writing, at the end of the second decade of the twenty-first century, more than in the last ten years, the creative direction of the audio in a new title is also discussed and experimented on very early in the concept phase.

This organization enables musical style and tone, sound design elements and approaches, both artistic and technical, to be considered early on in the development. It also facilitates early collaborations across departments and helps to establish how audio content is to be used in the game. However, very little time or consideration is paid to the ‘mix’ portion of the soundtrack, or basically, the thinking about what will play when, and how the three food groups of voice, sound and music will interact with, and complement, one another.

Ideally, being on the project and being able to map out what will play when and in what situation, gives a distinct advantage to those present on a preproduction or concept-phase development team, in that this work will actually inform the team as to what food group needs to be aesthetically prioritized in each situation. This way, for example, music cues can be planned with more accuracy and less ‘overall coverage’ in mind. Rather than making the mix decisions all at the ‘back end’ of the project during a final mix period, some of these decisions can be made upfront, thus saving a lot of time and money, and also allowing the prioritization of a composer’s, or sound designer’s tasks, on the areas of most importance in their work.

This is admittedly a utopian vision of how projects could work and be executed, and I know as well as anyone that as one works on a project, one needs to be prepared to react and pivot very quickly to cover something that was not discussed or even thought about the week before. Requests can emerge suddenly and quickly on a game team: for example, a new design feature can be requested and is taken on board as something that would be fun for the game to include, which completely changes one’s budgeted music requirements. Conversely, a feature could be cut completely because it is not fun for the player, which can impact huge amounts of music and scheduled work that is no longer required, and with the clock ticking, one has to either reappropriate those existing cues to fit new contexts for which they were not composed, or completely scrap them and have a composer write brand new cues to fit the new contexts.

This is one of the inherent risks and challenges of video game development, that the iterative process of developing the game is focused on throwing away and cutting work that has been done, as often and early and continuously as possible in order to find the core essence of what the game is about. This means that a lot of work on both the music and sound side is carried out in ‘sketch’ mode, whereby everything is produced quite quickly and loosely (ready to be thrown away at a moment’s notice), in order to not solidify and polish the intentions too soon. This often means a lot of the recording and refinement of the SFX and musical score does not occur until very late in production phases. So you will rarely really hear the final mastered, texture and mix of the score working in context until the all-too-short post-production phases.

In addition to the challenges added to the creation of this musical content by these continually moving goalposts, we should consider the great technical challenges of implementing a video game score to play back seamlessly in the game engine. One of the additional challenges of writing and working in this medium is that delivery and implementation of the musical score occurs through run-time audio engine tools (perhaps through middleware such as FMOD or Wwise). These systems require very specific music stem/loop/cue preparation and delivery, and scripted triggering logic must be applied, so that each cue starts, evolves and ends in the desired ways so as to support the emotion and action of the game’s intensity seamlessly.

Given, then, that this is often the production process of working on music and sound in video games, we can start to understand how much of video game music has come into existence and how composers can spend two, three or four years (sometimes longer) on a single project, continually feeding it with music cues, loops, stingers and themes throughout the various milestones of a project’s lifespan.

Another challenge that may not be evident is that music is often used during this iterative production period as a quick fix to supply or imply emotion, or evoke a particular feeling of excitement within the game, as a shortcut, and more of a Band-Aid solution than a balanced approach to the project’s soundtrack.

The mechanical influences of a game’s design also have a significant impact upon music content. A rigid structure and cadence of playback for music cues in each map or level may have been created as a recipe or template into which music functionally needs to fit, meaning that strict patterns and formulae about some of the more mechanical ‘in-game’, non-story music content act, not as emotional signifiers, but as mechanical Pavlovian signifiers to the players.

Game Music 101: When Less Is Much More

When I am not working on games, and am instead playing, I often notice that music is almost continual (often described as ‘wall-to-wall’), in many game experiences. In these games, where music is initially established to be always present, in any subsequent moment when music is not present, the game somehow feels like it is missing something, or feels flat. In production, this would get flagged as a ‘bug’ by the QA (quality assurance) department. So an aesthetic trap very often lies in establishing music as an ever-present continuum right at the beginning, as from then on the audio direction becomes a slave to that established recipe of internalized logic in the game. For me, the danger of having ever-present music is not simply that ubiquitous music incurs a greater monetary burden on the project, but that this approach dilutes the power, significance and emotional impact of music through its continual presence.

Overused and omnipresent scores are a huge missed opportunity for the creative and aesthetic advancement of video game sound, and risk diluting the emotional impact of a game experience on an audience. Video game sound has largely been defined and understood aesthetically in a wider cultural context, over the last four decades, through its unsubtle use of repetitive music, dialogue and SFX. Though I understand, more than most, the huge pressure on game audio developers to implement and pack in as much heightened emotion and exaggerated impact into games as possible (depending much, of course, on already-established niche conventions and expectations of genre), it gives me great optimism that a more tempered and considered approach to the emotional and evocative power of game music and its relation to the other food groups of sound is already starting to be taken.

Development studio Naughty Dog have a consistent, highly cinematic and narrative-driven approach to their games, and subsequently their soundtracks feel aesthetically, and refreshingly, more like very contemporary movie soundtracks than ‘video game soundtracks’. The Last of Us (2013), and Uncharted (2007–2017) series all have what would generally be considered quite sparse musical treatments. Much of the soundtracks are focused on voice and character performances, and are also about dynamic sound moments. Only when you get to very emotional ‘key’ elements in the plot of a game do you hear music cues being used to drive home the emotional impact. This mature approach on the part of the game directors and the sound teams is certainly helped by the studio’s laser focus on narrative and cinematic experiences, which enables them to plot out and know much of what the player will be doing, in what order and in what environment far ahead of time during development.

Another fine example of a more sparse approach to a score is the Battlefield franchise by the developer DICE in Sweden. The game, being a first-person competitive shooter, necessitates an approach to the soundtrack where the player needs to hear all the intel and positional information available to them at all times during the chaos of the combat, with the utmost clarity. In this sense, a musical score would clearly get in the way of those needs during gameplay. In Battlefield 3 (2011), a non-diegetic musical score is used only to establish the emotional states of the pre- and post-battle phases, and of the campaign mission itself; otherwise, the musical score is absent, entirely by design, and beautifully fitting the needs of the players.

In many of my projects I have been asked by game directors to always support the mood and the storytelling with music, often on a moment-to-moment basis. This is a battle that I have often had to fight to be able to demonstrate that music, when overused like this, will have the opposite effect on the player, and rather than immersing them in the emotion of the game, will make them feel fatigued and irritated, as a result of continually being bombarded with the score and forced moods telling them how they should feel about what is on screen. It is the musical equivalent of closed captioning when you do not need closed captioning. Sometimes I have been successful and at other times unsuccessful in making these points, but this is a part of the everyday collaborative and political work that is necessary on a team. I strongly believe that new approaches to music, sound and voice will gradually propagate across the industry, the more we see, analyse and celebrate successful titles doing something different.

Certainly, after working under continual ‘use music to tell 100 per cent of the story’ pressure like this, I can safely say that in order to move forwards aesthetically and adopt an integrated approach to the soundtrack, in most cases, sound needs to carry more of the storytelling, voice needs to do less, and music most certainly needs to do a lot less.

As we build games, movies, experiences and emotions together in our teams, perhaps music should be the weapon we reach for last, instead of first. This way, as developers, we could at least have a more accurate picture of where music is really needed and understand more what the actual role of the musical score is in our work.

For me, there are two more pivotal examples of a more sparse and mature music aesthetic that need to be highlighted, and both of them also have approaches in which the music for the game could also easily be considered as ambiguous, and sound-effect-like. Those two games are Limbo (2010) and Inside (2016), both from the Danish studio Playdead, with sound and music by Martin Stig Andersen. The refreshing aesthetic in these games is evident from the opening moments, when the focus of the game is solely on a boy walking through the woods (in both games). Foley and footsteps are the only sounds that establish this soundtrack, and gradually as the gameplay experience unfolds, more environmental sound and storytelling through SFX and ambience begin to occur. It is only when we get fairly deep into the game that we hear our first music cue. In Limbo, the first ‘cue’ is particularly striking, as it is a deep, disturbing low note, or tone, which sounds when the player jumps their character onto a floating corpse in order to get to the other side of a river. This low tone has such a visceral impact when synchronized with this disturbing moment (in most games this would be a trivial mechanical moment of simple navigation), that it takes the game into a completely different direction. Rather than a ‘score’, the game’s ‘music’ seems in fact to be sound emanating from inside the soul of the character, or the dark black-and-white visuals of the world. And because the use of music is so sparse and rare – or at least, it is rare that you can identify particular sounds as specifically ‘musical’ – the impact of those cues and sounds becomes extremely intense. At the same time, the line between conventional musical materials and sound effects becomes blurred, allowing the sounds to gain a ‘musical’ quality. Many moments also stand out like this in the spiritual sequel to Limbo, Inside, the most memorable of which, for me, was the moment when the rhythmic gameplay of avoiding a sonic shockwave, and hiding behind metal doors, transitioned from being a purely sound-based moment to a purely musical one, and then back again. The way the transition was carried out elevated the experience to an entirely spiritual level, outside of the reality of the physics of sound, and into the realm of the intangible and the sacred.

Conclusions: Enjoy the Silence

Recently, I was very fortunate to work on a fantastic indie game called Loot Hound (2015) with some friends, and the interesting thing about that game was that it had absolutely no musical score and no musical cues whatsoever in the game. Even the menu and loading screen were devoid of music cues. The strange thing is that this approach was never consciously discussed between those of us working on the game. I am pretty sure this was not the first game with no music of any kind, but it was a very positive and renewing experience for me as someone who had come from working on triple-A, music-heavy games to be able to take time and express the gameplay and the aesthetics through just sound, mix and voice. In the end, I do not think any of us working on that title played it and felt that it was lacking anything, or even that music would have brought anything significant to the table in terms of the experience.

In the end, the encouraging thing was: I do not think anyone who played it even mentioned or noticed that there was no music in this game. The game was released through Steam, where one could easily keep a track on feedback and comments, and I do not recall seeing anything about the game’s lack of music, but did see quite a bit of praise for the sound and for the overall game experience itself. The game and process of its creation was certainly different and refreshing, and maybe one of many potential futures for game sound aesthetics to be celebrated and explored further in larger-scale productions to come.

A more integrated approach for all elements of the soundtrack is necessary to push games, and game scores, into new artistic and technical territories. This requires a lot of political and collaborative work on the part of game developers together in their teams, and also a desire to make something that breaks the mould of generic wall-to-wall game music. A part of establishing this new direction requires identifying, celebrating and elevating titles in which the soundtrack is fully more integrated, and where sound, music and voice gracefully handover meaning to one another, in a well-executed mix. A more integrated approach for music is also probably only possible once the contexts into which the music will fit can be understood more fully during all phases of production. And in that sense, the challenges are quite considerable, though not insurmountable. I believe that once the production and planning of music is integrated into game development schedules and production phases more carefully, the more integrated and enjoyable the score will be on an emotional level as a part of the overall experience of playing video games.

8 Composing for Independent Games: The Music of Kentucky Route Zero

Ben Babbitt

Composer Ben Babbitt is one third of the game development team Cardboard Computer, along with Jake Elliott and Tamas Kemenczy. Cardboard Computer is best known for the critically lauded adventure game Kentucky Route Zero, which was released in five instalments between 2013 and 2020. Here, Babbitt describes his background, philosophy and experiences writing music as part of a small team of game creators.

Background

I grew up around music – both of my parents are employed as musicians, playing in orchestras and teaching. My father plays violin and viola and my mother plays cello. I was surrounded by a lot of what was primarily classical music growing up and there was always somebody practising or teaching or listening to music. I know my mother was practising a lot of Bach cello music when she was pregnant with me, so I must’ve been right up against the body of the cello for part of my time in utero. I’m sure that had some kind of prenatal influence on my developing brain. As I understand it, I became fascinated with music and sound very early on and started playing violin and piano when I was four or five. That fascination really took hold, though, and manifested as a desire to write my own music when I was around twelve or thirteen.

When I was growing up, my parents didn’t allow my brothers and I to play video games, so we didn’t have any consoles. I played at friends’ houses but that was the extent of my relationship to the history of games. I never had games in mind as a context for my composition work, and only got involved when I met Jake Elliott (co-creator of Kentucky Route Zero) when we were in university together at the School of the Art Institute of Chicago (SAIC). He and Tamas Kemenczy were just starting to work on Kentucky Route Zero, and Jake asked me to write some music for it. Seven years later, we managed to complete the project.

I’d been writing my own music and playing in bands actually since I was about eleven or twelve, but I didn’t focus on composition or think of my music that way until I went to study music when I was nineteen. I focused on composition at a conservatory in Chicago before I left and finished my degree at SAIC where I met Jake and Tamas. I’d done a few collaborations with a choreographer I went to school with but Kentucky Route Zero was actually the first project I was hired to work on as a composer. It was very much a learning process for me as I got deeper into the work. Because I didn’t have much context for music in games beyond the little I’d experienced as a kid, in retrospect I think I approached it kind of blindly and intuitively. Conversations with Jake and Tamas helped a lot with finding an aesthetic direction for the music, but I was not drawing from a deeper set of experiences with other game music.

Composing for Games

I’m still becoming familiar with the process of creating music for other media, as I’ve only scored a few film projects so far. One major difference that strikes me is that when I’m working on a film, I’m able to create musical/sonic moments that correlate directly and specifically with visual moments on screen down to the frame, and that relationship remains intact – assuming everybody involved approves. There’s arguably a wider range of possibility in terms of how music can interact with the image in games. Of course it’s still possible to create music that relates to specific moments in a game as well, but it’s a different kind of material inherently because it’s unfrozen – it’s always moving even when the activity in a scene can be slowed or stopped. I don’t yet feel like I’ve explored the more real-time fluid relationship between music and image that’s possible in games, but it’s interesting to me. In some ways, it relates to an earlier tradition in music history where composers like John Cage and Morton Feldman were really interrogating the notion of authorship and distancing themselves from it via modes of chance and aleatoric processes. Games make that very easy to explore in the work itself, not only in the process of creating the music. One of my favorite composers and thinkers and game makers, David Kanaga (composer for Proteus, 2013), has explored what he calls the ‘affordances’ of games in relation to music and sound much more than I have.Footnote ¹

Kanaga is one of the most interesting and inspiring composers and thinkers working primarily in games. Our work is very different aesthetically but I think we share a lot of musical and philosophical interests. Liz Ryerson (dys4ia) is another composer working both in and outside of games making great work.Footnote ² I return time and again to the music of twentieth-century French composer Olivier Messiaen, and his work had a clear influence on my music in Kentucky Route Zero. My relationship with music by other composers and artists took on a kind of research-based orientation for each new musical modality incorporated into the game. In other words, my influences varied from one part of the project to another, especially over the course of so many years. I’ve always wanted to continue working on my own music projects and collaborate with people in a music context, but it’s been challenging to balance those interests with the workload of such a sprawling game as Kentucky Route Zero and other commercial interests I have. That continues to be a challenge, and I think anybody who works in a creative field can relate to that tension between investing in their ‘career’ or paid work, and investing in their own creative pursuits.

Working on Games

I don’t think it’s necessary to study composition for games specifically in order to work in that context. That said, I’m sure there are many aspects of a curriculum focused on composition for games that are useful and could help to cut down on the time it can take to find one’s footing in a field when taking the trial-by-fire approach in learning ‘on the job’. I think loving music and having some sense of curiosity about it and hunger to continue exploring musical possibilities is more useful than any kind of specific technical knowledge or traditional skill or facility on an instrument. For me, any technical knowledge or instrumental facility I’ve developed over time has grown out of that initial impulse to explore musical possibilities coming initially from my love for music as a listener, I think.

I became involved with Kentucky Route Zero through knowing Jake Elliott a little bit from a class we were in together at school. So I’d say that’s a great place to start, whether someone is in school or not, there might already be people in one’s community interested in making games and working with a composer. It can be really generative to work with friends and grow together and flatten the hierarchy a bit between employer/employee. In terms of finding jobs beyond one’s own community, I know there are some really great resources on the Gamasutra website that has a jobs section, and also Twitter and Discord seem to be really active spaces for making new connections. Having work examples, even if one has never been hired to score a project, is very helpful in the process, especially when looking to work with people or studios outside of one’s community or social circle.

For me, the most important thing to think about before starting a new project is whether the work sparks any real creative direction for me and if that’s something I’m drawn to and connect to; in other words, does it seem exciting or interesting to work on? Of course, we don’t always have the luxury of working only on the projects that are exciting to us, but I try to at least think about that before signing on to do something. I think if any money will be exchanged, having even just a simple contract or agreement in place before starting to work with someone is important. Also communicating one’s needs to collaborators from the outset can be very helpful in establishing a clearly defined relationship that helps minimize the back and forth that can become tiresome.

I would advise aspiring composers to make the best music that you can make and judge that by how much you enjoy it and love the result, and then find ways to get that music to potential collaborators by any means. Don’t be too self-critical at first, it can be difficult work on all fronts.

Kentucky Route Zero

Kentucky Route Zero is a point-and-click adventure, which follows world-weary antiques delivery-driver Conway on his mission to complete what is apparently his ‘final delivery’. As Conway seeks to locate the destination, he finds himself entangled in a series of curious events and is variously caught up in the lives of other characters. Adding to the sense of mystery, this ‘magical realist adventure’ frequently adopts an experimental approach to chronology, cause-and-effect, perspective and space. Kentucky Route Zero comprises five parts, called ‘acts’, split into ‘scenes’, each announced with a title card. Besides the acts, there are additional interludes and other satellite texts which serve to enrich the story and world of the game. The game includes ambient electronic underscore and sequences when characters are seen performing – in particular, a country band (credited as the Bedquilt Ramblers) are shown in silhouette performing Appalachian music and in Act III, players are introduced to an electronic musical duo, Junebug and Johnny. When Junebug and Johnny later perform a song at a gig, the player is able to select the lyrics of the song as it is performed.

The Project and Development

Kentucky Route Zero was a project developed over quite a long period of time, about nine years. I wrote the first pieces of music for it in 2011 and we published the fifth and final episode in January of 2020. The process unfolded organically in the sense that we were not working under prescribed production or release schedules coming from an outside source, like a producer or publisher; for almost the entire duration of the development process, there were only three people involved in the project. We did partner with Annapurna Interactive in the last two years of development, but that was to port the game to consoles; they were not directly involved in the development process, although they were immensely helpful in many ways. They became invaluable cheerleaders and enablers in those final hours.

At the beginning of my time working on the game, Jake, Tamas and I had a few conversations about the direction of the music and at that time they had some fairly specific ideas of what they wanted to hear. They asked me to record a set of traditional folk songs that they had selected and planned to use in each episode or ‘act’ of the game to provide a kind of meta-commentary on the events of the story. Additionally, they wanted me to create instrumental electronic ‘ambient’ versions of each of those songs that could be blended seamlessly with the acoustic versions. They also gave me a collection of old recordings from the 1930s of the songs by the Kentucky musician Bill Monroe that I could use as the primary reference for my own versions for the game. So at that point, I was really following their prompts and in many ways executing their vision for the music. I think this is not uncommon as a process between a composer and a director or game developer.

So I did my best to complete their request and turned in that first batch of work. It was after that point that Jake and Tamas reconfigured the project and found its five-act structure, and so that first set of pieces I had composed were mostly used in the first act. Over time as I became more involved as a collaborator and co-creator in the project, that initial process changed quite a bit. Although we continued to have conversations about the direction of the music that still included feedback and creative prompts from Jake and Tamas, they graciously allowed me to develop my own voice as composer and to bring my own ideas to whatever we were working on at the time. Of course, a lot of the music in the game still came from narrative decisions already made that involved a character performing in a particular scene, for example, which were moments that had already been envisioned by the time I started working on them. But there was a balance between predetermined musical needs for any given scene and space for me to suggest new musical possibilities, in addition to having the creative freedom to inflect those predetermined musical moments with my own sensibilities.

Often before beginning to compose for any given scene or new section of the game, I would be able to read through Jake’s script or play through works in progress delivered by Tamas that would be very helpful in my process of making music, hopefully feeling right at home in those scenes and moments when finally implemented. Although there were phases of the project where my work would come later in the process compared to where Jake and Tamas were, which would allow me to react to what they’d done, there were other portions of the development when we would all be working in tandem on the same thing. Especially towards the end of the project, the process became less linear and more simultaneous. We each had our siloed roles but were very often in dialogue with each other about what still needed to be done and the progress of our individual processes.

Early on in the development process of Kentucky Route Zero, it was decided that we would stop announcing release dates ahead of time because I think we’d given a release date for the second act and missed that deadline, to vocal dismay from some of the players. As a result, we would work on a new act or interlude for however long it took to complete it, and almost always publish it without fanfare or warning as soon as it was finished. This was possible because we were self-publishing the work and were fortunate not to have to rely on traditional means of presenting new work with PR announcements and necessary lead times to secure coverage. It was never a priority, I don’t think, to maximize the attention we might be able to garner for the project. It was more interesting to us – if I may speak on the others’ behalf – to focus on the work itself and making it everything it could be. This certainly ended up shaping the timeline of development greatly, and as such it was a learning process with regard to pacing and planning our work in order for the project to remain feasible to complete.

Some time between publishing the fourth act and working on the interlude Un Puebla De Nada, we started working with Annapurna Interactive on the process of porting the game to PS4, Xbox One and Nintendo Switch, as well as localizing or translating all of the text in the game. This proved to be quite an intensive and time-consuming process, and also had an effect on the development of the game itself, despite our relationship with Annapurna being focused on the console port. This all relates back to the question of the scope of a project, and how that can impact the time it takes to complete it. I think, from my experience working on Kentucky Route Zero, it can be quite difficult to understand the scope of a project before diving into the development process and spending some time with it. That may have been more true for all of us who worked on KRZ because it was our first major project like this, but I think games are just inherently difficult to plan out accurately with regard to something like duration, for example, as opposed to knowing one is aiming to make a 90–120 minute feature film. I’m sure more experienced developers have a better sense of that relationship between scope and the timeline, but it’s something I’m still striving to get a better grasp of.

It’s interesting to have had both the experience of self-publishing and working with an outside publisher like Annapurna Interactive during the development process of Kentucky Route Zero. I think there are many benefits to both approaches, and both certainly come with their challenges. It does seem that in the time since we began developing Kentucky Route Zero in 2011, a number of interesting self-publishing open platforms have cropped up, like itch.io and Twine, and I’m sure there are others I don’t know about. It is absolutely a privilege to work with a publisher like Annapurna to bring a project to a wider audience and make it available on more platforms than would be possible alone, but I think Jake, Tamas and I will always feel a kinship with the self-directed autodidactic relationship with our work that we cultivated over the years when we were making and publishing our work independently. It was very interesting to see everything that goes into planning a wider release, and actually participating more directly in the ‘marketplace’ and making an effort to engage an audience during the phase of wrapping up content on Kentucky Route Zero, and preparing to release the project on consoles simultaneous with the publication of the final act on PC.

The Music of Kentucky Route Zero

The music in Kentucky Route Zero grew directly out of those early conversations I had with Jake and Tamas back in 2011. At that time, they came to me with very specific musical needs for what the project was then and I did my best to fulfil them. As I mentioned earlier, the traditional folk songs in each act and the ambient electronic score as foundational elements of the music in the game had been conceptualized by the time I was brought on to the project. They did, however, give me a lot of creative freedom when it came to my interpretation and execution of their prompts. Initially, I was not involved in selecting and placing the compositions that I made into the game itself. By the time I was more involved in a collaborative sense, working more closely with Jake and Tamas beginning with Act II, the sound world and the musical palette and their function within the story had already been established. As the project continued to develop and we encountered further beats of the story, and new characters and locations, there were more opportunities to introduce new types of music to the game.

In some ways, music in Kentucky Route Zero is used quite narratively in the sense that it’s directly tied to and part of the storytelling; characters are often playing the music heard in the game. The traditional regional folk music performed by the Bedquilt Ramblers helps to anchor the story in a real location and hopefully transmits some sense of its cultural history. KRZ is a story about America, and as such it seemed necessary to draw on regional American vernacular music in some form given that it’s such a part of the fabric of that culture.

To that end, I think that the music in the game might help to create a sense of occupying a specific ‘world’ or place, something that the text and imagery very much do as well. The music also helps to inflect different characters with qualities that might not come through in the same way otherwise. Junebug and Johnny display something about themselves when they are performing their music that is arguably ineffable and too subjective to concretize in any other way.

Another aspect of the role I think music serves in KRZ is related to the pace of the game itself. KRZ takes place almost entirely at night, and almost all of the story is delivered through a text window that appears on screen without voice actors reciting the lines. This nocturnal quality, in combination with the fact that the game involves a lot of reading, sort of dictates a certain slowness; tight and fast-paced action would not be compatible with these formal aspects of the project. The use of minimalist ambient electronic music, for example, might also help the player to slow down in a meditative sense, and become immersed in the pace of the story and encourage a certain kind of engagement with the story, to kind of drift into it, as it were. I hope that my music in KRZ helps to create the emotional textures specific to this story and its characters.

The performances by the Bedquilt Ramblers were already built into the role music would play throughout the project by the time I became involved. However, those were not necessarily intended to be central or even overtly performative events in the story. The role of the folk band calls back to that of the chorus in a Greek tragedy, which provided commentary about the events of the story from the sidelines. More central musical performances started to creep up in Act II, with a somewhat hidden performance of a piece of organ music in the Bureau of Reclaimed Spaces, but didn’t really coalesce into being the focus of a scene until Junebug’s performance in Act III. I think that sequence in particular might’ve been compelling to people because it was the first time in the course of the game where all of the different storytelling modalities were employed simultaneously; the text and the imagery and the music all became fused together, with the lyrics becoming dialogue choices and the visual focus of the scene also being the performance of the music.

Junebug and Johnny’s performance in the third act was something that had been decided, I think, a long time before we actually got to that scene in the development process. Jake and Tamas had a number of references in mind for that character and the kind of music they might perform, the starting touchstone being ’60s and ’70s country singers like Loretta Lynn and Dolly Parton, both in terms of their musical and lyrical tropes and of their performance styles and stage personas. After some initial conversations about the direction for the music that also touched on presenting the song as an ‘interactive’ moment with selectable verses presented as karaoke-like text, I started working on it. It took a number of iterations before I arrived at a musical result I was happy with. Jake had written the lyrics so it was very much a collaborative effort. Given the fact that Junebug and Johnny were androids and not humans, Junebug’s singing voice was a less straightforward albeit important component of the performance that needed to be figured out. After ruling out the option of using a vocoder,Footnote ³ or another overtly synthetic process for this, I experimented with treating my own voice through pitch and formant shifting and discovered a sort of uncanny instrument that fell somewhere between something palpably organic and human and something more heavily manipulated and artificial.

Reflecting on the Project

I have been continually surprised by the longevity of interest in this project, and the fact that it’s found such a supportive audience. I don’t think any of us could have foreseen most of the things that have happened with this project, least of all the kind of reception it’s received. When Jake first asked me if I was interested in writing some music for it, I was very young and had never worked on anything like it. I had no context for what went into developing a game and how much time that could take. I’m not sure that Jake and Tamas did either. The risks in accepting the invitation did not occur to me at the time. I think if any of us had known that it would take so many years to complete, and come with some very challenging periods of work, we might’ve felt differently about taking something so ambitious in scope in the first place. I think the extent of its risks became more evident as we all got deeper into the process. That’s certainly true for me. It proved at times to be very difficult to sustain the development and keep it viable for all of us to continue devoting the bulk of our time to it over the course of the project because of the unpredictable nature of its income. Because the development was funded through sales of the game, naturally that could be quite volatile and inconsistent and with it came stretches when very little money was coming in.

The process of making the work a viable means of earning a living continues to be a challenge, even if it is temporarily deferred from time to time. And aside from the practical challenges, working with and communicating with my collaborators will probably always come with its challenges as well as its evident benefits. Also, as I mentioned earlier, balancing my own creative interests with my obligations in the projects I take on is something that seems to only become more challenging, despite that tension existing to some degree the entire time I’ve been working as a composer.

The Broader Contexts and Meanings of Game Music

I think it’s very important to consider the politics and ethics of representation when conceptualizing and choosing the music in a video game. All art is political, right? And as such all video games are political in terms of relating to or even reflecting a set of cultural values and interests through choices of representation, depiction and engagement with the conventions of the medium and its history. Although the decision to use traditional folk music from the region where KRZ is set was not mine initially, I think Jake and Tamas were conscious of the importance of respectful representation when they envisioned its place in the game. I recorded most of those songs in 2011 so I was much younger then and I think, generally less aware of some of these issues than I could’ve been. I actually took their initial prompts to record that set of songs based on the Bill Monroe records at face value; I did my best to recreate those historical recordings accurately, even mimicking recording techniques used at the time. Over the course of the project, I was compelled to move away from a kind of historical re-enactment approach to representing that music, and so the songs mutated into less straightforwardly traditional versions. That said, there are likely examples to be found of earlier interpretations of these songs that were just as unusual or divergent from the Bill Monroe interpretations.

With regard to the ways in which the music in KRZ engages with identity, the most overt example of that is the Junebug performance in Act III. Junebug and Johnny transform in an instant from rugged itinerant musicians into glossed up otherworldly performers, and despite having expressed their views with a dry and sarcastic sense of humour with no trace of sentimentality prior to the performance, Junebug becomes the vehicle for a heartfelt and heartbroken transmission of feeling and sentiment presented in the form of pop and country lyrical and musical tropes. The tension between the persona of the artist and the content of their work is made manifest. Junebug and Johnny relay parts of their origin story as involving having to remake themselves as they see fit, showing that their identities are an ever-shifting and alterable material. Their relationship with music is very much centred in their conscious embrace of the notion that identity itself is mutable, fluid and dynamic. Although I didn’t make the conscious connection at the time, that ethos described by those characters informed my process in making that music, most specifically in the choice to use my own voice and transform it into something unrecognizable for Junebug’s vocal performance.

Footnotes

4 Building Relationships: The Process of Creating Game Music

¹ Paul Hoffert, Music for New Media (Boston, MA: Berklee Press, 2007), 16; Richard Stevens and Dave Raybould, The Game Audio Tutorial (Burlington, MA: Focal, 2011), 162–3.

² Michael Sweet, Writing Interactive Music for Video Games (Upper Saddle River, NJ: Addison-Wesley, 2015), 80.

³ Winifred Phillips, A Composer’s Guide to Game Music (Cambridge, MA: MIT Press, 2014), 136–7.

⁴ Phillips, Composer’s Guide, 119; Sweet, Interactive Music, 55–6; Chance Thomas, Composing Music for Games (Boca Raton, FL: CRC Press, 2016), 52.

⁵ Thomas, Composing Music for Games, 249–53.

⁶ Phillips, Composer’s Guide, 166.

⁷ Simon Wood, ‘Video Game Music – High Scores: Making Sense of Music and Video Games’, in Sound and Music in Film and Visual Media: An Overview, ed. Graeme Harper, Ruth Doughty and Jochen Eisentraut (New York: Continuum, 2009), 129–48.

⁸ Phillips, Composer’s Guide, 188.

⁹ Sweet, Writing Interactive Music, 149.

¹⁰ Guy Whitmore, ‘A DirectMusic Case Study for No One Lives Forever’, in DirectX 9 Audio Exposed: Interactive Audio Development, ed. Todd M. Fay with Scott Selfon and Todor J. Fay (Plano, TX: Wordware Publishing, 2003), 387–415.

¹¹ Martin O’Donnell, ‘Producing Audio for Halo’ (presentation, Game Developers Conference, San Jose, 21–23 March 2002), accessed 8 April 2020, http://halo.bungie.org/misc/gdc.2002.music/.

5 The Inherent Conflicts of Musical Interactivity in Video Games

¹ Gordon Calleja, In-Game: From Immersion to Incorporation (Cambridge, MA: The MIT Press, 2011), 124.

² Dominic Arsenault, ‘Narratology’, in The Routledge Companion to Video Game Studies, ed. Mark J. P. Wolf and Bernard Perron (New York: Routledge, 2014), 475–83, at 480.

³ See, amongst others, Richard Jacques, ‘Staying in Tune: Richard Jacques on Game Music’s Past, Present, and Future’, Gamasutra, 2008, accessed 8 April 2020, www.gamasutra.com/view/feature/132092/staying_in_tune_richard_jacques_.php; Michael Sweet, Writing Interactive Music for Video Games (Upper Saddle River, NJ: Addison-Wesley, 2015); Simon Wood, ‘Video Game Music – High Scores: Making Sense of Music and Video Games’, in Sound and Music in Film and Visual Media: An Overview, ed. Graeme Harper, Ruth Doughty and Jochen Eisentraut (New York and London: Continuum, 2009), 129–48.

⁴ Claudia Gorbman, Unheard Melodies: Narrative Film Music (London: BFI, 1987).

⁵ Annabel J. Cohen, ‘Film Music from the Perspective of Cognitive Science’, in The Oxford Handbook of Film Music Studies, ed. David Neumeyer (New York: Oxford University Press, 2014), 96–130, at 103.

⁶ Annabel J. Cohen, ‘Congruence-Association Model and Experiments in Film Music: Toward Interdisciplinary Collaboration’, Music and the Moving Image 8, no. 2 (2015): 5–24.

⁷ Cohen, ‘Congruence-Association Model’.

⁸ Cohen, ‘Congruence-Association Model’, 10.

⁹ Scott Selfon, Karen Collins and Michael Sweet all distinguish between adaptive and interactive music. Karen Collins, ‘An Introduction to Procedural Music in Video Games’, Contemporary Music Review 28, no. 1 (2009): 5–15; Scott Selfon, ‘Interactive and Adaptive Audio’, in DirectX 9 Audio Exposed: Interactive Audio Development, ed. Todd M. Fay with Scott Selfon and Todor J. Fay (Plano, TX: Wordware Publishing, 2003), 55–74; Sweet, Writing Interactive Music.

¹⁰ Guy Whitmore, ‘Design with Music in Mind: A Guide to Adaptive Audio for Game Designers’, Gamasutra, 2003, accessed 8 April 2020, www.gamasutra.com/view/feature/131261/design_with_music_in_mind_a_guide_.php; www.gamasutra.com/view/feature/131261/design_with_music_in_mind_a_guide_.php?page=2.

¹¹ The game may instigate an event, but it only does so because the player has entered the area or crossed a trigger point – so whether this is ‘game’-instigated or ‘player’-instigated is probably a matter of debate.

¹² If the motif is played on the strings then the player knows the ‘Spitter’ has spawned further away. Each special enemy in Left 4 Dead 2 has a specific musical spawn sound: a two-note motif for the ‘Smoker’, a three-note motif for the ‘Hunter’, four for the ‘Spitter’, five for the ‘Boomer’, six for the ‘Charger’ and eight for the ‘Jockey’.

¹³ Michel Chion, trans. Claudia Gorbman, Audio-Vision: Sound on Screen (New York: Columbia University Press, 1994), 25–7.

¹⁴ Richard Stevens, Dave Raybould and Danny McDermott, ‘Extreme Ninjas Use Windows, Not Doors: Addressing Video Game Fidelity Through Ludo-Narrative Music in the Stealth Genre’, in Proceedings of the Audio Engineering Society Conference: 56th International Conference: Audio for Games, 11–13 February 2015, London, AES (2015).

¹⁵ The very attentive listener will note that the wolf drums also indicate the direction from which they are coming via the panning of the instruments.

¹⁶ Music can also indicate ludic information through its absence. For example, in L.A. Noire (2011), when all clues have been collected in a particular location, the music stops.

¹⁷ Axel Stockburger, ‘The Game Environment from an Auditive Perspective’, in Proceedings: Level Up: Digital Games Research Conference (DiGRA), ed. Marinka Copier and Joost Raessens, Utrecht, 4–6 November 2003; Patrick Ng and Keith V. Nesbitt, ‘Informative Sound Design in Video Games’, 13 Proceedings of the 9th Australasian Conference on Interactive Entertainment, 30 September–1 October 2013, Melbourne, Australia (New York: ACM, 2013) 9: 1–9: 9.

¹⁸ Mickey-Mousing ‘consists of following the visual action in synchrony with musical trajectories (rising, falling, zigzagging) and instrumental punctuations of actions’. Chion, Audio-Vision, 121–2.

¹⁹ Royal S. Brown, Overtones and Undertones: Reading Film Music (Berkeley: University of California Press, 1994), 54.

²⁰ Ernest Adams, ‘Resolutions to Some Problems in Interactive Storytelling Volume 1’ (PhD Thesis, University of Teesside, 2013), 96–119; Tulia-Maria Cășvean, ‘An Introduction to Videogame Genre Theory: Understanding Videogame Genre Framework’, Athens Journal of Mass Media and Communications 2, no. 1 (2015): 57–68.

²¹ Gorbman identifies that film music is typically designed not to be consciously noticed or ‘heard’. She identifies incongruence as one of the factors that violates this convention, along with technical mistakes, and the use of recognizable pre-existing music. Claudia Gorbman, ‘Hearing Music’, paper presented at the Music for Audio-Visual Media Conference, University of Leeds, 2014.

²² Martin O’Donnell, ‘Martin O’Donnell Interview – The Halo 3 Soundtrack’, (c.2007), YouTube, accessed 13 October 2020, www.youtube.com/watch?v=aDUzyJadfpo.

²³ Austin Wintory, ‘Journey vs Monaco: Music Is Storytelling’, presented at the Game Developers Conference, 5–9 March 2012, San Francisco, accessed 20 October 2020, www.gdcvault.com/play/1015986/Journey-vs-Monaco-Music-is.

²⁴ Footnote Ibid.

²⁵ Three examples demonstrate a typical way in which this is implemented within games. In Deus Ex: Human Revolution (2011), Dishonored and Far Cry 3, the music starts when the player has been spotted, and fades out when the NPC is no longer pursuing them.

²⁶ The term ludonarrative dissonance was coined by game designer Clint Hocking to describe situations where the ludic aspects of the game conflict with the internal narrative. In his analysis of the game BioShock, he observes that the rules of the game imply ‘it is best if I do what is best for me without consideration for others’ by harvesting the ‘Adam’ from the Little Sisters characters in order to enhance the player’s skill. However, in doing this the player is allowed to ‘play’ in a way that is explicitly opposed by the game’s narrative, which indicates that the player should rescue the Little Sisters. As Hocking comments, ‘“helping someone else” is presented as the right thing to do by the story, yet the opposite proposition appears to be true under the mechanics’. Clint Hocking, ‘Ludonarrative Dissonance in Bioshock’, 2007, accessed 8 April 2020, http://clicknothing.typepad.com/click_nothing/2007/10/ludonarrative-d.html.

²⁷ Sho Iwamoto, ‘Epic AND Interactive Music in “Final Fantasy XV”’, paper presented at the Game Developers Conference, 27 February–3 March 2017, San Francisco, accessed 15 October 2020, www.gdcvault.com/play/1023971/Epic-AND-Interactive-Music-in.

²⁸ The musical approaches referred to here as parallel and transitional are sometimes referred to respectively as Vertical Re-Orchestration and Horizontal Re-Sequencing (Kenneth B. McAlpine, Matthew Bett and James Scanlan, ‘Approaches to Creating Real-Time Adaptive Music in Interactive Entertainment: A Musical Perspective’, The Proceedings of the AES 35th International Conference: Audio for Games, 11–13 February 2009, London, UK), or simply as Vertical and Horizontal (Winifred Phillips, A Composer’s Guide to Game Music (Cambridge, MA: The MIT Press, 2014), 185–202).

²⁹ ‘Noise’ in this instance refers to any rapid fluctuations in data that can obscure or disrupt the more significant underlying changes.

³⁰ Barry Ip, ‘Narrative Structures in Computer and Video Games: Part 2: Emotions, Structures, and Archetypes’, Games and Culture 6, no. 3 (2011): 203–44.

³¹ Jim Dooley, quoted in NPR Staff, ‘Composers Find New Playgrounds in Video Games’, NPR.org, 2010, accessed 8 April 2020, www.npr.org/blogs/therecord/2010/12/01/131702650/composers-find-new-playgrounds-in-video-games.

³² Richard Bartle, ‘Hearts, Clubs, Diamonds, Spades: Players Who Suit MUDs’, Journal of MUD Research 1, no. 1 (1996), accessed 13 October 2020, reproduced at http://mud.co.uk/richard/hcds.htm.

³³ Alessandro Canossa and Sasha Makarovych, ‘Videogame Player Segmentation’, presented at the Data Innovation Summit, 22 March 2018, Stockholm, accessed 13 October 2020, www.youtube.com/watch?v=vBfYGH4g2gw.

³⁴ Chris McEntee, ‘Rational Design: The Core of Rayman Origins’, Gamasutra, 2012, accessed 8 April 2020, www.gamasutra.com/view/feature/167214/rational_design_the_core_of_.php?page=1.

³⁵ Imagine two versions of a film: in one the main character is John McClane, while in another it is Sherlock Holmes. The antagonists are the same, the key plot features the same, but their approaches are probably very different, and necessitate a very different musical score.

³⁶ Playing a game with an extreme stealth approach is sometimes referred to as ‘ghosting’, and playing without killing any enemies a ‘pacifist’ run.

³⁷ Parallel forms or vertical remixing involve altering the volume of parallel musical layers in response to game events or variables. Horizontal forms or horizontal resequencing involve transitions from one musical cue to another. See references for further detailed discussion. See Richard Stevens and Dave Raybould, Game Audio Implementation: A Practical Guide Using the Unreal Engine (Burlington, MA: Focal Press, 2015) and Sweet, Writing Interactive Music.

³⁸ K. J. Donnelly, Occult Aesthetics: Synchronization in Sound Film (New York: Oxford University Press, 2014).

³⁹ There is evidence that ‘infants develop expectation for the onset of rhythmic cycles (the downbeat), even when it is not marked by stress of other distinguishing spectral features’, since they exhibit a brainwave pattern known as mismatch negativity, which occurs when there is a change to an otherwise consistent sequence of events. István Winkler, Gábor P. Háden, Olivia Ladinig, István Sziller and Henkjan Honing, ‘Newborn Infants Detect the Beat in Music’, Proceedings of the National Academy of Sciences 106, no. 7 (2009): 2468–71 at 2468. The perception of a fictional accent when listening to a monotone metronome sequence is known as subjective rhythmization (Rasmus Bååth, ‘Subjective Rhythmization’, Music Perception 33, no. 2 (2015): 244–54).

⁴⁰ John Sloboda, ‘Music Structure and Emotional Response: Some Empirical Findings’, Psychology of Music 19, no. 2 (1991): 110–20.

⁴¹ Chion, Audio-Vision, 13–18.

⁴² Elizabeth Medina-Gray, ‘Modularity in Video Game Music’, in Ludomusicology: Approaches to Video Game Music, ed. Michiel Kamp, Tim Summers and Mark Sweeney (Sheffield: Equinox, 2016), 53–72.

⁴³ Rob Bridgett, ‘Building an Immersive Soundscape in Shadow of the Tomb Raider – Full Q&A’, Gamasutra, 2018, accessed 8 April 2020, www.gamasutra.com/blogs/ChrisKerr/20181108/329971/Building_an_immersive_soundscape_in_Shadow_of_the_Tomb_Raider__full_QA.php.

⁴⁴ These stylistic responses are also a product of the inflexibility of waveform-based recorded music. It is possible that we will look back on the era of waveform-based video game music and note that the affordances and constraints of the waveform have had the same degree of impact on musical style that the affordances and constraints of the 2A03 (NES) or SID chip (C64) had on their era.

⁴⁵ Game audio middleware consists of audio-specific tools that integrate into game engines. They give audio personnel a common and accessible interface that is abstracted from the implementations of each specific gaming platform. FMOD (Firelight Technologies) and Wwise (Audiokinetic) are two of the most commonly used middleware tools.

⁴⁶ Sweet, Writing Interactive Music, 167, 171.

⁴⁷ Rigby and Ryan describe two distinct types of competence feedback; sustained competence feedback (during an episode) or cumulative competence feedback (at the conclusion of an episode). Richard M. Ryan and Scott Rigby, Glued to Games: How Video Games Draw Us in and Hold Us Spellbound (Santa Barbara: Praeger, 2010), 25.

⁴⁸ Hanson refers to this as ‘mastery through compulsive repetition’. Christopher Hanson, Game Time: Understanding Temporality in Video Games (Bloomington, Indiana: Indiana University Press, 2018), 111.

⁴⁹ Daniel Kahneman has suggested that there are differences between the objective moment-to-moment evaluation of what he terms the ‘experiencing self’, and that of the more subjective ‘remembering self’ that retrospectively evaluates an experience. The remembering self neglects duration, and the experience is instead evaluated upon ‘Peak’ and ‘End’ experiences. Daniel Kahneman, Thinking, Fast and Slow (London: Penguin, 2012). There is further evidence that when activities are goal-oriented the ‘End’ is more dominant, that the achievement (or lack of achievement) of the goal will dominate evaluations of the experience more than any ‘peak’ experiences within it. Ziv Carmon and Daniel Kahneman, ‘The Experienced Utility of Queuing: Real Time Affect and Retrospective Evaluations of Simulated Queues’, Working paper, Duke University (1996). www.researchgate.net/publication/236864505_The_Experienced_Utility_of_Queuing_Experience_Profiles_and_Retrospective_Evaluations_of_Simulated_Queues.

⁵⁰ Richard Stevens and Dave Raybould, ‘Designing a Game for Music: Integrated Design Approaches for Ludic Music and Interactivity’, in The Oxford Handbook of Interactive Audio, ed. Karen Collins, Bill Kapralos and Holly Tessler (New York: Oxford University Press, 2014), 147–66.

⁵¹ Colin Walder, ‘Music: Action Synchronization’, Wwise Tour, Warsaw 2016, accessed 13 October 2020, www.youtube.com/watch?v=aLq0NKs3H-k.

⁵² Walder, ‘Music: Action Synchronization’.

⁵³ Walder, ‘Music: Action Synchronization’.

⁵⁴ Robin Hunicke and Vernell Chapman, ‘AI for Dynamic Difficulty Adjustment in Games’, in Challenges in Game Artificial Intelligence: Papers from the AAAI Workshop, ed. Dan Fu, Stottler Henke and Jeff Orkin (Menlo Park, California: AAAI Press, 2004), 91–6.

⁵⁵ Ernest Adams, Fundamentals of Game Design, 2nd ed. (Berkeley: New Riders, 2007), 347.

⁵⁶ Ben Serviss, ‘Ben Serviss’s Blog – The Discomfort Zone: The Hidden Potential of Valve’s AI Director’, Gamasutra, 2013, accessed 8 April 2020, www.gamasutra.com/blogs/BenServiss/20130207/186193/The_Discomfort_Zone_The_Hidden_Potential_of_Valves_AI_Director.php.

⁵⁷ Paul Hellquist, [tweet] (1 September 2017), accessed 13 October 2020, https://twitter.com/theelfquist/status/903694421434277888?lang=en. Game designer Jennifer Scheurle revealed many other game design ‘secrets’ or manipulations through responses to her Twitter thread (accessed 13 October 2020, https://twitter.com/gaohmee/status/903510060197744640), discussed further in her 2017 presentation to the New Zealand Game Developers Association. Jennifer Scheurle. ‘Hide and Seek: Good Design is Invisible’, presented at the New Zealand Game Developers Conference, Wellington, 6–8 September 2017, accessed 13 October 2020, www.youtube.com/watch?v=V0o2A_up1WA.

6 The Triple Lock of Synchronization

¹ With some games, a slight lack of synchrony between sound and image, between a player’s input and the illusion of the event on screen, can be tolerable. For instance, with point-and-click narratives or detection games the player can mentally compensate for the discrepancy. However, the overwhelming majority of video games require rapid response to player input. What is sometimes called ‘input lag’ or ‘latency’, when a button is pressed and the in-game response to that activity is not immediate, can utterly ruin the gaming experience.

² I have discussed this in detail in relation to film and television in my book Occult Aesthetics: Synchronization in Sound Film (New York: Oxford University Press, 2014). Other relevant writing includes Jeff Rona, Synchronization: From Reel to Reel: A Complete Guide for the Synchronization of Audio, Film and Video (Milwaukee, WI: Hal Leonard Corporation, 1990) and Michael Sweet, Writing Interactive Music for Video Games: A Composer’s Guide (London: Addison Wesley, 2014), 28–9.

³ Michel Chion, Audio-Vision: Sound on Screen, ed. and trans. Claudia Gorbman (New York: Columbia University Press, 1994), 5.

⁴ Chion’s synchresis matches the ideas of Lipscomb and Kendall, both of which note perceptual ‘marking’ by synch points. S. D. Lipscomb and R. A. Kendall, ‘Sources of Accent in Musical Sound and Visual Motion’, in the Proceedings of the 4th International Conference for Music Perception and Cognition (Liege: ICMPC, 1994), 451–2.

⁵ Frans Mäyrä notes the surface and coding reality beneath. He discusses the ‘dual structure of video games’, where ‘players access both a “shell” (representational layers) and the “core” (the gameplay)’. Frans Mäyrä, ‘Getting into the Game: Doing Multidisciplinary Game Studies’, in The Video Game Theory Reader 2, ed. Bernard Perron and Mark J. P. Wolf (New York: Routledge, 2008), 313–30 at 317.

⁶ Rudolf Arnheim, ‘The Gestalt Theory of Expression’, in Documents of Gestalt Psychology, ed. Mary Henle (Los Angeles: University of California Press, 1961), 301–23 at 308.

⁷ Rudolf Arnheim, Art and Visual Perception: A Psychology of the Creative Eye (Los Angeles: University of California Press, 1974), 450.

⁸ For further reading about games and Gestalt theory consult K. J. Donnelly, ‘Lawn of the Dead: The Indifference of Musical Destiny in Plants vs. Zombies’, in Music In Video Games: Studying Play, ed. K. J. Donnelly, William Gibbons and Neil Lerner (New York: Routledge, 2014), 151–65 at 160; Ingolf Ståhl, Operational Gaming: An International Approach (Oxford: Pergamon, 2013), 245; and Mark J. P. Wolf, ‘Design’, in The Video Game Theory Reader 2, ed. Bernard Perron and Mark J. P. Wolf (London: Routledge, 2008), 343–4.

⁹ In other words, rather than ‘Mickey-Mousing’ (the music redundantly repeating the dynamics of the image activity), there is only a general sense of the music ‘fitting’ the action.

¹⁰ See Randall Hyde, The Art of Assembly (N.p.: Randall Hyde, 1996), 92–6.

¹¹ MMORPGs (Massively Multiplayer Online Role-Playing Games) and games for multiple players require effective synchronization to hold the shared gameworld together.

¹² Jesper Kaae, ‘Theoretical Approaches to Composing Dynamic Music for Video Games’, in From Pac-Man to Pop Music: Interactive Audio in Games and New Media, ed. Karen Collins (Aldershot: Ashgate, 2008), 75–92 at 77.

¹³ Richard Stevens and Dave Raybould, The Game Audio Tutorial: A Practical Guide to Sound and Music for Interactive Games (London: Focal Press, 2011), 112.

¹⁴ Sweet, Writing Interactive Music, 28.

¹⁵ Footnote Ibid., 36.

¹⁶ Karen Collins, Game Sound: An Introduction to the History, Theory and Practice of Video Game Music and Sound Design (Cambridge, MA: The MIT Press, 2008), 126.

¹⁷ Tim van Geelen, ‘Realising Groundbreaking Adaptive Music’, in From Pac-Man to Pop Music: Interactive Audio in Games and New Media, ed. Karen Collins (Aldershot: Ashgate, 2008), 93–102.

¹⁸ Tim Summers, Understanding Video Game Music (Cambridge, UK: Cambridge University Press, 2016), 190.

¹⁹ Footnote Ibid., 176.

²⁰ Alison McMahan, ‘Immersion, Engagement and Presence: A Method for Analyzing 3-D Video Games’, in The Video Game Theory Reader, ed. Bernard Perron and Mark J. P. Wolf (London: Routledge, 2003), 67–86, at 72.

²¹ Ken McGorry, ‘Scoring to Picture’, in Post Magazine, November 2009, 39, accessed 15 October 2020, https://web.archive.org/web/20100126001756/http://www.jasongraves.com:80/press.

²² Mark Sweeney notes that the game has two musical sound worlds: a neo-romantic one in cut scenes and a modernist one inspired by twentieth-century art music (of the sort used in horror films) which works for gameplay. The latter is reminiscent of Penderecki’s music as used in The Exorcist (1973) and The Shining (1980). ‘Isaac’s Silence: Purposive Aesthetics in Dead Space’, in Ludomusicology: Approaches to Video Game Music, ed. Michiel Kamp, Tim Summers and Mark Sweeney (Sheffield: Equinox, 2016), 172–97 at 190, 192.

²³ Don Veca, the game’s audio director, created a scripting system he called ‘Dead Script’, which was on top of low-level audio drivers and middleware. An important aspect of this was what he called ‘the creepy ambi patch’, which was a grouping of sounds that constantly reappeared but in different forms, pitch-shifted, filtered and processed. These were also controlled by the ‘fear emitters’ but appeared more frequently when no action or notable events were happening. Paul Mac, ‘Game Sound Special: Dead Space’, in Audio Media, July 2009, 2–3.

²⁴ For more discussion of plesiochrony, see Donnelly, Occult Aesthetics, 181–3.

²⁵ Gernot Böhme points to atmosphere as a form of integrated, concrete relationship between human and environment. Gernot Böhme, The Aesthetics of Atmospheres, ed. Jean-Paul Thibaud (London: Routledge, 2017), 14.

²⁶ Of course, this is importing wholesale music of a style that is not particular to video games, yet fits these particular games well.

²⁷ Composers include Toru Minegishi (who had worked on the Legend of Zelda games and Super Mario 3D World), Shiho Fujii and Manaka Tomigana.

²⁸ Although the player’s actions trigger musically derived sounds, forming something of a random sound ‘soloing’ performed over the top of the musical bed, it is difficult to conceive this as a coherent piece of music.

²⁹ The opening tango section comprises sixteen bars, followed by eight bars of oboe melody (the same four repeated), then an orphan four-bar drop-out section leading to pizzicato strings for eight bars, followed by the same section repeated, with added sustained strings for eight bars, and finally, a section of piano arpeggios of sixteen bars (the same four repeated four times), after which the whole piece simply repeats.

³⁰ Chion, Audio-Vision, 8–9.

³¹ Donnelly, ‘Lawn of the Dead’, 154.

³² The ‘repetition effect’ tests music’s durability, although the cumulative effect of repetition can persuade a listener. I hated the music at first, but after playing the game, ended up appreciating it.

³³ ‘Stems’ are musical parts that work through fitting intimately together rather than on their own. Arguably, the process of using stems comes from digital musical culture, where it is easy to group together recorded music channels in so-called ‘submixes’.

³⁴ Sweet notes that these points are called ‘hooks’, which describe the commands sent from player input, through the game engine to the audio, so named for the game ‘hooking into’ the music engine. Sweet, Writing Interactive Music, 28.

³⁵ Not only has Karen Collins written about different forms of interactive audio, but also Michael Liebe, who notes that there might be three broad categories of music interaction: ‘Linear’ (which cannot be changed by the player), ‘Reactive’ (which is triggered by player actions) and ‘Proactive’ (where the player must follow the game). Michael Liebe, ‘Interactivity and Music in Computer Games’, in Music and Game: Perspectives on a Popular Alliance, ed. Peter Moormann (Wiesbaden: Springer, 2013), 41–62, at 47–8.

³⁶ Frans Mäyrä notes that players, ‘ … access both a “shell” (representational layers) as well as the “core” (the gameplay).’ Mäyrä, ‘Getting into the Game,’ 317.

7 ‘Less Music, Now!’ New Contextual Approaches to Video Game Scoring

¹ Increasingly, because of either budgetary, aesthetic or cultural specificity, many video games do not actually contain any voice-over dialogue. They either display spoken content via on-screen text, or they contain no spoken dialogue or text at all. This decision can also be motivated because of the large budgetary impact that localizing these voice-overs into many different languages can incur on a production.

² Randy Thom, ‘Designing a Move for Sound’, in Soundscape: The School of Sound Lectures, 1998–2001, ed. Larry Sider and Diane Freeman (London: Wallflower, 2003), 121–37.

8 Composing for Independent Games: The Music of Kentucky Route Zero

¹ See, for example, David Kanaga, ‘Ecooperatic Music Game Theory’, in The Oxford Handbook of Algorithmic Music, ed. Roger T. Dean and Alex McLean (New York: Oxford University Press, 2018), 451–70.

² See Ryerson’s website, http://ellaguro.blogspot.com/ (accessed 8 May 2020) for more on her work.

³ A vocoder is an electronic device or effect which allows the ‘timbre and articulation of one sound source (usually a voice) to control another’, often resulting in a ‘talking’ or ‘singing’ instruments effects. Hugh Davies, ‘Vocoder’, Grove Music Online (2001), www.oxfordmusiconline.com/grovemusic/view/10.1093/gmo/9781561592630.001.0001/omo-9781561592630-e-0000047646. Accessed 8 May 2020.

¹ Ciarán Robinson, Game Audio with FMOD and Unity (New York: Routledge, 2019).

² Rob Bridgett, From the Shadows of Film Sound. Cinematic Production & Creative Process in Video Game Audio. (N.p.: Rob Bridgett, 2010).

³ Winifred Phillips, A Composer’s Guide to Game Music (Cambridge, MA: The MIT Press, 2014).

⁴ George Sanger, The Fat Man on Game Audio (Indianapolis, IN: New Riders, 2003).

⁵ Michael Sweet, Writing Interactive Music for Video Games (Upper Saddle River, NJ: Addison-Wesley, 2015).

⁶ Chance Thomas, Composing Music for Games (Boca Raton, FL: CRC Press, 2016).

⁷ Gina Zdanowicz and Spencer Bambrick, The Game Audio Strategy Guide: A Practical Course (New York: Routledge, 2020).

⁸ For challenges arising from this situation, see Phillips, A Composer’s Guide.

⁹ Karen Collins, Game Sound: An Introduction to the History, Theory and Practice of Video Game Music and Sound Design (Cambridge, MA: The MIT Press, 2008), 183–5.

¹⁰ Richard Stevens and Dave Raybould, Game Audio Implementation: A Practical Guide Using the Unreal Engine (Boca Raton, FL: CRC Press, 2016), 129–96.

¹¹ Zdanowicz and Bambrick, Game Audio Strategy Guide, 332.

¹² K. J. Donnelly, ‘Lawn of the Dead: The Indifference of Musical Destiny in Plants vs. Zombies’, in Music in Video Games: Studying Play, ed. K. J. Donnelly, William Gibbons and Neil Lerner (New York: Routledge, 2014), 151–65.

¹³ On the process of creating menu music for Shift 2, see Stephen Baysted, ‘Palimpsest, Pragmatism and the Aesthetics of Genre Transformation: Composing the Hybrid Score to Electronic Arts’ Need for Speed Shift 2: Unleashed’, in Ludomusicology: Approaches to Video Game Music, ed. Michiel Kamp, Tim Summers and Mark Sweeney (Sheffield: Equinox, 2016), 152–71.

Book contents

Part II - Creating and Programming Game Music

Summary

Beginning the Project

Devising the Musical Strategy

Methods of Dynamic Composition

Game Music as Media Music

Delivering the Music

Promotion and Afterlife

Scoring and Directing

Algorithms and Aesthetics

Agency and Structure

Conclusion

Synching It All Up

Finger on the Trigger

Player-Led Synchrony

Plesiochrony

Music-Led Asynchrony

Parallel-Path Asynchrony

Conclusion

The Practicalities of Game (Music) Development

Game Music 101: When Less Is Much More

Conclusions: Enjoy the Silence

Background

Composing for Games

Working on Games

Kentucky Route Zero

The Project and Development

The Music of Kentucky Route Zero

Reflecting on the Project

The Broader Contexts and Meanings of Game Music

Footnotes

4 Building Relationships: The Process of Creating Game Music

5 The Inherent Conflicts of Musical Interactivity in Video Games

6 The Triple Lock of Synchronization

7 ‘Less Music, Now!’ New Contextual Approaches to Video Game Scoring

8 Composing for Independent Games: The Music of Kentucky Route Zero

References

Further Reading

Save book to Kindle

Save book to Dropbox

Save book to Google Drive