In video games, music (in particular that which is considered ‘the score’) is automatically thought of as a fundamental requirement of the identity of any production. However, in this chapter, I will discuss how musical scores are often over-relied upon, formulaic and also, conceivably in some game titles, not required at all.
From the earliest availability of sound technologies, music has been present and marketed as a crucial part of a game’s ‘identity’. Most admirably, composers themselves have garnered an elevated status as auteurs within the industry, in a way that very few designers, animators, visual artists or sound artists have done. While this marks an important achievement for music and its presence in games, we are arguably in a phase of video game aesthetics where the function and use of music, particularly orchestral music, is becoming increasingly jaded, formulaic and repetitive, and where more subtle and fresh approaches appear to be garnering much higher critical praise.
While this chapter carries a provocative title and opening statement, I should state up front that I am a very big fan of video game music, composers and game scores. The premise of this chapter is not to question or discourage the involvement of composers and music in game development, but instead to challenge and further understand some of the motivations behind music use in video games. It is also to provoke the reader to see a future in which music can be integrated and considered much more thoughtfully, effectively and positively in order to serve the game’s soundtrack in both production and aesthetic terms.
The Practicalities of Game (Music) Development
As a practitioner and audio director for the last twenty years in the video games industry, at both the triple-A and indie studio level, I have gained a good insight into how music is both commissioned and discussed amongst game developers, game directors, producers, composers and marketing departments.
Firstly, we need to consider some of the wider overall contexts in which music in contemporary video games production sits. Generally speaking, the soundtrack of a video game can be defined by three main food groups of voice-over,Footnote 1 sound effects and music. I would also add an additional area of consideration to this, perhaps almost a fourth food group, of ‘mix’, which is the artistic, technical and collaborative balance of all three of those food groups in their final contexts, and in which the audio developers are able to make decisions about which of the three to prioritize at any given moment or any given state or transition in a game. The mix is something that should be (but is very rarely) considered with as much vigour as music, voice and SFX, as early and often as possible during the process of working across a game’s soundtrack.
To borrow from the production world of cinema sound, Ben Burtt and Randy Thom, both widely known and respected cinema sound designers and rerecording mixers, have often talked about what they refer to as ‘the 100% theory’.Footnote 2 This is an observation whereby every different department on a film’s soundtrack will consider it 100 per cent of their job to tell the story. The sound FX and Foley department consider it 100 per cent of their job to hit all the story beats and moments with FX, the writers consider it their job to hit and tell all the story moments with dialogue lines, and the composer and music department consider it 100 per cent of their job to hit all the storytelling moments with music cues. The result, most often, is arriving at the final mix stage, with premixes full of choices to be made about what plays when, and what is the most important element, or balance of elements, in any particular moment. It is essentially a process of deferring decision making until the very last minute on the mix stage, and is usually the result of a director working separately with each department, rather than having those departments present and coordinated as part of the overall sound team in general. Certainly, a similar thing can be said to be true of video game sound development, whereby the final ‘shape’ of the game experience is often unknown by those working on the game until very close to the end of post-production. This is mainly because game development is an ‘iterative’ process, whereby the members of a game development team work out and refine the elements of the game, the story, the characters (quite often even the game engine itself) and the gameplay as they build it.
Iteration basically requires that something is first tried out on screen (a rough first pass at a gameplay feature or story element), then subjected to multi-disciplinary feedback, then refined from a list of required changes; and then that process of execution and review is repeated and repeated until the feedback becomes increasingly fine, and the feature or story element feels more and more satisfactory whenever played or viewed.
If we consider the difference between film and game preproduction for a moment, the sound teams in cinema are able to identify and articulate that the 100 per cent rule is being applied in their productions. They typically have a director and a pre-approved shooting script already in the bag before production begins, so they already know what they are making, who the characters are and what the story is, and likely all the shots in the movie: yet they still have every department thinking it is their job to tell 100 per cent of the story. The situation is even more exaggerated in games, because of the iterative nature of production and (often) highly segmented workflow, both of which keep a great many more factors in flux right the way through the cycle of creation. Developers know very little upfront about the story and the gameplay mechanics, and perhaps only understand the genre or the overall feeling and high-level rules of the initial creative vision as they set off into production. Coupled with the vast amount of work and re-work that occurs in game production in that iterative process, the 100 per cent theory is in overdrive, as creators work to cover what may be ultimately required by the game once the whole vision of the game has been figured out, and what is actually important has become apparent.
Deferring decisions to the mix in cinema, though less than ideal, is still very much something that can be done. There is time allotted at the end of the post-production period for this to occur, and there are craftspeople in the role of rerecording mixer who are heavily specialized in mixing. So, to further develop this picture of the production of game sound content, we also need to understand that mixing technology, practices, expertise and planning in video games have only recently come into existence in the last fifteen years, and are nowhere near anything that could be described as beyond rudimentary when compared with cinema. In contrast with those of games, because cinema mixes can be conceived and executed against a linear image, they are able to make sense of the overabundance of content in the three main food groups to a greater degree of sophistication. An interactive game experience, however, is being mixed at run-time – a mix which needs to take into account almost all possible gameplay situations that the player could initiate, and not just a linear timeline that is the same every time it is viewed.
One of the advantages that video game developers do have over those working in cinema is that their audio teams are very often, at least at the triple-A studio level, already embedded in the project from the earliest concept phases. This is something cinema sound designers and composers have long yearned for, as it would enable them to influence the other crafts and disciplines of the film-making process, such as script writing, set design and cinematography, in order to create better storytelling opportunities for sound and allow it to play the role of principal collaborator in the movie. In video games, the planning of financial and memory budgets, conversations about technology and so on begin very early in the concept phase. At the time of writing, at the end of the second decade of the twenty-first century, more than in the last ten years, the creative direction of the audio in a new title is also discussed and experimented on very early in the concept phase.
This organization enables musical style and tone, sound design elements and approaches, both artistic and technical, to be considered early on in the development. It also facilitates early collaborations across departments and helps to establish how audio content is to be used in the game. However, very little time or consideration is paid to the ‘mix’ portion of the soundtrack, or basically, the thinking about what will play when, and how the three food groups of voice, sound and music will interact with, and complement, one another.
Ideally, being on the project and being able to map out what will play when and in what situation, gives a distinct advantage to those present on a preproduction or concept-phase development team, in that this work will actually inform the team as to what food group needs to be aesthetically prioritized in each situation. This way, for example, music cues can be planned with more accuracy and less ‘overall coverage’ in mind. Rather than making the mix decisions all at the ‘back end’ of the project during a final mix period, some of these decisions can be made upfront, thus saving a lot of time and money, and also allowing the prioritization of a composer’s, or sound designer’s tasks, on the areas of most importance in their work.
This is admittedly a utopian vision of how projects could work and be executed, and I know as well as anyone that as one works on a project, one needs to be prepared to react and pivot very quickly to cover something that was not discussed or even thought about the week before. Requests can emerge suddenly and quickly on a game team: for example, a new design feature can be requested and is taken on board as something that would be fun for the game to include, which completely changes one’s budgeted music requirements. Conversely, a feature could be cut completely because it is not fun for the player, which can impact huge amounts of music and scheduled work that is no longer required, and with the clock ticking, one has to either reappropriate those existing cues to fit new contexts for which they were not composed, or completely scrap them and have a composer write brand new cues to fit the new contexts.
This is one of the inherent risks and challenges of video game development, that the iterative process of developing the game is focused on throwing away and cutting work that has been done, as often and early and continuously as possible in order to find the core essence of what the game is about. This means that a lot of work on both the music and sound side is carried out in ‘sketch’ mode, whereby everything is produced quite quickly and loosely (ready to be thrown away at a moment’s notice), in order to not solidify and polish the intentions too soon. This often means a lot of the recording and refinement of the SFX and musical score does not occur until very late in production phases. So you will rarely really hear the final mastered, texture and mix of the score working in context until the all-too-short post-production phases.
In addition to the challenges added to the creation of this musical content by these continually moving goalposts, we should consider the great technical challenges of implementing a video game score to play back seamlessly in the game engine. One of the additional challenges of writing and working in this medium is that delivery and implementation of the musical score occurs through run-time audio engine tools (perhaps through middleware such as FMOD or Wwise). These systems require very specific music stem/loop/cue preparation and delivery, and scripted triggering logic must be applied, so that each cue starts, evolves and ends in the desired ways so as to support the emotion and action of the game’s intensity seamlessly.
Given, then, that this is often the production process of working on music and sound in video games, we can start to understand how much of video game music has come into existence and how composers can spend two, three or four years (sometimes longer) on a single project, continually feeding it with music cues, loops, stingers and themes throughout the various milestones of a project’s lifespan.
Another challenge that may not be evident is that music is often used during this iterative production period as a quick fix to supply or imply emotion, or evoke a particular feeling of excitement within the game, as a shortcut, and more of a Band-Aid solution than a balanced approach to the project’s soundtrack.
The mechanical influences of a game’s design also have a significant impact upon music content. A rigid structure and cadence of playback for music cues in each map or level may have been created as a recipe or template into which music functionally needs to fit, meaning that strict patterns and formulae about some of the more mechanical ‘in-game’, non-story music content act, not as emotional signifiers, but as mechanical Pavlovian signifiers to the players.
Game Music 101: When Less Is Much More
When I am not working on games, and am instead playing, I often notice that music is almost continual (often described as ‘wall-to-wall’), in many game experiences. In these games, where music is initially established to be always present, in any subsequent moment when music is not present, the game somehow feels like it is missing something, or feels flat. In production, this would get flagged as a ‘bug’ by the QA (quality assurance) department. So an aesthetic trap very often lies in establishing music as an ever-present continuum right at the beginning, as from then on the audio direction becomes a slave to that established recipe of internalized logic in the game. For me, the danger of having ever-present music is not simply that ubiquitous music incurs a greater monetary burden on the project, but that this approach dilutes the power, significance and emotional impact of music through its continual presence.
Overused and omnipresent scores are a huge missed opportunity for the creative and aesthetic advancement of video game sound, and risk diluting the emotional impact of a game experience on an audience. Video game sound has largely been defined and understood aesthetically in a wider cultural context, over the last four decades, through its unsubtle use of repetitive music, dialogue and SFX. Though I understand, more than most, the huge pressure on game audio developers to implement and pack in as much heightened emotion and exaggerated impact into games as possible (depending much, of course, on already-established niche conventions and expectations of genre), it gives me great optimism that a more tempered and considered approach to the emotional and evocative power of game music and its relation to the other food groups of sound is already starting to be taken.
Development studio Naughty Dog have a consistent, highly cinematic and narrative-driven approach to their games, and subsequently their soundtracks feel aesthetically, and refreshingly, more like very contemporary movie soundtracks than ‘video game soundtracks’. The Last of Us (2013), and Uncharted (2007–2017) series all have what would generally be considered quite sparse musical treatments. Much of the soundtracks are focused on voice and character performances, and are also about dynamic sound moments. Only when you get to very emotional ‘key’ elements in the plot of a game do you hear music cues being used to drive home the emotional impact. This mature approach on the part of the game directors and the sound teams is certainly helped by the studio’s laser focus on narrative and cinematic experiences, which enables them to plot out and know much of what the player will be doing, in what order and in what environment far ahead of time during development.
Another fine example of a more sparse approach to a score is the Battlefield franchise by the developer DICE in Sweden. The game, being a first-person competitive shooter, necessitates an approach to the soundtrack where the player needs to hear all the intel and positional information available to them at all times during the chaos of the combat, with the utmost clarity. In this sense, a musical score would clearly get in the way of those needs during gameplay. In Battlefield 3 (2011), a non-diegetic musical score is used only to establish the emotional states of the pre- and post-battle phases, and of the campaign mission itself; otherwise, the musical score is absent, entirely by design, and beautifully fitting the needs of the players.
In many of my projects I have been asked by game directors to always support the mood and the storytelling with music, often on a moment-to-moment basis. This is a battle that I have often had to fight to be able to demonstrate that music, when overused like this, will have the opposite effect on the player, and rather than immersing them in the emotion of the game, will make them feel fatigued and irritated, as a result of continually being bombarded with the score and forced moods telling them how they should feel about what is on screen. It is the musical equivalent of closed captioning when you do not need closed captioning. Sometimes I have been successful and at other times unsuccessful in making these points, but this is a part of the everyday collaborative and political work that is necessary on a team. I strongly believe that new approaches to music, sound and voice will gradually propagate across the industry, the more we see, analyse and celebrate successful titles doing something different.
Certainly, after working under continual ‘use music to tell 100 per cent of the story’ pressure like this, I can safely say that in order to move forwards aesthetically and adopt an integrated approach to the soundtrack, in most cases, sound needs to carry more of the storytelling, voice needs to do less, and music most certainly needs to do a lot less.
As we build games, movies, experiences and emotions together in our teams, perhaps music should be the weapon we reach for last, instead of first. This way, as developers, we could at least have a more accurate picture of where music is really needed and understand more what the actual role of the musical score is in our work.
For me, there are two more pivotal examples of a more sparse and mature music aesthetic that need to be highlighted, and both of them also have approaches in which the music for the game could also easily be considered as ambiguous, and sound-effect-like. Those two games are Limbo (2010) and Inside (2016), both from the Danish studio Playdead, with sound and music by Martin Stig Andersen. The refreshing aesthetic in these games is evident from the opening moments, when the focus of the game is solely on a boy walking through the woods (in both games). Foley and footsteps are the only sounds that establish this soundtrack, and gradually as the gameplay experience unfolds, more environmental sound and storytelling through SFX and ambience begin to occur. It is only when we get fairly deep into the game that we hear our first music cue. In Limbo, the first ‘cue’ is particularly striking, as it is a deep, disturbing low note, or tone, which sounds when the player jumps their character onto a floating corpse in order to get to the other side of a river. This low tone has such a visceral impact when synchronized with this disturbing moment (in most games this would be a trivial mechanical moment of simple navigation), that it takes the game into a completely different direction. Rather than a ‘score’, the game’s ‘music’ seems in fact to be sound emanating from inside the soul of the character, or the dark black-and-white visuals of the world. And because the use of music is so sparse and rare – or at least, it is rare that you can identify particular sounds as specifically ‘musical’ – the impact of those cues and sounds becomes extremely intense. At the same time, the line between conventional musical materials and sound effects becomes blurred, allowing the sounds to gain a ‘musical’ quality. Many moments also stand out like this in the spiritual sequel to Limbo, Inside, the most memorable of which, for me, was the moment when the rhythmic gameplay of avoiding a sonic shockwave, and hiding behind metal doors, transitioned from being a purely sound-based moment to a purely musical one, and then back again. The way the transition was carried out elevated the experience to an entirely spiritual level, outside of the reality of the physics of sound, and into the realm of the intangible and the sacred.
Conclusions: Enjoy the Silence
Recently, I was very fortunate to work on a fantastic indie game called Loot Hound (2015) with some friends, and the interesting thing about that game was that it had absolutely no musical score and no musical cues whatsoever in the game. Even the menu and loading screen were devoid of music cues. The strange thing is that this approach was never consciously discussed between those of us working on the game. I am pretty sure this was not the first game with no music of any kind, but it was a very positive and renewing experience for me as someone who had come from working on triple-A, music-heavy games to be able to take time and express the gameplay and the aesthetics through just sound, mix and voice. In the end, I do not think any of us working on that title played it and felt that it was lacking anything, or even that music would have brought anything significant to the table in terms of the experience.
In the end, the encouraging thing was: I do not think anyone who played it even mentioned or noticed that there was no music in this game. The game was released through Steam, where one could easily keep a track on feedback and comments, and I do not recall seeing anything about the game’s lack of music, but did see quite a bit of praise for the sound and for the overall game experience itself. The game and process of its creation was certainly different and refreshing, and maybe one of many potential futures for game sound aesthetics to be celebrated and explored further in larger-scale productions to come.
A more integrated approach for all elements of the soundtrack is necessary to push games, and game scores, into new artistic and technical territories. This requires a lot of political and collaborative work on the part of game developers together in their teams, and also a desire to make something that breaks the mould of generic wall-to-wall game music. A part of establishing this new direction requires identifying, celebrating and elevating titles in which the soundtrack is fully more integrated, and where sound, music and voice gracefully handover meaning to one another, in a well-executed mix. A more integrated approach for music is also probably only possible once the contexts into which the music will fit can be understood more fully during all phases of production. And in that sense, the challenges are quite considerable, though not insurmountable. I believe that once the production and planning of music is integrated into game development schedules and production phases more carefully, the more integrated and enjoyable the score will be on an emotional level as a part of the overall experience of playing video games.