1. Introduction
The grounding assumption in this paper is that it is worthwhile to attempt to draw on and integrate technical and human reception perspectives to advance new music research and/or creation, towards implementing robust solutions that can usefully inform both fields of interest. Towards addressing this, a part of the new technology experimentation process is the opportunity to reflect on the balance between technical input and communicative outputs, and to raise issues that might be useful towards developing significant lines of further enquiry.
The emerging use of software agent technology as a tool in music/sound art creation, part of a wider interest in the deployment of interactive music systems such as those applying complementary techniques from evolutionary arts (Brown Reference Brown2002) and A-Life (Miranda Reference Miranda2003), presents the chance to take a reflective step aside.
This paper is firstly a survey of recent software-based agents in music/sound art in both linear and non-linear idioms. After the technology is introduced, a theoretical framework is put forward towards an understanding of the work, followed by a conceptual overview that characterises and illustrates common approaches. Secondly, areas of neglect in recent research are then identified, particularly in affective (emotion-based) approaches and narrative mapping, and a possible future direction based on affective music briefly explored. Finally, to conclude and answer the critique, a hybrid model is proposed based on non-linear generative improvisation coupled with a conversational model of human–computer interaction that takes into account music/sound as an affective language.
The intention here is then to broaden debate and stimulate further discussion – and also to signal areas of contention that might be useful to address in future work.
1.1. What are software agents?
An agent is generally understood as somebody or something that acts on behalf of another in a process. A software agent, put simply, is a computer program that works on tasks specified by a user. As with their physical counterparts, software agents can exhibit varying degrees of persistence, independence, communication and collaboration with other software agents or people. In addition, ‘intelligent’ agents might have the capability to make decisions, the capacity to learn within an environment, and the ability to be mobile over computer networks. Increasingly more ‘intelligent’ agents monitor environments, can glean information, make decisions to react or not, and even modify their behaviour according to the results received (see Bigus and Bigus Reference Bigus and Bigus2001; Consoli, Ichalkaranje, Jarvis, Phillips-Wren, Tweedale and Sioutis Reference Consoli, Ichalkaranje, Jarvis, Phillips-Wren, Tweedale and Sioutis2006).
As the use of software agents has become more widespread, a variety of independent development environments (IDE) have been created for building agent-based systems, including IBM’s Aglets, or Grasshopper, Jade and Zeus (see Balogh, Budinska, Dang, Hluchy, Laclavik and Nguyen Reference Balogh, Budinska, Dang, Hluchy, Laclavik and Nguyen2002; Detlor and Serenko Reference Detlor and Serenko2002). The technology is now used in academia across several disciplines, particularly in simulating multicausal situations, examples being found in the annual conference proceedings of Autonomous Agents and Multi-Agent Systems (AAMAS). It is also increasingly used in real-time industrial and military applications in areas such as scheduling and simulation, fields as diverse as architecture, engineering and construction (Anumba, Ren and Ugwu Reference Anumba, Ren and Ugwu2005), and in business (see Aronson, Liang and Turban Reference Aronson, Liang and Turban2004). Useful conceptual introductory texts include Bigus and Bigus (Reference Bigus and Bigus2001) and Weiss (Reference Weiss2000).
Beyond using a single software agent to do something on your behalf, there are three broad technical software-based deployments of the technology. A Multi-Agent Systems (MAS) is one where a computational system allows agents to cooperate or compete with others to achieve some individual or collective task(s). A Distributed Artificial Intelligence (DAI) approach is a cooperative system where agents act together to solve a problem, and can include agents being used in decision support systems where the outcome is based on interactions between human and agent input. Finally, there are Multi-Agent Based Simulations (MABS) of complex situations, although there is some debate in the field as to the degree to which MABS actually use agent technology (Drogoul, Meurisse and Vanbergue Reference Drogoul, Meurisse and Vanbergue2002).
2. Scope And Theoretical Approach
To understand the scope of the field, offer a theoretical framework in which decisions are made, and propose a theoretical test of potential deployment of agent technology in music/sound art works, there are useful perspectives in prior literature that can be drawn on and amalgamated on which to base subsequent understanding.
2.1. Interactivity
Interactivity is integral to agent technology deployment, either in self-contained (closed) or HCI (open) systems. The scope of music interactivity then gives a sense of the range of the field. Graugaard’s (Reference Graugaard2006: 125) interactivity schema, although not without limitations, integrates various perspectives (see figure 1).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170209043247-08851-mediumThumb-S1355771809000260_fig1g.jpg?pub-status=live)
Figure 1 Disciplinary context of interactive music.
The focus in this paper is largely on the dialectic between computer science, ‘musical’, narrative, and emotional parameters relating to agent technology given in figure 1.
2.2. Functional decision space
In making new works, creators do not begin in a vacuum. Gimenes, Miranda and Johnson (Reference Gimenes, Miranda and Johnson2006), for example, note that each artwork one creates is embedded in a cultural inheritance, and pieces of music reflect both composer choices and the history of influences on a work. A computer-based composer/sound artist or programmer in making new works, either consciously or unconsciously, is likely to draw on their view of music/sound art, the purposes for which the output is intended, and the knowledge, techniques and conventions developed through artistic and technical training. A theoretical approach to the multidimensional decision space that integrates software-based agent technology is proposed in figure 2, accepting that this neglects engineering aspects of Graugaard’s model (figure 1) that are beyond the focus of this paper.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170209043247-97593-mediumThumb-S1355771809000260_fig2g.jpg?pub-status=live)
Figure 2 Decision space.
Elaborating on figure 2, the GOALS a creator has influence the type of system built technically and artistically. Weinberg (Reference Weinberg2005: 31–2) characterises the two possible approaches as being musical structure/composer controlled (based in the late European avant-garde position), and a processes-based approach (explored by American composers such as John Cage and Steve Reich). Technically, a structure-centred approach allows participants (agents and/or humans) to fulfil prescribed musical or performance outcomes. A process-centred approach concentrates on exploring possibilities or fulfilling goals through collaboration or competition, and the experience may differ with each session. Expanding the notion of making linear approaches to notation or acousmatic/sound art works beyond one possible ‘frozen’ form, Chadabe’s (Reference Chadabe1996, Reference Chadabe2004) concept of a ‘process’-based rather than ‘product’-based approach to creation is useful here, explaining how musical structures are a reflection of wider paradigms. Chadabe’s ‘non-linear’ perspective allows for structures to be created by the dynamic interplay of the behaviour of participants, rather than being predetermined by scores or set forms. In addition, Dorin’s (Reference Dorin2001) conceptual work on generative arts illustrates how that in a ‘process-based’ approach, both form and content of new non-linear works can be the outcome of the creative process.
The LANGUAGE/KNOWLEDGE space has a continuum of choices (figure 2). A creator decides if a work is to be notation based, or will take Smalley’s (Reference Smalley1997) theoretical approach to spectomorphology in creating sound works, for example, or a combination of these. In addition, the creator may take the formalist view that language (sound or notation based) is self-contained with no affective/emotive attributes, or that it has an affective/emotive element(s) (Juslin and Sloboda Reference Juslin and Sloboda2001; Nussbaum Reference Nussbaum2007). With any creative approach, choices are made also if musical/sound language is something self-contained, or includes aspects of other artistic communicative elements, such as dance (see Bryant and Hagen Reference Bryant and Hagen2003; Thaut Reference Thaut2005; Brown and Parsons Reference Brown and Parsons2008), or wider narrative storytelling. Finally, choices are made as to if a new work is to be creator focussed, or based in reception studies and/or human physiology: see Landy (Reference Landy1999) or Weale (Reference Weale2006) on electroacoustic music reception, for example, and Patel (Reference Patel2008), Mithen (Reference Mithen2005), and Brown, Merker and Wallin (Reference Brown, Merker and Wallin2001) on evolutionary approaches to explaining human musical abilities and responsiveness.
A part of choices about language, and a reflection of those choices, is also sometimes a decision about knowledge. Many artificial intelligence (AI) approaches to music centre around four methods to applying artificial intelligence methodologies to notation (see Camurri’s Reference Camurri1993 classification). In summary, a symbolic approach sees knowledge as being represented in an appropriate language, and machine manipulation of symbols gives new knowledge (see Cope Reference Cope1991, for example). The logic approach assumes that using logical formalisms represents knowledge, and by manipulating this logic-based knowledge we can then infer the new information. In a sub-symbolic approach, connectionist researchers believe knowledge expressed by states and connections between simple processing units can be learned and replicated by machines. Finally, a hybrid approach begins from the assumption that knowledge is represented by an integration of a sub-symbolic system and a symbolic system.
In the AGENT VIEW space, technical decisions must be made as to the level of deployment of the technology, and the general approach to be taken: MAS, DAI, MABS. Subdivisions on the decision space in the agent view here (see figure 2, agent view modified from Consoli et al. Reference Consoli, Ichalkaranje, Jarvis, Phillips-Wren, Tweedale and Sioutis2006) should again be read as a continuum rather that binary oppositions, and a system might also contain a hybrid of approaches. The significant division here is the question as to if the system is self-contained (MAS) or can include external input (DAI), particularly human input. As previously noted, agents’ adaptive learning ability and level of autonomy largely characterise the level of agent ‘intelligence’. In addition, enhancing technical deployment, systems might be extended by drawing on other AI techniques such as machine learning.
2.3. Reaction and interaction
Theoretically, and as a basis to critique work, it is useful to have a test of the level of intelligence of agent-based systems. For this, and expanding Graugaard’s (Reference Graugaard2006) notion of interactivity (see figure 1) is Paine’s (Reference Paine2002) argument that in many machine/human interactive works (Rowe Reference Rowe1994; Winkler Reference Winkler1998) the human agent(s) interacts, but the machine largely reacts. To address this, he proposes a conversational model (Paine Reference Paine2002: 297) of human and machine agency. Based on the analogy of a human conversation, he notes that the relationship should be unique and personal to participants, unique to the moment of interaction, vary with unfolding dialogue, and be maintained by both parties speaking the same language and addressing the same topic. Further, one party may know the beginning point of a conversation, and while there may be a pre-existing agenda, the terrain of the conversation might not be known in advance. An interactive process is then one of exchange and of sharing ideas, and the relationship between participants should deepen over time. A greater amount of cognition is then required on the machine’s part than is normally considered in many ‘interactive’ music systems (see Winkler Reference Winkler1998). Further, and independent of human–machine interaction, Paine’s model might also be drawn on as a test of interaction in machine-only systems in terms of the level of their ‘intelligence’.
While there are drawbacks in taking an extreme approach to using new musical/sonic languages in generating form and content using this model – as we have to understand something of a shared vocabulary, grammar or forms to have a conversation – the argument is useful as a counterpoint to avoiding excessively prescriptive approaches to music/sound art composition. The model may also be used to understand how common ‘musical’ languages might evolve from the dynamic input of different participants, and aid in understanding the range of current work in ‘interactivity’ when applying agent technology that addresses parts of Paine’s paradigm. Tempering Paine’s model, it is also worth noting that not all conversations are symmetrical in terms of knowledge, participation, and input quality.
3. Framework For Agent-Based Music/Sound Art
Given the range and amount of academic and industrial work taking place that uses multi-agent systems, it is surprising how little attention the tools have attracted from composers or computer scientists interested in music/sound art. In contrast, less surprising is the divergence of approaches that reflect the wide field of interest in the decision space (figures 1 and 2). Generally, technical or computer-science perspectives, rather than wider music/sound artistic disciplinary perspectives, influence many approaches taken.
Drawing on the framework presented in the previous section, four methods in using software agents in music/sound art seem apparent, grounded in similar technical and/or artistic perspectives (Whalley Reference Whalley2005). Put crudely, simulation and performance/reactive method are largely structure focused and often notation based, and generative and generative/improvisational methods draw largely on process-based paradigms and have broader view of language and knowledge. The examples given to illustrate each show something of the current scope of work, and the summary is necessarily partly descriptive. Aspects of computer science, music, emotion and narrative (see figure 1) are touched on here, comments on influence within the decision space made, and the ‘conversational’ test applied. A brief general critique of the scope of work is left to the next section.
3.1. The simulation approach
Technically, agent technology is generally well suited to simulating linear tonal music creation or realisation from motives within set forms or structures, because it deals with real-time data and multi-causal situations where the parts (‘players’) constantly adapt (react) to each other in the process of making a structural whole as a set goal. Given this, the emotional dimension of music and performance values are of less concern here. The method could also be implemented using other AI technologies, but the extent of real-time adaptation gives agent-technology some advantages.
Although not widely deployed, the linear principle here could also be partially used to make acousmatic music or sound art (Smalley Reference Smalley1997; Wishart Reference Wishart2002) where the structure and/or form are predetermined but the content be decided by the agents. A simulation approach may also potentially involve an agent-based closed system replicating known styles, without human interaction, regardless of the music AI technique used.
In terms of examples, the simulation approach to tonal music, a common area of MAS research to test out scenarios in software, is found in Nakayama, Wulfhorst and Vicari’s (Reference Nakayama, Wulfhorst and Vicari2003) ‘A Multi-agent Approach for Musical Interactive System’. Here, a community of agents interact through musical events (MIDI), simulating the behaviours of a musical group. The outcome is an intelligent accompaniment system, where agents ‘listen’ and interact with each other. Agents are ascribed a basic level of knowledge to play their instruments synchronously and satisfy their internal goals. The resulting Virtual Musician Multi-Agent System is analogous to instructing a beginning jazz band to accompany a singer based on an agreed structure and set of rules, into which individuals make musical choices as they interact with other players. While technically interesting, the work has limitations artistically: first by its narrow MIDI implementation (limiting the expressive potential of affective performance), and second because the structure is based on a logical or rule-based approach to music as language (GTTM and Camurri’s Reference Camurri2003 classification), in contrast to a view that music is a medium for expression, or a language that can be expanded as part of the creative process.
A contrasting simulation approach is found in work by Gang, Goldman and Rosenschein (Reference Gang, Goldman and Rosenschein1999). Here, a system was developed for generating two-part counterpoint integrating sub-symbolic (machine learning, states/connections) and symbolic music processing (see Cope Reference Cope1991 on recombinicity). The work encodes musical knowledge, intuitions and aesthetic taste into different modules, captured by applying rules, fuzzy concepts and learning. Their work illustrates a hybrid AI approach made up of a connectionist module and an agent-based module. Again, the work is of technical interest, but many sound artists may see the aesthetic of tonal/notational stylistic variation as limiting.
A criticism that can be made here is that the level of agent deployment (‘intelligence’) is low in terms of agent autonomy and learning ability, making these self-contained systems largely reactive. Further, the division between the methods used to generate outcomes here and other straight rule-based systems is not clear cut at times, particularly when considering the continuum of possibilities that agent technology presents.
3.2. Generative approach
In contrast, a generative approach (see Dorin Reference Dorin2001 and Miranda Reference Miranda2003, who have also used the term) applied to either pitch/duration or sound art composition allows agent technology to create content and even structure based on the dynamic interplay of parts (process approach). This is a non-linear perspective, where the composer creates a range of agent behaviours, sometimes based on a logic-rule approach rather than predetermining forms, and there will be many possible outcomes and durations of works that result. Process then becomes the artistic focus, rather than a set product. These systems are generally technically self-contained once set in motion, and the language used is largely self-referential, neglecting the emotional perspective in terms of composition and performance value. Deployment may also draw on other rule-based system such as those found in A-Life paradigms, used in combination with agent-based approaches. Some recent examples are cited to illustrate work here.
A largely generative systems approach is seen in ‘Khorwa: a musical experience with autonomous agents’ by Malt (Reference Malt2004), which is informed by A-Life/evolutionary systems. This installation also provides the opportunity to enter voice recordings. It is programmed in MAX/MSP rather than a common package for the development of agent systems. Subsequently, the software used for implementation limits agent applications to an extent, and Drogoul et al. (Reference Drogoul, Meurisse and Vanbergue2002) might debate if there are any agents involved.
Although adding the ‘Mobile Agent Systems’ represented in by Kon and Ueda’s work (Reference Kon and Ueda2004) here, their platform remains largely aesthetically speculative. Mobile agents, in contrast to MAS systems that most practitioners implement in music/sound art, are active autonomous objects that can execute computation in a network, migrating from one node to another. Kon and Ueda’s Andante is a prototype for the construction of a distributed application for music composition and performance based on mobile musical agents. Implemented in Java, two recent applications from this are NoiseWeaver and Maestro. Both are based on agents that generate and play stochastic music in real-time, either being controlled by a GUI or script. Kon and Ueda’s infrastructure provides a shell for using DAI technology, but the current limitation is one of computational formalism (i.e. aiming to sonify the results of a successful technological system), and it lacks widespread artistic uptake.
Extending the generative approach is the work of Gimenes, Miranda and Johnson (Reference Gimenes, Miranda and Johnson2006) – also see Gimenes, Miranda and Johnson Reference Gimenes, Miranda and Johnson2005 – where a society of agents is created to illustrate the development of musical styles. This model is used to study human beings’ abilities and activities, toward understanding alternative routes in the historical evolution of musical styles from the perspective of music interaction and influences. To do this, agents in the system attempt to create their own worldview through aspects of music they learn, interaction with other agents, and generating new music. The system is a type of musical language lab where a user can define scenarios for evolution, and agents’ abilities grow based on Dawkin’s concept of memes. The system, rather than intended as a strictly creative generator, develops alternative stylistic approaches based on known musical language variations. In terms of intelligence, the strength of the work here is to implement some aspects of Paine’s conversational model in a machine-based system that is beyond other machine reactive approaches, while also allowing aspects of language to evolve as part of the inter-agent dialogue.
Finally, Eigenfeldt (Reference Eigenfeldt2008) developed an ensemble based on a multiple-agent system programmed in MAX/MSP that generates pitch/duration polyphonic rhythmic patterns. Agents here require little input or supervision once set in motion. Based on the Kinetic Engine, the system uses networked computers and individual software agents to emulate what drummers improvising in a percussion ensemble might do. The agents here can develop within and between one performance and another (learn), evaluating their own input in comparison to ‘personality parameters’. Although intended as a creative method in contrast to the previous example, it also illustrates how a conversational approach to composition might best be applied when provided with aspects of pitch/duration-based language in the first instance.
From a technical perspective here, the theoretical advantage of deploying agent-based systems over other algorithmic rule-based systems, such as cellular automata, is the ability of agents to evolve rules in real-time (adaptation) and learn from experience. The extent to which this is applied to rule-based languages, in part, allows the deployment of parts of the conversational model of interaction.
3.3. Performance/reactive approach
A performance/reactive approach is generally represented in the application of MAX/MSP to live performance (Rowe Reference Rowe1994; Winkler Reference Winkler1998), MAX/MSP’s deployment in many human/machine interactive installations, or other software-based systems used to create various types of net-based music with real-time human input (Weinberg Reference Weinberg2005). The term ‘reactive’ is used here to satisfy Paine’s (Reference Paine2002) concept of ‘interaction’.
In terms of agent deployment, these systems tend to be focused on increasing control over aspects of structure (performance ‘command and control’), have low levels of agent autonomy and learning ability, and are generally based on traditional notation approaches to self-contained musical language. They extend simulation approaches by being open systems (DAI) that allow for human input. Two recent examples that treat language as something self-contained illustrate this method.
Spicer, Tan and Tan (Reference Spicer, Tan and Tan2003) in ‘The Learning Agent Based Interactive Performance System’ apply the idea of an artificial performer to aid managing a complex system. Using a tonal MIDI-based pitch/duration paradigm, musical agents communicate and collaborate to produce compositions, often in conjunction with human input where the agents make up an ensemble of virtual performers. The method allows a human performer to ‘play the system like an instrument’. The performer can alter target values so that the program will change to arrive at the user’s desired state, and the system incorporates a delta-learning rule that allows agents to respond and adapt to user input. The machine agent is based on an expert system model.
Spicer (Reference Spicer2004) has also explored a further performance/reactive approach in ‘AALIVENET: An agent based distributed interactive composition environment’. This allows human performers in different locations to interact with each other via agents that they control. The system uses a client/server autonomous agent structure, client machines generating local variations of music based on updated information from human performers. The server acts as an information hub that passes on performers’ intentions to client machines. The client machines will converge on a state that reflects all performers’ intentions, but the music made at each site will differ in detail.
Both systems are limited in that the language used (pitch/duration tonal music generation) restricts the output to a small range of music/sound art possibilities. Further, the way that MIDI or synthesis is deployed in these systems neglects performance expressiveness.
3.4. Generative/improvisation approach
The most extensive deployment of agent technology, and an area that takes a broader view of music/sound art language although still largely self-referential, has taken place the area of a process-focused generative/improvisation approach (also see Brown Reference Brown2002 on evolutionary systems approaches). This allows real-time improvised human input, but rather than the machine agent simply reacting, there is a human and machine adaptation (autonomy and learning) to the generative input of the machine and the input of human agency. Again in theoretical terms, if the adaptive relationship between human and machine demonstrates balanced listening/dialogue, and little of the final output is prescribed, a conversational model of interaction is then possible.
The scope of work here is best illustrated by citing examples. Two recent deployments of the generative/improvisation approach lean on a sound art perspective. Both use improvised human input and an embedded multi-agent system that adapts in an ongoing dialectic, as either net or installation based. Chen and Kiss (Reference Chen and Kiss2003) used a multi-agent system as part of an installation in Quorum Sensing. This work metaphorically reconstitutes an ecosystem with synthesised sounds and images. It makes use of a multi-agent system that responds to the input of visitors to the site, the source images and sounds containing objects that can be perceived, created, destroyed and modified by the software agents.
Whalley (Reference Whalley2004) also uses an embedded multi-agent system in the PIWeCS project, a Public Interactive Web-based Composition System. This is an internet-based structure that increases the sense of dialogue between human and machine agency through integrating intelligent agent programming with MAX/MSP. Human input is through a web interface. The ‘conversation’ is initiated and continued by participants through arrangements and composition based on short performed samples, and the system allows the extension of a composition through the electroacoustic manipulation of source material by human or machine. A limitation of the method in terms of the conversational test is in using predetermined material, rather than letting the machine decide on some of the source content.
Both of these generative/improvisation systems enhances machine agency, attempting to add a sense of machine cognition to interactive works. However, the degree of autonomy of the agent systems and extent of interaction differs between them. Chen and Kiss (Reference Chen and Kiss2003) allow greater autonomy over agent decision-making regarding content, but their system has no memory to allow an evolving dialogue. Whalley’s (Reference Whalley2004) MAS is built on an expert system, and the memory function allows for the machine accumulation of user patterns, facilitating a conversation that is unique to each session, and user/machine dialogue to develop by mutual understanding over time, in line with the conversational model.
Three other systems expand the general approach here in terms of agent ‘intelligence’.
A partly speculative system is Edwards, Murray-Rust and Smaill’s (Reference Edwards, Murray-Rust and Smaill2006) MAMA (Musical Acts – Musical Agents) a generative architecture for interactive musical agents. The Musical Acts method is based on Speech Acts theory (simply defined as: in the act of saying something, we do something). The aim is to make a collection of agents that can improvise music (notation based) with each other or with human input, embedded in a language through which agents can represent and reason about music. Their system provides an agent-based ‘Musical Middleware’ for composers and musicians, with some constraint on input, and a set of general style libraries geared towards creating music. Using a logic-based approach to AI based on Music Acts, this allows for computational efficiency, rather than each agent having to listen to and interpret each other, and allows for the possibility of analysing musical acts and gestures from human input. At least, the system here acknowledges that there are different types of improvisational approaches to form and content, and that conversations start from clusters of known language and grammar. The theoretical limitation is that, in prescribing parts of the language, one prescribes something of the gambit of possible conversational outcomes.
A recent system introduced by Beyls (Reference Beyls2007) is found in ‘Interaction and Self-Organisation in a Society of Musical Agents’. Again partly speculative, particularly regarding human input, this is a distributed architecture where musical agents interact according to mutual affinities in a virtual world. Here, agents continuously exchange information in their neighbourhoods while self-organisation takes place. The ‘society’ may function on a continuum between autonomy or man–machine interaction, providing ‘an adaptive musical playground’. Within this, agents associate spontaneously, creating temporary emergent structures, the result of perpetual self-production in line with the theory of autopoiesis (a self-maintaining system that produces and replaces its own components and distinguishes itself from its environment). The great potential of the system, based on biological models and implemented through an array of AI techniques, is that it puts forward the possibility of human/machine interaction based on non-ideomatic improvisation. Despite being embedded in MIDI streams and pitch/duration methods, the system potentially presents a flexible means to implementing more of the conversational model; as fewer musical elements are prescribed (although behavioural boundaries can be specified), the artistic input becomes a more integral aspect of an evolving two-way dialogue, and an independent musical ‘personality’ can potentially emerge from agent input.
Finally, Miranda and Gimenes (Reference Miranda and Gimenes2008) have introduced iMe (Interactive Music Environments) in ‘An A-Life Approach to Machine Learning of Musical Worldviews for Improvisation Systems’. This extends work described in the generative approach (above) by Gimenes et al. (Reference Gimenes, Miranda and Johnson2006) and develops techniques beyond genetic algorithmic methods (see Biles’ Reference Biles2003 GA taxonomy) by using adaptive learning agents that interact with ongoing human input, generating music autonomously and in real-time. The system again takes a cultural view (memes) seen as being reflected in a musical style/worldview established by patterns and artefacts produced by human behaviour resulting from constrained choices. Like previous work, the research question raised is again how different musical influences lead to musical worldviews based on interaction between agents, but this time adding influence by external systems and human input. Agents’ perception and abilities can change over time based on their ability and experience (memory, structure of memory, and forgetting) of musical input: the system allowing one to see how different musical elements and the balance between them defines a musical style. The work here allows for creating mechanisms/agents with cognition through which new styles can emerge, rather than relying on fixed rules from known styles, extending Cope’s (Reference Cope1991) fixed pattern-based work, and extending Rowe’s (Reference Rowe1994) work to foster a greater sense of interaction in line with the conversational model.
4. Scope, Challenges, Pointers
4.1. Scope and decision-making space
As noted, generally and in light of Graugaard’s (Reference Graugaard2006: 125) overview (figure 1) and its augmentation for discussion here (figure 2), the balance of work in applying intelligent agent technology to music/sound art is largely grounded in computer-science technical perspectives, or in biologically influenced technical perspectives. Further, although AI techniques have been applied to linear and non-linear idioms’ form and content, work has largely been embedded in tonal pitch/duration views of language/knowledge – the few explorations that move beyond conventional western musical language replication or stylistic variation allowing for more artistic flexibility in terms of sonic languages, even if the language is seen as being self-referential. Until recently, the technical focus has also favoured machine-based (closed) or controlled systems, rather than a real-time machine and human agency dialectic.
Given the current thrust of research work, technical approaches are likely to continue and to be extended, neglecting other possibilities that might be usefully explored as part of this process. Two areas on Graugaard’s model that need further exploration are as follows.
Beyond the narrative structures associated with linear tonal music or other pitch/duration languages, an area that begs consideration or taxonomy is the narrative dimension of interactivity (see figure 1), to discuss new structures that might emerge from using new sound languages and/or non-linear processes: whether predetermined in full or part, or constructed solely as an outcome of the interplay of agent deployment. The research question here is one of the relationships between current agent work and generic dramatic narrative structures that are found in other time-based art forms that are a part and reflection of human experience.
A starting point for this discussion that might be useful for future agent-based work includes Booker’s (Reference Booker2004) The Seven Basic Plots: Why We Tell Stories. Booker argues that regardless of place, time or culture there are reoccurring narrative forms that are expressions of the human condition. In these terms, people retell old stories in new ways, regardless of language variations. Similarly, system dynamics modelling (Senge Reference Senge1992) provides a set of narrative structures that are a long-standing part of both time-based arts and real-world situations that could be drawn on (see Whalley Reference Whalley2001) to extend discussion, Senge arguing that there are reoccurring dramatic narrative archetypes that can be coupled together in different ways to reflect narrative structure and the tension/relaxation that dynamically results from them.
Perhaps most importantly, however, a fruitful area for further investigation, largely ignored in the recent agent-technology exploration process, is the affective/emotional dimension of music/sound art given in Graugaard’s (Reference Graugaard2006) model: the gap between interpreting music/sound art language in the technical way that practitioners tend to do, and what it mean in affective terms to a wider range of people. The significance of this is that emotional interpretation is an essential part of human communicative intelligence, carried by languages that are sometimes shared by both practitioners and lay people. To point to a possible future toward integrating technical and human reception perspectives in continuing work, the proposal in this paper is then to first attempt to include an affective dimension in agent-based work to readdress the imbalance.
4.2. Affective music: research challenges and departures
A starting point in integrating affective music approaches is the assumption that music/sound art composition and performance is a semiotic system that signifies meaning to elicit emotional responses (Gabrielsson Reference Gabrielsson2001; Mithen Reference Mithen2005). This is not to imply that the mechanism by which this happens is a straightforward one-to-one system, or that meaning cannot be multiple or change with the context of reception or time, or that the chain from composition to performance to reception is always stable. The argument is simply that there is a robust enough theoretical and empirical basis to attempt this in music composition and performance in western music practice (see Juslin Reference Juslin2001; Scherer and Zentner Reference Scherer and Zentner2001), and that there is a successful history of practically implementing aspects of the approach in media such as film music in western culture (Cohen Reference Cohen2001). Recent work in musical and emotional mapping also provides a basis to implement this technically (see Gabrielsson and Lindstrom Reference Gabrielsson and Lindstrom2001; Schubert Reference Schubert2001; Scherer Reference Scherer2004). Similarly, there is now significant theoretical or empirical work on emotion and meaning in music that reinforces the basis of this approach (Peretz Reference Peretz2001; Patel Reference Patel2008: 300–51), and this is supported by research on musical embodiment and rhythm previously cited (Bryant and Hagen Reference Bryant and Hagen2003; Thaut Reference Thaut2005; Brown and Parsons Reference Brown and Parsons2008).
The technical research challenge is then how to implement an affective approach within the interactive/agent-based paradigm (see Nakamura, Numao and Takagi Reference Nakamura, Numao and Takagi2002 as an early starting point), linking composition, performance and reception. Recent work here has already begun to be explored that can be drawn upon.
4.3. Pointers: affective composition, performance and reception linking
Affective computing is a growing field of interest in computer science research, and understanding how music and emotion matching in new media might be incorporated into technical systems is an up-and-coming field in multimedia research. Recent explorations are found in Li and Shan (Reference Li and Shan2007), Homer et al. (Reference Homer, Chen, Lin and Su2007), and Wang, Zhang, and Zhu (Reference Wang, Zhang and Zhu2004). Cai, Wei-Ying, Wang, Zhang and Zhang (Reference Cai, Wei-Ying, Wang, Zhang and Zhang2007), for example, present a system to automatically suggest music when users read web documents, matching music to a document’s content using songs drawn on in terms of the emotions expressed. These explorations attempt to update, in non-linear new media, what film composers were sometimes skilled at in older linear media. The concern is largely with mapping existing music to trigger emotional response.
Mapping emotion to real-time musical expression in performance is also a continuing part of computer music research that might be usefully drawn. A recent example, and one that uses agent technology, is found in Coutinho, Miranda and Silva (Reference Coutinho, Miranda and da Silva2005). This addresses the issue from an A-Life standpoint combined with neuroscience’s perspectives on emotion. The method partly uses an agent-based system to model different emotional influences, and then plays MIDI files expressively according to an adaptive process.
The area of emotional embodiment has begun to be addressed in authoring systems: two musically conservative examples include Whalley’s (Reference Whalley2002) ‘Kansei in Non-Linear Automated Music Composition: Semiotic Structure and recombinicity’; and the work by Hashimoto, Kurihara, Legaspi, Moriyama and Numao (Reference Hashimoto, Kurihara, Legaspi, Moriyama and Numao2007) in ‘Music Compositional Intelligence with an Affective Flavor’. Whalley (Reference Whalley2002) proposes a software-based automated composition system operated by a single user to create music for closed system dramatic narratives where the dramatic parameters are known but the dramatic shape and outcomes are not predetermined. The concern is with a system that will address Kansei (emotion-based; see Hashimoto Reference Hashimoto1998) approaches to narrative structure, musical generation and performance. The model outlined allows for music creation from controlling an emotional ‘flight simulator’ interface that represents affective states, rather than dealing directly with the composition process, allowing non-composers to ‘recompose’ or explore a work in different ways.
Extending this is the recent offering from Hashimoto et al. (Reference Hashimoto, Kurihara, Legaspi, Moriyama and Numao2007), noting that human feelings have received little attention in automated music generation by intelligent music systems. Their work aims for a method that is musical compositional intelligent and directly linked with the listener’s affective perceptions. The system induces a model describing the relationship between feelings and musical structures, learned by the use of the inductive logic programming paradigm in FOIL (see Quinlan n.d.), ‘coupled with the Diverse Density weighting metric over a dataset that was constructed using musical score fragments that were hand-labelled by the listener according to a semantic differential scale that uses bipolar affective descriptor pairs’. From this, a genetic algorithm based on the acquired model and following standard notational theory generates variations of original musical structures. Although the range of emotional impressions from the music is restricted (favourable–unfavourable, bright–dark, happy–sad, and heartrending–not heartrending), their system is at least able to classify and generate impressions with reasonable accuracy.
Finally, a more successful integration of the affective approach linking composition, performance and reception is found in the recent work of Brown, Livingstone and Muhlberger, ‘Controlling Musical Emotionality: An Affective Computational Architecture for Influencing Musical Emotions’ (Reference Brown, Livingstone and Muhlberger2007). Their system, using western tonal music, demonstrates an affective computing architecture where music is dynamically modified to predictably affect induced musical emotions. The advance over other systems, based in research on musical/emotional mapping from reception studies, is that the system aims to reliably control both perceived (what is sent) and induced (what is received) musical emotions. This is a rule-based system used to modify a subset of musical features for both composition and performance. It is interactive in that it leverages the listeners’ affective sense by adapting the emotionality of the modification made to the music in real-time towards assisting the listeners to reach a desired emotional state. This provides a basis, in an adaptive environment, to implement what film music composers partly attempt to do instinctively in a linear film-music environment.
An accepted limitation of current work here is it is largely based in western tonal languages. An area that remains to be explored in affective music/sound art is then taxonomy for mapping affective emotional responses to a wider range of sound art languages that could be drawn upon. This is partly a reflection of the general neglect of this area in research based in sound art.
5. Towards A Model For Future Work
In recent music/sound art works that use agent technology, the generative improvisation approach demonstrates the greatest potential to extend and broaden music making that is process-based. This is because the method moves beyond the limitations of non-adaptive rule-based systems, and combines the best of both human and machine agency to control both performance and composition.
In addition, agent technology has the potential to further enhance music/sound art creative practice through allowing machines to take on greater ‘intelligence’ in the interactive process by increasing the level of autonomy, adaptability and learning ability of agent technology deployment. Through this technical underpinning, the conversational model of human/machine interaction can be more fully explored. The questions then concern what the conversations are about, and what musical language is used to enact them.
In addressing ‘what’, the conversational model is best embedded in machine agency and adaptable rules based on affective taxonomy that are part of human communicative intelligence, expressed through an affective music paradigm that links music composition and performance with reception research. The theoretical or empirical basis for this is covered in the previous section, and some recent technical deployments described to demonstrate how this might be implemented in agent-based work.
In line with the conversational model, the music/sound art ‘language’ used in a work needs to be embedded not only in affective reception rules but the associated language used to convey affective meaning that is familiar to the intended audience, in order facilitate dialogue between human and machine agency. The music/sound art language chosen must also be able to be varied to convey reoccurring ideas in new ways to sustain interest, and be expanded within the reception paradigm to enhance education, experimentation and engagement. The machine agency, in balancing familiar and unknown languages, should then allow each interactive session to develop something of its own language. Machine agency can then lead or follow in the interactive process with human agency, acknowledging that not all conversations are symmetrical in terms of knowledge and participation.
In terms of research ‘stretch’ of the proposal here, two areas require attention in line with in Graugaard’s (Reference Graugaard2006) model of interactivity. The first is the narrative dimension, with some possible directions for future work being outlined in the prior discussion. Second, but beyond the scope of the paper here, is human gesture input sensing/capture/mapping beyond the simple computer/mouse input method of reflecting 2D affective human intension (see Schubert Reference Schubert2001). The field of HCI research could be drawn on here to allow for more extensive communicative input to music/sound art based agent systems by non-musicians.
In these terms and as a proposed first step, a hybrid agent system based on human/machine generative improvisation that includes an affective conversational model based in known and evolving languages, at least, presents opportunities to draw together and extend multidisciplinary work in agent-based systems applied to music and sound art, redressing the current lack of balance between technical and human reception perspectives in the field.