INTRODUCTION
When individuals gather together around a table, eating becomes not only an opportunity for nourishment, but also for sociability. In Korea, the practice of common eating is recognized as a cultural hallmark: people not only share a table, but also eat from the same dishes. Togetherness is thus a key feature of Korean eating. In recent years, however, the traditional practice of eating together has taken on a new multimodal form among Korea's younger generation. Viewers turn on their electronic devices and watch mukbang, a Korean livestream of eating, via an online broadcasting website. Mukbang is a digital dinner table where an individual known as a broadcast jockey (BJ) displays an array of mouthwatering dishes and enjoys eating them as hundreds of viewers watch. The broadcaster and viewers multimodally communicate with each other: the eater speaks to the viewers through the livestream camera while eating, and the viewers type real-time comments to each other and to the eater through a live chat room. Thus, mukbang provides and supports a virtual platform for sociable eating in which participants’ roles are asymmetrical but mutually dependent. In this article, I examine how mukbang introduces new virtually and multimodally integrated ways to eat together, and I discuss what these patterns of coordinated multimodal involvement can tell us about digital social practice.
The discourse(s) surrounding food and eating have received a great deal of research attention from various perspectives. ‘Foodie’ discourse and nutrition have been examined within larger contexts of culture, society, identity, and health practices (e.g. Wiggins, Potter, & Wildsmith Reference Wiggins, Potter and Wildsmith2001; Gough Reference Gough2007; Buccini Reference Buccini and Albala2013; Szatrowski Reference Szatrowski2014; Mapes Reference Mapes2015), and mealtime itself is a frequently studied discourse context in interactional sociolinguistics. For example, Erickson (Reference Erickson and Tannen1982) investigates how an Italian American family manages topical coherence and floors in dinner table talk; Tannen (Reference Tannen2005) evaluates how friends use different conversational styles during Thanksgiving dinner; and Blum-Kulka (Reference Blum-Kulka1993) compares the storytelling strategies of American families and Israeli families at dinner.
Scholars have recently begun to consider how the act of eating itself constructs, and is constructed by, interaction. In her studies of recorded mealtime conversations, Wiggins (Reference Wiggins2002, Reference Wiggins2013, Reference Wiggins2014) examines how gustatory expressions like ‘yuck’, ‘mmm’, or ‘eugh’ are embodied with eating actions to show tastes and preferences. Wiggins and colleagues (Reference Wiggins, Potter and Wildsmith2001) show how eating is not just merely a matter of nutritional consumption, but a social practice constructed through the ways in which people offer, evaluate, urge, and negotiate eating food. Some studies have considered how people talk about food, drink, and eating practices in online space, such as in restaurant reviews (e.g. Vásquez & Chik Reference Vásquez and Chik2015), online discussions on picky eaters (Gordon & İkizoğlu Reference Gordon and İkizoğlu2017), and conversations about coffee on Twitter (Zappavigna Reference Zappavigna, Seargeant and Tagg2014).
In this study of mukbang, I extend analysis to multimodally coordinated eating.Footnote 1 No study has yet examined how the act of eating itself may be multimodally and jointly conducted, linking online with offline worlds through mutual participation. As more and more social actions and activities are taking place online—from advice-giving (e.g. Stommel Reference Stommel2008; Morrow Reference Morrow, Limberg and Locher2012; Gordon Reference Gordon, Gygax and Locher2015) to role-play (e.g. Campbell Reference Campbell2003) to language learning (e.g. Ware & Kramsch Reference Ware and Kramsch2005; Akiyama Reference Akiyama2016), for example—we need to understand the linguistic and multimodal strategies that enable the coordination and accomplishment of such activities.
To date, most studies on digitally mediated multimodal activity have focused primarily on text and, to some extent, images such as emojis, but they have not considered the more complex multimodal context of video-streaming while communicating through a chat box. The present study attempts to do so, drawing on insights from interactional sociolinguistics (Gumperz Reference Gumperz1982) and conversation analysis (CA) studies regarding recruitments and multimodal interaction. I bring together Goffman's (Reference Goffman1981) classic concept of footing, a term that covers various forms of interactional alignment; the CA notion of recruitment, that is, getting others to do things in interaction (e.g. Rosaldo Reference Rosaldo1982; Drew & Couper-Kuhlen Reference Drew, Couper-Kuhlen, Drew and Couper-Kuhlen2014; Kendrick & Drew Reference Kendrick and Drew2016; Zinken & Rossi Reference Zinken and Rossi2016); and Tannen's (Reference Tannen1989/2007) concept of interactional involvement. Together, these concepts help clarify how the joint activity of sociable eating is multimodally achieved in mukbang. The analysis highlights how mukbang participants jointly coordinate their actions through speech, written text, and embodied acts, and how this coordination establishes involvement among the mukbang participants. The analysis also shows that the co-construction of sociable eating both depends upon and recreates a symbiotic relationship, which is characterized by moment-by-moment negotiations between the viewers and the broadcaster.
The organization of the article is as follows. I first offer some background on what mukbang is and how it relates to traditional Korean models of social eating. I go on to provide an overview of theoretical concepts that inform my approach to multimodal analysis: footing, recruitments, and involvement and joint action. I then introduce the mukbang participants and describe my methods for data collection and display. I offer examples of three kinds of collaborative eating: eating through recruitment, eating as constructed action, and eating as busking. I examine the significance of these forms of involvement in coordinating sociable eating, and I consider how these strategies contribute to participants’ coordinated activities and establish and maintain symbiotic—but asymmetrical—relationships. Finally, I suggest that this sociolinguistic examination of mukbang has wider implications for our understanding of sociability and agency in today's multimodally interactive online environments.
BACKGROUND INFORMATION
What is mukbang?
먹방 (mukbang) is short for 먹는방송 (muknunbangsong): 먹는 (muknun) combines the verb, 먹다 (mukda) ‘to eat’ with a relativizer suffix, -는 (-nun), and thus characterizes 방송 (bangsong) ‘broadcast’. Thus, mukbang means, roughly, ‘a broadcast where people eat’. These mukbang broadcasts typically feature a solo eater, who consumes a large meal consisting of several dishes and speaks through a camera while viewers watch online and type comments through real-time chat. Sometimes cooking and cleaning up are also included in the broadcast, but the primary focus of mukbang is the eating experience, including the visual display of plates or bites of food, the amplification of eating sounds, the use of gustatory expressions (see Wiggins Reference Wiggins2002, Reference Wiggins2013, Reference Wiggins2014), and sometimes additional production resources such as music, costumes, and lighting effects.
Mukbang broadcasts are produced and watched live on platforms like Afreeca TV,Footnote 2 which allow anyone to stream or watch livestreams on a variety of topics. These include TV shows, gaming, singing, talking, cooking, and how-to, but mukbang is among the most popular broadcast offerings. The broadcast jockeys, or BJs, create the livestreams. During live broadcasts, viewers can reward BJs with star balloons, a form of internet currency that can be converted into real cash—one star balloon was worth approximately eight US cents at the time the research was conducted. The monetary benefits for popular BJs can be significant, and for some, livestreaming can become a main or sole source of income. Thus, BJs are often motivated to create fun, compelling content to attract more viewers.
Mukbang has been popular in Korea since the late 2000s. News articles (AFP/Relaxnews 2013; Choi Reference Choi2015; Hu Reference Hu2015) posit various reasons for the popularity of mukbang among young Koreans. First, mukbang may resemble what western audiences know as ‘food porn’—it gives viewers vicarious satisfaction, especially through the sensory stimulation provided by visual and audio representations of eating. Care is given to the display of food, and mukbang BJs may intentionally eat loudly, sometimes increasing their microphone volume to dramatize their eating sounds. In the YouTube comments section of mukbang clips, viewers often express appreciation or gratitude to mukbang BJs. Some excerpted comments in English and Korean include ‘omggg so excited!! my fav foods’, ‘love the slurping sound he makes’, and ‘다이어트하는데 대리만족 되네요’ (‘it vicariously satisfies me while I am currently on a diet’). Comments like these suggest that mukbang BJs may help to sate the food cravings of those who cannot or do not eat such elaborate meals.
Second, mukbang is also a new way to be with other people and fulfills a desire not to eat alone. Eating together lies at the heart of traditional Korean food culture: the prototypical Korean meal involves a family gathered around a common table to share numerous communal dishes. Thus, the prospect of an individual eating alone has long been the object of culturally perceived stigma. As single-person households in Korea are gradually increasing (see Kim & Lee Reference Kim and Lee2014), so is the phenomenon of isolated eating. But mukbang provides an alternative, enabling a feeling of togetherness for those physically eating alone. Many young people use mukbang as their new eating companion. As an interviewee from AFP/Relaxnews (2013) notes, watching mukbang is like going to dinner with someone, making viewers feel emotionally connected. Mukbang participants often refer to this mediated co-presence in chat messages like ‘아놔 저도 방금 시킴 같이 먹어요~ ’ (‘omg I just ordered food as well let's eat together~’) or YouTube comments such as ‘저 지금 점심 먹으면서 보고있어요!’ (‘I am watching this while eating my lunch!’). Through mukbang, technology makes possible online what is traditionally considered to be possible only offline, and this reality has social and emotional meaning for participants. It is thus important to examine how this practice of eating together while apart is multimodally and collaboratively accomplished in digitally mediated contexts.
THEORETICAL FRAMEWORK
Footing
Goffman's (Reference Goffman1981) concept of footing, or alignment, addresses the ways in which participants situationally orient themselves toward, and take part in, the meaning-making process of ongoing interaction. When we change footing, in Goffman's (Reference Goffman1981:128) words, ‘it implies a change in the alignment we take up to ourselves and the others present as expressed in the way we manage the production or reception of an utterance’. Changes in footing show how interactional dynamics shift moment by moment, and these changes are accomplished through what Gumperz (Reference Gumperz1982) calls ‘contextualization cues’ that signal how speakers mean what they say and do—through linguistic cues, such as lexical items and syntactic structure, as well as through paralinguistic ones, such as tone, pitch, laughter, and nonverbal actions such as gesture, motion, and gaze.
Recruitments
The interactional project of getting others to do things is known in conversation analysis as recruitment (e.g. Clayman & Heritage Reference Clayman, Heritage, Drew and Couper-Kuhlen2014; Drew & Couper-Kuhlen Reference Drew, Couper-Kuhlen, Drew and Couper-Kuhlen2014). Recruitments have their own discursive sequence of verbal and/or nonverbal actions: requesting or soliciting, accepting the request or solicitation, and solving. Recruitments are not necessarily initiated by those who are in need, and voluntarily offering help to another person is also an act of recruitment.
Kendrick & Drew (Reference Kendrick and Drew2016:2) note that ‘recruitment lies at the very heart of cooperation and collaboration in our social lives’. They conceptualize recruitment as a set of organized practices: those employed by Self (the person seeking assistance or to whom assistance is offered) to indicate their own difficulties, and those employed by Other (the person extending assistance) to offer solutions. In a study of everyday conversation among family and peers, Zinken & Rossi (Reference Zinken and Rossi2016:20–21) describe recruitment as a contribution that ‘can be expected on the basis of already established commitments’. They note that Other's engagement cannot simply be characterized as assisting, but also as contributing to a joint activity. In one of their examples, two people take up the task of cooking together. Sofia peels potatoes and puts them on a cutting board, her action recruits Paolo to cut the peeled potatoes, and Paolo's cutting of the potatoes indicates his acceptance of her offer. According to Zinken & Rossi (Reference Zinken and Rossi2016:26), ‘the established commitments and respective roles in the joint activity can function as an engine that progresses the sequence to its relevant outcome’.
As the prior example shows, recruitments need not be constructed with directive language. Drew & Couper-Kuhlen (Reference Drew, Couper-Kuhlen, Drew and Couper-Kuhlen2014:7) note that the act of asking can be accomplished through ‘half-spoken turns’ or ‘gesture, body position, a look in a certain direction’ without speaking. Goodwin & Cekaite (Reference Goodwin, Cekaite, Drew and Couper-Kuhlen2014) also suggest that recruitments are multimodally embodied in interaction. For example, they demonstrate how parents issue directives with gaze, gestures, facial expressions, and intonation to induce their son to brush his teeth and get ready for bed.
Other scholars have examined how cultural and social elements such as gender, age, and roles contribute to hierarchical relationships among interactants and affect how recruitments are realized. Rosaldo (Reference Rosaldo1982) observes that in the Ilongot community, women receive commands to do domestic chores from men, and that children, according to their age hierarchy, receive directives from their seniors or parents and pass them on to younger siblings. Sicoli (Reference Sicoli2018) describes how certain types of recruitments are managed in a Lachixío family when a father complains about over-cooked tortillas and indirectly requests a drink of water (i.e. ‘a little water will help one get it down’). The other family members display their alignments by making eye contact with one another and using the directive form of recruitment. When the mother says to the son, ‘go get water for your father to drink’, the son passes the obligation to his sister: ‘now it's you because you're a woman’. As they jointly work to resolve the recruitment sequence, the participants’ verbal and nonverbal actions both construct and display a series of hierarchical obligations.
In sum, then, the accomplishment of recruitment is jointly constructed and negotiated moment by moment through verbal and nonverbal cues, and it at once shapes and is shaped by participants’ interactional roles and relationships.
Involvement and joint action
Tannen's (Reference Tannen1989/2007) notion of involvement can be characterized as connection that binds speakers and listeners together. The connection is linguistically created and sustained through active participation and engagement with participants, which presupposes that the participants have mutual understanding built upon background knowledge.
Tannen (Reference Tannen1989/2007) notes that involvement is achieved in interaction, not given. In creating involvement in interaction, sound and sense work together. On the sound level, she cites ‘rhythmic synchrony’ from the study of conversational synchrony (e.g. Birdwhistell Reference Birdwhistell1970; Scheflen Reference Scheflen1972; Kendon Reference Kendon1981), which refers to ‘the astonishing rhythmic and iconic coordination that can be observed when people interact face to face’ (Tannen Reference Tannen1989/2007:32). Utterances and movements of listeners are synchronized with those of speakers. Erickson & Shultz (Reference Erickson and Shultz1982), as cited in Tannen (Reference Tannen1989/2007), demonstrate the shared conversational rhythms through which participants speak, pause, listen, or make physical actions on the beat.
On the sense level, Tannen identifies ‘constructed dialogue’ as one linguistic involvement strategy. She argues that what is often called ‘reported speech’ should in fact be thought of as constructed dialogue, arguing that we cannot represent another's words exactly as they were spoken and, even if we could, the words are necessarily recontextualized by the new speaker's voice and the current context. Tannen also identifies ‘ventriloquizing’ as a special case of constructed dialogue where people borrow the voice of someone or something in the presence of that someone or something to achieve certain communicative goals such as criticizing. Scholars have examined ventriloquizing and ventriloquizing-like dialogue in various interactions: parents’ use of a baby talk register (e.g. Gordon Reference Gordon2009), adult family members’ speech as and to their family dogs (e.g. Tannen Reference Tannen2004), and veterinarians’ use of animal-authored talk with their nonhuman patients (e.g. Roberts Reference Roberts2004; MacMartin, Coe, & Adams Reference MacMartin, Coe and Adams2014).
Tannen's discussion of involvement primarily focuses on verbal exchanges in spoken and literary discourse. A second, multimodally oriented approach to interactional involvement focuses primarily on embodied actions (e.g. Streeck, Goodwin, & LeBaron Reference Streeck, Goodwin and LeBaron2011) and ‘participation’ (e.g. Goodwin & Goodwin Reference Goodwin, Goodwin and Duranti2004). Scholars examine how people use bodily resources and collaboratively organize their bodily conduct to create joint courses of action during strips of talk in different kinds of contexts. For example, Clark (Reference Clark1996, Reference Clark, Levinson and Enfield2006) demonstrates how ‘joint commitment’ is constructed when people walk on a crowded street or assemble a TV stand together; Kendon (Reference Kendon2009) shows how romantic partners collaborate with each other when kissing; and Aoki (Reference Aoki2011) examines how Japanese people nod to signal and elicit responses. As van Leeuwen (Reference van Leeuwen, Tannen, Hamilton and Schiffrin2015) points out, interaction is de facto multimodal where different semiotic modes are combined and integrated; where interaction takes place, participants are expected to ‘work together’ multimodally to create involvement.
There is a small but rapidly growing body of research on co-constructed talk in technology-mediated environments. Herring and colleagues (e.g. Herring Reference Herring, Georgakopoulou and Spilioti2015; Herring & Demarest Reference Herring and Demarest2017) call attention to the emergence of interactive multimodal communication, emphasizing how technological affordances allow more participatory and multimodal interaction between online and offline spheres. For instance, Keating & Sunakawa (Reference Keating and Sunakawa2010) show how gamers bodily and verbally collaborate across two spatial environments (i.e. a gaming world on a computer and a physical world in real life). Gordon (Reference Gordon, Gygax and Locher2015) presents an online weight-loss discussion forum where users co-create small stories and solve problems for one user who had an uncomfortable medical encounter with a doctor who made an unwelcome comment about her weight.
Involvement is mutually and reciprocally created through multimodal resources in interaction. I bring Tannen's (Reference Tannen1989/2007) sense of involvement in verbal contexts to the analysis of multimodal interaction. I explore how language use and collaborative actions are juxtaposed in multimodally mediated online contexts, and I show how this coordination creates involvement and, by extension, establishes both community and social agency.
So far, we have reviewed the overarching interactional concepts that organize my approach to understanding the ecology of mukbang interaction: footing, recruitments, and involvement in multiple contexts. Now I turn to the data, introducing the mukbang participants as well as the data collection and display method.
MUKBANG PARTICIPANTS
BJ ChangHyun
BJ ChangHyun regularly made mukbang broadcasts on the popular livestreaming website Afreeca TV. Like many other BJs, he recorded the livestream and then uploaded it to YouTube for others to watch later. During the time I collected data, his approach to archiving his materials on YouTube was unusual, because he also consistently included the chat messages contributed by viewers during the livestream session—they appear in a chat screen in the upper right corner of the mukbang screen (see Figure 1). Most BJs’ YouTube videos of mukbang excluded this crucial part of the livestream process, and without the chat messages, the picture of verbal and nonverbal actions is incomplete. ChangHyun's extensive mukbang archive with accompanying chat messages preserved the complexity of the livestream as it unfolded in real time and is thus an ideal data set for multimodal analysis.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190322010249983-0356:S0047404518001355:S0047404518001355_fig1g.jpeg?pub-status=live)
Figure 1. BJ ChangHyun reading chat messages during mukbang.
During the time this study was carried out, BJ ChangHyun hosted mukbang at his apartment every night at 11pm. He usually carried out or ordered food for each segment but very occasionally cooked. As of December 2015, his YouTube channel had 221,356 subscribers and contained 505 uploaded videos of mukbang.
Viewers
When BJ ChangHyun livestreamed his mukbang in 2015, an average of 1,400 people watched each broadcast. It is hard to know exactly how many viewers participated in chatting while watching because the Afreeca TV website does not count viewers and chat participants separately. Given the average number of viewers and the fast pace of chat messages occurring on the chat screen, it is estimated that hundreds of viewers participate in chatting during each of his mukbang sessions. Two types of mukbang consumers can be distinguished: visible viewers and invisible viewers. Visible viewers are those who participate in chat messaging, so their presence is apparent through the chat messaging screen. Invisible viewers are those who watch without chatting at all, including potential viewers who watch recorded clips through YouTube. In this study, only visible viewers are considered.
METHODS FOR DATA COLLECTION AND DISPLAY
I examined sixty-seven mukbang videos that BJ ChangHyun uploaded to YouTube between September and October 2015. BJs may broadcast and respond to viewers’ chat messages during three phases of mukbang: preparing, eating, and cleaning up. Only the second phase is obligatory, however, and most hosts usually only record and upload the eating parts on YouTube. Examples considered here are thus excerpted from when ChangHyun eats, interacting with his viewers.
In the transcribed multimodal interactions I turn to next, there are two parties: BJ ChangHyun and his viewers. I categorize their participation into three parts: (i) I include screen captures of ChangHyun's visible bodily actions; (ii) I describe his bodily actions including facial expressions in English; and (iii) his speech and the typed comments of his viewers are displayed in Korean, followed by idiomatic English translation in single quotation marks. Each chat message is shown with a participant's pseudo ID. Where spoken, physical, and typed actions are simultaneous, they appear on the same row in the transcript. BJ ChangHyun's actions are numbered by Arabic numerals, whereas viewers’ actions are denoted by Roman alphabet letters. I do not translate Korean online laughter, which is conventionally typed through repeated use of the Korean consonant letter <ㅋ> (kieuk), equivalent to the English letter <k > .
COLLABORATIVE EATING
In what follows, I present three types of eating actions jointly undertaken by BJ ChangHyun and his visible viewers: (i) eating through recruitment, (ii) eating through constructed action, and (iii) eating as busking. These are not the only forms of multimodal interaction that contribute to collaborative eating. But they are the ones that I repeatedly encountered, showing the prominent ways in which people co-construct involvement through mukbang. I argue that these negotiated forms of involvement have important consequences, connecting the host and his viewers in a symbiotic relationship while also establishing a form of collaborative social agency.
Eating through recruitment
The following examples illustrate two types of recruitment in mukbang interaction. The first example shows a viewer, as a recruiter, telling BJ ChangHyun to stop eating one dish and eat another instead. The viewer's typed directive leads ChangHyun to take the footing of a recruit and carry out the requested eating action. In the second example, the eater and viewers create a chain of recruitments, taking turns to solicit one another's involvement in order to accomplish the act of eating. When ChangHyun makes a request to the viewers, their answers to that request function as another recruitment to him. Then the host assumes the footing of a recruit and acts in response to the request made by the viewers. These two examples show the act of eating being jointly constructed through shared recruitment, as the mukbang participants collaboratively direct, construct, and perform eating tasks together through verbal and nonverbal, aural, and visual modes.
Example (1): Recruitment by a viewerFootnote 3 (shown on the following two pages)
In this mukbang, fried chicken and tteokbokki, Korean hot spicy rice cake, are prepared. Immediately before the following excerpt, BJ ChangHyun has been eating and reviewing the fried chicken. While he is savoring it, a viewer types that the host should eat tteokbokki. Upon seeing the viewer's typed solicitation, he stops eating the fried chicken and tastes tteokbokki in compliance with the directive issued by the viewer.
Example (1)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190322010249983-0356:S0047404518001355:S0047404518001355_tabU1.gif?pub-status=live)
It is the BJ who owns the food. He has the right to choose whatever he wants to eat. But in this example, the viewer input influences his eating action. When a viewer whose ID is aseesy tells him ‘eat tteokbokki too’ in line A, the given directive serves as a contextualization cue to signal that the viewer is assuming the footing of recruiter. The host aligns his nonverbal and verbal behavior to the viewer to accomplish the recruited work: he drops the chicken he is about to eat in line 4 and says ‘Umm tteokbokki?’ and ‘Okay I will’ in lines 6 and 7, and then verbally confirms again that he will eat tteokbokki in line 8, ‘I am eating tteokbokki’. At the same time, his eye gaze (line 7) and right hand (line 8) also correspond to his verbal actions as he looks at and grabs the tteokbokki, respectively. ChangHyun's bodily actions and spoken utterances, in response to the viewer's recruitment message, are contextualization cues that indicate ChangHyun is taking up the footing of recruit.
Mukbang is not BJ ChangHyun's own personal mealtime but rather, it is open to public viewership to satisfy viewers’ vicarious pleasure of eating. This set-up enables two mukbang parties (i.e. BJ ChangHyun and his viewers) to share authorship of the act of eating. In that context, the viewer takes up the footing of the recruiter while the BJ assumes the recruit footing and realizes the viewer's typed action by animating it. It is true that ChangHyun's eating does not literally satisfy his viewers’ eating desire. His eating through recruitments is more likely a psychological process of ‘doing for others’, responding to an asymmetrically structured interactional environment where viewers want to eat but cannot. Viewers use the BJ as a resource to eat for them so that they can vicariously enjoy the eating moment. The means available to the mukbang participants differ: speaking and doing for BJ ChangHyun, typing for his viewers. But all of these means are used to enact and achieve recruitments, which form the interactional backbone of mukbang. Participants draw on these multimodal means to jointly produce collaborative eating and involvement.
Example (2): Recruitment chain (shown on the following three pages)
Now I turn to a second example that illustrates how recruitment utterances are used to coordinate the act of collaborative eating. In what I characterize as a ‘recruitment chain’, participants take up and exchange in sequence two contrasting footings—that of recruiter and recruit—to share responsibility for the task of eating.
The following excerpt takes place when ChangHyun is sampling different kinds of croquettes, seasoned fried chicken, and cherry-flavored juice. He offers viewers the choice of what he should eat first. Viewers provide their opinions, and then he chooses one of these options as he performs his very first eating action.
Example (2)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190322010249983-0356:S0047404518001355:S0047404518001355_tabU2.gif?pub-status=live)
Recruitments are usually considered a one-way process from a recruiter to a recruit, who fulfills a recruiter's request. However, in this situation, recruitment is recurring, with each party realizing the other's request: the BJ and his viewers take turns assuming recruiter and recruit footings and vice versa. In line 1, ‘What do you want me to eat first?’, ChangHyun's recruitment initiation invites his viewers to become active participants in the act of his eating. The viewers respond to his request as recruits, giving multiple suggestions including croquette (line B), mozzarella croquette (line C), ice cream croquette (lines D, G, K), seasoned fried chicken (line I), cherry juice (line L), and cheese croquette (line M). The viewers’ suggestions are answers to his question, but at the same time they are requests—each contribution directs BJ ChangHyun to take a particular eating action. After collecting the suggestions from the viewers, the host now takes up the footing of recruit, choosing one of the responses in lines 4 and 6 (‘Mozzarella croquette? Okay’ and ‘I am going to try the mozzarella croquette since it was the first response’). His bodily actions provide contextualization cues that indicate this footing shift: his eyes are looking at the targeted croquettes on the plate (line 4) and his left hand is grabbing the mozzarella croquette (line 7).
It is important to note that none of the participants’ work as recruits is passive. Viewers actively decide and communicate their preferences when recruited to do so. In choosing one suggestion from a pool of many, ChangHyun exerts his own agency, even as he assumes a recruit footing. Thus, as they work to achieve collaborative eating, both parties exhibit agency in recruiting and in being recruited. They display and incorporate different footings, creating what I call a ‘recruitment chain’, which is visually presented in Table 1.
Table 1. Recruitment chain.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190322010249983-0356:S0047404518001355:S0047404518001355_tab1.gif?pub-status=live)
The recruitment chain highlights how the interaction between ChangHyun and the viewers creates a series of recruitments. Also, the table shows how each action—speaking (i.e. requesting), typing (i.e. giving multiple suggestions), and speaking and doing (i.e. choosing a response and grabbing the mozzarella croquette from the plate)—functions as a contextualization cue that multimodally signals how each party changes footing from recruiter to recruit or vice versa.
Rosaldo (Reference Rosaldo1982) and Sicoli (Reference Sicoli2018) describe how recruiter/recruit footings are passed down from one to another based on a social hierarchy from parents to children (familial hierarchy) or from men to women (gender hierarchy). In their studies, people between the recruiter and the recruit may serve as passers, who, using their hierarchical roles and power, simply transfer the recruit's words to another person or make others responsible for the recruitment work. However, in mukbang, the recruitment work is passed back and forth, and chained recruitment results in shared agency: to recruit is to be recruited at the same time.
Eating as constructed action
Tannen (Reference Tannen and Coulmas1986, Reference Tannen1989/2007) suggests dialogue is constructed when a person animates someone's previous utterance, repeating and recontextualizing what was spoken. Building on this concept, I argue that action can also be constructed, much like dialogue. This is what I call constructed action. I identify two types of constructed action in mukbang: embodied animating and puppeteering. Both are similar to what Tannen (Reference Tannen1989/2007:22) calls ‘ventriloquizing’, a type of constructed dialogue where a speaker animates another's voice in the presence of that other.
The following analyses are based on this concept of ventriloquizing, but these examples are distinctive in that BJ ChangHyun acts as if a viewer were physically co-present with him, then ‘performs’ as if he is that viewer accomplishing her/his goal, by animating that viewer's typed recruitments in the form of physical action.
Example (3): Embodied animating (shown on the following three pages)
The following constructed action happens when the BJ is eating different kinds of boneless fried chicken including creamy chicken, Korean BBQ-flavored chicken, chicken with shrimp, and pickled radishes. Preceding this excerpt, a viewer called Bamboo has just given ChangHyun a number of star balloons. In this segment, the eater offers Bamboo the chicken as a gesture of gratitude. Interestingly, he pretends that he is feeding Bamboo as if Bamboo were sitting across from him, by bringing the chicken closer to the camera. Then, he eats it, now embodying ‘Bamboo-as-eater’. Here, ChangHyun takes up two footings simultaneously, and both are multimodally layered: His literal act of eating—chewing the chicken and making eating sounds—represents the viewer, while his spoken action—offering food and complimenting the viewer—and physical act of offering food represent the footing of the food provider.
Example (3)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190322010249983-0356:S0047404518001355:S0047404518001355_tabU3.gif?pub-status=live)
As ChangHyun pretends to be Bamboo, his eating action is constructed at two levels: on one level, it is not the BJ but the one he is embodying, Bamboo, who is eating the chicken. In line 11, he is looking at the chat screen as if he were looking at Bamboo, and his spoken utterance includes gustatory expressions in line 12 (‘Mmmm ah mmm… nom nom’)—both contextualization cues that signal ChangHyun is animating Bamboo-as-eater. We actually do not know what Bamboo's eating action would actually be like; the BJ's animation of the viewer eating must be understood as constructed.
On the second level, BJ ChangHyun maintains his footing as a food provider by offering food in line 1 (‘Okay, Bamboo, say ah’) with his right hand bringing the chicken closer to the camera and in line 3 (‘Bamboo, try it’) and complimenting his embodied viewer—Bamboo, whom he is performing as in line 12 (‘our Bamboo is eating so well’). In line J, Bamboo types the eating sound, ‘Nom nom!’, as if s/he were eating the offered chicken. The viewer's chat message of a gustatory expression strengthens ChangHyun's food provider footing work. At the same time, the host continues to animate Bamboo's footing as eater through bodily action (chewing the chicken) and spoken action (the gustatory expressions ‘Mmmm ah mmm’ and ‘nom nom’) in line 12. ChangHyun's bodily actions and spoken utterances are used as contextualization cues to signal that the two footings (food provider and Bamboo-as-eater) are layered together to simultaneously and jointly comprise the eating action. The BJ eats as, and for, Bamboo, and his talk and offering gesture address Bamboo as if the viewer were eating.
In summary, building on Tannen's model of ventriloquized constructed dialogue, I suggest that embodied animating is a multimodal way to construct another's typed action in the virtual presence of the person being embodied. It also shows the dynamic performance where different types of actions—doing, speaking, and typing—signal different roles of offering and eating food that are tightly enmeshed, and thus it creates involvement and sociability between the participants. Now, I turn to another example of constructed action, which I call puppeteering.
Example (4): Puppeteering (shown on the following six pages)
In the following excerpt, ChangHyun is eating tangsuyuk (crispy sweet and sour pork), possam (steamed pork with vegetables) with ssamjang (Korean dipping sauce), and fried dumplings. Right before the following situation, a viewer (trtr5) has repeatedly asked him to eat tangsuyuk in a particular way. ChangHyun finally accepts the viewer's request and subsequently acts as a puppet, as if moved and controlled by the viewer. In this example of constructed action, he animates the viewer's typed recruitments, presumably for the viewer's vicarious pleasure. To conserve space, I have included only the messages of the viewer who requests and ‘controls’ the ChangHyun's eating. Other viewers’ chat messages are included only if the eater mentions them while speaking.
Example (4)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190322010249983-0356:S0047404518001355:S0047404518001355_tabU4.gif?pub-status=live)
This constructed action through puppeteering starts with a request by the viewer, trtr5, in line A, ‘Could you please eat tangsuyuk wrapped with a lettuce leaf?’. The host is continually and verbally seeking confirmation and visually demonstrating what he is doing. His spoken utterances and visual demonstrations serve as contextualization cues that indicate ChangHyun is assuming the footing of the puppet. First, every time he embodies the viewer's typed recruitments, he looks for the viewer's confirmation with utterances such as ‘Do you want me to dip?’ (line 8), ‘What's next?’ (line 9), and ‘Should I wrap it?’ (line 10) to perform, or eat appropriately as recruited. ChangHyun also seeks confirmation that what he has just done is right—‘Can I eat now?’ (line 11), ‘Am I correct?’ (line 12), and ‘Am I doing it right?’ (line 13)—so he can correct his previous action, if he does wrong, before moving forward. Second, he presents to the camera how he is following trtr5's instruction step-by-step: he (i) puts tangsuyuk on a Korea lettuce leaf, (ii) adds some sauce and garlic, (iii) wraps it up, and finally (iv) eats. His hesitating action before eating in line 22, in addition to his continuous seeking of confirmation, demonstrates that his eating actions are not self-motivated but rather seemingly controlled by the viewer, so that he becomes a ‘puppet’. This is also why he has his viewer authorize and approve every eating action that he makes.
Importantly, it is worth noting that ChangHyun is not a passive puppet but rather an active one: he is aware that his eating action is ostensibly controlled but also knows that he is responsible for eating as requested. Thus, the eater engages in a constant pursuit of approval, eliciting step-by-step instructions. By continuously talking back to his puppeteer, ChangHyun's active embodiment suggests his agency in eating is not simply constrained but also voluntarily shared, thus contributing to creating involvement and reciprocity.
While his eating actions are influenced by the viewer's input, the host is getting ready to take up another footing: as a food reviewer, which enables him to speak in his own voice. ChangHyun multimodally dramatizes his footing display: playing music (line 26), turning on lights (line 27), making a hand gesture (line 28), and grunting (lines 28 and 29). Such actions serve to intensify his reviewing actions: uttering ‘Delicious’ (lines 31 and 32), ‘Oh my god’ (line 34), ‘Oh’ (line 35), and ‘How come it tastes so good?’ (line 36) and picking up a lettuce leaf to try another one (line 34). When he changes his footing to that of a reviewer, therefore, he uses a range of multimodal contextualization cues including music, lights, bodily actions, and verbal evaluation. Now the viewer aligns as a person whose puppeteering actions are being evaluated. It is supported by her/his chat message in line G, ‘Do not kick me out’. On AfreecaTV chat rooms, hosts can kick users out of chat rooms. The viewer is now afraid that ChangHyun will do so if he does not like what he's tasting.
This example demonstrates how ‘puppeteered’ eating actions can be viewed as constructed, similar to the construction of dialogue (Tannen Reference Tannen and Coulmas1986, Reference Tannen1989/2007). More importantly, unlike embodied animating where the host brings in his own agency to assume an imaginary footing of the viewer and performs a certain eating action, puppeteering does not entail the BJ casting his own interpretation of the eating action, but rather performing a literal animation of what the viewer instructs.
Example (5): Eating as busking (shown on the following three pages)
When walking on city streets, we might see street performers. They play music, sing, or sometimes mime for donations. People stop and enjoy their performance. Some of them drop money into a hat, a tip jar, or an open instrument case to show their appreciation for the performance. When viewers tip, the buskers make a gesture of gratitude such as smiling, nodding their head, bowing, winking, saying thank you, or sometimes posing for photos with the audience. Mukbang interactions sometimes follow a similar trajectory when viewers voluntarily give star balloons to reward BJs for a satisfying experience. In ChangHyun's mukbang, if a viewer presents star balloons, other viewers acknowledge it by repeatedly typing the discourse marker, ‘oh’, and more importantly, the star balloon sender's presence is overtly recognized by ChangHyun, who stops eating and dramatizes his gratitude through exaggerated appreciative responses: calling out the ID of the sender, playing music, dancing, singing, or offering food to the camera. These responses show gratefulness, but they also enhance the entertainment value of the broadcast through dramatic humor. In the following example, the BJ is eating kimchi fried rice, gorgonzola pizza, and a beef steak. A viewer whose ID is woo345 has just given him 1,004 star balloons, which equates to approximately $80 in US currency at the time this research was conducted.
Example (5)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190322010249983-0356:S0047404518001355:S0047404518001355_tabU5.gif?pub-status=live)
In mukbang, the practice of sending and receiving star balloons is, like eating, the product of reciprocal joint actions. It resembles ‘gifting’ (Good & Beach Reference Good and Beach2005; Robles Reference Robles2012) and ‘offers’ (Sicoli Reference Sicoli2018): giving and receiving benefits publicly indicate and enhance the special relationship between participants. Many scholars (e.g. Hua, Li, & Yuan Reference Hua, Wei and Yuan2000; Good & Beach Reference Good and Beach2005; Robles Reference Robles2012; Sicoli Reference Sicoli2018) note that gift giving and receiving are sequentially organized: when gifts are given and accepted, receivers recognize and open them, display assessments of the gifts, and express gratitude.
When ChangHyun receives the 1,004 star balloons, he stops eating. His bodily actions show how he sets up the mukbang stage to respond to the sender: in lines 1 and 2, he scrolls back through the chat feed to display where the 1,004 star balloons were sent, starts playing children's gospel choir music, and turns on a silver light to create a dreamy and unreal effect. These actions playfully dramatize his show of gratitude. The BJ humorously takes up the footing of a charismatic preacher, saying ‘Hallelujah’ (lines 4, 8, and 9), ‘amen’ (line 5), ‘Have mercy on you’ (lines 6 and 10), and ‘God save her’ (line 13), and making nonverbal actions of a prayer gesture (line 4), crossing himself (line 5), raising his hands (lines 6 and 9), and bowing his head down (lines 7 and 10). Just as in the previous excerpt, audiovisual devices, spoken words, and physical actions are jointly used as contextualization cues to display ChangHyun's footing work.
His playful footing is also intertextually tied to his attempt to make a pun of the number 1,004, which is a homophone in Korean with the word, 천사 (chunsa) ‘angel’. Through his multimodal pun and his use of conventional bodily expressions of praise, ChangHyun gives the star-balloon sender the footing of an angel, alluding to his or her generosity. In this way, BJ ChangHyun engages in a kind of pretend play with the viewer.
I argue that this mutual gifting between ChangHyun and his viewer creates involvement in three senses: (i) establishing mutual appreciation through this organized exchange between the sender and host; (ii) enhancing solidarity between the BJ and the viewer through humor; and (iii) accomplishing each participant's interactional goals—vicariously fulfilling the viewer's desire for entertainment and sensory satisfaction, while also making money for BJ ChangHyun.
DISCUSSION
All of these mutually beneficial mukbang interactions are reminiscent of the symbiosis between an Egyptian plover bird and a crocodile. When a crocodile swallows its prey, bits of flesh get stuck in its teeth and can cause tooth decay unless removed. To clean its teeth, the crocodile opens its mouth as a sign of invitation, and Egyptian plover birds enter to eat the food lodged in the crocodile's teeth. Their unlikely relationship is mutually beneficial: the crocodile maintains healthy teeth and the plover bird gets food. What BJ ChangHyun and his viewers create together through collaborative eating is a mutually beneficial relationship like that of the plover bird and the crocodile. Viewers achieve sensory satisfaction or find entertainment through the host's efforts to display and share his eating experience, while BJ ChangHyun himself responds to viewers’ typed recruitments and earns attention, praise, and money. In mukbang, multimodal social work is undertaken in part through footing shifts signaled and accomplished in spoken, embodied, and typed actions.
We have seen how the mutual benefits of collaborative eating are constructed moment by moment through various forms of joint involvement, including recruitments in (1) and (2), constructed action in the form of embodied animating in (3) and puppeteering in (4), and busking in (5). In mukbang, viewers use their messages to try to affect and direct the host's eating actions, but it is also important to note that their text comments are acted on only when the BJ chooses. And participants on both sides of the camera display gratitude and acknowledgement for what the other party has offered. Of course, there is always a possibility that viewers will stop giving BJ ChangHyun star balloons or watching his eating show if his mukbang does not adequately satisfy their desires. There is also a possibility that, in the absence of sufficient participation from his audience, ChangHyun might stop broadcasting and spend his time in other pursuits. Thus, joint involvement is also mutually ensured: ChangHyun is motivated to constantly strive to entertain viewers and comply with their requests—and viewers are motivated to provide verbal and monetary encouragements.
Participants’ agency in shaping the interaction and the course of the mukbang event is shared and jointly constructed. This supports Al Zidjaly's (Reference Al Zidjaly2009:178) understanding that exercising agency is a ‘mediated, collective process of negotiating alignments, task, and roles’. That is, ChangHyun has the power to initiate and host this eating-based interaction, but without his viewers’ participation, this mukbang system would fall apart. Collaboration and involvement are not merely beneficial outcomes of mukbang; rather, they are the means to its viability as an interactional event. Mukbang is not simply a multimodally and jointly mediated reconfiguration of eating action—it is also a reconfiguration of social agency. As Ahearn (Reference Ahearn2001) notes, empowering and exerting agency can be controlled by sociocultural context. In this case, mukbang re-imagines collaborative eating, challenges the traditional social stigma of eating alone, and demonstrates how agency can be collaboratively achieved on the internet.
CONCLUSION
Tannen's (Reference Tannen1989/2007) idea that sound and sense are ‘involvement strategies’ offers an important perspective in its appreciation of the poetics in ordinary interaction. It also points to ways in which these subtle devices bind social participants to one another. Here, I have applied that insight to a new interactional context—the multimodal practice of collaborative eating known as mukbang. My analysis has shown that action, too, is an important involvement strategy: through food and eating, mukbang participants are connected to each other. They establish a form of co-presence that transcends physical distance, even as it relies on embodiment, physicality, and coordinated sensory attention. This connection is asymmetrical and jointly negotiated, with broadcaster and viewers employing different modes and acts to shift footings and manage the asymmetrical distribution of authorship and responsibility in collaborative eating. Each party fulfills their own goals, but at the same time each contributes to the joint construction of involvement and reciprocity.
This study adds a new analytical dimension to the notion of eating as a social practice. Through the multimodal possibilities of mukbang, participants engineer a new way of being together that simultaneously draws on and challenges traditional forms of eating practice. Involvement created through mukbang establishes a joint redefinition of what it means to ‘eat alone’ and ‘eat together’. It transforms what has been traditionally considered social stigma into a powerful interactional resource that technologically binds physically separated people together through food and eating.
The study also shows how interactional sociolinguistics can contribute to that understanding, extending the notion of ‘contextualization cues’ to the multimodal interactive context of digital environments, where participants employ a range of linguistic, aural, and visual resources to signal how they mean what they say (speaking and typing) and do (eating). The tracing of co-constructed eating action via mukbang contributes to our understanding of shared agency and jointly produced action in online multimodal interaction more generally. My analysis of mukbang sheds light on how the ritual act of eating, which is generally conducted offline, can be jointly and multimodally performed on the internet and create a new context of sociability. Thus, the current study not only bridges the gap between online and offline discourse, but also expands our knowledge of multimodality in eating discourse and digital interaction.