Anyone who watches subtitled video is aware that subtitles are not always perfect and at times result in a frustrating experience. This frustration could be due to inaccuracies, spelling mistakes, or simply a lack of time to read subtitles before they disappear. It should therefore come as no surprise that subtitle speed is one of the most debated and controversial aspects of subtitling for users, producers, and researchers. Broadcasters and producers of online content on streaming platforms are in favor of verbatim subtitles mainly because this requires less editing (resulting in lower costs) in what is fast becoming a largely automated production process. Less editing may satisfy some users but it necessarily results in higher presentation speeds. This is further complicated by the fact that viewers have diverging needs based on their reading and language proficiency, hearing status and other less permanent factors like viewing environment, and fatigue, that make fast subtitles harder to follow at times for some viewers. Despite the vast body of literature on reading (cf., e.g., Rayner, Reference Rayner1978, Reference Rayner1998, Reference Rayner2009; Rayner, Pollatsek, Ashby & Clifton, Reference Rayner, Pollatsek, Ashby and Clifton2012; Rayner & Sereno, Reference Rayner, Sereno and Gernsbacher1994; Reichle, Reference Reichle2021; Reichle, Pollatsek, Fisher & Rayner, Reference Reichle, Pollatsek, Fisher and Rayner1998; Reichle, Rayner & Pollatsek, Reference Reichle, Rayner and Pollatsek2003) and the growing body of research on how subtitle processing operates (cf., e.g., d’Ydewalle & De Bruycker, Reference d’Ydewalle and De Bruycker2007; d’Ydewalle & Gielen, Reference d’Ydewalle, Gielen and Rayner1992; Liao, Kruger, & Doherty, Reference Liao, Kruger and Doherty2020; Liao, Yu, Reichle, & Kruger, Reference Liao, Yu, Reichel and Kruger2021; Perego, Del Missier, Porta, & Mosconi, Reference Perego, Del Missier, Porta and Mosconi2010; Szarkowska & Bogucka, Reference Szarkowska and Bogucka2019; Winke, Gass, & Sydorenko, Reference Winke, Gass and Sydorenko2013), the full picture of how subtitle speed impacts reading and comprehension is still unclear.
In a recent study by Liao et al. (Reference Liao, Yu, Reichel and Kruger2021) looking at the impact of subtitle speed (12, 20, and 28 characters per second or cpsFootnote 1 ) and video presence on comprehension and reading, it was found that faster subtitles resulted in significantly fewer and shorter fixations as well as fewer crossovers between subtitles and image. The study also found that both gaze durations (the fixation time on a word during first-pass reading and a measure of early lexical processing), and total time (including both first-pass visits and subsequent revisits to words), increased with word length, decreasing word frequency, and slower subtitle speed. However, these word-frequency and word-length effects are less pronounced at faster speeds. This leads the authors to conclude that increasing subtitle speed results in a shift “from local (cognitive) eye-movement control towards heuristics informed by global task constraints (e.g., subtitle speed).”
In this paper, we report on the same dataset but in order to get a better understanding of the impact of subtitle speed on the degree to which the text can be processed, we focus on word skipping and rereading. Specifically, we examined two aspects of reading that are most likely to be impacted by subtitle speed. The first is the extent to which viewers can read enough of the subtitles, and particularly read subtitles to completion (as evidenced by word skipping). The second is the extent to which readers reread individual words either following horizontal eye movements from within the subtitles or following vertical eye movements between the subtitles and the image. It seems probable that both the ability to read subtitles to completion and to reread individual words would be impacted negatively by increased subtitle speeds, due to the reduction in the time available to read the subtitles. Before explaining these measures in more detail, what exactly is meant with subtitle speed has to be discussed.
Subtitle speed
Subtitle speed, also referred to as “presentation rate” or “subtitle rate” and even “reading speed” (although the latter is obviously problematic), is a measure of the length of time the subtitle stays on screen, as a factor of the amount of text that has to be read during its display. Researchers in the field of audiovisual translation (AVT) are well aware of the fact that “subtitle speeds are not set in stone; they differ from country to country and even from company to company” (Szarkowska & Gerber-Moron, Reference Szarkowska and Gerber-Morón2018; pp. 2). Previous studies on this topic have often included a comprehensive overview of different traditions regarding subtitle speed (e.g., d’Ydewalle, Van Rensbergen & Pollet, Reference d’Ydewalle, van Rensbergen, Pollet, O’Regan and Levy-Schoen1987; Szarkowska & Bogucka, Reference Szarkowska and Bogucka2019; Szarkowska & Gerber-Moron Reference Szarkowska and Gerber-Morón2018; and Romero-Fresco, Reference Romero-Fresco2019). Some countries such as Australia (ACMA, 2016), Canada (CRTC, 2016), France (CSA, 2011), Spain (AENOR, 2012), the UK (Ofcom, 2017), and the US (FCC, 2014) have developed official rules regarding some technical parameters to be followed by subtitlers. The recommended subtitle speeds range from 10 to 20 cps depending on the guidelines’ provenance, target viewers (deaf, hard of hearing, hearing, age group), and the nature of on-screen-text (intra- vs. interlingual subtitles, but also scrolling vs. chunked subtitles). However, none of these guidelines on subtitle speed are based on sound empirical evidence, nor has the impact of different speeds on subtitle reading been investigated exhaustively (but cf. Szarkowska & Gerber-Morón, Reference Szarkowska and Gerber-Morón2018).
In addition to the fact that there is a large variability in terms of recommended subtitle speeds (which is often a recommended maximum speed), the actual subtitle speed in any given video often varies dramatically. In order to get a sense of the distribution of actual subtitle speeds (regardless of existing guidelines), we analyzed 23356 English same-language subtitles extracted from the top 11 “Blockbuster” movies on Netflix Australia on July 13, 2020.Footnote 2 Although this is not a representative sample of movies, even on Netflix, it does give a sense of the variability of subtitle speed in mainstream subtitled products. The distribution of speeds is presented in Figure 1. What is interesting is that, in spite of the Netflix guidelines that English captions should be a maximum of 20 cps (Netflix, 2020), the average speed in this small corpus of subtitles was 12.6 cps, with a range of 0.8 cps to 50.4 cps. Importantly, a total of 15.2% of the subtitles were faster than 20 cps. This is not a quality judgment, but rather an indication of the variability in subtitle speed found in widely available films on a popular streaming platform, as well as the fact that this range is often obscured when average speeds are reported. It also confirms that the speed at which subtitles are delivered varies throughout audiovisual products due to the pace of its different parts, speech rates of individual speakers, or the strictness and consistency in following specific editing rules (cf. Fresno & Sepielak, Reference Fresno and Sepielak2020 for an overview).
Previous experimental studies on subtitle speed
Previous studies that examined how subtitle speed impacts processing and comprehension largely focused on global eye movements (i.e., averaged across videos without taking individual words into account) and used diverse post hoc measures of comprehension, subtitle, and scene recognition and viewers’ attitudes and preferences (cf., e.g., Koolstra, van der Voort, & d’Ydewalle, Reference Koolstra, Van Der Voort and d’Ydewalle1999; Romero-Fresco, Reference Romero-Fresco2019; Szarkowska et al., Reference Szarkowska, Krejtz, Pilipczuk, Dutka and Kruger2016; Szarkowska & Gerber-Morón, Reference Szarkowska and Gerber-Morón2018). Due to the lack of comparability between some experimental designs as well as the nature of authentic audiovisual materials used in those studies, the full picture of how subtitle speed affects multimodal processing is unavoidably distorted, making subtitle speed one of the most misconstrued topics in AVT research.
Variability and manipulation of subtitle speed
As pointed out by Fresno and Sepielak (Reference Fresno and Sepielak2020), only a few studies looking at subtitle speed report measures of speed variability (e.g., Jensema et al., Reference Jensema, McCann and Ramsey1996), while most studies only report the average subtitle speed (e.g., Szarkowska & Bogucka, Reference Szarkowska and Bogucka2019). This inevitably obfuscates the impact of speed, especially when reporting only global measures. In other studies, like Krejtz, Szarkowska & Krejtz (Reference Krejtz, Szarkowska and Krejtz2013), Szarkowska et al. (Reference Szarkowska, Krejtz, Pilipczuk, Dutka and Kruger2016), and Szarkowska and Gerber-Morón, (Reference Szarkowska and Gerber-Morón2018), the researchers created different speeds mainly by editing the text of the subtitles so that the overall length of the text is reduced by removing idea units, or by replacing or deleting individual words or phrases. This creates a potential confound since subtitles that are edited down to create a slower speed may not only differ in meaning from verbatim transcripts, but they are also often simplified in either semantic structure, word length, or word frequency, which could make them easier to process, but might also introduce discrepancies with the spoken dialogue. In our study, we use a consistent speed in each of the conditions by reducing the duration of the subtitle to increase the speed, but without editing the language.
Proportional reading time
Some of the earlier eye-tracking studies on subtitle reading investigated proportional reading time for subtitles at different speeds. These studies found that adult viewers spend proportionally more time reading longer subtitles (i.e., subtitles containing more text) than shorter subtitles, and also proportionally more time reading faster subtitles (d’Ydewalle, Muylle, & Van Rensbergen, Reference d’Ydewalle, Muylle, van Rensbergen, Groner, McConkie and Menz1985; d’Ydewalle, van Rensbergen & Pollet, Reference d’Ydewalle, van Rensbergen, Pollet, O’Regan and Levy-Schoen1987).
A much more recent study by Szarkowska and Gerber-Morón (Reference Szarkowska and Gerber-Morón2018) was designed to test whether viewers could cope with reading subtitles as fast as 20 cps. The authors make some inferences regarding the efficiency of subtitle processing based on proportional reading time between three speeds, taking the fact that readers spent proportionally more time reading subtitles at the fast rate as evidence that “faster subtitles were read more efficiently than slow subtitles” (p. 24).
Although proportional reading time could be a useful measure when comparing manipulations of subtitle content, in the context of subtitle speed it is somewhat problematic. The proportion of time viewers spend reading subtitles (i.e., proportionally more time for faster subtitle speeds) cannot be equated with the efficiency of processing without adding word-level analyses, or knowing to what extent the subtitles were in fact processed cognitively or just scanned superficially. It is more likely that this measure simply reflects the fact that fast subtitles are on screen for a shorter period of time and, with more text to be read in a shorter period of time, or less proportional time available per character, it should come as no surprise that viewers would spend proportionally more time reading faster subtitles.
To provide a more nuanced interpretation of the impact of subtitle speed on reading, we will therefore investigate evidence of incomplete reading, or the proportion of subtitles that were not read to completion. Unlike proportional reading time which could be very high in cases where the subtitle disappears before a reader could finish reading it, this measure makes it possible to determine to what extent subtitles are not read fully.
Comparability between conditions
The design of the study by Szarkowska and Gerber-Morón (Reference Szarkowska and Gerber-Morón2018) include subtitles in three languages and soundtracks in either a foreign language in experiment 1 or English as a first or second language in experiment 2, which required both translation and editing down of the subtitle content to create the slower rates. This makes it difficult to disentangle the actual influence of speed from other factors such as semantic and syntactic differences, as well as equivalence where subtitles at different speed may differ in meaning from the spoken dialogue.
The lack of comparability between conditions in previous studies is also due to the use of a wide range of audiovisual materials. For example, Szarkowska and Gerber-Morón (Reference Szarkowska and Gerber-Morón2018) used short clips from a number of different genres. Likewise, Szarkowska et al. (Reference Szarkowska, Krejtz, Pilipczuk, Dutka and Kruger2016) used twelve two-minutes clips selected across three different genres. Although the authors indicate that they controlled for comparability by looking at readability metrics and using only dialogue-heavy and fast-paced clips, the differences in genres and editing alone could have a significant impact on eye movements. To ensure comparability between clips in the current study, we used clips from one genre with consistent film editing and content.
To explain the focus of this paper on word skipping and rereading, the following sections will explore the relevance of these two eye-movement behaviors in reading based on previous research on static texts.
Word skipping and rereading
When reading, our eyes constantly alternate between fixations and saccades. Fixations are gaze points when the eyes stop scanning the scene, holding the central foveal vision in place so that the visual system can take in detailed information about what is being looked at. Saccades are rapid movements of the eyes from one fixation to the next (Rayner, Reference Rayner1998, Reference Rayner2009). Extensive eye-tracking research supports the idea that information intake occurs mostly while we fixate a specific point and not during the fast saccadic movement when we are functionally blind (Rayner, Smith, Malcolm, & Henderson, Reference Rayner, Smith, Malcolm and Henderson2009).
In spite of our perception that we can see our full field of vision in detail, only the center of vision, namely 2° of visual angle known as the fovea, provides us with enough visual acuity during fixations to identify the fine details of words and objects (Rayner & Morrison, Reference Rayner and Morrison1981). The region that extends 5° from the center of vision is known as the parafovea, which allows parafoveal processing such as parafoveal preview of words or objects around the fovea. In reading, there is only a limited region for effective visual processing, which is known as the perceptual span. Perceptual span extends asymmetrically from the center of vision for about 3-4 characters to the left and up to 14 characters to the right (in reading a language like English), although we can only identify words that fall within a space of around 7-8 characters to the right of a fixation (cf. Rayner, Reference Rayner1986, Reference Rayner1995). The rest of the visual field is known as peripheral vision and can only be used to track previously identified objects and identify sudden onsets or movement, such as a moving object on the video image while reading a subtitle, or the onset of a subtitle while looking at the image. In order to process the object or to start reading the subtitle, the viewer has to shift his or her gaze to bring it into the fovea.
In studies on the reading of static text, word skipping often refers to words not being fixated during first-pass reading and is found to increase with decreasing word length and increasing word frequency (cf. Angele, Laishley, Rayner & Liversedge, Reference Angele, Laishley, Rayner and Liversedge2014; Drieghe, Reference Drieghe2008; Drieghe, Rayner & Pollatsek, Reference Drieghe, Rayner and Pollatsek2005; Rayner, Slattery, Drieghe & Liversedge, Reference Rayner, Slattery, Drieghe and Liversedge2011). However, as pointed out by Reichle (Reference Reichle2021: 366), “most words have to be fixated to be identified” during reading because of the restrictions of perceptual span and visual acuity. In this study, we will therefore refer to word skipping as those words that were not fixated at least once. Reichle (Reference Reichle2021) also points to the fact that shorter words such as function words are more likely to be skipped. As documented by Rayner (Reference Rayner1998, Reference Rayner2009), between 70% and 80% of words are fixated at least once, which means that in the reading of static texts, between 20% and 30% of words are skipped. It stands to reason that the syntactic and discourse processing of subtitles would be impaired if a higher skipping ratio is observed, which would mean that a large portion of the words could not be identified and processed.
In addition to looking at word skipping during subtitle reading, we will also look at the extent to which words that are located at the end of subtitles are skipped. Should a larger proportion of words at the end of subtitles not be fixated, it would indicate that viewers were unable to read the subtitles to completion, which would then make it less likely that post-lexical integration would occur when linguistic representations of words are converted into propositional representations (c.f. Rayner, Sereno, Morris, Schmauder, & Clifton, Reference Rayner, Sereno, Morris, Schmauder and Clifton1989; Warren, White, & Reichle, Reference Warren, White and Reichle2009).
A key feature of reading is the fact that skilled readers routinely make regressions (eye movements in the opposite direction to the direction of reading, that is, saccades from left to right in the reading of a language like English) (Schotter, Tran, & Rayner, Reference Schotter, Tran and Rayner2014). According to Rayner, Pollatsek, Ashby and Clifton (Reference Rayner, Pollatsek, Ashby and Clifton2012), between 10% and 25% of eye movements are regressions, during which viewers revisit sections of the text either because of an oculomotor error, or, more interestingly, because they did not obtain sufficient information during first-pass reading and have to return to it to resolve an ambiguity, or correct for comprehension breakdown (cf. Eskenazi & Folk, Reference Eskenazi and Folk2017; Inhoff, Kim & Radach, Reference Inhoff, Kim and Radach2019; Schotter, Tran, & Rayner, Reference Schotter, Tran and Rayner2014). Inhoff et al. (Reference Inhoff, Kim and Radach2019) distinguish between “small” regressions (correcting for saccadic error when the fixation lands on the wrong word or skipped a word accidentally, and incomplete processing where the reader has to return to a partially processed word), and “large” regressions which are related to text comprehension (where segments of the text are reread for comprehension). A reduction in time for viewers to make regressions (as a result of faster subtitle speeds) may therefore interfere with comprehension (see Schotter, Tran & Rayner, Reference Schotter, Tran and Rayner2014; and Cook & Wei, Reference Cook and Wei2019; Inhoff et al., Reference Inhoff, Kim and Radach2019; Metzner, von der Malsurg, Vasishth, & Rösler, Reference Metzner, von der Malsburg, Vasishth and Rösler2017). In other words, and more specifically, given the increased difficulty of integrating the meaning of low-frequency words into the sentence context suggested in the literature (cf. White, Drieghe, Liversedge & Staub, Reference White, Drieghe, Liversedge and Staub2018), a reduction in regressions will most likely have a detrimental effect on sentence comprehension.
Hyönä, Lorch, and Rink (Reference Hyönä, Lorch, Rinck, Hyönä, Radach and Deubel2003, see also Hyönä, Lorch & Kaakinen, Reference Hyönä, Lorch and Kaakinen2002) distinguish between first-pass rereading time as the sum of all reinspective fixations during first-pass reading, and lookback fixation time (or second-pass fixation time) as the sum of all fixations during a second-pass reading of a text. In their view, reinspections are fixations on a word that has already been fixated and are initiated by a regression, but followed by either regressive or progressive fixations. The present study adopts this approach by investigating rereading of words in subtitles when a previously fixated word is refixated from any direction after the eyes moved out of the word for the first time.
In reading fast subtitles, readers may not have the time to make either lookback fixations or first-pass revisits to resolve syntactic or lexical issues. Studying rereading as evidenced by lookbacks or revisits to words could provide valuable information on the reading process in a multimodal context. More importantly, unlike in static reading, revisits to words (or rereading behavior) during subtitle reading could be initiated horizontally (from elsewhere within the subtitle) or vertically (from the video to the subtitle). These two types of rereading behavior most likely serve very different purposes, whereas rereading following horizontal eye movements likely reflect viewers’ engagement with linguistic processing (making sense of the text), rereading following vertical eye movements (closer to lookback fixations) initiated from within the video image probably serve the purpose of integrating the text with other visuals to build a situation model of the video (to make sense of the story) (see also Laeng & Teodorescu, Reference Laeng and Teodorescu2002).
As such, rereading is associated with global processing strategies (Hyönä, et al., Reference Hyönä, Lorch and Kaakinen2002) and regressions for text comprehension (Cook & Wei, Reference Cook and Wei2019; Inhoff, et al., Reference Inhoff, Kim and Radach2019). Since subtitles are typically single sentences, it is possible to relate rereading after horizontal eye movements to lexical processing and comprehension at sentence level, whereas rereading after vertical eye movements is most likely related to higher-level comprehension found in the reading of longer texts (see Cook & Wei, Reference Cook and Wei2019), and indeed reading of subtitles on video. A reduction in either of these types of rereading would therefore compromise, or at least detract from, both the local, linguistic processing of the subtitles, and the global processing or integration of the subtitles into the situation model of the video.
Reading in multimodal contexts
While a considerable amount has been learned about the mental processes engaged in the reading of static text where the reading pace is under the reader’s control (see Reichle, Reference Reichle2021), little is known about the cognitive processes of reading in dynamic and multimodal contexts as is the case when reading subtitles in video. When watching a subtitled film, the cognitive systems for reading have to be coordinated with those needed for the processing of other information (e.g., background video content or auditory input) within a limited period of time, at a pace dictated by the film and over which the viewer has no control (cf. Kruger & Steyn, Reference Kruger and Steyn2014).
A recent article by Liao, et al. (Reference Liao, Yu, Reichel and Kruger2021) presents a schematic diagram of the perceptual and cognitive processes supporting multimodal reading, such as reading subtitles in film (Figure 2).
This framework presents the different perceptual and cognitive processes that support reading in an integrated manner in a multimodal context. What the figure illustrates is that viewers’ comprehension of subtitled video does not rely solely on the subtitles or the image or the soundtrack, but that the objects identified in the image, as well as the spoken dialogue and the sounds contribute to the creation of a situation model that forms a context in which the subtitles are read. However, if a word is not identified, or is misunderstood, it could compromise the sentence processing, which would impact negatively on the text base and situation model, just as failure to identify an object could impact negatively on the situation model.
Although the multimodal integrated language framework provides a good account of how different sources of information interact and contribute to the comprehension of subtitled videos, it should be noted that the efficiency of subtitle processing is highly context-specific and derives from the very nature of audiovisual materials while also being moderated by viewer characteristics such as their ability to manage their attentional resources, their language proficiency, reading speed, prior knowledge, etc.
The coordination of eye movements and the integration of language in a multimodal framework as in Figure 2 is also subject to the availability of cognitive resources. Multimodal language integration is dependent on the accurate identification of words and objects. When any of the serial processes (such as word or object identification, or sentence processing) encounters a problem such as ambiguity or missing information, more cognitive resources have to be assigned to the processing of that item, with the result that other items that appear at the same time could easily be missed. In other words, when a viewer is focusing exclusively on one element, he or she becomes “blind” to other elements such as changes (also known as “change blindness”), or unexpected items (also known as “inattentional blindness”) (see Jensen, Yao, Street & Simons, Reference Jensen, Yao, Street and Simons2011). Romero-Fresco (Reference Romero-Fresco2019) uses the term “subtitling blindness” to refer to the fact that viewers could miss an important part of the image on screen because they were reading a subtitle.
Limitations in visual perception means that the effective processing of fast subtitles and synchronous video could very well be possible when all the systems can operate smoothly, but when things go wrong, the integration could be compromised. An important reason for this is that faster subtitles take important control away from the viewer since less time is available for text processing. In particular, due to the reduction in time available to read the subtitles, viewers may have less time to make regressions to low-frequency words or to correct for saccadic errors. Solid empirical evidence confirms the existence of systematic differences in the ease (or difficulty) of processing words in the text (see Rayner & Duffy, Reference Rayner and Duffy1986 for overview). In reading, shorter words, more frequent words, and more predictable words receive shorter fixations than longer, less frequent, or less predictable words (see Rayner, Ashby, Pollatsek, & Reichle, Reference Rayner, Ashby, Pollatsek and Reichle2004; Rayner & Duffy, Reference Rayner and Duffy1986). The existence of word frequency and word length effects point to the fact that linguistic complexity influences the reading process and that nodes of linguistic complexity attract increased processing, or, simply put, require more time to process. What this implies for subtitle reading is that any increase in linguistic complexity will have one of two results: either it will slow down the reading process with longer fixations and more regressions, with an increasing chance that sentence processing will not be complete before the subtitle disappears, or, if the reading does not slow down, these nodes of linguistic complexity will not receive sufficient processing. In the present study, we specifically test whether higher subtitle speeds will have an influence on the way viewers process linguistic complexity as evidenced by an increase in rereading of words following eye movements from elsewhere in the subtitle such as regressions, or the way viewers integrate the image with specific words in the subtitle evidenced by rereading of words following vertical eye movements from the image to the subtitle.
Research questions and hypotheses
This study seeks to determine whether an increase in subtitle speed impacts a) the rereading of words following horizontal regressions or revisits from elsewhere in the subtitle, likely as an indication of linguistic processing; b) the rereading of words following vertical eye movements from the image, likely as an indication of the integration of the words in the subtitle and the image to form a situation model; and c) the ability of viewers to identify words and read subtitles to completion (as evidenced by word skipping). Our hypotheses are that high subtitle speeds will result in fewer words being reread after both horizontal and vertical eye movements, resulting in more superficial linguistic processing (as found by Liao et al., Reference Liao, Yu, Reichel and Kruger2021), as well as impoverished situation models. We also hypothesize that an increase in subtitle speed will result in an increase in the number of words skipped both in the subtitles overall (marking a reduction in lexical processing and word identification), and at the end of subtitles (marking a reduction in post-lexical integration) compromising the viewer’s ability to integrate the information in the subtitles with the image.
Methodology
To address the research questions of the present study, we performed new analyses using the comprehension and eye-movement data from Liao et al.’s (Reference Liao, Yu, Reichel and Kruger2021) experiment. In Liao, et al.’s experiment, participants watched videos with subtitles at three speeds (12 cps, 20 cps, and 28 cps) with and without background video. As the present study focuses on the impact of subtitle speed on the integration between the subtitle and the video, we only report on the conditions with video presentation. While the analyses of word skipping and revisits are new, the analysis of comprehension is the same as that in Liao et al. (Reference Liao, Yu, Reichel and Kruger2021), but reported only for the video present condition.
Design
The experiment had one independent variable, namely subtitle speed, with three levels: 12 cps, 20 cps, and 28 cps in a within-participant design. The three conditions were counterbalanced via a Latin-square design, so that each participant encountered all conditions in a randomized order but read a given set of subtitles only once.
Participants
The participants recruited for this experiment (n = 31, aged 18–36, 25 female and 6 male) were all native English speakers with advanced reading skills (they were recruited from a population of university students). They all had normal or corrected-to-normal vision. Participants provided informed consent and received either a gift voucher or course credit for participation. To control for familiarity with subtitle reading which could impact the results, participants were asked to indicate how often they watch English movies with English subtitles on a scale from 1 (never) to 7 (always). The mean for our 31 participants was 2.78 (SD = 1.96), indicating that the task was not completely unfamiliar or unnatural to these participants.
Stimuli
The stimuli for this experiment were six self-contained video clips from the BBC documentary series Planet Earth (Fothergill, Reference Fothergill2006). To eliminate the influence of auditory input, the soundtracks were removed. This also allowed for more control of the subtitle speed since it was not necessary to synchronize the subtitles with the spoken narrative. As can be seen in Table 1, the subtitles in all the clips were comparable in terms of the total number of lines (one line was used for all subtitles, with an average length of 42.6 characters including spaces and punctuation – minimum length of 27, maximum length of 59) and total word count, and were also controlled for readability (using the Flesch Reading Ease score; Graesser et al., Reference Graesser, McNamara, Cai, Conley, Li and Pennebaker2014). Although the line length therefore sometimes exceeded the conventional line length of around 37 to 42 characters, using one line means that the additional variable of number of lines (that would require return sweeps) could be avoided. Videos were presented at the center of the screen with subtitles presented below the videos.
The subtitle speed was edited using Aegisub subtitle-editing software (www.aegisub.org). The slowest speed was created first (12 cps), and this speed was then increased incrementally for the faster speeds of 20 and 28 cps by reducing the time the subtitle was on screen. This means that the gap between subtitles increases systematically between the three speed conditions. However, in cases where a sentence was distributed across more than one subtitle, the maximum gap between these subtitles was kept to a maximum of 500 ms to ensure coherence. (For more information on the characteristics of the subtitles, please see Liao et al., Reference Liao, Yu, Reichel and Kruger2021). Participants had to answer eight three-alternative comprehension questions after each video that were based on the information contained in the subtitles.
Apparatus
We recorded participants’ eye movements using an EyeLink 1000+ (SR Research Ltd., Canada) eye-tracker with a 2,000-Hz sampling rate. The stimuli were displayed on a BenQ Zowie XL2540 screen with a 240-Hz refresh rate and a screen resolution of 1,920 × 1,080 pixels. Videos were presented at the center of the screen with a resolution of 1,440 × 810, and subtitles were presented below the video using a 30-point Courier New font (RGB color of 255, 255, 102). Participants were seated 95 cm from the screen so that each letter took up approximately ∼0.4° of visual angle. A chin-and-forehead rest was used to minimize head movements. Only the right eye was tracked.
Procedure
Each participant was tested individually. Before the experiment, participants were instructed to watch the videos for comprehension and to answer the comprehension questions after each video. A nine-point calibration was conducted before each video. The maximum calibration error was 0.5° to ensure tracking accuracy. To minimize fatigue, participants were given a 5-minute break after every second video.
Analyses
While Liao et al. (Reference Liao, Yu, Reichel and Kruger2021) reported a range of local measures (including mean fixation duration, mean saccade length, fixation count, and crossovers between subtitles and the image), as well as global measures (the effect of word frequency, word length and wrap-up using gaze duration, total times, and skipping probability of words in subtitle final positions), the current study reports only on word-based measures related to the rereading of words following vertical or horizontal eye movements, as well as to the skipping of words overall and at the end of subtitles (or incomplete reading).
We report the comprehension scores for each speed condition as an indication of the degree to which participants could recognize the information contained in the subtitles. The hypothesis is that comprehension will be impacted negatively by increasing speed, as the time to process the subtitle decreases.
In order to compare participants’ subtitle reading as a function of subtitle speed, we report word skipping and rereading:
-
(1) To measure the impact of speed on the ability of participants to identify words, we report the percentage of skipped words or the percentage of words that did not receive any fixations.
-
(2) To measure the impact of subtitle speed on the ability of participants to read subtitles to completion, we report the percentage of words skipped after right-most fixation. This is obtained from the number of words in each subtitle after the right-most fixation in the subtitle (i.e., words at the end of the subtitle that are not fixated), divided by the total number of words in that subtitle. Words after the rightmost fixation that were not fixated were coded as 1, and other words were coded as 0.
-
(3) To measure the impact of subtitle speed on linguistic processing, we analyze the percentage of words revisited following horizontal eye movements. Here we look at words that were fixated more than once (excluding refixations during first-pass reading of the word), following a horizontal eye movement. In other words, words fixated more than once were coded as 1, and the rest 0. This measure reflects the extent to which the progressive reading is interrupted by a return to a preceding word before progressive reading resumes.
-
(4) To measure the impact of subtitle speed on global processing and on the ability of viewers to integrate objects in the image with words in the subtitle through rereading, we analyze the percentage of words revisited following vertical eye movements. Here we also look at words that were fixated more than once (excluding refixations during first-pass reading of the word), but this time following a vertical eye movement initiated from within the video image, again coding refixated words as 1, and the rest as 0.
The comprehension and eye-movement data were analyzed using Generalized Linear Mixed Models (GLMMs) via the lme4 package (version 1.1-23) in R (version 3.6.3). For all the analyses, subtitle speed was treated as a fixed factor, with participant and word treated as random effects. Contrasts of the different speed conditions (as factor) were set up using the contr.sdif function to compare the means of each pair of consecutive levels (20-12 cps and 28-20 cps).
Only significant results are reported for brevity. Finally, the emmeans package (version 1.4.7) in R was used to compute the contrasts and extract the estimated means between any two conditions (e.g., 12 cps vs. 28 cps). When fitting a model, we started with a maximal random-effect structure, which was then progressively pruned following the Parsimonious Mixed Model approach (Bates, Kliegle, Vasishth, & Baayen, Reference Bates, Kliegl, Vasishth and Baayen2015). The following models were used for all analyses: DV ∼ Subtitle Speed + (1 | Participant) + (1 | Word). A summary of the models is provided in Appendix 2. Because the six video clips were all selected from the Planet Earth series and thus comparable, a single random effect variable was coded for each combination of video, subtitles, and words in our analyses (assuming no inherent video-subtitle hierarchy).
Results
Comprehension accuracy
Overall comprehension accuracy was above chance (0.33), ranging from 0.50 to 0.92 (M = 0.74, SD = 0.44) across participants. As illustrated by Figure 3, comprehension accuracy declined significantly from 20 to 28 cps (b = −0.49, SE = 0.21, z = −2.31, p = 0.02).
Eye movements measures
For eye movement analyses, two participants were excluded due to poor tracking quality, so were six participants whose comprehension accuracy was less than 0.40 for two of the six videos. Fixations shorter than 60 ms or longer than 800 ms were removed (6.5% of the total data).
Table 2 presents descriptive statistics for the four eye-tracking measures used to examine subtitle processing. Overall, with the increasing subtitle presentation speed, more words were skipped, more subtitles were not read to completion, and fewer words were reread following either horizontal or vertical eye movements.
Percentage of words skipped
As Table 3 and Figure 4A show, more words were skipped as the speed increased (all |z|s > 14.25, ps < 0.001). At the slowest rate, viewers only skipped 29% of words, but at the medium rate this increased to 35% and at the fastest rate, to 43%.
Bold font indicates |z| > 1.96.
Percentage of words skipped after right-most fixation
From Table 3 and Figure 4B, it is clear that viewers were increasingly unable to read to the end of subtitles before they disappeared as the speed increased (all |z|s > 12.29, all ps < 0.001). The proportion of words skipped at the end of subtitles increased from 17% at the slowest speed, to 19% at the medium speed and to 24% at the fastest speed.
Percentage of words revisited following horizontal eye movements
As shown in Table 3 and Figure 4C, viewers reread fewer words following horizontal eye movements with increasing subtitle speed (all |z|s > 14.88, ps < 0.001). The percentage of words revisited following horizontal eye movements (from within the subtitle) reduced from 19% at the slowest speed, to 12% at the medium speed, and to a mere 7% at the fastest speed.
Percentage of words revisited following vertical eye movements
As shown in Table 3 and Figure 4D, viewers reread fewer words following vertical eye movements from the video when subtitle speed increased (all |z|s > 4.22, ps < 0.001). The percentage of words revisited from the video reduced from 2.9% at the slowest speed, to virtually disappear at the medium speed (0.5%) and at the fast speed (0.2%).
Discussion
The fact that there was a significant difference in comprehension accuracy between the slowest and the fastest rate suggests that the fast subtitle speed does have a negative impact on comprehension. Due to the fact that the comprehension questions merely tested the ability of the participants to identify one of three alternatives correctly on eight items per video, there was limited discriminatory value in the questions, and therefore, this finding should be treated with caution. Our hypothesis in terms of comprehension is therefore only partially supported.
There was a significant effect of subtitle speed on the percentage of words that were skipped, with around 30% of words skipped at the slowest speed, which is in line with skipping rates during static reading (between 20% and 30% according to Rayner, Reference Rayner1998), and 43% of words skipped at the fastest rate. In addition, an increasing number of skipped words were at the end of subtitles as the speed increased, suggesting that viewers were increasingly unlikely to be able to read the subtitles in full before they disappeared as the subtitle speed increased. This points to the fact that increasing the subtitle speed will inevitably result in fewer words being processed, and more subtitles not being processed in full, confirming our hypothesis. A larger proportion of words being skipped at both the faster speeds is a clear indication that viewers are more likely to miss important information in subtitles as the speed increases and that this is likely to increase even further in the presence of variable subtitle speed when the viewer cannot get used to (or predict) a consistently fast speed as in this experiment. The same applies to the ability of viewers to read the full subtitle before it disappears, thereby compromising sentence integration.
Furthermore, our analyses of rereading behavior clearly demonstrates that increased subtitle speed impaired viewers’ engagement with linguistic processing for text comprehension as well as the integration process between two different visual sources (i.e., the subtitle and the video) that is essential for building a comprehensive situational model. As the subtitle speed increased, participants made fewer revisits to previously fixated words, both from elsewhere in the subtitle (i.e., following horizontal eye movements) and from the video (i.e., following vertical eye movements). The fact that participants could still revisit or reread around 19% of words after horizontal eye movements at 12 cps, but only 12% and 7% at the two faster speeds suggests that the linguistic processing of the subtitles suffers, in that the essential reading routine of rereading words that were misidentified or misinterpreted is compromised at the higher speeds, confirming our hypothesis.
In addition, the fact that refixation of words following vertical eye movements from the image virtually disappears at both the faster rates (going down from 3% at the slowest rate to 0.5% and 0.2% at the two faster rates) suggests that viewers are less likely to be able to integrate their identification of specific objects in the image with specific words in the subtitles at the higher speeds. We are not suggesting that this integration cannot occur at the higher speeds, since the viewers would still have identified words in their declarative memory as well as in their procedural memory to the extent that sentence processing had occurred (Figure 2), but in view of the fact that more words are skipped and fewer subtitles read to completion, both the text base and the situation model are likely to be affected negatively by this reduction in rereading for integration.
A key point here is that, although comprehension may still continue at times when there is no text on screen and when rereading is therefore not possible, a reduction in rereading combined with an increase in word skipping would make it harder for a viewer to a) correct for syntactic misanalysis and lexical misidentification of words, b) perform word identification in general, and lexical integration of words after reading full subtitles (and therefore sentence processing), and c) capitalize on the benefits of multimodal texts to integrate information in the subtitles with the processing of the image in building a situation model. Therefore, although the time for integration increases at the higher rates in this experiment with longer gaps between subtitles, there is a strong likelihood that high speeds would result in more reading errors and incomplete reading, which will counter this benefit. It is also important to note that this increased gap between subtitles as a result of the shorter duration of subtitles does not reflect the majority of naturalistic contexts where verbatim subtitles are used in the presence of high speech rates, often with minimal gaps between subtitles.
Conclusion
The findings of this study under controlled conditions (no sound, consistent subtitle speed, single display lines) reveal that faster subtitle speeds have a considerable impact on the depth of processing of the subtitles. As the speed increases, viewers speed up their reading, resulting in more words being skipped, both overall and at the end of subtitles. With fewer words being fixated and fewer subtitles read to completion as the speed increases, sentence processing will most likely begin to suffer. In naturalistic contexts where subtitle speed tends to vary significantly, this effect is likely to increase as viewers cannot predict the amount of time they will have available to finish reading a subtitle and may therefore fail to read even more subtitles to completion.
The findings of this study on the reduced ability of viewers to reread previously fixated words as the speed increases, either after horizontal eye movements from within the subtitles or following vertical eye movement from the video, further support our conclusion that high subtitle speeds will likely have a negative impact on both linguistic processing of the subtitles, and integration of the subtitles with the video.
The fact that readers speed up their reading as the subtitle speed increases (evident in fewer, shorter fixations and longer saccades as found by Liao et al., Reference Liao, Yu, Reichel and Kruger2021) is not surprising. Viewers will naturally try to cover as much as possible of the text before it disappears. The consequence, however, is that viewers have less time to process words and to re-process words that were misidentified or that resulted in confusion or were simply unfamiliar.
This brings us back to the question of whether viewers can cope with faster speeds. Our results clearly indicate that the processing of subtitles becomes increasingly superficial and incomplete as the speed increases. This is not to say that viewers do not have the ability to adapt their reading behavior to compensate for the reduction in available time to read the subtitles, although it does impact the overall processing depth. An analogy with driving seems appropriate here. Just as driving at faster speeds makes it harder to react to anything on the road in time, increasing the reading speed to keep up with faster subtitles makes it harder to react to anything in the text that trips up reading. This is evident in the dramatic increase in the number of words that are skipped overall and at the end of subtitles, but more specifically in the reduction of rereading that occurs with increasing speed.
The videos used in this experiment were not particularly demanding, with a slow editing pace and sweeping camera shots. It stands to reason that more complex visuals resulting from a faster video editing pace and more complex interactions between elements on the screen will leave fewer resources to engage in the processing of subtitles and would result in either more visual elements not being identified while reading the subtitles, or more words not being identified due to an increase in attention to the image. In future research, we plan to manipulate the visual demands to investigate this.
Like most experimental studies, this study had to make a number of compromises. Some limitations relate to the testing of comprehension that provide limited insight into the impact of increasing subtitle speed on comprehension. Since the comprehension questions were based on the content of the subtitles, it did not test the comprehension of the visual elements of the clips that were not referenced in the subtitles. This might explain the lack of a significant difference in comprehension between the slow and medium speeds. The duration of the clips, although at least as long as the clips used in other studies, combined with the relatively undemanding nature of the video, means that the real impact of subtitle speed on comprehension could not be tested robustly. In future research, validated comprehension tests on longer (even full-length) videos could be used to investigate the impact of speed on comprehension, particularly since no study to date has been able to measure this with the necessary rigor, including the current study. It should also be noted that the participants in this experiment were all highly educated first-language speakers of English and that the impact on viewers with lower language proficiency and educational levels might be more severe. This would be in line with studies on deaf school students (cf. Tyler, Jones, Grebennikov, Leigh, Noble, & Burnam, Reference Tyler, Jones, Grebennikov, Leigh, Noble and Burnham2009). Furthermore, as first-language speakers of English who report watching subtitled video infrequently, the participants may not be as skilled at reading subtitles as viewers who use subtitles routinely. It would therefore be important also to replicate this study with such participants.
Taking away the soundtrack, although necessary in order to isolate the impact of subtitle speed, also meant that this study does not take into account the impact of auditory input in the multimodal integrated-language framework (Figure 2). Like the visual input, sound and dialogue will certainly facilitate feature binding and the formation of situation models that will reduce the reliance on the text in the subtitles. Nevertheless, particularly in cases where the audience is more reliant on the subtitles (such as in the case of deaf and hard-of-hearing viewers or viewers who do not have any access to the language of the soundtrack, second language learners, or when viewers’ access to the soundtrack is limited due to environmental conditions such as noisy environments), higher subtitle speeds will make it harder for the viewer to process the subtitles and understand the film as a whole. This problem will be exacerbated by a variable subtitle speed where viewers cannot adjust to a faster speed due to the fluctuation in speed.
Acknowledgments
We would like to thank Erik Reichle and Lili Yu for their valuable contribution to the conceptualization of the study and statistical analyses.
Competing interests
The authors declare no conflict of interest concerning the authorship or the publication of this article.
Appendix 1. Information about Netflix subtitles
Appendix 2. Model summaries
Data and R scripts are available on request.
1. GLM model summary for comprehension analysis
2. GLM model summary for percentage of words skipped
3. GLM model summary for percentage of words skipped after right-most fixation
4. GLM model summary for percentage of words revisited following horizontal eye movements
5. GLM model summary for percentage of words revisited following vertical eye movements