Hostname: page-component-7b9c58cd5d-g9frx Total loading time: 0 Render date: 2025-03-15T13:08:55.328Z Has data issue: false hasContentIssue false

A Call for Conceptual Models of Technology in I-O Psychology: An Example From Technology-Based Talent Assessment

Published online by Cambridge University Press:  03 November 2017

Neil Morelli*
Affiliation:
The Cole Group–R&D
Denise Potosky
Affiliation:
Pennsylvania State University–Psychology
Winfred Arthur Jr.
Affiliation:
Texas A&M University–Psychology
Nancy Tippins
Affiliation:
CEB
*
Correspondence concerning this article should be addressed to Neil Morelli, The Cole Group – R & D, 300 Brannan St., Suite 304, San Francisco, CA 94107. E-mail: neil.morelli@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

The rate of technological change is quickly outpacing today's methods for understanding how new advancements are applied within industrial-organizational (I-O) psychology. To further complicate matters, specific attempts to explain observed differences or measurement equivalence across devices are often atheoretical or fail to explain why a technology should (or should not) affect the measured construct. As a typical example, understanding how technology influences construct measurement in personnel testing and assessment is critical for explaining or predicting other practical issues such as accessibility, security, and scoring. Therefore, theory development is needed to guide research hypotheses, manage expectations, and address these issues at this intersection of technology and I-O psychology. This article is an extension of a Society for Industrial and Organizational Psychology (SIOP) 2016 panel session, which (re)introduces conceptual frameworks that can help explain how and why measurement equivalence or nonequivalence is observed in the context of selection and assessment. We outline three potential conceptual frameworks as candidates for further research, evaluation, and application, and argue for a similar conceptual approach for explaining how technology may influence other psychological phenomena.

Type
Focal Article
Copyright
Copyright © Society for Industrial and Organizational Psychology 2017 

In a recent issue of the Economist, John Battelle, a technology journalist and entrepreneur, was quoted as saying; “Technology is no longer a vertical industry, as it's been understood by everyone for four decades. Technology is now a horizontal, enabling force throughout the whole economy” (“To fly, to fall,” 2015). Technology has also been a pervasive enabling force in the science and practice of industrial-organizational (I-O) psychology (Coovert & Thompson, Reference Coovert, Thompson, Coovert and Thompson2014a). In recent years, the academic literature and the Society for Industrial and Organizational Psychology (SIOP) conference have been inundated with studies focusing on the application of technology to I-O psychology. Whether it involves selection and assessment (e.g., Gray, Morelli, & McLane, Reference Gray, Morelli, McLane and Morelli2015; Seiler et al., Reference Seiler, McEwen, Benavidez, O'Shea, Popp and Sydell2015), training (e.g., Bank et al., Reference Bank, Collins, Hartog, Hardesty, O'Shea, Dapra and Bank2015) or performance management (e.g., Armstrong, Landers, & Collmus, Reference Armstrong, Landers and Collmus2015), almost no I-O psychology practice area has gone untouched by the application of technology. Undoubtedly, these studies are important for identifying differences that occur across technological devices, increasing the general awareness of technology's impact on I-O psychology as a field, finding best practices that benefit client organizations, and directing research interests. However, it is important to ask if the I-O psychology community is any closer to understanding why or how technology is (or is not) affecting the measured construct, the trainee's reaction, or the effectiveness of the new performance evaluation method, to name a few examples. In this article, we argue that I-O psychology has largely failed to answer these questions due to the absence of a theoretical or conceptual framework of technology applied to I-O psychology that transcends individual studies of specific hardware or software applications.

A panel session held at the 2016 SIOP conference in Anaheim, California (Morelli, Adler, Arthur, Potosky, & Tippins, Reference Morelli, Adler, Arthur, Potosky and Tippins2016), was designed to answer these questions: “How much (or how little) does technology influence the phenomena commonly examined in I-O psychology or its methods and practices?” and “If it does, are there any conceptualizations that can help explain these technology-related effects?” Starting from these general questions, the panel narrowed its discussion to conceptualizations that could help create more specific, theory-based hypotheses for technology applied to assessment within a selection context. This article is an extension of the panel's conclusions and an invitation to other researchers and practitioners to join this discussion. Specifically, this article intends to build on the conclusions that (a) there are legitimate reasons a conceptual framework of “technology applied to I-O psychology” is needed, and (b) as an example, there are existing conceptual frameworks that could be evaluated, refined, and potentially adopted for guiding hypothesis development related to how technology impacts construct measurement in personnel testing and assessment.

A Working Definition of Technology

Undoubtedly, the question of how to create a unifying conceptual model of “technology” applied within I-O psychology is a broad one with many tacks to an informative answer. To help narrow both the question and the response, we offer a working definition of technology and limit the discussion to focus on how technology applies to talent assessment.

Defining technology as a general term has been largely overlooked within the I-O psychology community. When the term “technology” is mentioned in I-O psychology publications, it is often mentioned in the context of how it is being used to digitize and/or automate a method of measurement, assessment, or communication (Mead, Olson-Buchanan, & Drasgow, Reference Mead, Olson-Buchanan, Drasgow, Coovert and Thompson2014; Tippins, Reference Tippins, Tippins and Adler2011). In Coovert and Thompson's (Reference Coovert and Thompson2014b) edited book on technology applied to work, which helpfully summarized the most recent thinking and application of information technologies to I-O psychology practice, we could find no general definition of technology unrelated to a specific application (e.g., training, performance appraisal, teamwork, or leadership). Others have most likely noticed this definitional omission, but not being technologists or engineers, it makes sense that our profession would not risk a misinformed or overly broad definition.

Even for writers, historians, and engineers who study and build technology, the exercise of defining it has resulted in an amalgam of contradictory statements (Arthur, Reference Arthur2009). As Arthur (Reference Arthur2009) simply put it, “What is technology? The answer, whether we turn to dictionaries or to the writings of technology thinkers, is annoyingly unclear” (p. 27). However, Arthur goes on to define technology at three levels: singular (a device, method, or process), plural (a combination of practices or components), and general (the entire collection of devices and processes available to a culture). We suspect that when I-O psychology researchers and practitioners refer to “technology-based” practices, such as assessments used in talent selection, the third definition is implied, but applications using the first and second definitions, technology-singular and technology-plural, are what are typically studied. For instance, when Mead, Olson-Buchanan, and Drasgow (Reference Mead, Olson-Buchanan, Drasgow, Coovert and Thompson2014) summarized the current research on technology-based selection, defined as the automation of traditional selection processes, research investigating specific uses of a technology (e.g., computers, the Internet, a simulation) were referenced.

With these three definitional levels and our personal experiences with technology-enabled assessment as our backdrop, we offer a working, general definition: Technology, as it is commonly applied to I-O psychology practice, is the constellation of individual tools that assist a user with controlling or adapting to his or her environment. Although we run the risk of offering a broad and perhaps overly simplistic definition, we believe this general definition elevates our view of technology beyond individual technologies to a suite of tools. This definition simultaneously narrows down our view of technology, sometimes defined as a general knowledge area or cultural truism passed down over time, to discrete groupings of tools that have specific purposes (Arthur, Reference Arthur2009). Our hope is that conceptualizing technology at this middle, plural level allows us to develop theories that are generalizable but not so unwieldy that they fail to have specific implications for individual technologies applied to I-O psychology practice.

This definition also allows for individual tools to combine in a way that they accomplish a purpose that is adaptive for an individual user. This is an important distinction germane to how technology is applied in I-O psychology. I-O psychology assumes that at least one human is involved in the phenomenon being studied or leveraged; thus, we assume that a human is interacting with the technology in question to serve a human purpose (e.g., automation for efficiency). This further narrows our definition to exclude the supporting technologies that might store or share data in the background and not be directly interacted with by an end user.

In an I-O psychology selection and assessment context, it could be argued that I-O psychology practitioners have adopted technological tools to help assessors adapt to the environmental business demands of timely and data-based decision making. In other words, when I-O psychology practitioners create “technology-based assessments,” they are leveraging a suite of individual technologies (e.g., personal computers, Internet access, Web browsers) that achieve the common purpose of helping assessors process more response data or response data more accurately while implementing and interpreting assessments more quickly, affordably, profitably, and accessibly (Tippins, Reference Tippins, Tippins and Adler2011). By focusing on a suite of individual tools organized by their purpose, we can avoid getting hung up on studying each individual component in isolation. In sum, borrowing language from Arthur (Reference Arthur2009), we take a definition of technology that includes multiple devices and systems that can be studied using generalizable theoretical constructs, which still have implications for individual components.

Studying Technology's Impact on Construct Measurement

Technology has had a transformative effect on every component of the testing process, including item generation (Gierl, Lai, Fung, & Zheng, Reference Gierl, Lai, Fung, Zheng and Drasgow2016), item delivery (Luecht, Reference Luecht and Drasgow2016), scoring (Bennett & Zhang, Reference Bennett, Zhang and Drasgow2016), security (Foster, Reference Foster and Drasgow2016), and accessibility of test takers (Stone, Laitusis, & Cook, Reference Stone, Laitusis, Cook and Drasgow2016). According to the recently revised Standards for Educational and Psychological Testing (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014), the evaluation of an assessment should be based on its purpose, construct or domain definition, intended population, and intended uses. A common thread through these evaluations is the importance of the construct validity of the test. In our experience, first determining how the measurement of the construct may or may not be affected by using technology is critical before determining other practical matters surrounding the test. For instance, accessibility is often important in employment testing. The Standards (AERA et al., 2014) state that accessibility denotes that “all test takers should have an unobstructed opportunity to demonstrate their standing on the construct(s) being measured” (p. 49). Like accessibility, other testing components, such as test takers' reactions or the length of the assessment, are important from a practice perspective, but we focus our discussion on how technology may or may not affect the construct being measured. Theorizing about the influence of technology on construct validity is an essential first step toward anticipating technology's role in item generation and delivery, scoring, security, and accessibility, because such insight into the nature of what we want to measure has implications for how we go about creating, using, and modifying measures.

Oftentimes, the primary concern with using a new assessment medium is that the technology introduces error to the construct measurement (Scott & Mead, Reference Scott, Mead, Tippins and Adler2011). According to classical test theory (Guilford, Reference Guilford1954; Gulliksen, Reference Gulliksen1950; Spearman, Reference Spearman1904, Reference Spearman1910; Thurstone, Reference Thurstone1931), a person's observed score on a test is equal to his or her “true score” on the focal construct plus error (cf., Ghiselli, Campbell, & Zedeck, Reference Ghiselli, Campbell and Zedeck1981). In the application of classical test theory formulas, reliability estimates represent error as any source of construct-irrelevant variance, essentially combining systematic and unsystematic error. Potosky (Reference Potosky2008) argued that the variance introduced by the administration medium represents systematic error, which can confound the target construct's measurement (e.g., Zickar, Cortina, & Carter, Reference Zickar, Cortina, Carter, Farr and Tippins2010). The technical term for this error is interpretational confounding (Anderson & Gerbing, Reference Anderson and Gerbing1998). This occurs when the latent construct being measured is different than the latent construct that is intended to be measured and is a serious psychometric problem when establishing the reliability and validity of the scores obtained from a selection tool (Binning & Barrett, Reference Binning and Barrett1989). Put another way, systematic errors introduced by the chosen technological medium account for “momentary and nonrepeating factors” (Becker, Reference Becker2000, p. 370) that can change the measured construct, thus affecting its construct-related validity. In addition, because reliability of the measurement may be affected, criterion-related validity can also be impacted.

To test for this error, measurement equivalence tests are typically conducted (Vandenberg & Lance, Reference Vandenberg and Lance2000). Measurement equivalence refers to the extent to which two assessment tools measure the intended construct similarly and produce comparable results (Scott & Mead, Reference Scott, Mead, Tippins and Adler2011) and is an important assumption to verify before estimating an assessment's reliability, validity, or observed group differences. In traditional equivalence studies, a new technological medium used to deliver an assessment is compared to a less technical (or more traditional) medium, by either comparing a series of structural equation models to one another (Schmitt & Kuljanin, Reference Schmitt and Kuljanin2008; Vandenberg & Lance, Reference Vandenberg and Lance2000; Vandenberg & Morelli, Reference Vandenberg, Morelli and Meyer2016) or by examining individual items for their differential item functioning using item response theory (e.g., Tay, Meade, & Cao, Reference Tay, Meade and Cao2014). These comparisons are usually made between technology-enabled versions of the same assessment, as researchers follow the advice that “the critical issue . . . is determining the major sources of error, estimating their size, and ideally, identifying strategies that can leverage the technology to improve reliability” (Scott & Mead, Reference Scott, Mead, Tippins and Adler2011, p. 32). The following questions summarize the potential sources of error that are often tested in today's technology-enabled assessment research:

  1. a. Are latent or observed score differences associated with how the test taker is interacting with the technology (e.g., perceptual abilities of reading text on different screen sizes, attitudes—familiarity or anxiety—while using different devices)?

  2. b. Are latent or observed score differences created by how the test itself is interacting with the technology (e.g., type of construct measured, item type, length, response interaction)?

  3. c. Are latent or observed score differences created by how the technology interacts with the environment (e.g., the strength of a Wi-Fi or cellular data signal on a smartphone, smartphone battery life)?

  4. d. Are latent or observed score differences created by all the above? (Note: The technology also influences and is influenced by those who use it, including test developers and test takers.)

In an attempt to answer these questions, extant research has been reactionary in that technology is often studied as a static exogenous variable directly affecting or mediating the relationship between psychological phenomena and organizational outcomes (e.g., Coyne, Warszta, Beadle, & Sheehan, Reference Coyne, Warszta, Beadle and Sheehan2005). Individual studies of technology applied to talent assessment usually follow the process of “create and/or apply new technology to a talent assessment, and compare the results to ‘traditional’ testing methods”; from the early comparisons between paper-and-pencil assessments and computerized assessments (e.g., Mead & Drasgow, Reference Mead and Drasgow1993) to the comparisons between mobile device-delivered assessments and nonmobile assessments (e.g., King, Ryan, Kantrowitz, Grelle, & Dainis, Reference King, Ryan, Kantrowitz, Grelle and Dainis2015).

However, there are multiple theoretical and practical issues with the “conduct equivalence study, rinse, and repeat” cycle. First, beyond quasi-experimental, between-subject studies using archival data, executing equivalence studies in operational employment settings is often infeasible. Collecting data from large enough samples can be challenging for the more controlled, within-subject experimental designs, as there is little tolerance on the part of applicants or employees for taking multiple test versions. Moreover, other practices (e.g., translations and adaptations into multiple languages) that would also benefit from equivalence studies complicate the research design and further increase the sample size problem.

Second, planning, organizing, and presenting equivalence studies that generate a clear consensus on a given assessment medium take longer than the rate at which technology evolves. This has been observed in mobile device testing research. By the time the first equivalence studies were published, more powerful devices and assessment types were hitting the market (Chamorro-Premuzic, Winsborough, Sherman, & Hogan, Reference Chamorro-Premuzic, Winsborough, Sherman and Hogan2016). In addition, it is unlikely that every single variation of each device can be studied. Specifically, the incremental and iterative equivalence study approach is neglecting innovative technology-based assessments that can meaningfully advance the field, such as new unproctored assessment formats, virtual roleplays, immersive simulations, gamified assessments, crowdsourcing, and algorithmic data-gathering methods (Ryan & Ployhart, Reference Ryan and Ployhart2014). Although we do not advocate blindly applying these new technologies to selection practice, I-O psychologists run the risk of being sidelined by organizational leaders who will not wait for answers to their questions on the appropriate use of these technologies.

Third, Potosky (Reference Potosky2008) pointed out that practitioners who want to use a new technology to administer selection tests might not want equivalence with an older administration medium, especially if attaining equivalence means forcing the new medium to be similar in format to the old one; the anticipated benefits of the new technology might represent an advancement over the old format. In this case, a lack of equivalence may be perceived as acceptable if it favors the new technology (Potosky, Reference Potosky2008), but without demonstrating equivalence between the new and old test media, a new validity study using data from the new medium is needed to estimate the validity of the new medium. Moreover, equivalence studies are, in themselves, an atheoretical shortcut to estimating the validity of a new measure. Even if two compared measures are highly correlated, their validities are not necessarily similar (McCornack, Reference McCornack1956).

Finally, relying on reactive equivalence also lowers the ability to develop practically meaningful solutions or recommendations for organizations and stakeholders who rely on assessment data. Without a conceptual understanding that informs relationships among variables (e.g., test-taker's true scores, technology's influence, and the latent construct being measured), the ability to substantively address practical questions is reduced. It should not be forgotten that explaining human cognition and behavior constitutes the primary charge of the I-O psychology profession. The theory defended or rejected in a research project is what is generalizable, not the specific quantitative findings of any one research study. Therefore, creating or defining a conceptual model that is based on theory gives technology-enabled assessment research increased scientific and practical value, and more fully aligns individual research efforts with the overall purpose of the field. In other words, predicating technology-related research efforts on theory helps answer these questions: “Why should we expect latent or observed score differences (or similarities) between a new assessment delivery and an older assessment delivery?” and “How are observed scores likely to differ?” Answers to these questions are essential when deciding to develop or use a new technology-enabled assessment.

In sum, researchers who conduct Sisyphean equivalence studies comparing the nuances of each new technological iteration, often under less than ideal conditions, incur opportunity costs of time, resources, and insight. Furthermore, this trial-and-error method ultimately fails to explain why construct-irrelevant variance is added to the measurement process. Thus, the fundamental question when it comes to technology applied to I-O psychology talent assessment is not “Does the technology change the construct irrelevant variance in a new technology-enabled measurement of a psychological construct?” but instead, “What are the theoretical reasons we should expect construct-irrelevant variance to change due to the use of technology?”

In our experience, most equivalence-comparison studies fail to answer the latter question because no conceptual models of technology are offered to inform the level of analysis or predict the interactions with the test taker's psychological processes. To address this problem, three potential conceptual frameworks are (re)introduced to help predict how technology should affect an individual's communicative and cognitive processing directly and thus his or her test score.

Potosky's (Reference Potosky2008) Conceptual Framework

Potosky (Reference Potosky2008) proposed a conceptual framework that views an assessment as a communicative act between the test taker and the individual or organization who wants to measure attributes of the test taker. The test and the test medium are two distinguishable components of this information exchange. Most assessments use mobile, electronic, written, or face-to-face communication channels, and the communication medium employed (e.g., a smartphone, the Internet, paper-and-pencil, video-conferencing) will have structural attributes that affect the message (i.e., the test) quality as well as the participants in the exchange. Potosky eschews the use of broad technology labels and instead identifies four attributes that generalize across media. Specifically, the attributes of any test administration medium include:

  1. 1. Social bandwidth: the amount of information that can be shared via a given medium. A medium with high social bandwidth shares more social information, such as paralanguage, facial expressions, and/or affect.

  2. 2. Transparency: the degree to which a medium is evident or salient to the test taker during testing. A medium with low transparency does not interfere with or obstruct the communication exchange.

  3. 3. Interactivity: the pace of information sharing facilitated by the medium.

  4. 4. Surveillance: the degree to which a medium allows an external party to monitor or intercept the message being exchanged.

The four attributes of any assessment medium have structural features (by design) and dynamic features (changes that occur during test administration). Figure 1 shows how the structural and dynamic features of each attribute can be modeled. For any given medium, there is a structural range that defines the limits of each attribute. When designing an assessment and anticipating its use, a point along this range is targeted. In other words, every technological medium is set up, or structured, to have a certain level of each attribute. For example, an assessment might aim to utilize as much social bandwidth as the assessment medium allows but might not use the features of the medium that facilitate surveillance during the exchange. The up/down arrowed lines in Figure 1 indicate these choices for each attribute. During the actual assessment, however, the levels of each attribute will vary along the structural range that the medium allows and the targeted level of each attribute might not be realized. Figure 1 suggests a hypothetical depiction of the dynamically achieved level of each attribute relative to the targeted level associated with the medium. The arrows and markings within each bar indicate that the level of each attribute dynamically fluctuates (a lot or a little) during the assessment. Hence, given dynamic alterations during use, the level of each attribute may be represented as the average level achieved throughout an assessment. It would be interesting to compare the intended nature of each attribute of an assessment medium to the average level attained for each respondent and across respondents.

Figure 1. Structural and dynamic features of an assessment medium.

For example, Web-conferencing software may be designed to enable audio-visual communication over the Internet to conduct a selection interview. The attributes of this medium can be set to maximize social bandwidth (e.g., using audio and video), interactivity (e.g., as close to immediate, real-time responsiveness as possible), and transparency (e.g., participants will hardly notice the Web-conferencing platform because with luck the technology will run smoothly during the assessment), and to minimize surveillance (e.g., no mention of recording the session). Yet, despite the way the medium is structured, any of these settings can dynamically change during the interview. Sometimes the dynamic variations associated with using a medium are subtle, such as the brief delay during a Web-enabled conversation that causes participants to talk over each other or the lack of eye contact because people look at each other's video images and not their Web cameras. The resulting awkward pauses and seeming interruptions disrupt interactivity and diminish social bandwidth because social cues are more difficult to interpret. When a system message such as “the Internet speed is not fast enough to allow video” or “please wait while we try to get the call back” pops up, the transparency of this medium diminishes substantially. Users are suddenly more aware of the medium and less focused on the communication exchange. When the test administrator or the respondent turns off the video camera and relies solely on audio, social bandwidth is greatly reduced and, at the same time, interactivity may be restored.

Potosky (Reference Potosky2008) proposed that structural and dynamic features of the four attributes add systematic variance to the assessment process. For example, it is not simply a device's screen size that adds construct-irrelevant variance but the structural and/or dynamic low transparency (e.g., because the user must scroll down or expand the font size, or activate or turn off a setting to continue to participate in the assessment exchange) that introduces systematic error. One could imagine a typology that anticipates the systematic effects associated with various combinations of the four attributes. In addition, research that compares the attributes of two different test media can explain why one medium used for delivering an assessment may or may not be equivalent to another medium. For example, a test administered via an Android smartphone might be more similar to the same test administered via a tablet than it is to an iPhone; a test administered via paper-and-pencil may be more similar to the same test administered via an e-reader than a laptop. The reasons for similarities and differences have less to do with the label used to describe the medium and more to do with the attributes of the test medium. By focusing on the four attributes, this conceptualization provides a way to compare devices without having to study all the specific settings and versions of each device.

Research is needed to develop measures of these attributes, create methods for evaluating the structural and dynamic features of a given medium, and test this theory overall. Nevertheless, recent methodological research has placed an emphasis on including the dynamic aspect of mediating variables in prediction models (Huang & Yuan, in press); thus, embracing a conceptualization that views technology as having structural and dynamic attributes helps facilitate this measurement recommendation.

Arthur, Keiser, and Doverspike's (Reference Arthur, Keiser and Doverspike2017) SCIP Framework

Whereas Potosky (Reference Potosky2008) framed the technological attributes that affect the communication quality between two parties, the structural characteristics and information processing (SCIP) framework (Arthur, Keiser, & Doverspike, Reference Arthur, Keiser and Doverspike2017) is a conceptualization of technology that assumes structural attributes affect the assessee's cognition. Specifically, this model explains how an unproctored Internet test's (UIT) device type introduces construct-irrelevant variance by affecting the respondents’ cognitive load. Arthur, Keiser, et al. (Reference Arthur, Keiser and Doverspike2017) preface the development of their model with a literature review of the 23 published and unpublished studies on UIT device types. The review's summary of the literature's findings indicated that although mobile and nonmobile cognitive and noncognitive assessments do not differ in terms of psychometric properties, such as factor structure, the reliability of scores, and differential item functioning, among others, cognitive assessments on mobile devices typically result in lower scores than those on nonmobile devices. To explain this mean score difference, Arthur, Keiser, et al. offer four information processing variables each associated with specified UIT device-type structural characteristics (screen size, screen clutter, response interface, and permissibility) that engender construct-irrelevant cognitive load. These information processing variables are as follows:

  1. 1. Working memory: the smaller screens native to most mobile devices likely increase the amount of information users must store in their working memory.

  2. 2. Perceptual speed and visual acuity: less viewable real estate on the screen creates more screen clutter of text, objects, and action buttons, which increases demands on the user's perceptual speed and visual acuity.

  3. 3. Psychomotor ability: the use of a touch interface, typically on smaller devices rather than a keyboard and mouse, requires more psychomotor ability to manipulate the device and the test content.

  4. 4. Selective attention: distractions resulting from the permissibility, or degrees of freedom in location choice, allowed by the device places more weight on the user's selective attention ability or the ability to focus on the test content and ignore environmental distractions.

These structural differences are theorized to place additional cognitive load on the test taker through these four sources of information processing demands, thus creating construct-irrelevant variance. Thus, the SCIP framework, which is illustrated in Figure 2, conceptualizes assessment device-type effects in terms of how individual differences in the specified information processing variables engendered by the assessment device's structural characteristics intersect with the constructs assessed to manifest as device-type effects (or lack thereof). Consequently, the SCIP framework frames UIT (and other) device-type comparisons more in psychological terms and not just as a “mobile” versus “nonmobile” comparison. Specifically, the structural characteristics stipulated in the framework create an information processing demands continuum, which allows for more nuanced and predictable relationships concerning score or psychometric differences across devices. For instance, smartphones are expected to create higher cognitive load for users due to their smaller screen sizes, touch interface, increased screen clutter, and higher permissibility as compared to other devices, such as laptop computers or desktop computers, which lie on the lower end of this structural continuum. So, for instance, using this framework in the context of test-taker reactions and preferences, the SCIP framework posits a number of propositions that are based on the increased test difficulty engendered by the additional device-related cognitive load. Specifically, test difficulty, broadly construed in terms of both content and method of assessment, has been demonstrated to influence reactions to assessments (e.g., Hong, Reference Hong1999; Tonidandel, Quiñones, & Adams, Reference Tonidandel, Quiñones and Adams2002), with higher levels of difficulty being associated with more negative reactions. Hence, for example, as per the SCIP framework, to the extent that the increased cognitive load associated with construct-irrelevant information processing variables introduces additional challenges to completing assessments on UIT devices, then one would expect reactions to assessments completed on devices at the higher end of the device-engendered cognitive load continuum to be more negative.

Figure 2. Unproctored Internet test (UIT) device-type SCIP model. Illustration of unproctored Internet-based testing (UIT) device-type structural characteristics and associated information processing demands. IP demands = information processing demands. Adapted from Arthur, Keiser, & Doverspike (Reference Arthur, Keiser and Doverspike2017).

It is also worth noting that the tenets of the SCIP framework are not necessarily limited to UIT devices; and indeed, they can conceptually inform discussions of a wider range of assessments, particularly in any domain in which construct-irrelevant information processing demands associated with the testing method or medium are pertinent. For instance, the SCIP framework would be germane in the context of other technologically mediated assessment methods, such as virtual roleplays, immersive simulations, and gamified assessments (Arthur, Doverspike, Kinney, & O'Connell, Reference Arthur, Doverspike, Kinney, O'Connell, Farr and Tippins2017). As another example, one could envisage characteristics associated with situational judgment tests (e.g., item length, complexity of instructions, response format, presentation mode [paper-and-pencil vs. video]) that could very well engender differential construct-irrelevant cognitive load, which then results in differential test scores (Arthur et al., Reference Arthur, Glaze, Jarrett, White, Schurig and Taylor2014; Chan & Schmitt, Reference Chan and Schmitt1997). In summary, hypotheses developed using the SCIP framework can be applied to other types of devices and technologies that have similar structural components or information processing demands.

The SCIP framework provides an intuitive and generalizable framework that is well-suited for technology-enabled assessment research, but more research is needed to answer two questions raised by the authors. First, what is the relative importance of these information processing variables in terms of the role they play in affecting levels of construct-irrelevant variance? This question is critical for understanding the relative weight each structural attribute has in explaining potential systematic error variance. The second question is, where is the “breakpoint” on the structural continuum where devices would be expected to influence assessment scores (e.g., desktop computers vs. smartphones) versus where they would not (e.g., desktop computers vs. notebooks)? If structural attributes indeed cause higher or lower cognitive load depending on where that device lies on the continuum, then what point on the continuum “matters” for explaining meaningful construct-irrelevant variance?

In addition, what about technologies that have different structural components from those presented on the continuum? For example, Ferran and Watts (Reference Ferran and Watts2008) compared video-conference meetings to face-to-face meetings in a field study of medical professionals and reported that the video-conference medium distorted participants’ judgments about presenters. They concluded that the video-conference medium increased users’ cognitive load. When these questions can be answered, this conceptualization can offer researchers new opportunities for creating specific hypotheses. In particular, the focus on the cognitive load associated with using technology is helpful in that information-processing demands of a certain technological tool can be evaluated relative to other devices or independently.

Sociomateriality

In addition to these conceptual models of technology applied to assessment, we believe there is another opportunity to identify and explain potential sources of construct-irrelevant variance by borrowing a concept from the organizational management literature called sociomateriality (Orilkowski, Reference Orilkowski2007). Sociomateriality describes technology as a combination of its social use and its materiality, or the arrangement of physical or digital matter into a form that is stable across time and important to users (Leonardi, Reference Leonardi, Leonardi, Nardi and Kallinikos2012). Instead of studying either facet in isolation, or even in a reciprocal relationship, sociomateriality considers technology's social uses and materiality to be “entangled” in a mutually shaping “assemblage” (i.e., human action shapes the material attributes as does the material shaping the human action; Orilkowski & Scott, Reference Orilkowski and Scott2008). Put another way, technology should not be thought of as an exogenous object that affects human thinking and behavior, as has been commonly conceptualized in technology-enabled assessment research, but should instead be thought of as the material component of an individual's behavior in practice (Orilkoswki & Scott, Reference Orilkowski and Scott2008). Leonardi (Reference Leonardi, Leonardi, Nardi and Kallinikos2012) uses social media as an example to demonstrate this concept. Social media's materiality allows users to both edit and permanently share text, pictures, or video, but entangled with its materiality is the social practice of sharing information publically. As such, social media's technical form is only fully understood and differentiated from other media through its social use, whereas the social use is predicated by what is technically possible with the software.

This conceptualization suggests that when studying technology, the proper unit of analysis is framed in terms of practice, or being “in use.” In sociomateriality, practice is the coordinated activity of people in a given context and is where the materiality and social use of technology become entangled (Leonardi, Reference Leonardi, Leonardi, Nardi and Kallinikos2012; Orilkowski & Scott, Reference Orilkowski and Scott2008). In other words, technology in practice is best understood as the extent to which an individual can accomplish his or her goals in the context of how the technology functions. In a selection context, for example, applicants’ goals are to be favorably, or at least accurately and fairly, evaluated for a job position, and they will use whatever technology is available (or required) in a way that helps them accomplish this goal. Thus, it may be necessary to explain or predict construct-irrelevant variance associated with technology used in assessment in terms of an applicant's goal attainment.

As an example, consider an assessment developer who would like to primarily use images to present both items and responses to candidates in a mobile-first format. Specific hypotheses can be created using sociomateriality as a conceptual framework that predicts how this assessment may or may not introduce construct-irrelevant variance. For instance, knowing that candidates’ intended social use for a mobile device is to apply for an open position, a researcher might predict that the mobile technology will not frustrate the candidate's goal of putting his or her best foot forward for a job that requires an assessment with a touch or swipe-based interface. In other words, job candidates may feel very comfortable using their smartphones for an assessment if they can use their devices in the same way during the assessment as they use them for other reasons. There is much to consider here in terms of expectations, goals, and use of technology, however.

In the case of mobile devices, for example, the assessment developer will need to consider “technology in social use” when designing the assessment, keeping in mind how a device is typically used: that the respondent could be anywhere when attempting the test; that the screen size may limit the respondent's ability to discern detail depicted in the images and text; or that calls, texts, or other distractions may pop up on the device during the assessment. With this conceptual model in mind, more precise hypotheses can be created based on the respondent's task during the test, the system requirements of the test, the specific applicant demographics, and the targeted job types. In any case, these hypotheses would consider the candidate's typical patterns of technology use as well as his or her goals, needs, comfort, and preferences regarding technological devices.

One key value of this conceptualization comes from its ability to help researchers create hypotheses related to talent assessment psychological processes that are already imbued with technological materiality. This changes the focus from tackling “technology in assessment” (i.e., the static examinations of new technological implementations) and moves toward studying the applicant's psychological processes and patterns of use of the technology during the assessment. Just as technological materiality is entangled within social phenomena (Leonardi, Reference Leonardi, Leonardi, Nardi and Kallinikos2012), it may be necessary to model the ways in which technological materiality is present in each individual's psychological processes. In other words, the variance associated with an assessment can potentially be understood by examining how technology's use facilitates or constrains goal achievement by assessors and/or assessees. In the examples above, more nuanced and informed hypotheses could be created by simultaneously considering both the technical features of the mobile device and its intended social use.

Admittedly, more work is needed to fully apply this idea in selection practice, but the sociomateriality concept is a promising candidate for better understanding and predicting how technology relates to an individual's psychological processes during an assessment and vice versa. Some within the I-O psychology field have recently alluded to the idea that hypotheses generated with this frame of reference have a strong theoretical foothold (Landers, Reference Landers2016).

Where Do We Go From Here?

In sum, we argue that rather than studying technology as a static, contaminating variable in the respondent's true score measurement model, we should instead study technology as a variable that is dynamically influenced by its use. When a test is administered, test developers and assessors pretend that only the target construct is measured and not the person's systematic use of the test. Controlling confounding variables was easy to do when talent assessment relied primarily on paper-and-pencil technology, because the dynamic experience of taking a paper-and-pencil test was consistent from one test taker to the next. This consistency was also improved by controlling most of the dynamic features associated with the paper-and-pencil medium through proctored administrations. In contrast, mobile-delivered, unproctored, Internet-based assessments can be completed at any time in any location, on a device that typically fits into a pocket.

A typical approach to understanding mobile-delivered tests has been to administer a test via smartphone and examine all the device's properties one by one (e.g., mobility, the operating system used, the screen size). However, Arthur, Keiser et al. (Reference Arthur, Doverspike, Kinney, O'Connell, Farr and Tippins2017) asked important research questions that arise from this approach: Where is the point on the device structural continuum that score differences manifest? Why? These are difficult questions to answer by studying any one structural property alone. Thus, the mobile device–delivered selection assessment is an example of technology that is better understood by conceptual frameworks that represent dynamic uses and how those uses influence psychological processes. Beyond mobile-based testing, Chamorro-Premuzic et al. (Reference Chamorro-Premuzic, Winsborough, Sherman and Hogan2016) provided a compelling argument that innovative technologies in assessment for selection, such as gamified assessments, digital interviewing, and crowd-sourced peer ratings, are just new digital versions of old assessment methods. Therefore, it is still important to understand how variance related to our test media could affect our carefully developed constructs, especially among the methods that imply some technology in use. In addition, it is worth noting that the variance associated with the technology used for assessment is not irrelevant, even when it is “construct irrelevant.” Constructs such as technological use, expertise, or comfort are not necessarily or always confounding with respect to the focal construct of the assessment.

Developing theories that explain how technology and psychology are wrapped up together is where I-O psychology should be spending its time. Theory-based predictions and explanations of technology use may also help address concerns and issues raised by practitioners who know that a balance of data and theory is needed for success going forward (we define success here as making accurate predictions in a way that is accepted and valued by client organizations and end users). Practitioners are vitally interested in the degree to which validity can be extended across the various forms of technology used in selection and assessment. Unless there is clear evidence to support the hypotheses made about technology's impact on test performance, practitioners must continue to research all the technological variations, as well as other relevant variations (e.g., language, cultural adaptations, etc.), to demonstrate both equivalence and the generalizability of validity across situations, despite the difficulty of conducting this research in a single organization. Outside of military and large employer testing programs, consortia studies may be the most feasible option. A starting point could be to ask test publishers to include an optional, standardized checklist of technology features that test takers complete to build a large database, which will provide researchers sufficient data to develop appropriate metrics of the salient technology features and explore the effects these have on test performance.

Instead of stopping at asking whether systematic error variance exists in technology-enabled assessments, we have proposed some theoretical frameworks that can help answer, a priori, the more important questions of why or how this error variance exists. We do not yet know all the implications of adopting a more conceptual approach to studying technology in assessment; but, our goal here is to introduce the need for a more conceptual or theoretically based explanation of technology's impact on talent assessment and to challenge researchers to consider some potential models as orienting frameworks that can elevate their efforts to be more explanatory and proactive. We hope this discussion leads to a greater conversation among I-O psychologists about how we should conceptualize technology in its application to important practices such as talent assessment.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
Anderson, J. C., & Gerbing, D. W. (1998). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103, 411423.Google Scholar
Armstrong, M., Landers, R. N., & Collmus, A. (2015, April). Game-thinking in human resource management. Poster session presented at the 30th Annual Conference of the Society for Industrial-Organizational Psychology, Philadelphia, PA.Google Scholar
Arthur, W. Jr., Doverspike, D., Kinney, T. B., & O'Connell, M. (2017). The impact of emerging technologies on selection models and research: Mobile devices and gamification as exemplars. In Farr, J. L. & Tippins, N. T. (Eds.), Handbook of employee selection (2nd ed.) (pp. 967986). New York: Taylor & Francis/Psychology Press.Google Scholar
Arthur, W. Jr., Glaze, R. M., Jarrett, S. M., White, C. D., Schurig, I., & Taylor, J. E. (2014). Comparative evaluation of three situational judgment test response formats in terms of construct-related validity, subgroup differences, and susceptibility to response distortion. Journal of Applied Psychology, 99, 335345.Google Scholar
Arthur, W. Jr., Keiser, N., & Doverspike, D. (2017). An information processing-based conceptual framework of the effects of unproctored Internet-based testing devices on scores on employment-related assessments and tests. Manuscript submitted for publication.Google Scholar
Arthur, W. B. (2009). The nature of technology: What it is and how it evolves. New York: Free Press.Google Scholar
Bank, J., Collins, L., Hartog, S., Hardesty, S., O'Shea, P., & Dapra, R. (2015, April). In Bank, J. (chair), High-fidelity simulations: Refining leader assessment and leadership development. Symposium presented at the 30th Annual Conference of the Society for Industrial-Organizational Psychology, Philadelphia, PA.Google Scholar
Becker, G. (2000). How important is transient error in estimating reliability? Going beyond simulation studies. Psychological Methods, 5, 370379.Google Scholar
Bennett, R. E., & Zhang, M. (2016). Validity and automated scoring. In Drasgow, F. (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 142173). New York: Routledge.Google Scholar
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 89, 150157.Google Scholar
Chamorro-Premuzic, T., Winsborough, D. Sherman, R. A., & Hogan, R. (2016). New talent signals: Shiny new objects or a brave new world? Industrial and Organizational Psychology: Perspectives on Science and Practice, 9 (3), 120 doi:10.1017/iop.2016.6 Google Scholar
Chan, D., & Schmitt, N. (1997). Video-based versus paper-and-pencil method of assessment in situational judgment tests: Subgroup differences in test performance and face validity perceptions. Journal of Applied Psychology, 82, 143159.Google Scholar
Coovert, M. D., & Thompson, L. F. (2014a). Toward a synergistic relationship between psychology and technology. In Coovert, M. D. & Thompson, L. F. (Eds.), The psychology of workplace technology (pp. 121). New York: Routledge.Google Scholar
Coovert, M. D., & Thompson, L. F. (2014b). The psychology of workplace technology. New York: Routledge.Google Scholar
Coyne, I., Warszta, T., Beadle, S., & Sheehan, N. (2005). The impact of mode of administration on the equivalence of a test battery: A quasi-experimental design. International Journal of Selection and Assessment, 13, 220224.CrossRefGoogle Scholar
Ferran, C., & Watts, S. (2008). Videoconferencing in the field: A heuristic processing model. Management Science, 54, 565578.Google Scholar
Foster, D. (2016). Testing technology and its effects on test security. In Drasgow, F. (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 235255). New York: Routledge.Google Scholar
Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. New York: W. H. Freeman & Co.Google Scholar
Gierl, M. J., Lai, H., Fung, K., & Zheng, B. (2016). Using technology-enhanced processes to generate test items in multiple languages. In Drasgow, F. (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 109126). New York: Routledge.Google Scholar
Gray, C., Morelli, N. A., & McLane, W. (2015, April). Does use context affect selection assessments via mobile devices? In Morelli, N. A. (Chair), Mobile devices in talent assessment: The next chapter. Symposium presented at the 30th Annual Conference of the Society for Industrial and Organizational Psychology, Philadelphia, PA.Google Scholar
Guilford, J. P. (1954). Psychometric methods (2nd ed.). New York: McGraw-Hill.Google Scholar
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.Google Scholar
Hong, E. (1999). Test anxiety, perceived test difficulty, and test performance: Temporal patterns of their effects. Learning and Individual Differences, 11, 431447.Google Scholar
Huang, J., & Yuan, J. (In press.). Bayesian dynamic mediation analysis. Psychological Methods. doi:10.1037/met0000073 CrossRefGoogle Scholar
King, D. D., Ryan, A. M., Kantrowitz, T., Grelle, D., & Dainis, A. (2015). Mobile Internet testing: An analysis of equivalence, individual differences, and reactions. International Journal of Selection and Assessment, 23, 382394.CrossRefGoogle Scholar
Landers, R. N. (2016). An introduction plus a crash course in R. The Industrial-Organizational Psychologist, 54 (1). Retrieved from http://www.siop.org/tip/july16/crash.aspx Google Scholar
Leonardi, P. M. (2012). Materiality, sociomateriality, and socio-technical systems: What do these terms mean? How are they related? Do we need them? In Leonardi, P. M., Nardi, B. A., & Kallinikos, J. (Eds.), Materiality and organizing: Social interaction in a technological world (pp. 2548). Oxford, UK: Oxford University Press.Google Scholar
Luecht, R. M. (2016). Computer-based test delivery models, data, and operational implementation issues. In Drasgow, F. (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 179205). New York: Routledge.Google Scholar
McCornack, R. L. (1956). A criticism of studies comparing item-weighing methods. Journal of Applied Psychology, 40, 343344.Google Scholar
Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449459.Google Scholar
Mead, A. D., Olson-Buchanan, , & Drasgow, F. (2014). Technology-based selection. In Coovert, M. D. & Thompson, L. F. (Eds.), The psychology of workplace technology (pp. 2143). New York: Routledge.Google Scholar
Morelli, N., Adler, S., Arthur, W. Jr., Potosky, D., & Tippins, N. (2016, April). Developing a conceptual model of technology applied to I-O psychology. Panel discussion presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA.Google Scholar
Orilkowski, W. J. (2007). Sociomaterial practices: Exploring technology at work. Organization Studies, 28, 14351448.CrossRefGoogle Scholar
Orilkowski, W. J., & Scott, S. V. (2008). Sociomateriality: Challenging the separation of technology, work, and organization. The Academy of Management Annals, 2, 433474.Google Scholar
Potosky, D. (2008). A conceptual framework for the role of the administration medium in the personnel assessment process. Academy of Management Review, 33, 629648.Google Scholar
Ryan, A. M., & Ployhart, R. E. (2014). A century of selection. Annual Review of Psychology, 65, 693717.CrossRefGoogle ScholarPubMed
Schmitt, N., & Kuljanin, G. (2008). Measurement invariance: Review of practice and implications. Human Resource Management Review, 18, 210222.Google Scholar
Scott, J. C., & Mead, A. D. (2011). Foundations for measurement. In Tippins, N. & Adler, S. (Eds.), Technology-enhanced assessment of talent (pp. 118). San Francisco: John Wiley & Sons, Inc.Google Scholar
Seiler, S., McEwen, D., Benavidez, J., O'Shea, P., Popp, E., & Sydell, E. (2015, April). Under the hood: Practical challenges in developing technology-enhanced assessments. Panel discussion presented at the 30th Annual Conference of the Society for Industrial-Organizational Psychology, Philadelphia, PA.Google Scholar
Spearman, C. (1904). The proof and measurement of the association between two things. American Journal of Psychology, 15, 72101.Google Scholar
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271295.Google Scholar
Stone, E., Laitusis, C. C., & Cook, L. L. (2016). Increasing the accessibility of assessments through technology. In Drasgow, F. (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 217234). New York: Routledge.Google Scholar
Tay, L., Meade, A. W., & Cao, M. (2014). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods, 18, 346.Google Scholar
Thurstone, L. L. (1931). The reliability and validity of tests: Derivation and interpretation of fundamental formulae concerned with reliability and validity of tests and illustrative problems. Ann Arbor, MI: Edwards Bros.Google Scholar
Tippins, N. (2011). Overview of technology-enabled assessments. In Tippins, N. & Adler, S. (Eds.), Technology-enhanced assessment of talent (pp. 118). San Francisco: John Wiley & Sons, Inc.CrossRefGoogle Scholar
Tonidandel, S., Quiñones, M. A., & Adams, A. A. (2002). Computer-adaptive testing: The impact of test characteristics on perceived performance and test takers' reactions. Journal of Applied Psychology, 87, 320332.CrossRefGoogle ScholarPubMed
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 470.CrossRefGoogle Scholar
Vandenberg, R. J., & Morelli, N. A. (2016). A contemporary update on testing for measurement equivalence and invariance. In Meyer, J. P. (Ed.), The handbook of employee commitment (pp. 449461). Cheltenham, UK: Edward Elgar.Google Scholar
Zickar, M. J., Cortina, J., & Carter, N. T. (2010). Evaluation of measures: Sources of sufficiency, error, and contamination. In Farr, J. L. & Tippins, N. (Eds.), Handbook of employee selection (pp. 399416). New York: Routledge.Google Scholar
Figure 0

Figure 1. Structural and dynamic features of an assessment medium.

Figure 1

Figure 2. Unproctored Internet test (UIT) device-type SCIP model. Illustration of unproctored Internet-based testing (UIT) device-type structural characteristics and associated information processing demands. IP demands = information processing demands. Adapted from Arthur, Keiser, & Doverspike (2017).