Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-06T19:14:51.018Z Has data issue: false hasContentIssue false

Further Considerations in SJT Development

Published online by Cambridge University Press:  23 March 2016

Matthew J. Borneman*
Affiliation:
Ergometrics & Applied Personnel Research, Inc., Lynnwood, Washington
*
Correspondence concerning this article should be addressed to Matthew J. Borneman, Ergometrics & Applied Personnel Research, Inc., 18720 33rd Avenue West, Lynnwood, WA 98037. E-mail: matthewb@ergometrics.org
Rights & Permissions [Opens in a new window]

Extract

The situational judgment test (SJT) development procedures outlined by the authors of the focal piece (Lievens & Motowidlo, 2016) provide an excellent framework to design SJTs that help answer fundamental questions about what SJTs measure and why they work. This article expands on this framework to explore further some of the issues faced in the development of SJTs. These issues include the implied assumption of linearity between general domain knowledge and effectiveness, whether the SJT measures a single construct or multiple constructs, and when a more criterion-centered approach to SJT development might be preferred.

Type
Commentaries
Copyright
Copyright © Society for Industrial and Organizational Psychology 2016 

The situational judgment test (SJT) development procedures outlined by the authors of the focal piece (Lievens & Motowidlo, Reference Lievens and Motowidlo2016) provide an excellent framework to design SJTs that help answer fundamental questions about what SJTs measure and why they work. This article expands on this framework to explore further some of the issues faced in the development of SJTs. These issues include the implied assumption of linearity between general domain knowledge and effectiveness, whether the SJT measures a single construct or multiple constructs, and when a more criterion-centered approach to SJT development might be preferred.

Assumption of Linearity

Implicit in the authors’ discussion of design considerations in the development of SJTs is the assumption of a linear relationship between the general domain of knowledge and the ultimate criterion of job performance. This assumption is not unique to the authors of the focal article, nor is it, generally speaking, an untenable assumption to make. However, there are some lines of research that suggest that the simple linear relationship may not be appropriate for all jobs and in all contexts.

One line of research stems from the faking literature. Kuncel and Tellegen (Reference Kuncel and Tellegen2009) found that undergraduate students, when instructed to fake good on a personality inventory to obtain a desirable job (even if it required some form of deception), did not always pick the most extreme response option. Instead, some more moderate level of the trait was rated as most desirable for numerous items, including several that exhibit an inverted u-shaped relationship. Consider the item “talkative,” which exhibits this inverted-u shape (Kuncel & Tellegen, Reference Kuncel and Tellegen2009, Figure 3). It is relatively easy to understand why moderate levels of this item are desired, as neither extremely reticent nor extremely verbose people are ideal in numerous workplace contexts. The extent to which these nonlinear relationships manifest themselves in the prediction of subsequent behavior is an empirical question open to further study.

The notion of nonlinear relationships between some personality traits and subsequent job performance has received some empirical support as well (Le et al., Reference Le, Oh, Robbins, Ilies, Holland and Westrick2011). For low complexity jobs, both conscientiousness and emotional stability exhibited nonlinear inverted-u relationships with job performance. Again, although these findings certainly merit further empirical investigation, it does raise concern over an important implicit assumption that must be examined when developing an SJT that assesses a general domain of knowledge: the assumption of linearity.

Some anecdotal evidence from our own work in developing SJTs also challenges this implicit assumption of linearity. Consider an SJT designed to predict job performance for positions with heavy customer service elements. Obviously, detail orientation is a critical trait in the prediction of performance in customer service positions. However, our work suggests that the heuristic of “detail-oriented actions are more effective” is not always the case. Suppose a customer service representative is on a phone call with a customer regarding an issue with receiving a bill. Although helping the customer, the customer service representative notices an error in the zip code, which may be causing the billing issue. One detail-oriented course of action would be to correct the zip code in the system. A more detail-oriented and more effective action would be to review and verify the entire address with the customer to ensure timely delivery of bills. An even more detail-oriented, though substantially less effective action would be to review and verify all information in the system with the customer; this increases the time spent with the customer without any measurable additional benefit for either the customer service representative or the customer.

Taken all together, these lines of evidence suggest that care must be taken when delineating the general domain of knowledge, as suggested by the authors of the focal article. This delineation must also take into account the nature of the relationship between the domain of knowledge and the subsequent behavior to be predicted.

Single Versus Multiple Constructs

The authors of the focal article describe a general process to develop a single SJT of a single general domain of knowledge. This process could be generalized a bit further to include SJTs that assessed multiple domains of knowledge at the same time. Practical issues notwithstanding (e.g., it is easier to sell one SJT measuring five constructs than five SJTs that each measure one construct), assessing multiple constructs in a single SJT does allow for some interesting opportunities to arise in development.

One opportunity for SJT item development that arises is the ability to relax the requirement to develop response options that vary both in terms of effectiveness and in levels of the target trait. As the authors note, getting response options to vary on both effectiveness and trait levels is difficult; as practitioners specializing in SJT development, we can certainly attest to this fact as well. However, by developing SJTs to assess multiple constructs at the same time, the necessity to vary both effectiveness and trait levels across response options for a single item can be eliminated. Although it would still be necessary to vary the level of effectiveness across response options within each item, the important part for trait levels is that they vary across items. The scoring would be a bit different than what is proposed in the focal article but would correspond fairly closely to the “SJT With a Most Likely/Least Likely Format” section of Motowidlo, Hooper, and Jackson (Reference Motowidlo, Hooper, Jackson, Weekley and Ployhart2006).

Assessing multiple constructs in a single SJT also has some theoretical benefits. Using the SJT format to more broadly sample the entire criterion domain would likely result in higher predictive validity coefficients. Simply put, by measuring more of the knowledge, skills, abilities, and other characteristics (KSAOs) required for successful performance in a given occupation, we should be better able to predict subsequent job performance. In addition, even if a more unidimensional construct is desired, there may be facets to that construct that could be measured independently. For example, openness to experience might include four facets: openness to sensations, aesthetics, introspection, and nontraditional (Connelly, Ones, Davies, & Birkland, Reference Connelly, Ones, Davies and Birkland2014). Each facet might have differential relationships with other variables/criteria, and the specific factor variance may or may not predict anything beyond what the global measure predicts. Allowing for multiple constructs and/or facets to be assessed within a single SJT opens these sorts of questions up for empirical testing.

Construct Versus Criterion-Oriented Development

Although the goals and framework advocated by the authors are certainly laudable, there are situations where the more construct-oriented approach to SJT development is not the ideal procedure. Consider the occupation of firefighter. Although there are local variations in some policies and procedures, the vast majority of these jobs are constant across locations. That is, nearly all firefighter jobs have the same sets of KSAOs; this is documented with vast databases of job analysis information. With such highly regimented jobs, it is possible to create a single SJT geared toward the prediction of performance for firefighters across numerous organizations without having to target a more construct-oriented approach of assessing general domain knowledge.

The reason the construct-oriented approach may not be ideal in this situation is a reason brought up by the authors of the focal article: that criterion-related validity might suffer because there is less point-to-point correspondence with the criterion. Part of the beauty of the traditional SJT format is that each SJT item can sample a different part of the criterion space directly. Using a rigorous job analysis as the basis for development of the SJT helps create large and substantive criterion-related validity coefficients (Christian, Edwards, & Bradley, Reference Christian, Edwards and Bradley2010). Ultimately, the goal of a test developer is to provide a test (or set of tests) that best predicts the criterion of interest, given organizational and practical constraints. If there is any risk that an alternative development strategy would result in lower criterion-related validity coefficients, then the prudent approach would be to allow the empirical evidence on SJT development procedures to develop to the point where we can test whether the approach advocated by the authors of the focal piece would result in similar (or better, or worse) criterion-related validity coefficients.

References

Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational judgment tests: Constructs assessed and a meta-analysis of their criterion-related validities. Personnel Psychology, 63, 83117.Google Scholar
Connelly, B. S., Ones, D. S., Davies, S. E., & Birkland, A. (2014). Opening up openness: A theoretical sort following critical incidents methodology and a meta-analytic investigation of the trait family measures. Journal of Personality Assessment, 96 (1), 1728.Google Scholar
Kuncel, N. R., & Tellegen, A. (2009). A conceptual and empirical reexamination of the measurement of social desirability of items: Implications for detecting desirable response style and scale development. Personnel Psychology, 62, 201228.Google Scholar
Le, H., Oh, I.-S., Robbins, S. B., Ilies, R., Holland, E., & Westrick, P. (2011). Too much of a good thing: Curvilinear relationships between personality traits and job performance. Journal of Applied Psychology, 96 (1), 113133. doi:10.1037/a0021016 Google Scholar
Lievens, F., & Motowidlo, S. J. (2016). Situational judgment tests: From measures of situational judgment to measures of general domain knowledge. Industrial and Organizational Psychology: Perspectives on Science and Practice, 9, 322.Google Scholar
Motowidlo, S. J., Hooper, A. C., & Jackson, H. L. (2006). A theoretical basis for situational judgment tests. In Weekley, J. A. & Ployhart, R. E. (Eds.), Situational judgment tests: Theory, measurement, application (pp. 5781). Mahwah, NJ: Erlbaum.Google Scholar