Seeing the Forest but Missing the Trees: The Role of Judgments in Performance Management

John P. Meriac; C. Allen Gorman; Therese Macan

doi:10.1017/iop.2015.6

Seeing the Forest but Missing the Trees: The Role of Judgments in Performance Management

Published online by Cambridge University Press: 21 April 2015

John P. Meriac ,

C. Allen Gorman and

Therese Macan

Show author details

John P. Meriac*: Affiliation:
Department of Psychological Sciences, University of Missouri—St. Louis
C. Allen Gorman: Affiliation:
Department of Management, East Tennessee State University
Therese Macan: Affiliation:
Department of Psychological Sciences, University of Missouri—St. Louis
*: Correspondence concerning this article should be addressed to John P. Meriac, Department of Psychological Sciences, University of Missouri—St. Louis, 425 Stadler Hall, One University Boulevard, St. Louis, MO 63121-4499. E-mail: meriacj@umsl.edu

Article contents

Extract
The Case for Facilitating Performance Judgments
Rater Training and Improvements in Judgments
Streamlining Rating Processes
Back Into the Forest
References

Rights & Permissions

Extract

Various solutions have been proposed to “fix” performance management (PM) over the last several decades. Pulakos, Mueller Hanson, Arad, and Moye (2015) have presented a holistic approach to improving PM in organizations. Although this approach addresses several key elements related to the social context of PM, namely the buy-in of organizational stakeholders, timely and regular feedback, and future-directed feedback, we believe that several robust findings from the PM research literature could further improve this process. Are Pulakos et al. looking at the forest but missing the trees? In the following commentary, we offer several reasons that performance judgments and perhaps even informal ratings are still operating and occurring in the proposed holistic system. Therefore, advancements in other areas of PM research may offer additional ways to fix PM.

Type: Commentaries
Information: Industrial and Organizational Psychology , Volume 8 , Issue 1 , March 2015 , pp. 102 - 108

DOI: https://doi.org/10.1017/iop.2015.6 [Opens in a new window]
Copyright: Copyright © Society for Industrial and Organizational Psychology 2015

Various solutions have been proposed to “fix” performance management (PM) over the last several decades. Pulakos, Mueller Hanson, Arad, and Moye (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have presented a holistic approach to improving PM in organizations. Although this approach addresses several key elements related to the social context of PM, namely the buy-in of organizational stakeholders, timely and regular feedback, and future-directed feedback, we believe that several robust findings from the PM research literature could further improve this process. Are Pulakos et al. looking at the forest but missing the trees? In the following commentary, we offer several reasons that performance judgments and perhaps even informal ratings are still operating and occurring in the proposed holistic system. Therefore, advancements in other areas of PM research may offer additional ways to fix PM.

The Case for Facilitating Performance Judgments

We agree with Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) that the typical rating process conducted by managers can be burdensome and time consuming, but the alternatives to not making ratings are unclear. Several unanswered questions remain about the proposed holistic system and how it could operate. Do managers still make performance judgments even if formal ratings are not assigned? Many of the authors’ suggestions and proposed solutions hinge on performance judgments, and the extent that these are not ratings in the broader sense is unclear.

Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have emphasized the use of “meaningful performance standards” (p. 16). However, these recommended performance standards appear to be nothing more than objective measurements, which have their own well-known problems specifically related to contamination and deficiency (Murphy & Cleveland, Reference Murphy and Cleveland1995). Despite the apparent purity of objective indices, environmental and situational factors outside the employee's control have the capacity to influence outcomes. Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015, p. 16) have gone on to suggest that “performance effectiveness indicators” should be developed for situations in which objective data are not available—but there is no recommendation on how this would be conducted or how the work would be evaluated. Peer reviews and customer feedback have been suggested by the authors as potential examples, but are these likely to be superior to supervisor ratings? Will these still then be subjective performance judgments and simply shift the same, well-documented problems with supervisor ratings to another source? Thus, even if we do not call the performance judgments ratings, it is unclear how the proposed solutions in the focal article would serve to improve on the performance judgments that would be “abandoned” in the proposed system.

Researchers have long recognized that it is difficult to justify the use of performance judgments if those assessments have little to no relationship to the characteristics of the people being appraised (Ilgen, Barnes-Farrell, & McKellin, Reference Ilgen, Barnes-Farrell and McKellin1993). Unfortunately, however, the research literature clearly indicates that performance judgments are rarely sufficiently reliable and valid indicators of employee performance. Raters in PM systems are routinely subjected to a multitude of contextual influences, including cognitive biases; differences in rating goals, purposes, and motivation; and political and organizational influences; these can influence the quality of performance ratings (Levy & Williams, Reference Levy and Williams2004; Murphy & Cleveland, Reference Murphy and Cleveland1995). The proposed holistic system offers no solution to this key issue, although performance judgments are still being made. Fortunately, PM researchers have designed several successful interventions for improving the relevance and quality of performance judgments. Below, we summarize a few of these advancements that can be integrated into a successful PM system.

Rater Training and Improvements in Judgments

Rater training represents one of the most robust interventions that can influence the quality of a PM system. Although a broader focus on all stakeholders is critical to consider, the raters (typically managers) still represent important (if not the most important) sources of information in the system. The importance of the rater is central to all PM systems (Murphy & Cleveland, Reference Murphy and Cleveland1995; Pulakos & O’Leary, Reference Pulakos and O’Leary2011). However, performance judgments made by these individuals, whether these are formal ratings or some other type of evaluation, hinge on the proper observation, categorization, and subsequent scaling of behaviors. In the absence of strategies to align managers in this process, performance judgments could be more problematic than the formal performance ratings that would be abandoned. Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have touched on this point by emphasizing a “shared mindset about what effective PM is” (p. 17). However, the solutions presented in creating this shared mindset ignore key developments in this regard from the PM literature.

In those cases in which Pulakos et al.'s (Reference Pulakos, Mueller Hanson, Arad and Moye2015) alternative framework is adopted or has been adopted (e.g., Cargill), rater training may serve as a viable means of facilitating the shared mindset they have suggested. Specifically, frame-of-reference training (Bernardin & Buckley, Reference Bernardin and Buckley1981) has generally been demonstrated as the most effective approach for improving rater accuracy (Roch, Woehr, Mishra, & Kieszczynska, Reference Roch, Woehr, Mishra and Kieszczynska2012; Woehr & Huffcutt, Reference Woehr and Huffcutt1994). The goal of this rating approach is to train raters to share a common schema or to use a common cognitive knowledge structure regarding performance-related behaviors and their subsequent scaling with respect to effective and ineffective levels of performance (Athey & McIntyre, Reference Athey and McIntyre1987; Bernardin & Buckley, Reference Bernardin and Buckley1981; Gorman & Rentsch, Reference Gorman and Rentsch2009; Noonan & Sulsky, Reference Noonan and Sulsky2001; Schleicher & Day, Reference Schleicher and Day1998). In other words, frame-of-reference training has been empirically demonstrated as an effective strategy for fostering a shared mindset for making performance judgments. Although we agree that frequent communication is beneficial, the authors of the focal article have suggested that only informal discussions should take place. A formal rater training process could perhaps better facilitate this process.

We agree with Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) that focusing on behaviors that matter is critical in a PM system. Toward that end, rater training from other avenues can also serve to improve the quality of ratings. Behavioral specificity (as opposed to making inferences) has been regarded as one of the key benefits in the assessment center (AC) method, which is an integral component of rater training (Thornton & Rupp, Reference Thornton and Rupp2006). This focus on observing behaviors helps facilitate the use of ACs for developmental purposes in general and for valuable diagnostic feedback in particular. Research has demonstrated that rater training in one context can transfer to another. For example, Macan et al. (Reference Macan, Mehner, Havill, Roberts, Heft and Meriac2011) found that managers who served as raters in an AC context (i.e., assessors) provided more specific behaviors in performance appraisals compared with managers who did not serve as assessors. The difference was attributable to the extensive training that was a component of the AC process—in this case, frame-of-reference training. Moreover, a focus on behavior will improve the reliability of performance ratings. Lievens (Reference Lievens2001), for example, found that a frame-of-reference training condition produced interrater reliability estimates of at least .80 or greater in a sample of both students and managers across three dimensions of performance. These estimates were all larger than were those from control training and data-driven training conditions. Thus, rater training appears to be one means to foster the shared mindset that the authors of the focal article have suggested.

Streamlining Rating Processes

Rater training also helps address a point that Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have raised—specifically the downsides of streamlining ratings. Frame-of-reference training specifically aims to form or change the rater's schema to facilitate the rating or judgment process. Research from the training and development literature indicates that as trainees move from novices to experts, they become better able to recognize and evaluate new performance information because their schemas become more pattern oriented and more highly integrated, and information is stored in larger chunks (Cannon-Bowers, Tannenbaum, Salas, & Converse, Reference Cannon-Bowers, Tannenbaum, Salas and Converse1991). To the extent that raters develop a shared mindset, this not only has the potential to minimize idiosyncratic ratings but also has the potential to communicate information more consistently across managers and over time.

Decades earlier, rating forms were regularly examined as potential solutions to improve the PM process. However, given their minimal improvement on rater errors, these approaches were largely abandoned (Landy & Farr, Reference Landy and Farr1980). Unfortunately, the early rating-format research paradigm was based almost entirely on a false premise: that rating errors were actually errors. It is now widely recognized that rater “errors” are poor indicators of the quality of rating formats (Fisicaro, Reference Fisicaro1988; Murphy, Reference Murphy2008; Murphy & Balzer, Reference Murphy and Balzer1989; Nathan & Tippins, Reference Nathan and Tippins1990). Indeed, recent research has demonstrated that the form does matter and can improve the reliability, validity, and factor structure of ratings as well as rater accuracy (Borman et al., Reference Borman, Buck, Hanson, Motowidlo, Stark and Drasgow2001; Goffin, Jelley, Powell, & Johnston, Reference Goffin, Jelley, Powell and Johnston2009; Hoffman et al., Reference Hoffman, Gorman, Blair, Meriac, Overstreet and Atchley2012; Kane & Woehr, Reference Kane, Woehr, Bennett, Lance and Woehr2006). To be clear, we do not suggest that rater training or rating forms should be emphasized over the social context of the PM system but rather suggest that the measurement tools and the processes used to create the shared mindset among managers can facilitate the effective implementation of such a holistic system.

Back Into the Forest

We completely agree with several of Pulakos et al.'s (Reference Pulakos, Mueller Hanson, Arad and Moye2015) key points related to regular feedback and that more frequent attempts to improve performance are beneficial for employees. However, meaningful feedback that has the potential to improve employee performance relies on reliable and valid performance judgments. The proposed holistic system hinges on the assumption that the core problems in traditional PM systems are attributable to the task of making ratings and evaluating performance on an annual basis, not information processing. To the extent that we aim to fix PM, let us not ignore the research advancements related to the facilitation of higher quality performance ratings.

References

Athey, T. R., & McIntyre, R. M. (1987). Effect of rater training on rater accuracy: Level of processing theory and social facilitation theory perspectives. Journal of Applied Psychology, 72, 567–572. doi:10.1037/0021-9010.72.4.567Google Scholar

Bernardin, H. J., & Buckley, M. R. (1981). Strategies in rater training. Academy of Management Review, 6, 205–212. doi:10.5465/AMR.1981.4287782Google Scholar

Borman, W. C., Buck, D. E., Hanson, M. A., Motowidlo, S. J., Stark, S., & Drasgow, F. (2001). An examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales. Journal of Applied Psychology, 86, 965–973. doi:10.1037/0021-9010.86.5.965Google Scholar

Cannon-Bowers, J. A., Tannenbaum, S. L., Salas, E., & Converse, S. A. (1991). Toward an integration of training theory and technique. Human Factors, 33, 281–292.Google Scholar

Fisicaro, S. A. (1988). A reexamination of the relation between halo error and accuracy. Journal of Applied Psychology, 73, 239–244. doi:10.1037/0021-9010.73.2.239CrossRef Google Scholar

Goffin, R. D., Jelley, R. B., Powell, D. M., & Johnston, N. G. (2009). Taking advantage of social comparisons in performance appraisal: The relative percentile method. Human Resource Management, 48, 251–268.CrossRef Google Scholar

Gorman, C. A., & Rentsch, J. R. (2009). Evaluating frame-of-reference rater training effectiveness using performance schema accuracy. Journal of Applied Psychology, 94, 1336–1344. doi:10.1037/a0016476Google Scholar

Hoffman, B. J., Gorman, C. A., Blair, C. A., Meriac, J. P., Overstreet, B., & Atchley, E. K. (2012). Evidence for the effectiveness of an alternative multisource performance rating methodology. Personnel Psychology, 65, 531–563. doi:10.1111/j.1744-6570.2012.01252.xGoogle Scholar

Ilgen, D. R., Barnes-Farrell, J. L., & McKellin, D. B. (1993). Performance appraisal process research in the 1980s: What has it contributed to appraisals in use? Organizational Behavior and Human Decision Processes, 54, 321–368. doi:10.1006/obhd.1993.1015Google Scholar

Kane, J. S., & Woehr, D. J. (2006). Performance measurement reconsidered: An examination of frequency estimation as a basis for assessment. In Bennett, W. Jr., Lance, C. E., & Woehr, D. J. (Eds.), Performance measurement: Current perspectives and future challenges (pp. 77–110). Mahwah, NJ: Erlbaum.Google Scholar

Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72–107.CrossRef Google Scholar

Levy, P. E., & Williams, J. R. (2004). The social context of performance appraisal: A review and framework for the future. Journal of Management, 30, 881–905. doi:10.1016/j.jm.2004.06.005Google Scholar

Lievens, F. (2001). Assessor training strategies and their effects on accuracy, interrater reliability, and discriminant validity. Journal of Applied Psychology, 86, 255–264. doi:10.1037/0021-9010.86.2.255Google Scholar

Macan, T., Mehner, K., Havill, L., Roberts, L., Heft, L., & Meriac, J. P. (2011). Two for the price of one: Assessment center training to focus on behaviors can transfer to performance appraisals. Human Performance, 24, 443–457. doi:10.1080/08959285.2011.614664Google Scholar

Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology: Perspectives on Research and Practice, 1, 148–160. doi:10.1111/j.1754-9434.2008.00030.xGoogle Scholar

Murphy, K. R., & Balzer, W. K. (1989). Rater errors and rating accuracy. Journal of Applied Psychology, 74, 619–624.Google Scholar

Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. Thousand Oaks, CA: Sage.Google Scholar

Nathan, B. R., & Tippins, N. (1990). The consequences of halo “error” in performance ratings: A field study of the moderating effect of halo on test validation results. Journal of Applied Psychology, 75, 290–296. doi:10.1037/0021-9010.75.3.290Google Scholar

Noonan, L. E., & Sulsky, L. M. (2001). Impact of frame-of-reference and behavioral observation training on alternative training effectiveness criteria in a Canadian military sample. Human Performance, 14, 3–26. doi:10.1207/S15327043HUP1401_02Google Scholar

Pulakos, E. D., Mueller Hanson, R., Arad, S., & Moye, N. (2015). Performance management can be fixed: An on-the-job experiential learning approach for complex behavior change. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 51–76.Google Scholar

Pulakos, E. D., & O’Leary, R. S. (2011). Why is performance management broken? Industrial and Organizational Psychology: Perspectives on Science and Practice, 4, 146–164. doi:10.1111/j.1754-9434.2011.01315.xGoogle Scholar

Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U. (2012). Rater training revisited: An updated meta-analytic review of frame-of-reference training. Journal of Occupational and Organizational Psychology, 85, 370–395. doi:10.1111/j.2044-8325.2011.02045.xCrossRef Google Scholar

Schleicher, D. J., & Day, D. V. (1998). A cognitive evaluation of frame-of-reference rater training: Content and process issues. Organizational Behavior and Human Decision Processes, 73, 76–101. doi:10.1006/obhd.1998.2751CrossRef Google Scholar PubMed

ThorntonG. C., III G. C., III, & Rupp, D. E. (2006). Assessment centers in human resource management. Mahwah, NJ: Erlbaum.CrossRef Google Scholar

Woehr, D. J., & Huffcutt, A. I. (1994). Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology, 67, 189–205. doi:10.1111/j.2044-8325.1994.tb00562.xGoogle Scholar

Article contents

Seeing the Forest but Missing the Trees: The Role of Judgments in Performance Management

Extract

The Case for Facilitating Performance Judgments

Rater Training and Improvements in Judgments

Streamlining Rating Processes

Back Into the Forest

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests