Various solutions have been proposed to “fix” performance management (PM) over the last several decades. Pulakos, Mueller Hanson, Arad, and Moye (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have presented a holistic approach to improving PM in organizations. Although this approach addresses several key elements related to the social context of PM, namely the buy-in of organizational stakeholders, timely and regular feedback, and future-directed feedback, we believe that several robust findings from the PM research literature could further improve this process. Are Pulakos et al. looking at the forest but missing the trees? In the following commentary, we offer several reasons that performance judgments and perhaps even informal ratings are still operating and occurring in the proposed holistic system. Therefore, advancements in other areas of PM research may offer additional ways to fix PM.
The Case for Facilitating Performance Judgments
We agree with Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) that the typical rating process conducted by managers can be burdensome and time consuming, but the alternatives to not making ratings are unclear. Several unanswered questions remain about the proposed holistic system and how it could operate. Do managers still make performance judgments even if formal ratings are not assigned? Many of the authors’ suggestions and proposed solutions hinge on performance judgments, and the extent that these are not ratings in the broader sense is unclear.
Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have emphasized the use of “meaningful performance standards” (p. 16). However, these recommended performance standards appear to be nothing more than objective measurements, which have their own well-known problems specifically related to contamination and deficiency (Murphy & Cleveland, Reference Murphy and Cleveland1995). Despite the apparent purity of objective indices, environmental and situational factors outside the employee's control have the capacity to influence outcomes. Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015, p. 16) have gone on to suggest that “performance effectiveness indicators” should be developed for situations in which objective data are not available—but there is no recommendation on how this would be conducted or how the work would be evaluated. Peer reviews and customer feedback have been suggested by the authors as potential examples, but are these likely to be superior to supervisor ratings? Will these still then be subjective performance judgments and simply shift the same, well-documented problems with supervisor ratings to another source? Thus, even if we do not call the performance judgments ratings, it is unclear how the proposed solutions in the focal article would serve to improve on the performance judgments that would be “abandoned” in the proposed system.
Researchers have long recognized that it is difficult to justify the use of performance judgments if those assessments have little to no relationship to the characteristics of the people being appraised (Ilgen, Barnes-Farrell, & McKellin, Reference Ilgen, Barnes-Farrell and McKellin1993). Unfortunately, however, the research literature clearly indicates that performance judgments are rarely sufficiently reliable and valid indicators of employee performance. Raters in PM systems are routinely subjected to a multitude of contextual influences, including cognitive biases; differences in rating goals, purposes, and motivation; and political and organizational influences; these can influence the quality of performance ratings (Levy & Williams, Reference Levy and Williams2004; Murphy & Cleveland, Reference Murphy and Cleveland1995). The proposed holistic system offers no solution to this key issue, although performance judgments are still being made. Fortunately, PM researchers have designed several successful interventions for improving the relevance and quality of performance judgments. Below, we summarize a few of these advancements that can be integrated into a successful PM system.
Rater Training and Improvements in Judgments
Rater training represents one of the most robust interventions that can influence the quality of a PM system. Although a broader focus on all stakeholders is critical to consider, the raters (typically managers) still represent important (if not the most important) sources of information in the system. The importance of the rater is central to all PM systems (Murphy & Cleveland, Reference Murphy and Cleveland1995; Pulakos & O’Leary, Reference Pulakos and O’Leary2011). However, performance judgments made by these individuals, whether these are formal ratings or some other type of evaluation, hinge on the proper observation, categorization, and subsequent scaling of behaviors. In the absence of strategies to align managers in this process, performance judgments could be more problematic than the formal performance ratings that would be abandoned. Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have touched on this point by emphasizing a “shared mindset about what effective PM is” (p. 17). However, the solutions presented in creating this shared mindset ignore key developments in this regard from the PM literature.
In those cases in which Pulakos et al.'s (Reference Pulakos, Mueller Hanson, Arad and Moye2015) alternative framework is adopted or has been adopted (e.g., Cargill), rater training may serve as a viable means of facilitating the shared mindset they have suggested. Specifically, frame-of-reference training (Bernardin & Buckley, Reference Bernardin and Buckley1981) has generally been demonstrated as the most effective approach for improving rater accuracy (Roch, Woehr, Mishra, & Kieszczynska, Reference Roch, Woehr, Mishra and Kieszczynska2012; Woehr & Huffcutt, Reference Woehr and Huffcutt1994). The goal of this rating approach is to train raters to share a common schema or to use a common cognitive knowledge structure regarding performance-related behaviors and their subsequent scaling with respect to effective and ineffective levels of performance (Athey & McIntyre, Reference Athey and McIntyre1987; Bernardin & Buckley, Reference Bernardin and Buckley1981; Gorman & Rentsch, Reference Gorman and Rentsch2009; Noonan & Sulsky, Reference Noonan and Sulsky2001; Schleicher & Day, Reference Schleicher and Day1998). In other words, frame-of-reference training has been empirically demonstrated as an effective strategy for fostering a shared mindset for making performance judgments. Although we agree that frequent communication is beneficial, the authors of the focal article have suggested that only informal discussions should take place. A formal rater training process could perhaps better facilitate this process.
We agree with Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) that focusing on behaviors that matter is critical in a PM system. Toward that end, rater training from other avenues can also serve to improve the quality of ratings. Behavioral specificity (as opposed to making inferences) has been regarded as one of the key benefits in the assessment center (AC) method, which is an integral component of rater training (Thornton & Rupp, Reference Thornton and Rupp2006). This focus on observing behaviors helps facilitate the use of ACs for developmental purposes in general and for valuable diagnostic feedback in particular. Research has demonstrated that rater training in one context can transfer to another. For example, Macan et al. (Reference Macan, Mehner, Havill, Roberts, Heft and Meriac2011) found that managers who served as raters in an AC context (i.e., assessors) provided more specific behaviors in performance appraisals compared with managers who did not serve as assessors. The difference was attributable to the extensive training that was a component of the AC process—in this case, frame-of-reference training. Moreover, a focus on behavior will improve the reliability of performance ratings. Lievens (Reference Lievens2001), for example, found that a frame-of-reference training condition produced interrater reliability estimates of at least .80 or greater in a sample of both students and managers across three dimensions of performance. These estimates were all larger than were those from control training and data-driven training conditions. Thus, rater training appears to be one means to foster the shared mindset that the authors of the focal article have suggested.
Streamlining Rating Processes
Rater training also helps address a point that Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have raised—specifically the downsides of streamlining ratings. Frame-of-reference training specifically aims to form or change the rater's schema to facilitate the rating or judgment process. Research from the training and development literature indicates that as trainees move from novices to experts, they become better able to recognize and evaluate new performance information because their schemas become more pattern oriented and more highly integrated, and information is stored in larger chunks (Cannon-Bowers, Tannenbaum, Salas, & Converse, Reference Cannon-Bowers, Tannenbaum, Salas and Converse1991). To the extent that raters develop a shared mindset, this not only has the potential to minimize idiosyncratic ratings but also has the potential to communicate information more consistently across managers and over time.
Decades earlier, rating forms were regularly examined as potential solutions to improve the PM process. However, given their minimal improvement on rater errors, these approaches were largely abandoned (Landy & Farr, Reference Landy and Farr1980). Unfortunately, the early rating-format research paradigm was based almost entirely on a false premise: that rating errors were actually errors. It is now widely recognized that rater “errors” are poor indicators of the quality of rating formats (Fisicaro, Reference Fisicaro1988; Murphy, Reference Murphy2008; Murphy & Balzer, Reference Murphy and Balzer1989; Nathan & Tippins, Reference Nathan and Tippins1990). Indeed, recent research has demonstrated that the form does matter and can improve the reliability, validity, and factor structure of ratings as well as rater accuracy (Borman et al., Reference Borman, Buck, Hanson, Motowidlo, Stark and Drasgow2001; Goffin, Jelley, Powell, & Johnston, Reference Goffin, Jelley, Powell and Johnston2009; Hoffman et al., Reference Hoffman, Gorman, Blair, Meriac, Overstreet and Atchley2012; Kane & Woehr, Reference Kane, Woehr, Bennett, Lance and Woehr2006). To be clear, we do not suggest that rater training or rating forms should be emphasized over the social context of the PM system but rather suggest that the measurement tools and the processes used to create the shared mindset among managers can facilitate the effective implementation of such a holistic system.
Back Into the Forest
We completely agree with several of Pulakos et al.'s (Reference Pulakos, Mueller Hanson, Arad and Moye2015) key points related to regular feedback and that more frequent attempts to improve performance are beneficial for employees. However, meaningful feedback that has the potential to improve employee performance relies on reliable and valid performance judgments. The proposed holistic system hinges on the assumption that the core problems in traditional PM systems are attributable to the task of making ratings and evaluating performance on an annual basis, not information processing. To the extent that we aim to fix PM, let us not ignore the research advancements related to the facilitation of higher quality performance ratings.
Various solutions have been proposed to “fix” performance management (PM) over the last several decades. Pulakos, Mueller Hanson, Arad, and Moye (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have presented a holistic approach to improving PM in organizations. Although this approach addresses several key elements related to the social context of PM, namely the buy-in of organizational stakeholders, timely and regular feedback, and future-directed feedback, we believe that several robust findings from the PM research literature could further improve this process. Are Pulakos et al. looking at the forest but missing the trees? In the following commentary, we offer several reasons that performance judgments and perhaps even informal ratings are still operating and occurring in the proposed holistic system. Therefore, advancements in other areas of PM research may offer additional ways to fix PM.
The Case for Facilitating Performance Judgments
We agree with Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) that the typical rating process conducted by managers can be burdensome and time consuming, but the alternatives to not making ratings are unclear. Several unanswered questions remain about the proposed holistic system and how it could operate. Do managers still make performance judgments even if formal ratings are not assigned? Many of the authors’ suggestions and proposed solutions hinge on performance judgments, and the extent that these are not ratings in the broader sense is unclear.
Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have emphasized the use of “meaningful performance standards” (p. 16). However, these recommended performance standards appear to be nothing more than objective measurements, which have their own well-known problems specifically related to contamination and deficiency (Murphy & Cleveland, Reference Murphy and Cleveland1995). Despite the apparent purity of objective indices, environmental and situational factors outside the employee's control have the capacity to influence outcomes. Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015, p. 16) have gone on to suggest that “performance effectiveness indicators” should be developed for situations in which objective data are not available—but there is no recommendation on how this would be conducted or how the work would be evaluated. Peer reviews and customer feedback have been suggested by the authors as potential examples, but are these likely to be superior to supervisor ratings? Will these still then be subjective performance judgments and simply shift the same, well-documented problems with supervisor ratings to another source? Thus, even if we do not call the performance judgments ratings, it is unclear how the proposed solutions in the focal article would serve to improve on the performance judgments that would be “abandoned” in the proposed system.
Researchers have long recognized that it is difficult to justify the use of performance judgments if those assessments have little to no relationship to the characteristics of the people being appraised (Ilgen, Barnes-Farrell, & McKellin, Reference Ilgen, Barnes-Farrell and McKellin1993). Unfortunately, however, the research literature clearly indicates that performance judgments are rarely sufficiently reliable and valid indicators of employee performance. Raters in PM systems are routinely subjected to a multitude of contextual influences, including cognitive biases; differences in rating goals, purposes, and motivation; and political and organizational influences; these can influence the quality of performance ratings (Levy & Williams, Reference Levy and Williams2004; Murphy & Cleveland, Reference Murphy and Cleveland1995). The proposed holistic system offers no solution to this key issue, although performance judgments are still being made. Fortunately, PM researchers have designed several successful interventions for improving the relevance and quality of performance judgments. Below, we summarize a few of these advancements that can be integrated into a successful PM system.
Rater Training and Improvements in Judgments
Rater training represents one of the most robust interventions that can influence the quality of a PM system. Although a broader focus on all stakeholders is critical to consider, the raters (typically managers) still represent important (if not the most important) sources of information in the system. The importance of the rater is central to all PM systems (Murphy & Cleveland, Reference Murphy and Cleveland1995; Pulakos & O’Leary, Reference Pulakos and O’Leary2011). However, performance judgments made by these individuals, whether these are formal ratings or some other type of evaluation, hinge on the proper observation, categorization, and subsequent scaling of behaviors. In the absence of strategies to align managers in this process, performance judgments could be more problematic than the formal performance ratings that would be abandoned. Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have touched on this point by emphasizing a “shared mindset about what effective PM is” (p. 17). However, the solutions presented in creating this shared mindset ignore key developments in this regard from the PM literature.
In those cases in which Pulakos et al.'s (Reference Pulakos, Mueller Hanson, Arad and Moye2015) alternative framework is adopted or has been adopted (e.g., Cargill), rater training may serve as a viable means of facilitating the shared mindset they have suggested. Specifically, frame-of-reference training (Bernardin & Buckley, Reference Bernardin and Buckley1981) has generally been demonstrated as the most effective approach for improving rater accuracy (Roch, Woehr, Mishra, & Kieszczynska, Reference Roch, Woehr, Mishra and Kieszczynska2012; Woehr & Huffcutt, Reference Woehr and Huffcutt1994). The goal of this rating approach is to train raters to share a common schema or to use a common cognitive knowledge structure regarding performance-related behaviors and their subsequent scaling with respect to effective and ineffective levels of performance (Athey & McIntyre, Reference Athey and McIntyre1987; Bernardin & Buckley, Reference Bernardin and Buckley1981; Gorman & Rentsch, Reference Gorman and Rentsch2009; Noonan & Sulsky, Reference Noonan and Sulsky2001; Schleicher & Day, Reference Schleicher and Day1998). In other words, frame-of-reference training has been empirically demonstrated as an effective strategy for fostering a shared mindset for making performance judgments. Although we agree that frequent communication is beneficial, the authors of the focal article have suggested that only informal discussions should take place. A formal rater training process could perhaps better facilitate this process.
We agree with Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) that focusing on behaviors that matter is critical in a PM system. Toward that end, rater training from other avenues can also serve to improve the quality of ratings. Behavioral specificity (as opposed to making inferences) has been regarded as one of the key benefits in the assessment center (AC) method, which is an integral component of rater training (Thornton & Rupp, Reference Thornton and Rupp2006). This focus on observing behaviors helps facilitate the use of ACs for developmental purposes in general and for valuable diagnostic feedback in particular. Research has demonstrated that rater training in one context can transfer to another. For example, Macan et al. (Reference Macan, Mehner, Havill, Roberts, Heft and Meriac2011) found that managers who served as raters in an AC context (i.e., assessors) provided more specific behaviors in performance appraisals compared with managers who did not serve as assessors. The difference was attributable to the extensive training that was a component of the AC process—in this case, frame-of-reference training. Moreover, a focus on behavior will improve the reliability of performance ratings. Lievens (Reference Lievens2001), for example, found that a frame-of-reference training condition produced interrater reliability estimates of at least .80 or greater in a sample of both students and managers across three dimensions of performance. These estimates were all larger than were those from control training and data-driven training conditions. Thus, rater training appears to be one means to foster the shared mindset that the authors of the focal article have suggested.
Streamlining Rating Processes
Rater training also helps address a point that Pulakos et al. (Reference Pulakos, Mueller Hanson, Arad and Moye2015) have raised—specifically the downsides of streamlining ratings. Frame-of-reference training specifically aims to form or change the rater's schema to facilitate the rating or judgment process. Research from the training and development literature indicates that as trainees move from novices to experts, they become better able to recognize and evaluate new performance information because their schemas become more pattern oriented and more highly integrated, and information is stored in larger chunks (Cannon-Bowers, Tannenbaum, Salas, & Converse, Reference Cannon-Bowers, Tannenbaum, Salas and Converse1991). To the extent that raters develop a shared mindset, this not only has the potential to minimize idiosyncratic ratings but also has the potential to communicate information more consistently across managers and over time.
Decades earlier, rating forms were regularly examined as potential solutions to improve the PM process. However, given their minimal improvement on rater errors, these approaches were largely abandoned (Landy & Farr, Reference Landy and Farr1980). Unfortunately, the early rating-format research paradigm was based almost entirely on a false premise: that rating errors were actually errors. It is now widely recognized that rater “errors” are poor indicators of the quality of rating formats (Fisicaro, Reference Fisicaro1988; Murphy, Reference Murphy2008; Murphy & Balzer, Reference Murphy and Balzer1989; Nathan & Tippins, Reference Nathan and Tippins1990). Indeed, recent research has demonstrated that the form does matter and can improve the reliability, validity, and factor structure of ratings as well as rater accuracy (Borman et al., Reference Borman, Buck, Hanson, Motowidlo, Stark and Drasgow2001; Goffin, Jelley, Powell, & Johnston, Reference Goffin, Jelley, Powell and Johnston2009; Hoffman et al., Reference Hoffman, Gorman, Blair, Meriac, Overstreet and Atchley2012; Kane & Woehr, Reference Kane, Woehr, Bennett, Lance and Woehr2006). To be clear, we do not suggest that rater training or rating forms should be emphasized over the social context of the PM system but rather suggest that the measurement tools and the processes used to create the shared mindset among managers can facilitate the effective implementation of such a holistic system.
Back Into the Forest
We completely agree with several of Pulakos et al.'s (Reference Pulakos, Mueller Hanson, Arad and Moye2015) key points related to regular feedback and that more frequent attempts to improve performance are beneficial for employees. However, meaningful feedback that has the potential to improve employee performance relies on reliable and valid performance judgments. The proposed holistic system hinges on the assumption that the core problems in traditional PM systems are attributable to the task of making ratings and evaluating performance on an annual basis, not information processing. To the extent that we aim to fix PM, let us not ignore the research advancements related to the facilitation of higher quality performance ratings.