Introduction
Over the last decade, there has been a global trend away from the use of traditional governmental bureaus to deliver services. The most common feature of these reforms has been an increase in the autonomy of agencies and the introduction of more “business-like” organisational structures, language, accounting procedures and incentive systems. A variety of labels have been used to describe this restructuring process, including “agencification”, “corporatization” and even “quangocratization”. (Hereafter, for convenience, we describe both the process and the outcome as agencification.Footnote 1 ) The OECD (2005, 114–115) estimates that such agencies now account for 50% or more of public expenditures and employment in some countries. For example, in Sweden, such agencies now dominate government service provision (Jacobsson and Sundström Reference Jacobsson and Sundström2007). The trend does not appear to be abating.
The canonical feature of agencification is that governments retain formal ownership and politicians retain hierarchical control of nonetheless more autonomous organisations. As it remains a creature of government, the agency’s primary organisational goal remains something other than profit maximisation. Therefore, it is distinct from privatisation where the formal organisational goal becomes profit maximisation.Footnote 2 Agencification, however, does usually involve more emphasis on both operational efficiency and enhanced revenue generation (Bertelli Reference Bertelli2006, 232). Furthermore, the strategic direction and day-to-day management are often entrusted to managers that are not part of the civil service. As a result of these characteristics, such agencies are somewhat awkwardly balanced between government and market (Greve Reference Greve1999; Bertelli Reference Bertelli2006). What effect does this tension have on their performance? Most importantly, what is the long-run impact of agencification on performance?
Although there is empirical evidence suggesting that agencification does produce short-term beneficial impacts, that evidence does not address the longer-run impact question because it is based on cross-sectional data or very brief time series. We address this gap by examining the performance of 13 agencies in the province of Québec, Canada. We examine multiple dimensions of performance for these agencies using a variety of measures over approximately a 10-year period following agencification.
To summarise our empirical results, we find that these agencies showed sustained improvement in a number of performance measures. Looking most closely at labour productivity, we find that it increased annually following agencification, although at a declining rate, such that performance plateaued with all gains realised within about 10 years. These findings are substantively and statistically significant.
What is agencification?
There are many ways to categorise governmental organisational designs, or more specifically, the architecture of government ownership and oversight. An initial problem in analysing agencification is the absence of a bright theoretical line separating the resulting agencies from other institutional forms. These agencies are almost never as autonomous as state-owned enterprises (SOEs), but they are usually more autonomous than traditional line bureaus that perform similar tasks (Vining Reference Vining2011). Lægreid and Verhoest (Reference Lægreid and Verhoest2010, 4) exemplify this balancing act in their definition of agencification. On the one hand, they argue that an agency is “a structurally disaggregated body, formally separated from the ministry, which carries out public tasks at a national level on a permanent basis, is staffed by public servants, is financed mainly by the state budget, and is subject to public legal procedures”. On the other hand, they also argue that agencies “are not totally independent, because executives normally have ultimate political responsibility for their activities”. Other research identifies a similar tension (Richards and Smith Reference Richards and Smith2006; Verschuere Reference Verschuere2007). Lægreid and Verhoest’s definition is quite restrictive in view of the observed evolutionary diversity among agencies (Florio and Fecher Reference Florio and Fecher2011), especially cross-nationally. For example, many agencies are not financed “mainly by the state budget” because they have legislative authority to collect fees that at least cover their operating costs (Bertelli Reference Bertelli2006; Vining Reference Vining2011). In some countries, the managers and workers may be “public servants”, but they are not part of the government civil service. In addition, at least in nations with federal systems, many of these agencies operate at the sub-national level.
The efficiency hypothesis
A number of management scholars, applied economists and management consultants have posited that the major rationale for agencification is to improve “efficiency”. The expectation of improvement in efficiency is largely based on new public management (NPM) ideas. The core prescription that relates to agencification is “establishing an operating environment for a government organization which replicates the internal and external conditions of successful private enterprises” (Nicholls Reference Nicholls1989, 27). These conditions usually include giving agencies narrower task domains, clearer mandates, greater access to higher-powered incentives and more autonomy from political interference (Walsh Reference Walsh1995; Kettl Reference Kettl2005; Pollitt et al. Reference Pollitt, Van Thiel and Homberg2007).Footnote 3
Critics have retorted that the “be more like private enterprises” metaphor is simplistic; this has been famously summarised in the aphorism that private sector firms and public sector agencies are “fundamentally alike in all unimportant respects” (Sayre Reference Sayre1958, 102; see also, e.g. Wilson Reference Wilson1989; Hargrove and Glidewell Reference Hargrove and Glidewell1990, but see Boyne Reference Boyne2002). Other critics have more specifically questioned the intellectual coherence of the NPM (Hood Reference Hood1990; Goldfinch and Wallis Reference Goldfinch and Wallis2010; Reddy et al. Reference Reddy, Locke and Scrimgeour2011). They raise a fundamental dialectic. On the one hand, NPM appears to be optimistic about the intrinsic motivation of public servants, as is public interest theory, because it seeks to “set managers free” to achieve public goals. Yet, on the other hand, NPM scholars also emphasise higher-powered extrinsic rewards and punishments, unlike public interest theory. In this regard, NPM appears to be heavily influenced by principal-agent theory (Dixit Reference Dixit2002; Burgess and Ratto Reference Burgess and Ratto2003; Vining and Weimer Reference Vining and Weimer2005). NPM has also been criticised on the more practical grounds that the relative importance of the various recipe ingredients is unclear. Lægreid and Verhoest (Reference Lægreid and Verhoest2010, 1), for example, focus on narrower agency task domains: “New Public Management (NPM) assumes that task specialization results in efficiency gains”. Other scholars focus on the potential benefits of agency and managerial autonomy in allowing managers to pursue the public interest freed from political interference (Pendlebury and Karbhari Reference Pendlebury and Karbhari1998; Talbot Reference Talbot2004). There is empirical evidence that agencification, in its various forms, does actually result in greater autonomy (Verhoest et al. Reference Verhoest, Peters, Bouckaert and Verschuere2004; Lægreid et al. Reference Lægreid, Roness and Rubecksen2005; Verhoest Reference Verhoest2005; Painter et al. Reference Painter, Burns and Yee2010).
From a principal-agent theory perspective, however, performance improvements from structural reform must fundamentally flow from either reduced moral hazard (hidden action) or reduced adverse selection (hidden information) or some combination of the two (Dixit Reference Dixit2002; Burgess and Ratto Reference Burgess and Ratto2003; Bertelli Reference Bertelli2006). In practice, it is usually not possible to distinguish precisely between moral hazard and adverse selection in their causal contribution to “agency loss”. However, both can, in theory, be reduced by better alignment between managerial rewards and specific performance outcomes – in other words, a shift to more high-powered incentives in conjunction with narrower task domains that accommodate more measurable mandates (Frant Reference Frant1996; Hyndman and Eden Reference Hyndman and Eden2002). In addition, agencies enjoy some formal insulation from political actors. In turn, this also increases managerial autonomy within the agency, particularly the autonomy of the chief executive (CE) who now functions more like a private sector CE officer. Autonomy could improve agency performance by allowing the CE and other managers to act in the public interest, that is, to create greater public value with less political interference. From a principal-agent perspective, however, the problem is that greater autonomy also affords managers greater freedom to engage in inefficient self-interested behaviour (even if only “the quiet life”).
In sum, agencification – clearer incentives and targets for the CE and employees, increased transparency around managerial behaviour and performance and greater pressure on CEs to deliver results – should lead to improved performance if it is not overwhelmed by the countervailing effects of more self-interested behaviour flowing from increased agency autonomy. The net effect of these opposing forces on performance is theoretically ambiguous, as summarised in Figure 1 (adapted from Bilodeau et al. Reference Bilodeau, Laurin and Vining2007) by the question mark (?). If the net impact is positive, we would expect the rest of the flow in Figure 1 to unfold as in the lower part of the figure and ultimately result in improved performance.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190710070438916-0203:S0143814X14000245:S0143814X14000245_fig1g.jpeg?pub-status=live)
Figure 1 Summary of information and incentive changes with agencification.
The efficiency hypothesis clearly has both positive and normative content. In its positive, managerial NPM guise (which is of most interest for thinking about performance effects), the hypothesis posits that performance will improve following agencification, as viewed from the principal’s perspective. In addition, from most normative perspectives, it is hard to argue with more output, lower cost or more output at a lower cost. Therefore, the empirical evidence on the performance effects of agencification is crucial. As we discuss below, there is some emerging evidence of changed behaviour and improved performance in the short run.
Yet, an important question (holding constant for the moment the problem of appropriate performance measurement) is whether agency managers will continue to deliver services more efficiently over a longer time horizon. Harding and Preker (Reference Harding and Preker2000, 19) observe that (although without presenting empirical evidence): “In some cases performance has improved. But often these improvements have not been sustained”. Their finding is consistent with scepticism that public managers are still ultimately subject to political control rather than to market discipline. Thus, here the concern is that not enough has fundamentally changed because agencification does not “depoliticize decisionmaking in a sustainable way” (Harding and Preker Reference Harding and Preker2000, 19). Why might this be the case?
First, consider principals. It takes considerable discipline for political principals to continue to monitor performance and the metrics that appropriately incentivise performance. Although some politicians and senior civil servants may care about efficiency and effectiveness, eventually the attention of these monitors will shift to more pressing political concerns (Koop Reference Koop2011). In addition, even initially altruistic politicians are likely to demand other outputs when efficiency conflicts with their electoral needs. This version of the null hypothesis posits that, in the long run, the principal-agent problem is more likely to reside with (political) principals than with (managerial) agents.
Second, consider agents. The motivation of agents subject to high-powered incentives is problematic in the public sector. Burgess and Ratto (Reference Burgess and Ratto2003, 298) summarise these reasons well:
This is due to aspects like multi-tasking, multiple principals, the difficulty of defining and measuring output, and the issue of the intrinsic motivation of workers. In these circumstances the theory predicts that low-powered incentive schemes are optimal and task assignment and work organisation become crucial in promoting better performance and may sometimes be substitutes for high-powered financial incentives.
These caveats suggest that agencification might produce a short-term improvement in performance that fades gradually after the change of status: a “Hawthorne-like” effect (Mayo Reference Mayo1933; Levitt and List Reference Levitt and List2009). This fade might emerge over time as politicians distance themselves from agencies that engage in activities that could cost votes. Agencification could be useful, because it provides politicians with the appearance of distance and absence of control, while retaining the option to exercise control covertly or even explicitly in a politically salient emergency. For instance, governments frequently create gambling agencies – for example, the Georgia Lottery Corporation and the British Columbia Lottery Corporation (Jensen Reference Jensen2003). These agencies are attractive because they are usually a major source of revenue for government, but most politicians are nervous about the potential accusations of promoting immoral behaviour and hypocrisy – fostering gambling addiction, family dysfunction, indebtedness and the like. Similarly, tolling authorities (and other revenue-generating agencies with some degree of monopoly power) are often structured as stand-alone agencies. Politicians can protest that they oppose the rate increases, lament that they have no power to stop them and pocket the revenues.
To foreshadow the empirical analysis, it is important to emphasise that the appropriate performance metrics for agencies with monopoly pricing power are both complex and controversial. Agencification is likely to provide managers with more discretion to exercise any monopoly power that the agency possesses. However, success at pricing above social marginal cost is not the best way to judge the social value of these agencies (Vining Reference Vining2011). Thus, measures that capture this, such as revenue increases, expenditure increases or revenue-expenditure margin increases, are normatively suspect. Nonetheless, some governments creating agencies are likely to regard such revenue increases and related metric changes as one form of performance improvement. In addition, agencies with both monopoly power and enhanced autonomy may well fall prey to greater technical or X-inefficiency (Leibenstein Reference Leibenstein1966). An increase in this kind of inefficiency would not be captured by revenue, expenditure or revenue-expenditure margin changes. Indeed, increases in revenue, expenditures or margins might enable this kind of inefficiency. To address this issue, cost and productivity measures are required. In sum, for those agencies with some degree of monopoly power, an assessment of performance improvement will be dependent on the appropriateness of the specific metrics. As we discuss in detail later, our strategy is to use an extensive set of measures across the agencies that provide a comprehensive picture of the behaviour and performance changes following their creation. Most importantly, we explicitly examine productivity change, as this is socially valuable under all plausible circumstances. Having provided direct evidence on this normatively unambiguous measure, one can remain agnostic, or reject, a change on any other particular individual metric as really representing improvement (Boyne Reference Boyne2003a).
As we explain further below, our primary hypothesis is that agencification improves performance as measured by productivity. In addition, we hypothesise that performance improvements continue for many years following agencification. However, we also test a secondary hypothesis that performance gains decline over time until a performance plateau is reached.
Existing (and related) empirical evidence on performance effects
The global empirical evidence on the performance effects of agencification is quite limited. Therefore, it is useful to also examine the empirical evidence on performance change in two closely related contexts: (1) changes in bureau autonomy that do not amount to agencification and (2) changes in bureau or agency incentives (and motivation). The relevance of evidence of performance change following changes in autonomy or incentives is clear: agencification increases formal autonomy and provides more opportunities to employ more high-powered, or at least more extrinsic, incentives.
Autonomy and performance
A manifest purpose of many government reorganisations is to increase the autonomy of line bureaus from political influence and hierarchical bureaucratic control. However, evidence on the performance effects of autonomy is not extensive. Lewis (Reference Lewis2003, 158) has shown that, at the federal level of the United States (US) government, certain kinds of autonomy (insulation) improve the chances that an agency will survive over time; there is also evidence that politicians delegate autonomy to agencies to insulate desired policies when political uncertainty is high (Lavertu Reference Lavertu2013). Concerning more conventional measures of performance, Moynihan and Pandey (Reference Moynihan and Pandey2006, 122) review the extant empirical evidence and conclude: “There is some evidence that clear goals and bureaucratic autonomy are clear predictors of public sector performance”. However, also based on a review of the evidence, Rainey and Steinbauer (Reference Rainey and Steinbauer1999, 16) argue that the evidence suggests a non-linear relationship: “Government agencies will be more effective when they have higher levels of autonomy, but not extremely high levels of autonomy”.
There are only a small number of empirical studies that examine the impact of autonomy on line bureaus (Poister et al. Reference Poister, Pitts and Edwards2011). Wolf (Reference Wolf1993) analysed 44 federal agencies in terms of the impact of autonomy (among other variables) on performance. In aggregate, he found that higher levels of political autonomy had a positive effect on performance. Moynihan and Pandey (Reference Moynihan and Pandey2006) examined a large sample of information system managers in state-level primary health and human services agencies and concluded that a higher level of “managerial authority” is associated with better performance. Langbein (Reference Langbein2009) used survey data from US government employees to examine the relationship between employee perceptions of discretion and productivity. Although her results show a complex, contingent relationship between these variables, her overall conclusion is: “there appears to be a trade-off between accountability (to the executive) and productivity: executive political controls reduce both discretion (presumably raising accountability) and productivity (106)”.
The case study evidence on the insulation of agencies and programmes from political influence also generally concludes that increased autonomy results in better performance, although the performance measures have been quite variable (Borins Reference Borins1998; Carpenter Reference Carpenter2001; Barzelay and Campbell Reference Barzelay and Campbell2003; Katyal Reference Katyal2006; Stephenson Reference Stephenson2008). In sum, there is some evidence that autonomy improves performance and little evidence that it worsens performance.
(Higher-powered) incentives and performance
A contentious issue in public management concerns the effect that individual and group incentives have on public sector performance (for reviews, see Moynihan Reference Moynihan2008; Perry et al. 2009; Heinrich and Courty Reference Heinrich and Courty2010; Langbein Reference Langbein2010). Theory suggests that “pay for performance” (PFP) is likely to be efficacious primarily where output is easy to measure (Lazear Reference Lazear2000; Lazear and Shaw Reference Lazear and Shaw2007). However, as Langbein (Reference Langbein2010, 12) warns, “in complex modern organizations, tasks are complex, and team-based; individual performance is hard to observe and hard to link to firm profits…when performance is hard to measure, PFP is not even used in the competitive private sector, where efficiency is the unintended by-product of self-interested actors”. The crucial question is whether extrinsic rewards “crowd out” intrinsic rewards (Frey and Jegen Reference Frey and Jegen2001; Le Grand Reference Le Grand2003; Perry and Hondegem Reference Perry and Hondeghem2008). Suffice it to say that “it remains quite open what to expect regarding the relationship between performance and pay” (Binderkrantz and Christensen Reference Binderkrantz and Christensen2011, 33).
In the US, early empirical research centred on the Job Training Partnership Act of 1982 (Heckman et al. Reference Heckman, Smith and Taber1996; Courty and Marschke Reference Courty and Marschke1997). These studies generally found considerable “gaming” of what were, initially, very blunt incentive schemes. Heinrich (Reference Heinrich2007) examines the “high performance bonuses” introduced by the US federal government and the incentive effects of the Workforce Investment Act, the largest US employment and training programme. She concludes that the incentive system was so badly designed that any positive motivating effects would have been very surprising. Heinrich (Reference Heinrich2007, 281) concludes, “the results of the theoretical and empirical investigation suggest that high performance bonus systems are more likely to encourage misrepresentation of performance and other strategic behaviors than to recognize and motivate exceptional performance and performance improvements”.
There have also been a number of European empirical studies. Walker and Boyne (Reference Walker and Boyne2006) provide tentative evidence that (non-financial) incentives (and greater goal clarity) did result in improved performance in local governments in the United Kingdom. However, multiple reforms were introduced simultaneously, complicating the interpretation of specific causality. Kelman and Friedman (Reference Kelman and Friedman2009) studied the introduction of hospital-level financial incentives intended to reduce wait times. They report dramatic improvements in wait times and no evidence of dysfunctional effort substitution or gaming (but see Bevan and Hood Reference Bevan and Hood2006 and Bevan Reference Bevan2010). Burgess et al. (Reference Burgess, Propper, Ratto, von Hinke, Scholder and Tominey2010) study the performance effects of the introduction of both team-based and individual incentives to tax collection in the United Kingdom. They find that team-based incentives raised productivity (even though the teams were quite large – over a hundred members) and that individual incentives raised both the “tax-yield” and productivity. The main source of productivity improvement came from reassignment of more efficient employees to the incentivised tasks. Binderkrantz and Christensen (Reference Binderkrantz and Christensen2011) study the impact of executive performance contracts on over 60 Danish public agencies using 2000, 2005 and 2008 data. (As they only purport to study the effect of PFP on performance, we do not treat this as an “agencification and performance” study per se). They find no substantive impact on performance as a result of these incentives.
In sum, the empirical findings suggest that extrinsic incentives, especially financial incentives, are quite variable in their impact. The one constant appears to be gaming by employees (i.e. strategic responses). This raises the possibility that more sophisticated incentive designs might produce better outcomes. It is also worth emphasising that, in almost all public sector contexts, financial incentives have been low relative to base salary, so that the price effect has been small (Weibel et al. Reference Weibel, Rost and Osterloh2010). There is some evidence that intrinsic incentives do work.
Agencification and performance
Obviously, the most relevant evidence directly examines the effect of agencification on performance and it is very limited (Talbot Reference Talbot2004, 105; see Verhoest and Lægreid Reference Verhoest and Lægreid2010 for a review; see Pollitt and Dan Reference Pollitt and Dan2011 for a meta-analysis of empirical evidence). Boyne (Reference Boyne2003b) specifically notes the lack of (time-series) evidence, and Pollitt and Dan (Reference Pollitt and Dan2011, 32–33) emphasise the paucity of empirical studies that convincingly address productivity change.
In an early study, Shirley (Reference Shirley1999) generally found that “agencification” improved performance in developing countries. However, it is unclear what exactly agencification means, as her study covers a number of disparate countries; in many cases, the changes would appear to be better described as SOE reform. Several studies that examine what is described as agencification in China have the same problem (Aivazian et al. Reference Xu, Zhu and Lin2005; Xu et al. Reference Xu, Zhu and Lin2005). Brewer (Reference Brewer2004) analysed quite highly aggregated evidence from 25 OECD countries. His work supports the idea that a portfolio of “agencification-like” reforms do improve agency performance. Bilodeau et al. (Reference Bilodeau, Laurin and Vining2007) studied 11 agencies at both the federal level in Canada and at the provincial level in Québec and found that the change to an agency form did improve performance for the three years following agencification. Quenneville et al. (Reference Quenneville, Laurin and Thibodeau2010, 158) analysed 16 agencies in Québec over a 5-year period and concluded that “average annual financial performance across agencies clearly improved over this period”. Cambini et al. (Reference Cambini, Filippini, Piacenza and Vannoni2011) find that (two different versions of) agencification of Italian bus entities resulted in a reduction of production costs. In aggregate, the evidence does suggest improvement, although only based on short time frames.
Nelson and Nikolakis (Reference Nelson and Nikolakis2012) provide the only extant study of longer-term performance of which we are aware. They assess the agencification of six Australian state-level forest SOEs over the period 1989 to 2007. The changes increased managerial autonomy and encouraged clarity in the definition of goals, and can therefore be considered agencification, albeit from a starting point of much greater political independence than a traditional bureau. Nonetheless, they report improved performance and profitability over the period.
Likely performance effects of agencification
We conclude from the literature review that: (1) the long-run effects of agencification are largely unknown, although there is some minimal evidence showing some performance improvement; (2) there is some evidence showing short run improvement from agencification; (3) the evidence suggests that some degree of autonomy improves performance; (4) the evidence on incentives is mixed, but higher-powered extrinsic incentives do not improve performance unless well-designed; and (5) higher-powered intrinsic incentives appear to have some positive impact, although the evidence is not very strong.
In view of this weak and mixed evidence, our specific hypotheses relating to performance change are tentative. However, we conclude that our primary maintained hypothesis is that agencification improves performance, primarily as measured by productivity. In addition, we hypothesise (H1) that annual performance gains continue for many years following agencification. We base this hypothesis on the idea that, while the CE could implement some productivity-improving or cost-cutting changes quite quickly, other operational or “cultural” changes could take a number of years to fully implement. However, we also test a secondary hypothesis (H2): the magnitudes of any incremental performance improvements decline over time. Even if performance does improve for years following privatisation, there is no strong theoretical rationale to expect that these performance improvements will continue to accumulate indefinitely. Indeed, a more plausible assumption is diminishing marginal returns to agencification: after some years, the effects of agencification will “peter out” completely.
Sample selection, data and research design
We examine the consequences of agencification in the province of Québec, Canada over an extended period of time. The final sample consists of 13 agencies for which long-term time series data are available. Appendix 1 summarises the basic information on these 13 agencies, including the name, de facto year of creation, last year of data availability, number of full-time equivalent employees (FTEs), agency annual expenditures as of 2008 and a summary of the agency’s main programme or services.Footnote 4 These data were collected from their annual reports up to 2009–2010. In examining performance change, an important issue is potential bias in the sample selection. We first address this issue and then outline our data and research design.
Sample selection and potential bias
An initial screen of agencification candidates suggested 21 entities created in 2001 or before that date. They basically fell into two categories. First, between 1995 and 2000, during “an experimentation phase” (Mazouz and Tremblay Reference Mazouz and Tremblay2006), the Quebec government created five “Autonomous Service Units”. The Public Administration Act (2000) formalised their status as autonomous agencies. Second, after the passage of this Act, the Quebec government created 16 more agencies that were required to negotiate and commit to an “accountability and performance convention” (Convention de Performance et d’Imputabilité) with an oversight Ministry. We attempted to collect data on all 21 of these entities. However, we could not include three agencies, because they did not disaggregate their performance data from those of their oversight Ministries (Centre Québécois d’Inspection des Aliments et de Santé Animale, Contrôle Routier and Géologie Québec). We could find no clear statement as to why their accounts were handled in this way. In addition, we could not construct a plausible primary measure of output for three entities (Centre d’Expertise Hydrique, Forêt Québec and la Régie du Logement). Finally, we had to drop two more agencies: Sécurité du Revenu, because it merged with another agency in 2006, and the Centre de Signalisation du Québec, because it was privatised in 2007. As a result, their time series was truncated and we could not validate their data.
Therefore, we include all 13 agencies with sufficient data. Thus, we eliminate some obvious sources of bias. There are, however, some remaining areas of concern. A review of the agencies’ mandates shows that most share the feature that, following agencification, they would have fairly narrow task domains, although we found no clear statement that this was a specific rationale for their agencification. Similarly, consistent with a narrowing task domain, most of the selected entities would have a single primary tangible output that was reasonably easy to measure. However, this was not universally the case, as we had to exclude three agencies because of lack of relevant output data.
Therefore, our sample is biased to the extent that it mostly includes agencies that share the two features of narrow task domain and a primary tangible and measurable output. Our theoretical discussion around Figure 1 suggests that agencies with these features would be most amenable to performance improvement. Furthermore, our reading of government documents suggests that the Quebec government was concerned with improved efficiency, as it implemented a formal results-based management system during the same period. Thus, our sample is not random in one sense – it probably is biased towards agencies that are more likely to improve performance than would be a randomly selected sample of government line bureaus. Thus, our findings apply most directly to bureaus, or separable parts of bureaus, with narrow task domains. Parenthetically, we also note that this may well turn out to be a common problem in empirical agencification studies, because this kind of change limits the use of straightforward “before-after” comparisons.
Data and research design
The starting point for our analysis is financial data that all agencies disclose in their annual reports. These data include annual revenues (where applicable) and expenditures and the number of FTEs. For this first component, we also identify a measure for the primary output of each agency.Footnote 5 Appendix 1 also reports our selected measures of output. Using the financial statement data, we compute a number of ratio measures that assess the agencies’ financial performance, the average cost of their outputs and their overall rates of labour productivity over about a 10-year time period.
The financial data underlying each measure needs to be adjusted to take inflation into account. The use of nominal dollars (especially over an extended period of years) would overstate the magnitude of change. We convert revenues and expenditures for all agencies for all years to 2003 (Canadian) dollars using the Consumer Price Index – all items (CPI). The CPI is available from the Statistics Canada website (www.statcan.gc.ca).
As each agency produces different outputs, our output measures for agencies are in different units of analysis. For statistical description and analysis to be meaningful, we normalise these data using a standard methodology (see Boardman et al. Reference Boardman, Laurin and Vining2002, 147). Specifically, for each agency in year t, the normalised measure is equal to the ratio of the year t observation to the value of the measure in the base year. It is important to note that, using this normalisation method, each agency is given equal weight in the sample (similarly, see Bilodeau et al. Reference Bilodeau, Laurin and Vining2007) regardless of size. Thus, this approach normalises for size discrepancies as well as for measurement in different output units. This is the appropriate approach for an examination of the effect of agencification as an organisational form. We use only the normalised data in further analysis. We then calculate the annual change in the normalised values for the relevant measures for each agency.
In the first component of our research design, we test hypothesis H1 by computing confidence intervals around the geometric means of normalised values for each standard annual report performance measure. If the confidence interval excludes 1.00, then we conclude that the performance indicator has significantly changed over the period of study. This provides a rough first test of the efficiency hypothesis H1 as well as an overview of other potentially relevant outcomes of agencification. The second component seeks to assess the impact of agencification on performance by estimating a panel regression model of the measures computed from the financial statements of each agency. The regression model allows us to estimate the pattern of performance change over time and explicitly test the efficiency hypotheses, H1 and H2. The third component reports results of a survey that we sent to agency personnel following our initial statistical analysis. In addition to providing a check of the data, it allowed us to elicit executives’ views about the role of autonomy in achieving performance improvements.
Results
Overview of financial performance
Table 1 shows the average change in the standard agency measures over the entire study period. It reports the end-of-period geometric mean (appropriate in view of normalised data) of the normalised performance measures and tests whether it is significantly different from one with a probability of type I error set at a 5% level. The changes (increases) for revenues, output and labour productivity are all statistically significant at the 5% level, while the improvement (increase) in the revenue-expenditure margin is statistically significant at the 10% level. We regard the increases in total output and labour productivity as unambiguously representing performance improvement, as neither measure is subject to the exercise of monopoly power concerns discussed earlier. Both total output and labour productivity increase over the entire study period. We regard the decrease in average cost as also unambiguously representing performance improvement, especially in a context of generally rising expenditures and revenues. These aggregate results are consistent with the efficiency hypothesis H1.
Table 1 Annual change in performance measures normalised to base year (base year: agency creation year)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190710070438916-0203:S0143814X14000245:S0143814X14000245_tab1.gif?pub-status=live)
Notes:
1 FTE change calculated from 2001 to 2010 owing to data limitation.
2 FTE change calculated from 2001 to the end of period (2010) owing to data limitation.
BIA=Bureau des infractions et amendes; CARRA=Commission administrative des régimes de retraite et d’assurances; CCQ=Centre de conservation du Québec; CEAEQ=Centre d’expertise en analyse environnemen-tale du Québec; CGER=Centre de gestion de l’équipement roulant; CPF=Centre de perception fiscal; LSJML=Laboratoire de sciences judiciaires et de médecine légale; RC=Régie du cinema; SAG=Service aérien gouvernemental; AFE=Aide financière aux etudes; CR=Centre de recouvrement; EQ=Emploi Québec; RRQ=Régie des rentes du Québec; FTE=full-time equivalent employees.
*Significantly different from 1.000 at the 0.05 level.
Table 1 is also informative about the subset of agencies that generated revenues. All of these agencies generated an average annual increase in revenues during the entire study period. The average annual increase in revenue is quite substantial – more than 10%. Table 1 demonstrates that annual revenues increased at a faster rate than did expenditures (3.6% on average). Revenues also increased faster than output (~4.3% on average). The individual agency results also point to the conclusion that performance on the measures tended to improve over time. Across other dimensions of performance, however, the results demonstrate some variability. For those agencies that do not generate revenues, results generally show performance improvement. However, AFE experienced a substantial increase in expenditures and FTEs, along with a decrease in output and productivity, at least when including its unaudited data for the first two years of its operation.
Assessment of performance (efficiency hypotheses)
The central empirical question is whether or not agencification affects long-run productivity. To answer this question, we analyse the data for the 13 agencies over the multi-year period. The results of these panel regressions appear in Table 2.
Table 2 Panel regression analysis of productivity: 13 agencies for up to 13 years
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190710070438916-0203:S0143814X14000245:S0143814X14000245_tab2.gif?pub-status=live)
Notes:
GDP=gross provincial product; FTFs=full-time equivalent employees.
Two-tailed significance tests: *p<0.05.
The dependent variable for each model is the normalised productivity by agency and year. The primary independent variable is the time since the beginning of the agencification in years.Footnote 6 Because of the normalisation of productivity, the coefficient of time can be interpreted as an annual percentage rate of growth. Model 1 provides the basic analysis. In addition to time since agencification, it includes fixed effects for both firms and calendar years. It also includes the square of time to allow for the possibility that the change in productivity is itself changing over time. It thus provides a way of testing hypothesis H2.
The coefficients of both time since agencification and its square in Model 1 are statistically significant. Substantively, the effects of time and its square must be interpreted together. Specifically, productivity gains appear greatest immediately after agencification and decline over time until falling to 0 in about 11 years (∂Productivity/∂Time=0.088−0.0078×Time). Consequently, consistent with hypothesis H2, it appears that there are substantial and fairly long-lasting productivity gains following agencification, but that these gains eventually dissipate so that productivity plateaus.
Model 2 adds the variable FTE, the number of full-time employees in thousands. All else equal, larger agencies show smaller gains in productivity. However, some caution is required because, as the productivity involves division of the agency output measure by FTE, the negative relation may represent a correlation purely by construction.Footnote 7 Nonetheless, the time pattern of productivity appears similar to that found in Model 1: large initial gains gradually decline towards 0. In this model, the gains reach 0 somewhat later – at about 14 years.
The last two years of the study period saw a worldwide recession. To investigate whether the declining pattern of improvements was an artefact of the recession, we re-estimate the model excluding the observations for 2008 and 2009. Model 3 presents the results from re-estimation of Model 2. The curvilinear relationship continues to hold. We find similar results (not shown) in replicating Model 1.
To assess whether the results are being driven by economic activity in general, we replaced the calendar-year fixed effects with the annual real gross provincial product in billions of dollars. Its coefficient was not statistically significantly different from 0 and the pattern of productivity gains did not change.
In summary, the results of the analysis are consistent with both the hypothesis that annual gains in productivity persist over an extended period (H1) and the hypothesis that the magnitude of these annual gains decreases over time (H2).
Although we are confident that these agencies improved their performance following agencification, our research design does not allow us to rule out the possibility that these agencies would have improved similarly even if they had not been agencificied. To investigate this possibility, we would have needed data covering a comparable period for a set of comparable agencies that were not agencified. If these comparable agencies showed similar performance improvements, then the improvements we observed in the agencified agencies would likely be more attributable to (unobserved) broader institutional reform than to agencification. Unfortunately, because of the lack of comparable data from other Québec public bureaus, we could not construct a plausible comparison group of these bureaus against which to compare the results of the agencies (Mazouz and Tremblay Reference Mazouz and Tremblay2006). However, we are able to compare the change in the FTEs of the 13 agencies to that of Québec’s overall public sector FTEs over approximately the same time period. This comparison shows that the Québec public sector FTEs increased over the whole period, while the agencies in our sample showed virtually no change in overall FTEs. Indeed, they showed substantial declines in the latter part of the study period. Over the period 2000 through 2009, total annual average public sector full-time equivalent employment in Québec increased by 26% while it decreased by 56% for the 13 agencies we studied. Thus, the productivity of employees in bureaus would have had to increase substantially to produce the same productivity gains as realised by the agencies. Specifically, output by bureaus would have had to have gone up about 76% over the period to match the average 40% increase in productivity achieved by the agencies – an implausibly large number.Footnote 8
As a further way to investigate whether the agency gains might have occurred without agencification, we compared the employment data of the agencies to that of the three largest SOEs in Quebec, including the electric utility (Hydro-Québec, HQ), the lottery corporation (Loto-Québec, LQ) and the enterprise responsible for selling alcoholic beverages (Société des Alcools du Québec, SAQ). These SOEs are also autonomous but have a clear mandate to generate income for the government.Footnote 9 These SOEs employ, respectively, more than 20,000, 6,000 and 5,000 FTEs. Although their goals are complex (and beyond our scope to discuss), they do have incentives to monitor employee productivity. If there were an underlying trend towards reducing FTEs to improve productivity in the Quebec public sector, these three SOEs would almost certainly reflect it. Our analysis of these three SOEs, however, found that, between 2000 and 2009, the aggregate number of FTEs increased by more than 17% (11.5% for HQ, 13.9% for LQ and 54.4% for SAQ). Furthermore, using revenue per employee as a measure of productivity, we found that productivity actually declined: by 15.2% for HQ, 17.7% for LQ and 7.1% for SAQ. In summary, the SOEs saw reductions in productivity, and the public sector as a whole would have had implausibly large output gains to show comparable increases in productivity to the agencies. Consequently, we conclude that there was no secular trend in increasing productivity and that the improvements in performance that we measure are most plausibly attributable to agencification.
Survey results
Using the performance measures data, we summarised each agency’s performance in a report that we gave to each CE. We asked the CEs to confirm the accuracy of the data, answer a brief questionnaire regarding their agency’s results and provide their overall responses to the results and, more generally, the impact of agencification.Footnote 10 CEs in 11 of the 13 agencies responded to the survey.Footnote 11 Our exchanges with each agencies’ personnel provided us with an opportunity to make a number of adjustments to the data and, in one case, to expand the database.Footnote 12 Table 3 summarises these survey results.
Table 3 Summary of agency survey responses
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190710070438916-0203:S0143814X14000245:S0143814X14000245_tab3.gif?pub-status=live)
Note: A total of 11 responses to all questions except Q3a, which had 10 responses.
The survey asked six questions. For each one, the CEs were asked to mark their agreement with a statement on a five-point Likert scale and also to provide open-ended comments on those statements. Question 1 asked respondents if they thought our results accurately portrayed the performance of their agency. A total of 7 out of 11 respondents thought that the results correctly portrayed their agency’s long-term performance. However, a number of respondents argued that the results required some context if they were to be useful in assessing their overall performance accurately. Question 2 asked about our selection of the agency’s primary output measure. In a number of cases, the CEs expressed some uneasiness at our selections and commented that they would have preferred that we examine all outputs. However, as already noted, the only additional measures that they consistently asked for were highly correlated with the primary measure. Therefore, use of multiple measures would not have materially altered our findings. Question 3 asked about their managerial “autonomy” on three dimensions: budget surplus management (question 3a), human resource management (question 3b) and other aspects of managerial autonomy (question 3c). A majority of the CEs responded that their autonomy to manage budget surpluses and human resources was too restricted. Their comments suggest that they perceived their autonomy had decreased over time as the Québec Treasury Council imposed both salary and hiring freezes. Question 4 asked about their ability to mandate more high-powered “incentives”. A number of the respondents argued that these kinds of incentives were not appropriate for their agencies, either for executives or for employees. Question 5 asked them about the role of performance measures in agency management and control. Almost without exception, the respondents’ concurred that those performance measures laid out in annual reports are actually used to assess the performance of their agency. Question 6 asked about the use of other performance measures beyond those in the annual report. A majority of the CEs said that they do use other measures, and some provided examples of these measures.
In aggregate, and consistent with our empirical findings, the survey results suggest that the implementation of NPM-like reform has had a substantive impact on these agencies. Yet, managers do worry about the sustainability of autonomy and about the reductionism inherent in numeric measurement of goal achievement. Despite the performance improvements we document, agency managers are somewhat skeptical of the sustainability and depth of reform over time (similarly, see Mazouz and Tremblay Reference Mazouz and Tremblay2006). Managers appear to fear the presence of a Hawthorne effect.
Discussion and conclusion
The increase in total output, the decrease in average cost and especially the increase in labour productivity suggest performance improvement from agencification, at least in the Québec institutional environment. Our formal test of the efficiency hypothesis focusing on labour productivity indeed shows an initial increase in productivity following agencification. However, the annual gains appear to decline towards 0 over time such that the agencies reach a productivity plateau. Thus, we conclude that, despite some scholarly scepticism about the sustainability of performance improvement (which is to some extent shared by Quebec agency CEs!), agencification does result in longer-run improved performance in these agencies, though these performance improvements peak after about a decade. Hence, our conclusion is that the “?” in Figure 1 represents a net positive effect on performance.
A number of explanations can be offered for the long-run plateau in performance. There may indeed be a sort of “Hawthorne effect” such that both the initial increase in the oversight of managers and their performance decline over time. It may also be that the managers’ perceptions of declining autonomy are accurate, so that their opportunities to drive productivity gains decline. An alternative explanation, however, is simple declining marginal returns: over time, managers take actions that exhaust performance improvement. Of course, these explanations are not mutually exclusive. In any event, sorting them out would require long term, detailed case studies of agency behaviour.
There are five caveats to this conclusion. First, we do not have an ideal experimental design – governments simply do not create pairs of bureaus and then agencify one of them. Before–after comparison is rarely possible because of scope and incentive changes that occur at the time of agencification. In spite of these design limitations, we are confident that the agencified agencies showed productivity gains. We have to rely on indirect evidence to conclude that agencification caused the gains because we do not have a close comparison group during the agencification period. Further, our results do not imply that agencification will produce productivity gains in any bureau. Rather, the results apply most appropriately to bureaus like those agencified in Québec, that is, bureaus with relatively narrow task domains and a measurable primary output.
Second, one can always argue about what the “long run” really is when assessing performance. Certainly, in the long run, we are all dead. In this sense, all life is a Hawthorne effect. The CEs’ perceptions that their autonomy has decreased over time may foreshadow a reversion to the previous status quo performance. Obviously, tracking agency performance over even longer-term time frames would be informative and should be a major focus of future research. Nonetheless, the performance of this set of agencies appears to have improved over a long enough period to support the claim of long-run productivity improvement.
A third caveat is whether agencification is enabling the exercise of monopoly pricing power at the same time that it is generating performance improvement. The fact that aggregate agency revenues grew more quickly than did aggregate primary outputs suggests that this is a legitimate concern. It is not likely, however, to concern political principals, or their central budgetary control agents, who will almost universally regard these particular changes as representing performance improvement. Nonetheless, this issue merits further research. It also suggests the potential value of a “regulatory constitution” that mandates a goal of social welfare maximisation and social marginal cost pricing. The chances of implementing such a constitution are slim, however, because revenue-hungry governments care about more than social welfare.
A fourth caveat is the “black box” nature of the agencification recipe (Pollitt and Dan Reference Pollitt and Dan2011). In view of the variability in recipes, we do not know which ingredients are critical. We also do not know whether agencification permits or fosters other organisational and human resource changes, such as better employee recruitment, that might have led to better performance over time (Huff Reference Huff2011). Indeed, we cannot be sure if there are critical ingredients; rather, it may simply be: “somebody is finally paying attention to us!” In other words, is there an extended Hawthorne effect?
Fifth, a final caveat relates to the year-to-year inconsistency in the additional customised performance measures revealed by the agencies. Only a few agency-designed measures were used continuously. This inconsistency means that it is almost impossible for either government performance auditors or external researchers to effectively monitor any changes in these kind of customised measures, including the quality of services. More attention to the definitions and continued use of these performance measures would improve both internal and external accountability.
Acknowledgements
The authors would like to thank Marie-Ève Quenneville, Simon Kelly and Dominique Hamel for outstanding research assistance. Laurin acknowledges financial support from the Centre sur La Productivité et La Prospérité. Vining thanks the Social Science Research Council of Canada (SSHRC) for its invaluable financial support.
Appendix 1: Summary description of agencies
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190710070438916-0203:S0143814X14000245:S0143814X14000245_tab4.gif?pub-status=live)