Published online by Cambridge University Press: 26 April 2005
Objectives: This study was undertaken to appraise the quality of published pediatric economic evaluations.
Methods: Two independent reviewers appraised 149 randomly selected pediatric health economic studies. Data were collected from full economic evaluations published between 1980 and 1999. Economic evaluations of interventions, programs, and services aimed at neonates to adolescents were included. The Pediatric Quality Appraisal Questionnaire (PQAQ) was used for appraisal. The PQAQ is a 57-item instrument with 13 domains scored from 0 to 1 and one descriptive domain, each corresponding to a key aspect of health economic methodology. The primary outcome was the score for each domain. Additional analyses examined the global rating, the distribution of analytic technique, and the association between domain score and analytic technique.
Results: A total of 38 percent of publications were very good to excellent, whereas 43 percent were fair or worse. Although the Discounting, Target Population, Economic Evaluation, Conclusions, and Comparators domains exhibited good quality (0.74 to 0.78), the papers were of poor quality for Conflict of Interest, Incremental Analysis, and Perspective (0.32 to 0.39). Analytic technique was a significant predictor of quality for study design-related domains, with cost-utility analyses demonstrating the highest domain scores.
Conclusions: Domains closely related to the elements of economic evaluation demonstrated medium to high quality. However, domains related to analysis fared poorly and are worthy of further methodological research to improve the use of health economic methods in children.
The conduct of economic evaluations of health interventions has grown dramatically in recent years, with the hope that the findings may improve health-care budget allocation decisions in a climate characterized by increasing budget constraints. To serve this need, standard methods for the conduct of economic evaluations have evolved in the past decade (7;8;18;25;27). However, the limitations of standard health economic methods become apparent in the study of special populations. Although health-care providers may not be as well versed in health economic methodology as health services researchers and health economists, those health practitioners serving children have long been aware of the uniqueness of the pediatric population. Child health differs from adult health in important ways. These differences include the physiologic vulnerability of children as they grow and develop; their reliance on parents, teachers, and others to provide access to health care; and the ways in which they manifest disease and interact with the health-care system and their immediate environment (6;15;19;21;26). These characteristics have important implications for the measurement of costs and consequences in the pediatric population, including the use of parents as proxies to report the child's well-being and health-care resource use, the assessment of time costs related to work absences by parents and caregivers, the measurement of future work impairment and productivity costs of children, the inclusion of health-related resource use occurring outside the health-care system such as in schools and in the community, the evaluation of children's quality of life and the choice of a family or parent-child as the unit of analysis. Measuring utility, determining quality-adjusted life years, and assessing willingness-to-pay are also distinct measurement challenges in a pediatric population.
Recognizing these measurement issues, a research program in pediatric health economic methods was conceived. In the first phase of the program, a comprehensive database of all pediatric economic evaluations published between 1980 and 1999 was created (30;31;33). This database is available on-line at http://pede.bioinfo.sickkids.on.ca/pede. With this database available, an important objective was to evaluate the quality of published pediatric economic evaluations to reveal the methodological limitations of existing approaches to economic evaluation and to point a direction for future methods research in this population. In the second phase of the project, existing economic quality appraisal methods and instruments were reviewed (1;3;4;11–14;17;20;22;23;28; 29). Whereas there was some consistency among existing quality appraisal instruments with respect to the elements included, most of the tools remained subjective with little information provided regarding validity and reliability. A significant limitation of these instruments is that none focused on the pediatric population. Thus, the Pediatric Quality Appraisal Questionnaire (PQAQ), a 57-item pediatric-specific quality appraisal instrument, was developed through a systematic process of item selection, validity and reliability assessment, and pretesting (32). This study reports the third phase of the project: the quality appraisal of the pediatric health economic literature using the PQAQ.
A comprehensive pediatric health economic database of 787 publications published in the medical and gray literature during the 20-year period from January 1, 1980, to December 31, 1999, was established. Only full economic evaluations were included, where a comparator existed (or was suggested, such as a “do nothing” or before–after design) and descriptions of both costs and health consequences were present. The economic evaluation did not have to be the primary objective of the study to be eligible. The database includes key characteristics for each citation, including publication year, target population, ICD-9-CM disease class (16), age group, experimental intervention, intervention category, health outcomes, and analytical technique, and is linked to a bibliographic database containing the full citation information and abstract. The details of database development are described elsewhere (30;31).
To conduct the quality appraisal, a draft instrument was developed and subjected to review by a panel of experts. The instrument was pilot tested by three independent appraisers, and the final PQAQ underwent test–retest and inter-rater reliability assessment. In the final 57-item version, items are grouped into 14 domains representing the common methodological themes of economic evaluation. These domains are labeled as Economic Evaluation (questions referring to specification of the research question and analytic technique), Comparators, Target Population, Time Horizon, Perspective, Costs and Resource Use, Outcomes (questions regarding the measurement and reporting of effectiveness), Quality of Life, Analysis (questions referring to the valuation and aggregation of costs), Discounting, Incremental Analysis, Sensitivity Analysis, Conflict of Interest, and Conclusions. The PQAQ contains nine items unique to the pediatric population. In the Costs and Resource Use domain, three questions address the measurement of productivity costs of parents and caregivers, the inclusion of cost items outside the health-care system, such as school and community resources, and the measurement of future productivity costs of children with chronic diseases or disabilities. The Target Population domain contains an item assessing whether the study sample is representative of the ultimate target population, as many pediatric interventions are based on adult data. The Outcomes domain includes a question about the recording of school day absences, a simple and common measure of functioning in children. Two questions in the Quality of Life domain address whose quality of life is being measured and whether quality of life was assessed directly or by proxy. In the Analysis domain, one item addresses the use of proxy measures during data collection and a second item addresses the unit of analysis, for example, child, child–parent, or family.
The number of items per PQAQ domain varies from 2 to 10, with an average of 4. However, not all items were relevant for all publications. For example, for studies with time horizons less than 1 year, the items pertaining to Discounting were not applicable and this domain was not included. Thirteen domains are scored from 0 to 1, with 1 equivalent to a perfect score. Only the Quality of Life domain, which contained descriptive questions only, was not scored. While each domain is scored, there is no single summary score for the entire instrument. A global assessment is included as the final questionnaire item. Additional details of the development and testing of the PQAQ are available (32).
A random sample of 150 papers, stratified by 5-year period, was selected from the publication database. This sample size was chosen to provide adequate statistical power for the inter-rater and test–retest reliability assessment of the PQAQ (32) and for a comparison of mean PQAQ domain scores between groups of papers classified by analytic type. For the intergroup comparisons of PQAQ domain scores, seventeen papers per group were required to detect a minimally important score difference of 0.25 at alpha equal to 0.05 and beta equal to 0.20. The journal names, authors, and years were concealed. Appraisals were completed for 149 articles by two independent reviewers. Adjudication was conducted by the Principal Investigator. Data from the completed questionnaires were entered into an ACCESS 97 database.
A series of statistical tests were applied to the final data set using SAS release 8.02 (24). Descriptive statistics were calculated for each domain score. Further analyses were undertaken on seven key domains considered to be highly relevant and most often discussed in published quality appraisals: Comparators, Perspective, Costs and Resource Use, Outcomes, Analysis, Incremental Analysis, and Sensitivity Analysis. To uncover correlations between domains, a nonparametric Spearman correlation coefficient was calculated between all individual domain scores. To test for changes in domain quality scores over time, Spearman correlation coefficients were calculated between key domain scores and 5-year interval (1980–1984, 1985–1989, 1990–1994, and 1995–1999) and between the global score and 5-year period.
One-way frequency tests were performed on the global impression score (item # 57) and Spearman correlation coefficients were determined between the global score and key domain scores. Analysis of variance (ANOVA) was performed to determine whether observed variance in the key domain scores could be explained by the analytic technique, where analytic technique was specified as cost-effectiveness analysis (CEA), cost-benefit analysis (CBA), cost-utility analysis (CUA), or cost-minimization analysis (CMA). Mean domain scores were compared between analytic technique groups. Duncan's multiple-range test for post hoc pair-wise comparisons was used to assess the statistical significance of differences between mean domain scores for one analytic group versus another. Finally, one-way frequency distributions were performed on descriptive items that were not scored within the domains. A p value < .05 was considered statistically significant for all statistical calculations.
The quality appraisal analysis was conducted on a random sample of papers, stratified by 5-year period. An assessment of the citation characteristics revealed that the sample was representative of the full data set with respect to age group studied, analytic technique, category of intervention studied, ICD-9-CM disease class, and journal type. The quality of the 149 sampled publications of pediatric economic evaluations was determined by examining the domain scores, which varied from 0 to 1, and the global rating. The mean scores and standard deviations for the domains in the PQAQ are presented in Figure 1.
Quality of pediatric health economic evaluations by Pediatric Quality Appraisal Questionnaire domain. The bars represent mean domain score for each domain, and the error bars represents the standard deviation.
All domains except Discounting were represented by at least one item in all of the publications. The Discounting domain was relevant for seventy-three publications. The sampled papers demonstrated good quality for Discounting, Target Population, Economic Evaluation, Conclusions, and Comparators. The articles were of poor quality with respect to Conflict of Interest, Incremental Analysis, and Perspective. The remaining domains displayed intermediate quality scores.
Figure 2 displays the global rating frequency distribution. Just over one third (38 percent) were rated as very good to excellent, whereas 43 percent were rated as fair or worse. To determine whether quality improved over time, Spearman correlation coefficients were computed between domain scores and 5-year period. Most domain scores showed no improvement over time. Weak positive correlations were observed between 5-year period and Costs and Resource Use (r=0.16, p=.05), Outcomes (r=.18; p=.03), and Conclusions (r=0.23; p=.005).
Global rating of pediatric health economic evaluations.
Table 1 presents the Spearman correlation coefficients between the global rating and each domain score. Statistically significant correlations between the global rating and domain scores were observed for all domains. Strong correlations were observed for Analysis, Sensitivity Analysis, Incremental Analysis, and Costs and Resource Use.
Analyses were undertaken to examine the correlations between the scores of key domains and all other domains. The strongest correlations were observed between Analysis and the other domains (Table 2). The strongest correlations with Analysis were found with Sensitivity Analysis, Costs and Resource Use, Incremental Analysis, and Economic Evaluation.
The analytic technique is the most definitive characteristic of an economic evaluation. Within the 149 appraised articles, there were 168 economic analyses, as some articles reported more than one analytic technique. As seen in Figure 3, CEA and CBA composed two thirds of analyses (66 percent) according to the authors' specifications. A high proportion of papers (42 percent) was designated as “unknown” or “other” by the appraiser as the authors did not specify a conventional analytic technique. Indeed, some of the reports were published before currently conventional analytic techniques were widely accepted. Given the extent of nonspecification and the risk of misclassification, all of the publications were reassigned a single analytic technique based on a careful reading of the methods, rather than the author's judgment. Where more than one analytic technique was mentioned, a primary technique was indicated that was considered to be most prominent in the analysis or most related to the research question. Upon examination of the methods used, the majority of unclassified papers expressed outcomes in natural units and, therefore, were designated as CEAs. Similarly, some of the CBAs used natural units as an outcome measure rather than dollars, for example, cases prevented, and were reclassified as CEAs.
Analytic technique used in the health economic analysis. CEA, cost-effectiveness analysis; CBA, cost-benefit analysis; CUA, cost-utility analysis; CMA, cost-minimization analysis.
An analysis of variance compared the mean domain scores by analytic technique. Table 3 presents the mean domain scores for the seven domains that had a statistically significant overall p value for the ANOVA. Despite the small group sizes for CUA and CMA, statistically significant differences were found between analytic technique groups for each of the domains because observed differences in mean scores were large, often exceeding the minimal important difference of 0.25. Analytic technique was a significant predictor of domain score for most of the study design-related domains. Except for Time Horizon and Perspective, studies designated as CUAs demonstrated the highest domain scores and were significantly higher than those of CEAs and most CMAs. Cost-benefit analyses demonstrated the highest scores for Time Horizon and Perspective. In addition to these domains, CBAs were superior to CEAs for Discounting and Sensitivity Analysis, demonstrating large differences in mean scores.
In addition to the questionnaire items that were scored within the domains, several questions were included for the purpose of description and further characterization of the publication. The PQAQ includes several descriptive questions regarding costing methods. The analysis revealed that most studies used multiple costing sources, including charges, fees, market prices, and average costs. Expert opinion or assumptions were used in 27 percent of direct cost valuations. The majority of publications (79 percent) did not include productivity (indirect) costs, despite that parental time loss is an important cost item for most pediatric interventions. Although most articles (52 percent) failed to perform any uncertainty analysis, of those that did, one-way sensitivity analysis was performed almost exclusively (Table 4).
If one were to view the domain scores of the PQAQ as a “report card” on the quality of pediatric economic evaluations, then one must conclude that the pediatric health economics literature deserves no better than a C grade. Whereas some domains might be worthy of a B, notably Discounting, Target Population, Economic Evaluation, and Comparators, the remaining domains exhibited failing grades. The performance of the domain scores was reflected in the global rating, with 44 percent of publications rated no better than “good.” The global rating was most highly correlated with domains related to analysis, suggesting that analytic issues may be a prime consideration when appraising the overall quality of the report.
The greatest deficiencies were observed as failures to state and present the perspective, to measure and evaluate costs and outcomes appropriately, to conduct incremental and sensitivity analyses, and to declare the relationship to the study sponsor. In addition, parental time costs were usually excluded, and a reliance on expert opinion in lieu of data was evident. Whereas some of these failings can be attributed to a lack of rigor in applying general health economic methods, they also relate to the unique challenges of conducting economic evaluations in children. With an understanding of these deficiencies, methodologists must work together with clinicians to address them. Methods research in this patient population that examines the use of how parents communicate a child's health status, the assessment of time costs related to work absences by parents and caregivers, the measurement of future productivity costs of children as a function of level of disability and future prognosis, the inclusion of health-related resource use occurring outside the health-care system such as in schools and in the community, the evaluation of children's quality of life, and the measurement of economic outcomes including utility and willingness-to-pay is needed. Such research studies would benefit health services researchers and economists who continue to expand and improve their methodological toolbox, as well as health-care practitioners who would gain insight into the economic consequences of disease and of treatment strategies.
Several investigators have examined the scope or quality of economic evaluations in the medical literature, without focusing on a particular patient population or age group (1–5;9–14;17;20;22;23;28;29;34). Problems cited in previously published appraisals included the omission of comprehensive costs and sensitivity analysis (12); improper allocation of overhead costs, absence of sensitivity analysis, and lack of a summary of treatment costs and consequences as a net benefit or incremental ratio (1); failure to state and test assumptions with sensitivity analysis (29); lack of identification of relevant costs and consequences, lack of discounting, failure to perform an incremental analysis, failure to conduct sensitivity analysis, and inappropriate use of the term “cost-effectiveness analysis” (20); inadequate handling of ethical issues and study perspective (4); suboptimal adherence to methodologic standards and lack of improvement over time (3;22); inadequate assessment of uncertainty (28) and insufficient use of QALYs and life-years gained (2).
The poor performance observed in the appraisals of the adult literature was echoed in the quality appraisal of the pediatric health economic literature. The implication of poorly conducted health economic studies is that incomplete or unreliable information is used by decision-makers resulting in less than optimal resource allocation.
A particular challenge in the conduct of this investigation was the classification of papers by analytic technique, with 43 percent of analyses remaining without an explicit or correct analytic designation. Many unclassified papers and so-called CBAs that examined the incremental cost of cases of disease prevented were relabeled as CEAs. After reclassification, 15 percent of publications remained true CBAs. This finding contradicts the commonly held perception that CBAs are rare in health economics due to a reluctance to monetarize health outcomes. Forty-eight percent of CBAs were for preventative interventions, and another 34 percent were studies of detection/diagnostic interventions. Whereas adult medical care emphasizes medical treatment for disease, child health is characterized by a low incidence of manifest disease with greater importance placed on the prevention of pediatric illness and illness occurring in later years.
Despite their small numbers, CUAs demonstrated significantly higher quality than the other types of analyses, with very high domain scores observed for Economic Evaluation, Discounting, and Sensitivity Analysis. The true CBAs also showed superior quality to the CEAs. Misuse of the terms cost-benefit analysis and cost-effectiveness analysis in the medical literature continues to hamper the conduct of literature searches for the purpose of methodological research (35). Although the effect of journal type on domain score was not measured directly, it was found that CEAs tended to be concentrated in pediatrics/perinatal medicine journals, whereas the higher quality CUAs and CBAs were more evenly distributed across journals of pediatrics/perinatal medicine, subspecialty medicine, public health, and general medicine. Only 5 percent of all economic evaluations were published in Health Economics/Health Policy/Methods journals. It is important that high-quality studies find their target audiences, be they health providers, decision-makers, researchers, or families. The field of health economics continues to grow, and in an era of increasing financial constraint in the health-care system, the need for high quality investigations becomes more acute, particularly in vulnerable populations such as children.
Over the past decade, standard methods have been developed for the economic evaluation of health interventions. As the field of health economics evolves and as economic evaluation activity increases, the limitations of these methods become revealed. The PQAQ is a valid and reliable pediatric-specific quality appraisal instrument. The use of this instrument to appraise the quality of pediatric health economic evaluations revealed deficits in the analytic aspects of pediatric economic evaluation. Methodologists and clinicians must work together to address these deficiencies and to further investigate the factors that contribute to study quality.
The quality appraisal described above is a necessary first step in identifying methodological gaps in pediatric health economic evaluations. As with the adult population, poorly conducted health economic studies will be inadequate for informing decision-makers, resulting in less than optimal resource allocation decisions by purchasers of health-care services and interventions, including government agencies, insurers, institutions, and providers. Improving the quality of health economic assessments in children is a necessary prerequisite to improving the quality of health care in this population.
Wendy J. Ungar, MSc, PhD (wendy.ungar@sickkids.ca), Assistant Professor, Department of Health Policy, Management and Evaluation, University of Toronto, McMurrich Building, 2nd Floor, 12 Queen's Park Crescent West, Toronto, Ontario M5S 1A8, Canada; Scientist, Population Health Sciences, The Hospital for Sick Children, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada Maria Santos, MHSc (maria_santos@gov.nt.ca), Research Assistant, Division of Population Health Sciences, The Hospital for Sick Children, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada
This research was supported by funding from The Canadian Coordinating Office for Health Technology Assessment, The Hospital for Sick Children Research Institute, and from in-kind support from The Institute of Health Economics. The researchers are grateful to Ms. Amy Lee, The Hospital for Sick Children, and Ms. Tania Stafinski, The Institute of Health Economics. We thank Ms. Renee Henderson and Ms. Mahdie Seyed for their technical assistance.
Quality of pediatric health economic evaluations by Pediatric Quality Appraisal Questionnaire domain. The bars represent mean domain score for each domain, and the error bars represents the standard deviation.
Global rating of pediatric health economic evaluations.
Correlation of Pediatric Quality Appraisal Questionnaire (PQAQ) Domain Scores with Global Rating
Correlation of Pediatric Quality Appraisal Questionnaire (PQAQ) Domain Scores with Analysis Domain Score
Analytic technique used in the health economic analysis. CEA, cost-effectiveness analysis; CBA, cost-benefit analysis; CUA, cost-utility analysis; CMA, cost-minimization analysis.
Pediatric Quality Appraisal Questionnaire (PQAQ) Domain Score by Health Economic Analytic Technique
Methods Used to Assess Uncertainty in the Data