Systematic evaluation of the costs associated with the provision and receipt of ambulatory and home-based health services is an essential component of research aimed at determining which interventions and sites of care are most cost-effective. Despite this need, limited attention has been paid to the methodological issues concerning the measurement of health resource costs. Unlike other data collection instruments in health services research, costing resource tools have not been systematically developed or rigorously evaluated. Furthermore, care recipient and family costs have not been measured with the same diligence that has been applied to clinical factors, such as psychosocial and physiological status.
Although several studies have assessed costs associated with ambulatory and home-based health service utilization (2;6;10;18;23;24;30–35;37;38), the methods used for instrument development have not been published and the measurement properties of only one of these instruments (4) have been empirically evaluated. Because these instruments captured resource use retrospectively and because few have been evaluated comprehensively, the level of accuracy with which participants recalled and recorded various types of data remains unknown. Furthermore, because each questionnaire was designed to address one specific healthcare service or for a particular population, they cannot be easily adapted and applied to alternative care delivery settings and interventions.
The Ambulatory and Home Care Record (AHCR) (Coyte and Guerriere, 1998) was developed in response to the need for a comprehensive instrument to value resources associated with ambulatory and home-based care. Using a societal perspective (11;16;39), the AHCR assesses three types of costs. Health system costs include consultations with healthcare professionals, laboratory and diagnostic tests, and prescription medications that are publicly financed. Care recipients' out-of-pocket costs consist of personal expenditures for care that is not publicly financed, such as consultations and medications, and related costs such as traveling expenses when seeking health care. Finally, care recipients' and family caregivers' time costs refer to the monetary value of time devoted to receiving and providing care. Because of its comprehensiveness and adaptability for use across an array of healthcare settings with diverse care recipients, the AHCR is a promising tool for measuring health resource costs. As part of the standardization process, an evaluation of its measurement properties is necessary. Accordingly, the purpose of this study was to evaluate the level of agreement between self-reports on the AHCR and administrative data.
METHODS
Participants were recruited from a Cystic Fibrosis (CF) Clinic at St. Michael's Hospital in Toronto, Ontario, Canada, after receiving approval from the hospital's and the University of Toronto's ethics committees. CF is a chronic, inherited disease that is caused by mutations in the CF transmembrane conductance regulator gene, resulting in dysfunction of exocrine secretion, and characterized by excessive, thick mucus secretions that obstruct the lungs and the gastrointestinal system (25). The incidence of CF is approximately 1 in 2,500 live births (3). The population of adults with CF was selected for this study for three reasons. First, the management of CF involves the use of a range of healthcare services on a regular basis, including the following: medications, such as antibiotics, bronchodilators, corticosteroids, anti-inflammatory drugs, and pancreatic enzyme replacements; various therapies, such as chest physiotherapy and oxygen (7;13;28); and frequent clinic visits for consultations with a variety of healthcare providers. Second, because the severity of the condition varies among individuals, there is variability in resource use across care recipients. Finally, because CF patients may experience an exacerbation and require additional therapy, an individual's use of resources also varies over the course of this chronic condition. This finding permitted an evaluation of the AHCR with a population in which there was variation in resource utilization both within and between individuals.
Individuals were eligible if they were 18 years of age or older, fluent in English, and not terminally ill or awaiting a transplant. In addition, individuals were eligible if they were Ontario residents with a valid health card number from the Ontario Health Insurance Plan (OHIP). In Ontario, the costs of all medically necessary diagnostic and treatment health services, spanning various care settings such as home, hospital, and community, are paid for by the provincial government under OHIP. Certain groups of care recipients are also eligible for prescription medication coverage. Some care recipients also pay out-of-pocket for appointments with professionals (traditional and alternative medicine consultations), medications, and supplies to supplement publicly insured services.
Starting from the day of recruitment, participants completed the AHCR on a daily basis over a 4-week data collection period and then returned the AHCR by mail, using a prepaid envelope. During the data collection period, research assistants telephoned participants to remind them to complete the AHCR and to answer any questions.
Care recipients' AHCR reports were compared with administrative data for four publicly financed health services: (i) home-based healthcare professional appointments, (ii) healthcare professional appointments outside home (hospital, clinic, office-based), (iii) laboratory and diagnostic tests, and (iv) prescription medications.
Administrative data were extracted from three databases for each participant for his/her 4-week data collection period. Hospital, clinic, and office-based healthcare appointments were obtained from the OHIP detailed claims file, which captures the services rendered by physicians across all sites of care and licensed medical laboratories in Ontario under the health insurance act. For each care recipient, number and dates of service use, fee schedule codes for each service date, and physician specialty code were extracted from the OHIP database. Home care service data were obtained from the Ontario Home Care Administrative System (OHCAS) database, which contains data for all publicly funded home-based appointments with nonphysician professionals (nurses, physiotherapists, occupational therapists, and personal support workers). The number and dates for consultations were extracted from the OHCAS database. Finally, data regarding prescription medications were obtained from the hospital pharmacy. All care recipients who attend the hospital-based CF clinic must have their CF prescription medications filled at the hospital's pharmacy to have the total cost of their medications covered by a publicly funded drug insurance program. For each participant, the name of each prescription drug and the date the prescription was filled were extracted. Participants were linked deterministically to each of the three databases using their health card number, using standard methods.
The sample size was determined using power contours computed by Donner and Eliasziw (1987) (9) to determine sample size requirements for reliability studies. To test the hypothesis that the intraclass correlation coefficient (ICC) is equal to or greater than 0.75 with 80 percent power at the 0.05 level of significance using two data sources and allowing for an attrition rate of 20 percent and a response rate of 80 percent, the necessary sample size was ninety-four. ICC values of 0.75 or greater are considered to be acceptable levels for health surveys (1;9) and for instruments measuring nonphysical constructs in health services research.
Statistical Analysis
Participants' reports of consultations and tests outside the home were compared with the OHIP database, home-based consultations were compared with data in the OHCAS database, and the medications were compared with the hospital pharmacy database. The approach for analyzing agreement is based on the strategy adopted by Ungar and Coyte (36) and on the statistical procedures presented by Fleiss (14).
The level of agreement between self-reported responses on the AHCR and administrative data was assessed using two agreement indicators: observed agreement (Po) and the kappa statistic (14). Unlike the observed agreement, the kappa statistic expresses the degree of agreement while correcting for agreement expected simply by chance (12). To calculate the agreement indicators, each of the resources reported by participants was categorized by service type. Depending on the frequency of reports for each service, some of the service categories were combined to increase the cell size stability. Services were reported treated as dichotomous (yes/no) variables, with a “yes” representing a utilization rate of one or more resources within a given category. For these variables, observed agreement and a simple unweighted kappa were calculated to assess agreement (14). Treating the health services categories as ordinal-level (or continuous) variables was not appropriate, because most of the participants reported no more than one use of a given service. In cases where participants reported multiple use of a health resource, such as medications, the data were treated as ordinal-level and agreement was estimated with a quadratic weighted kappa statistic (14).
Because several medications were reported by a small number of participants, the medications were combined into four therapeutic categories, including diabetes therapy, antibiotics & antifungals, gastrointestinal therapy, and pulmonary therapy. To calculate the weighted kappa, the response levels were as follows: (i) 0, 1, 2, 3, 4, and 5 or more prescriptions for the gastrointestinal category; (ii) 0, 1, 2, and 3 or more prescriptions for the antibiotics category; (iii) 0, 1, 2, 3, and 4 prescriptions for the pulmonary therapy category; and (iv) 0, 1, and 2 or more prescriptions for the diabetes therapy category.
The computed kappa statistics were then classified according to Landis and Koch (19) as “poor to fair” if they were under 0.4, “moderate” if in the range of 0.41–0.60, “substantial” if 0.61–0.80, and “almost perfect” if 0.81–1.00. Confidence intervals around the kappa statistics were calculated using the method of Donner and Eliasziw (1992) (8).
RESULTS
Participants
Of the 178 individuals who were identified as being eligible for study participation, 134 (75.3 percent) agreed to participate, and 110 completed the study. Demographic and clinical characteristics of the participants appear in Table 1. The average age of the sample was 31 years, slightly more than half were male, the majority were employed either full-time or part-time, and just under half were either married or in a common-law relationship. There were no statistically significant differences between the participants and those who withdrew or refused to participate with respect to age, proportion of males to females and pulmonary disease severity, as measured by percentage of predicted forced expiratory volume in 1 second. These three parameters indicated that the sample was representative of the population of CF patients in Canada (15).

Home-Based and Ambulatory Health Services Categories
Agreement between the AHCR and administrative data was evaluated for 100 participants, as 10 participants had health card numbers that did not match those in the OHIP database. Participants' reports of health services were grouped into nine categories (Table 2). In this study, reports of home-based visits consisted of consultations with either a nurse or a personal support worker. Because only a small number of participants reported home-based visits, nursing and personal support consultations were combined into a single category to increase the cell size. A clinic visit was defined as a consultation with a respirologist or internist that included diagnostic tests (e.g. pulmonary function testing) or included laboratory services (e.g. biochemistry, microbiology, and hematology). Reports of consultations with a respirologist or internist or reports of diagnostic tests or laboratory services that were not associated with a clinic visit were included in their respective categories and, therefore, not included in the clinic visit categoryUltrasounds, computed tomography (CT) scans, and radiological tests were grouped into the Diagnostic Tests category; blood and sputum tests were grouped together under Laboratory Services.

Table 2 presents the overall proportion of agreement (Po) and the proportion of chance-corrected agreement represented by a simple kappa, by utilization category. For the physiotherapy category, visits were reported with the identical frequency in the AHCR and the OHIP database. For the Clinic and Hospitalization categories, the observed agreement and the kappa were almost perfect. Within five categories (General Practitioner, Physician Specialist, Diagnostic Tests, Laboratory Tests, and Home-based Nursing or Personal Support), the observed agreement and the kappa statistics indicated moderate to substantial agreement. Although the observed agreement was high in the Physician Specialist and Laboratory Tests categories (0.85 and 0.92, respectively), the kappa statistics indicated only moderate agreement.
Prescription Medications
All 110 participants were included in the agreement analysis for prescription medications. Participants reported twenty-five different generic medication names. Because several of these names were reported by a relatively small number of participants, medications were collapsed into four broad therapeutic categories: (i) gastrointestinal therapy medications (domperidone, digestive enzymes, ranitidine, omeprazole, and multivitamins), (ii) antibiotic/antifungals (amoxicillin, apo-sulfatrim, cefuroxime, cephalexin, ciprofloxacin, cloxacillin, tobramycin, and nystatin), (iii) pulmonary therapy medications (beclomethasone, budesonide, fenoterol, sodium cromoglycate, ipratropium, prednisone, salbutamol), and (iv) diabetes therapy (all types of insulin).
Table 3 presents the overall proportion of agreement (Po) and the proportion of chance-corrected agreement, measured using a simple kappa. The observed agreement was very high (0.95) for the diabetes category and high (0.76 to 0.78) for the remaining three categories. For the Diabetes Therapy and Antibiotics/Antifungals categories, the simple kappas indicated substantial agreement between the AHCR and the pharmacy database. Observed and chance-corrected agreement were lower for Gastrointestinal Therapy and Pulmonary Therapy.

In Table 4, the quadratic weighted kappa indicated substantial agreement for the Antibiotics/Antifungals category and moderate agreement for the three remaining categories. The weighted and simple kappa values are almost identical, suggesting that, if an individual is able to recall that they obtained a prescription, he/she is as likely to recall the number of prescriptions filled.

DISCUSSION
The development and assessment of the AHCR was motivated by the need for a comprehensive, standardized resource cost measure. The level of agreement between self-reports on the AHCR and administrative data ranged from moderate to perfect for health services categories and moderate to substantial for prescription medications. The range in agreement for the health services categories and for the medication categories indicates that participants' recall differs for various types of health services.
The Physiotherapy category had perfect agreement between self-reports and OHIP. In another recently conducted study, agreement between self-reports of physiotherapy by adults with low-back pain and medical records was assessed. Unlike our study, kappa statistics were calculated for specific physiotherapy interventions (e.g., massage) (17). The lower kappa values reported in that study (ranging from 0.01 to 0.34) may be due to the demand for a more detailed recall of the specific interventions compared with the present study.
The agreement between administrative data and self-reports of general practitioner visits and specialist visits was substantial to moderate in our study. Ungar and Coyte reported kappa values indicating moderate agreement (0.43) and substantial agreement (0.80), respectively, for general/family practitioner and specialist visits for adults with asthma (36). Although utilization was recalled over a 3-month period in their study, CF participants attend appointments with a greater variety of specialists, which may have diminished recall for all specialist visits, thus decreasing the degree of agreement. In another study, Roberts et al. (29) reported agreement between medical records and men's self-reported utilization of all types of physician visits within the previous year (weighted kappa=0.56) and in the previous 2 weeks (weighted kappa=0.56) for adult males.
Almost perfect agreement was found in our study between self-reports of hospital admissions and OHIP data. Ungar and Coyte reported perfect agreement between respiratory patients' self-reports of hospital admission in the prior 12 months and substantial agreement of admissions over the 3-month study period (36). In a New Zealand study of cardiac patients' recall of hospital admissions over a 3-year period, 16 percent of the sample did not report an admission that had been reported in a national database (26). However, comparison is complicated by the large difference in the recall periods of the two studies.
Two other studies evaluated the level of agreement between self-reports and clinical records for Laboratory tests (4;17). Browne et al. (4;5) reported kappa statistics for specific laboratory services (e.g., blood test) ranging from 0.48 (moderate) to 0.89 (almost perfect) for adults with chronic illnesses. The moderate agreement observed in our study for reports of a group of laboratory services falls within the range reported by Browne and colleagues. In our study, we observed substantial agreement between care recipients' reports of diagnostic tests and OHIP data. Guzman and colleagues reported similar kappa values for agreement between prospective self-reports and medical records for laboratory tests (0.55), X-rays (0.79), and CT scans (0.85) for adults with low-back pain (17).
In our study, it was expected that participants' reports would display greater accuracy than would be observed with retrospective recall. However, reporting in our study (daily diary) may have been less accurate than that obtained from interviews, as the accuracy of participants' reports may be enhanced by interviewers' prompts and explanations. Unlike our study in which data were collected prospectively, most studies have assessed the level of agreement between participants' retrospective reports and administrative data. Unlike other questionnaires used to measure resource utilization, the AHCR requires the respondent to write in the type of service used, rather than checking off services used from a preset list of options.
Although several other studies have evaluated the level of agreement between administrative data and self-reports, none have evaluated the accuracy of self-reports of home-based healthcare appointments, and only two analyzed agreement for the purpose of evaluating a data collection instrument (4;17). Both of these studies reported results that supported the instrument; however, one study examined only laboratory tests (4), and the other study examined a disease-specific instrument designed to assess resource utilization for individuals with lower back pain (17).
For several health services categories, the OHIP or OHCAS database indicated that a small number of participants received a service that was not reported on the AHCR. This discrepancy may be attributable to how the study was described to participants. Although study participants were asked to report use of all healthcare resources, they knew that the study focused on CF care. Therefore, some may not have reported utilization that was perceived to be unrelated to CF. For example, physician specialist appointments with dermatologists and gynecologists and diagnostic tests, such as electrocardiograms and mammograms that were found in the databases, were not reported in the AHCR.
Although kappa statistics indicated moderate to perfect agreement, it should be noted that laboratory tests and physician specialist visits were reported with less accuracy than the other health services categories. Given that the AHCR is used to estimate resource costs, it is important to consider how the under- or over-reporting of particular resources may effect the calculation of overall costs. In this study, laboratory tests and physician specialist visits were under-reported by participants. Although laboratory tests are relatively inexpensive compared with other resources (e.g., hospitalizations), underreporting of physician specialist visits may have an effect on the calculation of overall costs in a particular study. The underreporting of these two resources should be acknowledged when administering the AHCR in future studies. As discussed above, this underreporting may be minimized by explaining to respondents what resources are necessary for them to report.
In our study, the level of agreement between reports on the AHCR and the pharmacy database was greater than 76 percent for each medication category. Another study reported percentage agreement between patients' reports of the name, dose, and direction for use of current medications in an in-home interview with physicians' notes (73 percent) and pharmacy databases (73 percent) (22). Another study assessed the agreement between elderly patients' home-visit reports of the number of medications used (16 medication categories) obtained by a mailed questionnaire and a telephone interview. Percentage agreement ranged from 50 to 100 percent for mailed questionnaires and from 0 to 100 percent for telephone interview reports (20).
In our study, we reported simple kappa values that ranged from 0.55 to 0.64. Three other studies reported simple kappa statistics for the level of agreement between reports and administrative data (17;21;27). Simple kappas of 0.89 and 0.86 were reported for agreement between patients' recall of past and current zidovudine (AZT) use, respectively, for acquired immunodeficiency syndrome or human immunodeficiency virus-related illness and pharmacy databases (21). Another study assessing the agreement between elderly women's recall of medication use over the previous 10 years and pharmacy databases, reported kappa values that ranged from 0.30 for steroids to 0.66 for antihypertensives (27). Finally, the level of agreement between patients' reports of medications for lower back pain control and medical records ranged from 0.02 for topical analgesics to 0.46 for narcotics (17). We are unable to compare our weighted kappa statistics with these studies, as none assessed respondents' recall of the actual number of prescriptions.
Although this study supports the use of the AHCR in health services research, there are limitations. Only publicly financed services were assessed; the remaining components of the AHCR were not assessed, because out-of-pocket and time costs can be obtained only by families, and assessing the accuracy of these reports was not possible. However, this study is the first to assess the accuracy of prospective reports for several health resource categories in a standardized data collection instrument. In addition, because of an insufficient number of reports, we were unable to estimate a kappa statistic for emergency department visits. Another limitation of this study is the selection of the sample. Because this study was conducted with a sample from one CF clinic, the results may not be generalized to CF patients in other regions or to other disease populations. However, this CF clinic serves approximately 70 percent of the adult CF population in Ontario, and it is unexpected that the Ontario adult CF population would record this information with a different level of accuracy than care recipients from other locations. Participants may have reported resource use with more diligence knowing that their responses would be compared with claims data. Finally, although participants were asked to complete the AHCR on a daily basis for the 4-week data collection period, we could not rule out the possibility of retrospective reporting. However, such reporting would tend to lower the accuracy of selfreports.
CONCLUSION
Overall, the results of this study indicate that there was good agreement between participants' reports and administrative data for three health service utilization categories on the AHCR: (i) home-based healthcare professional visits; (ii) ambulatory healthcare visits, including laboratory and diagnostic tests; and (iii) prescription medications. The results of this study provide support for the use of the AHCR to measure resource use in health services research for diverse patient populations.
The AHCR was developed in response to the need for a standardized, comprehensive instrument to obtain and value resources associated with ambulatory and home-based healthcare programs. When used in conjunction with health-related quality of life or other outcome measures, a standardized instrument to systematically and comprehensively assess costs would improve cost-utility and cost-effectiveness analyses and would permit comparisons of the resource implications of different services in different healthcare settings.
CONTACT INFORMATION
Denise N. Guerriere, PhD (denise.guerriere@utoronto.ca), Assistant Professor, Department of Health Policy, Management and Evaluation, University of Toronto, 155 College Street, Suite 425, Toronto, Ontario M5T 3M6, Canada
Wendy J. Ungar, PhD (wndy.ungar@sickkids.ca), Assistant Professor, Department of Health Policy, Management and Evaluation, University of Toronto, 155 College Street, Suite 425, Toronto, Ontario M5T 3M6, Canada; Scientist, Population Health Sciences, The Hospital for Sick Children Research Institute, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada
Mary Corey, PhD (mary.cory@sickkids.ca), Professor, Department of Public Health Sciences, University of Toronto, 155 College Street, Suite 425, Toronto, Ontario M5T 3M7, Canada; Senior Scientist, Population Health Sciences, The Hospital for Sick Children, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada
Ruth Croxford, MSc (RuthC@ices.on.ca), Research Coordinator/Biostatistician, Clinical Epidemiology Unit, Sunnybrook and Women's College Health Sciences Centre, G106, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada
Jennifer E. Tranmer, MSc (jenn.tranmer@uontario.ca), Department of Health Policy, Management and Evaluation, University of Toronto, 155 College Street, Suite 425, Toronto, Ontario M5T 3M6, Canada
Elizabeth Tullis, MD (tullise@smh.toronto.on.ca), Associate Professor, Department of Medicine, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada; Head, Division of Respirology, St. Michael's Hospital, 30 Bond Street, Suite 6-049, Toronto, Ontario M5B 1W8, Canada
Peter C. Coyte, PhD (pter.coyte@utoronto.ca), Professor of Health Economics, Department of Health Policy, Management and Evaluation, University of Toronto, 155 College Street, Suite, 425, Toronto, Ontario M5T 3M6, Canada
The authors to acknowledge funding from the Canadian Institutes for Health Research (Grant number: 37883). In addition, Dr. Ungar is supported by a Canadian Institutes for Health Research New Investigator Award. The Institute for Clinical Evaluative Studies is funded by the Ontario Ministry of Health and Long-Term Care. The opinions, results, and conclusions are those of the authors, and no endorsement by the Ministry of Health and Long-Term Care or by the Institute for Clinical Evaluative Sciences is intended or should be inferred.