Administrative and surveillance data are used to inform many different types of healthcare epidemiology and antimicrobial stewardship (HE&AS) research, from rapid cycle process improvements (eg, internal to a specific healthcare setting to study quality and safety), to single-center observational studies, to large multicenter studies using data sources that are shared across institutions or at a regional or national level. An important characteristic of administrative and surveillance data is the ability to leverage existing data sources as opportunities for learning and continuous improvement. 1
Administrative data are traditionally defined as data collected for coding and billing. A broader discussion is warranted for studies related to HE&AS that includes any information collected primarily for nonresearch purposes, such as clinical or electronic health record (EHR) data, pharmacy data, billing/coding information, and other surveillance or registry data. Because these data are often already collected or are being collected in an ongoing fashion, such studies can often be conducted efficiently. National or regional level surveillance data collection, such as data collected by the Centers for Disease Control and Prevention (CDC) and partners through the National Healthcare Safety Network (NHSN), or other healthcare-associated infection (HAI) reporting that occurs because of Federal or state mandate, is also available for use in research studies. Data collected through NHSN, for example, is collected prospectively and is intended for use in HE&AS studies, which can be research or nonresearch in nature. Because these data are intended for wider use, the methodology, validation, and auditing documentation can provide assurance about data quality that are not available for most administrative data sources.
Administrative and surveillance data may also be used for clinical and quality improvement, to study utilization (eg, antimicrobials or laboratory tests), for comparative effectiveness research (eg, comparing different infection control or stewardship strategies), for monitoring for safety measures and outcomes, to study costs related to interventions or outcomes, or to study the effects of changes in health policies. Because they are available from most healthcare institutions, administrative data provide an opportunity for researchers to study “real-world” practices, in contrast to controlled research studies, and to include patients who otherwise may be excluded, thus potentially increasing the generalizability of the findings. In this review, we focus on the benefits and challenges of using administrative and surveillance data to study HE&AS.
ADVANTAGES AND DISADVANTAGES
The advantages and disadvantages of using administrative data for research vary by data source, but some common themes are evident (Table 1). Administrative data are widely available and are often previously collected; thus, they increase efficiency. Meaningful use has increased EHR use, with a goal of enhancing data sharing and standardization, patient engagement, and data transparency, and this approach supports data collection for multiple purposes, including evaluation of processes and outcomes. 2 In most cases, data collection has already occurred and does not disrupt patient care, and individual informed consent may be waived by the institutional review board (IRB) due to the minimal risk and impracticality of consent for these studies. Generalizability is often maximized through the inclusion of a larger proportion of the population of interest, although limitations remain (eg, EHR data from a university hospital consortium might not be generalizable to community hospitals). The major disadvantage of using administrative data is lack of control over data collected for other purposes, including which variables were collected, how they were defined, and whether they were standardized and/or complete. Electronically captured data may be discrete or textual—the latter requiring sophisticated techniques such as natural language processingReference Tan, Gao and Koch 3 to utilize the data for research purposes. A comprehensive understanding of which data will be used, the purpose(s) for which the data were originally collected, and the quality control of the data are essential. For data sources such as NHSN, extensive documentation is available regarding how data are collected, case definitions, and changes that have occurred over time, and this documentation should be carefully considered and fully understood when planning the analysis.
TABLE 1 Advantages, Disadvantages, and Potential Pitfalls of Using Administrative and Surveillance Data in Healthcare Epidemiology and Antimicrobial Stewardship Research

NOTE. EHR, electronic health record. ICD, International Classifications of Diseases. IRB, institutional review board. NHSN, National Healthcare Safety Network.
PITFALLS AND TIPS
Although studies using administrative data may seem relatively easy to perform compared to studies employing primary data collection methods, several pitfalls must be considered. During the study planning stages, close communication with those responsible for generating the data or with those who work with it regularly is key to enhancing the validity of the inferences derived from these studies.
Assumptions about data quality are a common misstep. Understanding whether and to what extent data validation occurs is critical. In the case of billing/coding data, submission of data to payors requires some standardization/common methodology; however, the coding goal may be to maximize reimbursement to the extent that can be supported by clinical documentation rather than to accurately reflect a clinically valid case definition for the diagnosis of interest. For inpatient pharmacy data, researchers may observe very different utilization depending on whether they use pharmacy inventory data, patient-level orders for medications, charges for medications, or actual medication administration.Reference Dalton, Sabuda, Bresee and Conly 4 For outpatient encounters, one may track prescriptions written (which may never be filled) versus pharmacy claims (which requires the medication to be filled at a pharmacy); however, neither of these approaches precisely reflects whether the patient actually took the medication. Most other administrative data are subject to a similar limitation—that what is documented may or may not reflect reality. Furthermore, identification of antibiotic-resistant bacteria has been particularly difficult with administrative data. One study found that >50% of patients with ICD-9 codes for methicillin-resistant Staphylococcus aureus (MRSA) did not actually have evidence of this organism.Reference Schweizer, Eber and Laxminarayan 5
Validation thus becomes an important component of studies using administrative data. Large surveillance datasets, such as the CDC’s NHSN and the American College of Surgery’s National Surgery Quality Improvement Program (NSQIP), have clearly defined definitions (which may differ from clinical definitions) and established protocols for data collection, and these undergo regular validation. Furthermore, while these programs are frequently used by the sites submitting data to measure their own outcomes and benchmark with comparator facilities, these databases also provide national-level data and have been used to study postoperative infection rates,Reference Gehrich, Lustik, Mehr and Patzwald 6 geographic variability in antibiotic resistance,Reference Soe, Edwards, Sievert, Ricks, Magill and Fridkin 7 and national healthcare personnel influenza vaccination rates,Reference Lindley, Bridges and Strikas 8 among other topics. Even a single-center study using a hospital EHR may benefit from internal validation via chart review to measure the extent to which particular variables capture the items of interest.
Knowing the data limitations is fundamental to informing the study plan and to conducting proper sensitivity analyses. The goal may not be 100% accuracy, but as long as the case definition and how it is being identified and reported does not change over time, validity may be maintained. The effect of possible misclassification bias on the potential results should be examined prior to study initiation. This requires some knowledge of the system(s) under study; for example, researchers studying antimicrobial use in sepsis may need to rely on ICD-9/ICD-10 codes for “sepsis” to identify cases. These codes are unlikely to be assigned 100% accurately, but they may be reasonably valid if coding procedures have remained constant. However, if an unrelated sepsis-related initiative increased the coders’ attention to sepsis diagnoses during part of the study period, the cases prior to the sepsis initiative may have been coded quite differently than later cases, and the protocol may need to be modified accordingly.
PUBLISHED EXAMPLES IN HEALTHCARE EPIDEMIOLOGY AND ANTIMICROBIAL STEWARDSHIP
Administrative data may be used for multiple purposes to describe a wide variety of HE&AS-related outcomes. One of the fundamental uses of “traditional” administrative data (ie, coding) is for surveillance, particularly for HAIs that are not easily monitored in other ways. For example, until recently, the NHSN did not include surveillance for Clostridium difficile infections (CDI), but several investigators used administrative data to estimate CDI incidence.Reference Dubberke, Butler and Yokoe 9 , Reference Young-Xu, Kuntz and Gerding 10 Surgical site infections (SSIs) can be particularly difficult to capture via traditional surveillance; thus, coding data may serve as an important method of ascertainment, particularly of post-discharge SSIs.Reference Olsen, Nickel and Fox 11 , Reference Perencevich, Sands, Cosgrove, Guadagnoli, Meara and Platt 12 However, numerous studies have evaluated the accuracy of administrative data in generalReference O’Malley, Cook, Price, Wildes, Hurdle and Ashton 13 – Reference van Walraven, Bennett and Forster 15 or have compared administrative (mostly coding) data to other forms of surveillance for infections such as CDI,Reference Dubberke, Reske, McDonald and Fraser 16 – Reference Wen, Barber and Ananthakrishnan 19 central-line–associated bloodstream infection,Reference Furuno, Harris and Wright 20 – Reference Patrick, Davis and Sedman 23 SSI,Reference Curtis, Graves and Birrell 24 ventilator-associated pneumonia,Reference Drees, Hausman, Rogers, Freeman, Frosch and Wroten 25 and overall HAIs.Reference Sherman, Heydon and St John 26 – Reference Jhung and Banerjee 30 In general, these studies have found administrative data to lack sensitivity and specificity for the infections under study.
Because of the challenges faced when conducting HAI surveillance, the CDC recently developed a surveillance module (known as LabID) 31 that is based solely on administratively collected data (admission/discharge and microbiology result dates and patient location) for CDI and MRSA bacteremia surveillance. These modules have been adopted by the Centers for Medicare and Medicaid Services (CMS) as conditions of participation. Several studies have compared this “administrative” surveillance to traditional surveillance, and generally higher rates of hospital-onset infections with administrative data were reported, due primarily to differences in definition.Reference Durkin, Baker and Dicks 32 , Reference Baker, Durkin and Dicks 33 Administrative data can also be used to expand HAI-related knowledge, such as risk factors,Reference Dubberke, Reske, Yan, Olsen, McDonald and Fraser 34 outcomes such as readmissions,Reference Khong, Baggs, Kleinbaum, Cochran and Jernigan 35 or to provide risk-adjustment for studies of antimicrobial-resistant organismsReference McGregor, Perencevich and Furuno 36 , Reference McGregor, Kim and Perencevich 37 or other HAIs.Reference Daneman, Simor and Redelmeier 38 , Reference Kanerva, Ollgren and Lyytikainen 39
Several scenarios make the use of administrative data particularly compelling. One is antimicrobial utilization surveillance, where the primary objective is to capture all antimicrobial use, not just use in particular settings or by practitioners enrolled in a study. Several such ecologic studies have been published, in which researchers were able to access databases of antimicrobial prescriptions (some from entire countries) and compare these data to antimicrobial resistance rates across the same geographic areas.Reference Elseviers, Ferech, Vander Stichele and Goossens 40 – Reference Hicks, Chien, Taylor, Haber and Klugman 43 While these studies are subject to the limitation of ecologic fallacy and inability to prove causation, this high-level view of antimicrobial use and resistance would be extremely difficult without administrative data. At the level of individual practice, both billing/claims and EHR data have been used to study antimicrobial use at the point of care during outpatient visits, in long-term care settings, or even with telephone-based contact.Reference Maselli and Gonzales 44 – Reference Rummukainen, Mäkelä, Noro, Finne-Soveri and Lyytikäinen 46
Administrative and surveillance data can also be used to conduct “natural experiments,” for example, when there is a change in state or federal policy related to HAI. One prominent example is the increase in laws and regulations related to public reporting of HAI. Several researchers have studied the impact of these laws on blood culture and antibiotic utilizationReference Flett, Ozonoff, Graham, Sandora and Priebe 47 and on central-line–associated bloodstream infection (CLABSI) rates. Interestingly, while hospitals in states with mandatory public reporting were more likely to participate in a national CLABSI prevention program and these participants trended toward greater CLABSI reduction,Reference Marsteller, Hsu and Weeks 48 other investigators have found no difference in HAI reduction in states with mandatory reporting.Reference Pakyz and Edmond 49 , Reference Lee, Kleinman and Soumerai 50 To some extent, this discrepancy may be due to the different data sources used, and researchers must be cognizant that the same outcome as measured in different administrative data sets may lead to different inferences. Others have used administrative data to study whether CMS non-reimbursement policies related to HAI have changed provider behavior related to test ordering,Reference Morgan, Meddings and Saint 51 billing,Reference Kawai, Calderwood and Jin 52 or infection rates.Reference Waters, Daniels and Bazzoli 53 These questions are particularly applicable to administrative data; whatever data inaccuracies may exist, they are likely to be stable over time, other than the effect of the new policy (provided other significant changes in practice patterns have not simultaneously occurred).
Lastly, because billing is such an important component of many administrative datasets, studies attempting to determine costs or cost-effectiveness usually rely on administrative data. In healthcare epidemiology, many studies have measured increased or attributable costs related to HAI,Reference Perencevich, Sands, Cosgrove, Guadagnoli, Meara and Platt 12 , Reference Goudie, Dynan, Brady and Rettiganti 54 – Reference Mauldin, Salgado, Hansen, Durup and Bosso 60 while others have evaluated the cost or cost-effectiveness of interventions.Reference Bejko, Tarzia and Carrozzini 61 – Reference Brilli, McClead and Crandall 65 A recent study utilized several different administrative datasets to estimate the potential cost-benefit of CDI reduction.Reference Slayton, Scott, Baggs, Lessa, McDonald and Jernigan 66 In any attempt to analyze costs of infections or of their prevention, an understanding of charges versus actual costs is critical.Reference Finkler 67
MAJOR TAKE-HOME POINTS
In light of the important pitfalls and limitations raised here, researchers planning a study in HE&AS using administrative data must consider several important points (Table 2). Most importantly, these points underscore the need for thorough understanding of the data source(s) prior to the initiation of a study. This awareness must include the development of an appropriate research question, detailed understanding of the datasets available (including working with those who generate or work with the data regularly), and recognition of the validation (or lack thereof) and limitations of the data related to the particular research question under investigation. Table 3 provides examples of research publications using different types of administrative data.
TABLE 2 Checklist of Key Considerations When Developing a Study in Healthcare Epidemiology or Antimicrobial Stewardship Using Administrative or Surveillance Data

NOTE. EHR, electronic health record; IRB, institutional review board. IT, information technology; NHSN, National Healthcare Safety Network.
TABLE 3 Examples of Different Types of Administrative and Surveillance Data and Their Use in Hospital Epidemiology and Antimicrobial Stewardship Research

CONCLUSIONS
Administrative and surveillance data can inform many types of HE&AS research. As the use of administrative data becomes more common (uptake of EHRs) and has greater impact (ie, CMS pay for performance), scrutiny of the specific data collection methodology and data validity will only increase. Ideally, harmonization and standardization of these data will increase accordingly. More advanced techniques such as natural language processing will further increase the richness of data available from administrative sources. Compared to traditional controlled trials, studies using administrative data are efficient, can include vastly larger numbers of patients (increasing power to detect differences), may have broader applicability and generalizability, and have the potential to improve healthcare in a wide variety of settings. As hospital epidemiology, infection prevention and control, and antimicrobial stewardship increasingly attract attention, researchers in this field have an obligation to ensure that the data being generated are utilized, analyzed, and interpreted appropriately.
ACKNOWLEDGMENTS
The authors would like to thank SHEA and members of the SHEA Research Committee for their support and review of this manuscript.
Financial support: No financial support was provided relevant to this article.
Potential conflicts of interest: All authors report no conflicts of interest relevant to this article.