BACKGROUND
Observational studies are a common study design in healthcare epidemiology, in large part due to the increasing accessibility of electronic data to clinicians, infection preventionists, and administrators. In contrast to randomized controlled trials, the investigator does not intervene with the exposure, and generally, the resources required for the study are far less intensive. Observational studies provide an opportunity to define and elucidate potential cause-and-effect relationships when it is not feasible to perform a randomized controlled trial. In this review, we discuss strategies for designing and conducting observational studies to maximize reliability and validity. We encourage readers to review the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement, or any of several other helpful tools, when planning an observational study.Reference von Elm, Altman, Egger, Pocock, Gotzsche and Vandenbroucke 1 , Reference Sanderson, Tatt and Higgins 2 Best practices in reporting your study begin with careful attention to the design of the study.
DESIGNING AN OBSERVATIONAL STUDY
Like any investigation attempting to explore the relationship between an exposure and potential outcome, the objective of most observational studies is to test the counterfactual, ie, what the outcome would have been if the subject had not received the exposure of interest.Reference Hofler 3 To do this, all observational studies start with a cohort (ie, the population, or a specified subset of the population) at risk for the outcome of interest and then compare the likelihood of the outcome among subjects with and without the exposure of interest.
To this end, observational studies may have a wide range of designs, including retrospective cohort, prospective cohort, and case-control studies. In cohort studies, all individuals in the defined population at risk for the outcome are followed for a specified period of time to ascertain these events. Subjects in the cohort may be defined by a shared characteristic, location, and/or time period (eg, all residents of a skilled nursing facility during the 2010–2011 influenza season). All individuals at risk for the outcome are included in the analysis. Retrospective cohorts include a study population for whom the outcome of interest has occurred at the time of study design and enrollment; in contrast, subjects in a prospective cohort are enrolled before the occurrence of outcomes. The choice of retrospective versus prospective design is often dictated by the circumstances under which the study is performed. Prospective cohort studies may afford the investigator an opportunity to improve the completeness and reliability of the data collected, though the time to complete the study may be longer. Although some studies may involve both retrospective and prospective data collection,Reference Kleinbaum, Kupper and Morgenstern 4 within the realm of healthcare epidemiology, a completely retrospective design for cohort studies is often most convenient to conduct. Debate sometimes ensues regarding the definitions of “retrospective” and “prospective” observational studies. A distinction between timing and directionality may be made: the timing of a study is the relationship between the occurrence of outcomes and data collection (retrospective vs prospective), whereas the directionality of the study refers to the order in which the exposure and outcome are identified in the study cohort (forward vs backward). In a case-control study, individuals who developed the outcome (cases) and did not develop the outcome (controls) are first identified from the same base population as an investigator would define the source population for a cohort study. There are various ways to select controls, but regardless of the selection method, it is critical to ensure that controls are drawn from the same source population as the cases to help mitigate biases.Reference Schulz and Grimes 5 – Reference Wacholder, Silverman, McLaughlin and Mandel 7 After appropriately defining the cohort and selecting cases and controls, the association between the exposure of interest and the outcome may be analyzed.
ADVANTAGES AND DISADVANTAGES
Observational studies in healthcare epidemiology benefit from the widespread use of electronic data collection (Table 1). Data collected for clinical or administrative purposes may come with potential limitations.Reference Wyllie and Davies 8 , Reference Schweizer, Braun and Milstone 12 However, the availability of these data, the relatively low cost of conducting an observational study, and the ability to evaluate infrequent outcomes make cohort and case-control studies the most commonly implemented study designs in the field.
TABLE 1 Advantages, Disadvantages, and Potential Pitfalls of Using Observational Studies in Healthcare Epidemiology and Antimicrobial Stewardship Research
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160927051445-73034-mediumThumb-S0899823X16001185_tab1.jpg?pub-status=live)
Observational studies are highly susceptible to commonly encountered biases such as selection, assessment, time-dependent, loss to follow-up, and recall biases. These biases may be introduced by a mis-specified or poorly defined source population, use of a non-standard or subjective definition of the outcome, difficulty in ascertaining covariates that impact the outcome of interest, incomplete accounting of time at risk, and confounding from both measured and unmeasured confounders.Reference Harris, Carmeli, Samore, Kaye and Perencevich 9 , Reference Schweizer, Braun and Milstone 12
PITFALLS AND TIPS
A “perfect” observational study is difficult, if not impossible, to achieve due to the intrinsic nature of observational study designs. For this reason, investigators should aim to design and conduct a study with the fewest limitations feasible, and readers should carefully consider the impact of study limitations while evaluating the conclusions of the study. Table 1 enumerates potential pitfalls to avoid and tips to resolve these limitations.
It is important to start with a clearly defined study premise: does the study define a discrete hypothesis that is testable with the data available (or to be collected), and does the study address a meaningful question not sufficiently answered in the existing peer-reviewed published literature? While a cohort or case-control study may be chosen for cost and convenience and is often the appropriate design for healthcare epidemiology research, alternative designs such as time-series analysis or controlled trial may be more appropriate for some hypotheses and conditions.
As discussed in the introduction, selection of cases and controls is an important and underappreciated consideration. Defining the outcome and consideration of covariates (measurable and unmeasurable) are important to ensure validity and avoid potential biases, including confounding. Outcomes in healthcare epidemiology may be a measure of incidence or prevalence, with or without consideration of recurrence. As prevalence is influenced not only by incidence but also by duration of disease, it is preferable to use incidence as an outcome in most circumstances, though the data collection and analysis may be more complex.Reference Rhame and Sudderth 13
The precise and accurate collection of data is an essential aspect of internal validity (how well the study was designed and performed). If multiple reviewers collect data, a calculation of the inter-rater reliability may be used to assess the precision and internal validity of the study.Reference Viera and Garrett 14 , Reference Hallgren 15 External validity (the generalizability of the study findings to other populations) can be limited if variables or the cohort are poorly defined. Standard objective definitions of exposures and other variables, but particularly the outcome, allow for optimal interpretation of the study findings. In hospital epidemiology, a number of resources for standard surveillance definitions are available, such as National Healthcare Safety Network guidance and criteria for identifying and defining healthcare-associated infections. 16 Table 2 provides a condensed checklist of key considerations that may help identify and address potential pitfalls in the design phase of a study.
TABLE 2 Checklist of Key Considerations When Developing an Observational Study in Healthcare Epidemiology
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160927051445-36014-mediumThumb-S0899823X16001185_tab2.jpg?pub-status=live)
STATISTICAL CONSIDERATIONS
Because the “assignment” of exposures and potential confounders to patients in the study population is beyond the control of the investigator conducting an observational study, statistical tools are often employed to account for associations that may affect the measured relationship between the exposure and the outcome. These analyses may be complex and require the assistance of a biostatistician, including both before initiating a study and during the analysis phase. A biostatistician and/or an epidemiologist trained in study design may advise on complex situations: recurrent outcomes per unique subject (eg, recurrent Clostridium difficile colitis); nonbinary risk categories (eg, the risk of healthcare associated infection due to Staphylococcus aureus among patients with a baseline history of infection, a positive colonization screen, a negative screen, and/or unknown colonization status); or exposure variables that are time dependent (eg, determining association between the duration of antimicrobial exposure and risk of infection due to carbapenem-resistant Enterobacteriaceae). In Table 3, we have outlined examples some of the fundamental tools used in the analysis of observational studies.
TABLE 3 Statistical Tools Useful for the Analysis of Observational Studies
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160927051445-27313-mediumThumb-S0899823X16001185_tab3.jpg?pub-status=live)
OBSERVATIONAL STUDIES IN HEALTHCARE EPIDEMIOLOGY AND ANTIMICROBIAL STEWARDSHIP
While there are no limitation-free observational studies, several examples will help illustrate key principles of this review. For case-control studies, appropriately identifying cases and controls is essential to testing the counterfactual: did every subject have the same opportunity to be exposed to risk factors of interest and to develop the outcome? In a study that aimed to investigate risk factors for the acquisition of extended-spectrum β-lactamase–producing Klebsiella pneumoniae in an intensive care unit (ICU), cases were defined as patients for whom the organism was isolated during the ICU admission, and controls were selected from the same ICU population with a similar length of exposure in the ICU (and who were not defined as a case), and were matched on age, gender, severity of illness score, and underlying disease.Reference Piroth, Aube, Doise and Vincent-Martin 17 Having defined cases and controls in this fashion—including consideration of control group selection and adjustment for time at risk— allowed the investigators to be more confident that subjects who did not develop the outcome had the potential to become a case had their exposures been different.Reference Harris, Karchmer, Carmeli and Samore 18
In prospective cohort studies, by controlling which data elements or specimens to collect, investigators may have the opportunity to improve the internal validity and overall quality of the study. In a 2011 study, Dutch investigators included in their prospective cohort consecutive patients admitted to 1 of 4 ICUs for at least 48 hours during the study period. Investigators collected surveillance cultures from the respiratory tract (on admission, twice weekly, and on discharge) to identify patients (cases) who acquired a multidrug-resistant Pseudomonas aeruginosa or Enterobacter sp. With this design, the investigators were able to confidently identify subjects in the cohort at risk for colonization with a resistant strain, and they had full case ascertainment in investigating variables associated with the development of resistance.Reference Ong, Jongerden and Buiting 19
Using a large, validated, multicenter, quality improvement database, investigators in California studied a cohort of 20,934 very-low-birth-weight infants born between 2002 and 2006 in a study hospital.Reference Kleinbaum, Kupper and Morgenstern 4 The exposure of interest was birth in a hospital participating in a quality improvement project (vs not participating in the project), and the outcome was nosocomial infection events. The outcome was carefully defined using a standardized definition and a defined at-risk period (because nosocomial infection could not occur within 4 days of birth by definition, those with length of stay <4 days were excluded). Authors accounted for the potential impact of mis-specified outcomes by excluding infants with abdominal surgery or necrotizing enterocolitis (because nosocomial bloodstream infection may be confounded by these events). The investigators acknowledged the challenges of handling missing data and determining exposure when patients were transferred between hospitals.Reference Wirtschafter, Powers and Pettit 20
While there exists no “perfect” study, in every study it is important to design the study to minimize factors that impact the quality of the study and to acknowledge not only the limitations present but also the significance of the impact these limitations may have on the validity and the interpretation of the results.
MAJOR TAKE-HOME POINTS
Observational studies are frequently employed in healthcare epidemiology research due to the increasing availability of electronic databases, which may be more accessible to the field due to feasibility limitations of randomized controlled trials. An observational study may be designed as a prospective or retrospective cohort study, or as a case-control study, but a unifying characteristic of these designs is the comparison of an outcome among subjects exposed and unexposed to a variable of interest without intervention from study investigators. Researchers contemplating an observational trial in hospital epidemiology should carefully consider the design of the study before collecting and analyzing data, and researchers should be particularly vigilant when defining the study cohort at risk of the outcome. Sources of bias should be considered first in the design stage, and various statistical tools are available to aid in the analysis to accommodate bias resulting from a lack of randomized comparison groups.
CONCLUSIONS
Cohort and case-control studies represent the observational designs most frequently employed in healthcare epidemiological research. Although data might be readily available and the analysis of exposures and outcomes might seem straightforward, these studies are subject to many sources of bias. Limitations of these observational studies can be addressed by careful planning, definition of subjects at risk for the outcome, and inclusion of patients with and without the exposure of interest.
ACKNOWLEDGEMENTS
Financial support: No financial support was provided relevant to this article.
Potential conflicts of interest: L.S.M.-P. reports that she has served as a speaker for Ecolab and Xenex and as a consultant for Xenex and Clorox. All other authors report no conflicts of interest relevant to this article.