Since the 1960s, ethics review of research has become increasingly recognized as an important structural development in health care, especially in the Western world. However, there has been much concern expressed by researchers over perceived differences in judgments ethics committees make (28). Different countries have their own legal policies, regulatory requirements, and cultural practices, although there have been moves to bring these different systems of review together. The European Directive on Implementing Good Clinical Practice, for example, once implemented across Member States, may avoid or indeed conceal the issue of different judgments by stating that each Member States must establish a procedure for giving a single ethical opinion for multicenter drug trials (16). There are also several overarching guidelines, some carrying international weight, to ensure that research ethics committees adopt similar methods and processes, if not that they reach similar conclusions. The World Health Organization's Operational Guidelines for Ethics Committees That Review Biomedical Research (35) has informed many more domestic arrangements; such as the Federal Drug Agency Common Rule (13), updated by the U.S. Office for Protection from Research Risks; Guidelines and Recommendations for European Ethics Committees (15), published by the European Forum for Good Clinical Practice; and the Governance Arrangements for Research Ethics Committees (10), issued by the UK National Health Service's Central Office for Research Ethics Committees (11). However, within these administrative stipulations, there is still much scope for differences in the way committees make judgments and there is little research in this area.
In this study, we investigate the perception that different research ethics committees make different and sometimes incompatible judgments by first identifying any relevant studies in the literature, then recording their data according to the methods they used, synthesizing the data according to the ethical concerns raised in them, and finally discussing their findings.
METHODS
We included any study that attempted to compare the judgments made by different Research Ethics Committees (RECs) or Institutional Review Boards (IRBs) when reviewing one or more protocol directly. We did not, at this stage, exclude any poorly designed or poorly executed studies, nor did we exclude anecdotal accounts of researchers trying to get their research approved by ethics committees. All “eligible” studies were available in English.
There were twenty-six studies that addressed our study question. These studies were identified from three main sources described below: (i) electronic searching, (ii) hand searching, and (iii) exploding reference lists. (i) Electronic search. We conducted an electronic search of Medline/PubMed and the International Bibliography of the Social Sciences up to February 2006 using keywords, such as “ethics committee” or “review,” that were more sensitive than specific, and we therefore narrowed our yield manually by reading the titles and abstracts of studies where they were available. (ii) Hand search. We also conducted a hand search of the relevant journals, including IRB: Review of Human Subjects Research, Healthcare Ethics Committee Forum, the Journal of Medical Ethics, The Lancet, British Medical Journal, The New England Journal of Medicine, and the Journal of American Medical Association. (iii) Exploding references. The reference lists of relevant papers gained by the above process were also examined for further material.
The main findings and quality of the twenty-six component studies were reviewed independently by S.E. and T.St. The studies were too heterogeneous to combine in a meaningful statistical way. The studies used different designs, approached different numbers of committees with different numbers of research protocols, and were conducted at different times and in different places. Despite these difficulties, T.Sw. described the designs of the studies according to three categories: (i) seventeen studies investigating how different local RECs or IRBs independently reviewed the same single protocol; (ii) four studies investigating how local RECs reviewed a study they knew had already been approved by a Multicenter Research Ethics Committee (MREC), and (iii) five studies investigating how different RECs or IRBs reviewed different numbers of protocols. Tables of these studies are available from the first author upon request.
Within these types of design, the data were then presented in narrative by S.E., according to the quality of the evidence and topics the studies reported. The data are recorded as percentages for the sake of consistency in addition to the different respective sample sizes in parentheses. The results from component studies cannot easily be compared, although S.E. identifies some patterns across the studies in the Discussion section along with comment on their ethical and legal significance.
RESULTS
RECs or IRBs Independently Reviewing One Protocol
Quality of the Studies
Seventeen studies sought data on how different RECs reviewed a single protocol. The aim was to examine the (in-)consistency between RECs by contrasting their respective judgments of a single protocol, although only three of the studies attempted a formal statistical comparison (7;18;33). There were various methodological weaknesses with the designs used. The studies were generally descriptive and anecdotal, yet all were prospective, being undertaken as part of the process of getting real research projects off the ground. The ten UK studies were undertaken before the introduction of the UK MREC system in 1997 and before the governance framework required a single ethics opinion. The response rates from RECs were high because they were already required to respond to all applications, although they did not seem to have been aware that their “performance” was being recorded. The sample of RECs in each study was thus convenient, making general conclusions impossible to make, although two studies sought data from an apparently complete population (1;27). Even then, however, it was not possible to draw general conclusions about the individual RECs involved because only one protocol was reviewed by each of them.
Incompatible Judgments
Of these seventeen studies, only five reported that some RECs or IRBs approved while others rejected the same protocol. Two studies, both undertaken in 1996, submitted questionnaires or interviews, and found that 13 percent (3 of 24) and 17 percent (2 of 12), respectively, rejected it while the remainder approved it either straight away or after some revision (8;29). More recently, two studies, both done in the United States, reported that an observational study of physicians' learning of and adherence to practice guidelines was exempted from ethics review in 2 percent (1 of 43) cases, was approved in 95 percent (41 of 43) of cases, and rejected in 2 percent (1 of 43) of cases (18), and that another observational study, using a database of patients undergoing ventral hernia repair, was granted ethics approval in only 82 percent (14 of 17) of sites (33).
The fifth study compared ethics committees across Europe and showed that ethics review was required for a trial of leaflet designed to increase elderly people's participation in GP consultations in only 27 percent (3 of 11) of countries sampled (22).
Protocol Revisions
All seventeen studies reported differences between RECs over the questions they asked of, or over issues they raised with, researchers before giving their ultimate approval. One study did not specify exactly what changes had been required by different RECs, yet documented that only some RECs asked for amendments before giving their approval (2 of 9) (4). Stair et al. (2001) also reported that 91 percent (40 of 44) of IRBs required further amendments (31).
In addition, 38 percent (9 of 24) wanted evidence of local support for the researcher, while 8 percent (3 of 36), and 18 percent (2 of 11) just wanted to know more about the local researcher (2;22;29). In some localities, 5 percent (1 of 19) asked for a question to be taken out of the questionnaire apparently for local reasons, which could have led to a scientifically weakened protocol (1).
Consent and Patient Information Sheets
Eight studies reported differences in approach to consent despite there being apparently clear guidelines in the Declaration of Helsinki (2000) (36). On submitting a multicenter survey of patients with oral cancer, Ah-See et al. (1) reported that 37 percent (7 of 19) of RECs asked for the patient information sheet to be reworded and 5 percent (1 of 19) even asked for the title to be changed. A further 25 percent (6 of 24), 17 percent (5 of 30), and 13 percent (21 of 162), in three respective studies had concerns about the sheets (21;27;29), while RECs (4 of 4) each had different concerns in another study looking at the feasibility of creating a fatal asthma registry (7).
One study showed that 53 percent (23 of 43) of IRBs required separate consent from doctors to send them a questionnaire concerning their learning of, and adherence to, practice guidelines, while 76 percent of these IRBs required at least one resubmission, only 12 percent required substantive revisions, the others being regarded as merely editorial (18). In one particular example reported in this study, the same IRB reviewed this protocol for two different sites and approved it without consent the first time and required revision with formal consent the second.
In relation to vulnerable groups, 57 percent (8 of 14) of RECs requested documented consent for a survey of mental health needs of a juvenile population (17). In addition, 17 percent (2 of 12) required consent to use routine and confidential records of psychiatric patients (14), while 3 percent (2 of 58) required parental consent for use of routine confidential data relating to their babies (6). Another study reported 33 percent (1 of 3) of RECs approved a survey on domestic violence without requiring separate consent from responding healthcare workers or patients (23).
Recruitment
Thirty-seven percent (7 of 19) of RECs in one study insisted that potential participants should initially be contacted by their own general practitioners (GPs) (1), while only 25 percent (6 of 24) of RECs asked for initial contact with parents of babies treated in specific neonatal units to come from the GP (29). One study required the patients' invitation letter to be reworded (22). In another two studies, 3 percent (2 of 58) and 5 percent (2 of 43) of RECs, respectively, wanted the recruit's GP to be informed that the patient was participating in their research (6;34).
In one study, 13 percent (3 of 24) of REC asked for changes to eligibility criteria, of which 67 percent (2 of 3) wanted bereaved parents to be excluded from a study of neonatal units and the remainder wanted all parents to be excluded (29). Another study reported 33 percent (1 of 3) of RECs raising concerns about approaching patients in a waiting room for a survey on domestic violence (23), while each IRB (4 of 4) required researchers to adopt different approaches when contacting the next of kin in a study of fatal asthma (7).
Ah-See et al. (1) found that only 5 percent (1 of 19) of RECs insisted that those who did not initially respond to an invitation to participate in a questionnaire survey should not be contacted again with a repeat invitation or followed-up. However, 37 percent (11 of 30) asked how participants would be contacted if they were shown in a survey of physical fitness of apparently healthy volunteers to be in need of medical attention (21).
Risks and Expected Benefits
One study, 36 percent (4 of 11) of IRBs did not consider that an observational study of HIV adolescents posed more than “minimal risk” to its subjects, while 9 percent (1 of 11) judged it to be greater than minimal risk for healthy controls only, and the remainder judged it to be more than minimal risk for all subjects (30).
In one particular case reported by Vick et al. (33), the same IRB reviewed the questionnaire protocol for two different sites and regarded it as “minimal risk” the first time and greater than minimal risk the second time. In the third study, 2 percent (1 of 43) of IRBs rejected a questionnaire study as “too risky” (18).
Compensation Arrangements
Harries et al. (21) found that 17 percent (5 of 30) had questions about compensation for injury in a survey of physical fitness among healthy volunteers.
Scientific Issues
Middle et al. (27) reported that 13 percent (21 of 162) had concerns about the aims of a postal survey of birth weight and the questionnaires to be completed by GPs, parents, and teachers. Five percent (1 of 19) of RECs in one study, Ah-See et al. (1), and 33 percent (1 of 3) of RECs in another Hirshon et al. (23), asked for power calculations for multicenter surveys.
Local RECs Reviewing One Protocol Already Approved by Another REC
Quality of Evidence
There were only four studies that investigated how local RECs (LREC) reviewed a protocol that had already gained approval from a multicenter REC and so was charged with examining only “local” issues. Again, there were various methodological weaknesses with this design and with the individual studies that used it. The studies were descriptive and anecdotal, again being mostly undertaken as part of the process of real multicenter research projects and may similarly reflect the real experiences of researchers in the United Kingdom after the introduction of the MREC system in 1997. The LRECs in the studies would have known that the protocol had already been approved by an MREC, and they will have had sight of the protocols and the MREC's comments before this policy changed in 2001, which restricted the information available to the LREC (10). The studies were each apparently prospective, and the LRECs involved did not seem to have been aware that their “performance” was being recorded. The sample of LRECs in each study was again convenient, making general conclusions impossible to draw. Again, it was impossible to draw general conclusions about the RECs involved in the study because only one protocol was reviewed by each of them. Formal statistical comparisons were made by two of the studies (26;32).
Protocol Revisions
Although there were no reports of some RECs approving while others rejected the same protocol, all four reported differences in the revisions they required.
Lux et al. (2000) found that 36 percent (36 of 99) approved a study of spasms in infants without further revision (26), while Lewis et al. (25) reported that 62 percent (33 of 53) of LRECs approved a genetic study of tuberous sclerosis without further amendments.
In addition, 6 percent (8 of 125) wanted to know more about the local researcher, while 3 percent (4 of 125) wanted approval from their local NHS Trust R&D office (32). The authors also suggested that 67 percent (84 of 125) of LRECs raised issues that were not even “local.”
Patient Information Sheet
A total of 68 percent (30 of 44), 22 percent (28 of 125), 8 percent (4 of 53), and 7 percent (1 of 15) of RECs in all four studies had concerns about the information sheets, although it is impossible to tell how substantive these concerns were (19;25;31;32).
Recruitment
In one study, 10 percent (13 of 125) asked for the ethnic mix to be considered (32). Two studies asked for further new exclusions: 2 percent (3 of 125) of RECs wanted patients who were already participating in other, unrelated research projects to be excluded (32), and 9 percent (4 of 44) of local IRBs wanted pregnant women to be excluded from a multicenter clinical trial of an asthma drug (31).
RECs/IRBs Reviewing More Than One Protocol
Quality of Evidence
There were five studies that used more than one protocol to investigate differences between RECs. That said they each used very different methods to gather their data. Two studies, Burman et al. (5) and Goldman and Katz (19), simply investigated how RECs reviewed three protocols instead of just one. Another study, Harding and Ummel (20), convened a “mock” or simulated REC from a pool of existing committee members to review eight protocols that are already approved by real RECs but without apparently telling them. The mock style of this study may have made the data somewhat artificial, although the members were real REC members. Kent (24) sought to obtain retrospective data on 50 protocols from analyzing in a formal statistical way the correspondence between the different RECs and their researcher applicants. Lastly, Dal-re et al. (9), recorded ecological data on a group of RECs reviewing a group of 100 different protocols from a single drug company. It is thus not possible to say which individual protocols led to differences in review between individual RECs. In addition, there were no details on the proposed research except two of the protocols.
Protocol Revisions
There were no incompatible judgments reported in any of these studies, although there were variations in the revisions required.
Patient Information Sheet
Twenty-five percent (1 of 4) of these RECs were significantly (p < .05) more likely to ask for changes (24). The number of changes to consent forms required by local IRBs in another study ranged from 3 to 160 (5). Forty-one percent (21 of 50) of these locally approved consent forms now had an inappropriately high reading grade level on the researchers' assessment. Most changes involved word alteration where the meaning was kept the same. Errors were commonly introduced (11.2 percent of changes), and 55 percent (33 of 50) contained at least one error in protocol presentation or a required consent form element. A third study reported that 18 percent (3 of 17) asked for substantive change to make patients more aware of available alternatives (9).
Risks and Expected Benefits
Fourteen percent (3 of 22) of RECs in one study, Goldman and Katz (19), objected to the degree of risk associated with a trial of intramuscular injections in an adult sample population.
Compensation Arrangements
Goldman and Katz (19) showed that 32 percent (7 of 22) of RECs objected to the level of compensation offered to those injured in a trial of intramuscular injections.
Scientific Issues
All RECs in one study raised methodological issues with researchers, although it is unclear from the reported data whether there were any differences among them (24). In another study, 68 percent (15 of 22) raised methodological queries over two trial protocols, of which 4 percent thought the process of randomization was not adequate in one protocol and 4 percent thought the end point was not clear enough in the other protocol (19). On reviewing a third protocol, 82 percent (18 of 22) had methodological queries, of which 50 percent related to the control (19).
Placebo Controls
The Goldman and Katz (19) study showed that 36 percent (8 of 22) questioned the ethical use of a placebo control arm in a trial of tamoxifen. This study was written at a time when the Declaration of Helsinki did not include a statement requiring controls to be the best available standard therapy so that placebos would be ethically justified only when no standard and available treatment existed as is currently the case.
DISCUSSION
The data show that there are indeed important variations in the judgments made by RECs and IRBs, although there were surprisingly few instances where some RECs approved while others rejected the same protocol. Most variation was concentrated on the revisions required of researchers before final approval. These differences, however, covered many issues, including the consent process, recruitment procedures, level of risk, compensation arrangements as well as scientific validity. The REC or IRB samples were generally small and convenient, and the reports were mostly anecdotal, making it impossible to generalize. It is difficult to draw firm conclusions and there is urgent need for good quality research in this area.
Many of the protocols under review involved analyzing records or doing surveys with a few of the earlier studies reporting variation in review of more invasive research. This finding could mean that the ethics of noninvasive research is not clear-cut, and more debate or training for RECs is needed.
In addition, the studies were published over a long period of time with the first in 1982 and the most recent in 2006, the remainder being almost evenly distributed in between. There did not seem to be an obvious trend in the types of reported differences between committees over time, although there continue to be such reports of variation especially in the United States where multiple review by individual institutions involved in multi-site research still exists. In any case, the data on ethical judgments are not subject to the same regulatory changes as administrative diktat and so the earlier studies remain important and relevant today. That said, there have been wider cultural changes toward respecting individual autonomy during this time period. The data do not show an obvious corresponding trend with one study published in 2006 reporting IRBs paternalistically rejecting an observational study of doctors learning of and adherence to practice guidelines being “too risky” (18). Interestingly, while there was a preoccupation with the issue of differences between RECs in the United Kingdom with ten studies (1–3;6;8;17;21;22;24–26;27;29;32;34), nearly as many in the United States with eight (5;7;18;19;23;30;31;33), and only one across Europe (22), one in Spain (9), and one in Switzerland (20), there were no obvious themes according to where the studies were held. However, it should also be noted that we cannot conclude from an absence of reported differences that RECs make similar judgments and, while the requirement for a single opinion at least across European Member States makes drug trials easier to get approved, there may still be concealed differences in REC's values.
Differences between RECs can sometimes be justified on local grounds. One study, for example, reported differences in recruitment of the right ethnic mix probably to represent the local population (32). However, there is less emphasis on catering for a local population at least in the United Kingdom, with protocols being centrally allocated to any REC in the country, and sometimes locality issues are taken out of the ethics remit all together (12). Local conditions remain important ethical considerations and should not be sidelined in pursuit of greater “consistency.”
Not all variation can be justified in this way. For example, in a study published in 1996, people who might become distressed by the offer of research participation were sometimes excluded on paternalistic grounds and sometimes not (29). Controversially, the very existence of ethics review by committees, which are made of up of people with different backgrounds, expertise, and values, may explain and even justify some differences in REC values (14). Indeed, it may be surprising that an REC is able to reach a formal consensus at all, although it is not only the outcome that may be ethically significant but also the process through which the judgment is made. With variation comes game playing, and researchers unashamedly can sometimes choose the REC they think will be most friendly and favorable. Policy makers might want to address this problem to make the system fairer and more impartial.
While the administrative arrangements are now more clearly laid out, with target time scales stipulated and standard forms provided, there is still little understanding of how such groups actually make the sorts of judgments required of them. We particularly need research to help identify the source of any aberrations, distortions, or confusions which could arbitrarily affect judgments. In the last analysis, the REC must legally behave “reasonably” within their terms of reference and be prepared to answer to an appeal from a researcher and fully justify its judgment when challenged.
POLICY IMPLICATIONS
- The studies reviewed above simply record differences between the judgments research ethics committees make, and research is urgently needed to show how ethics committees make such judgments. In particular, there is a need to help identify the sources of any aberrations, distortions, or confusions that could arbitrarily affect their judgments.
- Local issues remain an important ethical consideration and should not be ignored in the drive for greater “consistency.” Local ethics committees need to see all the relevant information upon which to make a judgment.
- Policy makers may want to address the ways in which variation in the system provides opportunities for researchers to play games. Allocation of projects to ethics committees should be fair and impartial, and researchers should perhaps be denied the opportunity to choose which committee reviews their work.
CONTACT INFORMATION
Sarah J. L. Edwards, BSc MA PhD (sarah.edwards1@uclh.org), Senior Lecturer, Centre for Bioethics and Philosophy of Medicine, University College London, Gower Street, London WC1E 6BT, UK; Senior Lecturer, Joint Research Unit UCLH/UCL, University College London Hospitals, NHS Trust, Tottenham Court Road, London WC1E 5DB, UK
Tracey Stone, BA, PhD (t.stone@bristol.ac.uk), Research Assistant, Teresa Swift, BA, MSc (t.swift@bristol.ac.uk), PhD student, Centre for Ethics in Medicine, Centre for Ethics in Medicine, Bristol University, St. Michael's Hill, Bristol BS2 8BH, UK
We thank Professors Iain Chalmers, Richard Lilford, and Andrew Stevens for their helpful comments on the layout of this paper.