The decision-making process of practising veterinarians as well as farm personnel and animal scientists should be based on objective information (Holmes & Cockcroft, Reference Holmes and Cockcroft2004). Therefore the implementation of evidence-based medicine becomes increasingly important. Sackett et al. (Reference Sackett, Rosenberg, Gray, Haynes and Richardson1996) defined evidence-based medicine as the conscientious, explicit and judicious use of the best external evidence currently available, for the purpose of making decisions concerning the medical care of individual patients. The term ‘evidence’ demonstrates the degree of certainty with which the results of a study reliably represent reality (Arlt & Heuwieser, Reference Arlt and Heuwieser2005).
However, Holmes & Cockcroft (Reference Holmes and Cockcroft2004) have postulated that there is a dearth of methodologically performed, rigorous, large-scale clinical studies in veterinary medicine resulting in a lack of research results of high evidence. This hypothesis has been supported by several authors (Mair & Cohen, Reference Mair and Cohen2003; Arlt et al. Reference Arlt, Dicty and Heuwieser2010) who demonstrated that in veterinary medicine the increase of knowledge is mainly based on reviewing field reports rather than randomized, controlled clinical studies. Nevertheless, researchers seem to have become increasingly aware of this problem. Not only did several authors develop some kind of guidelines concerning the question of how to conduct high-quality studies or meta-analyses (e.g. Lean et al. 2009). Also, many very well-conducted studies eligible for performing meta-analyses have been published during recent years (Rabiee et al. Reference Rabiee, Lean and Stevenson2005; Lean et al. Reference Lean, DeGaris, McNeil and Block2006). Nevertheless, randomized, controlled, double-blinded studies are the gold standard with regard to the evaluation of a given treatment (Kastelic, Reference Kastelic2006). The quality of a certain study depends on its design, its clinical relevance, the analysis of the study results, and the quality and comprehensiveness of the reporting (Arlt & Heuwieser, Reference Arlt and Heuwieser2005). Four stages of evidence have been suggested to categorize studies with respect to their quality (Bassler & Antes, Reference Bassler, Antes, Kunz, Ollenschläger, Raspe, Jonitz and Kolkmann2000). Stage I represents the highest level of evidence and refers to meta-analyses of randomized, controlled studies or evidence gained from at least one randomized, controlled study. Well-designed, controlled studies without randomization and well-designed, quasi-experimental studies generate evidence of stage II. Evidence being categorized as stage III is obtained through well-designed, descriptive studies that are not experimental. Finally, the lowest stage of evidence (stage IV) covers opinions of experts, results presented at scientific meetings as well as clinical experience of accredited authorities. In order to improve the quality of publications, scientists developed checklists containing important aspects for conducting trials. For instance, the CONSORT and the PRISMA statements aim to improve the reporting of randomized clinical trials, as well as systematic reviews and meta-analyses. The REFLECT statement is a modification of the CONSORT statement for veterinary science as livestock and food safety. Besides the lack of high-evidence studies, there is a marked variation in the quality of studies in veterinary science, resulting in insufficient comparability of the various trials (Cockcroft & Holmes, Reference Cockcroft and Holmes2003).
Clinical endometritis in cattle is defined as the presence of a purulent (>50% pus) uterine discharge detectable in the vagina 21 d or more post partum, or mucopurulent (approximately 50% pus, 50% mucus) discharge detectable in the vagina after 26 d post partum, (Sheldon et al. Reference Sheldon, Lewis, LeBlanc and Gilbert2006). Most recently, following a study conducted by Runciman et al. (Reference Runciman, Anderson, Malmo and Davis2008) which aimed to evaluate the role of vaginoscopy in predicting a reduction in reproductive performance parameters associated with a positive discharge detected by vaginoscopy, cytological endometritis diagnosed with a cytobrush and purulent vaginal discharges diagnosed by Metricheck device have been described as distinct manifestations of uterine inflammation (Dubuc et al. Reference Dubuc, Duffield, Leslie, Walton and LeBlanc2010b). The prevalence of post-partum uterine infections (up to 57·7%) (Sheldon, Reference Sheldon2009) and the resulting opportunity costs (decreased fertility, increased culling) underline the importance of this disease (Plaizier et al. Reference Plaizier, Lissemore, Kelton and King1998; LeBlanc et al. Reference LeBlanc, Duffield, Leslie, Bateman, Keefe, Walton and Johnson2002a; LeBlanc, Reference LeBlanc2008).
There is a wealth of information on the treatment of endometritis and this subject has been reviewed extensively by several authors (Gilbert & Schwark, Reference Gilbert and Schwark1992; Olson, Reference Olson1996; Azawi, Reference Azawi2008). However, the treatment of endometritis is still an issue of considerable controversy (Arlt et al. Reference Arlt, Padberg, Drillich and Heuwieser2009; Dubuc et al. Reference Dubuc, Duffield, Leslie, Walton and Leblanc2011). This may be due to the wide variety of therapies available for endometritis, including systemic or local antibiotics, prostaglandin F2α (PGF2α) and oestradiol.
Numerous studies have been conducted to evaluate the effect of a treatment with PGF2α or its analogues within 40 d of calving on reproductive performance of dairy cows. It is noteworthy that there is a wide disparity between the results (Burton & Lean, Reference Burton and Lean1995). Young et al. (Reference Young, Anderson and Plenderleith1984), for instance, reported a significant improvement in the first service conception rates of cows given PGF2α, whereas a study conducted by Macmillan et al. (1987) and including 1813 cows could not support these findings.
To shed some light on this issue, the overall objective of this study was to evaluate the quality and comparability of published literature and to summarize the effect of PGF2α for the treatment of endometritis. Specifically we set out to test two hypotheses: (1) studies published are diverse in respect to relevant quality criteria such as control groups, blinding, randomization, and sample size and (2) the majority of trials reveal an improvement of the reproductive performance through the application of PGF2α to cows with endometritis.
Materials and Methods
A comprehensive literature search was conducted on 4 August 2010 utilizing the search engine Vetseek (http://www.vetseek.info), databases Pubmed (http://www.pubmed.gov), Medline (http://www.medline.de), and Animal Production (http://www.ovid.com/site/catalog/DataBase/22.jsp) to identify literature related to the treatment of endometritis with prostaglandin in dairy cattle. The subject headings ‘endometritis AND cattle’, ‘endometritis AND cattle AND prostaglandin’ were used to include all articles written in English or German addressing the treatment of bovine endometritis with PGF2α. In addition, a systematic review of citations in the retrieved papers was carried out. We defined specific exclusion criteria to only include studies that focus on chronic endometritis, i.e. presence of a purulent (>50% pus) uterine discharge detectable in the vagina 21 d or more post partum (Sheldon et al. Reference Sheldon, Lewis, LeBlanc and Gilbert2006). Furthermore, we excluded studies in which the animals received concomitant treatments with medications other than PGF2α. Also, book chapters, case studies, review articles and abstracts were excluded. Furthermore, publications describing aetiological, epidemiological, microbiological or nutritional results, clinical symptoms or diagnostic procedures were rejected. Articles not meeting the inclusion criteria, owing to wrong indexation, and those not obtainable through the internet, bibliographies or inter-lending services were excluded as well. If multiple publications were retrieved describing the same trial, those containing the least information were regarded as doublets and excluded. Retrieval and management of references was performed with Endnote (Version X3 for Windows, Thomson Reuters, New York, NY, USA).
The remaining publications were evaluated according to various evidence parameters utilizing an evaluation form developed by Arlt (Reference Arlt and Heuwieser2010) and recently validated by Simoneit et al. (Reference Simoneit, Heuwieser and Arlt2011). Relevant criteria of the study design such as sample size, the involvement of control groups, either untreated, placebo-treated or treated with a drug other than PGF2α, blinding and randomization were considered. Furthermore, type and definition of endometritis, diagnostic methods, the drug and dosage applied, route of application, number of treatments, treatment time relative to calving, and reproductive performance parameters, i.e. calving to first service interval, calving to conception interval and conception rate, were documented in a spreadsheet. Descriptive statistics were compiled using SPSS for Windows (Version 18.0; SPSS Inc., Munich, Germany).
Results
In total, 4393 publications were retrieved (Vetseek, 2369; Pubmed, 570; Medline, 565; Animal Production, 889). After excluding doublets (n=1670), 2723 publications remained. According to the exclusion criteria, 2662 indexed articles had to be excluded resulting in 61 remaining publications which comprised 63 individual trials. Because 4 articles were retrieved through search by hand, a total of 65 publications, comprising 68 trials, met the inclusion criteria and were suitable for further analysis (Table 1).
Table 1. Research articles (n=68) studying the efficacy of PGF2α treatment of chronically endometritic cows chosen for evaluation
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921121516-61017-mediumThumb-S002202991200012X_tab1.jpg?pub-status=live)
According to Sheldon's (Reference Sheldon, Lewis, LeBlanc and Gilbert2006) definition of chronic endometritis, diagnosis had to be conducted later than 20 d post partum. However, several studies complied only partly with this time period (n=20), whereas others did not provide an exact time of diagnosis at all (n=16). To account for this variation, we decided to subdivide the studies according to their date of diagnosis.
More than half of all 68 trials (51·5%) were older than 20 years (Table 2). Trials that did not give a specific definition of endometritis made up 23·5% of all trials analysed. We found a sample size smaller than 50 in 16·2% of all trials and about one-third of the studies (36·8%) had included more than 200 animals. Overall, 70·6% of all trials included a control group (Table 3). Of those, 60·3% had a positive (a drug other than PGF2α), 22·1% an untreated, and 5·9% a placebo-treated control group. In 41·2% of all trials, the authors stated that allocation to treatment and control group had been conducted in a random manner. In 11·8% of all studies randomization was computerized, whereas 16·2% allocated the cows enrolled in an alternating order or with the help of ear-tag numbers, and 13·2% did not offer any details concerning the mode of randomization. Our investigation revealed that 38·2% of all trials were controlled and randomized. In this context, control groups were either untreated, placebo-treated or treated with a drug other than PGF2α. Trials were considered as randomized if the animals had reportedly been allocated to the various groups in random manner, i.e. by chance. Only 3 of those articles (4·4%) were also blinded.
Table 2. Relevant criteria of 68 trials studying the efficacy of PGF2α treatment of chronically endometritic cows
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921121516-92702-mediumThumb-S002202991200012X_tab2.jpg?pub-status=live)
† According to Sheldon et al. (Reference Sheldon, Lewis, LeBlanc and Gilbert2006)
Table 3. Relevant criteria of 68 trials studying the efficacy of PGF2α treatment of chronically endometritic cows
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921121516-75391-mediumThumb-S002202991200012X_tab3.jpg?pub-status=live)
† According to Sheldon et al. (Reference Sheldon, Lewis, LeBlanc and Gilbert2006)
Among the 68 trials, a wide variety of methods to diagnose the disease in question could be identified (Table 4). However, 13·2% did not specify the particular method used.
Table 4. Relevant diagnostic and therapeutic criteria of 68 trials studying the efficacy of PGF2α treatment of chronically endometritic cows
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921121516-78309-mediumThumb-S002202991200012X_tab4.jpg?pub-status=live)
† According to Sheldon et al. (Reference Sheldon, Lewis, LeBlanc and Gilbert2006)
‡ External inspection of the vulvar region
Concerning reproductive performance parameters, 19·1% of the studies provided a concise definition. By taking a closer look at the single parameters, it becomes obvious that the conclusions of the authors concerning an improvement are only partly proved as statistically significant (Table 5).
Table 5. Reproductive performance parameters described in 68 trials studying the efficacy of PGF2α treatment of chronically endometritic cows
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160920232452083-0823:S002202991200012X:S002202991200012X_tab5.gif?pub-status=live)
† According to the authors’ conclusion
Similar observations were made when assessing reproductive performance parameters examined in randomized and controlled trials as well as randomized, controlled, and blinded trials (Table 6).
Table 6. Reproductive performance parameters described in 68 trials studying the efficacy of PGF2α treatment of chronically endometritic cows
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921121516-84753-mediumThumb-S002202991200012X_tab6.jpg?pub-status=live)
Considering high-level evidence studies for which a conception rate was calculated (n=16), 7, 7 and 2 articles respectively revealed a positive, none or a negative effect (Table 7). Of those 7 articles revealing a positive effect, only 3 showed statistical significance. Twenty-two of thirty articles which were attributed to a moderate or high evidence level did not demonstrate a statistically significant effect of a PGF2α treatment. Six low-quality papers concluded a positive effect without statistical validation.
Table 7. Effects on conception rate after PGF2α treatment described in trials studying the efficacy of PGF2α treatment of chronically endometritic cows
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160920232452083-0823:S002202991200012X:S002202991200012X_tab7.gif?pub-status=live)
† According to Bassler and Antes (Reference Bassler, Antes, Kunz, Ollenschläger, Raspe, Jonitz and Kolkmann2000)
‡ Level 1, i.e. randomized, controlled trials
§ Level 2, i.e. well designed, controlled studies without randomization
¶ Level 3, i.e. well designed, descriptive non-experimental studies
Discussion
The checklists introduced may also be useful for critical appraisal of published reviews, but they are not explicitly designed as quality assessment instruments to determine the quality of articles (Simoneit et al. Reference Simoneit, Heuwieser and Arlt2011). Hence, for our study, we applied the evaluation form designed by Arlt et al. (Reference Arlt, Dicty and Heuwieser2010).
One might question whether studies older than 20 years should be included in such systematic literature assessment since the dairy industry has changed considerably particularly in respect to housing, feeding, and management (LeBlanc et al. Reference LeBlanc, Lissemore, Kelton, Duffield and Leslie2006). In the last two decades milk yield has increased by 39·2% (1990: 6705 kg/cow; 2009: 9333 kg/cow) (Blayney, Reference Blayney2002; USDA, 2010) and 45·2% (1990: 4857 kg/cow; 2010: 7050 kg/cow) (BMELV, 1992; ADR, 2010) in the United States and Germany, respectively. Thus, it can be questioned whether trials conducted more than 20 years ago support adequate evidence for specific recommendations in livestock health care today. On the other side, these studies provided observational evidence for the efficacy of PGF2α and were a relevant component of the development of the current best-practice standard.
The majority of the trials evaluated had insufficient detail of study design. In 16·2% of the trials the total sample size was smaller than 50, and only 36·8% had included more than 200 animals. More important than the absolute number of animals included, is the question whether the sample size of each group was large enough to test the proposed research hypothesis. None of the authors, however, mentioned a calculation of sample size for the study. Therefore, a final evaluation of the adequacy of the sample size is not possible.
Our investigation revealed that only about one-third of all trials was controlled and randomized. The mean sample size of those trials was 165·3±99. Untreated or placebo-treated control groups were included in only 28% of all trials. An overall shortage of randomized, controlled trials in veterinary medicine was also described by Kastelic (Reference Kastelic2006). This deficiency might be due to the high costs involved and ethical issues related to leaving diseased animals untreated.
A computer-generated random allocation of animals to treatment groups was implemented in only about a fourth of the randomized trials. The other trials allocated their animals according to ear-tag numbers or in an alternating order or did not offer any details concerning the mode of randomization at all.
According to Lund et al. (Reference Lund, James and Neaton1994), a truly random allocation scheme (assured by computer or random number table) implies a predetermined probability for every potential study subject for assignment to a treatment group. In contrast, systematic assignments, e.g. based on days of the week, are not recommended because they are vulnerable to manipulation. Allocation to study groups based on ear-tag numbers, however, can be considered as random because those numbers are assigned at birth and thus long before the study and without any fore knowledge of it.
We speculate that the allocation of animals in studies that claimed to be randomized but did not provide any information about the method (13·2%) should a priori be considered as not randomized but as haphazard. However, missing data, especially regarding older studies, could be due to incomplete reporting. Therefore, studies with missing data should not a priori be judged as low quality.
Randomized, controlled clinical trials provide the highest validity of results obtained (Schulz et al. Reference Schulz, Chalmers, Hayes and Altman1995). However, this specific study design is not applicable to every question (Antes, Reference Antes1998; Smith & Pell, Reference Smith and Pell2003). For example, it might be unethical to include an untreated control group if that would inevitably imply serious distress, suffering or even death for the animals involved (Sayre et al. Reference Sayre, O'Connor, Atkins, Billi, Callaway, Shuster, Eigel, Montgomery, Hickey, Jacobs, Nadkarni, Morley, Semenko and Hazinski2010). The process of randomization helps to assure that treatment groups are comparable with respect to known and unknown factors that could influence the primary outcome variable of the study (Lund et al. Reference Lund, James and Neaton1994). Our finding that only 38·2% of the trials were controlled and randomized clearly limits the strength of evidence of PGF2α as treatment of bovine endometritis.
Conclusions or treatments inferred from uncontrolled and unrandomized trials are in general less likely to be true than those based on randomized controlled trials. In our analysis, we found a considerable percentage (25·0%) of uncontrolled studies that described reproductive performance parameters. Drawing inference and implementing treatment decisions based on such results, however, should be considered carefully. A wide variety of diagnostic methods were applied in the 68 trials evaluated. It has been demonstrated that different methods to diagnose endometritis differ in their sensitivity (Drillich et al. Reference Drillich, Bergmann, Falkenberg, Kurth and Heuwieser2002; LeBlanc et al. Reference LeBlanc, Duffield, Leslie, Bateman, Keefe, Walton and Johnson2002b).
The classification of cytological endometritis and purulent vaginal discharge has been most recently described (Runciman et al. Reference Runciman, Anderson, Malmo and Davis2008; Dubuc et al. Reference Dubuc, Duffield, Leslie, Walton and LeBlanc2010a). Owing to a shortage of studies based on this new classification, we decided to use the definition by Sheldon (Reference Sheldon, Lewis, LeBlanc and Gilbert2006).
Specific and repeatable exclusion criteria were defined to exclude studies that did not focus on chronic endometritis, i.e. after 21 d after parturition (Sheldon et al. Reference Sheldon, Lewis, LeBlanc and Gilbert2006). Several studies, however, complied only partly with this time period and had also enrolled cows earlier, whereas others did not provide an exact time of diagnosis at all. Therefore we decided to classify studies according to their time of diagnosis. This classification was important because several authors observed a significant self-cure rate in cows with chronic endometritis during the first weeks post partum. The self-cure rate ranged from 92% in the first week to 25% in the seventh week post partum (Falkenberg & Heuwieser, Reference Falkenberg and Heuwieser2005; Hirsbrunner et al. Reference Hirsbrunner, Burkhardt and Steiner2006). A considerable inconsistency existed also regarding the calculation of the pregnancy outcome. Some authors assessed overall conception rates, whereas others calculated a first service conception rate or a pregnancy rate. Overall, only 19·1% of all studies provided a concise definition of reproductive performance parameters used. In addition, a specific definition of endometritis was not given in 23·5% of all trials analysed. This lack of homogeneity to some extend limits comparability of study results.
Our results demonstrate that an impressive percentage of studies addressing the efficacy of PGF2α are severely flawed in the study design, and that comparability between publications is limited owing to considerable differences. In human medicine intensive examination of the appraising of available literature has been conducted in the framework of evidence-based medicine (EBM). However, Arlt & Heuwieser (Reference Arlt and Heuwieser2010) point out the need for further appraisal of scientific publications in veterinary medicine.
One objective of the study was to summarize the effect of PGF2α for the treatment of endometritis. Of those trials assessing reproductive performance parameters (calving to first service interval, calving to conception interval, conception rate), statistically significant effects on reproductive performance were reported only in a small fraction of trials. Twenty-two of 37 studies that evaluated conception rate were attributed to a moderate or high evidence level and did not show any statistically significant effect of PGF2α treatment (Table 7). A positive effect was revealed by 21 articles. Of those, only 6 reported a statistical significance. However, it is stressed by different authors that different factors, such as the time of diagnosis (LeBlanc et al. Reference LeBlanc, Duffield, Leslie, Bateman, Keefe, Walton and Johnson2002b), the severity of endometritis, or the additional occurrence of other puerperal disorders (Burton & Lean, Reference Burton and Lean1995) may influence the efficacy of a PGF2α treatment. Based on our results, we propose a tendency of low-quality papers concluding a positive effect without statistical validation (n=6). However, it is important to emphasize that low-quality trials do not necessarily show larger effects of a certain intervention (Kunz & Oxman, Reference Kunz and Oxman1998). We conclude that the evidence for the efficacy of PGF2α for the treatment of chronic bovine endometritis is limited. In combination with most recent results (Dubuc et al. Reference Dubuc, Duffield, Leslie, Walton and LeBlanc2010b), we suggest that the use of PGF2α as a standard treatment for endometritis should be critically reconsidered. Further research in the form of controlled, randomized and blinded trials is required to assess and quantify the efficacy of this treatment.