The majority of scientists indicate that they have experienced failure to replicate other researchers’ preclinical results, and often it fails to transfer the results to clinical studies on humans. For example, in basic psychiatric, where 150 preclinical anti-stroke drugs fail when tested in humans (Macleod, Reference Macleod2005) and out of 244 promising drugs for Alzheimer’s disease, only 1 was approved after clinical trial in humans (Cumming et al., Reference Cumming, Morstorf and Zhung2014). These and many other studies reflect that neuropsychiatric experimental animal science is in a so-called reproducibility and translational crisis that is more extensive than previously thought. As the name indicates, the crisis consists of two closely related elements. First, it is difficult to reproduce scientific results in the same species of experimental animals (typically mice or rats) which means that the sensitive psychiatric animal models may react differently from laboratory to laboratory. Secondly, promising scientific results obtained in animal experiments often give different and disappointing results when tested in clinical human psychiatric studies (also known as the ‘bench to bedside’ crisis).
There seem to be several possible explanations for the crisis such as inadequate experimental design, faulty randomization, statistics and reporting, as well as missing standardization of animal experimentation. It is obvious that many scientific papers contain too little information to be fully reproducible for other research groups. The origin of the animals and their daily care are often not throughout described, and information about, for example, acclimatization period and animal health condition often lack. Even apparent small differences in procedures may be crucial, as the study of Hurst and West (Hurst & West, Reference Hurst and West2010) showed where the handling of mice prior to Open Field test had an impact on their behaviour – a point of great importance for basic psychiatric animal research. As a result, a number of guidelines (e.g. ARRIVE) have been developed listing the information that should be found in the method section of scientific articles. Undoubtedly, such guidelines will benefit reproducibility in the future.
Even such comprehensive guidelines as the ARRIVE do not address all information relevant to quality of results. For example, in a 5-year-old study published in Nature Methods, rodents react differently in behavioural tests, depending on whether the operator was a woman or a man (Sorges et al., Reference Sorges, Martin, Isbester, Sotocinal, Rosen, Tuttle, Wieskopf, Acland, Dokova, Kadoura and Leger2014). Despite this, preclinical psychiatric studies are still reported without information on staff gender, and as part of this the operator’s gender is not even implanted in the ARRIVE guidelines. ARRIVE guidelines should be regularly updated in accordance with the international scientific literature to include all the factors that are known to be important for the experiments. The use of ARRIVE guidelines is likely to facilitate the implementation of in-depth systematic reviews that will be able to identify which factors are crucial for animal testing and which may be excluded. It will undoubtedly increase the reproducibility of the preclinical research in the end.
It is also likely that ARRIVE guidelines may indirectly play a role in solving the neuropsychiatric translational crisis. Through systematic reviews based on published papers, it may be possible to demonstrate which experimental set-up is most likely to lead to translational results in animal experiments. Is it one or the other model of depression that is best to predict whether a drug has a clinical effect in humans? ARRIVE guidelines can help ensuring such useful data used in systematic reviews.
The question is whether other important reasons for the poor translational value of preclinical research exist? One reason could be the 3R concept. Here we point out that standardization of animal experiments is a possible problem. Standardization of laboratory animals, combined with inbreeding and microbiological alignment, reduces the intraspecies variation and thereby reduces the number of animals required for statistical adequacy in each study. This complies with the 3R (replacement, reduction and refinement) concept, where the second R is concerned with the reduction of the number of experimental animals, a goal that has achieved historical success with far fewer experimental animals per study today than in the 1950s. However, the modern use of the three Rs completely ignore translational aspects of transfer animal studies to human clinic through their exclusive focus on replacement, reducing the total number and suffering of animals. This is paradoxical as Russell and Burch, the architects of the 3R concept, made great efforts to define the characteristics of a useful animal model in their book. It is hardly ethical to use experimental animals if the results are not usable?
However, why is reduction in intraspecies variation so problematic for the translational aspects of animal experimentation? Individual variation in both humans and animals is a fundamental characteristic of all biology, and it is a prerequisite for Darwinian evolution. Eliminating this intraspecific variation in laboratory animals distorts the diversity of responses to new psychiatric medical test substances. Preclinical studies with a lack of animal diversity take the risk of incorrectly capturing the broad range of possible biological effects, even where the biological effect is only marginal. Therefore, test drugs developed in laboratory animals may later be tested in humans with disappointing and unforeseen results.
Should preclinical researchers then completely abandon efforts to reduce the number of experimental animals? The simple answer is no, but not at the expense of invaluable intraspecies response variability. Optimal experimental design should ensure that experimental animals are their own control in crossover, left–right comparisons or pre–post studies when possible. This will be a stronger design than if one simply chose to use a single inbred experimental strain. Moreover, a new published Nature Method study has shown that it was not possible to reduce the variation (and thus the group sizes) using inbred mice rather than outbred – nor in behavioural or other studies (Alexander et al., Reference Alexander, Elissa, Chesler and Mogil2018). If inbred animals are needed, several different strains can advantageously be included in the same study, avoiding the pitfalls of just using a single strain.
We therefore recommend that researchers use a broader perspective, when considering the 3R concept and involve the neuropsychiatric translational aspects. This may indeed contribute to solving the translational crisis in the preclinical research. Here, the ARRIVE guidelines can serve to make this set-up adequately described in the papers.
The majority of scientists indicate that they have experienced failure to replicate other researchers’ preclinical results, and often it fails to transfer the results to clinical studies on humans. For example, in basic psychiatric, where 150 preclinical anti-stroke drugs fail when tested in humans (Macleod, Reference Macleod2005) and out of 244 promising drugs for Alzheimer’s disease, only 1 was approved after clinical trial in humans (Cumming et al., Reference Cumming, Morstorf and Zhung2014). These and many other studies reflect that neuropsychiatric experimental animal science is in a so-called reproducibility and translational crisis that is more extensive than previously thought. As the name indicates, the crisis consists of two closely related elements. First, it is difficult to reproduce scientific results in the same species of experimental animals (typically mice or rats) which means that the sensitive psychiatric animal models may react differently from laboratory to laboratory. Secondly, promising scientific results obtained in animal experiments often give different and disappointing results when tested in clinical human psychiatric studies (also known as the ‘bench to bedside’ crisis).
There seem to be several possible explanations for the crisis such as inadequate experimental design, faulty randomization, statistics and reporting, as well as missing standardization of animal experimentation. It is obvious that many scientific papers contain too little information to be fully reproducible for other research groups. The origin of the animals and their daily care are often not throughout described, and information about, for example, acclimatization period and animal health condition often lack. Even apparent small differences in procedures may be crucial, as the study of Hurst and West (Hurst & West, Reference Hurst and West2010) showed where the handling of mice prior to Open Field test had an impact on their behaviour – a point of great importance for basic psychiatric animal research. As a result, a number of guidelines (e.g. ARRIVE) have been developed listing the information that should be found in the method section of scientific articles. Undoubtedly, such guidelines will benefit reproducibility in the future.
Even such comprehensive guidelines as the ARRIVE do not address all information relevant to quality of results. For example, in a 5-year-old study published in Nature Methods, rodents react differently in behavioural tests, depending on whether the operator was a woman or a man (Sorges et al., Reference Sorges, Martin, Isbester, Sotocinal, Rosen, Tuttle, Wieskopf, Acland, Dokova, Kadoura and Leger2014). Despite this, preclinical psychiatric studies are still reported without information on staff gender, and as part of this the operator’s gender is not even implanted in the ARRIVE guidelines. ARRIVE guidelines should be regularly updated in accordance with the international scientific literature to include all the factors that are known to be important for the experiments. The use of ARRIVE guidelines is likely to facilitate the implementation of in-depth systematic reviews that will be able to identify which factors are crucial for animal testing and which may be excluded. It will undoubtedly increase the reproducibility of the preclinical research in the end.
It is also likely that ARRIVE guidelines may indirectly play a role in solving the neuropsychiatric translational crisis. Through systematic reviews based on published papers, it may be possible to demonstrate which experimental set-up is most likely to lead to translational results in animal experiments. Is it one or the other model of depression that is best to predict whether a drug has a clinical effect in humans? ARRIVE guidelines can help ensuring such useful data used in systematic reviews.
The question is whether other important reasons for the poor translational value of preclinical research exist? One reason could be the 3R concept. Here we point out that standardization of animal experiments is a possible problem. Standardization of laboratory animals, combined with inbreeding and microbiological alignment, reduces the intraspecies variation and thereby reduces the number of animals required for statistical adequacy in each study. This complies with the 3R (replacement, reduction and refinement) concept, where the second R is concerned with the reduction of the number of experimental animals, a goal that has achieved historical success with far fewer experimental animals per study today than in the 1950s. However, the modern use of the three Rs completely ignore translational aspects of transfer animal studies to human clinic through their exclusive focus on replacement, reducing the total number and suffering of animals. This is paradoxical as Russell and Burch, the architects of the 3R concept, made great efforts to define the characteristics of a useful animal model in their book. It is hardly ethical to use experimental animals if the results are not usable?
However, why is reduction in intraspecies variation so problematic for the translational aspects of animal experimentation? Individual variation in both humans and animals is a fundamental characteristic of all biology, and it is a prerequisite for Darwinian evolution. Eliminating this intraspecific variation in laboratory animals distorts the diversity of responses to new psychiatric medical test substances. Preclinical studies with a lack of animal diversity take the risk of incorrectly capturing the broad range of possible biological effects, even where the biological effect is only marginal. Therefore, test drugs developed in laboratory animals may later be tested in humans with disappointing and unforeseen results.
Should preclinical researchers then completely abandon efforts to reduce the number of experimental animals? The simple answer is no, but not at the expense of invaluable intraspecies response variability. Optimal experimental design should ensure that experimental animals are their own control in crossover, left–right comparisons or pre–post studies when possible. This will be a stronger design than if one simply chose to use a single inbred experimental strain. Moreover, a new published Nature Method study has shown that it was not possible to reduce the variation (and thus the group sizes) using inbred mice rather than outbred – nor in behavioural or other studies (Alexander et al., Reference Alexander, Elissa, Chesler and Mogil2018). If inbred animals are needed, several different strains can advantageously be included in the same study, avoiding the pitfalls of just using a single strain.
We therefore recommend that researchers use a broader perspective, when considering the 3R concept and involve the neuropsychiatric translational aspects. This may indeed contribute to solving the translational crisis in the preclinical research. Here, the ARRIVE guidelines can serve to make this set-up adequately described in the papers.
Author contributions
AKOA and CS did all the writing together.
Financial support
No financial support was received for this work.
Conflict of interest
We declare no conflicts of interests.