Effectiveness of physical activity in the treatment of chronic conditions such as psychiatric diseases, metabolic diseases, cardiovascular diseases, pulmonary diseases, musculoskeletal disorders, or cancers is now well established (Reference Pedersen and Saltin1). Based on clinical evidence, in several countries such as Sweden or the United Kingdom, general practitioners can prescribe physical activity for at risk and chronically ill patients. Where there is a recognized clinical benefit to adding physical activity interventions to usual care in the treatment of a chronic condition, the cost-effectiveness of these interventions needs to be assessed for optimizing allocation of health care resources.
In a previous systematic literature review, Roine et al. (Reference Roine, Roine and Räsänen2) identified sixty-five studies focusing on the cost-effectiveness of exercise programs in the treatment of various diseases. Most of these studies focused on chronic conditions such as musculoskeletal and rheumatic disorders or cardiovascular diseases. The authors found large variations in the cost-effectiveness of interventions based on physical exercise. They nevertheless concluded that some kinds of exercise interventions can be cost-effective, especially for the treatment of cardiovascular diseases and low back pain, despite partly contradictory findings.
Since the work of Roine et al. in 2009 (Reference Roine, Roine and Räsänen2), no recent review gathered available evidence on the cost-effectiveness of exercise-based programs in the treatment of chronic conditions. Our objective is to provide an up-to-date literature review. We systematically review and evaluate the methodological quality of the economic evaluations of physical activity programs among chronically ill patients published since 2008.
METHODS
We used a predefined research protocol for inclusion criteria and methods of analysis. This protocol was not registered in the international prospective register of systematic reviews (PROSPERO) before the start of the study. The review was conducted according to PRISMA guidelines.
Information Sources and Search Strategy
To identify relevant articles published since 2008, two databases were searched using keywords for the period between January 1, 2008 and December 31, 2016 (last search November 23, 2017): PUBMED and JSTOR. JSTOR is a multidisciplinary database of academic content which was searched to identify health economics articles from journals not indexed in PubMed. Search terms were classified in two categories relating to physical activity or economic analysis. In the two databases the following keywords were used as general search terms in titles: “cost-effectiveness”, “cost-benefit”, “cost-utility”, “economic evaluation”, “economic analysis” or “economic impact” for the economic analysis category and “physical activity”, “sport”, “exercise”, “training”, “strength”, “fitness”, “running”, “walking”, “swimming”, or “gymnastics” for the physical activity category. Articles including at least one search term of each category in their title were selected for inclusion. In addition, we used the following major Medical Subject Headings (MeSH) terms to search for articles in PUBMED: “costs and cost analysis” and “exercise therapy”, “sports”, “exercise”, “physical fitness”. Supplementary File 1 provides the full search strategy for each database. References in literature reviews identified through the keywords search were screened and were included if they met all inclusion criteria. The result of the systematic search was recorded in Zotero®, in particular to remove duplicates.
Selection Criteria
Full economic evaluations of exercise programs, either based on the results of a randomized controlled trial (RCT) or using a model based on one or several RCTs, comparing costs and health benefits of two or more interventions targeting patients with chronic conditions were included. Studies were excluded if not in English or French or if published before 2008 or after 2016. Studies that did not report on original data (i.e., commentaries, editorials, case studies and study protocols) or studies already included in the previous literature review of Roine et al. (Reference Roine, Roine and Räsänen2) were excluded. Furthermore, studies were excluded if the exercise program targeted nonchronically ill individuals, if they compared programs including different types of physical activity or if they provided only indirect evaluations of physical activity programs (e.g., media campaigns promoting physical activity or counselling without participation in an exercise program). We chose to exclude multicomponent programs for which at least one component was not physical activity, unless strictly related to enhancing participation in the physical activity component. For instance, studies mixing physical activity with weight management or a psychological intervention were excluded. On the other hand, studies including techniques to enhance participation along with the exercise program under evaluation were included. We included studies using either general or disease-specific health benefit measures. Disease-specific measures can provide valuable information to compare health benefits within the same disease category and their use might be needed when generic benefit measures such as quality-adjusted life-year (QALY) lack sensitivity to capture the effects of the exercise program on health.
Study Selection
In a first step, M.G. undertook the systematic keywords search in the two databases and performed the first eligibility assessment based on titles and abstracts following the predefined inclusion and exclusion criteria determined by the three authors (M.G., L.R., J.C.K.D.). Articles were classified in two categories: “no” if the article clearly violated one of the inclusion criteria or met one of the exclusion criteria and “maybe” when there was uncertainty. Titles and abstracts of articles in the “maybe” category were read by the two other researchers (L.R. and J.C.K.D.) and the decision about their inclusion in the next step was agreed upon consensually. In the second step, full-text reading of potentially eligible articles was performed by M.G. to determine final eligibility. Articles excluded through this second step were read by the two other researchers (L.R. and J.C.K.D.) and in case of doubt the decision was made through consensus. Data were then extracted for all articles meeting inclusion criteria. Data extraction was performed independently by the three researchers on a sample of ten articles to identify lack of consistency in data extraction. Data extraction on the remaining articles was performed by M.G.
Data Collection and Quality Evaluation
Using a predefined form, we extracted data on the following main categories: pathology, characteristics of the study population, exercise program and comparator(s), type and measurement of costs, type and measurement of health outcomes, design, results and uncertainty analyses performed. We adopted three classification categories regarding the cost measurement perspective. We considered that a study used a health care perspective if resource use from the health system for the program and health care consumptions of patients over the study period were taken into account. We classified the perspective of the study as “health and social care” if resource use from social services was also included. Finally, the perspective was held as societal if the study took into account at least one of the following costs: productivity losses, opportunity cost of time spent exercising or cost of informal care. We classified the results of the economic evaluations based on the broader cost perspective in each study and according to the Incremental cost-utility ratio (ICUR) rather than Incremental cost-effectiveness ratios (ICERs) if both types of ratios were calculated. The physical activity program was considered cost-effective if its cost per QALY was below the lower bound £20,000 per QALY threshold referred to by the National Institute for Health and Care Excellence (NICE). We also mention when an exercise program was cost-effective only at the upper bound NICE threshold of £30,000 per QALY. To determine the cost-effectiveness of physical activity programs, all ICURs were converted in United States dollars (US$) using the Purchasing Power Parity (PPP) exchange rate of the price year used in the study and compared with the NICE thresholds converted in US$ using the PPP exchange rate of the same year. Exchange rates were drawn from the Organisation for Economic Co-operation and Development (OECD) (3).
Using the predefined extraction form we also collected data on the characteristics of the RCTs, on the structural assumptions and validity checks of model-based studies and on the economic characteristics needed to assess the methodological quality of the study. Data that could not be retrieved from the economic evaluation study were gathered from the study protocol or the clinical evaluation study.
Based on the extracted data, we assessed the risk of bias of the RCTs (for RCT-based studies and for modelling studies based on the results of a single RCT) using the Cochrane Risk of Bias Tool for Randomized Controlled Trials (Reference Higgins, Altman and Gøtzsche4). We assessed three criteria of the tool: random sequence generation, allocation concealment, and incomplete outcome data. The other criteria were not investigated given the nature of this review. Specifically, selective outcome reporting was not investigated as it can be legitimate to focus on outcomes such as QALYs in the cost-effectiveness analysis. Blinding of participants and personnel was not possible for physical activity programs and the blinding of outcome assessment criterion was not evaluated as all studies used self-reported health benefit measures. For the three criteria assessed, the RCT was classified as “low risk,” “high risk,” or “unclear risk.” We scored each criterion as one for “low risk” and zero otherwise. The scores of the three criteria were summed to create an overall score of RCT quality ranging from 0 to 3.
We used the adjusted Consensus Health Economic Criteria (CHEC) list (Reference Evers, Goossens, De Vet, Van Tulder and Ament5) to assess the methodological quality of the economic evaluations. The CHEC list was specifically designed for conducting systematic reviews based on economic evaluation studies and its use is recommended by the Cochrane Collaboration (Reference Higgins and Green6). The list was recently adapted to fit both model and trial-based economic evaluations (Reference Odnoletkova, Goderis and Pil7;Reference van Eeden, van Heugten, van Mastrigt and Evers8). The adjusted CHEC list contains twenty yes/no questions on the methodology of the economic evaluations. To obtain an index of methodological quality, we scored each item as 1 if the adjusted CHEC list criterion was satisfactorily fulfilled (yes) and 0 (no) otherwise. One question (question 5) is specific to model-based studies while another question on discounting (question 15) only applies to studies with time horizons longer than 1 year. Thus, the maximum achievable score ranges between 18 and 20. We classified the methodological quality of the economic evaluations based on the percentage of the maximum achievable score they obtained: low (<50 percent), moderate (50–70 percent), and high (>70 percent).
RESULTS
Study Selection
Keywords search gave 431 hits. After removing duplicates, 426 studies were screened for inclusion. Exclusion was carried out in two steps. First, 333 studies were excluded after title and abstract reading. Second, full-text reading led to exclude fifty-five of the ninety-three remaining articles based on the criteria described in Figure 1. Among the thirty-eight articles retained for analysis, two studies reported on the same results from a unique physical activity program, leaving a total of thirty-seven different economic evaluations (Reference Aboagye, Karlsson, Hagberg and Jensen9–Reference Coyle, Coyle and Kenny45).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220224184918091-0099:S0266462318000533:S0266462318000533_fig1g.gif?pub-status=live)
Fig. 1. Selection of included articles.
Overview of Included Studies
Table 1 presents the major program and economic characteristics of the thirty-seven studies. Further details can be found in Supplementary Files 2 and 3. The main disease categories were musculoskeletal and rheumatologic disorders (eleven studies, 29.7 percent), cardiovascular diseases (ten studies, 27 percent), neurological disorders (six studies, 18.2 percent), mental illnesses (three studies, 8.1 percent) and cancers (three studies, 8.1 percent). A majority of articles came from two countries: the United Kingdom with twelve articles (32.4 percent of total) and the Netherlands with eleven articles (29.7 percent of total). Among the thirty-seven included studies, thirty were RCT-based (81.1 percent), while seven (18.9 percent) were modeling studies based on one or several RCTs. Only two studies exclusively used disease-specific measures of health benefits (Reference Hurley, Walsh and Mitchell13;Reference Sevick, Miller, Loeser, Williamson and Messier16). The time horizon of RCT-based studies ranged from 12 weeks (Reference d'Amico, Rehill and Knapp36) to 2.5 years (Reference Reed, Whellan and Li23). Model-based studies had longer time horizons as they most often simulated the long-term effects of exercise programs. A total of seventeen studies (45.9 percent) were conducted from a societal cost perspective, fifteen studies (40.6 percent) from a health care perspective and five studies (13.5 percent) from a health and social care perspective. The physical activity interventions differed in terms of type, volume, and duration of exercise performed, even within disease categories.
Table 1. Program Description and Economic Characteristics
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220224184918091-0099:S0266462318000533:S0266462318000533_tab1.gif?pub-status=live)
a 3-letter International Organization for Standardization country codes. AU: Australia; BR: Brazil; CA: Canada; CH: Switzerland; CO: Colombia; ES: Spain; FI: Finland; GB: The United Kingdom; NL: The Netherlands; NZ: New Zealand; SE: Sweden; US: The United States.
b Broader cost perspective reported by the authors.
6MWD, 6-Minute Walking Distance; COPD, chronic obstructive pulmonary disease; h, hours; NPI, Neuropsychiatric Inventory; PFDI-20, Pelvic-Floor-Distress-Inventory-20; QALY, quality-adjusted life-year; RCT, randomized controlled trial; WOMAC, Western Ontario and McMaster Universities Osteoarthritis Index.
RESULTS BY DISEASE CATEGORY
Musculoskeletal and Rheumatologic Disorders
Among the eleven studies on musculoskeletal and rheumatologic disorders, eight programs included strength exercises, four included stretching exercises, two included aerobics, two included balance exercises, one included water-based exercises and one included yoga. Some exercise programs were very long and intensive with 104 hours of exercise over 8 months (Reference Gusi and Tomas-Carus11) while others were much shorter and less intensive with 4 hours of exercise over 2 weeks (Reference Manning, Kaambwa and Ratcliffe14). Only three of eleven studies did not take usual care as the comparator. In these studies, the physical activity program was compared with self-care advice (Reference Aboagye, Karlsson, Hagberg and Jensen9), leaflet provision (Reference Barton, Sach and Jenkinson10), or therapeutic education on weight and exercise (Reference Sevick, Miller, Loeser, Williamson and Messier16). All studies were based on RCTs while only two studies, both focusing on knee pain, did not include QALYs (Reference Hurley, Walsh and Mitchell13;Reference Sevick, Miller, Loeser, Williamson and Messier16). Seven of eleven studies used a societal cost perspective. The time horizon for the evaluation ranged from 8 months (Reference Gusi and Tomas-Carus11) to 30 months (Reference Hurley, Walsh and Mitchell13).
Results of the quality assessment of the RCTs are available in Supplementary File 4. Among the eleven studies on musculoskeletal and rheumatologic disorders, three (Reference Barton, Sach and Jenkinson10;Reference Manning, Kaambwa and Ratcliffe14;Reference Tan, Teirlinck and Dekker18) obtained the maximum RCT quality score of 3, six obtained a score of 2, and two obtained lower scores of 1 (Reference Aboagye, Karlsson, Hagberg and Jensen9) or 0 (Reference Henchoz, Pinget and Wasserfallen12). Table 2 gives the results of the methodological quality assessment of the economic evaluations. Among the eleven studies on musculoskeletal and rheumatologic disorders, five studies had a high score, five had a moderate score, and one had a low score (Reference Sevick, Miller, Loeser, Williamson and Messier16).
Table 2. Assessment of the Methodological Quality of the Economic Evaluations
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220224184918091-0099:S0266462318000533:S0266462318000533_tab2.gif?pub-status=live)
Note. + The adjusted Consensus Health Economic Criteria list criterion is satisfactorily fulfilled. - The adjusted Consensus Health Economic Criteria list criterion is not satisfactorily fulfilled. † Not applicable: study based on a randomised controlled trial. * Not applicable: time horizon shorter or equal to 12 months.
Table 3 gives the results of the cost-effectiveness analyses. Results are reported for the broadest cost perspective adopted by the authors. Full results for all cost perspectives and health benefit measures are detailed in Supplementary File 5. For the eleven exercise programs focusing on musculoskeletal and rheumatologic disorders, five were dominant (cheaper and more effective) (Reference Aboagye, Karlsson, Hagberg and Jensen9;Reference Hurley, Walsh and Mitchell13;Reference Manning, Kaambwa and Ratcliffe14;Reference Tan, van Linschoten and van Middelkoop17;Reference Tan, Teirlinck and Dekker18) and five represented intermediate cases as they were both more expensive and more effective than their comparators (Reference Barton, Sach and Jenkinson10;Reference Gusi and Tomas-Carus11;Reference Henchoz, Pinget and Wasserfallen12;Reference Pinto, Robertson and Abbott15;Reference Williams, Williamson and Heine19). Among the five intermediate cases, the exercise program was cost-effective at the £20,000 per QALY threshold in three studies (Reference Gusi and Tomas-Carus11;Reference Pinto, Robertson and Abbott15;Reference Williams, Williamson and Heine19) and not cost-effective in two studies (Reference Barton, Sach and Jenkinson10;Reference Henchoz, Pinget and Wasserfallen12). The cost-effectiveness of the exercise program was uncertain in one study that only used disease-specific measures of health benefits. This study did not report cost-effectiveness acceptability curves nor discuss the threshold values that should be used to judge the cost-effectiveness based on the disease-specific health benefit measures used (Reference Sevick, Miller, Loeser, Williamson and Messier16).
Table 3. Results of Cost-Effectiveness Analyses
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220224184918091-0099:S0266462318000533:S0266462318000533_tab3.gif?pub-status=live)
a Based on the £20,000 per QALY threshold used by the National Institute for Health and Care Excellence. Results are based on the ICUR rather than ICERs if both types of ratios were calculated and are reported for the broadest cost perspective adopted by the authors.
b Not cost-effective at the £20,000 per QALY threshold but cost-effective at the £30,000 per QALY threshold.
c The intervention is implemented by a community organisation and the physiologist is an employee of the organisation. The authors also report the cost-effectiveness results for a private model where exercise physiologists working privately integrate the intervention into their routine practice. In both cases, the exercise program is not cost-effective.
6MWD, 6-minute walking distance; AU, Australia; CA, Canada; COPD, chronic obstructive pulmonary disease; ICER, incremental cost-effectiveness ratio; ICUR, incremental cost-utility ratio; Int, international; NPI, Neuropsychiatric Inventory; NZ, New Zealand; p, probability; PFDI-20, Pelvic-Floor-Distress-Inventory-20; QALY, quality-adjusted life-year; US, United States of America; WOMAC, Western Ontario and McMaster Universities Osteoarthritis Index.
Cardiovascular Diseases
Among the ten studies on cardiovascular diseases, four programs included walking, four included strength or stretch exercises, four included aerobics while one study reported no information on the exercise program (Reference Rincón, Rojas and Romero25). The duration and volume of the exercise program was highly variable. For example, in the case of intermittent claudication, the exercise program lasted from 12 weeks (Reference Mazari, Khan and Carradice22) to 12 months (Reference van Asselt, Nicolaï, Joore, Prins and Teijink27) or included 24 (Reference Spronk, Bosch and den Hoed26;Reference Van Den Houten, Lauret and Fakhry28) to 78 hours (Reference Reynolds, Apruzzese and Galper24;Reference van Asselt, Nicolaï, Joore, Prins and Teijink27) of supervised exercise. Usual care or optimal medical care was the comparator in six studies, while one study compared physical activity with walking advice and leaflet provision (Reference van Asselt, Nicolaï, Joore, Prins and Teijink27). The comparator was surgery in three studies on intermittent claudication (Reference Mazari, Khan and Carradice22;Reference Spronk, Bosch and den Hoed26;Reference Van Den Houten, Lauret and Fakhry28). Grouping the findings from studies using surgical and nonsurgical comparators might not be appropriate and the results are thereafter differentiated for these two types of studies.
Among the seven studies using nonsurgical comparators, four were based on a RCT while three were model-based. All seven studies included QALYs while only three (Reference Reed, Whellan and Li23;Reference Reynolds, Apruzzese and Galper24;Reference Witham, Fulton and Greig29) used a societal cost perspective. The time horizon for the evaluation ranged from 24 weeks (Reference Witham, Fulton and Greig29) to 2.5 years (Reference Reed, Whellan and Li23). Among the three studies using a surgical comparator, one study was model-based (Reference Van Den Houten, Lauret and Fakhry28), all studies used QALYs and two studies adopted a societal cost perspective (Reference Mazari, Khan and Carradice22;Reference Spronk, Bosch and den Hoed26). One study evaluated the exercise program over 10 years (Reference Kühr, Ribeiro, Rohde and Polanczyk21), while the time horizon was 12 months in the other two studies (Reference Mazari, Khan and Carradice22;Reference Spronk, Bosch and den Hoed26).
The RCT-quality could be rated for seven studies, including five studies using nonsurgical comparators. Among these five studies, two obtained a score of 2 (Reference Reed, Whellan and Li23;Reference Witham, Fulton and Greig29) and three a score of 1 (Reference Hautala, Kiviniemi and Mäkikallio20;Reference Reynolds, Apruzzese and Galper24;Reference van Asselt, Nicolaï, Joore, Prins and Teijink27). Among studies using surgical comparators, one had a RCT quality score of 0 (Reference Mazari, Khan and Carradice22) and the other a score of 2 (Reference Spronk, Bosch and den Hoed26). For the seven studies using nonsurgical comparators, one (Reference Reynolds, Apruzzese and Galper24) had a high score for the methodological quality of the economic evaluation, while five had a moderate score, and one had a low score (Reference Kühr, Ribeiro, Rohde and Polanczyk21). The three studies using surgical comparators obtained a moderate score for the quality of the economic evaluation.
Among the seven studies using nonsurgical comparators, the exercise program was dominant in two studies (Reference Mazari, Khan and Carradice22;Reference Witham, Fulton and Greig29), cost-effective in two studies (Reference Reynolds, Apruzzese and Galper24;Reference Rincón, Rojas and Romero25) and not cost-effective in three studies (Reference Kühr, Ribeiro, Rohde and Polanczyk21;Reference Reed, Whellan and Li23;Reference van Asselt, Nicolaï, Joore, Prins and Teijink27). If an alternative cost-effectiveness threshold at £30,000 per QALY were to be adopted, exercise programs would then be cost-effective in two studies (Reference Kühr, Ribeiro, Rohde and Polanczyk21;Reference van Asselt, Nicolaï, Joore, Prins and Teijink27). Among the three studies using surgical comparators, the exercise program was dominant in one study (Reference Hautala, Kiviniemi and Mäkikallio20) and cost-effective in the other two studies (Reference Spronk, Bosch and den Hoed26;Reference Van Den Houten, Lauret and Fakhry28).
Neurological Disorders
Among the six studies on neurological disorders, two studies (Reference Farag, Sherrington and Hayes30;Reference Fletcher, Goodwin, Richards, Campbell and Taylor31) focused on Parkinson disease and evaluated programs of strengthening and balance exercises over the same time period (5 to 6 months). However, the first study (Reference Farag, Sherrington and Hayes30) included 84 hours of exercise versus only 30 hours in the second study (Reference Fletcher, Goodwin, Richards, Campbell and Taylor31). Two studies focused on walking and aerobics or walking programs for patients with chronic fatigue. The first program (Reference McCrone, Sharpe and Chalder32) lasted longer (12 months) and included twice as many sessions compared with the second program which lasted 16 weeks (Reference Sabes-Figuera, McCrone and Hurley33). The last two studies evaluated aerobic programs among patients with cerebral palsy and multiple sclerosis with either twelve (Reference Tosh, Dixon and Carter35) or eighteen (Reference Tosh, Dixon and Carter35) supervised sessions over 12 weeks. Usual or specialist medical care, sometimes in combination with leaflet provision, was the comparator in all studies on neurological disorders. All studies in this category were based on RCTs and included QALYs. Only half of studies used a societal cost perspective (Reference McCrone, Sharpe and Chalder32;Reference Slaman, van den Berg-Emons and Tan34;Reference Tosh, Dixon and Carter35) while the time horizon for the evaluation ranged from 20 weeks (Reference Fletcher, Goodwin, Richards, Campbell and Taylor31) to 12 months (Reference McCrone, Sharpe and Chalder32;Reference Slaman, van den Berg-Emons and Tan34).
On the six studies focusing on neurological disorders, two obtained a RCT quality score of 3 (Reference Farag, Sherrington and Hayes30;Reference Tosh, Dixon and Carter35), two had a score of 2 (Reference Fletcher, Goodwin, Richards, Campbell and Taylor31;Reference McCrone, Sharpe and Chalder32), and two had low scores of 1 (Reference Sabes-Figuera, McCrone and Hurley33) or 0 (Reference Slaman, van den Berg-Emons and Tan34). Regarding the quality of the economic evaluation, half of studies reached a high score (Reference Fletcher, Goodwin, Richards, Campbell and Taylor31;Reference McCrone, Sharpe and Chalder32;Reference Tosh, Dixon and Carter35), and the other half obtained a moderate score (Reference Farag, Sherrington and Hayes30;Reference Sabes-Figuera, McCrone and Hurley33;Reference Slaman, van den Berg-Emons and Tan34).
The exercise program was dominant in three studies on neurological disorders (31;32;34). The program was not cost-effective at the £20,000 per QALY threshold in two of the remaining studies (Reference Farag, Sherrington and Hayes30;Reference Tosh, Dixon and Carter35) and the cost-effectiveness of the exercise program was uncertain in the third one (Reference Sabes-Figuera, McCrone and Hurley33). For one study, the exercise program was cost-effective only at the £30,000 per QALY threshold (Reference Tosh, Dixon and Carter35). In one study (Reference Sabes-Figuera, McCrone and Hurley33), only a disease-specific measure of health benefit, the Chalder fatigue score, was used and thus the classification of the cost-effectiveness result was difficult. However, the authors reported that the probability of cost-effectiveness was equal to 55 percent to 63 percent if the decision maker was willing to pay £1,000 to £2,500 per clinically significant improvement in fatigue (four-point variation in the Chalder fatigue scale).
Mental Illnesses
Among the three studies on mental illnesses, two exercise programs focused on walking (36;38) while the type of exercise performed was not mentioned in one study (Reference Edwards, Linck and Hounsome37). The length of the supervised exercise program ranged from 6 weeks (Reference d'Amico, Rehill and Knapp36) to 6 months (Reference Gusi, Reyes, Gonzalez-Guerrero, Herrera and Garcia38), while the supervised exercise volume was comprised between 14 (Reference d'Amico, Rehill and Knapp36) and 60 hours (Reference Gusi, Reyes, Gonzalez-Guerrero, Herrera and Garcia38) over the whole program. Usual care was the comparator in all studies, in combination with leaflet provision or exercise advice. All studies were based on RCTs and used QALYs, but only one study adopted a societal cost perspective (Reference d'Amico, Rehill and Knapp36). The time horizon for the evaluation ranged from 12 weeks (Reference d'Amico, Rehill and Knapp36) to 12 months (Reference Edwards, Linck and Hounsome37).
On the three studies dealing with mental illnesses, one obtained the minimal RCT quality score of 0 (Reference d'Amico, Rehill and Knapp36), and two had a score of 1 (Reference Edwards, Linck and Hounsome37;Reference Gusi, Reyes, Gonzalez-Guerrero, Herrera and Garcia38). For the methodological quality of the economic evaluation, one study (Reference Edwards, Linck and Hounsome37) obtained a good score, while two studies had a moderate score (Reference d'Amico, Rehill and Knapp36;Reference Gusi, Reyes, Gonzalez-Guerrero, Herrera and Garcia38).
The exercise program was found cost-effective in two studies among depressed patients (Reference Edwards, Linck and Hounsome37;Reference Gusi, Reyes, Gonzalez-Guerrero, Herrera and Garcia38) but not cost-effective in one study on dementia (Reference d'Amico, Rehill and Knapp36).
Cancers
Two studies focused on breast cancer patients (Reference Gordon, DiSipio and Battistutta39;Reference Mewes, Steuten and Duijts40). The first study (Reference Gordon, DiSipio and Battistutta39) included sixteen sessions of aerobic and strength exercises spread over 8 months while women undertook mostly unsupervised running, swimming, or cycling in the second study (Reference Mewes, Steuten and Duijts40). The third study examined a swallowing exercise program for head and neck cancer patients (Reference Retèl, van der Molen and Hilgers41). All exercise programs were compared with usual care. Two studies were model-based (Reference Mewes, Steuten and Duijts40;Reference Retèl, van der Molen and Hilgers41) and one was based on an RCT (Reference Gordon, DiSipio and Battistutta39). QALYs were measured in all studies, while only one study (Reference Gordon, DiSipio and Battistutta39) adopted a societal cost perspective. The time horizon for the evaluation ranged from 12 months (Reference Gordon, DiSipio and Battistutta39;Reference Retèl, van der Molen and Hilgers41) to 5 years (Reference Mewes, Steuten and Duijts40).
The RCT quality was assessable for two studies that obtained scores of 2 (Reference Gordon, DiSipio and Battistutta39) and 1 (Reference Mewes, Steuten and Duijts40), respectively. The three studies obtained a moderate score for the methodological quality of the economic evaluation.
Among the three studies in the field of oncology, one swallowing exercise program for head and neck cancer patients was found to be cost-effective (Reference Retèl, van der Molen and Hilgers41), while two exercise programs among women with breast cancer were not cost-effective at the £20,000 per QALY threshold (Reference Gordon, DiSipio and Battistutta39;Reference Mewes, Steuten and Duijts40). In one study on breast cancer (Reference Mewes, Steuten and Duijts40), the exercise program was cost-effective when using the £30,000 per QALY threshold.
Other Diseases
Among the four remaining studies, two focused on pelvic floor muscle training among women with pelvic prolapse (Reference Panman, Wiegersma and Kollen42;Reference Panman, Wiegersma and Kollen43), one focused on a 2-year walking and cycling program among chronic obstructive pulmonary disease (COPD) patients (Reference Zwerink, Effing and Kerstjens44) and one evaluated a 6-month aerobic and resistance exercise program for diabetic patients (Reference Coyle, Coyle and Kenny45). The two articles on pelvic organ prolapse evaluated the same program with respect to different comparators, watchful waiting (Reference Panman, Wiegersma and Kollen42), or pessary treatment (Reference Panman, Wiegersma and Kollen43). Self-management was used as comparator in the study on COPD (Reference Zwerink, Effing and Kerstjens44) while the comparator group was a waiting list control in the study with diabetic patients (Reference Coyle, Coyle and Kenny45). One study was model-based (Reference Coyle, Coyle and Kenny45), while the others were based on RCTs (Reference Panman, Wiegersma and Kollen42;Reference Panman, Wiegersma and Kollen43;Reference Zwerink, Effing and Kerstjens44). All studies included QALYs and used a health care cost perspective. The time horizon of the evaluation was 40 years in the model-based study (Reference Coyle, Coyle and Kenny45), while it was 2 years in the three RCT-based studies (Reference Panman, Wiegersma and Kollen42;Reference Panman, Wiegersma and Kollen43;Reference Zwerink, Effing and Kerstjens44).
The COPD study (Reference Zwerink, Effing and Kerstjens44) and the two studies on pelvic organ prolapse (Reference Panman, Wiegersma and Kollen42;Reference Panman, Wiegersma and Kollen43) achieved a RCT quality score of 1, while the study on type 2 diabetes (Reference Coyle, Coyle and Kenny45) obtained a score of 2. All four studies attained a moderate score for the methodological quality of the economic evaluation.
The pelvic floor muscle training for women with pelvic organ prolapse was found to be dominated by pessary (Reference Panman, Wiegersma and Kollen43) treatment or not cost-effective at the £20,000 per QALY threshold against watchful waiting (Reference Panman, Wiegersma and Kollen42) (but cost-effective at the £30,000 per QALY threshold). The physical activity program was found cost-effective in the study on COPD (Reference Zwerink, Effing and Kerstjens44), while a combined aerobic and resistance training among diabetic patients was found not cost-effective at the £20,000 per QALY level but cost-effective at the £30,000 per QALY level (Reference Coyle, Coyle and Kenny45).
DISCUSSION
We identified thirty-seven studies evaluating the cost-effectiveness of exercise programs among chronically ill patients, mainly with musculoskeletal and rheumatologic disorders or cardiovascular diseases, published after 2008. Exercise programs were dominant or cost-effective in twenty-two studies (59.5 percent) when using a £20,000 per QALY threshold or in twenty-eight studies (75.7 percent) when using a £30,000 per QALY threshold. Exercise programs were not cost-effective in seven studies (18.9 percent) for either threshold, while cost-effectiveness of physical activity remained unclear in two studies (5.4 percent) given the use of disease-specific health benefit measures only.
Exercise programs were dominant or cost-effective at the £20,000 per QALY threshold in eight of eleven studies on musculoskeletal and rheumatologic disorders, in four of seven studies on cardiovascular diseases using a nonsurgical comparator, in three of six studies on neurological disorders, in one of three studies on cancers and in two of three studies on mental illnesses. If an alternative cost-effectiveness threshold at £30,000 per QALY were to be adopted, the exercise program would be cost-effective in two additional studies on cardiovascular diseases using a nonsurgical comparator, in one additional study on neurological disorders and in one additional study on cancers. Thus, available evidence shows that exercise programs are cost-effective for the most part for the treatment of musculoskeletal and rheumatologic disorders and for the treatment of cardiovascular diseases when considering the upper NICE cost-effectiveness threshold of £30,000 per QALY. This result is in line with the previous findings of Roine et al. (Reference Roine, Roine and Räsänen2) who found stronger evidence of cost-effectiveness for exercise programs in cardiac rehabilitation and in back pain patients.
Since the last review of Roine et al. (Reference Roine, Roine and Räsänen2) more studies were found in the fields of neurological disorders, mental illnesses and oncology. However, for these disease groups, economic evaluations of physical activity programs remain scarce and the few studies available show contradictory results. For other conditions, such as diabetes, obesity, and respiratory diseases, the exercise programs under evaluation are usually multicomponent and study designs most often do not allow isolation of the specific impact of exercise on costs and health outcomes.
The existing literature suffers from several limitations. First, included studies were of varying levels of methodological quality. For instance, five of eleven studies on musculoskeletal and rheumatologic disorders had a good score for the methodological quality of the economic evaluation, while this was the case for only one of seven studies on cardiovascular diseases using a nonsurgical comparator.
Progress has been made in the comparability of results between and within disease categories, because QALYs were used as a measure of health benefits in thirty-five of thirty-seven studies. Nevertheless, comparability of cost-effectiveness results is still limited. This arises mainly from the differences in methodologies across studies. The cost perspective adopted, the type of design used (RCT-based or model-based studies) or the time horizon chosen to evaluate the exercise program may all impact the cost-effectiveness results. The use of a larger cost perspective, that includes productivity losses or informal care, tends to increase the cost-effectiveness of exercise programs compared with a strict health care perspective. Conversely, including the opportunity cost of exercise or patients’ out-of-pocket costs tends to reduce the cost-effectiveness of the exercise program compared with using a health care system perspective.
The choice of model-based analyses, which typically simulates the impact of the exercise program over a longer period of time, might also affect the results of the cost-effectiveness analysis. For example, among the seven studies focusing on cardiovascular diseases and using nonsurgical comparators, three studies (Reference Kühr, Ribeiro, Rohde and Polanczyk21;Reference Reynolds, Apruzzese and Galper24;Reference Rincón, Rojas and Romero25) used modeling to assess the impact of exercise programs over several years and the exercise program was cost-effective in all three studies. In this review, cost-effectiveness results tend to be more often positive in modeling studies. Indeed, among the seven modeling studies, the exercise program was dominant or cost-effective in 71.4 percent of cases when using the £20,000 QALY thresholds, while among the thirty RCT-based studies, the exercise program was dominant or cost-effective in only 60 percent of cases.
The second factor impeding the comparability of cost-effectiveness results is the heterogeneity among interventions. Exercise programs under evaluation differed in terms of the type of exercise performed. For example, in the case of low back pain, the physical activity program included strength, endurance, and stretching exercises in one study (Reference Henchoz, Pinget and Wasserfallen12), while it included yoga in another study (Reference Aboagye, Karlsson, Hagberg and Jensen9).
The exercise programs also differed in terms of duration and volume of exercise performed. For instance, for intermittent claudication, the exercise programs lasted from 12 weeks (Reference Mazari, Khan and Carradice22) to 12 months (Reference van Asselt, Nicolaï, Joore, Prins and Teijink27) or included 24 (Reference Spronk, Bosch and den Hoed26;Reference Van Den Houten, Lauret and Fakhry28) to 78 hours (Reference Reynolds, Apruzzese and Galper24;Reference van Asselt, Nicolaï, Joore, Prins and Teijink27) of supervised exercise. No clear pattern of association between the volume and duration of the exercise program and its cost-effectiveness emerged from our analysis. Indeed, for low back pain, a yoga program of twelve sessions over 6 weeks (Reference Aboagye, Karlsson, Hagberg and Jensen9) was found dominant, while a longer exercise program of twenty-four sessions spread over 12 weeks (Reference Henchoz, Pinget and Wasserfallen12) was not cost-effective.
On the other hand, in the case of heart failure, two exercise programs both including thirty-six sessions over 12 weeks were found either cost-effective (Reference Rincón, Rojas and Romero25) or not cost-effective (Reference Reed, Whellan and Li23). The two studies focusing on intermittent claudication and using nonsurgical comparators included two walking programs of 78 hours. In the first study, the program was spread over 6 months and 78 sessions (Reference Reynolds, Apruzzese and Galper24), while in the second study, the program was run in 156 sessions over 12 months (Reference van Asselt, Nicolaï, Joore, Prins and Teijink27). In this specific case, both programs were cost-effective at the £30,000 per QALY threshold even if the cost-effectiveness of the more intensive program was slightly superior.
In addition to the interventions’ characteristics, patients’ adherence to the program may also affect its health and economic impacts. We found that only seventeen studies (45.9 percent) reported patients’ adherence to the exercise programs, with highly varying levels (Supplementary File 2). Finally, the differences in the clinical and demographic characteristics of patients may also impact the cost-effectiveness results. For example, among the four studies focusing on heart failure, two studies included patients with New York Heart Association class II or III heart failure (Reference Kühr, Ribeiro, Rohde and Polanczyk21;Reference Witham, Fulton and Greig29), while one study also included class IV patients (Reference Reed, Whellan and Li23) and one gave no details on the severity of the diseases among included patients (Reference Reynolds, Apruzzese and Galper24) (Supplementary File 2).
In conclusion, we identified thirty-seven studies evaluating the cost-effectiveness of exercise programs among chronically ill patients. Exercise programs for the treatment of musculoskeletal and rheumatologic disorders, and to a lesser extent for the treatment of cardiovascular diseases, appear cost-effective. More research is needed to investigate the cost-effectiveness of exercise programs in the treatment of cancers, mental disorders, diabetes, obesity, and respiratory diseases.
ACKNOWLEDGMENTS
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
SUPPLEMENTARY MATERIAL
The supplementary material for this article can be found at https://doi.org/10.1017/S0266462318000533
Supplementary File 1: https://doi.org/10.1017/S0266462318000533
Supplementary File 2: https://doi.org/10.1017/S0266462318000533
Supplementary File 3: https://doi.org/10.1017/S0266462318000533
Supplementary File 4: https://doi.org/10.1017/S0266462318000533
Supplementary File 5: https://doi.org/10.1017/S0266462318000533
CONFLICTS OF INTEREST
The authors have nothing to disclose.