Independent Researcher, 1096 HZ Amsterdam, The Netherlands. https://orcid.org/0000-0003-1079-5647
Independent Researcher, 30159 Hannover, Germany. https://orcid.org/0000-0001-8018-1419
*Corresponding Author:
Independent Researcher, 1096 HZ Amsterdam, The Netherlands.
Email: markvink.md@outlook.com
In this article, we analyzed the systematic review by Kuut et al. into the efficacy of cognitive behavioral therapy (CBT) for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), a disease that predominantly affects women, and the eight trials in it. We found many issues with the studies in the review, but also with the review itself. For example, the systematic review by Kuut et al. included a researcher who was involved in seven of the eight studies in their review, and another one who was involved in five of them. Moreover, at least one of them was involved in every study in the review. On top of that, the three professors who were involved in the systematic review, have all built their career on the CB model and the reversibility of ME/ CFS through CBT and GET and two of the systematic reviewers have a potential financial conflict of interest. Yet they failed to inform the readers about these conflicts of interest. Conducting a review in this manner and not informing the readers, undermines the credibility of a systematic review and its conclusion.
Regarding outcome differences between treatment and control group, it’s highly likely that the combination of non-blinded trials, subjective outcomes and poorly chosen control groups, alone or together with response shift bias and/or patients filling in questionnaires in a manner to please the investigators, allegiance bias, small study effect bias and other forms of bias, produced the appearance of positive effects, despite the lack of any substantial benefit to the patients, leading to the erroneous inference of efficacy in its absence. That CBT is not an effective treatment is highlighted by the fact that patients remained severely disabled after treatment with it. The absence of objective improvement as shown by the actometer, employment status and objective cognitive measures, confirms the inefficacy of CBT for ME/CFS. The systematic review did not report on safety but research by the Oxford Brookes University shows that CBT, which contains an element of graded exercise therapy, is harmful for many patients. Finally, our reanalysis highlights the fact that researchers should not mark their own homework.
Keywords: CBT; CFS; Chronic Fatigue Syndrome; Cognitive Behavior Therapy; Myalgic Encephalomyelitis; NICE.
Myalgic encephalomyelitis (ME), a name which goes back to the 1950s, is also known as chronic fatigue syndrome (CFS). In this day and age it’s often referred to as ME/CFS. ME is characterized by an abnormally delayed muscle recovery after trivial exertion [1], which over time has evolved into post exertional malaise (PEM) [2]. Common other symptoms are muscle fatigue, which manifests itself as muscle weakness/heavy legs, myalgia, cognitive disturbances, headaches/migraines, reversal of sleep rhythm and hypersensitivity to light and / or sound [3] leading to (severe) functional impairment [4]. Up to 25% of patients are severely or very severely affected and are homebound or bedbound and dependent on help and care from others for even the most basic things [5]. Yet, for many years, treatment for this disease has been based on the biopsychosocial model, also known as the cognitive behavioral model (CBmodel), which is based on the assumption that there is no underlying illness, but that after a viral illness, which has been resolved, patients attribute their symptoms to disease and they have become deconditioned as a consequence of avoidance of activity and exercise thereby producing a vicious circle which leads to further deconditioning [6,7]. Cognitive behavior therapy (CBT), which contains an element of graded exercise therapy (GET), together with GET, were designed to tackle factors assumed to maintain and perpetuate ME/CFS symptoms and associated disability as well as deconditioning and lead to recovery. According to Surawy et al. [6], who formalized that model in 1995, one of the limitations of their model is the absence of objective proof for it. A number of studies [8,9] showed that patients do not possess the behavioral characteristics targeted by the treatments of the CBmodel. Geraghty et al. published a detailed review of this model. They concluded “that the model lacks high-quality evidential support, conflicts with accounts given by most patients and fails to account for accumulating biological evidence of pathological and physiological abnormalities found in patients. There is little scientific credibility in the claim that psycho-behavioural therapies are a primary treatment for this illness” [10]. Thoma et al. came to a similar conclusion in 2023 in an article entitled “Why the Psychosomatic View on Myalgic Encephalomyelitis/Chronic Fatigue Syndrome Is Inconsistent with Current Evidence and Harmful to Patients” [11].
The prestigious American Institute of Medicine (IOM, now the National Academy of Medicine) confirmed that in 2015 when it concluded that ME/CFS is a “complex, multisystem, and often devastating disorder” and that it is “is a medical — not a psychiatric or psychological — illness” for which there is no effective treatment [2].
The National Institute for Health and Care Excellence (NICE) confirmed the conclusion by the IOM, that ME/CFS is a complex multisystem chronic medical condition for which there is no effective treatment in its updated guidelines on the diagnosis and management of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) in October 2021 [12]. It specifically stated that CBT and GET do not lead to improvement or recovery, but that CBT might be offered as a supportive therapy. A systematic review by Dutch proponents of the CBmodel however, has recently concluded “that CBT for ME/CFS can lead to significant reductions of fatigue, functional impairment, and physical limitations. There is no indication patients meeting different case definitions or reporting additional symptoms benefit less from CBT. Our findings do not support recent guidelines in which evidence from studies not mandating PEM was downgraded” [13].
In this article we will review the evidence presented in the systematic review by Kuut et al. [13] to assess whether or not the aforementioned conclusion of this systematic review is justified by the data contained within the primary studies included in the review. A study by Smakowski et al. into the effectiveness of GET, which included professor Chalder, one of the world’s leading CBT and GET proponents for ME/CFS, concluded that “self-report measures…can be problematic because they are dependent on a patient’s perception of their own illness” (p. 7 [14]. Consequently, in our analysis, we will concentrate on the objective outcome measures to establish if improvements in self-report (fatigue, functional impairment, and physical limitations) translate to observable improvement in objective tests (physical ability, fitness, etc.) as there is an inverse relationship between fatigue and physical activity [15]. We also concentrated on work related outcomes, because “significant reductions of fatigue, functional impairment, and physical limitations” should lead to a significant improvement in work status and a significant reduction in illness and benefits status. Moreover, according to, for example, one of the studies in the review (Tummers et al.), part of their treatment was “to make a plan for work resumption” [16]. Finally, according to Stevelink et al., which also included professor Chalder, “given the evidence that meaningful occupation is important for well-being and psychosocial needs, work-related outcomes should be targeted in CFS treatment” [17].
Our analysis shows the review’s conclusion, that CBT is effective, irrespective of the case definitions, is not supported by the evidence. When the subjective outcomes of the trials are considered, it is possible to state that patients remain severely disabled after treatment with CBT. Additionally, when the objective outcomes are considered, it is possible to state that CBT is ineffective for ME/CFS.
Lack of patient blinding combined with self-reporting of outcomes
All trials in the review were non-blinded by definition, yet the review only used a subjective primary outcome, fatigue severity (CIS) and two subjective secondary outcomes: functional impairment (SIP8) and physical functioning (SF–36). Even though it is well known that patient self-report is an unreliable measure [18].
As noted by Demetriou et al., “one of the main advantages of the self-report questionnaire is that it can be administered to a large sample of people quickly without much effort or financial cost” [19]. Moreover, questionnaires are quick and easy to administer and score [20]. Yet the lack of patient blinding in combination with self-reporting of outcomes leads to pronounced bias as patients become prone to outside influences leading to the erroneous inference of efficacy in its absence, thus making subjectively assessed outcomes unreliable [21,22]. Also, according to, for example, the BRANDO project (Bias in Randomised and Observational studies) [23], which, amongst others, included Stanford Professor Ioannidis, it is important that “as far as possible, clinical and policy decisions should not be based on trials in which blinding is not feasible and outcome measures are subjectively assessed”, because the lack of blinding is “associated with an average 13% exaggeration of intervention effects.” “Therefore, trials in which blinding is not feasible should focus as far as possible on objectively measured outcomes” [23]. In non-blinded studies, self-report measures are highly vulnerable to response bias, the size of which is not trivial. No such inflation was observed when objective outcome measures were used [21]. According to a systematic review by Whiting et al., there is a particular problem with subjective outcomes for patients with CFS, as they “may feel better able to cope with daily activities, because they have reduced their expectations of what they should achieve, rather than because they have made any recovery as a result of the intervention” [24]. Whiting et al. continued by stating that “a more objective measure of the effect of any intervention would be whether participants have increased their working hours, returned to work or school, or increased their physical activities” [24].
Moreover, Vercoulen et al., which included one of the researchers of the systematic review (Bleijenberg), concluded that “one has to be very careful with using self-report questionnaires as measures for actual activity level: none of the self-report questionnaires had strong correlations with the Actometer. Thus, self-report questionnaires are no perfect parallel tests for the Actometer” [25]. But also that “subjective instruments do not measure actual behaviour. Responses on these instruments appear to be an expression of the patients’ views about activity and may be biased by cognitions concerning illness and disability. In healthy subjects such cognitions do not exist and therefore their responses were not biased by these cognitions” [25]. Van der Werf et al., which also included Bleijenberg, concluded, “that self-report measures of activity and behavioural data often correlate poorly” [26].
Healey et al. concluded that “caution should be taken when using self-reported PA [Physical Activity] measures. It is also important to note the wider limitations of all self-report measures i.e. potential for social desirability bias, recall bias, over and underestimation of activities/ misclassification of activities. Therefore, where possible, the use of objective measures of PA (e.g. accelerometry) should be considered. There is greater evidence of their validity and reliability and they can objectively capture all dimensions of PA” [27]. Consequently, “self-reported physical activity tends to overestimate the level of physical activity compared to the objective method” [28].
Quinlan et al. who “investigated the associations between…self reported…and objectively measured physical activity in middle-aged adults” concluded that “in adults, self-report measures of physical activity tend to have low correlations with objective measures. Accurately measuring physical activity using self-reported tools may be difficult as individuals cannot accurately estimate the amount and type of physical activity completed in the surveyed time, or precisely report the intensity of physical activity” [29].
Additionally, the reanalysis of the amended Cochrane exercise review [30] found that objective outcomes from two CBT trials for ME/CFS confirmed the unreliability of the subjective outcomes in non-blinded studies, as shown by the following examples:
In Jason et al. [31], there was a substantial difference in the subjective physical functioning scores at baseline between the exercise and control groups, yet, objectively, there was not (six-minute walk test or 6MWT);
In the PACE trial, the released individual participant data showed that 20% of participants whose physical functioning improved subjectively had deteriorated objectively (6MWT) [32–34].
The German Institute for Quality and Efficiency in Healthcare (Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen, IQWiG) methods guidance handbook [35] states the following about this problem. The value of patient-reported outcomes (PROs) in non-blinded studies is limited because of the subjective nature of them. (In German, “Da Angaben zu PROs aufgrund ihrer Natur subjektiv sind, sind offene, d. h. nicht verblindete Studien in diesem Bereich nur von eingeschränkter Validität”). The same handbook also states that non-blinded studies should, as far as possible, rely on objective endpoints, because subjective ones can be influenced by the person collecting them. (In German, “Falls eine verblindete Zielgrößenerhebung nicht möglich ist, sollte ein möglichst objektiver Endpunkt gewählt werden, der in seiner Ausprägung und in der Stringenz der Erfassung so wenig wie möglich durch diejenige Person, die den Endpunkt (unverblindet) erhebt, beeinflusst werden kann” p. 171). Finally, Lilienfeld et al. [22] concluded that non-blinded trials, and CBT studies are non-blinded by definition, should not rely on subjective primary outcomes, but use either objective primary outcomes alone, or combined with subjective ones, as a methodological safeguard against erroneous inference of efficacy in its absence.
Control groups
A key principle of a randomized controlled trial (RCT) to ensure a fair comparison is that groups should be similar with respect to all factors that might affect the outcome, besides the intervention, including the number of treatment sessions, to ensure a fair comparison [36] but, also, to make sure that an RCT is ‘internally valid’, which refers to the extent that the outcome for a trial can be attributed to the experimental treatment and not to any alternative explanation, such as the natural course of the target problem [37]. Yet, in a waitlist control group, patients do not get any treatment from (a doctor from) that study.
Additionally, “being allocated to a waiting list control group may evoke feelings of being denied support and disappointment” [38] because participants expect to get some form of treatment in return for taking part in a trial “and that participants can decide to wait to attempt to change until receiving the support sought” [38] resulting in participants putting less effort into change.
Also, in a no treatment, usual care, or waiting list control group (WLC), participants must attend several assessments without any direct benefit for themselves. These patients will be disappointed that they have been denied treatment benefits they anticipated from participation in a study. Assignment to those sort of control groups may strengthen participants’ beliefs that they will not improve, thereby reducing the chance of spontaneous improvement. “Direct evidence of this phenomenon was found in one exploratory trial [38], which showed that participants who rated themselves as ready to change their alcohol consumption, and who were allocated to a waiting list group, waited to reduce their drinking”. As a matter of fact, “28% of participants had markedly higher consumption at follow-up…after having joined the study hoping to reduce it” because of being allocated to a waiting list group. “Being made to wait may invite negative research participation effects” [38] artificially inflating the treatment effect of the treatment under investigation. Moreover, a meta analysis by Furukawa et al. entitled “waiting list may be a nocebo condition”, concluded that waiting list control groups do not control “for regression towards the mean and the natural course of the disease but instead it may introduce negative psychological expectation of ‘waiting for the desired active treatment“ [39]. Also, using waitlist, usual care or no-treatment control conditions does not adequately correct for the placebo effect, regression to the mean and other forms of biases and confounding factors [37].
Researchers often assume that with waiting list control groups and other no-treatment control designs, the absence of treatment equates with the absence of an effect. Yet participants randomized to these designs may improve less than would be expected compared to participants not enrolled in a trial which may threaten the internal validity of a trial. Consequently, subjective baseline–follow-up differences cannot be assumed to be the natural history of what would have occurred in the absence of patients enrolling in the study [37].
Moreover, Janse et al. [40], one of the studies in the systematic review, who used a waiting list control group, noted that “the use of a waiting-list control [group] does not control for non-specific therapy factors and limits the external validity”. Therefore, using waitlist, usual care or no-treatment control conditions can lead to the overestimation of the effectiveness of a treatment [37].
Study size
A review of homeopathy studies by the Australian National Health and Medical Research Council (NHMRC), used a minimum number of 150 participants in randomised controlled trials (evenly distributed across the therapy and control group) because, according to the NHMRC, the results may be distorted in studies with a smaller number of participants [41]. Consequently, ‘trials with limited sample sizes are more likely to report larger beneficial effects than large trials” [42]. This is also known as the small study effect. We not only mention this review by the NHMRC because of the distortion of results if the number of participants in a study is small, but also because it was an integral part of the recent advice of the EASAC Homeopathy Working Group—The European Academies Science Advisory Council—to the EU on homeopathy [43]. Furthermore, one of its members (Van der Meer) is one of the world’s leading CBT proponents for ME/CFS and Van der Meer, Bleijenberg and Knoop are the three leading ME/CFS experts in the Netherlands. Also, the three of them are/were the leaders of the Dutch Expertise Centre for Chronic Fatigue in the Netherlands which has been promoting CBT as an effective treatment for ME/CFS for many years. Consequently, one should be careful with interpreting therapeutic responses in RCTs with less than the aforementioned minimum number of participants and those studies were excluded from the recent advice of the EASAC Homeopathy Working Group.
Response-shift bias and allegiance bias
As noted by Howard, when “using self-report instruments, researchers assume that a subject’s understanding of the standard of measurement for the dimension being assessed will not change from one testing to the next (pretest to posttest). If the standard of measurement were to change, the ratings would reflect this shift in understanding in addition to any actual changes in the subject. Consequently, comparisons of the ratings would not accurately reflect change due to treatment and would be invalid” [44]. This “instrumentation related source of contamination is known as response-shift bias” [44]. This is even more of a problem when the therapy used, in this case different forms of CBT for ME/CFS, aims to modify participants’ beliefs and perception of their symptoms [45]. According to Lilienfeld et al., one of the things that can “help to eliminate response-shift biases as explanations for apparent improvement” is not relying “exclusively on self-report ratings” [22].
One of the other problems with CBT and psychotherapy studies is allegiance of the researchers with the treatment. “Allegiance in psychotherapy represents the therapist’s personal belief both in the superiority and the efficacy of a particular treatment” [46]. A systematic review by Dragioti et al. concluded that “experimenter’s allegiance influences the effect sizes of psychotherapy RCTs and can be considered non-financial conflict of interest introducing a form of optimism bias, especially since blinding is problematic in this kind of research” [46]. This is of particular interest because two of the researchers of Kuut et al. (Bleijenberg and Knoop) have based their career on the CBmodel and the efficacy of CBT for ME/CFS.
Design
A reanalysis was conducted of the systematic review by Kuut et al. and the studies in it.
Study selection criteria
Studies were only eligible if they were part of that systematic review.
Analysis of the results
We examined the characteristics of the studies in the review by paying particular attention to the size of the treatment groups and the sort of control group that was used. We also checked each study for dropouts because not only can that give valuable information about the acceptability of a treatment to participants, but at the same time, a treatment can only be deemed to be safe and effective if patients actually adhered to it.
We examined the clinical relevance of the subjective effects of CBT, by using the scoring by the systematic review on the CIS-fatigue, no longer severely fatigued, i.e. scoring < 35. However, the review classified someone who scored < 700 on the SIP8 as no longer functionally impaired, and someone scoring >70 on the SF-36 subscale physical functioning (SF-36 PF), as no longer severely impaired in physical functioning. In other words, participants were deemed to be recovered on those scales, even though with scores of 700 or more (SIP8) or 70 or less they were deemed to be severely functionally impaired and severely impaired in physical functioning (SF-36 PF), respectively. Yet, no impairments and severely impaired do not border on each other, instead, there is a spectrum of improvement in between. This is highlighted by the following from a study by two of the systematic reviewers (Knoop and Bleijenberg) which noted that the mean CIS-fatigue score in healthy adults (mean age of 37) is 17.3, the mean SF-36 PF score in healthy adults without a chronic condition, is 93.1 and the mean SIP8 total score of healthy women is 65.5 [47]. Consequently, we changed, no longer being functionally impaired, and no longer being impaired in physical functioning into no longer being severely functionally impaired (scoring <700 on the SIP8) and no longer being severely impaired in physical functioning (scoring >70 on the SF-36 PF).
We assessed the number of published studies that had a physical functioning entry requirement. Only studies that had a physical functioning entry requirement were used to assess this because, as concluded by Janse et al., “the fact that our study did not select on the level of physical functioning will make it more difficult to find an effect of iCBT on physical functioning” [40].
To investigate the overall clinical relevance of the effect of CBT, we used objective outcomes. Not only to correct for potential biases and confounding factors caused by using subjective outcomes in non-blinded studies, or caused by using a waiting list or treatment as usual control group, but also, and more importantly, because according to the CBmodel, symptoms in ME/CFS are caused by deconditioning. Consequently, by definition, symptoms can only be substantially less and there can only be a relevant improvement for patients, if there is a substantial improvement in physical conditioning.
According to a Dutch multidisciplinary ME/CFS guideline from 2013 which included one of the authors of the systematic review (Bleijenberg), “it is not unusual for interventions that have been proven to be effective in a research setting to perform less well in (clinical) practice. An important question is therefore whether information is available about the effectiveness of CBT for CFS and graded exercise therapy outside the confines of clinical trials” (in Dutch: “Het is niet ongebruikelijk dat interventies die in een onderzoekssetting effectief zijn gebleken, in de (klinische) praktijk minder goed presteren. Belangrijke vraag is dan ook of er informatie voorhanden is over de werkzaamheid van CGT voor CVS en graded exercise therapie in de praktijk” [48]. We will therefore also analyze evaluation studies that have been conducted to answer this question by paying particular attention to the objective outcomes because as noted before, there can only be a relevant improvement for patients, if there is a substantial improvement of their deconditioning (fitness).
Analysis of the review and the studies in it: The systematic review by Kuut et al. included eight randomized controlled studies (RCTs) of CBT for ME/CFS and “the total sample consisted of 1298 patients” [13], yet one of them is a non-published non-peer reviewed study, as can be seen in table 1. Only 578 participants (93 + 36 + 85 + 68 + 62 + 136 + 160) received some form of CBT in the remaining seven studies. All seven published studies relied on subjective primary outcomes. In two of the seven published studies, participants received individual face-to-face CBT, in one study they received group CBT, in two studies guided self-instructions with email contact with a therapist and in two others, Internet-based CBT. In two studies the participants were adolescents, in the other ones they were adults. One study used a natural course control group and one a care as usual one. The other five studies used a waiting list control group. Three of the seven published studies had less than 75 participants in the treatment group and would therefore not have been included in the earlier mentioned review of homeopathy studies.
Study | n | Participants | CBT | Control | Duration of treatment |
---|---|---|---|---|---|
Prins et al. (2001)[49] | 278 | Adults | Individual face-to-face CBT (n=93)Guided support (n=94) | Natural course (n=91) | 8 months |
Stulemeijer et al. (2005) [50] | 71 | Adolescents | Individual face-to-face CBT (n=36) | Waiting list(n=35) | 5 months |
Knoop et al. (2008)[51] | 171 | Adults | Guided self-instructions with emailcontact with a therapist(n=85) | Waiting list(n=86) | At least16 weeks |
Nijhof et al. (2012)[52] | 135 | Adolescents | Internet-based (n=68) | Care as usual(n=67) | 6 months |
Tummers et al. (2012)[16] | 123 | Adults | Guided self-instructions with emailcontact with a therapist(n=62) | Waiting list(n=61) | At least20 weeks |
Van der Schaafet al. (2015) [53, 54] | Unpublished | Non-peer re- viewed study | Unpublished study | Waiting list | 6 months |
Wiborg et al. (2015)[55] | 204 | Adults | Group face-to-face (n=136) | Waiting list (n=68) | 6 months |
Janse et al. (2018)[40] | 240 | Adults | Internet-based (n=160) | Waiting list (n=80) | 6 months |
NC: natural course; WL: waiting list.
As can be seen in table 2, in three studies, the dropout rate from the CBT group was substantially higher than in the control group (15% versus 5%, 19% versus 12% and 19% versus 6%), and in one study, it was much higher than in the control group (41% versus 23%). A systematic review by Whiting et al. entitled “Interventions for the Treatment and Management of Chronic Fatigue Syndrome A Systematic Review” noted that “where dropout rates are higher in the intervention group than in the control group it may be the case that there is something about the intervention that trial participants find unacceptable. It may be the method or frequency of administration, or adverse effects arising from the intervention” [24].
Finally, as concluded by psychology professor Lilienfeld in an article entitled “Why Ineffective Psychotherapies Appear to Work: A Taxonomy of Causes of Spurious Therapeutic Effectiveness”, “in contrast to clients who remain in treatment, those who drop out of treatment tend to be lower functioning” [22]. But also that “clients who drop out of therapy are not a random subsample of all clients. Research demonstrates that clients who are not improving are especially likely to leave psychotherapy. As a result, therapists may conclude erroneously that their treatments are effective merely because their remaining clients are those that have improved” [22]. He also noted that with high levels of dropout “clients who remain in these treatments…are generally faring better than when they began, but they are unrepresentative of the clients who initially enrolled. The clients who dropped out may not have been helped or may have even been harmed by the intervention” [22].
Only four of the seven published studies used one or more objective outcome measures and only one of these studies published the actometer results in (the supplement of) their original publication. Prins et al. waited nine years and Stulemeijer et al. five years before they published those results in an article by Wiborg et al. [56], which involved authors of the original two studies but also two of the authors of the systematic review (Bleijenberg and Knoop). Bleijenberg was also involved in the FITNET study by Nijhof et al. who did not publish these results at all. Instead, they stated the following in a response to comments by Kindlon and Crawford: “actual physical activity as measured by actigraphy is not likely to be the mediator of reduction in fatigue” [57]. Thereby acknowledging the null effect on the actometer in an indirect way. The only study that used the actometer and published its findings in supplement two of their article, was the study by Janse et al. who found an improvement, but the researchers themselves noted in the article itself that “this might be an accidental finding, taking the amount of missing data into account” [40]. Moreover, as noted earlier, Janse et al. also concluded that “the fact that our study did not select on the level of physical functioning will make it more difficult to find an effect of iCBT on physical functioning” [40]. Or to put it differently, there was no physical functioning entry score requirement, nor was there a maximum physical functioning score. The consequence of this was that the mean physical functioning score at baseline was 62.4 with a standard deviation of 21.1 [40]. This means that patients were included in the study with a physical functioning score of 80 or more. These scores suggest that patients were already very high functioning, which suggests that their diagnosis of ME/CFS might have been incorrect. Moreover, according to supplement two of the study by Janse et al., only 16% (13/80) adhered to the treatment in the protocol-driven feedback and only 19% (15/80) in the feedback on demand treatment group [40]. Consequently, 81% and 84%, respectively, did not adhere to treatment in the two treatment groups. According to an article by Baryakova et al. on overcoming barriers to patient adherence, “experiencing adverse effects or anxiety about potential adverse effects is a major deterrent to patient adherence” [58]. An improvement can only be down to a treatment if patients actually adhered to it. Or as Janse et al. themselves noted, the improvement on the actometer, “might be an accidental finding, taking the amount of missing data into account” [40].
Janse et al. also decided to deviate from the original study protocol by not determining quality of life scores. This might sug- gest that treatment did not have an effect on the quality of life.
Study | CBT | Objective outcome used | Objective improvement | Drop outs |
---|---|---|---|---|
Prins et al. (2001)[49] | Individual face-to-face CBT and guidedsupport | Yes(work status, actometer and objective neuropsychological tests) | No[49,56,59] | 41% (CBT,38/93); 23% (NC, 21/91) |
Stulemeijer et al.(2005) [50] | Individualface-to-face CBT | Yes (actometer) | No[56] | 19% (CBT,7/36); 6% (WL,2/35) |
Knoop et al. (2008) [51] | Guided self-instructions (GSI) | No objective outcomes were used, despite setting goals, such as returning to work. | Not used | 15% (GSI,13/85); 5% (WL, 4/86) |
Nijhof et al. (2012) [52] | Internet-based (IB) | Yes (actometer) | No [57] | 6% (IB, 4/68);6% (CAU, 4/67) |
Tummers et al. (2012) [16] | Guided self-instructions (GSI) | Not used | Not used | 11% (GSI,7/62); 6% (WL, 5/61) |
Van der Schaafet al. (2015) [53,54] | Unpublished study | BOLD signalas measured with fMRI; Cerebral tissueproperties as measured with MRI, DTIand MR-spectroscopy. | Unpublished study | Unpublished study |
Wiborg et al. (2015) [55] | Group CBT | No | No | 19%(Group CBT, 26/136); 12% (WL, 8/68) |
Janse et al. (2018) [40] | Internet-based (IB) | Yes (actometer) | Improvement,but “this might be an acciden-tal finding, taking the amount of missing datainto account” (p. 116 [40]). | 6% (IB,10/160); 5% (WL,4/80);buthigh amount of missing datafor the objective outcome. |
According to the Dutch trial register, Van der Schaaf et al. [53] was a randomised control trial with a no treatment, waiting list control group. The study used the following 4 primary outcomes.
1) “Blood Oxygenation Level Dependent (BOLD) signal as measured with functional Magnetic Resonance Imaging (fMRI)
2) Cerebral tissue properties as measured with Magnetic Resonance Imaging (MRI), Diffusion Tensor Imaging (DTI) and MR-spectroscopy.
3) Behavioural performance on computerized tasks
4) Fatigue severity: Checklist individual strength (CIS) [53]. This study has not been published so far and consequently is a non-peer reviewed study. The fact that it has not been published despite having an anticipated start date of the 1st of January 2014 might suggest that the objective primary outcomes did not support the hypothesis of the study and contradicted the efficacy of CBT.
The objective neuropsychological tests (two reaction time tests and a symbol digit modalities task) from Prins et al. were published six years later in 2007 by authors of the original ararticle, which included two of the authors of the systematic review (Bleijenberg and Knoop). The results were available for 83.8 per cent (233/278) of participants equally divided over the three groups (78 CBT; 79 support group; 76 no treatment) and showed that CBT did not lead to objective improvement [59].
According to Tummers et al., “activity patterns are usually assessed with an actometer, a small device worn around the ankle, and activity levels are assessed over a period of 12 days. However, as this was an implementation study, actometers were not available because of the high costs involved” [16]. Tummers et al. is the only study in the review that acknowledged the importance of the actometer. However, according to Peters et al., “implementation research is the scientific inquiry into questions concerning implementation—the act of carrying an intention into effect, which in health research can be policies, programmes, or individual practices” [60]. Additionally, “implementation outcome variables describe the intentional actions to deliver services. These implementation outcome variables—acceptability, adoption, appropriateness, feasibility, fidelity, implementation cost, coverage, and sustainability—can all serve as indicators of the success of implementation” [60]. An implementation study “contrasts with typical randomised controlled trials that look at the efficacy of an intervention in an “ideal” or controlled setting and with highly selected patients and standardised clinical outcomes, usually of a short term nature” [60] whereas an implementation study doesn’t do that. The study by Tummers et al. “evaluated the effectiveness of guided self-instruction for CFS implemented in an MHC [a community-based mental health centre], delivered by nurses. Method. One hundred and twenty-three patients were randomly assigned to either guided self-instruction (n = 62) or a waiting list (n = 61)” [16]. Consequently, it was not an implementation study, but simply a randomized controlled trial into the efficacy of an intervention, just like it said in the title of their article (“Implementing a minimal intervention for chronic fatigue syndrome in a mental health centre: a randomized controlled trial”). Consequently, the study by Tummers et al. should have used the actometer as an objective outcome measure. The same applies to the other studies, which did not use the actometer as an objective measure of activity.
As noted earlier, the ultimate test to see if a treatment is effective or not, is the evaluation of its efficacy in real life, outside the confines of a RCT. A number of evaluation studies of the use of CBT in real life have been conducted [61-65] and as can be seen in table 3, two of those studies did not use objective outcome measures. The three others that did, showed that CBT has a negative instead of a positive effect on employment and disability/illness benefit status. Moreover, the Belgium evaluation study showed that CBT did not lead to an improvement in fitness [62].
Consequently, one cannot safely conclude that CBT is effective in view of the above named problems, the negative effect on employment status and disability benefits, and the lack of objective improvement of fitness.
Study | Treatment | N | Work and disability/illness benefitstatus | Physical capacity/fitness |
---|---|---|---|---|
Koolhaas et al. (2008)[61] | Evaluationof CBT inThe Netherlands | 100 | 41% wereemployed before and 31% after CBT;patients who worked, worked5 h less after CBT | Not used |
Stordeur etal. (2008) [62] | CBT and GET evaluation in Belgian CFS clinics | 655 | Employment status decreased from 18.3% to 14.9%; percentage of incapacitated persons increased from 54% to57% | ”Physical capacity (maximal or sub-maximal)did not change between start andend of the treatment” (p. 80) |
Collin and Crawley (2017) [63] | Evaluation of CBT and GET in 11 English CFS clinics | 952 | Aftertherapy: 47.2% unchanged working status;18.0% worked againor longer; 30.0% stoppedworking or workedless because of CFS | Not used |
Collin et al. (2018) [64] | Evaluation ofthe presence/absence of 5 symptoms (musclepain, joint pain, headache, sore throat, andpainful lymph nodes) which can occur in addition to the 3 symptoms (post-exertional malaise, cognitive dysfunc- tion, and disturbed/unrefreshing sleep)that are present for almost all patients in 12 specialist CFS/ME services (11 UK,1 NL) | 918 (UK) and1392 (Dutch) | Not used | Not used |
Adamson et al. (2020)[65] | Outcomes from a specialist clinic in the UK | 995 | Not used | Not used |
Scores after CBT in comparison to healthy people
Kuut et al. investigated the clinical relevance of the effects of CBT by using a CIS-fatigue score of < 35 (no longer severely fatigued), physical functioning score of >70 (“no longer impaired in physical functioning”) and a SIP8 score of < 700 (“no longer functionally impaired”) [13]. Most studies in the review had a fatigue severity entry requirement of 35 or more on the Checklist Individual Strength (CIS) fatigue severity sub-scale. Many of the included studies did not have an entry requirement on the Sickness Impact Profile–8 (SIP8) but in Knoop et al. [51] and Wiborg et al. [55], the entry requirement was a total score of more than 700. According to the same study by Knoop et al., “the CIS sub-scale ‘fatigue severity’ was used to measure the level of fatigue over the past 2 weeks. Scores ranged from 8 (no fatigue) to 56 (severely fatigued). The weighted total score on eight sub-scales of the SIP8 (SIP8 total score) was used to assess functional disability in all domains of functioning. Physical disabilities were measured with the physical functioning sub-scale of the 36-item Short Form Health Survey (SF–36). Scores ranged from 0 (maximum physical limitations) to 100 (ability to do vigorous activity)” [51]. Additionally, in the Qure study, which investigated the efficacy of CBT and doxycycline for Q fever fatigue syndrome and which included two researchers of this systematic review (Bleijenberg and Knoop), “significant disabilities in daily functioning” was defined by a “score ≥450 on the Sickness Impact Profile [SIP8]” [66].
Kuut et al. reported that the mean scores after treatments were 34.49 (fatigue severity), 920.98 (functional impairment) and 73.42 (physical functioning) [13]. Table 4 puts those into perspective by comparing them to severe impairment and scores of healthy people.
Outcome | After CBT | Severe impairment | Scores of healthy people |
---|---|---|---|
CIS fatigue severity | 34.49 | 35 or more | 17.3 |
Functional impairment (SIP8 total score) | 920.98 | More than 700 or ≥450 in the Qure study | 65.5 |
SF–36 physical functioning | 73.42 | score of 70 or below | 93.1 |
According to a non-randomised study without a control group by Knoop et al. entitled, “Is a full recovery possible after cognitive behavioural therapy for chronic fatigue syndrome?”, which also included Bleijenberg, “healthy adults with a mean age of 37.1 have a mean score on the CIS-fatigue of 17.3”, “healthy adults without a chronic condition…[have]…a mean [physical functioning] score of 93.1” and “the mean SIP8 total score of healthy…women is 65.5” 172 [47].
Consequently, as can be seen in table 4, patients were still severely disabled after treatment labeled as effective by the systematic review.
Table 5 highlights the fact that only two of the seven published studies had a physical functioning entry requirement. Studies that did not have a physical functioning entry requirement, showed the following. In Janse et al. [40], the baseline physical functioning score was 62.4 with a standard deviation of 21.1 so that many patients already had a physical functioning score of 80 or more before receiving any treatment. In Wiborg et al. [55], this was 55.4 with a standard deviation of 18.8 so that many patients already had a physical function score of 74 or more. In Nijhof et al. [52], the physical function score at baseline was 60.7 with a standard deviation of 14.5 so that many patients already had a physical function score of 75 or more at baseline. These three studies without a physical functioning entry requirement, therefore artificially inflated the mean physical functioning score after treatment of the studies combined, which, according to the systematic review, was 73.42. Moreover, in Nijhof et al. in only 28% of cases was the onset of the disease in the treatment group after an infection, even though ME/CFS is a post-infectious disease. Consequently, that puts the diagnosis of ME/CFS in doubt in 72% of cases.
In for example Tummers et al., severely disabled, was operationalized as a SF 36 physical and/or social functioning score of 70 or below [16] and a CIS fatigue score of 35 or higher, indicated severe fatigue. In the two studies that did have a physical functioning entry requirement, the scores after treatment show that patients remained severely disabled.
Study | Physical functioningafter therapy | Entry requirement? |
---|---|---|
Prins et al. (2001)[49] | Not an outcome measure | No |
Stulemeijer et al. (2005)[50] | 69.4 | 65 or less |
Knoop et al. (2008)[51] | 65.9 | No |
Nijhof et al. (2012)[52] | 88·5 | No |
Tummers et al. (2012)[16] | 65.4 | 70 or less |
Van der Schaafet al. (2015) [53,54] | Non-published study | Non-published study |
Wiborg et al. (2015)[55] | 74.4 | No |
Janse et al. (2018)[40] | 73.3 | No |
Physical functioning scores “range from 0 (maximum limitations) to 100 (no limitations)” (p. 375 [42]).
Work related outcome
In the study by Tummers et al., guided self-instruction consisted of a booklet and in chapter 9 patients were invited “to make a plan for work resumption. This plan contains the date when a patient will resume work, and how the patients will increase the number of hours worked.” And in chapter 13, “patients attain the goals as formulated in chapter 1 step by step, including resumption of work” [16]. The intervention in Knoop et al. “consisted of a self-instruction booklet containing information about chronic fatigue syndrome and weekly assignments. The programme took at least 16 weeks, but often more if patients formulated long-term goals such as returning to work” [51].
Why Tummers et al. and Knoop et al. did not assess work status before and after treatment, if resuming work is an important goal of the intervention, is unclear. Unless that’s because the study by Prins et al. already showed that CBT doesn’t lead to an improvement in work and employment status [49].
Conflict of interest and involvement in the studies
The systematic review was done by a group of Dutch researchers which included psychology professors Bleijenberg and Knoop. Both researchers have successfully built a career on the CBmodel and the claimed efficacy of CBT for ME/CFS. They have both become professors as a consequence of that and together with a professor of internal medicine (Van der Meer), they are the three leading experts in the Netherlands. Moreover, together with four mental health professors from the UK, they are seen as the leading worldwide experts and for many years, CBT has been promoted by guidelines all over the world based on their research. These investigators are deeply committed to the ‘unhelpful cognitions’ theory of ME/CFS, which they and other colleagues had originated and/or continue to actively promote. If their systematic review would have failed to show that CBT is an effective treatment for ME/CFS, then that would have undermined their careers.
As can be seen in table 6, Bleijenberg was involved in 7 of the 8 RCTs that were analyzed in the systematic review and Knoop was involved in 5 of them. In the world of academia, this is known as marking your own homework. However, this is not listed in the competing interest section of their article. The only competing interest of both researchers, according to the article itself, is the following: “HK [Knoop] and GB [Bleijenberg] receive royalties for a published manual of CBT for ME/CFS” [13]. On top of that, seven of the eight studies in this systematic review were done by the same Institute from the Netherlands (the Expert Centre for Chronic Fatigue, in Dutch, het Nederlands Kenniscentrum Chronische Vermoeidheid also known as the NKCV) and the systematic review was conducted by the leader of that Institute (Knoop). The only study in this systematic review which was conducted by researchers from a different Institute, was the FITNET study by Nijhof et al., yet Bleijenberg from the NKCV, was one of the researchers of that study as well. If the non-published study would be excluded, then Bleijenberg was involved in all seven studies of the systematic review and Knoop in four of them. Moreover, in four of these seven studies, both of them were involved as researchers and at least one of them was involved in all studies. Additionally, there is also a potential financial conflict of interest as the NKCV [67] is the main center in the Netherlands which earns money from treating ME/CFS patients with CBT. As noted, one of the systematic reviewers (psychologist Knoop) is the leader of that Institute and one of the other systematic reviewers (Kuut) is a psychologist in that center. A negative outcome of the systematic review might have meant that Dutch healthcare insurance companies would stop reimbursing CBT for ME/CFS.
Finally, the three professors who were involved in the systematic review (Bleijenberg, Moss-Morris and Knoop), have all built their career on the CBmodel and the reversibility of ME/CFS through CBT and GET. Why none of this is mentioned as a conflict of interest, and why it wasn’t brought to the attention of the readers of the article, is unclear even though the risk of latent bias was palpable from the outset.
Study | Bleijenberg | Knoop | Other authors of the system- aticreview |
---|---|---|---|
Prins et al. (2001)[49] | Yes | No | No |
Stulemeijer et al. (2005)[50] | Yes | No | No |
Knoop et al. (2008)[51] | Yes | Yes | No |
Nijhof et al. (2012)[52] | Yes | No | No |
Tummers et al. (2012)[16] | Yes | Yes | No |
Van der Schaaf et al. (2015)[53,54] | No | Yes | No |
Wiborg et al. (2015)[55] | Yes | Yes | No |
Janse et al. (2018)[40] | Yes | Yes | No |
Total numberof studies involved | 7 of the 8 | 5 of the 8 | 0 of the 8 |
Other authors of the systematic review: Kuut, Buffart, Braamse, Csorba, Nieuwkerk, Moss-Morris, Müller.
Safety
Generally, systematic reviews focus on the efficacy or effectiveness of therapeutic interventions, however, a balanced assessment of interventions also requires the analysis of harms. According to a systematic review by Ernst and Pittler who investigated the reporting of safety aspects of therapeutic interventions by systematic reviews and meta-analyses, “information on safety is equally important for making informed, evidence based decisions on the value of a given treatment” [68]. Yet most systematic reviews “provide little information on the safety aspects of therapeutic interventions” [68]. This “may be explained, in part, because safety is not the primary aim of most reviews” [69]. The PRISMA harms checklist which specifically measures the reporting of harms, was published in 2016 to improve it [69]. Safety was not reported on by Kuut et al. [13]. Why they did not use the PRISMA harms checklist is unclear.
Issues with the review and the studies in it
In this article, we analyzed the systematic review by Kuut et al. and the eight trials in it. In our analysis we found many issues with the studies in the review, but also with the review itself. For example, one of the eight included studies was a non-published, non-peer reviewed study which is not available on the Internet, and as a consequence this study should have been excluded from the review. All studies were by definition non-blinded, yet just like this systematic review, they relied on usually one subjective primary outcome. The combination of non-blinding and subjective outcomes is known to exaggerate the treatment effect of a treatment under investigation. This problem was made bigger by using badly designed control groups yet the systematic review didn’t flag these things up as problematic. Of note is that the non-published study used two subjective and two objective primary outcomes. The study started in January 2014 with 60 participants in the treatment and control group each. Why the study has not been published so far, is unclear, although it might well be that the objective primary outcomes showed that CBT did not lead to objective improveimprovement and the researchers did not want to publish a null effect. It might also be that medical journals did not want to publish a study with negative results although in the last five years or more, more and more medical journals do want to publish studies with negative results, to counter publication bias.
Other systematic reviews
According to Kuut et al., “CBT leads to a significant and clinically relevant reduction of fatigue and functional impairment, and improvement of physical functioning in ME/CFS patients. This is in accordance with findings of one systematic review (Ingman et al., 2022) and of seven previous meta-analyses” [13]. They do not mention the systematic review by Ahmed et al. entitled “Assessment of the scientific rigour of randomized controlled trials on the effectiveness of cognitive behavioural therapy and graded exercise therapy for patients with myalgic encephalomyelitis/chronic fatigue syndrome: A systematic review” [70]. Ahmed et al. concluded that “in order to securely demonstrate the efficacy of CBT/GET within a non-blinded design, researchers need to show that self-reported improvements are supported by objectively measurable outcomes” [70]. Moreover, the final conclusion of Ahmed et al. was that “the findings of this systematic review do not support the claim that CBT and GET are effective treatments for ME/CFS patients” [70].
Moreover, the systematic review by Ingman et al. noted among the limitations of their own review “that most studies reported subjective, self-report measures, which may have increased the risk of observer or detection bias” [71] but also that “subjective and objective measures do not necessarily correlate” [71]. Ingman et al. go on to conclude that “results suggest some support for the positive effects of CBT and GET at short-term to medium-term follow-up although this requires further investigation given the inconsistent findings of previous reviews” [71]. Their conclusion about “the inconsistent findings of previous reviews” shows that it is incorrect when Kuut et al. state that more reviews have provided evidence for the efficacy of CBT and GET. Of note is that one of the researchers of Ingman et al. is Professor Chalder, one of the four leading proponents of CBT from the UK, and one of the principal investigators of the PACE trial, the largest CBT and GET trial for ME/CFS so far. Ingman et al. also concluded that “a final limitation is that treatment in most trials involved al least some face-to-face sessions, thereby excluding less mobile or housebound participants, therefore findings may not be generalizable to severe CFS” [70]. This is of particular interest, because Kuut et al. concluded that “patients with less severe functional impairment benefitting more [from treatment with CBT] as compared to patients with severe functional impairment” [13]. The (very) severely disabled, those who are home or bedbound – 25 per cent according to most estimates [5] – are unable to attend outpatient clinics and take part in those studies. Consequently, those patients with severe functional impairment as mentioned by Kuut et al., are not patients with severe ME/CFS, as noted by Ingman et al., but are in fact moderately affected ME/CFS patients, who are able to attend outpatient clinics and take part in those studies, yet have more functional impairment than the mildly affected patients in the same studies. Consequently, findings from RCTs are not generalisable to the wider CFS population and the CBT trials reviewed here are inherently biased as a consequence of that. On top of that, the FINE trial [72], the sister trial of the PACE trial, examined the efficacy of CBT and GET for the more severely affected. They found that both treatments do not lead to objective improvements in the severely affected.s
Consequently, what Kuut et al. mean is that the only patients who benefit subjectively from treatment with CBT according to their review, are the mildly affected patients. But even that is incorrect as can be seen in Table 3. The mean fatigue score of 34.5 borders on the score of severely fatigued (35 or more). The mean physical functioning score of 73.4 is less than the change on one question (one gets five points per question on this item with 20 questions) away from being severely disabled, operationalized as scoring 70 or less by for example, Tummers et al. [16]. As noted earlier, only two of the seven published studies used an entry requirement for physical functioning. One of the studies that didn’t is the study by Janse et al. [40] and many patients in the study already had a physical functioning score of 80 at baseline. This not only puts the diagnosis of ME/CFS in doubt, but it also artificially inflates the physical functioning score after treatment, because these patients are hardly ill, and consequently do not have a problem exercising. Another study, in which this was a major issue, is the study by Nijhof et al. [52]. In this study, in only 28% of participants in the treatment group, their ME/CFS followed an infection, even though ME/CFS is a post-infectious disease. This is of particular importance because of the very high physical functioning score (88.5) after treatment in comparison to the other studies which will have artificially inflated the mean physical functioning score of the systematic review. This suggests that this score is a problematic outlier representing poor sampling [73]. Hawkins described an outlier as an observation that “deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism” [74].
In the two studies that did have an entry requirement, the physical functioning score after treatment was less than 70. As noted by Janse et al., “the fact that our study did not select on the level of physical functioning will make it more difficult to find an effect of iCBT on physical functioning” [40]. Why the systematic review did not take this into account is unclear, even more so because two of the systematic reviewers (Bleijenberg and Knoop) were part of that study. Why the systematic review didn’t remove the outlier score from Nijhof et al. is also unclear.
After treatment deemed to be effective by the systematic review, the mean SIP8 score of 921 is far away from the score from no longer being functionally impaired (less than 700 or less than 450 in a study by the same researchers for Q fever fatigue syndrome [66]. All three scores (fatigue, physical functioning and SIP8) are a long way away from healthy individuals of the same age as highlighted by table 4, even though the basis of the CBmodel as an explanation for the symptoms in ME/CFS, is that there is no underlying disease and that symptoms are a result of avoidance of exercise and deconditioning. CBT, but also GET, are designed to reverse this and by doing so lead to recovery. Yet after treatment deemed to be effective by Kuut et al., patients remain severely disabled. Moreover, the aforementioned systematic review by Ahmed et al. concluded that “the methodological quality of the 18 included studies was found to be relatively low, as bias was prominently found, affecting the main outcome measures of the studies (fatigue, physical functioning and functional impairment/status)” [70].
Objective outcomes
Only four studies in the review used an objective outcome (actometer) yet three of these studies did not publish those results in their initial publications. Two studies published them five and, respectively, nine years later and the results showed that CBT does not lead to objective improvement in adults and adolescents, respectively. The third study did not publish their objective outcome (actometer) at all, but acknowledged later in a comment that Internet‐based CBT does not lead to objective improvement. The fourth study concluded that the improvement in their study according to the actometer, was artificially caused by a high number of missing data on this outcome.
One study also used two other objective outcomes (an objective outcome of cognitive functioning and employment status). Both these objective outcomes showed no objective improvement either. Consequently, the objective outcome measures confirm the inefficacy of CBT for ME/CFS. Of note is that three of the four studies that used objective outcomes, used selective reporting to avoid publishing a null effect.
Vercoulen et al., which included one of the authors of the systematic review (Bleijenberg), concluded that “the actual level of physical activity was related to fatigue severity”, and that “fatigue severity was related to the Actometer” [25]. But also that “the CIS [fatigue severity] also is a self-report questionnaire requiring a general subjective interpretation. Thus, responses on these instruments are susceptible to the same biases” [25]. In view of that, it is unclear why the systematic review by Kuut et al. ignored the actometer results but also why they ignored the objective outcomes in general. Even more so because according to Kuut et al., “in all included trials, a specific CBT protocol was used. All CBT protocols for ME/CFS are based on the cognitive-behavioral model of fatigue assuming that cognitive-behavioral factors perpetuate fatigue and associated disability. They generally focus on similar perpetuating factors. Further, all CBT protocols for ME/CFS contain graded exposure to activity, a central element of the intervention” [13]. In most studies, the treatment lasted six months or more and if there is no underlying physical disease, then one would expect a substantial improvement in fitness and activity after such a time frame. Yet the objective outcome measures showed that a graded exposure to activity doesn’t lead to objective improvement. By definition of the CBmodel, if deconditioning, i.e. the level of fitness of patients did not change, then symptoms cannot have changed either. Consequently, the subjective improvement after CBT is an artifact caused by all the problems with the design of the studies and the different forms of bias. It also shows that CBT is not the right treatment for ME/CFS.
High risk of bias
Kuut et al. acknowledge that a limitation of their review is that “none of the included studies were rated as having a low risk of bias. All included studies used patient-reported outcomes, namely subjectively experienced symptoms and functional impairments.” They then downplayed the relevance of this by stating the following. “All case definitions of ME/CFS rely on reports of patients of subjectively experienced symptoms. Therefore the efficacy of interventions aimed at symptoms of ME/CFS can only be determined with patient-reported outcome measures” [13]. Even though they also acknowledge that “in the Cochrane risk of bias tool studies are penalized if the outcome assessor (the patient) was aware of the intervention received. However, this limitation is inherent to the evaluation of behavioral/psychotherapeutic interventions using a subjective outcome measure” [13]. Yet this is puzzling for a number of reasons. First of all, the basis of the CBmodel as an explanation for ME/CFS is that patients interpret their symptoms incorrectly. It is illogical and unscientific to then rely on patients interpreting their symptoms by using subjective outcomes because that would imply that patients are interpreting their symptoms wrongly but at the same time they are also interpreting the same symptoms rightly. Moreover, CBT studies are non‐blinded by definition, and relying on subjective outcomes in those studies is very prone to all sorts of different forms of bias. Consequently, one could get the impression of efficacy, even though patients have not benefited from the treatment. An additional problem of the CBT studies is that patients are instructed to interpret their symptoms differently. A subjective improvement could then simply be caused by answering questionnaires in a different way. This is also known as response shift bias, and the only way to correct for this is by using objective outcomes. Moreover, the GETSET study by White, one of the world’s leading CBT and GET proponents for ME/CFS, noted the following about using subjective outcomes: “All outcomes were self‐rated, which might lead to bias by expectation.” But also that “objective outcomes, such as actigraphy,…might have tested the validity of our self‐rated measures of physical activity” [75]. Finally, if Kuut et al. want to rely on patient reported outcomes, then why do they ignore that patients have been saying for a long time that CBT is ineffective?
Badly designed control groups
Six of the eight studies used a waiting list control group, even though the basic requirement of a properly conducted randomized controlled study is that the only difference between the treatment and control group is the treatment. Interestingly enough, this problem was documented by one of the studies in the meta analysis as long ago as in 2008 (Knoop et al., which included two of the authors of the meta analysis, Knoop and Bleijenberg) when they noted that “as we did not use a control condition [they used a waiting list control group] we cannot be sure that the specific elements in the minimal intervention condition were responsible for the reduction of fatigue and disabilities” [51].
Moreover, a meta analysis by Furukawa et al. entitled “waiting list may be a nocebo condition”, concluded that “the effect size estimates for CBT [in the treatment of depression] were substantively different, depending on the control condition” [39]. Also, that “in individual trials of psychotherapy, the use of WL [a waiting list control group] as control should be more carefully deliberated, as it” doesn’t control “for regression towards the mean and the natural course of the disease but instead it may introduce negative psychological expectation of ‘waiting for the desired active treatment’” [39] which artificially inflates the treatment effect. According to the same meta analysis by Furukawa et al., “the apparent existence of small study effects is another major threat” [39] which affected three of the seven studies, or four of the eight studies, when the non-published, non-peer reviewed study is included, which all had less than 75 participants in the treatment group.
Work rehabilitation
The study by Prins et al. noted that “the final goal of CBT for CFS included work rehabilitation for patients who used to be active in a job” (p. 846 [49]). They tried to downplay the null effect on employment by stating that “only 33% had a job at baseline, whereas 76% had been employed before the onset of CFS. For the unemployed patients, securing employment within the limited period of treatment and follow-up would be difficult” [49]. Yet treatment finished after eight months and follow-up was after 14 months. Consequently, participants had at least six months to find work but if the improvement on the subjective outcomes had been a real improvement, then patients were already improving in the last few months of treatment and therefore would have had most likely two, three or four extra months during the last months of treatment to start and look for a job. In reality, they would then have had six to ten months to look for a job. One would assume that many people aged 36.2, the mean age in Prins et al., should be able to find a job in that sort of timeframe if the “clinically significant improvement…in fatigue severity” which occured in 35% of cases, in Karnofsky performance status (49%), and self-rated improvement (50%)” [49] as reported by Prins et al., were real improvements and not artifacts.
Important other limitations
Tummers et al. noted a few important limitations of their study, which might also apply to the other studies in the systematic review. During the study, the authors found out that 10% of patients (12/123) in their study were misdiagnosed and therefore, “were wrongfully included in this study” yet they “were not excluded from data analyses [16] even though patients who do not have the disease under investigation, should be excluded from a study and its data analyses. Most other studies did not record misdiagnosing.
The second limitation of their study, as noted by Tummers et al. themselves, was that the “assessment of the physical activity patterns of patients was not based on actometer scores, a valid and reliable method to determine the activity pattern” [16]. Moreover, “treatment adherence…was not assessed” [16]. This was not assessed by most of the other studies in the systematic review with the notable exception of Janse et al.. According to supplement number two of this study, only 16% (13/80) adhered to the treatment in the protocol-driven feedback and only 19% (15/80) in the feedback on demand treatment group [40]. Consequently, 81% and 84%, did not adhere to treatment. Yet one cannot conclude that a treatment is safe and effective if one does not know if patients have actually adhered to the treatment or not, or if the large majority of them has not adhered to the treatment as was the case in Janse et al..
Safety
An important part of a systematic review is also reporting about the safety of a treatment. Kuut et al. however, did not report about the safety of CBT yet it is unclear why they didn’t use the PRISMA harms checklist to do so. Although, as noted by the aforementioned systematic review by Ahmed et al., “not reporting adverse events is typical for this field as psychotherapy trials generally report infrequently on adverse outcomes” [70].
According to a systematic review by Ernst and Pittler, the reporting of safety aspects of therapeutic interventions by systematic reviews and meta-analyses, is poor. But also that randomized clinical trials typically assess only a small number of patients which limits the chances of detecting adverse events. Therefore, they recommend that the “assessment of safety has to go far beyond randomized clinical trials” [68]. One of those ways is the reporting by patients of adverse events due to a particular treatment in real life. Patients have been saying since as early as 1990 that CBT and GET are harmful [76]. Why Kuut et al. ignored this, especially as they emphasize the importance of patient reported outcome measures, and their systematic review relies on them, is unclear.
The British National Institute for Health and Clinical Excellence (NICE) published its updated ME/CFS guidelines in October 2021 [12]. As part of that review process, it commissioned the Oxford Brookes University to carry out a survey amongst ME/CFS patients (n = 2274) on the safety of CBT and GET. The Oxford Brookes University published its report in February 2019 [77]. In it, they concluded the following: 98.5% of the patients who took part in the survey experienced post-exertional malaise, the core symptom of the disease. Worsening of ME/CFS symptoms after treatment with CBT, which contains an element of GET, was reported by 58.3% and the percentage of severely affected patients increased from 12.6% to 26.6%. Or to put it differently, 14% of patients were made homebound or bedridden by CBT.
Finally, in an interview on Dutch TV [78], pediatrician Terheggen-Lagro and medical psychologist Oostrom from the Amsterdam UMC (University Medical Centre), a leading hospital in the Netherlands, stated in January 2024, that they started to treat children with long Covid with CBT as designed for ME/CFS, which contains an element of graded exercise therapy (GET), and GET. However, they noted that instead of helping patients recover, it made them worse, just like in ME/CFS. Consequently, they stopped using these treatments.
NICE updated ME/CFS guideline
NICE published its updated ME/CFS guideline in October 2021 [12], in which it concluded that CBT does not lead to improvement or recovery and that it should only be used as an adjunctive treatment if patients need it. Two of the researchers of the systematic review have built their career on the CBmodel and the efficacy of CBT for ME/CFS and one of them (Knoop) has already co-written two articles to criticize the new NICE guidelines and its processes. One of those is entitled “Anomalies in the review process and interpretation of the evidence in the NICE guideline for chronic fatigue syndrome and myalgic encephalomyelitis” and it was published in a journal in which one of the co-authors is a paid associate editor according to the competing interests section of the article [79]. The other one entitled “New NICE guideline on chronic fatigue syndrome: more ideology than science?” was published in The Lancet [80]. Our analysis of this article found that it was based on rhetoric and ideology, instead of evidence based science [81]. This all gives the impression that the systematic review was the third article in a row, to counter the conclusion by NICE because the authors of those three articles cannot accept the pragmatic shift away from the opinion and eminence based biopsychosocial and cognitive behavioral model. This seems to be a typical example of the following noted by Ioannidis. “Investigators working in any field are likely to resist accepting that the whole field in which they have spent their careers is a “null field.”” (p. 0700 [82]).
Additionally, the systematic review by Kuut et al. included studies from 2001 till 2018 and all those studies were already known to NICE and included in their extensive analysis of the CBT literature. This showed that all the CBT trials included in the review process to update the NICE ME/CFS guideline, were of low or very low quality [12,83]. Our analysis of the systematic review by Kuut et al. and all the studies in it, highlighted the extensive number of problems with those studies and confirmed the conclusion by NICE that none of those studies were of higher quality.
Non-reported conflict of interest
A potentially important limitation of the systematic review is that one of its authors (Bleijenberg) was involved in seven of the eight RCTs that were analyzed and another one (Knoop) was involved in five of them. After excluding the non-published non-peer reviewed study, then one of the authors (Bleijenberg) was involved in all of the seven studies, and the other one (Knoop) in four of the seven. Or to put it differently, at least one of them was involved in all seven studies and both of them were investigators in four of the seven studies. In the world of academia, this is known as marking your own homework which might have influenced the review and its outcome [84]. Also, two of the systematic reviewers might have financial interests in the outcome of their review and three professors (Bleijenberg, Moss-Morris and Knoop) who were involved in the systematic review, have all built their career on the CBmodel and the reversibility of ME/CFS through CBT and GET. As noted by Dana and Loewenstein [85], when individuals have a stake in reaching a particular conclusion, they weigh arguments in a biased fashion that favors a specific conclusion. Moreover, as noted by Groopman [86], scientists often ignore what they don’t want to see and seek confirmation of what they believe. Consequently, the three professors had a preexisting professional stake in the outcome of their systematic review and their impartiality might reasonably be questioned. If the systematic review had failed to show that CBT/GET is an effective treatment for ME/CFS, then that would have undermined the CBmodel and the theories of reversibility to which three of the systematic reviewers have dedicated their careers. In conclusion, some of the systematic reviewers have an interest that could be affected substantially by the outcome of the systematic review, whether financially or otherwise, yet, none of these conflicts of interests are listed in the competing interest section of their article, nor is it mentioned in the abstract, the discussion and the conclusion. As a consequence, most readers will not be aware of these.
According to the publication ethics section of the journal (Psychological Medicine) that the systematic review was published in, “competing interests are situations that could be perceived to exert an undue influence on the presentation, review or publication of a piece of work” (p. 1 [84]). Moreover, according to, for example, a leading medical journal (the British Medical Journal, usually referred to as the BMJ), their journal “should know about any competing interests that authors may have, and that if we publish the article readers should know about them too” [87] yet, the researchers did not inform their readers.
A systematic review by Dragioti et al. not only concluded that “experimenter’s allegiance influences the effect sizes of psychotherapy RCTs and can be considered non-financial conflict of interest introducing a form of optimism bias, especially since blinding is problematic in this kind of research” [46]. But also that “reported effect sizes were found to be larger by almost 30% when the allegiant therapist had participated in the respective RCT compared to studies in which he was not included in the authorship list” [46]. This is of particular importance because Bleijenberg and Knoop have built their career on the CBmodel and the efficacy of CBT for ME/CFS and Bleijenberg participated in seven of the eight included RCTs in the systematic review and Knoop in five.
Finally, in March 2009, “the World Association of Medical Editors revised its policy on competing interests to emphasize nonfinancial competing interests, including academic commitments, personal relationships, political or religious beliefs and institutional affiliations” [88]. As a consequence, the Canadian Medical Association Journal changed its policy accordingly, and in an editorial it emphasized that “full disclosure [of competing interest] is particularly important for authors of commentaries, editorials and review articles [italic by us]. Because such articles often offer explicit guidance, readers expect a stronger guarantee of integrity” [88]. Why the researchers of the systematic review ignored all that, is unclear.
Checklist of recommendations on how to design and conduct treatment studies for ME/CFS
There are no effective treatments for post infectious diseases like ME/CFS and long Covid because the medical profession has been psychologising those diseases for decades. If we had taken them seriously, as we should have, then by now we would have had effective pharmacological treatments for the estimated 400 million people with long Covid [89] and the medical profession would not be clutching at straws on how to treat those patients. The same applies to the estimated 17 to 24 million people with ME/CFS [90]. As noted by Rohrhofer et al., “given the substantial health and socioeconomic burden associated with ME/CFS, urgent attention and research efforts are needed to define causative treatment approaches“ [91].
Treatment studies to define and find causative treatment approaches need to adhere to the highest standards of clinical research to be able to find effective pharmacological treatments for post infectious diseases. Requirements for those studies, which will allow for reliable and valid conclusions, are:
• The research population needs to be homogenous;
• There need to be at least 75 participants in the treatment group and a similar number in the control group unless it’s a phase one or feasibility study [41,42];
• The Oxford criteria [92], that do not require the main characteristic of the disease (PEM) for diagnosis, or the Fukuda criteria [93], according to which the main characteristic is optional, should not be used anymore;
• Studies should use the International Consensus Criteria [94];
• Primary outcomes should be objective (i.e. the steptest, six minute walk test, actometer, hours worked, as discussed earlier) [17,22,24-27,35];
• One additional subjective primary outcome (quality of life scores) could be used [95];
• Fatigue should not be used as a primary outcome because a subjective “improvement” of a few points on a fatigue scale, is not relevant to patients, even more so because many ME/CFS patients do not suffer from fatigue;
• Studies of psychological treatments and gradually increasing activity should not be performed or financed anymore, to stop wasting time, money and resources and to prevent harming more patients;
• Control groups should be properly matched and designed as discussed earlier;
• Studies should not use a wait list, no treatment or usual care, or other control groups where patients do not receive a control treatment with the same frequency, duduration, expectation of improvement, et cetera from the study as people in the treatment group [36-40];
• Patients and/or carers should be involved in the design and conduct of these studies;
• Definitions of recovery should be based on objective outcomes;
• Non-blinded studies that deem treatments to be effective, based on subjective outcomes, should be rejected for publication;
• Misdiagnosed patients should be removed from a study and its analysis;
• Future trials are particularly needed for individuals with severe and very severe ME/CFS because they are devoid of any medical care, despite being severely ill;
• Studies that resort to selective reporting of objective outcomes should be rejected by journals;
• Studies that make extensive outpoint changes, use a post hoc definition of recovery, have an overlap in entry and recovery criteria, label the severely ill as recovered, et cetera, should also be rejected by journals.
In summary, the systematic review by Kuut et al. included a researcher who was involved in seven of the eight studies in their review, and another one who was involved in five of them. Moreover, at least one of them was involved in every study. On top of that, the three professors who were involved in the systematic review, have all built their career on the CBmodel and the reversibility of ME/CFS through CBT and GET and two of the systematic reviewers have a potential financial conflict of interest. Yet they failed to inform the readers about these conflicts of interest. Conducting a review in this manner and not informing the readers, undermines the credibility of a systematic review and its conclusion. Regarding outcome differences between treatment and control group, it’s highly likely that the combination of non-blinded trials, subjective outcomes and poorly chosen control groups, alone or together with response shift bias and/or patients filling in questionnaires in a manner to please the investigators, allegiance bias, small study effect bias and other forms of bias, produced the appearance of positive effects, despite the lack of any substantial benefit to the patients, leading to the erroneous inference of efficacy in its absence. That CBT is not an effective treatment is highlighted by the fact that patients remain severely disabled after treatment with it. The absence of objective improvement as shown by the actometer, employment and disability benefit status, and objective cognitive measures, confirms the inefficacy of CBT for ME/CFS. Kuut et al. did not report about the safety of CBT. The Oxford Brookes University did, and their research shows that CBT, which contains an element of graded exercise therapy, is harmful for many patients. Finally, our reanalysis highlights the fact that researchers should not mark their own homework.
Author Contributions: Conceptualization, M.V.; methodology, M.V. and A.V.-N.; validation, M.V. and A.V.-N.; writing—original draft preparation, M.V.; writing—review and editing, M.V. and A.V.-N.; and supervision, M.V. and A.V.-N. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: Not applicable.
Conflicts of Interest: The authors declare that they have no conflict of interest.