How similar are estimates of treatment effectiveness derived from randomised controlled trials and observational studies?

Key messages

- On average, the effect estimates of randomised controlled trials (RCTs) and observational studies differ only very slightly. Effect estimates are statistical constructs that describe the size of an intervention effect in terms of the difference between the outcomes of two groups of people in a clinical trial or study.
- We need more research with careful consideration of factors that might impact on the similarities and differences in effect estimates between different study types.

What are RCTs and observational studies, and why do their effect estimates potentially differ?

Randomised controlled trials (RCTs) are a type of healthcare experiment where participants are allocated at random to one of two (or more) treatment groups. One group is given an experimental treatment (also known as an 'intervention'); the other is the 'control' group, which is not given the intervention. RCTs test how effective and safe an experimental treatment is under ideal conditions.

Observational studies try to measure the effectiveness of an intervention in non-experimental, 'real world' scenarios. Case-control (or retrospective) studies and cohort studies are two common types of observational study. Case-control studies compare a group of people with a particular condition/disease to a group who do not have it but are otherwise similar. Cohort studies follow a group of people with a common characteristic over time to find out how many reach a certain health outcome of interest.

Sometimes, the results of RCTs and observational studies addressing the same question may have different results. These types of study differ in how they are conducted and their susceptibility to systematic error.

What did we want to find out?

We wanted to assess the impact of study type (RCT versus observational studies) on the summary effect estimate and to explore methodological aspects that might explain any differences.

What did we do?

We searched databases for reviews that systematically compared the effect estimates reported in RCTs and observational studies that addressed the same health research question. We looked for reviews that included any healthcare outcomes, without restrictions on the language of publication. We searched for reviews/overviews published between 01 January 1990 and 12 May 2022. We then compared the results of the reviews, and summarised the evidence. We rated our confidence in this evidence, based on factors such as the methods used in the reviews and their size, and the consistency of findings across reviews.

What did we find?

We identified 47 relevant reviews; 34 contributed data to our main analysis. The reviews compared the effect estimates of RCTs to those of cohort studies, case-control studies, or both. The reviews addressed a variety of health-related topics. They were conducted in countries around the world, but most were done in the USA. Twelve reviews did not report any information on funding. In 8 reviews, the authors reported receiving no funding. In 23 reviews, the authors reported receiving public funding, such as governmental funding or funding from universities or foundations. Two reviews were funded by the European Union and two reviews reported receiving industry funding. Most funded reviews reported multiple sources of funding.

Main results

- We found that the effect estimates of RCTs and observational studies may differ very little to not at all.
- There may be small differences when we compare effect estimates of studies investigating only medicines (as opposed to other healthcare treatments, such as surgery or physical therapy).

We also found little difference in the effect estimates that were based on data from:
- meta-analysis of RCTs and observational studies that showed substantial statistical heterogeneity; that is, variability in the intervention effects being evaluated in the different studies;
- observational studies that either did not use or were unclear about how they used methods to account for population characteristics that can have an impact on the effectiveness of an intervention (propensity score adjustment);
- observational studies that did not give sufficient information about their study design.

What are the limitations of the evidence?

We have little confidence in the evidence because the included reviews might be at risk for systematic errors because of how they were conducted. Moreover, the reviews were about different types of people and interventions, meaning that the individual findings amongst the reviews differed considerably.

How up to date is this review?

The evidence is current to May 2022.

Authors' conclusions: 

We found no difference or a very small difference between effect estimates from RCTs and observational studies. These findings are largely consistent with findings from recently published research. Factors other than study design need to be considered when exploring reasons for a lack of agreement between results of RCTs and observational studies, such as differences in the population, intervention, comparator, and outcomes investigated in the respective studies. Our results underscore that it is important for review authors to consider not only study design, but the level of heterogeneity in meta-analyses of RCTs or observational studies. A better understanding is needed of how these factors might yield estimates reflective of true effectiveness.

Read the full abstract...
Background: 

Researchers and decision-makers often use evidence from randomised controlled trials (RCTs) to determine the efficacy or effectiveness of a treatment or intervention. Studies with observational designs are often used to measure the effectiveness of an intervention in 'real world' scenarios. Numerous study designs and their modifications (including both randomised and observational designs) are used for comparative effectiveness research in an attempt to give an unbiased estimate of whether one treatment is more effective or safer than another for a particular population. An up-to-date systematic analysis is needed to identify differences in effect estimates from RCTs and observational studies. This updated review summarises the results of methodological reviews that compared the effect estimates of observational studies with RCTs from evidence syntheses that addressed the same health research question.

Objectives: 

To assess and compare synthesised effect estimates by study type, contrasting RCTs with observational studies.

To explore factors that might explain differences in synthesised effect estimates from RCTs versus observational studies (e.g. heterogeneity, type of observational study design, type of intervention, and use of propensity score adjustment).

To identify gaps in the existing research comparing effect estimates across different study types.

Search strategy: 

We searched MEDLINE, the Cochrane Database of Systematic Reviews, Web of Science databases, and Epistemonikos to May 2022. We checked references, conducted citation searches, and contacted review authors to identify additional reviews.

Selection criteria: 

We included systematic methodological reviews that compared quantitative effect estimates measuring the efficacy or effectiveness of interventions tested in RCTs versus in observational studies. The included reviews compared RCTs to observational studies (including retrospective and prospective cohort, case-control and cross-sectional designs). Reviews were not eligible if they compared RCTs with studies that had used some form of concurrent allocation.

Data collection and analysis: 

Using results from observational studies as the reference group, we examined the relative summary effect estimates (risk ratios (RRs), odds ratios (ORs), hazard ratios (HRs), mean differences (MDs), and standardised mean differences (SMDs)) to evaluate whether there was a relatively larger or smaller effect in the ratio of odds ratios (ROR) or ratio of risk ratios (RRR), ratio of hazard ratios (RHR), and difference in (standardised) mean differences (D(S)MD).

If an included review did not provide an estimate comparing results from RCTs with observational studies, we generated one by pooling the estimates for observational studies and RCTs, respectively. Across all reviews, we synthesised these ratios to produce a pooled ratio of ratios comparing effect estimates from RCTs with those from observational studies. In overviews of reviews, we estimated the ROR or RRR for each overview using observational studies as the reference category.

We appraised the risk of bias in the included reviews (using nine criteria in total). To receive an overall low risk of bias rating, an included review needed: explicit criteria for study selection, a complete sample of studies, and to have controlled for study methodological differences and study heterogeneity. We assessed reviews/overviews not meeting these four criteria as having an overall high risk of bias.

We assessed the certainty of the evidence, consisting of multiple evidence syntheses, with the GRADE approach.

Main results: 

We included 39 systematic reviews and eight overviews of reviews, for a total of 47. Thirty-four of these contributed data to our primary analysis. Based on the available data, we found that the reviews/overviews included 2869 RCTs involving 3,882,115 participants, and 3924 observational studies with 19,499,970 participants.

We rated 11 reviews/overviews as having an overall low risk of bias, and 36 as having an unclear or high risk of bias. Our main concerns with the included reviews/overviews were that some did not assess the quality of their included studies, and some failed to account appropriately for differences between study designs – for example, they conducted aggregate analyses of all observational studies rather than separate analyses of cohort and case-control studies.

When pooling RORs and RRRs, the ratio of ratios indicated no difference or a very small difference between the effect estimates from RCTs versus from observational studies (ratio of ratios 1.08, 95% confidence interval (CI) 1.01 to 1.15). We rated the certainty of the evidence as low. Twenty-three of 34 reviews reported effect estimates of RCTs and observational studies that were on average in agreement.

In a number of subgroup analyses, small differences in the effect estimates were detected:

- pharmaceutical interventions only (ratio of ratios 1.12, 95% CI 1.04 to 1.21);
- RCTs and observational studies with substantial or high heterogeneity; that is, I2 ≥ 50% (ratio of ratios 1.11, 95% CI 1.04 to 1.18);
- no use (ratio of ratios 1.07, 95% CI 1.03 to 1.11) or unclear use (ratio of ratios 1.13, 95% CI 1.03 to 1.25) of propensity score adjustment in observational studies; and
- observational studies without further specification of the study design (ratio of ratios 1.06, 95% CI 0.96 to 1.18).

We detected no clear difference in other subgroup analyses.