Evidence-based medicine demands knowledge of the best available research evidence to guide clinical decisions, which most often comes from rigorous systematic reviews and meta-analyses that identify, summarize, pool, and appraise all research evidence addressing a particular research question. While results from individual studies may appear impressive, our confidence in the body of evidence may be undermined by issues such as limitations in study designs or differences between the questions addressed in studies and the question of interest. To address such challenges, the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group developed a system to evaluate the certainty (quality) of evidence—called GRADE. The GRADE approach intends to improve interpretation of evidence for decision-making [1].

The GRADE approach is applied to a body of evidence. A body of evidence refers to all research studies addressing a particular clinical question summarized in a systematic review and meta-analysis. According to the GRADE approach, the certainty of a body of evidence may be rated either as high, moderate, low, or very low (Fig. 1). High certainty evidence suggests that the estimated effect (the results from a rigorous systematic review and meta-analysis) is likely close to the true effect. Conversely, low or very low certainty evidence suggests that the estimated effect may in fact be substantially different from the true effect.

Fig. 1: GRADE Approach.
figure 1

Overview of the GRADE approach to assessing the certainty of evidence.

Original GRADE guidance proposed that for questions of causal inference, a body of evidence comprised of randomized trials starts at high certainty and non-randomized studies start at low certainty [2]. This is because, randomized trials, by virtue of randomization, achieve balance (more or less) in prognostic factors such that any observed differences in outcomes between randomized arms can be confidently attributed to the intervention under investigation. In non-randomized studies, however, participants with and without an exposure may differ with regard to prognostic factors such that any differences in outcomes between participants may be an artifact of differences in other prognostic factors. Even if investigators use sophisticated design and analytic methods to adjust for a comprehensive list of prognostic factors, important factors that are unknown or unmeasured may still influence results. This phenomenon is called residual confounding and is the reason why non-randomized studies are initially rated at low certainty.

Newer GRADE guidance now suggests that a body of evidence comprised of non-randomized studies can also start at high certainty and the certainty of evidence may be downgraded by considering limitations of the evidence in comparison to a “target trial”—a hypothetical trial, without any limitations, that may or may not be feasible, addressing the question of interest [3]. Using this approach however, a body of evidence comprised of non-randomized studies will almost always still land at low or very low certainty due to concerns with residual confounding.

The certainty of a body of evidence may be rated down by one or more levels due to concerns related to five factors: risk of bias (i.e., study limitations that may lead to systematic under- or over-estimation) [4], inconsistency (i.e., unexplained heterogeneity in results across studies) [5], indirectness (i.e., differences between the questions addressed in studies and the question of interest) [6], imprecision (i.e., the magnitude of confidence intervals around an estimate in relation to the minimum difference in the outcome that patients find important) [7], and publication bias (i.e., the tendency for studies with statistically significant results or positive results to be published, published faster, or published in journals with higher visibility) [8]. The certainty of evidence may also be rated up in select rare scenarios: when there is a dose-response relationship, a large effect, or when all plausible confounders act in the opposite direction than the observed effect [9].

GRADE is sometimes misinterpreted as evaluating risk of bias or misused as a risk of bias tool. GRADE, however, considers many factors beyond risk of bias and is intended to be applied alongside, rather than instead of, a risk of bias tool [4]. Systematic reviewers are expected to first assess the risk of bias of individual studies using an established risk of bias tool. Then, when applying GRADE, they should consider the risk of bias ratings of individual studies to make judgments about the risk of bias of the entire body of evidence. Of course, to apply GRADE, reviewers will also need to consider factors beyond risk of bias as described above.

Fig. 2: Macular Hole Closure Meta-Analysis Results.
figure 2

Meta-analysis comparing macular hole closure between face-down positioning and control.

Box 1 presents an application of the GRADE approach to a body of evidence addressing a question related to ophthalmology.

Updates

The GRADE Working Group has refined and updated its guidance since its inception. Official guidance from the GRADE Working Group is summarized in GRADE Guidance Papers, while GRADE Concept Papers and GRADE Notes discuss concepts relevant to GRADE and case studies [14]. While theoretically anyone can use the GRADE approach to evaluate the certainty of a body of evidence, the confident application of the GRADE approach will require familiarity with the prodigious volume of GRADE guidance papers.

Initially, GRADE guidance focused on assessing the certainty of evidence for causal questions. Since then, GRADE guidance has been extended to apply to questions of prognosis [15] and diagnosis [16]. GRADE now also offers guidance for evaluating the certainty of evidence from network meta-analyses [17, 18]. GRADE has recently clarified judgments related to imprecision [19]. Newer GRADE guidance emphasizes that the certainty of evidence represents the certainty that the true effect lies on one side of a specified threshold or within a chosen range [19].

Most recently, GRADE has been expanded to offer guidance on moving from evidence to recommendations, by way of GRADE Evidence-to-Decision frameworks [20]. These frameworks provide a structured process for moving from evidence to guideline recommendations, systematically considering all factors that may bear on the direction and strength of the recommendations, including the balance between benefits and harms, certainty of evidence, values and preferences, and sometimes, cost-effectiveness, acceptability, feasibility, and equity.

GRADE also offers guidance on summarizing results, developed based on feedback from evidence users and other stakeholders [21]. According to this guidance, high certainty evidence is described with declarative statements, moderate certainty evidence with ‘probably’, low certainty evidence with ‘may’ and very low indicated with ‘very uncertain’.

Advantages of the GRADE approach

GRADE has now been adopted by over 100 organizations worldwide, including the World Health Organization, the National Institute for Health and Care Excellence, and the Cochrane Collaboration, and across a diverse range of health fields [22].

The GRADE approach offers a systematic and transparent process for assessing the certainty of evidence and moving from evidence to recommendations. Its application ensures that all factors that may bear on the certainty of evidence or on the direction and strength of recommendations are systematically and transparently considered. Nevertheless, its application invariably involves subjective judgments and such judgments may vary between reviewers, guideline developers, and other decision-makers. Disagreements between reasonable individuals are expected. The transparency facilitated by GRADE, however, allows parties to identify specific sources of disagreement.

The application of GRADE also promotes the consideration of patient values and preferences. According to the GRADE approach, patient perspectives inform the selection of outcomes for consideration for guideline recommendations and other decisions, whether the benefits of a particular course of action outweigh harms, and the minimum important difference in outcomes that patients find important [23]. Considering values and preferences respects the rights of citizens to participate in health decision-making, aligns guidelines with the needs and priorities of the communities they are intended to serve, ensures recommendations are logistically feasible and acceptable, and improves support for the recommendations [24, 25].

Conclusion

GRADE presents a systematic and transparent approach to evaluating the certainty of evidence and for moving from evidence to recommendations. It has been adopted by multiple authoritative international organizations. Since its inception, GRADE has been refined and updated and will likely continue to evolve based on the experiences of reviewers and guideline developers. Its application offers several strong advantages, including the promotion of consistency in standards to evaluate evidence across health fields and explicit consideration of patient values and preferences.