Levels of Evidence – How to Rate the Quality of a Study


The major goal of the USPSTF is to provide a reliable and accurate source of evidence-based recommendations on preventative services. The overarching question that the Task Force seeks to answer for every preventive service is whether evidence suggests that provision of the service would improve health outcomes if implemented in a general primary care population 

Assessing the Quality of a Study

The concept of ‘levels of evidence’ follows the NAM standard that quality of evidence should be an integral part of medical guideline development. Note that the NAM does not prescribe a single rating system. Different professional bodies may use different scoring systems. What is important is that ‘levels of evidence’ be readily apparent to the guideline reader. The same approach can be used when reviewing a paper.

Assess the quality of a study by asking 3 basic questions:

  1. Is the study Interventional: do the investigators ‘intervene’ in the management of the study subject? Or does nature take its course?
  2. If yes to the above, is the study randomized?
  3. Did the authors include a section describing how they determined that they included enough study subjects to draw conclusions?

Level I Evidence – interventional study – randomized

  • Randomized control trial (RCT) – study subjects are placed into an experimental versus a control group in a random fashion, beyond the control of the investigators
  • Even unknown or unanticipated effects will hopefully balance out between the two groups
  • The ‘non-exposed’ group can receive either a placebo or alternate drug, test, or management plan

Level II-1  Evidence – interventional study – not  randomized

  • Allocation to the experimental versus control group is left to the investigators and therefore bias is more likely than Level I

Level II-2 Evidence – observational study – no intervention by the researchers

(a) Cohort study – these are studies where the starting point is a particular exposure

  • Study subjects are classified as ‘exposed’ or ‘not exposed’ and followed for a period of time
  • Timing can be prospective (following study subjects for a prescribed amount of time) or retrospective – looking back through charts
  • The rate of outcome can be compared between the exposed and unexposed group and relative risks can be determined

(b) Case control studies – these are ‘retrospective’ studies  – which means that the events or interventions being studied have already taken place in the past and the outcome of interest drives the design, rather than the exposure (see cohort study, above)

  • A group who had the outcome of interest (for example cesarean section) is matched to a group that did not
  • The intervention of interest is then assessed in both groups (to continue the example – exposure to an epidural during labor)
    • The odds of undergoing a cesarean section following exposure to an epidural can then be compared to the odds of vaginal delivery following exposure to an epidural
  • These studies can determine association but not causation

Level II-3 Evidence – a ‘snapshot’ in time

Cross sectional studies – At any given time, what is the exposure and outcome?

  • Useful if looking at prevalence of a disorder as well as potential variables
  • For example, one could look at prevalence of uterine cancer in a particular population, based on age or exposure to hormonal therapy

  • These studies will often assess other factors (e.g., socioeconomic level) to make sure the variable of interest is truly an independent factor

Level III Evidence – other studies with essentially minimal design, ‘descriptive’  and lowest level of evidence

  • Case study – description of clinical features related to a particular topic of interest
  • Expert opinion – individual expert opinion was previously used frequently for recommendations but now used in the absence of better study design
  • Used in the context of expert panels and committees and may be important when data is lacking or research is ongoing


“Level of evidence” is a standardized way to determine the quality of a research project, which is based on study design. Clinical trials are ‘interventional’, if researchers intervene and are the highest level of evidence, followed by lower levels of evidence where observational approaches are used and ‘nature is allowed to take its course’.  Ultimately, the goal of any such rating system is to help clinicians provide evidenced-based care to patients and aid in public health policy.


  • Certain topics of interest are not always amenable to RCTs and may require other approaches.
    • In the case of rare outcomes, case-control studies may be more practical
  • Because an RCT has not been done does not mean that something is not true or valid
    • An intervention that has been proven not to work via an RCT is not the same as an intervention that has shown promise using other study designs but has not yet been studied using an RCT
  • Blinding investigators is another important way of limiting bias, but is not always practical or realistic – for example, some surgical interventions
  • Always check to see if there is a section describing how the investigators determined the appropriate number of study subjects.  If a study is too small, an important difference may exist but statistical significance will not be achieved

Learn More – Primary Sources:

USPSTF: Update on Methods: Estimating Certainty and Magnitude of Net Benefit

Introducing the “Level of Evidence” to Obstetrics & Gynecology

Center for Evidence-Based Medicine: Study design

Cambridge University BHRU Video: Systematic Reviews and Meta-Analysis

Critical Appraisal Skills Program (CASP): Randomised Controlled Trial Standard Checklist