Guidelines and Frameworks in Real-World Evidence (RWE)–Don’t Drown in the Alphabet Soup!

Picture of Justin Belair

Justin Belair

Biostatistician and RWE Expert in Pharma, Biotech, & Science | Consultant | Author of Causal Inference in Statistics

Table of Contents

When you design a traditional clinical trial, the path is well-worn. CONSORT tells you what to report, ICH-E9 tells you what to plan, and decades of regulatory precedent tell you what questions to expect. The story arc is familiar: randomize, blind, follow protocol, analyze as planned, report transparently. There are many controversies and disagreements in traditional trials, but they exist within an established framework.

Real-world evidence doesn’t have that luxury. Or perhaps more accurately: it has many frameworks, which are still being contested, debated, and refined as the field matures. You might be familiar with the most preeminent guidelines: STROBE, RECORD & RECORD-PE, STaRT-RWE, HARPER, TARGET, and the Causal Roadmap.

This alphabet soup of observational study guidance has grown thick enough that researchers joke about needing a framework to choose their frameworks! But this proliferation isn’t arbitrary, it’s rather a reflection of the diversity of approaches, frameworks, and schools of thought that were developed to wrestle with something genuinely hard: how do you create transparency and structure around research that is inherently more heterogeneous, more assumption-dependent, and more contextual than the RCT ideal?

Observational Research is (Really) Difficult

Let’s remind ourselves what are the challenges that exist when looking to generate compelling evidence using real-world data. The FDA, in it’s framework for RWE, defines a traditional clinical trial as:

a research study in which one or more human subjects are prospectively assigned to one or more interventions (which may include placebo or other control) to evaluate the effects of those interventions on health-related biomedical or behavioral outcomes … One that is usually supported by a research infrastructure that is largely separate from routine clinical practice and is designed to control variability and maximize data quality (emphasis mine).

In traditional trials, the elements emphasized in the definition above all provide methodological tools to increase the validity of the evidence that will be produced from the trial. We can compare this with RWE in the following table.

Comparison of Traditional Clinical Trials and Real-World Evidence Studies
Feature Traditional Clinical Trials RWE Studies
Prospective assignment of interventions Subjects assigned by investigator using randomization to create comparable groups. Retrospective data; assignment by patient choice, physician decision, or complex uncontrolled processes. Observational, not interventional.
Placebo or other control Randomization creates comparable treatment and control groups to isolate treatment effects. Control patients may be unavailable or poorly matched due to lack of randomization.
Effects Randomization helps isolate causal effects. Causal effects harder to estimate due to confounding and biases.
Health-related outcomes Outcomes carefully defined and measured; staff rigorously trained; protocols standardized. Outcomes may be poorly defined, difficult to operationalize, subject to measurement error or misclassification.
Research infrastructure Controlled environment with dedicated staff and resources. Data from routine practice: inconsistent followups, missing information, irregular timing, unstructured data.
Data quality control Variability controlled through randomization and blinding; data quality maximized through design. High variability makes inference difficult; poor data quality limits reliable conclusions.

Paul Rosenbaum, a pioneer of using causal inference methods in observational research, wrote about developing reasonably compelling observational research as an attainable ideal. This is what he had to say in his 2019 book (affiliate link), Observation and Experiment: An Introduction to Causal Inference:

… even a reasonably compelling observational study may turn out, in light of subsequent research, to have reached an erroneous conclusion. Sometimes a reasonably compelling observational study prompts investigators to perform a randomized trial, and sometimes the trial does not support the conclusions of the observational study. At other times, several reasonably compelling observational studies point in incompatible directions. When ethical or practical constraints force scientists to rely on observational studies, it is not uncommon to see a decade or more of thrashing about, a decade or more of controversy, conflicting conclusions, and uncertainty. This can be true even when the studies themselves are well designed and executed. Can an observational study be more than reasonably compelling? Arguably, it has happened once or twice, but reasonably compelling studies are rare to begin with. (emphasis mine)

Having acknowledged that observational research for RWE is difficult, even for the most skilled, ethical and transparent scientists, there still is a need to leverage real-world data to advance patient well-being and develop innovative technology. In my consulting practice and my workshops on RWE, I emphasize the need to follow methodological guidelines. These serve as guardrails that help keep us on track, enhance transparency, and build trust with stakeholders. However, I also stress that these guidelines are not a panacea: they don’t replace mature and honest scientific judgment.

Let’s breakdown these frameworks, highlighting their strengths and limitations. We then follow with recommendations that can be used in your own RWE studies to craft compelling narratives for regulators. We end with a reminder of the importance of mature scientific judgment and transparency in generating RWE.

Alphabet Soup of RWE Guidelines and Frameworks

As our understanding of causal inference and the challenges in observational research has grown over the past few decades, different communities–statisticians, epidemiologists, regulators, causal inference methodologists–developed distinct but overlapping methodological guidelines. Understanding each framework is the first step toward using them strategically rather than mechanically.

STROBE (Strengthening the Reporting of Observational Studies in Epidemiology)

STROBE is the foundational reporting standard, published in 2007 and widely adopted across epidemiology and clinical research. STROBE provides a 22-item checklist covering what researchers should report about their study’s background, methods, results, and interpretation. Its focus is transparency: what did you do? STROBE doesn’t tell you how to design a study or choose analytic methods. Rather, it tells you what to document so others can evaluate your choices. Think of it as the observational study equivalent of CONSORT for trials: a minimal reporting standard that enhances reproducibility and critical appraisal.

RECORD (REporting of studies Conducted using Observational Routinely-collected health Data)

RECORD extends STROBE specifically for studies using administrative databases, electronic health records (EHR), and other routinely collected health data. Published in 2015, RECORD adds guidance on reporting data source characteristics, code lists for defining exposures and outcomes, data linkage procedures, and validation studies.

RECORD-PE (Pharmacoepidemiology)

RECORD-PE further extends this for drug safety and effectiveness research, adding items on drug exposure definition, new user designs, and time-varying confounding. Both RECORD and RECORD-PE maintain STROBE’s reporting focus but acknowledge the unique challenges of secondary data use: data that wasn’t purposefully collected for research requires high levels of transparency about what it contains and how it will be used.

We must always keep in mind that prospective data collection with a pre-specified protocol gives control over important methodological choices that can help enhance the credibility of evidence. On the other hand, secondary data use comes with inherent limitations, pitfalls, and roadblocks that should be clearly evaluated as part of a data fitness audit before embarking on a study.

STaRT-RWE (STructured Approach To Real-World Evidence)

STaRT-RWE takes a different approach. Developed by regulators and industry stakeholders, STaRT-RWE is less about what to report and more about how to justify your design choices for regulatory decision-making. It provides a structured template for documenting study rationale, data source selection, design features, and analytic approaches in the context of a specific regulatory question.

The framework explicitly connects design choices to fitness-for-purpose: why is this observational study the right tool for answering this question? Why is this data source appropriate? How do the design features address potential biases? STaRT-RWE speaks the language of regulatory review, making it particularly valuable when RWE will inform labeling, approval, or post-market commitments.

HARPER (HARmonized Protocol Template to Enhance Reproducibility)

HARPER addresses a different gap: protocol heterogeneity. Even with reporting standards like STROBE, observational study protocols vary wildly in structure and content, making them difficult to review and compare. The International Society for Pharmacoepidemiology (ISPE) and the Professional Society for Health Economics and Outcomes Research (ISPOR) convened a joint task force, including representation from key international stakeholders, to create a harmonized protocol template for RWE studies that evaluate a treatment effect and are intended to inform decision-making.

This harmonized template covers study rationale, objectives, methods, analysis plans, and governance regardless of study design or data source. It’s particularly useful for multi-database studies and distributed research networks where protocol standardization facilitates reproducibility and cross-site comparison. HARPER’s value is in what to plan and how to organize your protocol documentation, creating consistency across diverse research contexts.

TARGET (Target Trial Emulation Framework)

TARGET brings explicit causal inference into study design. Rather than starting with “here’s the data I have, what can I study?” (which, by the way, is a tempting but potentially dangerous approach), TARGET starts with “if I could run the ideal randomized trial to answer this question, what would it look like?”. It then asks how closely an observational study can emulate that target trial.

Developed and popularized by Miguel Hernán and James Robins, two towering figures in causal inference methodology research, the framework requires specifying eligibility criteria, treatment strategies, assignment procedures, follow-up period, outcomes, causal contrasts, and analysis plan as if you were designing a trial. This “target trial emulation” approach makes assumptions explicit, helps identify potential biases, and provides a common language for discussing design choices. TARGET is fundamentally about causal specification: forcing clarity about what causal effect you’re trying to estimate and what assumptions are required to estimate it from observational data.

By casting an observational study in the mould of a traditional trial, the hope is that the gap between the two can be closed as much as possible, with any remaining discrepancies transparently articulated. Detractors argue that this forces a square peg into a round hole: observational studies and traditional trials are fundamentally different, and the methods used should reflect this.

The Causal Roadmap

The Causal Roadmap provides the most comprehensive framework, spanning the entire research process from question formulation to inference. Developed by Mark van der Laan’s school of causal inference, the roadmap consists of systematic steps:

  1. Define the causal question precisely
  2. Specify the causal model (often using DAGs)
  3. Identify the target causal parameter
  4. Assess identifiability assumptions
  5. Conduct statistical estimation
  6. Perform sensitivity analysis.

Unlike frameworks focused primarily on reporting or design justification, the Causal Roadmap emphasizes causal reasoning at every stage. It’s less prescriptive about specific designs and more focused on ensuring that researchers explicitly state their causal question, articulate their assumptions, and conduct inference in light of those assumptions.

Comparison of Observational Study Guidelines
Framework Primary Purpose Key Research Stages Regulatory Alignment
STROBE Reporting transparency Reporting Moderate
RECORD / RECORD-PE Reporting for routine data studies Reporting Moderate
STaRT-RWE Design justification for regulatory use Planning, Justification, Reporting High
HARPER Protocol harmonization Planning, Protocol development Moderate-High
TARGET Causal specification via target trial Planning, Design, Analysis Moderate-High
Causal Roadmap Comprehensive causal reasoning All Stages Variable

The complementarity becomes clear when you consider a typical RWE study intended for regulatory submission. During protocol development, you might use the Causal Roadmap to crystallize your causal question and assumptions, TARGET to specify your design as a target trial emulation, and HARPER to structure your protocol document1.

When justifying your design to regulators or other stakeholders, STaRT-RWE provides the framework for explaining why your choices are fit-for-purpose. When reporting your completed study, STROBE (extended with RECORD or RECORD-PE as appropriate) ensures transparent documentation of what you did, while supplemental materials might include your target trial specification and causal diagrams to make assumptions explicit.

No single framework is sufficient and a guarantee of validity. STROBE tells you what to report but not why you made particular design choices. STaRT-RWE helps justify those choices but doesn’t provide the causal machinery to specify what effect you’re estimating. TARGET makes causal assumptions explicit but doesn’t tell you how to format a regulatory submission. The Causal Roadmap provides comprehensive reasoning tools but isn’t a reporting checklist.

The Regulatory Narrative in Practice

The frameworks we’ve discussed aren’t meant to replace scientific judgment–I would argue that they’re rather scaffolding for documenting and communicating your insights in a transparent and trustworthy manner. Remember, in science, no guideline, checklist, or method guarantees the validity of evidence. I always suggest the following to my clients and my students: don’t aim for perfection, but rather for transparency, honesty, and humility in face of the daunting task of RWE generation. Regulators evaluating RWE aren’t looking for perfect studies–they know such studies don’t exist! They’re looking for honest accounting of strengths, limitations, and the reasonableness of underlying assumptions. A study with acknowledged limitations but transparent reasoning is far more credible than one that glosses over potential biases or treats methodological choices as self-evident. The frameworks help you build this transparency systematically.

Despite your best efforts, there will always be decisions based on expert judgment. I stress this because often, non-statistically trained stakeholders (e.g. medical affairs), feel paralyzed by the complexity of statistics. Yet, any experienced statistician will readily acknowledge that without expert clinical or medical judgment and domain-knowledge, they can’t get very far. A few examples of decisions that do not boil down to algorithms or decision-trees:

  • Covariate selection: Which confounders to adjust for? Your DAG provides guidance, but this just pushes the problem one step further: how can we trust the DAG? By trusting the experts who validate it, of course (and sensitivity analysis). Also, real-world data forces trade-offs between bias reduction and precision loss, between measured proxies and unmeasured true confounders. Document your reasoning: why include variable X despite measurement error? Why exclude variable Y despite theoretical relevance?
  • Time-zero definition: When does follow-up begin? This seemingly technical choice has profound causal implications. The frameworks help you think through immortal time bias, prevalent user bias, and time-varying confounding-—but the specific choice depends on your causal question and data structure.
  • Sensitivity analysis scope: You can’t test every assumption in all their breadth and depth. Which sensitivity analyses are most important for your regulatory audience? Focus on assumptions that, if violated, would most affect conclusions and where violation is plausible given your data source.
  • Effect heterogeneity: Should you stratify by subgroups? Report heterogeneity analyses? These choices involve balancing statistical power against clinical/regulatory relevance. The frameworks don’t answer this–your understanding of the clinical context and regulatory priorities does.
  • Communicating uncertainty: How strongly should you interpret your findings given acknowledged limitations? This requires synthesizing the entire evidentiary chain: data quality, assumption plausibility, sensitivity analysis results, consistency with prior evidence. The frameworks help you document each link; judgment connects them into an honest conclusion.

Weaving Regulator-Friendly Narratives

A compelling regulatory narrative typically includes a judicious weaving of the following elements, ideally drawing from the various frameworks discussed above:

  • The Setup: “We sought to estimate [specific causal effect] because [regulatory or clinical importance]. We used [data source] because [justification], recognizing [key limitations].”
  • The Design Logic: “We emulated a target trial that would have [specific protocol]. Our observational design approximates this by [design features], with the following gaps: [deviations] which we address through [analytical choices].”
  • The Causal Framework: “Our causal assumptions, shown in the DAG, are [list key assumptions]. We assessed these by [validation studies, sensitivity analyses, external evidence]. Under these assumptions, our estimand is [specific parameter].”
  • The Evidence: “We found [results]. These are robust to [sensitivity analyses show what], but would be biased if [specific unmet assumption]. Based on [totality of evidence assessment], we conclude [appropriately hedged interpretation].”
  • The Transparency: “Limitations include [specific, quantified where possible]. Future work should [how to strengthen evidence].”

This isn’t a template to fill mechanically! It’s a structure for honest communication about what you did, why you did it, what it shows, and what it doesn’t show. Notice how each step relies on deep knowledge of data, methods, and the therapeutic area.

Final Recommendations

In conclusion, here are my top recommendations for using RWE frameworks effectively:

  • Use multiple frameworks, not multiple checklists: Don’t just “comply” with STROBE and call it a day. Layer frameworks to address different aspects: causal clarity (Roadmap, TARGET), design justification (STaRT-RWE), reporting (STROBE/RECORD).
  • Document early and often: Don’t wait until manuscript writing to think about frameworks. Use them during protocol development to structure your thinking and create pre-specified plans.
  • Make assumptions explicit and testable: Every observational study rests on untestable assumptions. Your job isn’t to pretend they’re met, but to state them clearly and stress test them where possible. Be critical of your own work, but don’t be paralyzed by imperfection. Involve diverse perspectives: clinicians, epidemiologists, statisticians.
  • Engage regulators early: If your RWE will support regulatory decisions, share your protocol and causal framework with regulators before data analysis. Early alignment on design and assumptions prevents costly late-stage disagreements.
  • Embrace nuance over algorithms: There’s no flowchart that tells you which frameworks to use or how to integrate them. That’s a feature, not a bug. Good observational research requires thoughtful judgment guided by structured frameworks.

Overwhelmed by Framework Overload? You’re Not Alone! I Can Help

The alphabet soup of RWE frameworks reflects genuine complexity in generating credible causal evidence from observational data. Rather than seeking one framework to rule them all, develop skill in strategic layering: use each framework for what it does best, and let them complement each other in service of transparent, compelling evidence that earns regulatory trust.

It may be daunting to navigate these guidelines at first, but it’s much easier than navigating RWE without a map. Once you familiarize yourself with the fundamentals of causal inference, a specific therapeutic area, and the data sources available to you, choosing and applying these frameworks becomes more intuitive.

If you’re looking to build or strengthen your organization’s RWE capabilities, I offer several paths:

  • Corporate Training & Workshops: I’ve delivered workshops and training for pharmaceutical and biotechnology companies. All my trainings are customized to your organization’s needs, but usually fall into one of three categories: RWE, statistics, or causal inference. I also offer ongoing strategic support after the training to make sure your RWE capabilities are growing the way you need them to. Contact me at or through Linkedin to discuss how we can design a training program that meets your team’s specific needs.
  • My Causal Inference in Statistics, With Exercises, Practice Projects, and R/Python Code Notebooks book can help you master causal inference through theory and mathematical notation, plain-language explanations, and case-studies using code and real datasets. The first Chapter is available for free by clicking here. You can also join my newsletter to receive monthly insights into causal inference, biostatistics, and other community activities. It’s a lot of fun!
  • Consulting Services: Sometimes you need targeted expertise for a specific study or regulatory submission. I provide consulting on study design, causal framework development, and regulatory strategy for RWE studies. Whether you’re preparing a protocol for regulatory feedback, designing sensitivity analyses, or crafting your regulatory narrative, I can help ensure your evidence is as compelling and defensible as possible. Reach out at or through Linkedin.

  1. Dang, the first author of the original Causal Roadmap paper, even published in 2023 a paper called Start with the Target Trial Protocol, Then Follow the Roadmap for Causal Inference, acknowledging how both approaches work hand-in-hand. This is particularly interesting when you consider that many researchers in the field consider there to be a competition between both frameworks.↩︎

Scroll to Top

Get Our GitHub Code Library For Free