Jump to navigation

Home

Cochrane Training

Cochrane handbook for systematic reviews of interventions.

Cochrane Handbook for Systematic Reviews of Interventions

  • Access the Cochrane Handbook for Systematic Reviews of Interventions
  • About the  Handbook

Methodological Expectations for Cochrane Intervention Reviews (MECIR)

Contact the editors, how to cite the handbook, permission to re-use material from the handbook, previous versions, access the cochrane handbook  for systematic reviews of interventions.

Open the online Handbook Download PDFs (restricted) Buy the book

Back to top

About the Handbook

The Cochrane Handbook for Systematic Reviews of Interventions is the official guide that describes in detail the process of preparing and maintaining Cochrane systematic reviews on the effects of healthcare interventions. All authors should consult the Handbook for guidance on the methods used in Cochrane systematic reviews. The Handbook includes guidance on the standard methods applicable to every review (planning a review, searching and selecting studies, data collection, risk of bias assessment, statistical analysis, GRADE and interpreting results), as well as more specialised topics (non-randomized studies, adverse effects, complex interventions, equity, economics, patient-reported outcomes, individual patient data, prospective meta-analysis, and qualitative research).

Last updated: 22 August, 2023

Key aspects of Handbook guidance are collated as the Methodological Expectations for Cochrane Intervention Reviews (MECIR). These provide core standards that are generally expected of Cochrane reviews. Each MECIR item includes a link to a relevant Handbook chapter.

For further information and for any Handbook enquiries please contact: [email protected] .

The Handbook editorial team includes: Julian Higgins and James Thomas (Senior Scientific Editors); Jacqueline Chandler, Miranda Cumpston,  Tianjing Li , Matthew Page and Vivian Welch (Associate Scientific Editors); Ella Flemyng (Managing Editor).

To cite the full Handbook online, please use:

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook.

To cite the print edition of the Handbook,  please use:

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions . 2nd Edition. Chichester (UK): John Wiley & Sons, 2019.

Details of how to cite individual chapters in either of these versions are available in each chapter.

Academic or other non-commercial re-use of Handbook material You do not need to request permission to use short quotations (though these must be appropriately cited), or to cite the Handbook as a source. See How to cite the Handbook . If you intend to reproduce material from the Handbook using screenshots, including exact figures or tables from the Handbook or including lengthy direct quotations (more than 5 lines of text), then please fill in this form to request permission to re-use material from the Handbook . This will be sent to the Cochrane Support team who will notify Julian Higgins or James Thomas, the Handbook Senior Editors, as appropriate. If approved, these requests will be granted free of charge on condition that the source is acknowledged.

Commercial re-use of Handbook material Commercial re-use includes any use of the Handbook that will be used in a product for which there is a monetary fee, and/or where it is associated in any way with any product or service. For all enquiries related to the commercial re-use of Handbook material please contact Wiley Global Permissions , John Wiley & Sons, Ltd.

Details on how the Handbook has changed compared to previous versions can be found on the Versions and changes   page. More information on the process for updating the Handbook can be found here . 

Archived copies of the following previous versions of the Handbook are available:

  • Version 6.3 : February 2022 [browsable] 
  • Version 6.2 : February 2021 [browsable] 
  • Version 6.1 : September 2020 [browsable]
  • Version 6.0 : July 2019 [browsable]
  • Version 5.2 : June 2017 [PDF]
  • Version 5.1: March 2011 [browsable]
  • Version 5.0.2: September 2009 [browsable]
  • Version 5.0.0: February 2008 [browsable]
  • Version 4.2.6: September 2006 [PDF] 2.8MB
  • Version 4.2.1 : December 2003 [PDF]

You may also be interested in:

  • Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy

University of Maryland Libraries Logo

Systematic Review

  • Library Help
  • What is a Systematic Review (SR)?

Steps of a Systematic Review

  • Framing a Research Question
  • Developing a Search Strategy
  • Searching the Literature
  • Managing the Process
  • Meta-analysis
  • Publishing your Systematic Review

Forms and templates

Logos of MS Word and MS Excel

Image: David Parmenter's Shop

  • PICO Template
  • Inclusion/Exclusion Criteria
  • Database Search Log
  • Review Matrix
  • Cochrane Tool for Assessing Risk of Bias in Included Studies

   • PRISMA Flow Diagram  - Record the numbers of retrieved references and included/excluded studies. You can use the Create Flow Diagram tool to automate the process.

   •  PRISMA Checklist - Checklist of items to include when reporting a systematic review or meta-analysis

PRISMA 2020 and PRISMA-S: Common Questions on Tracking Records and the Flow Diagram

  • PROSPERO Template
  • Manuscript Template
  • Steps of SR (text)
  • Steps of SR (visual)
  • Steps of SR (PIECES)

Adapted from  A Guide to Conducting Systematic Reviews: Steps in a Systematic Review by Cornell University Library

Source: Cochrane Consumers and Communications  (infographics are free to use and licensed under Creative Commons )

Check the following visual resources titled " What Are Systematic Reviews?"

  • Video  with closed captions available
  • Animated Storyboard
  • << Previous: What is a Systematic Review (SR)?
  • Next: Framing a Research Question >>
  • Last Updated: Jan 26, 2024 4:35 PM
  • URL: https://lib.guides.umd.edu/SR
  • University of Michigan Library
  • Research Guides

Systematic Reviews

  • Work with a Search Expert
  • Covidence Review Software
  • Types of Reviews
  • Evidence in a Systematic Review

Methods - Guidance

  • Information Sources
  • Search Strategy
  • Managing Records
  • Selection Process
  • Data Collection Process
  • Study Risk of Bias Assessment
  • Reporting Results
  • For Search Professionals

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) "is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses . . . to help authors improve the reporting of systematic reviews and meta-analyses."

PRISMA 27-item checklist

The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration.

Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies : The PRISMA-DTA Statement.

Bias: "a systematic error, or deviation from the truth, in results or inference" (Cochrane Handbook, ch. 8)

www.effective healthcare.ahrq.gov)

Types of bias include:

  • Publication, time lag, or multiple publication bias
  • Location bias
  • Citation bias
  • Language bias
  • Outcome reporting bias.

For more details on bias and how to prevent it, see: Cochrane Handbook for Systematic Reviews of Interventions (2008, Chapter 8 &  Chapter 10 , table 10.1.a) and the Catalogue of Bias from CEBM, Oxford.

For guidance on assessing study types, see the Reporting Results page in this guide.

Institute of Medicine. (2011).  Finding What Works in Health Care: Standards for Systematic Reviews . Washington, DC: National Academies 

Cochrane Handbook of Systematic Reviews of Interventions, version 6 (2019)

Center for Reviews and Dissemination (University of York, England) (2009).  Systematic Reviews: CRD's guidance for undertaking systematic reviews in health care .  

Joanna Briggs Institute. (2014)   The Reviewers Manual . The Joanna Briggs Institute/The University of Adelaide. https://wiki.jbi.global/display/MANUAL/JBI+Manual+for+Evidence+Synthesis

The Community Guide/Methods/Systematic Review Methods (June 2014). From The Community Preventive Services Task Force .

For issues in systematic reviews, especially in social science or other qualitative research:  Some Potential "Pitfalls" in the Construction of Educational Systematic Reviews .

  • Additional Resources for Systematic Review Methods

The PRISMA 2020 statement: an updated guideline for reporting systematic reviews

Affiliations.

  • 1 School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia [email protected].
  • 2 School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia.
  • 3 Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, Netherlands.
  • 4 Université de Paris, Centre of Epidemiology and Statistics (CRESS), Inserm, F 75004 Paris, France.
  • 5 Institute for Evidence-Based Healthcare, Faculty of Health Sciences and Medicine, Bond University, Gold Coast, Australia.
  • 6 University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA; Annals of Internal Medicine.
  • 7 Knowledge Translation Program, Li Ka Shing Knowledge Institute, Toronto, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada.
  • 8 Evidence Partners, Ottawa, Canada.
  • 9 Clinical Research Institute, American University of Beirut, Beirut, Lebanon; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada.
  • 10 Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA.
  • 11 York Health Economics Consortium (YHEC Ltd), University of York, York, UK.
  • 12 Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada; School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada; Department of Medicine, University of Ottawa, Ottawa, Canada.
  • 13 Centre for Evidence-Based Medicine Odense (CEBMO) and Cochrane Denmark, Department of Clinical Research, University of Southern Denmark, Odense, Denmark; Open Patient data Exploratory Network (OPEN), Odense University Hospital, Odense, Denmark.
  • 14 Department of Anesthesiology and Pain Medicine, The Ottawa Hospital, Ottawa, Canada; Clinical Epidemiology Program, Blueprint Translational Research Group, Ottawa Hospital Research Institute, Ottawa, Canada; Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, Canada.
  • 15 Department of Ophthalmology, School of Medicine, University of Colorado Denver, Denver, Colorado, United States; Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.
  • 16 Division of Headache, Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA; Head of Research, The BMJ, London, UK.
  • 17 Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA.
  • 18 Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
  • 19 Centre for Reviews and Dissemination, University of York, York, UK.
  • 20 EPPI-Centre, UCL Social Research Institute, University College London, London, UK.
  • 21 Li Ka Shing Knowledge Institute of St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Epidemiology Division of the Dalla Lana School of Public Health and the Institute of Health Management, Policy, and Evaluation, University of Toronto, Toronto, Canada; Queen's Collaboration for Health Care Quality Joanna Briggs Institute Centre of Excellence, Queen's University, Kingston, Canada.
  • 22 Methods Centre, Bruyère Research Institute, Ottawa, Ontario, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada.
  • 23 Centre for Journalology, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada.
  • PMID: 33782057
  • PMCID: PMC8005924
  • DOI: 10.1136/bmj.n71

The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, published in 2009, was designed to help systematic reviewers transparently report why the review was done, what the authors did, and what they found. Over the past decade, advances in systematic review methodology and terminology have necessitated an update to the guideline. The PRISMA 2020 statement replaces the 2009 statement and includes new reporting guidance that reflects advances in methods to identify, select, appraise, and synthesise studies. The structure and presentation of the items have been modified to facilitate implementation. In this article, we present the PRISMA 2020 27-item checklist, an expanded checklist that details reporting recommendations for each item, the PRISMA 2020 abstract checklist, and the revised flow diagrams for original and updated reviews.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Medical Writing / standards
  • Meta-Analysis as Topic
  • Practice Guidelines as Topic
  • Quality Control
  • Research Design / standards*
  • Statistics as Topic
  • Systematic Reviews as Topic* / methods
  • Systematic Reviews as Topic* / standards
  • Terminology as Topic

Grants and funding

  • UG1 EY020522/EY/NEI NIH HHS/United States

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Systematic Review | Definition, Example, & Guide

Systematic Review | Definition, Example & Guide

Published on June 15, 2022 by Shaun Turney . Revised on November 20, 2023.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question “What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?”

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

Table of contents

What is a systematic review, systematic review vs. meta-analysis, systematic review vs. literature review, systematic review vs. scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, other interesting articles, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce bias . The methods are repeatable, and the approach is formal and systematic:

  • Formulate a research question
  • Develop a protocol
  • Search for all relevant studies
  • Apply the selection criteria
  • Extract the data
  • Synthesize the data
  • Write and publish a report

Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesizing all available evidence and evaluating the quality of the evidence. Synthesizing means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Prevent plagiarism. Run a free check.

Systematic reviews often quantitatively synthesize the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesize results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimize bias by using transparent and repeatable methods.

However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

  • A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
  • If you’re doing a systematic review on your own (e.g., for a research paper or thesis ), you should take appropriate measures to ensure the validity and reliability of your research.
  • Access to databases and journal archives. Often, your educational institution provides you with access.
  • Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
  • Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

  • They minimize research bias by considering all available evidence and evaluating each study for bias.
  • Their methods are transparent , so they can be scrutinized by others.
  • They’re thorough : they summarize all available evidence.
  • They can be replicated and updated by others.

Systematic reviews also have a few cons .

  • They’re time-consuming .
  • They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

  • Allow you to more effectively communicate your research to other researchers and practitioners
  • Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

  • Population(s) or problem(s)
  • Intervention(s)
  • Comparison(s)

You can rearrange these four components to write your research question:

  • What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fifth component, the type of study design . In this case, the acronym is PICOT .

  • Type of study design(s)
  • The population of patients with eczema
  • The intervention of probiotics
  • In comparison to no treatment, placebo , or non-probiotic treatment
  • The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
  • Randomized control trials, a type of study design

Their research question was:

  • What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

  • Background information : Provide the context of the research question, including why it’s important.
  • Research objective (s) : Rephrase your research question as an objective.
  • Selection criteria: State how you’ll decide which studies to include or exclude from your review.
  • Search strategy: Discuss your plan for finding studies.
  • Analysis: Explain what information you’ll collect from the studies and how you’ll synthesize the data.

If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:

  • Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
  • Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
  • Gray literature: Gray literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of gray literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of gray literature.
  • Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

  • Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
  • Handsearch: Conference proceedings and reference lists of articles
  • Gray literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
  • Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.

You should apply the selection criteria in two phases:

  • Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
  • Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarize what you did using a PRISMA flow diagram .

Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

  • Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
  • Your judgment of the quality of the evidence, including risk of bias .

You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomized into the control and treatment groups.

Step 6: Synthesize the data

Synthesizing the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesizing the data:

  • Narrative ( qualitative ): Summarize the information in words. You’ll need to discuss the studies and assess their overall quality.
  • Quantitative : Use statistical methods to summarize and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analyzed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

  • Abstract : A summary of the review
  • Introduction : Including the rationale and objectives
  • Methods : Including the selection criteria, search method, data extraction method, and synthesis method
  • Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
  • Discussion : Including interpretation of the results and limitations of the review
  • Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

In their report, Boyle and colleagues concluded that probiotics cannot be recommended for reducing eczema symptoms or improving quality of life in patients with eczema. Note Generative AI tools like ChatGPT can be useful at various stages of the writing and research process and can help you to write your systematic review. However, we strongly advise against trying to pass AI-generated text off as your own work.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, November 20). Systematic Review | Definition, Example & Guide. Scribbr. Retrieved February 22, 2024, from https://www.scribbr.com/methodology/systematic-review/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, how to write a literature review | guide, examples, & templates, how to write a research proposal | examples & templates, what is critical thinking | definition & examples, what is your plagiarism score.

  • PRISMA STATEMENT
  • TRANSLATIONS
  • ENDORSEMENT
  • UNC Libraries
  • HSL Academic Process
  • Systematic Reviews

Systematic Reviews: Home

Created by health science librarians.

HSL Logo

  • Systematic review resources

What is a Systematic Review?

A simplified process map, how can the library help, systematic reviews in non-health disciplines, resources for performing systematic reviews.

  • Step 1: Complete Pre-Review Tasks
  • Step 2: Develop a Protocol
  • Step 3: Conduct Literature Searches
  • Step 4: Manage Citations
  • Step 5: Screen Citations
  • Step 6: Assess Quality of Included Studies
  • Step 7: Extract Data from Included Studies
  • Step 8: Write the Review

  Check our FAQ's

   Email us

  Chat with us (during business hours)

   Call (919) 962-0800

   Make an appointment with a librarian

  Request a systematic or scoping review consultation

Sign up for a systematic review workshop or watch a recording

There are many types of literature reviews.

Before beginning a systematic review, consider whether it is the best type of review for your question, goals, and resources. The table below compares a few different types of reviews to help you decide which is best for you. 

  • Scoping Review Guide For more information about scoping reviews, refer to the UNC HSL Scoping Review Guide.

Systematic Reviews: A Simplified, Step-by-Step Process  Step 1: Pre-Review. Common tasks include formulating a team, developing research question(s), and scoping literature for published systematic reviews on the topic. Librarians can provide substantial support for Step 1.  Step 2: Develop Protocol. Common tasks include determining eligibility criteria, selecting quality assessment tools and items for data extraction, writing the protocol, and making the protocol accessible via a website or registry.  Step 3: Conduct Literature Searches. Common tasks include partnering with a librarian, searching multiple databases, performing other searching methods like hand searching, and locating grey literature or other unpublished research. Librarians can provide substantial support for Step 3.  Step 4: Manage Citations. Common tasks include exporting citations to a citation manager such as Endnote, preparing a PRISMA flow-chart with numbers of citations for steps, updating as necessary, and de-duplicating citations and uploading them to a screening tool such as Covidence. Librarians can provide substantial support for Step 4.   Step 5: Screen Citations. Common tasks include screening the titles and abstracts of citations using inclusion criteria with at least two reviewers and locating full-text and screen citations that meet the inclusion criteria with at least two reviewers.  UNC Health Sciences Librarians (HSL) Librarians can provide support with using AI or other automation approaches to reduce the volume of literature that must be screened manually. Reach out to HSL for more information.  Step 6: Conduct Quality Assessment. Common tasks include performing quality assessments, like a critical appraisal, of the included studies.  Step 7: Complete Data Extraction. Common tasks include extracting data from included studies and creating tables of studies for the manuscript.  Step 8: Write Review. Common tasks include consulting the PRISMA checklist or other reporting standard, writing the manuscript, and organizing supplementary materials. Librarians can provide substantial support for Step 8.

  • UNC HSL's Simplified, Step-by-Step Process Map A PDF file of the HSL's Systematic Review Process Map.

The average systematic review takes 1,168 hours to complete. ¹   A librarian can help you speed up the process.

Systematic reviews follow established guidelines and best practices to produce high-quality research. Librarian involvement in systematic reviews is based on two levels. In Tier 1, the librarian will collaborate with researchers in a consultative manner. In Tier 2, the librarian will be an active member of your research team and co-author on your review. Roles and expectations of librarians vary based on the level of involvement desired. Examples of these differences are outlined in the table below.

  • Request a systematic or scoping review consultation

Researchers are conducting systematic reviews in a variety of disciplines.  If your focus is on a topic other than health sciences, you may want to also consult the resources below to learn how systematic reviews may vary in your field.  You can also contact a librarian for your discipline with questions.

  • EPPI-Centre methods for conducting systematic reviews The EPPI-Centre develops methods and tools for conducting systematic reviews, including reviews for education, public and social policy.

Cover Art

Environmental Topics

  • Collaboration for Environmental Evidence (CEE) CEE seeks to promote and deliver evidence syntheses on issues of greatest concern to environmental policy and practice as a public service

Social Sciences

systematic review method guidelines

  • Siddaway AP, Wood AM, Hedges LV. How to Do a Systematic Review: A Best Practice Guide for Conducting and Reporting Narrative Reviews, Meta-Analyses, and Meta-Syntheses. Annu Rev Psychol. 2019 Jan 4;70:747-770. doi: 10.1146/annurev-psych-010418-102803. A resource for psychology systematic reviews, which also covers qualitative meta-syntheses or met-ethnographies
  • The Campbell Collaboration

Social Work

Cover Art

Software engineering

  • Guidelines for Performing Systematic Literature Reviews in Software Engineering The objective of this report is to propose comprehensive guidelines for systematic literature reviews appropriate for software engineering researchers, including PhD students.

Cover Art

Sport, Exercise, & Nutrition

Cover Art

  • Application of systematic review methodology to the field of nutrition by Tufts Evidence-based Practice Center Publication Date: 2009
  • Systematic Reviews and Meta-Analysis — Open & Free (Open Learning Initiative) The course follows guidelines and standards developed by the Campbell Collaboration, based on empirical evidence about how to produce the most comprehensive and accurate reviews of research

Cover Art

  • Systematic Reviews by David Gough, Sandy Oliver & James Thomas Publication Date: 2020

Cover Art

Updating reviews

  • Updating systematic reviews by University of Ottawa Evidence-based Practice Center Publication Date: 2007

Looking for our previous Systematic Review guide?

Our legacy guide was used June 2020 to August 2022

  • Systematic Review Legacy Guide
  • Next: Step 1: Complete Pre-Review Tasks >>
  • Last Updated: Feb 8, 2024 9:22 AM
  • URL: https://guides.lib.unc.edu/systematic-reviews

Search & Find

  • E-Research by Discipline
  • More Search & Find

Places & Spaces

  • Places to Study
  • Book a Study Room
  • Printers, Scanners, & Computers
  • More Places & Spaces
  • Borrowing & Circulation
  • Request a Title for Purchase
  • Schedule Instruction Session
  • More Services

Support & Guides

  • Course Reserves
  • Research Guides
  • Citing & Writing
  • More Support & Guides
  • Mission Statement
  • Diversity Statement
  • Staff Directory
  • Job Opportunities
  • Give to the Libraries
  • News & Exhibits
  • Reckoning Initiative
  • More About Us

UNC University Libraries Logo

  • Search This Site
  • Privacy Policy
  • Accessibility
  • Give Us Your Feedback
  • 208 Raleigh Street CB #3916
  • Chapel Hill, NC 27515-8890
  • 919-962-1053

SMU Libraries logo

  •   SMU Libraries
  • Scholarship & Research
  • Teaching & Learning
  • Bridwell Library
  • Business Library
  • DeGolyer Library
  • Fondren Library
  • Hamon Arts Library
  • Underwood Law Library
  • Fort Burgwin Library
  • Exhibits & Digital Collections
  • SMU Scholar
  • Special Collections & Archives
  • Connect With Us
  • Research Guides by Subject
  • How Do I . . . ? Guides
  • Find Your Librarian
  • Writing Support

Evidence Syntheses and Systematic Reviews: Overview

  • Choosing a Review

Analyze and Report

What is evidence synthesis.

Evidence Synthesis: general term used to refer to any method of identifying, selecting, and combining results from multiple studies. There are several types of reviews which fall under this term; the main ones are in the table below: 

Types of Reviews

General steps for conducting systematic reviews.

The number of steps for conducting Evidence Synthesis varies a little, depending on the source that one consults. However, the following steps are generally accepted in how Systematic Reviews are done:

  • Identify a gap in the literature and form a well-developed and answerable research question which will form the basis of your search
  • Select a framework that will help guide the type of study you’re undertaking
  • Different guidelines are used for documenting and reporting the protocols of your systematic review before the review is conducted. The protocol is created following whatever guideline you select.
  • Select Databases and Grey Literature Sources
  • For steps 3 and 4, it is advisable to consult a librarian before embarking on this phase of the review process. They can recommend databases and other sources to use and even help design complex searches.
  • A protocol is a detailed plan for the project, and after it is written, it should be registered with an appropriate registry.
  • Search Databases and Other Sources
  • Not all databases use the same search syntax, so when searching multiple databases, use search syntaxes that would work in individual databases.
  • Use a citation management tool to help store and organize your citations during the review process; great help when de-duplicating your citation results
  • Inclusion and exclusion criteria already developed help you remove articles that are not relevant to your topic. 
  • Assess the quality of your findings to eliminate bias in either the design of the study or in the results/conclusions (generally not done outside of Systematic Reviews).

Extract and Synthesize

  • Extract the data from what's left of the studies that have been analyzed
  • Extraction tools are used to get data from individual studies that will be analyzed or summarized. 
  • Synthesize the main findings of your research

Report Findings

Report the results using a statistical approach or in a narrative form.

Need More Help?

Librarians can:

  • Provide guidance on which methodology best suits your goals
  • Recommend databases and other information sources for searching
  • Design and implement comprehensive and reproducible database-specific search strategies 
  • Recommend software for article screening
  • Assist with the use of citation management
  • Offer best practices on documentation of searches

Related Guides

  • Literature Reviews
  • Choose a Citation Manager
  • Project Management

Steps of a Systematic Review - Video

  • Next: Choosing a Review >>
  • Last Updated: Feb 16, 2024 5:40 PM
  • URL: https://guides.smu.edu/evidencesyntheses

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Wiley-Blackwell Online Open

Logo of blackwellopen

An overview of methodological approaches in systematic reviews

Prabhakar veginadu.

1 Department of Rural Clinical Sciences, La Trobe Rural Health School, La Trobe University, Bendigo Victoria, Australia

Hanny Calache

2 Lincoln International Institute for Rural Health, University of Lincoln, Brayford Pool, Lincoln UK

Akshaya Pandian

3 Department of Orthodontics, Saveetha Dental College, Chennai Tamil Nadu, India

Mohd Masood

Associated data.

APPENDIX B: List of excluded studies with detailed reasons for exclusion

APPENDIX C: Quality assessment of included reviews using AMSTAR 2

The aim of this overview is to identify and collate evidence from existing published systematic review (SR) articles evaluating various methodological approaches used at each stage of an SR.

The search was conducted in five electronic databases from inception to November 2020 and updated in February 2022: MEDLINE, Embase, Web of Science Core Collection, Cochrane Database of Systematic Reviews, and APA PsycINFO. Title and abstract screening were performed in two stages by one reviewer, supported by a second reviewer. Full‐text screening, data extraction, and quality appraisal were performed by two reviewers independently. The quality of the included SRs was assessed using the AMSTAR 2 checklist.

The search retrieved 41,556 unique citations, of which 9 SRs were deemed eligible for inclusion in final synthesis. Included SRs evaluated 24 unique methodological approaches used for defining the review scope and eligibility, literature search, screening, data extraction, and quality appraisal in the SR process. Limited evidence supports the following (a) searching multiple resources (electronic databases, handsearching, and reference lists) to identify relevant literature; (b) excluding non‐English, gray, and unpublished literature, and (c) use of text‐mining approaches during title and abstract screening.

The overview identified limited SR‐level evidence on various methodological approaches currently employed during five of the seven fundamental steps in the SR process, as well as some methodological modifications currently used in expedited SRs. Overall, findings of this overview highlight the dearth of published SRs focused on SR methodologies and this warrants future work in this area.

1. INTRODUCTION

Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the “gold standard” of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to systematically search, appraise, and synthesize the available evidence. 3 Several guidelines, developed by various organizations, are available for the conduct of an SR; 4 , 5 , 6 , 7 among these, Cochrane is considered a pioneer in developing rigorous and highly structured methodology for the conduct of SRs. 8 The guidelines developed by these organizations outline seven fundamental steps required in SR process: defining the scope of the review and eligibility criteria, literature searching and retrieval, selecting eligible studies, extracting relevant data, assessing risk of bias (RoB) in included studies, synthesizing results, and assessing certainty of evidence (CoE) and presenting findings. 4 , 5 , 6 , 7

The methodological rigor involved in an SR can require a significant amount of time and resource, which may not always be available. 9 As a result, there has been a proliferation of modifications made to the traditional SR process, such as refining, shortening, bypassing, or omitting one or more steps, 10 , 11 for example, limits on the number and type of databases searched, limits on publication date, language, and types of studies included, and limiting to one reviewer for screening and selection of studies, as opposed to two or more reviewers. 10 , 11 These methodological modifications are made to accommodate the needs of and resource constraints of the reviewers and stakeholders (e.g., organizations, policymakers, health care professionals, and other knowledge users). While such modifications are considered time and resource efficient, they may introduce bias in the review process reducing their usefulness. 5

Substantial research has been conducted examining various approaches used in the standardized SR methodology and their impact on the validity of SR results. There are a number of published reviews examining the approaches or modifications corresponding to single 12 , 13 or multiple steps 14 involved in an SR. However, there is yet to be a comprehensive summary of the SR‐level evidence for all the seven fundamental steps in an SR. Such a holistic evidence synthesis will provide an empirical basis to confirm the validity of current accepted practices in the conduct of SRs. Furthermore, sometimes there is a balance that needs to be achieved between the resource availability and the need to synthesize the evidence in the best way possible, given the constraints. This evidence base will also inform the choice of modifications to be made to the SR methods, as well as the potential impact of these modifications on the SR results. An overview is considered the choice of approach for summarizing existing evidence on a broad topic, directing the reader to evidence, or highlighting the gaps in evidence, where the evidence is derived exclusively from SRs. 15 Therefore, for this review, an overview approach was used to (a) identify and collate evidence from existing published SR articles evaluating various methodological approaches employed in each of the seven fundamental steps of an SR and (b) highlight both the gaps in the current research and the potential areas for future research on the methods employed in SRs.

An a priori protocol was developed for this overview but was not registered with the International Prospective Register of Systematic Reviews (PROSPERO), as the review was primarily methodological in nature and did not meet PROSPERO eligibility criteria for registration. The protocol is available from the corresponding author upon reasonable request. This overview was conducted based on the guidelines for the conduct of overviews as outlined in The Cochrane Handbook. 15 Reporting followed the Preferred Reporting Items for Systematic reviews and Meta‐analyses (PRISMA) statement. 3

2.1. Eligibility criteria

Only published SRs, with or without associated MA, were included in this overview. We adopted the defining characteristics of SRs from The Cochrane Handbook. 5 According to The Cochrane Handbook, a review was considered systematic if it satisfied the following criteria: (a) clearly states the objectives and eligibility criteria for study inclusion; (b) provides reproducible methodology; (c) includes a systematic search to identify all eligible studies; (d) reports assessment of validity of findings of included studies (e.g., RoB assessment of the included studies); (e) systematically presents all the characteristics or findings of the included studies. 5 Reviews that did not meet all of the above criteria were not considered a SR for this study and were excluded. MA‐only articles were included if it was mentioned that the MA was based on an SR.

SRs and/or MA of primary studies evaluating methodological approaches used in defining review scope and study eligibility, literature search, study selection, data extraction, RoB assessment, data synthesis, and CoE assessment and reporting were included. The methodological approaches examined in these SRs and/or MA can also be related to the substeps or elements of these steps; for example, applying limits on date or type of publication are the elements of literature search. Included SRs examined or compared various aspects of a method or methods, and the associated factors, including but not limited to: precision or effectiveness; accuracy or reliability; impact on the SR and/or MA results; reproducibility of an SR steps or bias occurred; time and/or resource efficiency. SRs assessing the methodological quality of SRs (e.g., adherence to reporting guidelines), evaluating techniques for building search strategies or the use of specific database filters (e.g., use of Boolean operators or search filters for randomized controlled trials), examining various tools used for RoB or CoE assessment (e.g., ROBINS vs. Cochrane RoB tool), or evaluating statistical techniques used in meta‐analyses were excluded. 14

2.2. Search

The search for published SRs was performed on the following scientific databases initially from inception to third week of November 2020 and updated in the last week of February 2022: MEDLINE (via Ovid), Embase (via Ovid), Web of Science Core Collection, Cochrane Database of Systematic Reviews, and American Psychological Association (APA) PsycINFO. Search was restricted to English language publications. Following the objectives of this study, study design filters within databases were used to restrict the search to SRs and MA, where available. The reference lists of included SRs were also searched for potentially relevant publications.

The search terms included keywords, truncations, and subject headings for the key concepts in the review question: SRs and/or MA, methods, and evaluation. Some of the terms were adopted from the search strategy used in a previous review by Robson et al., which reviewed primary studies on methodological approaches used in study selection, data extraction, and quality appraisal steps of SR process. 14 Individual search strategies were developed for respective databases by combining the search terms using appropriate proximity and Boolean operators, along with the related subject headings in order to identify SRs and/or MA. 16 , 17 A senior librarian was consulted in the design of the search terms and strategy. Appendix A presents the detailed search strategies for all five databases.

2.3. Study selection and data extraction

Title and abstract screening of references were performed in three steps. First, one reviewer (PV) screened all the titles and excluded obviously irrelevant citations, for example, articles on topics not related to SRs, non‐SR publications (such as randomized controlled trials, observational studies, scoping reviews, etc.). Next, from the remaining citations, a random sample of 200 titles and abstracts were screened against the predefined eligibility criteria by two reviewers (PV and MM), independently, in duplicate. Discrepancies were discussed and resolved by consensus. This step ensured that the responses of the two reviewers were calibrated for consistency in the application of the eligibility criteria in the screening process. Finally, all the remaining titles and abstracts were reviewed by a single “calibrated” reviewer (PV) to identify potential full‐text records. Full‐text screening was performed by at least two authors independently (PV screened all the records, and duplicate assessment was conducted by MM, HC, or MG), with discrepancies resolved via discussions or by consulting a third reviewer.

Data related to review characteristics, results, key findings, and conclusions were extracted by at least two reviewers independently (PV performed data extraction for all the reviews and duplicate extraction was performed by AP, HC, or MG).

2.4. Quality assessment of included reviews

The quality assessment of the included SRs was performed using the AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews). The tool consists of a 16‐item checklist addressing critical and noncritical domains. 18 For the purpose of this study, the domain related to MA was reclassified from critical to noncritical, as SRs with and without MA were included. The other six critical domains were used according to the tool guidelines. 18 Two reviewers (PV and AP) independently responded to each of the 16 items in the checklist with either “yes,” “partial yes,” or “no.” Based on the interpretations of the critical and noncritical domains, the overall quality of the review was rated as high, moderate, low, or critically low. 18 Disagreements were resolved through discussion or by consulting a third reviewer.

2.5. Data synthesis

To provide an understandable summary of existing evidence syntheses, characteristics of the methods evaluated in the included SRs were examined and key findings were categorized and presented based on the corresponding step in the SR process. The categories of key elements within each step were discussed and agreed by the authors. Results of the included reviews were tabulated and summarized descriptively, along with a discussion on any overlap in the primary studies. 15 No quantitative analyses of the data were performed.

From 41,556 unique citations identified through literature search, 50 full‐text records were reviewed, and nine systematic reviews 14 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 were deemed eligible for inclusion. The flow of studies through the screening process is presented in Figure  1 . A list of excluded studies with reasons can be found in Appendix B .

An external file that holds a picture, illustration, etc.
Object name is JEBM-15-39-g001.jpg

Study selection flowchart

3.1. Characteristics of included reviews

Table  1 summarizes the characteristics of included SRs. The majority of the included reviews (six of nine) were published after 2010. 14 , 22 , 23 , 24 , 25 , 26 Four of the nine included SRs were Cochrane reviews. 20 , 21 , 22 , 23 The number of databases searched in the reviews ranged from 2 to 14, 2 reviews searched gray literature sources, 24 , 25 and 7 reviews included a supplementary search strategy to identify relevant literature. 14 , 19 , 20 , 21 , 22 , 23 , 26 Three of the included SRs (all Cochrane reviews) included an integrated MA. 20 , 21 , 23

Characteristics of included studies

SR = systematic review; MA = meta‐analysis; RCT = randomized controlled trial; CCT = controlled clinical trial; N/R = not reported.

The included SRs evaluated 24 unique methodological approaches (26 in total) used across five steps in the SR process; 8 SRs evaluated 6 approaches, 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 while 1 review evaluated 18 approaches. 14 Exclusion of gray or unpublished literature 21 , 26 and blinding of reviewers for RoB assessment 14 , 23 were evaluated in two reviews each. Included SRs evaluated methods used in five different steps in the SR process, including methods used in defining the scope of review ( n  = 3), literature search ( n  = 3), study selection ( n  = 2), data extraction ( n  = 1), and RoB assessment ( n  = 2) (Table  2 ).

Summary of findings from review evaluating systematic review methods

There was some overlap in the primary studies evaluated in the included SRs on the same topics: Schmucker et al. 26 and Hopewell et al. 21 ( n  = 4), Hopewell et al. 20 and Crumley et al. 19 ( n  = 30), and Robson et al. 14 and Morissette et al. 23 ( n  = 4). There were no conflicting results between any of the identified SRs on the same topic.

3.2. Methodological quality of included reviews

Overall, the quality of the included reviews was assessed as moderate at best (Table  2 ). The most common critical weakness in the reviews was failure to provide justification for excluding individual studies (four reviews). Detailed quality assessment is provided in Appendix C .

3.3. Evidence on systematic review methods

3.3.1. methods for defining review scope and eligibility.

Two SRs investigated the effect of excluding data obtained from gray or unpublished sources on the pooled effect estimates of MA. 21 , 26 Hopewell et al. 21 reviewed five studies that compared the impact of gray literature on the results of a cohort of MA of RCTs in health care interventions. Gray literature was defined as information published in “print or electronic sources not controlled by commercial or academic publishers.” Findings showed an overall greater treatment effect for published trials than trials reported in gray literature. In a more recent review, Schmucker et al. 26 addressed similar objectives, by investigating gray and unpublished data in medicine. In addition to gray literature, defined similar to the previous review by Hopewell et al., the authors also evaluated unpublished data—defined as “supplemental unpublished data related to published trials, data obtained from the Food and Drug Administration  or other regulatory websites or postmarketing analyses hidden from the public.” The review found that in majority of the MA, excluding gray literature had little or no effect on the pooled effect estimates. The evidence was limited to conclude if the data from gray and unpublished literature had an impact on the conclusions of MA. 26

Morrison et al. 24 examined five studies measuring the effect of excluding non‐English language RCTs on the summary treatment effects of SR‐based MA in various fields of conventional medicine. Although none of the included studies reported major difference in the treatment effect estimates between English only and non‐English inclusive MA, the review found inconsistent evidence regarding the methodological and reporting quality of English and non‐English trials. 24 As such, there might be a risk of introducing “language bias” when excluding non‐English language RCTs. The authors also noted that the numbers of non‐English trials vary across medical specialties, as does the impact of these trials on MA results. Based on these findings, Morrison et al. 24 conclude that literature searches must include non‐English studies when resources and time are available to minimize the risk of introducing “language bias.”

3.3.2. Methods for searching studies

Crumley et al. 19 analyzed recall (also referred to as “sensitivity” by some researchers; defined as “percentage of relevant studies identified by the search”) and precision (defined as “percentage of studies identified by the search that were relevant”) when searching a single resource to identify randomized controlled trials and controlled clinical trials, as opposed to searching multiple resources. The studies included in their review frequently compared a MEDLINE only search with the search involving a combination of other resources. The review found low median recall estimates (median values between 24% and 92%) and very low median precisions (median values between 0% and 49%) for most of the electronic databases when searched singularly. 19 A between‐database comparison, based on the type of search strategy used, showed better recall and precision for complex and Cochrane Highly Sensitive search strategies (CHSSS). In conclusion, the authors emphasize that literature searches for trials in SRs must include multiple sources. 19

In an SR comparing handsearching and electronic database searching, Hopewell et al. 20 found that handsearching retrieved more relevant RCTs (retrieval rate of 92%−100%) than searching in a single electronic database (retrieval rates of 67% for PsycINFO/PsycLIT, 55% for MEDLINE, and 49% for Embase). The retrieval rates varied depending on the quality of handsearching, type of electronic search strategy used (e.g., simple, complex or CHSSS), and type of trial reports searched (e.g., full reports, conference abstracts, etc.). The authors concluded that handsearching was particularly important in identifying full trials published in nonindexed journals and in languages other than English, as well as those published as abstracts and letters. 20

The effectiveness of checking reference lists to retrieve additional relevant studies for an SR was investigated by Horsley et al. 22 The review reported that checking reference lists yielded 2.5%–40% more studies depending on the quality and comprehensiveness of the electronic search used. The authors conclude that there is some evidence, although from poor quality studies, to support use of checking reference lists to supplement database searching. 22

3.3.3. Methods for selecting studies

Three approaches relevant to reviewer characteristics, including number, experience, and blinding of reviewers involved in the screening process were highlighted in an SR by Robson et al. 14 Based on the retrieved evidence, the authors recommended that two independent, experienced, and unblinded reviewers be involved in study selection. 14 A modified approach has also been suggested by the review authors, where one reviewer screens and the other reviewer verifies the list of excluded studies, when the resources are limited. It should be noted however this suggestion is likely based on the authors’ opinion, as there was no evidence related to this from the studies included in the review.

Robson et al. 14 also reported two methods describing the use of technology for screening studies: use of Google Translate for translating languages (for example, German language articles to English) to facilitate screening was considered a viable method, while using two computer monitors for screening did not increase the screening efficiency in SR. Title‐first screening was found to be more efficient than simultaneous screening of titles and abstracts, although the gain in time with the former method was lesser than the latter. Therefore, considering that the search results are routinely exported as titles and abstracts, Robson et al. 14 recommend screening titles and abstracts simultaneously. However, the authors note that these conclusions were based on very limited number (in most instances one study per method) of low‐quality studies. 14

3.3.4. Methods for data extraction

Robson et al. 14 examined three approaches for data extraction relevant to reviewer characteristics, including number, experience, and blinding of reviewers (similar to the study selection step). Although based on limited evidence from a small number of studies, the authors recommended use of two experienced and unblinded reviewers for data extraction. The experience of the reviewers was suggested to be especially important when extracting continuous outcomes (or quantitative) data. However, when the resources are limited, data extraction by one reviewer and a verification of the outcomes data by a second reviewer was recommended.

As for the methods involving use of technology, Robson et al. 14 identified limited evidence on the use of two monitors to improve the data extraction efficiency and computer‐assisted programs for graphical data extraction. However, use of Google Translate for data extraction in non‐English articles was not considered to be viable. 14 In the same review, Robson et al. 14 identified evidence supporting contacting authors for obtaining additional relevant data.

3.3.5. Methods for RoB assessment

Two SRs examined the impact of blinding of reviewers for RoB assessments. 14 , 23 Morissette et al. 23 investigated the mean differences between the blinded and unblinded RoB assessment scores and found inconsistent differences among the included studies providing no definitive conclusions. Similar conclusions were drawn in a more recent review by Robson et al., 14 which included four studies on reviewer blinding for RoB assessment that completely overlapped with Morissette et al. 23

Use of experienced reviewers and provision of additional guidance for RoB assessment were examined by Robson et al. 14 The review concluded that providing intensive training and guidance on assessing studies reporting insufficient data to the reviewers improves RoB assessments. 14 Obtaining additional data related to quality assessment by contacting study authors was also found to help the RoB assessments, although based on limited evidence. When assessing the qualitative or mixed method reviews, Robson et al. 14 recommends the use of a structured RoB tool as opposed to an unstructured tool. No SRs were identified on data synthesis and CoE assessment and reporting steps.

4. DISCUSSION

4.1. summary of findings.

Nine SRs examining 24 unique methods used across five steps in the SR process were identified in this overview. The collective evidence supports some current traditional and modified SR practices, while challenging other approaches. However, the quality of the included reviews was assessed to be moderate at best and in the majority of the included SRs, evidence related to the evaluated methods was obtained from very limited numbers of primary studies. As such, the interpretations from these SRs should be made cautiously.

The evidence gathered from the included SRs corroborate a few current SR approaches. 5 For example, it is important to search multiple resources for identifying relevant trials (RCTs and/or CCTs). The resources must include a combination of electronic database searching, handsearching, and reference lists of retrieved articles. 5 However, no SRs have been identified that evaluated the impact of the number of electronic databases searched. A recent study by Halladay et al. 27 found that articles on therapeutic intervention, retrieved by searching databases other than PubMed (including Embase), contributed only a small amount of information to the MA and also had a minimal impact on the MA results. The authors concluded that when the resources are limited and when large number of studies are expected to be retrieved for the SR or MA, PubMed‐only search can yield reliable results. 27

Findings from the included SRs also reiterate some methodological modifications currently employed to “expedite” the SR process. 10 , 11 For example, excluding non‐English language trials and gray/unpublished trials from MA have been shown to have minimal or no impact on the results of MA. 24 , 26 However, the efficiency of these SR methods, in terms of time and the resources used, have not been evaluated in the included SRs. 24 , 26 Of the SRs included, only two have focused on the aspect of efficiency 14 , 25 ; O'Mara‐Eves et al. 25 report some evidence to support the use of text‐mining approaches for title and abstract screening in order to increase the rate of screening. Moreover, only one included SR 14 considered primary studies that evaluated reliability (inter‐ or intra‐reviewer consistency) and accuracy (validity when compared against a “gold standard” method) of the SR methods. This can be attributed to the limited number of primary studies that evaluated these outcomes when evaluating the SR methods. 14 Lack of outcome measures related to reliability, accuracy, and efficiency precludes making definitive recommendations on the use of these methods/modifications. Future research studies must focus on these outcomes.

Some evaluated methods may be relevant to multiple steps; for example, exclusions based on publication status (gray/unpublished literature) and language of publication (non‐English language studies) can be outlined in the a priori eligibility criteria or can be incorporated as search limits in the search strategy. SRs included in this overview focused on the effect of study exclusions on pooled treatment effect estimates or MA conclusions. Excluding studies from the search results, after conducting a comprehensive search, based on different eligibility criteria may yield different results when compared to the results obtained when limiting the search itself. 28 Further studies are required to examine this aspect.

Although we acknowledge the lack of standardized quality assessment tools for methodological study designs, we adhered to the Cochrane criteria for identifying SRs in this overview. This was done to ensure consistency in the quality of the included evidence. As a result, we excluded three reviews that did not provide any form of discussion on the quality of the included studies. The methods investigated in these reviews concern supplementary search, 29 data extraction, 12 and screening. 13 However, methods reported in two of these three reviews, by Mathes et al. 12 and Waffenschmidt et al., 13 have also been examined in the SR by Robson et al., 14 which was included in this overview; in most instances (with the exception of one study included in Mathes et al. 12 and Waffenschmidt et al. 13 each), the studies examined in these excluded reviews overlapped with those in the SR by Robson et al. 14

One of the key gaps in the knowledge observed in this overview was the dearth of SRs on the methods used in the data synthesis component of SR. Narrative and quantitative syntheses are the two most commonly used approaches for synthesizing data in evidence synthesis. 5 There are some published studies on the proposed indications and implications of these two approaches. 30 , 31 These studies found that both data synthesis methods produced comparable results and have their own advantages, suggesting that the choice of the method must be based on the purpose of the review. 31 With increasing number of “expedited” SR approaches (so called “rapid reviews”) avoiding MA, 10 , 11 further research studies are warranted in this area to determine the impact of the type of data synthesis on the results of the SR.

4.2. Implications for future research

The findings of this overview highlight several areas of paucity in primary research and evidence synthesis on SR methods. First, no SRs were identified on methods used in two important components of the SR process, including data synthesis and CoE and reporting. As for the included SRs, a limited number of evaluation studies have been identified for several methods. This indicates that further research is required to corroborate many of the methods recommended in current SR guidelines. 4 , 5 , 6 , 7 Second, some SRs evaluated the impact of methods on the results of quantitative synthesis and MA conclusions. Future research studies must also focus on the interpretations of SR results. 28 , 32 Finally, most of the included SRs were conducted on specific topics related to the field of health care, limiting the generalizability of the findings to other areas. It is important that future research studies evaluating evidence syntheses broaden the objectives and include studies on different topics within the field of health care.

4.3. Strengths and limitations

To our knowledge, this is the first overview summarizing current evidence from SRs and MA on different methodological approaches used in several fundamental steps in SR conduct. The overview methodology followed well established guidelines and strict criteria defined for the inclusion of SRs.

There are several limitations related to the nature of the included reviews. Evidence for most of the methods investigated in the included reviews was derived from a limited number of primary studies. Also, the majority of the included SRs may be considered outdated as they were published (or last updated) more than 5 years ago 33 ; only three of the nine SRs have been published in the last 5 years. 14 , 25 , 26 Therefore, important and recent evidence related to these topics may not have been included. Substantial numbers of included SRs were conducted in the field of health, which may limit the generalizability of the findings. Some method evaluations in the included SRs focused on quantitative analyses components and MA conclusions only. As such, the applicability of these findings to SR more broadly is still unclear. 28 Considering the methodological nature of our overview, limiting the inclusion of SRs according to the Cochrane criteria might have resulted in missing some relevant evidence from those reviews without a quality assessment component. 12 , 13 , 29 Although the included SRs performed some form of quality appraisal of the included studies, most of them did not use a standardized RoB tool, which may impact the confidence in their conclusions. Due to the type of outcome measures used for the method evaluations in the primary studies and the included SRs, some of the identified methods have not been validated against a reference standard.

Some limitations in the overview process must be noted. While our literature search was exhaustive covering five bibliographic databases and supplementary search of reference lists, no gray sources or other evidence resources were searched. Also, the search was primarily conducted in health databases, which might have resulted in missing SRs published in other fields. Moreover, only English language SRs were included for feasibility. As the literature search retrieved large number of citations (i.e., 41,556), the title and abstract screening was performed by a single reviewer, calibrated for consistency in the screening process by another reviewer, owing to time and resource limitations. These might have potentially resulted in some errors when retrieving and selecting relevant SRs. The SR methods were grouped based on key elements of each recommended SR step, as agreed by the authors. This categorization pertains to the identified set of methods and should be considered subjective.

5. CONCLUSIONS

This overview identified limited SR‐level evidence on various methodological approaches currently employed during five of the seven fundamental steps in the SR process. Limited evidence was also identified on some methodological modifications currently used to expedite the SR process. Overall, findings highlight the dearth of SRs on SR methodologies, warranting further work to confirm several current recommendations on conventional and expedited SR processes.

CONFLICT OF INTEREST

The authors declare no conflicts of interest.

Supporting information

APPENDIX A: Detailed search strategies

ACKNOWLEDGMENTS

The first author is supported by a La Trobe University Full Fee Research Scholarship and a Graduate Research Scholarship.

Open Access Funding provided by La Trobe University.

Veginadu P, Calache H, Gussy M, Pandian A, Masood M. An overview of methodological approaches in systematic reviews . J Evid Based Med . 2022; 15 :39–54. 10.1111/jebm.12468 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

  • Systematic review
  • Open access
  • Published: 19 February 2024

‘It depends’: what 86 systematic reviews tell us about what strategies to use to support the use of research in clinical practice

  • Annette Boaz   ORCID: orcid.org/0000-0003-0557-1294 1 ,
  • Juan Baeza 2 ,
  • Alec Fraser   ORCID: orcid.org/0000-0003-1121-1551 2 &
  • Erik Persson 3  

Implementation Science volume  19 , Article number:  15 ( 2024 ) Cite this article

1381 Accesses

61 Altmetric

Metrics details

The gap between research findings and clinical practice is well documented and a range of strategies have been developed to support the implementation of research into clinical practice. The objective of this study was to update and extend two previous reviews of systematic reviews of strategies designed to implement research evidence into clinical practice.

We developed a comprehensive systematic literature search strategy based on the terms used in the previous reviews to identify studies that looked explicitly at interventions designed to turn research evidence into practice. The search was performed in June 2022 in four electronic databases: Medline, Embase, Cochrane and Epistemonikos. We searched from January 2010 up to June 2022 and applied no language restrictions. Two independent reviewers appraised the quality of included studies using a quality assessment checklist. To reduce the risk of bias, papers were excluded following discussion between all members of the team. Data were synthesised using descriptive and narrative techniques to identify themes and patterns linked to intervention strategies, targeted behaviours, study settings and study outcomes.

We identified 32 reviews conducted between 2010 and 2022. The reviews are mainly of multi-faceted interventions ( n  = 20) although there are reviews focusing on single strategies (ICT, educational, reminders, local opinion leaders, audit and feedback, social media and toolkits). The majority of reviews report strategies achieving small impacts (normally on processes of care). There is much less evidence that these strategies have shifted patient outcomes. Furthermore, a lot of nuance lies behind these headline findings, and this is increasingly commented upon in the reviews themselves.

Combined with the two previous reviews, 86 systematic reviews of strategies to increase the implementation of research into clinical practice have been identified. We need to shift the emphasis away from isolating individual and multi-faceted interventions to better understanding and building more situated, relational and organisational capability to support the use of research in clinical practice. This will involve drawing on a wider range of research perspectives (including social science) in primary studies and diversifying the types of synthesis undertaken to include approaches such as realist synthesis which facilitate exploration of the context in which strategies are employed.

Peer Review reports

Contribution to the literature

Considerable time and money is invested in implementing and evaluating strategies to increase the implementation of research into clinical practice.

The growing body of evidence is not providing the anticipated clear lessons to support improved implementation.

Instead what is needed is better understanding and building more situated, relational and organisational capability to support the use of research in clinical practice.

This would involve a more central role in implementation science for a wider range of perspectives, especially from the social, economic, political and behavioural sciences and for greater use of different types of synthesis, such as realist synthesis.

Introduction

The gap between research findings and clinical practice is well documented and a range of interventions has been developed to increase the implementation of research into clinical practice [ 1 , 2 ]. In recent years researchers have worked to improve the consistency in the ways in which these interventions (often called strategies) are described to support their evaluation. One notable development has been the emergence of Implementation Science as a field focusing explicitly on “the scientific study of methods to promote the systematic uptake of research findings and other evidence-based practices into routine practice” ([ 3 ] p. 1). The work of implementation science focuses on closing, or at least narrowing, the gap between research and practice. One contribution has been to map existing interventions, identifying 73 discreet strategies to support research implementation [ 4 ] which have been grouped into 9 clusters [ 5 ]. The authors note that they have not considered the evidence of effectiveness of the individual strategies and that a next step is to understand better which strategies perform best in which combinations and for what purposes [ 4 ]. Other authors have noted that there is also scope to learn more from other related fields of study such as policy implementation [ 6 ] and to draw on methods designed to support the evaluation of complex interventions [ 7 ].

The increase in activity designed to support the implementation of research into practice and improvements in reporting provided the impetus for an update of a review of systematic reviews of the effectiveness of interventions designed to support the use of research in clinical practice [ 8 ] which was itself an update of the review conducted by Grimshaw and colleagues in 2001. The 2001 review [ 9 ] identified 41 reviews considering a range of strategies including educational interventions, audit and feedback, computerised decision support to financial incentives and combined interventions. The authors concluded that all the interventions had the potential to promote the uptake of evidence in practice, although no one intervention seemed to be more effective than the others in all settings. They concluded that combined interventions were more likely to be effective than single interventions. The 2011 review identified a further 13 systematic reviews containing 313 discrete primary studies. Consistent with the previous review, four main strategy types were identified: audit and feedback; computerised decision support; opinion leaders; and multi-faceted interventions (MFIs). Nine of the reviews reported on MFIs. The review highlighted the small effects of single interventions such as audit and feedback, computerised decision support and opinion leaders. MFIs claimed an improvement in effectiveness over single interventions, although effect sizes remained small to moderate and this improvement in effectiveness relating to MFIs has been questioned in a subsequent review [ 10 ]. In updating the review, we anticipated a larger pool of reviews and an opportunity to consolidate learning from more recent systematic reviews of interventions.

This review updates and extends our previous review of systematic reviews of interventions designed to implement research evidence into clinical practice. To identify potentially relevant peer-reviewed research papers, we developed a comprehensive systematic literature search strategy based on the terms used in the Grimshaw et al. [ 9 ] and Boaz, Baeza and Fraser [ 8 ] overview articles. To ensure optimal retrieval, our search strategy was refined with support from an expert university librarian, considering the ongoing improvements in the development of search filters for systematic reviews since our first review [ 11 ]. We also wanted to include technology-related terms (e.g. apps, algorithms, machine learning, artificial intelligence) to find studies that explored interventions based on the use of technological innovations as mechanistic tools for increasing the use of evidence into practice (see Additional file 1 : Appendix A for full search strategy).

The search was performed in June 2022 in the following electronic databases: Medline, Embase, Cochrane and Epistemonikos. We searched for articles published since the 2011 review. We searched from January 2010 up to June 2022 and applied no language restrictions. Reference lists of relevant papers were also examined.

We uploaded the results using EPPI-Reviewer, a web-based tool that facilitated semi-automation of the screening process and removal of duplicate studies. We made particular use of a priority screening function to reduce screening workload and avoid ‘data deluge’ [ 12 ]. Through machine learning, one reviewer screened a smaller number of records ( n  = 1200) to train the software to predict whether a given record was more likely to be relevant or irrelevant, thus pulling the relevant studies towards the beginning of the screening process. This automation did not replace manual work but helped the reviewer to identify eligible studies more quickly. During the selection process, we included studies that looked explicitly at interventions designed to turn research evidence into practice. Studies were included if they met the following pre-determined inclusion criteria:

The study was a systematic review

Search terms were included

Focused on the implementation of research evidence into practice

The methodological quality of the included studies was assessed as part of the review

Study populations included healthcare providers and patients. The EPOC taxonomy [ 13 ] was used to categorise the strategies. The EPOC taxonomy has four domains: delivery arrangements, financial arrangements, governance arrangements and implementation strategies. The implementation strategies domain includes 20 strategies targeted at healthcare workers. Numerous EPOC strategies were assessed in the review including educational strategies, local opinion leaders, reminders, ICT-focused approaches and audit and feedback. Some strategies that did not fit easily within the EPOC categories were also included. These were social media strategies and toolkits, and multi-faceted interventions (MFIs) (see Table  2 ). Some systematic reviews included comparisons of different interventions while other reviews compared one type of intervention against a control group. Outcomes related to improvements in health care processes or patient well-being. Numerous individual study types (RCT, CCT, BA, ITS) were included within the systematic reviews.

We excluded papers that:

Focused on changing patient rather than provider behaviour

Had no demonstrable outcomes

Made unclear or no reference to research evidence

The last of these criteria was sometimes difficult to judge, and there was considerable discussion amongst the research team as to whether the link between research evidence and practice was sufficiently explicit in the interventions analysed. As we discussed in the previous review [ 8 ] in the field of healthcare, the principle of evidence-based practice is widely acknowledged and tools to change behaviour such as guidelines are often seen to be an implicit codification of evidence, despite the fact that this is not always the case.

Reviewers employed a two-stage process to select papers for inclusion. First, all titles and abstracts were screened by one reviewer to determine whether the study met the inclusion criteria. Two papers [ 14 , 15 ] were identified that fell just before the 2010 cut-off. As they were not identified in the searches for the first review [ 8 ] they were included and progressed to assessment. Each paper was rated as include, exclude or maybe. The full texts of 111 relevant papers were assessed independently by at least two authors. To reduce the risk of bias, papers were excluded following discussion between all members of the team. 32 papers met the inclusion criteria and proceeded to data extraction. The study selection procedure is documented in a PRISMA literature flow diagram (see Fig.  1 ). We were able to include French, Spanish and Portuguese papers in the selection reflecting the language skills in the study team, but none of the papers identified met the inclusion criteria. Other non- English language papers were excluded.

figure 1

PRISMA flow diagram. Source: authors

One reviewer extracted data on strategy type, number of included studies, local, target population, effectiveness and scope of impact from the included studies. Two reviewers then independently read each paper and noted key findings and broad themes of interest which were then discussed amongst the wider authorial team. Two independent reviewers appraised the quality of included studies using a Quality Assessment Checklist based on Oxman and Guyatt [ 16 ] and Francke et al. [ 17 ]. Each study was rated a quality score ranging from 1 (extensive flaws) to 7 (minimal flaws) (see Additional file 2 : Appendix B). All disagreements were resolved through discussion. Studies were not excluded in this updated overview based on methodological quality as we aimed to reflect the full extent of current research into this topic.

The extracted data were synthesised using descriptive and narrative techniques to identify themes and patterns in the data linked to intervention strategies, targeted behaviours, study settings and study outcomes.

Thirty-two studies were included in the systematic review. Table 1. provides a detailed overview of the included systematic reviews comprising reference, strategy type, quality score, number of included studies, local, target population, effectiveness and scope of impact (see Table  1. at the end of the manuscript). Overall, the quality of the studies was high. Twenty-three studies scored 7, six studies scored 6, one study scored 5, one study scored 4 and one study scored 3. The primary focus of the review was on reviews of effectiveness studies, but a small number of reviews did include data from a wider range of methods including qualitative studies which added to the analysis in the papers [ 18 , 19 , 20 , 21 ]. The majority of reviews report strategies achieving small impacts (normally on processes of care). There is much less evidence that these strategies have shifted patient outcomes. In this section, we discuss the different EPOC-defined implementation strategies in turn. Interestingly, we found only two ‘new’ approaches in this review that did not fit into the existing EPOC approaches. These are a review focused on the use of social media and a review considering toolkits. In addition to single interventions, we also discuss multi-faceted interventions. These were the most common intervention approach overall. A summary is provided in Table  2 .

Educational strategies

The overview identified three systematic reviews focusing on educational strategies. Grudniewicz et al. [ 22 ] explored the effectiveness of printed educational materials on primary care physician knowledge, behaviour and patient outcomes and concluded they were not effective in any of these aspects. Koota, Kääriäinen and Melender [ 23 ] focused on educational interventions promoting evidence-based practice among emergency room/accident and emergency nurses and found that interventions involving face-to-face contact led to significant or highly significant effects on patient benefits and emergency nurses’ knowledge, skills and behaviour. Interventions using written self-directed learning materials also led to significant improvements in nurses’ knowledge of evidence-based practice. Although the quality of the studies was high, the review primarily included small studies with low response rates, and many of them relied on self-assessed outcomes; consequently, the strength of the evidence for these outcomes is modest. Wu et al. [ 20 ] questioned if educational interventions aimed at nurses to support the implementation of evidence-based practice improve patient outcomes. Although based on evaluation projects and qualitative data, their results also suggest that positive changes on patient outcomes can be made following the implementation of specific evidence-based approaches (or projects). The differing positive outcomes for educational strategies aimed at nurses might indicate that the target audience is important.

Local opinion leaders

Flodgren et al. [ 24 ] was the only systemic review focusing solely on opinion leaders. The review found that local opinion leaders alone, or in combination with other interventions, can be effective in promoting evidence‐based practice, but this varies both within and between studies and the effect on patient outcomes is uncertain. The review found that, overall, any intervention involving opinion leaders probably improves healthcare professionals’ compliance with evidence-based practice but varies within and across studies. However, how opinion leaders had an impact could not be determined because of insufficient details were provided, illustrating that reporting specific details in published studies is important if diffusion of effective methods of increasing evidence-based practice is to be spread across a system. The usefulness of this review is questionable because it cannot provide evidence of what is an effective opinion leader, whether teams of opinion leaders or a single opinion leader are most effective, or the most effective methods used by opinion leaders.

Pantoja et al. [ 26 ] was the only systemic review focusing solely on manually generated reminders delivered on paper included in the overview. The review explored how these affected professional practice and patient outcomes. The review concluded that manually generated reminders delivered on paper as a single intervention probably led to small to moderate increases in adherence to clinical recommendations, and they could be used as a single quality improvement intervention. However, the authors indicated that this intervention would make little or no difference to patient outcomes. The authors state that such a low-tech intervention may be useful in low- and middle-income countries where paper records are more likely to be the norm.

ICT-focused approaches

The three ICT-focused reviews [ 14 , 27 , 28 ] showed mixed results. Jamal, McKenzie and Clark [ 14 ] explored the impact of health information technology on the quality of medical and health care. They examined the impact of electronic health record, computerised provider order-entry, or decision support system. This showed a positive improvement in adherence to evidence-based guidelines but not to patient outcomes. The number of studies included in the review was low and so a conclusive recommendation could not be reached based on this review. Similarly, Brown et al. [ 28 ] found that technology-enabled knowledge translation interventions may improve knowledge of health professionals, but all eight studies raised concerns of bias. The De Angelis et al. [ 27 ] review was more promising, reporting that ICT can be a good way of disseminating clinical practice guidelines but conclude that it is unclear which type of ICT method is the most effective.

Audit and feedback

Sykes, McAnuff and Kolehmainen [ 29 ] examined whether audit and feedback were effective in dementia care and concluded that it remains unclear which ingredients of audit and feedback are successful as the reviewed papers illustrated large variations in the effectiveness of interventions using audit and feedback.

Non-EPOC listed strategies: social media, toolkits

There were two new (non-EPOC listed) intervention types identified in this review compared to the 2011 review — fewer than anticipated. We categorised a third — ‘care bundles’ [ 36 ] as a multi-faceted intervention due to its description in practice and a fourth — ‘Technology Enhanced Knowledge Transfer’ [ 28 ] was classified as an ICT-focused approach. The first new strategy was identified in Bhatt et al.’s [ 30 ] systematic review of the use of social media for the dissemination of clinical practice guidelines. They reported that the use of social media resulted in a significant improvement in knowledge and compliance with evidence-based guidelines compared with more traditional methods. They noted that a wide selection of different healthcare professionals and patients engaged with this type of social media and its global reach may be significant for low- and middle-income countries. This review was also noteworthy for developing a simple stepwise method for using social media for the dissemination of clinical practice guidelines. However, it is debatable whether social media can be classified as an intervention or just a different way of delivering an intervention. For example, the review discussed involving opinion leaders and patient advocates through social media. However, this was a small review that included only five studies, so further research in this new area is needed. Yamada et al. [ 31 ] draw on 39 studies to explore the application of toolkits, 18 of which had toolkits embedded within larger KT interventions, and 21 of which evaluated toolkits as standalone interventions. The individual component strategies of the toolkits were highly variable though the authors suggest that they align most closely with educational strategies. The authors conclude that toolkits as either standalone strategies or as part of MFIs hold some promise for facilitating evidence use in practice but caution that the quality of many of the primary studies included is considered weak limiting these findings.

Multi-faceted interventions

The majority of the systematic reviews ( n  = 20) reported on more than one intervention type. Some of these systematic reviews focus exclusively on multi-faceted interventions, whilst others compare different single or combined interventions aimed at achieving similar outcomes in particular settings. While these two approaches are often described in a similar way, they are actually quite distinct from each other as the former report how multiple strategies may be strategically combined in pursuance of an agreed goal, whilst the latter report how different strategies may be incidentally used in sometimes contrasting settings in the pursuance of similar goals. Ariyo et al. [ 35 ] helpfully summarise five key elements often found in effective MFI strategies in LMICs — but which may also be transferrable to HICs. First, effective MFIs encourage a multi-disciplinary approach acknowledging the roles played by different professional groups to collectively incorporate evidence-informed practice. Second, they utilise leadership drawing on a wide set of clinical and non-clinical actors including managers and even government officials. Third, multiple types of educational practices are utilised — including input from patients as stakeholders in some cases. Fourth, protocols, checklists and bundles are used — most effectively when local ownership is encouraged. Finally, most MFIs included an emphasis on monitoring and evaluation [ 35 ]. In contrast, other studies offer little information about the nature of the different MFI components of included studies which makes it difficult to extrapolate much learning from them in relation to why or how MFIs might affect practice (e.g. [ 28 , 38 ]). Ultimately, context matters, which some review authors argue makes it difficult to say with real certainty whether single or MFI strategies are superior (e.g. [ 21 , 27 ]). Taking all the systematic reviews together we may conclude that MFIs appear to be more likely to generate positive results than single interventions (e.g. [ 34 , 45 ]) though other reviews should make us cautious (e.g. [ 32 , 43 ]).

While multi-faceted interventions still seem to be more effective than single-strategy interventions, there were important distinctions between how the results of reviews of MFIs are interpreted in this review as compared to the previous reviews [ 8 , 9 ], reflecting greater nuance and debate in the literature. This was particularly noticeable where the effectiveness of MFIs was compared to single strategies, reflecting developments widely discussed in previous studies [ 10 ]. We found that most systematic reviews are bounded by their clinical, professional, spatial, system, or setting criteria and often seek to draw out implications for the implementation of evidence in their areas of specific interest (such as nursing or acute care). Frequently this means combining all relevant studies to explore the respective foci of each systematic review. Therefore, most reviews we categorised as MFIs actually include highly variable numbers and combinations of intervention strategies and highly heterogeneous original study designs. This makes statistical analyses of the type used by Squires et al. [ 10 ] on the three reviews in their paper not possible. Further, it also makes extrapolating findings and commenting on broad themes complex and difficult. This may suggest that future research should shift its focus from merely examining ‘what works’ to ‘what works where and what works for whom’ — perhaps pointing to the value of realist approaches to these complex review topics [ 48 , 49 ] and other more theory-informed approaches [ 50 ].

Some reviews have a relatively small number of studies (i.e. fewer than 10) and the authors are often understandably reluctant to engage with wider debates about the implications of their findings. Other larger studies do engage in deeper discussions about internal comparisons of findings across included studies and also contextualise these in wider debates. Some of the most informative studies (e.g. [ 35 , 40 ]) move beyond EPOC categories and contextualise MFIs within wider systems thinking and implementation theory. This distinction between MFIs and single interventions can actually be very useful as it offers lessons about the contexts in which individual interventions might have bounded effectiveness (i.e. educational interventions for individual change). Taken as a whole, this may also then help in terms of how and when to conjoin single interventions into effective MFIs.

In the two previous reviews, a consistent finding was that MFIs were more effective than single interventions [ 8 , 9 ]. However, like Squires et al. [ 10 ] this overview is more equivocal on this important issue. There are four points which may help account for the differences in findings in this regard. Firstly, the diversity of the systematic reviews in terms of clinical topic or setting is an important factor. Secondly, there is heterogeneity of the studies within the included systematic reviews themselves. Thirdly, there is a lack of consistency with regards to the definition and strategies included within of MFIs. Finally, there are epistemological differences across the papers and the reviews. This means that the results that are presented depend on the methods used to measure, report, and synthesise them. For instance, some reviews highlight that education strategies can be useful to improve provider understanding — but without wider organisational or system-level change, they may struggle to deliver sustained transformation [ 19 , 44 ].

It is also worth highlighting the importance of the theory of change underlying the different interventions. Where authors of the systematic reviews draw on theory, there is space to discuss/explain findings. We note a distinction between theoretical and atheoretical systematic review discussion sections. Atheoretical reviews tend to present acontextual findings (for instance, one study found very positive results for one intervention, and this gets highlighted in the abstract) whilst theoretically informed reviews attempt to contextualise and explain patterns within the included studies. Theory-informed systematic reviews seem more likely to offer more profound and useful insights (see [ 19 , 35 , 40 , 43 , 45 ]). We find that the most insightful systematic reviews of MFIs engage in theoretical generalisation — they attempt to go beyond the data of individual studies and discuss the wider implications of the findings of the studies within their reviews drawing on implementation theory. At the same time, they highlight the active role of context and the wider relational and system-wide issues linked to implementation. It is these types of investigations that can help providers further develop evidence-based practice.

This overview has identified a small, but insightful set of papers that interrogate and help theorise why, how, for whom, and in which circumstances it might be the case that MFIs are superior (see [ 19 , 35 , 40 ] once more). At the level of this overview — and in most of the systematic reviews included — it appears to be the case that MFIs struggle with the question of attribution. In addition, there are other important elements that are often unmeasured, or unreported (e.g. costs of the intervention — see [ 40 ]). Finally, the stronger systematic reviews [ 19 , 35 , 40 , 43 , 45 ] engage with systems issues, human agency and context [ 18 ] in a way that was not evident in the systematic reviews identified in the previous reviews [ 8 , 9 ]. The earlier reviews lacked any theory of change that might explain why MFIs might be more effective than single ones — whereas now some systematic reviews do this, which enables them to conclude that sometimes single interventions can still be more effective.

As Nilsen et al. ([ 6 ] p. 7) note ‘Study findings concerning the effectiveness of various approaches are continuously synthesized and assembled in systematic reviews’. We may have gone as far as we can in understanding the implementation of evidence through systematic reviews of single and multi-faceted interventions and the next step would be to conduct more research exploring the complex and situated nature of evidence used in clinical practice and by particular professional groups. This would further build on the nuanced discussion and conclusion sections in a subset of the papers we reviewed. This might also support the field to move away from isolating individual implementation strategies [ 6 ] to explore the complex processes involving a range of actors with differing capacities [ 51 ] working in diverse organisational cultures. Taxonomies of implementation strategies do not fully account for the complex process of implementation, which involves a range of different actors with different capacities and skills across multiple system levels. There is plenty of work to build on, particularly in the social sciences, which currently sits at the margins of debates about evidence implementation (see for example, Normalisation Process Theory [ 52 ]).

There are several changes that we have identified in this overview of systematic reviews in comparison to the review we published in 2011 [ 8 ]. A consistent and welcome finding is that the overall quality of the systematic reviews themselves appears to have improved between the two reviews, although this is not reflected upon in the papers. This is exhibited through better, clearer reporting mechanisms in relation to the mechanics of the reviews, alongside a greater attention to, and deeper description of, how potential biases in included papers are discussed. Additionally, there is an increased, but still limited, inclusion of original studies conducted in low- and middle-income countries as opposed to just high-income countries. Importantly, we found that many of these systematic reviews are attuned to, and comment upon the contextual distinctions of pursuing evidence-informed interventions in health care settings in different economic settings. Furthermore, systematic reviews included in this updated article cover a wider set of clinical specialities (both within and beyond hospital settings) and have a focus on a wider set of healthcare professions — discussing both similarities, differences and inter-professional challenges faced therein, compared to the earlier reviews. These wider ranges of studies highlight that a particular intervention or group of interventions may work well for one professional group but be ineffective for another. This diversity of study settings allows us to consider the important role context (in its many forms) plays on implementing evidence into practice. Examining the complex and varied context of health care will help us address what Nilsen et al. ([ 6 ] p. 1) described as, ‘society’s health problems [that] require research-based knowledge acted on by healthcare practitioners together with implementation of political measures from governmental agencies’. This will help us shift implementation science to move, ‘beyond a success or failure perspective towards improved analysis of variables that could explain the impact of the implementation process’ ([ 6 ] p. 2).

This review brings together 32 papers considering individual and multi-faceted interventions designed to support the use of evidence in clinical practice. The majority of reviews report strategies achieving small impacts (normally on processes of care). There is much less evidence that these strategies have shifted patient outcomes. Combined with the two previous reviews, 86 systematic reviews of strategies to increase the implementation of research into clinical practice have been conducted. As a whole, this substantial body of knowledge struggles to tell us more about the use of individual and MFIs than: ‘it depends’. To really move forwards in addressing the gap between research evidence and practice, we may need to shift the emphasis away from isolating individual and multi-faceted interventions to better understanding and building more situated, relational and organisational capability to support the use of research in clinical practice. This will involve drawing on a wider range of perspectives, especially from the social, economic, political and behavioural sciences in primary studies and diversifying the types of synthesis undertaken to include approaches such as realist synthesis which facilitate exploration of the context in which strategies are employed. Harvey et al. [ 53 ] suggest that when context is likely to be critical to implementation success there are a range of primary research approaches (participatory research, realist evaluation, developmental evaluation, ethnography, quality/ rapid cycle improvement) that are likely to be appropriate and insightful. While these approaches often form part of implementation studies in the form of process evaluations, they are usually relatively small scale in relation to implementation research as a whole. As a result, the findings often do not make it into the subsequent systematic reviews. This review provides further evidence that we need to bring qualitative approaches in from the periphery to play a central role in many implementation studies and subsequent evidence syntheses. It would be helpful for systematic reviews, at the very least, to include more detail about the interventions and their implementation in terms of how and why they worked.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Before and after study

Controlled clinical trial

Effective Practice and Organisation of Care

High-income countries

Information and Communications Technology

Interrupted time series

Knowledge translation

Low- and middle-income countries

Randomised controlled trial

Grol R, Grimshaw J. From best evidence to best practice: effective implementation of change in patients’ care. Lancet. 2003;362:1225–30. https://doi.org/10.1016/S0140-6736(03)14546-1 .

Article   PubMed   Google Scholar  

Green LA, Seifert CM. Translation of research into practice: why we can’t “just do it.” J Am Board Fam Pract. 2005;18:541–5. https://doi.org/10.3122/jabfm.18.6.541 .

Eccles MP, Mittman BS. Welcome to Implementation Science. Implement Sci. 2006;1:1–3. https://doi.org/10.1186/1748-5908-1-1 .

Article   PubMed Central   Google Scholar  

Powell BJ, Waltz TJ, Chinman MJ, Damschroder LJ, Smith JL, Matthieu MM, et al. A refined compilation of implementation strategies: results from the Expert Recommendations for Implementing Change (ERIC) project. Implement Sci. 2015;10:2–14. https://doi.org/10.1186/s13012-015-0209-1 .

Article   Google Scholar  

Waltz TJ, Powell BJ, Matthieu MM, Damschroder LJ, et al. Use of concept mapping to characterize relationships among implementation strategies and assess their feasibility and importance: results from the Expert Recommendations for Implementing Change (ERIC) study. Implement Sci. 2015;10:1–8. https://doi.org/10.1186/s13012-015-0295-0 .

Nilsen P, Ståhl C, Roback K, et al. Never the twain shall meet? - a comparison of implementation science and policy implementation research. Implementation Sci. 2013;8:2–12. https://doi.org/10.1186/1748-5908-8-63 .

Rycroft-Malone J, Seers K, Eldh AC, et al. A realist process evaluation within the Facilitating Implementation of Research Evidence (FIRE) cluster randomised controlled international trial: an exemplar. Implementation Sci. 2018;13:1–15. https://doi.org/10.1186/s13012-018-0811-0 .

Boaz A, Baeza J, Fraser A, European Implementation Score Collaborative Group (EIS). Effective implementation of research into practice: an overview of systematic reviews of the health literature. BMC Res Notes. 2011;4:212. https://doi.org/10.1186/1756-0500-4-212 .

Article   PubMed   PubMed Central   Google Scholar  

Grimshaw JM, Shirran L, Thomas R, Mowatt G, Fraser C, Bero L, et al. Changing provider behavior – an overview of systematic reviews of interventions. Med Care. 2001;39 8Suppl 2:II2–45.

Google Scholar  

Squires JE, Sullivan K, Eccles MP, et al. Are multifaceted interventions more effective than single-component interventions in changing health-care professionals’ behaviours? An overview of systematic reviews. Implement Sci. 2014;9:1–22. https://doi.org/10.1186/s13012-014-0152-6 .

Salvador-Oliván JA, Marco-Cuenca G, Arquero-Avilés R. Development of an efficient search filter to retrieve systematic reviews from PubMed. J Med Libr Assoc. 2021;109:561–74. https://doi.org/10.5195/jmla.2021.1223 .

Thomas JM. Diffusion of innovation in systematic review methodology: why is study selection not yet assisted by automation? OA Evid Based Med. 2013;1:1–6.

Effective Practice and Organisation of Care (EPOC). The EPOC taxonomy of health systems interventions. EPOC Resources for review authors. Oslo: Norwegian Knowledge Centre for the Health Services; 2016. epoc.cochrane.org/epoc-taxonomy . Accessed 9 Oct 2023.

Jamal A, McKenzie K, Clark M. The impact of health information technology on the quality of medical and health care: a systematic review. Health Inf Manag. 2009;38:26–37. https://doi.org/10.1177/183335830903800305 .

Menon A, Korner-Bitensky N, Kastner M, et al. Strategies for rehabilitation professionals to move evidence-based knowledge into practice: a systematic review. J Rehabil Med. 2009;41:1024–32. https://doi.org/10.2340/16501977-0451 .

Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44:1271–8. https://doi.org/10.1016/0895-4356(91)90160-b .

Article   CAS   PubMed   Google Scholar  

Francke AL, Smit MC, de Veer AJ, et al. Factors influencing the implementation of clinical guidelines for health care professionals: a systematic meta-review. BMC Med Inform Decis Mak. 2008;8:1–11. https://doi.org/10.1186/1472-6947-8-38 .

Jones CA, Roop SC, Pohar SL, et al. Translating knowledge in rehabilitation: systematic review. Phys Ther. 2015;95:663–77. https://doi.org/10.2522/ptj.20130512 .

Scott D, Albrecht L, O’Leary K, Ball GDC, et al. Systematic review of knowledge translation strategies in the allied health professions. Implement Sci. 2012;7:1–17. https://doi.org/10.1186/1748-5908-7-70 .

Wu Y, Brettle A, Zhou C, Ou J, et al. Do educational interventions aimed at nurses to support the implementation of evidence-based practice improve patient outcomes? A systematic review. Nurse Educ Today. 2018;70:109–14. https://doi.org/10.1016/j.nedt.2018.08.026 .

Yost J, Ganann R, Thompson D, Aloweni F, et al. The effectiveness of knowledge translation interventions for promoting evidence-informed decision-making among nurses in tertiary care: a systematic review and meta-analysis. Implement Sci. 2015;10:1–15. https://doi.org/10.1186/s13012-015-0286-1 .

Grudniewicz A, Kealy R, Rodseth RN, Hamid J, et al. What is the effectiveness of printed educational materials on primary care physician knowledge, behaviour, and patient outcomes: a systematic review and meta-analyses. Implement Sci. 2015;10:2–12. https://doi.org/10.1186/s13012-015-0347-5 .

Koota E, Kääriäinen M, Melender HL. Educational interventions promoting evidence-based practice among emergency nurses: a systematic review. Int Emerg Nurs. 2018;41:51–8. https://doi.org/10.1016/j.ienj.2018.06.004 .

Flodgren G, O’Brien MA, Parmelli E, et al. Local opinion leaders: effects on professional practice and healthcare outcomes. Cochrane Database Syst Rev. 2019. https://doi.org/10.1002/14651858.CD000125.pub5 .

Arditi C, Rège-Walther M, Durieux P, et al. Computer-generated reminders delivered on paper to healthcare professionals: effects on professional practice and healthcare outcomes. Cochrane Database Syst Rev. 2017. https://doi.org/10.1002/14651858.CD001175.pub4 .

Pantoja T, Grimshaw JM, Colomer N, et al. Manually-generated reminders delivered on paper: effects on professional practice and patient outcomes. Cochrane Database Syst Rev. 2019. https://doi.org/10.1002/14651858.CD001174.pub4 .

De Angelis G, Davies B, King J, McEwan J, et al. Information and communication technologies for the dissemination of clinical practice guidelines to health professionals: a systematic review. JMIR Med Educ. 2016;2:e16. https://doi.org/10.2196/mededu.6288 .

Brown A, Barnes C, Byaruhanga J, McLaughlin M, et al. Effectiveness of technology-enabled knowledge translation strategies in improving the use of research in public health: systematic review. J Med Internet Res. 2020;22:e17274. https://doi.org/10.2196/17274 .

Sykes MJ, McAnuff J, Kolehmainen N. When is audit and feedback effective in dementia care? A systematic review. Int J Nurs Stud. 2018;79:27–35. https://doi.org/10.1016/j.ijnurstu.2017.10.013 .

Bhatt NR, Czarniecki SW, Borgmann H, et al. A systematic review of the use of social media for dissemination of clinical practice guidelines. Eur Urol Focus. 2021;7:1195–204. https://doi.org/10.1016/j.euf.2020.10.008 .

Yamada J, Shorkey A, Barwick M, Widger K, et al. The effectiveness of toolkits as knowledge translation strategies for integrating evidence into clinical care: a systematic review. BMJ Open. 2015;5:e006808. https://doi.org/10.1136/bmjopen-2014-006808 .

Afari-Asiedu S, Abdulai MA, Tostmann A, et al. Interventions to improve dispensing of antibiotics at the community level in low and middle income countries: a systematic review. J Glob Antimicrob Resist. 2022;29:259–74. https://doi.org/10.1016/j.jgar.2022.03.009 .

Boonacker CW, Hoes AW, Dikhoff MJ, Schilder AG, et al. Interventions in health care professionals to improve treatment in children with upper respiratory tract infections. Int J Pediatr Otorhinolaryngol. 2010;74:1113–21. https://doi.org/10.1016/j.ijporl.2010.07.008 .

Al Zoubi FM, Menon A, Mayo NE, et al. The effectiveness of interventions designed to increase the uptake of clinical practice guidelines and best practices among musculoskeletal professionals: a systematic review. BMC Health Serv Res. 2018;18:2–11. https://doi.org/10.1186/s12913-018-3253-0 .

Ariyo P, Zayed B, Riese V, Anton B, et al. Implementation strategies to reduce surgical site infections: a systematic review. Infect Control Hosp Epidemiol. 2019;3:287–300. https://doi.org/10.1017/ice.2018.355 .

Borgert MJ, Goossens A, Dongelmans DA. What are effective strategies for the implementation of care bundles on ICUs: a systematic review. Implement Sci. 2015;10:1–11. https://doi.org/10.1186/s13012-015-0306-1 .

Cahill LS, Carey LM, Lannin NA, et al. Implementation interventions to promote the uptake of evidence-based practices in stroke rehabilitation. Cochrane Database Syst Rev. 2020. https://doi.org/10.1002/14651858.CD012575.pub2 .

Pedersen ER, Rubenstein L, Kandrack R, Danz M, et al. Elusive search for effective provider interventions: a systematic review of provider interventions to increase adherence to evidence-based treatment for depression. Implement Sci. 2018;13:1–30. https://doi.org/10.1186/s13012-018-0788-8 .

Jenkins HJ, Hancock MJ, French SD, Maher CG, et al. Effectiveness of interventions designed to reduce the use of imaging for low-back pain: a systematic review. CMAJ. 2015;187:401–8. https://doi.org/10.1503/cmaj.141183 .

Bennett S, Laver K, MacAndrew M, Beattie E, et al. Implementation of evidence-based, non-pharmacological interventions addressing behavior and psychological symptoms of dementia: a systematic review focused on implementation strategies. Int Psychogeriatr. 2021;33:947–75. https://doi.org/10.1017/S1041610220001702 .

Noonan VK, Wolfe DL, Thorogood NP, et al. Knowledge translation and implementation in spinal cord injury: a systematic review. Spinal Cord. 2014;52:578–87. https://doi.org/10.1038/sc.2014.62 .

Albrecht L, Archibald M, Snelgrove-Clarke E, et al. Systematic review of knowledge translation strategies to promote research uptake in child health settings. J Pediatr Nurs. 2016;31:235–54. https://doi.org/10.1016/j.pedn.2015.12.002 .

Campbell A, Louie-Poon S, Slater L, et al. Knowledge translation strategies used by healthcare professionals in child health settings: an updated systematic review. J Pediatr Nurs. 2019;47:114–20. https://doi.org/10.1016/j.pedn.2019.04.026 .

Bird ML, Miller T, Connell LA, et al. Moving stroke rehabilitation evidence into practice: a systematic review of randomized controlled trials. Clin Rehabil. 2019;33:1586–95. https://doi.org/10.1177/0269215519847253 .

Goorts K, Dizon J, Milanese S. The effectiveness of implementation strategies for promoting evidence informed interventions in allied healthcare: a systematic review. BMC Health Serv Res. 2021;21:1–11. https://doi.org/10.1186/s12913-021-06190-0 .

Zadro JR, O’Keeffe M, Allison JL, Lembke KA, et al. Effectiveness of implementation strategies to improve adherence of physical therapist treatment choices to clinical practice guidelines for musculoskeletal conditions: systematic review. Phys Ther. 2020;100:1516–41. https://doi.org/10.1093/ptj/pzaa101 .

Van der Veer SN, Jager KJ, Nache AM, et al. Translating knowledge on best practice into improving quality of RRT care: a systematic review of implementation strategies. Kidney Int. 2011;80:1021–34. https://doi.org/10.1038/ki.2011.222 .

Pawson R, Greenhalgh T, Harvey G, et al. Realist review–a new method of systematic review designed for complex policy interventions. J Health Serv Res Policy. 2005;10Suppl 1:21–34. https://doi.org/10.1258/1355819054308530 .

Rycroft-Malone J, McCormack B, Hutchinson AM, et al. Realist synthesis: illustrating the method for implementation research. Implementation Sci. 2012;7:1–10. https://doi.org/10.1186/1748-5908-7-33 .

Johnson MJ, May CR. Promoting professional behaviour change in healthcare: what interventions work, and why? A theory-led overview of systematic reviews. BMJ Open. 2015;5:e008592. https://doi.org/10.1136/bmjopen-2015-008592 .

Metz A, Jensen T, Farley A, Boaz A, et al. Is implementation research out of step with implementation practice? Pathways to effective implementation support over the last decade. Implement Res Pract. 2022;3:1–11. https://doi.org/10.1177/26334895221105585 .

May CR, Finch TL, Cornford J, Exley C, et al. Integrating telecare for chronic disease management in the community: What needs to be done? BMC Health Serv Res. 2011;11:1–11. https://doi.org/10.1186/1472-6963-11-131 .

Harvey G, Rycroft-Malone J, Seers K, Wilson P, et al. Connecting the science and practice of implementation – applying the lens of context to inform study design in implementation research. Front Health Serv. 2023;3:1–15. https://doi.org/10.3389/frhs.2023.1162762 .

Download references

Acknowledgements

The authors would like to thank Professor Kathryn Oliver for her support in the planning the review, Professor Steve Hanney for reading and commenting on the final manuscript and the staff at LSHTM library for their support in planning and conducting the literature search.

This study was supported by LSHTM’s Research England QR strategic priorities funding allocation and the National Institute for Health and Care Research (NIHR) Applied Research Collaboration South London (NIHR ARC South London) at King’s College Hospital NHS Foundation Trust. Grant number NIHR200152. The views expressed are those of the author(s) and not necessarily those of the NIHR, the Department of Health and Social Care or Research England.

Author information

Authors and affiliations.

Health and Social Care Workforce Research Unit, The Policy Institute, King’s College London, Virginia Woolf Building, 22 Kingsway, London, WC2B 6LE, UK

Annette Boaz

King’s Business School, King’s College London, 30 Aldwych, London, WC2B 4BG, UK

Juan Baeza & Alec Fraser

Federal University of Santa Catarina (UFSC), Campus Universitário Reitor João Davi Ferreira Lima, Florianópolis, SC, 88.040-900, Brazil

Erik Persson

You can also search for this author in PubMed   Google Scholar

Contributions

AB led the conceptual development and structure of the manuscript. EP conducted the searches and data extraction. All authors contributed to screening and quality appraisal. EP and AF wrote the first draft of the methods section. AB, JB and AF performed result synthesis and contributed to the analyses. AB wrote the first draft of the manuscript and incorporated feedback and revisions from all other authors. All authors revised and approved the final manuscript.

Corresponding author

Correspondence to Annette Boaz .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: appendix a., additional file 2: appendix b., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Boaz, A., Baeza, J., Fraser, A. et al. ‘It depends’: what 86 systematic reviews tell us about what strategies to use to support the use of research in clinical practice. Implementation Sci 19 , 15 (2024). https://doi.org/10.1186/s13012-024-01337-z

Download citation

Received : 01 November 2023

Accepted : 05 January 2024

Published : 19 February 2024

DOI : https://doi.org/10.1186/s13012-024-01337-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Implementation
  • Interventions
  • Clinical practice
  • Research evidence
  • Multi-faceted

Implementation Science

ISSN: 1748-5908

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

systematic review method guidelines

  • Open access
  • Published: 23 February 2024

Comparison of clinical and radiological outcomes for the anterior and medial approaches to open reduction in the treatment of bilateral developmental dysplasia of the hip: a systematic review protocol

  • Edward Alan Jenner   ORCID: orcid.org/0000-0003-0803-5091 1 ,
  • Govind Singh Chauhan 1 ,
  • Abdus Burahee 2 , 3 ,
  • Junaid Choudri 2 ,
  • Adrian Gardner 2 , 3 &
  • Christopher Edward Bache 1  

Systematic Reviews volume  13 , Article number:  72 ( 2024 ) Cite this article

Metrics details

Developmental dysplasia of the hip (DDH) affects 1–3% of newborns and 20% of cases are bilateral. The optimal surgical management strategy for patients with bilateral DDH who fail bracing, closed reduction or present too late for these methods to be used is unclear. There are proponents of both medial approach open reduction (MAOR) and anterior approach open reduction (AOR); however, there is little evidence to inform this debate.

We will perform a systematic review designed according to the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocol. We will search the medical and scientific databases including the grey and difficult to locate literature. The Medical Subject Headings “developmental dysplasia of the hip”, “congenital dysplasia of the hip”, “congenital hip dislocation”, “developmental hip dislocation”, and their abbreviations, “DDH” and “CDH” will be used, along with the qualifier “bilateral”. Reviewers will independently screen records for inclusion and then independently extract data on study design, population characteristics, details of operative intervention and outcomes from the selected records. Data will be synthesised and a meta-analysis performed if possible. If not possible we will analyse data according to Systematic Review without Meta-Analysis guidance. All studies will be assessed for risk of bias. For each outcome measure a summary of findings will be presented in a table with the overall quality of the recommendation assessed using the Grading of Recommendations Assessment Development and Evaluation approach.

The decision to perform MAOR or AOR in patients with bilateral DDH who have failed conservative management is not well informed by the current literature. High-quality, comparative studies are exceptionally challenging to perform for this patient population and likely to be extremely uncommon. A systematic review provides the best opportunity to deliver the highest possible quality of evidence for bilateral DDH surgical management.

Systematic review registration

The protocol has been registered in the International Prospective Register of Systematic Reviews (PROSPERO ID CRD42022362325).

Peer Review reports

Introduction

DDH describes a spectrum of abnormalities in the infant’s hip, from subluxation to frank dislocation, due to incomplete acetabular and femoral head development [ 1 ]. Developmental dysplasia of the hip (DDH) affects 1–3% of newborns and 20% of cases are bilateral [ 2 , 3 , 4 ]. Although many cases of DDH spontaneously resolve as the child grows [ 5 ] those in whom the hip(s) remains shallow, subluxed, or dislocated will go on to develop gait abnormalities, hip pain, and early onset osteoarthritis [ 6 ]. This often requires early hip arthroplasty [ 7 ]. Clinical and radiological outcomes for children with bilateral DDH have been reported to be worse than for children with unilateral DDH by some authors [ 8 , 9 , 10 ] whereas others have found no difference [ 11 , 12 ].

The aim of treatment in bilateral DDH is to achieve concentrically reduced hips, without significant deformity or residual dysplasia. If bilateral DDH is detected as a neonate, abduction bracing is attempted, although failure rates are higher than for unilateral disease [ 8 , 13 , 14 ]. Patients who fail bracing proceed to examination under anesthetic and arthrogram, aiming for closed reduction and hip spica. Typically, this is performed before age 6 months. Bilateral DDH represents a significant risk factor for failure of conservative treatment [ 8 , 9 ] and patients failing closed reduction proceed to open reduction.

Operative options are medial approach open reduction (MAOR) or anterior approach open reduction (AOR). MAOR is performed between 6 and 18 months of age [ 15 ]. This approach requires limited soft tissue dissection through a small, cosmetically acceptable, anteromedial incision with minimal blood loss. The anatomical blocks to reduction (capsular constriction, transverse acetabular ligament, ligamentum teres and iliopsoas tendon) are well visualised and released. Both hips are usually operated on at the same sitting and the patient is immobilised in a hip spica for 6–12 weeks postoperatively. Critics suggest that MAOR increases the risk of femoral head avascular necrosis (AVN), prevents the blocks to femoral head reduction from being fully addressed and does not allow capsulorrhaphy [ 16 , 17 , 18 ]. Rates of residual dysplasia may also be higher. It has been reported that MAOR may have worse outcomes compared to AOR [ 16 , 17 , 18 ]; however, these studies relate to unilateral cases and limited data, specific to bilateral DDH, has been published. The data relating to unilateral disease is itself heterogeneous and contradictory [ 15 , 19 , 20 ].

AOR is usually performed around 12–24 months of age through a bikini line incision via the ilio-inguinal approach. This results in a larger, less cosmetically acceptable scar, more soft tissue dissection, potentially greater blood loss and risks of damage to the lateral femoral cutaneous nerve [ 21 , 22 ]. Proponents argue that AOR allows all the potential blocks to femoral head reduction to be addressed and capsulorrhaphy to be performed therefore improving outcomes [ 23 ]. Pelvic osteotomy can be performed through the same approach and this is usually required when surgery is undertaken after age 2 years [ 24 , 25 , 26 , 27 ]. Typically, in AOR, one hip is operated on at each sitting with a 6-week gap between surgeries during which the patient is immobilised in a hip spica cast [ 11 , 12 ]. Some authors have reported single-sitting bilateral surgery in AOR [ 25 ]; however, this remains rare.

The choice of AOR or MAOR depends on a number of factors, including the patient’s age, the surgeon’s training and experience and the perceived advantages and disadvantages of each technique. Both of these surgical management strategies for bilateral DDH have proponents on each side, however, there is limited evidence to inform decision-making. To the best of our knowledge, this will be the first systematic review comparing outcomes for AOR vs MAOR in bilateral DDH.

Our aim is to establish whether there is a difference in the clinical and radiological outcomes for children with bilateral DDH who have been treated with MAOR compared to AOR. We will examine a range of clinical and radiological outcome measures and if possible perform a quantitative analysis. We will summarise the evidence available and give recommendations for management. This will help to inform decision-making in the management of bilateral DDH.

Design and methods

This protocol has been designed according to the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocol (PRISMA-P) [ 28 , 29 ]. The design and method have been formed through discussion between experts in the management of DDH and experts in the methodology of systematic reviews. The protocol has been registered in the International Prospective Register of Systematic Reviews (PROSPERO ID-CRD42022362325).

Eligibility criteria

Children with idiopathic bilateral developmental dysplasia of the hip undergoing surgical management of both hips.

Exclusion criteria—children with bilateral DDH in whom one hip is managed through harness treatment alone, children with teratologic bilateral developmental dysplasia of the hip, children undergoing revision surgery and surgery for acetabular dysplasia in adolescence.

Intervention

Medial approach open reduction of the hip (MAOR).

Anterior approach open reduction of the hip (AOR).

Rate and severity of avascular necrosis of the femoral head at the latest follow-up using Kalamchi and MacEwen [ 30 ] or Bucholz and Ogden classification [ 31 ] or other appropriate scoring system.

Radiological outcome at the latest follow-up using acetabular index measured in degrees, Severin Score [ 32 ] or other appropriate scoring system.

Clinical outcomes at the latest follow-up including Modified McKay criteria [ 33 ], Children’s Hospital of Oakland Hip Evaluation Scale [ 34 ], Pediatric Outcomes Data Collection Instrument (PODCI) [ 35 ] or other appropriate scoring system.

Prevalence, event rate or time-to-event surgical complications assessed according to the Clavien-Dindo system [ 36 , 37 ] or other appropriate scoring system.

Prevalence, event rate or time to event of secondary surgery.

Study design

Inclusion criteria—clinical studies, level IV (retrospective case series) and above, with a clear description of the operative management with a set of clinical and/or radiological outcomes included, published in English.

Exclusion criteria—case reports, technical or cadaveric studies, studies without a clear description of the operative management or where this is unobtainable, studies without a clear description of clinical and/or radiological outcomes or where this is unobtainable. Full-text studies not available in English will be excluded.

Search strategy

A search of the electronic medical and scientific databases; PubMed, MEDLINE, the Cochrane Library, Embase, Google Scholar, Web of Science and Scopus will be conducted from the date of first entry until the date of search. The grey and difficult-to-locate literature (including theses and dissertations) will be searched via the Open Grey [ 38 ] and Open Access Theses and Dissertations [ 39 ] databases. The Medical Subject Headings (MeSH terms) “developmental dysplasia of the hip”, “congenital dysplasia of the hip”, “congenital hip dislocation”, “developmental hip dislocation”, and their abbreviations, “DDH” and “CDH” will be used, along with the qualifier “bilateral”. The search strategy will be developed in Medline and then applied to other databases. An example of the search strategy can be found in Additional file  1 . Only full-text studies, published in English will be included. There will be no time limit imposed.

Study selection

Two reviewers (EJ and GC) will independently screen the title and abstract of records for inclusion according to the eligibility criteria. Once preliminary screening has been performed, selected studies will be screened as full text. Researchers will be blinded to each other’s decisions. Where there is disagreement a separate reviewer (CEB) will arbitrate. Screening decisions at the full-text stage will be fully recorded. The results of the screening will be presented in a PRISMA flow diagram [ 29 ].

Data management

The selected studies will be collated in the Zotero citation management system, screened for duplicates, and exported to Systematic Review Data Repository-Plus [ 40 ]. This database will be used to aid data extraction and management. Extracted data will be exported to RevMan software for analysis.

Data extraction

Data will be extracted in a predefined electronic data extraction form. Data on study design, population characteristics, details of operative intervention (intervention and comparison), and outcomes (clinical, radiological, complications and rate of secondary surgery) will be extracted. A summary of intended data items for extraction is shown in Table  1 . Four reviewers (EJ, GC, MJC and AB) will be allocated the selected studies and will independently extract data. Each reviewer will be blinded to data extraction. Where possible corresponding authors will be contacted for unreported data. Data will be extracted to a secured anonymised form on Systematic Review Data Repository-Plus and then exported to RevMan for analysis.

Data synthesis

The extracted data will be summarised in a structured table format, grouped and ordered by study design (according to the hierarchy of evidence) or by risk bias if study designs are similar, and including the data items specific to the outcomes of interest. This will help to assess clinical and methodological heterogeneity across the studies and determine the feasibility of performing a meta-analysis. We do not expect the included studies to be of sufficient quality or consistency to allow a meta-analysis to be performed. In this instance, we will follow the Systematic Review without Meta-Analysis (SWiM) guidance [ 41 ] and analyse data according to this and the recommendations in the Cochrane Handbook Chapter 12 [ 42 ]. Studies will be grouped and tabulated as described. We expect that the key outcome data for radiological and clinical outcomes will be in short ordinal scales (e.g. Severin Score [ 32 ]). Where possible we will transform these data to dichotomous outcomes and present this as a relative risk with 95% confidence intervals for MAOR in comparison to AOR. Longer ordinal scales such as the Pediatric Outcomes Data Collection Instrument [ 35 ] will be transformed to continuous data. For complications and secondary surgery data we will transform to an incidence estimate, event rate or time-to-event data. For non-comparative studies, we will transform extracted data as described above and use this to generate a crude estimate of incidence, prevalence or event rate. Where possible we will pool this data using a random effects model as per the recommendation in Murad et al. [ 43 ]. Results will be reported according to the guidance in the Cochrane Handbook Chapter 12 [ 42 ]. Where sufficient information is available but synthesis cannot be performed a structured reporting of effects will be used. When effect estimates are available without measures of precision an illustrated synthesis of summary statistics will be used. If P values are available an illustrated synthesis of P values will be used. Where directions of effect are available an illustrated synthesis using vote-counting based on direction of effect will be used.

We aim to limit publication bias by a thorough and systematic search of the literature including the grey literature as described in the search strategy. Where possible publication bias will be assessed across studies by generation of funnel plots. These will be inspected for asymmetry and analysed via Egger’s test [ 44 ].

Risk of bias

Randomised trials will be assessed using the Cochrane Risk of Bias 2 (RoB 2) tool [ 45 ]. However, included studies are most likely to be non-randomised, observational studies. For comparative studies (cohort or case–control) we will use the ROBINS-I tool to assess risk of bias [ 46 ]. For case series, we will use Murad et al. ’s method for evaluating the methodological quality across four domains; selection, ascertainment, causality and reporting [ 43 ]. Four reviewers (EJ, GC, MJC & AB) will assess included studies for risk of bias. A separate reviewer (CEB) will resolve disagreements through discussion. A summary figure of the risk of bias analysis will be included in the final manuscript.

Assessment of quality

For each outcome measure a summary of findings will be presented in a table [ 47 ] with the overall quality of the recommendation assessed using the Grading of Recommendations Assessment Development and Evaluation approach (GRADE) [ 48 ]. This approach uses five factors; risk of bias, inconsistency, indirectness, imprecision and publication bias to assess the quality of evidence and produce a rating of “high”, “moderate”, “low” or “very low”. GRADEpro GDT software [ 49 ] will be used to aid decision-making when assessing the quality of evidence.

Discussion and implications of review

Management of bilateral DDH represents a significant challenge for the paediatric orthopaedic surgeon. The aim of treatment is to achieve concentrically reduced hips, without significant deformity or residual dysplasia. The decision to perform MAOR or AOR in patients with bilateral DDH who have failed conservative management is not well informed by the current literature. High-quality, comparative studies are exceptionally challenging to perform for this patient population and likely to be extremely uncommon. A systematic review provides the best opportunity to deliver the highest possible quality of evidence for bilateral DDH surgical management. We are not aware of any systematic reviews that compare the outcomes of MAOR with AOR for bilateral DDH. This study aims to identify whether there are any significant differences in the clinical or radiological outcomes for patients with bilateral DDH surgically treated with MAOR compared to AOR so that surgeons can make better-informed decisions about the management strategy they will offer to patients.

Limitations

We expect that this review will be limited by studies that have a small sample size and have a retrospective, non-comparative study design. We expect result reporting to be heterogeneous and incomplete. These limitations will place all studies at a high risk of bias and therefore limit the quality of evidence that can be derived from the systematic review.

Availability of data and materials

Not applicable.

Abbreviations

  • Developmental dysplasia of the hip

Medial approach open reduction

Anterior approach open reduction

  • Congenital dysplasia of the hip

Avascular necrosis

Preferred Reporting Items for Systematic Review and Meta-Analysis Protocol

Pediatric Outcomes Data Collection Instrument

Medical Subject Headings

Systematic Review without Meta-Analysis

Cochrane Risk of Bias 2

Grading of Recommendations Assessment Development and Evaluation approach

Zhang S, Doudoulakis KJ, Khurwal A, Sarraf KM. Developmental dysplasia of the hip. Br J Hosp Med Lond Engl 2005. 2020;81(7):1–8.

Google Scholar  

Sewell MD, Rosendahl K, Eastwood DM. Developmental dysplasia of the hip. BMJ. 2009;339:b4454.

Article   CAS   PubMed   Google Scholar  

Marks DS, Clegg J, Al-Chalabi AN. Routine ultrasound screening for neonatal hip instability. Can it abolish late-presenting congenital dislocation of the hip? J Bone Joint Surg Br. 1994;76(4):534–8.

Macnicol MF. Results of a 25-year screening programme for neonatal hip instability. J Bone Joint Surg Br. 1990;72(6):1057–60.

Bialik V, Bialik GM, Blazer S, Sujov P, Wiener F, Berant M. Developmental dysplasia of the hip: a new approach to incidence. Pediatrics. 1999;103(1):93–9.

Cooperman DR, Wallensten R, Stulberg SD. Acetabular dysplasia in the adult. Clin Orthop. 1983;175:79–85.

Article   Google Scholar  

Furnes O, Lie SA, Espehaug B, Vollset SE, Engesaeter LB, Havelin LI. Hip disease and the prognosis of total hip replacements. A review of 53,698 primary total hip replacements reported to the Norwegian Arthroplasty Register 1987–99. J Bone Joint Surg Br. 2001;83(4):579–86.

Kitoh H, Kawasumi M, Ishiguro N. Predictive factors for unsuccessful treatment of developmental dysplasia of the hip by the Pavlik harness. J Pediatr Orthop. 2009;29(6):552–7.

Article   PubMed   Google Scholar  

Viere RG, Birch JG, Herring JA, Roach JW, Johnston CE. Use of the Pavlik harness in congenital dislocation of the hip. An analysis of failures of treatment. J Bone Joint Surg Am. 1990;72(2):238–44.

Greene WB, Drennan JC. A comparative study of bilateral versus unilateral congenital dislocation of the hip. Clin Orthop. 1982;162:78–86.

Zionts LE, MacEwen GD. Treatment of congenital dislocation of the hip in children between the ages of one and three years. J Bone Joint Surg Am. 1986;68(6):829–46.

Wang TM, Wu KW, Shih SF, Huang SC, Kuo KN. Outcomes of open reduction for developmental dysplasia of the hip: does bilateral dysplasia have a poorer outcome? J Bone Jt Surg Am. 2013;95(12):1081–6.

Segal LS, Boal DK, Borthwick L, Clark MW, Localio AR, Schwentker EP. Avascular necrosis after treatment of DDH: the protective influence of the ossific nucleus. J Pediatr Orthop. 1999;19(2):177–84.

Lerman JA, Emans JB, Millis MB, Share J, Zurakowski D, Kasser JR. Early failure of Pavlik harness treatment for developmental hip dysplasia: clinical and ultrasound predictors. J Pediatr Orthop. 2001;21(3):348–53.

Akilapa O. The medial approach open reduction for developmental dysplasia of the hip: do the long-term outcomes validate this approach? A systematic review of the literature. J Child Orthop. 2014;8(5):387–97.

Article   PubMed   PubMed Central   Google Scholar  

Okano K, Yamada K, Takahashi K, Enomoto H, Osaki M, Shindo H. Long-term outcome of Ludloff’s medial approach for open reduction of developmental dislocation of the hip in relation to the age at operation. Int Orthop. 2009;33(5):1391–6.

Mankey MG, Arntz GT, Staheli LT. Open reduction through a medial approach for congenital dislocation of the hip. A critical review of the Ludloff approach in sixty-six hips. J Bone Joint Surg Am. 1993;75(9):1334–45.

Koizumi W, Moriya H, Tsuchiya K, Takeuchi T, Kamegaya M, Akita T. Ludloff’s medial approach for open reduction of congenital dislocation of the hip. A 20-year follow-up. J Bone Joint Surg Br. 1996;78(6):924–9.

Gardner ROE, Bradley CS, Howard A, Narayanan UG, Wedge JH, Kelley SP. The incidence of avascular necrosis and the radiographic outcome following medial open reduction in children with developmental dysplasia of the hip: a systematic review. Bone Jt J. 2014;96-B(2):279–86.

Article   CAS   Google Scholar  

Hoellwarth JS, Kim YJ, Millis MB, Kasser JR, Zurakowski D, Matheney TH. Medial versus anterior open reduction for developmental hip dislocation in age-matched patients. J Pediatr Orthop. 2015;35(1):50–6.

Jia G, Wang E, Lian P, Liu T, Zhao S, Zhao Q. Anterior approach with mini-bikini incision in open reduction in infants with developmental dysplasia of the hip. J Orthop Surg. 2020;15(1):180.

Rudin D, Manestar M, Ullrich O, Erhardt J, Grob K. The anatomical course of the lateral femoral cutaneous nerve with special attention to the anterior approach to the hip joint. JBJS. 2016;98(7):561–7.

Herring JA Tachdjian MO. Texas Scottish rite hospital for children. Tachdjian’s Pediatric Orthopaedics. 4th ed. Philadelphia: Saunders/Elsevier; 2008. help_tachdjiansv1c15.pdf. Available from: https://storage.googleapis.com/global-help-publications/books/help_tachdjiansv1c15.pdf . [cited 2023 Sep 25].

Subasi M, Arslan H, Cebesoy O, Buyukbebeci O, Kapukaya A. Outcome in unilateral or bilateral DDH treated with one-stage combined procedure. Clin Orthop. 2008;466(4):830–6.

Ezirmik N, Yildiz K. Advantages of single-stage surgical treatment with salter innominate osteotomy and pemberton pericapsular osteotomy for developmental dysplasia of both hips. J Int Med Res. 2012;40(2):748–55.

Agus H, Bozoglan M, Kalenderer Ö, Kazımoğlu C, Onvural B, Akan İ. How are outcomes affected by performing a one-stage combined procedure simultaneously in bilateral developmental hip dysplasia? Int Orthop. 2014;38(6):1219–24.

Kotzias Neto A, Ferraz A, Bayer Foresti F, Barreiros HR. Bilateral developmental dysplasia of the hip treated with open reduction and Salter osteotomy: analysis on the radiographic results. Rev Bras Ortop Engl Ed. 2014;49(4):350–8.

Rethlefsen ML, Kirtley S, Waffenschmidt S, Ayala AP, Moher D, Page MJ, et al. PRISMA-S: an extension to the PRISMA statement for reporting literature searches in systematic reviews. Syst Rev. 2021;10(1):39.

Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4(1):1.

Kalamchi A, MacEwen GD. Avascular necrosis following treatment of congenital dislocation of the hip. J Bone Joint Surg Am. 1980;62(6):876–88.

Patterns of ischemic necrosis of the proximal femur in nonoperatively treated congenital hip disease – ScienceOpen. Available from: https://www.scienceopen.com/document?vid=397821b3-1187-46ae-a255-9f9805906ecf . [cited 2022 Oct 12].

Severin E. Congenital dislocation of the hip; development of the joint after closed reduction. J Bone Joint Surg Am. 1950;32-A(3):507–18.

McKay DW. A comparison of the innominate and the pericapsular osteotomy in the treatment of congenital dislocation of the hip. Clin Orthop. 1974;98:124–32.

Aguilar CM, Neumayr LD, Eggleston BE, Earles AN, Robertson SM, Jergesen HE, et al. Clinical evaluation of avascular necrosis in patients with sickle cell disease: children’s hospital Oakland hip evaluation scale–a modification of the harris hip score. Arch Phys Med Rehabil. 2005;86(7):1369–75.

Daltroy LH, Liang MH, Fossel AH, Goldberg MJ. The POSNA pediatric musculoskeletal functional health questionnaire: report on reliability, validity, and sensitivity to change. Pediatric Outcomes Instrument Development Group. Pediatric Orthopaedic Society of North America. J Pediatr Orthop. 1998;18(5):561–71.

Dindo D, Demartines N, Clavien PA. Classification of surgical complications: a new proposal with evaluation in a cohort of 6336 patients and results of a survey. Ann Surg. 2004;240(2):205–13.

Dodwell ER, Pathy R, Widmann RF, Green DW, Scher DM, Blanco JS, et al. Reliability of the modified Clavien-Dindo-Sink complication classification system in pediatric orthopaedic surgery. JBJS Open Access. 2018;3(4):e0020.

OPENGREY.EU - Grey literature database. Available from: https://opengrey.eu/ . [cited 2022 Oct 13].

OATD – Open access theses and dissertations. Available from: https://oatd.org/ . [cited 2022 Oct 13].

SRDR+. Available from: https://srdrplus.ahrq.gov/ . [cited 2022 Oct 12].

Campbell M, McKenzie JE, Sowden A, Katikireddi SV, Brennan SE, Ellis S, et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ. 2020;368:l6890.

Chapter 12: synthesizing and presenting findings using other methods. Available from: https://training.cochrane.org/handbook/current/chapter-12 . [cited 2022 Oct 12].

Murad MH, Sultan S, Haffar S, Bazerbachi F. Methodological quality and synthesis of case series and case reports. BMJ Evid Based Med. 2018;23(2):60–3.

Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997;315(7109):629–34.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898.

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919.

Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94.

Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924–6.

Guideline Development Tool. Available from: https://gdt.gradepro.org/app/#projects . [cited 2022 Oct 13].

Download references

Acknowledgements

Author information, authors and affiliations.

Birmingham Children’s Hospital, Steelhouse Lane, Birmingham, B4 6NH, UK

Edward Alan Jenner, Govind Singh Chauhan & Christopher Edward Bache

Royal Orthopaedic Hospital, Bristol Road South, Birmingham, B31 2AP, UK

Abdus Burahee, Junaid Choudri & Adrian Gardner

University of Birmingham, College of Medical & Dental Sciences, Birmingham, UK

Abdus Burahee & Adrian Gardner

You can also search for this author in PubMed   Google Scholar

Contributions

EJ, GC, AB, JC, AG and CEB all made substantial contributions to the conceptualisation, design, background, drafting and editing of this protocol.

Corresponding author

Correspondence to Edward Alan Jenner .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Search strategy example.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Jenner, E.A., Chauhan, G.S., Burahee, A. et al. Comparison of clinical and radiological outcomes for the anterior and medial approaches to open reduction in the treatment of bilateral developmental dysplasia of the hip: a systematic review protocol. Syst Rev 13 , 72 (2024). https://doi.org/10.1186/s13643-023-02444-6

Download citation

Received : 19 January 2023

Accepted : 21 December 2023

Published : 23 February 2024

DOI : https://doi.org/10.1186/s13643-023-02444-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Congenital hip dislocation
  • Developmental hip dislocation

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

systematic review method guidelines

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Effect of exercise for...

Effect of exercise for depression: systematic review and network meta-analysis of randomised controlled trials

Linked editorial.

Exercise for the treatment of depression

  • Related content
  • Peer review
  • Michael Noetel , senior lecturer 1 ,
  • Taren Sanders , senior research fellow 2 ,
  • Daniel Gallardo-Gómez , doctoral student 3 ,
  • Paul Taylor , deputy head of school 4 ,
  • Borja del Pozo Cruz , associate professor 5 6 ,
  • Daniel van den Hoek , senior lecturer 7 ,
  • Jordan J Smith , senior lecturer 8 ,
  • John Mahoney , senior lecturer 9 ,
  • Jemima Spathis , senior lecturer 9 ,
  • Mark Moresi , lecturer 4 ,
  • Rebecca Pagano , senior lecturer 10 ,
  • Lisa Pagano , postdoctoral fellow 11 ,
  • Roberta Vasconcellos , doctoral student 2 ,
  • Hugh Arnott , masters student 2 ,
  • Benjamin Varley , doctoral student 12 ,
  • Philip Parker , pro vice chancellor research 13 ,
  • Stuart Biddle , professor 14 15 ,
  • Chris Lonsdale , deputy provost 13
  • 1 School of Psychology, University of Queensland, St Lucia, QLD 4072, Australia
  • 2 Institute for Positive Psychology and Education, Australian Catholic University, North Sydney, NSW, Australia
  • 3 Department of Physical Education and Sport, University of Seville, Seville, Spain
  • 4 School of Health and Behavioural Sciences, Australian Catholic University, Strathfield, NSW, Australia
  • 5 Department of Clinical Biomechanics and Sports Science, University of Southern Denmark, Odense, Denmark
  • 6 Biomedical Research and Innovation Institute of Cádiz (INiBICA) Research Unit, University of Cádiz, Spain
  • 7 School of Health and Behavioural Sciences, University of the Sunshine Coast, Petrie, QLD, Australia
  • 8 School of Education, University of Newcastle, Callaghan, NSW, Australia
  • 9 School of Health and Behavioural Sciences, Australian Catholic University, Banyo, QLD, Australia
  • 10 School of Education, Australian Catholic University, Strathfield, NSW, Australia
  • 11 Australian Institute of Health Innovation, Macquarie University, Macquarie Park, NSW, Australia
  • 12 Children’s Hospital Westmead Clinical School, University of Sydney, Westmead, NSW, Australia
  • 13 Australian Catholic University, North Sydney, NSW, Australia
  • 14 Centre for Health Research, University of Southern Queensland, Springfield, QLD, Australia
  • 15 Faculty of Sport and Health Science, University of Jyvaskyla, Jyvaskyla, Finland
  • Correspondence to: M Noetel m.noetel{at}uq.edu.au (or @mnoetel on Twitter)
  • Accepted 15 January 2024

Objective To identify the optimal dose and modality of exercise for treating major depressive disorder, compared with psychotherapy, antidepressants, and control conditions.

Design Systematic review and network meta-analysis.

Methods Screening, data extraction, coding, and risk of bias assessment were performed independently and in duplicate. Bayesian arm based, multilevel network meta-analyses were performed for the primary analyses. Quality of the evidence for each arm was graded using the confidence in network meta-analysis (CINeMA) online tool.

Data sources Cochrane Library, Medline, Embase, SPORTDiscus, and PsycINFO databases.

Eligibility criteria for selecting studies Any randomised trial with exercise arms for participants meeting clinical cut-offs for major depression.

Results 218 unique studies with a total of 495 arms and 14 170 participants were included. Compared with active controls (eg, usual care, placebo tablet), moderate reductions in depression were found for walking or jogging (n=1210, κ=51, Hedges’ g −0.62, 95% credible interval −0.80 to −0.45), yoga (n=1047, κ=33, g −0.55, −0.73 to −0.36), strength training (n=643, κ=22, g −0.49, −0.69 to −0.29), mixed aerobic exercises (n=1286, κ=51, g −0.43, −0.61 to −0.24), and tai chi or qigong (n=343, κ=12, g −0.42, −0.65 to −0.21). The effects of exercise were proportional to the intensity prescribed. Strength training and yoga appeared to be the most acceptable modalities. Results appeared robust to publication bias, but only one study met the Cochrane criteria for low risk of bias. As a result, confidence in accordance with CINeMA was low for walking or jogging and very low for other treatments.

Conclusions Exercise is an effective treatment for depression, with walking or jogging, yoga, and strength training more effective than other exercises, particularly when intense. Yoga and strength training were well tolerated compared with other treatments. Exercise appeared equally effective for people with and without comorbidities and with different baseline levels of depression. To mitigate expectancy effects, future studies could aim to blind participants and staff. These forms of exercise could be considered alongside psychotherapy and antidepressants as core treatments for depression.

Systematic review registration PROSPERO CRD42018118040.

Figure1

  • Download figure
  • Open in new tab
  • Download powerpoint

Introduction

Major depressive disorder is a leading cause of disability worldwide 1 and has been found to lower life satisfaction more than debt, divorce, and diabetes 2 and to exacerbate comorbidities, including heart disease, 3 anxiety, 4 and cancer. 5 Although people with major depressive disorder often respond well to drug treatments and psychotherapy, 6 7 many are resistant to treatment. 8 In addition, access to treatment for many people with depression is limited, with only 51% treatment coverage for high income countries and 20% for low and lower-middle income countries. 9 More evidence based treatments are therefore needed.

Exercise may be an effective complement or alternative to drugs and psychotherapy. 10 11 12 13 14 In addition to mental health benefits, exercise also improves a range of physical and cognitive outcomes. 15 16 17 Clinical practice guidelines in the US, UK, and Australia recommend physical activity as part of treatment for depression. 18 19 20 21 But these guidelines do not provide clear, consistent recommendations about dose or exercise modality. British guidelines recommend group exercise programmes 20 21 and offer general recommendations to increase any form of physical activity, 21 the American Psychiatric Association recommends any dose of aerobic exercise or resistance training, 20 and Australian and New Zealand guidelines suggest a combination of strength and vigorous aerobic exercises, with at least two or three bouts weekly. 19

Authors of guidelines may find it hard to provide consistent recommendations on the basis of existing mainly pairwise meta-analyses—that is, assessing a specific modality versus a specific comparator in a distinct group of participants. 12 13 22 These meta-analyses have come under scrutiny for pooling heterogeneous treatments and heterogenous comparisons leading to ambiguous effect estimates. 23 Reviews also face the opposite problem, excluding exercise treatments such as yoga, tai chi, and qigong because grouping them with strength training might be inappropriate. 23 Overviews of reviews have tried to deal with this problem by combining pairwise meta-analyses on individual treatments. A recent such overview found no differences between exercise modalities. 13 Comparing effect sizes between different pairwise meta-analyses can also lead to confusion because of differences in analytical methods used between meta-analysis, such as choice of a control to use as the referent. Network meta-analyses are a better way to precisely quantify differences between interventions as they simultaneously model the direct and indirect comparisons between interventions. 24

Network meta-analyses have been used to compare different types of psychotherapy and pharmacotherapy for depression. 6 25 26 For exercise, they have shown that dose and modality influence outcomes for cognition, 16 back pain, 15 and blood pressure. 17 Two network meta-analyses explored the effects of exercise on depression: one among older adults 27 and the other for mental health conditions. 28 Because of the inclusion criteria and search strategies used, these reviews might have been under-powered to explore moderators such as dose and modality (κ=15 and κ=71, respectively). To resolve conflicting findings in existing reviews, we comprehensively searched randomised trials on exercise for depression to ensure our review was adequately powered to identify the optimal dose and modality of exercise. For example, a large overview of reviews found effects on depression to be proportional to intensity, with vigorous exercise appearing to be better, 13 but a later meta-analysis found no such effects. 22 We explored whether recommendations differ based on participants’ sex, age, and baseline level of depression.

Given the challenges presented by behaviour change in people with depression, 29 we also identified autonomy support or behaviour change techniques that might improve the effects of intervention. 30 Behaviour change techniques such as self-monitoring and action planning have been shown to influence the effects of physical activity interventions in adults (>18 years) 31 and older adults (>60 years) 32 with differing effectiveness of techniques in different populations. We therefore tested whether any intervention components from the behaviour change technique taxonomy were associated with higher or lower intervention effects. 30 Other meta-analyses found that physical activity interventions work better when they provide people with autonomy (eg, choices, invitational language). 33 Autonomy is not well captured in the taxonomy for behaviour change technique. We therefore tested whether effects were stronger in studies that provided more autonomy support to patients. Finally, to understand the mechanism of intervention effects, such as self-confidence, affect, and physical fitness, we collated all studies that conducted formal mediation analyses.

Our findings are presented according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses-Network Meta-analyses (PRISMA-NMA) guidelines (see supplementary file, section S0; all supplementary files, data, and code are also available at https://osf.io/nzw6u/ ). 34 We amended our analysis strategy after registering our review; these changes were to better align with new norms established by the Cochrane Comparing Multiple Interventions Methods Group. 35 These norms were introduced between the publication of our protocol and the preparation of this manuscript. The largest change was using the confidence in network meta-analysis (CINeMA) 35 online tool instead of the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) guidelines and adopting methods to facilitate assessments—for example, instead of using an omnibus test for all treatments, we assessed publication bias for each treatment compared with active controls. We also modelled acceptability (through dropout rate), which was not predefined but was adopted in response to a reviewer’s comment.

Eligibility criteria

To be eligible for inclusion, studies had to be randomised controlled trials that included exercise as a treatment for depression and included participants who met the criteria for major depressive disorder, either clinician diagnosed or identified through participant self-report as exceeding established clinical thresholds (eg, scored >13 on the Beck depression inventory-II). 36 Studies could meet these criteria when all the participants had depression or when the study reported depression outcomes for a subgroup of participants with depression at the start of the study.

We defined exercise as “planned, structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness.” 37 Unlike recent reviews, 12 22 we included studies with more than one exercise arm and multifaceted interventions (eg, health and exercise counselling) as long as they contained a substantial exercise component. These trials could be included because network meta-analysis methods allows for the grouping of those interventions into homogenous nodes. Unlike the most recent Cochrane review, 12 we also included participants with physical comorbidities such as arthritis and participants with postpartum depression because the Diagnostic Statistical Manual of Mental Health Disorders , fifth edition, removed the postpartum onset specifier after that analysis was completed. 23 Studies were excluded if interventions were shorter than one week, depression was not reported as an outcome, and data were insufficient to calculate an effect size for each arm. Any comparison condition was included, allowing us to quantify the effects against established treatments (eg, selective serotonin reuptake inhibitors (SSRIs), cognitive behavioural therapy), active control conditions (usual care, placebo tablet, stretching, educational control, and social support), or waitlist control conditions. Published and unpublished studies were included, with no restrictions on language applied.

Information sources

We adapted the search strategy from the most recent Cochrane review, 12 adding keywords for yoga, tai chi, and qigong, as they met our definition for exercise. We conducted database searches, without filters or date limits, in The Cochrane Library via CENTRAL, SPORTDiscus via Embase, and Medline, Embase, and PsycINFO via Ovid. Searches of the databases were conducted on 17 December 2018 and 7 August 2020 and last updated on 3 June 2023 (see supplementary file section S1 for full search strategies). We assessed full texts of all included studies from two systematic reviews of exercise for depression. 12 22

Study selection and data collection

To select studies, we removed duplicate records in Covidence 38 and then screened each title and abstract independently and in duplicate. Conflicts were resolved through discussion or consultation with a third reviewer. The same methods were used for full text screening.

We used the Extraction 1.0 randomised controlled trial data extraction forms in Covidence. 38 Data were extracted independently and in duplicate, with conflicts resolved through discussion with a third reviewer.

For each study, we extracted a description of the interventions, including frequency, intensity, and type and time of each exercise intervention. Using the Compendium of Physical Activities, 39 we calculated the energy expenditure dose of exercise for each arm as metabolic equivalents of task (METs) min/week. Two authors evaluated each exercise intervention using the Behaviour Change Taxonomy version 1 30 for behaviour change techniques explicitly described in each exercise arm. They also rated the level of autonomy offered to participants, on a scale from 1 (no choice) to 10 (full autonomy). We also extracted descriptions of the other arms within the randomised trials, including other treatment or control conditions; participants’ age, sex, comorbidities, and baseline severity of depressive symptoms; and each trial’s location and whether or not the trial was funded.

Risk of bias in individual studies

We used Cochrane’s risk of bias tool for randomised controlled trials. 40 Risk of bias was rated independently and in duplicate, with conflicts resolved through discussion with a third reviewer.

Summary measures and synthesis

For main and moderation analyses, we used bayesian arm based multilevel network meta-analysis models. 41 All network meta-analytical approaches allow users to assess the effects of treatments against a range of comparisons. The bayesian arm based models allowed us to also assess the influence of hypothesised moderators, such as intensity, dose, age, and sex. Many network meta-analyses use contrast based methods, comparing post-test scores between study arms. 41 Arm based meta-analyses instead describe the population-averaged absolute effect size for each treatment arm (ie, each arm’s change score). 41 As a result, the summary measure we used was the standardised mean change from baseline, calculated as standardised mean differences with correction for small studies (Hedges’ g). In keeping with the norms from the included studies, effect sizes describe treatment effects on depression, such that larger negative numbers represent stronger effects on symptoms. Using National Institute for Health and Care Excellence guidelines, 42 we standardised change scores for different depression scales (eg, Beck depression inventory, Hamilton depression rating scale) using an internal reference standard for each scale (for each scale, the average of pooled standard deviations at baseline) reported in our meta-analysis. Because depression scores generally show regression to the mean, even in control conditions, we present effect sizes as improvements beyond active control conditions. This convention makes our results comparable to existing, contrast based meta-analyses.

Active control conditions (usual care, placebo tablet, stretching, educational control, and social support) were grouped to increase power for moderation analyses, for parsimony in the network graph, and because they all showed similar arm based pooled effect sizes (Hedges’ g between −0.93 and −1.00 for all, with no statistically significant differences). We separated waitlist control from these active control conditions because it typically shows poorer effects in treatment for depression. 43

Bayesian meta-analyses were conducted in R 44 using the brms package. 45 We preregistered informative priors based on the distributional parameters of our meta-analytical model. 46 We nested effects within arms to manage dependency between multiple effect sizes from the same participants. 46 For example, if one study reported two self-reported measures of depression, or reported both self-report and clinician rated depression, we nested these effect sizes within the arm to account for both pieces of information while controlling for dependency between effects. 46 Finally, we compared absolute effect sizes against a standardised minimum clinically important difference, 0.5 standard deviations of the change score. 47 From our data, this corresponded to a large change in before and after scores (Hedges’ g −1.16), a moderate change compared with waitlist control (g −0.55), or a small benefit when compared with active controls (g −0.20). For credibility assessments comparing exercise modalities, we used the netmeta package 48 and CINeMA. 49 We also used netmeta to model acceptability, comparing the odds ratio for drop-out rate in each arm.

Additional analyses

All prespecified moderation and sensitivity analyses were performed. We moderated for participant characteristics, including participants’ sex, age, baseline symptom severity, and presence or absence of comorbidities; duration of the intervention (weeks); weekly dose of the intervention; duration between completion of treatment and measurement, to test robustness to remission (in response to a reviewer’s suggestion); amount of autonomy provided in the exercise prescription; and presence of each behaviour change technique. As preregistered, we moderated for behaviour change techniques in three ways: through meta-regression, including all behaviour change techniques simultaneously for primary analysis; including one behaviour change technique at a time (using 99% credible intervals to somewhat control for multiple comparisons) in exploratory analyses; and through meta-analytical classification and regression trees (metaCART), which allowed for interactions between moderating variables (eg, if goal setting combined with feedback had synergistic effects). 50 We conducted sensitivity analyses for risk of bias, assessing whether studies with low versus unclear or high risk of bias on each domain showed statistically significant differences in effect sizes.

Credibility assessment

To assess the credibility of each comparison against active control, we used CINeMA. 35 49 This online tool was designed by the Cochrane Comparing Multiple Interventions Methods Group as an adaptation of GRADE for network meta-analyses. 35 In line with recommended guidelines, for each comparison we made judgements for within study bias, reporting bias, indirectness, imprecision, heterogeneity, and incoherence. Similar to GRADE, we considered the evidence for comparisons to show high confidence then downgraded on the basis of concerns in each domain, as follows:

Within study bias —Comparisons were downgraded when most of the studies providing direct evidence for comparisons were unclear or high risk.

Reporting bias —Publication bias was assessed in three ways. For each comparison with at least 10 studies 51 we created funnel plots, including estimates of effect sizes after removing studies with statistically significant findings (ie, worst case estimates) 52 ; calculated an s value, representing how strong publication bias would need to be to nullify meta-analytical effects 52 ; and conducted a multilevel Egger’s regression test, indicative of small study bias. Given these tests are not recommended for comparisons with fewer than 10 studies, 51 those comparisons were considered to show “some concerns.”

Indirectness — Our primary population of interest was adults with major depression. Studies were considered to be indirect if they focused on one sex only (>90% male or female), participants with comorbidities (eg, heart disease), adolescents and young adults (14-20 years), or older adults (>60 years). We flagged these studies as showing some concerns if one of these factors was present, and as “major concerns” if two of these factors were present. Evidence from comparisons was classified as some concerns or major concerns using majority rating for studies directly informing the comparison.

Imprecision — As per CINeMA, we used the clinically important difference of Hedges’ g=0.2 to ascribe a zone of equivalence, where differences were not considered clinically significant (−0.2<g<0.2). Studies were flagged as some concerns for imprecision if the bounds of the 95% credible interval extended across that zone, and they were flagged as major concerns if the bounds extended to the other side of the zone of equivalence (such that effects could be harmful).

Heterogeneity — Prediction intervals account for heterogeneity differently from credible intervals. 35 As a result, CINeMA accounts for heterogeneity by assessing whether the prediction intervals and the credible intervals lead to different conclusions about clinical significance (using the same zone of equivalence from imprecision). Comparisons are flagged as some concerns if the prediction interval crosses into, or out of, the zone of equivalence once (eg, from helpful to no meaningful effect), and as major concerns if the prediction interval crosses the zone twice (eg, from helpful and harmful).

Incoherence — Incoherence assesses whether the network meta-analysis provides similar estimates when using direct evidence (eg, randomised controlled trials on strength training versus SSRI) compared with indirect evidence (eg, randomised controlled trials where either strength training or SSRI uses waitlist control). Incoherence provides some evidence the network may violate the assumption of transitivity: that the only systematic difference between arms is the treatment, not other confounders. We assessed incoherence using two methods: Firstly, a global design-by-treatment interaction to assess for incoherence across the whole network, 35 49 and, secondly, separating indirect and direct evidence (SIDE method) for each comparison through netsplitting to see whether differences between those effect estimates were statistically significant. We flagged comparisons as some concerns if either no direct comparisons were available or direct and indirect evidence gave different conclusions about clinical significance (eg, from helpful to no meaningful effect, as per imprecision and heterogeneity). Again, we classified comparisons as major concerns if the direct and indirect evidence changed the sign of the effect or changed both limits of the credible interval. 35 49

Patient and public involvement

We discussed the aims and design of this study with members of the public, including those who had experienced depression. Several of our authors have experienced major depressive episodes, but beyond that we did not include patients in the conduct of this review.

Study selection

The PRISMA flow diagram outlines the study selection process ( fig 1 ). We used two previous reviews to identify potentially eligible studies for inclusion. 12 22 Database searches identified 18 658 possible studies. After 5505 duplicates had been removed, two reviewers independently screened 13 115 titles and abstracts. After screening, two reviewers independently reviewed 1738 full text articles. Supplementary file section S2 shows the consensus reasons for exclusion. A total of 218 unique studies described in 246 reports were included, totalling 495 arms and 14 170 participants. Supplementary file section S3 lists the references and characteristics of the included studies.

Fig 1

Flow of studies through review

Network geometry

As preregistered, we removed nodes with fewer than 100 participants. Using this filter, most interventions contained comparisons with at least four other nodes in the network geometry ( fig 2 ). The results of the global test design-by-treatment interaction model were not statistically significant, supporting the assumption of transitivity (χ 2 =94.92, df=75, P=0.06). When net-splitting was used on all possible combinations in the network, for two out of the 120 comparisons we found statistically significant incoherence between direct and indirect evidence (SSRI v waitlist control; cognitive behavioural therapy v tai chi or qigong). Overall, we found little statistical evidence that the model violated the assumption of transitivity. Qualitative differences were, however, found for participant characteristics between different arms (see supplementary file, section S4). For example, some interventions appeared to be prescribed more frequently among people with severe depression (eg, 7/16 studies using SSRIs) compared with other interventions (eg, 1/15 studies using aerobic exercise combined with therapy). Similarly, some interventions appeared more likely to be prescribed for older adults (eg, mean age, tai chi=59 v dance=31) or women (eg, per cent female: dance=88% v cycling=53%). Given that plausible mechanisms exist for these systematic differences (eg, the popularity of tai chi among older adults), 53 there are reasons to believe that allocation to treatment arms would be less than perfectly random. We have factored these biases in our certainty estimates through indirectness ratings.

Fig 2

Network geometry indicating number of participants in each arm (size of points) and number of comparisons between arms (thickness of lines). SSRI=selective serotonin reuptake inhibitor

Risk of bias within studies

Supplementary file section S5 provides the risk of bias ratings for each study. Few studies explicitly blinded participants and staff ( fig 3 ). As a result, overall risk of bias for most studies was unclear or high, and effect sizes could include expectancy effects, among other biases. However, sensitivity analyses suggested that effect sizes were not influenced by any risk of bias criteria owing to wide credible intervals (see supplementary file, section S6). Nevertheless, certainty ratings for all treatments arms were downgraded owing to high risk of bias in the studies informing the comparison.

Fig 3

Risk of bias summary plot showing percentage of included studies judged to be low, unclear, or high risk across Cochrane criteria for randomised trials

Synthesis of results

Supplementary file section S7 presents a forest plot of Hedges’ g values for each study. Figure 4 shows the predicted effects of each treatment compared with active controls. Compared with active controls, large reductions in depression were found for dance (n=107, κ=5, Hedges’ g −0.96, 95% credible interval −1.36 to −0.56) and moderate reductions for walking or jogging (n=1210, κ=51, g −0.63, −0.80 to −0.46), yoga (n=1047, κ=33, g=−0.55, −0.73 to −0.36), strength training (n=643, κ=22, g=−0.49, −0.69 to −0.29), mixed aerobic exercises (n=1286, κ=51, g=−0.43, −0.61 to −0.25), and tai chi or qigong (n=343, κ=12, g=−0.42, −0.65 to −0.21). Moderate, clinically meaningful effects were also present when exercise was combined with SSRIs (n=268, κ=11, g=−0.55, −0.86 to −0.23) or aerobic exercise was combined with psychotherapy (n=404, κ=15, g=−0.54, −0.76 to −0.32). All these treatments were significantly stronger than the standardised minimum clinically important difference compared with active control (g=−0.20), equating to an absolute g value of −1.16. Dance, exercise combined with SSRIs, and walking or jogging were the treatments most likely to perform best when modelling the surface under the cumulative ranking curve ( fig 4 ). For acceptability, the odds of participants dropping out of the study were lower for strength training (n=247, direct evidence κ=6, odds ratio 0.55, 95% credible interval 0.31 to 0.99) and yoga (n=264, κ=5, 0.57, 0.35 to 0.94) than for active control. The rate of dropouts was not significantly different from active control in any other arms (see supplementary file, section S8).

Fig 4

Predicted effects of different exercise modalities on major depression compared with active controls (eg, usual care), with 95% credible intervals. The estimate of effects for the active control condition was a before and after change of Hedges’ g of −0.95 (95% credible interval −1.10 to −0.79), n=3554, κ =113. Colour represents SUCRA from most likely to be helpful (dark purple) to least likely to be helpful (light purple). SSRI=selective serotonin reuptake inhibitor; SUCRA=surface under the cumulative ranking curve

Consistent with other meta-analyses, effects were moderate for cognitive behaviour therapy alone (n=712, κ=20, g=−0.55, −0.75 to −0.37) and small for SSRIs (n=432, κ=16, g=−0.26, −0.50 to −0.01) compared with active controls ( fig 4 ). These estimates are comparable to those of reviews that focused directly on psychotherapy (g=−0.67, −0.79 to −0.56) 7 or pharmacotherapy (g=−0.30, –0.34 to −0.26). 25 However, our review was not designed to find all studies of these treatments, so these estimates should not usurp these directly focused systematic reviews.

Despite the large number of studies in the network, confidence in the effects were low ( fig 5 ). This was largely due to the high within study bias described in the risk of bias summary plot. Reporting bias was also difficult to robustly assess because direct comparison with active control was often only provided in fewer than 10 studies. Many studies focused on one sex only, older adults, or those with comorbidities, so most arms had some concerns about indirect comparisons. Credible intervals were seldom wide enough to change decision making, so concerns about imprecision were few. Heterogeneity did plausibly change some conclusions around clinical significance. Few studies showed problematic incoherence, meaning direct and indirect evidence usually agreed. Overall, walking or jogging had low confidence, with other modalities being very low.

Fig 5

Summary table for credibility assessment using confidence in network meta-analysis (CINeMA). SSRI=selective serotonin reuptake inhibitor

Moderation by participant characteristics

The optimal modality appeared to be moderated by age and sex. Compared with models that only included exercise modality (R 2 =0.65), R 2 was higher for models that included interactions with sex (R 2 =0.71) and age (R 2 =0.69). R 2 showed no substantial increase for models including baseline depression (R 2 =0.67) or comorbidities (R 2 =0.66; see supplementary file, section S9).

Effects appeared larger for women than men for strength training and cycling ( fig 6 ). Effects appeared to be larger for men than women when prescribing yoga, tai chi, and aerobic exercise alongside psychotherapy. Yoga and aerobic exercise alongside psychotherapy appeared more effective for older participants than younger people ( fig 7 ). Strength training appeared more effective when prescribed to younger participants than older participants. Some estimates were associated with substantial uncertainty because some modalities were not well studied in some groups (eg, tai chi for younger adults), and mean age of the sample was only available for 71% of the studies.

Fig 6

Effects of interventions versus active control on depression (lower is better) by sex. Shading represents 95% credible intervals

Fig 7

Effects of interventions versus active control on depression (lower is better) by age. Shading represents 95% credible intervals

Moderation by intervention and design characteristics

Across modalities, a clear dose-response curve was observed for intensity of exercise prescribed ( fig 8 ). Although light physical activity (eg, walking, hatha yoga) still provided clinically meaningful effects (g=−0.58, −0.82 to −0.33), expected effects were stronger for vigorous exercise (eg, running, interval training; g=−0.74, −1.10 to −0.38). This finding did not appear to be due to increased weekly energy expenditure: credible intervals were wide, which meant that the dose-response curve for METs/min prescribed per week was unclear (see supplementary file, section S10). Weak evidence suggested that shorter interventions (eg, 10 weeks: g=−0.53, −0.71 to −0.35) worked somewhat better than longer ones (eg, 30 weeks: g=−0.37, −0.79 to 0.03), with wide credible intervals again indicating high uncertainty (see supplementary file, section S11). We also moderated for the lag between the end of treatment and the measurement of the outcome. We found no indication that participants were likely to relapse within the measurement period (see supplementary file, section S12); effects remained steady when measured either directly after the intervention (g=−0.59, −0.80 to −0.39) or up to six months later (g=−0.63, −0.87 to −0.40).

Fig 8

Dose-response curve for intensity (METs) across exercise modalities compared with active control. METs=metabolic equivalents of task

Supplementary file section S13 provides coding for the behaviour change techniques and autonomy for each exercise arm. None of the behaviour change techniques significantly moderated overall effects. Contrary to expectations, studies describing a level of participant autonomy (ie, choice over frequency, intensity, type, or time) tended to show weaker effects (g=−0.28, −0.78 to 0.23) than those that did not (g=−0.75, −1.17 to −0.33; see supplementary file, section S14). This effect was consistent whether or not we included studies that used physical activity counselling (usually high autonomy).

Use of group exercise appeared to moderate the effects: although the overall effects were similar for individual (g=−1.10, −1.57 to −0.64) and group exercise (g=−1.16, −1.61 to −0.73), some interventions were better delivered in groups (yoga) and some were better delivered individually (strength training, mixed aerobic exercise; see supplementary file, section S15).

As preregistered, we tested whether study funding moderated effects. Models that included whether a study was funded did explain more variance (R 2 =0.70) compared with models that included treatment alone (R 2 =0.65). Funded studies showed stronger effects (g=−1.01, −1.19 to −0.82) than unfunded studies (g=−0.77, −1.09 to −0.46). We also moderated for the type of measure (self-report v clinician report). This did not explain a substantial amount of variance in the outcome (R 2 =0.66).

Sensitivity analyses

Evidence of publication bias was found for overall estimates of exercise on depression compared with active controls, although not enough to nullify effects. The multilevel Egger’s test showed significance (F 1,98 =23.93, P<0.001). Funnel plots showed asymmetry, but the result of pooled effects remained statistically significant when only including non-significant studies (see supplementary file, section S16). No amount of publication bias would be sufficient to shrink effects to zero (s value=not possible). To reduce effects below clinical significance thresholds, studies with statistically significant results would need to be reported 58 times more frequently than studies with non-significant results.

Qualitative synthesis of mediation effects

Only a few of the studies used explicit mediation analyses to test hypothesised mechanisms of action. 54 55 56 57 58 59 One study found that both aerobic exercise and yoga led to decreased depression because participants ruminated less. 54 The study found that the effects of aerobic exercise (but not yoga) were mediated by increased acceptance. 54 “Perceived hassles” and awareness were not statistically significant mediators. 54 Another study found that the effects of yoga were mediated by increased self-compassion, but not rumination, self-criticism, tolerance of uncertainty, body awareness, body trust, mindfulness, and attentional biases. 55 One study found that the effects from an aerobic exercise intervention were not mediated by long term physical activity, but instead were mediated by exercise specific affect regulation (eg, self-control for exercise). 57 Another study found that neither exercise self-efficacy nor depression coping self-efficacy mediated effects of aerobic exercise. 56 Effects of aerobic exercise were not mediated by the N2 amplitude from electroencephalography, hypothesised as a neuro-correlate of cognitive control deficits. 58 Increased physical activity did not appear to mediate the effects of physical activity counselling on depression. 59 It is difficult to infer strong conclusions about mechanisms on the basis of this small number of studies with low power.

Summary of evidence

In this systematic review and meta-analysis of randomised controlled trials, exercise showed moderate effects on depression compared with active controls, either alone or in combination with other established treatments such as cognitive behaviour therapy. In isolation, the most effective exercise modalities were walking or jogging, yoga, strength training, and dancing. Although walking or jogging were effective for both men and women, strength training was more effective for women, and yoga or qigong was more effective for men. Yoga was somewhat more effective among older adults, and strength training was more effective among younger people. The benefits from exercise tended to be proportional to the intensity prescribed, with vigorous activity being better. Benefits were equally effective for different weekly doses, for people with different comorbidities, or for different baseline levels of depression. Although confidence in many of the results was low, treatment guidelines may be overly conservative by conditionally recommending exercise as complementary or alternative treatment for patients in whom psychotherapy or pharmacotherapy is either ineffective or unacceptable. 60 Instead, guidelines for depression ought to include prescriptions for exercise and consider adapting the modality to participants’ characteristics and recommending more vigorous intensity exercises.

Our review did not uncover clear causal mechanisms, but the trends in the data are useful for generating hypotheses. It is unlikely that any single causal mechanism explains all the findings in the review. Instead, we hypothesise that a combination of social interaction, 61 mindfulness or experiential acceptance, 62 increased self-efficacy, 33 immersion in green spaces, 63 neurobiological mechanisms, 64 and acute positive affect 65 combine to generate outcomes. Meta-analyses have found each of these factors to be associated with decreases in depressive symptoms, but no single treatment covers all mechanisms. Some may more directly promote mindfulness (eg, yoga), be more social (eg, group exercise), be conducted in green spaces (eg, walking), provide a more positive affect (eg, “runner’s high”’), or be more conducive to acute adaptations that may increase self-efficacy (eg, strength). 66 Exercise modalities such as running may satisfy many of the mechanisms, but they are unlikely to directly promote the mindful self-awareness provided by yoga and qigong. Both these forms of exercise are often practised in groups with explicit mindfulness but seldom have fast and objective feedback loops that improve self-efficacy. Adequately powered studies testing multiple mediators may help to focus more on understanding why exercise helps depression and less on whether exercise helps. We argue that understanding these mechanisms of action is important for personalising prescriptions and better understanding effective treatments.

Our review included more studies than many existing reviews on exercise for depression. 13 22 27 28 As a result, we were able to combine the strengths of various approaches to exercise and to make more nuanced and precise conclusions. For example, even taking conservative estimates (ie, the least favourable end of the credible interval), practitioners can expect patients to experience clinically significant effects from walking, running, yoga, qigong, strength training, and mixed aerobic exercise. Because we simultaneously assessed more than 200 studies, credible intervals were narrower than those in most existing meta-analyses. 13 We were also able to explore non-linear relationships between outcomes and moderators, such as frequency, intensity, and time. These analyses supported some existing findings—for example, our study and the study by Heissel et al 22 found that shorter interventions had stronger effects, at least for six months; our study and the study by Singh et al 13 both found that effects were stronger with vigorous intensity exercise compared with light and moderate exercise. However, most existing reviews found various treatment modalities to be equally effective. 13 27 In our review, some types of exercise had stronger effect sizes than others. We attribute this to the study level data available in a network meta-analysis compared with an overview of reviews 24 and higher power compared with meta-analyses with smaller numbers of included studies. 22 28 Overviews of reviews have the ability to more easily cover a wider range of participants, interventions, and outcomes, but also risk double counting randomised trials that are included in separate meta-analyses. They often include heterogeneous studies without having as much control over moderation analyses (eg, Singh et al included studies covering both prevention and treatment 13 ). Some of those reviews grouped interventions such as yoga with heterogeneous interventions such as stretching and qigong. 13 This practise of combining different interventions makes it harder to interpret meta-analytical estimates. We used methods that enabled us to separately analyse the effects of these treatment modalities. In so doing, we found that these interventions do have different effects, with yoga being an intervention with strong effects and stretching being better described as an active control condition. Network meta-analyses revealed the same phenomenon with psychotherapy: researchers once concluded there was a dodo bird verdict, whereby “everybody has won, and all must have prizes,” 67 until network meta-analyses showed some interventions were robustly more effective than others. 6 26

Predictors of acceptability and outcomes

We found evidence to suggest good acceptability of yoga and strength training; although the measurement of study drop-out is an imperfect proxy of adherence. Participants may complete the study without doing any exercise or may continue exercising and drop out of the study for other reasons. Nevertheless, these are useful data when considering adherence.

Behaviour change techniques, which are designed to increase adherence, did not meaningfully moderate the effect sizes from exercise. This may be due to several factors. It may be that the modality explains most of the variance between effects, such that behaviour change techniques (eg, presence or absence of feedback) did not provide a meaningful contribution. Many forms of exercise potentially contain therapeutic benefits beyond just energy expenditure. These characteristics of a modality may be more influential than coexisting behaviour change techniques. Alternatively, researchers may have used behaviour change techniques such as feedback or goal setting without explicitly reporting them in the study methods. Given the inherent challenges of behaviour change among people with depression, 29 and the difficulty in forecasting which strategies are likely to be effective, 68 we see the identification of effective techniques as important.

We did find that autonomy, as provided in the methods of included studies, predicted effects, but in the opposite direction to our hypotheses: more autonomy was associated with weaker effects. Physical activity counselling, which usually provides a great deal of patient autonomy, was among the lowest effect sizes in our meta-analysis. Higher autonomy judgements were associated with weaker outcomes regardless of whether physical activity counselling was included in the model. One explanation for these data is that people with depression benefit from the clear direction and accountability of a standardised prescription. When provided with more freedom, the low self-efficacy that is symptomatic of depression may stop patients from setting an appropriate level of challenge (eg, they may be less likely to choose vigorous exercise). Alternatively, participants were likely autonomous when self-selecting into trials with exercise modalities they enjoyed, or those that fit their social circumstances. After choosing something value aligned, autonomy within the trial may not have helpful. Either way, data should be interpreted with caution. Our judgement of the autonomy provided in the methods may not reflect how much autonomy support patients actually felt. The patient’s perceived autonomy is likely determined by a range of factors not described in the methods (eg, the social environment created by those delivering the programme, or their social identity), so other studies that rely on patient reports of the motivational climate are likely to be more reliable. 33 Our findings reiterate the importance of considering these patient reports in future research of exercise for depression.

Our findings suggest that practitioners could advocate for most patients to engage in exercise. Those patients may benefit from guidance on intensity (ie, vigorous) and types of exercise that appear to work well (eg, walking, running, mixed aerobic exercise, strength training, yoga, tai chi, qigong) and be well tolerated (eg, strength training and yoga). If social determinants permit, 66 engaging in group exercise or structured programmes could provide support and guidance to achieve better outcomes. Health services may consider offering these programmes as an alternative or adjuvant treatment for major depression. Specifically, although the confidence in the evidence for exercise is less strong than for cognitive behavioural therapy, the effect sizes seem comparable, so it may be an alternative for patients who prefer not to engage in psychotherapy. Previous reviews on those with mild-moderate depression have found similar effects for exercise or SSRIs, or the two combined. 13 14 In contrast, we found some forms of exercise to have stronger effects than SSRIs alone. Our findings are likely related to the larger power in our review (n=14 170) compared with previous reviews (eg, n=2551), 14 and our ability to better account for heterogeneity in exercise prescriptions. Exercise may therefore be considered a viable alternative to drug treatment. We also found evidence that exercise increases the effects of SSRIs, so offering exercise may act as an adjuvant for those already taking drugs. We agree with consensus statements that professionals should still account for patients’ values, preferences, and constraints, ensuring there is shared decision making around what best suits the patient. 66 Our review provides data to help inform that decision.

Strengths, limitations, and future directions

Based on our findings, dance appears to be a promising treatment for depression, with large effects found compared with other interventions in our review. But the small number of studies, low number of participants, and biases in the study designs prohibits us from recommending dance more strongly. Given most research for the intervention has been in young women (88% female participants, mean age 31 years), it is also important for future research to assess the generalisability of the effects to different populations, using robust experimental designs.

The studies we found may be subject to a range of experimental biases. In particular, researchers seldom blinded participants or staff delivering the intervention to the study’s hypotheses. Blinding for exercise interventions may be harder than for drugs 23 ; however, future studies could attempt to blind participants and staff to the study’s hypotheses to avoid expectancy effects. 69 Some of our ratings are for studies published before the proliferation of reporting checklists, so the ratings might be too critical. 23 For example, before CONSORT, few authors explicitly described how they generated a random sequence. 23 Therefore, our risk of bias judgements may be too conservative. Similarly, we planned to use the Cochrane risk of bias (RoB) 1 tool 40 so we could use the most recent Cochrane review of exercise and depression 12 to calibrate our raters, and because RoB 2 had not yet been published. 70 Although assessments of bias between the two tools are generally comparable, 71 the RoB 1 tool can be more conservative when assessing open label studies with subjective assessments (eg, unblinded studies with self-reported measures for depression). 71 As a result, future reviews should consider using the latest risk of bias tool, which may lead to different assessments of bias in included studies.

Most of the main findings in this review appear robust to risks from publication bias. Specifically, pooled effect sizes decreased when accounting for risk of publication bias, but no degree of publication bias could nullify effects. We did not exclude grey literature, but our search strategy was not designed to systematically search grey literature or trial registries. Doing so can detect additional eligible studies 72 and reveal the numbers of completed studies that remain unpublished. 73 Future reviews should consider more systematic searches for this kind of literature to better quantify and mitigate risk of publication bias.

Similarly, our review was able to integrate evidence that directly compared exercise with other treatment modalities such as SSRIs or psychotherapy, while also informing estimates using indirect evidence (eg, comparing the relative effects of strength training and SSRIs when tested against a waitlist control). Our review did not, however, include all possible sources of indirect evidence. Network meta-analyses exist that directly focus on psychotherapy 7 and pharmacotherapy, 25 and these combined for treating depression. 6 Those reviews include more than 500 studies comparing psychological or drug interventions with controls. Harmonising the findings of those reviews with ours would provide stronger data on indirect effects.

Our review found some interesting moderators by age and sex, but these were at the study level rather than individual level—that is, rather than being able to determine whether women engaging in a strength intervention benefit more than men, we could only conclude that studies with more women showed larger effects than studies with fewer women. These studies may have been tailored towards women, so effects may be subject to confounding, as both sex and intervention may have changed. The same finding applied to age, where studies on older adults were likely adapted specifically to this age group. These between study differences may explain the heterogeneity in the effects of interventions, and confounding means our moderators for age and sex should be interpreted cautiously. Future reviews should consider individual patient meta-analyses to allow for more detailed assessments of participant level moderators.

Finally, for many modalities, the evidence is derived from small trials (eg, the median number of walking or jogging arms was 17). In addition to reducing risks from bias, primary research may benefit from deconstruction designs or from larger, head-to-head analyses of exercise modalities to better identify what works best for each candidate.

Clinical and policy implications

Our findings support the inclusion of exercise as part of clinical practice guidelines for depression, particularly vigorous intensity exercise. Doing so may help bridge the gap in treatment coverage by increasing the range of first line options for patients and health systems. 9 Globally there has been an attempt to reduce stigma associated with seeking treatment for depression. 74 Exercise may support this effort by providing patients with treatment options that carry less stigma. In low resource or funding constrained settings, group exercise interventions may provide relatively low cost alternatives for patients with depression and for health systems. When possible, ideal treatment may involve individualised care with a multidisciplinary team, where exercise professionals could take responsibility for ensuring the prescription is safe, personalised, challenging, and supported. In addition, those delivering psychotherapy may want to direct some time towards tackling cognitive and behavioural barriers to exercise. Exercise professionals might need to be trained in the management of depression (eg, managing risk) and to be mindful of the scope of their practice while providing support to deal with this major cause of disability.

Conclusions

Depression imposes a considerable global burden. Many exercise modalities appear to be effective treatments, particularly walking or jogging, strength training, and yoga, but confidence in many of the findings was low. We found preliminary data that may help practitioners tailor interventions to individuals (eg, yoga for older men, strength training for younger women). The World Health Organization recommends physical activity for everyone, including those with chronic conditions and disabilities, 75 but not everyone can access treatment easily. Many patients may have physical, psychological, or social barriers to participation. Still, some interventions with few costs, side effects, or pragmatic barriers, such as walking and jogging, are effective across people with different personal characteristics, severity of depression, and comorbidities. Those who are able may want to choose more intense exercise in a structured environment to further decrease depression symptoms. Health systems may want to provide these treatments as alternatives or adjuvants to other established interventions (cognitive behaviour therapy, SSRIs), while also attenuating risks to physical health associated with depression. 3 Therefore, effective exercise modalities could be considered alongside those intervention as core treatments for depression.

What is already known on this topic

Depression is a leading cause of disability, and exercise is often recommended alongside first line treatments such as pharmacotherapy and psychotherapy

Treatment guidelines and previous reviews disagree on how to prescribe exercise to best treat depression

What this study adds

Various exercise modalities are effective (walking, jogging, mixed aerobic exercise, strength training, yoga, tai chi, qigong) and well tolerated (especially strength training and yoga)

Effects appeared proportional to the intensity of exercise prescribed and were stronger for group exercise and interventions with clear prescriptions

Preliminary evidence suggests interactions between types of exercise and patients’ personal characteristics

Ethics statements

Ethical approval.

Not required.

Acknowledgments

We thank Lachlan McKee for his assistance with data extraction. We also thank Juliette Grosvenor and another librarian (anonymous) for their review of our search strategy.

Contributors: MN led the project, drafted the manuscript, and is the guarantor. MN, TS, PT, MM, BdPC, PP, SB, and CL drafted the initial study protocol. MN, TS, PT, BdPC, DvdH, JS, MM, RP, LP, RV, HA, and BV conducted screening, extraction, and risk of bias assessment. MN, JS, and JM coded methods for behaviour change techniques. MN and DGG conducted statistical analyses. PP, SB, and CL provided supervision and mentorship. All authors reviewed and approved the final manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: None received.

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

Data sharing Data and code for reproducing analyses are available on the Open Science Framework ( https://osf.io/nzw6u/ ).

The lead author (MN) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: We plan to disseminate the findings of this study to lay audiences through mainstream and social media.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

  • ↵ World Health Organization. Depression. 2020 [cited 2020 Mar 12]. https://www.who.int/news-room/fact-sheets/detail/depression
  • ↵ Birkjær M, Kaats M, Rubio A. Wellbeing adjusted life years: A universal metric to quantify the happiness return on investment. Happiness Research Institute; 2020. https://www.happinessresearchinstitute.com/waly-report
  • Jacobson NC ,
  • Pinquart M ,
  • Duberstein PR
  • Cuijpers P ,
  • Karyotaki E ,
  • Vinkers CH ,
  • Cipriani A ,
  • Furukawa TA
  • Strawbridge R ,
  • Marwood L ,
  • Santomauro D ,
  • Collins PY ,
  • Generaal E ,
  • Lawlor DA ,
  • Cooney GM ,
  • Recchia F ,
  • Miller CT ,
  • Mundell NL ,
  • Gallardo-Gómez D ,
  • Del Pozo-Cruz J ,
  • Álvarez-Barbosa F ,
  • Alfonso-Rosa RM ,
  • Del Pozo Cruz B
  • Salcher-Konrad M ,
  • ↵ National Collaborating Centre for Mental Health (UK). Depression: The Treatment and Management of Depression in Adults (Updated Edition). Leicester (UK): British Psychological Society; https://www.ncbi.nlm.nih.gov/pubmed/22132433
  • Bassett D ,
  • ↵ American Psychiatric Association. Practice Guideline for the Treatment of Patients with Major Depressive Disorder. Third Edition. Washington, DC: American Psychiatric Association; 2010. 87 p. https://psychiatryonline.org/pb/assets/raw/sitewide/practice_guidelines/guidelines/mdd-1410197717630.pdf
  • ↵ NICE. Depression in adults: treatment and management. [cited 2023 Mar 13]. National Institute for Health and Care Excellence; 2022 https://www.nice.org.uk/guidance/ng222/resources
  • Heissel A ,
  • Brokmeier LL ,
  • Ekkekakis P
  • ↵ Chaimani A, Caldwell DM, Li T, Higgins JPT, Salanti G. Undertaking network meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al., editors. Cochrane Handbook for Systematic Reviews of Interventions. Cochrane; 2022. www.training.cochrane.org/handbook
  • Furukawa TA ,
  • Salanti G ,
  • Miller KJ ,
  • Gonçalves-Bradley DC ,
  • Areerob P ,
  • Hennessy D ,
  • Mesagno C ,
  • Glowacki K ,
  • Duncan MJ ,
  • Gainforth H ,
  • Richardson M ,
  • Johnston M ,
  • Abraham C ,
  • Whittington C ,
  • McAteer J ,
  • French DP ,
  • Olander EK ,
  • Chisholm A ,
  • Mc Sharry J
  • Ntoumanis N ,
  • Prestwich A ,
  • Caldwell DM ,
  • Nikolakopoulou A ,
  • Higgins JPT ,
  • Papakonstantinou T ,
  • Caspersen CJ ,
  • Powell KE ,
  • Christenson GM
  • ↵ Veritas Health Innovation. Covidence systematic review software. Melbourne, Australia; 2023. www.covidence.org
  • Ainsworth BE ,
  • Haskell WL ,
  • Herrmann SD ,
  • Altman DG ,
  • Gøtzsche PC ,
  • Cochrane Bias Methods Group ,
  • Cochrane Statistical Methods Group
  • Hodges JS ,
  • ↵ Dias S, Welton NJ, Sutton AJ, Ades AE. NICE DSU technical support document 2: a generalised linear modelling framework for pairwise and network meta-analysis of randomised controlled trials. In: National Institute for Health and Care Excellence (NICE), editor. NICE Decision Support Unit Technical Support Documents. London: Citeseer; 2011. https://www.ncbi.nlm.nih.gov/books/NBK310366/
  • Faltinsen E ,
  • Todorovac A ,
  • Staxen Bruun L ,
  • ↵ R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2022. https://www.R-project.org/
  • Hengartner MP ,
  • Balduzzi S ,
  • Dusseldorp E ,
  • Sterne JAC ,
  • Sutton AJ ,
  • Ioannidis JPA ,
  • Mathur MB ,
  • VanderWeele TJ
  • Leung LYL ,
  • La Rocque CL ,
  • Mazurka R ,
  • Stuckless TJR ,
  • Harkness KL
  • Vollbehr NK ,
  • Hoenders HJR ,
  • Bartels-Velthuis AA ,
  • Zeibig JM ,
  • Seiffer B ,
  • Ehmann PJ ,
  • Alderman BL
  • Bombardier CH ,
  • Gibbons LE ,
  • ↵ American Psychological Association. Clinical practice guideline for the treatment of depression across three age cohorts. American Psychological Association; 2019. https://www.apa.org/depression-guideline/
  • van Straten A ,
  • Reynolds CF 3rd .
  • Johannsen M ,
  • Nissen ER ,
  • Lundorff M ,
  • Coventry PA ,
  • Schuch FB ,
  • Deslandes AC ,
  • Gosmann NP ,
  • Fleck MP de A
  • Saunders DH ,
  • Phillips SM
  • Teychenne M ,
  • Hunsley J ,
  • Di Giulio G
  • Milkman KL ,
  • Hecksteden A ,
  • Savović J ,
  • ↵ Richter B, Hemmingsen B. Comparison of the Cochrane risk of bias tool 1 (RoB 1) with the updated Cochrane risk of bias tool 2 (RoB 2). Cochrane; 2021. Report No.: 1. https://community.cochrane.org/sites/default/files/uploads/inline-files/RoB1_2_project_220529_BR%20KK%20formatted.pdf
  • Chandler J ,
  • Lefebvre C ,
  • Glanville J ,
  • Briscoe S ,
  • Coronado-Montoya S ,
  • Kwakkenbos L ,
  • Steele RJ ,
  • Turner EH ,
  • Angermeyer MC ,
  • van der Auwera S ,
  • Schomerus G
  • Al-Ansari SS ,

systematic review method guidelines

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-9531-4990 Bashar Hasan 1 , 2 ,
  • http://orcid.org/0000-0001-9225-1197 Samer Saadi 1 , 2 ,
  • Noora S Rajjoub 1 ,
  • Moustafa Hegazi 1 , 2 ,
  • Mohammad Al-Kordi 1 , 2 ,
  • Farah Fleti 1 , 2 ,
  • Magdoleen Farah 1 , 2 ,
  • Irbaz B Riaz 3 ,
  • Imon Banerjee 4 , 5 ,
  • http://orcid.org/0000-0002-9368-6149 Zhen Wang 1 , 6 ,
  • http://orcid.org/0000-0001-5502-5975 Mohammad Hassan Murad 1 , 2
  • 1 Kern Center for the Science of Healthcare Delivery , Mayo Clinic , Rochester , Minnesota , USA
  • 2 Public Health, Infectious Diseases and Occupational Medicine , Mayo Clinic , Rochester , Minnesota , USA
  • 3 Division of Hematology-Oncology Department of Medicine , Mayo Clinic , Rochester , Minnesota , USA
  • 4 Department of Radiology , Mayo Clinic Arizona , Scottsdale , Arizona , USA
  • 5 School of Computing and Augmented Intelligence , Arizona State University , Tempe , Arizona , USA
  • 6 Health Care Policy and Research , Mayo Clinic Minnesota , Rochester , Minnesota , USA
  • Correspondence to Dr Bashar Hasan, Mayo Clinic, Rochester, MN 55905, USA; Hasan.Bashar{at}mayo.edu

Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of ‘Classification of Intervention’. Kendall agreement coefficient was highest for the domains of ‘Participant Selection’, ‘Missing Data’ and ‘Measurement of Outcomes’, suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required.

  • Evidence-Based Practice
  • Systematic Reviews as Topic

Data availability statement

Data are available upon reasonable request. Search strategy, selection process flowchart, prompts and boxes containing included SRs and studies are available in the appendix. Analysed datasheet is available upon request.

https://doi.org/10.1136/bmjebm-2023-112597

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

Risk of bias assessment in systematic reviews is a time-consuming task associated with inconsistency. Large language models’ (LLMs) utilisation in systematic reviews may be helpful but largely unexplored.

WHAT THIS STUDY ADDS

This study introduces a structured framework for integrating LLMs into systematic reviews with four domains: rationale, protocol, execution and reporting.

The framework defines five possible task types for LLMs in systematic reviews: selection, data extraction, judgement, analysis and narration.

A case study about using LLMs for risk of bias assessments using Risk Of Bias In Non-randomised Studies of Interventions demonstrates fair agreement between LLM and human reviewers.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

The proposed framework can serve as a blueprint for future systematic reviewers planning to integrate LLMs into their workflow.

The case study suggests the need to pair LLMs assessing the risk of bias with a human reviewer.

Introduction

Systematic reviews are the key initial step in decision-making in healthcare. However, they are costly, require a long time to complete and become outdated, especially in areas of rapidly evolving evidence. Semi-automating systematic reviews and transitioning to living systematic reviews using the best contemporary available evidence are key priority areas of current evidence synthesis. 1–4 Recent advances in artificial intelligence (AI) have ushered in a new era of possibilities in healthcare practice and medical research, 5–7 including evidence synthesis and living systematic reviews. 8 9 By learning from human data analysis patterns (supervision), AI technologies offer the ability to automate, accelerate and enhance the accuracy of a wide array of research tasks, from data collection to analysis and even interpretation. 10

A recent AI advancement, large language models (LLMs) such as Meta AI LLaMA2 and OpenAI’s GPT-4, 11 are considered foundational models pre-trained in a self-supervised manner by leveraging a tremendous amount of free text data. The pre-training process allows them to acquire generic knowledge, and afterward, they can be fine-tuned on downstream tasks. With increasing model size, larger training data sets and longer training time, LLMs evolve emergent abilities such as zero-shot and few-shot in-context learning generalisation and have demonstrated significant capabilities in understanding and generating human-like text and processing data with minimal supervision, which may lead to meaningful participation in a systematic review. 12 13

Risk of bias (RoB) assessment is a significant step in systematic reviews that requires time, introduces inconsistencies and may be amenable to using AI and LLMs. 14 In this exposition, we propose a framework for incorporating LLMs into systematic reviews and employ GPT-4 for RoB assessment in a case study using the Cochrane Collaboration’s Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool. 15 We chose the ROBINS-I tool for this case study because it is a modern tool that is quite detailed, relatively complicated, and requires a long time to apply, 16 which makes it an ideal candidate to explore whether models such as GPT-4 can improve its consistency and time requirements.

The reporting of this case study adheres to the guidelines of methodological research. 17

Search strategy and study identification

We searched Scopus to identify all systematic reviews (SRs) from the Cochrane Collaboration that cited the original publication of the ROBINS-I tool. 15 We limited our search to SRs conducted by Cochrane in the field of medicine that were fully published. All original non-randomised studies included in the identified SRs were included as long as the ROBINS-I tool was used for their RoB assessment in the SR.

Data entry into ChatGPT

We conducted several pilot tests to determine the most effective method of obtaining RoB assessments using ChatGPT (GPT-4). The initial approach involved directly uploading the study PDFs to GPT-4 via the Code Interpreter tool available to Plus users. However, the tool was unable to interpret the fragmented pieces of text from the PDFs. We then attempted to paste the full text of individual studies in the prompt, however, this was unsuccessful due to the current estimated 2500-word limit for GPT-4 prompts. Finally, we converted the PDF to a Word file and extracted only the Methods and Results sections from each study for RoB assessment because these are the sections on which human reviewers focus for RoB assessments. Prompts used to instruct ChatGPT are presented in the appendix. The processes of data entry and prompt development were done iteratively until data were appropriately uploaded and a sensical output was obtained (ie, these processes were not prespecified). Foreign-language studies were provided in their original language to GPT-4.

Statistical analysis

One reviewer extracted RoB judgements from each Cochrane SR and a second reviewer verified the extraction. We measured the agreement between Cochrane reviewers and GPT-4 comparing the ordinal judgements about RoB using raw per cent agreement, weighted Cohen’s kappa and Kendall’s τ for correlation. The magnitude of agreement based on values of a correlation or kappa coefficient was considered to be slight (0–0.20), fair (0.21–0.40), moderate (0.41–60), substantial (0.61–0.80) and almost perfect (0.81–1.0).

Analysis was conducted using R software package (R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria URL https://www.R-project.org ).

Initial screening and inclusion

The initial search yielded 98 SRs, from which 36 provided full ROBINS-I assessment. After deduplicating studies that appeared in multiple SRs, we finalised our sample with 307 unique individual studies ( online supplemental figure; box 1 and box 2 ).

Supplemental material

Agreement between cochrane reviewers and gpt-4.

Agreement measures are summarised in table 1 for each ROBINS-I domain and for overall judgements. Raw per cent agreement was the highest for the domain of ‘Classification of Intervention’. Kendall agreement coefficient was highest for the domains of ‘Participant Selection’, ‘Missing Data’ and ‘Measurement of Outcomes’, suggesting moderate agreement in these domains. Kappa coefficient was low across all domains. Agreement about the overall RoB across domains was fair (61% raw agreement, Kendall coefficient 0.35).

  • View inline

Performance metrics

Framework for incorporating LLM’s in a systematic review

Figure 1 outlines the proposed framework for integrating LLMs into a systematic review workflow. The framework has four domains that relate to establishing a rationale, incorporating LLM in the protocol of the systematic review, execution and reporting.

  • Download figure
  • Open in new tab
  • Download powerpoint

Framework for incorporating large language models in systematic reviews. LLM, large language model; RoB, risk of bias; SR, systematic review.

The first step is to establish the rationale (ie, why LLMs are needed, and whether they are capable of doing this specific task). In the protocol, the LLM model should be described with its version and whether it was off the shelf or used via other tools, applications or interfaces. For example, code interpreters or AI agents can be used. An LLM agent, such as a generative pre-trained transformer (GPT) agent, is a specialised system designed to execute complex, multistep tasks and can adapt to new tools not included in the general model’s training data or recently published tools.

The prompts for LLM need to be iteratively tested and refined and described in the protocol to the extent possible, realising that it will not be possible to prespecify or anticipate every step. The method of data entry (copy/paste vs uploading a file) also needs to be tested and described in the protocol. Metrics of success depend on the task type that is assigned to LLM. We identify five basic task types: selection (eg, of included studies), extraction (eg, of study characteristics and outcomes), judgement (eg, RoB assessment), analysis (quantitative and qualitative) and narration/editing (eg, writing a manuscript, abstract or a lay person or executive summary). The metrics of success and the extent of human interaction and supervision should also be specified in the protocol.

The execution of LLM engagement will likely lead to changes in some of the approaches specified in the protocol, which should be explicitly mentioned as revisions to the protocol. Reporting is the last part of the framework and is vital. The items mentioned above, which are beyond the usual reporting requirements from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement and its extensions, should all be included in the manuscript. 18 19 Importantly, the AI model and interface used need to be explicitly reported along with a timestamp of when AI was used because the output may vary over time for the same input and prompts. The transparency in reporting and informing peer reviewers and journal editors about the details of using LLMs are critical for the credibility of the systematic review process and subsequent decisions made based on the evidence. The proposed framework is applied to the current case study in table 2 .

Applying the proposed framework to the case study

The current case study suggests an overall fair agreement between Cochrane reviewers and ChatGPT-4 in using ROBINS-I for assessing RoB in non-randomised studies of intervention. This work identifies several challenges for using general utility LLM models, such as handling file types, word token limits and the quality of prompt engineering. Nonetheless, our study provides an assessment of zero-shot performance and a rationale for training RoB-specific systematic review models. The proposed framework is just a starting point since this field is very dynamic.

The current study also provides insight into evaluating inter-rater agreement on ordinal variables. We found that the weighted kappa coefficient was low across all domains which likely reflects the skewed distribution of the ratings. Kappa accounts for agreement occurring by chance, while Kendall’s τ measures the strength and direction of the association between two ranked variables. A recent comparison of reliability coefficients for ordinal rating scales suggested that the differences between these measures can vary at different agreement levels. 20 Thus, using more than one measure is helpful to assess the robustness of results. While our findings suggest the potential of LLMs like GPT-4 to be used in systematic reviews, it is obvious that there is a certain rate of error and that duplication of RoB assessment is needed.

Some limitations of the case study should be mentioned. This study was feasible because of the availability of comprehensive systematic reviews from the Cochrane Collaboration that used the ROBINS-I tool and reported detailed judgements. While their RoB assessment is certainly not a reference standard and can be quite poor for some domains such as confounding, 21 the rigorous and multidomain evaluation conducted by pairs of independent reviewers in these reviews makes them a reasonable comparison for novel LLM application. It is possible also that some systematic reviews used ROBINS-I but did not cite its original paper and were not included in our sample. We also had to use ChatGPT to translate a few studies published in languages other than English, truncate text when it was too lengthy and convert files format, all may have affected RoB judgements.

Practical implications

Given its current capabilities, GPT-4 is arguably a very advanced text-analysing tool. A major advantage is its availability as a universal language model—one model that can perform any language-based extraction, retrieval or even reasoning-based tasks. However, this approach may not be suitable for application in every domain. Sensitive domains like medicine require precise use of language in a consistent manner. LLMs have displayed trends of inconsistency in performance—different output for the same input. LLMs have the propensity to generate favourable answers and to hallucinate. Hallucination is a major threat to the use of LLMs in research. In table 3 , we describe the phenomenon of artificial hallucinations in terms of definition, types and plausible causes. 22–24

The phenomenon of artificial hallucinations: definition, types and causes

Additional applications in systematic reviews can extend to other tasks such as aiding in screening studies, translating foreign-language studies in real-time, data extraction, meta-analysis and even generating decision aids or translational products. 25 However, a human reviewer remains needed as a duplicate independent reviewer.

This exploration of LLMs application in systematic reviews is a step toward integrating AI as a dynamic adjunct in research. The proposed framework, coupled with a case study on RoB assessment, underscores the potential of LLMs to facilitate research tasks. While GPT-4 is not without its limitations, its ability to assist in complex tasks under human supervision makes it a promising tool for assessing RoB in systematic reviews. Considering the agreement level with a human reviewer in the case study, pairing AI with an independent human reviewer remains required at present.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Chu H , et al
  • Sipra QUAR ,
  • Naqvi SAA , et al
  • Ryu AJ , et al
  • Naqvi SAA ,
  • He H , et al
  • Kayaalp ME ,
  • Ollivier M , et al
  • Noorbakhsh-Sabet N ,
  • Zhang Y , et al
  • Ramkumar PN ,
  • Haeberle HS , et al
  • Kelly SE , et al
  • Feng Y , et al
  • van Dijk SHB ,
  • Brusse-Keizer MGJ ,
  • Bucsán CC , et al
  • Touvron H ,
  • Stone K , et al
  • Kolluri S ,
  • Liu R , et al
  • Jardim PSJ ,
  • Ames HM , et al
  • Sterne JA ,
  • Hernán MA ,
  • Reeves BC , et al
  • Jeyaraman MM ,
  • Rabbani R ,
  • Al-Yousif N , et al
  • Liberati A ,
  • Tetzlaff J , et al
  • de Raadt A ,
  • Warrens MJ ,
  • Bosker RJ , et al
  • Thirunavukarasu AJ ,
  • Elangovan K , et al
  • Alkaissi H ,
  • McFarlane SI
  • Blaizot A ,
  • Veettil SK ,
  • Saidoung P , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1

Twitter @BasharHasanMD, @M_Hassan_Murad

Contributors MHM and BH conceived this study. BH, SS, MH, MA-K, FF, MF, ZW, IBR, IB and NSR participated in data identification, extraction and analysis. MHM, SS, IBR and IB wrote the first draft. All authors critically revised the manuscript and approved the final version. BH is the guarantor.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

  • Download PDF
  • CME & MOC
  • Share X Facebook Email LinkedIn
  • Permissions

Interventions for Optimization of Guideline-Directed Medical Therapy : A Systematic Review

  • 1 Department of Medicine, University of California Los Angeles
  • 2 Department of Medicine, Division of Cardiology, University of California Los Angeles
  • 3 Louise M. Darling Biomedical Library, UCLA Library, University of California Los Angeles
  • 4 Associate Section Editor, JAMA Cardiology

Question   What are the most effective interventions for optimization of guideline-directed medical therapy (GDMT) in clinical practice?

Findings   In this systematic review, interdisciplinary heart failure clinics were the most consistently effective for uptitration of GDMT. Other types of interventions, including audits, electronic health record alerts, or education-based initiatives, showed efficacy for some components of GDMT, but results varied across studies.

Meaning   Interdisciplinary heart failure clinics are effective for GDMT titration, but multifactorial interventions may be necessary for further optimization and initiation of GDMT.

Importance   Implementation of guideline-directed medical therapy (GDMT) in real-world practice remains suboptimal. It is unclear which interventions are most effective at addressing current barriers to GDMT in patients with heart failure with reduced ejection fraction (HFrEF).

Objective   To perform a systematic review to identify which types of system-level initiatives are most effective at improving GDMT use among patients with HFrEF.

Evidence Review   PubMed, Embase, Cochrane, CINAHL, and Web of Science databases were queried from January 2010 to November 2023 for randomized clinical trials that implemented a quality improvement intervention with GDMT use as a primary or secondary outcome. References from related review articles were also included for screening. Quality of studies and bias assessment were graded based on the Cochrane Risk of Bias tool and Oxford Centre for Evidence-Based Medicine.

Findings   Twenty-eight randomized clinical trials were included with an aggregate sample size of 19 840 patients. Studies were broadly categorized as interdisciplinary interventions (n = 15), clinician education (n = 5), electronic health record initiatives (n = 6), or patient education (n = 2). Overall, interdisciplinary titration clinics were associated with significant increases in the proportion of patients on target doses of GDMT with a 10% to 60% and 2% to 53% greater proportion of patients on target doses of β-blockers and renin-angiotensin-aldosterone system inhibitors, respectively, in intervention groups compared with usual care. Other interventions, such as audits, clinician and patient education, or electronic health record alerts, were also associated with some improvements in GDMT utilization, though these findings were inconsistent across studies.

Conclusions and Relevance   This review summarizes interventions aimed at optimization of GDMT in clinical practice. Initiatives that used interdisciplinary teams, largely comprised of nurses and pharmacists, most consistently led to improvements in GDMT. Additional large, randomized studies are necessary to better understand other types of interventions, as well as their long-term efficacy and sustainability.

Read More About

Tang AB , Brownell NK , Roberts JS, et al. Interventions for Optimization of Guideline-Directed Medical Therapy : A Systematic Review . JAMA Cardiol. Published online February 21, 2024. doi:10.1001/jamacardio.2023.5627

Manage citations:

© 2024

Artificial Intelligence Resource Center

Cardiology in JAMA : Read the Latest

Browse and subscribe to JAMA Network podcasts!

Others Also Liked

Select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing
  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

IMAGES

  1. The Systematic Review Process

    systematic review method guidelines

  2. Systematic reviews

    systematic review method guidelines

  3. 10 Steps to Write a Systematic Literature Review Paper in 2023

    systematic review method guidelines

  4. How to Conduct a Systematic Review

    systematic review method guidelines

  5. Guidelines for performing Systematic Reviews

    systematic review method guidelines

  6. A Step by Step Guide for Conducting a Systematic Review

    systematic review method guidelines

VIDEO

  1. SYSTEMATIC AND LITERATURE REVIEWS

  2. Methodology of the Study

  3. Part 1: Reasons for a systematic review protocol

  4. #maa #shortsvideo #ammajanammajan

  5. Systematic Review for Beginners

  6. What is a Systematic Review?

COMMENTS

  1. The PRISMA 2020 statement: an updated guideline for reporting ...

    The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, published in 2009, was designed to help systematic reviewers transparently report why the review was done, what the authors did, and what they found. Over the past decade, advances in systematic review methodology and terminology have necessitated an update to the guideline. The PRISMA 2020 statement ...

  2. Guidance on Conducting a Systematic Literature Review

    The objective of this article is to provide guidance on how to conduct systematic literature review. By surveying publications on the methodology of literature review, we summarize the typology of literature review, describe the procedures for conducting the review, and provide tips to planning scholars.

  3. Cochrane Handbook for Systematic Reviews of Interventions

    The Cochrane Handbook for Systematic Reviews of Interventions is the official guide that describes in detail the process of preparing and maintaining Cochrane systematic reviews on the effects of healthcare interventions. All authors should consult the Handbook for guidance on the methods used in Cochrane systematic reviews.

  4. Steps of a Systematic Review

    Recommended readings: Muka, T., Glisic, M., Milic, J., Verhoog, S., Bohlius, J., Bramer, W., ... & Franco, O. H. (2020). A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research. European Journal of Epidemiology, 35 (1), 49-60. Choi, A. R., Cheng, D. L., & Greenberg, P. B. (2019).

  5. How to Do a Systematic Review: A Best Practice Guide for ...

    Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of e …

  6. Five steps to conducting a systematic review

    A review earns the adjective systematic if it is based on a clearly formulated question, identifies relevant studies, appraises their quality and summarizes the evidence by use of explicit methodology. It is the explicit and systematic approach that distinguishes systematic reviews from traditional reviews and commentaries.

  7. How to Do a Systematic Review: A Best Practice Guide ...

    How to Do a Systematic Review: A Best Practice Guide for Conducting and Reporting Narrative Reviews, Meta-Analyses, and Meta-Syntheses Annual Review of Psychology Vol. 70:747-770 (Volume publication date January 2019) First published as a Review in Advance on August 8, 2018 https://doi.org/10.1146/annurev-psych-010418-102803

  8. Methods

    Finding What Works in Health Care: Standards for Systematic Reviews. Washington, DC: National Academies . Cochrane Handbook of Systematic Reviews of Interventions, version 6 (2019) Center for Reviews and Dissemination (University of York, England) (2009). Systematic Reviews: CRD's guidance for undertaking systematic reviews in health care.

  9. A step by step guide for conducting a systematic review and meta

    To solve those hindrances, this methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly conduct a SR/MA, in which all the steps here depicts our experience and expertise combined with the already well-known and accep...

  10. Guidance to best tools and practices for systematic reviews

    Methods and guidance to produce a reliable evidence synthesis. Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table (Table1). 1).They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and ...

  11. The PRISMA 2020 statement: an updated guideline for reporting ...

    Over the past decade, advances in systematic review methodology and terminology have necessitated an update to the guideline. The PRISMA 2020 statement replaces the 2009 statement and includes new reporting guidance that reflects advances in methods to identify, select, appraise, and synthesise studies. The structure and presentation of the ...

  12. How to Undertake an Impactful Literature Review: Understanding Review

    The systematic literature review (SLR) is one of the important review methodologies which is increasingly becoming popular to synthesize literature in any discipline in general and management in particular. In this article, we explain the SLR methodology and provide guidelines for performing and documenting these studies.

  13. Guidance to best tools and practices for systematic reviews

    Part 1. The state of evidence synthesis Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice.

  14. Systematic Review

    A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer. Example: Systematic review

  15. How to Undertake an Impactful Literature Review: Understanding Review

    Important aspects of a systematic literature review (SLR) include a structured method for conducting the study and significant transparency of the approaches used for summarizing the literature (Hiebl, 2023).The inspection of existing scientific literature is a valuable tool for (a) developing best practices and (b) resolving issues or controversies over a single study (Gupta et al., 2018).

  16. Guidelines for writing a systematic review

    A Systematic Review (SR) is a synthesis of evidence that is identified and critically appraised to understand a specific topic. SRs are more comprehensive than a Literature Review, which most academics will be familiar with, as they follow a methodical process to identify and analyse existing literature ( Cochrane, 2022 ).

  17. How-to conduct a systematic literature review: A quick guide for

    Method details Overview. A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure [12].An SLR updates the reader with current literature about a subject [6].The goal is to review critical points of current knowledge on a ...

  18. The PRISMA 2020 statement: an updated guideline ...

    The PRISMA 2020 items are relevant for mixed-methods systematic reviews (which include quantitative and qualitative studies), but reporting guidelines addressing the presentation and synthesis of qualitative data should also be consulted [39, 40]. PRISMA 2020 can be used for original systematic reviews, updated systematic reviews, or ...

  19. Systematic Reviews: Step 8: Write the Review

    The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) is a 27-item checklist used to improve transparency in systematic reviews. These items cover all aspects of the manuscript, including title, abstract, introduction, methods, results, discussion, and funding. The PRISMA checklist can be downloaded in PDF or Word files.

  20. PRISMA

    PRISMA is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses. PRISMA primarily focuses on the reporting of reviews evaluating the effects of interventions, but can also be used as a basis for reporting systematic reviews with objectives other than evaluating interventions (e.g. evaluating aetiology ...

  21. PDF The PRISMA 2020 statement: an updated guideline for reporting ...

    mixed-methods systematic reviews (which include quantitative and qualitative studies), but reporting guidelines addressing the presentation and synthesis of qualitative data should also be consulted.39 40 PRISMA 2020 can be used for original systematic reviews, updated systematic reviews, or continually updated ("living") systematic reviews.

  22. Home

    A systematic review is a literature review that gathers all of the available evidence matching pre-specified eligibility criteria to answer a specific research question. It uses explicit, systematic methods, documented in a protocol, to minimize bias, provide reliable findings, and inform decision-making. ¹ ²

  23. Evidence Syntheses and Systematic Reviews: Overview

    Systematic Review: Comprehensive literature synthesis on a specific research question, typically requires a team: Systematic; exhaustive and comprehensive; search of all available evidence ... Provide guidance on which methodology best suits your goals; Recommend databases and other information sources for searching;

  24. An overview of methodological approaches in systematic reviews

    Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the "gold standard" of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to...

  25. 'It depends': what 86 systematic reviews tell us about what strategies

    The effectiveness of interventions designed to increase the uptake of clinical practice guidelines and best practices among musculoskeletal professionals: a systematic review. BMC Health Serv Res. 2018;18:2-11. ... Pawson R, Greenhalgh T, Harvey G, et al. Realist review-a new method of systematic review designed for complex policy ...

  26. Comparison of clinical and radiological outcomes for the anterior and

    This protocol has been designed according to the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocol (PRISMA-P) [28, 29].The design and method have been formed through discussion between experts in the management of DDH and experts in the methodology of systematic reviews.

  27. Effect of exercise for depression: systematic review and network meta

    Objective To identify the optimal dose and modality of exercise for treating major depressive disorder, compared with psychotherapy, antidepressants, and control conditions. Design Systematic review and network meta-analysis. Methods Screening, data extraction, coding, and risk of bias assessment were performed independently and in duplicate. Bayesian arm based, multilevel network meta ...

  28. Integrating large language models in systematic reviews: a framework

    Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews.

  29. Interventions for Optimization of Guideline-Directed Medical Therapy

    Key Points. Question What are the most effective interventions for optimization of guideline-directed medical therapy (GDMT) in clinical practice?. Findings In this systematic review, interdisciplinary heart failure clinics were the most consistently effective for uptitration of GDMT. Other types of interventions, including audits, electronic health record alerts, or education-based ...