otto-SR

Research Paper • June 2025

Automation of Systematic Reviews with Large Language Models

Introducing otto-SR: An end-to-end agentic workflow that achieves superhuman performance in systematic review automation, completing 12 work-years of research in just 2 days.

96.7%

Screening Sensitivity

vs 81.7% human performance

93.1%

Data Extraction Accuracy

vs 79.7% human performance

2 Days

Complete Cochrane Issue

12 reviews, ~12 work-years

146K

Citations Processed

Across 12 systematic reviews

Revolutionizing Evidence Synthesis

Systematic reviews are the foundation of evidence-based medicine, but they typically take over 16 months and cost $100,000+ to complete. otto-SR changes that.

The Challenge

Time-Intensive Process

Traditional systematic reviews take 16+ months to complete

Human Error Prone

Dual human screening shows significant variability and missed studies

Resource Intensive

Costs upwards of $100,000 and requires specialized expertise

The Solution

AI-Powered Automation

GPT-4.1 for screening, o3-mini-high for data extraction

Superhuman Accuracy

Outperforms human reviewers in both sensitivity and specificity

Rapid Processing

Complete systematic reviews in days, not months

How otto-SR Works

An end-to-end agentic workflow supporting both fully automated and human-in-the-loop systematic reviews

1. Literature Search

Comprehensive search across databases to capture all potentially relevant citations

RIS format upload
Multiple database support
Automated deduplication

2. AI Screening

GPT-4.1 powered screening agent for abstract and full-text review

96.7% sensitivity
97.9% specificity
PDF to Markdown conversion

3. Data Extraction

o3-mini-high model for precise data extraction and analysis

93.1% accuracy
Structured data output
Meta-analysis ready

Breakthrough Results

otto-SR demonstrated superhuman performance across multiple systematic review tasks

Cochrane Reproducibility Study

Reproduced and updated an entire issue of Cochrane reviews (n=12) in under 2 days

Studies correctly identified

64/64 (100%)

Median studies incorrectly excluded

0 (IQR 0-0.25)

Additional eligible studies found

54 studies

New statistically significant findings

2 reviews

Performance Comparison

otto-SR vs traditional dual human reviewers across key metrics

Screening Sensitivityotto-SR vs Human

96.7%

81.7%

Data Extraction Accuracyotto-SR vs Human

93.1%

79.7%

Processing Timeotto-SR vs Traditional

2 days

12 work-years

Research Team

A collaborative effort across leading institutions worldwide

Lead Authors

Christian Cao - University of Toronto

Rohit Arora - Harvard Medical School

Paul Cento - Independent Researcher

Key Contributors

Niklas Bobrovitz - University of Calgary

George Church - Harvard Medical School

David Moher - University of Ottawa

Institutions

University of Toronto

Harvard Medical School

University of Calgary

MIT

McGill University

+ 12 more institutions

Transform Your Research Process

otto-SR represents a major advancement in systematic review automation. Join the future of evidence synthesis.