CLI tool that automates literature research from research questions to curated, ranked, and exported paper sets with structured reports.
- Generates search facets and academic queries from one or more research questions
- Discovers candidates from Semantic Scholar and OpenAlex
- Screens and analyzes papers with an LLM through LiteLLM
- Supports citation graph expansion for frequently referenced works
- Ranks papers and exports reports, references, JSON data, PDFs, and metrics
- Supports robust resume via a saved
state.json
- Use
discovery_sources = ["s2", "openalex"]for broader coverage. - Candidates are deduplicated across sources and source provenance is tracked.
- Optional expansion stage adds highly cross-referenced papers after ranking.
- Configure with
expand_citationsandmin_cross_refs.
- Export top papers to Zotero user or group libraries.
- Supports collection assignment, tags, and PDF attachment when available.
- Bring your own PDFs with
--inject-pdfsorinject_pdf_dir. - Match files by
{paper_id}.pdfor DOI-based filenames.
- Every run writes
metrics.jsonwith stage timings and aggregate counts. - Includes source breakdown plus PDF availability and usage metrics.
- Improved resume reliability from
state.jsoncheckpoints. - Safer state persistence with atomic writes.
- Configurable extraction strategy supports token budgets for LLM context limits.
- Falls back gracefully when PDFs are unavailable or extraction is limited.
uv pip install litresearchFor local development:
uv sync
uv run nox- Set an LLM API key for a LiteLLM-supported provider:
export OPENAI_API_KEY=your_key_here
# or
export ANTHROPIC_API_KEY=your_key_here- Optionally set a Semantic Scholar key for better rate limits:
export S2_API_KEY=your_key_here- Copy the example config and tune defaults:
cp litresearch.toml.example litresearch.toml- Run the pipeline:
litresearch run "What is the impact of large language models on software engineering?"- Inspect the output directory:
output/
report.md
paper_analyses.md
references.bib
references.ris
data.json
metrics.json
papers/
state.json
Run one or more research questions:
litresearch run \
"How do large language models affect developer productivity?" \
"What evidence exists about code quality impacts?"Override settings from the CLI:
litresearch run \
"How do LLMs affect software engineering?" \
--model anthropic/claude-sonnet-4-20250514 \
--top-n 10 \
--threshold 50 \
--output-dir runs/llm-se \
--overwriteResume an interrupted run:
litresearch resume output/state.jsonInject local PDFs for papers you already have:
litresearch run "Your research question" --inject-pdfs /path/to/pdfsInspect current configuration:
litresearch configSettings load in this order:
- CLI flags
- Environment variables
litresearch.toml- Built-in defaults
Supported environment variables:
OPENAI_API_KEYANTHROPIC_API_KEYOPENROUTER_API_KEYS2_API_KEYZOTERO_API_KEYS2_TIMEOUTS2_REQUESTS_PER_SECONDSCREENING_SELECTION_MODESCREENING_TOP_PERCENTSCREENING_TOP_KSCREENING_THRESHOLD
Start from the full example config:
cp litresearch.toml.example litresearch.tomlKey options include:
default_model = "openai/gpt-4o-mini"
llm_timeout = 120
max_retries = 3
retry_base_delay = 1.0
discovery_sources = ["s2"]
screening_selection_mode = "top_percent"
screening_top_percent = 0.3
screening_threshold = 60
top_n = 20
max_results_per_query = 20
expand_citations = false
min_cross_refs = 3
zotero_export = false
s2_timeout = 10
s2_requests_per_second = 1.0
pdf_extraction_mode = "budget"
pdf_token_budget = 4000
pdf_first_pages = 4
pdf_last_pages = 2
abstract_fallback = true
# inject_pdf_dir = "/path/to/pdfs"
output_dir = "output"Screening selection modes:
top_percent(default): deep-analyze the top share of screened papers globallytop_k: deep-analyze the top K screened papers globallythreshold: deep-analyze papers scoring>= screening_threshold
Semantic Scholar tuning:
s2_timeout: request timeout in secondss2_requests_per_second: global request rate cap across S2 endpoints
Discovery tuning:
discovery_sources: chooses2,openalex, or bothopenalex_email: optional email for OpenAlex polite pool rate limits
Citation expansion tuning:
expand_citations: enable or disable expansion stagemin_cross_refs: minimum citation graph references to include
Zotero export tuning:
zotero_export: enable export integrationzotero_library_id,zotero_library_type,zotero_collection_key,zotero_tag
report.md: main literature review report with research questions, search summary, top papers, and synthesispaper_analyses.md: detailed per-paper analysis for all analyzed papersreferences.bib: BibTeX for ranked papers when citation data is availablereferences.ris: RIS export for citation managersdata.json: machine-readable export of the pipeline statemetrics.json: per-stage timings and aggregate run metricspapers/: downloaded open-access PDFs for ranked papersstate.json: resumable pipeline checkpoint
uv run nox
uv run litresearch --helpv1.0.0 delivers a production-ready core workflow for automated literature research,
including multi-source discovery, ranking, export, and operational telemetry.