Skip to content

feat: add DoclingServeConverter integration#3173

Open
SyedShahmeerAli12 wants to merge 2 commits intodeepset-ai:mainfrom
SyedShahmeerAli12:feat/docling-serve-integration
Open

feat: add DoclingServeConverter integration#3173
SyedShahmeerAli12 wants to merge 2 commits intodeepset-ai:mainfrom
SyedShahmeerAli12:feat/docling-serve-integration

Conversation

@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor

Summary

Adds a DoclingServeConverter component that converts documents using a running docling-serve HTTP server, without any heavy ML dependencies (no PyTorch required).

  • Accepts URLs, local file paths, and ByteStream sources
  • Supports MARKDOWN, TEXT, and JSON export formats
  • Optional API key authentication via Haystack Secret
  • Both synchronous (run) and asynchronous (arun) execution
  • 27 unit tests, all passing

Closes #2960

Test plan

  • 27 unit tests passing
  • Lint clean (ruff check, ruff format)
  • Integration test requires a running docling-serve instance (pytest -m integration)

Adds a new `docling-serve-haystack` integration with a `DoclingServeConverter`
component that converts documents via a remote DoclingServe HTTP server instead
of loading heavy ML dependencies locally (no PyTorch required).

- Supports URLs, local file paths, and ByteStream sources
- Export formats: Markdown (default), plain text, JSON
- Both sync `run()` and async `arun()` methods
- Configurable conversion options, timeout, and optional API key auth
- Full unit test suite (mocked httpx) + integration test markers
- CI workflow, labeler, coverage comment, and root README table entry

Closes deepset-ai#2960
Adds a new DoclingServeConverter component that converts documents
by sending them to a running docling-serve HTTP server. Supports
local files, URLs, and ByteStreams; markdown, text, and JSON export
formats; optional API key authentication; and both sync (run) and
async (arun) execution.

Closes deepset-ai#2960
@SyedShahmeerAli12 SyedShahmeerAli12 requested a review from a team as a code owner April 16, 2026 20:12
@SyedShahmeerAli12 SyedShahmeerAli12 requested review from julian-risch and removed request for a team April 16, 2026 20:12
@github-actions github-actions bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 16, 2026
@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor Author

SyedShahmeerAli12 commented Apr 16, 2026

heyy ..... @julian-risch
this implements the DoclingServeConverter as described in #2960.

Key design decisions:

  • Used httpx instead of requests for native async support (arun())
  • api_key uses Haystack Secret class for secure serialization
  • convert_options is a single dict instead of individual params ...... cleaner and forward-compatible with new
    docling-serve options
  • Sources are base64-encoded and sent as JSON to /v1/convert/source (avoids multipart complexity)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:CI type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new docling-serve integration

1 participant