Summary
In a single-node Quickwit 0.9.0-nightly deployment with a Postgres metastore and an SQS file source running at ~1500 files/minute, the OTLP HTTP endpoint (POST /api/v1/otlp/v1/logs) hangs indefinitely. Quickwit logs report:
ERROR quickwit_serve::otlp_api::rest_handler:
otlp internal error: status: 'The service is currently unavailable',
self: "ingest service is unavailable (no shards available)"
ERROR quickwit_ingest::ingest_v2::router:
ingest request should not timeout as there is a timeout on independent ingest requests too.
timeout after 35000
ERROR quickwit_actors::actor_handle: actor-timeout actor="ControlPlane-..."
But the chitchat state shows an _ingest-source shard IS created and assigned to the indexer, and ingester.status=ready. The router cannot see the shard as assignable.
Environment
| Item |
Value |
| Image |
quickwit/quickwit:v0.9.0-rc (published 2026-04-19 on Docker Hub) |
| Deployment |
Single-node via docker-compose, command: run |
| Host |
AWS EC2, 4 vCPUs, not resource-constrained (CPU/memory under-utilised) |
| Target |
aarch64-unknown-linux-gnu |
| Metastore |
PostgreSQL |
| Storage |
S3 (s3://…/indexes/, region eu-west-1) |
enabled_services (chitchat) |
metastore,searcher,control_plane,janitor,indexer |
ingester.status (chitchat) |
ready |
readiness (chitchat) |
READY |
(Yes, I did restart container)
Workload
-
Noisy index: ***-logs with an SQS file source (***-sqs-filesource) consuming S3 notifications.
-
Sustained rate: ~1500 files per minute. Each S3 file becomes a distinct shard in the metastore:
INFO quickwit_metastore::metastore::postgres::metastore:
opened shard index_uid=***-logs:01KNF3635YNGTBZCWQEY6943JP
source_id=***-sqs-filesource
shard_id=s3://***-logs-prod/.../1776710923-xxxxx.ndjson.gz
leader_id= follower_id=None
at roughly 15–20 log lines per second.
-
Target index for OTLP: otel-logs-v0_9, ingest_settings.min_shards=1, _ingest-source (ingest-v2) present alongside _ingest-api-source.
Recurring error pattern in Quickwit logs
ERROR quickwit_actors::actor_handle: actor-timeout actor="ControlPlane-purple-6GTo"
ERROR quickwit_serve::otlp_api::rest_handler: otlp internal error: ... "ingest service is unavailable (no shards available)"
ERROR quickwit_ingest::ingest_v2::router: ingest request should not timeout... timeout after 35000
ERROR quickwit_indexing::source::queue_sources::shared_state: failed to prune shards error=TooManyRequests
ERROR quickwit_serve::rest: failed to serve connection: connection closed before message completed
The TooManyRequests from queue-sources correlates with the S3/SQS shard churn.
Direct Quickwit (bypassing any proxy) repros
curl -i --max-time 20 -X POST 'http://localhost:7280/api/v1/otlp/v1/logs' \
-H 'content-type: application/x-protobuf' \
-H 'qw-otel-logs-index: otel-logs-v0_9' \
--data-binary @valid-otlp.bin
# curl: (28) Operation timed out after 20001 milliseconds with 0 bytes received
A malformed or truncated protobuf body (\x00, or the first 50/100 bytes of a valid body) returns 400 "failed to decode Protobuf message" instantly — proving the request reaches Quickwit and the parse path is fast. Only full valid bodies hang.
Disabling the SQS file source unblocks OTLP
I temporarily disabled SQS file sources, then retried the Node OTLP client. OTLP started working within seconds — the log record landed in otel-logs-v0_9 on the first attempt.
After enabling sqs source, currently endpoint is responsive, I will post any additional updates.
Summary
In a single-node Quickwit 0.9.0-nightly deployment with a Postgres metastore and an SQS file source running at ~1500 files/minute, the OTLP HTTP endpoint (
POST /api/v1/otlp/v1/logs) hangs indefinitely. Quickwit logs report:But the chitchat state shows an
_ingest-sourceshard IS created and assigned to the indexer, andingester.status=ready. The router cannot see the shard as assignable.Environment
quickwit/quickwit:v0.9.0-rc(published 2026-04-19 on Docker Hub)command: runaarch64-unknown-linux-gnus3://…/indexes/, regioneu-west-1)enabled_services(chitchat)metastore,searcher,control_plane,janitor,indexeringester.status(chitchat)readyreadiness(chitchat)READY(Yes, I did restart container)
Workload
Noisy index:
***-logswith an SQS file source (***-sqs-filesource) consuming S3 notifications.Sustained rate: ~1500 files per minute. Each S3 file becomes a distinct shard in the metastore:
at roughly 15–20 log lines per second.
Target index for OTLP:
otel-logs-v0_9,ingest_settings.min_shards=1,_ingest-source(ingest-v2) present alongside_ingest-api-source.Recurring error pattern in Quickwit logs
The
TooManyRequestsfrom queue-sources correlates with the S3/SQS shard churn.Direct Quickwit (bypassing any proxy) repros
A malformed or truncated protobuf body (
\x00, or the first 50/100 bytes of a valid body) returns400 "failed to decode Protobuf message"instantly — proving the request reaches Quickwit and the parse path is fast. Only full valid bodies hang.Disabling the SQS file source unblocks OTLP
I temporarily disabled SQS file sources, then retried the Node OTLP client. OTLP started working within seconds — the log record landed in
otel-logs-v0_9on the first attempt.After enabling sqs source, currently endpoint is responsive, I will post any additional updates.