Open
Conversation
cf655d7 to
3ce26eb
Compare
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
3ce26eb to
636ae07
Compare
There was a problem hiding this comment.
Pull request overview
Adds multi-bucket split sharding to Quickwit indexes by introducing extra_index_uris and persisting the chosen bucket per split (SplitMetadata.storage_uri) so search, merge, GC, and tooling can always resolve the correct storage location.
Changes:
- Add
extra_index_uristo index config/template + metastore update flow, and persist per-splitstorage_uriwith a fallback helper (effective_storage_uri). - Update indexing, merge, search/list APIs, CLI, janitor, and garbage collection to read/write/delete splits using the per-split effective storage URI.
- Add round-robin bucket selection and an end-to-end integration test + docs updates.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| quickwit/quickwit-serve/src/lib.rs | Treat indexes as file-backed if any configured index URI (primary or extra) uses file/ram storage. |
| quickwit/quickwit-search/src/search_job_placer.rs | Group jobs by (index_uid, storage_uri); refactor grouping helper to comparator-based API. |
| quickwit/quickwit-search/src/root.rs | Carry per-split storage_uri through search + fetch-docs job paths and leaf request building. |
| quickwit/quickwit-search/src/list_terms.rs | Route list-terms leaf requests per (index_uid, storage_uri) group. |
| quickwit/quickwit-search/src/list_fields.rs | Route list-fields leaf requests per (index_uid, storage_uri) group. |
| quickwit/quickwit-proto/src/codegen/quickwit/quickwit.metastore.rs | Add extra_index_uris field to UpdateIndexRequest codegen. |
| quickwit/quickwit-proto/protos/quickwit/metastore.proto | Add extra_index_uris to metastore UpdateIndexRequest proto. |
| quickwit/quickwit-metastore/src/tests/index.rs | Update metastore update-index tests to pass extra_index_uris. |
| quickwit/quickwit-metastore/src/split_metadata_version.rs | Extend split metadata v0.8 serialization with optional storage_uri. |
| quickwit/quickwit-metastore/src/split_metadata.rs | Add storage_uri to SplitMetadata + effective_storage_uri helper. |
| quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs | Deserialize and apply extra_index_uris during update-index. |
| quickwit/quickwit-metastore/src/metastore/mod.rs | Add (de)serialization support for extra_index_uris in UpdateIndexRequestExt. |
| quickwit/quickwit-metastore/src/metastore/index_metadata/mod.rs | Persist extra_index_uris updates in index metadata; add unit test. |
| quickwit/quickwit-metastore/src/metastore/file_backed/mod.rs | Deserialize and apply extra_index_uris during update-index. |
| quickwit/quickwit-metastore/src/metastore/file_backed/file_backed_index/mod.rs | Thread extra_index_uris through file-backed index config updates. |
| quickwit/quickwit-janitor/src/actors/garbage_collector.rs | Pass storage resolver through GC plumbing; adjust mocks. |
| quickwit/quickwit-janitor/src/actors/delete_task_service.rs | Build an IndexingSplitStore with multiple storages + selector for delete pipeline. |
| quickwit/quickwit-janitor/src/actors/delete_task_planner.rs | Build SearchJob using split effective storage URI. |
| quickwit/quickwit-janitor/src/actors/delete_task_pipeline.rs | Use IndexingSplitStore instead of a single Storage in delete pipeline. |
| quickwit/quickwit-integration-tests/src/tests/multi_bucket_tests.rs | New end-to-end integration test covering multi-bucket ingest + search. |
| quickwit/quickwit-integration-tests/src/tests/mod.rs | Register the new multi-bucket test module. |
| quickwit/quickwit-indexing/src/split_store/mod.rs | Export bucket selector API. |
| quickwit/quickwit-indexing/src/split_store/indexing_split_store.rs | Support multiple storages + per-split read/write routing using effective URI. |
| quickwit/quickwit-indexing/src/split_store/bucket_selector.rs | New round-robin bucket selector + tests. |
| quickwit/quickwit-indexing/src/models/split_attrs.rs | Initialize new SplitMetadata.storage_uri field. |
| quickwit/quickwit-indexing/src/mature_merge.rs | Resolve all configured storages and write merged outputs via selector. |
| quickwit/quickwit-indexing/src/lib.rs | Re-export split-store cache and selector helpers. |
| quickwit/quickwit-indexing/src/actors/uploader.rs | Select bucket per new split and persist SplitMetadata.storage_uri. |
| quickwit/quickwit-indexing/src/actors/merge_split_downloader.rs | Fetch splits using split metadata (effective storage URI). |
| quickwit/quickwit-indexing/src/actors/indexing_service.rs | Build multi-storage IndexingSplitStore for indexing pipelines. |
| quickwit/quickwit-indexing/src/actors/indexing_pipeline.rs | Remove direct Storage from params; rely on IndexingSplitStore. |
| quickwit/quickwit-index-management/src/index.rs | Validate connectivity for extra storages; pass resolver into GC flows. |
| quickwit/quickwit-index-management/src/garbage_collection.rs | Group deletions per effective storage URI; resolve per-bucket storage for bulk delete. |
| quickwit/quickwit-config/src/index_template/serialize.rs | Add extra_index_uris to index template (de)serialization. |
| quickwit/quickwit-config/src/index_template/mod.rs | Add extra_index_uris to templates + validation; propagate into index configs. |
| quickwit/quickwit-config/src/index_config/serialize.rs | Add extra_index_uris to index config schema; enforce “no removals” on update. |
| quickwit/quickwit-config/src/index_config/mod.rs | Add extra_index_uris field + helper all_index_uris; include in fingerprinting. |
| quickwit/quickwit-common/src/uri.rs | Add ordering to Protocol and Uri to enable grouping/sorting by URI. |
| quickwit/quickwit-cli/src/tool.rs | Resolve the correct storage URI for a specific split when extracting. |
| quickwit/quickwit-cli/src/lib.rs | Checklist now validates connectivity for extra index storages too. |
| docs/reference/rest-api.md | Document extra_index_uris in create/update index REST payloads. |
| docs/configuration/storage-config.md | Mention extra_index_uris as another place storage URIs can be used. |
| docs/configuration/index-config.md | Document extra_index_uris and the multi-bucket split sharding behavior/caveat. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+502
to
+513
| let failed_split_paths = all_storage_failures | ||
| .iter() | ||
| .map(|split_info| split_info.file_name.as_path()) | ||
| .collect::<Vec<_>>(); | ||
| error!( | ||
| error=?bulk_delete_error.error, | ||
| index_id=index_uid.index_id, | ||
| storage_uri=%uri, | ||
| "failed to delete split file(s) {:?} from storage", | ||
| PrettySample::new(&failed_split_paths, 5), | ||
| ); | ||
| combined_storage_error = Some(bulk_delete_error); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Splits can now be distributed across multiple storage buckets for a single index. A new
extra_index_urisconfiguration option allows specifying additional storage URIs alongside the existingindex_uri. New splits are written to buckets using a round-robin strategy, and each split records which bucket it was stored in so that reads, merges, and garbage collection work correctly regardless of how the list evolves over time.Motivation
Previously, all splits for an index were stored under a single
index_uri. This change enables spreading data across multiple buckets for improved write throughput, storage isolation, or operational flexibility.Configuration
index_uriremains required and acts as the primary storage location.extra_index_urisis optional (defaults to empty — fully backward compatible).How it works
IndexingSplitStoreholds all resolved storages and aBucketSelector(round-robin by default). Each new split is assigned a target bucket before staging. The chosen URI is persisted inSplitMetadata.storage_uri.SearchJobandFetchDocsJobcarry the per-split storage URI. Leaf requests are grouped by(index_uid, storage_uri)so splits in different buckets get separate requests. No proto changes were needed.fetch_and_open_splittakes&SplitMetadataand resolves the correct bucket viaeffective_storage_uri(). Merged output splits are assigned a bucket by the selector.storage_uri: Noneand continue to be read fromindex_uri. No database migration is required — the field lives inside the existingsplit_metadata_jsoncolumn.Breaking changes
extra_index_uriscannot be read by older Quickwit versions (the field is omitted from serialized JSON when empty, so indexes not using the feature are unaffected).