Skip to content

fix: support Curvine LanceDB clone and conditional writes#824

Merged
szbr9486 merged 2 commits into
CurvineIO:mainfrom
jlon:feat/lancedb-on-curvine-pr04
May 15, 2026
Merged

fix: support Curvine LanceDB clone and conditional writes#824
szbr9486 merged 2 commits into
CurvineIO:mainfrom
jlon:feat/lancedb-on-curvine-pr04

Conversation

@jlon
Copy link
Copy Markdown
Contributor

@jlon jlon commented May 14, 2026

What changed

  • adds Curvine-aware shallow clone_table wiring for curvine:// LanceDB databases
  • supports source version and source tag resolution for shallow clone targets
  • implements PutMode::Update with Curvine-backed conditional write locking and eTag checks
  • extends object-store semantic coverage for conditional updates, multipart interactions, and missing-object behavior
  • expands minicluster e2e coverage for LanceDB table lifecycle, clone-table flows, schema/version/tag behavior, and object-store conditional writes

Why

This is PR 4 in the LanceDB on Curvine stack. It builds on PR #823 by adding the LanceDB semantics that require conditional object updates and correct URI/path handling across clone operations.

Stack

Validation

  • cargo fmt --all
  • cargo clippy -p curvine-lancedb-rs --all-targets --all-features -- -D warnings
  • cargo test -p curvine-lancedb-rs
  • cargo clippy -p curvine-tests --test lancedb_object_store_e2e -- -D warnings

}
}

async fn release_object_write_lock(&self, guard: &ConditionalWriteLock) -> OsResult<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock acquisition/release pair has two correctness gaps that LanceDB conditional writes will hit in production:

  1. No ownership check on release. release_object_write_lock unconditionally sends LockType::UnLock against guard.path. If process A acquired the lock, hit the 120 s timeout doing nothing (or stalled on GC), and process B subsequently re-acquired the same lock and started its critical section, process A's release call will clear B's lock and let C charge in. The conditional-update etag check on disk still narrows the race window, but the staging+rename sequence is no longer the only writer between B's head() and B's rename(). Consider gating release on the same owner_id/pid tuple that acquired the lock (server-side compare-and-clear), or returning the previous lock state and refusing to clear a foreign owner.

  2. No TTL / lease, no crash recovery. conditional_write_lock writes acquire_time: 0 and there's no heartbeat or expiry. If the LanceDB process is kill -9'd (OOM, deploy roll, machine reboot) mid-put_update, the advisory lock at /.curvine/lancedb/locks/<md5> stays held forever from Curvine's point of view. Every subsequent conditional write to that key will hit Timed out waiting for Curvine object write lock after 120 s and never recover, and the lock files accumulate at /.curvine/lancedb/locks/ with no GC. At minimum this needs:

    • A reasonable TTL/lease on the lock (so server reaps abandoned holders).
    • A non-zero acquire_time so server-side reaping has a basis.
    • A periodic cleanup path for the lock-file directory, or a documented operator runbook for the leak.

These are blockers for using PutMode::Update as a real "this is safe to commit metadata on" primitive, which is the whole point of the conditional path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked this against the Curvine server lock implementation and I do not think the two core risks apply to the current code path.

  • Release is owner-scoped: LockMeta::set_flock removes by key = client_id + owner_id, and this facade stores the same generated owner_id in the guard used for unlock. A stale A unlock cannot remove B’s later lock because B has a different owner key.
  • TTL/acquire_time are server-side: LockMeta::set_flock overwrites acquire_time = LocalTime::mills(), and MasterFilesystem::set_lock passes master.lock_expire_time_ms() (default 5m). The client sends 0, but that is not persisted for acquired locks.

The lock files themselves may remain as marker files under /.curvine/lancedb/locks, but the held lock state is stored in inode lock metadata and is expired by the existing Curvine lock path. I am leaving this unchanged in this PR.

Comment thread curvine-lancedb-rs/src/object_store.rs Outdated
.cloned()
.or_else(|| env::var(ClusterConf::ENV_CONF_FILE).ok())
.ok_or_else(missing_curvine_config_error)?;
Ok(format!("{CURVINE_SCHEME}${conf_path}"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns about embedding the raw config-file path in the store prefix:

  1. Cache key fragility. Lance uses the prefix as the dedup key in the ObjectStoreRegistry. Two callers pointing at the same Curvine cluster via different conf paths — /etc/curvine/cluster.toml vs ~/.curvine/cluster.toml, or absolute vs symlinked, or the conf file in two working dirs of the same binary — will get two distinct CurvineObjectStore instances, double the Curvine client connections, and double the curvine_session() allocations. For an integration like this where the intent is "one store per cluster", consider deriving the prefix from the cluster's master endpoint(s) parsed out of the conf (or a hash of the resolved cluster identity) rather than the raw file path.

  2. Mandatory config at prefix-calculation time. Today calculate_object_store_prefix is called by Lance on essentially every URI parse, including paths that go through the registry but don't ultimately need a Curvine connection (read-only manifest probes, listing, etc.). Making the function fail with "Missing Curvine cluster configuration" when the env var isn't set is a hard coupling that surprised me reading clone_table in connection.rs; consider falling back to a stable curvine$<authority>/<path> identity when the conf isn't resolvable, and only failing in new_store where the configuration is actually used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d6f4199d.

calculate_object_store_prefix no longer embeds the raw config-file path and no longer fails just because config is absent during prefix calculation. With a config file it parses the Curvine cluster config and derives the registry identity from the resolved master endpoints; without config it falls back to a stable URI-derived identity and leaves the real configuration failure to new_store.

I also added facade tests for both behaviors.

Comment thread curvine-lancedb-rs/src/object_store.rs Outdated
store: CURVINE_SCHEME,
source: e.to_string().into(),
})?;
if self.dir_contains_visible_file(&child).await? {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dir_contains_visible_file does a full recursive directory walk for every top-level subdirectory found during list_with_delimiter, which inflates the RPC count significantly:

  • list_with_delimiter(None) on a workspace with N top-level directories and on average M files per subtree will issue O(N * depth) list_status RPCs and walk O(N * M) files just to decide whether the directory should appear as a common_prefix. LanceDB calls list_with_delimiter on hot paths (manifest fanout, version table discovery), so this scales poorly as the dataset count in a Curvine workspace grows.
  • The walk also has no caching, so back-to-back list_with_delimiter calls pay the full cost again.

A couple of alternatives that keep the "don't return empty / staging-only prefix" semantics without the full walk:

  • Stop the recursion as soon as the first user-visible file is found — the current Box::pin recursion already does this, but it still has to do the full walk for genuinely-empty staging-only directories (which is the common case during multipart in-flight). A shallower test ("the subtree contains at least one file that isn't .curvine/...") is cheap if Curvine exposes a has_visible_descendant server-side primitive, or by special-casing the multipart/lock root paths (which are the only known "filler" sources right now) and skipping them at the listing level rather than after the fact.
  • Alternatively, drop the visibility pruning entirely and let LanceDB see the empty prefix — Lance's manifest discovery code can already tolerate prefixes that don't contain a .lance manifest.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d6f4199d.

list_with_delimiter no longer calls dir_contains_visible_file, so it does not recursively walk each candidate directory. It now filters only known internal Curvine LanceDB directories (/.curvine/lancedb/multipart and /.curvine/lancedb/locks) and otherwise returns the immediate directory as a common_prefix, matching normal delimiter semantics and avoiding the hot-path O(N*M) scan.

@jlon jlon force-pushed the feat/lancedb-on-curvine-pr04 branch from 584ebe8 to 2d63c0e Compare May 15, 2026 08:57
@jlon jlon changed the base branch from feat/lancedb-on-curvine-pr03 to main May 15, 2026 08:58
@szbr9486 szbr9486 merged commit 258a58d into CurvineIO:main May 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants