diff --git a/docs/_index.md b/docs/_index.md index 1d28b62..189a3b3 100644 --- a/docs/_index.md +++ b/docs/_index.md @@ -54,7 +54,7 @@ As of May 2026, SQLRite has: - Full-text search + hybrid retrieval (Phase 8 complete): FTS5-style inverted index with BM25 ranking + `fts_match` / `bm25_score` scalar functions + `try_fts_probe` optimizer hook + on-disk persistence with on-demand v4 → v5 file-format bump (8a-8c), a worked hybrid-retrieval example combining BM25 with vector cosine via raw arithmetic (8d), and a `bm25_search` MCP tool symmetric with `vector_search` (8e). See [`docs/fts.md`](fts.md). - SQL surface + DX follow-ups (Phase 9 complete, v0.2.0 → v0.9.1): DDL completeness — `DEFAULT`, `DROP TABLE` / `DROP INDEX`, `ALTER TABLE` (9a); free-list + manual `VACUUM` (9b) + auto-VACUUM (9c); `IS NULL` / `IS NOT NULL` (9d); `GROUP BY` + aggregates + `DISTINCT` + `LIKE` + `IN` (9e); four flavors of `JOIN` — INNER, LEFT, RIGHT, FULL OUTER (9f); prepared statements + `?` parameter binding with a per-connection LRU plan cache (9g); HNSW probe widened to cosine + dot via `WITH (metric = …)` (9h); `PRAGMA` dispatcher with the `auto_vacuum` knob (9i) - Benchmarks against SQLite + DuckDB (Phase 10 complete, SQLR-4 / SQLR-16): twelve-workload bench harness with a pluggable `Driver` trait, criterion-driven, pinned-host runs published. See [`docs/benchmarks.md`](benchmarks.md). -- Phase 11 (concurrent writes via MVCC + `BEGIN CONCURRENT`, SQLR-22) is in flight. **11.1 → 11.5: shipped.** `Connection` is `Send + Sync`; `Connection::connect()` mints sibling handles. `sqlrite::mvcc` exposes `MvccClock`, `ActiveTxRegistry`, `MvStore`, and `ConcurrentTx`. WAL header v1 → v2 persists the clock high-water mark. `PRAGMA journal_mode = mvcc;` opts a database into MVCC. `BEGIN CONCURRENT` writes go through commit-time validation against `MvStore` and abort with `SQLRiteError::Busy` on row-level write-write conflict. Reads via `Statement::query` see the BEGIN-time snapshot. **11.6 garbage collection: shipped on this branch.** Per-commit GC sweep on the write-set's chains + a new `Connection::vacuum_mvcc()` for explicit full-store drains; bounds the in-memory version chain growth that 11.4–11.5 left unbounded. Plan: [`docs/concurrent-writes-plan.md`](concurrent-writes-plan.md). +- Phase 11 (concurrent writes via MVCC + `BEGIN CONCURRENT`, SQLR-22) is in flight. **11.1 → 11.6: shipped.** `Connection` is `Send + Sync`; `Connection::connect()` mints sibling handles. `sqlrite::mvcc` exposes `MvccClock`, `ActiveTxRegistry`, `MvStore`, and `ConcurrentTx`. WAL header v1 → v2 persists the clock high-water mark. `PRAGMA journal_mode = mvcc;` opts a database into MVCC. `BEGIN CONCURRENT` writes go through commit-time validation and abort with `SQLRiteError::Busy` on row-level write-write conflict. Reads via `Statement::query` see the BEGIN-time snapshot. Per-commit GC + `Connection::vacuum_mvcc()` bound the in-memory version chain growth. **11.7 SDK propagation: shipped on this branch.** The C FFI gains `SqlriteStatus::Busy` / `BusySnapshot`. Python adds `sqlrite.BusyError` / `BusySnapshotError` subclasses. Node exports `errorKind(message)` + `ErrorKind` enum. Go adds `ErrBusy` / `ErrBusySnapshot` sentinels (matchable with `errors.Is`) + an `IsRetryable(err)` helper. Plan: [`docs/concurrent-writes-plan.md`](concurrent-writes-plan.md). - A fully-automated release pipeline that ships every product to its registry on every release with one human action — Rust engine + `sqlrite-ask` + `sqlrite-mcp` to crates.io, Python wheels to PyPI (`sqlrite`), Node.js + WASM to npm (`@joaoh82/sqlrite` + `@joaoh82/sqlrite-wasm`), Go module via `sdk/go/v*` git tag, plus C FFI tarballs, MCP binary tarballs, and unsigned desktop installers as GitHub Release assets (Phase 6 complete) See the [Roadmap](roadmap.md) for the full phase plan. diff --git a/docs/roadmap.md b/docs/roadmap.md index 0853d8c..dabf647 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -622,7 +622,7 @@ The headline slice. Multiple sibling `Connection`s can each hold their own open **Known limitations carried forward (most resolved in 11.5):** - ~~Reads via `Statement::query` / `Statement::query_with_params` bypass the swap.~~ ✅ Fixed in 11.5 — `Connection.concurrent_tx` is now `Mutex>` and a new `with_snapshot_read` helper threads the swap through `&self`. -- The `MvStore` write-set isn't yet persisted to the WAL — Phase 11.7 introduces an MVCC log-record frame kind so commits become durable through `MvStore` itself rather than via the legacy `Database::tables` mirror. (Durability already works through the legacy mirror in v0; the WAL log-record format is foundation work for cross-process MVCC.) +- The `MvStore` write-set isn't yet persisted to the WAL — Phase 11.8 introduces an MVCC log-record frame kind so commits become durable through `MvStore` itself rather than via the legacy `Database::tables` mirror. (Durability already works through the legacy mirror in v0; the WAL log-record format is foundation work for cross-process MVCC.) - `AUTOINCREMENT` inside `BEGIN CONCURRENT` isn't explicitly rejected; the v0 deep-clone-snapshot model handles concurrent INSERTs by isolating each tx's `last_rowid` bumps to its private snapshot, so two concurrent INSERTs on an `AUTOINCREMENT` column may collide at COMMIT and surface as `Busy`. Adopting the plan's "reject AUTOINCREMENT under MVCC" gate is a clean follow-up. - Tables touched by `BEGIN CONCURRENT` writes can't carry FTS or HNSW indexes today — `restore_row` only maintains B-tree secondary indexes. Concurrent-tx tests don't exercise FTS / HNSW, but a runtime guard would surface this with a clear error rather than producing inconsistent indexes. @@ -634,7 +634,7 @@ Lock order is consistently `concurrent_tx → inner` across every code path; dea This was renumbered out of plan-doc order: the plan-doc had 11.5 as checkpoint integration, but that's a much larger slice and the prepare/query-bypass-the-swap gap was a real correctness hole for users hitting `BEGIN CONCURRENT`. Plan-doc 11.5 (checkpoint) → roadmap 11.7; plan-doc 11.6 (GC) → roadmap 11.6 (this one). -### 🚧 Phase 11.6 — Garbage collection *(in progress, plan-doc "Phase 10.6"; promoted ahead of plan-doc 11.5 because unbounded `MvStore` growth was the next concrete user-impact concern after 11.5 closed the snapshot-read gap)* +### ✅ Phase 11.6 — Garbage collection *(plan-doc "Phase 10.6"; promoted ahead of plan-doc 11.5 because unbounded `MvStore` growth was the next concrete user-impact concern after 11.5 closed the snapshot-read gap)* Bounds in-memory growth of the [`MvStore`](../src/mvcc/store.rs) version chains. Without this, every committed version stays forever in the in-memory chain — a memory leak that grows linearly with commits. @@ -647,21 +647,34 @@ Bounds in-memory growth of the [`MvStore`](../src/mvcc/store.rs) version chains. **What 11.6 doesn't yet do:** - No background GC thread or `PRAGMA mvcc_gc_interval_ms`. Per-commit sweep + explicit `vacuum_mvcc()` cover the v0 model; the periodic-sweep variant lands as a follow-up if profiles show it's needed. -- GC sweeps don't trigger `Mvcc → Wal` journal-mode downgrades. The `set_journal_mode` setter still rejects the transition while the store carries committed versions; promoting that path requires the checkpoint-integration story from 11.7. +- GC sweeps don't trigger `Mvcc → Wal` journal-mode downgrades. The `set_journal_mode` setter still rejects the transition while the store carries committed versions; promoting that path requires the checkpoint-integration story from 11.8. -### Phase 11.7 — Checkpoint integration + crash recovery *(planned, plan-doc "Phase 10.5"; renumbered to follow GC because durability via the legacy `save_database` mirror already works in v0; this slice is foundation work for cross-process MVCC and column-level WAL deltas)* +### 🚧 Phase 11.7 — SDK propagation of `Busy` / `BusySnapshot` *(in progress, plan-doc "Phase 10.8"; promoted ahead of plan-doc 11.5 checkpoint work for the same reason 11.5 / 11.6 jumped the queue — surfacing retryable errors to SDK callers is what unblocks Python / Node / Go users from actually writing `BEGIN CONCURRENT` retry loops)* + +- **C FFI** ([`sqlrite-ffi/src/lib.rs`](../sqlrite-ffi/src/lib.rs)): new `SqlriteStatus::Busy = 5` and `SqlriteStatus::BusySnapshot = 6` codes alongside the existing `Ok` / `Error` / `InvalidArgument` set. `SqlriteStatus::is_retryable()` covers both. A new internal `status_of_sqlrite` mapper inspects the engine's `SQLRiteError` variant and routes `Busy` / `BusySnapshot` to the dedicated codes (the generic `status_of` keeps mapping every error to `Error`). `sqlrite_execute` switches to the engine-aware mapper so `BEGIN CONCURRENT` commits surface the dedicated codes through every language binding. Header regenerated automatically via `build.rs`. +- **Python SDK** ([`sdk/python/src/lib.rs`](../sdk/python/src/lib.rs)): two new exception classes `sqlrite.BusyError` and `sqlrite.BusySnapshotError`, both inheriting from `sqlrite.SQLRiteError`. Existing `except sqlrite.SQLRiteError` blocks keep catching them; retry helpers can branch with `except sqlrite.BusyError`. A new `map_engine_err` helper inspects the engine error variant and raises the matching exception class. Every engine-typed call site (open / execute / prepare / query / rows.next) routes through it. +- **Node.js SDK** ([`sdk/nodejs/src/lib.rs`](../sdk/nodejs/src/lib.rs)): new exported `ErrorKind` string enum (`'Busy'`, `'BusySnapshot'`, `'Other'`) and `errorKind(message: string)` classifier function. The engine's `thiserror` Display already prefixes retryable errors with `'Busy: '` / `'BusySnapshot: '`, so the classifier just regex-tests the prefix. JS callers wrap their `BEGIN CONCURRENT` loops in `try / catch (err) { if (errorKind(err.message) === ErrorKind.Busy) continue; }`. +- **Go SDK** ([`sdk/go/sqlrite.go`](../sdk/go/sqlrite.go)): two new sentinel error values `sqlrite.ErrBusy` and `sqlrite.ErrBusySnapshot`, plus an `IsRetryable(err error) bool` helper. `wrapErr` recognises the new FFI status codes and wraps the engine message with `fmt.Errorf("…: %w", ErrBusy)` so `errors.Is(err, sqlrite.ErrBusy)` works through the `database/sql` driver chain. +- **WASM SDK** — deliberately untouched. The browser WASM target is single-threaded; `BEGIN CONCURRENT` is meaningful but multi-handle concurrency through `Connection::connect` isn't yet exposed across `wasm-bindgen`'s lifetime model. When the multi-handle JS shape lands (separate slice), the same `Busy: …` message prefix will be the classifier hook for the WASM bindings too. + +**What this slice doesn't do:** + +- The multi-handle / sibling-`Connection` shape isn't exposed through any SDK yet. Each `sqlrite.connect(path)` / `new Database(path)` / `sql.Open(...)` builds an independent backing database. End-to-end testing of cross-handle `Busy` is therefore deferred to the multi-handle SDK slice; this PR ships the *plumbing* so once that wiring lands, callers already have the retry idioms they need. +- The WAL log-record durability work (plan-doc 11.5 / our 11.8) stays deferred. + +### Phase 11.8 — Checkpoint integration + crash recovery *(planned, plan-doc "Phase 10.5"; renumbered to follow GC + SDK propagation because durability via the legacy `save_database` mirror already works in v0; this slice is foundation work for cross-process MVCC and column-level WAL deltas)* MVCC log-record WAL frame format (the deferred 11.4 piece). Commit appends log records pre-`save_database`. Reopen replays log records into `MvStore`. Checkpoint drains `MvStore` versions back into the pager (so `Mvcc → Wal` becomes legal once the store is empty). Crash-recovery test: kill mid-commit between log-record append and version-chain push; reopen; verify the committed transaction is visible and the half-written one is not. -### Phase 11.8 — Indexes under MVCC *(deferred-by-design, plan-doc "Phase 10.7")* +### Phase 11.9 — Indexes under MVCC *(deferred-by-design, plan-doc "Phase 10.7")* Each secondary-index entry becomes its own `RowVersion`. Turso explicitly punted on this; SQLRite's v0 will reject `CREATE INDEX` while `journal_mode = mvcc`. -### Phase 11.9 — SDK + REPL propagation *(planned, plan-doc "Phase 10.8")* +### Phase 11.10 — Multi-handle SDK shape + REPL `.spawn` *(planned, was plan-doc 11.8's other half)* -Surface `Busy` / `BusySnapshot` through the FFI shim and each language SDK. New REPL `.spawn` meta-command + new "N concurrent writers" benchmark workload. +Expose `Connection::connect()` through the FFI + each SDK so Python / Node / Go callers can mint sibling handles, plus a new REPL `.spawn` meta-command. Without this, the 11.7 retry-error machinery can't actually be exercised end-to-end through an SDK (each SDK `connect()` builds an independent DB). Also adds the "N concurrent writers" benchmark workload. -### Phase 11.10 — Docs *(planned, plan-doc "Phase 10.9")* +### Phase 11.11 — Docs *(planned, plan-doc "Phase 10.9")* Promote the plan to `docs/concurrent-writes.md` and update the cross-references. diff --git a/sdk/go/sqlrite.go b/sdk/go/sqlrite.go index dc27732..b738fb5 100644 --- a/sdk/go/sqlrite.go +++ b/sdk/go/sqlrite.go @@ -127,12 +127,65 @@ const ( statusOk Status = 0 statusError Status = 1 statusInvalidArgument Status = 2 - statusDone Status = 101 - statusRow Status = 102 + // Phase 11.7 — retryable-error codes the C FFI surfaces from + // `BEGIN CONCURRENT` commit conflicts. See `ErrBusy` / + // `ErrBusySnapshot` below for the Go-side sentinels callers + // match against. + statusBusy Status = 5 + statusBusySnapshot Status = 6 + statusDone Status = 101 + statusRow Status = 102 ) +// Phase 11.7 — retryable error sentinels exposed to Go callers. +// Match against them with `errors.Is(err, sqlrite.ErrBusy)` / +// `errors.Is(err, sqlrite.ErrBusySnapshot)` to drive a retry +// loop: +// +// for { +// tx, err := db.Begin() +// if err != nil { return err } +// // ... do work, then: +// err = tx.Commit() +// if err == nil { break } +// if errors.Is(err, sqlrite.ErrBusy) || +// errors.Is(err, sqlrite.ErrBusySnapshot) { +// // tx was already rolled back by the engine +// continue +// } +// return err +// } +// +// Use [IsRetryable] for a kind-agnostic check. +var ( + // ErrBusy is returned when a `BEGIN CONCURRENT` commit hits a + // row-level write-write conflict. The transaction has already + // been rolled back; the caller should retry the whole + // transaction with a fresh `BEGIN CONCURRENT`. + ErrBusy = errors.New("sqlrite: busy (write-write conflict; transaction rolled back, retry)") + + // ErrBusySnapshot is returned when a `BEGIN CONCURRENT` read + // sees a row that has been superseded after the transaction's + // snapshot was taken. Same retry semantics as `ErrBusy`. + ErrBusySnapshot = errors.New("sqlrite: busy snapshot (snapshot stale; transaction rolled back, retry)") +) + +// IsRetryable reports whether `err` chains an `ErrBusy` or +// `ErrBusySnapshot` and should therefore be retried by the +// caller. Use it instead of comparing against individual +// sentinels so a future retryable variant (e.g. lock-wait +// timeout) doesn't force a caller-side change. +func IsRetryable(err error) bool { + return errors.Is(err, ErrBusy) || errors.Is(err, ErrBusySnapshot) +} + // wrapErr returns a Go error when the status code is nonzero. Use // after any `sqlrite_*` call that can fail. +// +// Phase 11.7 — retryable statuses surface as errors that wrap the +// matching sentinel (`ErrBusy` / `ErrBusySnapshot`) so callers can +// use `errors.Is` to branch their retry loops without parsing the +// message. func wrapErr(status Status, op string) error { if status == statusOk { return nil @@ -141,7 +194,14 @@ func wrapErr(status Status, op string) error { if msg == "" { msg = fmt.Sprintf("SQLRite status %d", uint32(status)) } - return fmt.Errorf("sqlrite: %s: %s", op, msg) + switch status { + case statusBusy: + return fmt.Errorf("sqlrite: %s: %s: %w", op, msg, ErrBusy) + case statusBusySnapshot: + return fmt.Errorf("sqlrite: %s: %s: %w", op, msg, ErrBusySnapshot) + default: + return fmt.Errorf("sqlrite: %s: %s", op, msg) + } } // cString converts a Go string into a C-allocated NUL-terminated diff --git a/sdk/go/sqlrite_test.go b/sdk/go/sqlrite_test.go index f5657f1..484d806 100644 --- a/sdk/go/sqlrite_test.go +++ b/sdk/go/sqlrite_test.go @@ -12,6 +12,8 @@ package sqlrite_test import ( "database/sql" + "errors" + "fmt" "os" "path/filepath" "strings" @@ -249,6 +251,74 @@ func TestBadSQLBubblesUpAsError(t *testing.T) { } } +// --------------------------------------------------------------------------- +// Phase 11.7 — BEGIN CONCURRENT / Busy sentinel errors + +func TestBusySentinelsAreDistinctErrors(t *testing.T) { + if sqlrite.ErrBusy == nil { + t.Fatal("sqlrite.ErrBusy is nil") + } + if sqlrite.ErrBusySnapshot == nil { + t.Fatal("sqlrite.ErrBusySnapshot is nil") + } + // Sanity: the two sentinels are independent values. + if errors.Is(sqlrite.ErrBusy, sqlrite.ErrBusySnapshot) { + t.Error("sqlrite.ErrBusy must not match sqlrite.ErrBusySnapshot via errors.Is") + } + if errors.Is(sqlrite.ErrBusySnapshot, sqlrite.ErrBusy) { + t.Error("sqlrite.ErrBusySnapshot must not match sqlrite.ErrBusy via errors.Is") + } +} + +func TestIsRetryableCoversBothSentinels(t *testing.T) { + if !sqlrite.IsRetryable(sqlrite.ErrBusy) { + t.Error("sqlrite.IsRetryable(sqlrite.ErrBusy) should be true") + } + if !sqlrite.IsRetryable(sqlrite.ErrBusySnapshot) { + t.Error("sqlrite.IsRetryable(sqlrite.ErrBusySnapshot) should be true") + } + if sqlrite.IsRetryable(errors.New("not a busy error")) { + t.Error("sqlrite.IsRetryable on a generic error should be false") + } + if sqlrite.IsRetryable(nil) { + t.Error("sqlrite.IsRetryable(nil) should be false") + } + // Wrapped errors flow through errors.Is — retry loops use + // `fmt.Errorf("... %w", sqlrite.ErrBusy)` shape, so we verify the + // helper recognises wrapped variants too. + wrapped := fmt.Errorf("commit failed: %w", sqlrite.ErrBusy) + if !sqlrite.IsRetryable(wrapped) { + t.Error("sqlrite.IsRetryable should unwrap %w to find sqlrite.ErrBusy") + } +} + +func TestJournalModeMvccReachesGoDriver(t *testing.T) { + // Sanity that `PRAGMA journal_mode = mvcc` reaches the engine + // through cgo. BEGIN CONCURRENT itself isn't usefully + // exercisable through `database/sql` today (the driver + // doesn't expose sibling Connection handles per the Phase + // 11.1 multi-connection contract), but PRAGMA accepts and the + // `BEGIN CONCURRENT` gate flips, which proves the cgo + // plumbing is right. + // + // Note: PRAGMA renders a single-row result in the engine's + // `CommandOutput.rendered`, but the Go driver routes non-SELECT + // statements through `sqlrite_execute` (no rows), so we don't + // try to read the value back through `db.Query`. + db := openMem(t) + mustExec(t, db, "PRAGMA journal_mode = mvcc") + // BEGIN CONCURRENT only succeeds once journal_mode is mvcc; + // the gate proves the toggle landed. + mustExec(t, db, "CREATE TABLE t (id INTEGER PRIMARY KEY)") + mustExec(t, db, "BEGIN CONCURRENT") + mustExec(t, db, "ROLLBACK") + // Unknown values still error cleanly (regression guard for + // the PRAGMA dispatcher). + if _, err := db.Exec("PRAGMA journal_mode = nonsense"); err == nil { + t.Fatal("expected unknown journal_mode to error") + } +} + func TestNonEmptyParametersRejected(t *testing.T) { db := openMem(t) mustExec(t, db, "CREATE TABLE t (id INTEGER PRIMARY KEY, name TEXT)") diff --git a/sdk/nodejs/src/lib.rs b/sdk/nodejs/src/lib.rs index 1d64f23..bf15fc9 100644 --- a/sdk/nodejs/src/lib.rs +++ b/sdk/nodejs/src/lib.rs @@ -54,6 +54,118 @@ fn map_err(e: E) -> napi::Error { napi::Error::from_reason(e.to_string()) } +// --------------------------------------------------------------------------- +// Phase 11.7 — error-kind helper for JS retry loops +// +// The engine's `thiserror` Display impl prefixes retryable errors with +// `"Busy: "` and `"BusySnapshot: "`. That prefix flows through +// `from_reason(e.to_string())` into `err.message` on the JS side, so +// JavaScript callers can already discriminate via regex. We expose a +// typed helper from the SDK so the retry idiom is one named function +// call instead of a string match — same UX SDK consumers expect from +// `process.errno` / `err.code` patterns in core Node modules. + +/// Distinguishes the three error kinds JS callers care about. +/// Returned by [`error_kind`]; mirrors the engine's +/// [`sqlrite::SQLRiteError::is_retryable`] split. +#[napi(string_enum)] +#[derive(Debug, PartialEq, Eq)] +pub enum ErrorKind { + /// Generic engine error or any non-retryable failure. JS + /// callers should propagate this — retrying won't help. + Other, + /// `BEGIN CONCURRENT` commit hit a row-level write-write + /// conflict. The transaction has already been rolled back; the + /// retry helper should call the user's closure again with a + /// fresh `BEGIN CONCURRENT`. + Busy, + /// Snapshot-isolation read anomaly. Same retry semantics as + /// `Busy`; SDK retry helpers branch on either kind. + BusySnapshot, +} + +/// Classifies a thrown error's `.message` into a retryable kind. +/// Pass `err.message` (a JS string) and the helper returns the +/// matching [`ErrorKind`]. Use it inside a `catch` block to +/// decide whether to retry: +/// +/// ```js +/// const { Database, errorKind, ErrorKind } = require('@joaoh82/sqlrite'); +/// // ... open db, set journal_mode = mvcc ... +/// while (true) { +/// try { +/// db.exec('BEGIN CONCURRENT'); +/// db.exec("UPDATE t SET v = v + 1 WHERE id = 1"); +/// db.exec('COMMIT'); +/// break; +/// } catch (err) { +/// const kind = errorKind(err.message); +/// if (kind === ErrorKind.Busy || kind === ErrorKind.BusySnapshot) { +/// continue; // retryable +/// } +/// throw err; +/// } +/// } +/// ``` +#[napi] +pub fn error_kind(message: String) -> ErrorKind { + classify_error_message(&message) +} + +/// Pure Rust classifier — split out so the unit tests don't need +/// to spin up napi. Returns the kind based on the engine's +/// `thiserror` prefix conventions: `"Busy: "` / `"BusySnapshot: "`. +fn classify_error_message(message: &str) -> ErrorKind { + // Note the ordering: `BusySnapshot` is checked first because + // it's a prefix of itself but `Busy: ` is also a prefix of + // `BusySnapshot: `. Matching the long form first avoids + // mis-classifying snapshot errors as plain Busy. + if message.starts_with("BusySnapshot: ") { + ErrorKind::BusySnapshot + } else if message.starts_with("Busy: ") { + ErrorKind::Busy + } else { + ErrorKind::Other + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn classify_recognises_busy_prefix() { + assert_eq!( + classify_error_message("Busy: write-write conflict on accounts/1"), + ErrorKind::Busy + ); + } + + #[test] + fn classify_recognises_busy_snapshot_prefix() { + assert_eq!( + classify_error_message("BusySnapshot: row 5 changed under us"), + ErrorKind::BusySnapshot + ); + } + + #[test] + fn classify_returns_other_for_generic_errors() { + assert_eq!( + classify_error_message("General error: bad SQL syntax"), + ErrorKind::Other + ); + assert_eq!(classify_error_message(""), ErrorKind::Other); + // A regular SELECT result shouldn't accidentally match + // — explicitly check a message that contains "Busy" but + // not as the prefix. + assert_eq!( + classify_error_message("The host is Busy: ..."), + ErrorKind::Other + ); + } +} + /// Converts a `sqlrite::Value` into a napi-compatible JS value using the /// env to allocate. Used both for row values and for error contexts. fn value_to_js(env: &Env, v: &Value) -> Result { diff --git a/sdk/python/src/lib.rs b/sdk/python/src/lib.rs index 892b1a9..6a65da1 100644 --- a/sdk/python/src/lib.rs +++ b/sdk/python/src/lib.rs @@ -70,10 +70,57 @@ pyo3::create_exception!( "Base error class for SQLRite failures." ); +// Phase 11.7 — distinct exception classes for the two retryable +// engine errors. They inherit from `SQLRiteError` so existing +// `except sqlrite.SQLRiteError` blocks still catch them, but +// retry helpers can branch on the specific subclass with +// `except sqlrite.BusyError` for snapshot-isolation-style retry +// loops without re-parsing the message. +pyo3::create_exception!( + sqlrite, + BusyError, + SQLRiteError, + "Raised when `BEGIN CONCURRENT` commits hit a row-level \ + write-write conflict. The transaction has already been \ + rolled back; the caller should retry the whole transaction \ + with a fresh `BEGIN CONCURRENT`. Subclass of `SQLRiteError`." +); + +pyo3::create_exception!( + sqlrite, + BusySnapshotError, + SQLRiteError, + "Raised when a `BEGIN CONCURRENT` read sees a row that has \ + been superseded after the transaction's snapshot was taken. \ + Same retry semantics as `BusyError` — wired through the same \ + SDK retry helper. Subclass of `SQLRiteError`." +); + +/// Generic error mapper — produces a base `SQLRiteError`. +/// Falls back to the engine's `Display` impl for the message +/// regardless of the variant. fn map_err(e: E) -> PyErr { SQLRiteError::new_err(e.to_string()) } +/// Phase 11.7 — engine-typed mapper. Inspects the variant and +/// raises `BusyError` / `BusySnapshotError` for retryable +/// failures, the base `SQLRiteError` otherwise. Every path that +/// receives a `Result<_, sqlrite::SQLRiteError>` should use +/// this — `map_err` (generic, above) collapses the variant and +/// SDK callers lose the retryability information. +/// +/// The full-path `sqlrite::SQLRiteError::…` references below +/// disambiguate the engine enum from the pyo3 exception class +/// of the same short name (`SQLRiteError`) created above. +fn map_engine_err(e: sqlrite::SQLRiteError) -> PyErr { + match &e { + sqlrite::SQLRiteError::Busy(_) => BusyError::new_err(e.to_string()), + sqlrite::SQLRiteError::BusySnapshot(_) => BusySnapshotError::new_err(e.to_string()), + _ => SQLRiteError::new_err(e.to_string()), + } +} + // --------------------------------------------------------------------------- // Connection // @@ -89,9 +136,9 @@ fn map_err(e: E) -> PyErr { #[pyo3(text_signature = "(database, /)")] fn connect(database: &str) -> PyResult { let rust_conn = if database == ":memory:" { - RustConnection::open_in_memory().map_err(map_err)? + RustConnection::open_in_memory().map_err(map_engine_err)? } else { - RustConnection::open(PathBuf::from(database)).map_err(map_err)? + RustConnection::open(PathBuf::from(database)).map_err(map_engine_err)? }; Ok(Connection { inner: Some(Mutex::new(rust_conn)), @@ -104,7 +151,8 @@ fn connect(database: &str) -> PyResult { #[pyfunction] #[pyo3(text_signature = "(database, /)")] fn connect_read_only(database: &str) -> PyResult { - let rust_conn = RustConnection::open_read_only(PathBuf::from(database)).map_err(map_err)?; + let rust_conn = + RustConnection::open_read_only(PathBuf::from(database)).map_err(map_engine_err)?; Ok(Connection { inner: Some(Mutex::new(rust_conn)), ask_config: None, @@ -176,7 +224,7 @@ impl Connection { fn commit(&mut self) -> PyResult<()> { self.with_inner("commit", |c| { if c.in_transaction() { - c.execute("COMMIT").map(|_| ()).map_err(map_err)?; + c.execute("COMMIT").map(|_| ()).map_err(map_engine_err)?; } Ok(()) }) @@ -187,7 +235,7 @@ impl Connection { fn rollback(&mut self) -> PyResult<()> { self.with_inner("rollback", |c| { if c.in_transaction() { - c.execute("ROLLBACK").map(|_| ()).map_err(map_err)?; + c.execute("ROLLBACK").map(|_| ()).map_err(map_engine_err)?; } Ok(()) }) @@ -710,13 +758,13 @@ impl Cursor { .unwrap_or(false); if is_query { - let stmt = c.prepare(sql).map_err(map_err)?; - let rows = stmt.query().map_err(map_err)?; + let stmt = c.prepare(sql).map_err(map_engine_err)?; + let rows = stmt.query().map_err(map_engine_err)?; self.description = Some(rows.columns().to_vec()); self.current_rows = Some(rows); self.last_status = Some("SELECT Statement prepared.".to_string()); } else { - let status = c.execute(sql).map_err(map_err)?; + let status = c.execute(sql).map_err(map_engine_err)?; self.current_rows = None; self.description = None; self.last_status = Some(status); @@ -779,7 +827,7 @@ impl Cursor { let Some(rows) = self.take_rows_for_iteration() else { return Ok(None); }; - match rows.next().map_err(map_err)? { + match rows.next().map_err(map_engine_err)? { Some(row) => { let owned = row.to_owned_row(); Ok(Some(owned_row_to_tuple(py, &owned)?)) @@ -798,7 +846,7 @@ impl Cursor { let limit = size.unwrap_or(usize::MAX); let mut out: Vec> = Vec::new(); while out.len() < limit { - match rows.next().map_err(map_err)? { + match rows.next().map_err(map_engine_err)? { Some(row) => { let owned = row.to_owned_row(); out.push(owned_row_to_tuple(py, &owned)?); @@ -911,6 +959,12 @@ fn owned_row_to_tuple(py: Python<'_>, row: &OwnedRow) -> PyResult> { fn sqlrite_module(m: &Bound<'_, PyModule>) -> PyResult<()> { m.add("__version__", env!("CARGO_PKG_VERSION"))?; m.add("SQLRiteError", m.py().get_type::())?; + // Phase 11.7 — retryable engine errors. Inherit from + // `SQLRiteError` so existing `except sqlrite.SQLRiteError` + // blocks still catch them. Retry loops can branch on the + // narrower class with `except sqlrite.BusyError`. + m.add("BusyError", m.py().get_type::())?; + m.add("BusySnapshotError", m.py().get_type::())?; m.add_function(wrap_pyfunction!(connect, m)?)?; m.add_function(wrap_pyfunction!(connect_read_only, m)?)?; m.add_class::()?; diff --git a/sdk/python/tests/test_sqlrite.py b/sdk/python/tests/test_sqlrite.py index 218eee7..0eb0044 100644 --- a/sdk/python/tests/test_sqlrite.py +++ b/sdk/python/tests/test_sqlrite.py @@ -230,6 +230,61 @@ def test_connection_execute_shortcut(conn): assert cur.fetchone() == (99,) +# --------------------------------------------------------------------------- +# Phase 11.7 — BEGIN CONCURRENT / BusyError propagation + + +def test_busy_error_class_exists_and_inherits_from_sqlrite_error(): + """`BusyError` and `BusySnapshotError` are reachable from the + `sqlrite` module and inherit from `SQLRiteError` so existing + `except sqlrite.SQLRiteError` blocks still catch them. + + Phase 11.7 — the headline SDK contract. Distinct exception + classes for the two retryable engine errors let retry helpers + branch on the narrower type without re-parsing the message + string. The base class catch-all still works for callers that + don't care about the distinction. + """ + assert hasattr(sqlrite, "BusyError") + assert hasattr(sqlrite, "BusySnapshotError") + # Subclass relationship — existing `except sqlrite.SQLRiteError` + # blocks still catch the new variants. + assert issubclass(sqlrite.BusyError, sqlrite.SQLRiteError) + assert issubclass(sqlrite.BusySnapshotError, sqlrite.SQLRiteError) + + +def test_journal_mode_pragma_reaches_python(conn): + """Sanity check that `PRAGMA journal_mode = mvcc` reaches the + engine through the Python SDK. + + BEGIN CONCURRENT is unusable from Python today without a + multi-handle API (each `sqlrite.connect()` builds an isolated + DB; siblings via `Connection::connect()` aren't yet exposed — + follow-up for 11.10), but PRAGMA accepts/rejects cleanly proves + the FFI plumbing is right. + + Note: PRAGMA goes through the cursor's non-query path (the + `is_query` heuristic only flags SELECT), so `fetchone()` after + PRAGMA returns None — the rendered single-row result lives in + the engine's `CommandOutput.rendered` field, which the Python + cursor doesn't surface today. The contract we're testing here + is "the PRAGMA executes without erroring", not "the read-form + rendering reaches Python". + """ + # Set the mode — must not error. + conn.execute("PRAGMA journal_mode = mvcc") + # Verify BEGIN CONCURRENT now succeeds (it would error with + # "requires PRAGMA journal_mode = mvcc" if the toggle didn't + # take). Roll back immediately; we're just probing the gate. + conn.execute("CREATE TABLE t (id INTEGER PRIMARY KEY)") + conn.execute("BEGIN CONCURRENT") + conn.execute("ROLLBACK") + # An unknown PRAGMA value still surfaces as an error class + # callers can catch (regression guard). + with pytest.raises(sqlrite.SQLRiteError): + conn.execute("PRAGMA journal_mode = nonsense") + + def test_executescript_runs_batched_statements(conn): cur = conn.cursor() cur.executescript( diff --git a/sqlrite-ffi/include/sqlrite.h b/sqlrite-ffi/include/sqlrite.h index f51fc50..62dd41c 100644 --- a/sqlrite-ffi/include/sqlrite.h +++ b/sqlrite-ffi/include/sqlrite.h @@ -27,6 +27,23 @@ typedef enum SqlriteStatus { // A required pointer argument was null, or an input string was // invalid UTF-8 / not NUL-terminated. InvalidArgument = 2, + // Phase 11.7 — a `BEGIN CONCURRENT` commit hit a row-level + // write-write conflict. The transaction has already been + // rolled back; the caller should retry the whole + // transaction with a fresh `BEGIN CONCURRENT`. SDK retry + // helpers branch on this code (or its `BusySnapshot` + // sibling) and call their user-provided closure again. + Busy = 5, + // Phase 11.7 — same retry semantics as [`Self::Busy`], but + // surfaces the snapshot-isolation specific case (a row in + // the transaction's read-set changed under us). Reserved + // for the read-anomaly path Phase 11.5+ wires through + // `MvStore`; today the engine emits it from the same + // `BEGIN CONCURRENT` commit path. Distinguished from + // `Busy` so SDKs can map each to its own + // per-language exception class without forking on the + // message string. + BusySnapshot = 6, // A SELECT query returned no more rows (returned from `step`). Done = 101, // A SELECT query produced a row (returned from `step`). diff --git a/sqlrite-ffi/src/lib.rs b/sqlrite-ffi/src/lib.rs index a4b8547..f33f0d0 100644 --- a/sqlrite-ffi/src/lib.rs +++ b/sqlrite-ffi/src/lib.rs @@ -71,12 +71,40 @@ pub enum SqlriteStatus { /// A required pointer argument was null, or an input string was /// invalid UTF-8 / not NUL-terminated. InvalidArgument = 2, + /// Phase 11.7 — a `BEGIN CONCURRENT` commit hit a row-level + /// write-write conflict. The transaction has already been + /// rolled back; the caller should retry the whole + /// transaction with a fresh `BEGIN CONCURRENT`. SDK retry + /// helpers branch on this code (or its `BusySnapshot` + /// sibling) and call their user-provided closure again. + Busy = 5, + /// Phase 11.7 — same retry semantics as [`Self::Busy`], but + /// surfaces the snapshot-isolation specific case (a row in + /// the transaction's read-set changed under us). Reserved + /// for the read-anomaly path Phase 11.5+ wires through + /// `MvStore`; today the engine emits it from the same + /// `BEGIN CONCURRENT` commit path. Distinguished from + /// `Busy` so SDKs can map each to its own + /// per-language exception class without forking on the + /// message string. + BusySnapshot = 6, /// A SELECT query returned no more rows (returned from `step`). Done = 101, /// A SELECT query produced a row (returned from `step`). Row = 102, } +impl SqlriteStatus { + /// Phase 11.7 — true for [`Self::Busy`] and + /// [`Self::BusySnapshot`]. Mirrors `SQLRiteError::is_retryable` + /// on the Rust side; SDK retry helpers should branch on this + /// rather than matching specific codes so a future retryable + /// code (e.g. lock-wait timeout) doesn't break callers. + pub fn is_retryable(self) -> bool { + matches!(self, Self::Busy | Self::BusySnapshot) + } +} + // --------------------------------------------------------------------------- // Thread-local last-error @@ -153,6 +181,11 @@ struct StmtHandle { /// Turns a `Result<_, E>` into a status code + last-error side effect. /// Callers capture the `Ok` value via an out-parameter before calling this. +/// +/// The generic version maps every error to [`SqlriteStatus::Error`]. +/// For results carrying an engine-typed `SQLRiteError`, use +/// [`status_of_sqlrite`] instead so retryable variants surface as +/// [`SqlriteStatus::Busy`] / [`SqlriteStatus::BusySnapshot`]. fn status_of(r: Result) -> (SqlriteStatus, Option) { match r { Ok(v) => { @@ -166,6 +199,31 @@ fn status_of(r: Result) -> (SqlriteStatus, Option } } +/// Phase 11.7 — specialised [`status_of`] for engine results. +/// Maps [`sqlrite::SQLRiteError::Busy`] / +/// [`sqlrite::SQLRiteError::BusySnapshot`] to the distinct status +/// codes that SDK retry helpers (Python / Node / Go / WASM) +/// branch on. Every other variant collapses to +/// [`SqlriteStatus::Error`] with the message in +/// [`sqlrite_last_error`]. +fn status_of_sqlrite(r: Result) -> (SqlriteStatus, Option) { + match r { + Ok(v) => { + clear_last_error(); + (SqlriteStatus::Ok, Some(v)) + } + Err(e) => { + let status = match e { + sqlrite::SQLRiteError::Busy(_) => SqlriteStatus::Busy, + sqlrite::SQLRiteError::BusySnapshot(_) => SqlriteStatus::BusySnapshot, + _ => SqlriteStatus::Error, + }; + set_last_error(e.to_string()); + (status, None) + } + } +} + /// Borrows a C string into &str or sets the last-error and returns None. unsafe fn cstr_to_str<'a>(ptr: *const c_char) -> Option<&'a str> { if ptr.is_null() { @@ -317,7 +375,10 @@ pub unsafe extern "C" fn sqlrite_execute( }; // Safety: caller guarantees `conn` is a valid handle. let handle = unsafe { &mut *(conn as *mut ConnHandle) }; - let (status, _) = status_of(handle.conn.execute(sql_str)); + // Use the SQLRiteError-aware mapper so `BEGIN CONCURRENT` + // commit conflicts surface as `Busy` / `BusySnapshot` rather + // than the generic `Error`. SDK retry helpers branch on these. + let (status, _) = status_of_sqlrite(handle.conn.execute(sql_str)); status } @@ -1095,6 +1156,104 @@ mod tests { } } + // ----------------------------------------------------------------- + // Phase 11.7 — BEGIN CONCURRENT / Busy status surfacing + // ----------------------------------------------------------------- + + /// Static check: `SqlriteStatus::is_retryable` covers both + /// `Busy` and `BusySnapshot`. SDK retry helpers branch on + /// this rather than matching specific variants so adding a + /// third retryable code later doesn't force a recompile. + #[test] + fn is_retryable_covers_busy_variants() { + assert!(SqlriteStatus::Busy.is_retryable()); + assert!(SqlriteStatus::BusySnapshot.is_retryable()); + assert!(!SqlriteStatus::Ok.is_retryable()); + assert!(!SqlriteStatus::Error.is_retryable()); + assert!(!SqlriteStatus::InvalidArgument.is_retryable()); + assert!(!SqlriteStatus::Done.is_retryable()); + assert!(!SqlriteStatus::Row.is_retryable()); + } + + /// Two `BEGIN CONCURRENT` transactions on the same row from + /// two FFI connections: the second commit must surface as + /// `SqlriteStatus::Busy` (not the generic `Error`), and the + /// retry on a fresh `BEGIN CONCURRENT` must succeed. + /// + /// This is the headline SDK contract: every binding layered + /// on the FFI can branch on the status code to drive + /// language-idiomatic retry helpers. + #[test] + fn begin_concurrent_busy_status_round_trip() { + unsafe { + // Two FFI handles sharing the same in-memory DB. + let mut a: *mut SqlriteConnection = ptr::null_mut(); + assert_eq!(sqlrite_open_in_memory(&mut a), SqlriteStatus::Ok); + // For Busy to fire we need two `Connection`s that + // share the same backing `Database`. The FFI's + // `sqlrite_open_in_memory` builds a fresh DB per + // call, so we reach into the inner `ConnHandle` and + // mint a sibling via `Connection::connect`. + let handle_a = &mut *(a as *mut ConnHandle); + let conn_b = handle_a.conn.connect(); + let b: *mut SqlriteConnection = + Box::into_raw(Box::new(ConnHandle { conn: conn_b })) as _; + + let (_c, p) = cstr("PRAGMA journal_mode = mvcc;"); + assert_eq!(sqlrite_execute(a, p), SqlriteStatus::Ok); + + let (_c, p) = cstr("CREATE TABLE t (id INTEGER PRIMARY KEY, v INTEGER);"); + assert_eq!(sqlrite_execute(a, p), SqlriteStatus::Ok); + let (_c, p) = cstr("INSERT INTO t (id, v) VALUES (1, 100);"); + assert_eq!(sqlrite_execute(a, p), SqlriteStatus::Ok); + + // Both transactions open before either writes. + let (_c, p) = cstr("BEGIN CONCURRENT;"); + assert_eq!(sqlrite_execute(a, p), SqlriteStatus::Ok); + let (_c, p) = cstr("BEGIN CONCURRENT;"); + assert_eq!(sqlrite_execute(b, p), SqlriteStatus::Ok); + + let (_c, p) = cstr("UPDATE t SET v = 200 WHERE id = 1;"); + assert_eq!(sqlrite_execute(a, p), SqlriteStatus::Ok); + let (_c, p) = cstr("UPDATE t SET v = 300 WHERE id = 1;"); + assert_eq!(sqlrite_execute(b, p), SqlriteStatus::Ok); + + // First commit wins. + let (_c, p) = cstr("COMMIT;"); + assert_eq!(sqlrite_execute(a, p), SqlriteStatus::Ok); + + // Second commit aborts with Busy — the dedicated + // status code SDKs hang their retry helpers off. + let (_c, p) = cstr("COMMIT;"); + let status = sqlrite_execute(b, p); + assert_eq!( + status, + SqlriteStatus::Busy, + "expected Busy, got {status:?}; last_err={:?}", + last_err() + ); + assert!(status.is_retryable()); + assert!( + last_err() + .map(|e| e.contains("write-write conflict")) + .unwrap_or(false), + "Busy status should still set last_error, got {:?}", + last_err() + ); + + // Retry on a fresh BEGIN CONCURRENT succeeds. + let (_c, p) = cstr("BEGIN CONCURRENT;"); + assert_eq!(sqlrite_execute(b, p), SqlriteStatus::Ok); + let (_c, p) = cstr("UPDATE t SET v = 300 WHERE id = 1;"); + assert_eq!(sqlrite_execute(b, p), SqlriteStatus::Ok); + let (_c, p) = cstr("COMMIT;"); + assert_eq!(sqlrite_execute(b, p), SqlriteStatus::Ok); + + sqlrite_close(b); + sqlrite_close(a); + } + } + #[test] fn step_without_query_returns_error() { // Column accessors against a handle that hasn't produced a row