diff --git a/Cargo.toml b/Cargo.toml index e0bd4d9..d4588bd 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -73,6 +73,15 @@ path = "examples/rust/quickstart.rs" name = "hybrid-retrieval" path = "examples/hybrid-retrieval/hybrid_retrieval.rs" +# Phase 11.12 — BEGIN CONCURRENT retry-loop demo. Mints a sibling +# Connection via `Connection::connect`, runs two concurrent +# transactions (first disjoint, then same-row), and surfaces the +# Busy / retry path. Run with `cargo run --example concurrent_writers`. +# See `docs/concurrent-writes.md` for the conceptual walkthrough. +[[example]] +name = "concurrent_writers" +path = "examples/rust/concurrent_writers.rs" + [features] # Default build includes everything: the REPL binary (cli) and # POSIX/Windows advisory file locks on the Pager (file-locks). diff --git a/docs/_index.md b/docs/_index.md index ae29a8e..32f7ab9 100644 --- a/docs/_index.md +++ b/docs/_index.md @@ -16,7 +16,8 @@ A small, hand-written guide to the SQLRite codebase — how it's structured, how ## Using SQLRite as a library - [Embedding](embedding.md) — the public `Connection` / `Statement` / `Rows` API (Phase 5a) and where the non-Rust SDKs plug in (Phase 5b – 5g) -- [`examples/`](../examples/) — runnable Rust quickstart (`cargo run --example quickstart`); language-specific subdirectories fill in as each 5x sub-phase lands +- [Concurrent writes — MVCC + `BEGIN CONCURRENT`](concurrent-writes.md) — Phase 11 canonical reference: SQL surface, embedding API, SDK error mapping, REPL meta-commands, durability story, limitations. Design rationale lives in the [historical plan-doc](concurrent-writes-plan.md). +- [`examples/`](../examples/) — runnable Rust quickstart (`cargo run --example quickstart`) + concurrent-writers retry-loop demo (`cargo run --example concurrent_writers`); language-specific subdirectories fill in as each 5x sub-phase lands ## Phase 7 — AI-era extensions @@ -54,7 +55,7 @@ As of May 2026, SQLRite has: - Full-text search + hybrid retrieval (Phase 8 complete): FTS5-style inverted index with BM25 ranking + `fts_match` / `bm25_score` scalar functions + `try_fts_probe` optimizer hook + on-disk persistence with on-demand v4 → v5 file-format bump (8a-8c), a worked hybrid-retrieval example combining BM25 with vector cosine via raw arithmetic (8d), and a `bm25_search` MCP tool symmetric with `vector_search` (8e). See [`docs/fts.md`](fts.md). - SQL surface + DX follow-ups (Phase 9 complete, v0.2.0 → v0.9.1): DDL completeness — `DEFAULT`, `DROP TABLE` / `DROP INDEX`, `ALTER TABLE` (9a); free-list + manual `VACUUM` (9b) + auto-VACUUM (9c); `IS NULL` / `IS NOT NULL` (9d); `GROUP BY` + aggregates + `DISTINCT` + `LIKE` + `IN` (9e); four flavors of `JOIN` — INNER, LEFT, RIGHT, FULL OUTER (9f); prepared statements + `?` parameter binding with a per-connection LRU plan cache (9g); HNSW probe widened to cosine + dot via `WITH (metric = …)` (9h); `PRAGMA` dispatcher with the `auto_vacuum` knob (9i) - Benchmarks against SQLite + DuckDB (Phase 10 complete, SQLR-4 / SQLR-16): twelve-workload bench harness with a pluggable `Driver` trait, criterion-driven, pinned-host runs published. See [`docs/benchmarks.md`](benchmarks.md). -- Phase 11 (concurrent writes via MVCC + `BEGIN CONCURRENT`, SQLR-22) is in flight. **11.1 → 11.9: shipped.** Engine + SDK error propagation: `Connection` is `Send + Sync`; `Connection::connect()` mints sibling handles. `sqlrite::mvcc` exposes `MvccClock`, `ActiveTxRegistry`, `MvStore`, `ConcurrentTx`, and the `MvccCommitBatch` / `MvccLogRecord` WAL codec. WAL header v1 → v2 persisted the clock high-water mark; **v2 → v3 (11.9)** adds typed MVCC log-record frames. `PRAGMA journal_mode = mvcc;` opts a database into MVCC. `BEGIN CONCURRENT` writes commit-validate against `MvStore`, abort with `SQLRiteError::Busy`, and now also append an MVCC log-record frame to the WAL — covered by the same fsync as the legacy page commit. Reopen replays those frames into `MvStore` and seeds `MvccClock` past the highest committed `commit_ts`, so the MVCC conflict-detection window survives a process restart. Reads via `Statement::query` see the BEGIN-time snapshot. Per-commit GC + `vacuum_mvcc()` bound version-chain growth. C FFI / Python / Node / Go all propagate `Busy` / `BusySnapshot` as typed retryable errors; the FFI's `sqlrite_connect_sibling`, Python's `Connection.connect()`, and Node's `db.connect()` mint sibling handles that share backing state. Plan: [`docs/concurrent-writes-plan.md`](concurrent-writes-plan.md). +- Phase 11 (concurrent writes via MVCC + `BEGIN CONCURRENT`, SQLR-22) is **shipped end-to-end through 11.11a** plus the 11.12 docs sweep — a small set of follow-ups (checkpoint-drain to enable `Mvcc → Wal` downgrade; indexes under MVCC; the "N concurrent writers" benchmark workload) remain explicitly parked. `Connection` is `Send + Sync`; `Connection::connect()` mints sibling handles. `sqlrite::mvcc` exposes `MvccClock`, `ActiveTxRegistry`, `MvStore`, `ConcurrentTx`, and the `MvccCommitBatch` / `MvccLogRecord` WAL codec. WAL header v1 → v2 persisted the clock high-water mark; v2 → v3 added typed MVCC log-record frames. `PRAGMA journal_mode = mvcc;` opts a database into MVCC. `BEGIN CONCURRENT` writes commit-validate against `MvStore`, abort with `SQLRiteError::Busy`, and append a typed MVCC log-record frame to the WAL — covered by the same fsync as the legacy page commit. Reopen replays those frames into `MvStore` and seeds `MvccClock` past the highest committed `commit_ts`, so the MVCC conflict-detection window survives a process restart. Reads via `Statement::query` see the BEGIN-time snapshot. Per-commit GC + `vacuum_mvcc()` bound version-chain growth. C FFI / Python / Node / Go propagate `Busy` / `BusySnapshot` as typed retryable errors; the FFI's `sqlrite_connect_sibling`, Python's `Connection.connect()`, and Node's `db.connect()` mint sibling handles that share backing state. The `sqlrite` REPL ships `.spawn` / `.use` / `.conns` for interactive demos. **User-facing reference:** [`docs/concurrent-writes.md`](concurrent-writes.md); runnable example at [`examples/rust/concurrent_writers.rs`](../examples/rust/concurrent_writers.rs). Original design proposal: [`docs/concurrent-writes-plan.md`](concurrent-writes-plan.md). - A fully-automated release pipeline that ships every product to its registry on every release with one human action — Rust engine + `sqlrite-ask` + `sqlrite-mcp` to crates.io, Python wheels to PyPI (`sqlrite`), Node.js + WASM to npm (`@joaoh82/sqlrite` + `@joaoh82/sqlrite-wasm`), Go module via `sdk/go/v*` git tag, plus C FFI tarballs, MCP binary tarballs, and unsigned desktop installers as GitHub Release assets (Phase 6 complete) See the [Roadmap](roadmap.md) for the full phase plan. diff --git a/docs/concurrent-writes-plan.md b/docs/concurrent-writes-plan.md index 2e9ae1a..8abd267 100644 --- a/docs/concurrent-writes-plan.md +++ b/docs/concurrent-writes-plan.md @@ -1,8 +1,16 @@ # Concurrent writes plan — MVCC + `BEGIN CONCURRENT` -**Status:** proposal, not yet scheduled. Drafted 2026-05-07. +> 📘 **Looking for the user-facing reference?** This is the original +> design proposal, kept as the historical record of the decisions +> that shaped Phase 11. For the shipped surface — SQL, embedding API, +> SDK error mapping, REPL meta-commands, durability story, +> limitations — read [**`concurrent-writes.md`**](concurrent-writes.md) +> first; come back here when you want the *why* and the +> sequencing discussion. + +**Status:** shipped end-to-end through Phase 11.11a (May 2026); a small set of follow-ups remain explicitly parked — see the [roadmap](roadmap.md#phase-11--concurrent-writes-via-mvcc--begin-concurrent-sqlr-22-in-flight--see-concurrent-writes-planmd). Drafted 2026-05-07. **Inspiration:** [Turso](https://turso.tech) — a SQLite-compatible engine, written in Rust, that implements multi-version concurrency control to lift SQLite's single-writer ceiling. See [`turso/core/mvcc/`](https://github.com/tursodatabase/turso/tree/main/core/mvcc) and the [Turso concurrent-writes docs](https://docs.turso.tech/tursodb/concurrent-writes). -**Tracks:** SQLR-?? (Marvin) — to be filed alongside this doc. +**Tracks:** [SQLR-22](https://app.marvinapp.io/) (Marvin). This document proposes adding **multi-version concurrency control (MVCC)** and a **`BEGIN CONCURRENT`** transaction mode to SQLRite, enabling multiple writers in the same process to make progress in parallel under snapshot isolation, with row-level write-write conflict detection at commit. It is intentionally a *plan* — there is no code yet. @@ -296,9 +304,11 @@ Index maintenance under MVCC is hard enough that Turso explicitly punted on it. ### Phase 10.9 — Docs -- Promote this plan to `docs/concurrent-writes.md` (the canonical user-facing reference), keeping `concurrent-writes-plan.md` as the historical design document. -- Update [roadmap.md](roadmap.md), [`docs/_index.md`](_index.md), [supported-sql.md](supported-sql.md), [embedding.md](embedding.md), [design-decisions.md](design-decisions.md). -- Add a worked example under `examples/rust/concurrent_writers.rs`. +> **Status (roadmap 11.12 — May 2026):** Shipped. The canonical user-facing reference at [`docs/concurrent-writes.md`](concurrent-writes.md) covers the SQL surface, embedding API, SDK error mapping, REPL meta-commands, durability story, and limitations as of Phase 11.11a. This plan-doc is now the historical record. Cross-references in `_index.md`, `supported-sql.md`, `embedding.md`, and `design-decisions.md` point at the canonical doc; a runnable example lives at [`examples/rust/concurrent_writers.rs`](../examples/rust/concurrent_writers.rs). + +- Promote this plan to `docs/concurrent-writes.md` (the canonical user-facing reference), keeping `concurrent-writes-plan.md` as the historical design document. **(Shipped — 11.12.)** +- Update [roadmap.md](roadmap.md), [`docs/_index.md`](_index.md), [supported-sql.md](supported-sql.md), [embedding.md](embedding.md), [design-decisions.md](design-decisions.md). **(Shipped — 11.12.)** +- Add a worked example under `examples/rust/concurrent_writers.rs`. **(Shipped — 11.12.)** --- diff --git a/docs/concurrent-writes.md b/docs/concurrent-writes.md new file mode 100644 index 0000000..c99445a --- /dev/null +++ b/docs/concurrent-writes.md @@ -0,0 +1,340 @@ +# Concurrent writes — MVCC + `BEGIN CONCURRENT` + +User-facing reference for SQLRite's multi-version concurrency control. For the original design discussion + sequencing decisions, see [`concurrent-writes-plan.md`](concurrent-writes-plan.md); this doc covers the *shipped* surface as of Phase 11.11a (May 2026). + +--- + +## TL;DR + +```sql +PRAGMA journal_mode = mvcc; -- once per database +BEGIN CONCURRENT; +UPDATE accounts SET balance = balance - 50 WHERE id = 1; +UPDATE accounts SET balance = balance + 50 WHERE id = 2; +COMMIT; -- may return Busy → caller retries +``` + +Two writers on *disjoint* rows now make progress in parallel; two writers on the *same* row see the second commit fail fast with [`SQLRiteError::Busy`](../src/error.rs), which the caller retries. The data structure backing this is a per-row in-memory version chain ([`MvStore`](../src/mvcc/store.rs)) sitting in front of the existing pager; the on-disk format is unchanged — durability piggybacks on the WAL via a new `MvccCommitBatch` frame (Phase 11.9). Reads inside a `BEGIN CONCURRENT` transaction see a stable BEGIN-time snapshot. + +The story is the same one Turso ships with `--experimental-mvcc`, narrowed for SQLRite's single-process scope. + +--- + +## Why MVCC + +SQLite (and pre-Phase-11 SQLRite) serializes every writer through a single exclusive lock. Two writers touching *unrelated rows* still wait on each other — the lock is page- or file-granularity, not row-granularity. For workloads where most writes don't actually conflict, that's throughput left on the table. + +Phase 11 replaces the lock-for-the-whole-transaction model with **optimistic concurrency control**: writes run against a per-transaction snapshot; the engine only checks for conflicts at `COMMIT`, and only on the row IDs the transaction actually touched. The shape is straight out of [Hekaton (Larson et al., VLDB 2011)](https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/main-mem-cc-techreport.pdf). + +What you get: + +- **Disjoint-row writers run in parallel.** A `BEGIN CONCURRENT` on connection A and one on connection B can both progress; commit ordering is decided by a process-wide logical clock, not by lock acquisition. +- **Snapshot-isolated reads.** A reader inside `BEGIN CONCURRENT` sees the database as it was at BEGIN time, regardless of what other writers commit in the meantime. +- **Row-level conflict detection.** The unit of conflict is `(table, rowid)`, not a page or a table. +- **Same on-disk format.** Existing `.sqlrite` files open unchanged. The toggle is `PRAGMA journal_mode = mvcc;`. + +What you don't get (v0; see [Limitations](#limitations)): + +- Cross-process MVCC. The version index is in-memory only; multi-process writers still serialize through the pager's `flock`. +- `CREATE INDEX` while `journal_mode = mvcc`. Index maintenance under MVCC is Phase 11.10 (deferred-by-design). +- DDL inside `BEGIN CONCURRENT`. Rejected with a typed error; commit your DDL outside the concurrent transaction. + +--- + +## Quick start + +```sql +-- 1. Opt the database into MVCC. Per-database setting; survives reopens. +PRAGMA journal_mode = mvcc; + +-- 2. Multi-row update inside a concurrent transaction. +BEGIN CONCURRENT; +INSERT INTO orders (id, customer, total) VALUES (1, 'alice', 100); +UPDATE inventory SET stock = stock - 1 WHERE sku = 'WIDGET-A'; +COMMIT; + +-- 3. If two concurrent transactions touch the same row, the second +-- commit fails with Busy. Retry with a fresh BEGIN CONCURRENT. +``` + +The same end-to-end thing from Rust: + +```rust +use sqlrite::{Connection, SQLRiteError}; + +let mut conn = Connection::open("orders.sqlrite")?; +conn.execute("PRAGMA journal_mode = mvcc")?; + +loop { + conn.execute("BEGIN CONCURRENT")?; + conn.execute("INSERT INTO orders (id, customer, total) VALUES (1, 'alice', 100)")?; + conn.execute("UPDATE inventory SET stock = stock - 1 WHERE sku = 'WIDGET-A'")?; + match conn.execute("COMMIT") { + Ok(_) => break, + Err(e) if e.is_retryable() => { + conn.execute("ROLLBACK").ok(); + continue; + } + Err(e) => return Err(e.into()), + } +} +# Ok::<(), sqlrite::SQLRiteError>(()) +``` + +[`SQLRiteError::is_retryable`](../src/error.rs) covers both `Busy` (write-write conflict at commit) and `BusySnapshot` (the snapshot the read path expected has been GC'd) — see [Error semantics](#error-semantics). + +A complete runnable version of this loop lives in [`examples/rust/concurrent_writers.rs`](../examples/rust/concurrent_writers.rs). + +--- + +## Conceptual model + +### The version chain + +For every `(table, rowid)` SQLRite has touched under `BEGIN CONCURRENT`, the [`MvStore`](../src/mvcc/store.rs) holds an ordered chain of `RowVersion`s: + +``` + begin=ts1 begin=ts3 begin=ts7 + end=Some(ts3) end=Some(ts7) end=None + ┌────────────┐ ┌────────────┐ ┌────────────┐ + rowid 42 ─→ │ Present { │ ──next──→ │ Present { │ ──next────→ │ Tombstone │ + │ balance: │ │ balance: │ │ (DELETE) │ + │ 100 │ │ 150 │ │ │ + │ } │ │ } │ │ │ + └────────────┘ └────────────┘ └────────────┘ +``` + +A version is **visible** to a transaction with begin-timestamp `T` when `begin <= T < end` (the textbook snapshot-isolation rule). New writes push a new head onto the chain at commit time, capping the previous latest version's `end` to the new `commit_ts`. + +### Timestamps come from a process-wide logical clock + +[`MvccClock`](../src/mvcc/clock.rs) is an `AtomicU64` that hands out `begin_ts` at `BEGIN CONCURRENT` and `commit_ts` at the start of validation. The clock's high-water mark is persisted in the WAL header (Phase 11.2's WAL v2) and seeded past the highest replayed `commit_ts` on reopen (Phase 11.9), so timestamps don't reuse the same value across restarts. + +### Commit-time validation + +When a `BEGIN CONCURRENT` transaction commits, the engine: + +1. Allocates a `commit_ts` from the clock. +2. Walks the write-set. For each `(table, rowid)`, if any committed version's `begin > tx.begin_ts`, somebody else superseded us → return `SQLRiteError::Busy`. +3. Otherwise, for each row in the write-set, push a new `RowVersion` onto the chain at `commit_ts`, capping the previous latest's `end`. +4. Append an `MvccCommitBatch` frame to the WAL; the legacy page-commit's fsync covers it (Phase 11.9). +5. Mirror the writes into `Database::tables` so the legacy read path stays correct after commit. +6. Drop the transaction's `TxHandle` and run a per-commit GC sweep over the write-set's chains. + +### Reads + +Reads via [`Statement::query`](../src/connection.rs) (Phase 11.5) consult `MvStore` first when a `BEGIN CONCURRENT` is open on the connection. If the row has a version visible at the transaction's `begin_ts`, that's the answer; otherwise the read falls through to the legacy table → pager path. This means a reader inside `BEGIN CONCURRENT` sees a consistent BEGIN-time snapshot for as long as the transaction is open. + +Reads *outside* `BEGIN CONCURRENT` still go through the legacy path — they see the latest committed state, exactly as before Phase 11. That's the keystone of the design: nothing about the existing non-concurrent codepath changed; MVCC is layered on top, opt-in. + +--- + +## SQL surface + +### `PRAGMA journal_mode` + +| Form | Effect | +|---|---| +| `PRAGMA journal_mode;` | Read — returns the current mode as a single-row `wal` / `mvcc` result | +| `PRAGMA journal_mode = mvcc;` | Switch this database into MVCC mode | +| `PRAGMA journal_mode = wal;` | Switch back to the legacy WAL-backed pager | + +Case-insensitive on both the pragma name and the value. Quoted values (`'mvcc'`) work; numeric values are rejected. Unknown modes return a typed error. + +The setting is **per-database**, not per-connection — every [`Connection::connect`](#sibling-handles) sibling sees the same value. Switching `Mvcc → Wal` is rejected if `MvStore` carries committed versions; call [`Connection::vacuum_mvcc`](#vacuum_mvcc) first to drain the store. + +### `BEGIN CONCURRENT` + +Opens a concurrent transaction. Requires `PRAGMA journal_mode = mvcc;` first. + +```sql +BEGIN CONCURRENT; +-- DML against the per-tx snapshot +SELECT … ; -- sees BEGIN-time state +INSERT … ; +UPDATE … ; +DELETE … ; +COMMIT; -- or ROLLBACK +``` + +Rules (each surfaces as a typed error): + +- Plain `BEGIN CONCURRENT` against a `Wal`-mode database is rejected. +- Nested transactions (`BEGIN CONCURRENT` inside an open one, or `BEGIN` inside one) are rejected. +- DDL inside `BEGIN CONCURRENT` is rejected — `CREATE TABLE`, `CREATE INDEX`, `DROP TABLE`, `DROP INDEX`, `ALTER TABLE`, `VACUUM` all bounce, the transaction stays open so the caller can `ROLLBACK`. +- Read-only databases reject `BEGIN CONCURRENT`. + +`COMMIT` may surface `SQLRiteError::Busy` or `SQLRiteError::BusySnapshot`. The transaction is dropped on either; the caller's loop should `continue` after a `ROLLBACK`. + +### `COMMIT` / `ROLLBACK` + +Inside an open `BEGIN CONCURRENT`, plain `COMMIT` validates the write-set and either commits or returns `Busy`. Plain `ROLLBACK` drops the per-tx state and returns control. Both also work outside `BEGIN CONCURRENT` (they fall through to the legacy single-writer transaction control). + +--- + +## Embedding API + +### Sibling handles + +A single `Connection::open` is the only path that touches the file. Mint additional handles with [`Connection::connect`](../src/connection.rs): + +```rust +let primary = Connection::open("orders.sqlrite")?; +let secondary = primary.connect(); +let tertiary = primary.connect(); +``` + +Every sibling shares the same `Arc>`. Each sibling can hold its own independent `BEGIN CONCURRENT` — that's the whole point of multi-handle MVCC. Sibling handles are `Send + Sync`, so it's safe to send them across threads. + +Sibling propagation across each SDK (Phase 11.7 + 11.8): + +| SDK | Sibling API | Retryable-error type | +|---|---|---| +| C FFI | `sqlrite_connect_sibling(existing, out)` | `SqlriteStatus::Busy` / `BusySnapshot`; `sqlrite_status_is_retryable` | +| Python | `conn.connect()` | `sqlrite.BusyError` / `sqlrite.BusySnapshotError` (both subclass `SQLRiteError`) | +| Node.js | `db.connect()` | `errorKind(message)` returns `'Busy'` / `'BusySnapshot'` / `'Other'` | +| Go | `(via database/sql pool — see notes below)` | `errors.Is(err, sqlrite.ErrBusy)` / `ErrBusySnapshot`; `sqlrite.IsRetryable(err)` | +| WASM | *(deferred — single-threaded runtime)* | *(deferred)* | + +For Go, each `sql.Open("sqlrite", path)` still constructs its own backing DB; siblings within a single `sql.DB` pool share state automatically. Cross-pool sharing is a separate follow-up (Phase 11.11b). + +### The retry loop + +The canonical shape is the same in every language: + +```rust +loop { + conn.execute("BEGIN CONCURRENT")?; + conn.execute(/* writes */)?; + match conn.execute("COMMIT") { + Ok(_) => break, + Err(e) if e.is_retryable() => { + conn.execute("ROLLBACK").ok(); + continue; + } + Err(e) => return Err(e.into()), + } +} +``` + +SQLRite intentionally **does not** ship an automatic-backoff retry helper — the right policy (immediate retry, exponential backoff, capped attempts, jittered, etc.) depends on the workload. The retryable-error classification is the only piece the SDK guarantees. + +### `Connection::vacuum_mvcc` + +Per-commit GC sweeps the write-set's chains automatically. For a deterministic full drain (memory-pressure testing, debug snapshots, `Mvcc → Wal` downgrade prep), call [`conn.vacuum_mvcc()`](../src/connection.rs) — returns the count of versions reclaimed across the whole store. Both paths are safe against in-flight readers: a reader inside `BEGIN CONCURRENT` keeps every version its `begin_ts` snapshot still needs visible. + +--- + +## REPL multi-handle demo (Phase 11.11a) + +The `sqlrite` REPL ships with three meta-commands for interactive MVCC demos. The prompt always shows the active handle (`sqlrite[A]>`, `sqlrite[B]>`): + +| Command | Effect | +|---|---| +| `.spawn` | Mint a sibling handle off the active one and switch to it | +| `.use NAME` | Switch the active handle (case-insensitive); errors with the list of valid names on miss | +| `.conns` | List every handle, mark the active one with `*`, tag handles in an open `BEGIN CONCURRENT` | + +End-to-end demo: + +```text +sqlrite[A]> PRAGMA journal_mode = mvcc; +sqlrite[A]> CREATE TABLE t (id INTEGER PRIMARY KEY, v INTEGER); +sqlrite[A]> INSERT INTO t (id, v) VALUES (1, 0); +sqlrite[A]> .spawn +Spawned sibling handle 'B' and switched to it. 2 handles open. +sqlrite[B]> .use A +sqlrite[A]> BEGIN CONCURRENT; +sqlrite[A]> UPDATE t SET v = 100 WHERE id = 1; +sqlrite[A]> .conns +2 handle(s): + * A (BEGIN CONCURRENT) + B +sqlrite[A]> .use B +sqlrite[B]> BEGIN CONCURRENT; +sqlrite[B]> UPDATE t SET v = 200 WHERE id = 1; +sqlrite[B]> COMMIT; +sqlrite[B]> .use A +sqlrite[A]> COMMIT; +An error occured: Busy: write-write conflict on t/1: another transaction +committed this row at ts=3 (after our begin_ts=1); transaction rolled +back, retry with a fresh BEGIN CONCURRENT +sqlrite[A]> .use B +sqlrite[B]> SELECT * FROM t; ++----+-----+ +| id | v | ++----+-----+ +| 1 | 200 | ++----+-----+ +``` + +--- + +## Error semantics + +| Variant | When | Retryable | +|---|---|---| +| `SQLRiteError::Busy` | A `BEGIN CONCURRENT` `COMMIT` lost the validation race — some other transaction superseded one of our row writes after our `begin_ts` | yes | +| `SQLRiteError::BusySnapshot` | A snapshot the read path expected has been GC'd; surfaces from `Statement::query` when a long-lived reader's `begin_ts` predates the GC watermark | yes | +| Any other variant | Programming error or storage failure — not retryable | no | + +`SQLRiteError::is_retryable()` is the single classifier — every SDK's retryable-error helper is a wrapper over the same predicate. + +--- + +## Durability and recovery + +### WAL log records (Phase 11.9) + +Every successful `BEGIN CONCURRENT` commit writes **two** WAL records: the legacy per-page commit frames *and* a new typed `MvccCommitBatch` frame distinguished by the sentinel `page_num = u32::MAX`. The MVCC frame is appended buffered; the legacy save's commit-frame fsync covers both — so a crash between commits either keeps both writes or loses both. + +The MVCC frame body encodes `commit_ts + record stream` (per-record: op tag, table name, rowid, optional column-value pairs). The encoder caps each batch at 4 KiB (the frame body size); multi-frame batches for very large transactions are a deferred follow-up. + +### Reopen replay + +`pager::open_database` walks every recovered MVCC frame and re-pushes the row versions into `MvStore` via `MvStore::push_committed`. The `MvccClock` is seeded past `max(WAL header's clock_high_water, max(commit_ts among replayed batches))` so post-restart transactions can never hand out a regressed `begin_ts`. + +### What's parked + +The checkpoint half of plan-doc §10.5 — folding MVCC log records back into pager-level updates so a WAL truncate doesn't lose them, and re-enabling the `Mvcc → Wal` journal-mode downgrade once the store is drainable — is the remaining slice. The legacy save mirror still covers durability of the visible row state on the read path, so the gap is foundation work, not a correctness regression. + +### WAL format version + +| Version | Adds | +|---|---| +| v1 | Pre-Phase-11 baseline. Reads cleanly today. | +| v2 (Phase 11.2) | `clock_high_water: u64` in the WAL header (bytes 24..32) | +| v3 (Phase 11.9) | MVCC log-record frames (`page_num = u32::MAX`) | + +Decoders accept v1..=v3. A v2 reader on a v3 WAL emits a clean "unsupported WAL format version" diagnostic instead of silently dropping MVCC frames. + +--- + +## Limitations + +- **`CREATE INDEX` is rejected while `journal_mode = mvcc`.** Index maintenance under MVCC is Phase 11.10 (deferred-by-design — Turso explicitly punted on the same problem). +- **DDL inside `BEGIN CONCURRENT` is rejected.** Run DDL outside the concurrent transaction, then begin a fresh one. +- **Cross-process MVCC is out of scope.** The version index is in-memory only; multi-process writers still serialize through the pager's `flock(LOCK_EX)`. SQLRite has no shared-memory coordination file. +- **No automatic backoff in retry helpers.** Callers pick the policy. +- **FTS / HNSW indexes are not maintained inside `BEGIN CONCURRENT`.** The per-row commit-apply path covers B-tree secondary indexes only; tables under MVCC writers shouldn't have FTS or HNSW indexes attached if you need the search index to stay current. +- **`AUTOINCREMENT` is not specifically guarded** — two concurrent INSERTs that each allocate the same rowid surface as `Busy` at the second commit. The plan's "reject AUTOINCREMENT under MVCC" gate is a clean follow-up. +- **Memory growth is bounded only via GC.** Per-commit sweeps + `vacuum_mvcc()` cover most cases; for adversarial workloads where readers hold long-lived `begin_ts` snapshots, the chains can grow until the longest-lived reader closes. +- **Bottom-up B-tree rebuild on every save.** The architectural mismatch flagged in the plan-doc still applies. MVCC amortizes the rebuild to checkpoint time only once the checkpoint-drain follow-up lands; until then, every concurrent commit's mirror write to `Database::tables` triggers the legacy `save_database` rebuild path. Fine for v0 workloads; will matter at scale. + +--- + +## See also + +- [`docs/concurrent-writes-plan.md`](concurrent-writes-plan.md) — original design proposal + sequencing decisions. Historical; the current doc reflects shipped reality. +- [`docs/supported-sql.md`](supported-sql.md) — full SQL reference; the `PRAGMA journal_mode` and `BEGIN CONCURRENT` sections cross-link here. +- [`docs/embedding.md`](embedding.md) — embedding API + multi-handle examples. +- [`docs/file-format.md`](file-format.md) — WAL frame layout, MVCC log-record body, clock-high-water field. +- [`docs/design-decisions.md`](design-decisions.md) §12a–§12h — the design notes accumulated across Phase 11 sub-phases. +- [`docs/roadmap.md`](roadmap.md#phase-11--concurrent-writes-via-mvcc--begin-concurrent-sqlr-22-in-flight--see-concurrent-writes-planmd) — phase-by-phase shipped vs deferred status. +- [`examples/rust/concurrent_writers.rs`](../examples/rust/concurrent_writers.rs) — runnable retry-loop example. + +External: + +- [Turso concurrent writes](https://docs.turso.tech/tursodb/concurrent-writes) — the direct inspiration; we cite their issues throughout the plan-doc. +- [Hekaton (Larson et al., VLDB 2011)](https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/main-mem-cc-techreport.pdf) — the optimistic MVCC paper Turso (and now SQLRite) builds on. +- [Hermitage anomaly test suite](https://github.com/ept/hermitage) — snapshot-isolation conformance bar; SQLRite has not yet ported these (a clean follow-up). diff --git a/docs/design-decisions.md b/docs/design-decisions.md index 444d0a7..26b0a11 100644 --- a/docs/design-decisions.md +++ b/docs/design-decisions.md @@ -144,6 +144,8 @@ Decisions are grouped by the engine layer they concern: parser, storage, concurr --- +> **Phase 11 (§12a–§12h) — concurrent writes via MVCC + `BEGIN CONCURRENT`.** These notes capture the per-slice design decisions made as Phase 11 shipped. For the **user-facing reference** (SQL surface, embedding API, SDK error mapping, REPL meta-commands, durability story, limitations) go to [`docs/concurrent-writes.md`](concurrent-writes.md). The plan-doc references below point to the original [`concurrent-writes-plan.md`](concurrent-writes-plan.md) which stays as the historical design record. + ### 12a. `Connection` as a thin handle over `Arc>` (Phase 11.1) **Decision.** `Connection` no longer owns a `Database` by value; it holds `Arc>` plus a per-handle prepared-statement LRU. A new `Connection::connect()` mints a sibling handle that shares the same backing engine state. The mutex is acquired transparently at the entry of every public method (`execute`, `prepare`, `database()`, accessors); statements release it between calls. `Connection: Send + Sync`. diff --git a/docs/embedding.md b/docs/embedding.md index d9375a5..4b2cee9 100644 --- a/docs/embedding.md +++ b/docs/embedding.md @@ -90,13 +90,17 @@ for h in writers { h.join().unwrap()?; } # Ok::<(), sqlrite::SQLRiteError>(()) ``` -Today every commit still serializes through the per-database mutex (and the pager's existing process-level `flock`); the goal of 11.1 is *capability*, not throughput. True multi-writer throughput on disjoint rows arrives with `BEGIN CONCURRENT` in 11.4 — see below + [`concurrent-writes-plan.md`](concurrent-writes-plan.md). +Today every commit still serializes through the per-database mutex (and the pager's existing process-level `flock`); the goal of 11.1 is *capability*, not throughput. True multi-writer throughput on disjoint rows arrives with `BEGIN CONCURRENT` in 11.4 — see below, plus the canonical [`docs/concurrent-writes.md`](concurrent-writes.md) reference for the full Phase 11 surface. Per-handle state — the prepared-statement cache (LRU populated by `prepare_cached`), the cache capacity setter — stays on each handle, by design (no extra mutex traffic for a per-thread accelerator). The shared state is the `Database` (tables, pager, transaction snapshot, auto-VACUUM threshold). ### Concurrent writes via `BEGIN CONCURRENT` (Phase 11.4) -*Phase 11.4 — see [`supported-sql.md`](supported-sql.md#begin-concurrent-phase-114-sqlr-22) for the full SQL reference.* Multi-writer concurrency is opt-in: `PRAGMA journal_mode = mvcc;` once per database, then each writer wraps its work in `BEGIN CONCURRENT;` … `COMMIT;`. Sibling [`Connection::connect`](#sharing-one-database-across-threads) handles can each hold their own open `BEGIN CONCURRENT`; commits are validated against the [`MvStore`](../src/mvcc/store.rs) version index and abort with `SQLRiteError::Busy` if another writer superseded one of our rows. +> **Canonical reference:** [`docs/concurrent-writes.md`](concurrent-writes.md) — the full Phase 11 user-facing reference (conceptual model, SQL, SDK error mapping, durability, limitations). The summary below is the embedding-API view of the same surface. +> +> **Runnable example:** [`examples/rust/concurrent_writers.rs`](../examples/rust/concurrent_writers.rs) — interleaved BEGINs across two sibling handles, demonstrating both the disjoint-row happy path and the same-row retry. Run with `cargo run --example concurrent_writers`. + +Multi-writer concurrency is opt-in: `PRAGMA journal_mode = mvcc;` once per database, then each writer wraps its work in `BEGIN CONCURRENT;` … `COMMIT;`. Sibling [`Connection::connect`](#sharing-one-database-across-threads) handles can each hold their own open `BEGIN CONCURRENT`; commits are validated against the [`MvStore`](../src/mvcc/store.rs) version index and abort with `SQLRiteError::Busy` if another writer superseded one of our rows. ```rust use sqlrite::{Connection, SQLRiteError}; @@ -126,11 +130,15 @@ The retryable-error branch is the headline new flow: pick a backoff policy that **Memory bounding.** Every successful commit triggers a per-row GC sweep over the write-set's chains, reclaiming versions no in-flight reader can possibly see anymore. For workloads where you want a deterministic full drain (memory-pressure testing, debug snapshots), call `conn.vacuum_mvcc()` — returns the count of versions reclaimed across the whole store. Both paths are correct against in-flight readers: a reader holding `BEGIN CONCURRENT; SELECT …` keeps every version its `begin_ts` snapshot needs. -**What's still ahead** (11.7+): +**What shipped after 11.4:** + +- 11.5 — reads inside the transaction see the BEGIN-time snapshot through `Statement::query` / `Statement::query_with_params` as well as `Connection::execute("SELECT…")`. +- 11.6 — per-commit GC + `Connection::vacuum_mvcc()` bound version-chain growth. +- 11.7 + 11.8 — every SDK (C FFI / Python / Node / Go) propagates `Busy` / `BusySnapshot` as a typed retryable error; the FFI's `sqlrite_connect_sibling`, Python's `Connection.connect()`, and Node's `db.connect()` mint sibling handles that share backing state. +- 11.9 — every successful `BEGIN CONCURRENT` commit writes a typed `MvccCommitBatch` frame to the WAL (covered by the same fsync as the legacy page commit), and reopen replays those frames into `MvStore` so the conflict-detection window survives a process restart. +- 11.11a — the REPL ships `.spawn` / `.use` / `.conns` for interactive multi-handle demos; the prompt shows the active handle. -- Reads inside the transaction see the BEGIN-time snapshot via both `Connection::execute("SELECT…")` and `Statement::query()` / `query_with_params()` — the prepare/query gap that 11.4 left open closed in 11.5. -- DDL inside `BEGIN CONCURRENT` is rejected with a typed error. -- The transaction's write-set persists only via the legacy `Database::tables` mirror — a crash mid-transaction loses everything (correct behaviour, the transaction never committed). Phase 11.7 introduces an MVCC log-record WAL frame so `BEGIN CONCURRENT` writes become durable through `MvStore` itself. +**What's deferred** (see [`docs/concurrent-writes.md`](concurrent-writes.md#limitations) for the full list): DDL inside `BEGIN CONCURRENT`, `CREATE INDEX` while `journal_mode = mvcc`, cross-process MVCC, the checkpoint-drain path that would re-enable `set_journal_mode(Mvcc → Wal)`, and the "N concurrent writers" benchmark workload (carved out as Phase 11.11b). ### What's deferred diff --git a/docs/roadmap.md b/docs/roadmap.md index e4fbd35..9885d9b 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -2,7 +2,7 @@ The project is staged in phases. Each phase is shippable on its own, ends with a working build + full test suite + a commit on `main`, and can be paused between. The README's roadmap section is a summary of this doc. -> **Active frontier (May 2026):** Phases 0–10 shipped end-to-end. After Phase 8 closed the v0.1.x cycle, the v0.2.0 → v0.9.1 wave (Phase 9, sub-phases 9a–9i) landed the SQL surface that had been parked under "possible extras": DDL completeness (DEFAULT, DROP TABLE/INDEX, ALTER TABLE), free-list + auto-VACUUM, IS NULL, GROUP BY + aggregates + DISTINCT + LIKE + IN, four flavors of JOIN, prepared statements with parameter binding, HNSW metric extension, and the PRAGMA dispatcher. Phase 10 published the SQLR-4 / SQLR-16 benchmarks against SQLite + DuckDB. **Current head: v0.9.1.** Phase 11 (concurrent writes via MVCC + `BEGIN CONCURRENT`, SQLR-22) is now in flight — the multi-connection foundation (11.1) is the first slice; see [`concurrent-writes-plan.md`](concurrent-writes-plan.md) for the full design. +> **Active frontier (May 2026):** Phases 0–10 shipped end-to-end. After Phase 8 closed the v0.1.x cycle, the v0.2.0 → v0.9.1 wave (Phase 9, sub-phases 9a–9i) landed the SQL surface that had been parked under "possible extras": DDL completeness (DEFAULT, DROP TABLE/INDEX, ALTER TABLE), free-list + auto-VACUUM, IS NULL, GROUP BY + aggregates + DISTINCT + LIKE + IN, four flavors of JOIN, prepared statements with parameter binding, HNSW metric extension, and the PRAGMA dispatcher. Phase 10 published the SQLR-4 / SQLR-16 benchmarks against SQLite + DuckDB. **Current head: v0.9.1.** **Phase 11 (concurrent writes via MVCC + `BEGIN CONCURRENT`, SQLR-22) is shipped end-to-end through 11.12** — the multi-connection foundation, logical clock, `MvStore`, `BEGIN CONCURRENT` writes + commit-time validation, snapshot-isolated reads, garbage collection, SDK propagation across C / Python / Node / Go, multi-handle SDK shape, WAL log-record durability + crash recovery, REPL `.spawn` for interactive demos, and the canonical user-facing reference all landed. A small set of follow-ups (checkpoint-drain to enable `Mvcc → Wal` downgrade, indexes under MVCC, the "N concurrent writers" benchmark workload) remain explicitly parked. See [`concurrent-writes.md`](concurrent-writes.md) for the user-facing reference; [`concurrent-writes-plan.md`](concurrent-writes-plan.md) for the design rationale. ## ✅ Phase 0 — Modernization @@ -581,9 +581,9 @@ Every executable statement accepts `?` placeholders anywhere a value literal is End-to-end SQLR-4 / SQLR-16 bench harness with twelve workloads across three groups (read-by-PK, transactional CRUD, analytical slices, vector / FTS retrieval). Pluggable `Driver` trait + bundled SQLite + DuckDB drivers; criterion-based; pinned-host runs published at [`docs/benchmarks.md`](benchmarks.md). Excluded from CI (criterion is too noisy on shared runners; `rusqlite-bundled` is heavy). See [`docs/benchmarks-plan.md`](benchmarks-plan.md) for the design and PRs #102–#114 for the staged rollout. -## Phase 11 — Concurrent writes via MVCC + `BEGIN CONCURRENT` *(SQLR-22; in flight — see [`concurrent-writes-plan.md`](concurrent-writes-plan.md))* +## Phase 11 — Concurrent writes via MVCC + `BEGIN CONCURRENT` *(SQLR-22; shipped end-to-end through 11.12 — canonical reference: [`concurrent-writes.md`](concurrent-writes.md); design rationale: [`concurrent-writes-plan.md`](concurrent-writes-plan.md))* -Lift SQLRite past SQLite's single-writer ceiling with multi-version concurrency control and a `BEGIN CONCURRENT` transaction mode, modelled on Turso's experimental MVCC. The plan doc internally numbers sub-phases as "Phase 10.x" (its working title before the roadmap renumbering); they're listed under Phase 11 here because Phase 10 already shipped. +Lift SQLRite past SQLite's single-writer ceiling with multi-version concurrency control and a `BEGIN CONCURRENT` transaction mode, modelled on Turso's experimental MVCC. The plan doc internally numbers sub-phases as "Phase 10.x" (its working title before the roadmap renumbering); they're listed under Phase 11 here because Phase 10 already shipped. Remaining follow-ups (checkpoint-drain to enable `Mvcc → Wal` downgrade, indexes under MVCC, the bench workload) are explicitly carved out and parked. ### ✅ Phase 11.1 — Multi-connection foundation *(plan-doc "Phase 10.1")* @@ -701,9 +701,13 @@ The downstream "N concurrent writers" benchmark workload (originally bundled int New benchmark in [`benchmarks/`](../benchmarks/) that pits SQLRite-MVCC against SQLite + DuckDB on a disjoint-row "N writers, mostly disjoint rows" scenario. Slots into the existing SQLR-16 harness as a Group D differentiator workload. Also includes Go SDK multi-handle work (cross-pool sibling shape) — see the 11.8 note for why that's a separate slice. -### Phase 11.12 — Docs *(planned, plan-doc "Phase 10.9")* +### ✅ Phase 11.12 — Docs sweep *(plan-doc "Phase 10.9")* -Promote the plan to `docs/concurrent-writes.md` and update the cross-references. +Promotes the plan to a canonical user-facing reference at [`docs/concurrent-writes.md`](concurrent-writes.md) — SQL surface, embedding API, SDK error mapping, REPL meta-commands, durability story, limitations all in one place. The original [`concurrent-writes-plan.md`](concurrent-writes-plan.md) stays as the historical design record with a redirect banner at the top. + +- Cross-references updated in [`docs/_index.md`](_index.md), [`docs/supported-sql.md`](supported-sql.md), [`docs/embedding.md`](embedding.md), this file, and the design-decisions doc. +- New runnable example at [`examples/rust/concurrent_writers.rs`](../examples/rust/concurrent_writers.rs) (registered as `cargo run --example concurrent_writers`) — two sibling handles, interleaved `BEGIN CONCURRENT`s, demonstrating both the disjoint-row happy path and the same-row retry. +- `examples/README.md` lists the new example alongside the existing quickstart and hybrid-retrieval entries. ## "Possible extras" not pinned to a phase diff --git a/docs/supported-sql.md b/docs/supported-sql.md index cdf49d8..f46b147 100644 --- a/docs/supported-sql.md +++ b/docs/supported-sql.md @@ -564,6 +564,8 @@ Out-of-range values (anything outside `0.0..=1.0`, `NaN`, `±∞`) and unknown i ### `PRAGMA journal_mode` (Phase 11.3, SQLR-22) +> The full Phase 11 user-facing reference — conceptual model, embedding API, SDK error mapping, REPL meta-commands, durability story, limitations — lives at [`docs/concurrent-writes.md`](concurrent-writes.md). This section is the SQL-syntax reference. + Selects the per-database concurrency model. `wal` (default) is the legacy WAL-backed pager every pre-Phase-11 build used; `mvcc` opts the database into multi-version concurrency control (Phase 11 — concurrent writes via `BEGIN CONCURRENT`). ```sql @@ -575,14 +577,14 @@ PRAGMA journal_mode = wal; -- switch back (rejected if the MvStore Case-insensitive on both the pragma name and the value. Quoted values (`'mvcc'`) work; numeric values are rejected (the field is enum-shaped). Unknown modes return a typed error and don't disturb the existing setting. -The setting is **per-database** — every `Connection::connect` sibling sees the same value (the [open-question](concurrent-writes-plan.md) on per-connection vs per-database journal mode resolved to per-database for v0; revisit if a workload requires the per-connection variant). Reachable through the public API as `Connection::journal_mode() -> JournalMode`. - -**What 11.3 changes:** the toggle is observable. The data structures backing MVCC (`MvccClock`, `MvStore`, the active-transaction registry) are allocated and round-trip through `PRAGMA`. **What 11.4 adds (this slice):** `BEGIN CONCURRENT` writes go through commit-time validation against `MvStore`; same-row conflicts surface as `SQLRiteError::Busy`. See `BEGIN CONCURRENT` below. +The setting is **per-database** — every `Connection::connect` sibling sees the same value. Reachable through the public API as `Connection::journal_mode() -> JournalMode`. --- ## `BEGIN CONCURRENT` (Phase 11.4, SQLR-22) +> For the conceptual walkthrough (version chains, snapshot-isolation visibility, the WAL log-record durability story, REPL `.spawn` demos), see [`docs/concurrent-writes.md`](concurrent-writes.md). This section is the SQL-syntax reference. + Opens a transaction that doesn't acquire the engine's single-writer lock — multiple `BEGIN CONCURRENT` transactions can coexist, on the same `Connection` or across sibling [`Connection::connect`](embedding.md#sharing-one-database-across-threads) handles. Writes accumulate against a per-transaction snapshot; at `COMMIT`, the engine validates the write-set against any versions that committed after the transaction's `begin_ts` and aborts with [`SQLRiteError::Busy`](../src/error.rs) if some other transaction superseded a row. ```sql @@ -681,7 +683,6 @@ For context when you hit `NotImplemented`. See [Roadmap](roadmap.md) for when th ### Transactions - Savepoints (`SAVEPOINT`, `RELEASE SAVEPOINT`, `ROLLBACK TO SAVEPOINT`) - Isolation-level control (`BEGIN IMMEDIATE`, `BEGIN EXCLUSIVE`) -- Concurrent writes / MVCC (`BEGIN CONCURRENT`) — proposal sketched in [`docs/concurrent-writes-plan.md`](concurrent-writes-plan.md) ### Query shape - `OFFSET` diff --git a/examples/README.md b/examples/README.md index 172a04e..0c8f917 100644 --- a/examples/README.md +++ b/examples/README.md @@ -32,6 +32,14 @@ cargo run --example hybrid-retrieval Combines BM25 lexical scoring (Phase 8b) with vector cosine distance (Phase 7d) in a single `ORDER BY`, showing where each ranking shape wins on the same corpus. Pre-baked vectors keep the example self-contained — no embedding-model dependency. Read [`hybrid-retrieval/README.md`](hybrid-retrieval/README.md) for the narrative. +## Running the concurrent-writers example (Phase 11.12) + +```bash +cargo run --example concurrent_writers +``` + +End-to-end `BEGIN CONCURRENT` demo with two sibling [`Connection`](../docs/embedding.md) handles minted via `Connection::connect`. Two scenarios: disjoint-row commits both succeed; interleaved same-row commits resolve with one winner and one `SQLRiteError::Busy` → retry. Reads from [`rust/concurrent_writers.rs`](rust/concurrent_writers.rs); the canonical conceptual walkthrough lives at [`docs/concurrent-writes.md`](../docs/concurrent-writes.md). + ## Running the C sample ```bash diff --git a/examples/rust/concurrent_writers.rs b/examples/rust/concurrent_writers.rs new file mode 100644 index 0000000..251dec7 --- /dev/null +++ b/examples/rust/concurrent_writers.rs @@ -0,0 +1,85 @@ +//! End-to-end `BEGIN CONCURRENT` demo with two sibling handles. +//! +//! Run with: `cargo run --example concurrent_writers` +//! +//! Phase 11 (SQLR-22) opt-in MVCC. The example: +//! +//! 1. Opens a connection, opts the database into `journal_mode = mvcc`. +//! 2. Mints a sibling handle via `Connection::connect` so two writers +//! share the same backing database. +//! 3. Runs two concurrent transactions: +//! - A and B touch *disjoint* rows → both commit. +//! - A and B touch the *same* row → the second commit fails +//! with `SQLRiteError::Busy`; the retry takes a fresh +//! `begin_ts`, observes the post-commit state, and lands. +//! +//! The retry loop is the canonical shape every SDK reuses; see +//! [`docs/concurrent-writes.md`](../../docs/concurrent-writes.md). + +use sqlrite::{Connection, Result}; + +fn main() -> Result<()> { + let mut a = Connection::open_in_memory()?; + a.execute("PRAGMA journal_mode = mvcc")?; + a.execute( + "CREATE TABLE accounts ( + id INTEGER PRIMARY KEY, + holder TEXT NOT NULL, + balance INTEGER NOT NULL + )", + )?; + a.execute("INSERT INTO accounts (id, holder, balance) VALUES (1, 'alice', 100)")?; + a.execute("INSERT INTO accounts (id, holder, balance) VALUES (2, 'bob', 100)")?; + + // Sibling handle on the same Arc>. In real apps + // you'd hand this to a worker thread; we keep it on the main + // thread to keep the demo readable. + let mut b = a.connect(); + + println!("=== Disjoint-row commits both succeed ==="); + a.execute("BEGIN CONCURRENT")?; + b.execute("BEGIN CONCURRENT")?; + a.execute("UPDATE accounts SET balance = balance + 10 WHERE id = 1")?; + b.execute("UPDATE accounts SET balance = balance + 20 WHERE id = 2")?; + a.execute("COMMIT")?; + b.execute("COMMIT")?; // write-sets don't intersect — no conflict. + print_balances(&mut a)?; + + println!("\n=== Same-row commits: A wins, B retries ==="); + // Interleave BEGINs so A.begin_ts < B.begin_ts and both see the + // same pre-update value. + a.execute("BEGIN CONCURRENT")?; + b.execute("BEGIN CONCURRENT")?; + a.execute("UPDATE accounts SET balance = balance + 5 WHERE id = 1")?; + b.execute("UPDATE accounts SET balance = balance + 50 WHERE id = 1")?; + a.execute("COMMIT")?; + // B's commit sees a version newer than its own `begin_ts` → Busy. + // The transaction is already dropped on the failed COMMIT; + // there's no ROLLBACK to run. Start a fresh BEGIN CONCURRENT. + match b.execute("COMMIT") { + Err(e) if e.is_retryable() => { + eprintln!(" B lost the race: {e}"); + b.execute("BEGIN CONCURRENT")?; + b.execute("UPDATE accounts SET balance = balance + 50 WHERE id = 1")?; + b.execute("COMMIT")?; + } + other => { + other?; + } + } + print_balances(&mut a)?; + + Ok(()) +} + +fn print_balances(conn: &mut Connection) -> Result<()> { + let stmt = conn.prepare("SELECT id, holder, balance FROM accounts ORDER BY id")?; + let mut rows = stmt.query()?; + while let Some(row) = rows.next()? { + let id: i64 = row.get_by_name("id")?; + let holder: String = row.get_by_name("holder")?; + let balance: i64 = row.get_by_name("balance")?; + println!(" account {id} ({holder}): {balance}"); + } + Ok(()) +}