Skip to content

feat(retrieval): Phase 1 — typed schema and FT.CREATE translation#236

Merged
jamby77 merged 7 commits into
masterfrom
feature/retrieval-sdk-phase1-schema-builder
Jun 15, 2026
Merged

feat(retrieval): Phase 1 — typed schema and FT.CREATE translation#236
jamby77 merged 7 commits into
masterfrom
feature/retrieval-sdk-phase1-schema-builder

Conversation

@jamby77

@jamby77 jamby77 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Phase 1 of the Retrieval SDK plan, stacked on #234 (Phase 0). Adds the first code of @betterdb/retrieval plus one deferred Phase 0 review item in the kit.

  • Scaffold packages/retrieval (@betterdb/retrieval 0.1.0), mirroring valkey-search-kit, with a workspace dep on the kit
  • Pure buildFtCreateArgs(name, schema, capabilities?) translating the typed index schema (text / tag+separator / numeric+sortable fields; HNSW|FLAT vector as a discriminated union) into the full FT.CREATE argument vector — HNSW defaults M=16 / EF_CONSTRUCTION=200 / EF_RUNTIME=10 always emitted; exported indexName() / keyPrefix() naming helpers
  • TEXT field emission gated via FtCapabilities.textFields — valkey-search < 1.2 rejects TEXT, so callers on older modules get an actionable error instead of a server failure
  • Tighten isIndexNotFoundError in valkey-search-kit: the broad 'not found' substring match is now scoped to index errors ('not found' + 'index' co-occurrence). Verified against live engines — valkey-search 1.2 emits Index with name '…' not found in database 0, Redis 8 emits No such index …; both stay matched, generic key not found-style messages no longer misclassify. The semantic-cache characterization lock was deliberately split into positive + negative cases for this.

Test Plan

  • @betterdb/retrieval unit tests: 32/32 (table-driven, full-vector deep equality)
  • @betterdb/valkey-search-kit unit tests: 32/32 (incl. empirically captured engine phrasings)
  • @betterdb/semantic-cache suite: 191/191 (characterization net intact)
  • semantic-cache integration suite vs live valkey-bundle (valkey-search 1.2, port 6384): 13/13
  • tsc builds clean across the three packages

Stacked PR: base is feature/retrieval-sdk-valkey-search-kit (#234). After #234 merges, this will be rebased onto master and retargeted — do not delete the base branch before that.


Note

Medium Risk
The tightened index-not-found heuristic changes runtime error handling for semantic-cache initialization and any other kit consumers; behavior is well-tested but mis-tuned matching could still mis-route FT.INFO failures.

Overview
Introduces @betterdb/retrieval (0.1.0) as Phase 1 of the retrieval SDK: typed index schema (text / tag / numeric fields plus HNSW|FLAT vector specs) and pure buildFtCreateArgs that emits the full FT.CREATE argument vector, with indexName / keyPrefix helpers and FtCapabilities.textFields to fail fast when TEXT fields are used on valkey-search < 1.2.

isIndexNotFoundError in @betterdb/valkey-search-kit no longer treats every 'not found' substring as a missing index; it now requires both 'not found' and 'index' (plus existing phrasings), so messages like key not found are not misclassified. Semantic-cache characterization tests were split into positive index-missing cases and a negative case where generic not-found errors surface as ValkeyCommandError instead of triggering index creation.

Reviewed by Cursor Bugbot for commit ab188f1. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread packages/retrieval/src/ft-create.ts
@jamby77 jamby77 force-pushed the feature/retrieval-sdk-valkey-search-kit branch from 3c8b814 to 6ce7dde Compare June 12, 2026 12:13
@jamby77 jamby77 force-pushed the feature/retrieval-sdk-phase1-schema-builder branch 2 times, most recently from c14395b to 93458ff Compare June 12, 2026 12:24
@KIvanow

KIvanow commented Jun 12, 2026

Copy link
Copy Markdown
Member

Minor DX note: fresh-checkout tests fail until the kit is built

Running pnpm --filter @betterdb/semantic-cache test on a clean checkout currently fails with:

Error: Failed to resolve entry for package "@betterdb/valkey-search-kit".
The package may have incorrect main/module/exports specified in its package.json.
  ❯ src/utils.ts:8:1

Vitest resolves the workspace symlink through the kit's main: ./dist/index.js, so the kit's dist/ has to exist before semantic-cache (and soon retrieval, once it actually imports the kit) tests can run. A turbo test^build dependency, or pointing the kit's dev-time resolution at src/ (e.g. via publishConfig), would make pnpm test work out of the box.

Totally fine to leave as-is if this is already planned for one of the next PRs in the stack — just flagging it so it doesn't get lost.

Base automatically changed from feature/retrieval-sdk-valkey-search-kit to master June 15, 2026 11:04
jamby77 added 6 commits June 15, 2026 14:05
…nslation

- Add RetrievalSchema, FieldSpec, VectorSpec, FtCapabilities types in schema.ts
- Implement pure buildFtCreateArgs in ft-create.ts: HNSW (6 pairs/12 params with
  defaults M=16 EF_CONSTRUCTION=200 EF_RUNTIME=10), FLAT (3 pairs/6 params),
  all three field types (text/tag/separator/numeric/sortable), metric mapping,
  textFields capability gate, dims/fieldName/algorithm-param validation
- 24 table-driven tests, TDD red→green
- Export all public types + builder from index.ts
- Remove --passWithNoTests from test script
- Discriminated-union VectorSpec: HnswVectorSpec / FlatVectorSpec split
  so FLAT cannot carry HNSW params at the type level
- Replace validateDims void + as-cast with requireDims narrowing guard
- Add indexName/keyPrefix helpers with empty-name validation; export both
- resolveVectorFieldName helper eliminates duplicated ?? 'embedding'
- validateFlatHnswParams uses 'in' guards, accepts VectorSpec (no any)
- METRIC_MAP typed as Record<VectorMetric, string>
- Harmonize error messages to include offending value
- Replace no-interpolation template literals with single-quoted strings
- Prettier pass over all src files
- 32 tests (up from 24): FLAT dims missing/invalid, empty/whitespace
  index name, indexName/keyPrefix unit cases; FLAT+HNSW param throw
  tests construct invalid objects via property mutation to avoid casts
  in production code
@jamby77 jamby77 force-pushed the feature/retrieval-sdk-phase1-schema-builder branch from 93458ff to ee9fb6a Compare June 15, 2026 11:05

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ee9fb6a. Configure here.

'EF_CONSTRUCTION',
String(efConstruction),
'EF_RUNTIME',
String(efRuntime),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HNSW tuning params unvalidated

Medium Severity

buildVectorArgs applies ?? for m, efConstruction, and efRuntime, so explicit NaN or non-finite values are forwarded into the FT.CREATE vector attribute list (e.g. M becomes the string NaN). dims is validated via requireDims, but HNSW tuning fields are not, so invalid schemas can produce server-rejected commands instead of a clear client error.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ee9fb6a. Configure here.

for (const name of Object.keys(fields)) {
if (name.length === 0) {
throw new Error('Invalid field name: empty field name is not allowed');
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace-only schema field names

Low Severity

validateFieldNames treats only zero-length keys as invalid, while index names and vector fieldName reject whitespace-only strings via trim(). A schema field key consisting only of spaces is accepted and emitted in the FT.CREATE SCHEMA section, which is inconsistent validation and can yield confusing server failures.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ee9fb6a. Configure here.

@jamby77 jamby77 merged commit 2f8baf1 into master Jun 15, 2026
3 checks passed
@jamby77 jamby77 deleted the feature/retrieval-sdk-phase1-schema-builder branch June 15, 2026 11:11
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants