Skip to content

feat(retrieval): Phase 4 — query (vector + filters + hybrid)#242

Open
jamby77 wants to merge 1 commit into
feature/retrieval-sdk-phase3-upsert-deletefrom
feature/retrieval-sdk-phase4-query
Open

feat(retrieval): Phase 4 — query (vector + filters + hybrid)#242
jamby77 wants to merge 1 commit into
feature/retrieval-sdk-phase3-upsert-deletefrom
feature/retrieval-sdk-phase4-query

Conversation

@jamby77

@jamby77 jamby77 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Phase 4 — query (vector + filters + hybrid)

Stacked on #241 (Phase 3) — base is the Phase 3 branch, not master.

What's new

  • buildFtSearchQuery(schema, k, filter?) — pure builder: (filter)=>[KNN k @vec $vec AS __score]; TAG → @f:{escapeTag(v)}, NUMERIC → @f:[v v]. Throws on unknown fields, TEXT fields, and non-numeric values for numeric fields.
  • Retriever.query(options) — embeds text or uses a precomputed vector (exactly one), runs FT.SEARCH … PARAMS 2 vec … LIMIT 0 k DIALECT 2, maps rows → { id (prefix stripped), score, text, fields }; empty → []. Validates k is a positive integer and that a precomputed vector matches the index dimension.
  • hybrid: 'rerank' via an injectable rerankFn(queryText, hits) (requires rerankFn + text).
  • New fields.ts centralizes the reserved field names (__text, __score); buildFtCreateArgs now rejects schema fields using them.

Public API

Exports buildFtSearchQuery, QueryFilter, QueryHit, QueryOptions, RerankFn, and the TEXT_FIELD/SCORE_FIELD constants.

Tests

20 unit tests (6 builder + 14 query) plus reserved-field guards; full package suite 76/76 green, tsc --noEmit + prettier clean.

Review-driven changes (pre-PR review)

Added: k positive-integer validation, NUMERIC filter value-type check, precomputed-vector dimension check, reserved-field guard at index creation (via shared fields.ts), and a stronger rerank assertion verifying it receives mapped hits.

Deferred (not in this phase)

  • RETURN clause to avoid shipping embedding bytes for every hit — Phase 5 (perf/observability).
  • Rerank over-fetch (fetch >k candidates then trim) — the plan scopes rerank to reordering k; over-fetch is a future enhancement.
  • score semantics: KNN returns distance (smaller = closer); the field is named score per the spec — worth documenting for callers.
  • Real-Valkey round-trip lands in Phase 6 integration.

Note

Medium Risk
New search/query path touches embedding, FT.SEARCH construction, and hit mapping; mistakes could affect result correctness or filter semantics, but changes are well-validated in unit tests and don't alter auth or persistence beyond search.

Overview
Adds Phase 4 query to the retrieval package: KNN search over Valkey/RediSearch with optional tag/numeric filters and an optional post-search rerank hook.

buildFtSearchQuery builds the FT.SEARCH dialect string (filter=>[KNN k @vec $vec AS __score]), with TAG escaping and AND-combined clauses; it rejects unknown fields, TEXT filters, and non-numeric values on numeric fields.

Retriever.query accepts text or a precomputed vector (mutually exclusive), validates k and vector dimensions, runs FT.SEARCH with PARAMS/DIALECT 2, and maps hits to { id, score, text, fields } (stripping key prefix and internal __text/__score/embedding fields). hybrid: 'rerank' calls an injectable rerankFn when text is provided.

fields.ts centralizes __text and __score; buildFtCreateArgs now rejects schema fields with those reserved names. Public exports add query types and buildFtSearchQuery. Unit tests cover the builder, query path, reserved fields, and rerank validation.

Reviewed by Cursor Bugbot for commit bf41ae5. Bugbot is set up for automated code reviews on this repo. Configure here.

- Add buildFtSearchQuery: KNN query string with TAG/NUMERIC filter clauses
- Add Retriever.query: embed text or accept a precomputed vector, FT.SEARCH,
  map rows to { id, score, fields, text } stripping reserved fields
- Reject both/neither text|vector; empty results return []
- Support optional hybrid:'rerank' via an injectable rerankFn
- Export query types, buildFtSearchQuery, SCORE_FIELD

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit bf41ae5. Configure here.

`Query vector dimension mismatch: index expects ${declared}, got ${options.vector.length}`,
);
}
return options.vector;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector query skips inferred dims

Medium Severity

resolveQueryVector only compares a precomputed vector to schema.vector.dims. When dims is omitted and the index dimension was inferred via embedFn (as in existing upsert/create flows), text queries still validate through embedresolveDims, but a vector query accepts any length and can reach FT.SEARCH with a mismatched embedding.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit bf41ae5. Configure here.

this.schema = options.schema;
this.capabilities = options.capabilities;
this.embedFn = options.embedFn;
this.rerankFn = options.rerankFn;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upsert allows reserved score field

Medium Severity

This change reserves __score at index creation and maps KNN results with score: Number(hit.fields[__score]), but assertNoReservedFields still only blocks __text and the vector field. Upsert can persist a document __score hash field that may overwrite or mix with the KNN alias in parsed FT.SEARCH fields, yielding incorrect hit scores.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit bf41ae5. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant