feat(retrieval): Phase 3 — upsert / delete#241
Conversation
3d23af1 to
ebdfea4
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ebdfea4. Configure here.
- Add embedFn to Retriever; resolve dims from schema or an embedFn probe (cached)
- createIndex infers dims via embedFn when the schema omits them
- upsert: embed text, HSET {name}:{id} with fields, encoded vector, and __text
- delete: DEL derived keys; both no-op on empty input
- Export resolveVectorFieldName from ft-create for reuse
- Export EmbedFn and UpsertEntry types
ebdfea4 to
c6860e8
Compare
KIvanow
left a comment
There was a problem hiding this comment.
Looks good overall, approving. Three non-blocking notes worth a follow-up:
-
upsertisn't atomic at the wire level. Validation is all-or-nothing, but the HSET loop issues N separate calls, so if the connection drops mid-batch, entries 1..k-1 are already written. Fine for idempotent re-runs, but there's no MULTI. Matches the deferred batch/transaction note. -
HSET is merge, not replace. Re-upserting an id with fewer fields leaves stale fields from the prior write. If callers expect "replace document" semantics this surprises them. Probably intended as partial update, but worth documenting.
-
idisn't validated. An empty id yields a key equal to the index prefix (docs:), with no guard. Edge case only.

Phase 3 —
upsert/deleteStacked on #240 (Phase 2) — base is the Phase 2 branch, not master.
Adds document write + delete to
Retriever, withembedFnand dimension inference wired in.What's new
embedFnconstructor option;resolveDims()=schema.vector.dimsor a one-time cachedembedFn('probe').createIndexnow infers dims viaembedFnwhen the schema omits them (deferred from Phase 2); the explicit-dims path is unchanged.upsert(entries)— embeds text, validates vector length against resolved dims,HSET {name}:{id}with stringified fields,encodeFloat32vector, and the reserved__textfield. Rejects entries whosefieldscollide with the reserved__textor the vector field name.delete(ids)—DELthe derived{name}:{id}keys in one call. Both methods no-op on empty input.EmbedFn,UpsertEntry,TEXT_FIELD, andresolveVectorFieldName(so Phase 4queryshares the text-field name and vector-field resolution).Tests
12 unit tests (mocked client + fake embedFn), full package suite 54/54 green,
tsc --noEmit+ prettier clean. Includes a cache test assertingembedFnis probed exactly once across multiple entries.Review-driven changes
A pre-PR code review surfaced and fixed: the reserved-field collision guard, exporting
TEXT_FIELD/resolveVectorFieldNamefor Phase 4, and the cache-coverage test.Deferred (not in this phase)
upsertembeds sequentially. Bulk-ingestion parallelism needs bounded concurrency (rate-limit safety) — belongs with Phase 5 (observability/perf), not a naivePromise.all.EmbedFntype is duplicated with semantic-cache; consolidating intopackages/sharedis a separate cross-package change.Note
Medium Risk
New Valkey write paths and embedding validation affect ingestion correctness; behavior is well covered by mocked unit tests but not yet by real-Valkey integration.
Overview
Adds document write and delete to
Retriever, plus optionalembedFnand shared dimension resolution for index creation and upserts.createIndexnow callsresolveDims()beforeFT.CREATE: useschema.vector.dimswhen set, otherwise a one-time cachedembedFn('probe')(errors if neither is available).upsert(entries)embeds each entry’s text, checks vector length against resolved dims, and **HSET**s{name}:{id}with stringified custom fields,encodeFloat32on the vector field, and reserved__text. Userfieldscannot use__textor the configured vector field name; invalid batches fail before anyHSET.delete(ids)issues a singleDELfor derived keys. Emptyupsert/deleteinputs are no-ops.Public API additions:
EmbedFn,UpsertEntry,TEXT_FIELD, and exportedresolveVectorFieldName. Newupsert-delete.test.tscovers inference, upsert/delete behavior, validation, and dim caching.Reviewed by Cursor Bugbot for commit c6860e8. Bugbot is set up for automated code reviews on this repo. Configure here.