Skip to content

Release 5.5.0: embeddings tooling, image bytes pipeline, and search/ACL hardening#17

Merged
AdrianCurtin merged 1 commit into
mainfrom
rag_bach_readble_by
Jun 10, 2026
Merged

Release 5.5.0: embeddings tooling, image bytes pipeline, and search/ACL hardening#17
AdrianCurtin merged 1 commit into
mainfrom
rag_bach_readble_by

Conversation

@AdrianCurtin

@AdrianCurtin AdrianCurtin commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Release 5.5.0: embeddings tooling, image bytes pipeline, and search/ACL hardening

Bumps parse-stack-next to 5.5.0. This release adds an SDK-side image bytes pipeline for image embeddings, bulk embedding operations tooling, and a set of security-hardening and correctness fixes across vector search, retrieval, ACL aggregation routing, and webhook dispatch.

Breaking changes

  • BREAKING: the British-spelled :ACL.writeable_by operator now resolves to the same public-inclusive, role-expanding implementation as :ACL.writable_by. Previously the one-letter spelling difference silently selected a separate strict, non-role-expanding constraint, so the two spellings produced different result sets. Code that relied on the old strict behavior of writeable_by should pass strict: true or use the :writable_by_exact operator.
  • CHANGED: an unrecognized element in a readable_by / writable_by permission array (or an unsupported Symbol) now raises ArgumentError instead of being silently dropped, which weakened the intended filter.

New

  • Image bytes pipeline: embed_image ..., source: :bytes fetches image bytes SDK-side through Parse::Embeddings::ImageFetch with magic-byte MIME verification (no header/extension fallthrough), a deny-by-default allowed_image_hosts allowlist, configurable allowed_image_types, and EXIF/XMP metadata stripping (on by default via exif_strip). validate_image_url!(mode: :fetch) validates URLs for SDK-side fetch without requiring the provider-egress sentinel.
  • Parse::Embeddings::BatchEmbedder: bulk embedding with rate pacing (requests_per_minute:), retry with backoff (max_attempts:, retry_on:), progress callbacks, and BatchFailed errors that report batch_index/completed_count for resumability.
  • Embedding cache: opt-in process-local query-embedding cache (Parse::Embeddings::Cache.enable!) with LRU + TTL semantics, a fail-open MonetaStore adapter for shared backends, and hit/miss instrumentation.
  • SpendCap controls: warn_at: soft-cap threshold with a one-shot warning event per crossing, plus query-side charging (charge_query!, with_precharged).
  • reembed! and embedding provenance: model-aware re-embedding with only_stale: filtering, and an auto-declared <into>_meta object field recording provider/model/dimensions for each managed vector.
  • Vector index drift verification: declared index shape is checked against the live Atlas index at query time, governed by Parse::VectorSearch.index_drift_policy (:warn / :raise / :ignore).
  • Retrieval pointer-value translation: caller-friendly pointer filter values are translated to storage form (_p_<field> / Class$objectId) before tenant-scope folding.
  • Opt-in Unicode regex flags: { value: /.../, unicode: true } constraint form compiles to $options: "iu" without changing default regex behavior.

Fixed / hardened

  • Scoped aggregation now fails closed: queries scoped by session token, acl_user, or acl_role raise MongoDirectRequired instead of silently running over the master-key-only REST /aggregate endpoint when mongo-direct is unavailable.
  • Hybrid search recomputes _hybrid_score from post-ACL visible order, closing a membership-inference side channel, and no longer caches authorization errors as "rankFusion unsupported".
  • ACL constraint fixes: strict (*_exact) variants, corrected not_readable_by / not_writable_by semantics for missing-ACL documents, empty-intent (readable_by([])) matching, and role self-inclusion in role expansion (an unpersisted role no longer raises "no valid permissions").
  • Webhook after_save chain now runs exactly once per delivery (fixes class-route + wildcard double-fire), with per-phase error isolation and corrected ruby_initiated? memoization.
  • verify_password now shares the login rate-limit bucket, closing a credential-probing bypass.

Docs

  • README and docs/atlas_vector_search_guide.md updated for all new APIs; CHANGELOG updated for 5.5.0.

Copilot AI review requested due to automatic review settings June 9, 2026 20:57

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Release 5.5.0 of parse-stack-next, adding new embeddings/image-bytes functionality and multiple security hardening and correctness fixes across vector search, retrieval, ACL aggregation behavior, and webhook afterSave callback dispatch.

Changes:

  • Adds SDK-side image bytes fetch (embed_image source: :bytes) with magic-byte MIME verification, allowlists, and EXIF/XMP stripping; updates Cohere/Voyage image providers accordingly.
  • Introduces embedding ops tooling: BatchEmbedder, opt-in query-embed cache (Parse::Embeddings::Cache + Moneta adapter), and expanded spend-cap controls (query charging + warn_at).
  • Adds vectorSearch index drift verification, hybrid-search hardening, retrieval pointer-filter translation, and several ACL/aggregation + webhook routing fixes; updates docs/tests and bumps version to 5.5.0.

Reviewed changes

Copilot reviewed 50 out of 51 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/lib/parse/webhook_triggers_test.rb Adjusts tests for afterSave callback chaining behavior
test/lib/parse/webhook_aftersave_payload_fidelity_test.rb Drives afterSave chain explicitly in lifecycle dispatch helper
test/lib/parse/verify_password_rate_limit_test.rb Adds rate-limit parity tests for verify_password
test/lib/parse/vector_search_hybrid_security_test.rb Adds hybrid-search security regression tests
test/lib/parse/vector_index_drift_test.rb Adds drift verification policy + findings tests
test/lib/parse/search_index_migrator_tenant_filter_test.rb Tests auto-added tenant filter path in vectorSearch declarations
test/lib/parse/retrieval_pointer_filter_test.rb Adds pointer-value translation tests for retrieval filters
test/lib/parse/regex_unicode_option_unit_test.rb Tests opt-in unicode regex options compilation
test/lib/parse/query/hint_mongo_direct_integration_test.rb Adds Mongo-direct integration test for query hints
test/lib/parse/query/constraints/acl_query_constraints_test.rb Updates ACL constraint tests for new alias/semantics
test/lib/parse/embeddings_voyage_image_test.rb Updates Voyage image input validation error expectations
test/lib/parse/embeddings_spend_cap_query_test.rb Adds query spend-cap coverage + warn_at tests
test/lib/parse/embeddings_image_fetch_test.rb Adds tests for ImageFetch sniff/verify/strip/fetch pipeline
test/lib/parse/embeddings_cohere_image_test.rb Updates Cohere image input validation error expectations
test/lib/parse/embeddings_cache_test.rb Adds tests for embedding cache + Moneta adapter
test/lib/parse/embeddings_batch_embedder_test.rb Adds tests for batch slicing/pacing/backoff behavior
test/lib/parse/embed_managed_meta_reembed_test.rb Tests <into>_meta, reembed!, and bytes-mode embed_image
test/lib/parse/cloud_result_decode_test.rb Adds cloud decode sessionToken preservation tests
test/lib/parse/cloud_functions_module_test.rb Adds raw: behavior tests for cloud function calls
test/lib/parse/aggregation_auto_promotion_test.rb Updates scoped aggregation fail-closed behavior tests
test/lib/parse/agent/mcp_resource_subscriptions_test.rb Adds authorization gate parity tests for subscriptions
test/lib/parse/acl_constraints_unit_test.rb Expands ACL aggregation routing + fail-closed regression tests
README.md Documents 5.5 features and updates capability notes
lib/parse/webhooks/payload.rb Fixes ruby-initiated memoization behavior
lib/parse/webhooks.rb Moves afterSave callback chain to once-per-delivery path + adds safety
lib/parse/vector_search/hybrid.rb Hardens probe classification + recomputes visible-order hybrid scores
lib/parse/vector_search.rb Adds index drift policy config surface
lib/parse/stack/version.rb Bumps version to 5.5.0
lib/parse/schema/search_index_migrator.rb Auto-augments vectorSearch declarations with tenant filter path
lib/parse/retrieval/retriever.rb Adds pointer-value translation for retrieval filters
lib/parse/retrieval/agent_tool.rb Wraps retrieval in spend-cap precharged scope
lib/parse/query/constraint.rb Adds helper to parse { value:, unicode: true } regex option form
lib/parse/model/core/vector_searchable.rb Adds spend-cap charging, cache hook, and index drift verification
lib/parse/model/core/embed_managed.rb Adds bytes-mode embed_image, provenance meta, and reembed!
lib/parse/model/acl.rb Adds include_missing toggle and strict predicate shaping
lib/parse/embeddings/voyage.rb Supports FetchedImage inputs + base64 rows for Voyage
lib/parse/embeddings/spend_cap.rb Adds warn_at, with_precharged, and query charging
lib/parse/embeddings/provider.rb Updates provider contract for image sources (URL or FetchedImage)
lib/parse/embeddings/image_fetch.rb Adds ImageFetch (sniff/verify/strip/fetch + FetchedImage)
lib/parse/embeddings/cohere.rb Supports FetchedImage inputs + data-URI forwarding for Cohere
lib/parse/embeddings/cache.rb Adds opt-in query-embed cache + Moneta adapter
lib/parse/embeddings/batch_embedder.rb Adds batch-level embed orchestration with pacing/backoff
lib/parse/embeddings.rb Wires new embeddings components + allowed_image_types + fetch-mode validation
lib/parse/client.rb Documents trusted cloud decode behavior + raw guidance
lib/parse/api/users.rb Adds shared login rate-limit to verify_password
Gemfile.lock Updates gem version to 5.5.0
docs/atlas_vector_search_guide.md Documents drift verification, caching/spend caps, bytes-mode embedding, reembed tooling
CHANGELOG.md Adds 5.5.0 release notes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/parse/embeddings/image_fetch.rb Outdated
Comment on lines +143 to +146
io = Parse::File.safe_open_url(canonical)
bytes = io.read
bytes = bytes.to_s.dup.force_encoding(Encoding::BINARY)
io.close if io.respond_to?(:close)
Comment on lines +119 to +127
def index_drift_policy=(value)
v = value.to_sym
unless INDEX_DRIFT_POLICIES.include?(v)
raise ArgumentError,
"Parse::VectorSearch.index_drift_policy must be one of " \
"#{INDEX_DRIFT_POLICIES.inspect} (got #{value.inspect})."
end
@index_drift_policy = v
end
@AdrianCurtin AdrianCurtin changed the title Release 5.5.0: embeddings, image bytes, and fixes Release 5.5.0: embeddings tooling, image bytes pipeline, and search/ACL hardening Jun 9, 2026
@AdrianCurtin AdrianCurtin force-pushed the rag_bach_readble_by branch 2 times, most recently from cccbdb0 to aae716f Compare June 9, 2026 21:53
Bumps the SDK to 5.5.0 and adds a major set of embedding, image, caching, migration, and hardening features. Key changes:

- Multimodal image bytes path: SDK-side image download via Parse::Embeddings::ImageFetch with magic-byte MIME sniffing, URL-extension cross-check, and configurable allowed_image_types; EXIF/XMP stripping is on by default; embed_image now supports source: :bytes and FetchedImage objects to avoid provider-side fetches.
- Bulk embedding & resilience: Parse::Embeddings::BatchEmbedder adds batch slicing, inter-batch pacing, exponential backoff with jitter, and a BatchFailed error for resumable jobs.
- Query-embed cache: Parse::Embeddings::Cache (opt-in, LRU+TTL) with MonetaStore adapter for persistent L2 sharing and a hashed keyspace to avoid plaintext queries landing in stores; cache hits emit existing embed notifications with cached: true.
- Spend-cap improvements: SpendCap now covers all query-embed paths (direct callers included), supports warn_at soft-cap notifications, and provides tooling to avoid double-billing for agent tools.
- Embedding provenance & migrations: auto-declared <into>_meta object with {provider,model,dimensions,modality,embedded_at}; Class.reembed! for resumable bulk re-embeds; guidance for same-shape vs changed-width migrations and dual-field workflow.
- Vector index drift detection: first-query verification of Atlas vectorSearch index numDimensions/similarity and tenant-scope coverage with configurable Parse::VectorSearch.index_drift_policy (:warn/:raise/:ignore).
- Retrieval & filter hardening: pointer-value translation into MongoDB storage form for pointer-valued filters; various ACL/aggregation fixes and stricter/strict: options for permission constraints; aggregation terminals now route via mongo-direct when necessary and fail-closed when scoped and direct is unavailable.
- Hybrid search & ACL fixes: rankFusion score recomputation for scoped callers, probe error-class narrowing, and multiple webhook after_save callback hardening (single-run semantics and swallowed callback errors where appropriate).
- Client ergonomics & docs: README, changelog and Atlas vector search guide updated with new features, examples, and operator notes; numerous tests added/updated for embeddings, image fetch, cache, batch embedder, vector drift, retrieval filters and webhook behavior.

Overall this changeset hardens embedding/image handling (PII protections and MIME-laundering prevention), adds operational tooling for bulk re-embedding and caching, and tightens vector-search / ACL correctness and safety.
@AdrianCurtin AdrianCurtin force-pushed the rag_bach_readble_by branch from aae716f to 17b4e47 Compare June 9, 2026 21:58
@AdrianCurtin AdrianCurtin merged commit cb203ee into main Jun 10, 2026
11 checks passed
@AdrianCurtin AdrianCurtin deleted the rag_bach_readble_by branch June 10, 2026 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants