Skip to content

Adds LegacyFilterToQueryFilterTransformer#271

Open
suddendust wants to merge 8 commits intohypertrace:mainfrom
suddendust:pg_writes_filter_transformer
Open

Adds LegacyFilterToQueryFilterTransformer#271
suddendust wants to merge 8 commits intohypertrace:mainfrom
suddendust:pg_writes_filter_transformer

Conversation

@suddendust
Copy link
Contributor

@suddendust suddendust commented Feb 4, 2026

Add Legacy-to-Query Filter Transformer

Summary

Enables flat PostgreSQL collections to use the v1 query parser by transforming legacy filters to the newer query filter format.

Why

We want to move to using the new filter API now. It's also hardened in production since the query layer uses it throughout.

What

  • Transforms legacy filter operators to v1 equivalents (AND, OR, EQ, IN, LIKE, etc).
  • Uses SchemaRegistry to detect the type of the columns (since this information is not carried in the legacy filters).

Transformation Example

// Legacy
Filter.and(
    Filter.eq("props.brand", "Dettol"),
    Filter.gt("price", 100)
)

// Transformed
LogicalExpression.and(
    RelationalExpression.of(
        JsonIdentifierExpression.of("props", JsonFieldType.STRING, "brand"),
        EQ, ConstantExpression.of("Dettol")),
    RelationalExpression.of(
        IdentifierExpression.of("price"),
        GT, ConstantExpression.of(100))
)

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 85.12397% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.63%. Comparing base (6a2b05d) to head (de560d8).

Files with missing lines Patch % Lines
...nsformer/LegacyFilterToQueryFilterTransformer.java 85.00% 9 Missing and 9 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #271      +/-   ##
============================================
+ Coverage     80.52%   80.63%   +0.10%     
  Complexity     1396     1396              
============================================
  Files           239      240       +1     
  Lines          6695     6815     +120     
  Branches        615      636      +21     
============================================
+ Hits           5391     5495     +104     
- Misses          900      907       +7     
- Partials        404      413       +9     
Flag Coverage Δ
integration 80.63% <85.12%> (+0.10%) ⬆️
unit 53.70% <0.82%> (-0.97%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

suresh-prakash
suresh-prakash previously approved these changes Feb 5, 2026
*
* <p>Resolution strategy: progressively try shorter prefixes to find a JSONB column.
*/
private Optional<String> findJsonbColumnPrefix(String path) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use case for this one? Maybe some documented example will help

Copy link
Contributor Author

@suddendust suddendust Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So say the identifier in the filter is props.inheritiedAttributes.color = 'red'. Now, three possibilities arise:

  1. props is the json col and inheritiedAttributes.color is the path.
  2. props.inheritiedAttributes is the jsonb col and color is the path.
  3. props.inheritiedAttributes.color is the jsonb col.

This method progressively looks for prefixes and checks its type to determine which case it is.

return Optional.empty();
}

/** Extracts the nested JSONB path from a full path given the resolved column name. */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docs

}

if (value instanceof Object[]) {
return createConstantExpressionFromCollection(Arrays.asList((Object[]) value));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, if query is getting fired from the old path, we would need to figure out type at runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By old path, do you mean Mongo? If so, then it doesn't really case about the types. Here's how the parser looks:

      Object value = filter.getValue();
      Map<String, Object> map = new HashMap<>();
      switch (op) {
        case EQ:
          map.put(filter.getFieldName(), value);

It just puts in the object. The driver determines the type at runtime.

Copy link

@puneet-traceable puneet-traceable Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant if someone is using legacy filter?

Copy link
Contributor Author

@suddendust suddendust Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gotcha. Since legacy filters with PG is today works on nested docs, it basically extracts them as text using ->> and cast:

  public static String prepareCast(String field, Object value) {
    return prepareCast(field, getType(value));
  }

  public static String prepareCast(String field, Type type) {
    String fmt = "CAST (%s AS %s)";
    if (type.equals(Type.NUMERIC)) {
      return String.format(fmt, field, type);
    } else if (type.equals(Type.BOOLEAN)) {
      return String.format(fmt, field, type);
    } else /* default is string */ {
      return field;
    }
  }

Type is determined by the value:

  public static Type getType(Object value) {
    boolean isArrayType = false;
    Type type = Type.STRING;

    // handle the case if the value type is collection for filter operator - `IN`
    // Currently, for `IN` operator, we are considering List collection, and it is fair
    // assumption that all its value of the same types. Based on that and for consistency
    // we will use CAST ( <field name> as <type> ) for all non string operator.
    // Ref : https://github.com/hypertrace/document-store/pull/30#discussion_r571782575
    if (value instanceof List<?> && ((List<Object>) value).size() > 0) {
      List<Object> listValue = (List<Object>) value;
      value = listValue.get(0);
      isArrayType = true;
    }

    if (value instanceof Number) {
      type = Type.NUMERIC;
    } else if (value instanceof Boolean) {
      type = Type.BOOLEAN;
    } else if (isArrayType) {
      type = Type.STRING_ARRAY;
    }
    return type;
  }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for anyone migrating, the recommendation would be to use new filter interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't added any new methods to the Collection interface that accept the new filter directly - Clients would still have to pass in the older filter and this converter would convert it internally. While I can add those methods, I feel this way clients wouldn't have to change their code.

Copy link

@puneet-traceable puneet-traceable Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that leave us in the same inefficient query problems that we faced in the read path? Might be non-issue if update patterns are largely id based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants