debug: smartGenerate step logging for hang diagnosis by acmeguy · Pull Request #38 · smartdataHQ/synmetrix

acmeguy · 2026-04-18T13:03:51Z

Temporary debug logging to find where smart gen hangs on k8s. Will remove after diagnosis.

Adds a new POST endpoint that detects nested (GROUPED) column structures in a ClickHouse table and returns discriminator columns with their distinct values, enabling the frontend to show filter options in the Smart Generate dialog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ion, cleanup - Use cubejs.options.driverFactory({ securityContext }) instead of cubejs.driverFactory() - Add SAFE_IDENTIFIER regex validation on schema/table params to prevent SQL injection - Add driver.release() cleanup in catch block - Use { code, message } error response shape matching other routes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… cube names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Restore AS alias clause in legacy ARRAY JOIN path SQL with partition WHERE - Use ClickHouse-standard doubled single quotes ('') instead of backslash escaping - Remove redundant template literal wrapping in arrayJoinGroups map - Add warning when groupColumns is empty but arrayJoinGroups were requested Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Insert LLM polishing step after AI enrichment and before final JS code generation. The polisher rewrites cube definitions per modeling principles while preserving original SQL. Polish results are included in all response payloads (dry-run, no-changes, and apply). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ndpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…selected Without this, profiling ran against the base table and reported empty columns for nested array sub-columns. Now the profiler uses LEFT ARRAY JOIN so column stats reflect the expanded array-joined rows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ClickHouse Nested columns (stored as parallel arrays with dotted names) require enumerating each sub-column in the ARRAY JOIN clause: ARRAY JOIN `parent.child1` AS child1_alias, `parent.child2` AS child2_alias Previously used `ARRAY JOIN parent` which is invalid for this column type. Fixes both profiler (for accurate column stats on expanded rows) and cubeBuilder (for correct cube SQL generation). Non-array-join profiling path is unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace dots with underscores in the full column name (e.g. commerce.products.entry_type → commerce_products_entry_type) for both the ARRAY JOIN alias and the nested WHERE filter clause. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The frontend sends nestedFilters in the profile-table POST body but the route wasn't extracting or passing them to the profiler function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Profiler: filter ARRAY JOIN to columns where rawType starts with Array( Scalar dotted columns (e.g. commerce.details Nullable(String)) excluded - Profiler + CubeBuilder: use full column name with dots→underscores as alias (e.g. commerce.products.entry_type → commerce_products_entry_type) - CubeBuilder: dimension/measure SQL uses the aliased column name - WHERE clauses use the aliased names consistently Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Query rewrite rules (e.g. partition scoping by team properties) are now loaded and translated to raw SQL filters before profiling. This ensures the profiler respects the same row-level access controls as the Cube.js query layer. Applied in both profileTable and smartGenerate routes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…alidate After the LLM returns polished cubes, generates JS and runs validateModelSyntax. If validation fails, sends errors back to the LLM for correction, up to 2 cycles. Also mounts first-principles path and checks multiple principle file locations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Zod schemas are now built inside an async getSchemas() function that imports zod dynamically, avoiding the undefined 'z' at module load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…for Zod 4 compat zodResponseFormat fails with z.any() as a record value type in Zod 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… compat OpenAI structured output requires every field to have an explicit type. Replaced z.any().nullable() for rollingWindow, timeShift, refresh_key, and meta with fully typed schemas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Preview shows the raw generated model for fast feedback. Polishing runs only when the user clicks Apply Changes, avoiding timeouts during preview. Also increased polisher timeout to 180s for large models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Plan 1: Single-line fix for lcFrom missing arrayJoinClause (4 broken queries) Plan 2: 6-task plan for principle-compliant cubeBuilder heuristics (titles, meta, paired counts, format, public:false, drill members, pre-aggregations) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ries Single-line fix: the lcFrom variable (used by 4 downstream queries for Map numeric stats, Map string stats, and LC value probe) was missing the arrayJoinClause. All nested filter WHERE conditions referenced aliased column names that only exist after ARRAY JOIN. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- titleFromName: snake_case → Title Case on all fields and cubes - Partition-first dimension ordering - Complete meta block: grain, grain_description, time_dimension, time_zone, refresh_cadence - Paired filtered counts for LC dimensions (max 10 values) - Drill members on primary count measure - Format inference: currency/percent by column name pattern - public: false on plumbing fields (GIDs, write_key, etc.) - Default pre-aggregations: daily + monthly rollups with ClickHouse indexes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…c, drill_members in yamlGenerator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

6-task plan: fix Hasura timeout, create modelAdvisor with 4 focused micro-prompts (descriptions, segments, metrics, pre-aggregations), integrate into pipeline, update frontend, delete old polisher, full end-to-end testing including Cube.js compiler validation and Explore page query verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r debug logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…enerate pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rays Cube.js expects pre_aggregations as named keys with indexes as nested named objects. The yamlGenerator was using JSON.stringify which produced arrays with 'name' fields — invalid Cube.js syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously buildCubes always emitted the raw base-table cube AND the array-joined cube. When the user selects an array join with filters, only the filtered array-joined cube should be produced — one cube, one file, one intent. The raw cube is still built internally (for field processing and as a base for the array join cube's inherited dimensions) but is not emitted. All heuristics (partition-first, grain/meta, drill members, format inference, public:false, pre-aggregations) are now applied to the array-joined cube when it's the sole output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a new model is merged with an existing one, FILTER_PARAMS expressions from the old model may reference the previous cube name. This replaces all FILTER_PARAMS.old_cube_name references with the actual cube name from the current generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…models When nestedFilters are active: 1. Force mergeStrategy='replace' — FILTER_PARAMS from old model are incompatible with ARRAY JOIN (indexOf on scalar columns) 2. Use the cube name for the file name — ensures file name matches cube name for Cube.js resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. SQL now uses newlines + indentation for readability in model editor 2. Removed count_distinct_approx from pre-agg filters and advisor schema — not supported by ClickHouse driver Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

FILTER_PARAMS dimensions use indexOf on array columns which become scalars after ARRAY JOIN. These dimensions cause runtime ClickHouse errors and must be excluded from the array-joined cube. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ions When FILTER_PARAMS dimensions are stripped from the array-joined cube, paired count measures that reference those dimensions must also be removed, and drill_members lists must be cleaned. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the user deselects columns in the profile preview, the filter column (e.g. commerce.products.entry_type) might be removed from the columns Map. But the WHERE clause still references it. Ensure filter columns are always in the ARRAY JOIN regardless of selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… preview Backend: smartGenerate strips excluded dimensions/measures/segments from cubes before generating JS. excluded_fields flows through Hasura action → RPC handler → CubeJS route. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When user deselects fields in change preview, all references must be cleaned: drill_members, paired counts, pre-aggregation measures/dimensions, and derived metrics that reference excluded fields via {name} syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. ARRAY JOIN SQL now uses SELECT *, alias1, alias2... instead of just SELECT *. ClickHouse doesn't project ARRAY JOIN aliases into outer subquery scope with SELECT * alone. 2. Segments that reference excluded dimensions via {CUBE}.field_name are now stripped during field exclusion cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Runs cleanup in a loop until stable — each pass may remove fields that other fields depend on. Checks both {name} and {CUBE}.name reference patterns. Handles cascading dependencies (metric A references metric B which references excluded field C). Also adds debug logging for excluded_fields receipt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Smart Generation improvements: - Map-expanded fields default to unchecked (opt-in) in change preview - ARRAY JOIN nested fields default to unchecked (opt-in) in change preview - AI-generated metrics default to unchecked (opt-in) in change preview - Count measure and rewrite-rule dimensions always selected - Source tagging in diffModels (map, nested, ai) for frontend selection logic - Skip LLM toggle support (skip_llm parameter) - Required fields (rewrite rules + filter dims) passed to frontend ARRAY JOIN SQL generation: - Replace SELECT * with explicit column list to prevent Array/scalar ambiguity - ARRAY JOIN alias names projected in SELECT for Cube.js subquery visibility - Non-AJ nested groups (location.*) excluded from SELECT (no corresponding dims) - After excluded_fields, prune ARRAY JOIN SQL to only surviving columns - Recompute summary counts after field exclusion Field continuity fixes: - Non-AJ nested groups (location.*) pass through processColumns despite no profiling - FILTER_PARAMS dimensions for non-AJ groups preserved in AJ cube (indexOf still valid) - AJ group FILTER_PARAMS dimensions correctly excluded (indexOf breaks on scalars) - Backtick-quote dotted column names in NestedFieldProcessor SQL - AI metrics empty selection sends empty array (not undefined) to prevent include-all - SELECT pruning uses exact alias name tracking (not regex heuristic) Removed: - Paired filtered count measures (count_dimensions_* etc) - granularity/partition_granularity from pre-aggregations (Cube.js v1.6 compat) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ggregation handling The /discover response now includes a usage block documenting all authenticated endpoints and the header mapping (id → x-hasura-datasource-id, etc.) so consumers can self-discover the API. Load export refactored to skip native ClickHouse passthrough when pre-aggregations are in the query plan, falling back to semantic streaming which correctly routes through CubeStore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…p_llm Add ConfigMap-driven default datasource provisioning: - New utility reads /etc/synmetrix/default-datasources.json (ConfigMap mount) and provisions datasources for each new team with name-based dedup - Integrated into both createTeam RPC and WorkOS JIT provisioning paths - Backfill RPC handler to provision existing teams (admin-only action) - Passwords resolved from k8s secrets via env vars at provisioning time Expose skip_llm parameter through Hasura action schema so the frontend checkbox actually reaches the smart generation route. Also: expand discover endpoint documentation with additional API endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-and-array-join-fixes # Conflicts: # services/actions/src/rpc/smartGenSchemas.js # services/hasura/metadata/actions.graphql

…ioning cubeBuilder: add granularity field ('day'/'month') to all pre-aggregation definitions — required by Cube.js v1.6.x when time_dimension is present. Without it, models fail compilation with "granularity is required". yamlGenerator: serialize granularity field in JS model output. docker-compose.dev.yml: remove environment block that overrode .dev.env passwords with empty shell vars, breaking datasource provisioning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

acmeguy and others added 30 commits March 31, 2026 08:57

feat: enhance cubeBuilder with nested filter support and auto-derived…

2eecef5

… cube names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: thread nestedFilters through smart-generate pipeline and profiler

faaf830

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: add default param for nestedFilters in buildWhereClause

26b3707

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add NestedFilterInput types to Hasura actions and RPC handler

b4adbf1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: return raw_type and value_type in discoverNested discriminators

c808cfc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add modelPolisher LLM module for cube-principles compliance

5be570d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: mount cube-principles.md for modelPolisher in dev environment

1a55821

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add polish field to SmartGenOutput

dd47242

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use arrayJoin for nested column value lookups in column-values e…

08f1d6a

…ndpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: pass nestedFilters from profile-table route to profiler

51846dc

The frontend sends nestedFilters in the profile-table POST body but the route wasn't extracting or passing them to the profiler function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: lazy-load Zod schemas to avoid module-level z reference error

9e0d561

Zod schemas are now built inside an async getSchemas() function that imports zod dynamically, avoiding the undefined 'z' at module load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: replace z.record(z.any()) with z.record(z.string(), z.string()) …

ab9dfde

…for Zod 4 compat zodResponseFormat fails with z.any() as a record value type in Zod 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wip: smart gen improvements - plans, specs, principles copy

ae209f8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: serialize titles, descriptions, pre-aggregations, format, publi…

4d5d3f4

…c, drill_members in yamlGenerator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove debug logging from profiler nested filter code

85a952d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

acmeguy and others added 24 commits March 31, 2026 08:57

fix: increase smart_gen_dataschemas timeout to 300s, add nested filte…

dc973d9

…r debug logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add micro-prompt modelAdvisor replacing monolithic polisher

2607b30

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: replace monolithic polisher with micro-prompt advisor in smartG…

d802136

…enerate pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove old monolithic modelPolisher, replaced by modelAdvisor

0a9138e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feat/map-field-opt-in…

619c96e

…-and-array-join-fixes # Conflicts: # services/actions/src/rpc/smartGenSchemas.js # services/hasura/metadata/actions.graphql

debug: add step logging to smartGenerate for hang diagnosis

45da30c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove debug step logging from smartGenerate

326848d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yasirali179 approved these changes Apr 18, 2026

View reviewed changes

acmeguy merged commit 5d0c3e2 into main Apr 18, 2026
2 of 3 checks passed

acmeguy deleted the feat/map-field-opt-in-and-array-join-fixes branch April 18, 2026 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debug: smartGenerate step logging for hang diagnosis#38

debug: smartGenerate step logging for hang diagnosis#38
acmeguy merged 54 commits intomainfrom
feat/map-field-opt-in-and-array-join-fixes

acmeguy commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acmeguy commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants