debug: smartGenerate step logging for hang diagnosis#38
Merged
Conversation
Adds a new POST endpoint that detects nested (GROUPED) column structures in a ClickHouse table and returns discriminator columns with their distinct values, enabling the frontend to show filter options in the Smart Generate dialog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion, cleanup
- Use cubejs.options.driverFactory({ securityContext }) instead of cubejs.driverFactory()
- Add SAFE_IDENTIFIER regex validation on schema/table params to prevent SQL injection
- Add driver.release() cleanup in catch block
- Use { code, message } error response shape matching other routes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… cube names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Restore AS alias clause in legacy ARRAY JOIN path SQL with partition WHERE
- Use ClickHouse-standard doubled single quotes ('') instead of backslash escaping
- Remove redundant template literal wrapping in arrayJoinGroups map
- Add warning when groupColumns is empty but arrayJoinGroups were requested
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Insert LLM polishing step after AI enrichment and before final JS code generation. The polisher rewrites cube definitions per modeling principles while preserving original SQL. Polish results are included in all response payloads (dry-run, no-changes, and apply). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ndpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…selected Without this, profiling ran against the base table and reported empty columns for nested array sub-columns. Now the profiler uses LEFT ARRAY JOIN so column stats reflect the expanded array-joined rows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse Nested columns (stored as parallel arrays with dotted names) require enumerating each sub-column in the ARRAY JOIN clause: ARRAY JOIN `parent.child1` AS child1_alias, `parent.child2` AS child2_alias Previously used `ARRAY JOIN parent` which is invalid for this column type. Fixes both profiler (for accurate column stats on expanded rows) and cubeBuilder (for correct cube SQL generation). Non-array-join profiling path is unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace dots with underscores in the full column name (e.g. commerce.products.entry_type → commerce_products_entry_type) for both the ARRAY JOIN alias and the nested WHERE filter clause. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The frontend sends nestedFilters in the profile-table POST body but the route wasn't extracting or passing them to the profiler function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Profiler: filter ARRAY JOIN to columns where rawType starts with Array( Scalar dotted columns (e.g. commerce.details Nullable(String)) excluded - Profiler + CubeBuilder: use full column name with dots→underscores as alias (e.g. commerce.products.entry_type → commerce_products_entry_type) - CubeBuilder: dimension/measure SQL uses the aliased column name - WHERE clauses use the aliased names consistently Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Query rewrite rules (e.g. partition scoping by team properties) are now loaded and translated to raw SQL filters before profiling. This ensures the profiler respects the same row-level access controls as the Cube.js query layer. Applied in both profileTable and smartGenerate routes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…alidate After the LLM returns polished cubes, generates JS and runs validateModelSyntax. If validation fails, sends errors back to the LLM for correction, up to 2 cycles. Also mounts first-principles path and checks multiple principle file locations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Zod schemas are now built inside an async getSchemas() function that imports zod dynamically, avoiding the undefined 'z' at module load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…for Zod 4 compat zodResponseFormat fails with z.any() as a record value type in Zod 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… compat OpenAI structured output requires every field to have an explicit type. Replaced z.any().nullable() for rollingWindow, timeShift, refresh_key, and meta with fully typed schemas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Preview shows the raw generated model for fast feedback. Polishing runs only when the user clicks Apply Changes, avoiding timeouts during preview. Also increased polisher timeout to 180s for large models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plan 1: Single-line fix for lcFrom missing arrayJoinClause (4 broken queries) Plan 2: 6-task plan for principle-compliant cubeBuilder heuristics (titles, meta, paired counts, format, public:false, drill members, pre-aggregations) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ries Single-line fix: the lcFrom variable (used by 4 downstream queries for Map numeric stats, Map string stats, and LC value probe) was missing the arrayJoinClause. All nested filter WHERE conditions referenced aliased column names that only exist after ARRAY JOIN. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- titleFromName: snake_case → Title Case on all fields and cubes - Partition-first dimension ordering - Complete meta block: grain, grain_description, time_dimension, time_zone, refresh_cadence - Paired filtered counts for LC dimensions (max 10 values) - Drill members on primary count measure - Format inference: currency/percent by column name pattern - public: false on plumbing fields (GIDs, write_key, etc.) - Default pre-aggregations: daily + monthly rollups with ClickHouse indexes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…c, drill_members in yamlGenerator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6-task plan: fix Hasura timeout, create modelAdvisor with 4 focused micro-prompts (descriptions, segments, metrics, pre-aggregations), integrate into pipeline, update frontend, delete old polisher, full end-to-end testing including Cube.js compiler validation and Explore page query verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r debug logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…enerate pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rays Cube.js expects pre_aggregations as named keys with indexes as nested named objects. The yamlGenerator was using JSON.stringify which produced arrays with 'name' fields — invalid Cube.js syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously buildCubes always emitted the raw base-table cube AND the array-joined cube. When the user selects an array join with filters, only the filtered array-joined cube should be produced — one cube, one file, one intent. The raw cube is still built internally (for field processing and as a base for the array join cube's inherited dimensions) but is not emitted. All heuristics (partition-first, grain/meta, drill members, format inference, public:false, pre-aggregations) are now applied to the array-joined cube when it's the sole output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a new model is merged with an existing one, FILTER_PARAMS expressions from the old model may reference the previous cube name. This replaces all FILTER_PARAMS.old_cube_name references with the actual cube name from the current generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…models When nestedFilters are active: 1. Force mergeStrategy='replace' — FILTER_PARAMS from old model are incompatible with ARRAY JOIN (indexOf on scalar columns) 2. Use the cube name for the file name — ensures file name matches cube name for Cube.js resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. SQL now uses newlines + indentation for readability in model editor 2. Removed count_distinct_approx from pre-agg filters and advisor schema — not supported by ClickHouse driver Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FILTER_PARAMS dimensions use indexOf on array columns which become scalars after ARRAY JOIN. These dimensions cause runtime ClickHouse errors and must be excluded from the array-joined cube. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ions When FILTER_PARAMS dimensions are stripped from the array-joined cube, paired count measures that reference those dimensions must also be removed, and drill_members lists must be cleaned. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the user deselects columns in the profile preview, the filter column (e.g. commerce.products.entry_type) might be removed from the columns Map. But the WHERE clause still references it. Ensure filter columns are always in the ARRAY JOIN regardless of selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… preview Backend: smartGenerate strips excluded dimensions/measures/segments from cubes before generating JS. excluded_fields flows through Hasura action → RPC handler → CubeJS route. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When user deselects fields in change preview, all references must be
cleaned: drill_members, paired counts, pre-aggregation measures/dimensions,
and derived metrics that reference excluded fields via {name} syntax.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. ARRAY JOIN SQL now uses SELECT *, alias1, alias2... instead of just
SELECT *. ClickHouse doesn't project ARRAY JOIN aliases into outer
subquery scope with SELECT * alone.
2. Segments that reference excluded dimensions via {CUBE}.field_name
are now stripped during field exclusion cleanup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs cleanup in a loop until stable — each pass may remove fields that
other fields depend on. Checks both {name} and {CUBE}.name reference
patterns. Handles cascading dependencies (metric A references metric B
which references excluded field C).
Also adds debug logging for excluded_fields receipt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Smart Generation improvements: - Map-expanded fields default to unchecked (opt-in) in change preview - ARRAY JOIN nested fields default to unchecked (opt-in) in change preview - AI-generated metrics default to unchecked (opt-in) in change preview - Count measure and rewrite-rule dimensions always selected - Source tagging in diffModels (map, nested, ai) for frontend selection logic - Skip LLM toggle support (skip_llm parameter) - Required fields (rewrite rules + filter dims) passed to frontend ARRAY JOIN SQL generation: - Replace SELECT * with explicit column list to prevent Array/scalar ambiguity - ARRAY JOIN alias names projected in SELECT for Cube.js subquery visibility - Non-AJ nested groups (location.*) excluded from SELECT (no corresponding dims) - After excluded_fields, prune ARRAY JOIN SQL to only surviving columns - Recompute summary counts after field exclusion Field continuity fixes: - Non-AJ nested groups (location.*) pass through processColumns despite no profiling - FILTER_PARAMS dimensions for non-AJ groups preserved in AJ cube (indexOf still valid) - AJ group FILTER_PARAMS dimensions correctly excluded (indexOf breaks on scalars) - Backtick-quote dotted column names in NestedFieldProcessor SQL - AI metrics empty selection sends empty array (not undefined) to prevent include-all - SELECT pruning uses exact alias name tracking (not regex heuristic) Removed: - Paired filtered count measures (count_dimensions_* etc) - granularity/partition_granularity from pre-aggregations (Cube.js v1.6 compat) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ggregation handling The /discover response now includes a usage block documenting all authenticated endpoints and the header mapping (id → x-hasura-datasource-id, etc.) so consumers can self-discover the API. Load export refactored to skip native ClickHouse passthrough when pre-aggregations are in the query plan, falling back to semantic streaming which correctly routes through CubeStore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p_llm Add ConfigMap-driven default datasource provisioning: - New utility reads /etc/synmetrix/default-datasources.json (ConfigMap mount) and provisions datasources for each new team with name-based dedup - Integrated into both createTeam RPC and WorkOS JIT provisioning paths - Backfill RPC handler to provision existing teams (admin-only action) - Passwords resolved from k8s secrets via env vars at provisioning time Expose skip_llm parameter through Hasura action schema so the frontend checkbox actually reaches the smart generation route. Also: expand discover endpoint documentation with additional API endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-and-array-join-fixes # Conflicts: # services/actions/src/rpc/smartGenSchemas.js # services/hasura/metadata/actions.graphql
…ioning
cubeBuilder: add granularity field ('day'/'month') to all pre-aggregation
definitions — required by Cube.js v1.6.x when time_dimension is present.
Without it, models fail compilation with "granularity is required".
yamlGenerator: serialize granularity field in JS model output.
docker-compose.dev.yml: remove environment block that overrode .dev.env
passwords with empty shell vars, breaking datasource provisioning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yasirali179
approved these changes
Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Temporary debug logging to find where smart gen hangs on k8s. Will remove after diagnosis.