Skip to content

Integrate JiT compilation into CLI#2110

Open
rafal-hawrylak wants to merge 6 commits intomainfrom
cli_jit_integration
Open

Integrate JiT compilation into CLI#2110
rafal-hawrylak wants to merge 6 commits intomainfrom
cli_jit_integration

Conversation

@rafal-hawrylak
Copy link
Copy Markdown
Collaborator

No description provided.

@rafal-hawrylak rafal-hawrylak self-assigned this Mar 9, 2026
@rafal-hawrylak rafal-hawrylak marked this pull request as ready for review March 9, 2026 20:28
@rafal-hawrylak rafal-hawrylak requested a review from a team as a code owner March 9, 2026 20:28
@rafal-hawrylak rafal-hawrylak enabled auto-merge (squash) March 9, 2026 20:28
Copy link
Copy Markdown
Contributor

@kolina kolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's test compatibility with different Dataform core versions.

Also this change seems worth it to bump a minor version of Dataform core

Comment thread cli/api/commands/compile.ts
Comment thread cli/api/commands/compile.ts
Comment thread cli/api/commands/compile.ts
Comment thread cli/api/commands/run.ts Outdated
Comment thread cli/api/commands/run.ts Outdated
Comment thread cli/tests/jit/jit_run_test.ts
Comment thread cli/api/commands/compile.ts Outdated
Comment thread packages/@dataform/cli/worker.ts Outdated
Comment thread protos/execution.proto
Comment thread tests/api/api.spec.ts Outdated
Copy link
Copy Markdown
Collaborator

@ikholopov-omni ikholopov-omni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* feat: add worker process management for JiT compilation

- Introduce base worker and JiT-specific child process logic

- Implement RPC bridge for database access during JiT

- Add VM scripts and loader for isolated execution

- Add unit tests for the RPC mechanism
- Enhance error coercion in common/errors/errors.ts
- Sync protos/execution.proto and protos/jit.proto with new fields
* feat: add worker process management for JiT compilation

- Introduce base worker and JiT-specific child process logic

- Implement RPC bridge for database access during JiT

- Add VM scripts and loader for isolated execution

- Add unit tests for the RPC mechanism
* feat: add action descriptor support to JiT compilation results

- Implement action and column descriptor logic in core/jit_compiler.ts

- Support overrides for descriptions, labels, and tags in JiT results

- Enhance error coercion in common/errors/errors.ts

- Add comprehensive JiT metadata tests in core/jit_compiler_test.ts

- Sync protos/execution.proto and protos/jit.proto with new result fields
@rafal-hawrylak rafal-hawrylak force-pushed the cli_jit_integration branch 13 times, most recently from 10bb4e6 to 824d7d9 Compare April 3, 2026 14:08
@rafal-hawrylak rafal-hawrylak force-pushed the cli_jit_integration branch 3 times, most recently from 2466f28 to 38cd3b9 Compare April 7, 2026 09:24
Comment thread cli/api/commands/run.ts Outdated
Comment thread cli/api/commands/run.ts
try {
if (action.jitCode) {
await this.compileJitAction(action, actionResult, this.dbadapter);
if ((actionResult.status as dataform.ActionResult.ExecutionStatus) === dataform.ActionResult.ExecutionStatus.FAILED) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need casts here? Here and below

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to override TypeScript narrowing the type to exactly RUNNING (literal types). It believes the type is ExecutionStatus.FAILED and assumes it is not mutated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that compileJitAction only updates actionResult in the exception block, so you can consider not passing actionResult there and moving the exception block into this function

Comment thread cli/api/commands/run.ts Outdated
Comment thread cli/api/commands/run.ts
try {
const jitResponse = await this.executionOptions.jitCompiler(
jitRequest,
this.executionOptions.projectDir,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reponding to the older comment

In the CLI, projectDir allows the JiT worker to resolve local require() calls and find @dataform/core. While this is convenient for local development, it does differ from the cloud environment where source code isn't available during execution. We may want to consider if JiT code should be strictly self-contained for better parity. This would make the project to be unportable from local to GCP and we don't want it, right?

I think it should be consistent. Allowing JiT compilation to access other files in a project seems ok to me, but if we don't support it in JiT, it'd make sense to not allow it in CLI as well

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both AoT compilation:

root: compileConfig.projectDir,
and JiT compilation: https://github.com/dataform-co/dataform/pull/2110/changes#diff-c88d00fcc6836e8a1bbecf72502bf3cd2f54cf0a585c58b1ba234bcf8b00bb4cR90 there importing files is allowed and in both there is restriction so that only files within projectDir can be imported.
What exactly you would align in AoT and JiT compilation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is about "GCP vs CLI": if my JiT task will be able to import other files locally, but then will break in GCP because JiT compilation in GCP doesn't support it, it'll be not a great experience.

Comment thread cli/api/commands/run.ts Outdated
Comment thread cli/api/dbadapters/bigquery.ts Outdated
Comment thread cli/vm/jit_worker.ts
return moduleName;
}
},
sourceExtensions: ["js", "json"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add yaml here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, interesting - should yaml be really supported and is ever read in JiT compilation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_settings.yaml?

Comment thread cli/console.ts Outdated
Comment thread cli/index.ts
if (
argv[dryRunOptionName] &&
isJsonOutput &&
!executionGraph.actions.some(action => !!action.jitCode)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to preserve printing graph here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we preserve printing the executionGraph here, the JSON output for JiT actions will be incomplete (it will only contain the raw js jitCode, not the generated SQL). I added this condition so that a dry-run with JiT actions falls through to the Runner, forcing the JiT compilation to happen, and outputting a runResult that contains the actual compiled SQL. If we remove this, users lose the ability to inspect dynamically generated JiT queries via JSON dry-runs. Do you still want me to revert this and print the uncompiled graph?

@@ -1,2 +1,28 @@
import { listenForCompileRequest } from "df/cli/vm/compile";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be cleaned up?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean removing this function from listenForCompileRequest :_)

@rafal-hawrylak rafal-hawrylak force-pushed the cli_jit_integration branch 4 times, most recently from 483f247 to 807dae4 Compare April 16, 2026 09:06
* feat: enable JiT compilation in CLI run, build and compile commands

- Connect main CLI execution flow to the JiT worker process

- Add comprehensive JiT test suite

- Finalize AoT build logic to preserve JiT-specific fields

- Update CLI console and entry points for integrated JiT support
Copy link
Copy Markdown
Contributor

@kolina kolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No outstanding issues from my side except question about access of JiT compiled code to other files

Comment thread cli/api/commands/run.ts
) {
return {
runResult: {
actions: [],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this if actions is present in executionOptions?

Comment thread cli/api/commands/run.ts
private readonly dbadapter: dbadapters.IDbAdapter,
private readonly graph: dataform.IExecutionGraph,
private readonly executionOptions: IExecutionOptions = {},
optionsOrProjectDir: RunOptionsOrProjectDir = ".",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify what exactly are we passing here

Comment thread cli/api/commands/run.ts
try {
const jitResponse = await this.executionOptions.jitCompiler(
jitRequest,
this.executionOptions.projectDir,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is about "GCP vs CLI": if my JiT task will be able to import other files locally, but then will break in GCP because JiT compilation in GCP doesn't support it, it'll be not a great experience.

Comment thread cli/api/commands/run.ts
try {
if (action.jitCode) {
await this.compileJitAction(action, actionResult, this.dbadapter);
if ((actionResult.status as dataform.ActionResult.ExecutionStatus) === dataform.ActionResult.ExecutionStatus.FAILED) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that compileJitAction only updates actionResult in the exception block, so you can consider not passing actionResult there and moving the exception block into this function

Comment thread cli/vm/jit_worker.ts
return moduleName;
}
},
sourceExtensions: ["js", "json"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_settings.yaml?

@@ -1,2 +1,28 @@
import { listenForCompileRequest } from "df/cli/vm/compile";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean removing this function from listenForCompileRequest :_)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants