Skip to content

Trampolining: addition of fields to remove the infinite loops for pinned wfs #721

Merged
Shivs11 merged 13 commits intomasterfrom
ss/trampolining-infinite-loops
Mar 12, 2026
Merged

Trampolining: addition of fields to remove the infinite loops for pinned wfs #721
Shivs11 merged 13 commits intomasterfrom
ss/trampolining-infinite-loops

Conversation

@Shivs11
Copy link
Member

@Shivs11 Shivs11 commented Feb 19, 2026

READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.

-WISOTT

  • Users that could using trampolining in the near future could run into a problem, which shall only surface if the workflow author does not have AutoUpgrade as the initial CAN behaviour for their Pinned workflows. As of today, if a user does hit this case, they would have the pinned workflow CAN'ing infinite number of times. This is because a pinned workflow would commence on the pinned version (since it would have inherited pinned version set to true) and if the pinned version is different than a worker-deployment's current version, the targetDeploymentVersionChanged would be set to true in the server. This would then result in the workflow just CAN'ing all the time!
  • This change aims to change that by only allowing at-most one CAN for such pinned workflow executions, and it does so by remembering what the initial version was at the start of the continue-as-new.

Breaking changes

@Shivs11 Shivs11 changed the title Trampolining: Add target_version_on_start Trampolining: target_worker_deployment_version_on_start Feb 20, 2026
@Shivs11 Shivs11 marked this pull request as ready for review February 20, 2026 21:12
@Shivs11 Shivs11 requested review from a team as code owners February 20, 2026 21:12
@Shivs11 Shivs11 force-pushed the ss/trampolining-infinite-loops branch from 776bba0 to c69139f Compare March 6, 2026 03:31
@Shivs11 Shivs11 changed the title Trampolining: target_worker_deployment_version_on_start Trampolining: addition of fields to remove the infinite loops for pinned wfs Mar 6, 2026
// latest_target_worker_deployment_version on its versioning info. Used to
// detect whether the routing target has changed since this run started,
// preventing infinite continue-as-new loops for pinned workflows that do
// not have initial CaN behavior set as AutoUpgrade. Nil for fresh starts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just explicitly say: "Nil for workflows that are not initiated by a versioned workflow continuing-as-new."

Comment on lines +237 to +238
// task, it compares the polling worker's deployment version against the task
// queue's routing target. If they differ, matching sends the routing target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "the task queue's routing target" is not quite right, because "Target Worker Deployment Version" is a function of workflow_id, task_queue_routing_config, not just task_queue_routing_config. So I think it's best to just always say "Workflow's Target Version." We define that term in our docs, so we can just use it as is without explaining further imo

Comment on lines +242 to +248
// This field exists so that when a workflow continues-as-new, the new run
// can know what the routing target was at the end of the previous run.
// Specifically, it is copied into the new run's WorkflowExecutionStartedEvent,
// enabling the new run to detect whether the routing target has actually
// changed since it started — which is how we prevent infinite
// continue-as-new loops for pinned workflows that do not have initial
// CaN behavior set as AutoUpgrade.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider just omitting this paragraph, but also ok to leave it.

If you keep it, please change "routing target" to "workflow's Target Version"

// changed since it started — which is how we prevent infinite
// continue-as-new loops for pinned workflows that do not have initial
// CaN behavior set as AutoUpgrade.
temporal.api.deployment.v1.WorkerDeploymentVersion latest_target_worker_deployment_version = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh since this is already in the context of "versioning info" we can probably just call it latest_target_version.

Also, I'm realizing that since VersioningInfo is an external API thing, if we want to put:
TargetVersionOnContinueAsNew version and InheritedPinnedVersionOnContinueAsNew bool in here, that would have to be in this API PR as well.

// Target Version has changed since this run started, preventing infinite
// continue-as-new loops for pinned workflows that do not have initial CaN
// behavior set as AutoUpgrade. Nil for workflows that are not initiated by
// a versioned workflow continuing-as-new.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field doesn't need to be looked at by SDK, correct? A line to that effect is always helpful in docstrings.

Shivs11 and others added 2 commits March 9, 2026 16:11
…_version docstring

Clarifies that this field is server-internal and should not be read by SDKs,
per reviewer feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
// Wrapper for the last Target Deployment Version that the server notified the SDK about
// via the targetDeploymentVersionChanged flag on a workflow task started event.
// See last_notified_target_version on WorkflowExecutionStartedEventAttributes.
message LastNotifiedTargetVersion {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw: I was tempted to calling this NullableTargetVerison since we are using the message in the server repo here

however, chose against it for readability - @Sushisource / @ShahabT curious to hear how you folks think about this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nullable is an odd name. This makes more sense to me.

// Only populated by server with version >= 1.29.0.
bool eager_execution_accepted = 38;

// During the previous run of this workflow, if it exists, the server may have notified
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the previous run of this workflow, if it exists, the server may have notified
maybe remove the if it exists

//
// Used internally by the server during continue-as-new and retry.
// Should not be read or interpreted by SDKs.
LastNotifiedTargetVersion last_notified_target_version = 40;
Copy link
Contributor

@ShahabT ShahabT Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how I think: previous run sends a declined_upgrade_for_target_version to the next run.

how previous run calculates this value at the CaN time:

  • if prevMS.last_notified_target_version != nil -> that's the value.
  • else prevMS.declined_upgrade_for_target_version is the value

how previous failed run calculates this value at retry:

  • just pass starteventAttr.declined_upgrade_for_target_version

@Shivs11 Shivs11 merged commit 8baa851 into master Mar 12, 2026
4 checks passed
@Shivs11 Shivs11 deleted the ss/trampolining-infinite-loops branch March 12, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants