Skip to content

fix/invalid-state-transitiondevelop#1288

Open
lbeckman314 wants to merge 1 commit intodevelopfrom
fix/invalid-state-transition
Open

fix/invalid-state-transitiondevelop#1288
lbeckman314 wants to merge 1 commit intodevelopfrom
fix/invalid-state-transition

Conversation

@lbeckman314
Copy link
Copy Markdown
Contributor

@lbeckman314 lbeckman314 commented Dec 22, 2025

Overview 🌀

This PR resolves the Invalid State Transition error when resubmitting tasks via an external retry mechanism (e.g. K8s BackoffLimit).

Current Behavior ⚠️

s3-invalid.json

{
  "name": "S3 Storage example (invalid)",
  "description": "Task inputs and outputs can be Cloud Storage URLs (Invalid Test)",
  "executors": [
    {
      "image": "ubuntu",
      "command": ["md5sum", "/tmp/README.md"]
    }
  ],
  "inputs": [
    {
      "name": "input",
      "description": "Download a file from S3 Storage",
      "url": "s3://funnel-testing-east/ERROR",    <----- Non-existent object set here
      "path": "/tmp/README.md"
    }
  ]
}

➜ funnel task create examples/s3-invalid.json
<TASK ID>

➜ funnel task get <TASK ID> --view MINIMAL
{
  "id":  "<TASK ID>",
  "state":  "SYSTEM_ERROR"
}

➜ kubectl get jobs/<TASK ID>
NAME          COMPLETIONS   DURATION
<TASK ID>     0/1           13m     <---- Worker

➜ kubectl logs jobs/<TASK ID>
{"error":"invalid state transition from SYSTEM_ERROR to INITIALIZING","msg":"error writing event"}
{"error":"genericS3: stat object ERROR in bucket funnel-testing-east: The specified key does not exist."}
{"msg":"TASK_STATE","ns":"worker","state":"SYSTEM_ERROR","taskID":"<TASK ID>"}

Note

Tested with Helm Charts 0.1.71 (2025-12-14) and 0.1.75 (2025-12-22):

➜ helm repo update ohsu
Update Complete. ⎈Happy Helming!⎈

➜ helm search repo funnel --versions
NAME            CHART VERSION   APP VERSION     DESCRIPTION
ohsu/funnel     0.1.75          2025-12-22      A toolkit for distributed task execution ⚙️
...
ohsu/funnel     0.1.71          2025-12-14      A toolkit for distributed task execution ⚙️

➜ helm upgrade --install funnel ohsu/funnel -f values.yaml --version 0.1.71
Release "funnel" has been upgraded. Happy Helming!

Copilot AI review requested due to automatic review settings December 22, 2025 23:01
@lbeckman314 lbeckman314 self-assigned this Dec 22, 2025
@netlify
Copy link
Copy Markdown

netlify bot commented Dec 22, 2025

Deploy Preview for funnel-dev ready!

Name Link
🔨 Latest commit b09cdc0
🔍 Latest deploy log https://app.netlify.com/projects/funnel-dev/deploys/6949cdc94f782800080bffaa
😎 Deploy Preview https://deploy-preview-1288--funnel-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@lbeckman314 lbeckman314 added the blocker blocks user interactions label Dec 22, 2025
@lbeckman314 lbeckman314 moved this to In Progress in Funnel Dec 22, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds TODO comments to investigate invalid state transition errors that occur after valid worker restarts in Kubernetes environments. The changes are purely documentation-focused, adding context for future investigation without modifying any code logic.

  • Added TODO comments identifying potential sources of invalid state transition errors during K8s worker restarts
  • Documented concerns about retry policy interactions with K8s BackoffLimit settings
  • Added note about removing K8s artifacts in favor of Helm configurations

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tes/states.go Added TODO noting that the TransitionError is thrown after valid worker restarts
database/mongodb/events.go Added TODO suggesting this retrier logic may be the source of invalid state transition errors and noting interaction with K8s BackoffLimit
database/boltdb/events.go Added comment explaining executor restart behavior and terminal state transitions
config/kubernetes/executor-job.yaml Added TODO to remove K8s artifacts in favor of Helm configs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread database/boltdb/events.go
Comment on lines +176 to +178
// In case where restarts of executor are allowed, this line should not be encountered
// e.g. Accessing invalid object shpuld be SYSTEM_ERROR every time,
// but if a retry mechanism would work (e.g. unavailable for a few seconds), then Task Worker should allow Executor to restart as expected.
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is difficult to parse and unclear. Consider breaking it into multiple sentences or restructuring for clarity. The relationship between "restarts of executor", "invalid object", "SYSTEM_ERROR", and "retry mechanism" is confusing as currently written.

Suggested change
// In case where restarts of executor are allowed, this line should not be encountered
// e.g. Accessing invalid object shpuld be SYSTEM_ERROR every time,
// but if a retry mechanism would work (e.g. unavailable for a few seconds), then Task Worker should allow Executor to restart as expected.
// When executor restarts are allowed, this branch should not be reached.
// For example, repeatedly accessing an invalid object should always result in SYSTEM_ERROR (a terminal state) with no retries.
// In contrast, transient errors that can be retried (e.g. a service being unavailable for a few seconds) should be handled before the task reaches a terminal state, allowing the executor to restart as expected.

Copilot uses AI. Check for mistakes.
Comment thread database/boltdb/events.go
current, target)

// In case where restarts of executor are allowed, this line should not be encountered
// e.g. Accessing invalid object shpuld be SYSTEM_ERROR every time,
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "shpuld" should be "should"

Suggested change
// e.g. Accessing invalid object shpuld be SYSTEM_ERROR every time,
// e.g. Accessing invalid object should be SYSTEM_ERROR every time,

Copilot uses AI. Check for mistakes.
@lbeckman314 lbeckman314 changed the title chore: Add TODOs to investigate invalid state transition errors fix/invalid-state-transition -> develop Dec 23, 2025
@lbeckman314 lbeckman314 changed the title fix/invalid-state-transition -> develop fix/invalid-state-transitiondevelop Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocker blocks user interactions bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants