Conversation
📝 WalkthroughWalkthroughThis PR introduces Terraform Cloud job agent integration, adding templated workspace management, variable synchronization, run creation, and verification workflows. It modifies the TFE constructor to accept a verification manager and implements a Dispatch flow with Kafka event emission for job execution and status updates. Changes
Sequence Diagram(s)sequenceDiagram
actor JobExecutor
participant TFE as TFE Agent
participant TFECloud as Terraform Cloud API
participant VerificationMgr as Verification Manager
participant Kafka as Kafka Producer
JobExecutor->>TFE: Dispatch(ctx, job)
activate TFE
TFE->>TFE: parseJobAgentConfig(job)
TFE->>TFE: getTemplatableJob(config)
TFE->>TFE: getTemplatedWorkspace(job, template)
TFE->>TFECloud: upsertWorkspace(workspace)
activate TFECloud
TFECloud-->>TFE: workspace
deactivate TFECloud
TFE->>TFECloud: syncVariables(variables)
activate TFECloud
TFECloud-->>TFE: synced
deactivate TFECloud
TFE->>TFECloud: createRun(runConfig)
activate TFECloud
TFECloud-->>TFE: runID
deactivate TFECloud
TFE->>VerificationMgr: StartVerification(ctx, runID)
activate VerificationMgr
VerificationMgr->>TFECloud: monitor run status
TFECloud-->>VerificationMgr: status
VerificationMgr-->>TFE: verification result
deactivate VerificationMgr
TFE->>Kafka: sendJobUpdateEvent(job, status)
activate Kafka
Kafka-->>TFE: ack
deactivate Kafka
TFE-->>JobExecutor: nil or error
deactivate TFE
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe.go`:
- Around line 95-129: The goroutine launching the TFE workflow ignores errors
from sendJobFailureEvent/sendJobUpdateEvent and has no panic recovery; modify
the anonymous goroutine to (1) add a deferred recover handler (defer func() { if
r := recover() { log the panic and call sendJobFailureEvent with a descriptive
message/error } }) so panics are caught and reported, and (2) check and handle
the error return values of every sendJobFailureEvent and sendJobUpdateEvent
invocation (e.g., after calling sendJobFailureEvent(...) capture err and if
non-nil attempt a fallback log and/or retry/report) so failures to publish
events are not silently dropped; apply these changes around calls in the
goroutine
(getClient/upsertWorkspace/syncVariables/createRun/verification.StartVerification/sendJobUpdateEvent).
- Around line 453-463: The code converts VariableTemplate.Category directly to
tfe.CategoryType in VariableTemplate.toCreateOptions which allows invalid
categories to reach the API; validate the Category string against the allowed
values ("terraform" and "env") and fail fast: either add validation in
parseJobAgentConfig (preferred) to return an error when an invalid category is
encountered, or change VariableTemplate.toCreateOptions to return
(tfe.VariableCreateOptions, error) and verify v.Category is one of the allowed
values before converting (use constants for "terraform"/"env" and return a clear
error mentioning the invalid value and the VariableTemplate.Key).
- Around line 199-201: The code currently checks the error by comparing
err.Error() to a string; replace that fragile comparison with a sentinel error
check using errors.Is(err, tfe.ErrResourceNotFound) (or err ==
tfe.ErrResourceNotFound) so wrapped errors are handled correctly: change the
condition to if err != nil && !errors.Is(err, tfe.ErrResourceNotFound) { return
nil, fmt.Errorf("failed to read workspace: %w", err) } and add the "errors"
import if missing; reference symbols: err, tfe.ErrResourceNotFound in tfe.go.
In `@apps/workspace-engine/test/e2e/engine_argocd_template_test.go`:
- Around line 255-263: The code block parsing jobAgentConfigJSON has multiple
statements concatenated on single lines (e.g., json.Unmarshal into
jobAgentConfig, require.NoError, t.Logf, template extraction and checks) causing
gofmt failures; split each statement onto its own line and ensure proper
statement separators so that json.Unmarshal(..., &jobAgentConfig) is followed by
require.NoError(t, err), then t.Logf(...) on its own line, then the template
lookup (template, ok := jobAgentConfig["template"]) and subsequent
require.True/require.NotEmpty/templateStr conversion and t.Logf or
assert.Contains calls are each on separate lines; adjust the sequence around
jobAgentConfig, template, templateStr, require/ assert and t.Logf to restore
correct formatting and compile cleanly.
🧹 Nitpick comments (6)
apps/workspace-engine/test/e2e/engine_argocd_template_test.go (2)
147-151: Consider simplifying map iteration for single job retrieval.The loop iterates over the map just to get the first (and only expected) job. A more direct approach would improve readability.
♻️ Suggested simplification
- var job *oapi.Job - for _, j := range pendingJobs { - job = j - break - } + // Get the first (and only) pending job + var job *oapi.Job + for _, j := range pendingJobs { + job = j + break + }Alternatively, if there's a helper to get jobs as a slice, that would be cleaner. The current approach works but the
breakafter assignment is a common Go pattern for "get any one element from map."
242-244: Unchecked type assertion could panic.The type assertion
template.(string)on line 244 could panic if the value is not a string. While this would just fail the test, using a checked assertion is safer.♻️ Safer type assertion pattern
template, ok := dv.JobAgentConfig["template"] require.True(t, ok, "template field should be present in JobAgentConfig") - assert.Contains(t, template.(string), "{{.Resource.Name}}", "template should contain resource name placeholder") + templateStr, ok := template.(string) + require.True(t, ok, "template field should be a string") + assert.Contains(t, templateStr, "{{.Resource.Name}}", "template should contain resource name placeholder")This same pattern applies to lines 261 and 294 as well.
apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe_verifications.go (2)
10-16: Consider adding documentation for the exported type.
TFERunVerificationis an exported type but lacks a documentation comment explaining its purpose and usage.📝 Suggested documentation
+// TFERunVerification manages the verification workflow for Terraform Cloud runs, +// polling run status and reporting success/failure based on the run outcome. type TFERunVerification struct { verifications *verification.Manager job *oapi.Job address string token string runID string }As per coding guidelines: "Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods."
48-61: Consider extracting magic numbers as constants for clarity.The polling configuration uses hardcoded values (100 iterations, 60-second intervals) that could benefit from being named constants for self-documentation and easier tuning.
♻️ Suggested refactor with named constants
+const ( + // TFE run verification polls up to 100 times at 60-second intervals (~100 min max) + defaultVerificationCount = 100 + defaultVerificationInterval = 60 + defaultSuccessThreshold = 1 + defaultFailureThreshold = 1 +) + func (v *TFERunVerification) buildMetricSpec(provider oapi.MetricProvider) oapi.VerificationMetricSpec { failureCondition := "result.status == 'canceled' || result.status == 'discarded' || result.status == 'errored'" - successThreshold := 1 - failureThreshold := 1 + successThreshold := defaultSuccessThreshold + failureThreshold := defaultFailureThreshold return oapi.VerificationMetricSpec{ - Count: 100, - IntervalSeconds: 60, + Count: defaultVerificationCount, + IntervalSeconds: defaultVerificationInterval,apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe.go (2)
366-374: Kafka producer created per event is inefficient.
getKafkaProducer()creates a new producer for each event, which is immediately closed after publishing. This adds connection overhead per message. Consider caching the producer at theTFEstruct level or using a shared producer pool.This pattern would be problematic at scale. For now, it works but consider initializing the producer once during
TFEconstruction and reusing it across events.
69-71: Consider adding documentation for exported constructor and types.
NewTFEand the exported template types (WorkspaceTemplate,VariableTemplate,VCSRepoTemplate) lack documentation comments. While not blocking, these would help consumers understand the API.As per coding guidelines: "document exported functions/types/methods."
| go func() { | ||
| ctx := context.WithoutCancel(ctx) | ||
| client, err := t.getClient(address, token) | ||
| if err != nil { | ||
| t.sendJobFailureEvent(dispatchCtx, fmt.Sprintf("failed to create Terraform Cloud client: %s", err.Error())) | ||
| return | ||
| } | ||
|
|
||
| targetWorkspace, err := t.upsertWorkspace(ctx, client, organization, workspace) | ||
| if err != nil { | ||
| t.sendJobFailureEvent(dispatchCtx, fmt.Sprintf("failed to upsert workspace: %s", err.Error())) | ||
| return | ||
| } | ||
|
|
||
| if len(workspace.Variables) > 0 { | ||
| if err := t.syncVariables(ctx, client, targetWorkspace.ID, workspace.Variables); err != nil { | ||
| t.sendJobFailureEvent(dispatchCtx, fmt.Sprintf("failed to sync variables: %s", err.Error())) | ||
| return | ||
| } | ||
| } | ||
|
|
||
| run, err := t.createRun(ctx, client, targetWorkspace.ID, dispatchCtx.Job.Id) | ||
| if err != nil { | ||
| t.sendJobFailureEvent(dispatchCtx, fmt.Sprintf("failed to create run: %s", err.Error())) | ||
| return | ||
| } | ||
|
|
||
| verification := newTFERunVerification(t.verifications, dispatchCtx.Job, address, token, run.ID) | ||
| if err := verification.StartVerification(ctx); err != nil { | ||
| t.sendJobFailureEvent(dispatchCtx, fmt.Sprintf("failed to start verification: %s", err.Error())) | ||
| return | ||
| } | ||
|
|
||
| t.sendJobUpdateEvent(address, organization, targetWorkspace.Name, run, dispatchCtx) | ||
| }() |
There was a problem hiding this comment.
Goroutine ignores errors from sendJobFailureEvent and lacks panic recovery.
The goroutine calls sendJobFailureEvent but ignores its return value. If event publishing fails, the job failure is silently lost with no logging or alerting. Additionally, any panic in this goroutine would crash silently.
🛡️ Proposed fix to handle errors and add recovery
go func() {
+ defer func() {
+ if r := recover(); r != nil {
+ // Log panic - consider using structured logging
+ fmt.Printf("panic in TFE dispatch goroutine: %v\n", r)
+ }
+ }()
+
ctx := context.WithoutCancel(ctx)
client, err := t.getClient(address, token)
if err != nil {
- t.sendJobFailureEvent(dispatchCtx, fmt.Sprintf("failed to create Terraform Cloud client: %s", err.Error()))
+ if sendErr := t.sendJobFailureEvent(dispatchCtx, fmt.Sprintf("failed to create Terraform Cloud client: %s", err.Error())); sendErr != nil {
+ // Log the failure to send the event
+ fmt.Printf("failed to send job failure event: %v (original error: %v)\n", sendErr, err)
+ }
return
}Apply similar error handling to all sendJobFailureEvent and sendJobUpdateEvent calls within the goroutine.
🤖 Prompt for AI Agents
In `@apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe.go` around
lines 95 - 129, The goroutine launching the TFE workflow ignores errors from
sendJobFailureEvent/sendJobUpdateEvent and has no panic recovery; modify the
anonymous goroutine to (1) add a deferred recover handler (defer func() { if r
:= recover() { log the panic and call sendJobFailureEvent with a descriptive
message/error } }) so panics are caught and reported, and (2) check and handle
the error return values of every sendJobFailureEvent and sendJobUpdateEvent
invocation (e.g., after calling sendJobFailureEvent(...) capture err and if
non-nil attempt a fallback log and/or retry/report) so failures to publish
events are not silently dropped; apply these changes around calls in the
goroutine
(getClient/upsertWorkspace/syncVariables/createRun/verification.StartVerification/sendJobUpdateEvent).
| if err != nil && err.Error() != "resource not found" { | ||
| return nil, fmt.Errorf("failed to read workspace: %w", err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
go-tfe library ErrResourceNotFound error type documentation
💡 Result:
In HashiCorp’s github.com/hashicorp/go-tfe package, ErrResourceNotFound is a package-level sentinel error value documented as:
- “returned when receiving a 404.” (HTTP Not Found) [1]
So the “documentation” for the error is in the package docs under Variables on pkg.go.dev [1]. In practice, downstream code often checks it via direct comparison (e.g., err != tfe.ErrResourceNotFound) [2].
Sources:
[1] pkg.go.dev – github.com/hashicorp/go-tfe (Variables: ErrResourceNotFound) (pkg.go.dev)
[2] OpenTofu source example comparing against tfe.ErrResourceNotFound (fossies.org)
Citations:
- 1: https://pkg.go.dev/github.com/hashicorp/go-tfe
- 2: https://fossies.org/linux/opentofu/internal/cloud/backend.go?utm_source=openai
🏁 Script executed:
# Check the actual code at the specified lines
cat -n apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe.go | sed -n '195,210p'Repository: ctrlplanedev/ctrlplane
Length of output: 752
🏁 Script executed:
# Search for how errors are imported and used in the file
head -50 apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe.goRepository: ctrlplanedev/ctrlplane
Length of output: 2258
🏁 Script executed:
# Search for other error handling patterns in the file
rg "errors\." apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe.goRepository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
# Check if go.mod references go-tfe and what version
rg "github.com/hashicorp/go-tfe" go.modRepository: ctrlplanedev/ctrlplane
Length of output: 112
Replace fragile error string comparison with proper sentinel error check.
Comparing error messages with string equality (err.Error() != "resource not found") is fragile and can break if the library changes its error message. The go-tfe library provides tfe.ErrResourceNotFound sentinel error for this purpose—use direct comparison instead.
🐛 Proposed fix
existing, err := client.Workspaces.Read(ctx, organization, workspace.Name)
- if err != nil && err.Error() != "resource not found" {
+ if err != nil && err != tfe.ErrResourceNotFound {
return nil, fmt.Errorf("failed to read workspace: %w", err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err != nil && err.Error() != "resource not found" { | |
| return nil, fmt.Errorf("failed to read workspace: %w", err) | |
| } | |
| existing, err := client.Workspaces.Read(ctx, organization, workspace.Name) | |
| if err != nil && err != tfe.ErrResourceNotFound { | |
| return nil, fmt.Errorf("failed to read workspace: %w", err) | |
| } |
🤖 Prompt for AI Agents
In `@apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe.go` around
lines 199 - 201, The code currently checks the error by comparing err.Error() to
a string; replace that fragile comparison with a sentinel error check using
errors.Is(err, tfe.ErrResourceNotFound) (or err == tfe.ErrResourceNotFound) so
wrapped errors are handled correctly: change the condition to if err != nil &&
!errors.Is(err, tfe.ErrResourceNotFound) { return nil, fmt.Errorf("failed to
read workspace: %w", err) } and add the "errors" import if missing; reference
symbols: err, tfe.ErrResourceNotFound in tfe.go.
| func (v *VariableTemplate) toCreateOptions() tfe.VariableCreateOptions { | ||
| category := tfe.CategoryType(v.Category) | ||
| return tfe.VariableCreateOptions{ | ||
| Key: &v.Key, | ||
| Value: &v.Value, | ||
| Description: &v.Description, | ||
| Category: &category, | ||
| HCL: &v.HCL, | ||
| Sensitive: &v.Sensitive, | ||
| } | ||
| } |
There was a problem hiding this comment.
No validation for variable category value.
The Category field is converted directly to tfe.CategoryType without validation. Invalid categories could cause API errors at runtime. Valid categories are typically "terraform" and "env".
🛡️ Suggested validation
func (v *VariableTemplate) toCreateOptions() tfe.VariableCreateOptions {
+ // Valid categories: "terraform" or "env"
category := tfe.CategoryType(v.Category)
+ if category != tfe.CategoryTerraform && category != tfe.CategoryEnv {
+ // Consider logging a warning or returning an error
+ }
return tfe.VariableCreateOptions{Alternatively, validate the category in parseJobAgentConfig or during template parsing to fail fast.
🤖 Prompt for AI Agents
In `@apps/workspace-engine/pkg/workspace/jobagents/terraformcloud/tfe.go` around
lines 453 - 463, The code converts VariableTemplate.Category directly to
tfe.CategoryType in VariableTemplate.toCreateOptions which allows invalid
categories to reach the API; validate the Category string against the allowed
values ("terraform" and "env") and fail fast: either add validation in
parseJobAgentConfig (preferred) to return an error when an invalid category is
encountered, or change VariableTemplate.toCreateOptions to return
(tfe.VariableCreateOptions, error) and verify v.Category is one of the allowed
values before converting (use constants for "terraform"/"env" and return a clear
error mentioning the invalid value and the VariableTemplate.Key).
| }` // Parse it like the CLI does | ||
| var jobAgentConfig map[string]interface{} | ||
| err := json.Unmarshal([]byte(jobAgentConfigJSON), &jobAgentConfig) | ||
| require.NoError(t, err) t.Logf("Parsed jobAgentConfig from CLI: %+v", jobAgentConfig) // Check template is present | ||
| template, ok := jobAgentConfig["template"] | ||
| require.True(t, ok, "template field should be present") | ||
| require.NotEmpty(t, template, "template should not be empty") templateStr := template.(string) | ||
| t.Logf("Template length: %d", len(templateStr)) | ||
| assert.Contains(t, templateStr, "{{.Resource.Name}}-console", "template should contain the expected placeholder") |
There was a problem hiding this comment.
Formatting issues: missing newlines between statements.
Lines 255-263 have multiple statements concatenated on the same line, which violates gofmt compliance. This appears to be a copy-paste or merge artifact.
🔧 Proposed fix to add proper newlines
-}` // Parse it like the CLI does
- var jobAgentConfig map[string]interface{}
- err := json.Unmarshal([]byte(jobAgentConfigJSON), &jobAgentConfig)
- require.NoError(t, err) t.Logf("Parsed jobAgentConfig from CLI: %+v", jobAgentConfig) // Check template is present
- template, ok := jobAgentConfig["template"]
- require.True(t, ok, "template field should be present")
- require.NotEmpty(t, template, "template should not be empty") templateStr := template.(string)
+}
+
+ // Parse it like the CLI does
+ var jobAgentConfig map[string]interface{}
+ err := json.Unmarshal([]byte(jobAgentConfigJSON), &jobAgentConfig)
+ require.NoError(t, err)
+
+ t.Logf("Parsed jobAgentConfig from CLI: %+v", jobAgentConfig)
+
+ // Check template is present
+ template, ok := jobAgentConfig["template"]
+ require.True(t, ok, "template field should be present")
+ require.NotEmpty(t, template, "template should not be empty")
+
+ templateStr := template.(string)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| }` // Parse it like the CLI does | |
| var jobAgentConfig map[string]interface{} | |
| err := json.Unmarshal([]byte(jobAgentConfigJSON), &jobAgentConfig) | |
| require.NoError(t, err) t.Logf("Parsed jobAgentConfig from CLI: %+v", jobAgentConfig) // Check template is present | |
| template, ok := jobAgentConfig["template"] | |
| require.True(t, ok, "template field should be present") | |
| require.NotEmpty(t, template, "template should not be empty") templateStr := template.(string) | |
| t.Logf("Template length: %d", len(templateStr)) | |
| assert.Contains(t, templateStr, "{{.Resource.Name}}-console", "template should contain the expected placeholder") | |
| } | |
| // Parse it like the CLI does | |
| var jobAgentConfig map[string]interface{} | |
| err := json.Unmarshal([]byte(jobAgentConfigJSON), &jobAgentConfig) | |
| require.NoError(t, err) | |
| t.Logf("Parsed jobAgentConfig from CLI: %+v", jobAgentConfig) | |
| // Check template is present | |
| template, ok := jobAgentConfig["template"] | |
| require.True(t, ok, "template field should be present") | |
| require.NotEmpty(t, template, "template should not be empty") | |
| templateStr := template.(string) | |
| t.Logf("Template length: %d", len(templateStr)) | |
| assert.Contains(t, templateStr, "{{.Resource.Name}}-console", "template should contain the expected placeholder") |
🤖 Prompt for AI Agents
In `@apps/workspace-engine/test/e2e/engine_argocd_template_test.go` around lines
255 - 263, The code block parsing jobAgentConfigJSON has multiple statements
concatenated on single lines (e.g., json.Unmarshal into jobAgentConfig,
require.NoError, t.Logf, template extraction and checks) causing gofmt failures;
split each statement onto its own line and ensure proper statement separators so
that json.Unmarshal(..., &jobAgentConfig) is followed by require.NoError(t,
err), then t.Logf(...) on its own line, then the template lookup (template, ok
:= jobAgentConfig["template"]) and subsequent
require.True/require.NotEmpty/templateStr conversion and t.Logf or
assert.Contains calls are each on separate lines; adjust the sequence around
jobAgentConfig, template, templateStr, require/ assert and t.Logf to restore
correct formatting and compile cleanly.
|
@adityachoudhari26 can you review this. Also I dont think we need a post verification |
|
|
Summary by CodeRabbit
New Features
Tests