CXP-116 Containerization and refactor of caches by JavierCarnelli-ConductorOne · Pull Request #101 · ConductorOne/baton-aws

JavierCarnelli-ConductorOne · 2026-02-09T15:48:30Z

Updates the connector for it to be able to containerize.

This applies changes to the structure, the workflows and refactors the use of caches on the connector. Also implements ResourceSyncerV2.
Will require extensive testing and validations.

Summary by CodeRabbit

Refactor
- Migrated connectors to a unified SyncOpAttrs/SyncOpResults pagination model and updated connector initialization/signatures.
- Removed legacy external-ID validation and related configuration constants.
Performance
- Added session-backed caching and stateful, throttle-aware paging to reduce repeated API calls and improve sync resilience.
Tests
- Updated tests to exercise the new sync/pagination and session-caching behavior.

linear · 2026-02-09T15:48:34Z

CXP-116 [AWS] Containerize connector

coderabbitai · 2026-02-09T15:48:55Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Replaces legacy pagination/token+annotations APIs with SyncOpAttrs/SyncOpResults across connector resource types, moves external-id/config validation into the connector, and adds session-backed caching and paginated state encoding for AWS SSO permission-set lists, details, grants, and entitlements.

Changes

Cohort / File(s)	Summary
Config surface & tests `pkg/config/config.go`, `pkg/config/config_test.go`	Removed local external ID length constants and config validation helpers; tests switched to package `config` and now reference public config/connector validation (`connector.ValidateExternalID`).
Connector bootstrap & validation `pkg/connector/connector.go`	New constructor signature returning ConnectorBuilderV2 + opts; added exported `ValidateExternalID` and internal `validateConfig`; integrated pre-build validation and migrated ResourceSyncers to V2.
Session-backed SSO permission-set caching & paging `pkg/connector/account.go`	Introduced session-store cache keys, page-state encode/decode, throttle-aware error wrapping, per-account/per-permission-set caches, and helpers to fetch/cache permission sets and paged grants/entitlements; replaced prior in-memory mutex/map caches.
Pagination API migration (resource types) `pkg/connector/account.go`, `pkg/connector/account_iam.go`, `pkg/connector/iam_group.go`, `pkg/connector/iam_user.go`, `pkg/connector/role.go`, `pkg/connector/secrets.go`, `pkg/connector/sso_group.go`, `pkg/connector/sso_user.go`	Migrated List/Entitlements/Grants signatures from `pagination.Token` + `(string, annotations.Annotations)` to `resourceSdk.SyncOpAttrs` input and `*resourceSdk.SyncOpResults` output; moved token handling to `opts.PageToken.Token` and updated all error/return shapes.
Grants/Entitlements multi-phase paging `pkg/connector/account.go`, `pkg/connector/account_iam.go`	Reworked grants/entitlements to be session-aware multi-phase flows: collect permission-set IDs, fetch details (session/API), fetch per-permission-set paged grants, and assemble results with SyncOpResults-based continuation.
Tests & interface updates `pkg/connector/sso_group_test.go`, `pkg/connector/...`, `pkg/config/config_test.go`	Tests updated to use `resourceSdk.SyncOpAttrs`/`SyncOpResults`; interface implementations updated (e.g., IAM user account manager → V2).
Imports & cleanup `pkg/connector/.go`, `pkg/config/`	Removed unused imports (annotations, sync, context in config.go), reordered/added imports for sessions and resource types, and updated log/error prefixes.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Connector
    participant Session as sessions.SessionStore
    participant SSO as AWS SSO Admin API
    rect rgba(200,200,255,0.5)
    Client->>Session: GetJSON("permission-sets")
    alt cached list present
        Session-->>Client: permissionSetIDs
    else cache miss
        Session-->>Client: nil
        Client->>SSO: ListPermissionSets()
        SSO-->>Client: permissionSetIDs
        Client->>Session: SetJSON("permission-sets", permissionSetIDs)
    end
    Client->>Session: GetManyJSON(["permission-set:<id>...", ...])
    alt some details missing
        Client->>SSO: DescribePermissionSet(id)
        SSO-->>Client: permissionSetDetail
        Client->>Session: SetJSON("permission-set:<id>", permissionSetDetail)
    end
    Client-->>Client: build entitlements/grants (encode page state if truncated)
    Client-->>Client: return []*Entitlement / []*Grant + SyncOpResults
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hopped through tokens, caches, and keys,

Swapped old returns for SyncOp ease,
I moved validation close to the start,
Cached SSO sets to spare the API's heart,
A rabbit's patch — tidy, nimble, and smart.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'CXP-116 Containerization and refactor of caches' accurately reflects the main objectives: containerization updates, cache refactoring, and session storage interactions mentioned in the PR description.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jallers/cxp-116-containerization

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pkg/config/config_test.go (1)
21-33: ⚠️ Potential issue | 🟡 Minor

Align test validation with single‑hop behavior.

The test currently requires an external ID whenever use-assume is set, but the production validator only enforces it when a global role ARN is configured. This makes the tests reject valid single‑hop configs.
Suggested fix
 func validateConfig(ctx context.Context, v *viper.Viper) error {
 	if v.GetBool(config.UseAssumeField.FieldName) {
-		err := connector.ValidateExternalID(v.GetString(config.ExternalIdField.FieldName))
-		if err != nil {
-			return err
-		}
-		err = connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName))
-		if err != nil {
-			return err
-		}
+		if err := connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName)); err != nil {
+			return err
+		}
+		if v.GetString(config.GlobalRoleArnField.FieldName) != "" {
+			if err := connector.ValidateExternalID(v.GetString(config.ExternalIdField.FieldName)); err != nil {
+				return err
+			}
+		}
 	}
 	return nil
 }
pkg/connector/iam_user.go (1)
233-255: ⚠️ Potential issue | 🟡 Minor

Bug: Inverted error-check logic in CreateAccount.

The condition if result, err := o.iamClient.GetUser(...); err == nil means the user was found successfully. Inside this block, checking errors.As(err, &noSuchEntity) will always be false since err is nil. The current logic returns the existing user as a success regardless, which may be intentional for idempotency, but the dead NoSuchEntityException check is misleading.

If the intent is to return an error when GetUser fails (for reasons other than "not found"), the logic should be inverted:
Suggested fix
-	if result, err := o.iamClient.GetUser(ctx, &iam.GetUserInput{UserName: awsSdk.String(username)}); err == nil {
-		var noSuchEntity *iamTypes.NoSuchEntityException
-		if errors.As(err, &noSuchEntity) {
-			return nil, nil, nil, fmt.Errorf("aws-connector: iam.GetUser failed: %w", err)
-		}
+	result, err := o.iamClient.GetUser(ctx, &iam.GetUserInput{UserName: awsSdk.String(username)})
+	if err != nil {
+		var noSuchEntity *iamTypes.NoSuchEntityException
+		if !errors.As(err, &noSuchEntity) {
+			return nil, nil, nil, fmt.Errorf("aws-connector: iam.GetUser failed: %w", err)
+		}
+		// User doesn't exist, proceed to create
+	} else {
+		// User already exists, return it
 		annos := &v2.V1Identifier{
 			Id: awsSdk.ToString(result.User.Arn),
 		}

🤖 Fix all issues with AI agents

In `@pkg/connector/account.go`:
- Around line 33-37: The comment above the cache key constants (the
"SessionStore cache keys" comment) is missing a terminating period; update that
comment to end with a period so the linter is satisfied. Locate the comment
above permissionSetsCacheKey and permissionSetDetailKeyPrefix in the file
(pkg/connector/account.go) and add the period to the end of the sentence.

In `@pkg/connector/connector.go`:
- Around line 36-38: The externalIDLengthMaximum constant
(externalIDLengthMaximum) is off-by-one compared to the ValidateExternalID logic
and error text; update externalIDLengthMaximum to 64 (to match
ValidateExternalID and the error message) or alternatively make
ValidateExternalID and its error message consistent with the constant—ensure the
symbols externalIDLengthMaximum, externalIDLengthMinimum and the
ValidateExternalID function all use the same max value so invalid IDs aren’t
allowed through.

🧹 Nitpick comments (1)

pkg/connector/account.go (1)
644-711: Consider logging cache errors for observability.

The caching implementation silently ignores errors from session.GetJSON, session.SetJSON, and session.SetManyJSON. While this is a common pattern for optional caching, logging these errors at debug level would help with troubleshooting cache issues in production.
Example: Add debug logging for cache errors
 func (o *accountResourceType) getPermissionSets(ctx context.Context, ss sessions.SessionStore) ([]*awsSsoAdminTypes.PermissionSet, error) {
+	l := ctxzap.Extract(ctx)
 	if ss != nil {
 		cached, found, err := session.GetJSON[[]*awsSsoAdminTypes.PermissionSet](ctx, ss, permissionSetsCacheKey)
-		if err == nil && found {
+		if err != nil {
+			l.Debug("aws-connector: failed to get permission sets from cache", zap.Error(err))
+		} else if found {
 			return cached, nil
 		}
 	}

pkg/connector/account.go

coderabbitai · 2026-02-09T15:54:22Z

pkg/connector/connector.go

+const (
+	externalIDLengthMaximum = 65
+	externalIDLengthMinimum = 32


⚠️ Potential issue | 🟡 Minor

External ID max length mismatch (65 vs 64).

ValidateExternalID says max is 64, but the constant allows 65. This lets invalid values through and conflicts with the error message.

Suggested fix

- externalIDLengthMaximum = 65 + externalIDLengthMaximum = 64 externalIDLengthMinimum = 32

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const (

externalIDLengthMaximum = 65

externalIDLengthMinimum = 32

const (

externalIDLengthMaximum = 64

externalIDLengthMinimum = 32

🤖 Prompt for AI Agents

In `@pkg/connector/connector.go` around lines 36 - 38, The externalIDLengthMaximum constant (externalIDLengthMaximum) is off-by-one compared to the ValidateExternalID logic and error text; update externalIDLengthMaximum to 64 (to match ValidateExternalID and the error message) or alternatively make ValidateExternalID and its error message consistent with the constant—ensure the symbols externalIDLengthMaximum, externalIDLengthMinimum and the ValidateExternalID function all use the same max value so invalid IDs aren’t allowed through.

Where did you check this value? Is there an official docs site for me to look at?

🧩 Analysis chain

🌐 Web query:

AWS STS AssumeRole external ID length requirements documentation

💡 Result:

For AWS STS AssumeRole, the ExternalId parameter has these documented requirements:

Length: 2 to 1,224 characters [1], [2]

Allowed characters / pattern: alphanumeric with no whitespace; may also include: + = , . @ : / - and underscore _ (regex: [\w+=,.@:\/-]*) [1], [2]

Docs:

STS API Reference – AssumeRole → ExternalId [1]

IAM User Guide – “IAM and AWS STS quotas” (external ID character limits) [2]

@JavierCarnelli-ConductorOne, according to the official AWS STS API documentation, the ExternalId parameter for AssumeRole actually supports 2 to 1,224 characters, not 32-64.

The official AWS documentation states external IDs can be 2 to 1,224 characters, with allowed characters being alphanumeric plus + = , . @ : / - and underscore.

So the 32-64 byte constraint in this code appears to be a custom/internal business rule, not an AWS service requirement.

However, the off-by-one issue still exists within your custom validation:

Constant says: externalIDLengthMaximum = 65

Error message says: "aws_external_id must be between 32 and 64 bytes"

These need to match

Questions:

Is there a specific business/security reason for enforcing the stricter 32-64 character limit?

Should the maximum be 64 or 65 for your custom constraint?

I agree on this observation, however I'm unsure if we should modify this.
I feel like it's a big risk to modify this boundaries.
I'll just leave this comment here. Would require further analysis on the impact of updating this

@JavierCarnelli-ConductorOne, understood—I appreciate the cautious approach given the risk involved in changing these validation boundaries. Leaving the inconsistency documented here for future analysis makes sense.

✏️ Learnings added

Learnt from: JavierCarnelli-ConductorOne Repo: ConductorOne/baton-aws PR: 101 File: pkg/connector/connector.go:36-38 Timestamp: 2026-02-12T18:28:05.444Z Learning: In the AWS connector (pkg/connector/connector.go), the external ID length validation boundaries (externalIDLengthMaximum = 65, externalIDLengthMinimum = 32) are considered risky to modify and require careful impact analysis before any changes, even to fix inconsistencies with error messages.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

cmd/baton-aws/main.go

pkg/connector/account.go

mateovespConductor

I agree with the comments left by @agustin-conductor . I think these changes are necessary

pkg/connector/account.go

mateovespConductor · 2026-02-12T19:48:22Z

pkg/connector/account.go

+	}
+
+	if ss != nil && len(toCache) > 0 {
+		_ = session.SetManyJSON(ctx, ss, toCache)


I think we should not ignore the error here because it could silently fail the cache

I've got Claude to make several changes to the code so you probably won't see this updated in the exact same place.
Please take a look to the refactored functions

Thanks in advance Beto! I really appreciate the feedback

pkg/connector/account.go

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

pkg/connector/account.go (2)
293-293: Dead code: assignmentsInput.NextToken assignment is unused.

The AWS SDK paginator manages pagination internally. Setting assignmentsInput.NextToken manually has no effect since the paginator doesn't use this input after initialization. This line can be removed.
🧹 Proposed fix
 			}
 
-			assignmentsInput.NextToken = assignmentsResp.NextToken
 			if !assignmentsPaginator.HasMorePages() {
 				break
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/connector/account.go` at line 293, Remove the dead assignment to
assignmentsInput.NextToken (it is set from assignmentsResp.NextToken) since the
AWS SDK paginator manages pagination and ignores changes to the input after
initialization; locate the pagination loop where assignmentsInput and
assignmentsResp are used (the block that iterates with the paginator) and delete
the line setting assignmentsInput.NextToken so the code relies solely on the
paginator for advancing pages.
720-723: Inconsistent error handling for session operations.

In getPermissionSets, errors from session.GetManyJSON are silently ignored (line 722), while in Grants (lines 183-184), similar errors cause the function to return an error. The same inconsistency exists for SetManyJSON.

If session errors should be fatal in Grants, they should likely be consistent here. Alternatively, if ignoring cache errors is intentional (fall back to API), consider applying the same lenient approach in Grants for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/connector/account.go` around lines 720 - 723, getPermissionSets currently
swallows errors from session.GetManyJSON while Grants returns on similar session
errors, causing inconsistent behavior; update getPermissionSets (and mirror in
any SetManyJSON usage) to handle session errors consistently with Grants by
checking the error returned from session.GetManyJSON/SetManyJSON (when ss !=
nil) and either return the error up the stack (make it fatal like in Grants) or
instead log the error and continue (make Grants lenient) — pick one approach and
apply it consistently, referencing the session variable ss, the cachedDetails
variable, and the GetManyJSON/SetManyJSON calls so the change is applied in the
correct functions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/connector/account.go`:
- Line 304: Fix the typo in the error message string by replacing "writting"
with "writing" in the fmt.Errorf call that returns "aws-connector: Session
Storage writting operation failed: %w" (found in pkg/connector/account.go);
update that message so it reads "aws-connector: Session Storage writing
operation failed: %w" to correct the spelling while preserving the error wrap.

---

Nitpick comments:
In `@pkg/connector/account.go`:
- Line 293: Remove the dead assignment to assignmentsInput.NextToken (it is set
from assignmentsResp.NextToken) since the AWS SDK paginator manages pagination
and ignores changes to the input after initialization; locate the pagination
loop where assignmentsInput and assignmentsResp are used (the block that
iterates with the paginator) and delete the line setting
assignmentsInput.NextToken so the code relies solely on the paginator for
advancing pages.
- Around line 720-723: getPermissionSets currently swallows errors from
session.GetManyJSON while Grants returns on similar session errors, causing
inconsistent behavior; update getPermissionSets (and mirror in any SetManyJSON
usage) to handle session errors consistently with Grants by checking the error
returned from session.GetManyJSON/SetManyJSON (when ss != nil) and either return
the error up the stack (make it fatal like in Grants) or instead log the error
and continue (make Grants lenient) — pick one approach and apply it
consistently, referencing the session variable ss, the cachedDetails variable,
and the GetManyJSON/SetManyJSON calls so the change is applied in the correct
functions.

pkg/connector/account.go

FeliLucero1 · 2026-02-17T12:36:30Z

pkg/connector/account.go

+	PermissionSetIndex int `json:"psi"`
+}
+
+func encodeEntitlementsPageToken(state entitlementsPageState) (string, error) {


encodeGrantsPageToken/decodeGrantsPageToken and encodeEntitlementsPageToken/decodeEntitlementsPageToken follow the same pattern: JSON marshal → base64, and base64 → JSON unmarshal. Only the state type changes.
You can create a generic helper, such as:

func encodePageToken[T any](state T) (string, error) { data, err := json.Marshal(state) if err != nil { return "", err } return base64.StdEncoding.EncodeToString(data), nil }

Also
func decodePageToken[T any](token string, state *T) error { ... }

FeliLucero1 · 2026-02-17T12:49:53Z

pkg/connector/account.go

+}
+
+// getOrFetchPermissionSetIDs retrieves permission set IDs for an account, using cache if available.
+func (o *accountResourceType) getOrFetchPermissionSetIDs(ctx context.Context, ss sessions.SessionStore, accountID string) ([]string, error) {


I would replace getOrFetchAllPermissionSetIDs and getPermissionSetWithCache with this:

func getOrSetCache[T any](ctx context.Context, ss sessions.SessionStore, cacheKey string, fetch func(context.Context) (T, error)) (T, error) { var zero T if ss != nil { cached, found, err := session.GetJSON[T](ctx, ss, cacheKey) if err != nil { return zero, fmt.Errorf("%s: %w", errSessionStorageRead, err) } if found { return cached, nil } } val, err := fetch(ctx) if err != nil { return zero, err } if ss != nil { if err := session.SetJSON(ctx, ss, cacheKey, val); err != nil { return zero, fmt.Errorf("%s: %w", errSessionStorageWrite, err) } } return val, nil }

agustin-conductor · 2026-02-18T18:26:07Z

cmd/baton-aws/main.go

+		version,
 		cfg.Config,
+		connector.New,
+		connectorrunner.WithSessionStoreEnabled(),


i think we might want this https://github.com/ConductorOne/baton-lastpass/pull/4#discussion_r2822595562

agustin-conductor · 2026-02-18T20:37:36Z

pkg/connector/connector.go

+}
+
+func validateConfig(awsc *cfg.Aws) error {
+	if awsc.GetBool(cfg.UseAssumeField.FieldName) {


i think we can just do awsc.UseAssume here, same for RoleArn

agustin-conductor · 2026-02-18T20:39:04Z

pkg/connector/connector.go

+
+	config := Config{
+		GlobalBindingExternalID: awsc.GetString(cfg.GlobalBindingExternalIdField.FieldName),
+		GlobalRegion:            awsc.GetString(cfg.GlobalRegionField.FieldName),


same comment, can we access the value directly? awsc.GlobalRegion ?

agustin-conductor

I think we might be able to process accounts entitlements more than 1 permission set at the time, other than that, LGTM

github-actions · 2026-03-03T22:50:02Z

pkg/config/config_test.go

 func validateConfig(ctx context.Context, v *viper.Viper) error {
-	if v.GetBool(UseAssumeField.FieldName) {
-		err := ValidateExternalId(v.GetString(ExternalIdField.FieldName))
+	if v.GetBool(config.UseAssumeField.FieldName) {
+		err := connector.ValidateExternalID(v.GetString(config.ExternalIdField.FieldName))
 		if err != nil {
 			return err
 		}
-		err = connector.IsValidRoleARN(v.GetString(RoleArnField.FieldName))
+		err = connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName))
 		if err != nil {
 			return err
 		}


Test/production validation divergence: This validateConfig always requires external-id when use-assume is set, but the production validateConfig in connector.go:206-221 only requires it for two-hop mode (when global-role-arn is set). Valid single-hop configs (IRSA → target role) would fail this test but pass in production. The test should mirror the production logic by only requiring external-id when global-role-arn is also set.

github-actions · 2026-03-03T22:50:10Z

pkg/connector/account.go

+func encodeGrantsPageToken(state grantsPageState) (string, error) {
+	data, err := json.Marshal(state)
+	if err != nil {
+		return "", err
+	}
+	return base64.StdEncoding.EncodeToString(data), nil
+}
+
+func decodeGrantsPageToken(token string) (grantsPageState, error) {
+	if token == "" {
+		return grantsPageState{}, nil
+	}
+	data, err := base64.StdEncoding.DecodeString(token)
+	if err != nil {
+		return grantsPageState{}, err
+	}
+	var state grantsPageState
+	if err := json.Unmarshal(data, &state); err != nil {
+		return grantsPageState{}, err
+	}
+	return state, nil
+}
+
+func permissionSetIDsCacheKey(accountID string) string {
+	return permissionSetIDsCacheKeyPrefix + accountID
+}
+
+// entitlementsPageState tracks pagination state for the Entitlements function.
+// This allows progress to be saved between calls, reducing the "blast radius"
+// if a rate limit error occurs mid-sync.
+type entitlementsPageState struct {
+	// PermissionSetIndex is the index into the cached permission set IDs list
+	PermissionSetIndex int `json:"psi"`
+}
+
+func encodeEntitlementsPageToken(state entitlementsPageState) (string, error) {
+	data, err := json.Marshal(state)
+	if err != nil {
+		return "", err
+	}
+	return base64.StdEncoding.EncodeToString(data), nil
+}
+
+func decodeEntitlementsPageToken(token string) (entitlementsPageState, error) {
+	if token == "" {
+		return entitlementsPageState{}, nil
+	}
+	data, err := base64.StdEncoding.DecodeString(token)
+	if err != nil {
+		return entitlementsPageState{}, err
+	}
+	var state entitlementsPageState
+	if err := json.Unmarshal(data, &state); err != nil {
+		return entitlementsPageState{}, err
+	}
+	return state, nil
+}


Duplicate encode/decode page token functions: encodeGrantsPageToken/decodeGrantsPageToken and encodeEntitlementsPageToken/decodeEntitlementsPageToken share identical JSON↔base64 logic. As @FeliLucero1 noted, generic helpers would eliminate this:

func encodePageToken[T any](state T) (string, error) { data, err := json.Marshal(state) if err != nil { return "", err } return base64.StdEncoding.EncodeToString(data), nil } func decodePageToken[T any](token string) (T, error) { var zero T if token == "" { return zero, nil } data, err := base64.StdEncoding.DecodeString(token) if err != nil { return zero, err } var state T if err := json.Unmarshal(data, &state); err != nil { return zero, err } return state, nil }

github-actions · 2026-03-03T22:50:17Z

pkg/connector/account.go

+func wrapAWSError(err error) error {
+	if err == nil {
+		return nil
+	}
+
+	var apiErr smithy.APIError
+	if errors.As(err, &apiErr) {
+		if _, isThrottle := awsThrottleErrorCodes[apiErr.ErrorCode()]; isThrottle {
+			return status.Error(codes.Unavailable, err.Error())
+		}
+	}
+
+	return err
+}


wrapAWSError breaks the error chain: status.Error(codes.Unavailable, err.Error()) converts the error to a string, so errors.As/errors.Is won't find the original AWS error after wrapping. The gRPC status package doesn't support %w-style wrapping. If the baton-sdk sync engine only inspects the gRPC status code, this is acceptable — but a comment documenting that assumption would prevent future confusion.

github-actions · 2026-03-03T22:50:23Z

pkg/connector/account.go

 	resp, err := o.orgClient.ListAccounts(ctx, listAccountsInput)
 	if err != nil {
-		return nil, "", nil, fmt.Errorf("aws-connector: listAccounts failed: %w", err)
+		return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err))


Inconsistent error prefix: This uses aws-connector: while the rest of the new code uses baton-aws:. Lines 195 and 198 also use the old prefix in debug messages.

Suggested change

return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err))

return nil, nil, wrapAWSError(fmt.Errorf("baton-aws: listAccounts failed: %w", err))

github-actions · 2026-03-03T22:50:32Z

pkg/connector/account.go

+// getOrFetchPermissionSetIDs retrieves permission set IDs for an account, using cache if available.
+func (o *accountResourceType) getOrFetchPermissionSetIDs(ctx context.Context, ss sessions.SessionStore, accountID string) ([]string, error) {
+	cacheKey := permissionSetIDsCacheKey(accountID)
+
+	// Try cache first
+	if ss != nil {
+		cached, found, err := session.GetJSON[[]string](ctx, ss, cacheKey)
+		if err != nil {
+			return nil, fmt.Errorf("baton-aws: Session Storage reading operation failed: %w", err)
+		}
+		if found {
+			return cached, nil
+		}
+	}
+
+	// Fetch from AWS API
+	var permissionSetIDs []string
 	psBindingInput := &awsSsoAdmin.ListPermissionSetsProvisionedToAccountInput{
-		AccountId:   awsSdk.String(resource.Id.Resource),
+		AccountId:   awsSdk.String(accountID),
 		InstanceArn: o.identityInstance.InstanceArn,
 	}

 	psBindingPaginator := awsSsoAdmin.NewListPermissionSetsProvisionedToAccountPaginator(o.ssoAdminClient, psBindingInput)
 	for {
 		psBindingsResp, err := psBindingPaginator.NextPage(ctx)
 		if err != nil {
-			return nil, "", nil, fmt.Errorf("aws-connector: ssoadmin.ListPermissionSetsProvisionedToAccount failed: %w", err)
+			return nil, wrapAWSError(fmt.Errorf("baton-aws: ssoadmin.ListPermissionSetsProvisionedToAccount failed: %w", err))
 		}

-		for _, psId := range psBindingsResp.PermissionSets {
-			ps, err := o.getPermissionSet(ctx, psId)
-			if err != nil {
-				return nil, "", nil, fmt.Errorf("aws-connector: ssoadmin.DescribePermissionSet failed: %w", err)
-			}
+		permissionSetIDs = append(permissionSetIDs, psBindingsResp.PermissionSets...)
+		if !psBindingPaginator.HasMorePages() {
+			break
+		}
+	}

-			bindingName := &PermissionSetBinding{
-				AccountID:       resource.Id.Resource,
-				PermissionSetId: awsSdk.ToString(ps.PermissionSetArn),
-			}
-			var annos annotations.Annotations
-			annos.Update(&v2.V1Identifier{
-				Id: bindingName.String(),
-			})
-			entitlement := &v2.Entitlement{
-				Id:          bindingName.String(),
-				DisplayName: fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name)),
-				Description: awsSdk.ToString(ps.Description),
-				Resource:    resource,
-				Purpose:     v2.Entitlement_PURPOSE_VALUE_ASSIGNMENT,
-				GrantableTo: []*v2.ResourceType{resourceTypeSSOGroup, resourceTypeSSOUser},
-				Annotations: annos,
-			}
+	// Cache the result
+	if ss != nil {
+		if err := session.SetJSON(ctx, ss, cacheKey, permissionSetIDs); err != nil {
+			return nil, fmt.Errorf("baton-aws: Session Storage writing operation failed: %w", err)
+		}
+	}

-			assignmentsInput := &awsSsoAdmin.ListAccountAssignmentsInput{
-				AccountId:        awsSdk.String(resource.Id.Resource),
-				InstanceArn:      o.identityInstance.InstanceArn,
-				PermissionSetArn: ps.PermissionSetArn,
-			}
+	return permissionSetIDs, nil
+}
+
+// getOrFetchAllPermissionSetIDs retrieves all permission set IDs globally (not per-account), using cache if available.
+// This is used by Entitlements() which needs all permission sets regardless of account association.
+func (o *accountResourceType) getOrFetchAllPermissionSetIDs(ctx context.Context, ss sessions.SessionStore) ([]string, error) {
+	// Try cache first
+	if ss != nil {
+		cached, found, err := session.GetJSON[[]string](ctx, ss, allPermissionSetIDsCacheKey)
+		if err != nil {
+			return nil, fmt.Errorf("baton-aws: Session Storage reading operation failed: %w", err)
+		}
+		if found {
+			return cached, nil
+		}
+	}

-			assignmentsPaginator := awsSsoAdmin.NewListAccountAssignmentsPaginator(o.ssoAdminClient, assignmentsInput)
-			for {
-				assignmentsResp, err := assignmentsPaginator.NextPage(ctx)
-				if err != nil {
-					return nil, "", nil, fmt.Errorf("aws-connector: ssoadmin.ListAccountAssignments failed: %w", err)
-				}
-
-				for _, assignment := range assignmentsResp.AccountAssignments {
-					switch assignment.PrincipalType {
-					case awsSsoAdminTypes.PrincipalTypeGroup:
-						groupARN := ssoGroupToARN(o.region, awsSdk.ToString(o.identityInstance.IdentityStoreId), awsSdk.ToString(assignment.PrincipalId))
-						var groupAnnos annotations.Annotations
-						groupAnnos.Update(&v2.V1Identifier{
-							Id: V1GrantID(entitlement.Id, groupARN),
-						})
-						groupAnnos.Update(&v2.GrantExpandable{
-							EntitlementIds: []string{
-								fmt.Sprintf("%s:%s:%s", resourceTypeSSOGroup.Id, groupARN, groupMemberEntitlement),
-							},
-						})
-						rv = append(rv, &v2.Grant{
-							Id:          GrantID(entitlement, &v2.ResourceId{Resource: groupARN, ResourceType: resourceTypeSSOGroup.Id}),
-							Entitlement: entitlement,
-							Principal: &v2.Resource{
-								Id: fmtResourceId(resourceTypeSSOGroup.Id, groupARN),
-							},
-							Annotations: groupAnnos,
-						})
-					case awsSsoAdminTypes.PrincipalTypeUser:
-						userARN := ssoUserToARN(o.region, awsSdk.ToString(o.identityInstance.IdentityStoreId), awsSdk.ToString(assignment.PrincipalId))
-						var userAnnos annotations.Annotations
-						userAnnos.Update(&v2.V1Identifier{
-							Id: V1GrantID(entitlement.Id, userARN),
-						})
-						rv = append(rv, &v2.Grant{
-							Id:          GrantID(entitlement, &v2.ResourceId{Resource: userARN, ResourceType: resourceTypeSSOUser.Id}),
-							Entitlement: entitlement,
-							Principal: &v2.Resource{
-								Id: fmtResourceId(resourceTypeSSOUser.Id, userARN),
-							},
-							Annotations: userAnnos,
-						})
-					}
-				}
-				assignmentsInput.NextToken = assignmentsResp.NextToken
-				if !assignmentsPaginator.HasMorePages() {
-					break
-				}
-			} // end pagination loop for assignments
-		} // end range ange psBindingsResp.PermissionSets
+	// Fetch from AWS API
+	var permissionSetIDs []string
+	input := &awsSsoAdmin.ListPermissionSetsInput{
+		InstanceArn: o.identityInstance.InstanceArn,
+	}

-		if !psBindingPaginator.HasMorePages() {
+	paginator := awsSsoAdmin.NewListPermissionSetsPaginator(o.ssoAdminClient, input)
+	for {
+		resp, err := paginator.NextPage(ctx)
+		if err != nil {
+			return nil, wrapAWSError(fmt.Errorf("baton-aws: ssoadmin.ListPermissionSets failed: %w", err))
+		}
+
+		permissionSetIDs = append(permissionSetIDs, resp.PermissionSets...)
+		if !paginator.HasMorePages() {
 			break
 		}
-	} // end pagination loop for permission set to account binding
+	}
+
+	// Cache the result
+	if ss != nil {
+		if err := session.SetJSON(ctx, ss, allPermissionSetIDsCacheKey, permissionSetIDs); err != nil {
+			return nil, fmt.Errorf("baton-aws: Session Storage writing operation failed: %w", err)
+		}
+	}
+
+	return permissionSetIDs, nil
+}
+
+// getPermissionSetWithCache fetches a permission set by ID, using cache if available.
+func (o *accountResourceType) getPermissionSetWithCache(ctx context.Context, ss sessions.SessionStore, permissionSetId string) (*awsSsoAdminTypes.PermissionSet, error) {
+	cacheKey := permissionSetDetailKeyPrefix + permissionSetId
+
+	// Try cache first
+	if ss != nil {
+		cached, found, err := session.GetJSON[*awsSsoAdminTypes.PermissionSet](ctx, ss, cacheKey)
+		if err != nil {
+			return nil, fmt.Errorf("baton-aws: Session Storage reading operation failed: %w", err)
+		}
+		if found {
+			return cached, nil
+		}
+	}
+
+	// Fetch from AWS API
+	ps, err := o.fetchPermissionSetFromAPI(ctx, permissionSetId)
+	if err != nil {
+		return nil, wrapAWSError(fmt.Errorf("baton-aws: ssoadmin.DescribePermissionSet failed: %w", err))
+	}
+
+	// Cache the result
+	if ss != nil {
+		if err := session.SetJSON(ctx, ss, cacheKey, ps); err != nil {
+			return nil, fmt.Errorf("baton-aws: Session Storage writing operation failed: %w", err)
+		}
+	}
+
+	return ps, nil
+}


Duplicate cache-or-fetch pattern + cache errors as hard failures: getOrFetchPermissionSetIDs, getOrFetchAllPermissionSetIDs, and getPermissionSetWithCache all repeat the same check-cache→fetch→store pattern. As @FeliLucero1 suggested, a generic getOrSetCache[T] helper would consolidate this.

Additionally, session store read/write errors are treated as hard errors that abort the sync (e.g., line 397, 440, 484). A cache failure should degrade gracefully — log a warning and fall through to the API fetch. Cache write failures (lines 427, 469, 501) similarly abort when the data was already successfully fetched from AWS.

github-actions · 2026-03-03T22:50:41Z

pkg/connector/connector.go

+	externalIDLengthMaximum = 65
+	externalIDLengthMinimum = 32


Dropped TODO + off-by-one: The original constant in pkg/config/config.go had // TODO(marcos): this might be a bug. which was dropped during the move. The off-by-one is still present: externalIDLengthMaximum = 65 with > 65 check allows 65-byte IDs, but the error message at line 201 says "between 32 and 64 bytes". Per the PR discussion this is considered risky to change, but the TODO should be preserved.

Suggested change

externalIDLengthMaximum = 65

externalIDLengthMinimum = 32

externalIDLengthMaximum = 65 // TODO(marcos): this might be a bug. Error message says max 64 but this allows 65.

externalIDLengthMinimum = 32

github-actions · 2026-03-03T22:50:47Z

pkg/connector/connector.go

+	return nil
+}
+
+func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {


Unused parameter: opts *cli.ConnectorOpts is never referenced in the function body. If required by the ConnectorBuilderV2 interface contract, rename to _.

Suggested change

func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {

func New(ctx context.Context, awsc *cfg.Aws, _ *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {

github-actions · 2026-03-03T22:50:54Z

pkg/connector/account.go

+func (o *accountResourceType) Entitlements(ctx context.Context, resource *v2.Resource, opts resourceSdk.SyncOpAttrs) ([]*v2.Entitlement, *resourceSdk.SyncOpResults, error) {
+	// Parse page state from token
+	pageState, err := decodeEntitlementsPageToken(opts.PageToken.Token)
+	if err != nil {
+		return nil, nil, fmt.Errorf("baton-aws: failed to decode page token: %w", err)
+	}
+
+	// Step 1: Get or fetch all permission set IDs (cached globally per sync)
+	// We fetch all permission sets so that even on accounts that aren't associated with a permission set
 	// you could do a Grant Request for it -- and we'll just have an entitlement with zero entries in it in ListGrants.
-	allPS, err := o.getPermissionSets(ctx)
+	permissionSetIDs, err := o.getOrFetchAllPermissionSetIDs(ctx, opts.Session)
 	if err != nil {
-		return nil, "", nil, fmt.Errorf("aws-connector: getPermissionSets failed: %w", err)
+		return nil, nil, err
 	}
-	rv := make([]*v2.Entitlement, 0, len(allPS))
-	for _, ps := range allPS {
-		b := &PermissionSetBinding{
-			AccountID:       resource.Id.Resource,
-			PermissionSetId: awsSdk.ToString(ps.PermissionSetArn),
+
+	// If no permission sets, we're done
+	if len(permissionSetIDs) == 0 {
+		return nil, nil, nil
+	}
+
+	// If we've processed all permission sets, we're done
+	if pageState.PermissionSetIndex >= len(permissionSetIDs) {
+		return nil, nil, nil
+	}
+
+	// Step 2: Get the current permission set to process
+	currentPsId := permissionSetIDs[pageState.PermissionSetIndex]
+
+	// Step 3: Fetch the permission set details (with caching)
+	ps, err := o.getPermissionSetWithCache(ctx, opts.Session, currentPsId)
+	if err != nil {
+		return nil, nil, err
+	}
+
+	// Step 4: Build the entitlement for this permission set
+	b := &PermissionSetBinding{
+		AccountID:       resource.Id.Resource,
+		PermissionSetId: awsSdk.ToString(ps.PermissionSetArn),
+	}
+	var annos annotations.Annotations
+	annos.Update(&v2.V1Identifier{
+		Id: b.String(),
+	})
+	displayName := fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name))
+	member := entitlementSdk.NewAssignmentEntitlement(resource, displayName,
+		entitlementSdk.WithGrantableTo(resourceTypeSSOUser, resourceTypeSSOGroup),
+	)
+	member.Description = awsSdk.ToString(ps.Description)
+	member.Annotations = annos
+	member.Id = b.String()
+	member.Slug = fmt.Sprintf("%s access", awsSdk.ToString(ps.Name))
+
+	// Step 5: Determine next page state
+	var nextPageToken string
+	if pageState.PermissionSetIndex+1 < len(permissionSetIDs) {
+		// More permission sets to process
+		nextState := entitlementsPageState{
+			PermissionSetIndex: pageState.PermissionSetIndex + 1,
+		}
+
+		nextPageToken, err = encodeEntitlementsPageToken(nextState)
+		if err != nil {
+			return nil, nil, fmt.Errorf("baton-aws: failed to encode page token: %w", err)
 		}
-		var annos annotations.Annotations
-		annos.Update(&v2.V1Identifier{
-			Id: b.String(),
-		})
-		displayName := fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name))
-		member := entitlementSdk.NewAssignmentEntitlement(resource, displayName,
-			entitlementSdk.WithGrantableTo(resourceTypeSSOUser, resourceTypeSSOGroup),
-		)
-		member.Description = awsSdk.ToString(ps.Description)
-		member.Annotations = annos
-		member.Id = b.String()
-		member.Slug = fmt.Sprintf("%s access", awsSdk.ToString(ps.Name))
-		rv = append(rv, member)
 	}
-	return rv, "", nil, nil
+	// else: nextPageToken remains empty, signaling completion
+
+	if nextPageToken != "" {
+		return []*v2.Entitlement{member}, &resourceSdk.SyncOpResults{NextPageToken: nextPageToken}, nil
+	}
+	return []*v2.Entitlement{member}, nil, nil
 }


Entitlements emits one item per page: This processes exactly one permission set per call, returning a single-element slice. For accounts with many permission sets, this means N round-trips through the SDK pagination loop with per-call overhead. As @agustin-conductor noted, consider batching multiple permission sets per page (e.g., 10-25) to reduce overhead while still keeping the blast radius small on rate limits.

github-actions · 2026-03-04T00:36:07Z

Claude finished @laurenleach's task in 3m 22s —— View job

PR Review: CXP-116 Containerization and Refactor of Caches

Read repository guidelines
Analyze PR changes (16 files, +794/-470)
Perform comprehensive code review
Post review findings

Issues: 8

Breaking Changes

None.

Issues

pkg/config/config_test.go:22-31 — Test validateConfig always requires external-id when use-assume is set, but production validateConfig only requires it for two-hop mode (when global-role-arn is set). Single-hop configs pass in production but fail the test. See comment →
pkg/connector/account.go:87-143 — Duplicate encode/decodeGrantsPageToken and encode/decodeEntitlementsPageToken share identical JSON↔base64 logic. Generic helpers would eliminate the duplication. See comment →
pkg/connector/account.go:62-75 — wrapAWSError is only applied in account.go. All other builders make AWS API calls without throttle wrapping, so rate limit errors in IAM/SSO/IdentityStore calls won't be surfaced as codes.Unavailable. See comment →
pkg/connector/account.go:181,195,198 — Three leftover aws-connector: prefixes while the rest of the file uses baton-aws:. See comment →
pkg/connector/account.go:477-506 — Three cache-or-fetch functions repeat the same check-cache→fetch→store pattern. A generic getOrSetCache[T] helper would consolidate this. Additionally, session cache read/write failures are treated as hard errors — a cache failure should degrade gracefully. See comment →
pkg/connector/connector.go:36-37 — The TODO(marcos): this might be a bug. comment was dropped when constants were moved from pkg/config. Off-by-one (65 vs error saying 64) still present. See comment →
pkg/connector/connector.go:224 — opts *cli.ConnectorOpts parameter in New() is unused; rename to _. See comment →
pkg/connector/account.go:225-296 — Entitlements() emits exactly one entitlement per pagination call. Consider batching (e.g., 10-25 per call) to reduce overhead for accounts with many permission sets. See comment →

Documentation

Up to date. The GlobalAwsSsoRegionField changed from StringField to SelectField but the field name is unchanged, so no docs update is needed.

github-actions · 2026-03-04T00:38:10Z

pkg/config/config_test.go

+	if v.GetBool(config.UseAssumeField.FieldName) {
+		err := connector.ValidateExternalID(v.GetString(config.ExternalIdField.FieldName))
 		if err != nil {
 			return err
 		}
-		err = connector.IsValidRoleARN(v.GetString(RoleArnField.FieldName))
+		err = connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName))
 		if err != nil {
 			return err
 		}


Test/production validation divergence: This validateConfig always requires external-id when use-assume is set, but the production validateConfig in connector.go:206-221 only requires it for two-hop mode (when global-role-arn is set). A valid single-hop config (--use-assume --role-arn <arn> without --external-id) passes in production but fails this test.

The test should mirror the production validation logic. Fix this →

github-actions · 2026-03-04T00:38:19Z

pkg/connector/account.go

+func encodeGrantsPageToken(state grantsPageState) (string, error) {
+	data, err := json.Marshal(state)
+	if err != nil {
+		return "", err
+	}
+	return base64.StdEncoding.EncodeToString(data), nil
+}
+
+func decodeGrantsPageToken(token string) (grantsPageState, error) {
+	if token == "" {
+		return grantsPageState{}, nil
+	}
+	data, err := base64.StdEncoding.DecodeString(token)
+	if err != nil {
+		return grantsPageState{}, err
+	}
+	var state grantsPageState
+	if err := json.Unmarshal(data, &state); err != nil {
+		return grantsPageState{}, err
+	}
+	return state, nil
+}
+
+func permissionSetIDsCacheKey(accountID string) string {
+	return permissionSetIDsCacheKeyPrefix + accountID
+}
+
+// entitlementsPageState tracks pagination state for the Entitlements function.
+// This allows progress to be saved between calls, reducing the "blast radius"
+// if a rate limit error occurs mid-sync.
+type entitlementsPageState struct {
+	// PermissionSetIndex is the index into the cached permission set IDs list
+	PermissionSetIndex int `json:"psi"`
+}
+
+func encodeEntitlementsPageToken(state entitlementsPageState) (string, error) {
+	data, err := json.Marshal(state)
+	if err != nil {
+		return "", err
+	}
+	return base64.StdEncoding.EncodeToString(data), nil
+}
+
+func decodeEntitlementsPageToken(token string) (entitlementsPageState, error) {
+	if token == "" {
+		return entitlementsPageState{}, nil
+	}
+	data, err := base64.StdEncoding.DecodeString(token)
+	if err != nil {
+		return entitlementsPageState{}, err
+	}
+	var state entitlementsPageState
+	if err := json.Unmarshal(data, &state); err != nil {
+		return entitlementsPageState{}, err
+	}
+	return state, nil
+}


Duplicate encode/decode page token functions: encodeGrantsPageToken/decodeGrantsPageToken and encodeEntitlementsPageToken/decodeEntitlementsPageToken share identical JSON↔base64 logic — only the type parameter differs. As @FeliLucero1 noted, generic helpers would eliminate this duplication:

func encodePageToken[T any](state T) (string, error) { data, err := json.Marshal(state) if err != nil { return "", err } return base64.StdEncoding.EncodeToString(data), nil } func decodePageToken[T any](token string) (T, error) { var zero T if token == "" { return zero, nil } data, err := base64.StdEncoding.DecodeString(token) if err != nil { return zero, err } var state T if err := json.Unmarshal(data, &state); err != nil { return zero, err } return state, nil }

github-actions · 2026-03-04T00:38:27Z

pkg/connector/account.go

+// wrapAWSError checks if the error is an AWS throttling error and wraps it
+// with codes.Unavailable so the baton-sdk sync engine can handle rate limiting.
+func wrapAWSError(err error) error {
+	if err == nil {
+		return nil
+	}
+
+	var apiErr smithy.APIError
+	if errors.As(err, &apiErr) {
+		if _, isThrottle := awsThrottleErrorCodes[apiErr.ErrorCode()]; isThrottle {
+			return status.Error(codes.Unavailable, err.Error())
+		}
+	}
+
+	return err
+}


wrapAWSError breaks the error chain: status.Error(codes.Unavailable, err.Error()) converts the error to a string, so errors.As/errors.Is won't find the original AWS error after wrapping. The gRPC status package doesn't support %w-style wrapping. If the baton-sdk sync engine only inspects the gRPC status code to detect rate limits, this is fine — but add a comment documenting that assumption to prevent future confusion.

github-actions · 2026-03-04T00:38:33Z

pkg/connector/account.go

 	resp, err := o.orgClient.ListAccounts(ctx, listAccountsInput)
 	if err != nil {
-		return nil, "", nil, fmt.Errorf("aws-connector: listAccounts failed: %w", err)
+		return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err))


Inconsistent error prefix: Three occurrences of the old aws-connector: prefix remain in this function (lines 181, 195, 198) while all other new code in this file uses baton-aws:. Standardize to baton-aws: for consistency and easier log grep.

Suggested change

return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err))

return nil, nil, wrapAWSError(fmt.Errorf("baton-aws: listAccounts failed: %w", err))

github-actions · 2026-03-04T00:38:45Z

pkg/connector/account.go

+// getPermissionSetWithCache fetches a permission set by ID, using cache if available.
+func (o *accountResourceType) getPermissionSetWithCache(ctx context.Context, ss sessions.SessionStore, permissionSetId string) (*awsSsoAdminTypes.PermissionSet, error) {
+	cacheKey := permissionSetDetailKeyPrefix + permissionSetId
+
+	// Try cache first
+	if ss != nil {
+		cached, found, err := session.GetJSON[*awsSsoAdminTypes.PermissionSet](ctx, ss, cacheKey)
+		if err != nil {
+			return nil, fmt.Errorf("baton-aws: Session Storage reading operation failed: %w", err)
+		}
+		if found {
+			return cached, nil
+		}
+	}
+
+	// Fetch from AWS API
+	ps, err := o.fetchPermissionSetFromAPI(ctx, permissionSetId)
+	if err != nil {
+		return nil, wrapAWSError(fmt.Errorf("baton-aws: ssoadmin.DescribePermissionSet failed: %w", err))
+	}
+
+	// Cache the result
+	if ss != nil {
+		if err := session.SetJSON(ctx, ss, cacheKey, ps); err != nil {
+			return nil, fmt.Errorf("baton-aws: Session Storage writing operation failed: %w", err)
+		}
+	}
+
+	return ps, nil
+}


Duplicate cache-or-fetch pattern + cache errors as hard failures: getOrFetchPermissionSetIDs, getOrFetchAllPermissionSetIDs, and getPermissionSetWithCache all repeat the same check-cache→fetch→store pattern. As @FeliLucero1 suggested, a generic getOrSetCache[T] helper would consolidate this.

Additionally, session store read/write errors are treated as hard errors that abort the sync. A cache failure is not a fundamental API failure — the data can still be fetched from AWS. Consider logging cache errors at warn level and falling through to the API fetch, so a corrupted cache entry doesn't block the entire sync. Cache write failures (lines 427, 469, 501) similarly abort after the data was already successfully fetched from AWS.

Fix this →

github-actions · 2026-03-04T00:38:54Z

pkg/connector/connector.go

+	externalIDLengthMaximum = 65
+	externalIDLengthMinimum = 32


Dropped TODO + off-by-one: The original constant in pkg/config/config.go had // TODO(marcos): this might be a bug. which was dropped during the move. The off-by-one is still present: externalIDLengthMaximum = 65 with > 65 check allows 65-byte IDs, but the error message at line 201 says "between 32 and 64 bytes". Per the earlier PR discussion this is considered risky to change, but the TODO should be preserved so the discrepancy isn't forgotten.

Suggested change

externalIDLengthMaximum = 65

externalIDLengthMinimum = 32

externalIDLengthMaximum = 65 // TODO(marcos): this might be a bug. Error message says max 64 but this allows 65.

externalIDLengthMinimum = 32

github-actions · 2026-03-04T00:39:01Z

pkg/connector/connector.go

+	return nil
+}
+
+func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {


Unused parameter: opts *cli.ConnectorOpts is never referenced in the function body. If required by the ConnectorBuilderV2 interface contract, rename to _ to make this explicit.

Suggested change

func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {

func New(ctx context.Context, awsc *cfg.Aws, _ *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {

github-actions · 2026-03-04T00:39:08Z

pkg/connector/account.go

+func (o *accountResourceType) Entitlements(ctx context.Context, resource *v2.Resource, opts resourceSdk.SyncOpAttrs) ([]*v2.Entitlement, *resourceSdk.SyncOpResults, error) {
+	// Parse page state from token
+	pageState, err := decodeEntitlementsPageToken(opts.PageToken.Token)
+	if err != nil {
+		return nil, nil, fmt.Errorf("baton-aws: failed to decode page token: %w", err)
+	}
+
+	// Step 1: Get or fetch all permission set IDs (cached globally per sync)
+	// We fetch all permission sets so that even on accounts that aren't associated with a permission set
 	// you could do a Grant Request for it -- and we'll just have an entitlement with zero entries in it in ListGrants.
-	allPS, err := o.getPermissionSets(ctx)
+	permissionSetIDs, err := o.getOrFetchAllPermissionSetIDs(ctx, opts.Session)
 	if err != nil {
-		return nil, "", nil, fmt.Errorf("aws-connector: getPermissionSets failed: %w", err)
+		return nil, nil, err
 	}
-	rv := make([]*v2.Entitlement, 0, len(allPS))
-	for _, ps := range allPS {
-		b := &PermissionSetBinding{
-			AccountID:       resource.Id.Resource,
-			PermissionSetId: awsSdk.ToString(ps.PermissionSetArn),
+
+	// If no permission sets, we're done
+	if len(permissionSetIDs) == 0 {
+		return nil, nil, nil
+	}
+
+	// If we've processed all permission sets, we're done
+	if pageState.PermissionSetIndex >= len(permissionSetIDs) {
+		return nil, nil, nil
+	}
+
+	// Step 2: Get the current permission set to process
+	currentPsId := permissionSetIDs[pageState.PermissionSetIndex]
+
+	// Step 3: Fetch the permission set details (with caching)
+	ps, err := o.getPermissionSetWithCache(ctx, opts.Session, currentPsId)
+	if err != nil {
+		return nil, nil, err
+	}
+
+	// Step 4: Build the entitlement for this permission set
+	b := &PermissionSetBinding{
+		AccountID:       resource.Id.Resource,
+		PermissionSetId: awsSdk.ToString(ps.PermissionSetArn),
+	}
+	var annos annotations.Annotations
+	annos.Update(&v2.V1Identifier{
+		Id: b.String(),
+	})
+	displayName := fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name))
+	member := entitlementSdk.NewAssignmentEntitlement(resource, displayName,
+		entitlementSdk.WithGrantableTo(resourceTypeSSOUser, resourceTypeSSOGroup),
+	)
+	member.Description = awsSdk.ToString(ps.Description)
+	member.Annotations = annos
+	member.Id = b.String()
+	member.Slug = fmt.Sprintf("%s access", awsSdk.ToString(ps.Name))
+
+	// Step 5: Determine next page state
+	var nextPageToken string
+	if pageState.PermissionSetIndex+1 < len(permissionSetIDs) {
+		// More permission sets to process
+		nextState := entitlementsPageState{
+			PermissionSetIndex: pageState.PermissionSetIndex + 1,
+		}
+
+		nextPageToken, err = encodeEntitlementsPageToken(nextState)
+		if err != nil {
+			return nil, nil, fmt.Errorf("baton-aws: failed to encode page token: %w", err)
 		}
-		var annos annotations.Annotations
-		annos.Update(&v2.V1Identifier{
-			Id: b.String(),
-		})
-		displayName := fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name))
-		member := entitlementSdk.NewAssignmentEntitlement(resource, displayName,
-			entitlementSdk.WithGrantableTo(resourceTypeSSOUser, resourceTypeSSOGroup),
-		)
-		member.Description = awsSdk.ToString(ps.Description)
-		member.Annotations = annos
-		member.Id = b.String()
-		member.Slug = fmt.Sprintf("%s access", awsSdk.ToString(ps.Name))
-		rv = append(rv, member)
 	}
-	return rv, "", nil, nil
+	// else: nextPageToken remains empty, signaling completion
+
+	if nextPageToken != "" {
+		return []*v2.Entitlement{member}, &resourceSdk.SyncOpResults{NextPageToken: nextPageToken}, nil
+	}
+	return []*v2.Entitlement{member}, nil, nil
 }


Entitlements emits one item per page: This processes exactly one permission set per call, returning a single-element slice. For accounts with many permission sets, this means N round-trips through the SDK pagination loop with per-call serialization/deserialization overhead. As @agustin-conductor noted, consider batching multiple permission sets per page (e.g., 10-25) to reduce overhead while still keeping the blast radius small on rate limits.

github-actions · 2026-03-04T00:39:22Z

pkg/connector/account.go

+func wrapAWSError(err error) error {
+	if err == nil {
+		return nil
+	}
+
+	var apiErr smithy.APIError
+	if errors.As(err, &apiErr) {
+		if _, isThrottle := awsThrottleErrorCodes[apiErr.ErrorCode()]; isThrottle {
+			return status.Error(codes.Unavailable, err.Error())
+		}
+	}
+
+	return err


wrapAWSError only used in account.go: This function was introduced in this PR but is only applied in account.go. All other builders (account_iam.go, iam_group.go, iam_user.go, role.go, secrets.go, sso_group.go, sso_user.go) make AWS API calls without throttle wrapping, so rate limit errors from IAM/SSO/IdentityStore calls will not be surfaced as codes.Unavailable for the SDK sync engine to handle. This significantly reduces the value of the throttle wrapping since the most common rate-limited calls (ListUsers, ListGroups, ListGroupMemberships) are unprotected.

…udes all resource types

github-actions · 2026-03-04T07:39:54Z

Claude finished @laurenleach's task in 8m 6s —— View job

PR Review: CXP-116 Containerization and Refactor of Caches

Read repository guidelines
Analyze PR changes (16 files, +823/-470)
Perform comprehensive code review
Post review findings

Issues: 8

Breaking Changes

None.

Issues

pkg/config/config_test.go:22-31 — Test validateConfig always requires external-id when use-assume is set, but production validateConfig only requires it for two-hop mode (when global-role-arn is set). Single-hop configs pass in production but fail the test. See comment →
pkg/connector/account.go:62-74 — wrapAWSError is only applied in account.go. All other builders make AWS API calls without throttle wrapping, so rate limit errors in IAM/SSO/IdentityStore calls won't be surfaced as codes.Unavailable. Also, status.Error(codes.Unavailable, err.Error()) breaks the error chain — document the assumption that the SDK only checks the gRPC status code. See comment →
pkg/connector/account.go:87-143 — Duplicate encode/decodeGrantsPageToken and encode/decodeEntitlementsPageToken share identical JSON↔base64 logic. Generic helpers would eliminate this duplication. See comment →
pkg/connector/account.go:181,195,198 — Three leftover aws-connector: prefixes while the rest of the file uses baton-aws:. See comment →
pkg/connector/account.go:225-296 — Entitlements() emits exactly one entitlement per pagination call. Consider batching (e.g., 10-25 per call) to reduce overhead for accounts with many permission sets. See comment →
pkg/connector/account.go:389-506 — Three cache-or-fetch functions repeat the same check-cache→fetch→store pattern. A generic getOrSetCache[T] helper would consolidate this. Additionally, session cache read/write failures are treated as hard errors — a cache failure should degrade gracefully. Pagination loops also missing ctx.Done() checks. See comments →
pkg/connector/connector.go:36-37 — The TODO(marcos): this might be a bug. comment was dropped when constants were moved from pkg/config. Off-by-one (65 vs error saying 64) still present. See comment →
pkg/connector/connector.go:224 — opts *cli.ConnectorOpts parameter in New() is unused; rename to _. See comment →

Documentation

Up to date. No resource types, config fields, or capabilities were added/removed. The GlobalAwsSsoRegionField changed from StringField to SelectField but the field name is unchanged, so no docs update is needed.

github-actions · 2026-03-04T07:46:48Z

pkg/config/config_test.go

+		err = connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName))
 		if err != nil {
 			return err
 		}


Test/production validation divergence: This validateConfig always requires external-id when use-assume is set, but the production validateConfig in connector.go:206-221 only requires it for two-hop mode (when global-role-arn is set). Valid single-hop configs pass in production but fail this test. The test should mirror the production logic by only requiring external-id when global-role-arn is also set.

github-actions · 2026-03-04T07:46:53Z

pkg/connector/account.go

 	resp, err := o.orgClient.ListAccounts(ctx, listAccountsInput)
 	if err != nil {
-		return nil, "", nil, fmt.Errorf("aws-connector: listAccounts failed: %w", err)
+		return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err))


Inconsistent error prefix: This uses aws-connector: while the rest of the new code uses baton-aws:. Lines 195 and 198 also use the old prefix in debug messages. Standardize to baton-aws: for consistency.

Suggested change

return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err))

return nil, nil, wrapAWSError(fmt.Errorf("baton-aws: listAccounts failed: %w", err))

github-actions · 2026-03-04T07:47:01Z

pkg/connector/account.go

+		return entitlementsPageState{}, err
+	}
+	return state, nil
+}


Duplicate encode/decode page token functions: encodeGrantsPageToken/decodeGrantsPageToken and encodeEntitlementsPageToken/decodeEntitlementsPageToken share identical JSON↔base64 logic — only the type parameter differs. As @FeliLucero1 noted, generic helpers would eliminate this duplication:

func encodePageToken[T any](state T) (string, error) { data, err := json.Marshal(state) if err != nil { return "", err } return base64.StdEncoding.EncodeToString(data), nil } func decodePageToken[T any](token string) (T, error) { var zero T if token == "" { return zero, nil } data, err := base64.StdEncoding.DecodeString(token) if err != nil { return zero, err } var state T if err := json.Unmarshal(data, &state); err != nil { return zero, err } return state, nil }

github-actions · 2026-03-04T07:47:11Z

pkg/connector/account.go

+		}
+	}
+
+	return err


wrapAWSError only applied in account.go: This function was introduced in this PR but is only used in account.go. All other builders (account_iam.go, iam_group.go, iam_user.go, role.go, secrets.go, sso_group.go, sso_user.go) make AWS API calls without throttle wrapping. Rate limit errors from IAM/SSO/IdentityStore calls in those files will not be surfaced as codes.Unavailable for the SDK sync engine to handle. This significantly reduces the value of the throttle wrapping since common rate-limited calls (ListUsers, ListGroups, ListGroupMemberships) are unprotected.

Also note: status.Error(codes.Unavailable, err.Error()) converts the error to a string, breaking the error chain for errors.As/errors.Is. If the baton-sdk sync engine only inspects the gRPC status code, this is fine — but a comment documenting that assumption would prevent future confusion.

github-actions · 2026-03-04T07:47:20Z

pkg/connector/account.go

-				PermissionSetArn: ps.PermissionSetArn,
-			}
+	return permissionSetIDs, nil
+}


Duplicate cache-or-fetch pattern + cache errors as hard failures: getOrFetchPermissionSetIDs, getOrFetchAllPermissionSetIDs (line 436), and getPermissionSetWithCache (line 477) all repeat the same check-cache→fetch→store pattern. As @FeliLucero1 suggested, a generic getOrSetCache[T] helper would consolidate this.

Additionally, session store read/write errors are treated as hard errors that abort the sync. A cache failure is not a fundamental API failure — the data can still be fetched from AWS. Consider logging cache errors at warn level and falling through to the API fetch, so a corrupted cache entry doesn't block the entire sync. Cache write failures (lines 427, 469, 501) similarly abort after the data was already successfully fetched.

github-actions · 2026-03-04T07:47:29Z

pkg/connector/account.go

+		return []*v2.Entitlement{member}, &resourceSdk.SyncOpResults{NextPageToken: nextPageToken}, nil
+	}
+	return []*v2.Entitlement{member}, nil, nil
 }


Entitlements emits one item per page: This processes exactly one permission set per call, returning a single-element slice. For accounts with many permission sets, this means N round-trips through the SDK pagination loop with per-call serialization/deserialization overhead. As @agustin-conductor noted, consider batching multiple permission sets per page (e.g., 10-25) to reduce overhead while still keeping the blast radius small on rate limits.

github-actions · 2026-03-04T07:47:37Z

pkg/connector/connector.go

+	externalIDLengthMaximum = 65
+	externalIDLengthMinimum = 32


Dropped TODO + off-by-one: The original constant in pkg/config/config.go had // TODO(marcos): this might be a bug. which was dropped during the move. The off-by-one is still present: externalIDLengthMaximum = 65 with > 65 check allows 65-byte IDs, but the error message at line 201 says "between 32 and 64 bytes". Per the earlier PR discussion this is considered risky to change, but the TODO should be preserved.

Suggested change

externalIDLengthMaximum = 65

externalIDLengthMinimum = 32

externalIDLengthMaximum = 65 // TODO(marcos): this might be a bug. Error message says max 64 but this allows 65.

externalIDLengthMinimum = 32

github-actions · 2026-03-04T07:47:44Z

pkg/connector/connector.go

+	return nil
+}
+
+func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {


Unused parameter: opts *cli.ConnectorOpts is never referenced in the function body. If required by the ConnectorBuilderV2 interface contract, rename to _ to make this explicit.

Suggested change

func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {

func New(ctx context.Context, awsc *cfg.Aws, _ *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {

github-actions · 2026-03-04T07:47:53Z

pkg/connector/account.go

+		if !psBindingPaginator.HasMorePages() {
+			break
+		}
+	}


Pagination loops missing ctx.Done() check: Both getOrFetchPermissionSetIDs (this function) and getOrFetchAllPermissionSetIDs (line 454) contain inner pagination loops that do not check ctx.Done(). If the context is cancelled (e.g., timeout or shutdown), the loop will continue iterating until the next API call happens to fail.

Add a context cancellation check at the top of each loop body:

for { select { case <-ctx.Done(): return nil, ctx.Err() default: } // ... existing code }

JavierCarnelli-ConductorOne self-assigned this Feb 9, 2026

JavierCarnelli-ConductorOne requested a review from a team February 9, 2026 15:48

JavierCarnelli-ConductorOne requested review from a team, FeliLucero1, agustin-conductor, aldevv, btipling, johnallers, luisina-santos, mateoHernandez123, mateovespConductor and sergiocorral-conductorone February 9, 2026 15:49

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

agustin-conductor reviewed Feb 11, 2026

View reviewed changes

cmd/baton-aws/main.go Outdated Show resolved Hide resolved

agustin-conductor reviewed Feb 11, 2026

View reviewed changes

pkg/connector/account.go Outdated Show resolved Hide resolved

agustin-conductor reviewed Feb 11, 2026

View reviewed changes

pkg/connector/account.go Outdated Show resolved Hide resolved

mateovespConductor reviewed Feb 12, 2026

View reviewed changes

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

pkg/connector/account.go Outdated Show resolved Hide resolved

JavierCarnelli-ConductorOne requested review from agustin-conductor and mateovespConductor February 17, 2026 04:55

JavierCarnelli-ConductorOne assigned btipling Feb 17, 2026

FeliLucero1 reviewed Feb 17, 2026

View reviewed changes

agustin-conductor reviewed Feb 18, 2026

View reviewed changes

agustin-conductor approved these changes Feb 19, 2026

View reviewed changes

johnallers approved these changes Feb 23, 2026

View reviewed changes

github-actions bot reviewed Mar 3, 2026

View reviewed changes

make global sso region a select field

81ba141

github-actions bot reviewed Mar 4, 2026

View reviewed changes

implement default capabilities builder so generated capabilities incl…

d26e117

…udes all resource types

github-actions bot reviewed Mar 4, 2026

View reviewed changes

	return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err))
	return nil, nil, wrapAWSError(fmt.Errorf("baton-aws: listAccounts failed: %w", err))

	func New(ctx context.Context, awsc cfg.Aws, opts cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {
	func New(ctx context.Context, awsc cfg.Aws, _ cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) {

Conversation

JavierCarnelli-ConductorOne commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

linear bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mateovespConductor left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JavierCarnelli-ConductorOne Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agustin-conductor left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 3, 2026

Choose a reason for hiding this comment

JavierCarnelli-ConductorOne commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

coderabbitai bot Feb 9, 2026 •

edited

Loading

mateovespConductor left a comment •

edited

Loading

JavierCarnelli-ConductorOne Feb 17, 2026 •

edited

Loading

github-actions bot commented Mar 4, 2026 •

edited

Loading

github-actions bot commented Mar 4, 2026 •

edited

Loading