CXP-116 Containerization and refactor of caches#101
CXP-116 Containerization and refactor of caches#101JavierCarnelli-ConductorOne wants to merge 6 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces legacy pagination/token+annotations APIs with SyncOpAttrs/SyncOpResults across connector resource types, moves external-id/config validation into the connector, and adds session-backed caching and paginated state encoding for AWS SSO permission-set lists, details, grants, and entitlements. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Connector
participant Session as sessions.SessionStore
participant SSO as AWS SSO Admin API
rect rgba(200,200,255,0.5)
Client->>Session: GetJSON("permission-sets")
alt cached list present
Session-->>Client: permissionSetIDs
else cache miss
Session-->>Client: nil
Client->>SSO: ListPermissionSets()
SSO-->>Client: permissionSetIDs
Client->>Session: SetJSON("permission-sets", permissionSetIDs)
end
Client->>Session: GetManyJSON(["permission-set:<id>...", ...])
alt some details missing
Client->>SSO: DescribePermissionSet(id)
SSO-->>Client: permissionSetDetail
Client->>Session: SetJSON("permission-set:<id>", permissionSetDetail)
end
Client-->>Client: build entitlements/grants (encode page state if truncated)
Client-->>Client: return []*Entitlement / []*Grant + SyncOpResults
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/config/config_test.go (1)
21-33:⚠️ Potential issue | 🟡 MinorAlign test validation with single‑hop behavior.
The test currently requires an external ID whenever
use-assumeis set, but the production validator only enforces it when a global role ARN is configured. This makes the tests reject valid single‑hop configs.Suggested fix
func validateConfig(ctx context.Context, v *viper.Viper) error { if v.GetBool(config.UseAssumeField.FieldName) { - err := connector.ValidateExternalID(v.GetString(config.ExternalIdField.FieldName)) - if err != nil { - return err - } - err = connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName)) - if err != nil { - return err - } + if err := connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName)); err != nil { + return err + } + if v.GetString(config.GlobalRoleArnField.FieldName) != "" { + if err := connector.ValidateExternalID(v.GetString(config.ExternalIdField.FieldName)); err != nil { + return err + } + } } return nil }pkg/connector/iam_user.go (1)
233-255:⚠️ Potential issue | 🟡 MinorBug: Inverted error-check logic in
CreateAccount.The condition
if result, err := o.iamClient.GetUser(...); err == nilmeans the user was found successfully. Inside this block, checkingerrors.As(err, &noSuchEntity)will always be false sinceerrisnil. The current logic returns the existing user as a success regardless, which may be intentional for idempotency, but the deadNoSuchEntityExceptioncheck is misleading.If the intent is to return an error when GetUser fails (for reasons other than "not found"), the logic should be inverted:
Suggested fix
- if result, err := o.iamClient.GetUser(ctx, &iam.GetUserInput{UserName: awsSdk.String(username)}); err == nil { - var noSuchEntity *iamTypes.NoSuchEntityException - if errors.As(err, &noSuchEntity) { - return nil, nil, nil, fmt.Errorf("aws-connector: iam.GetUser failed: %w", err) - } + result, err := o.iamClient.GetUser(ctx, &iam.GetUserInput{UserName: awsSdk.String(username)}) + if err != nil { + var noSuchEntity *iamTypes.NoSuchEntityException + if !errors.As(err, &noSuchEntity) { + return nil, nil, nil, fmt.Errorf("aws-connector: iam.GetUser failed: %w", err) + } + // User doesn't exist, proceed to create + } else { + // User already exists, return it annos := &v2.V1Identifier{ Id: awsSdk.ToString(result.User.Arn), }
🤖 Fix all issues with AI agents
In `@pkg/connector/account.go`:
- Around line 33-37: The comment above the cache key constants (the
"SessionStore cache keys" comment) is missing a terminating period; update that
comment to end with a period so the linter is satisfied. Locate the comment
above permissionSetsCacheKey and permissionSetDetailKeyPrefix in the file
(pkg/connector/account.go) and add the period to the end of the sentence.
In `@pkg/connector/connector.go`:
- Around line 36-38: The externalIDLengthMaximum constant
(externalIDLengthMaximum) is off-by-one compared to the ValidateExternalID logic
and error text; update externalIDLengthMaximum to 64 (to match
ValidateExternalID and the error message) or alternatively make
ValidateExternalID and its error message consistent with the constant—ensure the
symbols externalIDLengthMaximum, externalIDLengthMinimum and the
ValidateExternalID function all use the same max value so invalid IDs aren’t
allowed through.
🧹 Nitpick comments (1)
pkg/connector/account.go (1)
644-711: Consider logging cache errors for observability.The caching implementation silently ignores errors from
session.GetJSON,session.SetJSON, andsession.SetManyJSON. While this is a common pattern for optional caching, logging these errors at debug level would help with troubleshooting cache issues in production.Example: Add debug logging for cache errors
func (o *accountResourceType) getPermissionSets(ctx context.Context, ss sessions.SessionStore) ([]*awsSsoAdminTypes.PermissionSet, error) { + l := ctxzap.Extract(ctx) if ss != nil { cached, found, err := session.GetJSON[[]*awsSsoAdminTypes.PermissionSet](ctx, ss, permissionSetsCacheKey) - if err == nil && found { + if err != nil { + l.Debug("aws-connector: failed to get permission sets from cache", zap.Error(err)) + } else if found { return cached, nil } }
| const ( | ||
| externalIDLengthMaximum = 65 | ||
| externalIDLengthMinimum = 32 |
There was a problem hiding this comment.
External ID max length mismatch (65 vs 64).
ValidateExternalID says max is 64, but the constant allows 65. This lets invalid values through and conflicts with the error message.
Suggested fix
- externalIDLengthMaximum = 65
+ externalIDLengthMaximum = 64
externalIDLengthMinimum = 32📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const ( | |
| externalIDLengthMaximum = 65 | |
| externalIDLengthMinimum = 32 | |
| const ( | |
| externalIDLengthMaximum = 64 | |
| externalIDLengthMinimum = 32 |
🤖 Prompt for AI Agents
In `@pkg/connector/connector.go` around lines 36 - 38, The externalIDLengthMaximum
constant (externalIDLengthMaximum) is off-by-one compared to the
ValidateExternalID logic and error text; update externalIDLengthMaximum to 64
(to match ValidateExternalID and the error message) or alternatively make
ValidateExternalID and its error message consistent with the constant—ensure the
symbols externalIDLengthMaximum, externalIDLengthMinimum and the
ValidateExternalID function all use the same max value so invalid IDs aren’t
allowed through.
There was a problem hiding this comment.
Where did you check this value? Is there an official docs site for me to look at?
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
AWS STS AssumeRole external ID length requirements documentation
💡 Result:
For AWS STS AssumeRole, the ExternalId parameter has these documented requirements:
- Length: 2 to 1,224 characters [1], [2]
- Allowed characters / pattern: alphanumeric with no whitespace; may also include:
+ = , . @ : / -and underscore_(regex:[\w+=,.@:\/-]*) [1], [2]
Docs:
- STS API Reference –
AssumeRole→ExternalId[1] - IAM User Guide – “IAM and AWS STS quotas” (external ID character limits) [2]
@JavierCarnelli-ConductorOne, according to the official AWS STS API documentation, the ExternalId parameter for AssumeRole actually supports 2 to 1,224 characters, not 32-64.
The official AWS documentation states external IDs can be 2 to 1,224 characters, with allowed characters being alphanumeric plus + = , . @ : / - and underscore.
So the 32-64 byte constraint in this code appears to be a custom/internal business rule, not an AWS service requirement.
However, the off-by-one issue still exists within your custom validation:
- Constant says:
externalIDLengthMaximum = 65 - Error message says:
"aws_external_id must be between 32 and 64 bytes" - These need to match
Questions:
- Is there a specific business/security reason for enforcing the stricter 32-64 character limit?
- Should the maximum be 64 or 65 for your custom constraint?
There was a problem hiding this comment.
I agree on this observation, however I'm unsure if we should modify this.
I feel like it's a big risk to modify this boundaries.
I'll just leave this comment here. Would require further analysis on the impact of updating this
There was a problem hiding this comment.
@JavierCarnelli-ConductorOne, understood—I appreciate the cautious approach given the risk involved in changing these validation boundaries. Leaving the inconsistency documented here for future analysis makes sense.
✏️ Learnings added
Learnt from: JavierCarnelli-ConductorOne
Repo: ConductorOne/baton-aws PR: 101
File: pkg/connector/connector.go:36-38
Timestamp: 2026-02-12T18:28:05.444Z
Learning: In the AWS connector (pkg/connector/connector.go), the external ID length validation boundaries (externalIDLengthMaximum = 65, externalIDLengthMinimum = 32) are considered risky to modify and require careful impact analysis before any changes, even to fix inconsistencies with error messages.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
I agree with the comments left by @agustin-conductor . I think these changes are necessary
pkg/connector/account.go
Outdated
| } | ||
|
|
||
| if ss != nil && len(toCache) > 0 { | ||
| _ = session.SetManyJSON(ctx, ss, toCache) |
There was a problem hiding this comment.
I think we should not ignore the error here because it could silently fail the cache
There was a problem hiding this comment.
I've got Claude to make several changes to the code so you probably won't see this updated in the exact same place.
Please take a look to the refactored functions
Thanks in advance Beto! I really appreciate the feedback
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/connector/account.go (2)
293-293: Dead code:assignmentsInput.NextTokenassignment is unused.The AWS SDK paginator manages pagination internally. Setting
assignmentsInput.NextTokenmanually has no effect since the paginator doesn't use this input after initialization. This line can be removed.🧹 Proposed fix
} - assignmentsInput.NextToken = assignmentsResp.NextToken if !assignmentsPaginator.HasMorePages() { break }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/connector/account.go` at line 293, Remove the dead assignment to assignmentsInput.NextToken (it is set from assignmentsResp.NextToken) since the AWS SDK paginator manages pagination and ignores changes to the input after initialization; locate the pagination loop where assignmentsInput and assignmentsResp are used (the block that iterates with the paginator) and delete the line setting assignmentsInput.NextToken so the code relies solely on the paginator for advancing pages.
720-723: Inconsistent error handling for session operations.In
getPermissionSets, errors fromsession.GetManyJSONare silently ignored (line 722), while inGrants(lines 183-184), similar errors cause the function to return an error. The same inconsistency exists forSetManyJSON.If session errors should be fatal in
Grants, they should likely be consistent here. Alternatively, if ignoring cache errors is intentional (fall back to API), consider applying the same lenient approach inGrantsfor consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/connector/account.go` around lines 720 - 723, getPermissionSets currently swallows errors from session.GetManyJSON while Grants returns on similar session errors, causing inconsistent behavior; update getPermissionSets (and mirror in any SetManyJSON usage) to handle session errors consistently with Grants by checking the error returned from session.GetManyJSON/SetManyJSON (when ss != nil) and either return the error up the stack (make it fatal like in Grants) or instead log the error and continue (make Grants lenient) — pick one approach and apply it consistently, referencing the session variable ss, the cachedDetails variable, and the GetManyJSON/SetManyJSON calls so the change is applied in the correct functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/connector/account.go`:
- Line 304: Fix the typo in the error message string by replacing "writting"
with "writing" in the fmt.Errorf call that returns "aws-connector: Session
Storage writting operation failed: %w" (found in pkg/connector/account.go);
update that message so it reads "aws-connector: Session Storage writing
operation failed: %w" to correct the spelling while preserving the error wrap.
---
Nitpick comments:
In `@pkg/connector/account.go`:
- Line 293: Remove the dead assignment to assignmentsInput.NextToken (it is set
from assignmentsResp.NextToken) since the AWS SDK paginator manages pagination
and ignores changes to the input after initialization; locate the pagination
loop where assignmentsInput and assignmentsResp are used (the block that
iterates with the paginator) and delete the line setting
assignmentsInput.NextToken so the code relies solely on the paginator for
advancing pages.
- Around line 720-723: getPermissionSets currently swallows errors from
session.GetManyJSON while Grants returns on similar session errors, causing
inconsistent behavior; update getPermissionSets (and mirror in any SetManyJSON
usage) to handle session errors consistently with Grants by checking the error
returned from session.GetManyJSON/SetManyJSON (when ss != nil) and either return
the error up the stack (make it fatal like in Grants) or instead log the error
and continue (make Grants lenient) — pick one approach and apply it
consistently, referencing the session variable ss, the cachedDetails variable,
and the GetManyJSON/SetManyJSON calls so the change is applied in the correct
functions.
| PermissionSetIndex int `json:"psi"` | ||
| } | ||
|
|
||
| func encodeEntitlementsPageToken(state entitlementsPageState) (string, error) { |
There was a problem hiding this comment.
encodeGrantsPageToken/decodeGrantsPageToken and encodeEntitlementsPageToken/decodeEntitlementsPageToken follow the same pattern: JSON marshal → base64, and base64 → JSON unmarshal. Only the state type changes.
You can create a generic helper, such as:
func encodePageToken[T any](state T) (string, error) {
data, err := json.Marshal(state)
if err != nil { return "", err }
return base64.StdEncoding.EncodeToString(data), nil
}
There was a problem hiding this comment.
Also
func decodePageToken[T any](token string, state *T) error { ... }
| } | ||
|
|
||
| // getOrFetchPermissionSetIDs retrieves permission set IDs for an account, using cache if available. | ||
| func (o *accountResourceType) getOrFetchPermissionSetIDs(ctx context.Context, ss sessions.SessionStore, accountID string) ([]string, error) { |
There was a problem hiding this comment.
I would replace getOrFetchAllPermissionSetIDs and getPermissionSetWithCache with this:
func getOrSetCache[T any](ctx context.Context, ss sessions.SessionStore, cacheKey string, fetch func(context.Context) (T, error)) (T, error) {
var zero T
if ss != nil {
cached, found, err := session.GetJSON[T](ctx, ss, cacheKey)
if err != nil {
return zero, fmt.Errorf("%s: %w", errSessionStorageRead, err)
}
if found {
return cached, nil
}
}
val, err := fetch(ctx)
if err != nil {
return zero, err
}
if ss != nil {
if err := session.SetJSON(ctx, ss, cacheKey, val); err != nil {
return zero, fmt.Errorf("%s: %w", errSessionStorageWrite, err)
}
}
return val, nil
}
| version, | ||
| cfg.Config, | ||
| connector.New, | ||
| connectorrunner.WithSessionStoreEnabled(), |
There was a problem hiding this comment.
i think we might want this https://github.com/ConductorOne/baton-lastpass/pull/4#discussion_r2822595562
pkg/connector/connector.go
Outdated
| } | ||
|
|
||
| func validateConfig(awsc *cfg.Aws) error { | ||
| if awsc.GetBool(cfg.UseAssumeField.FieldName) { |
There was a problem hiding this comment.
i think we can just do awsc.UseAssume here, same for RoleArn
pkg/connector/connector.go
Outdated
|
|
||
| config := Config{ | ||
| GlobalBindingExternalID: awsc.GetString(cfg.GlobalBindingExternalIdField.FieldName), | ||
| GlobalRegion: awsc.GetString(cfg.GlobalRegionField.FieldName), |
There was a problem hiding this comment.
same comment, can we access the value directly? awsc.GlobalRegion ?
agustin-conductor
left a comment
There was a problem hiding this comment.
I think we might be able to process accounts entitlements more than 1 permission set at the time, other than that, LGTM
| func validateConfig(ctx context.Context, v *viper.Viper) error { | ||
| if v.GetBool(UseAssumeField.FieldName) { | ||
| err := ValidateExternalId(v.GetString(ExternalIdField.FieldName)) | ||
| if v.GetBool(config.UseAssumeField.FieldName) { | ||
| err := connector.ValidateExternalID(v.GetString(config.ExternalIdField.FieldName)) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| err = connector.IsValidRoleARN(v.GetString(RoleArnField.FieldName)) | ||
| err = connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName)) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Test/production validation divergence: This validateConfig always requires external-id when use-assume is set, but the production validateConfig in connector.go:206-221 only requires it for two-hop mode (when global-role-arn is set). Valid single-hop configs (IRSA → target role) would fail this test but pass in production. The test should mirror the production logic by only requiring external-id when global-role-arn is also set.
| func encodeGrantsPageToken(state grantsPageState) (string, error) { | ||
| data, err := json.Marshal(state) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return base64.StdEncoding.EncodeToString(data), nil | ||
| } | ||
|
|
||
| func decodeGrantsPageToken(token string) (grantsPageState, error) { | ||
| if token == "" { | ||
| return grantsPageState{}, nil | ||
| } | ||
| data, err := base64.StdEncoding.DecodeString(token) | ||
| if err != nil { | ||
| return grantsPageState{}, err | ||
| } | ||
| var state grantsPageState | ||
| if err := json.Unmarshal(data, &state); err != nil { | ||
| return grantsPageState{}, err | ||
| } | ||
| return state, nil | ||
| } | ||
|
|
||
| func permissionSetIDsCacheKey(accountID string) string { | ||
| return permissionSetIDsCacheKeyPrefix + accountID | ||
| } | ||
|
|
||
| // entitlementsPageState tracks pagination state for the Entitlements function. | ||
| // This allows progress to be saved between calls, reducing the "blast radius" | ||
| // if a rate limit error occurs mid-sync. | ||
| type entitlementsPageState struct { | ||
| // PermissionSetIndex is the index into the cached permission set IDs list | ||
| PermissionSetIndex int `json:"psi"` | ||
| } | ||
|
|
||
| func encodeEntitlementsPageToken(state entitlementsPageState) (string, error) { | ||
| data, err := json.Marshal(state) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return base64.StdEncoding.EncodeToString(data), nil | ||
| } | ||
|
|
||
| func decodeEntitlementsPageToken(token string) (entitlementsPageState, error) { | ||
| if token == "" { | ||
| return entitlementsPageState{}, nil | ||
| } | ||
| data, err := base64.StdEncoding.DecodeString(token) | ||
| if err != nil { | ||
| return entitlementsPageState{}, err | ||
| } | ||
| var state entitlementsPageState | ||
| if err := json.Unmarshal(data, &state); err != nil { | ||
| return entitlementsPageState{}, err | ||
| } | ||
| return state, nil | ||
| } |
There was a problem hiding this comment.
Duplicate encode/decode page token functions: encodeGrantsPageToken/decodeGrantsPageToken and encodeEntitlementsPageToken/decodeEntitlementsPageToken share identical JSON↔base64 logic. As @FeliLucero1 noted, generic helpers would eliminate this:
func encodePageToken[T any](state T) (string, error) {
data, err := json.Marshal(state)
if err != nil { return "", err }
return base64.StdEncoding.EncodeToString(data), nil
}
func decodePageToken[T any](token string) (T, error) {
var zero T
if token == "" { return zero, nil }
data, err := base64.StdEncoding.DecodeString(token)
if err != nil { return zero, err }
var state T
if err := json.Unmarshal(data, &state); err != nil { return zero, err }
return state, nil
}| func wrapAWSError(err error) error { | ||
| if err == nil { | ||
| return nil | ||
| } | ||
|
|
||
| var apiErr smithy.APIError | ||
| if errors.As(err, &apiErr) { | ||
| if _, isThrottle := awsThrottleErrorCodes[apiErr.ErrorCode()]; isThrottle { | ||
| return status.Error(codes.Unavailable, err.Error()) | ||
| } | ||
| } | ||
|
|
||
| return err | ||
| } |
There was a problem hiding this comment.
wrapAWSError breaks the error chain: status.Error(codes.Unavailable, err.Error()) converts the error to a string, so errors.As/errors.Is won't find the original AWS error after wrapping. The gRPC status package doesn't support %w-style wrapping. If the baton-sdk sync engine only inspects the gRPC status code, this is acceptable — but a comment documenting that assumption would prevent future confusion.
| resp, err := o.orgClient.ListAccounts(ctx, listAccountsInput) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("aws-connector: listAccounts failed: %w", err) | ||
| return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err)) |
There was a problem hiding this comment.
Inconsistent error prefix: This uses aws-connector: while the rest of the new code uses baton-aws:. Lines 195 and 198 also use the old prefix in debug messages.
| return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err)) | |
| return nil, nil, wrapAWSError(fmt.Errorf("baton-aws: listAccounts failed: %w", err)) |
| // getOrFetchPermissionSetIDs retrieves permission set IDs for an account, using cache if available. | ||
| func (o *accountResourceType) getOrFetchPermissionSetIDs(ctx context.Context, ss sessions.SessionStore, accountID string) ([]string, error) { | ||
| cacheKey := permissionSetIDsCacheKey(accountID) | ||
|
|
||
| // Try cache first | ||
| if ss != nil { | ||
| cached, found, err := session.GetJSON[[]string](ctx, ss, cacheKey) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("baton-aws: Session Storage reading operation failed: %w", err) | ||
| } | ||
| if found { | ||
| return cached, nil | ||
| } | ||
| } | ||
|
|
||
| // Fetch from AWS API | ||
| var permissionSetIDs []string | ||
| psBindingInput := &awsSsoAdmin.ListPermissionSetsProvisionedToAccountInput{ | ||
| AccountId: awsSdk.String(resource.Id.Resource), | ||
| AccountId: awsSdk.String(accountID), | ||
| InstanceArn: o.identityInstance.InstanceArn, | ||
| } | ||
|
|
||
| psBindingPaginator := awsSsoAdmin.NewListPermissionSetsProvisionedToAccountPaginator(o.ssoAdminClient, psBindingInput) | ||
| for { | ||
| psBindingsResp, err := psBindingPaginator.NextPage(ctx) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("aws-connector: ssoadmin.ListPermissionSetsProvisionedToAccount failed: %w", err) | ||
| return nil, wrapAWSError(fmt.Errorf("baton-aws: ssoadmin.ListPermissionSetsProvisionedToAccount failed: %w", err)) | ||
| } | ||
|
|
||
| for _, psId := range psBindingsResp.PermissionSets { | ||
| ps, err := o.getPermissionSet(ctx, psId) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("aws-connector: ssoadmin.DescribePermissionSet failed: %w", err) | ||
| } | ||
| permissionSetIDs = append(permissionSetIDs, psBindingsResp.PermissionSets...) | ||
| if !psBindingPaginator.HasMorePages() { | ||
| break | ||
| } | ||
| } | ||
|
|
||
| bindingName := &PermissionSetBinding{ | ||
| AccountID: resource.Id.Resource, | ||
| PermissionSetId: awsSdk.ToString(ps.PermissionSetArn), | ||
| } | ||
| var annos annotations.Annotations | ||
| annos.Update(&v2.V1Identifier{ | ||
| Id: bindingName.String(), | ||
| }) | ||
| entitlement := &v2.Entitlement{ | ||
| Id: bindingName.String(), | ||
| DisplayName: fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name)), | ||
| Description: awsSdk.ToString(ps.Description), | ||
| Resource: resource, | ||
| Purpose: v2.Entitlement_PURPOSE_VALUE_ASSIGNMENT, | ||
| GrantableTo: []*v2.ResourceType{resourceTypeSSOGroup, resourceTypeSSOUser}, | ||
| Annotations: annos, | ||
| } | ||
| // Cache the result | ||
| if ss != nil { | ||
| if err := session.SetJSON(ctx, ss, cacheKey, permissionSetIDs); err != nil { | ||
| return nil, fmt.Errorf("baton-aws: Session Storage writing operation failed: %w", err) | ||
| } | ||
| } | ||
|
|
||
| assignmentsInput := &awsSsoAdmin.ListAccountAssignmentsInput{ | ||
| AccountId: awsSdk.String(resource.Id.Resource), | ||
| InstanceArn: o.identityInstance.InstanceArn, | ||
| PermissionSetArn: ps.PermissionSetArn, | ||
| } | ||
| return permissionSetIDs, nil | ||
| } | ||
|
|
||
| // getOrFetchAllPermissionSetIDs retrieves all permission set IDs globally (not per-account), using cache if available. | ||
| // This is used by Entitlements() which needs all permission sets regardless of account association. | ||
| func (o *accountResourceType) getOrFetchAllPermissionSetIDs(ctx context.Context, ss sessions.SessionStore) ([]string, error) { | ||
| // Try cache first | ||
| if ss != nil { | ||
| cached, found, err := session.GetJSON[[]string](ctx, ss, allPermissionSetIDsCacheKey) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("baton-aws: Session Storage reading operation failed: %w", err) | ||
| } | ||
| if found { | ||
| return cached, nil | ||
| } | ||
| } | ||
|
|
||
| assignmentsPaginator := awsSsoAdmin.NewListAccountAssignmentsPaginator(o.ssoAdminClient, assignmentsInput) | ||
| for { | ||
| assignmentsResp, err := assignmentsPaginator.NextPage(ctx) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("aws-connector: ssoadmin.ListAccountAssignments failed: %w", err) | ||
| } | ||
|
|
||
| for _, assignment := range assignmentsResp.AccountAssignments { | ||
| switch assignment.PrincipalType { | ||
| case awsSsoAdminTypes.PrincipalTypeGroup: | ||
| groupARN := ssoGroupToARN(o.region, awsSdk.ToString(o.identityInstance.IdentityStoreId), awsSdk.ToString(assignment.PrincipalId)) | ||
| var groupAnnos annotations.Annotations | ||
| groupAnnos.Update(&v2.V1Identifier{ | ||
| Id: V1GrantID(entitlement.Id, groupARN), | ||
| }) | ||
| groupAnnos.Update(&v2.GrantExpandable{ | ||
| EntitlementIds: []string{ | ||
| fmt.Sprintf("%s:%s:%s", resourceTypeSSOGroup.Id, groupARN, groupMemberEntitlement), | ||
| }, | ||
| }) | ||
| rv = append(rv, &v2.Grant{ | ||
| Id: GrantID(entitlement, &v2.ResourceId{Resource: groupARN, ResourceType: resourceTypeSSOGroup.Id}), | ||
| Entitlement: entitlement, | ||
| Principal: &v2.Resource{ | ||
| Id: fmtResourceId(resourceTypeSSOGroup.Id, groupARN), | ||
| }, | ||
| Annotations: groupAnnos, | ||
| }) | ||
| case awsSsoAdminTypes.PrincipalTypeUser: | ||
| userARN := ssoUserToARN(o.region, awsSdk.ToString(o.identityInstance.IdentityStoreId), awsSdk.ToString(assignment.PrincipalId)) | ||
| var userAnnos annotations.Annotations | ||
| userAnnos.Update(&v2.V1Identifier{ | ||
| Id: V1GrantID(entitlement.Id, userARN), | ||
| }) | ||
| rv = append(rv, &v2.Grant{ | ||
| Id: GrantID(entitlement, &v2.ResourceId{Resource: userARN, ResourceType: resourceTypeSSOUser.Id}), | ||
| Entitlement: entitlement, | ||
| Principal: &v2.Resource{ | ||
| Id: fmtResourceId(resourceTypeSSOUser.Id, userARN), | ||
| }, | ||
| Annotations: userAnnos, | ||
| }) | ||
| } | ||
| } | ||
| assignmentsInput.NextToken = assignmentsResp.NextToken | ||
| if !assignmentsPaginator.HasMorePages() { | ||
| break | ||
| } | ||
| } // end pagination loop for assignments | ||
| } // end range ange psBindingsResp.PermissionSets | ||
| // Fetch from AWS API | ||
| var permissionSetIDs []string | ||
| input := &awsSsoAdmin.ListPermissionSetsInput{ | ||
| InstanceArn: o.identityInstance.InstanceArn, | ||
| } | ||
|
|
||
| if !psBindingPaginator.HasMorePages() { | ||
| paginator := awsSsoAdmin.NewListPermissionSetsPaginator(o.ssoAdminClient, input) | ||
| for { | ||
| resp, err := paginator.NextPage(ctx) | ||
| if err != nil { | ||
| return nil, wrapAWSError(fmt.Errorf("baton-aws: ssoadmin.ListPermissionSets failed: %w", err)) | ||
| } | ||
|
|
||
| permissionSetIDs = append(permissionSetIDs, resp.PermissionSets...) | ||
| if !paginator.HasMorePages() { | ||
| break | ||
| } | ||
| } // end pagination loop for permission set to account binding | ||
| } | ||
|
|
||
| // Cache the result | ||
| if ss != nil { | ||
| if err := session.SetJSON(ctx, ss, allPermissionSetIDsCacheKey, permissionSetIDs); err != nil { | ||
| return nil, fmt.Errorf("baton-aws: Session Storage writing operation failed: %w", err) | ||
| } | ||
| } | ||
|
|
||
| return permissionSetIDs, nil | ||
| } | ||
|
|
||
| // getPermissionSetWithCache fetches a permission set by ID, using cache if available. | ||
| func (o *accountResourceType) getPermissionSetWithCache(ctx context.Context, ss sessions.SessionStore, permissionSetId string) (*awsSsoAdminTypes.PermissionSet, error) { | ||
| cacheKey := permissionSetDetailKeyPrefix + permissionSetId | ||
|
|
||
| // Try cache first | ||
| if ss != nil { | ||
| cached, found, err := session.GetJSON[*awsSsoAdminTypes.PermissionSet](ctx, ss, cacheKey) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("baton-aws: Session Storage reading operation failed: %w", err) | ||
| } | ||
| if found { | ||
| return cached, nil | ||
| } | ||
| } | ||
|
|
||
| // Fetch from AWS API | ||
| ps, err := o.fetchPermissionSetFromAPI(ctx, permissionSetId) | ||
| if err != nil { | ||
| return nil, wrapAWSError(fmt.Errorf("baton-aws: ssoadmin.DescribePermissionSet failed: %w", err)) | ||
| } | ||
|
|
||
| // Cache the result | ||
| if ss != nil { | ||
| if err := session.SetJSON(ctx, ss, cacheKey, ps); err != nil { | ||
| return nil, fmt.Errorf("baton-aws: Session Storage writing operation failed: %w", err) | ||
| } | ||
| } | ||
|
|
||
| return ps, nil | ||
| } |
There was a problem hiding this comment.
Duplicate cache-or-fetch pattern + cache errors as hard failures: getOrFetchPermissionSetIDs, getOrFetchAllPermissionSetIDs, and getPermissionSetWithCache all repeat the same check-cache→fetch→store pattern. As @FeliLucero1 suggested, a generic getOrSetCache[T] helper would consolidate this.
Additionally, session store read/write errors are treated as hard errors that abort the sync (e.g., line 397, 440, 484). A cache failure should degrade gracefully — log a warning and fall through to the API fetch. Cache write failures (lines 427, 469, 501) similarly abort when the data was already successfully fetched from AWS.
| externalIDLengthMaximum = 65 | ||
| externalIDLengthMinimum = 32 |
There was a problem hiding this comment.
Dropped TODO + off-by-one: The original constant in pkg/config/config.go had // TODO(marcos): this might be a bug. which was dropped during the move. The off-by-one is still present: externalIDLengthMaximum = 65 with > 65 check allows 65-byte IDs, but the error message at line 201 says "between 32 and 64 bytes". Per the PR discussion this is considered risky to change, but the TODO should be preserved.
| externalIDLengthMaximum = 65 | |
| externalIDLengthMinimum = 32 | |
| externalIDLengthMaximum = 65 // TODO(marcos): this might be a bug. Error message says max 64 but this allows 65. | |
| externalIDLengthMinimum = 32 |
| return nil | ||
| } | ||
|
|
||
| func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { |
There was a problem hiding this comment.
Unused parameter: opts *cli.ConnectorOpts is never referenced in the function body. If required by the ConnectorBuilderV2 interface contract, rename to _.
| func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { | |
| func New(ctx context.Context, awsc *cfg.Aws, _ *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { |
| func (o *accountResourceType) Entitlements(ctx context.Context, resource *v2.Resource, opts resourceSdk.SyncOpAttrs) ([]*v2.Entitlement, *resourceSdk.SyncOpResults, error) { | ||
| // Parse page state from token | ||
| pageState, err := decodeEntitlementsPageToken(opts.PageToken.Token) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("baton-aws: failed to decode page token: %w", err) | ||
| } | ||
|
|
||
| // Step 1: Get or fetch all permission set IDs (cached globally per sync) | ||
| // We fetch all permission sets so that even on accounts that aren't associated with a permission set | ||
| // you could do a Grant Request for it -- and we'll just have an entitlement with zero entries in it in ListGrants. | ||
| allPS, err := o.getPermissionSets(ctx) | ||
| permissionSetIDs, err := o.getOrFetchAllPermissionSetIDs(ctx, opts.Session) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("aws-connector: getPermissionSets failed: %w", err) | ||
| return nil, nil, err | ||
| } | ||
| rv := make([]*v2.Entitlement, 0, len(allPS)) | ||
| for _, ps := range allPS { | ||
| b := &PermissionSetBinding{ | ||
| AccountID: resource.Id.Resource, | ||
| PermissionSetId: awsSdk.ToString(ps.PermissionSetArn), | ||
|
|
||
| // If no permission sets, we're done | ||
| if len(permissionSetIDs) == 0 { | ||
| return nil, nil, nil | ||
| } | ||
|
|
||
| // If we've processed all permission sets, we're done | ||
| if pageState.PermissionSetIndex >= len(permissionSetIDs) { | ||
| return nil, nil, nil | ||
| } | ||
|
|
||
| // Step 2: Get the current permission set to process | ||
| currentPsId := permissionSetIDs[pageState.PermissionSetIndex] | ||
|
|
||
| // Step 3: Fetch the permission set details (with caching) | ||
| ps, err := o.getPermissionSetWithCache(ctx, opts.Session, currentPsId) | ||
| if err != nil { | ||
| return nil, nil, err | ||
| } | ||
|
|
||
| // Step 4: Build the entitlement for this permission set | ||
| b := &PermissionSetBinding{ | ||
| AccountID: resource.Id.Resource, | ||
| PermissionSetId: awsSdk.ToString(ps.PermissionSetArn), | ||
| } | ||
| var annos annotations.Annotations | ||
| annos.Update(&v2.V1Identifier{ | ||
| Id: b.String(), | ||
| }) | ||
| displayName := fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name)) | ||
| member := entitlementSdk.NewAssignmentEntitlement(resource, displayName, | ||
| entitlementSdk.WithGrantableTo(resourceTypeSSOUser, resourceTypeSSOGroup), | ||
| ) | ||
| member.Description = awsSdk.ToString(ps.Description) | ||
| member.Annotations = annos | ||
| member.Id = b.String() | ||
| member.Slug = fmt.Sprintf("%s access", awsSdk.ToString(ps.Name)) | ||
|
|
||
| // Step 5: Determine next page state | ||
| var nextPageToken string | ||
| if pageState.PermissionSetIndex+1 < len(permissionSetIDs) { | ||
| // More permission sets to process | ||
| nextState := entitlementsPageState{ | ||
| PermissionSetIndex: pageState.PermissionSetIndex + 1, | ||
| } | ||
|
|
||
| nextPageToken, err = encodeEntitlementsPageToken(nextState) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("baton-aws: failed to encode page token: %w", err) | ||
| } | ||
| var annos annotations.Annotations | ||
| annos.Update(&v2.V1Identifier{ | ||
| Id: b.String(), | ||
| }) | ||
| displayName := fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name)) | ||
| member := entitlementSdk.NewAssignmentEntitlement(resource, displayName, | ||
| entitlementSdk.WithGrantableTo(resourceTypeSSOUser, resourceTypeSSOGroup), | ||
| ) | ||
| member.Description = awsSdk.ToString(ps.Description) | ||
| member.Annotations = annos | ||
| member.Id = b.String() | ||
| member.Slug = fmt.Sprintf("%s access", awsSdk.ToString(ps.Name)) | ||
| rv = append(rv, member) | ||
| } | ||
| return rv, "", nil, nil | ||
| // else: nextPageToken remains empty, signaling completion | ||
|
|
||
| if nextPageToken != "" { | ||
| return []*v2.Entitlement{member}, &resourceSdk.SyncOpResults{NextPageToken: nextPageToken}, nil | ||
| } | ||
| return []*v2.Entitlement{member}, nil, nil | ||
| } |
There was a problem hiding this comment.
Entitlements emits one item per page: This processes exactly one permission set per call, returning a single-element slice. For accounts with many permission sets, this means N round-trips through the SDK pagination loop with per-call overhead. As @agustin-conductor noted, consider batching multiple permission sets per page (e.g., 10-25) to reduce overhead while still keeping the blast radius small on rate limits.
|
Claude finished @laurenleach's task in 3m 22s —— View job PR Review: CXP-116 Containerization and Refactor of Caches
Issues: 8 Breaking ChangesNone. Issues
DocumentationUp to date. The |
| if v.GetBool(config.UseAssumeField.FieldName) { | ||
| err := connector.ValidateExternalID(v.GetString(config.ExternalIdField.FieldName)) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| err = connector.IsValidRoleARN(v.GetString(RoleArnField.FieldName)) | ||
| err = connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName)) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Test/production validation divergence: This validateConfig always requires external-id when use-assume is set, but the production validateConfig in connector.go:206-221 only requires it for two-hop mode (when global-role-arn is set). A valid single-hop config (--use-assume --role-arn <arn> without --external-id) passes in production but fails this test.
The test should mirror the production validation logic. Fix this →
| func encodeGrantsPageToken(state grantsPageState) (string, error) { | ||
| data, err := json.Marshal(state) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return base64.StdEncoding.EncodeToString(data), nil | ||
| } | ||
|
|
||
| func decodeGrantsPageToken(token string) (grantsPageState, error) { | ||
| if token == "" { | ||
| return grantsPageState{}, nil | ||
| } | ||
| data, err := base64.StdEncoding.DecodeString(token) | ||
| if err != nil { | ||
| return grantsPageState{}, err | ||
| } | ||
| var state grantsPageState | ||
| if err := json.Unmarshal(data, &state); err != nil { | ||
| return grantsPageState{}, err | ||
| } | ||
| return state, nil | ||
| } | ||
|
|
||
| func permissionSetIDsCacheKey(accountID string) string { | ||
| return permissionSetIDsCacheKeyPrefix + accountID | ||
| } | ||
|
|
||
| // entitlementsPageState tracks pagination state for the Entitlements function. | ||
| // This allows progress to be saved between calls, reducing the "blast radius" | ||
| // if a rate limit error occurs mid-sync. | ||
| type entitlementsPageState struct { | ||
| // PermissionSetIndex is the index into the cached permission set IDs list | ||
| PermissionSetIndex int `json:"psi"` | ||
| } | ||
|
|
||
| func encodeEntitlementsPageToken(state entitlementsPageState) (string, error) { | ||
| data, err := json.Marshal(state) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return base64.StdEncoding.EncodeToString(data), nil | ||
| } | ||
|
|
||
| func decodeEntitlementsPageToken(token string) (entitlementsPageState, error) { | ||
| if token == "" { | ||
| return entitlementsPageState{}, nil | ||
| } | ||
| data, err := base64.StdEncoding.DecodeString(token) | ||
| if err != nil { | ||
| return entitlementsPageState{}, err | ||
| } | ||
| var state entitlementsPageState | ||
| if err := json.Unmarshal(data, &state); err != nil { | ||
| return entitlementsPageState{}, err | ||
| } | ||
| return state, nil | ||
| } |
There was a problem hiding this comment.
Duplicate encode/decode page token functions: encodeGrantsPageToken/decodeGrantsPageToken and encodeEntitlementsPageToken/decodeEntitlementsPageToken share identical JSON↔base64 logic — only the type parameter differs. As @FeliLucero1 noted, generic helpers would eliminate this duplication:
func encodePageToken[T any](state T) (string, error) {
data, err := json.Marshal(state)
if err != nil { return "", err }
return base64.StdEncoding.EncodeToString(data), nil
}
func decodePageToken[T any](token string) (T, error) {
var zero T
if token == "" { return zero, nil }
data, err := base64.StdEncoding.DecodeString(token)
if err != nil { return zero, err }
var state T
if err := json.Unmarshal(data, &state); err != nil { return zero, err }
return state, nil
}| // wrapAWSError checks if the error is an AWS throttling error and wraps it | ||
| // with codes.Unavailable so the baton-sdk sync engine can handle rate limiting. | ||
| func wrapAWSError(err error) error { | ||
| if err == nil { | ||
| return nil | ||
| } | ||
|
|
||
| var apiErr smithy.APIError | ||
| if errors.As(err, &apiErr) { | ||
| if _, isThrottle := awsThrottleErrorCodes[apiErr.ErrorCode()]; isThrottle { | ||
| return status.Error(codes.Unavailable, err.Error()) | ||
| } | ||
| } | ||
|
|
||
| return err | ||
| } |
There was a problem hiding this comment.
wrapAWSError breaks the error chain: status.Error(codes.Unavailable, err.Error()) converts the error to a string, so errors.As/errors.Is won't find the original AWS error after wrapping. The gRPC status package doesn't support %w-style wrapping. If the baton-sdk sync engine only inspects the gRPC status code to detect rate limits, this is fine — but add a comment documenting that assumption to prevent future confusion.
| resp, err := o.orgClient.ListAccounts(ctx, listAccountsInput) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("aws-connector: listAccounts failed: %w", err) | ||
| return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err)) |
There was a problem hiding this comment.
Inconsistent error prefix: Three occurrences of the old aws-connector: prefix remain in this function (lines 181, 195, 198) while all other new code in this file uses baton-aws:. Standardize to baton-aws: for consistency and easier log grep.
| return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err)) | |
| return nil, nil, wrapAWSError(fmt.Errorf("baton-aws: listAccounts failed: %w", err)) |
| // getPermissionSetWithCache fetches a permission set by ID, using cache if available. | ||
| func (o *accountResourceType) getPermissionSetWithCache(ctx context.Context, ss sessions.SessionStore, permissionSetId string) (*awsSsoAdminTypes.PermissionSet, error) { | ||
| cacheKey := permissionSetDetailKeyPrefix + permissionSetId | ||
|
|
||
| // Try cache first | ||
| if ss != nil { | ||
| cached, found, err := session.GetJSON[*awsSsoAdminTypes.PermissionSet](ctx, ss, cacheKey) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("baton-aws: Session Storage reading operation failed: %w", err) | ||
| } | ||
| if found { | ||
| return cached, nil | ||
| } | ||
| } | ||
|
|
||
| // Fetch from AWS API | ||
| ps, err := o.fetchPermissionSetFromAPI(ctx, permissionSetId) | ||
| if err != nil { | ||
| return nil, wrapAWSError(fmt.Errorf("baton-aws: ssoadmin.DescribePermissionSet failed: %w", err)) | ||
| } | ||
|
|
||
| // Cache the result | ||
| if ss != nil { | ||
| if err := session.SetJSON(ctx, ss, cacheKey, ps); err != nil { | ||
| return nil, fmt.Errorf("baton-aws: Session Storage writing operation failed: %w", err) | ||
| } | ||
| } | ||
|
|
||
| return ps, nil | ||
| } |
There was a problem hiding this comment.
Duplicate cache-or-fetch pattern + cache errors as hard failures: getOrFetchPermissionSetIDs, getOrFetchAllPermissionSetIDs, and getPermissionSetWithCache all repeat the same check-cache→fetch→store pattern. As @FeliLucero1 suggested, a generic getOrSetCache[T] helper would consolidate this.
Additionally, session store read/write errors are treated as hard errors that abort the sync. A cache failure is not a fundamental API failure — the data can still be fetched from AWS. Consider logging cache errors at warn level and falling through to the API fetch, so a corrupted cache entry doesn't block the entire sync. Cache write failures (lines 427, 469, 501) similarly abort after the data was already successfully fetched from AWS.
| externalIDLengthMaximum = 65 | ||
| externalIDLengthMinimum = 32 |
There was a problem hiding this comment.
Dropped TODO + off-by-one: The original constant in pkg/config/config.go had // TODO(marcos): this might be a bug. which was dropped during the move. The off-by-one is still present: externalIDLengthMaximum = 65 with > 65 check allows 65-byte IDs, but the error message at line 201 says "between 32 and 64 bytes". Per the earlier PR discussion this is considered risky to change, but the TODO should be preserved so the discrepancy isn't forgotten.
| externalIDLengthMaximum = 65 | |
| externalIDLengthMinimum = 32 | |
| externalIDLengthMaximum = 65 // TODO(marcos): this might be a bug. Error message says max 64 but this allows 65. | |
| externalIDLengthMinimum = 32 |
| return nil | ||
| } | ||
|
|
||
| func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { |
There was a problem hiding this comment.
Unused parameter: opts *cli.ConnectorOpts is never referenced in the function body. If required by the ConnectorBuilderV2 interface contract, rename to _ to make this explicit.
| func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { | |
| func New(ctx context.Context, awsc *cfg.Aws, _ *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { |
| func (o *accountResourceType) Entitlements(ctx context.Context, resource *v2.Resource, opts resourceSdk.SyncOpAttrs) ([]*v2.Entitlement, *resourceSdk.SyncOpResults, error) { | ||
| // Parse page state from token | ||
| pageState, err := decodeEntitlementsPageToken(opts.PageToken.Token) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("baton-aws: failed to decode page token: %w", err) | ||
| } | ||
|
|
||
| // Step 1: Get or fetch all permission set IDs (cached globally per sync) | ||
| // We fetch all permission sets so that even on accounts that aren't associated with a permission set | ||
| // you could do a Grant Request for it -- and we'll just have an entitlement with zero entries in it in ListGrants. | ||
| allPS, err := o.getPermissionSets(ctx) | ||
| permissionSetIDs, err := o.getOrFetchAllPermissionSetIDs(ctx, opts.Session) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("aws-connector: getPermissionSets failed: %w", err) | ||
| return nil, nil, err | ||
| } | ||
| rv := make([]*v2.Entitlement, 0, len(allPS)) | ||
| for _, ps := range allPS { | ||
| b := &PermissionSetBinding{ | ||
| AccountID: resource.Id.Resource, | ||
| PermissionSetId: awsSdk.ToString(ps.PermissionSetArn), | ||
|
|
||
| // If no permission sets, we're done | ||
| if len(permissionSetIDs) == 0 { | ||
| return nil, nil, nil | ||
| } | ||
|
|
||
| // If we've processed all permission sets, we're done | ||
| if pageState.PermissionSetIndex >= len(permissionSetIDs) { | ||
| return nil, nil, nil | ||
| } | ||
|
|
||
| // Step 2: Get the current permission set to process | ||
| currentPsId := permissionSetIDs[pageState.PermissionSetIndex] | ||
|
|
||
| // Step 3: Fetch the permission set details (with caching) | ||
| ps, err := o.getPermissionSetWithCache(ctx, opts.Session, currentPsId) | ||
| if err != nil { | ||
| return nil, nil, err | ||
| } | ||
|
|
||
| // Step 4: Build the entitlement for this permission set | ||
| b := &PermissionSetBinding{ | ||
| AccountID: resource.Id.Resource, | ||
| PermissionSetId: awsSdk.ToString(ps.PermissionSetArn), | ||
| } | ||
| var annos annotations.Annotations | ||
| annos.Update(&v2.V1Identifier{ | ||
| Id: b.String(), | ||
| }) | ||
| displayName := fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name)) | ||
| member := entitlementSdk.NewAssignmentEntitlement(resource, displayName, | ||
| entitlementSdk.WithGrantableTo(resourceTypeSSOUser, resourceTypeSSOGroup), | ||
| ) | ||
| member.Description = awsSdk.ToString(ps.Description) | ||
| member.Annotations = annos | ||
| member.Id = b.String() | ||
| member.Slug = fmt.Sprintf("%s access", awsSdk.ToString(ps.Name)) | ||
|
|
||
| // Step 5: Determine next page state | ||
| var nextPageToken string | ||
| if pageState.PermissionSetIndex+1 < len(permissionSetIDs) { | ||
| // More permission sets to process | ||
| nextState := entitlementsPageState{ | ||
| PermissionSetIndex: pageState.PermissionSetIndex + 1, | ||
| } | ||
|
|
||
| nextPageToken, err = encodeEntitlementsPageToken(nextState) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("baton-aws: failed to encode page token: %w", err) | ||
| } | ||
| var annos annotations.Annotations | ||
| annos.Update(&v2.V1Identifier{ | ||
| Id: b.String(), | ||
| }) | ||
| displayName := fmt.Sprintf("%s Permission Set", awsSdk.ToString(ps.Name)) | ||
| member := entitlementSdk.NewAssignmentEntitlement(resource, displayName, | ||
| entitlementSdk.WithGrantableTo(resourceTypeSSOUser, resourceTypeSSOGroup), | ||
| ) | ||
| member.Description = awsSdk.ToString(ps.Description) | ||
| member.Annotations = annos | ||
| member.Id = b.String() | ||
| member.Slug = fmt.Sprintf("%s access", awsSdk.ToString(ps.Name)) | ||
| rv = append(rv, member) | ||
| } | ||
| return rv, "", nil, nil | ||
| // else: nextPageToken remains empty, signaling completion | ||
|
|
||
| if nextPageToken != "" { | ||
| return []*v2.Entitlement{member}, &resourceSdk.SyncOpResults{NextPageToken: nextPageToken}, nil | ||
| } | ||
| return []*v2.Entitlement{member}, nil, nil | ||
| } |
There was a problem hiding this comment.
Entitlements emits one item per page: This processes exactly one permission set per call, returning a single-element slice. For accounts with many permission sets, this means N round-trips through the SDK pagination loop with per-call serialization/deserialization overhead. As @agustin-conductor noted, consider batching multiple permission sets per page (e.g., 10-25) to reduce overhead while still keeping the blast radius small on rate limits.
| func wrapAWSError(err error) error { | ||
| if err == nil { | ||
| return nil | ||
| } | ||
|
|
||
| var apiErr smithy.APIError | ||
| if errors.As(err, &apiErr) { | ||
| if _, isThrottle := awsThrottleErrorCodes[apiErr.ErrorCode()]; isThrottle { | ||
| return status.Error(codes.Unavailable, err.Error()) | ||
| } | ||
| } | ||
|
|
||
| return err |
There was a problem hiding this comment.
wrapAWSError only used in account.go: This function was introduced in this PR but is only applied in account.go. All other builders (account_iam.go, iam_group.go, iam_user.go, role.go, secrets.go, sso_group.go, sso_user.go) make AWS API calls without throttle wrapping, so rate limit errors from IAM/SSO/IdentityStore calls will not be surfaced as codes.Unavailable for the SDK sync engine to handle. This significantly reduces the value of the throttle wrapping since the most common rate-limited calls (ListUsers, ListGroups, ListGroupMemberships) are unprotected.
…udes all resource types
|
Claude finished @laurenleach's task in 8m 6s —— View job PR Review: CXP-116 Containerization and Refactor of Caches
Issues: 8 Breaking ChangesNone. Issues
DocumentationUp to date. No resource types, config fields, or capabilities were added/removed. The |
| err = connector.IsValidRoleARN(v.GetString(config.RoleArnField.FieldName)) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Test/production validation divergence: This validateConfig always requires external-id when use-assume is set, but the production validateConfig in connector.go:206-221 only requires it for two-hop mode (when global-role-arn is set). Valid single-hop configs pass in production but fail this test. The test should mirror the production logic by only requiring external-id when global-role-arn is also set.
| resp, err := o.orgClient.ListAccounts(ctx, listAccountsInput) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("aws-connector: listAccounts failed: %w", err) | ||
| return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err)) |
There was a problem hiding this comment.
Inconsistent error prefix: This uses aws-connector: while the rest of the new code uses baton-aws:. Lines 195 and 198 also use the old prefix in debug messages. Standardize to baton-aws: for consistency.
| return nil, nil, wrapAWSError(fmt.Errorf("aws-connector: listAccounts failed: %w", err)) | |
| return nil, nil, wrapAWSError(fmt.Errorf("baton-aws: listAccounts failed: %w", err)) |
| return entitlementsPageState{}, err | ||
| } | ||
| return state, nil | ||
| } |
There was a problem hiding this comment.
Duplicate encode/decode page token functions: encodeGrantsPageToken/decodeGrantsPageToken and encodeEntitlementsPageToken/decodeEntitlementsPageToken share identical JSON↔base64 logic — only the type parameter differs. As @FeliLucero1 noted, generic helpers would eliminate this duplication:
func encodePageToken[T any](state T) (string, error) {
data, err := json.Marshal(state)
if err != nil { return "", err }
return base64.StdEncoding.EncodeToString(data), nil
}
func decodePageToken[T any](token string) (T, error) {
var zero T
if token == "" { return zero, nil }
data, err := base64.StdEncoding.DecodeString(token)
if err != nil { return zero, err }
var state T
if err := json.Unmarshal(data, &state); err != nil { return zero, err }
return state, nil
}| } | ||
| } | ||
|
|
||
| return err |
There was a problem hiding this comment.
wrapAWSError only applied in account.go: This function was introduced in this PR but is only used in account.go. All other builders (account_iam.go, iam_group.go, iam_user.go, role.go, secrets.go, sso_group.go, sso_user.go) make AWS API calls without throttle wrapping. Rate limit errors from IAM/SSO/IdentityStore calls in those files will not be surfaced as codes.Unavailable for the SDK sync engine to handle. This significantly reduces the value of the throttle wrapping since common rate-limited calls (ListUsers, ListGroups, ListGroupMemberships) are unprotected.
Also note: status.Error(codes.Unavailable, err.Error()) converts the error to a string, breaking the error chain for errors.As/errors.Is. If the baton-sdk sync engine only inspects the gRPC status code, this is fine — but a comment documenting that assumption would prevent future confusion.
| PermissionSetArn: ps.PermissionSetArn, | ||
| } | ||
| return permissionSetIDs, nil | ||
| } |
There was a problem hiding this comment.
Duplicate cache-or-fetch pattern + cache errors as hard failures: getOrFetchPermissionSetIDs, getOrFetchAllPermissionSetIDs (line 436), and getPermissionSetWithCache (line 477) all repeat the same check-cache→fetch→store pattern. As @FeliLucero1 suggested, a generic getOrSetCache[T] helper would consolidate this.
Additionally, session store read/write errors are treated as hard errors that abort the sync. A cache failure is not a fundamental API failure — the data can still be fetched from AWS. Consider logging cache errors at warn level and falling through to the API fetch, so a corrupted cache entry doesn't block the entire sync. Cache write failures (lines 427, 469, 501) similarly abort after the data was already successfully fetched.
| return []*v2.Entitlement{member}, &resourceSdk.SyncOpResults{NextPageToken: nextPageToken}, nil | ||
| } | ||
| return []*v2.Entitlement{member}, nil, nil | ||
| } |
There was a problem hiding this comment.
Entitlements emits one item per page: This processes exactly one permission set per call, returning a single-element slice. For accounts with many permission sets, this means N round-trips through the SDK pagination loop with per-call serialization/deserialization overhead. As @agustin-conductor noted, consider batching multiple permission sets per page (e.g., 10-25) to reduce overhead while still keeping the blast radius small on rate limits.
| externalIDLengthMaximum = 65 | ||
| externalIDLengthMinimum = 32 |
There was a problem hiding this comment.
Dropped TODO + off-by-one: The original constant in pkg/config/config.go had // TODO(marcos): this might be a bug. which was dropped during the move. The off-by-one is still present: externalIDLengthMaximum = 65 with > 65 check allows 65-byte IDs, but the error message at line 201 says "between 32 and 64 bytes". Per the earlier PR discussion this is considered risky to change, but the TODO should be preserved.
| externalIDLengthMaximum = 65 | |
| externalIDLengthMinimum = 32 | |
| externalIDLengthMaximum = 65 // TODO(marcos): this might be a bug. Error message says max 64 but this allows 65. | |
| externalIDLengthMinimum = 32 |
| return nil | ||
| } | ||
|
|
||
| func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { |
There was a problem hiding this comment.
Unused parameter: opts *cli.ConnectorOpts is never referenced in the function body. If required by the ConnectorBuilderV2 interface contract, rename to _ to make this explicit.
| func New(ctx context.Context, awsc *cfg.Aws, opts *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { | |
| func New(ctx context.Context, awsc *cfg.Aws, _ *cli.ConnectorOpts) (connectorbuilder.ConnectorBuilderV2, []connectorbuilder.Opt, error) { |
| if !psBindingPaginator.HasMorePages() { | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
Pagination loops missing ctx.Done() check: Both getOrFetchPermissionSetIDs (this function) and getOrFetchAllPermissionSetIDs (line 454) contain inner pagination loops that do not check ctx.Done(). If the context is cancelled (e.g., timeout or shutdown), the loop will continue iterating until the next API call happens to fail.
Add a context cancellation check at the top of each loop body:
for {
select {
case <-ctx.Done():
return nil, ctx.Err()
default:
}
// ... existing code
}
Updates the connector for it to be able to containerize.
This applies changes to the structure, the workflows and refactors the use of caches on the connector. Also implements ResourceSyncerV2.
Will require extensive testing and validations.
Summary by CodeRabbit