Skip to content

chore(pulumi): remove orphan CI IAM users from all stacks (#212)#214

Merged
aatchison merged 2 commits intomainfrom
feat/oidc-migration-212
Apr 22, 2026
Merged

chore(pulumi): remove orphan CI IAM users from all stacks (#212)#214
aatchison merged 2 commits intomainfrom
feat/oidc-migration-212

Conversation

@aatchison
Copy link
Copy Markdown
Contributor

@aatchison aatchison commented Apr 21, 2026

Summary

Removes the tb_pulumi.iam.UserWithAccessKey CI user block from StackAccessPolicies.on_apply across all three stacks (dev, stage, prod).

Investigation findings (thunderbird/platform-infrastructure#212):

  • CloudTrail confirmed zero activity over 90 days for all three users in both eu-central-1 and us-east-1
  • No workflow in thunderbird/mailstrom/.github/workflows/ references AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY — the two workflows present (validate.yml, nightly-integration-tests.yml) don't touch AWS
  • gh search code --owner thunderbird "mailstrom-ci" returned no results
  • Outcome: orphan users — provisioned for a CI flow that was never built

Users being removed:

User ARN Key ID Created 90-day events
mailstrom-dev-ci arn:aws:iam::768512802988:user/mailstrom-dev-ci AKIA3F3XOYCWN7CIZV6E 2026-04-09 0
mailstrom-stage-ci arn:aws:iam::768512802988:user/mailstrom-stage-ci AKIA3F3XOYCWN7CBCRFO 2025-09-17 0
mailstrom-prod-ci arn:aws:iam::768512802988:user/mailstrom-prod-ci AKIA3F3XOYCWAIUIZPF3 2025-09-05 0

All three users confirmed in Pulumi state via pulumi stack export (9 child resources each: user, access key, group membership, Secrets Manager secret/version, IAM policy/attachment).

Pre-existing issues — do not run a full pulumi up

Two separate pre-existing problems surface during preview. Neither is introduced by this PR.

1. Pre-existing state drift (all stacks)

Running pulumi preview without --target shows 58–88 unexpected resource deletions per stack, including EC2 instances, security group rules, and IAM group policy attachments that are unrelated to the CI user. This is pre-existing Pulumi state drift — these resources exist in state but the program no longer produces them (likely due to tb_pulumi API changes since the last pulumi up).

Do not run a full pulumi up until this drift is investigated separately. A targeted destroy (below) avoids touching these resources entirely.

2. Provider mismatch without region env vars (dev/prod)

Without AWS_DEFAULT_REGION/AWS_REGION=eu-central-1, dev and prod fail immediately:

error: rpc error: code = Unknown desc = ["region"]: provider mismatch
(kind:DELETE inputDiff:true != kind:DELETE_REPLACE inputDiff:true)

Setting the region env vars resolves this. All apply commands below include them.

Merge checklist

  • Deactivate all three access keys (precaution before merge)
  • Review and merge this PR
  • Apply targeted destroy per stack — do not omit --target (avoids the pre-existing drift above):
    # In /workspaces/mailstrom/pulumi, with PULUMI_ACCESS_TOKEN set:
    for STACK in dev stage prod; do
      pulumi stack select mzla-services/mailstrom/$STACK
      AWS_DEFAULT_REGION=eu-central-1 AWS_REGION=eu-central-1 \
      TBPULUMI_DISABLE_PROTECTION=True PULUMI_CONFIG_PASSPHRASE='' \
      pulumi up -y \
        --target "urn:pulumi:${STACK}::mailstrom::tb:iam:UserWithAccessKey::mailstrom-${STACK}-ci" \
        --target-dependents
    done
  • Confirm NoSuchEntity for each user:
    for U in mailstrom-dev-ci mailstrom-stage-ci mailstrom-prod-ci; do
      aws iam get-user --user-name "$U" --profile mzla-legacy 2>&1
    done
  • Open a follow-up issue for the pre-existing state drift (separate from this PR's scope)

No OIDC migration needed

Mailstrom has no automated deploy workflow. Operator deployments run locally via SSO credentials. Once these orphan users are deleted there is no remaining AWS auth surface in this repo's CI.

Closes thunderbird/platform-infrastructure#212

CloudTrail confirms zero activity over 90 days across dev/stage/prod
in eu-central-1 and us-east-1. No workflow in this repo or the
thunderbird org references these users. All three access keys are
Active but unused.

Removes tb_pulumi.iam.UserWithAccessKey from the StackAccessPolicies
on_apply callback. Next pulumi up will destroy mailstrom-{dev,stage,prod}-ci
and their access keys across all three stacks.

Part of thunderbird/platform-infrastructure#212
@aatchison aatchison requested a review from ryanjjung April 21, 2026 01:47
Copy link
Copy Markdown
Contributor

@ryanjjung ryanjjung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ought to be safe. We never really set up much automation around this.

I was also going to suggest removing the StackAccessPolicies as well. Rationale: they can be easily rebuilt if we need them, but nobody needs them right now. Removing them would clean up a lot of constant ugly drift in the pulumi plans.

@aatchison
Copy link
Copy Markdown
Contributor Author

This ought to be safe. We never really set up much automation around this.

I was also going to suggest removing the StackAccessPolicies as well. Rationale: they can be easily rebuilt if we need them, but nobody needs them right now. Removing them would clean up a lot of constant ugly drift in the pulumi plans.

Great! I'll start with this one. We have a new way of doing it now with identity center anyhoo

@ryanjjung
Copy link
Copy Markdown
Contributor

Yes, and this was always in response to a security audit where it was determined we needed a way to grant individuals (like third party folks) limited env-app access. If someone like that ever shows up, we can make that happen.

@aatchison
Copy link
Copy Markdown
Contributor Author

Pre-merge notes

pulumi preview: pre-existing failure (unrelated to this PR)

pulumi preview fails at resource 5/283 in both dev and prod with:

error: rpc error: code = Unknown desc = ["region"]: provider mismatch (kind:DELETE inputDiff:true != kind:DELETE_REPLACE inputDiff:true)

The stage stack fails with an unrelated AttributeError: module 'tb_pulumi.cloudwatch' has no attribute 'LogDestination'. Neither error is introduced by this PR — they exist on main today.

CI user resources confirmed in state

Verified via pulumi stack export — all three users and their 9 child resources (user, access key, group membership, 2× Secrets Manager secret/version, 2× IAM policy/attachment) are tracked in Pulumi state and will be destroyed on pulumi up.

Recommended apply approach: targeted destroy

Because the full pulumi up dies at resource 5, use --target + --target-dependents to destroy just the CI user resources:

# In /workspaces/mailstrom/pulumi, with PULUMI_ACCESS_TOKEN set:

# dev
pulumi stack select mzla-services/mailstrom/dev
TBPULUMI_DISABLE_PROTECTION=True PULUMI_CONFIG_PASSPHRASE='' pulumi up -y --target   'urn:pulumi:dev::mailstrom::tb:iam:UserWithAccessKey::mailstrom-dev-ci'   --target-dependents

# stage
pulumi stack select mzla-services/mailstrom/stage
TBPULUMI_DISABLE_PROTECTION=True PULUMI_CONFIG_PASSPHRASE='' pulumi up -y --target   'urn:pulumi:stage::mailstrom::tb:iam:UserWithAccessKey::mailstrom-stage-ci'   --target-dependents

# prod
pulumi stack select mzla-services/mailstrom/prod
TBPULUMI_DISABLE_PROTECTION=True PULUMI_CONFIG_PASSPHRASE='' pulumi up -y --target   'urn:pulumi:prod::mailstrom::tb:iam:UserWithAccessKey::mailstrom-prod-ci'   --target-dependents

@aatchison
Copy link
Copy Markdown
Contributor Author

State drift detail — full pulumi preview output (2026-04-22)

⚠️ This drift is pre-existing on main and is NOT caused by this PR. It was surfaced during preview validation for the CI user removal. A full pulumi up would destroy live infrastructure — do not run one without investigating the root cause first.

Preview run with AWS_DEFAULT_REGION=eu-central-1 AWS_REGION=eu-central-1, Pulumi 3.231.0, tb_pulumi v0.0.18.


dev — 58 deletes

Count Resource type Notes
1 aws:ec2/instance:Instance Bastion mailstrom-dev-public-rjung-instance (id: i-0b1a0de64036229af)
1 aws:ec2/securityGroup:SecurityGroup Bastion SG
4 aws:ec2/securityGroupRule:SecurityGroupRule Bastion + Stalwart ingress rules
1 tb:ec2:SshableInstance + tb:network:SecurityGroupWithRules Bastion wrapper resources
2 aws:secretsmanager/secret:Secret + versions mailstrom/dev/stalwart.postboot.redis_backend (removed from config but not applied)
2 tb:secrets:SecretsManagerSecret Same redis secret wrappers
2 aws:iam/group:Group SAP readonly + admin groups
18 aws:iam/groupPolicyAttachment:GroupPolicyAttachment SAP group policy attachments
19 aws:iam/policy:Policy SAP service policies
9 CI user resources Expected — the purpose of this PR

stage — 88 deletes

Count Resource type Notes
24 aws:cloudwatch/metricAlarm:MetricAlarm All monitoring alarms gone from program
1 aws:sns/topic:Topic + 1 topicSubscription Monitoring SNS topic
1 tb:cloudwatch:CloudFrontDistributionAlarmGroup
3 tb:cloudwatch:Ec2InstanceAlarmGroup
8 tb:cloudwatch:LbTargetGroupAlarmGroup
2 aws:secretsmanager/secret:Secret + versions stalwart.postboot.redis_backend
2 aws:iam/group:Group SAP groups
18 aws:iam/groupPolicyAttachment:GroupPolicyAttachment SAP attachments
19 aws:iam/policy:Policy SAP policies
9 CI user resources Expected

prod — 86 deletes + 1 replace + error

Count Resource type Notes
29 aws:cloudwatch/metricAlarm:MetricAlarm All monitoring alarms
1 aws:sns/topic:Topic + 1 topicSubscription
1 tb:cloudwatch:CloudFrontDistributionAlarmGroup
4 tb:cloudwatch:Ec2InstanceAlarmGroup
8 tb:cloudwatch:LbTargetGroupAlarmGroup
2 aws:secretsmanager/secret:Secret + versions stalwart.postboot.redis_backend
2 aws:iam/group:Group SAP groups
14 aws:iam/groupPolicyAttachment:GroupPolicyAttachment SAP attachments
15 aws:iam/policy:Policy SAP policies
1 aws:ec2/securityGroupRule:SecurityGroupRule Cannot be deleted — preview fails with: resource "mailstrom-prod-stalwart-privlbsg-management-ingress-2" cannot be deleted
9 CI user resources Expected

Likely root causes

  1. Redis backend secretstalwart.postboot.redis_backend was removed from config.{dev,stage,prod}.yaml but pulumi up was never run; the secret and its wrapper remain in state.
  2. Bastion host (dev) — mailstrom-dev-public-rjung removed from config but not destroyed.
  3. CloudWatch monitoring (stage + prod) — monitoring_opts evaluates to None for these stacks, so the CloudWatchMonitoringGroup block is skipped; all alarms/SNS remain in state.
  4. SAP policy driftStackAccessPolicies is regenerating group policy attachments with new IDs, causing the old ones to appear as orphans.
  5. Security group rule conflict (prod) — a Stalwart security group rule has a dependency Pulumi cannot resolve for deletion.

Live preview links for reference:

@aatchison
Copy link
Copy Markdown
Contributor Author

Preview re-run with setup-{dev,stage,prod}.sh sourced (2026-04-22)

Re-ran pulumi preview sourcing the uncommitted pulumi/setup-{stack}.sh scripts (which set AWS_REGION, AWS_DEFAULT_REGION, and PULUMI_CONFIG_PASSPHRASE per stack). Results are identical to the previous run — the drift is confirmed to be a genuine state/code divergence, not a credentials or region configuration issue.

Counts unchanged: dev 58 deletes, stage 88 deletes, prod 86 deletes + 1 blocked deletion (security group rule).

Live preview links:

The targeted destroy for the CI user resources is still the correct path forward for this PR. The drift needs to be addressed in a separate issue before running a full pulumi up.

@aatchison aatchison merged commit 18b4c28 into main Apr 22, 2026
1 check passed
@aatchison aatchison deleted the feat/oidc-migration-212 branch April 22, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants