Skip to content

Security and Testing infrastructure Remediation#666

Draft
Wikid82 wants to merge 239 commits intodevelopmentfrom
feature/beta-release
Draft

Security and Testing infrastructure Remediation#666
Wikid82 wants to merge 239 commits intodevelopmentfrom
feature/beta-release

Conversation

@Wikid82
Copy link
Owner

@Wikid82 Wikid82 commented Feb 8, 2026

placeholder....

Chores of this PR:

  1. Migrate base image to Alpine to fix known Debian CVEs.

  2. Make sure testing infrastructure has no regression and passes all green. Includes local and CI E2E, Integrations, Frontend and Backend Coverage.

DoD Checklist:

  • Dockerfile updates and rebuild successful
  • Confirm Integration Tests
    • Cerberus
    • CrowdSec
    • Rate Limiter
    • WAF
  • E2E Runs Green
  • Frontend runs green and meets Coverage goal
  • Backend runs green and meets coverage goal
  • All CI/CD are running green

Closes Issues: #40, #587, #589, #592, #610, #618, #638, #664, #665, #631

Wikid82 and others added 30 commits February 4, 2026 05:33
fix: crowdsec web console enrollment
Propagate changes from development into feature/beta-release
…ekly-non-major-updates

chore(deps): update dependency @types/react to ^19.2.11 (feature/beta-release)
fix(ci): standardize image tag step ID across integration workflows
fix(ci): remove redundant image tag determination logic from multiple…
fix(ci): remove redundant Playwright browser cache cleanup from workf…
…ace conditions

- Changed workflow name to reflect sequential execution for stability.
- Reduced test sharding from 4 to 1 per browser, resulting in 3 total jobs.
- Updated job summaries and documentation to clarify execution model.
- Added new documentation file for E2E CI failure diagnosis.
- Adjusted job summary tables to reflect changes in shard counts and execution type.
fix(e2e): update E2E tests workflow to sequential execution and fix r…
Root causes:
1. Browser cache was restoring corrupted/stale binaries from previous runs
2. 30-minute timeout insufficient for fresh Playwright installation (10-15 min)
   plus Docker/health checks and test execution

Changes:
- Remove browser caching from all 3 browser jobs (chromium, firefox, webkit)
- Increase timeout from 30 → 45 minutes for all jobs
- Add diagnostic logging to browser install steps:
  * Install start/completion timestamps
  * Exit code verification
  * Cache directory inspection on failure
  * Browser executable verification using 'npx playwright test --list'

Benefits:
- Fresh browser installations guaranteed (no cache pollution)
- 15-minute buffer prevents premature timeouts
- Detailed diagnostics to catch future installation issues early
- Consistent behavior across all browsers

Technical notes:
- Browser install with --with-deps takes 10-15 minutes per browser
- GitHub Actions cache was causing more harm than benefit (stale binaries)
- Sequential execution (1 shard per browser) combined with fresh installs
  ensures stable, reproducible CI behavior

Expected outcome:
- Firefox/WebKit failures from missing browser executables → resolved
- Chrome timeout at 30 minutes → resolved with 45 minute buffer
- Future installation issues → caught immediately via diagnostics

Refs: #hofix/ci
QA: YAML syntax validated, pre-commit hooks passed (12/12)
fix: resolve Playwright browser executable not found errors in CI
Remove overly complex verification logic that was causing all browser
jobs to fail. Browser installation should fail fast and clearly if
there are issues.

Changes:
- Remove multi-line verification scripts from all 3 browser install steps
- Simplify to single command: npx playwright install --with-deps {browser}
- Let install step show actual errors if it fails
- Let test execution show "browser not found" errors if install incomplete

Rationale:
- Previous complex verification (using grep/find) was the failure point
- Simpler approach provides clearer error messages for debugging
- Tests themselves will fail clearly if browsers aren't available

Expected outcome:
- Install steps show actual error messages if they fail
- If install succeeds, tests execute normally
- If install "succeeds" but browser is missing, test step shows clear error

Timeout remains at 45 minutes (accommodates 10-15 min install + execution)
fix: simplify Playwright browser installation steps
actions-user and others added 30 commits February 8, 2026 05:50
- Make lint steps fail the pipeline so issues block merges
- Skip Node cache setup when the frontend lockfile is missing
- Cancel older CI runs for the same ref to reduce queue delays
- CI now focuses only on Dockerfile validation and security scanning
- Go code linting is handled locally via pre-commit hooks and DoD checklist
- Prevents CI failures from missing golangci-lint configuration
- Aligns CI responsibilities with local development workflow
The golangci-lint-action v9.2.0 requires version strings in "vX.Y.Z" format.
Previous attempt to remove the "v" prefix caused validation error:
"invalid version string '1.64.5', expected format v1.2 or v1.2.3"

Updated both ci-pipeline.yml and quality-checks.yml to use "v1.64.5"
instead of "1.64.5" to match the action's expected format.

Fixes: #666 (PR CI validation failure)
Previously, Phase 1 optimization restricted feature branch pushes to
linux/amd64 only for faster builds. This unintentionally prevented
arm64 images from being published to Docker Hub.

Changes:
- Feature branches now build for both linux/amd64 and linux/arm64
- PRs remain single-platform (amd64) for fast feedback
- Only PRs create artifacts (multi-platform manifests can't be loaded locally)
- Updated comments to reflect new platform behavior

Result: feature/beta-release will now publish both amd64 and arm64
images to Docker Hub on every push.

Closes: User report - arm64 missing from Docker Hub
The golangci-lint-action v9.2.0 dropped support for golangci-lint v1.x
and requires v2.x versions. The error "golangci-lint v1 is not supported
by golangci-lint-action >= v7" indicates we need to upgrade, not downgrade.

Updated both ci-pipeline.yml and quality-checks.yml from v1.64.5 to v2.8.0
to align with the current golangci-lint major version.

Fixes: #666 (golangci-lint version compatibility error)
The golangci-lint v2.x series requires a different configuration schema:

1. `linters-settings` must be nested under `linters.settings`
2. `issues.exclude-generated-strict` is not supported
3. `issues.exclude-rules` complex syntax replaced with simpler `exclude` patterns

Changes to both backend/.golangci-fast.yml and backend/.golangci.yml:
- Restructured linter settings under `linters.settings`
- Converted exclude-rules to simple exclude patterns
- Added proper v2.x directives (exclude-use-default, max-issues-per-linter)
- Maintained all security checks and error handling exclusions

This resolves the "invalid configuration keys" error when running
golangci-lint v2.8.0 with golangci-lint-action v9.2.0.

Fixes: #666 (golangci-lint configuration schema validation)
The golangci-lint v2.8.0 schema validation rejected all properties
in the issues section:
- exclude-use-default
- exclude-dirs
- exclude-files
- exclude
- max-issues-per-linter
- max-same-issues

Solution: Removed the entire issues section from both config files.
Linter behavior is now controlled exclusively through linters.settings,
which is properly configured for govet, errcheck, gosec, gocritic, etc.

Changes to backend/.golangci-fast.yml and backend/.golangci.yml:
- Removed issues section entirely (v2.x schema incompatible)
- Retained all linter-specific settings under linters.settings
- Linters will run with their configured settings and default behaviors

This resolves the jsonschema validation error:
"additional properties ... not allowed"

Fixes: #666 (golangci-lint v2.x schema validation)
- Updated error variable names for clarity in DNS provider, import, logs, manual challenge, security, user, and other handlers.
- Improved error handling in services such as backup, credential, docker, mail, notification, security headers, and uptime services.
- Enhanced readability by using more descriptive variable names for errors in multiple locations across the codebase.
- Ensured consistent error handling practices throughout the application.
…jor-7-github-artifact-actions

chore(deps): update actions/download-artifact action to v7 (feature/beta-release)
…jor-10-eslint-monorepo

chore(deps): update dependency eslint to v10 (feature/beta-release)
…jor-6-github-artifact-actions

chore(deps): update github artifact actions to v6 (feature/beta-release) (major)
- Updated environment variable assignments in multiple workflow files to use double quotes for consistency and to prevent potential issues with variable expansion.
- Refactored echo commands to group multiple lines into a single block for improved readability in the following workflows:
  - release-goreleaser.yml
  - renovate_prune.yml
  - security-pr.yml
  - security-weekly-rebuild.yml
  - supply-chain-pr.yml
  - supply-chain-verify.yml
  - update-geolite2.yml
  - waf-integration.yml
  - weekly-nightly-promotion.yml
- Fixed github.head_ref actionlint error by passing via environment variable
  instead of direct shell interpolation in ci-pipeline.yml
- Aligned E2E coverage artifact handling to shard artifacts and updated
  Codecov upload to use glob pattern for multi-shard merge
- Added workflow_run trigger to security-pr.yml for docker-build integration
  while retaining workflow_dispatch for manual runs
- Added workflow_run trigger to supply-chain-pr.yml for docker-build integration
  while retaining workflow_dispatch for manual runs
- All individual workflows now support both automatic (workflow_run) and manual
  (workflow_dispatch) triggering, maintaining design intent
- Audited remaining workflows; no additional blockers found
- All actionlint and pre-commit validations now passing
- Full pipeline trigger chain now functional
- Remove `if: always()` from integration-gate, coverage-gate, codecov-gate, pipeline-gate
- Gates now naturally skip when their upstream dependencies are skipped (fork PR behavior)
- Prevents confusing "complete" status when nothing actually ran
- Fork PRs will show "skipped" in UI instead of obscuring behavior behind gate success
- Aligns with GitHub Actions standard job dependency semantics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

architecture System design and structure beta Part of beta release bug Something isn't working critical Must have for the release, blocks other work deployment Docker, installation documentation Docs and guides testing Test suite

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants