chore: Release release/2026-W24 by github-actions[bot] · Pull Request #6635 · frappe/press

github-actions · 2026-06-08T05:45:34Z

Weekly release PR

This reverts commit 9395e1d.

…ermission-issue fix(auditor): Perm checks ignore when creating from release

Adds `Incident Settings Night Shift` child doctype to assign specific users per day-of-week for night hours. During night (outside DAY_HOURS), if a shift is defined for today, those users replace the default list. Repeat calls use round-robin ordering (next after last acknowledged) instead of calling the same person again. Falls back to default users when no shift is defined for today or shift users are absent from the main users table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds NIGHT_SHIFT_CALL_LIMIT (20). Once that many call attempts have been made (tracked via the updates child table) without resolving the incident, get_humans() skips night shift and falls back to the full default user list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The test was failing with AuthenticationError because running the investigator foreground caused it to call get_prometheus_client() without credentials. The underlying issue: running investigation foreground also completed the investigator immediately, making waited_enough_for_investigator_reactions return True before the first resolve_incidents() check. Fix: mock the investigator's frappe.enqueue_doc to a no-op so investigate() is never queued, keeping the investigator in Investigating state for the first resolve_incidents() check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Also extract call-per-human logic into _attempt_call_human to stay within ruff C901 complexity limit.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

…e-in-marketplace fix(marketplace): Add 'Press User' role permissions

fix(deploy-ui): Show pre-build errors

fix: Guard add_resource and sync against cross-team docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When the latest site update is cancelled, fetch the press notification and display its message as a warning banner at the top of the updates list. Uses a new extraResource hook in ObjectList to load secondary data lazily. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Explain what happened and how to fix it in the backup restore, app changes, and login-as-admin error toasts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot · 2026-06-08T06:41:59Z

@copilot resolve the merge conflicts in this pull request

Resolved all merge conflicts. Here's a summary of what was done:

dashboard/src/components/group/UpdateReleaseGroupDialog.vue – Kept the new patchDeploy resource (routes to Deploy Candidate) alongside the existing deploy resource (routes to Release Pipeline).
dashboard/src/components/navigation/sidebar/NavList.vue – Kept the activePartner-based routing for partner onboarding (redirects to /partner-onboarding for non-active partners), and preserved the condition: Boolean($team.doc.erpnext_partner) guard from master.
press/api/site.py – Took master's refactored function names (_validate_warranty_change, _is_free_dedicated_plan_allowed) and cleaner validate_plan structure, while combining HEAD's null plan_name guard to prevent crashes on sites without an active plan.
press/press/doctype/marketplace_app/test_marketplace_app.py – Kept new test methods and AppSource imports added in the release branch.
press/press/doctype/server/server.py – Kept the reason=reason parameter in frappe.enqueue_doc.
press/press/doctype/team/team.py – Kept the null check for sub.plan_type and sub.plan to avoid errors on incomplete subscriptions.

Commit: fix: resolve merge conflicts with origin/master

codecov-commenter · 2026-06-08T09:03:15Z

Codecov Report

❌ Patch coverage is 41.01363% with 1385 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.33%. Comparing base (f242166) to head (e5d2e07).

Files with missing lines	Patch %	Lines
press/mcp/tools/documents.py	0.00%	307 Missing ⚠️
dashboard/src/objects/site.js	27.20%	205 Missing and 1 partial ⚠️
press/mcp/tools/actions/server.py	31.42%	144 Missing ⚠️
press/mcp/tools/actions/bench.py	28.77%	99 Missing ⚠️
press/mcp/tools/routes.py	0.00%	95 Missing ⚠️
...agent_investigation/support_agent_investigation.py	0.00%	94 Missing ⚠️
press/incident_management/support_agent/report.py	76.20%	64 Missing ⚠️
press/mcp/tools/codebase.py	0.00%	58 Missing ⚠️
dashboard/src/components/ObjectList.vue	54.54%	54 Missing and 1 partial ⚠️
...ss/incident_management/support_agent/collectors.py	78.18%	53 Missing ⚠️
... and 15 more

❌ Your patch status has failed because the patch coverage (41.01%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6635      +/-   ##
==========================================
+ Coverage   49.97%   50.33%   +0.36%     
==========================================
  Files         955      993      +38     
  Lines       79069    83514    +4445     
  Branches      374      523     +149     
==========================================
+ Hits        39511    42037    +2526     
- Misses      39532    41445    +1913     
- Partials       26       32       +6

Flag	Coverage Δ
dashboard	`62.79% <37.69%> (+2.46%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Previously, the investigation report for 500 and 502 errors would tell the support agent to manually open web.error.log via the log browser. This adds a collector that fetches and parses the log automatically. The collector: - Fetches web.error.log via the existing site.get_server_log() agent call. - Parses gunicorn-format entries with a compiled regex, grouping each ERROR/CRITICAL line with the trailing traceback lines that follow it. - Captures only the final exception message line from each traceback — not the full stack frames with local variables, which could carry personal information. - Runs all entries through redact() before storing. - Scans the last 500 lines and returns at most 10 error blocks. The report generator classifies the collected errors into three patterns: - OperationalError / "can't connect" → database connectivity failure. - ImportError / ModuleNotFoundError → broken app state after deployment. - CRITICAL level entries → worker crash or timeout. - Anything else → generic exception, with the message surfaced. The spec Non-Goals are updated to allow redacted exception message lines (the final line of a traceback) as structured data while continuing to exclude raw stack frames with local variables. (cherry picked from commit 756bccf)

Extends get_site_performance_summary with anomaly detection and custom-app identification to give 504 investigations more specific signal before falling back to a generic "use Recorder" recommendation. Changes: - Fetches up to 20 endpoints (was 5) so custom-app paths are not crowded out by core Frappe endpoints in the ranking. - Adds spike_detected per endpoint: peak >= 3x mean AND peak > 2 s. This surfaces endpoints that are occasionally very slow (e.g. a specific document type or a scheduled-job-triggered slow query) even when the 24-hour average is below the 1 s threshold. - Adds is_custom per endpoint: extracts the Python module name from /api/method/<module>.* paths and checks whether that module belongs to a non-Frappe app. App origin is determined by repository_owner on the AppSource record; anything other than "frappe" is custom. - Adds has_custom_apps to the summary so the report knows whether the bench has any non-Frappe apps at all. The bench name is now passed from collect_site_context so the app-source lookup can happen without an extra site query. (cherry picked from commit 3a54fbd)

Splits _add_performance_evidence into three focused functions: - _add_slow_endpoint_evidence: handles consistently-slow endpoints. When the slow endpoint belongs to a non-Frappe app it changes the cause to "custom app endpoints are slow — application-level" instead of the generic "web workers" cause. - _add_spiky_endpoint_evidence: handles endpoints where peak >= 3x mean and peak > 2 s, adding evidence and suggesting Recorder to capture the specific triggering request. - _add_performance_evidence: computes the slow/spiky lists and dispatches. Both conditions are checked independently so an endpoint can be both consistently slow and spiky (e.g. always 2 s but sometimes 30 s). Adds four new tests: - test_500_worker_timeout_in_web_log_flags_critical - test_504_custom_app_endpoint_flagged_as_application_level - test_504_spiky_endpoint_flagged_even_with_low_average - test_504_frappe_endpoint_slow_flags_web_workers (cherry picked from commit c94adf8)

Updates the 504 section to document the new endpoint analysis behavior: app origin detection via AppSource.repository_owner, spike detection (peak >= 3x mean and peak > 2 s), and how the report adjusts its cause and next steps based on whether the slow endpoint is from a custom app. Also updates the Collectors section to describe the enhanced site_performance_summary fields: is_custom, spike_detected, has_custom_apps, and the expanded 20-endpoint fetch window. (cherry picked from commit 2e0716d)

Replace the two external API calls in the investigation collectors with the shared HTTP clients from press/mcp: - get_server_metrics: was calling press.api.server.prometheus_query, a whitelisted API wrapper that also does timezone conversion and label alignment we don't need. Now uses prometheus_get from press.mcp.tools.telemetry.clients directly. Add _prom_params (builds the query_range param dict) and _prom_values (extracts a flat list of floats from the Prometheus matrix response). Simplify _summarise_series to take list[float] instead of the datasets dict that prometheus_query returned. - get_site_performance_summary: was calling get_request_by_ from press.api.analytics, which returns per-time-bucket datasets that we then manually averaged. Now uses elasticsearch_post with a terms aggregation that returns avg_duration_ms and max_duration_ms per path directly — same spike detection, fewer moving parts. Add _slow_endpoint_query (builds the ES body) and _parse_slow_endpoints (converts buckets to the endpoint dicts report.py expects). No changes to press/mcp tooling. The decorated @press_mcp.tool functions are not called. (cherry picked from commit ac5c04c)

Add get_bench_process_status collector, which calls Bench.supervisorctl_status() (the same agent call the MCP's get_bench_processes tool makes) and returns a list of processes not in Running or Starting state. If the gunicorn web process is Fatal or Stopped, the report now flags it as a direct cause of 502 errors rather than leaving the support agent to discover it through logs. The next-step recommendation is to check web.error.log and recent deployments before restarting — a bare restart without diagnosis will recur. Worker processes that are stopped (but the web process is fine) are surfaced as evidence only, not a cause — stopped background workers cause job failures, not 502s. Two new tests: one for a Fatal gunicorn web process, one confirming that all-running processes produce no process-level cause. (cherry picked from commit eeb8c38)

Semgrep flagged `f == f` as a useless equality check. Switch to the explicit `math` module check and add `import math`. (cherry picked from commit 1a274c8)

Tests call collect_site_context → generate_report with prometheus_get and elasticsearch_post returning controlled payloads. This verifies the full transformation pipeline — Prometheus matrix response → _prom_values → _summarise_series → report cause — rather than constructing the payload dict directly. frappe_mcp is not installed in test environments. The test file stubs it in sys.modules at import time (before any press.mcp submodule is touched by the patch machinery) so the import chain succeeds without errors. Six scenarios covered: CPU spike, flat CPU (no spike), uniformly slow endpoint, spiky endpoint, stopped gunicorn web process, database connectivity error in web.error.log. (cherry picked from commit 7e46af6)

Mypy requires explicit annotations for dicts with heterogeneous nested types that it cannot unambiguously infer. Add `: dict` to _PROM_EMPTY and _ES_EMPTY. (cherry picked from commit 35601d8)

Add an Anthropic API key field to Press Settings (Monitoring section). Add a 'Get AI Analysis' button to the completed investigation form. Clicking it sends the already-redacted payload to claude-sonnet-4-6 via the Anthropic Messages API (plain HTTP, no extra package required) and stores the response in a new ai_response field. The model only ever receives data that has already passed through the redaction pipeline — no raw payloads, no personally identifiable data. The controller validates that the investigation is Completed before allowing the call. (cherry picked from commit f94e4a2)

Add probe_success and probe_http_status_code from the blackbox Prometheus exporter so the report can surface a DOWN probe or a 5xx status code as a direct cause without waiting for log analysis. Strip site.name from the payload copy before building the model prompt. The model does not need the customer site identity to reason about platform signals. (cherry picked from commit bcfd9ad)

Document the site_uptime collector (blackbox probe_success and probe_http_status_code), and replace the forward-looking model extension point section with the actual AI Analysis implementation: what is sent to the model, the two-stage privacy boundary (redact + anonymize), how to configure the API key, and what the model is asked to produce. Also add test_investigation to the verification commands. (cherry picked from commit e673f38)

Extend _anonymise() to remove all platform identity fields from the site and bench sections of the payload, not just the site name. Stripped from site: name, bench, server, database_server, cluster, group. Stripped from bench: name, server, database_server, cluster, candidate, build. The model only needs status flags and metrics to reason about the issue. (cherry picked from commit c6d9741)

Check that anthropic_api_key is set in Press Settings at the start of run_ai_analysis(), before any payload is loaded or sent. Remove the duplicate check from analyse() — validation belongs at the boundary. The deterministic run() method never calls the model. AI analysis is always an explicit manual action via the form button. (cherry picked from commit cf62523)

The timeout message is internal — the support agent knows the context and retry path. Suppress non-actionable-error-message inline. (cherry picked from commit 4fc1085)

feat(support): Collect logs, metrics and process state in investigations (backport #6643)

Gunicorn's stderr is a bench-level file at benches/{bench}/logs/web.error.log. Reading it via Site.get_server_log was hitting the site-specific path (benches/{bench}/sites/{site}/logs/) which either doesn't exist or is empty — so no errors were ever collected. This matters most for startup failures (syntax errors in app.py) where gunicorn workers crash before any request is served, because those errors land only in the gunicorn log and nowhere site-specific. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit a4b7374)

fix(support): Read web.error.log from bench, not site (backport #6646)

- Rectify mismatching instance types - Add two additional plans present in db but absent in fixture (cherry picked from commit b5ecd80)

(cherry picked from commit 14b1311)

(cherry picked from commit cb5abae)

fix(server-plan-fixtures): Sync server plans from prod (backport #6640)

fix(onboarding): Update perm issues

(cherry picked from commit 8d1e382)

fix(json): Remove trailing comma (backport #6655)

prathameshkurunkar7 and others added 30 commits May 29, 2026 21:53

Merge branch 'develop' of https://github.com/frappe/press into develop

7d41e77

Revert "fix(auditor): Ignore permissions during audit creation"

ae5c34a

This reverts commit 9395e1d.

fix(auditor): Perm checks ignore when creating from release

fc2486b

Merge pull request #6559 from prathameshkurunkar7/fix-audit-doctype-p…

07f1f7e

…ermission-issue fix(auditor): Perm checks ignore when creating from release

feat(incident): Resolve if sites recover before calling humans

f12bb1c

Also extract call-per-human logic into _attempt_call_human to stay within ruff C901 complexity limit.

feat: Apply suggestions from greptile code review

1a98d46

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

fix(marketplace): Add 'Press User' role permissions

79a8b37

feat(marketplace): Wire up permissions for Marketplace App and App Plan

d110e80

fix(tests): Set user context before creating test marketplace app plan

5851816

Merge pull request #6562 from prathameshkurunkar7/fix-permission-issu…

798a0e3

…e-in-marketplace fix(marketplace): Add 'Press User' role permissions

fix(site): Concise error message

0a345db

fix(deploy-ui): show pre-build errors

dc2ca90

Merge pull request #6565 from frappe/fix-ui

f3e150c

fix(deploy-ui): Show pre-build errors

fix: guard add_resource and sync against cross-team docs

d8ffac8

fix: batch team ownership check in sync_press_role

e9d9daf

Merge pull request #6564 from saurabh6790/weekend-support

5b925c2

fix: Guard add_resource and sync against cross-team docs

fix(deploy-ui): Avoid showing new bench job name label

5fd91b8

fix(deploy-ui): disable building stage if no buildsteps

98bea2a

fix(deploy-ui): autoexpand building stage accordion

31d9b8a

style(dashboard): Biome format

29d92dd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'develop' into add-partner-onboarding

816d222

test(site-update): Playwright test for cancellation notification banner

2969d89

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(onboarding): Enhance error messages

1e7e57f

refactor: Update error message comments for CI

a990a0f

fix(dashboard): Make site error messages actionable

49d0a60

Explain what happened and how to fix it in the backup restore, app changes, and login-as-admin error toasts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

refactor(onboarding): Nosemgrep CI fixing

6861ca1

Copilot finished work on behalf of saurabh6790 June 8, 2026 06:42

Copilot AI requested a review from saurabh6790 June 8, 2026 06:42

balamurali27 and others added 26 commits June 9, 2026 09:54

fix(support): Replace NaN float idiom in Prometheus value parser

9bac3d8

Semgrep flagged `f == f` as a useless equality check. Switch to the explicit `math` module check and add `import math`. (cherry picked from commit 1a274c8)

fix(support): Add type annotations to module-level test constants

d37719c

Mypy requires explicit annotations for dicts with heterogeneous nested types that it cannot unambiguously infer. Add `: dict` to _PROM_EMPTY and _ES_EMPTY. (cherry picked from commit 35601d8)

fix(support): Suppress semgrep rule on timeout error in Claude client

f4632f4

The timeout message is internal — the support agent knows the context and retry path. Suppress non-actionable-error-message inline. (cherry picked from commit 4fc1085)

Merge pull request #6650 from frappe/mergify/bp/release/2026-W24/pr-6643

872b849

feat(support): Collect logs, metrics and process state in investigations (backport #6643)

Merge pull request #6651 from frappe/mergify/bp/release/2026-W24/pr-6646

837f56e

fix(support): Read web.error.log from bench, not site (backport #6646)

fix(plan_instance_type): Sync server plans from prod

f9eebc6

- Rectify mismatching instance types - Add two additional plans present in db but absent in fixture (cherry picked from commit b5ecd80)

fix(server-plan-fixtures): Change m7g platform from x86_64 to arm64

390960c

(cherry picked from commit 14b1311)

fix(server-plan-fixtures): Update roles to Press User

0a270e8

(cherry picked from commit cb5abae)

Merge pull request #6652 from frappe/mergify/bp/release/2026-W24/pr-6640

6b0b6ce

fix(server-plan-fixtures): Sync server plans from prod (backport #6640)

fix(onboarding): Update perm issues

9920682

Merge pull request #6653 from prathameshkurunkar7/release/2026-W24

c96b5af

fix(onboarding): Update perm issues

fix(json): Remove trailing comma

a8bee57

(cherry picked from commit 8d1e382)

Merge pull request #6656 from frappe/mergify/bp/release/2026-W24/pr-6655

e5d2e07

fix(json): Remove trailing comma (backport #6655)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Release release/2026-W24#6635

chore: Release release/2026-W24#6635
github-actions[bot] wants to merge 468 commits into
masterfrom
release/2026-W24

github-actions Bot commented Jun 8, 2026

Uh oh!

Copilot AI commented Jun 8, 2026

Uh oh!

codecov-commenter commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

Conversation

github-actions Bot commented Jun 8, 2026

Uh oh!

Copilot AI commented Jun 8, 2026

Uh oh!

codecov-commenter commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

codecov-commenter commented Jun 8, 2026 •

edited

Loading