Skip to content

fix(server-analytics): Prevent spiky underlay when stacking in small timeframes#6602

Open
balamurali27 wants to merge 6 commits into
developfrom
chart-fix
Open

fix(server-analytics): Prevent spiky underlay when stacking in small timeframes#6602
balamurali27 wants to merge 6 commits into
developfrom
chart-fix

Conversation

@balamurali27

@balamurali27 balamurali27 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

fix(server-charts): Widen rate() window to stop spikes to zero
The CPU/network/IOPS and MariaDB rate charts spiked down to zero on
arbitrary steps. The rate()/increase() lookback window was set to the
chart step (timegrain), which auto_timespan_timegrain can size below 2x
the 60s Prometheus scrape interval. When the window spans fewer than two
samples Prometheus returns no value for that step, and the chart renders
the gap as a spike to zero.

Decouple the rate window from the step with get_rate_interval(), which
mirrors Grafana's $__rate_interval and always spans several scrapes.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

test(server-charts): Cover rate-window sizing for analytics queries
Tests for get_rate_interval and the server analytics query builder: the
rate window always spans at least two scrape intervals, the CPU query
uses the widened window for a one-hour range, no rate chart uses a
sub-scrape window, and the Prometheus series has no gaps (missing steps
become None, not zero).

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

test(server-charts): Add Playwright e2e tests for server charts
Mock the analytics API and assert the CPU chart renders its line paths
with non-spiky data, requests analytics with a valid time range, and
shows the empty state when there is no data.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

chore(gitignore): Ignore .claude directory
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

balamurali27 and others added 4 commits June 2, 2026 23:28
The CPU/network/IOPS and MariaDB rate charts spiked down to zero on
arbitrary steps. The rate()/increase() lookback window was set to the
chart step (timegrain), which auto_timespan_timegrain can size below 2x
the 60s Prometheus scrape interval. When the window spans fewer than two
samples Prometheus returns no value for that step, and the chart renders
the gap as a spike to zero.

Decouple the rate window from the step with get_rate_interval(), which
mirrors Grafana's $__rate_interval and always spans several scrapes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tests for get_rate_interval and the server analytics query builder: the
rate window always spans at least two scrape intervals, the CPU query
uses the widened window for a one-hour range, no rate chart uses a
sub-scrape window, and the Prometheus series has no gaps (missing steps
become None, not zero).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mock the analytics API and assert the CPU chart renders its line paths
with non-spiky data, requests analytics with a valid time range, and
shows the empty state when there is no data.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@balamurali27 balamurali27 changed the title chart fix fix(server-analytics): Prevent spiky underlay when stacking in small timeframes Jun 3, 2026
@greptile-apps

greptile-apps Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes CPU/network/IOPS/MariaDB charts that spiked down to zero on arbitrary steps by decoupling the Prometheus rate()/increase() lookback window from the chart step. Previously, when auto_timespan_timegrain produced a step below 2× the 60 s scrape interval (e.g. 90 s for a 1-hour view), Prometheus could see only one sample in the window and return nothing, which the chart rendered as a zero spike.

  • get_rate_interval() in analytics.py mirrors Grafana's $__rate_interval: max(timegrain + scrape_interval, 4 × scrape_interval), ensuring the window always spans several scrapes regardless of the chart step.
  • All rate()/increase() queries in server.py (cpu, network, iops, database_commands_count, innodb_bp_miss_percent, innodb_avg_row_lock_time) are updated to use rate_interval instead of timegrain.
  • A comprehensive unit test suite (test_analytics.py) and Playwright e2e tests (server-charts.test.ts) verify the window sizing, query correctness, and chart rendering.

Confidence Score: 5/5

Safe to merge — the change is a targeted, well-tested decoupling of the Prometheus rate window from the chart step, with no schema, permission, or data-path regressions.

The get_rate_interval formula exactly mirrors Grafana's $__rate_interval and is verified by unit tests against every value auto_timespan_timegrain can produce. All affected PromQL queries are updated consistently, non-rate queries are untouched, and the Playwright e2e tests add end-to-end coverage. No new security surface, no ORM changes, no migrations.

No files require special attention.

Important Files Changed

Filename Overview
press/api/analytics.py Adds get_rate_interval() that mirrors Grafana's $__rate_interval: max(timegrain + scrape_interval, 4 × scrape_interval), plus a PROMETHEUS_SCRAPE_INTERVAL constant. Formula is correct; Final is already imported; zero-timegrain guard is present.
press/api/server.py Replaces timegrain with rate_interval in all rate()/increase() PromQL queries (cpu, network, iops, database_commands_count, innodb_bp_miss_percent, innodb_avg_row_lock_time). Import updated accordingly; no unintended side-effects on non-rate queries.
press/api/tests/test_analytics.py New test module covering get_rate_interval(), auto_timespan_timegrain(), per-chart query capture, and Prometheus series alignment. Window-size assertion and gap-as-None tests are both correct.
dashboard/tests-e2e/tests/dashboard/server-charts.test.ts Playwright e2e tests: chart renders with data, start/end params are sent, empty-state shows on no data. Imports coverage.fixture which exists; mock helpers are well-structured.
dashboard/tests-e2e/mocks/server-charts/analytics.ts Mock data helpers for e2e tests; cpuDataWithGaps is exported but never imported in the test file (dead export flagged in a previous review thread).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["analytics() endpoint\n(start, end, query)"] --> B["auto_timespan_timegrain()\n→ timegrain (chart step)"]
    B --> C["get_rate_interval(timegrain)\nmax(timegrain + 60s, 4 × 60s)"]
    C --> D{"timegrain ≤ 180 s?\n(e.g. 90 s for 1 h view)"}
    D -- "Yes (min floor kicks in)" --> E["rate_interval = 240 s\n(4 × scrape_interval)"]
    D -- "No (step is coarse)" --> F["rate_interval = timegrain + 60 s"]
    E --> G["PromQL: rate(...[240s])\nAlways ≥ 2 scrapes → no gaps"]
    F --> G
    G --> H["prometheus_query()\nstep = timegrain (chart resolution unchanged)"]
    H --> I["Chart renders without\nspikes to zero"]
Loading

Reviews (3): Last reviewed commit: "fix(site): Correct grammar" | Re-trigger Greptile

Comment on lines +44 to +55
const data = cpuData()
for (const dataset of data.datasets) {
dataset.values = dataset.values.map((value, i) =>
i % 2 === 1 ? null : value,
)
}
return data
}

export function emptyData() {
return { datasets: [], labels: [] }
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused export cpuDataWithGaps

cpuDataWithGaps is exported but never imported or used in server-charts.test.ts. The comment says it represents the pre-fix shape ("spikes to zero") and was presumably intended to anchor a regression test that would assert charts with null gaps don't render zeros, but that test was never written. The export is dead code in its current state.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +114 to +123
def test_rate_charts_never_use_a_sub_scrape_window(self):
start = datetime(2024, 1, 1, 0, 0, 0)
for chart in ("cpu", "network", "iops", "database_commands_count"):
query = self._capture_query(start, start + timedelta(minutes=10), chart)
window = int(query.split("[")[1].split("s]")[0])
self.assertGreaterEqual(
window,
2 * PROMETHEUS_SCRAPE_INTERVAL,
f"{chart} chart queried a {window}s window",
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 InnoDB rate charts are not covered by the window-size assertion

test_rate_charts_never_use_a_sub_scrape_window checks cpu, network, iops, and database_commands_count, but innodb_bp_miss_percent and innodb_avg_row_lock_time also use rate_interval in their queries and are not included. A future regression that accidentally re-introduces a narrow window on those queries would not be caught here.

@codecov-commenter

codecov-commenter commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 50.84%. Comparing base (af89b0c) to head (88ede89).
⚠️ Report is 57 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6602      +/-   ##
===========================================
+ Coverage    49.43%   50.84%   +1.41%     
===========================================
  Files          976      990      +14     
  Lines        81257    84133    +2876     
  Branches       379      701     +322     
===========================================
+ Hits         40166    42778    +2612     
- Misses       41063    41321     +258     
- Partials        28       34       +6     
Flag Coverage Δ
dashboard 64.17% <ø> (+3.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Make the helper's call sites say what they run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Want your agent to iterate on Greptile's feedback? Try greploops.

@balamurali27 balamurali27 requested a review from tanmoysrt as a code owner June 3, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants