fix(server-analytics): Prevent spiky underlay when stacking in small timeframes#6602
fix(server-analytics): Prevent spiky underlay when stacking in small timeframes#6602balamurali27 wants to merge 6 commits into
Conversation
The CPU/network/IOPS and MariaDB rate charts spiked down to zero on arbitrary steps. The rate()/increase() lookback window was set to the chart step (timegrain), which auto_timespan_timegrain can size below 2x the 60s Prometheus scrape interval. When the window spans fewer than two samples Prometheus returns no value for that step, and the chart renders the gap as a spike to zero. Decouple the rate window from the step with get_rate_interval(), which mirrors Grafana's $__rate_interval and always spans several scrapes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tests for get_rate_interval and the server analytics query builder: the rate window always spans at least two scrape intervals, the CPU query uses the widened window for a one-hour range, no rate chart uses a sub-scrape window, and the Prometheus series has no gaps (missing steps become None, not zero). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mock the analytics API and assert the CPU chart renders its line paths with non-spiky data, requests analytics with a valid time range, and shows the empty state when there is no data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
| Filename | Overview |
|---|---|
| press/api/analytics.py | Adds get_rate_interval() that mirrors Grafana's $__rate_interval: max(timegrain + scrape_interval, 4 × scrape_interval), plus a PROMETHEUS_SCRAPE_INTERVAL constant. Formula is correct; Final is already imported; zero-timegrain guard is present. |
| press/api/server.py | Replaces timegrain with rate_interval in all rate()/increase() PromQL queries (cpu, network, iops, database_commands_count, innodb_bp_miss_percent, innodb_avg_row_lock_time). Import updated accordingly; no unintended side-effects on non-rate queries. |
| press/api/tests/test_analytics.py | New test module covering get_rate_interval(), auto_timespan_timegrain(), per-chart query capture, and Prometheus series alignment. Window-size assertion and gap-as-None tests are both correct. |
| dashboard/tests-e2e/tests/dashboard/server-charts.test.ts | Playwright e2e tests: chart renders with data, start/end params are sent, empty-state shows on no data. Imports coverage.fixture which exists; mock helpers are well-structured. |
| dashboard/tests-e2e/mocks/server-charts/analytics.ts | Mock data helpers for e2e tests; cpuDataWithGaps is exported but never imported in the test file (dead export flagged in a previous review thread). |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["analytics() endpoint\n(start, end, query)"] --> B["auto_timespan_timegrain()\n→ timegrain (chart step)"]
B --> C["get_rate_interval(timegrain)\nmax(timegrain + 60s, 4 × 60s)"]
C --> D{"timegrain ≤ 180 s?\n(e.g. 90 s for 1 h view)"}
D -- "Yes (min floor kicks in)" --> E["rate_interval = 240 s\n(4 × scrape_interval)"]
D -- "No (step is coarse)" --> F["rate_interval = timegrain + 60 s"]
E --> G["PromQL: rate(...[240s])\nAlways ≥ 2 scrapes → no gaps"]
F --> G
G --> H["prometheus_query()\nstep = timegrain (chart resolution unchanged)"]
H --> I["Chart renders without\nspikes to zero"]
Reviews (3): Last reviewed commit: "fix(site): Correct grammar" | Re-trigger Greptile
| const data = cpuData() | ||
| for (const dataset of data.datasets) { | ||
| dataset.values = dataset.values.map((value, i) => | ||
| i % 2 === 1 ? null : value, | ||
| ) | ||
| } | ||
| return data | ||
| } | ||
|
|
||
| export function emptyData() { | ||
| return { datasets: [], labels: [] } | ||
| } |
There was a problem hiding this comment.
cpuDataWithGaps is exported but never imported or used in server-charts.test.ts. The comment says it represents the pre-fix shape ("spikes to zero") and was presumably intended to anchor a regression test that would assert charts with null gaps don't render zeros, but that test was never written. The export is dead code in its current state.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| def test_rate_charts_never_use_a_sub_scrape_window(self): | ||
| start = datetime(2024, 1, 1, 0, 0, 0) | ||
| for chart in ("cpu", "network", "iops", "database_commands_count"): | ||
| query = self._capture_query(start, start + timedelta(minutes=10), chart) | ||
| window = int(query.split("[")[1].split("s]")[0]) | ||
| self.assertGreaterEqual( | ||
| window, | ||
| 2 * PROMETHEUS_SCRAPE_INTERVAL, | ||
| f"{chart} chart queried a {window}s window", | ||
| ) |
There was a problem hiding this comment.
InnoDB rate charts are not covered by the window-size assertion
test_rate_charts_never_use_a_sub_scrape_window checks cpu, network, iops, and database_commands_count, but innodb_bp_miss_percent and innodb_avg_row_lock_time also use rate_interval in their queries and are not included. A future regression that accidentally re-introduces a narrow window on those queries would not be caught here.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #6602 +/- ##
===========================================
+ Coverage 49.43% 50.84% +1.41%
===========================================
Files 976 990 +14
Lines 81257 84133 +2876
Branches 379 701 +322
===========================================
+ Hits 40166 42778 +2612
- Misses 41063 41321 +258
- Partials 28 34 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Make the helper's call sites say what they run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
fix(server-charts): Widen rate() window to stop spikes to zero
The CPU/network/IOPS and MariaDB rate charts spiked down to zero on
arbitrary steps. The rate()/increase() lookback window was set to the
chart step (timegrain), which auto_timespan_timegrain can size below 2x
the 60s Prometheus scrape interval. When the window spans fewer than two
samples Prometheus returns no value for that step, and the chart renders
the gap as a spike to zero.
Decouple the rate window from the step with get_rate_interval(), which
mirrors Grafana's $__rate_interval and always spans several scrapes.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
test(server-charts): Cover rate-window sizing for analytics queries
Tests for get_rate_interval and the server analytics query builder: the
rate window always spans at least two scrape intervals, the CPU query
uses the widened window for a one-hour range, no rate chart uses a
sub-scrape window, and the Prometheus series has no gaps (missing steps
become None, not zero).
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
test(server-charts): Add Playwright e2e tests for server charts
Mock the analytics API and assert the CPU chart renders its line paths
with non-spiky data, requests analytics with a valid time range, and
shows the empty state when there is no data.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
chore(gitignore): Ignore .claude directory
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com