diff --git a/assets/img/monitoring-container-thread-pools.png b/assets/img/monitoring-container-thread-pools.png index 6066cc495c..f90c5aa6c5 100644 Binary files a/assets/img/monitoring-container-thread-pools.png and b/assets/img/monitoring-container-thread-pools.png differ diff --git a/assets/img/monitoring-dashboard.png b/assets/img/monitoring-dashboard.png new file mode 100644 index 0000000000..94c882bedc Binary files /dev/null and b/assets/img/monitoring-dashboard.png differ diff --git a/assets/img/monitoring-health-indicators.png b/assets/img/monitoring-health-indicators.png index 5186c9f67e..004fa4a87e 100644 Binary files a/assets/img/monitoring-health-indicators.png and b/assets/img/monitoring-health-indicators.png differ diff --git a/assets/img/monitoring-jvm-memory.png b/assets/img/monitoring-jvm-memory.png index 1390207d0e..7d8a8c9c63 100644 Binary files a/assets/img/monitoring-jvm-memory.png and b/assets/img/monitoring-jvm-memory.png differ diff --git a/en/operations/monitoring.html b/en/operations/monitoring.html index c6c65d9a8e..8077e067dd 100644 --- a/en/operations/monitoring.html +++ b/en/operations/monitoring.html @@ -6,7 +6,7 @@ - /en/cloud/monitoring.html --- -Sample Vespa Console dashboard +Sample Vespa Console dashboard

The Vespa Cloud Console has dashboards for insight into performance metrics, use the METRICS tab in the application zone view. @@ -28,43 +28,45 @@

The Vespa Cloud metrics dashboard

Tabs and filters

Dashboard tab bar -

The dashboard is organized into seven tabs:

+

The dashboard is organized into tabs for different purposes:

- + - + - + - + - - + + - + - +
TabWhat it showsWhen to use it
Overview
Overview Health indicators, request rates, QoS, latency summary, HTTP status codes, resource utilization Daily health check, first stop during incidents
Query
Query Container- and content-node query latency, per-rank-profile breakdown, match/docsum executors Investigating read latency, query quality issues
Feed
Feed Feed operation rates and latency at each layer, feed blocking Investigating write latency or throughput issues
Nearest Neighbor Search
Nearest Neighbor Search NNS distance computations, visit efficiencyTuning HNSW parameters (hidden when not in use)
Content NodeTuning HNSW parameters (hidden when not in use)
Content Node Document counts, Proton resource usage, executor utilization, maintenance jobs Deep investigation of search engine internals
Resources
Resources CPU, memory, disk, GPU, JVM, thread pools Sizing and scaling decisions
Health
Health Cluster state, data consistency, restarts, reindexing, resource limits Stability monitoring, post-incident review

Filters at the top apply across all tabs:

- Query, Feed, Content Node, Resources, and Health tabs group metrics per cluster — + Query, Feed, + Content Node, Resources, + and Health tabs group metrics per cluster — you see all metrics for one cluster before scrolling to the next. Container metrics are grouped per container cluster, content metrics per content cluster.

@@ -83,7 +85,7 @@

Annotations

Feed blocked in cluster - A content node crosses its disk/memory feed-block limit + A content node crosses its disk/memory feed-block limit Writes are paused cluster-wide until remediated Vespa upgrade A new Vespa version is rolled out @@ -116,10 +118,12 @@

Overview tab

Health Indicators

Overview tab Health Indicators row

- The Overview tab opens with a dedicated Health Indicators row — - five stat panels designed to surface stability issues in a single glance. - A row of green zeros is the signal to stop; a non-zero value tells you which tab to visit next. + The Overview tab opens with a dedicated Health Indicators row, + organized into three themed sub-rows. A row of green tiles is the signal to stop; + a non-zero value (or low Headroom) tells you which tab to visit next.

+ +
Stability — binary "should be zero" signals
@@ -127,22 +131,78 @@

Health Indicators

- + - + - + + +
IndicatorWhat it countsHealthy value
Core Dumps (1h) Core dumps processed across all clusters in the last hour0 — any non-zero value is a crash to investigate
0: any non-zero value is a crash
Restarts (1h)Vespa service restarts across all clusters in the last hourVespa service restarts across all clusters in the last hour. The underlying + sentinel_totalRestarts metric is cumulative since the sentinel + started; the "1h" window is computed by the panel via + delta(...[1h]) > 0. The > 0 filter discards + negative deltas that occur when the sentinel itself restarts and the counter + resets (a reset implies a restart happened, but the count within the reset + frame is unrecoverable). Same shape is used by the Core Dumps (1h) + tile. 0 during steady state; brief spikes are normal during upgrades
Feed Blocked Nodes currently above a feed-block resource limit0 — non-zero means writes are being rejected cluster-wide
0: non-zero means writes are being rejected cluster-wide
+ +
Cluster availability
+ + + + + + + + - - -
IndicatorWhat it countsHealthy value
Container: % Nodes DownActive container nodes where some service isn't running0 during steady state; brief spikes during deployments are expected
Content: Groups/Nodes Down Content groups with at least one node down 0 during steady state. 1 group down is normal during rolling restarts or maintenance; 2 or more should be investigated
Container: Services DownActive container nodes where some service isn't running0 during steady state; brief spikes during deployments are expected
+
Resource pressure
+

+ These tiles surface per-cluster saturation signals — values close to the + threshold mean the corresponding tab needs investigation now, not after the next outage. + The thread saturation tiles only render for the container configuration cases that exist + in your deployment (see Container thread pools below). +

+ + + + + + + + + + + + + + + + + + + + + + + + +
IndicatorWhat it countsHealthy value
Headroom to Feed Block (per content cluster)Remaining headroom before the feed-block limit, taken as the minimum across memory and disk (1 − usage ÷ limit)≥ 10% (green): healthy. 5–10% orange = plan capacity. < 5% or ≤ 0 = act now / cluster is feed-blocked
Content Executor Saturation (per content cluster)Worst-case utilization across the Proton executors most relevant to latency: match, docsum, field-writer (utilization and saturation)< 80% (green); 80–95% orange = queries / feed will start queueing; ≥ 95% red = action needed
Container Thread Saturation — search + document-apiPer container cluster (with both <search> and <document-api>): worst active / size ratio across all JDisc thread pools< 80% (green); 80–95% orange; ≥ 95% red: search-handler saturation directly degrades query latency
Container Thread Saturation — search onlySame as above, for clusters with only <search>Same thresholds (80% / 95%): latency-critical
Container Thread Saturation — document-api onlyFor clusters with only <document-api>< 90% (green); 90–98% orange; ≥ 98% red: later warning since feed delays don't surface as user-visible query failures
JVM Heap Pressure (per container cluster)Heap used ÷ heap capacity, averaged across hosts in the cluster< 70% (green); 70–85% orange; ≥ 85% red. Lights up before Core Dumps or Restarts do — the leading indicator for OOM/forced-restart risk
+

+ Note: Headroom to Feed Block inverts the usual reading — higher is better. + Its underlying metric aggregates across all storage nodes including those in maintenance + or retired state, so headroom can show below 5% on a cluster that isn't actually + feed-blocked. Cross-reference the Feed Blocked tile + (which only counts in-service nodes) for ground truth. +

+

QoS and latency overview

QoS (Quality of Service) shows the percentage of successful requests. @@ -187,9 +247,9 @@

Container-level metrics

  • Did QPS increase? More queries means more load.
  • Which latency metric increased?
  • @@ -199,11 +259,11 @@

    Container-level metrics

    The Query Quality row shows:

    @@ -214,19 +274,84 @@

    Rank profile metrics

    the Rank Profile dropdown:

    + + +
    Reading the metrics together — the matching pipeline
    +

    + These panels split per-query cost across the four phases of a matching query. + An operator triaging high content-side CPU benefits from reading them in pipeline order + rather than panel-by-panel: +

    +
    +match  →  first-phase rank  →  second-phase rerank  →  grouping & result construction
    +
    +

    Cost model per phase:

    + +

    + Soft-doom signals are the outcome of all of the above. + soft_doomed_queries counts queries that ran out of their soft timeout and + returned partial results; soft_doom_factor is an adaptive multiplier + (starts at 0.5, ticks ±0.01/0.02 per query depending on whether the query + finished under its soft timeout) that Vespa uses to shrink the per-query + deadline when queries are consistently overrunning. If soft-doom is firing, drill into + setup / rerank / grouping time on the same profile to find the overrunning phase. +

    + +
    Metric semantics — some non-obvious points
    +

    + Three of the timing metrics measure something slightly different from what their name + suggests: +

    + -

    Things to look for:

    +

    + The docs_matched rate is mostly a proxy for first-phase ranking + work, but a few mechanisms can skip ranking for matched documents: + rank-score-drop-limit drops low-scoring docs (still counted as matched), + match-phase limiting can cap how many docs reach ranking, and threads + hitting soft-timeout mid-loop never rank the remainder of their range. +

    + + +
    Things to look for

    - See Latency tracking below for a worked example, - and the - rank profiles documentation for background. + Each panel's hover tooltip carries impact when high and + investigate hints, and points sideways to the next likely panel to drill + into. See Latency tracking below for a worked example, + the rank profiles documentation + for background, and the + Practical search + performance guide for tuning recipes.

    Match and Docsum executor panels

    - The Query tab also includes Match Executor and Docsum Executor sub-rows - (queue size + accepted rate) so you can see whether the content-node thread pools - feeding the query and summary paths are saturated. These are not attributable to a - rank profile, but often explain tail-latency spikes that aren't visible in rank-profile metrics. + The Query tab also includes Match Executor and Docsum Executor sub-rows so you can see + whether the content-node thread pools feeding the query and summary paths are saturated. These are + not attributable to a rank profile, but often explain tail-latency spikes that aren't visible in + rank-profile metrics. +

    +

    The Docsum Executor row carries four panels per content cluster:

    + + + + + + + + + + + + + + + + + + + + + + + + +
    PanelWhat it showsRead it together with
    Docsum executor queue size (max)Peak length of the per-node docsum thread-pool queue. Sustained non-zero means tasks + are arriving faster than they can be drained.Docsum latency: queue depth and latency rise together when the pool is the bottleneck.
    Docsum executor accepted (rate)Throughput at the front door: tasks scheduled per second. One task = one summary + document to render.Document summaries requested (rate): accepted vs. completed.
    Docsum latencyAvg (steady-state) and max (per-host worst) time to render a summary. Cost grows with + summary class size, number of summary fields, and match-features + / summary-features that recompute at docsum time.Queue size: rising latency with rising queue points at executor saturation.
    Document summaries requested (rate)Throughput at the back door: renderings completed per second. Derived from + the docsum latency sample count over the snapshot interval.Docsum executor accepted (rate): sustained accepted > completed lines up with + growing queue depth and rising docsum latency.
    +

    + Docsum cost is not attributable to a single rank profile, so investigate the overall + query mix — expensive summary classes, large hits counts, or + match-features / summary-features lists that force per-hit feature + recomputation. +

    +

    + Docsum reads summary fields from the + document store. When those reads miss + the document-store cache they become disk reads, which surface as + CPU IOWait on the Resources tab — so a high + Document summaries requested (rate) combined with a low + Document Store Cache Hit Rate is the typical cause of IOWait on a + search cluster with no active feed.

    @@ -259,12 +435,15 @@

    Feed tab

    → Container Feed Latency (document processing chains, embedders) → Distributor Latency (routing based on bucket distribution) → Content: Storage Latency(persistence, per document replica) - → Commit Latency (transaction log) + → Persistence engine (input queue + adaptive concurrency throttle) + → Commit Latency (transaction log)

    Start from the top and find where latency increases. If container feed latency is normal but HTTP write latency is high, the bottleneck is network/payload. If distributor latency is high, check for node state issues in the Health tab. - If storage latency is high, check disk I/O in the Resources tab.

    + If storage latency is high, check the + Persistence Engine row to see whether the bottleneck is + the storage backend queue or the concurrency throttle, and disk I/O in the Resources tab.

    Typical healthy values