Skip to content

fix(prometheus): handle /api/v1/metadata responses without data field#5437

Open
lezzago wants to merge 1 commit into
opensearch-project:mainfrom
lezzago:fix-prometheus-metadata-missing-data-key
Open

fix(prometheus): handle /api/v1/metadata responses without data field#5437
lezzago wants to merge 1 commit into
opensearch-project:mainfrom
lezzago:fix-prometheus-metadata-missing-data-key

Conversation

@lezzago
Copy link
Copy Markdown
Member

@lezzago lezzago commented May 12, 2026

Description

PrometheusClientImpl blindly called response.getJSONObject("data") / response.getJSONArray("data") on every Prometheus API response. The Prometheus HTTP API spec does not mandate that data be present, and Cortex (observed on v1.18.1) legitimately returns {"status":"success"} with no data key when the endpoint has nothing to report.

This is the default state for any Cortex backend fed via OTLP remote-write, since OTLP metrics carry no OpenMetrics # HELP / # TYPE comments for Cortex to surface. A vanilla Prometheus returns {"status":"success","data":{}} in the same state, so the spec allows both shapes.

The resulting JSONException propagates up through PrometheusQueryHandler.getResources()DirectQueryExecutorServiceImpl and surfaces in OpenSearch Dashboards' Metrics Explore UI as "Unable to load metrics" for every metric against any Cortex-backed Prometheus datasource.

Stack trace observed on the live EKS deployment:

org.json.JSONException: JSONObject["data"] not found.
    at org.json.JSONObject.get(JSONObject.java:568)
    at org.json.JSONObject.getJSONObject(JSONObject.java:778)
    at org.opensearch.sql.prometheus.client.PrometheusClientImpl.getAllMetrics(PrometheusClientImpl.java:200)
    at org.opensearch.sql.prometheus.query.PrometheusQueryHandler.getResources(PrometheusQueryHandler.java:136)
    ...

Fix

Guard each affected call site (getAllMetrics, queryRange, query, getLabels, getLabel, getSeries, queryExemplars, getAlerts) with a response.has("data") check, returning a type-appropriate empty value (new JSONObject(), new JSONArray(), new ArrayList<>(), new HashMap<>()) instead of throwing.

Mutable empty collections are intentional: the transport layer reflectively constructs response maps, and Collections.emptyMap() / Collections.emptyList() trigger InaccessibleObjectException ("module java.base does not 'opens java.util' to unnamed module") during XContent serialization.

getRules already routes through normalizeRulesResponse(), which handles the missing-data case, so no change was needed there. Alertmanager methods don't read a data field.

Testing

  • ./gradlew :direct-query-core:test — 254 tests, 0 failures (including 9 new tests — one per patched method — each feeding {"status":"success"} and asserting a safe empty return).
  • ./gradlew :prometheus:test — 104 tests, 0 failures.
  • End-to-end validation against a live Cortex v1.18.1 backend in Docker:
    • Direct REST: GET /_plugins/_directquery/_resources/<ds>/api/v1/metadataHTTP 200 {"data":{}} (was 500).
    • OpenSearch Dashboards Metrics Explore UI: loads fully with label breakdowns, no "Unable to load metrics" toast.
    • Dashboards POST /api/enhancements/resources for metric_metadata200 {"data":{},"type":"prometheus"} (was 503).

Issues resolved

N/A — filed directly.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

joshuali925
joshuali925 previously approved these changes May 12, 2026
Comment thread CHANGELOG.md Outdated
## [Unreleased]

### Fixed
- Fix PrometheusClientImpl to handle Prometheus responses missing the `data` field ([#5437](https://github.com/opensearch-project/sql/pull/5437))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need changelog now?

ps48
ps48 previously approved these changes May 12, 2026
Prometheus-compatible backends (notably Cortex) legitimately return
{"status":"success"} with no "data" key when there is no metric
metadata to report — which is always the case for OTLP-ingested
metrics, since OTLP carries no OpenMetrics HELP/TYPE comments.

PrometheusClientImpl.getAllMetrics(), queryRange(), query(), getLabels(),
getLabel(), getSeries(), queryExemplars(), and getAlerts() blindly
called response.getJSONObject("data")/getJSONArray("data"), throwing
JSONException and breaking OSD's Metrics Explore UI end-to-end for
Cortex-backed Prometheus datasources.

Guard each call site with a response.has("data") check, returning an
empty result for the no-metadata case.

Signed-off-by: Ashish Agrawal <ashisagr@amazon.com>
@lezzago lezzago dismissed stale reviews from ps48 and joshuali925 via cdddbb7 May 12, 2026 22:40
@lezzago lezzago force-pushed the fix-prometheus-metadata-missing-data-key branch from 7848bd8 to cdddbb7 Compare May 12, 2026 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants