<!-- On Azure Kubernetes Service (AKS) with Azure Managed Prometheus, the metrics agent (ama-metrics) only scrapes CRDs from azmonitoring.coreos.com/v1. The gpu-operator hard-codes monitoring.coreos.com/v1 as the API group for ServiceMonitor CRDs, so ServiceMonitors are silently ignored and no GPU metrics are collected. There is no Helm value or chart parameter to override the API group. References in source: - ServiceMonitorCRDName = "servicemonitors.monitoring.coreos.com" (pkg/controllers/object_controls.go) - promv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1" (imports) - Add a Helm value to configure the ServiceMonitor/PodMonitor API group: serviceMonitor: enabled: true apiGroup: "azmonitoring.coreos.com" # default: "monitoring.coreos.com" Or detect the available CRD API group at install time and use whichever is present. - Current workaround: 1. Set serviceMonitor.enabled: false in Helm values 2. Deploy a hand-rolled PodMonitor using azmonitoring.coreos.com/v1 targeting nvidia-dcgm-exporter pods This works but is fragile and requires maintaining a separate manifest outside the chart. - gpu-operator chart: v25.3.4 - Kubernetes: AKS 1.31.x - Monitoring: Azure Managed Prometheus (ama-metrics addon) - Affected component: dcgm-exporter ServiceMonitor - Impact: Any AKS cluster using Azure Managed Prometheus + gpu-operator gets zero GPU metrics unless they manually work around this. - -->