Skip to content

refactor: migrate metrics from prometheus crate to metrics-rs facade#6328

Open
shuheiktgw wants to merge 2 commits intomainfrom
refactor/metrics-rs-migration
Open

refactor: migrate metrics from prometheus crate to metrics-rs facade#6328
shuheiktgw wants to merge 2 commits intomainfrom
refactor/metrics-rs-migration

Conversation

@shuheiktgw
Copy link
Copy Markdown
Collaborator

@shuheiktgw shuheiktgw commented Apr 21, 2026

Description

Replace the prometheus crate with the metrics-rs facade so a single call site can fan out to Prometheus, OTLP, and DogStatsD. A FanoutBuilder in logger.rs composes the three recorders and a PrefixFilter routes quickwit_* to Prometheus + OTLP and pomsky.* to DogStatsD.

Histogram-only static registration via inventory

metrics-rs does not expose bucket configuration through the Recorder trait — the Prometheus and OTLP exporters need bucket bounds configured up-front, before any observation. To avoid a full metric-registry refactor, histograms only are declared statically: new define_histogram! / define_histogram_vec! macros register a HistogramConfig via inventory at compile time, and the install path walks inventory::iter::<HistogramConfig>() to apply bounds to both exporters.

Counters and gauges continue to use the existing runtime new_counter* / new_gauge* helpers. Per discussion with @fulmicoton, migrating them to the same static style is out of scope for this PR.

Breaking change: gRPC metric naming

GrpcMetricsLayer::new(subsystem, kind) previously baked subsystem into the metric name (quickwit_<subsystem>_grpc_requests_total, ...). Since histograms now require a compile-time-fixed name, the subsystem is extracted into a dynamic service label on a single metric family:

quickwit_grpc_requests_total{service=..., kind=..., rpc=..., status=...}
quickwit_grpc_requests_in_flight{service=..., kind=..., rpc=...}
quickwit_grpc_request_duration_seconds{service=..., kind=..., rpc=..., status=...}

@shuheiktgw shuheiktgw force-pushed the refactor/metrics-rs-migration branch from 12ee5b5 to 2658fba Compare April 21, 2026 14:01
@shuheiktgw shuheiktgw marked this pull request as draft April 21, 2026 14:03
@shuheiktgw shuheiktgw marked this pull request as draft April 21, 2026 14:03
@shuheiktgw shuheiktgw force-pushed the refactor/metrics-rs-migration branch from 2658fba to 54ccd31 Compare April 21, 2026 14:26
Replace direct prometheus crate usage with metrics-rs facade wrappers
backed by DashMap-cached handles, enabling dual export to Prometheus
and OTLP from the same call sites.

- Introduce wrapper types (IntCounter, IntGauge, Gauge, Histogram,
  IntCounterVec, IntGaugeVec, HistogramVec, GaugeGuard, OwnedGaugeGuard)
  holding metrics-rs handles obtained via counter!/gauge!/histogram!
  macros at construction time, avoiding per-call allocation.
- Cache per-label-combination handles via DashMap with FNV hashing on
  Vec types.
- Register histograms statically through inventory with new
  define_histogram!/define_histogram_vec! macros, decoupling bucket
  configuration from instantiation order.
- Set up a Router recorder dispatching by metric name prefix: a Fanout
  of PrometheusRecorder and OpenTelemetryRecorder (opt-in via
  QW_ENABLE_OPENTELEMETRY_OTLP_EXPORTER) for quickwit_* metrics,
  DogStatsDRecorder for pomsky.* metrics, and NoopRecorder as the
  default.
- Wire PrometheusHandle::render() into metrics_text_payload() for the
  /metrics endpoint and propagate histogram bucket bounds to
  OpenTelemetryRecorder so OTLP uses the same boundaries.
- Hoist gRPC metrics into shared statics with service/kind as dynamic
  label values.
- Add unit tests using metrics-util's DebuggingRecorder via
  with_local_recorder to verify values forwarded through the facade.
@shuheiktgw shuheiktgw force-pushed the refactor/metrics-rs-migration branch from 54ccd31 to 0c98471 Compare April 21, 2026 14:32
@shuheiktgw shuheiktgw marked this pull request as ready for review April 21, 2026 14:47
@Mallets Mallets self-requested a review April 21, 2026 15:14
Comment thread quickwit/quickwit-common/src/metrics.rs Outdated

/// Declares a labeled histogram (`HistogramVec<N>`) with compile-time static registration.
#[macro_export]
macro_rules! define_histogram_vec {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't really need to have two macros: define_histogram and define_histogram_vec.

You can have a single macro accepting a variable number of arguments, e.g.:

macro_rules! define_histogram {
    (  $static_name:ident,
        name: $name:literal,
        help: $help:literal,
        subsystem: $subsystem:literal,
        buckets: $buckets:expr $(,)?
    ) => {
        // Do stuff to define the histogram
    };
    (  $static_name:ident,
        name: $name:literal,
        help: $help:literal,
        subsystem: $subsystem:literal,
        buckets: $buckets:expr $(,)?,
        const_labels: [$(($ck:literal, $cv:literal)),* $(,)?],
        labels: [$($label:literal),+ $(,)?],
    )=> {
        // Do stuff to define the histogram with labesl
    };
}

You can think about it like it was match statement run at compile time based on the number/types of arguments.


/// Declares a histogram with compile-time static registration.
#[macro_export]
macro_rules! define_histogram {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve the readability of the whole codebase, I think it's better if the macro returns the Histogram directly without declaring the static and LazyLock internally.

This makes the macro more portable outside the static context. The histogram definition than becomes a usage pattern in the codebase rather than a constraint of the macro.

E.g.:

quickwit_common::define_histogram! {
    THREAD_UNPARK_DURATION_MICROSECONDS,
    name: "thread_unpark_duration_microseconds",
    help: "Duration for which a thread of the main tokio runtime is unparked.",
    subsystem: "cli",
    buckets: quickwit_common::metrics::exponential_buckets(5.0, 5.0, 5).unwrap(),
}

would become:

static THREAD_UNPARK_DURATION_MICROSECONDS: LazyLock<Histogram> = LazyLock::new(|| quickwit_common::define_histogram! {
    name: "thread_unpark_duration_microseconds",
    help: "Duration for which a thread of the main tokio runtime is unparked.",
    subsystem: "cli",
    buckets: quickwit_common::metrics::exponential_buckets(5.0, 5.0, 5).unwrap(),
});

…back

- Merge define_histogram! and define_histogram_vec! into a single
  macro with two match arms (no-labels / labeled variants) dispatched
  by argument shape.
- Return a Histogram / HistogramVec<N> expression from the macro
  instead of declaring `pub static ... = LazyLock::new(...)` internally.
  Callers now own the static pattern, so the macro is portable outside
  the static context.
- Migrate all call sites (quickwit-cli, quickwit-common/tower,
  quickwit-ingest, quickwit-jaeger, quickwit-lambda-client,
  quickwit-opentelemetry, quickwit-parquet-engine, quickwit-search,
  quickwit-serve, quickwit-storage) to wrap the macro in
  `LazyLock::new(|| define_histogram!(...))`.

`inventory::submit!` continues to run at process startup via the
.init_array linker section even when the macro expansion lives inside
a closure body, so bucket-bounds registration is unaffected.
@shuheiktgw
Copy link
Copy Markdown
Collaborator Author

@Mallets I’ve fixed it, so could you take a look when you have a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants