Add probing service#815
Conversation
|
👋 Thanks for assigning @tnull as a reviewer! |
|
🔔 1st Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 2nd Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 5th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 6th Reminder Hey @tnull! This PR has been waiting for your review. |
There was a problem hiding this comment.
Hi @randomlogin, thanks for the work on this! I've reviewed the first two commits:
I've left a bunch of inline comments addressing configuration and public API, commit hygiene, testing infrastructure, and test flakiness.
In summary:
- A couple of items are exposed publicly that seem like they should be scoped to probing or gated for tests only (see
scoring_fee_paramsinConfigandscorer_channel_liquidityonNode). - The probing tests duplicate existing test helpers (
setup_node,MockLogFacadeLogger). Reusing and extending what's already intests/common/would reduce duplication and keep the test file focused on the tests themselves. test_probe_budget_blocks_when_node_offlinehas a race condition where the prober dispatches probes before the baseline capacity is measured, causing the assertion between the baseline and stuck capacities to fail. Details in the inline comment.- A few nits about commit hygiene, import structure, and suggestions for renaming stuff.
Also needs to be rebased.
| pub struct HighDegreeStrategy { | ||
| network_graph: Arc<Graph>, | ||
| /// How many of the highest-degree nodes to cycle through. | ||
| pub top_n: usize, |
There was a problem hiding this comment.
Could top_n be renamed to num_top_nodes? The latter reads less generic to me but up to you to modify or not.
There was a problem hiding this comment.
I'd leave it as is (maybe top_k, as somehow it is more common in algorithms to describe the number of samplings).
What about top_node_count?
Personally I don't like 'num' as a short for 'number'
|
🔔 7th Reminder Hey @tnull! This PR has been waiting for your review. |
|
@enigbe, thanks for a review, the updates are incoming soon. |
436e4a3 to
07dfde4
Compare
ff741c2 to
c31f1ce
Compare
tnull
left a comment
There was a problem hiding this comment.
Thanks for taking this on and excuse the delay here!
Did a first review pass and this already looks great! Here are some relatively minor comments, mostly concerning the API design.
|
🔔 4th Reminder Hey @enigbe! This PR has been waiting for your review. |
tnull
left a comment
There was a problem hiding this comment.
Seems tests are failing right now:
thread 'exhausted_probe_budget_blocks_new_probes' (167312) panicked at tests/probing_tests.rs:381:5:
no probe dispatched within 15 s
failures:
exhausted_probe_budget_blocks_new_probes
probe_budget_increments_and_decrements
f99786b to
1e73e6e
Compare
948c2fc to
ee21152
Compare
tnull
left a comment
There was a problem hiding this comment.
This still need a rebase.
Introduce a background probing service that periodically dispatches probes to improve the scorer's liquidity estimates. Includes two built-in strategies.
Change cursor of top nodes from HighDegreeStrategy to use cac: Create src/util.rs Add probe HTLC maximal lower bound Fix styling (config argument order), explicit Arc::clone instead of .clone() Change tests open_channel to reuse existing code
The locked_msat budget tracking was broken for Destination probes: send_spontaneous_preflight_probes only returns (PaymentHash, PaymentId) without exposing the actual paths or per-hop amounts. This meant we locked amount_msat at send time but released amount+fees per path in ProbeSuccessful/ProbeFailed events, causing a systematic mismatch. Fix by removing Probe::Destination entirely. Strategies now return a fully constructed Path, and run_prober always uses send_probe(path), locking and releasing the same path.hops.sum(fee_msat) on both sides. HighDegreeStrategy now calls Router::find_route directly and applies the liquidity-limit check itself, mirroring send_preflight_probes. Other fixes in this commit: - Fix RandomStrategy fee calculation: compute proportional fees on the forwarded amount (delivery + downstream fees), not just delivery - Fix HighDegreeStrategy doc - Fix random_range overflow when max - min == u64::MAX - Add doc warning about scorer_channel_liquidity being O(scorer size) - Make probing module public, import objects directly in builder.rs - Reorder EventHandler fields (prober after om_mailbox) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Strategy constructors (high_degree/random_walk/custom) moved from ProbingConfig to ProbingConfigBuilder, so they live on the builder rather than on the thing being built. ProbingConfigBuilder setters switched from consuming `self -> Self` to `&mut self -> &mut Self`, matching NodeBuilder. `build` now takes `&self`. Existing fluent call sites still compile unchanged. Removed the flat new_high_degree/new_random_walk UniFFI constructors on ProbingConfig that replicated the builder wiring. Bindings now go through ArcedProbingConfigBuilder (exposed as ProbingConfigBuilder via UDL), which wraps ProbingConfigBuilder in an RwLock for the Arc semantics UniFFI requires — mirroring ArcedNodeBuilder. AI-assisted (Claude Code).
Previously we always queried gossip data to construct probing route, which would fail for unannounced channels. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Clamp ProbingConfigBuilder::interval to MIN_PROBING_INTERVAL (100ms) in build(). Avoids the tokio::time::interval(Duration::ZERO) panic in run_prober and rules out sub-100ms hot-looping. New constant lives in config.rs alongside DEFAULT_PROBING_INTERVAL_SECS. - Replace .unwrap_or(0) with .expect() on the fetch_update calls in handle_probe_successful / handle_probe_failed. The closure always returns Some, so the Err arm is unreachable; unwrap_or(0) implied a possible failure mode that cannot occur. - Reject RandomStrategy paths whose HTLC bounds force the probe above the user-configured max_amount_msat. Previously the amount could be silently inflated past the user's ceiling. - Document in try_build_path that longer cycles aren't filtered from the random walk; probes fail at the destination by design, so revisiting a node via a different channel is harmless. - Simplify the Debug impl for ProbingConfig by deriving it and giving ProbingStrategyKind its own manual Debug that hides the Custom payload. Replaces a larger hand-written impl on ProbingConfig. - Cache prev.saturating_sub(amount) into `new` in the probe handlers so the log line doesn't recompute it. - Expand ProbingConfig docs with a Caution section noting that stuck intermediate HTLCs can lock outbound liquidity until timeout, and that max_locked_msat is the user-facing backstop for this. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ee21152 to
a10425e
Compare
|
Now it should be fine. Previously I've accidentally absorbed changes from the main without a proper merge. Should I squash my commits (especially with the abundance of empty CI re-trigger commits I made)? Also regarding the commit messages, I'll to be more accurate (and follow case styling). Please, don't hesitate to tell me once again if it occurs. Also ideally the commits should be more granular, right? |
Yes, fixed in #891.
Yes, please generally restructure your commit history so it has a few logical 'feature' commit and preferably only add fixups right after them, so they can be squashed in once reviewed. Usually you'd prefix their commit description with
Yes, that would be great. Preferably code introduced in earlier commit doesn't get changed again in later commits, and all revisions should build test, and format independently. |
Added a probing service which is used to send probes to estimate channels' capacities.
Related issue: #765.
Probing is intended to be used in two ways:
For probing a new abstraction
Proberis defined and is (optionally) created during node building.Prober periodically sends probes to feed the data to the scorer.
Prober sends probes using a ProbingStrategy.
ProbingStrategy trait has only one method:
fn next_probe(&self) -> Option<Probe>; every tick it generates a probe, whereProberepresents how to send a probe.To accommodate two different ways the probing is used, we either construct a probing route manually (
Probe::PrebuiltRoute) or rely on the router/scorer (Probe::Destination).Prober tracks how much liquidity is locked in-flight in probes, prevents the new probes from firing if the cap is reached.
There are two probing strategies implemented:
Random probing strategy, it picks a random route from the current node, the route is probed via
send_probe, thus ignores scoring parameters (what hops to pick), it also ignoresliquidity_limit_multiplierwhich prohibits taking a hop if its capacity is too small. It is a true random route.High degree probing strategy, it examines the graph and finds the nodes with the biggest number of (public) channels and probes routes to them using
send_spontaneous_preflight_probeswhich uses the current router/scorer.The former is meant to be used on payment nodes, while the latter on probing nodes. For the HighDegreeStrategy to work it is recommended to set
probing_diversity_penalty_msatto some nonzero value to prevent routes reuse, however it may fail to find any available routes.There are three tests added:
Example output (runs for ~1 minute, needs
--nocaptureflag):For performance testing I had to expose the scoring data (
scorer_channel_liquidity).Also exposed
scoring_fee_params: ProbabilisticScoringFeeParameterstoConfig.TODOs: