Skip to content

SparseStorage, concurrent_flat_map, and SparseMatrixAtomic#43

Open
bendavid wants to merge 33 commits intomainfrom
calibrationdev
Open

SparseStorage, concurrent_flat_map, and SparseMatrixAtomic#43
bendavid wants to merge 33 commits intomainfrom
calibrationdev

Conversation

@bendavid
Copy link
Copy Markdown
Owner

@bendavid bendavid commented Apr 9, 2026

Adds a sparse storage backend for HistoBoost and the supporting
data structures (lock-free concurrent map and SparseMatrixAtomic),
plus a generic MapWrapper helper that is then used to extend the
existing HistShiftHelper / QuantileHelper helpers with automatic
broadcasting over container arguments, tensor broadcasting, and a
continuous quantile-transform mode.

Bottom-up commit list:

  • add python script for tests
  • Add SymMatrixAtomic
  • minor improvement for SymMatrixAtomic and add initial version of SparseMatrixAtomic
  • fix deprecated storage_type access
  • fix constness
  • make wrapper more flexible/robust
  • flexible column types for quantile helpers
  • add missing include
  • make range_to more flexible
  • add lock-free insert-only concurrent_flat_map
  • add SparseMatrixAtomic test driver
  • SparseMatrixAtomic: switch to narf::concurrent_flat_map
  • concurrent_flat_map: add move constructor and assignment
  • HistoBoost: add SparseStorage option backed by concurrent_flat_map
  • HistoBoost SparseStorage: convert result to wums.SparseHist
  • concurrent_flat_map: serialize segment growth via sentinel
  • SparseStorage: fix ND linearization mismatch with SparseHist
  • SparseMatrixAtomic: configurable fill_fraction
  • HistShiftHelper: guard against non-finite bin geometry
  • Add MapWrapper helper for element-wise application over container args
  • HistShiftHelper: delegate container broadcasting to MapWrapper
  • QuantileHelper[Static]: delegate container broadcasting to MapWrapper
  • QuantileHelper[Static]: add continuous CDF-style lookup mode
  • define_quantile_ints: support continuous quantile mode
  • build_quantile_hists: return bin centers and volumes
  • MapWrapper: simplify to a single forwarding constructor
  • Rename define_quantile_ints to define_quantiles and add label parameter
  • quantile_lookup: fix CDF boundary and degenerate-bin issues — clamp
    the continuous CDF upper bound below 1 to keep outputs inside a
    Regular(N, 0, 1) axis, and handle degenerate (collapsed) quantile
    bins without producing NaN.
  • quantile_lookup: fix CDF formula to align with bin boundaries
    edges[k] now maps to (k+1)/N (right edge of bin k) instead of
    k/(N-1), fixing a first-bin pile-up in the continuous mode.
  • HistoBoost: forward optional metadata to the output histogram
    adds a metadata keyword threaded through both the dense
    hist.Hist and sparse SparseHist._from_flat paths.
  • Add build_quantile_hists_from_fine for histogram-based quantile edges — extracts chained conditional quantile edges from a
    pre-filled fine-binned histogram via cumulative-sum analysis
    (searchsorted on fine bin edges), replacing the sort-based approach
    with a single multi-threaded RDataFrame event loop. Returns
    (quantile_hists, centers_hist, volume_hist) and supports weighted
    events.
  • Add TensorMapWrapper and enable tensor broadcasting in quantile helpers — new TensorMapWrapper in rdfutils.hpp that element-
    wise applies a callable over Eigen tensor arguments (shape-preserving,
    scalar return-type-deduced, non-tensor args broadcast). All four
    quantile helpers are now
    MapWrapper<TensorMapWrapper<Impl>>, so they transparently handle
    scalar, RVec, and Eigen tensor inputs.
  • quantile_lookup: unify N+1 edge layout across integer and continuous modes — the stored edge tensor now has [val_min, e_0, ..., e_{N-1}]
    so every quantile bin gets its own [edges[i], edges[i+1]] segment.
    Fixes a residual first-bin off-by-one in continuous mode and
    simplifies the integer-mode lookup to a uniform clamp(iquant-1, 0, N-1). build_quantile_hists{_from_fine} prepend val_min to match
    the new layout (helper hist gains one extra edge slot on its last
    axis).

This series is the narf side of the larger sparse-input rework that
the rabbit and wums PRs make use of.

WMass/rabbit#129
WMass/wums#25

bendavid and others added 30 commits April 3, 2026 21:15
A segmented open-addressing hash map for integer keys supporting
concurrent lock-free find / insert / emplace / expansion. State bits
are encoded in the two MSBs of each slot's key. Includes tests
covering single-threaded correctness, pointer stability across
expansion, and multi-threaded concurrent insert/find, plus a test
for SparseMatrixAtomic that exercises its public API under
concurrent fetch_add.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces tbb::concurrent_unordered_map with the new lock-free
insert-only flat map, removing the FIXME about lock contention on
inserts. reserve() becomes a no-op since the new map grows on
demand.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Required so the map can live as a member of other movable types
(e.g. a boost::histogram storage class). The moved-from object is
left in a destroy-only state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds narf::concurrent_sparse_storage, a boost::histogram Storage type
backed by narf::concurrent_flat_map with has_threading_support = true,
plus a make_histogram_sparse factory and python-friendly snapshot
helpers (boost::histogram does not expose its storage_ member to
cppyy directly).

HistoBoost gains a SparseStorage marker class taking an estimated
fill_fraction (default 0.1) used to pre-size the underlying map and
avoid most on-the-fly expansions. Tensor weights are not supported in
this mode and conversion to a python hist.Hist is skipped; the raw
RResultPtr is returned. Includes an end-to-end RDataFrame test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SparseStorage path now lazily converts the underlying C++
sparse histogram to a wums.sparse_hist.SparseHist on first
dereference, snapshotting the concurrent_flat_map into flat
indices/values that match the with-flow row-major layout.
Pass convert_to_hist=False to get the raw RResultPtr instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously every thread that observed a saturated tail segment
speculatively allocated a doubled-size successor and then either
won the CAS or freed it. Under high thread contention this caused
a transient memory spike of M_threads * segment_size per growth
event, easily inflating peak RSS by an order of magnitude for
multi-GB segments and potentially fragmenting the address space.

ensure_next now CAS-publishes a "growing" sentinel into the
segment's next pointer before allocating; only the winning thread
performs the allocation while losers yield-spin until the real
successor is published. All segment walks use a new observed_next
helper that treats the sentinel as "no successor yet".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
boost::histogram linearizes axes column-major (leftmost axis has
stride 1), but wums.SparseHist expects numpy row-major flat
indices. For ND histograms this caused entries to land in the
wrong bins (often flow bins) and silently disappear from
toarray(flow=False); 1D was unaffected and so the existing test
did not catch it.

The conversion now un-ravels each boost-linear key under F order
and re-ravels under C order before constructing the SparseHist.
Adds a 3D test that cross-checks against a dense HistoBoost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the hard-coded size0*size1/40 initial capacity with a
fill_fraction constructor argument (default 0.025 to match the
previous behaviour) that sizes the underlying concurrent_flat_map
to fill_fraction * size0 * size1 entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Treat continuous-axis bins with infinite width or center as flow bins
and return zero correction, preventing NaN propagation when an axis
uses np.inf as a bin edge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MapWrapper wraps an arbitrary callable so that, when invoked, any
argument satisfying narf::is_container is zipped element-wise (with
scalar arguments broadcast via make_view) and the callable is applied
to each resulting tuple via std::apply. If none of the arguments are
containers, the callable is invoked directly with the arguments as-is.

Also provide a forwarding constructor so the wrapped callable can be
constructed in place from MapWrapper's own constructor arguments, and
add a unit test exercising both the container and scalar-passthrough
code paths.
Rename the core class to HistShiftHelperImpl and drop its is_container_any
branch, collapsing compute and compute_impl into a single scalar-only
implementation. HistShiftHelper is now a template alias for
MapWrapper<HistShiftHelperImpl<Axes...>>, which restores the previous
element-wise behavior for container arguments while keeping the per-event
code path untouched.
Rename the core classes to QuantileHelperImpl and QuantileHelperStaticImpl
and expose QuantileHelper / QuantileHelperStatic as MapWrapper template
aliases over them. This gives both helpers automatic element-wise
broadcasting over container arguments while leaving their existing
scalar call paths and factory/Python entry points source-compatible.

Also add a unit test exercising the scalar and RVec call paths of
QuantileHelperStatic.
Thread a bool Continuous template parameter through QuantileHelperImpl
and QuantileHelperStaticImpl via a shared quantile_lookup helper. In
continuous mode the helpers return a double in [0, 1] obtained by
linearly interpolating between adjacent stored edges (edges[i] maps to
i/(N-1)), with values outside [edges[0], edges[N-1]] clamped to 0 / 1.

Expose QuantileHelperContinuous / QuantileHelperStaticContinuous aliases
and a make_quantile_helper_continuous factory. Extend the unit test to
cover the scalar and RVec continuous code paths.
Add a continuous=False option to build_quantile_hists which preserves
the original (Regular / Variable) quantile axes in the returned helper
histograms instead of replacing them with Integer axes.
define_quantile_ints auto-detects the mode from the axis type and
dispatches to the continuous quantile helpers, feeding the resulting
CDF-style columns (named _quant instead of _iquant) to subsequent
helpers in the chain.
Also compute per-bin minima (via ak.min) alongside the existing maxima
so that per-dimension widths and centers of the final transformed
quantile bins can be derived. Return two additional histograms:

- centers_hist: multidimensional bin centers stored along an extra
  StrCategory "coord" axis labelled with the input quantile axis names
  (with a quant_i placeholder for any unnamed or duplicated name).
- volume_hist: product of the per-dimension widths of the same bin.

Both are indexed by the full set of conditional and quantile axes
matching the last helper histogram. Update the existing call site in
test/testquantiles.py to unpack the new return tuple.
Remove the explicit copy/move constructors for the wrapped Callable —
the forwarding constructor already handles those cases, and the implicit
copy/move constructors of MapWrapper itself are preferred by overload
resolution (non-template beats template).
The function no longer necessarily returns integers (continuous mode
returns doubles), so drop the _ints suffix. Add an optional label
string that is inserted into output column names (e.g. col_plus_quant)
so the same quantile transform can be applied to multiple sets of input
columns without naming collisions.
Clamp the continuous CDF upper bound to std::nextafter(1.0, 0.0) so
the output always falls within the last bin of a Regular(N, 0, 1) axis
rather than in the overflow, which caused out-of-bounds access when the
CDF value was used as a conditioning variable for the next chained
helper. Also guard against division by zero when two consecutive
quantile edges are equal (collapsed empty bins), returning 0.5 instead
of NaN.
The N stored edges are the right boundaries of N equal-count quantile
groups, so edges[k] should map to (k+1)/N — the right edge of the k-th
bin in a Regular(N, 0, 1) axis. The previous formula (i + frac) / (N-1)
mapped edges[0] to 0 (the left edge of bin 0) rather than to 1/N (the
boundary between bins 0 and 1), causing most events to pile up in the
first bin. Corrected to (i + 1 + frac) / N.
Add a metadata keyword to HistoBoost (default None) and pass it through
to hist.Hist(..., metadata=...) for the dense path and to
SparseHist._from_flat(..., metadata=...) for the sparse path, so that
caller-supplied metadata travels with the resulting histogram.
bendavid and others added 3 commits April 12, 2026 20:28
New function that extracts chained conditional quantile edges from a
pre-filled finely-binned multi-dimensional histogram, replacing the
sort-based approach with cumulative-sum analysis on the histogram bins.

The caller fills a single (condaxes + fine_quantile_axes) histogram in
one multi-threaded RDataFrame event loop, then passes it to this
function which:
  1. Projects out remaining fine axes (sum)
  2. Computes cumulative sums per conditional slice
  3. Uses searchsorted to find quantile boundaries (restricted to fine
     bin edges)
  4. Rebins the fine axis to quantile bins for the next variable's
     conditioning

Returns (quantile_hists, centers_hist, volume_hist) in the same format
as build_quantile_hists. Supports weighted events (the input histogram
can use Double or Weight storage). No sorting, no data materialization,
no multiple event loops.
Add TensorMapWrapper to rdfutils.hpp: applies a callable element-wise
over Eigen tensor arguments using the same make_zip_view pattern as
MapWrapper, with non-tensor arguments broadcast via make_repeat_view.
The output tensor shape is preserved and the scalar type is deduced
from the callable's return type.

Wrap QuantileHelper, QuantileHelperContinuous, QuantileHelperStatic,
and QuantileHelperStaticContinuous as MapWrapper<TensorMapWrapper<Impl>>
so they transparently handle scalar, RVec, and Eigen tensor inputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…modes

Previously the stored edge tensor held N right-boundaries of the N quantile
bins; continuous mode interpolated CDF on N-1 interior segments and used a
special linear extrapolation for bin 0, introducing a first-bin off-by-one
bias in the continuous quantile output.

Now both modes store N+1 edges: [val_min, e_0, ..., e_{N-1}], where val_min
is the observed lower boundary of the first quantile bin (CDF = 0) and
e_k is the right boundary of bin k. Each of the N bins — including bin 0
and the last one — gets its own dedicated [edges[i], edges[i+1]] segment,
so continuous CDF is piecewise-linear on N segments and integer lookup
uses i = clamp(iquant - 1, 0, N - 1) uniformly.

build_quantile_hists{_from_fine} now prepend val_min before writing to
the helper histogram, which gains one extra edge slot on its last axis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant