Skip to content

Make all BufferIterators shaped#496

Open
arcondello wants to merge 14 commits intodwavesystems:mainfrom
arcondello:feature/dtype
Open

Make all BufferIterators shaped#496
arcondello wants to merge 14 commits intodwavesystems:mainfrom
arcondello:feature/dtype

Conversation

@arcondello
Copy link
Copy Markdown
Member

@arcondello arcondello commented Apr 7, 2026

Work in progress

Makes all iterators shaped.

This simplifies the implementation somewhat, and means that all iterators work with std::default_sentinel which is both convenient and faster.

The downside of this change is that our Array::view() method returns a std::range, but not a std::common_range. That means it doesn't play nicely with a lot of the older C++ algorithm methods because the type of the beginning and end iterators are not the same. For example, std::accumulate has this signature:

template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );

so one cannot do

auto view = array->view(state);
std::accumulate(view.begin(), view.end());

One should instead do

std::accumulate(array->begin(state), array->end(state));

or

auto view = array->view(state);
std::accumulate(view.begin(), view.begin() + view.size());

The one that's most annoying is std::vector::assign_range is a C++23 feature.

Overall though I think it's worth it.

@arcondello arcondello changed the title Make all iterators type Make all iterators typed Apr 7, 2026
@arcondello arcondello changed the title Make all iterators typed Make all BufferIterators shaped Apr 7, 2026
@arcondello arcondello added the enhancement New feature or request label Apr 7, 2026
@arcondello arcondello requested a review from wbernoudy April 22, 2026 16:36
@arcondello arcondello marked this pull request as ready for review April 22, 2026 16:36
@arcondello
Copy link
Copy Markdown
Member Author

arcondello commented Apr 22, 2026

I asked Gemini to review and this is what it said

<some preamble>

However, since you are working with C++20/23 and seem to be prioritizing performance for systems programming, there are several architectural "weight" issues and potential edge cases to consider.

### 1. The "Heavyweight Iterator" Problem
In C++, iterators are generally expected to be "lightweight" (cheap to copy/move, often fitting in registers). 

* **Heap Allocation on Copy:** Your copy constructor and `operator=` perform a heap allocation via `std::make_unique_for_overwrite`. This is a significant performance bottleneck if this iterator is used with standard algorithms (like `std::sort` or `std::transform`) which copy iterators frequently.
* **Recommendation:** Use a **Small Buffer Optimization (SBO)** for `loc_`. Since most tensors/arrays have a rank (`ndim`) less than 8, you could use a `std::inplace_vector` (C++26) or a fixed-size buffer with a fallback to the heap.
    ```cpp
    // Example SBO approach
    ssize_t static_loc_[8]; 
    std::unique_ptr<ssize_t[]> dynamic_loc_;
    ```

### 2. Standard Library Compatibility
To fully satisfy the `std::random_access_iterator` concept (required for many `<ranges>` and `<algorithm>` features), you are missing a few type definitions and operators.

* **Missing Tags:** You should explicitly define `iterator_concept` and `iterator_category`.
    ```cpp
    using iterator_concept = std::random_access_iterator_tag;
    using iterator_category = std::random_access_iterator_tag;
    ```
* **Arrow Operator:** Even if it returns a value (proxy), `operator->` is expected for completeness.
* **Const Correctness:** In `location()`, you return a `std::span<const ssize_t>`, but the internal `unique_ptr` is `ssize_t[]`. This is fine, but ensure the method is marked `const`.

### 3. Logic & Performance in `increment_impl_`
The use of `std::div` is clever for handling N-dimensional wrapping, but it comes with a cost.

* **The Cost of `std::div`:** Integer division is one of the most expensive CPU instructions. Calling it `ndim` times for every increment is heavy.
* **Optimization:** Most iterations are contiguous or simple strides. You might consider a "fast path" for `ndim == 1` or for cases where the increment doesn't cause a wrap-around (checking if `loc[axis] + n < shape[axis]`).
* **`operator[]` Cost:** Currently, `operator[]` is $O(ndim)$ because it calls `increment_impl_`. In a standard random-access iterator, this is expected to be $O(1)$. While $O(ndim)$ is technically necessary for non-contiguous multi-dim arrays, users might be surprised by the overhead.

### 4. Safety & Edge Cases
* **Dangling Pointers:** As noted in your comments, `shape_` and `strides_` are non-owning. If the `BufferIterator` outlives the container that owns those arrays, you'll have a hard-to-debug crash. If this is a public-facing API, consider if the iterator should hold a `std::shared_ptr` or a small `std::vector` of the shape/strides.
* **The `ndim_ <= 0` Sentinel Check:** ```cpp
    friend bool operator==(const BufferIterator& lhs, std::default_sentinel_t) {
        if (lhs.ndim_ <= 0) return true;
        return lhs.loc_[0] >= lhs.shape_[0];
    }
    ```
    If `ndim_ == 0` (a scalar), this returns `true` immediately. This means a scalar iterator is **always** equal to the sentinel, making it impossible to iterate over (and dereference) a single scalar value. Usually, a scalar should allow one dereference before reaching the end.

### 5. `typing.hpp` Observations
* **Type Exhaustiveness:** In `value_()`, you switch on `FormatCharacter`. Ensure that every type allowed by the `DType` concept is covered. 
* **`long` vs `long long`:** On many systems (like Windows/MSVC), `long` and `int` are both 32-bit, while `long long` is 64-bit. On Linux, `long` is often 64-bit. Your `FormatCharacter` enum and `value_()` switch should be careful to map to the explicit `std::int64_t` types used in the concept to avoid "unreachable" code or missing cases on different compilers.

### Summary of Recommendations
1.  **Eliminate heap allocation on copy** by using a fixed-size array for `loc_` if `ndim` is small.
2.  **Add `iterator_concept`** to allow C++20 ranges to recognize it as a random-access iterator.
3.  **Fix the scalar (`ndim == 0`) logic** in the sentinel comparison to allow one-time dereferencing.
4.  **Inline the "fast path"** for `n=1` and `ndim=1` in `operator++` to avoid the `std::div` overhead for the most common use case.

Would you like me to draft a version of the `increment_impl_` that includes a fast-path for contiguous data?

For (1), I need to think about it. I certainly wouldn't handle the 8dim case, but I am a little tempted to handle the 1D case with a union. Something like

union Loc {
    std::make_unique<ssize_t[]> multi;
    ssize_t flat;
}

the code complexity is quite non-trivial though. Needs thought.

(2) seems like a no-brainer worth doing

For (3) I think it's confused and dereferencing 0d iterators works just fine. Even though it is equal to std::default_sentinel.

For (4) I am not sure the performance will be much different, but I'll look into it.

I'll also add operator->, surprised that isn't done for me.

@arcondello
Copy link
Copy Markdown
Member Author

arcondello commented Apr 22, 2026

After some more reading:
For (2)

  • We don't satisfy using iterator_category = std::random_access_iterator_tag;, because we don't always return a reference type. We do satisfy using iterator_concept = std::random_access_iterator_tag; though. FWIW that is the default assumption if you don't add iterator_concept, but we might as well be explicit. I did add a test that we work with std::sort as expected.
  • We shouldn't add operator-> because that would return a pointer to the underlying buffer, which we cannot always do and even when we can it's of dubious value.

For (4) I don't see any performance difference.

That leaves (1) which I think is safe to leave alone for now.

So overall we got a few more tests, a missing const, and a more explicit iterator_concept out of the review 😆 I did learn something though!


#include <array>
#include <cstdint>
#include <iostream>
Copy link
Copy Markdown
Member Author

@arcondello arcondello Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded #include

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant