Skip to content

Add benchmark for cast from/to decimals#9729

Open
klion26 wants to merge 2 commits intoapache:mainfrom
klion26:add_decimalcast_benchmark
Open

Add benchmark for cast from/to decimals#9729
klion26 wants to merge 2 commits intoapache:mainfrom
klion26:add_decimalcast_benchmark

Conversation

@klion26
Copy link
Copy Markdown
Member

@klion26 klion26 commented Apr 15, 2026

Which issue does this PR close?

What changes are included in this PR?

Add benchmarks for cast from/to decimals

Are these changes tested?

This is benchmarks, no need to add tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 15, 2026
Copy link
Copy Markdown
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the small input sizes are too small to give stable/meaningful results? I would imagine most "real life" arrow arrays will have O(100) or O(1000) entries, not O(10)?

But I'll defer to @alamb who has a lot more experience with these kinds of benchmarks.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 16, 2026

I don't know if the small input sizes are too small to give stable/meaningful results? I would imagine most "real life" arrow arrays will have O(100) or O(1000) entries, not O(10)?

Yeah I think most benchmarks we have use something like 8K rows as that mirrors what happens in most real world systems. Otherwise the per-array overhead dominates the actual work

Also I think we have found that if the iteration time per benchmark is less than a few hundred microseconds, the results are very suceptabel to system noise.

@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Apr 16, 2026

Thanks for the review and information. I've updated the code and made the time per-iter hundreds(or bigger) microseconds on my local laptop

string2decimal128(38, 3)         time:   [2.0216 ms 2.0831 ms 2.1535 ms]
....
decimal128(38, 3)_to_int64    time:   [211.44 µs 212.62 µs 213.87 µs]

@klion26 klion26 requested a review from scovich April 16, 2026 12:42
Copy link
Copy Markdown
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question: Do we want two classes of benches? One that exercises error/invalid paths heavily and one that does not? (right now, they all bias heavily toward error paths)

Comment on lines +61 to +62
let r = arrow_cast::cast(&array, &bench.1);
std::hint::black_box(r)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at https://doc.rust-lang.org/beta/std/hint/fn.black_box.html, I think this should be

Suggested change
let r = arrow_cast::cast(&array, &bench.1);
std::hint::black_box(r)
black_box(arrow_cast::cast(black_box(&array), black_box(&bench.1)));

(prevents compiler from making any assumptions about the input or the output of cast)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change it.

Comment on lines +37 to +47
1 => Some("".to_string()),
2 => Some(" ".to_string()),
3 => Some("-1.-23499999".to_string()),
4 => Some("--1.23456789".to_string()),
5 => Some("1.-23456789".to_string()),
6 => Some("000.123".to_string()),
7 => Some("+123".to_string()),
8 => Some("+123.12345000".to_string()),
9 => Some("0".to_string()),
10 => Some("000.000".to_string()),
11 => Some("0000000000000000012345.000".to_string()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully half the values in the array are invalid, so this bench will mostly measure the cost of the error paths. Is that intentional? Do we have a happy path bench to complement it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will improve this.

Comment on lines +80 to +85
1 => Some(f32::MIN),
2 => Some(f32::MAX),
_ => match x % 2 {
0 => Some((x as f32) * rng.random::<f32>()),
_ => Some(-(x as f32) * rng.random::<f32>()),
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random values are uniformly distributed over 0..1 (floating point) so this exercises mostly values in -5..5 with exponentially decreasing probability of large negative exponents (e.g. 1e-6). Do we want any coverage of large positive exponents (e.g. 1e6) as well?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Improve this data coverage.

Comment on lines +128 to +129
1 => Some(i32::MIN),
2 => Some(i32::MAX),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are out of gamut values, so 20% of the values will exercise error path

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will improve this.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klion26

Can we please move this code into the existing cast_kernels before we merge it (we can do it as a follow on PR if needed but I think it would be better to do before)

I also ran it locally and it seems to be capturing the time spent to convert to decimial 👍

Image

The only thing I ssee is that now maybe each iteration is perhaps too long (criterion wants it to be 5ms or less). Maybe we could reduce the batch size to 8K

However, I think it is fine

Benchmarking string2decimal32(9, 2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
string2decimal32(9, 2)  time:   [1.6175 ms 1.6220 ms 1.6264 ms]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking string2decimal64(18, 2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
Benchmarking string2decimal64(18, 2): Collecting 100 samples in estimated 8.0689 s (5050 iterations)

@@ -0,0 +1,346 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this in a previous review, but I think the other benchmarks for cast are in the cast_kernels.rs benchmark: https://github.com/apache/arrow-rs/blob/main/arrow/benches/cast_kernels.rs

Can we please add these benchmarks to the existing benchmark (e.g.

c.bench_function("cast decimal32 to decimal32 512", |b| {
b.iter(|| cast_array(&decimal32_array, DataType::Decimal32(9, 4)))
});
c.bench_function("cast decimal32 to decimal32 512 lower precision", |b| {
b.iter(|| cast_array(&decimal32_array, DataType::Decimal32(6, 5)))
});
c.bench_function("cast decimal32 to decimal64 512", |b| {
b.iter(|| cast_array(&decimal32_array, DataType::Decimal64(11, 5)))
});
c.bench_function("cast decimal64 to decimal32 512", |b| {
b.iter(|| cast_array(&decimal64_array, DataType::Decimal32(9, 2)))
});
c.bench_function("cast decimal64 to decimal64 512", |b| {
b.iter(|| cast_array(&decimal64_array, DataType::Decimal64(12, 4)))
});
c.bench_function("cast decimal128 to decimal128 512", |b| {
b.iter(|| cast_array(&decimal128_array, DataType::Decimal128(30, 5)))
});
c.bench_function("cast decimal128 to decimal128 512 lower precision", |b| {
b.iter(|| cast_array(&decimal128_array, DataType::Decimal128(6, 5)))
});
c.bench_function("cast decimal128 to decimal256 512", |b| {
b.iter(|| cast_array(&decimal128_array, DataType::Decimal256(50, 5)))
});
c.bench_function("cast decimal256 to decimal128 512", |b| {
b.iter(|| cast_array(&decimal256_array, DataType::Decimal128(38, 2)))
});
c.bench_function("cast decimal256 to decimal256 512", |b| {
b.iter(|| cast_array(&decimal256_array, DataType::Decimal256(50, 5)))
});
c.bench_function("cast decimal128 to decimal128 512 with same scale", |b| {
b.iter(|| cast_array(&decimal128_array, DataType::Decimal128(30, 3)))
});
c.bench_function(
"cast decimal128 to decimal128 512 with lower scale (infallible)",
|b| b.iter(|| cast_array(&decimal128_array, DataType::Decimal128(7, -1))),
);
c.bench_function("cast decimal256 to decimal256 512 with same scale", |b| {
)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move to the cast_kernel.rs

Copy link
Copy Markdown
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich @alamb thanks for the review. There are some unexpected matters to handle today, so I will deal with this as soon as possible.

Do we want two classes of benches? One that exercises error/invalid paths heavily and one that does not?

Will improve this, separate the benchmarks into two

The only thing I ssee is that now maybe each iteration is perhaps too long (criterion wants it to be 5ms or less). Maybe we could reduce the batch size to 8K

Will reduce the batch size so that the time is hundreds microsecond

@@ -0,0 +1,346 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move to the cast_kernel.rs

Comment on lines +37 to +47
1 => Some("".to_string()),
2 => Some(" ".to_string()),
3 => Some("-1.-23499999".to_string()),
4 => Some("--1.23456789".to_string()),
5 => Some("1.-23456789".to_string()),
6 => Some("000.123".to_string()),
7 => Some("+123".to_string()),
8 => Some("+123.12345000".to_string()),
9 => Some("0".to_string()),
10 => Some("000.000".to_string()),
11 => Some("0000000000000000012345.000".to_string()),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will improve this.

Comment on lines +128 to +129
1 => Some(i32::MIN),
2 => Some(i32::MAX),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will improve this.

Comment on lines +80 to +85
1 => Some(f32::MIN),
2 => Some(f32::MAX),
_ => match x % 2 {
0 => Some((x as f32) * rng.random::<f32>()),
_ => Some(-(x as f32) * rng.random::<f32>()),
},
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Improve this data coverage.

Comment on lines +61 to +62
let r = arrow_cast::cast(&array, &bench.1);
std::hint::black_box(r)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add benchmark for cast from/to decimals

3 participants