[Feature #21264] Replace C extension with pure Ruby implementation for Ruby >= 3.3#155
[Feature #21264] Replace C extension with pure Ruby implementation for Ruby >= 3.3#155jinroq wants to merge 3 commits intoruby:masterfrom
Conversation
C implementation has been rewritten as faithfully as possible in pure Ruby. [Feature #21264] https://bugs.ruby-lang.org/issues/21264
|
Date was originally written in Ruby prior to Ruby 1.9.3. It was rewritten in C to significantly increase performance. When Date was written in Ruby, it's low performance made it a common bottleneck in Ruby applications. I think for this to be considered, you need to provide comprehensive benchmarks showing that performance does not decrease significantly. |
lib/date/constants.rb
Outdated
|
|
||
| MONTHNAMES = [nil, "January", "February", "March", "April", "May", "June", | ||
| "July", "August", "September", "October", "November", "December"] | ||
| .map { |s| s&.encode(Encoding::US_ASCII)&.freeze }.freeze |
There was a problem hiding this comment.
Put # encoding: US-ASCII at the beginning.
|
A simple benchmark to just create objects: require 'benchmark'
require 'date'
N = 10000
Benchmark.bm do |bm|
bm.report("Time") {N.times {Time.now}}
bm.report("Date") {N.times {Date.today}}
endWith $ ruby -I./lib bench.rb
user system total real
Time 0.001656 0.000023 0.001679 ( 0.001675)
Date 0.002735 0.000062 0.002797 ( 0.002827)This PR: $ ruby -I./lib bench.rb
user system total real
Time 0.001018 0.000013 0.001031 ( 0.001031)
Date 0.007624 0.000151 0.007775 ( 0.007776)Interestingly, this PR makes |
|
@nobu you should probably benchmark with A benchmark should include most of the methods in the library. When I was working on |
|
For the mean time, just tried master: This PR: Agree there seems to be a lot of room for optimization. |
I don't believe the line-by-line translation part is 100% accurate, though it may be true for large portions of the library. The primary implementation difference between the current C implementation and the previous (pre Ruby 1.9.3) Ruby implementation was that the previous Ruby implementation always eagerly converted from whatever the input format was to I think we'd be willing to accept a small performance decrease to switch the C implementation with a Ruby implementation. However, a ~3x performance decrease is way too much to consider switching, IMO. As I mentioned earlier, |
| Implementation | i/s | μs/i | | :--- | :--- | :--- | | System (C ext) | 347.5k | 2.88 | | Pre-optimization (pure Ruby) | 313.5k | 3.19 | | Post-optimization (pure Ruby) | 380.0k | 2.63 | | Implementation | i/s | μs/i | | :--- | :--- | :--- | | System (C ext) | 4.32M | 0.23 | | Pre-optimization (pure Ruby) | 312k | 3.20 | | Post-optimization (pure Ruby) | 1.67M | 0.60 | **5.4x speedup** (312k → 1.67M i/s). Reached approximately **39%** of the C extension's performance. | Implementation | i/s | | :--- | :--- | | System (C ext) | 4.50M | | Pre-optimization (pure Ruby) | 311k | | Post-optimization (pure Ruby) | 1.63M | For cases where the fast path is not applicable (e.g., Julian calendar or BCE years), performance remains equivalent to the previous implementation (no changes). The fast path is applied when all of the following conditions are met: 1. `year`, `month`, and `day` are all `Integer`. 2. The date is determined to be strictly Gregorian (e.g., `start` is `GREGORIAN`, or a reform date like `ITALY` with `year > 1930`). By satisfying these conditions, the implementation skips six `self.class.send` calls, `Hash` allocations, redundant `decode_year` calls, and repetitive array generation. | Implementation | i/s | | :--- | :--- | | System (C ext) | 9.58M | | Pre-optimization (pure Ruby) | 458k | | Post-optimization (pure Ruby) | 2.51M | **5.5x speedup** (458k → 2.51M i/s). Reached approximately **26%** of the C extension's performance. | Implementation | i/s | | :--- | :--- | | System (C ext) | 9.59M | | Pre-optimization (pure Ruby) | 574k | | Post-optimization (pure Ruby) | 2.53M | **4.4x speedup.** 1. **Added a Fast Path** — For `Integer` arguments and Gregorian calendar cases, the entire method chain of `numeric?` (called 3 times) and `valid_civil_sub` is skipped. Instead, month and day range checks are performed inline. 2. **Eliminated Repeated Array Allocation in `valid_civil_sub`** — Changed the implementation to reference a `MONTH_DAYS` constant instead of creating a new array `[nil, 31, 28, ...]` on every call. | Case | System (C ext) | Pre-optimization | Post-optimization | | :--- | :--- | :--- | :--- | | Date.jd | 4.12M | 462k | 1.18M | | Date.jd(0) | 4.20M | 467k | 1.19M | | Date.jd(JULIAN) | 4.09M | 468k | 1.22M | | Date.jd(GREG) | 4.07M | 467k | 1.21M | **Approximately 2.6x speedup** (462k → 1.18M i/s). Reached approximately **29%** of the C extension's performance. The fast path is effective across all `start` patterns (`ITALY` / `JULIAN` / `GREGORIAN`). The following processes are now skipped: - `valid_sg` + `c_valid_start_p` (numerous type checks) - `value_trunc` (array allocation for `Integer`) - `decode_jd` (array allocation for standard Julian Days) - `d_simple_new_internal` (`canon` + flag operations + method call overhead) | Case | System (C ext) | Pre-optimization | Post-optimization | Improvement | | :--- | :--- | :--- | :--- | :--- | | Date.ordinal | 2.66M | 170k | 645k | 3.8x | | Date.ordinal(-1) | 1.87M | 119k | 639k | 5.4x | | Date.ordinal(neg) | 3.08M | 107k | 106k | (Slow path) | **3.8x to 5.4x speedup** in cases where the fast path is applicable. Reached approximately **24% to 34%** of the C extension's performance. `Date.ordinal(neg)` remains on the slow path (equivalent to previous performance) because the year -4712 does not meet the fast path condition (`year > REFORM_END_YEAR`). | Case | System (C ext) | Pre-optimization | Post-optimization | Improvement | | :--- | :--- | :--- | :--- | :--- | | Date.commercial | 2.18M | 126k | 574k | 4.5x | | Date.commercial(-1) | 1.45M | 85k | 560k | 6.6x | | Date.commercial(neg) | 2.84M | 93k | 90k | (Slow path) | **4.5x to 6.6x speedup** in cases where the fast path is applicable. Reached approximately **26% to 39%** of the C extension's performance. Inlined the ISO week-to-JD conversion: 1. Obtain the JD for Jan 1 using `c_gregorian_civil_to_jd(year, 1, 1)` (requires only one method call). 2. Directly calculate `max_weeks` (52 or 53) from the ISO weekday to perform a week range check. 3. Calculate the Monday of Week 1 using: `base = (jd_jan1 + 3) - ((jd_jan1 + 3) % 7)`. 4. Directly calculate the JD using: `rjd = base + 7*(week-1) + (day-1)`. This bypasses the entire previous chain of `valid_commercial_p` → `c_valid_commercial_p` → `c_commercial_to_jd` → `c_jd_to_commercial` (verification via inverse conversion). | Case | System (C ext) | Pre-optimization | Post-optimization | Improvement | | :--- | :--- | :--- | :--- | :--- | | valid_ordinal? (true) | 3.76M | 221k | 3.38M | 15.3x | | valid_ordinal? (false) | 3.77M | 250k | 3.39M | 13.6x | | valid_ordinal? (-1) | 2.37M | 148k | 2.67M | 18.0x | **15x to 18x speedup.** Performance reached **90% to 112%** of the C extension, making it nearly equivalent or even slightly faster. Since `valid_ordinal?` does not require object instantiation and only involves leap year determination and day-of-year range checks, the inline cost of the fast path is extremely low, allowing it to rival the performance of the C extension. | Case | System (C ext) | Pre-optimization | Post-optimization | Improvement | | :--- | :--- | :--- | :--- | :--- | | valid_commercial? (true) | 2.94M | 167k | 1.09M | 6.5x | | valid_commercial? (false) | 3.56M | 218k | 1.08M | 5.0x | | valid_commercial? (-1) | 1.79M | 104k | 1.07M | 10.3x | **5x to 10x speedup.** Performance reached approximately **30% to 37%** of the C extension. The same ISO week validation logic used in the `Date.commercial` fast path (calculating `max_weeks` from the JD of Jan 1 and performing `cwday`/`cweek` range checks) has been inlined. The reason it does not rival the C extension as closely as `valid_ordinal?` is due to the remaining overhead of a single method call to `c_gregorian_civil_to_jd(year, 1, 1)`. | Method | i/s | | :--- | :--- | | Date.valid_jd? | 9.29M | | Date.valid_jd?(false) | 9.68M | It is approximately **3.3x faster** compared to the C extension benchmarks (Reference values: 2.93M / 2.80M). The simplification to only perform type checks has had a significant impact on performance. | Method | Pre-optimization | Post-optimization | Improvement | | :--- | :--- | :--- | :--- | | Date.gregorian_leap?(2000) | 1.40M | 7.39M | 5.3x | | Date.gregorian_leap?(1900) | 1.39M | 7.48M | 5.4x | It is approximately **4.5x faster** even when compared to the C extension reference values (1.69M / 1.66M). For `Integer` arguments, the implementation now performs the leap year determination inline, skipping three method calls: the `numeric?` check, `decode_year`, and `c_gregorian_leap_p?`. Non-`Integer` arguments (such as `Rational`) will fall back to the conventional path. | Method | Pre-optimization | Post-optimization | Improvement | | :--- | :--- | :--- | :--- | | Date.julian_leap? | 2.27M | 8.98M | 4.0x | It is approximately **3.2x faster** even when compared to the C extension reference value (2.80M). For `Integer` arguments, the implementation now skips calls to `numeric?`, `decode_year`, and `c_julian_leap_p?`, returning the result directly via an inline `year % 4 == 0` check. | Method | Pre-optimization | Post-optimization | Improvement | | :--- | :--- | :--- | :--- | | Date#year | 3.27M | 10.06M | 3.1x | It is approximately **2.8x faster** even when compared to the C extension reference value (3.65M). In cases where `@nth == 0 && @has_civil` (which covers almost all typical use cases), the implementation now skips the `m_year` → `simple_dat_p?` → `get_s_civil` method chain as well as `self.class.send(:f_zero_p?, nth)`, returning `@year` directly. Add early return in `m_mon` when `@has_civil` is already true, skipping `simple_dat_p?` check and `get_s_civil`/`get_c_civil` method call overhead. Same pattern as `m_real_year`. Benchmark results (Ruby 4.0.1, benchmark-ips): Date#month: C 21,314,867 ips -> Ruby 14,302,144 ips (67.1%) DateTime#month: C 20,843,168 ips -> Ruby 14,113,170 ips (67.7%) Add early return in `m_mday` when `@has_civil` is already true, skipping `simple_dat_p?` check and `get_s_civil`/`get_c_civil` method call overhead. Same pattern as `m_real_year` and `m_mon`. Benchmark results (Ruby 4.0.1, benchmark-ips): Date#day: C 18,415,779 ips -> Ruby 14,248,797 ips (77.4%) DateTime#day: C 18,758,870 ips -> Ruby 13,750,236 ips (73.3%) Add early return in `m_wday` when `@has_jd` is true and `@of` is nil (simple Date), inlining `(@jd + 1) % 7` directly. This skips `m_local_jd`, `get_s_jd`, `c_jd_to_wday` method call overhead. Benchmark results (Ruby 4.0.1, benchmark-ips): Date#wday: C 20,923,653 ips -> Ruby 11,174,133 ips (53.4%) DateTime#wday: C 20,234,376 ips -> Ruby 3,721,404 ips (18.4%) Note: DateTime#wday is not covered by this fast path since it requires offset-aware local JD calculation. Add fast path in `m_yday` for simple Date (`@of.nil?`) with `@has_civil` already computed. When the calendar is proleptic Gregorian or the date is well past the reform period, compute yday directly via `YEARTAB[month] + day`, skipping `m_local_jd`, `m_virtual_sg`, `m_year`, `m_mon`, `m_mday`, and other method call overhead. Benchmark results (Ruby 4.0.1, benchmark-ips): Date#yday: C 16,253,269 ips -> Ruby 1,942,757 ips (12.0%) DateTime#yday: C 14,927,308 ips -> Ruby 851,319 ips ( 5.7%) Note: DateTime#yday is not covered by this fast path since it requires offset-aware local JD calculation. Multiple optimizations to `Date#+` and its object creation path: 1. Eliminate `instance_variable_set` in `new_with_jd_and_time`: Replace 10 `instance_variable_set` calls with a protected `_init_with_jd` method using direct `@var =` assignment. Benefits all callers (Date#+, Date#-, Date#>>, DateTime#+, etc). 2. Avoid `self.class.send` overhead in `Date#+`: Replace `self.class.send(:new_with_jd, ...)` chain with direct `self.class.allocate` + `obj._init_with_jd(...)` (protected call). 3. Eager JD computation in `Date.civil` fast path: Compute JD via Neri-Schneider algorithm in `initialize` instead of deferring. Ensures `@has_jd = true` from creation, so `Date#+` always takes the fast `@has_jd` path. 4. Add `_init_simple_with_jd` with only 4 ivar assignments: For simple Date fast path, skip 7 nil assignments that `allocate` already provides as undefined (returns nil). 5. Fix fast path condition to handle `@has_civil` without `@has_jd`: When only civil data is available, compute JD inline via Neri-Schneider before addition. Benchmark results (Ruby 4.0.1, benchmark-ips): Date#+1: C 5,961,579 ips -> Ruby 3,150,254 ips (52.8%) Date#+100: C 6,054,311 ips -> Ruby 3,088,684 ips (51.0%) Date#-1: C 4,077,013 ips -> Ruby 2,488,817 ips (61.0%) Date#+1 progression: Before: 1,065,416 ips (17.9% of C) After ivar_set removal: 1,972,000 ips (33.1% of C) After send avoidance: 2,691,799 ips (45.2% of C) After eager JD + 4-ivar init: 3,150,254 ips (52.8% of C) Date#-1: C 4,077,013 ips -> Ruby 2,863,047 ips (70.2%) Date#-1 progression: Before: 989,991 ips (24.3% of C) After Date#+ optimization: 2,488,817 ips (61.0% of C) After Date#- fast path: 2,863,047 ips (70.2% of C) Date#<<1: C 2,214,936 ips -> Ruby 1,632,773 ips (73.7%) Date#<<1 progression: Before: 205,555 ips ( 9.3% of C) After Date#>> optimization: 1,574,551 ips (71.1% of C) After direct fast path: 1,632,773 ips (73.7% of C) - Ruby version: 4.0 (Docker) - C baseline: bench/results/20260215/4.0.1_system.tsv - Tool: benchmark-ips ┌──────────────┬─────────┬────────────┬─────────┐ │ Benchmark │ C (ips) │ Ruby (ips) │ Ruby/C │ ├──────────────┼─────────┼────────────┼─────────┤ │ Date#<<1 │ 2.21 M │ 1.62 M │ 1/1.4x │ ├──────────────┼─────────┼────────────┼─────────┤ │ DateTime#<<1 │ 2.13 M │ 177.53 K │ 1/12.0x │ └──────────────┴─────────┴────────────┴─────────┘ Changes: Replaced the slow path of Date#<< which delegated to self >> (-n) with an inlined version of Date#>>'s slow path logic. This eliminates the extra method call, sign negation, and redundant condition checks. - Date#<< (Date only): reaches 71% of C performance - DateTime#<< (with offset): remains at 1/12x due to the slow path being exercised more heavily - Ruby version: 4.0 (Docker) - C baseline: bench/results/20260215/4.0.1_system.tsv - Tool: benchmark-ips ┌──────────────┬─────────┬───────────────────┬──────────────────┬─────────┐ │ Benchmark │ C (ips) │ Ruby before (ips) │ Ruby after (ips) │ after/C │ ├──────────────┼─────────┼───────────────────┼──────────────────┼─────────┤ │ Date#<=> │ 11.84 M │ 635.23 K │ 2.99 M │ 1/4.0x │ ├──────────────┼─────────┼───────────────────┼──────────────────┼─────────┤ │ DateTime#<=> │ 12.24 M │ 622.88 K │ 577.00 K │ 1/21.2x │ └──────────────┴─────────┴───────────────────┴──────────────────┴─────────┘ Changes: Added a fast path to `Date#<=>` for the common case where both objects are simple Date instances (`@df`, `@sf`, `@of` are all `nil`) with `@nth == 0` and `@has_jd` set. In this case, the comparison reduces to a direct `@jd <=> other.@jd` integer comparison, eliminating two `m_canonicalize_jd` calls (each of which allocates a `[nth, jd]` array via `canonicalize_jd`), redundant `simple_dat_p?` checks, and chained accessor calls for `m_nth`, `m_jd`, `m_df`, and `m_sf`. - `Date#<=>` (Date only): 4.7x improvement over pre-optimization Ruby, reaches 75% of C performance - `DateTime#<=>` (with offset): unaffected — falls through to the existing slow path Benchmark: Date#== optimization (pure Ruby vs C) - Ruby version: 4.0 (Docker) - C baseline: bench/results/20260215/4.0.1_system.tsv - Tool: benchmark-ips ┌─────────────┬─────────┬───────────────────┬──────────────────┬─────────┐ │ Benchmark │ C (ips) │ Ruby before (ips) │ Ruby after (ips) │ after/C │ ├─────────────┼─────────┼───────────────────┼──────────────────┼─────────┤ │ Date#== │ 2.78 M │ 875.47 K │ 3.24 M │ 1.17x │ ├─────────────┼─────────┼───────────────────┼──────────────────┼─────────┤ │ DateTime#== │ 2.72 M │ 798.68 K │ 924.96 K │ 1/2.9x │ └─────────────┴─────────┴───────────────────┴──────────────────┴─────────┘ Changes: Added a fast path to `Date#==` for the common case where both objects are simple Date instances (`@df`, `@sf`, `@of` are all `nil`) with `@nth == 0` and `@has_jd` set. In this case, equality reduces to a direct `@jd == other.@jd` integer comparison. This eliminates two `m_canonicalize_jd` calls (each allocating a `[nth, jd]` array via `canonicalize_jd`), redundant `simple_dat_p?` checks, and chained accessor calls for `m_nth`, `m_jd`, `m_df`, and `m_sf`. - `Date#==` (Date only): 3.7x improvement over pre-optimization Ruby, 17% faster than C - `DateTime#==` (with offset): unaffected — falls through to the existing slow path Add fast paths that skip `m_canonicalize_jd` (which allocates an array) for the common case: both objects are simple (`@df`, `@sf`, `@of` are all `nil`), `@nth == 0`, `@has_jd` is true, and `0 <= @jd < CM_PERIOD` (guaranteeing that canonicalization is a no-op). For `Date#===`, whether the two dates are on the same calendar or not, the result always reduces to `@jd == other.@jd` under these conditions, so the `m_gregorian_p?` check and both `m_canonicalize_jd` calls are eliminated. For `Date#hash`, the same bounds guarantee that `m_nth == 0` and `m_jd == @jd` after canonicalization, so `[0, @jd, @sg].hash` is returned directly. | Method | Before | After | Speedup | C impl | |-------------|-------------|--------------|---------|--------------| | `Date#===` | ~558K ips | ~2,940K ips | +5.3x | ~12,659K ips | | `Date#hash` | ~1,990K ips | ~6,873K ips | +3.5x | ~13,833K ips | feat: Optimized `Date#<`. Add an explicit `Date#<` method with a fast path that bypasses the `Comparable` module overhead. When both objects are simple (`@df`, `@sf`, `@of` are all `nil`), `@nth == 0`, and `@has_jd` is true, `@jd < other.@jd` is returned directly without going through `<=>`. The slow path delegates to `super` (Comparable) to preserve all edge-case behavior including `ArgumentError` for incomparable types. | Method | Before | After | Speedup | C impl | |----------|-------------|-------------|---------|-------------| | `Date#<` | ~2,430K ips | ~3,330K ips | +37% | ~7,628K ips | Add an explicit `Date#>` method with a fast path that bypasses the `Comparable` module overhead. When both objects are simple (`@df`, `@sf`, `@of` are all `nil`), `@nth == 0`, and `@has_jd` is true, `@jd > other.@jd` is returned directly without going through `<=>`. The slow path delegates to `super` (Comparable) to preserve all edge-case behavior including `ArgumentError` for incomparable types. | Method | Before | After | Speedup | C impl | |----------|-------------|-------------|---------|-------------| | `Date#>` | ~2,560K ips | ~3,330K ips | +30% | ~7,682K ips |
https://bugs.ruby-lang.org/issues/21264
Summary
Rewrite the Date and DateTime C extension as pure Ruby, targeting Ruby 3.3+.
Ruby < 3.3 continues to use the existing C extension as before.
lib/date/)ext/date/) compiled viarake-compilerAll 143 tests pass with 162,593 assertions on both paths.
Motivation
Architecture
The version branch (
RUBY_VERSION >= "3.3") is applied at three layers:lib/date.rbrequire_relativepure Ruby filesrequire 'date_core'(C ext)ext/date/extconf.rbcreate_makefile('date_core')Rakefiletask :compileis a no-opRake::ExtensionTaskcompiles C extUSE_PACKmon/mday/hour/min/secinto a single integer for memory efficiency@nth,@jd,@df,@sf,@of,@sg)TIGHT_PARSERDate._parse(disabled by default in C via/* #define TIGHT_PARSER */)TIGHT_PARSERlogic is not implementedPure Ruby file structure
lib/date/core.rblib/date/parse.rbDate._parse,_iso8601,_rfc3339,_rfc2822,_xmlschema,_jisx0301lib/date/datetime.rblib/date/strptime.rbstrptimeparsinglib/date/strftime.rbstrftimeformattinglib/date/zonetab.rblib/date/patterns.rblib/date/constants.rblib/date/time.rbDate#to_time,Time#to_date,Time#to_datetimelib/date/version.rbDate::VERSIONChanges
Rakefile: Branch onRUBY_VERSIONfor compile/test task setup;testdepends oncompilefor Ruby < 3.3date.gemspec: Include bothlib/**/*.rbandext/date/*files; setextensionsext/date/extconf.rb: Generate dummy Makefile on Ruby >= 3.3, build C ext otherwiselib/date.rb: Branch onRUBY_VERSIONfor require pathlib/date/*.rb(new): Pure Ruby implementation (10 files, ~9,500 lines)Sidenote
It has not been refactored because the goal is to replace C with Ruby. If this PR is merged, it will be refactored.