Skip to content

Conversation

@trinistr
Copy link
Contributor

@trinistr trinistr commented Feb 3, 2026

As #1223 was merged, let's do more of them.

From #1216

The cache-based optimization now supports lookarounds and atomic groupings. That is, match
for Regexp containing these extensions can now also be performed in linear time to the length of the input string. However, these cannot contain captures and cannot be nested. [Feature #19725]

Should negative cases be included? Not sure. Also, regexp with a capture group in a positive lookbehind is surprisingly linear.

> The cache-based optimization now supports lookarounds and atomic groupings. That is, match for Regexp containing these extensions can now also be performed in linear time to the length of the input string. However, these cannot contain captures and cannot be nested. [Feature #19725]
@eregon
Copy link
Member

eregon commented Feb 4, 2026

Should negative cases be included? Not sure. Also, regexp with a capture group in a positive lookbehind is surprisingly linear.

No, because Ruby implementations should always be allowed to optimize more.

There is /(a)\1/ currently which is not great as that could easily be optimized to be linear time. We should change it to something that's not reasonable to match in linear time like recursive subexpression call or a more complex backreference. Or maybe remove that example altogether, but OTOH it's useful to have a case to check "there are some cases in which it returns false".

@eregon
Copy link
Member

eregon commented Feb 4, 2026

These changes look good but I want to test them on TruffleRuby too and add explicit guards for cases where it differs.
The reason is this spec might be read as "what should work on every Ruby" but it's actually impossible to support some of these when using a DFA Regexp engine like TruffleRuby does.
CRuby uses a cache-based optimization which is slower but supports different Regexps.
For example atomic groupings seems supported with the cache-based optimization but AFAIK cannot be supported with DFA Regexp engines.

IOW I don't want Ruby implementations to be forced to use the slower cache-based optimization to pass this spec.
DFA Regexp engines are much faster and enable efficient JIT compilation, which TruffleRuby/TRegex does since 2021 (TruffleRuby+TRegex inspired several improvements for Regexp in CRuby).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants