Skip to content

Conversation

@gpshead
Copy link
Member

@gpshead gpshead commented Feb 5, 2022

Adds prefixmatch APIs to the re module as an alternate name for our long existing match APIs to help alleviate a common Python confusion for those coming from other languages regular expression libraries.

These alleviate common confusion around what "match" means as Python is different than other popular languages regex libraries in our use of the term as an API name. The original match names are NOT being deprecated. Source tooling like linters, IDEs, and LLMs could suggest using prefixmatch instead of match to improve code health and reduce cognitive burden of understanding the intent of code when configured for a modern minimum Python version.

See the documentation changes within this PR for a better description.

Documentation Preview: https://cpython-previews--31137.org.readthedocs.build/en/31137/

These alleviate common confusion around what "match" means as Python is
different than other popular languages in our use of the term as an API
name.  The original "match" names are NOT being deprecated.  Source
tooling like linters are expected to suggest using prefixmatch instead
of match to improve code health and reduce cognitive burden of
understanding the intent when reading code.

See the documentation changes within this PR for a better description.
@gpshead gpshead added the type-feature A feature request or enhancement label Feb 5, 2022
@gpshead gpshead marked this pull request as draft January 30, 2023 05:20
@gpshead gpshead changed the title bpo-42353: Add prefixmatch APIs to the re module gh-86519: Add prefixmatch APIs to the re module Jan 30, 2023
@python-cla-bot
Copy link

python-cla-bot bot commented Apr 18, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@gpshead gpshead force-pushed the prefixmatch-b42353 branch 3 times, most recently from 554bb41 to 54c77ca Compare April 19, 2025 01:09
@gpshead gpshead force-pushed the prefixmatch-b42353 branch from 54c77ca to 149f6e4 Compare April 19, 2025 01:20
ntBre pushed a commit to astral-sh/ruff that referenced this pull request Jan 30, 2026
## Summary

The example for
[FURB167](https://docs.astral.sh/ruff/rules/regex-flag-alias/#regex-flag-alias-furb167)
has `re.match` with an `^` anchor to match the start of the string:

```python
if re.match("^hello", "hello world", re.I):
```

But `re.match` already implicitly matches the start of the string:

https://docs.python.org/3/library/re.html#search-vs-match

Let's change the example to `re.search` so the anchor isn't redundant.
(The anchor's actually irrelevant to the example for this rule about
long or short flag names.)

(Aside: There's a discussion about adding `re.prefixmatch` and [soft]
deprecating `re.match` because of the confusion around it:
https://discuss.python.org/t/add-re-prefixmatch-deprecate-re-match/105927,
python/cpython#86519,
python/cpython#31137.)



## Test Plan

<!-- How was it tested? -->

1. Create feature branch
2. Push to my fork to run CI
3. Realise feature branches are disabled for forks in Ruff CI
4. Merge feature branch to my `main`
5. Push that
6. Be happy I did, because it failed because I missed something
7. Fixup, pushup
8. Passes [🎉](https://github.com/hugovk/ruff/actions/runs/21524112749)
gpshead and others added 4 commits January 31, 2026 15:18
Resolved conflicts:
- Doc/whatsnew/3.14.rst: Used main's version (3.14 is released)
- Lib/re/__init__.py: Removed __version__ (removed in main), updated
  docstring to reference 3.15 instead of 3.14

Added prefixmatch What's New entry to Doc/whatsnew/3.15.rst since the
feature is now targeting Python 3.15.
- Change "25 years" to "30 years" to reflect actual time
- Replace speculative "this decade, if ever" / "7 years" language
  with clear statement that we will never remove the original
  match name

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix traceback to include ^ anchor matching the pair pattern definition
- Add \A anchor to one example as a teaching hint for readers
- Update card game examples to demonstrate search/match/prefixmatch mix
- Add explanatory paragraph about match and prefixmatch being identical
- Rename compiled regex variables to use _re suffix (valid_hand_re, pair_re)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Using first_name/last_name patterns promotes the myth that names
have simple, universal structures. Replace with:
- "killer rabbit" with adjective/animal groups
- "Norwegian Blue, pining for the fjords" for unlabeled groups
@gpshead gpshead marked this pull request as ready for review February 1, 2026 00:12
@gpshead gpshead requested a review from AA-Turner as a code owner February 1, 2026 00:12
Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you for making this happen!

@picnixz
Copy link
Member

picnixz commented Feb 8, 2026

Did we actually reached a consensus on DPO for that feature to be merged? I thought there were some issues, especially with the fact that regex needs to be updated to be with parity.

@hugovk
Copy link
Member

hugovk commented Feb 8, 2026

Did we actually reached a consensus on DPO for that feature to be merged?

Yes, I think the 24 hearts on Greg's https://discuss.python.org/t/add-re-prefixmatch-deprecate-re-match/105927/20 is a pretty good sign.

The only other posts with double-digit hearts are also in support: the OP (10), a docs suggestion from Tim (14), a name suggestion (startswith) from Marc-André (11).

I thought there were some issues, especially with the fact that regex needs to be updated to be with parity.

Whatever the third-party regex package does or doesn't do, shouldn't hold us back. They say "This regex implementation is backwards-compatible with the standard ‘re’ module, but offers additional functionality", so it doesn't sound like it'll be a problem for them to do the same thing.

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Non-blocking docs suggestion:

Use prefixmatch pretty much throughout, except to mention match is the alias (plus version changed etc.), which you've mostly already done here.

We can also update other places like the regex HOWTO to prefer the new name, but this can also be in a followup.

Bike shed colour:

I'm fine with either prefixmatch and startswith.

Pro for prefixmatch: it keeps the "match" part, which might help remind people when they see match: "oh, wait, is this the lopsided prefixmatch or fullmatch? I'd better double check the docs and use one of those descriptive names or instead search with explicit anchors."

Soft deprecation:

We're discouraging in docs and not planning on removing. Sounds like soft deprecation. I'd be fine with soft deprecating too, but that could also be another discussion once this is done (if anyone has energy for that!).

@gpshead
Copy link
Member Author

gpshead commented Feb 8, 2026

Bike shed colour:
I'm fine with either prefixmatch and startswith.
Pro for prefixmatch: it keeps the "match" part, which might help remind people

I'm similarly inclined to stick with prefixmatch. It ties in with match wording elsewhere and avoids confusion w.r.t. the str/bytes methods of the same startswith name.

lots of what look like good comments on the docs here and thoughts on how to document this to digest from the discuss thread. i'll work on getting the doc edits in and see if it looks like it needs more review after that.

gpshead and others added 8 commits February 9, 2026 00:05
Apply suggestions from hugovk and hauntsaninja reviews:
- List prefixmatch before match in function/method signatures
- Use prefixmatch exclusively in examples, remove redundant match duplicates
- Remove interchangeable paragraph (covered by prefixmatch-vs-match section)
- Rename section to "search() vs. prefixmatch()"
- Add .. note:: directive for MULTILINE caveat
- Fix time-sensitive wording ("very recent Python", "never")
- Fix alphabetical ordering in whatsnew/3.15.rst
- Fix comment grammar in Lib/re/__init__.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the copy_prefixmatch_method_def_to_match() runtime memcpy with
static struct initializers that reuse the Clinic-generated prefixmatch
parser directly. This avoids duplicating argument parsing boilerplate
while keeping everything initialized at compile time.

Add Py_DEBUG assertions in sre_exec to verify the match and prefixmatch
method table entries remain identical except for ml_name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "Removed" heading underline was exactly 7 '=' characters, which
triggers the check-merge-conflict pre-commit hook as a false positive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor test_match_getitem to use a helper method instead of a
for loop, keeping the assertion lines at their original indentation
to preserve git line attribution history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gpshead gpshead self-assigned this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting merge type-feature A feature request or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants