-
-
Notifications
You must be signed in to change notification settings - Fork 34.1k
gh-144156: Fix email header folding concatenating encoded words #144692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
robsdedude
wants to merge
13
commits into
python:main
Choose a base branch
from
robsdedude:fix/gh-144156
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+86
−40
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
b5925e0
Fix email header wrapping omitting white space
robsdedude 1a7824e
📜🤖 Added by blurb_it.
blurb-it[bot] fb23a6e
fixup! Fix email header wrapping omitting white space
robsdedude 050491c
fixup! Fix email header wrapping omitting white space
robsdedude ed6f197
Formatting
robsdedude 91b4b3b
Extend news fragment
robsdedude d5f5001
Formatting
robsdedude 15e6618
Fix comment
robsdedude 902ec59
Faster and more readable last_word_is_ew tracking
robsdedude a1cfe3c
Fix broken application of news fragment suggestion
robsdedude 45022be
Simplify code
robsdedude ccf399d
Fix related WSP folding bug
robsdedude 0d694d6
Inline helper function
robsdedude File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
Misc/NEWS.d/next/Core_and_Builtins/2026-02-10-22-05-51.gh-issue-144156.UbrC7F.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Fix the folding of headers by the :mod:`email` library when :rfc:`2047` encoded words are used. Now whitespace is correctly preserved and also correctly added between adjacent encoded words. The latter property was broken by the fix for gh-92081, which mostly fixed previous failures to preserve whitespace. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoids that complicated single use function. We should never be producing lines that contain only blanks, they get encoded.
You can also move this up to be the first elif and drop the 'and not last_word_is_ew'.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing, I noticed, too, that wrapped spaces are always encoded. TBH, I didn't like that and wanted to come up with a solution that's robust to changing this behavior in the future. I didn't like it because reading the RFC and the rest of this code it sure seems like not encoding words is generally preferable because of its human readability (and to a lesser extent mail clients that don't support encoded words). Further, I thought assuming an invariant that's enforced a few hundred lines away in a different function is a nice recipe for spooky action at a distance.
If you anyway think the fix should assume this invariant, I'll change this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you have to keep in mind that the RFC requires that there be no line that consists of only whitespace. In theory the folder is only supposed to encode whitespace that it has to, but a completely blank line is an example of "has to". If it is being overly aggressive and encoding whitespace it doesn't need to that would be a bug. Though as the need for this PR demonstrates, getting it right isn't easy ;)
As for the spooky action at a distance...yes, this code has grown into a spaghetti ball. Believe it or, the original (pre bug fixing) version of this code was like the fourth iteration of ending up with an incomprehensible mess and starting over. I have plans to redo the whole thing from scratch again, and do something like have an accumulator object that would own things like the last_word_is_ew flag, but I'm working on rewriting the parser first, and that is taking a while. And I want to add a lot more tests before I attempt a rewrite, to make sure I don't break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I overlooked that part in the RFC. In this case I agree that there's much less risk I will change the PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to move this elif up above the one it is currently after. It only saves 23 characters, but since we can, why not :) More importantly, the logic is IMO slightly easier to follow: first we check if we are appending to an encoded word, and then if not we handle the remaining special case that only matters when we aren't appending to an encoded word.