gh-144156: Fix email header folding concatenating encoded words#144692
gh-144156: Fix email header folding concatenating encoded words#144692robsdedude wants to merge 13 commits intopython:mainfrom
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
FYI I've been looking at this and will have a review soonish. You are correct that the gh-92281 fix was buggy. |
bitdancer
left a comment
There was a problem hiding this comment.
Thanks for fixing this!
I edited this review a bunch, so mistake could have crept in to my code suggestions, though I did test them. The randomizer test should catch any transcription errors if nothing else does.
| # Therefore, we encode all to-be-displayed whitespace in the second | ||
| # encoded word. | ||
| leading_whitespace = _steal_all_trailing_WSP_if_exists(lines) | ||
| to_encode = leading_whitespace + to_encode |
There was a problem hiding this comment.
| to_encode = leading_whitespace + to_encode | |
| len_without_wsp = len(lines[-1].rstrip(_WSP)) | |
| leading_whitespace = lines[-1][len_without_wsp:] | |
| lines[-1] = lines[-1][:len_without_wsp] + ( | |
| ' ' if leading_whitespace else '') | |
| to_encode = leading_whitespace + to_encode |
This avoids that complicated single use function. We should never be producing lines that contain only blanks, they get encoded.
You can also move this up to be the first elif and drop the 'and not last_word_is_ew'.
There was a problem hiding this comment.
We should never be producing lines that contain only blanks, they get encoded.
While testing, I noticed, too, that wrapped spaces are always encoded. TBH, I didn't like that and wanted to come up with a solution that's robust to changing this behavior in the future. I didn't like it because reading the RFC and the rest of this code it sure seems like not encoding words is generally preferable because of its human readability (and to a lesser extent mail clients that don't support encoded words). Further, I thought assuming an invariant that's enforced a few hundred lines away in a different function is a nice recipe for spooky action at a distance.
If you anyway think the fix should assume this invariant, I'll change this part.
There was a problem hiding this comment.
Well, you have to keep in mind that the RFC requires that there be no line that consists of only whitespace. In theory the folder is only supposed to encode whitespace that it has to, but a completely blank line is an example of "has to". If it is being overly aggressive and encoding whitespace it doesn't need to that would be a bug. Though as the need for this PR demonstrates, getting it right isn't easy ;)
As for the spooky action at a distance...yes, this code has grown into a spaghetti ball. Believe it or, the original (pre bug fixing) version of this code was like the fourth iteration of ending up with an incomprehensible mess and starting over. I have plans to redo the whole thing from scratch again, and do something like have an accumulator object that would own things like the last_word_is_ew flag, but I'm working on rewriting the parser first, and that is taking a while. And I want to add a lot more tests before I attempt a rewrite, to make sure I don't break anything.
There was a problem hiding this comment.
you have to keep in mind that the RFC requires that there be no line that consists of only whitespace.
Thanks, I overlooked that part in the RFC. In this case I agree that there's much less risk I will change the PR accordingly.
There was a problem hiding this comment.
I think it is better to move this elif up above the one it is currently after. It only saves 23 characters, but since we can, why not :) More importantly, the logic is IMO slightly easier to follow: first we check if we are appending to an encoded word, and then if not we handle the remaining special case that only matters when we aren't appending to an encoded word.
Misc/NEWS.d/next/Core_and_Builtins/2026-02-10-22-05-51.gh-issue-144156.UbrC7F.rst
Outdated
Show resolved
Hide resolved
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Co-authored-by: R. David Murray <rdmurray@bitdance.com>
Misc/NEWS.d/next/Core_and_Builtins/2026-02-10-22-05-51.gh-issue-144156.UbrC7F.rst
Outdated
Show resolved
Hide resolved
Make use of the fact that folder must never produce a line of only WSP.
| # Therefore, we encode all to-be-displayed whitespace in the second | ||
| # encoded word. | ||
| leading_whitespace = _steal_all_trailing_WSP_if_exists(lines) | ||
| to_encode = leading_whitespace + to_encode |
There was a problem hiding this comment.
I think it is better to move this elif up above the one it is currently after. It only saves 23 characters, but since we can, why not :) More importantly, the logic is IMO slightly easier to follow: first we check if we are appending to an encoded word, and then if not we handle the remaining special case that only matters when we aren't appending to an encoded word.
Uh oh!
There was an error while loading. Please reload this page.