Skip to content

gh-144156: Fix email header folding concatenating encoded words#144692

Open
robsdedude wants to merge 13 commits intopython:mainfrom
robsdedude:fix/gh-144156
Open

gh-144156: Fix email header folding concatenating encoded words#144692
robsdedude wants to merge 13 commits intopython:mainfrom
robsdedude:fix/gh-144156

Conversation

@robsdedude
Copy link
Contributor

@robsdedude robsdedude commented Feb 10, 2026

@bedevere-app
Copy link

bedevere-app bot commented Feb 10, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@robsdedude robsdedude changed the title gh-144156: Fix email header wrapping omitting white space gh-144156: Fix email header folding concatenating encoded words Feb 10, 2026
@robsdedude robsdedude marked this pull request as ready for review February 11, 2026 13:32
@robsdedude robsdedude requested a review from a team as a code owner February 11, 2026 13:32
@bitdancer
Copy link
Member

FYI I've been looking at this and will have a review soonish. You are correct that the gh-92281 fix was buggy.

Copy link
Member

@bitdancer bitdancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

I edited this review a bunch, so mistake could have crept in to my code suggestions, though I did test them. The randomizer test should catch any transcription errors if nothing else does.

# Therefore, we encode all to-be-displayed whitespace in the second
# encoded word.
leading_whitespace = _steal_all_trailing_WSP_if_exists(lines)
to_encode = leading_whitespace + to_encode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to_encode = leading_whitespace + to_encode
len_without_wsp = len(lines[-1].rstrip(_WSP))
leading_whitespace = lines[-1][len_without_wsp:]
lines[-1] = lines[-1][:len_without_wsp] + (
' ' if leading_whitespace else '')
to_encode = leading_whitespace + to_encode

This avoids that complicated single use function. We should never be producing lines that contain only blanks, they get encoded.

You can also move this up to be the first elif and drop the 'and not last_word_is_ew'.

Copy link
Contributor Author

@robsdedude robsdedude Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never be producing lines that contain only blanks, they get encoded.

While testing, I noticed, too, that wrapped spaces are always encoded. TBH, I didn't like that and wanted to come up with a solution that's robust to changing this behavior in the future. I didn't like it because reading the RFC and the rest of this code it sure seems like not encoding words is generally preferable because of its human readability (and to a lesser extent mail clients that don't support encoded words). Further, I thought assuming an invariant that's enforced a few hundred lines away in a different function is a nice recipe for spooky action at a distance.

If you anyway think the fix should assume this invariant, I'll change this part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you have to keep in mind that the RFC requires that there be no line that consists of only whitespace. In theory the folder is only supposed to encode whitespace that it has to, but a completely blank line is an example of "has to". If it is being overly aggressive and encoding whitespace it doesn't need to that would be a bug. Though as the need for this PR demonstrates, getting it right isn't easy ;)

As for the spooky action at a distance...yes, this code has grown into a spaghetti ball. Believe it or, the original (pre bug fixing) version of this code was like the fourth iteration of ending up with an incomprehensible mess and starting over. I have plans to redo the whole thing from scratch again, and do something like have an accumulator object that would own things like the last_word_is_ew flag, but I'm working on rewriting the parser first, and that is taking a while. And I want to add a lot more tests before I attempt a rewrite, to make sure I don't break anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to keep in mind that the RFC requires that there be no line that consists of only whitespace.

Thanks, I overlooked that part in the RFC. In this case I agree that there's much less risk I will change the PR accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to move this elif up above the one it is currently after. It only saves 23 characters, but since we can, why not :) More importantly, the logic is IMO slightly easier to follow: first we check if we are appending to an encoded word, and then if not we handle the remaining special case that only matters when we aren't appending to an encoded word.

@bedevere-app
Copy link

bedevere-app bot commented Feb 12, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Make use of the fact that folder must never produce a line of only WSP.
# Therefore, we encode all to-be-displayed whitespace in the second
# encoded word.
leading_whitespace = _steal_all_trailing_WSP_if_exists(lines)
to_encode = leading_whitespace + to_encode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to move this elif up above the one it is currently after. It only saves 23 characters, but since we can, why not :) More importantly, the logic is IMO slightly easier to follow: first we check if we are appending to an encoded word, and then if not we handle the remaining special case that only matters when we aren't appending to an encoded word.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants