fix(super-editor): stop injecting bCs/iCs/highlight defaults on round-trip (SD-2912)#3237
fix(super-editor): stop injecting bCs/iCs/highlight defaults on round-trip (SD-2912)#3237tupizz wants to merge 1 commit into
Conversation
…-trip (SD-2912) The customer fixture round-tripped to a document.xml ~37KB larger than the source, with 822 occurrences each of `<w:bCs/>`, `<w:iCs/>` and `<w:highlight w:val="none"/>` injected into runs whose source rPr contained none of them. Word treats explicit-off and absence as identical so the rendering was unchanged, but every consumer reading the XML saw thousands of "intentional" toggles that weren't real. Two changes: - `decodeRPrFromMarks` no longer auto-propagates `boldCs`/`italicCs` from the latin bold/italic mark. In OOXML these are independent properties (ECMA-376 §17.3.2). The propagation was emitting a complex-script companion element on every run that touched a bold or italic mark, even when the source rPr had no `<w:bCs/>` or `<w:iCs/>`. - The highlight case in the same function no longer emits an explicit `<w:highlight w:val="none"/>` for the transparent mark. That mark is synthesized from `<w:shd val="clear" fill="auto"/>` shading on import and the shading itself round-trips independently via its own element. To keep round-trip lossless for documents that genuinely carry `<w:bCs/>` or `<w:iCs/>` (which were previously preserved as a side effect of the auto-propagation), `boldCs` and `italicCs` are removed from `RUN_PROPERTIES_DERIVED_FROM_MARKS` in calculateInlineRunPropertiesPlugin. That moves them to the "preserve from existing runProperties" branch in `getInlineRunProperties`, so the values captured at import time survive appendTransactions verbatim. Verified on the SD-2912 fixture: - bCs: input=0 → before=822 → after=0 - iCs: input=0 → before=822 → after=0 - highlight: input=0 → before=822 → after=0 - document.xml: 1.13 MB → 1.08 MB (smaller than the 1.09 MB input) - All 12714 super-editor tests pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efeafe0ed2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } else { | ||
| runProperties.highlight = { 'w:val': mark.attrs.color }; | ||
| } | ||
| if (mark.attrs.color && mark.attrs.color.toLowerCase() !== 'transparent') { |
There was a problem hiding this comment.
Preserve explicit
w:highlight="none" overrides
The new mark.attrs.color.toLowerCase() !== 'transparent' guard drops all transparent highlight marks, but this codebase uses transparent to represent an explicit OOXML off value (getHighLightValue maps <w:highlight w:val="none"/> to transparent in the same file). That means an imported run with explicit highlight negation can no longer be reconstructed by decodeRPrFromMarks, so when run properties are recomputed (especially during run splitting/rewrites) the highlight key can be lost and style-inherited highlight may reappear in exported DOCX.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
The SD-2912 customer fixture round-trips to a
document.xml~37 KB larger than the source. 2466 redundant XML elements are injected into every run: 822 each of<w:bCs/>,<w:iCs/>and<w:highlight w:val="none"/>— none of which existed in the source rPr. Word treats explicit-off and absence as identical so the rendering doesn't change, but every consumer reading the XML sees thousands of "intentional" toggles that weren't real.Why
Two paths in
styles.js::decodeRPrFromMarkswere emitting derived defaults that pollute the output:runProperties.boldCs = runProperties.bold(and italic→italicCs). In OOXML these are independent properties (ECMA-376 §17.3.2.1, §17.3.2.16). Auto-propagating the latin value to the complex-script companion means every bold or italic mark — including defaultvalue: falsefrom the style cascade — emits a complex-script element.runProperties.highlight = { 'w:val': 'none' }when the highlight mark's color was'transparent'. That mark is synthesized from<w:shd val="clear" fill="auto"/>shading on import (seeencodeMarksFromRPr's shading case). The shading itself round-trips on its own<w:shd>element, so the synthesized highlight was pure noise.To keep documents that genuinely carry
<w:bCs/>or<w:iCs/>lossless (the auto-propagation was the only mechanism preserving them through the mark layer),boldCsanditalicCsare removed fromRUN_PROPERTIES_DERIVED_FROM_MARKSincalculateInlineRunPropertiesPlugin. They now go through the "preserve existing runProperties" branch ingetInlineRunProperties, surviving appendTransactions via the run's stored runProperties.Test plan
styles.test.jscovering: bold-off no longer propagates bCs, italic-off no longer propagates iCs, transparent highlight no longer emits.styles.test.js > decodeRPrFromMarks > should decode bold, italic, and strike marksstyles.test.js > marks encoding/decoding round-trip > should correctly round-trip basic propertiessyncParagraphRunProperties.test.js(cursor-mark fallback)getMarksFromSelection.test.js(selection formatting state)sd-2912-pgmar-roundtrip.test.jsthat round-trips the customer fixture and asserts zero new<w:bCs/>/<w:iCs/>/<w:highlight/>indocument.xml.pnpm test: 12 714 / 12 727 pass.0 → 0(was0 → 822on main).Verified counts on the customer fixture
document.xml<w:bCs><w:iCs><w:highlight>document.xmlsizeOut of scope (separate follow-ups documented on the ticket)
The SD-2912 fixture has additional drift sources I did not address in this PR:
<w:pgMar>coercion: 56 of 57<w:pgMar>elements preserve their float values byte-exact; one (the document body's sectPr) round-trips throughinchesToTwipsinensureSectionLayoutDefaultsand loses sub-twip precision while gaining aw:gutter="0". Fixing this requires storing the original twips alongsidepageStyles.pageMargins.customXml/item2.xmlSharePoint FormTemplates: lost on round-trip because the XML parser doesn't preserve<?mso-contentType?>processing instructions. Affects SharePoint integration metadata. Needs a binary-passthrough path forcustomXml/*.word/styles.xmlHyperlink style injection: a default Hyperlink style is added to styles.xml even when the source didn't have it. Likely intentional for downstream consumers; worth a separate scoping discussion.docProps/app.xml+ Content_Types override: SuperDoc generatesapp.xmllike Word does on save. Benign.The biggest source of drift (the 37 KB of redundant elements) is fixed here. The remaining items are smaller and independent.