Skip to content

Bug: superdoc_open fails with "Invalid content for node type table" on Google Docs DOCX exports #3242

@law-and-code

Description

@law-and-code

What happened?

Summary

Opening a .docx file exported from Google Docs fails with the error:

Failed to open document: Failed to load document: Invalid content for node type table

The root cause is a row-level <w:sdt> (Structured Document Tag / Content Control) wrapping <w:tr> elements directly inside <w:tbl>. Google Docs emits this pattern on some DOCX exports. SuperDoc's table parser appears to expect <w:tr> as a direct child of <w:tbl> and does not handle the <w:sdt> wrapper at the row level.

Steps to Reproduce

Reliable reproduction using the attached fixture:

  1. Use fixture-01-sdt-row.docx (attached below) — it contains a table whose rows are confirmed to be wrapped in <w:sdt>.
  2. Call superdoc_open({ path: "/path/to/fixture-01-sdt-row.docx" }).
  3. Observe the error: Failed to load document: Invalid content for node type table.

Reproduction via Google Docs export (not guaranteed every export):

  1. Create a document in Google Docs that contains a table.
  2. Export it as .docx (File → Download → Microsoft Word (.docx)).
  3. Call superdoc_open({ path: "/path/to/exported.docx" }).
  4. If Google Docs included the goog_rdk_* SDT wrapper in this export, the error will occur. If not, the file opens normally. Use the fixture above for a guaranteed reproduction.

Error Message

Failed to open document: Failed to load document: Invalid content for node type table

Root Cause

Expected OOXML structure

SuperDoc appears to expect table rows as direct children of the table element:

<w:tbl>
  <w:tblPr>...</w:tblPr>
  <w:tblGrid>
    <w:gridCol w:w="4680"/>
    <w:gridCol w:w="4680"/>
  </w:tblGrid>
  <w:tr>           <!-- direct child -->
    <w:tc>...</w:tc>
    <w:tc>...</w:tc>
  </w:tr>
</w:tbl>

Actual OOXML structure (Google Docs export)

Google Docs wraps the <w:tr> in a <w:sdt> (Structured Document Tag) at the row level:

<w:tbl>
  <w:tblPr>...</w:tblPr>
  <w:tblGrid>
    <w:gridCol w:w="4680"/>
    <w:gridCol w:w="4680"/>
  </w:tblGrid>
  <w:sdt>                                      <!-- Google Docs wrapper -->
    <w:sdtPr>
      <w:tag w:val="goog_rdk_4"/>              <!-- Google-internal marker -->
      <w:id w:val="1961638184"/>
      <w:lock w:val="contentLocked"/>
    </w:sdtPr>
    <w:sdtContent>
      <w:tr w:rsidR="005458D6" ...>            <!-- row inside the sdt -->
        <w:tc>...</w:tc>
        <w:tc>...</w:tc>
      </w:tr>
    </w:sdtContent>
  </w:sdt>
</w:tbl>

OOXML specification context

Row-level <w:sdt> elements inside <w:tbl> are explicitly permitted by ECMA-376 (§17.5.2). The spec defines four valid positions for structured document tags:

SDT position Wraps Allowed
Block level <w:p> Yes
Row level <w:tr> Yes (ECMA-376 §17.5.2)
Cell level <w:tc> Yes
Inline level runs inside <w:p> Yes

Google Docs emits row-level SDTs with the proprietary tag prefix goog_rdk_* on tables exported to DOCX. This behaviour is not guaranteed on every export — it appears to depend on how the table was created in Google Docs — but it occurs frequently enough in practice to be a significant compatibility issue. The <w:lock w:val="contentLocked"/> property prevents end-users from deleting the row in Word but does not affect document semantics.

Impact

Any .docx file containing a table that was created in or passed through Google Docs cannot be opened by SuperDoc. This is a common real-world scenario: contracts, forms, and reports with signature blocks or data tables are frequently authored or reviewed in Google Docs before being processed programmatically.

Environment

  • SuperDoc MCP server via npx @superdoc-dev/mcp
  • File origin: Google Docs DOCX export
  • OS: macOS 15.3
  • The underlying XML is well-formed and passes xml.etree.ElementTree validation

Suggested Fix

Likely location: packages/super-editor/src/editors/v1/core/super-converter/ — the DOCX import pipeline where OOXML is parsed into ProseMirror nodes. Specifically, the code that walks <w:tbl> children to collect <w:tr> elements.

When parsing <w:tbl> children, treat <w:sdt> / <w:sdtContent> as a transparent wrapper and recurse into <w:sdtContent> to find the actual <w:tr> elements. This mirrors how compliant Word processors (Microsoft Word, LibreOffice) handle row-level SDTs: they render the contained rows normally and simply apply the SDT's lock/tag metadata as an overlay.

Pseudocode:

function getTableRows(tblNode) {
  const rows = [];
  for (const child of tblNode.children) {
    if (child.name === 'w:tr') {
      rows.push(child);
    } else if (child.name === 'w:sdt') {
      // Transparent unwrap: recurse into sdtContent
      const sdtContent = child.find('w:sdtContent');
      if (sdtContent) {
        for (const inner of sdtContent.children) {
          if (inner.name === 'w:tr') rows.push(inner);
        }
      }
    }
  }
  return rows;
}

The same transparent-unwrap pattern likely applies to cell-level and block-level SDTs and may already be handled elsewhere in SuperDoc's parser — this issue is specifically the row-level (<w:tr>-wrapping) case inside <w:tbl>.

Test Fixtures

The following minimal DOCX files were created to isolate the failure to exactly the row-level SDT position. All other SDT positions open without error.

File SDT position superdoc_open
fixture-01-sdt-row.docx Row level — <w:sdt> wrapping <w:tr> inside <w:tbl> FAILInvalid content for node type table
fixture-02-sdt-block-toc.docx Block level — <w:sdt> wrapping <w:p> (TOC) PASS
fixture-03-sdt-inline-control.docx Inline — <w:sdt> date picker inside <w:p> PASS

fixture-01-sdt-row.docx originates from a Google Docs export and contains a 3-row, 2-column table. Because Google Docs does not consistently include the goog_rdk_* SDT wrapper on every export, the <w:sdt> structure was verified and confirmed to be present in the XML before use as a fixture. The other two files open successfully, confirming the bug is scoped to the row-level position only.

fixture-01-sdt-row.docx
fixture-02-sdt-block-toc.docx
fixture-03-sdt-inline-control.docx

it-1040-sdt-wrapped-table.docx

Workaround

Until fixed, the <w:sdt> wrapper can be removed manually by unzipping the DOCX, editing word/document.xml, and replacing:

<w:sdt>
  <w:sdtPr>...</w:sdtPr>
  <w:sdtContent>
    <w:tr>...</w:tr>
  </w:sdtContent>
</w:sdt>

with the bare <w:tr>...</w:tr> and re-zipping. This produces a file that SuperDoc can open successfully.

Steps to reproduce

No response

SuperDoc version

1.3.0

Browser

None

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions