Skip to content

fix: Replace with fast-xml-parser (pure ESM) which works in all JS runtimes including workerd.#596

Open
maikunari wants to merge 5 commits intoemdash-cms:mainfrom
maikunari:fix/wxr-cloudflare-compat
Open

fix: Replace with fast-xml-parser (pure ESM) which works in all JS runtimes including workerd.#596
maikunari wants to merge 5 commits intoemdash-cms:mainfrom
maikunari:fix/wxr-cloudflare-compat

Conversation

@maikunari
Copy link
Copy Markdown
Contributor

@maikunari maikunari commented Apr 16, 2026

The WordPress WXR parser used the 'sax' package (CommonJS-only) which caused 'module is not defined' errors in Cloudflare Workers (workerd).

Replace with fast-xml-parser (pure ESM) which works in all JS runtimes including workerd.

Key changes:

  • Replace SAX event-driven parsing with tree-based XML parsing
  • parseWxrString() and parseWxr() now use fast-xml-parser's XMLParser
  • Remove sax dependency, add fast-xml-parser@^5.6.0
  • Add 15 unit tests for the new parser implementation
  • Maintain identical output types and behavior (WxrData interface unchanged)

Fixes #573

What does this PR do?

Replaces the sax XML parser (CommonJS-only) with fast-xml-parser (pure ESM) so WordPress WXR imports work in Cloudflare Workers (workerd). The original sax dependency crashes with module is not defined in workerd.
Fixes #573

Closes #

Type of change

  • Bug fix
  • Feature (requires maintainer-approved Discussion)
  • Refactor (no behavior change)
  • Translation
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm lint passes
  • pnpm test passes (or targeted tests for my change)
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • User-visible strings in the admin UI are wrapped for translation and pnpm locale:extract has been run (if applicable)
  • I have added a changeset (if this PR changes a published package)
  • New features link to an approved Discussion: https://github.com/emdash-cms/emdash/discussions/...

AI-generated code disclosure

  • This PR includes AI-generated code

Screenshots / test output

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 16, 2026

⚠️ No Changeset found

Latest commit: e5d3236

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

Scope check

This PR changes 1,567 lines across 4 files. Large PRs are harder to review and more likely to be closed without review.

If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs.

See CONTRIBUTING.md for contribution guidelines.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Apr 16, 2026

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@596

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@596

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@596

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@596

emdash

npm i https://pkg.pr.new/emdash@596

create-emdash

npm i https://pkg.pr.new/create-emdash@596

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@596

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@596

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@596

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@596

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@596

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@596

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@596

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@596

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@596

commit: e5d3236

@ascorbic
Copy link
Copy Markdown
Collaborator

How does memory use compare?

@maikunari
Copy link
Copy Markdown
Contributor Author

How does memory use compare?

Fast-xml-parser builds a full object tree, so it does use more memory than SAX's streaming approach. However, sax is CJS-only and crashes on Cloudflare Workers with module is not defined, so it's not viable for the workerd runtime.

For typical WXR files (most are under 50MB), the memory footprint is reasonable. If very large imports become an issue down the line, we could explore chunked processing — but I figured getting imports working on Cloudflare first was the priority. Happy to optimize later if needed!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces the WordPress WXR parser dependency from CommonJS-only sax to fast-xml-parser to avoid module is not defined failures in Cloudflare Workers (workerd), while keeping the existing WxrData output shape.

Changes:

  • Swap SAX event-driven parsing for fast-xml-parser tree-based parsing in the WXR parser.
  • Remove sax, add fast-xml-parser@^5.6.0 (and update lockfile).
  • Add a new unit test suite covering parseWxrString() scenarios.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 9 comments.

File Description
packages/core/src/cli/wxr/parser.ts Reimplements WXR parsing using fast-xml-parser, including item/meta/taxonomy extraction and stream handling.
packages/core/tests/unit/wxr/parser.test.ts Adds unit tests for parseWxrString() to validate behavior in non-Node runtimes.
packages/core/package.json Replaces sax dependency with fast-xml-parser.
pnpm-lock.yaml Lockfile updates reflecting dependency swap and transitive deps.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const SINGLE_POST_WXR = `<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
Comment on lines +169 to +173
it("parses a single published post with categories, tags, and meta", async () => {
const data = await parseWxrString(SINGLE_POST_WXR);

expect(data.posts.length).toBe(1);
const post = data.posts[0];
Comment thread packages/core/src/cli/wxr/parser.ts Outdated

post.title = getText(item["title"]) || undefined;
post.link = getText(item["link"]) || undefined;
post.pubDate = getText(item["pubdate"]) || undefined;
Comment thread packages/core/src/cli/wxr/parser.ts Outdated
Comment on lines +480 to +481
stream.on("data", (chunk: Buffer) => {
chunks.push(chunk);
Comment thread packages/core/src/cli/wxr/parser.ts Outdated
Comment on lines +220 to +224
/** Parse a numeric string, returning undefined for NaN/0/missing */
function parseIntSafe(val: string | undefined): number | undefined {
if (!val) return undefined;
const n = parseInt(val, 10);
return isNaN(n) ? undefined : n;
Comment thread packages/core/src/cli/wxr/parser.ts Outdated
Comment on lines +473 to +474
* This is compatible with all environments; for very large files (>100MB),
* consider using parseWxrString() with chunked reading instead.

import { describe, it, expect } from "vitest";

import { parseWxrString, type WxrData } from "../../../src/cli/wxr/parser.js";
"citty": "^0.1.6",
"consola": "^3.4.2",
"croner": "^10.0.1",
"fast-xml-parser": "^5.6.0",
Comment thread packages/core/src/cli/wxr/parser.ts Outdated
Comment on lines +464 to +466
const parser = createWxrParser();
const parsed = parser.parse(xml) as Record<string, unknown>;
return Promise.resolve(extractWxrData(parsed));
maikunari and others added 2 commits April 16, 2026 19:02
The WordPress WXR parser used the 'sax' package (CommonJS-only) which
caused 'module is not defined' errors in Cloudflare Workers (workerd).

Replace with fast-xml-parser (pure ESM, zero deps) which works in all
JS runtimes including workerd.

Key changes:
- Replace SAX event-driven parsing with tree-based XML parsing
- parseWxrString() and parseWxr() now use fast-xml-parser's XMLParser
- Remove sax dependency, add fast-xml-parser@^5.6.0
- Add 15 unit tests for the new parser implementation
- Maintain identical output types and behavior (WxrData interface unchanged)

Fixes emdash-cms#573
@ascorbic
Copy link
Copy Markdown
Collaborator

For handling CJS, our approach is usually to add the dependency to optimizeDeps in packages/core/src/astro/integration/vite-config.ts. We have had users who have raised concerns about memory usage in imports, even with the current approach, so I don't think it's viable to move to something fully in-memory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@maikunari maikunari force-pushed the fix/wxr-cloudflare-compat branch from 8f2688f to ec075f7 Compare April 16, 2026 10:10
@maikunari maikunari changed the title fix: replace sax with fast-xml-parser for Cloudflare Workers compat fix: Replace with fast-xml-parser (pure ESM) which works in all JS runtimes including workerd. Apr 16, 2026
@maikunari
Copy link
Copy Markdown
Contributor Author

Addressed Copilot's review feedback:

Fixed pubDate casing bug (was reading lowercase, WXR uses capital D)
Fixed stream encoding handling to accept both Buffer and string chunks
Wrapped parseWxrString in try/catch for consistent Promise rejection
Added namespace declaration to test fixture and pubDate assertion (verified this assertion fails against the pre-fix code)
Cleaned up docstrings and removed unused import

The original sax → fast-xml-parser commit introduced 52 TS2345 errors
that weren't caught locally because typecheck requires workspace builds.
The helpers declared `XmlNode = string | number | ... | undefined` but
every call site passes values from `Record<string, unknown>` indexing.

Widen the helper signatures to accept `unknown`. The helpers already
defensively runtime-check every branch, so permissive input types are
safe. Removed the now-unused `XmlNode` type alias.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@maikunari
Copy link
Copy Markdown
Contributor Author

Follow-up: the typecheck failure in CI turned out to be pre-existing on the branch — the original sax → fast-xml-parser commit introduced 52 TS2345 errors that weren't caught locally because typecheck requires workspace builds. Widened getText/getAttr to accept unknown (they already runtime-check every branch, so permissive input is safe). All checks should pass now.

After widening node to unknown, two paths could call String() on
arbitrary values, producing "[object Object]" or similar. Replace the
fallback with "" and narrow obj["#text"] to string|number before
stringifying. Matches existing defensive pattern for missing #text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WordPress import crashes with 'module is not defined' — sax CJS incompatible with workerd"

3 participants