fix: handle ChatGPT bulk export list format and raise file size limit by Deen-Wong · Pull Request #940 · MemPalace/mempalace

Deen-Wong · 2026-04-16T06:58:35Z

What this fixes

Two bugs that prevent ChatGPT bulk exports from being mined correctly.

Bug 1: File size limit too low (convo_miner.py)

MAX_FILE_SIZE was set to 10MB. ChatGPT exports split conversation history across multiple files, many of which exceed 10MB. Files over the limit are silently skipped with no warning to the user.

Fix: raised to 200MB.

Bug 2: ChatGPT list format not handled (normalize.py)

_try_chatgpt_json() only handled a single conversation object (dict with mapping key). ChatGPT bulk exports are lists of conversation objects. When a list was passed, the function returned None and fell through to _try_slack_json(), producing 0 or 1 drawers instead of hundreds.

Fix: added list handling to _try_chatgpt_json() and extracted _try_chatgpt_single() as the per-conversation parser.

How to test

Export your ChatGPT history and run:

- normalize.py: _try_chatgpt_json now handles list of conversations (ChatGPT bulk export format) in addition to single conversation dict. Adds _try_chatgpt_single() as the per-conversation parser. - convo_miner.py: raise MAX_FILE_SIZE from 10MB to 200MB. ChatGPT exports often exceed 10MB per file, causing silent skips. Tested against 208 conversations producing 6281 drawers.

mvalentsev · 2026-04-16T07:07:12Z

The ChatGPT list-of-conversations handling looks like a valid addition. A couple of concerns with the rest of the diff though:

strip_noise() removal -- this was deliberately added across three commits (9b99c13, ca2598a, 7e5eeda) and has a NORMALIZE_VERSION schema gate in palace.py so existing drawers get silently rebuilt. Removing it means Claude Code system tags, hook output, and UI chrome all end up in drawers again and pollute search results.

Slack sanitization removal -- the re.sub on user IDs guards against chunk-boundary injection via crafted exports, and the [{user_id}] prefix preserves who said what in multi-party chats. Dropping both is a security/data regression.

MAX_FILE_SIZE 200 MB -- #396 was specifically about OOM on large transcript files. 20x the current limit risks reintroducing that. The comment on line 58 still says "10 MB" too. #924 already adds SKIP logging so users know when files are skipped.

Would it make sense to split the ChatGPT list handling into its own focused PR? The normalize.py changes unrelated to that feature seem risky to land together.

Deen-Wong · 2026-04-16T07:11:52Z

Thanks for the detailed review — these are valid concerns.

You're right that copying the full normalize.py from my local venv
inadvertently included changes beyond the ChatGPT list fix. The
strip_noise() removal and Slack sanitization changes were not
intentional — I should have done a surgical diff instead.

On MAX_FILE_SIZE: fair point on the OOM risk. Would 50MB be a
reasonable middle ground, or is there a better approach given #924
adds SKIP logging?

I'll revert to a targeted change — only the ChatGPT list handling
in normalize.py and the file size adjustment in convo_miner.py,
leaving strip_noise() and Slack sanitization intact.

Deen-Wong · 2026-04-16T07:20:32Z

Thanks for the detailed review — these are valid concerns.
The strip_noise() removal and Slack sanitization changes were unintentional — I copied the full normalize.py from my local patched venv instead of doing a surgical diff. I've reverted those and the new commit contains only the ChatGPT list handling.
On MAX_FILE_SIZE: raised to 50MB as a middle ground rather than 200MB. Happy to defer to whatever the maintainers prefer, given the OOM history in #396 — the comment on line 58 is also updated to match.
The diff is now focused: only _try_chatgpt_json and _try_chatgpt_single in normalize.py, and the file size constant in convo_miner.py.

- normalize.py: _try_chatgpt_json now handles list of conversations (ChatGPT bulk export format) in addition to single conversation dict. Adds _try_chatgpt_single() as the per-conversation parser. strip_noise() and Slack sanitization left intact. - convo_miner.py: raise MAX_FILE_SIZE from 10MB to 50MB. ChatGPT exports often exceed 10MB per file causing silent skips. Updated comment to match new value. Tested against 208 conversations producing 6281 drawers.

Deen-Wong requested review from bensig and milla-jovovich as code owners April 16, 2026 06:58

Deen-Wong force-pushed the fix/chatgpt-list-format-and-file-size-limit branch from b45c5db to da90736 Compare April 16, 2026 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle ChatGPT bulk export list format and raise file size limit#940

fix: handle ChatGPT bulk export list format and raise file size limit#940
Deen-Wong wants to merge 2 commits intoMemPalace:developfrom
Deen-Wong:fix/chatgpt-list-format-and-file-size-limit

Deen-Wong commented Apr 16, 2026 •

edited

Loading

Uh oh!

mvalentsev commented Apr 16, 2026

Uh oh!

Deen-Wong commented Apr 16, 2026

Uh oh!

Deen-Wong commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Deen-Wong commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this fixes

Bug 1: File size limit too low (convo_miner.py)

Bug 2: ChatGPT list format not handled (normalize.py)

How to test

Uh oh!

mvalentsev commented Apr 16, 2026

Uh oh!

Deen-Wong commented Apr 16, 2026

Uh oh!

Deen-Wong commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Deen-Wong commented Apr 16, 2026 •

edited

Loading