Skip to content

fix(mine): log warning when files exceed MAX_FILE_SIZE (#923)#924

Open
mvalentsev wants to merge 2 commits intoMemPalace:developfrom
mvalentsev:fix/mine-log-oversized-skips
Open

fix(mine): log warning when files exceed MAX_FILE_SIZE (#923)#924
mvalentsev wants to merge 2 commits intoMemPalace:developfrom
mvalentsev:fix/mine-log-oversized-skips

Conversation

@mvalentsev
Copy link
Copy Markdown
Contributor

Summary

mempalace mine and mempalace mine --mode convos silently skip files
larger than the 10 MB MAX_FILE_SIZE with a bare continue. No log, no
counter, exit code 0. The output is indistinguishable from a directory
that legitimately had no mineable files.

This is especially painful for --mode convos, where long Claude/ChatGPT
exports routinely exceed 10 MB and silently vanish.

Prints a SKIP warning per oversized file, matching the format already used
in split_mega_files.py:

  SKIP: big_export.json (13.3 MB) exceeds 10 MB limit

Changes

  • mempalace/miner.py: log SKIP warning in scan_project() when a file
    exceeds MAX_FILE_SIZE
  • mempalace/convo_miner.py: same fix in scan_convos()
  • Tests for both paths (monkeypatched threshold, capsys capture)

Test plan

  • pytest tests/test_miner.py::test_scan_project_skips_oversized_files
  • pytest tests/test_convo_miner_unit.py::TestScanConvos::test_scan_skips_oversized_files
  • Full suite: 947 passed
  • ruff check: clean

Closes #923

@igorls igorls added bug Something isn't working area/mining File and conversation mining labels Apr 15, 2026
@mvalentsev mvalentsev marked this pull request as ready for review April 15, 2026 21:31
Both miner.py and convo_miner.py silently skip files larger than the
10 MB limit with a bare continue. This is especially painful for
conversation mining where long Claude/ChatGPT exports routinely
exceed 10 MB and vanish with no trace.

Print a SKIP warning per oversized file, matching the existing format
in split_mega_files.py.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mining File and conversation mining bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mine: oversized files (>10 MB MAX_FILE_SIZE) are silently skipped with no log or counter

2 participants