Skip to content

feat(vectorise): add --batch-size option for memory-efficient processing#306

Open
superbiche wants to merge 1 commit intoDavidyz:mainfrom
superbiche:feat/batch-vectorise
Open

feat(vectorise): add --batch-size option for memory-efficient processing#306
superbiche wants to merge 1 commit intoDavidyz:mainfrom
superbiche:feat/batch-vectorise

Conversation

@superbiche
Copy link

Summary

Adds a --batch-size / -b option to the vectorise command that controls how many files are processed concurrently.

  • Default: 100 files per batch
  • Use -1 to disable batching (process all files at once, legacy behavior)

This addresses memory exhaustion issues when vectorizing large codebases with thousands of files, as it limits the number of concurrent async tasks created.

Changes

  • Added batch_size field to Config dataclass with default value of 100
  • Added -b/--batch_size CLI argument to vectorise subparser
  • Modified vectorise() to process files in batches using the configured batch size
  • Added tests for batch processing functionality

Test plan

  • pytest tests/subcommands/test_vectorise.py -v - all tests pass
  • Manual test: vectorcode vectorise /large/project -r -b 50 processes files in batches
  • Manual test: vectorcode vectorise /project -r -b -1 processes all files at once (legacy)

Process files in batches to limit concurrent asyncio task creation,
reducing RAM spikes when vectorizing large projects (1000+ files).

- Default batch size of 100 files
- Use -1 to disable batching (original behavior)
@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.72%. Comparing base (3eacce5) to head (4d289b0).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #306   +/-   ##
=======================================
  Coverage   99.72%   99.72%           
=======================================
  Files          25       25           
  Lines        1845     1851    +6     
=======================================
+ Hits         1840     1846    +6     
  Misses          5        5           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant