Batch-convert office documents in a folder, from the command line. One sub-command per conversion, no interactive prompts, no config files: point it at a folder (or a single file) and it does the rest.
| Sub-command | From | To | Backend (lazy import) |
|---|---|---|---|
docx2pdf |
.docx |
.pdf |
docx2pdf |
xlsx2csv |
.xlsx |
.csv |
openpyxl |
rtf2csv |
.rtf |
.csv |
striprtf |
The heavy backends are imported lazily, only when the matching conversion actually runs. That means you install just the dependency you need, the rest of the tool (argument parsing, file discovery, CSV writing) runs on the standard library alone, and the test-suite needs nothing extra.
I regularly needed to turn a folder of .docx, .xlsx or .rtf files into
something scriptable (PDF or CSV) without opening each one by hand. Existing
libraries each solve one format; this wraps three common conversions behind one
CLI and installs only the backend you actually use.
The core package has zero hard dependencies. Pull in the backend you want via an extra:
pip install "docs-convert[xlsx]" # xlsx -> csv
pip install "docs-convert[rtf]" # rtf -> csv
pip install "docs-convert[docx]" # docx -> pdf
pip install "docs-convert[all]" # everythingFrom a local checkout:
pip install -e ".[all]"xlsx2csvneeds thexlsxextra (openpyxl). Reads the active worksheet and writes one CSV per workbook.rtf2csvneeds thertfextra (striprtf). The RTF markup is stripped to plain text, then each line is split on the delimiter (default,).docx2pdfneeds thedocxextra (docx2pdf). On Windows and macOS this drives a local Microsoft Word installation; it does not work in a headless environment without Word. If you need a Word-free path, render the PDF with a different toolchain (e.g. a LibreOffice--headless --convert-to pdfpipeline) and feed the results to your workflow.
# Convert every .xlsx in a folder, output next to each source file
docs-convert xlsx2csv ./invoices
# Convert into a separate output directory
docs-convert xlsx2csv ./invoices --out ./csv_out
# A single file works too
docs-convert rtf2csv ./notes/list.rtf --out ./csv_out
# Tab-separated RTF -> CSV
docs-convert rtf2csv ./notes --delimiter $'\t'
# docx -> pdf for a whole folder
docs-convert docx2pdf ./contracts --out ./pdf_outCommon options for every sub-command:
input— a folder (batch all matching files, non-recursive) or a single file.--out DIR— write outputs toDIR. Defaults to alongside each input file.
rtf2csv additionally accepts --delimiter to control how each line is split.
Office lock/temp files (e.g. ~$report.docx) are skipped automatically during
discovery.
0— success (including "nothing to convert").1— at least one file failed to convert (the rest still ran).2— the input path does not exist.
Run the test-suite (standard library unittest, no network, no backends):
python3 -m unittest discover -s testsHonest next steps:
- Recursive discovery and glob patterns (currently non-recursive).
- An output-encoding option for the CSV writers (non-UTF-8 locales).
- A built-in Word-free
docx → pdfbackend (e.g. LibreOffice headless), today only suggested as a manual workaround.
MIT — see LICENSE.