docs-convert

Batch-convert office documents in a folder, from the command line. One sub-command per conversion, no interactive prompts, no config files: point it at a folder (or a single file) and it does the rest.

Sub-command	From	To	Backend (lazy import)
`docx2pdf`	`.docx`	`.pdf`	`docx2pdf`
`xlsx2csv`	`.xlsx`	`.csv`	`openpyxl`
`rtf2csv`	`.rtf`	`.csv`	`striprtf`

The heavy backends are imported lazily, only when the matching conversion actually runs. That means you install just the dependency you need, the rest of the tool (argument parsing, file discovery, CSV writing) runs on the standard library alone, and the test-suite needs nothing extra.

Why I built this

I regularly needed to turn a folder of .docx, .xlsx or .rtf files into something scriptable (PDF or CSV) without opening each one by hand. Existing libraries each solve one format; this wraps three common conversions behind one CLI and installs only the backend you actually use.

Install

The core package has zero hard dependencies. Pull in the backend you want via an extra:

pip install "docs-convert[xlsx]"   # xlsx -> csv
pip install "docs-convert[rtf]"    # rtf  -> csv
pip install "docs-convert[docx]"   # docx -> pdf
pip install "docs-convert[all]"    # everything

From a local checkout:

pip install -e ".[all]"

Per-conversion dependency notes

xlsx2csv needs the xlsx extra (openpyxl). Reads the active worksheet and writes one CSV per workbook.
rtf2csv needs the rtf extra (striprtf). The RTF markup is stripped to plain text, then each line is split on the delimiter (default ,).
docx2pdf needs the docx extra (docx2pdf). On Windows and macOS this drives a local Microsoft Word installation; it does not work in a headless environment without Word. If you need a Word-free path, render the PDF with a different toolchain (e.g. a LibreOffice --headless --convert-to pdf pipeline) and feed the results to your workflow.

Usage

# Convert every .xlsx in a folder, output next to each source file
docs-convert xlsx2csv ./invoices

# Convert into a separate output directory
docs-convert xlsx2csv ./invoices --out ./csv_out

# A single file works too
docs-convert rtf2csv ./notes/list.rtf --out ./csv_out

# Tab-separated RTF -> CSV
docs-convert rtf2csv ./notes --delimiter $'\t'

# docx -> pdf for a whole folder
docs-convert docx2pdf ./contracts --out ./pdf_out

Common options for every sub-command:

input — a folder (batch all matching files, non-recursive) or a single file.
--out DIR — write outputs to DIR. Defaults to alongside each input file.

rtf2csv additionally accepts --delimiter to control how each line is split.

Office lock/temp files (e.g. ~$report.docx) are skipped automatically during discovery.

Exit codes

0 — success (including "nothing to convert").
1 — at least one file failed to convert (the rest still ran).
2 — the input path does not exist.

Development

Run the test-suite (standard library unittest, no network, no backends):

python3 -m unittest discover -s tests

Roadmap

Honest next steps:

Recursive discovery and glob patterns (currently non-recursive).
An output-encoding option for the CSV writers (non-UTF-8 locales).
A built-in Word-free docx → pdf backend (e.g. LibreOffice headless), today only suggested as a manual workaround.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src/docs_convert		src/docs_convert
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE_READINESS.md		RELEASE_READINESS.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docs-convert

Why I built this

Install

Per-conversion dependency notes

Usage

Exit codes

Development

Roadmap

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

docs-convert

Why I built this

Install

Per-conversion dependency notes

Usage

Exit codes

Development

Roadmap

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages