pfc-migrate — Convert compressed JSONL archives to PFC cold storage

Convert existing compressed JSONL archives from local disk, S3, Azure, or GCS to PFC format. No intermediate files, no schema changes, no pipelines.

Schema conversion (Apache logs, CSV, NDJSON → JSONL → PFC)? Use pfc-convert — it rewrites the data format and compresses in one step. pfc-migrate only swaps the compression wrapper (gzip/zstd → .pfc), content is unchanged.

What this does

Command	What it does
`pfc-migrate convert`	Convert gzip/zstd/bzip2/lz4/JSONL files to PFC
`pfc-migrate s3`	Convert JSONL archives in S3 in-place
`pfc-migrate glacier`	Restore + convert S3 Glacier archives to PFC
`pfc-migrate azure`	Convert JSONL archives in Azure Blob Storage
`pfc-migrate gcs`	Convert JSONL archives in Google Cloud Storage

Works with pfc-convert (pipe mode)

pfc-migrate accepts JSONL from stdin — combine with pfc-convert to convert schema and compress in one streaming pipeline:

# Apache CLF logs → JSONL → .pfc  (no temp files)
pfc-convert convert access.log.gz --schema apache --stdout \
  | pfc-migrate convert --stdin --out archive.pfc

Automated with pfc-ingest-watchdog

pfc-ingest-watchdog monitors folders or S3 prefixes and triggers pfc-migrate (or pfc-convert) automatically when new files arrive — no manual invocation needed.

Why convert?

Once your archives are in PFC format, DuckDB can query them directly — without decompressing the whole file first:

INSTALL pfc FROM community;
LOAD pfc;
LOAD json;

-- Query just one hour from a 30-day archive
SELECT line->>'$.level' AS level, line->>'$.message' AS message
FROM read_pfc_jsonl(
    '/var/log/pfc/app_2026-03-01.pfc',
    ts_from = epoch(TIMESTAMPTZ '2026-03-01 14:00:00+00'),
    ts_to   = epoch(TIMESTAMPTZ '2026-03-01 15:00:00+00')
);

Tool	1h query on 30-day archive	Storage vs gzip
gzip	Decompress full 30-day file	—
zstd	Decompress full 30-day file	—
PFC-JSONL	Decompress ~1/720 of the file	25% smaller than gzip

~6–11% compression ratio on typical JSONL log data (25–40% smaller than gzip).

Zero egress cost

Cloud conversions run in-region: download → convert → upload, without ever routing through your laptop or billing for egress.

Input Formats

Format	Extension	Extra dependency
gzip	`.jsonl.gz`	stdlib ✅
bzip2	`.jsonl.bz2`	stdlib ✅
zstd	`.jsonl.zst`	`pip install pfc-migrate[zstd]`
lz4	`.jsonl.lz4`	`pip install pfc-migrate[lz4]`
Plain JSONL	`.jsonl`	stdlib ✅

Requirements

The pfc_jsonl binary must be installed on the machine running the conversion:

# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# macOS (Apple Silicon M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

License note: pfc_jsonl is free for personal and open-source use. Commercial use requires a written license — see pfc-jsonl.

macOS Intel (x64): Binary coming soon. | Windows: Use WSL2 or a Linux machine.

Install

pip install pfc-migrate

# With zstd support
pip install pfc-migrate[zstd]

# With S3/Glacier support
pip install pfc-migrate[s3]

# With Azure Blob Storage support
pip install pfc-migrate[azure]

# With Google Cloud Storage support
pip install pfc-migrate[gcs]

# Everything
pip install pfc-migrate[all]

Usage — Local filesystem

# Single file (output auto-named: logs.pfc + logs.pfc.bidx)
pfc-migrate convert logs.jsonl.gz

# Explicit output
pfc-migrate convert logs.jsonl.gz logs.pfc

# Entire directory
pfc-migrate convert --dir /var/log/archive/ --output-dir /var/log/pfc/

# Recursive + verbose
pfc-migrate convert --dir /mnt/logs/ -r -v

Usage — Amazon S3 / S3 Glacier

Conversion happens in-region (download to temp dir → convert → upload). No egress charges.

# Single object
pfc-migrate s3 \
  --bucket my-logs \
  --key archive/app_2026-03.jsonl.gz \
  --out-bucket my-logs-pfc \
  --out-prefix converted/

# All objects matching a prefix
pfc-migrate s3 \
  --bucket my-logs \
  --prefix archive/2026-03/ \
  --out-bucket my-logs-pfc \
  --out-prefix converted/2026-03/ \
  --format gz \
  --verbose

# Glacier (Expedited retrieval)
pfc-migrate glacier \
  --bucket my-glacier-logs \
  --prefix 2025/ \
  --out-bucket my-glacier-pfc \
  --tier expedited

Usage — Azure Blob Storage

# All blobs matching a prefix
pfc-migrate azure \
  --container my-logs \
  --prefix archive/2026-03/ \
  --out-container my-logs-pfc \
  --connection-string "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;"

Usage — Google Cloud Storage

# All objects matching a prefix
pfc-migrate gcs \
  --bucket my-logs \
  --prefix archive/2026-03/ \
  --out-bucket my-logs-pfc \
  --verbose

Lossless guarantee

Every conversion is verified by full decompression and MD5 check before the output is written. If anything doesn't match, the output file is deleted and the error is reported — the original is never modified. For S3, GCS, and Azure subcommands, --delete removes the original only after successful verification.

Migrating from v1.x

In v2.0.0 the database export subcommands (cratedb, questdb, timescaledb, etc.) have been moved to dedicated standalone tools. This keeps pfc-migrate focused on what it was built for: converting compressed file archives.

Migration:

# Before (v1.x)
pfc-migrate cratedb --host localhost --table logs --output logs.pfc

# After (v2.0.0) — install the dedicated tool
pip install pfc-export-cratedb
pfc-export-cratedb --host localhost --table logs --output logs.pfc

All flags are identical — it's a drop-in replacement.

Part of the PFC Ecosystem

→ View all PFC tools & integrations

Direct integration	Why
pfc-convert	Pipe partner — schema conversion (Apache CLF, CSV → JSONL) before or after pfc-migrate
pfc-ingest-watchdog	Calls pfc-migrate automatically when new files arrive in folder or S3

License

pfc-migrate (this repository) is released under the MIT License — see LICENSE.

The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
examples		examples
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
VERSION.txt		VERSION.txt
pfc_migrate.py		pfc_migrate.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_all_backends.py		test_all_backends.py
test_lossless.py		test_lossless.py
test_pfc_migrate_v2.py		test_pfc_migrate_v2.py
test_pfc_migrate_v21.py		test_pfc_migrate_v21.py
test_s3_lossless.py		test_s3_lossless.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pfc-migrate — Convert compressed JSONL archives to PFC cold storage

What this does

Works with pfc-convert (pipe mode)

Automated with pfc-ingest-watchdog

Why convert?

Zero egress cost

Input Formats

Requirements

Install

Usage — Local filesystem

Usage — Amazon S3 / S3 Glacier

Usage — Azure Blob Storage

Usage — Google Cloud Storage

Lossless guarantee

Migrating from v1.x

Part of the PFC Ecosystem

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pfc-migrate — Convert compressed JSONL archives to PFC cold storage

What this does

Works with pfc-convert (pipe mode)

Automated with pfc-ingest-watchdog

Why convert?

Zero egress cost

Input Formats

Requirements

Install

Usage — Local filesystem

Usage — Amazon S3 / S3 Glacier

Usage — Azure Blob Storage

Usage — Google Cloud Storage

Lossless guarantee

Migrating from v1.x

Part of the PFC Ecosystem

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages