LLMD is a deterministic compiler system that converts Markdown into a compact, token-efficient format designed for LLM context windows. It replaces verbose hierarchical Markdown with implicit scoping, structured attributes, and configurable compression — reducing token counts while preserving semantic recoverability.
Author: Steven Ickman | License: MIT
# JavaScript (Node.js 18+)
node tools/js/llmdc.js docs/llmdc.md -c 2 -o docs/llmdc.llmd
# Python (3.10+)
python tools/py/llmdc.py docs/llmdc.md -c 2 -o docs/llmdc.llmd
# Rust (single binary, no runtime needed)
cargo run --manifest-path tools/rust/Cargo.toml -- docs/llmdc.md -c 2 -o docs/llmdc.llmdAll three implementations produce identical output. Use whichever fits your environment.
Markdown input:
## Authentication
The API supports authentication via OAuth2 and API keys.
- Use OAuth2 for user-facing apps.
- Use API keys for server-to-server.
Rate limit: 1000 requests per minute.LLMD output (c2):
@authentication
API supports authentication via OAuth2 and API keys
-Use OAuth2 user-facing apps
-Use API keys server-to-server
:rate_limit=1000/m.
Every line starts with a type prefix: @ scope, : attribute, - list item, → relation, :: code block — or is plain text (no prefix) for prose.
You have API docs, component libraries, or style guides that need to fit in a system prompt or RAG chunk. LLMD strips markdown formatting overhead while keeping the content machine-readable.
Component tables with class names like flm-button--primary compress well — the compiler preserves hyphens in keys, extracts common prefixes, and retains column semantics so the LLM can generate correct markup.
Endpoint tables, parameter lists, and status code references convert naturally to :k=v attributes and - list items. Code examples pass through untouched inside :: blocks.
When you need to fit several documents into a single context window, compile a directory at c2. The compiler handles file ordering deterministically and merges everything into one .llmd output.
Feed .llmd files as tool/function descriptions or system instructions to agents. The format is designed so LLMs can parse the scoped structure without explicit instructions.
llmd/
├── README.md # This file
├── LICENSE # MIT
│
├── LLMD Specification - v0.2.md # Format spec (line types, scoping, normalization)
├── LLMD Compiler Design v0.2.md # 6-stage pipeline spec
│
├── .architecture/
│ └── ARCHITECTURE.md # System overview and diagrams
│
├── docs/ # Tool reference documentation
│ ├── llmdc.md # Compiler reference
│ ├── schema2llmd.md # Schema converter reference
│ └── llmdc.llmd # Pre-compiled LLMD version
│
├── config/
│ └── llmdc.config.json # Compiler config (stopwords, phrases, units)
│
├── tools/
│ ├── js/ # Node.js implementations
│ │ ├── llmdc.js # Compiler
│ │ └── schema2llmd.js # Schema converter
│ ├── py/ # Python implementations
│ │ ├── llmdc.py # Compiler
│ │ └── schema2llmd.py # Schema converter
│ └── rust/ # Rust implementation
│ └── src/ # Compiler + schema converter
│
└── corpora/
└── samples/ # Sample documents for testing
├── api-spec.md
└── fluentlm-components.md
| Tool | JS | Python | Rust | Purpose |
|---|---|---|---|---|
| llmdc | tools/js/llmdc.js |
tools/py/llmdc.py |
tools/rust/ |
Compile Markdown → LLMD |
| schema2llmd | tools/js/schema2llmd.js |
tools/py/schema2llmd.py |
tools/rust/ |
Convert JSON Schema → LLMD |
Full reference docs: docs/
All three implementations produce identical output. Measured on Windows 11 (median of 5 runs, c2 compression):
| File | JS (Node 22) | Python 3.10 | Rust (release) |
|---|---|---|---|
| api-spec.md (1.3 KB) | 140 ms | 238 ms | 61 ms |
| fluentlm-components.md (45 KB) | 243 ms | 354 ms | 73 ms |
Run pwsh tools/bench.ps1 or bash tools/bench.sh to reproduce.
| Level | Name | What it does |
|---|---|---|
| c0 | Structural normalize | Whitespace cleanup, structure conversion |
| c1 | Compact structure | Merge :k=v pairs, collapse blanks, prefix extraction |
| c2 | Token compaction | Stopword removal, phrase/unit normalization, boolean compression |
- Hyphen-preserving key normalization — CSS class names like
flm-button--primarysurvive compilation intact - Table classification — 2-column property tables emit
:k=v, 3+ column tables with identifier keys emit:key=v1¦v2, others emit plain text rows with:_cols=headers - Common prefix extraction — When keys share a prefix (e.g.,
flm-text--), it's factored out as:_pfx=to avoid repetition - Chunked KV emission — Large attribute groups split across multiple lines (
max_kv_per_line, default 4) - Boolean compression — Columns of
Yes/No,true/false,enabled/disabled→Y/N,T/F - Column header preservation —
:_col=and:_cols=meta-attributes retain table column semantics - Code block passthrough — Fenced code blocks preserved exactly inside
::lang/<<</>>>delimiters - Deterministic output — Same input + config always produces identical output
# Compile your docs at c2 (good default)
node tools/js/llmdc.js my-docs/ -c 2 -o context.llmd
# Or compile at c0/c1 for less aggressive compression
node tools/js/llmdc.js my-docs/ -c 0 -o context.llmdLLMD works well with AI coding assistants like Claude Code and GitHub Copilot. These tools can automate the compilation workflow, and .llmd files make efficient context for AI-driven tasks.
Compiled .llmd files are smaller and cheaper to include in system prompts, tool descriptions, or RAG results. If your agent needs a component reference or API spec, give it the .llmd version instead of raw Markdown.
Add instructions to your project's CLAUDE.md or Copilot instructions file:
## LLMD Compilation
When documentation files in `docs/` are modified, recompile them:
- Run `node tools/js/llmdc.js docs/ -c 2 -o docs/compiled.llmd`
- The compiled output goes to `docs/compiled.llmd`
- Always compile at c2 unless asked otherwiseFor Claude Code, you can ask it to compile in one shot:
Compile corpora/samples/ at c2 and tell me the token savings.
The agent can execute the shell commands, read the output, and summarize results without you needing to remember the CLI arguments.
- Point the agent at
docs/— The reference docs indocs/*.mddescribe every CLI option and config key. An agent that reads these can run any tool correctly. - Use the config file —
config/llmdc.config.jsonis self-documenting. Agents can read and modify it for tuning. - Batch compile on change — Set up a hook or ask the agent to recompile whenever source docs change, so
.llmdversions stay current.
When embedding .llmd content in a system prompt or tool context, include the following instructions so the model can interpret the format correctly:
Content below is LLMD v0.2 — a compressed, token-optimized format. Read it as follows:
Line types (each non-empty line starts with exactly one prefix, or none for prose):
- @name — scope. Sets the current topic. All following lines belong to this scope
until the next @. Hierarchy is flattened: @Auth after @API means separate scopes,
not nested. Reconstruct context from scope names.
- :k=v k2=v2 — attributes. Key-value facts about the current scope. ¦ (broken bar,
U+00A6) separates multiple values (e.g., methods=oauth2¦apikey). Multiple pairs
may appear on one line, space-separated. Parse each pair by splitting on the
first = (keys never contain =).
- plain text (no prefix) — prose about the current scope.
- -item — list item. Nested depth uses dots: -. child, -.. grandchild.
- →Node — relation. Current scope depends on Node. ←Node is reverse. =Node is
equivalence. Trailing ? means optional (e.g., →Cache?).
- ::lang followed by <<<...>>> — literal block. Code or data preserved exactly,
not compressed.
- ~k=v — file metadata. Optional, appears at top of file.
Reserved meta-attributes (compiler-generated, prefixed with _):
- :_col=<header> — column header for a 2-column property table.
- :_cols=c1¦c2¦c3 — column headers for a multi-column table.
- :_pfx=<prefix> — common prefix extracted from subsequent keys. Prepend it to
restore full key names (e.g., :_pfx=flm-text-- then :secondary=... means the
full key is flm-text--secondary).
Compression artifacts (content may be shortened — infer original phrasing):
- Common words (the, a, is, are, of, etc.) may be removed from prose and list items.
- Long phrases replaced with short forms (e.g., "in order to" → "to",
"application programming interface" → "API", "specification" → "spec").
- Units shortened (e.g., "1000 requests per minute" → "1000/m",
"seconds" → "s", "megabytes" → "MB").
- Boolean values compressed (Yes/No → Y/N, true/false → T/F,
enabled/disabled → Y/N).
- Trailing periods stripped from prose and list items.
- Negation (no, not, never) and modals (must, should, may, always) are always
preserved.
| Document | Description |
|---|---|
| LLMD Specification v0.2 | Format definition: line types, scoping model, normalization rules, compression levels |
| Compiler Design v0.2 | 6-stage pipeline architecture, table classification, prefix extraction, config reference |
| Architecture | System overview, component diagrams, file relationships |