LLMD — LLM-optimized Deterministic Markdown

LLMD is a deterministic compiler system that converts Markdown into a compact, token-efficient format designed for LLM context windows. It replaces verbose hierarchical Markdown with implicit scoping, structured attributes, and configurable compression — reducing token counts while preserving semantic recoverability.

Author: Steven Ickman | License: MIT

Quick Start

# JavaScript (Node.js 18+)
node tools/js/llmdc.js docs/llmdc.md -c 2 -o docs/llmdc.llmd

# Python (3.10+)
python tools/py/llmdc.py docs/llmdc.md -c 2 -o docs/llmdc.llmd

# Rust (single binary, no runtime needed)
cargo run --manifest-path tools/rust/Cargo.toml -- docs/llmdc.md -c 2 -o docs/llmdc.llmd

All three implementations produce identical output. Use whichever fits your environment.

What It Looks Like

Markdown input:

## Authentication
The API supports authentication via OAuth2 and API keys.
- Use OAuth2 for user-facing apps.
- Use API keys for server-to-server.
Rate limit: 1000 requests per minute.

LLMD output (c2):

@authentication
API supports authentication via OAuth2 and API keys
-Use OAuth2 user-facing apps
-Use API keys server-to-server
:rate_limit=1000/m.

Every line starts with a type prefix: @ scope, : attribute, - list item, → relation, :: code block — or is plain text (no prefix) for prose.

When to Use LLMD

Stuffing reference docs into LLM context

You have API docs, component libraries, or style guides that need to fit in a system prompt or RAG chunk. LLMD strips markdown formatting overhead while keeping the content machine-readable.

CSS/design system references

Component tables with class names like flm-button--primary compress well — the compiler preserves hyphens in keys, extracts common prefixes, and retains column semantics so the LLM can generate correct markup.

API specification compression

Endpoint tables, parameter lists, and status code references convert naturally to :k=v attributes and - list items. Code examples pass through untouched inside :: blocks.

Multi-document context packing

When you need to fit several documents into a single context window, compile a directory at c2. The compiler handles file ordering deterministically and merges everything into one .llmd output.

Agentic tool context

Feed .llmd files as tool/function descriptions or system instructions to agents. The format is designed so LLMs can parse the scoped structure without explicit instructions.

Project Structure

llmd/
├── README.md                                      # This file
├── LICENSE                                        # MIT
│
├── LLMD Specification - v0.2.md                   # Format spec (line types, scoping, normalization)
├── LLMD Compiler Design v0.2.md                   # 6-stage pipeline spec
│
├── .architecture/
│   └── ARCHITECTURE.md                            # System overview and diagrams
│
├── docs/                                          # Tool reference documentation
│   ├── llmdc.md                                   # Compiler reference
│   ├── schema2llmd.md                             # Schema converter reference
│   └── llmdc.llmd                                 # Pre-compiled LLMD version
│
├── config/
│   └── llmdc.config.json                          # Compiler config (stopwords, phrases, units)
│
├── tools/
│   ├── js/                                        # Node.js implementations
│   │   ├── llmdc.js                               # Compiler
│   │   └── schema2llmd.js                         # Schema converter
│   ├── py/                                        # Python implementations
│   │   ├── llmdc.py                               # Compiler
│   │   └── schema2llmd.py                         # Schema converter
│   └── rust/                                      # Rust implementation
│       └── src/                                   # Compiler + schema converter
│
└── corpora/
    └── samples/                                   # Sample documents for testing
        ├── api-spec.md
        └── fluentlm-components.md

Tools

Tool	JS	Python	Rust	Purpose
llmdc	`tools/js/llmdc.js`	`tools/py/llmdc.py`	`tools/rust/`	Compile Markdown → LLMD
schema2llmd	`tools/js/schema2llmd.js`	`tools/py/schema2llmd.py`	`tools/rust/`	Convert JSON Schema → LLMD

Full reference docs: docs/

Performance

All three implementations produce identical output. Measured on Windows 11 (median of 5 runs, c2 compression):

File	JS (Node 22)	Python 3.10	Rust (release)
api-spec.md (1.3 KB)	140 ms	238 ms	61 ms
fluentlm-components.md (45 KB)	243 ms	354 ms	73 ms

Run pwsh tools/bench.ps1 or bash tools/bench.sh to reproduce.

Compression Levels

Level	Name	What it does
c0	Structural normalize	Whitespace cleanup, structure conversion
c1	Compact structure	Merge `:k=v` pairs, collapse blanks, prefix extraction
c2	Token compaction	Stopword removal, phrase/unit normalization, boolean compression

Key Features

Hyphen-preserving key normalization — CSS class names like flm-button--primary survive compilation intact
Table classification — 2-column property tables emit :k=v, 3+ column tables with identifier keys emit :key=v1¦v2, others emit plain text rows with :_cols= headers
Common prefix extraction — When keys share a prefix (e.g., flm-text--), it's factored out as :_pfx= to avoid repetition
Chunked KV emission — Large attribute groups split across multiple lines (max_kv_per_line, default 4)
Boolean compression — Columns of Yes/No, true/false, enabled/disabled → Y/N, T/F
Column header preservation — :_col= and :_cols= meta-attributes retain table column semantics
Code block passthrough — Fenced code blocks preserved exactly inside ::lang / <<< / >>> delimiters
Deterministic output — Same input + config always produces identical output

Typical Workflow

# Compile your docs at c2 (good default)
node tools/js/llmdc.js my-docs/ -c 2 -o context.llmd

# Or compile at c0/c1 for less aggressive compression
node tools/js/llmdc.js my-docs/ -c 0 -o context.llmd

Using with Agentic Coding Tools

LLMD works well with AI coding assistants like Claude Code and GitHub Copilot. These tools can automate the compilation workflow, and .llmd files make efficient context for AI-driven tasks.

Feed .llmd as context

Compiled .llmd files are smaller and cheaper to include in system prompts, tool descriptions, or RAG results. If your agent needs a component reference or API spec, give it the .llmd version instead of raw Markdown.

Automate compilation in your workflow

Add instructions to your project's CLAUDE.md or Copilot instructions file:

## LLMD Compilation
When documentation files in `docs/` are modified, recompile them:
- Run `node tools/js/llmdc.js docs/ -c 2 -o docs/compiled.llmd`
- The compiled output goes to `docs/compiled.llmd`
- Always compile at c2 unless asked otherwise

Let the agent run compilation

For Claude Code, you can ask it to compile in one shot:

Compile corpora/samples/ at c2 and tell me the token savings.

The agent can execute the shell commands, read the output, and summarize results without you needing to remember the CLI arguments.

Tips for best results

Point the agent at docs/ — The reference docs in docs/*.md describe every CLI option and config key. An agent that reads these can run any tool correctly.
Use the config file — config/llmdc.config.json is self-documenting. Agents can read and modify it for tuning.
Batch compile on change — Set up a hook or ask the agent to recompile whenever source docs change, so .llmd versions stay current.

LLMD Reading Guide (for LLM System Prompts)

When embedding .llmd content in a system prompt or tool context, include the following instructions so the model can interpret the format correctly:

Content below is LLMD v0.2 — a compressed, token-optimized format. Read it as follows:

Line types (each non-empty line starts with exactly one prefix, or none for prose):

- @name — scope. Sets the current topic. All following lines belong to this scope
  until the next @. Hierarchy is flattened: @Auth after @API means separate scopes,
  not nested. Reconstruct context from scope names.
- :k=v k2=v2 — attributes. Key-value facts about the current scope. ¦ (broken bar,
  U+00A6) separates multiple values (e.g., methods=oauth2¦apikey). Multiple pairs
  may appear on one line, space-separated. Parse each pair by splitting on the
  first = (keys never contain =).
- plain text (no prefix) — prose about the current scope.
- -item — list item. Nested depth uses dots: -. child, -.. grandchild.
- →Node — relation. Current scope depends on Node. ←Node is reverse. =Node is
  equivalence. Trailing ? means optional (e.g., →Cache?).
- ::lang followed by <<<...>>> — literal block. Code or data preserved exactly,
  not compressed.
- ~k=v — file metadata. Optional, appears at top of file.

Reserved meta-attributes (compiler-generated, prefixed with _):

- :_col=<header> — column header for a 2-column property table.
- :_cols=c1¦c2¦c3 — column headers for a multi-column table.
- :_pfx=<prefix> — common prefix extracted from subsequent keys. Prepend it to
  restore full key names (e.g., :_pfx=flm-text-- then :secondary=... means the
  full key is flm-text--secondary).

Compression artifacts (content may be shortened — infer original phrasing):

- Common words (the, a, is, are, of, etc.) may be removed from prose and list items.
- Long phrases replaced with short forms (e.g., "in order to" → "to",
  "application programming interface" → "API", "specification" → "spec").
- Units shortened (e.g., "1000 requests per minute" → "1000/m",
  "seconds" → "s", "megabytes" → "MB").
- Boolean values compressed (Yes/No → Y/N, true/false → T/F,
  enabled/disabled → Y/N).
- Trailing periods stripped from prose and list items.
- Negation (no, not, never) and modals (must, should, may, always) are always
  preserved.

Specifications

Document	Description
LLMD Specification v0.2	Format definition: line types, scoping model, normalization rules, compression levels
Compiler Design v0.2	6-stage pipeline architecture, table classification, prefix extraction, config reference
Architecture	System overview, component diagrams, file relationships

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMD — LLM-optimized Deterministic Markdown

Quick Start

What It Looks Like

When to Use LLMD

Stuffing reference docs into LLM context

CSS/design system references

API specification compression

Multi-document context packing

Agentic tool context

Project Structure

Tools

Performance

Compression Levels

Key Features

Typical Workflow

Using with Agentic Coding Tools

Feed .llmd as context

Automate compilation in your workflow

Let the agent run compilation

Tips for best results

LLMD Reading Guide (for LLM System Prompts)

Specifications

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.architecture		.architecture
.claude		.claude
config		config
corpora		corpora
docs		docs
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
LLMD Compiler Design v0.2.llmd		LLMD Compiler Design v0.2.llmd
LLMD Compiler Design v0.2.md		LLMD Compiler Design v0.2.md
LLMD Specification - v0.2.llmd		LLMD Specification - v0.2.llmd
LLMD Specification - v0.2.md		LLMD Specification - v0.2.md
README.llmd		README.llmd
README.md		README.md

License

Stevenic/llmd

Folders and files

Latest commit

History

Repository files navigation

LLMD — LLM-optimized Deterministic Markdown

Quick Start

What It Looks Like

When to Use LLMD

Stuffing reference docs into LLM context

CSS/design system references

API specification compression

Multi-document context packing

Agentic tool context

Project Structure

Tools

Performance

Compression Levels

Key Features

Typical Workflow

Using with Agentic Coding Tools

Feed .llmd as context

Automate compilation in your workflow

Let the agent run compilation

Tips for best results

LLMD Reading Guide (for LLM System Prompts)

Specifications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages