Skip to content
/ llmd Public

LLMD is a deterministic compiler that converts Markdown into a compact, LLM-optimized pathline format.

License

Notifications You must be signed in to change notification settings

Stevenic/llmd

Repository files navigation

LLMD — LLM-optimized Deterministic Markdown

LLMD is a deterministic compiler system that converts Markdown into a compact, token-efficient format designed for LLM context windows. It replaces verbose hierarchical Markdown with implicit scoping, structured attributes, and configurable compression — reducing token counts while preserving semantic recoverability.

Author: Steven Ickman | License: MIT


Quick Start

# JavaScript (Node.js 18+)
node tools/js/llmdc.js docs/llmdc.md -c 2 -o docs/llmdc.llmd

# Python (3.10+)
python tools/py/llmdc.py docs/llmdc.md -c 2 -o docs/llmdc.llmd

# Rust (single binary, no runtime needed)
cargo run --manifest-path tools/rust/Cargo.toml -- docs/llmdc.md -c 2 -o docs/llmdc.llmd

All three implementations produce identical output. Use whichever fits your environment.


What It Looks Like

Markdown input:

## Authentication
The API supports authentication via OAuth2 and API keys.
- Use OAuth2 for user-facing apps.
- Use API keys for server-to-server.
Rate limit: 1000 requests per minute.

LLMD output (c2):

@authentication
API supports authentication via OAuth2 and API keys
-Use OAuth2 user-facing apps
-Use API keys server-to-server
:rate_limit=1000/m.

Every line starts with a type prefix: @ scope, : attribute, - list item, relation, :: code block — or is plain text (no prefix) for prose.


When to Use LLMD

Stuffing reference docs into LLM context

You have API docs, component libraries, or style guides that need to fit in a system prompt or RAG chunk. LLMD strips markdown formatting overhead while keeping the content machine-readable.

CSS/design system references

Component tables with class names like flm-button--primary compress well — the compiler preserves hyphens in keys, extracts common prefixes, and retains column semantics so the LLM can generate correct markup.

API specification compression

Endpoint tables, parameter lists, and status code references convert naturally to :k=v attributes and - list items. Code examples pass through untouched inside :: blocks.

Multi-document context packing

When you need to fit several documents into a single context window, compile a directory at c2. The compiler handles file ordering deterministically and merges everything into one .llmd output.

Agentic tool context

Feed .llmd files as tool/function descriptions or system instructions to agents. The format is designed so LLMs can parse the scoped structure without explicit instructions.


Project Structure

llmd/
├── README.md                                      # This file
├── LICENSE                                        # MIT
│
├── LLMD Specification - v0.2.md                   # Format spec (line types, scoping, normalization)
├── LLMD Compiler Design v0.2.md                   # 6-stage pipeline spec
│
├── .architecture/
│   └── ARCHITECTURE.md                            # System overview and diagrams
│
├── docs/                                          # Tool reference documentation
│   ├── llmdc.md                                   # Compiler reference
│   ├── schema2llmd.md                             # Schema converter reference
│   └── llmdc.llmd                                 # Pre-compiled LLMD version
│
├── config/
│   └── llmdc.config.json                          # Compiler config (stopwords, phrases, units)
│
├── tools/
│   ├── js/                                        # Node.js implementations
│   │   ├── llmdc.js                               # Compiler
│   │   └── schema2llmd.js                         # Schema converter
│   ├── py/                                        # Python implementations
│   │   ├── llmdc.py                               # Compiler
│   │   └── schema2llmd.py                         # Schema converter
│   └── rust/                                      # Rust implementation
│       └── src/                                   # Compiler + schema converter
│
└── corpora/
    └── samples/                                   # Sample documents for testing
        ├── api-spec.md
        └── fluentlm-components.md

Tools

Tool JS Python Rust Purpose
llmdc tools/js/llmdc.js tools/py/llmdc.py tools/rust/ Compile Markdown → LLMD
schema2llmd tools/js/schema2llmd.js tools/py/schema2llmd.py tools/rust/ Convert JSON Schema → LLMD

Full reference docs: docs/

Performance

All three implementations produce identical output. Measured on Windows 11 (median of 5 runs, c2 compression):

File JS (Node 22) Python 3.10 Rust (release)
api-spec.md (1.3 KB) 140 ms 238 ms 61 ms
fluentlm-components.md (45 KB) 243 ms 354 ms 73 ms

Run pwsh tools/bench.ps1 or bash tools/bench.sh to reproduce.


Compression Levels

Level Name What it does
c0 Structural normalize Whitespace cleanup, structure conversion
c1 Compact structure Merge :k=v pairs, collapse blanks, prefix extraction
c2 Token compaction Stopword removal, phrase/unit normalization, boolean compression

Key Features

  • Hyphen-preserving key normalization — CSS class names like flm-button--primary survive compilation intact
  • Table classification — 2-column property tables emit :k=v, 3+ column tables with identifier keys emit :key=v1¦v2, others emit plain text rows with :_cols= headers
  • Common prefix extraction — When keys share a prefix (e.g., flm-text--), it's factored out as :_pfx= to avoid repetition
  • Chunked KV emission — Large attribute groups split across multiple lines (max_kv_per_line, default 4)
  • Boolean compression — Columns of Yes/No, true/false, enabled/disabledY/N, T/F
  • Column header preservation:_col= and :_cols= meta-attributes retain table column semantics
  • Code block passthrough — Fenced code blocks preserved exactly inside ::lang / <<< / >>> delimiters
  • Deterministic output — Same input + config always produces identical output

Typical Workflow

# Compile your docs at c2 (good default)
node tools/js/llmdc.js my-docs/ -c 2 -o context.llmd

# Or compile at c0/c1 for less aggressive compression
node tools/js/llmdc.js my-docs/ -c 0 -o context.llmd

Using with Agentic Coding Tools

LLMD works well with AI coding assistants like Claude Code and GitHub Copilot. These tools can automate the compilation workflow, and .llmd files make efficient context for AI-driven tasks.

Feed .llmd as context

Compiled .llmd files are smaller and cheaper to include in system prompts, tool descriptions, or RAG results. If your agent needs a component reference or API spec, give it the .llmd version instead of raw Markdown.

Automate compilation in your workflow

Add instructions to your project's CLAUDE.md or Copilot instructions file:

## LLMD Compilation
When documentation files in `docs/` are modified, recompile them:
- Run `node tools/js/llmdc.js docs/ -c 2 -o docs/compiled.llmd`
- The compiled output goes to `docs/compiled.llmd`
- Always compile at c2 unless asked otherwise

Let the agent run compilation

For Claude Code, you can ask it to compile in one shot:

Compile corpora/samples/ at c2 and tell me the token savings.

The agent can execute the shell commands, read the output, and summarize results without you needing to remember the CLI arguments.

Tips for best results

  • Point the agent at docs/ — The reference docs in docs/*.md describe every CLI option and config key. An agent that reads these can run any tool correctly.
  • Use the config fileconfig/llmdc.config.json is self-documenting. Agents can read and modify it for tuning.
  • Batch compile on change — Set up a hook or ask the agent to recompile whenever source docs change, so .llmd versions stay current.

LLMD Reading Guide (for LLM System Prompts)

When embedding .llmd content in a system prompt or tool context, include the following instructions so the model can interpret the format correctly:

Content below is LLMD v0.2 — a compressed, token-optimized format. Read it as follows:

Line types (each non-empty line starts with exactly one prefix, or none for prose):

- @name — scope. Sets the current topic. All following lines belong to this scope
  until the next @. Hierarchy is flattened: @Auth after @API means separate scopes,
  not nested. Reconstruct context from scope names.
- :k=v k2=v2 — attributes. Key-value facts about the current scope. ¦ (broken bar,
  U+00A6) separates multiple values (e.g., methods=oauth2¦apikey). Multiple pairs
  may appear on one line, space-separated. Parse each pair by splitting on the
  first = (keys never contain =).
- plain text (no prefix) — prose about the current scope.
- -item — list item. Nested depth uses dots: -. child, -.. grandchild.
- →Node — relation. Current scope depends on Node. ←Node is reverse. =Node is
  equivalence. Trailing ? means optional (e.g., →Cache?).
- ::lang followed by <<<...>>> — literal block. Code or data preserved exactly,
  not compressed.
- ~k=v — file metadata. Optional, appears at top of file.

Reserved meta-attributes (compiler-generated, prefixed with _):

- :_col=<header> — column header for a 2-column property table.
- :_cols=c1¦c2¦c3 — column headers for a multi-column table.
- :_pfx=<prefix> — common prefix extracted from subsequent keys. Prepend it to
  restore full key names (e.g., :_pfx=flm-text-- then :secondary=... means the
  full key is flm-text--secondary).

Compression artifacts (content may be shortened — infer original phrasing):

- Common words (the, a, is, are, of, etc.) may be removed from prose and list items.
- Long phrases replaced with short forms (e.g., "in order to" → "to",
  "application programming interface" → "API", "specification" → "spec").
- Units shortened (e.g., "1000 requests per minute" → "1000/m",
  "seconds" → "s", "megabytes" → "MB").
- Boolean values compressed (Yes/No → Y/N, true/false → T/F,
  enabled/disabled → Y/N).
- Trailing periods stripped from prose and list items.
- Negation (no, not, never) and modals (must, should, may, always) are always
  preserved.

Specifications

Document Description
LLMD Specification v0.2 Format definition: line types, scoping model, normalization rules, compression levels
Compiler Design v0.2 6-stage pipeline architecture, table classification, prefix extraction, config reference
Architecture System overview, component diagrams, file relationships

About

LLMD is a deterministic compiler that converts Markdown into a compact, LLM-optimized pathline format.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •