Skip to content
Merged

Dev #98

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Vectorless is a reasoning-native document intelligence engine written in Rust.

## Principles

- **Reason, don't vector.** — Every retrieval decision is an LLM decision.
- **Model fails, we fail.** No silent degradation. No heuristic fallbacks.
- **No thought, no answer.** Only LLM-reasoned output counts as an answer.
- **Reason, don't vector.** Retrieval is a reasoning act, not a similarity computation.
- **Model fails, we fail.** No heuristic fallbacks, no silent degradation.
- **No thought, no answer.** Only reasoned output counts as an answer.

## Project Structure

Expand Down
103 changes: 92 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
<img src="https://vectorless.dev/img/with-title.png" alt="Vectorless" width="400">

<h1>Reasoning-based Document Engine</h1>
<h5>Reason, don't vector · Structure, not chunks · Agents, not embeddings · Exact, not synthesized</h5>

[![PyPI](https://img.shields.io/pypi/v/vectorless.svg)](https://pypi.org/project/vectorless/)
[![PyPI Downloads](https://static.pepy.tech/badge/vectorless/month)](https://pepy.tech/projects/vectorless)
Expand All @@ -13,10 +14,79 @@

</div>

**Reason, don't vector.**
**Vectorless** is a reasoning-native document engine written in Rust. It compiles documents into navigable trees, then dispatches **multiple agents** to find exactly what's relevant across your **PDFs, Markdown, reports, contracts**. No embeddings, no chunking, no approximate nearest neighbors. Every retrieval is a **reasoning** act.

**Vectorless** is a reasoning-based document engine with the core written in Rust. It will reason through any of your structured documents — **PDFs, Markdown, reports, contracts** — and retrieve only what's relevant. Nothing more, nothing less.
Light up a star and shine with us! ⭐

## Three Rules
- **Reason, don't vector.** Retrieval is a reasoning act, not a similarity computation.
- **Model fails, we fail.** No heuristic fallbacks, no silent degradation.
- **No thought, no answer.** Only reasoned output counts as an answer.

## Why Vectorless

Traditional RAG systems split documents into chunks, embed them into vectors, and retrieve by similarity. Vectorless takes a different approach: it preserves document structure as a navigable tree and lets agents reason through it.

| | Embedding-Based RAG | Vectorless |
|---|---|---|
| **Indexing** | Chunk → embed → vector store | Parse → compile → document tree |
| **Retrieval** | Cosine similarity (approximate) | Multi-agent navigation (exact) |
| **Structure** | Destroyed by chunking | Preserved as first-class tree |
| **Query handling** | Keyword/similarity match | Intent classification + decomposition |
| **Multi-hop reasoning** | Not supported | Orchestrator replans dynamically |
| **Output** | Retrieved chunks | Original text passages, exact |
| **Failure mode** | Silent degradation | Explicit — no reasoning, no answer |

## How It Works

### Four-Artifact Index Architecture

When a document is indexed, the compile pipeline builds four artifacts:

```
Content Layer Navigation Layer Reasoning Index Document Card
DocumentTree NavigationIndex ReasoningIndex DocCard
(TreeNode) (NavEntry, ChildRoute) (topic_paths, hot_nodes) (title, overview,
│ │ │ question hints)
│ │ │ │
Agent reads Agent reads every Agent's targeted Orchestrator reads
only on cat decision round search tool (grep) for multi-doc routing
```

- **Content Layer** — The raw document tree. The agent only accesses this when reading specific paragraphs (`cat`).
- **Navigation Layer** — Each non-leaf node stores an overview, question hints, and child routes (title + description). The agent reads this every round to decide where to go next.
- **Reasoning Index** — Keyword-topic mappings with weights. Provides the agent's `grep` tool with structured keyword data for targeted search within a document.
- **DocCard** — A compact document-level summary. The Orchestrator reads DocCards to decide which documents to navigate in multi-document queries, without loading full documents.

This separation means the agent makes routing decisions from lightweight metadata, not by scanning full content.

### Agent-Based Retrieval

```
Engine.query("What drove the revenue decline?")
├─ Query Understanding ── intent, concepts, strategy (LLM)
├─ Orchestrator ── analyzes query, dispatches Workers
│ │
│ ├─ Worker 1 ── ls → cd "Financials" → ls → cd "Revenue" → cat
│ └─ Worker 2 ── ls → cd "Risk Factors" → grep "decline" → cat
│ │
│ └─ evaluate ── insufficient? → replan → dispatch new paths → loop
└─ Fusion ── dedup, LLM-scored relevance, return with source attribution
```

Worker navigation commands:

| Command | Action | Reads |
|---------|--------|-------|
| `ls` | List child sections | Navigation Layer (ChildRoute) |
| `cd` | Enter a child section | Navigation Layer |
| `cat` | Read content at current node | Content Layer (DocumentTree) |
| `grep` | Search by keyword | Reasoning Index (topic_paths) |

The Orchestrator evaluates Worker results after each round. If evidence is insufficient, it **replans** — adjusting strategy, dispatching new paths, or deepening exploration. This continues until enough evidence is collected.

## Quick Start

Expand Down Expand Up @@ -44,19 +114,30 @@ async def main():
asyncio.run(main())
```

## What It's For
## Key Features

- **Rust Core** — The entire engine (indexing, retrieval, agent, storage) is implemented in Rust for performance and reliability. Python SDK via PyO3 bindings and a CLI are also provided.
- **Multi-Agent Retrieval** — Every query is handled by multiple cooperating agents: an Orchestrator plans and evaluates, Workers navigate documents. Each retrieval is a reasoning act — not a similarity score, but a sequence of LLM decisions about where to look, what to read, and when to stop.
- **Zero Vectors** — No embedding model, no vector store, no similarity search. This eliminates a class of failure modes: wrong chunk boundaries, stale embeddings, and similarity-score false positives.
- **Tree Navigation** — Documents are compiled into hierarchical trees that preserve the original structure — headings, sections, paragraphs, lists. Workers navigate this tree the way a human would: scan the table of contents, jump to the relevant section, read the passage.
- **Document-Exact Output** — Returns original text passages from the source document. No synthesis, no rewriting, no hallucinated content. What you get is what was written.
- **Multi-Document Orchestration** — Query across multiple documents with a single call. The Orchestrator dispatches Workers, evaluates evidence, and fuses results. When one document is insufficient, it replans and expands the search scope.
- **Query Understanding** — Every query passes through LLM-based intent classification, concept extraction, and strategy selection. Complex queries are decomposed into sub-queries. The system adapts its navigation strategy based on whether the query is factual, analytical, comparative, or navigational.
- **Checkpointable Pipeline** — The 8-stage compile pipeline writes checkpoints at each stage. If indexing is interrupted (LLM rate limit, network failure), it resumes from the last completed stage — no wasted work.
- **Incremental Updates** — Content fingerprinting detects changes at the node level. Re-indexing a modified document only recompiles the changed sections and their dependents.

Vectorless is designed for applications that need **precise** document retrieval:
## Supported Documents

- **Financial analysis** — Extract specific figures from reports, compare across filings
- **Legal research** — Find relevant clauses, trace definitions across documents
- **Technical documentation** — Navigate large manuals, locate specific procedures
- **Academic research** — Cross-reference findings across papers
- **Compliance** — Audit trails with source references for every answer
- **PDF** — Full text extraction with page metadata
- **Markdown** — Structure-aware parsing (headings, lists, code blocks)

## Examples
## Resources

See [examples/](examples/) for complete usage patterns.
- [Documentation](https://vectorless.dev) — Guides, architecture, API reference
- [Rust API Docs](https://docs.rs/vectorless) — Auto-generated crate documentation
- [PyPI](https://pypi.org/project/vectorless/) — Python package
- [Crates.io](https://crates.io/crates/vectorless) — Rust crate
- [Examples](examples/) — Complete usage patterns for Python and Rust

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion docs/docusaurus.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import type * as Preset from '@docusaurus/preset-classic';

const config: Config = {
title: 'Vectorless',
tagline: 'Reasoning-native Document Intelligence Engine',
tagline: 'Reasoning-based Document Engine',
favicon: 'img/favicon.ico',

future: {
Expand Down
126 changes: 122 additions & 4 deletions docs/src/pages/index.module.css
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,45 @@
flex-wrap: wrap;
}

/* ===== Three Rules ===== */
.rulesRow {
display: flex;
gap: 1.5rem;
justify-content: center;
flex-wrap: wrap;
max-width: 1000px;
margin: 0 auto;
}

.ruleCard {
flex: 1;
min-width: 240px;
max-width: 320px;
background: var(--card-bg);
border: 1px solid var(--border);
border-radius: 16px;
padding: 2.25rem 2rem;
text-align: center;
}

.ruleTitle {
font-size: 1.1rem;
font-weight: 700;
color: var(--primary-dark);
margin-bottom: 0.75rem;
font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
}

[data-theme='dark'] .ruleTitle {
color: var(--primary);
}

.ruleDesc {
font-size: 0.92rem;
line-height: 1.65;
color: var(--text-light);
}

/* GitHub Star button */
.githubStarButton {
display: inline-flex;
Expand Down Expand Up @@ -392,10 +431,72 @@
background: #D97706;
}

/* ===== Format Pills ===== */
.formatPills {
display: flex;
justify-content: center;
gap: 0.75rem;
margin-bottom: 2rem;
}

.formatPill {
display: inline-flex;
align-items: center;
padding: 0.35rem 1rem;
border-radius: 20px;
font-size: 0.8rem;
font-weight: 600;
font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
letter-spacing: -0.2px;
background: var(--primary-soft);
color: var(--primary-dark);
border: 1px solid var(--primary);
}

[data-theme='dark'] .formatPill {
color: var(--primary);
}

/* ===== Key Features Grid ===== */
.featureGrid {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: 1.5rem;
max-width: 1100px;
margin: 0 auto;
}

.featureCard {
background: var(--card-bg);
border: 1px solid var(--border);
border-radius: 16px;
padding: 2rem 1.75rem;
transition: border-color 0.2s, box-shadow 0.2s;
}

.featureCard:hover {
border-color: var(--primary);
box-shadow: 0 4px 20px rgba(245, 158, 11, 0.08);
}

.featureTitle {
font-size: 1.1rem;
font-weight: 700;
color: var(--text);
margin: 0 0 0.75rem;
}

.featureDesc {
font-size: 0.92rem;
line-height: 1.65;
color: var(--text-light);
margin: 0;
}

/* ===== Navigation Theater ===== */
.narrativeDemo {
background: var(--code-bg);
border: 1px solid var(--border);
background: #161A1F;
border: 1px solid #252A30;
border-radius: 16px;
padding: 2rem 2.5rem;
max-width: 780px;
Expand Down Expand Up @@ -440,7 +541,7 @@
top: 24px;
bottom: 24px;
width: 2px;
background: #2A3040;
background: #252A30;
border-radius: 1px;
}

Expand Down Expand Up @@ -473,7 +574,7 @@
height: 10px;
border-radius: 50%;
background: var(--primary);
border: 2px solid var(--code-bg);
border: 2px solid #161A1F;
z-index: 1;
}

Expand Down Expand Up @@ -787,6 +888,10 @@
.section {
padding: 3.5rem 1.5rem;
}

.featureGrid {
grid-template-columns: repeat(2, 1fr);
}
}

@media screen and (max-width: 600px) {
Expand Down Expand Up @@ -826,4 +931,17 @@
.sectionTitle {
font-size: 1.5rem;
}

.featureGrid {
grid-template-columns: 1fr;
}

.rulesRow {
flex-direction: column;
align-items: center;
}

.ruleCard {
max-width: 100%;
}
}
Loading
Loading