diff --git a/docs/blog/2026-04-12-welcome/index.mdx b/docs/blog/2026-04-12-welcome/index.mdx index aa6e8a9..fea3e94 100644 --- a/docs/blog/2026-04-12-welcome/index.mdx +++ b/docs/blog/2026-04-12-welcome/index.mdx @@ -16,7 +16,7 @@ Traditional RAG systems rely on vector embeddings and similarity search. This ap Vectorless takes a different path: - **Hierarchical Semantic Trees** — Documents are parsed into a tree of sections, preserving structure and relationships. -- **LLM Navigation** — Queries are resolved by intelligently traversing the tree, not by comparing vectors. +- **LLM Agent Navigation** — Queries are resolved by agents that navigate the tree using commands (ls, cd, cat, find, grep), making every decision through LLM reasoning. - **Zero Infrastructure** — No vector DB, no embedding models, no similarity search. Just an LLM API key. ## Quick Start @@ -25,7 +25,7 @@ Vectorless takes a different path: ```python import asyncio -from vectorless import Engine, IndexContext +from vectorless import Engine, IndexContext, QueryContext async def main(): engine = Engine( @@ -38,7 +38,9 @@ async def main(): doc_id = result.doc_id # Query - answer = await engine.query(doc_id, "What is the total revenue?") + answer = await engine.query( + QueryContext("What is the total revenue?").with_doc_ids([doc_id]) + ) print(answer.single().content) asyncio.run(main()) @@ -69,11 +71,17 @@ async fn main() -> vectorless::Result<()> { } ``` -## What's Next? +## How It Works + +1. **Index** — Documents are parsed into hierarchical semantic trees with pre-computed navigation indexes and keyword mappings. +2. **Query** — The Orchestrator coordinates multi-document retrieval by dispatching Worker agents. Each Worker navigates the tree using commands, collects evidence, and self-evaluates sufficiency. +3. **Result** — Evidence is deduplicated, ranked by BM25 relevance, and returned as original document text. + +## What's Next -- Cross-document relationship graph -- Incremental indexing with content fingerprinting -- Multi-format support (Markdown, PDF, DOCX) +- Cross-document graph-aware retrieval with score boosting +- DOCX format support +- Streaming query results with real-time progress events The project is open source under Apache-2.0. Contributions welcome! diff --git a/docs/docs/architecture.mdx b/docs/docs/architecture.mdx index 3bb37ca..c5edfe1 100644 --- a/docs/docs/architecture.mdx +++ b/docs/docs/architecture.mdx @@ -16,7 +16,7 @@ Vectorless transforms documents into hierarchical semantic trees and uses LLM-po │ ┌──────────────┐ ┌──────▼───────┐ │ Result │◀────│ Retrieval │ - │ (Answer) │ │ Pipeline │ + │ (Evidence) │ │ Pipeline │ └──────────────┘ └──────────────┘ ``` @@ -33,6 +33,7 @@ The indexing pipeline processes documents through ordered stages: | **Enhance** | 30 | Generate LLM summaries (Full, Selective, or Lazy strategy) | | **Enrich** | 40 | Calculate metadata, page ranges, resolve cross-references | | **Reasoning Index** | 45 | Build keyword-to-node mappings, synonym expansion, summary shortcuts | +| **Navigation Index** | 50 | Build NavEntry + ChildRoute data for agent navigation | | **Optimize** | 60 | Final tree optimization | Each stage is independently configurable. The pipeline supports incremental re-indexing via content fingerprinting. @@ -70,11 +71,11 @@ Engine.query() → Dispatcher → Query Understanding (LLM) → QueryPlan (intent, concepts, strategy) → Orchestrator (always — single or multi-doc) - → Analyze (LLM selects documents + tasks) + → Analyze (LLM reviews DocCards, selects documents + tasks) → Supervisor Loop: Dispatch Workers → Evaluate (LLM sufficiency check) → if insufficient → Replan (LLM) → loop - → Rerank (dedup → BM25 score → synthesis/fusion) + → Rerank (dedup → BM25 score → evidence formatting) ``` ### Query Understanding @@ -84,29 +85,57 @@ Every query first passes through LLM-based understanding: | Field | Description | |-------|-------------| | **Intent** | Factual, Analytical, Navigational, or Summary | -| **Complexity** | Simple, Moderate, or Complex | -| **Key Concepts** | LLM-extracted concepts (distinct from keywords) | | **Strategy Hint** | focused, exploratory, comparative, or summary | +| **Key Concepts** | LLM-extracted concepts (distinct from keywords) | ### Orchestrator (Supervisor) The Orchestrator is the central coordinator. It always runs — even for single-document queries. Its supervisor loop: -1. **Analyze** — LLM reviews DocCards and selects relevant documents with specific tasks +1. **Analyze** — LLM reviews DocCards (lightweight metadata) and selects relevant documents with specific tasks 2. **Dispatch** — Fan-out Workers in parallel (one per document) 3. **Evaluate** — LLM checks if collected evidence is sufficient to answer the query 4. **Replan** (if insufficient) — LLM identifies missing information and dispatches additional Workers +When the user specifies document IDs directly, the Orchestrator skips the analysis phase and dispatches Workers immediately. + ### Worker (Evidence Collector) -Each Worker navigates a single document's tree to collect evidence: +Each Worker navigates a single document's tree to collect evidence through a command-based loop: 1. **Bird's-eye** — `ls` the root for an overview -2. **Plan** — LLM generates a navigation plan -3. **Navigate** — Loop: LLM → command → execute → repeat (with budget) +2. **Plan** — LLM generates a navigation plan based on keyword index hits +3. **Navigate** — Loop: LLM selects command → execute → observe result → repeat 4. **Return** — Collected evidence only — no answer synthesis -Workers use tree commands (`ls`, `cd`, `cat`, `grep`, `find`, `findtree`) and a `check` command for self-evaluation. +#### Available Commands + +| Command | Description | +|---------|-------------| +| `ls` | List children at current position (with summaries and leaf counts) | +| `cd ` | Enter a child node | +| `cd ..` | Go back to parent | +| `cat ` | Read node content (automatically collected as evidence) | +| `head ` | Preview first N lines (does NOT collect evidence) | +| `find ` | Search the document's ReasoningIndex for a keyword | +| `findtree ` | Search for nodes by title pattern (case-insensitive) | +| `grep ` | Regex search across content in current subtree | +| `wc ` | Show content size (lines, words, chars) | +| `pwd` | Show current navigation path | +| `check` | Evaluate if collected evidence is sufficient | +| `done` | End navigation | + +#### Navigation Strategy + +Workers prioritize keyword-based navigation over manual exploration: + +1. When keyword index hits are available, Workers use `find` with the exact keyword to jump directly to relevant sections +2. Workers use `ls` when no keyword hints exist or when discovering unknown structure +3. Workers use `findtree` when the section title pattern is known but not the exact name + +#### Dynamic Re-planning + +After a `check` command finds insufficient evidence, the Worker triggers a re-plan — the LLM generates a new navigation plan based on what's missing. This allows the Worker to adapt its strategy mid-navigation. ### Rerank Pipeline @@ -114,11 +143,17 @@ After all Workers complete, the Orchestrator runs the final pipeline: 1. **Dedup** — Remove duplicate and low-quality evidence 2. **BM25 Scoring** — Rank evidence by keyword relevance -3. **Answer Generation** — LLM synthesizes or fuses evidence into a final answer +3. **Evidence Formatting** — Return original document text with source attribution + +The system returns raw evidence text — no LLM synthesis or paraphrasing. This ensures the user sees the exact document content that matches their query. + +## DocCard Catalog + +When multiple documents are indexed, Vectorless maintains a lightweight `catalog.bin` containing DocCard metadata for each document. This allows the Orchestrator to analyze and select relevant documents without loading the full document trees — a significant optimization for workspaces with many documents. ## Cross-Document Graph -When multiple documents are indexed, Vectorless automatically builds a relationship graph based on shared keywords and Jaccard similarity. This graph enables cross-document retrieval with score boosting. +When multiple documents are indexed, Vectorless automatically builds a relationship graph based on shared keywords and Jaccard similarity. The graph is constructed as a background task after each indexing operation. ## Zero Infrastructure diff --git a/docs/docs/features/cross-document-graph.mdx b/docs/docs/features/cross-document-graph.mdx index 1ac22fb..1f94073 100644 --- a/docs/docs/features/cross-document-graph.mdx +++ b/docs/docs/features/cross-document-graph.mdx @@ -23,36 +23,18 @@ Document A ←── 0.72 ──→ Document B ### Graph Building -After each indexing operation, the graph is automatically rebuilt: +After each indexing operation, the graph is automatically rebuilt as a background task: 1. Extract keyword profiles from each document's reasoning index 2. Compute pairwise Jaccard similarity 3. Create edges for document pairs exceeding the similarity threshold 4. Store the graph in the workspace -### Graph-Aware Retrieval +The graph builder uses keyword weights from the ReasoningIndex — keywords that appear in titles get 2.0× weight, summaries 1.5×, and content 1.0×. This ensures that structurally important keywords have more influence on the similarity calculation. -When using the cross-document strategy, the graph boosts scores for connected documents: +### Accessing the Graph -1. Search each document independently -2. Identify high-confidence results (score > 0.5) -3. For each high-confidence result, boost neighbor documents' scores -4. Re-rank the merged result set - -```python -from vectorless import Engine, QueryContext - -engine = Engine(api_key="sk-...", model="gpt-4o") - -# Query across all documents with graph boosting -result = await engine.query( - QueryContext("Compare the approaches") -) -``` - -## Accessing the Graph - -### Python +#### Python ```python graph = await engine.get_graph() @@ -67,7 +49,7 @@ if graph: print(f" → {edge.target_doc_id} (weight: {edge.weight:.2f})") ``` -### Rust +#### Rust ```rust if let Some(graph) = engine.get_graph().await? { @@ -100,3 +82,7 @@ min_keyword_jaccard: 0.1 — Minimum Jaccard similarity threshold max_keywords_per_doc: 50 — Max keywords extracted per document max_edges_per_node: 10 — Max edges per document node ``` + +## Current Status + +The graph is built and persisted during indexing. Graph-aware retrieval features (such as score boosting for connected documents) are planned for a future release. Currently, the graph serves as a relationship discovery and inspection tool accessible via the API. diff --git a/docs/docs/features/summary-strategies.mdx b/docs/docs/features/summary-strategies.mdx index a2bde66..a589a0f 100644 --- a/docs/docs/features/summary-strategies.mdx +++ b/docs/docs/features/summary-strategies.mdx @@ -4,7 +4,7 @@ sidebar_position: 1 # Summary Strategies -Summaries are critical for retrieval quality. The Pilot uses summaries to evaluate candidate nodes during tree navigation. Without summaries, the Pilot can only use node titles for decision-making, which significantly reduces accuracy. +Summaries are critical for retrieval quality. The Worker agent uses summaries in the NavigationIndex to decide which branches to explore and where to navigate. Without summaries, the Worker can only use node titles for decision-making, which significantly reduces accuracy. ## Available Strategies @@ -32,7 +32,7 @@ let strategy = SummaryStrategy::selective(100, true); - `min_tokens` — Minimum content tokens to generate a summary (default: 100) - `branch_only` — Only generate for non-leaf nodes (default: true) -**Trade-off**: Lower indexing cost, but leaf nodes lack summaries. The Pilot falls back to title-only evaluation at leaf level. +**Trade-off**: Lower indexing cost, but leaf nodes lack summaries. The Worker falls back to title-only evaluation at leaf level. ### Lazy @@ -56,10 +56,23 @@ let strategy = SummaryStrategy::lazy(true); ## How Summaries Are Used -During retrieval, the Pilot builds context for each candidate node: +During retrieval, the Worker agent reads summary data from the NavigationIndex at each decision point: -1. **Title** — Always available, highest priority signal -2. **Summary** — Used for semantic evaluation at fork points -3. **Content** — Used for BM25 scoring and final result +1. **`ls` output** — Child nodes show their descriptions (derived from summaries) and leaf counts +2. **Navigation decisions** — The LLM evaluates summaries to decide which branch to enter +3. **Keyword index** — Topic tags from summaries are indexed for `find` command lookups +4. **DocCards** — Root-level summaries power the Orchestrator's document selection -When a node has no summary, the Pilot's decision quality degrades. This is why **Full** is the default — it ensures the Pilot always has summaries to work with. +When a node has no summary, the Worker's navigation quality degrades. This is why **Full** is the default — it ensures the Worker always has summary context to work with. + +## Navigation-Oriented Summaries + +Branch nodes receive structured summaries with three components: + +| Component | Purpose | Used By | +|-----------|---------|---------| +| **OVERVIEW** | 2-3 sentence routing summary | Worker's `ls` output | +| **QUESTIONS** | 3-5 typical questions this branch can answer | Keyword index | +| **TAGS** | 2-4 topic keywords | ReasoningIndex `find` command | + +This structured format enables the Worker to quickly assess whether a branch is worth exploring without reading its full content. diff --git a/docs/docs/getting-started.mdx b/docs/docs/getting-started.mdx index 14f541a..6708b6d 100644 --- a/docs/docs/getting-started.mdx +++ b/docs/docs/getting-started.mdx @@ -84,10 +84,7 @@ async fn main() -> vectorless::Result<()> { let result = engine.query( QueryContext::new("What is the total revenue?").with_doc_ids(vec![doc_id.to_string()]) ).await?; - - if let Some(item) = result.single() { - println!("{}", item.content); - } + println!("{}", result.content); Ok(()) } @@ -97,4 +94,4 @@ async fn main() -> vectorless::Result<()> { - [Architecture](/docs/architecture) — Understand the indexing and retrieval pipeline - [Indexing Overview](/docs/indexing/overview) — Learn about each pipeline stage -- [Retrieval Strategies](/docs/retrieval/strategies) — Choose the right strategy for your use case +- [Retrieval Strategies](/docs/retrieval/strategies) — Understand how queries are processed diff --git a/docs/docs/intro.mdx b/docs/docs/intro.mdx index fcaed05..beb3c30 100644 --- a/docs/docs/intro.mdx +++ b/docs/docs/intro.mdx @@ -12,7 +12,7 @@ It transforms documents into hierarchical semantic trees and uses LLMs to naviga 1. **Parse** — Documents (Markdown, PDF) are parsed into hierarchical semantic trees, preserving structure and relationships between sections. 2. **Index** — Trees are stored with metadata, keywords, and summaries. The pipeline resolves cross-references ("see Section 2.1") and expands keywords with LLM-generated synonyms for improved recall. Incremental indexing skips unchanged files via content fingerprinting. -3. **Query** — An LLM navigates the tree to find the most relevant sections. Multiple search algorithms (Beam Search, MCTS, Greedy) are available, and the Pilot component provides LLM-guided navigation at key decision points. +3. **Query** — An LLM-powered agent navigates the tree to find the most relevant sections. The Orchestrator coordinates multi-document queries, dispatching Workers that use `ls`, `cd`, `cat`, `find`, and `grep` commands to explore the tree and collect evidence. ## Quick Start @@ -24,7 +24,7 @@ pip install vectorless ```python import asyncio -from vectorless import Engine, IndexContext +from vectorless import Engine, IndexContext, QueryContext async def main(): engine = Engine( @@ -35,7 +35,9 @@ async def main(): result = await engine.index(IndexContext.from_path("./report.pdf")) doc_id = result.doc_id - answer = await engine.query(doc_id, "What is the total revenue?") + answer = await engine.query( + QueryContext("What is the total revenue?").with_doc_ids([doc_id]) + ) print(answer.single().content) asyncio.run(main()) @@ -74,11 +76,12 @@ async fn main() -> vectorless::Result<()> { ## Features - **Hierarchical Semantic Trees** — Preserves document structure, not flat chunks -- **LLM-Powered Retrieval** — Structural reasoning over the tree, not vector similarity -- **Cross-Reference Navigation** — Automatically resolves "see Section 2.1", "Appendix G" references and follows them during retrieval +- **LLM-Powered Agent Navigation** — Worker agents navigate the tree using commands (ls, cd, cat, find, grep), making every retrieval decision through LLM reasoning +- **Cross-Reference Resolution** — Automatically resolves "see Section 2.1", "Appendix G" references during indexing - **Synonym Expansion** — LLM-generated synonyms for indexed keywords improve recall for differently-worded queries -- **Multi-Algorithm Search** — Beam Search, MCTS, Greedy, and ToC Navigator with LLM Pilot guidance +- **Orchestrator Supervisor Loop** — Multi-document queries are coordinated by an LLM supervisor that dispatches Workers, evaluates evidence, and replans when needed - **Cross-Document Graph** — Automatic relationship discovery between documents via shared keywords - **Incremental Indexing** — Content fingerprinting skips unchanged files +- **DocCard Catalog** — Lightweight document metadata index enables fast multi-document analysis without loading full documents - **Multi-Format** — Markdown and PDF support - **Zero Infrastructure** — No vector DB, no embedding models, just an LLM API key diff --git a/docs/docs/retrieval/overview.mdx b/docs/docs/retrieval/overview.mdx index 7e38160..79ca9ab 100644 --- a/docs/docs/retrieval/overview.mdx +++ b/docs/docs/retrieval/overview.mdx @@ -9,43 +9,53 @@ The retrieval pipeline transforms a user query into relevant document content by ## Pipeline Phases ```text -Query ──▶ Analyze ──▶ Plan ──▶ Search ──▶ Evaluate ──▶ Result - │ │ │ │ - ▼ ▼ ▼ ▼ - Keywords Strategy Algorithm Score & - Complexity Selection Execution Dedup +Query ──▶ Understand ──▶ Orchestrate ──▶ Navigate ──▶ Evaluate ──▶ Result + │ │ │ │ + ▼ ▼ ▼ ▼ + QueryPlan DocCards + ls/cd/cat/ Evidence + Intent + Dispatch find/grep sufficiency + Concepts Workers check ``` -### Analyze +### Understand -- Extract keywords from the query -- Detect query complexity (simple keyword match vs. multi-hop reasoning) -- Decompose complex queries into sub-queries when needed +- LLM analyzes the query to extract intent, key concepts, and strategy hints +- Produces a `QueryPlan` that guides the entire retrieval process +- Extracts BM25 keywords for index lookup and evidence scoring -### Plan +### Orchestrate -- Select the retrieval strategy (Keyword, LLM, Hybrid, Cross-Document) -- Select the search algorithm (Beam Search, MCTS, Pure Pilot) -- Configure beam width, depth limits, and iteration budgets +- LLM reviews DocCard metadata (lightweight, no full document loading) +- Selects relevant documents and assigns specific tasks to each +- When user specifies doc_ids directly, skips analysis and dispatches immediately +- Fan-out Workers in parallel — one per document -### Search +### Navigate -- Use the ToC Navigator to locate relevant subtrees -- Execute the selected search algorithm from located subtrees -- The Pilot provides LLM guidance at key decision points +Each Worker navigates its assigned document through a command loop: + +1. **Bird's-eye** — `ls` at root to see the document structure +2. **Plan** — LLM generates a navigation plan using keyword index hits +3. **Command loop** — LLM picks a command (`ls`, `cd`, `cat`, `find`, `grep`, etc.), executes it, observes the result, and repeats +4. **Collect evidence** — `cat` automatically saves node content as evidence ### Evaluate -- Score and rank candidate nodes -- Deduplicate overlapping results -- Aggregate content within the token budget +- Workers can self-evaluate with `check` — an LLM assesses evidence sufficiency +- Orchestrator evaluates overall evidence across all Workers +- If insufficient, the Orchestrator triggers a replan and dispatches additional Workers + +### Result + +- Deduplicate and rank collected evidence by BM25 relevance score +- Return original document text with source attribution — no LLM synthesis ## Quick Selection Guide -| Use Case | Strategy | Algorithm | -|----------|----------|-----------| -| Simple keyword lookup | Keyword | Beam Search | -| Complex question requiring reasoning | Hybrid | Beam Search | -| Multi-hop reasoning across sections | LLM | MCTS | -| Multiple documents | Cross-Document | Beam Search | -| Fast overview of document | Keyword | ToC Navigator | +| Use Case | Flow | +|----------|------| +| Single document, specific question | Worker dispatched directly → navigate → collect evidence | +| Single document, broad exploration | Worker with navigation plan → multi-round exploration | +| Multiple documents | Orchestrator analyzes DocCards → dispatches Workers per document | +| Workspace-wide query | Orchestrator reviews all DocCards → selects relevant documents | +| Specified doc_ids | Skip Orchestrator analysis → direct Worker dispatch | diff --git a/docs/docs/retrieval/search-algorithms.mdx b/docs/docs/retrieval/search-algorithms.mdx index a0f4524..5482133 100644 --- a/docs/docs/retrieval/search-algorithms.mdx +++ b/docs/docs/retrieval/search-algorithms.mdx @@ -2,78 +2,118 @@ sidebar_position: 3 --- -# Search Algorithms +# Worker Navigation Commands -Search algorithms determine how the tree is traversed during retrieval. Each algorithm has different trade-offs between accuracy, speed, and token cost. +Workers navigate document trees using a set of commands that mimic filesystem operations. Each command is selected by the LLM based on the current context, collected evidence, and navigation plan. -## Algorithm Overview +## Command Overview -| Algorithm | Paths Explored | Backtracking | Token Cost | Use Case | -|-----------|---------------|-------------|------------|----------| -| **Beam Search** | Top-K (beam width) | Yes | Medium | General purpose | -| **MCTS** | Statistical sampling | Yes | High | Complex multi-hop | -| **Pure Pilot** | Single best path | No | High | High-accuracy single-path | -| **ToC Navigator** | ToC-guided | No | Low | Broad overview queries | +| Command | Purpose | Collects Evidence | +|---------|---------|-------------------| +| **`ls`** | List children at current position | No | +| **`cd `** | Navigate into a child node | No | +| **`cd ..`** | Navigate back to parent | No | +| **`cat `** | Read node content | **Yes** | +| **`head `** | Preview first N lines | No | +| **`find `** | Search ReasoningIndex | No | +| **`findtree `** | Search by title pattern | No | +| **`grep `** | Regex search subtree content | No | +| **`wc `** | Show content size | No | +| **`pwd`** | Show current path | No | +| **`check`** | Evaluate evidence sufficiency | No | +| **`done`** | End navigation | No | -## Beam Search +## Navigation Strategy -Explores multiple paths simultaneously, keeping the top-K candidates at each level. Supports backtracking when paths yield low scores. +Workers follow a priority-ordered strategy for efficient navigation: + +### 1. Keyword-First (Preferred) + +When the ReasoningIndex has keyword matches for the query, Workers use `find` to jump directly to relevant sections: ```text -Root -├── A (score: 0.9) ──▶ explore children -├── B (score: 0.7) ──▶ explore children -└── C (score: 0.3) ──▶ discard (below beam width) +Keyword matches available: + 'revenue' → root/Financial Statements/Revenue (weight 0.85) + +Worker: find revenue +Result: Found in "Revenue" section at depth 2 +Worker: cd "Financial Statements" +Worker: cd Revenue +Worker: cat . ``` -- **Beam width** controls how many paths are kept (default: 3) -- **Fallback stack** preserves truncated paths for backtracking -- Pilot weight: 0.7 (blended with NodeScorer at 0.3) +This avoids manual tree traversal and is the fastest path to relevant content. -This is the **recommended algorithm** for most use cases. +### 2. Manual Exploration -## MCTS (Monte Carlo Tree Search) - -Uses Upper Confidence Bound for Trees (UCT) to balance exploration vs. exploitation. Runs multiple simulations to statistically identify the best path. +When no keyword hints are available, Workers explore the tree manually: ```text - Root (100 visits) - / \ - A (60v) B (40v) - / \ \ - C (35v) D (25v) E (40v) +Worker: ls +Result: [1] Introduction (3 leaves), [2] Architecture (5 leaves), [3] Performance (4 leaves) +Worker: cd Architecture +Worker: ls +Result: [1] Overview (2 leaves), [2] Components (3 leaves) ``` -- Best for complex queries requiring multi-hop reasoning -- Pilot provides priors for UCT selection -- Higher iteration count improves accuracy +### 3. Title Search + +When the section name is known but not the exact path: -## Pure Pilot +```text +Worker: findtree performance +Result: Matches: "Performance" (depth 1), "Performance Metrics" (depth 2) +``` -Greedy single-path search where the Pilot picks the best child at each level. The most token-expensive approach since it makes an LLM call at every tree level. +## Target Resolution -- Pilot weight: 1.0 (no algorithm fallback) -- Best for queries where the correct path is unambiguous -- Fast for shallow trees, expensive for deep ones +When a Worker issues `cd`, `cat`, or `head` with a target name, the system resolves it using multi-level matching: -## ToC Navigator +| Priority | Match Type | Example | +|----------|-----------|---------| +| 1 | Exact title match | `"Revenue"` → Revenue | +| 2 | Case-insensitive match | `"revenue"` → Revenue | +| 3 | Substring (contains) match | `"rev"` → Revenue | +| 4 | Numeric index | `"1"` → first child | -Uses the document's table of contents to locate relevant top-level sections before running a search algorithm. Works in two modes: +## Evidence Collection -- **Keyword mode** — Matches query keywords against ToC entries -- **LLM mode** — Sends the ToC to the LLM for semantic matching +The `cat` command is the primary evidence collection mechanism: -The ToC Navigator is typically the first step in the search pipeline, narrowing the search space before running Beam Search or MCTS within the located subtree. +- `cat ` — Read a child node's content and save as evidence +- `cat` (no argument) — Read the current node's content (useful at leaf nodes) +- Evidence is automatically deduplicated — the Worker tracks visited nodes and avoids re-reading -## Configuration +## Self-Evaluation -```python -from vectorless import QueryContext +The `check` command triggers an LLM-based sufficiency evaluation: -ctx = ( - QueryContext("complex multi-hop question") - .with_doc_ids([doc_id]) - .with_depth_limit(10) # Max tree traversal depth - .with_max_tokens(4000) # Max tokens in result -) +```text +Worker: check +LLM evaluates: "Is the collected evidence sufficient to answer the query?" +Response: SUFFICIENT — evidence contains revenue figures matching the query +Worker: done ``` + +If evidence is insufficient, the response includes what's missing, triggering a dynamic re-plan. + +## Dynamic Re-planning + +After `check` finds insufficient evidence, the Worker generates a new navigation plan: + +1. LLM reviews the query, current evidence, and missing information +2. Generates an updated navigation plan targeting the gaps +3. Worker follows the new plan in subsequent rounds + +This allows Workers to adapt their strategy when initial plans don't yield sufficient results. + +## Budget Controls + +Workers operate within configurable budgets: + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `max_rounds` | 8 | Maximum navigation rounds | +| `max_llm_calls` | 12 | Maximum LLM calls per Worker | + +These prevent runaway navigation loops while giving Workers enough room for multi-hop exploration. diff --git a/docs/docs/retrieval/strategies.mdx b/docs/docs/retrieval/strategies.mdx index e718220..fc4e1d4 100644 --- a/docs/docs/retrieval/strategies.mdx +++ b/docs/docs/retrieval/strategies.mdx @@ -4,70 +4,84 @@ sidebar_position: 2 # Retrieval Strategies -Vectorless provides five retrieval strategies, each designed for different query types and accuracy/speed trade-offs. +Vectorless uses a unified agent-based retrieval approach where the Orchestrator and Workers coordinate through LLM reasoning. The strategy adapts automatically based on query scope and complexity. ## Strategy Overview -| Strategy | LLM Calls | Speed | Accuracy | Best For | -|----------|-----------|-------|----------|----------| -| **Keyword** | 0 (search) | Fastest | Good | Simple keyword matches | -| **LLM** | High | Slowest | Best | Complex reasoning queries | -| **Hybrid** | Medium | Medium | High | General purpose (recommended) | -| **Cross-Document** | Varies | Medium | High | Multi-document queries | -| **Page Range** | 0 (search) | Fast | Good | PDF page-scoped queries | +| Mode | When | LLM Calls | Description | +|------|------|-----------|-------------| +| **Direct Dispatch** | User specifies doc_ids | Medium | Skip Orchestrator analysis, dispatch Worker directly | +| **Single-Document** | One relevant document found | Medium | Orchestrator analyzes, dispatches one Worker | +| **Multi-Document** | Multiple relevant documents | High | Orchestrator selects docs, dispatches parallel Workers | +| **Workspace** | No scope specified | High | Orchestrator reviews all DocCards, selects relevant docs | -## Keyword Strategy +## Direct Dispatch -Fast TF-IDF/BM25 matching against the pre-computed reasoning index. No LLM calls during search. +When the user specifies document IDs in the query context, the system skips Orchestrator analysis and dispatches Workers directly to each specified document. ```python from vectorless import QueryContext -ctx = QueryContext("revenue").with_doc_ids([doc_id]) +answer = await engine.query( + QueryContext("What is the total revenue?").with_doc_ids([doc_id]) +) ``` -Use when: -- The query contains exact terms from the document -- Speed is the priority -- You want zero additional LLM token cost +This is the fastest path — no DocCard analysis LLM call, just direct navigation. -## LLM Strategy +## Single-Document Retrieval -LLM-powered tree navigation with full contextual understanding. The LLM sees the table of contents, node summaries, and makes navigation decisions at each level. +The Orchestrator analyzes the query and available DocCards, determines one document is relevant, and dispatches a single Worker to navigate it. ```python -ctx = QueryContext("Explain the relationship between architecture and performance").with_doc_ids([doc_id]) +answer = await engine.query( + QueryContext("What are the growth trends?").with_doc_ids([doc_id]) +) ``` -Use when: -- The query requires multi-hop reasoning -- Synonyms or paraphrases are likely -- Accuracy is more important than speed +The Worker follows a plan-navigate-evaluate loop: +1. Generate a navigation plan from keyword index hits +2. Navigate the tree using commands (ls, cd, cat, find, grep) +3. Self-evaluate evidence sufficiency with `check` +4. Return collected evidence -## Hybrid Strategy (Recommended) +## Multi-Document Retrieval -Two-phase retrieval: BM25 pre-filter followed by LLM refinement. Combines the speed of keyword matching with the accuracy of LLM reasoning. +The Orchestrator identifies multiple relevant documents and dispatches Workers in parallel — one per document, each with a specific sub-task. ```python -ctx = QueryContext("What are the growth trends?").with_doc_ids([doc_id]) +answer = await engine.query( + QueryContext("Compare quarterly revenue across reports").with_doc_ids([doc_id_1, doc_id_2]) +) ``` -The recommended default for most queries. Fast pre-filtering reduces the number of nodes sent to the LLM, keeping token costs manageable while maintaining high accuracy. +The Orchestrator's supervisor loop: +1. **Analyze** — LLM reviews DocCards and creates a dispatch plan with per-document tasks +2. **Dispatch** — Fan-out Workers in parallel +3. **Evaluate** — LLM checks if combined evidence is sufficient +4. **Replan** (if insufficient) — LLM identifies gaps and dispatches additional Workers -## Cross-Document Strategy +## Workspace Retrieval -Searches across multiple indexed documents and aggregates results. Uses the cross-document relationship graph for score boosting. +When no document scope is specified, the Orchestrator reviews all indexed documents via the lightweight DocCard catalog. ```python -ctx = QueryContext("Compare the architectures") +answer = await engine.query( + QueryContext("What documents discuss performance?") +) ``` -When a high-confidence result is found in one document, neighbor documents in the graph receive a score boost, surfacing related content across the workspace. +The DocCard catalog (`catalog.bin`) stores lightweight metadata for each document — enabling fast analysis without loading full document trees. This is critical for workspaces with many documents. -## Auto Selection +## Query Understanding -By default, the engine analyzes query complexity and automatically selects the appropriate strategy: +Every query (regardless of mode) passes through LLM-based understanding that produces a `QueryPlan`: -- Simple keyword queries → Keyword strategy -- Complex reasoning queries → Hybrid strategy -- Multi-document scope → Cross-Document strategy +| Field | Description | +|-------|-------------| +| **Intent** | Factual, Analytical, Navigational, or Summary | +| **Strategy Hint** | focused, exploratory, comparative, or summary | +| **Key Concepts** | LLM-extracted concepts from the query | +| **Keywords** | BM25 keywords for index lookup and evidence scoring | + +The QueryPlan guides Worker navigation — for example, a "factual" intent with a "focused" strategy hint tells the Worker to look for a specific answer rather than exploring broadly.