Skip to content

Evolution

deepelement.ai edited this page May 9, 2026 · 2 revisions

🧬 ClawCode Evolution — User Guide

From first run to seasoned expert — how ClawCode learns, adapts, and gets better over time.


Table of Contents


Overview — How ClawCode Evolves

Most AI coding tools are stateless — each session starts from scratch. ClawCode is different. It implements a closed-loop learning system that:

  1. Observes what happens during coding sessions (tool calls, successes, failures)
  2. Analyzes patterns in those observations to identify reusable knowledge
  3. Structures knowledge into portable, versioned artifacts (capsules)
  4. Applies past experience to new tasks automatically
  5. Refines knowledge through feedback, decay, and quality gates
  6. Promotes proven skills through canary experiments

The entire evolution system is built on the principle that an AI assistant should get smarter the more you use it, without requiring manual retraining or external model fine-tuning.

┌─────────────────────────────────────────────────────────────────┐
│                      The Learning Loop                           │
│                                                                 │
│  ┌─────────┐    ┌──────────┐    ┌───────────┐    ┌───────────┐  │
│  │Observe  │───▶│  Analyze  │───▶│  Evolve   │───▶│  Apply    │  │
│  │Record   │    │ Cluster   │    │  Promote  │    │  Reuse    │  │
│  │what     │    │ Score     │    │  Canary   │    │  Feedback │  │
│  │happens  │    │ Quality   │    │  Gates    │    │  Decay    │  │
│  └─────────┘    └──────────┘    └───────────┘    └───────────┘  │
│       │                                                  │       │
│       └──────────────────────────────────────────────────┘       │
│                          (feedback loop)                          │
└─────────────────────────────────────────────────────────────────┘

Three-Layer Learning Architecture

ClawCode's learning system has three layers, each with a different role in the evolution process:

┌─────────────────────────────────────────────────────────────┐
│  Layer 3: TECAP  —  Team Experience Capsules                │
│  Multi-role collaboration patterns, handoff contracts       │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: ECAP   —  Experience Capsules                     │
│  Individual problem-solving knowledge (structured traces)   │
├─────────────────────────────────────────────────────────────┤
│  Layer 1: Instinct —  Behavioral Rules & Preferences        │
│  Atomic preferences, formatting rules, coding style         │
└─────────────────────────────────────────────────────────────┘

Layer 1: Instinct — The Rule Layer

Instincts are the atomic building blocks of ClawCode's learning system. They capture:

What Instincts Store Examples
Formatting preferences "Use PEP 8 style for Python"
Coding conventions "Prefer type hints in function signatures"
Behavioral rules "Always run tests after making changes"
Tool usage preferences "Use grep before glob for content search"
Domain-specific patterns "React components should be arrow functions"

Properties:

Property Description
Persistence Saved to .clawcode/learning/instincts/
Categories personal/ (user-specific) vs inherited/ (from prior sessions)
Scoring Each instinct has a confidence score (0.0–1.0)
Decay Confidence decays at 3% per day unless reinforced
Validation Semantic conflict detection prevents contradictory instincts

Lifecycle:

  1. Observations are consumed from the append-only observations.jsonl log
  2. Analyzer clusters observations by pattern type
  3. New instincts are born from recurring patterns
  4. Existing instincts are reinforced or faded based on success/failure feedback

Layer 2: ECAP — The Experience Layer

ECAP (Experience Capsule) is the core unit of individual problem-solving knowledge. Unlike instincts (which are atomic rules), ECAPs capture structured problem-solving workflows.

ECAP v2 Structure

Section Fields Purpose
Identity schema_version, ecap_id, title, problem_type Unique capsule identification
context repo_fingerprint, language_stack, constraints Problem context and environment
model_profile source_provider, source_model, tool_budget, capability_profile Which model solved this, with what tools
solution_trace steps[], tool_sequence[], decision_rationale_summary Core — structured step-by-step solution
outcome result (success/partial/fail), verification[], risk_left[] What happened and what's left
transfer applicability_conditions[], anti_patterns[], model_migration_rules[] When and how to reuse this experience
links related_instinct_ids[], related_files[] Connections to other knowledge
governance privacy_level, redaction_applied, feedback_score, deprecated Privacy, feedback, lifecycle

Solution Trace (ECAP v2 Core)

Each step in the solution trace is a structured ExperienceStep:

Field Description
step_type tool_run, edit, verify, or decision
summary Human-readable step description
tool_name Which tool was used
params_summary Sanitized parameter summary
pre_conditions What must be true before this step
expected_effect What this step should accomplish
confidence_delta Impact on overall experience confidence

This structured approach means ECAPs are not just logs — they're portable problem-solving recipes that can be applied to new situations.

Problem Types

ECAPs are categorized by problem_type:

Type Use Case
debug Bug investigation and fix
review Code review and quality assessment
refactor Code restructuring without behavioral change
test Test creation and validation
general Any other problem-solving scenario

User Commands

Command Description
/experience-create Create an experience capsule from recent observations
/experience-create --problem-type debug Create with a specific problem type
/experience-status View current experience inventory
/experience-status --json View as machine-readable JSON
/experience-apply ecap-xxxx --mode concise Apply a specific capsule to current task
/experience-apply --problem-type debug --top-k 1 Apply the best matching capsule automatically
/experience-feedback ecap-xxxx --result success --score 0.9 Rate the experience effectiveness
/experience-export ecap-xxxx --format json Export for sharing or backup
/experience-import ./ecap.json Import experience from file or URL

Layer 3: TECAP — The Team Experience Layer

TECAP (Team Experience Capsule) extends ECAP to capture multi-agent collaboration patterns — how teams of AI agents work together, hand off work, and converge on solutions.

What TECAP Captures That ECAP Cannot

Dimension TECAP ECAP
Role coordination Which roles participated and when Single agent perspective
Handoff contracts Formal input/output agreements between roles N/A
Team topology Communication graph and dependencies N/A
Coordination metrics Handoff success rate, rework ratio, cycle time N/A
Decision log Key team decisions with rationale Individual decisions only
Evidence references Provenance for team conclusions Individual evidence only

TECAP v2 Structure

Section Fields
Identity schema_version, tecap_id, title, problem_type
team_context objective, constraints, repo_fingerprint, participants
team_topology Role graph (edges between collaborating roles)
participants TeamParticipant[] with agent_id, agent_role, responsibility
collaboration_trace TeamStep[] with owner_agent, step_type, handoff_to, dependencies
handoff_contracts from_roleto_role with input/output contracts and acceptance criteria
decision_log Key team decisions with timestamps and rationale
coordination_metrics handoff_success_rate, rework_ratio, escalation_count, cycle_time
iteration_records Per-iteration gap tracking for deep loop workflows
outcome Result, verification, risk, delivery metrics
team_experience_fn Weighted scoring: delivery quality (35%), cycle time (25%), rework (20%), escalation (20%)
quality_gates Acceptance criteria for the team process
transfer applicability_conditions[], team_migration_hints[]
governance Privacy, feedback, deprecation flags

Team Coordination Metrics

Metric Description Ideal Value
handoff_success_rate % of role handoffs completed without rework ≥ 0.85
rework_ratio % of work that needed to be redone ≤ 0.15
escalation_count Number of times the team had to escalate to higher-level reasoning ≤ 2
cycle_time Total time from start to converged solution Task-dependent

User Commands

Command Description
/team-experience-create Create a team experience capsule
/team-experience-status View team experience inventory
/team-experience-apply --strategy conservative Apply team experience (conservative/balanced/aggressive)
/team-experience-apply --explain Show why this capsule was selected
/team-experience-apply --top-k 3 Apply top 3 matching capsules
/team-experience-export tecap-xxxx --v1-compatible Export in v1 format for legacy consumers
/team-experience-import ./tecap.json Import team experience
/team-experience-feedback tecap-xxxx --result success --score 0.85 Rate team experience effectiveness

ClawMemory — Persistent Cross-Session Memory

ClawMemory provides persistent, curated memory that survives across sessions. Unlike ECAP/TECAP (which capture problem-solving patterns), ClawMemory stores factual knowledge and user preferences.

Two Memory Types

Memory File Purpose
Memory MEMORY.md System knowledge: project architecture, patterns, facts
User USER.md User-specific preferences: coding style, tool choices, conventions

Memory Governance

Feature Description
Character limits Memory: 2,200 chars / User: 1,375 chars (configurable)
Scoring system Each entry has a relevance score (0.0–1.0)
Security scanning Entries are checked against threat patterns before storage:
— Prompt injection ("ignore previous instructions")
— Role hijack ("you are now...")
— Rule disregard ("disregard your instructions")
— Credential exfiltration (curl/wget with $SECRET)

How Memory Works During Sessions

Session Start
    │
    ▼
Load MEMORY.md + USER.md into context
    │
    ▼
Agent learns: project facts + user preferences
    │
    ▼
During session: agent calls `memory` tool to save new facts
    │
    ▼
Before summarization: memory is flushed first (prevents data loss)
    │
    ▼
Session End
    │
    ▼
Memory persists → available for next session

ClawSkills — Evolved Skill Library

ClawSkills is the system's skill repository — reusable workflows, templates, and procedures that the agent can invoke during development.

Skill Anatomy

Each skill is a Markdown file (SKILL.md) with YAML frontmatter:

---
name: react-component-pattern
description: Generate React components with consistent patterns
type: workflow
source_instincts: [inst-xxx, inst-yyy]
---

## Instructions
1. Identify the component purpose and props interface
2. Create the component file with TypeScript annotations
3. Add error boundaries and loading states
4. Generate corresponding test file
...

Skill Validation

The quality gate validates evolved skills before import:

Check Description
Title Must start with #
Type Must include Type: field
Source Must include ## Source instincts section
Content Must be non-empty
Uniqueness SHA-256 content hash must be unique (no duplicates)

Skill Subdirectories

Directory Purpose
references/ Reference documentation
templates/ Reusable code templates
scripts/ Automation scripts
assets/ Supporting files (schemas, configs)

Atomic Writes

All skill modifications use atomic file writes (write to temp file → rename), ensuring that interrupted operations never corrupt the skill library.


The Autonomous Learning Cycle

The autonomous learning cycle is the heart of ClawCode's evolution system. It runs as a multi-stage pipeline that can be triggered manually or scheduled:

Cycle Stages

Stage Purpose Output
Observe Consume new observations from observations.jsonl Updated observation state
Analyze Cluster observations, identify patterns Pattern clusters with scores
Evolve Promote patterns to instincts and skills New/updated instincts, evolved skills
Import Import evolved artifacts into the agent's active toolset Available skills and commands
Report Generate quality reports and metrics Dashboard data, alerts
Tuning Apply parameter tuning based on feedback Adjusted confidence scores
Export Export reports and capsules for review JSON/Markdown exports

Running a Cycle

# Dry run (default) — analyze without writing
clawcode-autonomous-cycle --cwd /path/to/project

# Full run — allow imports and writes
clawcode-autonomous-cycle --no-dry-run --cwd /path/to/project

# Report only — just show current state
clawcode-autonomous-cycle --report-only

# With tuning — apply feedback-based adjustments
clawcode-autonomous-cycle --apply-tuning

# Export report — generate quality report
clawcode-autonomous-cycle --export-report

# Custom time window (hours of observations)
clawcode-autonomous-cycle --window-hours 24

# Explicit domain filtering
clawcode-autonomous-cycle --explicit-domain frontend

# Custom import limit
clawcode-autonomous-cycle --import-limit 50

Process Guard

The cycle uses a process lock to prevent concurrent executions:

Mechanism Detail
Lock file .clawcode/learning/runtime/cycle.lock
Lease timeout 300 seconds
Owner tracking {hostname}:{pid}
Idempotency cache Prevents duplicate cycle runs

Quality Gates & Confidence Management

ClawCode's learning system uses quality gates and confidence scoring to ensure that evolved knowledge is reliable.

Confidence Scoring

Parameter Default Description
Initial confidence 0.5 (default) New knowledge starts at moderate confidence
Success step +0.04 Each success increases confidence
Failure step -0.048 (1.2× penalty) Failures decrease confidence more than successes increase it
Decay rate 3% per day Unused knowledge decays over time
Minimum floor 0.2 Confidence never drops below 20%

Experience Quality Gates

For research-derived experience patterns:

Gate Threshold Description
Evidence quality ≥ 0.7 Minimum evidence quality score
Source count ≥ 3 Minimum number of sources supporting the pattern

Skill Quality Gates

Evolved skills must pass validation before import:

Gate Description
Structural validation Required sections present
Content uniqueness No duplicate content
Non-empty check Content is not blank
Title validation Proper Markdown heading

Canary Promotion

The canary system compares baseline and candidate knowledge:

Parameter Default Description
min_improvement 0.0 Absolute score improvement required
min_relative_improvement configurable Relative improvement required
min_samples 5 Minimum observations per bucket
min_confidence 0.6 Confidence threshold for promotion
control_ratio 0.5 Traffic split between control and candidate

Lifecycle: draftrunningpromoted / aborted


Privacy & Governance

ClawCode's learning system treats knowledge as sensitive data that needs proper governance:

Privacy Levels

Level Description
strict Maximum redaction — no repo paths, no model details
balanced (default) Redact file paths and credentials, keep problem types and patterns
full Minimal redaction — keep all context for internal use

What Gets Redacted

Redacted Not Redacted
File system paths Problem types and categories
API keys and tokens Solution patterns and strategies
Email addresses Tool usage statistics
Specific model versions Transfer rules and applicability conditions

Governance Fields

Every ECAP and TECAP includes:

Field Purpose
privacy_level strict / balanced / full
redaction_applied Boolean flag indicating if redaction was applied
reviewed_by Human reviewer identity (for shared capsules)
created_at / updated_at ISO timestamps
feedback_score / feedback_count Aggregated user feedback
deprecated Mark capsules as obsolete

Security: Prompt Injection Prevention

Memory entries are scanned for threat patterns before being stored:

Threat Pattern Risk
"ignore previous instructions" Prompt injection attack
"you are now..." Role hijack attempt
"disregard your instructions" Rule override
curl ... $SECRET Credential exfiltration
wget ... $TOKEN Secret stealing

Versioning & Compatibility

The learning system uses semantic versioning with backward-compatible reading:

ECAP Versioning

Version Status Compatibility
ecap-v1 Legacy Readable — auto-upgraded to v2 on load
ecap-v2 Current Full support, recommended

Automatic v1 → v2 migration:

v1 Field v2 Equivalent
steps: list[str] ExperienceStep[] with step_type=decision
tool_sequence: list[str] ToolCallHint[] with count=1
Missing governance fields Filled with safe defaults

TECAP Versioning

Version Status Compatibility
tecap-v1 Legacy Readable — auto-upgraded to v2 on load
tecap-v2 Current Full support, recommended

v1 → v2 additions:

New v2 Field Default on Upgrade
team_topology Generated from participants
handoff_contracts Empty list []
decision_log Empty list []
coordination_metrics Zero values
quality_gates Three default gate descriptions

Export compatibility: --v1-compatible flag produces v1-formatted JSON for legacy consumers.


Observability & Ops Monitoring

The learning system is fully observable through the Ops event system:

Event Type When Emitted
ops_event Every significant learning system operation
cycle_started Autonomous learning cycle begins
cycle_completed Cycle finishes with results
capsule_created New ECAP/TECAP created
skill_evolved Skill promoted from instinct
quality_gate_passed Evolved artifact passes validation

Experience Alerts

The system monitors for alert conditions:

Alert Description
Low feedback score Capsule with declining effectiveness
High rework ratio Team patterns with excessive rework
Deprecated capsule still applied Warning when using obsolete knowledge
Privacy violation Capsule leaking sensitive data

Experience Dashboard

The system can generate a dashboard with:

Metric Description
Total capsules Count of ECAPs and TECAPs
By problem type Distribution across debug/review/refactor/test
Average confidence Mean confidence across all capsules
Feedback distribution Score histogram
Deprecated count Capsules marked obsolete

User Commands Quick Reference

Experience (ECAP) Commands

Command Short Purpose
/experience-create Create capsule from observations
/experience-status Show current inventory
/experience-apply ecap-xxxx Apply specific capsule
/experience-apply --problem-type debug --top-k 1 Auto-apply best match
/experience-feedback ecap-xxxx --result success --score 0.9 Rate experience
/experience-export ecap-xxxx --format json Export capsule
/experience-import ./file.json Import capsule

Team Experience (TECAP) Commands

Command Purpose
/team-experience-create Create team capsule
/team-experience-status Show team inventory
/team-experience-apply --strategy balanced Apply with strategy
/team-experience-apply --explain Show selection reasoning
/team-experience-export --v1-compatible Export v1 format
/team-experience-import ./file.json Import team capsule
/team-experience-feedback tecap-xxxx --score 0.85 Rate team experience

Autonomous Cycle

clawcode-autonomous-cycle [--dry-run] [--no-dry-run] [--report-only]
  [--apply-tuning] [--export-report] [--window-hours N]
  [--import-limit N] [--explicit-domain domain]

Storage Layout

.clawcode/
├── learning/
│   ├── observations.jsonl          # Append-only raw observations
│   ├── instincts/
│   │   ├── personal/               # User-specific instincts
│   │   └── inherited/              # Cross-session instincts
│   ├── evolved/
│   │   ├── skills/                 # Evolved skill files (SKILL.md)
│   │   ├── commands/               # Evolved command definitions
│   │   └── agents/                 # Evolved agent configurations
│   ├── experience/
│   │   ├── capsules/               # ECAP files (*.json)
│   │   ├── exports/                # Exported capsules
│   │   └── feedback.jsonl          # Feedback log
│   ├── team-experience/
│   │   ├── capsules/               # TECAP files (*.json)
│   │   ├── exports/                # Exported team capsules
│   │   └── feedback.jsonl          # Team feedback log
│   ├── snapshots/                  # Audit snapshots
│   └── runtime/
│       ├── cycle.lock              # Process lock file
│       └── idempotency_cache.json  # Prevents duplicate cycles
├── claw_memory/
│   ├── MEMORY.md                   # System knowledge
│   ├── USER.md                     # User preferences
│   ├── MEMORY.meta.json            # Memory metadata (scores, timestamps)
│   └── USER.meta.json              # User metadata
└── claw_skills/                    # Skill library root
    ├── SKILL.md                    # Skill files
    ├── references/
    ├── templates/
    ├── scripts/
    └── assets/

How It All Works Together

Here's how ClawCode evolves through real usage:

Day 1 — First Session
    │
    ▼
You: "Fix the authentication bug"
Agent works through the problem, using various tools
    │
    ▼
┌──────────────────────────────────────────────────────┐
│ Observe: Tool calls, success/failure recorded         │
│ → observations.jsonl grows                            │
└────────────────────┬─────────────────────────────────┘
                     │
                     ▼
Day 2 — Pattern Recognition
    │
    ▼
Autonomous cycle runs:
  1. Analyzes yesterday's observations
  2. Clusters similar tool sequences
  3. Promotes recurring patterns to instincts
     → "Always check token expiration first" (confidence: 0.5)
  4. Builds ECAP capsule for auth debugging
     → Structured trace: glob → grep → view → edit → test
     → transfer conditions: Python, auth-related, has tests
└────────────────────┬─────────────────────────────────┘
                     │
                     ▼
Day 5 — Automatic Experience Application
    │
    ▼
You: "Fix the login redirect issue"
Agent detects: problem_type=debug, language=Python
    │
    ▼
Experience engine searches:
  → Finds ECAP for auth debugging (confidence: 0.52, 4 days decay)
  → Applies it as context for this session
  → Solution follows proven pattern: check first, search, verify
└────────────────────┬─────────────────────────────────┘
                     │
                     ▼
You: /experience-feedback ecap-auth --result success --score 0.85
    │
    ▼
Confidence updated: 0.52 + 0.04 = 0.56
    │
    ▼
Day 10 — Skill Evolution
    │
    ▼
Autonomous cycle:
  → "auth debugging" pattern has 3 successful applications
  → Confidence: 0.56 + (3 × 0.04) = 0.68
  → Promoted to evolved skill: SKILL.md
  → Quality gate passes (title, sections, unique content)
  → Available to all future sessions
└────────────────────┬─────────────────────────────────┘
                     │
                     ▼
Day 15 — Team Experience
    │
    ▼
/clawteam --deep_loop "Refactor the auth system"
Team collaborates: architect → backend → frontend → QA
    │
    ▼
TECAP created:
  → 4 roles participated, 3 handoff contracts
  → Handoff success: 88%, rework ratio: 12%
  → 6 iteration deep loop converged after 4 rounds
  → Quality score: 0.82
└────────────────────┬─────────────────────────────────┘
                     │
                     ▼
Ongoing — Continuous Evolution
    │
    ▼
Every session → observations → analysis → evolution → application
The system never stops learning. Each cycle makes it smarter,
more precise, and better aligned with your workflow.

ClawCodeCreative Engineering Cockpit for Serious AI Builders

An assistant that learns from every session, remembers every insight, and evolves with your team.

Clone this wiki locally