A proposal for making websites reliably operable by browser agents.
I've been watching LLM agents try to interact with websites, and the current approach is broken. They guess CSS selectors, rely on visual layout, and break the moment a UI redesign ships. It's fundamentally fragile because the web was never designed with software agents in mind.
This repo is my attempt at a solution: a thin semantic layer (data-agent-* attributes) and a typed capability manifest that can be added to existing HTML. The idea is that a runtime can discover actions, validate inputs, enforce safety rules, and execute through the real UI — without selectors, without screen scraping, without breaking when a class name changes.
The human UI stays the same. It just becomes agent-readable.
A regular HTML form becomes agent-operable by adding semantic attributes:
<form data-agent-kind="action"
data-agent-action="invoice.create"
data-agent-danger="low"
data-agent-confirm="optional">
<input type="email"
data-agent-kind="field"
data-agent-field="customer_email" />
<input type="number"
data-agent-kind="field"
data-agent-field="amount" />
<button type="submit"
data-agent-kind="action"
data-agent-action="invoice.create.submit">
Create Invoice
</button>
<div data-agent-kind="status"
data-agent-output="invoice.create.status"></div>
</form>A manifest at /.well-known/agent-manifest.json declares action schemas, risk levels, and confirmation policies. A runtime uses both to:
- Discover available actions and fields on the page
- Validate inputs against JSON Schema
- Enforce safety policies (block high-risk actions without confirmation)
- Execute by filling fields and clicking submit through the real DOM
- Log every step semantically (not selector-based)
The key design boundary: the LLM chooses intent; the runtime enforces execution. The model never drives the mouse — it picks an action name and args, and the runtime validates and executes safely.
Tool protocols (MCP, etc.) are great when a clean backend action exists. This proposal targets the cases where the agent needs to use the actual UI:
- Forms with live validation, modals, previews
- Draft vs. publish flows, in-page state
- Sites without a tool surface
- "Do it like the user would"
This approach and tool protocols are complementary — the manifest could generate MCP tool definitions later.
The fastest way to see this in action is the sample billing app with its embedded agent chat widget.
# 1. Pull an LLM model
ollama pull llama3.2
# 2. Make sure Ollama is running (it usually starts automatically)
ollama serve # if not already running
# 3. Install dependencies
npm install
# 4. Start the billing app
cd samples/billing-app && npx viteOpen http://localhost:5173/invoices/new and click the chat bubble in the bottom-right corner.
| Prompt | What happens |
|---|---|
| "Create an invoice for alice@example.com for 120 EUR" | Fills the form but does not submit (review mode) — user clicks submit |
| "Send a bill to bob@test.com for 50 USD" | Plans and executes invoice.create |
"Delete the workspace" (on /settings/) |
Triggers high-risk confirmation dialog |
"Delete my workspace" (on /invoices/new) |
Cross-page navigation — widget sees the action exists on /settings/, navigates there, restores the conversation, and executes |
On the invoices list page (/invoices/), the widget enters data chat mode — you can ask questions about the visible invoices like "How many invoices are there?" or "What's the total amount?"
There's also an interactive documentation site that is itself AAF-annotated — you can ask the chat widget questions about the spec:
cd samples/docs-site && npm run dev
# Open http://localhost:5174The docs site covers attributes, manifests, execution flow, tooling, and examples. Since every page is annotated with data-agent-* attributes, the widget enters data chat mode and you can ask it things like "What attributes does AAF define?" or "How does execution work?"
The prototype includes a core runtime (parser, validator, policy engine, logger), a Playwright testing adapter, a conformance linter, a code generator, typed planner/runtime contracts, a local LLM planner (Ollama), an embeddable agent chat widget, framework adapters (Next.js, SvelteKit, React, Vue), a WebMCP bridge (auto-registers AAF actions as browser-native MCP tools on Chrome 146+), and supporting tooling (ESLint plugin, Vite plugin, CLI, llms.txt generator). See CLAUDE.md for the full monorepo layout.
# Install dependencies
npm install
# Run the full test suite
npm test
# Start the sample billing app
cd samples/billing-app && npx vite
# Open http://localhost:5173# Unit tests for a specific package
npx vitest run packages/agent-runtime-core
npx vitest run packages/aaf-contracts
npx vitest run packages/aaf-planner-local
npx vitest run packages/aaf-agent-widget
# Falsification benchmark (selector vs semantic reliability)
npx vitest run tests/falsification
# Generate reliability report
npm run benchmark
# Outputs: artifacts/reliability-report.md# Start the billing app, then run Playwright tests
cd packages/agent-runtime-playwright
npm run test:e2e# Local file
npx aaf-lint --html samples/billing-app/invoices/new/index.html \
--manifest samples/billing-app/public/.well-known/agent-manifest.json
# Remote URL (raw fetch)
npx aaf-lint --audit https://example.com
# Remote SPA (renders JavaScript in headless Chromium first, requires playwright)
npx aaf-lint --audit https://example.com --render
# Site-wide audit — follows same-origin links on the entry page
npx aaf-lint --audit https://example.com --render --crawl
# Include safety checks (dangerous button annotations)
npx aaf-lint --audit https://example.com --render --safetyThe audit auto-discovers a manifest at {origin}/.well-known/agent-manifest.json — no need to pass --manifest for sites that serve one.
npx agentgen --manifest samples/billing-app/public/.well-known/agent-manifest.json \
--output generated-sdk/This is the test I find most convincing. Semantic automation survives UI refactors that break selector-based automation.
tests/falsification/ contains 10 fixture pairs — each with an original HTML page and a refactored version using completely different CSS classes, IDs, and layout, but identical data-agent-* attributes. The fixtures cover forms, modals, date pickers, selects, inline edits, file uploads, pagination, nested forms, and dynamic fields.
npx vitest run tests/falsification| Approach | Original app | Refactored app | Survives refactor? |
|---|---|---|---|
| CSS selectors | 20/20 pass | 0/20 pass | No |
| AAF semantic | 10/10 pass | 10/10 pass | Yes |
The benchmark also tests:
- Safety: high-risk actions without confirmation are blocked (9 tests across 3 actions)
- Scope enforcement: actions outside granted scopes are rejected (9 tests across 3 scoped actions)
- Drift detection: linter catches broken
data-agent-*attributes - Missing fields: required field omission produces clear errors
| Attribute | Values | Purpose |
|---|---|---|
data-agent-kind |
action field status collection item dialog step |
Semantic role |
data-agent-action |
invoice.create |
Action identifier (dot-notation) |
data-agent-field |
customer_email |
Field identifier (snake_case) |
data-agent-danger |
none low high |
Risk level |
data-agent-confirm |
never optional review required |
Confirmation policy |
data-agent-scope |
invoices.write |
Permission hint |
data-agent-idempotent |
true false |
Safe to retry? |
data-agent-for-action |
workspace.delete |
Links a field to an action outside its DOM tree |
Actions declare risk and confirmation requirements. The runtime enforces them — the LLM cannot bypass this:
danger="high"+confirm="required"blocks execution unless the user explicitly confirms- The runtime returns
needs_confirmationwith metadata (action name, risk, scope) - The widget shows a confirmation dialog; only on user approval does it re-execute with
confirmed: true
The PolicyEngine also enforces:
- Arg safety — rejects CSS selectors, XPath, and pseudo-class patterns in LLM-generated args
- Origin trust — validates the manifest was served from the same origin as the current page
- Scope enforcement — blocks actions outside the agent's granted scopes (configurable per widget instance)
See docs/06-security-threat-model.md for the full threat model.
User message
-> Discover actions on page (SemanticParser)
-> Build site-wide context from manifest (off-page actions)
-> LLM plans: { action: "invoice.create", args: { ... } }
-> If action is on another page: persist conversation, navigate, resume
-> Validate args against manifest schema (ManifestValidator)
-> Check policy: risk, confirmation, required fields, arg safety, origin, scope (PolicyEngine)
-> Execute: fill fields, click submit, read status (AAFAdapter)
-> Return structured result + semantic log
See docs/ for the full proposal:
| Document | Topic |
|---|---|
| 01 - Vision and Goals | The problem and why this approach |
| 02 - Standard Spec | Proposed DOM attributes and manifest format |
| 03 - Security and Conformance | Safety rules and conformance levels |
| 04 - Design Principles | Scope and principles |
| 05 - Future | Open questions and future directions |
| 06 - Security Threat Model | 6 enumerated threats with mitigations |
| 08 - HTML Standard Proposal | W3C-style position paper with browser implementation roadmap |
| 09 - Multi-Agent Handoff | Cross-site action chaining protocol |
This is a working prototype — I built it to prove (or disprove) the idea, not to ship a production framework. The prototype includes:
- Core runtime (parser, validator, policy engine with security hardening, logger, arg coercion)
- Agent widget — embeddable
<script>for any AAF page with Ollama/OpenAI-compatible LLM - Local planner (Ollama), prompt builder, response parser
- Linter, code generator, falsification benchmark (10 fixture pairs, 60+ tests)
- Typed planner/runtime contracts with selector rejection
- Framework adapters — Next.js (
AgentForm,withAgentAction), SvelteKit (AgentAction.svelte, server hook), React, Vue - WebMCP bridge — auto-registers AAF actions as
navigator.modelContexttools on Chrome 146+ - Multi-agent handoff protocol — cross-site action chaining with trust model
llms.txtgenerator — AI crawler discovery from manifest- ESLint plugin, Vite plugin
- Interactive docs site (data chat mode — you can ask it questions about AAF)
- Real-world sample app (ProjectHub — 5 pages, 5 actions, 3 data views)
The widget demonstrates the full loop — chat, plan, validate, execute, confirm — running directly on any annotated page. It supports cross-page navigation: if the user requests an action that exists on a different page, the widget auto-navigates there and resumes the conversation. On pages with only data collections (no actions), it enters data chat mode where you can ask questions about the visible content.
I think the interesting question now is whether this pattern works on real product flows, not just sample apps. If you try it and have thoughts, I'd love to hear them.