Agent Accessibility Framework (AAF)

A proposal for making websites reliably operable by browser agents.

I've been watching LLM agents try to interact with websites, and the current approach is broken. They guess CSS selectors, rely on visual layout, and break the moment a UI redesign ships. It's fundamentally fragile because the web was never designed with software agents in mind.

This repo is my attempt at a solution: a thin semantic layer (data-agent-* attributes) and a typed capability manifest that can be added to existing HTML. The idea is that a runtime can discover actions, validate inputs, enforce safety rules, and execute through the real UI — without selectors, without screen scraping, without breaking when a class name changes.

The human UI stays the same. It just becomes agent-readable.

The idea

A regular HTML form becomes agent-operable by adding semantic attributes:

<form data-agent-kind="action"
      data-agent-action="invoice.create"
      data-agent-danger="low"
      data-agent-confirm="optional">

  <input type="email"
         data-agent-kind="field"
         data-agent-field="customer_email" />

  <input type="number"
         data-agent-kind="field"
         data-agent-field="amount" />

  <button type="submit"
          data-agent-kind="action"
          data-agent-action="invoice.create.submit">
    Create Invoice
  </button>

  <div data-agent-kind="status"
       data-agent-output="invoice.create.status"></div>
</form>

A manifest at /.well-known/agent-manifest.json declares action schemas, risk levels, and confirmation policies. A runtime uses both to:

Discover available actions and fields on the page
Validate inputs against JSON Schema
Enforce safety policies (block high-risk actions without confirmation)
Execute by filling fields and clicking submit through the real DOM
Log every step semantically (not selector-based)

The key design boundary: the LLM chooses intent; the runtime enforces execution. The model never drives the mouse — it picks an action name and args, and the runtime validates and executes safely.

Why not just use tool APIs?

Tool protocols (MCP, etc.) are great when a clean backend action exists. This proposal targets the cases where the agent needs to use the actual UI:

Forms with live validation, modals, previews
Draft vs. publish flows, in-page state
Sites without a tool surface
"Do it like the user would"

This approach and tool protocols are complementary — the manifest could generate MCP tool definitions later.

Try the billing app

The fastest way to see this in action is the sample billing app with its embedded agent chat widget.

Prerequisites

Node.js (v18+)
Ollama installed and running

Steps

# 1. Pull an LLM model
ollama pull llama3.2

# 2. Make sure Ollama is running (it usually starts automatically)
ollama serve   # if not already running

# 3. Install dependencies
npm install

# 4. Start the billing app
cd samples/billing-app && npx vite

Open http://localhost:5173/invoices/new and click the chat bubble in the bottom-right corner.

Example prompts to try

Prompt	What happens
"Create an invoice for alice@example.com for 120 EUR"	Fills the form but does not submit (review mode) — user clicks submit
"Send a bill to bob@test.com for 50 USD"	Plans and executes `invoice.create`
"Delete the workspace" (on `/settings/`)	Triggers high-risk confirmation dialog
"Delete my workspace" (on `/invoices/new`)	Cross-page navigation — widget sees the action exists on `/settings/`, navigates there, restores the conversation, and executes

On the invoices list page (/invoices/), the widget enters data chat mode — you can ask questions about the visible invoices like "How many invoices are there?" or "What's the total amount?"

Try the docs site

There's also an interactive documentation site that is itself AAF-annotated — you can ask the chat widget questions about the spec:

cd samples/docs-site && npm run dev
# Open http://localhost:5174

The docs site covers attributes, manifests, execution flow, tooling, and examples. Since every page is annotated with data-agent-* attributes, the widget enters data chat mode and you can ask it things like "What attributes does AAF define?" or "How does execution work?"

What's in the repo

The prototype includes a core runtime (parser, validator, policy engine, logger), a Playwright testing adapter, a conformance linter, a code generator, typed planner/runtime contracts, a local LLM planner (Ollama), an embeddable agent chat widget, framework adapters (Next.js, SvelteKit, React, Vue), a WebMCP bridge (auto-registers AAF actions as browser-native MCP tools on Chrome 146+), and supporting tooling (ESLint plugin, Vite plugin, CLI, llms.txt generator). See CLAUDE.md for the full monorepo layout.

Quick start

# Install dependencies
npm install

# Run the full test suite
npm test

# Start the sample billing app
cd samples/billing-app && npx vite
# Open http://localhost:5173

Run specific test suites

# Unit tests for a specific package
npx vitest run packages/agent-runtime-core
npx vitest run packages/aaf-contracts
npx vitest run packages/aaf-planner-local
npx vitest run packages/aaf-agent-widget

# Falsification benchmark (selector vs semantic reliability)
npx vitest run tests/falsification

# Generate reliability report
npm run benchmark
# Outputs: artifacts/reliability-report.md

E2E tests (Playwright)

# Start the billing app, then run Playwright tests
cd packages/agent-runtime-playwright
npm run test:e2e

Lint / audit a page

# Local file
npx aaf-lint --html samples/billing-app/invoices/new/index.html \
             --manifest samples/billing-app/public/.well-known/agent-manifest.json

# Remote URL (raw fetch)
npx aaf-lint --audit https://example.com

# Remote SPA (renders JavaScript in headless Chromium first, requires playwright)
npx aaf-lint --audit https://example.com --render

# Site-wide audit — follows same-origin links on the entry page
npx aaf-lint --audit https://example.com --render --crawl

# Include safety checks (dangerous button annotations)
npx aaf-lint --audit https://example.com --render --safety

The audit auto-discovers a manifest at {origin}/.well-known/agent-manifest.json — no need to pass --manifest for sites that serve one.

Generate an SDK from a manifest

npx agentgen --manifest samples/billing-app/public/.well-known/agent-manifest.json \
             --output generated-sdk/

The falsification benchmark

This is the test I find most convincing. Semantic automation survives UI refactors that break selector-based automation.

tests/falsification/ contains 10 fixture pairs — each with an original HTML page and a refactored version using completely different CSS classes, IDs, and layout, but identical data-agent-* attributes. The fixtures cover forms, modals, date pickers, selects, inline edits, file uploads, pagination, nested forms, and dynamic fields.

npx vitest run tests/falsification

Approach	Original app	Refactored app	Survives refactor?
CSS selectors	20/20 pass	0/20 pass	No
AAF semantic	10/10 pass	10/10 pass	Yes

The benchmark also tests:

Safety: high-risk actions without confirmation are blocked (9 tests across 3 actions)
Scope enforcement: actions outside granted scopes are rejected (9 tests across 3 scoped actions)
Drift detection: linter catches broken data-agent-* attributes
Missing fields: required field omission produces clear errors

Proposed attributes

Attribute	Values	Purpose
`data-agent-kind`	`action` `field` `status` `collection` `item` `dialog` `step`	Semantic role
`data-agent-action`	`invoice.create`	Action identifier (dot-notation)
`data-agent-field`	`customer_email`	Field identifier (snake_case)
`data-agent-danger`	`none` `low` `high`	Risk level
`data-agent-confirm`	`never` `optional` `review` `required`	Confirmation policy
`data-agent-scope`	`invoices.write`	Permission hint
`data-agent-idempotent`	`true` `false`	Safe to retry?
`data-agent-for-action`	`workspace.delete`	Links a field to an action outside its DOM tree

Safety model

Actions declare risk and confirmation requirements. The runtime enforces them — the LLM cannot bypass this:

danger="high" + confirm="required" blocks execution unless the user explicitly confirms
The runtime returns needs_confirmation with metadata (action name, risk, scope)
The widget shows a confirmation dialog; only on user approval does it re-execute with confirmed: true

The PolicyEngine also enforces:

Arg safety — rejects CSS selectors, XPath, and pseudo-class patterns in LLM-generated args
Origin trust — validates the manifest was served from the same origin as the current page
Scope enforcement — blocks actions outside the agent's granted scopes (configurable per widget instance)

See docs/06-security-threat-model.md for the full threat model.

Execution flow

User message
  -> Discover actions on page (SemanticParser)
  -> Build site-wide context from manifest (off-page actions)
  -> LLM plans: { action: "invoice.create", args: { ... } }
  -> If action is on another page: persist conversation, navigate, resume
  -> Validate args against manifest schema (ManifestValidator)
  -> Check policy: risk, confirmation, required fields, arg safety, origin, scope (PolicyEngine)
  -> Execute: fill fields, click submit, read status (AAFAdapter)
  -> Return structured result + semantic log

Documentation

See docs/ for the full proposal:

Document	Topic
01 - Vision and Goals	The problem and why this approach
02 - Standard Spec	Proposed DOM attributes and manifest format
03 - Security and Conformance	Safety rules and conformance levels
04 - Design Principles	Scope and principles
05 - Future	Open questions and future directions
06 - Security Threat Model	6 enumerated threats with mitigations
08 - HTML Standard Proposal	W3C-style position paper with browser implementation roadmap
09 - Multi-Agent Handoff	Cross-site action chaining protocol

Status

This is a working prototype — I built it to prove (or disprove) the idea, not to ship a production framework. The prototype includes:

Core runtime (parser, validator, policy engine with security hardening, logger, arg coercion)
Agent widget — embeddable <script> for any AAF page with Ollama/OpenAI-compatible LLM
Local planner (Ollama), prompt builder, response parser
Linter, code generator, falsification benchmark (10 fixture pairs, 60+ tests)
Typed planner/runtime contracts with selector rejection
Framework adapters — Next.js (AgentForm, withAgentAction), SvelteKit (AgentAction.svelte, server hook), React, Vue
WebMCP bridge — auto-registers AAF actions as navigator.modelContext tools on Chrome 146+
Multi-agent handoff protocol — cross-site action chaining with trust model
llms.txt generator — AI crawler discovery from manifest
ESLint plugin, Vite plugin
Interactive docs site (data chat mode — you can ask it questions about AAF)
Real-world sample app (ProjectHub — 5 pages, 5 actions, 3 data views)

The widget demonstrates the full loop — chat, plan, validate, execute, confirm — running directly on any annotated page. It supports cross-page navigation: if the user requests an action that exists on a different page, the widget auto-navigates there and resumes the conversation. On pages with only data collections (no actions), it enters data chat mode where you can ask questions about the visible content.

I think the interesting question now is whether this pattern works on real product flows, not just sample apps. If you try it and have thoughts, I'd love to hear them.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.claude/commands		.claude/commands
docs		docs
packages		packages
samples		samples
schemas		schemas
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Accessibility Framework (AAF)

The idea

Why not just use tool APIs?

Try the billing app

Prerequisites

Steps

Example prompts to try

Try the docs site

What's in the repo

Quick start

Run specific test suites

E2E tests (Playwright)

Lint / audit a page

Generate an SDK from a manifest

The falsification benchmark

Proposed attributes

Safety model

Execution flow

Documentation

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Accessibility Framework (AAF)

The idea

Why not just use tool APIs?

Try the billing app

Prerequisites

Steps

Example prompts to try

Try the docs site

What's in the repo

Quick start

Run specific test suites

E2E tests (Playwright)

Lint / audit a page

Generate an SDK from a manifest

The falsification benchmark

Proposed attributes

Safety model

Execution flow

Documentation

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages