certifieddata · dkitchell · May 6, 2026 · May 6, 2026 · May 6, 2026 · May 6, 2026
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,4 @@
+* text=auto eol=lf
+fixtures/*.csv binary
+fixtures/*.json binary
+fixtures/*.jsonl binary
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,52 @@
+name: CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+permissions:
+  contents: read
+
+jobs:
+  test:
+    name: Node ${{ matrix.node }} on ${{ matrix.os }}
+    runs-on: ${{ matrix.os }}
+    strategy:
+      fail-fast: false
+      matrix:
+        node: [20, 22]
+        os: [ubuntu-latest, macos-latest]
+    steps:
+      # Until @certifieddata/verify is published to npm, this repo depends on
+      # it via `file:../verify`, so CI checks out both repositories as siblings.
+      # After the first publish, drop the verify checkout and switch this repo's
+      # dependency to a published version range.
+      - uses: actions/checkout@v4
+        with:
+          path: reference-impl
+      - uses: actions/checkout@v4
+        with:
+          repository: certifieddata/verify
+          # Match the branch name on the verify repo for PR builds; fall back
+          # to main on push-to-main builds (post-merge state).
+          ref: ${{ github.head_ref || 'main' }}
+          path: verify
+      - uses: actions/setup-node@v4
+        with:
+          node-version: ${{ matrix.node }}
+      - name: Build verify
+        working-directory: verify
+        run: |
+          npm install
+          npm run build
+      - name: Install reference-impl
+        working-directory: reference-impl
+        run: npm install
+      - name: Typecheck
+        working-directory: reference-impl
+        run: npm run typecheck
+      - name: Test
+        working-directory: reference-impl
+        run: npm test
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,8 @@
+node_modules/
+dist/
+*.log
+.DS_Store
+.env
+.env.local
+coverage/
+*.tgz
diff --git a/ARTICLE_12_MAPPING.md b/ARTICLE_12_MAPPING.md
@@ -0,0 +1,75 @@
+# EU AI Act Article 12 → ledger event mapping
+
+This document maps each requirement in **Article 12 ("Record-keeping")** of the EU AI Act onto a concrete field in the Decision Ledger event shape that this reference implementation emits.
+
+> **This is a reference implementation, not legal advice.** Final Article 12 compliance is determined by the deployer's risk management system per Article 9 of the EU AI Act. The mapping below is one defensible interpretation, not the only one.
+
+## Article 12(1) — Logging obligation
+
+> *High-risk AI systems shall technically allow for the automatic recording of events ('logs') over the lifetime of the system.*
+
+| Article 12 requirement | Ledger event field | Notes |
+|---|---|---|
+| Recording of events ('logs') over the lifetime of the system | `entries[]` append-only | Every `POST /decide` produces one entry. The ledger is hash-chained: tampering with any entry is detectable in O(n). |
+| Tamper evidence | `prev_hash`, `this_hash` | `this_hash = sha256(canonicalize({event, prev_hash, sequence}))`. Any reorder, mutation, or insertion breaks the chain. The `verifyChain()` helper in `src/ledger.ts` re-walks and confirms. |
+
+## Article 12(2)(a) — Period of use
+
+> *Logs shall record at least: the period of each use of the system.*
+
+| Article 12 requirement | Ledger event field | Notes |
+|---|---|---|
+| Period of each use of the system | `event.timestamp` | ISO-8601 UTC, recorded at decision time. If your inference takes non-trivial time, record `started_at`/`ended_at` as well. |
+
+## Article 12(2)(b) — Reference database checked against
+
+> *Logs shall record at least: the reference database against which input data has been checked by the system.*
+
+| Article 12 requirement | Ledger event field | Notes |
+|---|---|---|
+| Reference database against which input data has been checked | `event.training_cert_id` | The CertifiedData.io certification ID that bound the training dataset to the deployed model. The dataset itself is not stored in the ledger — its hash is bound into the cert, and the cert is verified at startup. |
+| Dataset integrity | (transitively) `cert.dataset_hash` | Stored on the `cert.v1` document, not on the ledger entry; re-verifiable any time with `certifieddata-verify`. |
+
+## Article 12(2)(c) — Input data leading to a match
+
+> *Logs shall record at least: the input data for which the search has led to a match.*
+
+| Article 12 requirement | Ledger event field | Notes |
+|---|---|---|
+| Input data leading to a match | `event.input_hash` | `sha256(JSON(input))`. The raw input is **not** written to the ledger to limit GDPR/PII exposure; the hash binds the decision to a specific input without retaining the input itself. If the deployer's risk management requires retaining inputs, store them in a separate, access-controlled store and add `input_storage_ref` to the event. |
+
+## Article 12(2)(d) — Identification of natural persons involved in verification
+
+> *Logs shall record at least: the identification of the natural persons involved in the verification of the results.*
+
+| Article 12 requirement | Ledger event field | Notes |
+|---|---|---|
+| Identification of natural persons involved in verification | `event.reviewer_id` | Optional. Null for fully-automated decisions. Set by the human-in-the-loop reviewer when the system routes a decision to them. The deployer's identity-management system maps `reviewer_id` back to a real person under controlled conditions. |
+
+## Beyond Article 12
+
+These fields are not required by Article 12 but are emitted to support broader EU AI Act obligations:
+
+| Field | Why |
+|---|---|
+| `event.model_version` | Article 14 (human oversight) and Article 15 (accuracy) require knowing which model produced a given output. |
+| `event.output` | Article 13 (transparency) and Article 14 require explainability of outputs to affected persons. |
+| `event.schema_version` | Schema evolution. Pinned to `article12.v1` for this release. |
+
+## What's deliberately not in the ledger
+
+- **Raw inputs.** Hashed only — see Article 12(2)(c) above.
+- **Model weights.** Bound to the training cert, not the ledger.
+- **Training data rows.** Bound to the training cert by hash, not the ledger.
+- **PII about the affected person.** The deployer's data-protection regime (GDPR Article 32, AI Act Recital 60) governs this. We default to "log nothing personal" and let the deployer add fields under their controllership.
+
+## Verifying the chain
+
+Anyone with access to an evidence bundle can verify the chain locally without trusting the ledger backend:
+
+```ts
+import { verifyChain } from "@certifieddata/reference-impl";
+const ok = verifyChain(bundle.entries);  // boolean
+```
+
+The reference implementation re-runs `verifyChain` on every `evidence/:id` response and exposes the result as `chain_verified` so consumers don't have to.
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,23 @@
+# Production image. Assumes @certifieddata/verify is published to npm.
+# For local development before that publish, use `npm install && npm start`
+# directly — the Dockerfile pins to a real npm version, not file:../verify.
+
+FROM node:22-alpine AS build
+WORKDIR /app
+COPY package.json package-lock.json* ./
+RUN npm install --omit=optional
+COPY tsconfig.json ./
+COPY src ./src
+COPY fixtures ./fixtures
+RUN npm run build
+
+FROM node:22-alpine
+WORKDIR /app
+ENV NODE_ENV=production
+ENV PORT=3000
+COPY --from=build /app/node_modules ./node_modules
+COPY --from=build /app/dist ./dist
+COPY --from=build /app/fixtures ./fixtures
+COPY --from=build /app/package.json ./package.json
+EXPOSE 3000
+CMD ["node", "dist/server.js"]
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2026 Certified Data
+Copyright (c) 2026 CertifiedData.io
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -1 +1,128 @@
-# reference-impl
+# reference-impl
+
+[![CI](https://github.com/certifieddata/reference-impl/actions/workflows/ci.yml/badge.svg)](https://github.com/certifieddata/reference-impl/actions/workflows/ci.yml)
+[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+[![Use this template](https://img.shields.io/badge/use%20this-template-181717?logo=github)](https://github.com/certifieddata/reference-impl/generate)
+
+> Reference implementation: a high-risk AI system logging EU AI Act Article 12 evidence with CertifiedData.io's Decision Ledger.
+
+A ~30-line credit-scoring service that:
+
+1. **Refuses to boot** unless its training-data certificate verifies via [`@certifieddata/verify`](https://github.com/certifieddata/verify).
+2. **Appends an Article 12 event** to a hash-chained Decision Ledger for every decision.
+3. **Exports a regulator-ready evidence bundle** for any decision via `/evidence/:id`.
+
+## Quickstart
+
+```bash
+git clone https://github.com/certifieddata/reference-impl.git
+cd reference-impl
+docker compose up
+```
+
+Then in another terminal:
+
+```bash
+curl -X POST localhost:3000/decide \
+  -H 'content-type: application/json' \
+  -d '{"income": 80000, "debt": 20000}'
+# → {"decision_id":"...","approved":true,"score":0.6}
+
+curl localhost:3000/evidence/<decision_id>
+# → {"decision_id":"...","entries":[...],"chain_verified":true}
+```
+
+## What this proves
+
+- Training data was **certified** as synthetic before it touched the model — the service literally refuses to start otherwise.
+- **Every decision** is hash-chained, tamper-evident, and linked back to the training certificate.
+- A **regulator can replay** any decision from the evidence bundle and re-verify the chain locally.
+
+## The 50-line app
+
+[`src/app.ts`](src/app.ts) is the entire example. Imports and comments aside, it is 33 lines of meaningful code:
+
+```ts
+export async function createApp(opts: AppOptions): Promise<{ app: Hono; certId: string }> {
+  const cert = await fetchCert(opts.trainingCert, { offline: true });
+  const keys = await loadKeys({ keysFile: opts.keys, offline: true });
+  const verdict = await verifyCertificate(cert, keys);
+  if (verdict.verdict !== "VALID") throw new Error(`training data not verified: ${verdict.reason}`);
+
+  const ledger = makeLedger({ url: opts.ledgerUrl });
+  const modelVersion = opts.modelVersion ?? "credit-v3.2.1";
+  const app = new Hono();
+
+  app.post("/decide", async (c) => {
+    const input = await c.req.json<{ income?: number; debt?: number }>();
+    const score = scoreCredit(input);
+    const decision = { decision_id: randomUUID(), approved: score > 0.6, score };
+    await ledger.append(article12Event({
+      decision_id: decision.decision_id,
+      training_cert_id: cert.certification_id,
+      model_version: modelVersion,
+      input, output: { approved: decision.approved, score },
+      timestamp: new Date().toISOString(),
+    }));
+    return c.json(decision);
+  });
+
+  app.get("/evidence/:id", async (c) => c.json(await ledger.evidenceBundle(c.req.param("id"))));
+
+  return { app, certId: cert.certification_id };
+}
+```
+
+Drop `mockModel` for your real inference, point `LEDGER_URL` at a production Decision Ledger, and you have an Article 12-compliant logging surface.
+
+## Article 12 mapping
+
+Every field in the ledger event maps explicitly to a paragraph in EU AI Act Article 12. See [ARTICLE_12_MAPPING.md](ARTICLE_12_MAPPING.md) for the full table.
+
+## Run it locally
+
+```bash
+npm install
+npm run fixtures        # regenerate signed training cert + keys
+npm run build
+npm test                # 3 smoke tests
+npm start               # serve on :3000
+
+# Or with Docker:
+docker compose up
+```
+
+The fixtures include a real Ed25519 keypair and a real signed `cert.v1`. They are committed so reviewers can verify the verifier without trusting any pre-built artifact.
+
+## Examples
+
+```bash
+npm run example:01      # log one Article 12 event
+npm run example:02      # verify the training-data certificate
+npm run example:03      # export an evidence bundle and re-verify the chain
+```
+
+## Use this as a template
+
+```bash
+gh repo create my-org/my-credit-scoring-service \
+  --template certifieddata/reference-impl \
+  --public
+```
+
+Then replace `scoreCredit` with your real model inference and replace the `MemoryLedger` URL with your production Decision Ledger.
+
+## What this is not
+
+- **Not a certified compliance product.** Do not deploy this and tell a regulator your obligations are met.
+- **Not legal advice.** Final Article 12 compliance is determined by the deployer's risk management system per Article 9 of the EU AI Act.
+- **Not a substitute for Article 9 risk management documentation.** This implementation handles the recording obligation, not the risk-assessment obligation.
+
+## Related projects
+
+- [`@certifieddata/verify`](https://github.com/certifieddata/verify) — the audit-friendly CLI/SDK this app imports
+- [`@certifieddata/pii-scan`](https://github.com/certifieddata/pii-scan) — scan datasets for PII before certifying them
+
+## License
+
+MIT — see [LICENSE](LICENSE).
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -0,0 +1,46 @@
+services:
+  reference-impl:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    image: certifieddata/reference-impl:latest
+    ports:
+      - "3000:3000"
+    environment:
+      TRAINING_CERT: /app/fixtures/training-cert.json
+      TRAINING_KEYS: /app/fixtures/keys.json
+      MODEL_VERSION: credit-v3.2.1
+      # LEDGER_URL: http://ledger:8080  # uncomment to use the HTTP ledger below
+    depends_on:
+      postgres:
+        condition: service_healthy
+
+  postgres:
+    image: postgres:16-alpine
+    environment:
+      POSTGRES_USER: ledger
+      POSTGRES_PASSWORD: ledger
+      POSTGRES_DB: ledger
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U ledger"]
+      interval: 5s
+      timeout: 3s
+      retries: 10
+    volumes:
+      - ledger-data:/var/lib/postgresql/data
+
+  # Stub for a future production Decision Ledger service.
+  # Uncomment when wiring up against a real ledger backend:
+  #
+  # ledger:
+  #   image: certifieddata/ledger:latest
+  #   environment:
+  #     POSTGRES_URL: postgres://ledger:ledger@postgres:5432/ledger
+  #   depends_on:
+  #     postgres:
+  #       condition: service_healthy
+  #   ports:
+  #     - "8080:8080"
+
+volumes:
+  ledger-data:
diff --git a/fixtures/decisions.jsonl b/fixtures/decisions.jsonl
@@ -0,0 +1,3 @@
+{"decision_id":"d_demo_001","input":{"income":80000,"debt":20000},"output":{"approved":true,"score":0.6}}
+{"decision_id":"d_demo_002","input":{"income":30000,"debt":25000},"output":{"approved":false,"score":0.05}}
+{"decision_id":"d_demo_003","input":{"income":100000,"debt":5000},"output":{"approved":true,"score":0.95}}