Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
* text=auto eol=lf
fixtures/*.csv binary
fixtures/*.json binary
fixtures/*.jsonl binary
52 changes: 52 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

permissions:
contents: read

jobs:
test:
name: Node ${{ matrix.node }} on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
node: [20, 22]
os: [ubuntu-latest, macos-latest]
steps:
# Until @certifieddata/verify is published to npm, this repo depends on
# it via `file:../verify`, so CI checks out both repositories as siblings.
# After the first publish, drop the verify checkout and switch this repo's
# dependency to a published version range.
- uses: actions/checkout@v4
with:
path: reference-impl
- uses: actions/checkout@v4
with:
repository: certifieddata/verify
# Match the branch name on the verify repo for PR builds; fall back
# to main on push-to-main builds (post-merge state).
ref: ${{ github.head_ref || 'main' }}
path: verify
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
- name: Build verify
working-directory: verify
run: |
npm install
npm run build
- name: Install reference-impl
working-directory: reference-impl
run: npm install
- name: Typecheck
working-directory: reference-impl
run: npm run typecheck
- name: Test
working-directory: reference-impl
run: npm test
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
node_modules/
dist/
*.log
.DS_Store
.env
.env.local
coverage/
*.tgz
75 changes: 75 additions & 0 deletions ARTICLE_12_MAPPING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# EU AI Act Article 12 → ledger event mapping

This document maps each requirement in **Article 12 ("Record-keeping")** of the EU AI Act onto a concrete field in the Decision Ledger event shape that this reference implementation emits.

> **This is a reference implementation, not legal advice.** Final Article 12 compliance is determined by the deployer's risk management system per Article 9 of the EU AI Act. The mapping below is one defensible interpretation, not the only one.

## Article 12(1) — Logging obligation

> *High-risk AI systems shall technically allow for the automatic recording of events ('logs') over the lifetime of the system.*

| Article 12 requirement | Ledger event field | Notes |
|---|---|---|
| Recording of events ('logs') over the lifetime of the system | `entries[]` append-only | Every `POST /decide` produces one entry. The ledger is hash-chained: tampering with any entry is detectable in O(n). |
| Tamper evidence | `prev_hash`, `this_hash` | `this_hash = sha256(canonicalize({event, prev_hash, sequence}))`. Any reorder, mutation, or insertion breaks the chain. The `verifyChain()` helper in `src/ledger.ts` re-walks and confirms. |

## Article 12(2)(a) — Period of use

> *Logs shall record at least: the period of each use of the system.*

| Article 12 requirement | Ledger event field | Notes |
|---|---|---|
| Period of each use of the system | `event.timestamp` | ISO-8601 UTC, recorded at decision time. If your inference takes non-trivial time, record `started_at`/`ended_at` as well. |

## Article 12(2)(b) — Reference database checked against

> *Logs shall record at least: the reference database against which input data has been checked by the system.*

| Article 12 requirement | Ledger event field | Notes |
|---|---|---|
| Reference database against which input data has been checked | `event.training_cert_id` | The CertifiedData.io certification ID that bound the training dataset to the deployed model. The dataset itself is not stored in the ledger — its hash is bound into the cert, and the cert is verified at startup. |
| Dataset integrity | (transitively) `cert.dataset_hash` | Stored on the `cert.v1` document, not on the ledger entry; re-verifiable any time with `certifieddata-verify`. |

## Article 12(2)(c) — Input data leading to a match

> *Logs shall record at least: the input data for which the search has led to a match.*

| Article 12 requirement | Ledger event field | Notes |
|---|---|---|
| Input data leading to a match | `event.input_hash` | `sha256(JSON(input))`. The raw input is **not** written to the ledger to limit GDPR/PII exposure; the hash binds the decision to a specific input without retaining the input itself. If the deployer's risk management requires retaining inputs, store them in a separate, access-controlled store and add `input_storage_ref` to the event. |

## Article 12(2)(d) — Identification of natural persons involved in verification

> *Logs shall record at least: the identification of the natural persons involved in the verification of the results.*

| Article 12 requirement | Ledger event field | Notes |
|---|---|---|
| Identification of natural persons involved in verification | `event.reviewer_id` | Optional. Null for fully-automated decisions. Set by the human-in-the-loop reviewer when the system routes a decision to them. The deployer's identity-management system maps `reviewer_id` back to a real person under controlled conditions. |

## Beyond Article 12

These fields are not required by Article 12 but are emitted to support broader EU AI Act obligations:

| Field | Why |
|---|---|
| `event.model_version` | Article 14 (human oversight) and Article 15 (accuracy) require knowing which model produced a given output. |
| `event.output` | Article 13 (transparency) and Article 14 require explainability of outputs to affected persons. |
| `event.schema_version` | Schema evolution. Pinned to `article12.v1` for this release. |

## What's deliberately not in the ledger

- **Raw inputs.** Hashed only — see Article 12(2)(c) above.
- **Model weights.** Bound to the training cert, not the ledger.
- **Training data rows.** Bound to the training cert by hash, not the ledger.
- **PII about the affected person.** The deployer's data-protection regime (GDPR Article 32, AI Act Recital 60) governs this. We default to "log nothing personal" and let the deployer add fields under their controllership.

## Verifying the chain

Anyone with access to an evidence bundle can verify the chain locally without trusting the ledger backend:

```ts
import { verifyChain } from "@certifieddata/reference-impl";
const ok = verifyChain(bundle.entries); // boolean
```

The reference implementation re-runs `verifyChain` on every `evidence/:id` response and exposes the result as `chain_verified` so consumers don't have to.
23 changes: 23 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Production image. Assumes @certifieddata/verify is published to npm.
# For local development before that publish, use `npm install && npm start`
# directly — the Dockerfile pins to a real npm version, not file:../verify.

FROM node:22-alpine AS build
WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm install --omit=optional
COPY tsconfig.json ./
COPY src ./src
COPY fixtures ./fixtures
RUN npm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production
ENV PORT=3000
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY --from=build /app/fixtures ./fixtures
COPY --from=build /app/package.json ./package.json
EXPOSE 3000
CMD ["node", "dist/server.js"]
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2026 Certified Data
Copyright (c) 2026 CertifiedData.io

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
129 changes: 128 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,128 @@
# reference-impl
# reference-impl

[![CI](https://github.com/certifieddata/reference-impl/actions/workflows/ci.yml/badge.svg)](https://github.com/certifieddata/reference-impl/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Use this template](https://img.shields.io/badge/use%20this-template-181717?logo=github)](https://github.com/certifieddata/reference-impl/generate)

> Reference implementation: a high-risk AI system logging EU AI Act Article 12 evidence with CertifiedData.io's Decision Ledger.

A ~30-line credit-scoring service that:

1. **Refuses to boot** unless its training-data certificate verifies via [`@certifieddata/verify`](https://github.com/certifieddata/verify).
2. **Appends an Article 12 event** to a hash-chained Decision Ledger for every decision.
3. **Exports a regulator-ready evidence bundle** for any decision via `/evidence/:id`.

## Quickstart

```bash
git clone https://github.com/certifieddata/reference-impl.git
cd reference-impl
docker compose up
```

Then in another terminal:

```bash
curl -X POST localhost:3000/decide \
-H 'content-type: application/json' \
-d '{"income": 80000, "debt": 20000}'
# → {"decision_id":"...","approved":true,"score":0.6}

curl localhost:3000/evidence/<decision_id>
# → {"decision_id":"...","entries":[...],"chain_verified":true}
```

## What this proves

- Training data was **certified** as synthetic before it touched the model — the service literally refuses to start otherwise.
- **Every decision** is hash-chained, tamper-evident, and linked back to the training certificate.
- A **regulator can replay** any decision from the evidence bundle and re-verify the chain locally.

## The 50-line app

[`src/app.ts`](src/app.ts) is the entire example. Imports and comments aside, it is 33 lines of meaningful code:

```ts
export async function createApp(opts: AppOptions): Promise<{ app: Hono; certId: string }> {
const cert = await fetchCert(opts.trainingCert, { offline: true });
const keys = await loadKeys({ keysFile: opts.keys, offline: true });
const verdict = await verifyCertificate(cert, keys);
if (verdict.verdict !== "VALID") throw new Error(`training data not verified: ${verdict.reason}`);

const ledger = makeLedger({ url: opts.ledgerUrl });
const modelVersion = opts.modelVersion ?? "credit-v3.2.1";
const app = new Hono();

app.post("/decide", async (c) => {
const input = await c.req.json<{ income?: number; debt?: number }>();
const score = scoreCredit(input);
const decision = { decision_id: randomUUID(), approved: score > 0.6, score };
await ledger.append(article12Event({
decision_id: decision.decision_id,
training_cert_id: cert.certification_id,
model_version: modelVersion,
input, output: { approved: decision.approved, score },
timestamp: new Date().toISOString(),
}));
return c.json(decision);
});

app.get("/evidence/:id", async (c) => c.json(await ledger.evidenceBundle(c.req.param("id"))));

return { app, certId: cert.certification_id };
}
```

Drop `mockModel` for your real inference, point `LEDGER_URL` at a production Decision Ledger, and you have an Article 12-compliant logging surface.

## Article 12 mapping

Every field in the ledger event maps explicitly to a paragraph in EU AI Act Article 12. See [ARTICLE_12_MAPPING.md](ARTICLE_12_MAPPING.md) for the full table.

## Run it locally

```bash
npm install
npm run fixtures # regenerate signed training cert + keys
npm run build
npm test # 3 smoke tests
npm start # serve on :3000

# Or with Docker:
docker compose up
```

The fixtures include a real Ed25519 keypair and a real signed `cert.v1`. They are committed so reviewers can verify the verifier without trusting any pre-built artifact.

## Examples

```bash
npm run example:01 # log one Article 12 event
npm run example:02 # verify the training-data certificate
npm run example:03 # export an evidence bundle and re-verify the chain
```

## Use this as a template

```bash
gh repo create my-org/my-credit-scoring-service \
--template certifieddata/reference-impl \
--public
```

Then replace `scoreCredit` with your real model inference and replace the `MemoryLedger` URL with your production Decision Ledger.

## What this is not

- **Not a certified compliance product.** Do not deploy this and tell a regulator your obligations are met.
- **Not legal advice.** Final Article 12 compliance is determined by the deployer's risk management system per Article 9 of the EU AI Act.
- **Not a substitute for Article 9 risk management documentation.** This implementation handles the recording obligation, not the risk-assessment obligation.

## Related projects

- [`@certifieddata/verify`](https://github.com/certifieddata/verify) — the audit-friendly CLI/SDK this app imports
- [`@certifieddata/pii-scan`](https://github.com/certifieddata/pii-scan) — scan datasets for PII before certifying them

## License

MIT — see [LICENSE](LICENSE).
46 changes: 46 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
services:
reference-impl:
build:
context: .
dockerfile: Dockerfile
image: certifieddata/reference-impl:latest
ports:
- "3000:3000"
environment:
TRAINING_CERT: /app/fixtures/training-cert.json
TRAINING_KEYS: /app/fixtures/keys.json
MODEL_VERSION: credit-v3.2.1
# LEDGER_URL: http://ledger:8080 # uncomment to use the HTTP ledger below
depends_on:
postgres:
condition: service_healthy

postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: ledger
POSTGRES_PASSWORD: ledger
POSTGRES_DB: ledger
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ledger"]
interval: 5s
timeout: 3s
retries: 10
volumes:
- ledger-data:/var/lib/postgresql/data

# Stub for a future production Decision Ledger service.
# Uncomment when wiring up against a real ledger backend:
#
# ledger:
# image: certifieddata/ledger:latest
# environment:
# POSTGRES_URL: postgres://ledger:ledger@postgres:5432/ledger
# depends_on:
# postgres:
# condition: service_healthy
# ports:
# - "8080:8080"

volumes:
ledger-data:
3 changes: 3 additions & 0 deletions fixtures/decisions.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{"decision_id":"d_demo_001","input":{"income":80000,"debt":20000},"output":{"approved":true,"score":0.6}}
{"decision_id":"d_demo_002","input":{"income":30000,"debt":25000},"output":{"approved":false,"score":0.05}}
{"decision_id":"d_demo_003","input":{"income":100000,"debt":5000},"output":{"approved":true,"score":0.95}}
Loading
Loading