Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
* text=auto eol=lf
fixtures/*.csv binary
fixtures/*.json binary
31 changes: 31 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

permissions:
contents: read

jobs:
test:
name: Node ${{ matrix.node }} on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
node: [20, 22]
os: [ubuntu-latest, macos-latest, windows-latest]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
cache: npm
- run: npm ci
- run: npm run lint
- run: npm run typecheck
- run: npm test
- run: npm pack --dry-run
25 changes: 25 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Publish to npm

on:
release:
types: [published]

permissions:
contents: read
id-token: write

jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
registry-url: https://registry.npmjs.org
cache: npm
- run: npm ci
- run: npm run lint
- run: npm run typecheck
- run: npm test
- run: npm publish --provenance --access public
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
node_modules/
dist/
*.log
.DS_Store
.env
.env.local
coverage/
*.tgz
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/).

## [0.1.0] - Unreleased

### Added

- Initial public release of `@certifieddata/verify`.
- `certifieddata-verify` and `cd-verify` binaries.
- RFC 8785 JCS canonicalizer (`canonicalize.ts`).
- Ed25519 signature verification using `node:crypto` only — zero third-party crypto dependencies.
- `cert.v1` schema support.
- `--dataset`, `--json`, `--offline`, `--keys`, `--no-cache` flags.
- Trusted-keys document fetched from `https://certifieddata.io/.well-known/certifieddata-keys.json` with TTL cache at `~/.certifieddata/keys.json`.
- Six exit codes documented in the README and `--help`.
- 34 tests across canonicalize, verify, and CLI suites.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2026 Certified Data
Copyright (c) 2026 CertifiedData.io

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
114 changes: 113 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,113 @@
# verify
# @certifieddata/verify

[![npm](https://img.shields.io/npm/v/@certifieddata/verify.svg)](https://www.npmjs.com/package/@certifieddata/verify)
[![CI](https://github.com/certifieddata/verify/actions/workflows/ci.yml/badge.svg)](https://github.com/certifieddata/verify/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Node](https://img.shields.io/node/v/@certifieddata/verify.svg)](package.json)

> Verify CertifiedData.io certificates from the command line. Audit-friendly, zero crypto dependencies.

## Install + verify in three lines

```bash
npm install -g @certifieddata/verify
certifieddata-verify ce_01HXYZ123abc... --dataset path/to/data.csv
# → ✓ VALID certification_id ce_01HXYZ123abc...
```

## What this verifies

- **The signature.** `cert.signature` is an Ed25519 signature over the RFC 8785 JCS canonicalization of the rest of the certificate. We re-canonicalize, re-verify, and refuse to claim a cert is valid unless the signature checks out.
- **The signer.** `cert.key_id` must appear in the issuer's published [`.well-known` keys document](https://certifieddata.io/.well-known/certifieddata-keys.json) and must not be revoked.
- **The dataset (optional).** When `--dataset <path>` is supplied, we stream-hash the file and refuse to claim a match unless its SHA-256 is bit-identical to `cert.dataset_hash`.

## Why audit-friendly

The whole verification routine lives in [`src/verify.ts`](src/verify.ts) — under 100 lines, no clever indirection, no third-party crypto. We use `node:crypto` directly:

```ts
const ok = crypto.verify('ed25519', canonicalBytes, publicKey, signatureBytes);
```

If you can read TypeScript, you can audit our verifier in five minutes.

## Exit codes

| Code | Verdict | Meaning |
|------|---------|---------|
| 0 | `VALID` | Signature verified and key is trusted (and dataset matches if `--dataset` was passed) |
| 1 | `INVALID` / `DATASET_MISMATCH` | Signature does not verify, or recomputed dataset hash differs |
| 2 | `UNKNOWN_KEY` | `key_id` is not in the trusted keys document, or has been revoked |
| 3 | `MALFORMED` | Certificate JSON is missing required fields, has bad base64, etc. |
| 4 | `NETWORK` | Could not reach the API or `.well-known` endpoint and no fresh cache is available |
| 64 | `USAGE` | Bad command-line flags |

## `--json` schema

```json
{
"verdict": "VALID | INVALID | UNKNOWN_KEY | DATASET_MISMATCH | MALFORMED",
"certification_id": "ce_...",
"key_id": "ck_...",
"issuer": "CertifiedData.io",
"algorithm": "CTGAN",
"signed_at": "2026-03-18T20:31:45Z",
"dataset_hash_expected": "sha256:...",
"dataset_hash_actual": "sha256:... | null",
"checks": {
"signature": "pass | fail | skipped",
"key_trust": "pass | fail | skipped",
"dataset_match": "pass | fail | skipped"
},
"reason": "human-readable explanation"
}
```

## Use in CI

```yaml
- name: Verify training-data certificate
run: |
npm install -g @certifieddata/verify
certifieddata-verify "${{ env.TRAINING_CERT_ID }}" --dataset data/training.csv --json \
| tee verify-result.json
- uses: actions/upload-artifact@v4
with: { name: cert-verification, path: verify-result.json }
```

The non-zero exit codes fail the job automatically — a CI run will not pass if your training data has drifted from the cert.

## Offline / air-gapped audit

```bash
# Pre-stage a copy of the issuer's keys document, then verify with no network.
curl -O https://certifieddata.io/.well-known/certifieddata-keys.json
certifieddata-verify ./received-cert.json --keys ./certifieddata-keys.json --offline
```

`--offline` refuses to make any network call. Combined with `--keys`, it produces a fully reproducible audit you can replay months later.

## How CertifiedData certificates work

CertifiedData.io issues `cert.v1` documents that bind together:

1. A **dataset hash** — `sha256(file_bytes)` for binary data (CSV, Parquet) or `sha256(JCS(payload))` for structured data.
2. **Provenance** — the algorithm used, row/column counts, the issuance timestamp, and an opaque `certification_id`.
3. A **signer** — `key_id`, with the public key fetched from the issuer's `.well-known` endpoint.

The signature is computed over the RFC 8785 JCS canonicalization of the certificate **with the `signature` field omitted** — this is the only sane way to sign a JSON document and have it round-trip through arbitrary JSON parsers.

We use Ed25519 because it is fast, deterministic, has small keys (32 bytes) and small signatures (64 bytes), and is built into Node's `crypto` module. We never sign the field that contains the signature, and we never claim a verdict beyond what the cert actually says — for example, we will not call a CTGAN cert "differentially private" unless the metadata explicitly carries a non-null `epsilon` and the algorithm is `DP-CTGAN`.

## Reporting vulnerabilities

See [SECURITY.md](SECURITY.md). Please do not open a public issue for cryptographic findings — email `security@certifieddata.io` and we will respond within 48 hours.

## Related projects

- [`@certifieddata/pii-scan`](https://github.com/certifieddata/pii-scan) — scan datasets for PII before certifying them
- [`certifieddata/reference-impl`](https://github.com/certifieddata/reference-impl) — a 50-line EU AI Act Article 12 reference application that uses this CLI

## License

MIT — see [LICENSE](LICENSE).
30 changes: 30 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Security policy

## Reporting a vulnerability

Please report security issues privately to **security@certifieddata.io**.

- Please do not open a public GitHub issue for cryptographic findings until a fix is released.
- We aim to acknowledge reports within **48 hours**, ship a fix or workaround within **7 days** for high-severity findings, and request a CVE for any cryptographic finding.
- We will credit you in the release notes unless you ask us not to.

## Scope

In scope:

- Bypass of signature verification in `verifyCertificate` (false `VALID` verdict on a cert that should not verify).
- Incorrect handling of revoked keys, malformed payloads, or non-canonical JSON that produces an exploitable mismatch between signed and verified bytes.
- Any path where the CLI returns exit code 0 for a certificate that does not actually verify.
- Cache poisoning of `~/.certifieddata/keys.json` that could elevate an untrusted key to "trusted".

Out of scope:

- Issues in upstream Node.js `node:crypto` — please report those to the Node.js project.
- DoS on a single host (e.g. very large fixtures making `sha256File` slow).
- Anything depending on a compromised local environment that already has write access to your home directory.

## Supported versions

Until 1.0.0, we support the latest minor release on the `0.x` line. After 1.0.0, we will support the two most recent minor versions.

This package follows [semantic versioning](https://semver.org/). Any breaking change to the verification path or to the `cert.v1` shape is a major version bump.
23 changes: 23 additions & 0 deletions eslint.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import js from "@eslint/js";
import tseslint from "typescript-eslint";

export default tseslint.config(
js.configs.recommended,
...tseslint.configs.recommended,
{
ignores: ["dist/", "fixtures/", "node_modules/"],
},
{
rules: {
"@typescript-eslint/no-unused-vars": [
"error",
{
argsIgnorePattern: "^_",
varsIgnorePattern: "^_",
destructuredArrayIgnorePattern: "^_",
ignoreRestSiblings: true,
},
],
},
},
);
33 changes: 33 additions & 0 deletions fixtures/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Test fixtures

These files are committed so reviewers can verify the verifier without
trusting a pre-built artifact. They are produced by `generate.mjs`,
which:

1. Generates a fresh Ed25519 keypair via `node:crypto`.
2. Hashes a tiny CSV dataset.
3. Signs a `cert.v1` certificate over its JCS canonicalization (with
the `signature` field omitted).
4. Writes the corresponding `keys.json`, plus three failure-case
certs: tampered, unknown-key, malformed.

Regenerate with:

```bash
npm run fixtures
```

After regeneration, all three CLI tests should still pass — the test
script depends only on the structural shape of the fixtures, not the
specific key material.

## Files

| File | Purpose | Expected verdict |
|---|---|---|
| `valid-cert.json` | Cleanly-signed demo cert | `VALID` |
| `tampered-cert.json` | `rows` mutated post-signature | `INVALID` |
| `unknown-key-cert.json` | Signed by a key not in `keys.json` | `UNKNOWN_KEY` |
| `malformed-cert.json` | `signature` field removed | `MALFORMED` |
| `keys.json` | Trusted-keys document for the test issuer | — |
| `valid-dataset.csv` | The dataset hashed into `valid-cert.json` | — |
Loading
Loading