Whereas

The open-source contract repository.

Built-in clause extraction, playbook deviation analysis, and embedded e-signature via DocuSeal.

Note: Whereas is currently pre-v0.1 and not ready for production use. If you want to follow along or contribute, watch this repo and read CONTRIBUTING.md.

What it does

Whereas is a self-hostable contract management system aimed at small and mid-sized legal teams who don't want their contracts living on someone else's server. It covers the post-execution side of contract lifecycle management:

Repository. Upload contracts (.docx, .pdf, scanned PDFs via OCR). Search across the corpus. Permission-scope by user and team.
Metadata extraction. On upload, an LLM extracts parties, effective date, term, governing law, contract value, renewal terms, and termination provisions — every field accompanied by a span citation back to the source document and a confidence score.
Clause segmentation. Contracts are segmented and tagged against the CUAD taxonomy (41 clause types). You can extend the taxonomy.
Playbook deviation. Define your firm's positions in YAML (e.g., "indemnification cap ≤ 12 months fees"). Whereas flags clauses that deviate, with severity and a suggested redline.
Q&A over your corpus. RAG-based question answering scoped to user permissions.
Embedded e-signature. DocuSeal runs alongside Whereas in the same Docker Compose. Send contracts for signature without leaving the app.

What Whereas is not (yet): a drafting tool, a Word/Outlook plugin, a negotiation/redlining workflow, or a SaaS. Those may come in later versions or never.

Working representation: DOCX/PDF + Markdown snapshots

Whereas is moving toward a PWA-first, Markdown-as-working-copy architecture:

The DOCX or PDF you upload remains the original legal artifact. Signed PDFs from DocuSeal are the source of truth for execution.
On upload, Whereas also stores a lightweight Markdown working snapshot (ContractMarkdownSnapshot) for fast preview, search, clause analysis, and future local-first sync. Snapshots are append-only; the latest is fetched via GET /api/contracts/{id}/markdown.
The frontend ships as an installable PWA. Browser file access (the File System Access API) is used only for explicit import, export, "save generated DOCX," and "open original in Word/Google Docs" workflows. Normal contract previews never trigger filesystem permission prompts — they read from app/backend storage.
Markdown conversion uses Microsoft MarkItDown when installed and falls back to the existing extracted plain text otherwise. Conversion failure is non-fatal: the upload still succeeds and the original remains downloadable.
The contract workspace defaults to the Markdown preview when one is available. It's optimized for skimming, search, and the future local-first sync layer. Use "View original" in the document header to switch to the plain-text view used for clause / metadata / finding span citations, or "Download original" to retrieve the underlying DOCX/PDF as the official artifact.
Original legal artifacts are tracked explicitly in a ContractArtifact model alongside the Markdown working snapshot. The original upload is recorded with artifact_type='original_upload' and is_official=true; future PRs add generated DOCX, signed PDFs from DocuSeal, redlines, and exhibits as additional artifact rows. The metadata list is exposed via GET /api/contracts/{id}/artifacts. Markdown snapshots remain the lightweight working representation; artifacts remain the official legal record.
Backfilling existing contracts: contracts created before artifact tracking landed only have the legacy Contract.s3_key / mime_type / file_hash_sha256 columns. Download falls back to those columns when no original_upload artifact exists, but operators should run the backfill once after deploying so the artifact row is the source of truth:
```
# Dry run — report what would be created without writing anything.
python -m backend.scripts.backfill_contract_artifacts --dry-run

# Real run.
python -m backend.scripts.backfill_contract_artifacts

# Optionally scope to a single organization.
python -m backend.scripts.backfill_contract_artifacts --organization-id <uuid>
```
The script is idempotent: it skips contracts that already have an original_upload artifact and contracts with no legacy storage key. The legacy Contract columns are retained as a fallback and are not removed by the backfill.
Agreement templates are first-class CLM objects. Operators upload an NDA/MSA/SOW/DPA/etc. as an AgreementTemplate; the original DOCX or PDF is stored as an official AgreementTemplateArtifact (is_official=true, artifact_type='original_upload') and the same Markdown converter that runs on contracts produces a working AgreementTemplateMarkdownSnapshot for fast preview and future local-first sync. Conversion failure is non-fatal: the upload still succeeds and the original remains the authoritative file.
Template variables are metadata only in this release. Templates can declare AgreementTemplateVariable rows (counterparty_name, effective_date, ...) so a later PR can render filled DOCX agreements; this PR does not generate DOCX, send to DocuSeal, or substitute placeholders. Variable keys are unique per template and are listed sorted by sort_order.

Stack

Backend: Python 3.11, FastAPI, SQLAlchemy, Alembic
Database: Postgres 16 with pgvector
Storage: S3-compatible (MinIO by default for self-host)
LLM abstraction: LiteLLM (default target: Ollama running locally; BYOK for any OpenAI-compatible provider)
Embeddings: BGE-M3 via Ollama or a hosted endpoint
Frontend: React + Vite + Tailwind
E-signature: DocuSeal, deployed alongside Whereas, sharing Postgres
Auth: Whereas-native (Argon2id + sessions). SSO is post-v0.1.

License

Whereas is licensed under AGPL-3.0-or-later. See LICENSE.

If you modify Whereas and run it as a network service, you must make your modifications available under the same license. If you want a commercially-licensed version (no copyleft obligations), get in touch.

The name "Whereas" and any associated logos are not covered by the AGPL — see TRADEMARK.md for the trademark policy. You can fork the code freely; you cannot ship a fork called "Whereas."

Quickstart (local dev)

Requires Docker and Docker Compose.

git clone https://github.com/foolish-bandit/whereas.git
cd whereas
./scripts/generate-secrets.sh   # generates .env with random secrets
# edit .env if you want to change LLM provider; defaults to local Ollama
docker compose up -d

Whereas will be available at http://localhost:8080. DocuSeal at http://localhost:8081. The API at http://localhost:8000.

For production deployment, read docs/deployment-guide.md before exposing this to the internet. There are non-negotiable hardening steps that the local quickstart skips.

For the security architecture and threat model, see docs/security-model.md.

For frontend-only development (without the full Docker stack), see frontend/README.md. The frontend dev server runs on http://localhost:5173 and expects the backend at VITE_API_BASE_URL (default http://localhost:8000).

For an end-to-end local setup (infra + backend + frontend + first-run workspace), see docs/local-developer-quickstart.md.

On first run, open the app, go to Settings, and click Create local development workspace. That creates an organization, a wrapped master key, and an active user, and stores the dev user UUID in your browser. The endpoint backing this (POST /api/setup/dev) is disabled when ENVIRONMENT=production.

A hosted UI preview, running in demo mode against fictional sample data, is at https://whereas.pages.dev/. No real contracts; nothing uploaded there is sent anywhere.

Project status

Repo scaffold
Document upload + storage
Metadata extraction with span citations
Clause segmentation (v1: heuristic, exact-span grounded — see note below)
Playbook YAML schema and deviation engine (schema + loader landed; deviation engine pending)
DocuSeal integration (embedded + auth bridge)
RAG Q&A
Permissioning model
First tagged release (v0.1)

Clause segmentation (v1)

Uploaded contracts are now segmented into clause-level units via a deterministic heuristic (numbered sections, Section N, ARTICLE V, ALL-CAPS / title-case headings, with a paragraph fallback). Every persisted clause is grounded in the original contract text by exact character offsets — Contract.full_text[span_start:span_end] == Clause.text is enforced at persistence time, and ungrounded candidates are dropped rather than written. The contract detail endpoint includes the clauses; a GET /api/contracts/{id}/clauses endpoint is also available.

This is a foundation for downstream features (playbook deviation, clause library, RAG Q&A); it is not a clause manager, not an LLM-driven classifier, and not legal advice. Clause types are labelled conservatively from a CUAD-inspired taxonomy when the heuristics are confident, and left unclassified otherwise.

Playbook schema and rule loader

Playbooks are firm-defined YAML documents that capture review positions on a particular contract type — for example, "for mutual NDAs in California, governing law should be California; assignment must require prior written consent." They are validated by backend/app/services/playbook_loader.py and persisted per organization in the playbooks table. The v1 schema supports three rule types:

required_clause — a clause of the named clause_type should be present somewhere in the contract.
preferred_value — a specific extracted value is preferred (e.g. governing law = California). Carries an expected_value.
text_contains — the clause text must contain all of the listed required_terms (case-insensitive). At least one term required.

The API surface (/api/playbooks) supports validate, create, list, detail, and soft-delete (deactivate). An example playbook ships under backend/app/services/playbook_examples/mutual_nda.yaml, and the read-only Playbooks page in the frontend renders them with a right-hand YAML pane.

Deterministic playbook review

Whereas can now run a playbook against a contract's segmented clauses and return pass/fail results per rule. The matching engine (backend/app/services/playbook_matcher.py) is deterministic — no LLM call, no embeddings, no paraphrase inference — and only uses the data that is already exact-span-grounded by the segmenter. The transient endpoint is POST /api/contracts/{contract_id}/playbook-review; the contract detail page's Review tab also runs the persisted flow described below and highlights the cited evidence span when an evidence row is clicked.

Persisted playbook review findings

A review can now be saved as a PlaybookReviewRun with one DeviationFinding row per failed deterministic outcome. Pass results are not persisted as separate rows; the run record carries the aggregate rules_checked / passed_count / failed_count so the audit signal is preserved without per-rule pass noise. Findings are generated from firm-authored YAML playbooks and remain exact-span-grounded — the matcher copies span_start / span_end straight off the source Clause row.

Endpoints (under /api/contracts/{contract_id}):

POST /playbook-review/runs — run the matcher and save the failed findings under a new PlaybookReviewRun. Marks any prior open findings on the same (contract, playbook) as superseded; reviewed and ignored findings are left alone so re-running a playbook does not silently reset deliberate human decisions.
GET /playbook-review/runs and GET /playbook-review/runs/{run_id} — list runs / fetch a run's findings and per-rule outcomes.
GET /findings — list a contract's findings, with optional filters on playbook_id, finding_status, severity, and review_run_id.
PATCH /findings/{finding_id} — update the reviewer workflow status (open / reviewed / ignored). Deterministic fields (status, message, span, rule metadata) are immutable through this endpoint.

LLM redlines and suggested replacement language remain future work. Findings are not legal advice — Whereas surfaces information about contracts; it does not replace human review.

Contributing

See CONTRIBUTING.md. Whereas is community-driven; PRs welcome. Read the design principles before proposing significant changes. For an architecture/status handoff covering PRs #32–#37 and the recommended next steps, see docs/local-first-pwa-clm-architecture.md.

Acknowledgments

Whereas builds on the work of:

DocuSeal for the e-signature layer
CUAD for the contract clause taxonomy and dataset
LiteLLM for provider-agnostic LLM access
pgvector for embedding storage

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
backend		backend
docker/postgres		docker/postgres
docs		docs
frontend		frontend
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TRADEMARK.md		TRADEMARK.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whereas

What it does

Working representation: DOCX/PDF + Markdown snapshots

Stack

License

Quickstart (local dev)

Project status

Clause segmentation (v1)

Playbook schema and rule loader

Deterministic playbook review

Persisted playbook review findings

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whereas

What it does

Working representation: DOCX/PDF + Markdown snapshots

Stack

License

Quickstart (local dev)

Project status

Clause segmentation (v1)

Playbook schema and rule loader

Deterministic playbook review

Persisted playbook review findings

Contributing

Acknowledgments

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages