Project Coherent Storage - ADR Package

Project: Project Coherent Storage
Architecture cycle: 2026-Q2
Architecture focus:
- Auto-Scaling Ai/HPC storage architecture featuring accelerator-centric Coherent Memory-Mesh
- Custom Max-IO Grid-Engine w/ ACID-compliant cache transactions for superscaler architectures
- Custom UA-Link pod-scale systems design with host-based CXL memory pools to clear the 'memory wall'
- Fully automated deployment w/ Ansible workflows, SLURM workload management, netboot ramdisks
- Dev/Lab, Stage/LT, Prod env support with full CI/CD test-coverage, load-test profiles, ITIL change-controls
- Gate-based workflows with 'failure-semantics', SLO & SLA definitions for observability and monitoring
- Network environment scaling from 10-25Gb/s to 400-800Gb/s port-based tuning profiles for RDMA/RoCEv2
- NVMe-oF with DPU hardware-based protocol offloads for OpenZFS (storage tiering + ACLs + DoD compliant encryption)
- LLM Prompt-Cache acceleration supported via disaggregated heterogeneous GP-GPU compute (AMD, NVIDIA, NPU, FPGA)
Generated: 2026-05-18
Automation: Tracked workflows, machine profiles, neteng scopes, etc located in 'Infra-Stage4-LLVM-NoGNU' repository.
Status: Proposed / Review

Visualized High-Level Architecture Scope

Operational View: API Front-End to Block Storage Layer

Operational View: Coherent-Mesh Traversal, S3-Object to REST-API

Purpose

This package refreshes the ADR set using the expanded RAG corpus and the project directives. It keeps the core invariant: inference actors connect to the Coherence-CE Memory Mesh and never bind directly to OpenZFS, DPU, RoCEv2, NVMe-oF, CXL, UA-Link, VLANs, RDMA memory handles, or physical storage internals.

The architecture emphasizes:

Coherence-CE namespace modalities with explicit Unified Namespace and Dimensional Indexed Namespace workflows for scalable cache locality.
UA-Link enabled pod-scale systems as a scale-up accelerator domain inside pod/rack boundaries.
Network architecture across scale-up and scale-out planes, separating UA-Link accelerator fabrics, Ethernet/RDMA scale-out, storage/NVMe-oF fabrics, management, telemetry, and timing.
CXL memory pools as governed T1/T1.5 memory capacity for warm KV/prefix state, metadata, vector heads, and future shared-memory research paths.
RDMA/RoCEv2 performance tuning with explicit PFC/ECN/DCQCN, traffic-class, rail, telemetry, and failure semantics.
DPU/SmartNIC storage offload as a hard requirement for NVMe-oF/RDMA storage-network paths.
General-purpose GPU and heterogeneous accelerator scheduling, covering vendor capability profiles and admission-control policy.
Reference Architecture Focused Development, baseline scoped architecture elements suitable for layering, adapting, and ease of feature adoption as the industry rapidly evolves; fully open-source across the entire application stack.

Source basis

The source pass extracted text from 363 PDFs in the RAG-DATA/ corpus into a local processing cache.

Text extraction OK: 360 PDFs
Source map: review-artifacts/rag-extraction-and-source-map.md

Important sources include the UA-Link white paper, UniFabriX UA-Link material, OCP Open Cluster inference/training fabric reference architectures, OCP MRC, Arista/Broadcom lossless Ethernet/RoCE material, AMD Pensando/Pollara cluster and product collateral, Intel Gaudi 3 cluster design, CXL/KV/GPU research, and prior Marvell/XConn/CXL/DPU materials.

Package index

Path	Purpose
`reports/project-coherent-storage_architecture-report.md`	Main architecture report for UA-Link pod scale, CXL memory pools, RDMA/RoCEv2, and heterogeneous GP-GPU compute.
`reports/project-coherent-storage_engineering-deep-dive.md`	Top-down engineering deep-dive from OpenAI/user layer through global/regional/datacenter load-balancer meshes and intra-datacenter storage layers.
`reports/project-coherent-storage_overview__executive-overview.md`	Executive overview for business value, hard requirements, namespace posture, and residual risks.
`reports/project-coherent-storage_overview__director-overview.md`	Director overview for procurement, lifecycle, deployment risk, and operational readiness.
`reports/project-coherent-storage_overview__engineering-overview.md`	Engineering/ARB overview for data paths, CXL roles, namespace rules, and validation checklist.
`reports/project-coherent-storage_s3-object-rest-api-translator-design.md`	Translator design report for S3/Object REST access and explicit prefix-cache namespace modalities.
`reports/project-coherent-storage_coherence-ce-object-chunking-and-lfs-gateway-design.md`	Design report for Coherence-CE object chunking, manifest semantics, and Git LFS gateway migration.
`api/coherence-ce-vllm-adapter.openapi.yaml`	OpenAPI contract for Coherence-CE vLLM adapter operations.
`api/s3-object-rest-translator.openapi.yaml`	OpenAPI contract for S3/Object REST translator routes, including Unified and Dimensional Indexed Namespace routes.
`api/coherence-ce-object-chunking-lfs-gateway.openapi.yaml`	OpenAPI contract for Coherence-native object chunking and Git LFS gateway facade routes.
`adr/diagrams/.puml`, `.png`, `*.svg`	Per-ADR PlantUML source and rendered PNG/SVG assets.
`diagrams/.puml`, `.png`, `*.svg`	Report-level PlantUML source and rendered PNG/SVG assets.
`review-artifacts/rag-extraction-and-source-map.md` and JSON peer	Extraction evidence and source map.
`review-artifacts/ietf-icnrg-chunking-source-map.md` and JSON peer	Source map for CCNx chunking, FLIC, RFC 8569/8609, and Git LFS API references.
`docs/git-lfs-policy.md`	Repository Git LFS lock-verification, normalized `.gitattributes`, pre-push hook, test-server, and migration policy.

ADR index

ADR File	Document Function
`ADR-001_Inference_Storage_Principles_and_SLOs.md`	Defines inference-first storage principles, latency SLOs, tier boundaries, and workload classes that govern all later ADRs.
`ADR-002_Hot_KV_and_Prefix_Cache_Data_Plane.md`	Defines the hot KV/prefix-cache data plane and keeps inference actors behind the Coherence-CE Memory Mesh.
`ADR-003_Model_Weight_Object_and_Corpus_Data_Tiers.md`	Defines model-weight, adapter, tokenizer, object, corpus, and artifact tiers for reproducible inference data placement.
`ADR-004_RDMA_Fabric_and_GPU_Direct_Data_Paths.md`	Defines RDMA, RoCEv2, GPU-direct, and scale-out data-path rules for cross-node inference and storage movement.
`ADR-005_DPU_and_SmartNIC_Offload_Boundaries.md`	Defines mandatory DPU/SmartNIC offload boundaries for NVMe-oF, RDMA mediation, isolation, telemetry, and degraded host fallback.
`ADR-006_OpenZFS_NVMe_oF_and_Media_Layout.md`	Defines OpenZFS, NVMe-oF, mirrored NAND, media layout, and durable block-substrate rules.
`ADR-007_Inference_Scheduler_Locality_and_Admission_Control.md`	Defines scheduler admission using model, KV, fabric, CXL, DPU, rail, and locality telemetry.
`ADR-008_RAG_Vector_Index_and_Corpus_Service.md`	Defines immutable RAG corpus, embedding, vector-index, retrieval-cache, and corpus-service architecture.
`ADR-009_Observability_Benchmarking_and_Rollout_Gates.md`	Defines observability, benchmark, failure-drill, and rollout gates for inference, fabric, storage, CXL, and scheduler claims.
`ADR-010_Coherence_CE_Write_Policy_to_OpenZFS.md`	Defines Coherence-CE write-through, write-back, write-around, and write-behind policy to OpenZFS by durability class.
`ADR-011_KV_Durability_Classes.md`	Defines KV-D0 through KV-D5 durability classes used by Coherence-CE, OpenZFS write policy, failure recovery, and scheduler admission.
`ADR-012_Coherence_CE_vLLM_Adapter_API_Contract.md`	Defines the Coherence-native and OpenAI-compatible API contract exposed to vLLM adapters without leaking lower-layer storage or fabric.
`ADR-013_Failure_Semantics_and_Fencing.md`	Defines failure semantics, fencing, recovery, drain behavior, and degraded-mode rules across compute, fabric, DPU, CXL, and storage.
`ADR-014_Coherence_Metrics_Scheduler_Admission.md`	Defines how Coherence-CE metrics roll up into scheduler GREEN, AMBER, RED, and DRAIN admission states.
`ADR-015_CXL_Memory_Tiering_and_OpenZFS_Interaction.md`	Defines CXL T1/T1.5 memory tiering, memory-pool governance, and safe OpenZFS-adjacent CXL roles.
`ADR-016_Roadmap_Evidence_and_Public_Claim_Guardrails.md`	Defines evidence grades and public-claim guardrails for vendor roadmap, partnership, and integration statements.
`ADR-017_Research_Metadata_and_Arxiv_Publication_Workflow.md`	Defines research metadata, arXiv API/bulk-data, Markdown, LaTeX, BibTeX, and publication workflow requirements.
`ADR-018_UALink_Pod_Scale_Fabric_and_Compute_Domains.md`	Defines UA-Link pod-scale accelerator fabric domains and their scheduler-visible but actor-hidden compute locality semantics.
`ADR-019_Pod_Scale_Network_Architecture_and_RDMA_RoCEv2_Tuning.md`	Defines pod-scale network planes and RDMA/RoCEv2 tuning gates for traffic classes, PFC, ECN/DCQCN, rails, and telemetry.
`ADR-020_CXL_Memory_Pools_for_UALink_Pods.md`	Defines CXL memory pools inside UA-Link pods as governed Coherence-owned warm capacity with ownership, latency, and failure gates.
`ADR-021_Heterogeneous_GP_GPU_Compute_and_Scheduler_Governance.md`	Defines heterogeneous GP-GPU and accelerator capability profiles for scheduler governance across vendors and fabrics.
`ADR-022_S3_Object_to_REST_API_Protocol_Mapping_Translator.md`	Defines the S3/Object-to-REST translator and its object, KV, vector, and prefix-cache REST contract.
`ADR-023_Coherence_CE_Namespace_Modalities.md`	Defines Unified Namespace and Dimensional Indexed Namespace workflows, API route semantics, and locality-governance rules.
`ADR-024_System_Level_Benchmarking_Suite_Definitions.md`	Defines system-level benchmark suite taxonomy across component, service, test-intent, SLURM execution, cross-platform tooling, and evidence gates.
`ADR-025_Broad_Systems_E2E_Testing_Workflows_and_Tooling.md`	Defines broad-systems E2E testing workflows, scheduler-adapter execution, failure-mode tests, evidence bundles, and CI/CD gates.
`ADR-026_Coherence_CE_Object_Chunking_and_Manifest_Semantics.md`	Defines Coherence-CE internal object chunking, manifest commit semantics, S3 multipart mapping, Git LFS facade behavior, and RAG byte-object boundaries.

Top-down architecture composition

The design composes the system from inference SLOs down through hot-state placement, namespace modality, data tiers, fabrics, offload, durable media, scheduler admission, failure semantics, CXL/UA-Link pod resources, heterogeneous accelerator governance, S3/Object REST translation, object chunking and manifest semantics, Git LFS gateway behavior, benchmark evidence, broad-systems E2ET, and research-publication workflow. Each ADR embeds its PNG diagram and has a PlantUML source file plus PNG/SVG renders under adr/diagrams/.

ADR	Architecture interaction diagram
`ADR-001_Inference_Storage_Principles_and_SLOs.md`	PNG / SVG / PUML
`ADR-002_Hot_KV_and_Prefix_Cache_Data_Plane.md`	PNG / SVG / PUML
`ADR-003_Model_Weight_Object_and_Corpus_Data_Tiers.md`	PNG / SVG / PUML
`ADR-004_RDMA_Fabric_and_GPU_Direct_Data_Paths.md`	PNG / SVG / PUML
`ADR-005_DPU_and_SmartNIC_Offload_Boundaries.md`	PNG / SVG / PUML
`ADR-006_OpenZFS_NVMe_oF_and_Media_Layout.md`	PNG / SVG / PUML
`ADR-007_Inference_Scheduler_Locality_and_Admission_Control.md`	PNG / SVG / PUML
`ADR-008_RAG_Vector_Index_and_Corpus_Service.md`	PNG / SVG / PUML
`ADR-009_Observability_Benchmarking_and_Rollout_Gates.md`	PNG / SVG / PUML
`ADR-010_Coherence_CE_Write_Policy_to_OpenZFS.md`	PNG / SVG / PUML
`ADR-011_KV_Durability_Classes.md`	PNG / SVG / PUML
`ADR-012_Coherence_CE_vLLM_Adapter_API_Contract.md`	PNG / SVG / PUML
`ADR-013_Failure_Semantics_and_Fencing.md`	PNG / SVG / PUML
`ADR-014_Coherence_Metrics_Scheduler_Admission.md`	PNG / SVG / PUML
`ADR-015_CXL_Memory_Tiering_and_OpenZFS_Interaction.md`	PNG / SVG / PUML
`ADR-016_Roadmap_Evidence_and_Public_Claim_Guardrails.md`	PNG / SVG / PUML
`ADR-017_Research_Metadata_and_Arxiv_Publication_Workflow.md`	PNG / SVG / PUML
`ADR-018_UALink_Pod_Scale_Fabric_and_Compute_Domains.md`	PNG / SVG / PUML
`ADR-019_Pod_Scale_Network_Architecture_and_RDMA_RoCEv2_Tuning.md`	PNG / SVG / PUML
`ADR-020_CXL_Memory_Pools_for_UALink_Pods.md`	PNG / SVG / PUML
`ADR-021_Heterogeneous_GP_GPU_Compute_and_Scheduler_Governance.md`	PNG / SVG / PUML
`ADR-022_S3_Object_to_REST_API_Protocol_Mapping_Translator.md`	PNG / SVG / PUML
`ADR-023_Coherence_CE_Namespace_Modalities.md`	PNG / SVG / PUML
`ADR-024_System_Level_Benchmarking_Suite_Definitions.md`	PNG / SVG / PUML
`ADR-025_Broad_Systems_E2E_Testing_Workflows_and_Tooling.md`	PNG / SVG / PUML
`ADR-026_Coherence_CE_Object_Chunking_and_Manifest_Semantics.md`	PNG / SVG / PUML

Public claim guardrails

UA-Link, CXL, RoCEv2, DPU, and heterogeneous GPU claims use the evidence-grade rule structures:

Direct: source explicitly states the relationship or capability.
Adjacent: relevant to architecture but not proof of a named integration.
Negative-control: retained to prevent overclaiming.
Not found in current sweep: searched but no direct source-backed mention found.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
adr		adr
api		api
diagrams		diagrams
docs		docs
latex		latex
operations		operations
policy/git-lfs		policy/git-lfs
reports		reports
research		research
review-artifacts		review-artifacts
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Project Coherent Storage - ADR Package

Visualized High-Level Architecture Scope

Operational View: API Front-End to Block Storage Layer

Operational View: Coherent-Mesh Traversal, S3-Object to REST-API

Purpose

Source basis

Package index

ADR index

Top-down architecture composition

ADR diagram gallery

Public claim guardrails

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages