Skip to content

ronishgeorge/modelmesh

Repository files navigation

ModelMesh

Distributed ML Model Serving Platform — register, deploy, A/B test, and roll back machine-learning models in seconds, not days.

CI Python License


The Platform Product Manager problem

ML teams produce models faster than infrastructure teams can deploy them. The friction points compound:

  • Deployment takes days, not minutes — every model requires a custom service stub, Dockerfile, and SRE review
  • No canary or shadow primitives — teams either YOLO 100% rollouts or build bespoke routing logic per project
  • Rollback is manual — when a regression slips through, the response time is whatever the on-call PM can manage by Slack
  • No drift visibility — silent input distribution shifts go undetected until business metrics tank weeks later

ModelMesh is a self-hostable, framework-agnostic serving platform that turns these problems into one-line operations. It's the open-source primitive that SageMaker, Vertex AI, and Modal sell as managed services.


Three measurable results

# Result Value on benchmark stack
1 Deployment time <60 seconds end-to-end from client.register_model() to first served prediction (vs. typical days of manual setup)
2 Performance under load p99 < 80ms at 500 RPS sustained on a single instance (with Redis prediction cache and dynamic batching enabled)
3 Reliability under failure Automatic rollback in <30s when canary error rate exceeds threshold (validated via chaos-test fault injection)

Architecture

                       ┌────────────────────────────────────┐
                       │            ModelMesh SDK           │
                       │  client.register_model(...)        │
                       │  client.predict(...)               │
                       │  client.create_canary(...)         │
                       └─────────────────┬──────────────────┘
                                         │ HTTPS
                       ┌─────────────────▼──────────────────┐
                       │       FastAPI Gateway              │
                       │  Auth · Rate limit · Request ID    │
                       │  Latency histogram (Prometheus)    │
                       └─────────────────┬──────────────────┘
                                         │
       ┌─────────────────────────────────┼─────────────────────────────────┐
       │                                 │                                 │
┌──────▼──────┐                ┌─────────▼─────────┐               ┌───────▼────────┐
│  routing/   │                │   inference/      │               │   registry/    │
│  A/B router │                │  Engine + cache   │               │  Versioned     │
│  Canary auto-rollback        │  Dynamic batching │               │  model store   │
│  Shadow (diff log)           │  P2C load balance │               │  sklearn/torch │
│  P2C load balancer           │                   │               │  /onnx adapters│
└──────┬──────┘                └─────────┬─────────┘               └───────┬────────┘
       │                                 │                                 │
       └─────────────┬───────────────────┼─────────────────────────────────┘
                     │                   │
              ┌──────▼──────┐    ┌───────▼────────┐
              │ reliability │    │ observability/ │
              │ circuit     │    │  Prometheus    │
              │ breaker     │    │  Structured    │
              │ retry       │    │  JSON logs     │
              │ chaos       │    │  PSI drift     │
              └─────────────┘    │  Tracing       │
                                 └───────┬────────┘
                                         │
                          ┌──────────────▼──────────────┐
                          │   PostgreSQL  +  Redis      │
                          │  Models · Deployments       │
                          │  Experiments · InferenceLog │
                          └─────────────────────────────┘

Features at a glance

Capability Implementation
Framework-agnostic registry Pluggable adapters for sklearn (joblib), PyTorch (.pt), ONNX runtime
Versioned model store Semantic versioning + immutable artifacts + lineage tracking
A/B routing Consistent-hashing for stable user-to-variant assignment
Canary deployments Weighted traffic split with sliding-window error tracking and auto-rollback
Shadow deployments Production return value preserved; candidate response diff logged for offline analysis
Prediction cache Redis-backed with TTL per model and request-hash keys
Dynamic batching Collects inflight requests up to max_batch_size or max_latency_ms
Load balancing Power-of-two-choices across replicas
Drift detection Population Stability Index (PSI) + Kolmogorov-Smirnov on rolling input windows
Reliability Circuit breakers, exponential-backoff retries, chaos-test utilities
Observability Prometheus histograms/counters, structured JSON logging, OpenTelemetry-style spans
Deployment Docker Compose for local · Kubernetes manifests with HPA for production

Quick start

One-line stack

docker compose up --build
# API           → http://localhost:8000/docs
# Prometheus    → http://localhost:9090
# Grafana       → http://localhost:3000 (admin / admin)

Register and serve a model from the SDK

from sklearn.ensemble import RandomForestClassifier
from modelmesh.sdk import ModelMeshClient

clf = RandomForestClassifier().fit(X_train, y_train)

client = ModelMeshClient(base_url="http://localhost:8000")
client.register_model(
    name="fraud-detector",
    model_object=clf,
    framework="sklearn",
    metadata={"trained_on": "2026-Q1", "auc": 0.94},
)

# Predict
prediction = client.predict("fraud-detector", inputs={"amount": 200, "merchant": "Acme"})
print(prediction)

Start a canary release

# Train and register v2
client.register_model("fraud-detector", model_object=clf_v2, framework="sklearn")

# Roll out gradually — 5% of traffic to v2, auto-rollback if error rate > 2%
client.create_canary(
    name="fraud-detector",
    candidate_version="2.0.0",
    initial_percentage=5,
    auto_rollback_threshold=0.02,
)

# Promote when satisfied
client.promote_canary("fraud-detector")

Shadow deployment (zero user impact)

client.create_shadow(
    name="fraud-detector",
    production_version="1.4.0",
    shadow_version="2.0.0",
)
# Every prediction now serves v1 to the user AND fires v2 in the background.
# Diffs are logged. View at: GET /monitoring/shadow/fraud-detector

API surface

Endpoint Purpose
POST /models Register a new model (multipart upload)
GET /models List with pagination + framework filter
GET /models/{name}/versions Version lineage
POST /predict/{model_name} Inference (single)
POST /predict/{model_name}/batch Batch inference (auto-batched server-side too)
POST /deployments/canary Start a canary release
POST /deployments/canary/{id}/promote Promote canary to 100%
POST /deployments/canary/{id}/rollback Manual rollback
POST /deployments/shadow Start a shadow deployment
GET /monitoring/drift/{name} PSI + KS report on input distribution
GET /monitoring/latency p50 / p95 / p99 over windows
GET /metrics Prometheus scrape endpoint
GET /health Liveness + readiness probes

Full reference in docs/api_reference.md.


Benchmark results

Full results in docs/benchmark_results.md.

Scenario Result
Register → first prediction 42 seconds end-to-end (sklearn 50MB model)
p99 latency @ 500 RPS, cache hit ratio 0.6 76 ms
Throughput with dynamic batching 3.2× single-request at same latency budget
Rollback latency (chaos test) 18 seconds mean, 28s p95
PSI drift detection lag <10 minutes at default 1000-sample window

Repository layout

modelmesh/
├── src/modelmesh/
│   ├── api/                       # FastAPI app + routes (models, inference, deployments, monitoring)
│   ├── registry/                  # Versioned store + sklearn/torch/onnx adapters
│   ├── routing/                   # A/B, canary, shadow, P2C load balancer
│   ├── inference/                 # Engine, Redis cache, dynamic batching
│   ├── observability/             # Prometheus metrics, structured logs, drift (PSI/KS), tracing
│   ├── reliability/               # Circuit breaker, retry, chaos
│   ├── db/                        # SQLAlchemy ORM + Alembic migrations
│   └── config.py                  # Pydantic settings
├── k8s/                           # deployment.yaml, service.yaml, hpa.yaml, configmap.yaml
├── prometheus/                    # prometheus.yml
├── grafana/dashboards/            # Pre-built ModelMesh dashboard
├── examples/                      # End-to-end demos: register, canary, shadow
├── tests/                         # API, registry, routing, SDK, reliability
├── docs/                          # architecture.md, api_reference.md, operations.md, benchmark_results.md
└── docker-compose.yml             # Full local stack (API + Postgres + Redis + Prometheus + Grafana)

Engineering notes

  • Async everywhere: FastAPI + asyncpg + httpx for the gateway path
  • Reproducibility: all randomness controlled via seeded RNGs
  • Typing: Pydantic v2 throughout; mypy strict in CI
  • Migrations: Alembic-managed schema with versions/0001_initial.py
  • Reliability: circuit breakers wrap every model call; configurable failure/recovery thresholds
  • Observability: structured JSON logs with request IDs; Prometheus histograms with bucketing tuned for sub-ms to 5s
  • Pre-commit: ruff, black, mypy, pytest run in GitHub Actions on every push

Resume bullet (for portfolio)

ModelMesh Distributed ML Model Serving Platform | Python, FastAPI, PostgreSQL, Redis, Prometheus, Grafana, Docker, Kubernetes, SQLAlchemy, Alembic

  • Developed a framework-agnostic ML model serving platform spanning 3,300+ lines of production Python and supporting 3 model formats (sklearn, PyTorch, ONNX), 4 deployment strategies (full, canary, shadow, A/B), and 21 versioned REST endpoints, enabling data science teams to ship models through a single registry without bespoke per-project deployment infrastructure
  • Implemented a consistent-hashing A/B router, canary deployments with configurable sliding-window error tracking and auto-rollback, shadow deployments with response-diff logging, Redis-backed prediction cache, dynamic batching, power-of-two-choices load balancing, and PSI + Kolmogorov-Smirnov drift detection across rolling input windows
  • Built a scalable async FastAPI gateway instrumented with 7 named Prometheus metrics (http_requests_total, http_request_duration_seconds, predictions_total, prediction_latency, active_deployments, canary_error_rate, cache_events), structured JSON logging, 3-state circuit breakers (CLOSED/OPEN/HALF_OPEN), Alembic-managed Postgres schema, and Kubernetes HPA manifests — load-tested at 594 RPS sustained on a single instance with p95 latency of 109ms, p99 of 160ms, and 100% success rate across 5,000 concurrent requests (reproducible via python run_load_test.py)

License

MIT — see LICENSE.

Citation

@software{modelmesh2025,
  title   = {ModelMesh: Distributed ML Model Serving Platform},
  author  = {ModelMesh Contributors},
  year    = {2025},
  url     = {https://github.com/yourorg/modelmesh}
}

About

Distributed ML Model Serving Platform with canary releases, shadow deployments, drift detection, and sub-30s automatic rollback

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages