Skip to content

feat: dynamic OpenAPI spec registry (replace bundled specs)#4

Merged
tvt286 merged 21 commits intomainfrom
feat/dynamic-spec-registry
Apr 19, 2026
Merged

feat: dynamic OpenAPI spec registry (replace bundled specs)#4
tvt286 merged 21 commits intomainfrom
feat/dynamic-spec-registry

Conversation

@tvt286
Copy link
Copy Markdown
Contributor

@tvt286 tvt286 commented Apr 18, 2026

Summary

MCP server now fetches OpenAPI specs from VNG Cloud's public docs portal at startup instead of bundling them in the wheel. New products published on docs.api.vngcloud.vn appear automatically on the user's next server restart — no code release needed.

Architecture

Provider pattern so the source can be swapped with a one-line change:

SpecProvider (Protocol)
  ├── RedoclyPortalProvider  ← active (scrapes docs portal)
  ├── LocalDirProvider       (dev/test via GRN_MCP_SPEC_DIR)
  ├── JsonRegistryProvider   (future — S3)
  └── OciRegistryProvider    (future — vCR)

Cache at ~/.greenode/mcp-specs/ with TTL + HTTP conditional GET. Partial failure tolerant — 1 product failing doesn't take down the whole server.

Breaking Changes

  • specs/vks.json no longer bundled in wheel
  • First run of v0.4.0 on a new machine requires network to docs.api.vngcloud.vn
  • Rollback: uvx greenode-mcp-server@0.3.2

New CLI flags

  • --refresh-specs — force re-download from registry
  • --offline — skip registry fetch, use cache only

Build-time URL config

Release workflow bakes DEFAULT_DOCS_PORTAL_URL at build time via GitHub Actions var DOCS_PORTAL_URL. End users cannot override at runtime.

Test Plan

  • 124 unit tests passing
  • Ruff clean across entire codebase
  • Wheel contains registry/ + _build_info.py, does NOT contain specs/
  • CLI shows --refresh-specs and --offline flags
  • GRN_MCP_SPEC_DIR activates LocalDirProvider
  • Smoke test vs real portal — 906 endpoints loaded across VKS, vServer, vLB, vDB, vMonitor, etc.
  • Partial failure tolerance verified (1 bad product skipped, rest loaded)
  • Manual: uvx greenode-mcp-server --help in Claude Code
  • Manual: search_api / call_api work end-to-end with real IAM credentials

Docs Updated

  • src/greenode-mcp-server/README.md — Spec Registry section + troubleshooting
  • src/greenode-mcp-server/CHANGELOG.md — v0.4.0 entry with breaking changes
  • README.md (root) — updated repository structure
  • CLAUDE.md — updated project overview, repo structure, key files table
  • .github/workflows/release.yml — "Bake docs portal URL" step

🤖 Generated with Claude Code

tytv2 and others added 8 commits April 18, 2026 10:53
MCP server now fetches specs from VNG Cloud's public docs portal at
startup and caches them under ~/.greenode/mcp-specs/ — new products
appear automatically without any server release.

- Add SpecProvider protocol + factory (registry/provider.py, factory.py)
- Add RedoclyPortalProvider scraping docs.api.vngcloud.vn (~906 endpoints
  across VKS, vServer, vLB, vDB, vMonitor, and more)
- Add LocalDirProvider for dev/test (GRN_MCP_SPEC_DIR env var)
- Add SpecCache with TTL + HTTP conditional GET (registry/cache.py)
- Add load_specs orchestrator with offline/refresh flags (registry/loader.py)
- Integrate into api_index.py via initialize_index()
- New CLI flags: --refresh-specs, --offline
- Bake DEFAULT_DOCS_PORTAL_URL at build time via CI (release.yml)
- Delete bundled specs/ directory
- Update README, CHANGELOG, CLAUDE.md

Breaking: first run requires network to docs.api.vngcloud.vn. Roll back
with uvx greenode-mcp-server@0.3.2 if needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cacheless providers (local-dir) now fetch directly without touching the
on-disk cache. Previously running with GRN_MCP_SPEC_DIR would pollute
~/.greenode/mcp-specs/ with whatever was in the local dir, then the next
production run would read that stale data back.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
VNG Cloud APIs wrap list payloads under various keys (listData, data,
results, records) besides 'items'. Previously _format_response only
recognized 'items' — other responses fell back to _format_object which
dumps every field of every item, blowing LLM context on large lists
(e.g. security groups returned as 11.9k-token raw JSON).

Also cap list output at 30 rows with a footer suggesting pagination.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously search was strict AND-match — "flavor" in product=vks
returned empty even though vServer has flavor endpoints. Now falls back:

  Tier 1: AND match, scoped by product
  Tier 2: AND match, all products (when scoped gives 0)
  Tier 3: OR match, scoped by product
  Tier 4: OR match, all products

Also:
- Simple stemming: "clusters" matches "cluster" (trailing -s stripped
  for words >4 chars)
- Relevance ranking: summary (+3) > path (+2) > description (+1)
  so strongest matches surface first
- Entry.format() now prefixes [product] so AI sees which product each
  result belongs to — important when fallback returns cross-product
  matches

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spawns greenode-mcp-server as stdio subprocess and verifies the basic
MCP handshake (initialize → tools/list → tools/call search_api) works
end-to-end. Uses GRN_MCP_SPEC_DIR fixture to avoid docs portal dep.

Catches regressions in:
- Protocol-level serialization (tool schemas, JSON-RPC framing)
- Tool registration (all 8 tools must be exposed)
- Startup sequence (initialize_index must complete before tools/call)

Runs after unit tests + ruff in the PR workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously `{"status": "DELETING"}` rendered as `**status**: DELETING`
— markdown bold on a lone key is noisy. Now single-field dicts render
as plain `status: DELETING`. Multi-field responses keep bold for key
emphasis.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
### Bug 1: "Invalid kube-config: expected key current-context"

VKS GET /v1/clusters/{id}/kubeconfig returns a JSON object
ClusterKubeConfigDto:
  {"kubeConfig": "<yaml string>", "status": "ACTIVE", ...}

k8s_client_cache was calling get_raw() and treating the whole JSON as
the kubeconfig YAML. yaml.safe_load happily parsed it (JSON is valid
YAML) into a dict with keys like 'kubeConfig', 'status', 'expirationAt'
— then the kubernetes library choked on the missing 'current-context'.

Fix: use get() to parse JSON, check status, extract kubeConfig field,
yaml.safe_load THAT string. Clear errors for NONE/CREATING/ERROR so the
caller knows to request a kubeconfig first.

### Bug 2: list_k8s_resources / manage_k8s_resource require api_version

For common built-in kinds (Pod, Deployment, Service, PVC, ...) the
api_version is well-known. Requiring users to guess it blocks the
happy path. Now api_version is optional: if unset, look up the kind
in a built-in COMMON_API_VERSIONS map (31 kinds). Custom resources
still need explicit api_version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously list_k8s_resources returned only name/namespace/labels/
annotations — AI had to call manage_k8s_resource per-item to see if
pods were Running, PVCs Bound, deployments rolled out, etc. Slow and
token-heavy for large namespaces.

Now each summary carries a compact status_summary string:
- Pod: "Running (ready 2/2, restarts 0)"
- Deployment/StatefulSet/ReplicaSet: "3/3 ready"
- DaemonSet: "3/3 ready"
- Service: "LoadBalancer 10.0.0.1 → 1.2.3.4"
- PersistentVolumeClaim: "Bound (10Gi)"
- Node: "Ready (v1.29.0)"
- Job: "active=0 succeeded=1 failed=0"
- Ingress: "1.2.3.4" or "no address"
- Unknown kind: falls back to status.phase

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/greenode-mcp-server/tests/test_status_summary.py Fixed
tytv2 and others added 13 commits April 18, 2026 15:56
…vents

Per the design spec, only Secret reads should require the sensitive-data
flag. get_pod_logs and get_k8s_events were scope creeped onto the same
gate, blocking common debug workflows like "what's in the logs?" and
"why is this pod pending?".

- Pod logs and events are routine debug reads, similar to listing
  resources — no stricter guard than list_k8s_resources itself.
- Apps should not log secrets; if they do, that's an app-layer bug.
- Secrets remain guarded (manage_k8s_resource read on kind=Secret).

Docs (README, CLAUDE.md, --allow-sensitive-data-access help text) already
scoped the flag to Secrets — no doc changes needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…veat

Output previously showed `namespace: null` when listing cross-namespace,
which can read as "default namespace" or "unknown". Now shows
`namespace: "all namespaces"` explicitly when scope is cluster-wide.

Added docstring notes:
- Explicit hint that leaving namespace empty lists all namespaces
- Warning that `status.phase != Running` misses CrashLoopBackOff pods
  (they keep phase=Running). Point AI at status_summary for reliable
  unhealthy-pod detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously apply_yaml required yaml_path (absolute server-local path) and
a namespace. That breaks when:
- MCP client and server run on different machines (common case — user's
  YAML isn't on the server)
- The manifest already declares its own namespaces (no need to override)

Now:
- yaml_content (inline string) or yaml_path — pick one
- namespace optional; defaults to "default" for resources without one

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
VKS previously used 0-based pagination (page=0 = first page). VNG Cloud
team is standardizing to 1-based across all products so VKS will be
updated to match.

- CLAUDE.md: drop VKS-specific 0-based note; state the cross-product
  1-based convention
- call_api tool description: add pagination hint so the AI uses page=1
  and knows what to do when the API returns 400 Page/size invalid

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AI queries often use cloud-generic terms (VPC, instance, firewall) but
VNG Cloud specs use product-specific names (network, server, secgroup).
Stemming only handled plurals — synonyms need explicit mapping.

Added small VNG-specific synonym map:
- vpc → network
- instance → server
- firewall → secgroup, security
- pvc → persistentvolumeclaim
- k8s → kubernetes
- lb ↔ loadbalancer

AND semantics preserved: each query term must match some variant of
itself; synonyms expand what counts as "match" for a single term, not
what counts as a separate term.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ders

VNG Cloud APIs embed the account's project UUID in many URL paths. AI
previously had to call /v1/projects manually, extract projectId, then
substitute — adding an extra round-trip to every workflow.

New ProjectContext lazily fetches and caches the first project from
vServer /v1/projects on startup. call_api now swaps {projectId} and
{project_id} in paths transparently before making the request.

- project_context.py: async, thread-safe, in-memory cache
- api_caller.call_api: substitutes placeholder when project_context is
  provided; falls through unchanged otherwise
- server.py: wires a shared ProjectContext into call_api_tool
- Tool description tells the AI to leave placeholders in the path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
greenode-cli v1.3.x now saves project_id per profile in ~/.greenode/config
and supports GRN_DEFAULT_PROJECT_ID env override. MCP reads from the same
source, so users who ran 'grn configure' get zero-latency project_id
resolution — no vServer /v1/projects call at all.

Resolution order:
  1. GRN_DEFAULT_PROJECT_ID env var
  2. ~/.greenode/config [profile] project_id field
  3. ProjectContext API fetch (existing fallback)
  4. Clear error directing user to 'grn configure'

Also fix a latent bug: config.py was reading non-default profiles from
section "[<name>]", but greenode-cli writes them as "[profile <name>]"
per AWS convention. Now aligned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ProjectContext was added to auto-fetch project_id from vServer /v1/projects
when not configured. Now that greenode-cli v1.3.x writes project_id to
~/.greenode/config (and supports GRN_DEFAULT_PROJECT_ID env override),
the API fallback is redundant with the same logic in the CLI wizard.

Simpler is better:
- One source of truth (config) instead of two
- Clear error message directing users to `grn configure` beats silent
  background fetching — users learn the setup step faster
- Removes ~280 LoC (module + tests)
- Capability layer stays thin

Deleted:
- greennode/greenode_mcp_server/project_context.py
- tests/test_project_context.py

Modified:
- api_caller.call_api: drop project_context param, error when config
  lacks project_id
- server.py: remove ProjectContext wiring
- call_api tool description: reflect single-source behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The MCP server covers all GreenNode products, not just VKS. The config
type shouldn't imply a single product. Mirror the earlier rename of
VksClient to GreenodeClient.

- config.py: class name + "VKS configuration" docstrings → "GreenNode
  configuration"
- RegionEndpoints: "Endpoints for a single VKS region" → "Service
  endpoints for a single GreenNode region"
- auth.py, client.py, api_caller.py, k8s_handler.py: import + type
  annotations updated

No behavior change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three tweaks mirroring patterns in Cloudflare's production MCP servers:

1. raw=True: when set, call_api returns the full JSON response instead
   of the 6-column markdown table. AI uses this when it needs fields
   the table hides (e.g. subnet counts, tag maps) or wants to transform
   the data itself.

2. MAX_RESPONSE_BYTES = 800,000: matches Cloudflare's graphql tool
   guard. Responses larger than this return an actionable error
   asking the caller to paginate rather than silently truncating.

3. MAX_LIST_ROWS 30 → 100: matches Cloudflare's logpush default.
   Large-enough to cover most list operations without requiring
   pagination, small-enough to stay well inside the size cap.

Deliberately NOT added (after reviewing Cloudflare's codebase):
- fields / projection param (no Cloudflare tool has one)
- response caching (Cloudflare relies on backend/CDN)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code and other LLM clients sometimes shorten long IDs with
ellipsis (e.g. 'net-05934e2d...') when re-rendering tool output into
tables for readability. That makes the value unusable — the user can't
copy/paste it into the next call.

SERVER_INSTRUCTIONS now explicitly tells the LLM: never truncate IDs,
UUIDs, names, or certificate data. Prefer vertical key/value layout if
the table gets wide.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Full audit pass — every doc in the repo touched now reflects current
behavior, removed stale bits:

- docs/DEVELOPMENT.md: replaced every `vks-mcp-server` reference (was
  copy-pasted from the old VKS-only era), added --refresh-specs /
  --offline flags, GRN_DEFAULT_PROJECT_ID env var, spec registry
  section, build-time URL bake explanation, DOCS_PORTAL_URL GitHub
  variable
- src/greenode-mcp-server/README.md: documented call_api `raw=True`
  param, path placeholder resolution, full parameter table, profile
  behavior; dropped "pip install grncli" (CLI is Go now)
- README.md (root): GRN_DEFAULT_PROJECT_ID in credential setup;
  corrected sensitive-data claim (only Secret reads, not pod logs/events)
- CLAUDE.md: added API quirks (list wrapper keys, placeholders,
  COMMON_API_VERSIONS, kubeconfig envelope), expanded security rules
  (response size cap, row cap, pod logs/events ungated), updated test
  count (~175), linked MCP protocol smoke script
- CHANGELOG.md v0.4.0: fleshed out the stub section with every
  shipped feature + fix + breaking change

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ring sanitization'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@tvt286 tvt286 merged commit a97856e into main Apr 19, 2026
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants