Skip to content

feat: unify ArksApplication on RBG v0.6.0#79

Open
LikiosSedo wants to merge 8 commits intoscitix:mainfrom
LikiosSedo:pr-unified-rbg060-clean
Open

feat: unify ArksApplication on RBG v0.6.0#79
LikiosSedo wants to merge 8 commits intoscitix:mainfrom
LikiosSedo:pr-unified-rbg060-clean

Conversation

@LikiosSedo
Copy link
Copy Markdown
Contributor

@LikiosSedo LikiosSedo commented Mar 12, 2026

Summary

  • upgrade Arks to work with RBG v0.6.0
  • evolve ArksApplication into the unified inference entrypoint
  • support mode=unified|disaggregated in ArksApplication
  • support router in unified mode
  • keep ArksDisaggregatedApplication as the legacy-compatible path
  • align disaggregated ArksApplication top-level status summary

Validation

Validated on test cluster:

  • old ArksDisaggregatedApplication resources survive upgrade from arks 0.2.2 + rbg050 alpha4 to arks_pr + rbg060
  • old PD resources remain runnable and updatable after upgrade
  • new ArksApplication(mode=disaggregated) can be created after upgrade
  • old and new resources can coexist and update independently

Notes

Not included in this PR:

  • local e2e-only changes
  • validation docs
  • temporary unified command override used only for local mock testing

刘森栋 and others added 5 commits February 3, 2026 15:44
- Adapt to RBG v0.5.0 pointer type API changes
- Add CoordinationPolicy with Scaling and RollingUpdate strategies
- Scaling: coordinated initial deployment and scale-up
- RollingUpdate: coordinated rolling updates (Issue #150)
- CoordinationPolicy is independent of PodGroupPolicy
- Switch rbg dependency from internal GitLab to official GitHub v0.6.0
- Adapt RoleSpec.Template to RoleSpec.TemplateSource.Template for KEP-8
  RoleTemplate support in arksapplication and arksdisaggregatedapplication
  controllers
Add rbgsSpecSemanticallyEqual() to compare RBGS specs by JSON
semantic equality before patching. This prevents unnecessary PATCH
requests caused by Go type-level differences (pointer vs value)
when upgrading across RBG API versions (v0.5.0 → v0.6.0).

Also explicitly set Partition=0 on scheduler/router roles to match
the CRD-defaulted value in stored RBGS, eliminating false diffs
that would trigger a reconcile cascade and recreate downstream pods.
@LikiosSedo LikiosSedo force-pushed the pr-unified-rbg060-clean branch from 39f624e to 79f92b6 Compare April 10, 2026 08:24
- Extract isArksAppReady() to utils.go as single source of truth for
  ArksApplication readiness, used by both application and endpoint controllers.
- Hoist label map generation outside loops in generateRouterCommand()
  to avoid redundant allocations on each iteration.
When router.image and router.commandOverride are both provided,
bypass the sglang-only validation and use the user-supplied command
directly. This enables non-sglang runtimes (vllm, dynamo) to use
a custom router deployment.
Rebuild dist/operator.yaml to include unified ArksApplication CRD
changes and new RBAC rules. Default image placeholder unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant