Skip to content

Releases: dstackai/dstack

0.20.12

26 Feb 15:52
733a17c

Choose a tag to compare

Backends

Crusoe

dstack now supports Crusoe as a backend, enabling VM-based provisioning with GPU instances. The backend supports both single-node and multi-node cluster provisioning with InfiniBand.

type: fleet
name: my-crusoe-fleet

backends: [crusoe]
resources:
  gpu: A100:8
nodes: 2
placement: cluster

Note

CPU instances, H200, B200, GB200, MI300X, MI355X and volumes support is coming soon.

UI

Launch wizard

The UI now includes a launch wizard that lets users create runs from pre-defined templates. Instead of writing YAML from
scratch, users can select a template, pick GPU resources, adjust settings, and review the final
configuration—all through a guided flow.

To enable the launch wizard, point the server to a templates repository:

$ DSTACK_SERVER_TEMPLATES_REPO=https://github.com/dstackai/dstack-templates dstack server

Templates are YAML files under .dstack/templates in the repo. Each template has type set to template, a unique name, a title, configurable parameters, and a configuration that defines the dstack run:

type: template
name: in-browser-ide

title: In-browser IDE
description: Access the instance using VS Code in the browser.

parameters:
  - type: name
  - type: resources
  - type: python_or_docker
  - type: repo
  - type: working_dir
    
  - type: env
    title: Password
    name: PASSWORD
    value: $random-password

configuration:
  type: service
  
  auth: false
  gateway: true
  https: auto

  env:
    - BIND_ADDR=0.0.0.0:8080
  commands:
    - |
      echo "Your password is $PASSWORD. Share it carefully as it grants full access to the IDE."
    - |
      curl -fsSL https://code-server.dev/install.sh | sh -s -- --method standalone --prefix /tmp/code-server
    - |
      /tmp/code-server/bin/code-server --bind-addr $BIND_ADDR --auth password --disable-telemetry --disable-update-check .
  port: 8080

  probes:
    - type: http
      url: /healthz

See dstack-templates for an example repository.

Note

The launch wizard is an experimental feature. Currently, templates are configured per server. Per-project templates configuration is coming soon.

Instances

The UI now has an Instance details page where you can view detailed information about an instance, including its events and inspect data. Instance names across the UI—including on Events pages—now link directly to this page.

What's changed

Full changelog: 0.20.11...0.20.12

0.20.11

26 Feb 09:17
e88e25f

Choose a tag to compare

This release fixes a potential issue with the server replica failing to start due to a migration trying to create existing index on Postgres.

What's Changed

Full Changelog: 0.20.10...0.20.11

0.20.10

19 Feb 12:42
008efc8

Choose a tag to compare

Services

Prefill-Decode disaggregation

dstack now supports disaggregated Prefill–Decode inference, allowing both Prefill and Decode worker types to run within a single service.

To define and run such a service, set pd_disaggregation to true under the router property (this requires the gateway to use the sglang router, and define separate replica groups for Prefill and Decode worker types:

type: service
name: prefill-decode

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

image: lmsysorg/sglang:latest

replicas:
  - count: 1..4
    scaling:
      metric: rps
      target: 3
    commands:
      - |
          python -m sglang.launch_server \
            --model-path $MODEL_ID \
            --disaggregation-mode prefill \
            --disaggregation-transfer-backend mooncake \
            --host 0.0.0.0 \
            --port 8000 \
            --disaggregation-bootstrap-port 8998
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    commands:
      - |
          python -m sglang.launch_server \
            --model-path $MODEL_ID \
            --disaggregation-mode decode \
            --disaggregation-transfer-backend mooncake \
            --host 0.0.0.0 \
            --port 8000
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

probes:
  - type: http
    url: /health_generate
    interval: 15s

router:
  type: sglang
  pd_disaggregation: true

Note

Note, pd_disaggregation requires both the gateway and replicas to use the same cluster. With dstack, this can now be used with the aws, gcp, kubernetes backends (as they support creating both clusters and gateways). Support for more backends (and eventually SSH fleets) is coming soon.

Currently, pd_disaggregation works only with SGLang. Support for vLLM is coming soon.

Support for additional scaling metrics, such as TTFT and ITL, is also coming soon to enable autoscaling of Prefill and Decode workers.

Model endpoint

If you configure the model property, dstack previously provided a global model endpoint at gateway.<gateway domain> (or /proxy/models/<project name>), allowing access to all models deployed in the project. This endpoint has been deprecated.

Now, any deployed model should be accessed via the service endpoint itself at <run name>.<gateway domain> (or /proxy/services/main/<service name>).

Note

If you configure the model property, dstack automatically enables CORS on the service endpoint. Future versions will allow you to disable or customize this behavior.

CLI

dstack apply

Previously, if you did not specify gpu, dstack treated it as 0..1 but did not display it in the run plan. Now, dstack properly displays this default. Additionally, if you do not specify image, dstack automatically defaults the vendor to nvidia.

dstack apply -f dev.dstack.yml
 Project              peterschmidt85
 User                 peterschmidt85
 Type                 dev-environment
 Resources            cpu=2.. mem=8GB.. disk=100GB.. gpu=0..
 Spot policy          on-demand
 Max price            off
 Retry policy         off
 Idle duration        5m
 Max duration         off
 Inactivity duration  off

 #  BACKEND         RESOURCES                  INSTANCE TYPE  PRICE
 1  verda (FIN-01)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
 2  verda (FIN-02)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
 3  verda (FIN-03)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
    ...

Submit the run dev? [y/n]: 

This makes the run plan much more explicit and clear.

What's changed

Full changelog: 0.20.9...0.20.10

0.20.9

12 Feb 13:12
c4ed6ca

Choose a tag to compare

Events

UI

In the UI, both the Project and User pages now have an Events tab, providing a convenient way to track events without manually using the global filters.

On the User page, the Events tab shows events where the current user is either the Actor (the one who initiated the operation) or the Target user (the user the command was applied to):

On the Project page, the Events tab shows all events within the current project.

CLI

dstack attach

The dstack attach command now waits until the run is provisioned (similar to dstack apply), shows live progress, and attaches only after the run reaches the running state.

In addition, if a task defines ports and any of those ports cannot be forwarded to localhost (for example, because the port is already in use), both dstack attach and dstack apply now show a clear error message with a -p suggestion:

Failed to attach: port 8000 is already in use. Use -p in dstack attach to override the local port mapping, e.g. -p 8001:8000.

Kubernetes

Resources and offers

The way the kubernetes backend fetches offers has been updated. Previously, the offers reflected the node resources. Now, dstack returns only the offers that satisfy the requested range at its minimum value; for example, if you request gpu: 0..8, dstack returns only offers with gpu: 0. This makes the displayed offers closer to how runs are actually provisioned by Kubernetes.

dstack offer -b kubernetes --gpu 0..8 will return only offers with gpu: 0.

To see offers with gpu: 1, you must pass gpu: 1 or gpu: 1.. to dstack offer or dstack apply.

Note

We understand that this differs from how offers are shown for other backends, but this is the first step in improving how the kubernetes backend does provisioning. Feedback is welcome.

Proxy jump

To proxy SSH traffic inside containers, the kubernetes backend creates a proxy jump pod on startup. This requires at least one cluster node to have an external IP and relies on Kubernetes to forward this traffic even if the proxy jump pod is not running on the node with the external IP.

However, not all Kubernetes services support this behavior; for example, Nebius's Managed Kubernetes requires the proxy jump pod to run on a node with an external IP. To support these cases, the kubernetes backend now double-checks that the proxy jump pod is created correctly.

Note

The most reliable approach in such environments is still to ensure that all cluster nodes have an external IP. Feedback is welcome.

Fleets

Instances in SSH fleets are no longer automatically terminated when they become unreachable over SSH. This prevents premature termination of SSH fleet instances due to transient SSH connectivity issues.

Docs

The reference pages for .dstack.yml configurations now include more information on supported types for every property, making them more useful.

What's changed

Full changelog: 0.20.8...0.20.9

0.20.8

05 Feb 11:46
3149be8

Choose a tag to compare

CLI

dstack event --watch

The dstack event command now supports a --watch option for real-time event tracking.

video

Event coverage has also been improved, with events for run in-place update and service registration now available.

dstack fleet

The dstack fleet command now includes fleet-level information such as nodes, resources, spot policy, and backend details, with individual instances listed underneath.

dstack-fleet

Skills

SKILL.md

If you're using agents such as Claude Code, Codex, Cursor, etc., it’s now possible to install dstack skills.

npx skills add dstackai/dstack

These skills make the agent fully aware of the configuration syntax and CLI commands.

Screenshot 2026-02-05 at 11 54 18

Services

Probes

UI

The UI now displays probe statuses for services, helping monitor replica readiness and health.

ui-probes

until_ready

A new until_ready option for probes allows stopping probe execution once the ready_after threshold is reached. This is useful for resource-intensive probes that only need to run during startup:

probes:
  - type: http
     url: /health
     until_ready: true
     ready_after: 2

Model probes

Services that use the model property to declare a chat model with an OpenAI-compatible interface now receive an automatically configured probe that checks model availability by requesting /v1/chat/completions.

Backends

RunPod

Community Cloud

RunPod Community Cloud is now disabled by default to ensure a more reliable experience. You can still enable Community Cloud in the backend settings. dstack Sky users can enable Community Cloud only when using their own RunPod credentials.

CUDO

Due to CUDO Compute winding down its public on-demand offering, the cudo backend is now deprecated.

What's changed

Full changelog: 0.20.7...0.20.8

0.20.7

28 Jan 16:48
763092d

Choose a tag to compare

Services

Replica groups

A service can now include multiple replica groups. Each group can define its own commands, resources spec, and scaling rules.

type: service
name: llama-8b-service

image: lmsysorg/sglang:latest
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B

replicas:
  - count: 1..2
    scaling:
      metric: rps
      target: 10
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --port 8000 \
          --trust-remote-code
    resources:
      gpu: 48GB

  - count: 1..4
    scaling:
      metric: rps
      target: 5
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --port 8000 \
          --trust-remote-code
    resources:
      gpu: 24GB

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Note

Properties such as regions, port, image, env and some other cannot be configured per replica group. This support is coming soon.

Note

Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.

Events

Events are now also supported for volumes, gateways, and secrets.

$ dstack event --target-gateway my-gateway
[2026-01-28 11:53:03] [👤admin] [gateway my-gateway] Gateway created. Status: SUBMITTED
[2026-01-28 11:53:32] [gateway my-gateway] Gateway status changed SUBMITTED -> PROVISIONING
[2026-01-28 11:54:46] [gateway my-gateway] Gateway status changed PROVISIONING -> RUNNING
[2026-01-28 11:55:08] [👤admin] [gateway my-gateway] Gateway set as default

Instance events now also include reachability and health events.

Finally, we have added Events under Concepts in the documentation.

CLI

dstack project

The dstack project and dstack project set-default commands now allow you to interactively select the default project when these commands are run without arguments.

dstack-cli-project

dstack login

The dstack login command can now be run without arguments. In this case, it will interactively ask for the URL and provider if needed. If you want to use dstack Sky, you can simply press Enter without entering a URL or provider.

dstack-cli-login

Also, if you have multiple projects, the command will prompt you to select the default project as well.

What's changed

Full changelog: 0.20.6...0.20.7

0.20.6

21 Jan 13:31
f09d061

Choose a tag to compare

Server deployment

Memory optimization

This release reduces peak server memory usage. Previously, memory grew with the total number of instances ever submitted; this is now fixed. We recommend upgrading if memory usage increases over time.

Logs storage

Fluent Bit + Elasticsearch/OpenSearch

Run logs can now be stored in your own log storage via Fluent Bit. At the same time, dstack can now read run logs from Elasticsearch/OpenSearch (to display in the UI and CLI), if Fluent Bit ships the logs there.

See the docs for more details.

Fleets

Since 0.20, dstack requires at least one fleet to be created before you can submit any runs. To make this easier, we’ve simplified default fleet creation during project setup in the UI:

In addition, if your project doesn’t have a fleet, the UI will prompt you to create one.

What's Changed

Full changelog: 0.20.3...0.20.6

0.20.5

21 Jan 11:31
6d14aad

Choose a tag to compare

Warning

Be sure to update to 0.20.6, which includes important fixes.

What's Changed

Full Changelog: 0.20.4...0.20.5

0.20.4

21 Jan 10:45
32fbc02

Choose a tag to compare

Warning

Be sure to update to 0.20.6, which includes important fixes.

What's changed

Full changelog: 0.20.3...0.20.4

0.20.3

08 Jan 18:03
d48b15f

Choose a tag to compare

Dev environments

Windsurf IDE

Dev environments now support Windsurf as a first-class IDE option alongside VSCode and Cursor.

type: dev-environment
ide: windsurf

repos:
- https://github.com/dstackai/dstack

resources:
  gpu: 24GB..:1

dstack provisions an instance for your dev environment and seamlessly connects your local Windsurf editor to it.

dstack-windsurf-dev-environment-min

Troubleshooting

Runs/fleets/volumes/gateways JSON via CLI

You can now inspect the full JSON state of runs, fleets, volumes, and gateways using these CLI commands:

$ dstack run get <name> --json
$ dstack fleet get <name> --json
$ dstack volume get <name> --json
$ dstack gateway get <name> --json

Runs/fleets JSON via UI

The UI includes new "Inspect" tabs with read-only JSON viewers for runs and fleets, making it easier to debug and understand resource states.

dstack-inspect-ui-min

What's changed

Full Changelog: 0.20.2...0.20.3