Autoscaling on Kubernetes

Enable autoscaling
Worker group bounds
Multiple worker groups
Triggers
Cool-down
Cluster-autoscaler vs Ray autoscaler
Next steps

KubeRay supports the in-tree Ray autoscaler — the same logic Ray uses on VM clusters, deployed as a sidecar on the head pod.

Enable autoscaling

spec:
  enableInTreeAutoscaling: true
  autoscalerOptions:
    upscalingMode: Default            # or Aggressive, Conservative
    idleTimeoutSeconds: 60
    resources:
      limits:   { cpu: "1", memory: "1Gi" }
      requests: { cpu: "500m", memory: "512Mi" }

Worker group bounds

workerGroupSpecs:
  - groupName: cpu
    replicas: 1            # initial count
    minReplicas: 0         # min after scale-in
    maxReplicas: 32        # cap on scale-out

Multiple worker groups

Define one group per node profile (CPU, GPU, high-memory). The autoscaler picks the cheapest group that satisfies a pending resource request.

workerGroupSpecs:
  - groupName: cpu
    minReplicas: 0
    maxReplicas: 50
  - groupName: gpu
    minReplicas: 0
    maxReplicas: 8
    template:
      spec:
        containers:
          - name: ray-worker
            resources:
              limits:
                nvidia.com/gpu: 1

Triggers

A task or actor with unfulfilled resource requests.
An autoscaling Serve deployment whose target_ongoing_requests is exceeded.
A placement group that can’t fit on existing nodes.

Cool-down

idleTimeoutSeconds controls how long a node must be idle before being terminated. Long timeouts smooth out spiky traffic; short timeouts save money.

Cluster-autoscaler vs Ray autoscaler

You can run both. The Ray autoscaler asks Kubernetes for new pods; if Kubernetes can’t schedule them on existing nodes, the cluster autoscaler adds nodes from your cloud provider.

Next steps

GPU

GPU node pools.

Troubleshooting

Why isn’t my cluster scaling?

RayCluster Configuration GPU Workloads

⌘I

Ray Clusters

Observability

Autoscaling on Kubernetes

Enable autoscaling

Worker group bounds

Multiple worker groups

Triggers

Cool-down

Cluster-autoscaler vs Ray autoscaler

Next steps

GPU

Troubleshooting

Ray Clusters

Observability

Documentation Index

​Enable autoscaling

​Worker group bounds

​Multiple worker groups

​Triggers

​Cool-down

​Cluster-autoscaler vs Ray autoscaler

​Next steps

GPU

Troubleshooting

Enable autoscaling

Worker group bounds

Multiple worker groups

Triggers

Cool-down

Cluster-autoscaler vs Ray autoscaler

Next steps