KubeRay supports the in-tree Ray autoscaler — the same logic Ray uses on VM clusters, deployed as a sidecar on the head pod.Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Enable autoscaling
Worker group bounds
Multiple worker groups
Define one group per node profile (CPU, GPU, high-memory). The autoscaler picks the cheapest group that satisfies a pending resource request.Triggers
- A task or actor with unfulfilled resource requests.
- An autoscaling Serve deployment whose
target_ongoing_requestsis exceeded. - A placement group that can’t fit on existing nodes.
Cool-down
idleTimeoutSeconds controls how long a node must be idle before being terminated. Long timeouts smooth out spiky traffic; short timeouts save money.
Cluster-autoscaler vs Ray autoscaler
You can run both. The Ray autoscaler asks Kubernetes for new pods; if Kubernetes can’t schedule them on existing nodes, the cluster autoscaler adds nodes from your cloud provider.Next steps
GPU
GPU node pools.
Troubleshooting
Why isn’t my cluster scaling?