Configuring Autoscaling on VMs

Bounds
Idle timeout
Upscaling speed
Resource matching
Spot instances
Verifying
Next steps

The autoscaler runs on the head node and provisions or terminates worker VMs based on resource demand.

Bounds

max_workers: 32
available_node_types:
  cpu-worker:
    min_workers: 0
    max_workers: 16
  gpu-worker:
    min_workers: 0
    max_workers: 8

min_workers is the floor for that node type; max_workers is the per-type cap. The cluster-wide max_workers is an absolute cap.

Idle timeout

idle_timeout_minutes: 5

Idle workers are terminated after this many minutes.

Upscaling speed

upscaling_speed: 1.0

How aggressively to scale up. 1.0 doubles the cluster on every check; 0.5 adds half. Higher values reduce time-to-capacity at the cost of bursty bills.

Resource matching

The autoscaler picks the cheapest node type that satisfies a pending resource request. Use custom resources to differentiate node classes:

available_node_types:
  high-mem:
    node_config: { InstanceType: r5.4xlarge }
    resources: { high_memory: 1 }

Then in your code:

@ray.remote(resources={"high_memory": 1})
def big_task(...):
    ...

Spot instances

For preemptible workers, configure the provider’s spot options (AWS InstanceMarketOptions, GCP preemptible: true, Azure priority: Spot).

Verifying

ray exec cluster.yaml "tail -f /tmp/ray/session_latest/logs/monitor*"

Next steps

Best practices

Run >100 nodes reliably.

Logging

Aggregate logs across nodes.

On-Premises Clusters Best Practices for Large Clusters

⌘I

Ray Clusters

Observability

Configuring Autoscaling on VMs

Bounds

Idle timeout

Upscaling speed

Resource matching

Spot instances

Verifying

Next steps

Best practices

Logging

Ray Clusters

Observability

Documentation Index

​Bounds

​Idle timeout

​Upscaling speed

​Resource matching

​Spot instances

​Verifying

​Next steps

Best practices

Logging

Bounds

Idle timeout

Upscaling speed

Resource matching

Spot instances

Verifying

Next steps