Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

ScalingConfig is the smallest object you’ll touch when scaling a training job. It controls how many workers run, what each worker gets, and how they’re placed.

Basic usage

from ray.train import ScalingConfig

ScalingConfig(num_workers=8, use_gpu=True)
This launches eight workers, each on its own GPU.

Resource customization

ScalingConfig(
    num_workers=4,
    use_gpu=True,
    resources_per_worker={
        "CPU": 4,
        "GPU": 1,
        "memory": 16 * 1024**3,
        "high_memory": 1,        # custom resource
    },
)
resources_per_worker follows the same shape as Ray’s @ray.remote(...) resource spec.

Placement strategies

Workers are scheduled into a placement group. Choose how to lay it out:
ScalingConfig(num_workers=4, placement_strategy="PACK")
StrategyEffect
PACK (default)Pack onto as few nodes as possible.
SPREADSpread across as many nodes as possible.
STRICT_PACKAll workers on one node, or fail.
STRICT_SPREADOne worker per node, or fail.

Trainer resources

A small “trainer” actor coordinates workers. Set its resources separately:
ScalingConfig(num_workers=4, trainer_resources={"CPU": 1})

Heterogeneous workers

Ray Train doesn’t yet support different resource specs per worker out of the box. For mixed workloads (e.g., one parameter server actor + N learners), build your own coordinator on top of Ray Core.

Validate the config

trainer = TorchTrainer(..., scaling_config=ScalingConfig(num_workers=4, use_gpu=True))
trainer.preprocess_datasets()  # raises if the cluster can't satisfy the request

Next steps

Run config

Storage, naming, callbacks.

Distributed PyTorch

See ScalingConfig in real training jobs.