Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
RunConfig captures everything about a run that isn’t model code or scaling: where checkpoints go, what to call the run, when to stop, how to handle failures.
Basic usage
Storage
storage_path is where Ray Train writes checkpoints, metrics, and trial state. Local paths, S3, GCS, and any pyarrow-supported filesystem are valid.
Run name
name controls the directory under storage_path. Defaults to a generated name like TorchTrainer_2025-04-30_12-34-56.
Stop conditions
Failure handling
max_failures retries the trial after worker errors. fail_fast=True stops the entire run on the first failure.
Sync config
Ray Train syncs checkpoints from each worker’s local disk tostorage_path. Tune behavior with SyncConfig:
Callbacks
Pass callbacks to integrate with experiment trackers:TBXLoggerCallback, WandbLoggerCallback, MLflowLoggerCallback, and CSVLoggerCallback.
Next steps
Checkpointing
Configure checkpoint behavior in detail.
Fault tolerance
Failure semantics for distributed training.