Documentation Index Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Ray Train works with Hugging Face Transformers’ Trainer and Accelerate. You keep your existing training script and wrap it in a Ray Train trainer.
Install
pip install -U "ray[train]" transformers datasets accelerate
With Hugging Face Trainer
import ray
from ray.train import ScalingConfig
from ray.train.torch import TorchTrainer
from ray.train.huggingface.transformers import (
RayTrainReportCallback,
prepare_trainer,
)
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
Trainer,
TrainingArguments,
)
import datasets
def train_loop_per_worker ( config ):
tok = AutoTokenizer.from_pretrained(config[ "model" ])
model = AutoModelForSequenceClassification.from_pretrained(config[ "model" ], num_labels = 2 )
raw = datasets.load_dataset( "imdb" , split = "train[:5%]" )
def tokenize ( b ): return tok(b[ "text" ], padding = True , truncation = True , max_length = 256 )
ds = raw.map(tokenize, batched = True ).rename_column( "label" , "labels" )
args = TrainingArguments(
output_dir = "/tmp/out" ,
per_device_train_batch_size = config[ "batch_size" ],
num_train_epochs = config[ "epochs" ],
learning_rate = config[ "lr" ],
report_to = "none" ,
save_strategy = "epoch" ,
)
trainer = Trainer(
model = model,
args = args,
train_dataset = ds,
tokenizer = tok,
callbacks = [RayTrainReportCallback()],
)
trainer = prepare_trainer(trainer)
trainer.train()
trainer = TorchTrainer(
train_loop_per_worker,
train_loop_config = {
"model" : "distilbert-base-uncased" ,
"lr" : 5e-5 ,
"batch_size" : 16 ,
"epochs" : 1 ,
},
scaling_config = ScalingConfig( num_workers = 4 , use_gpu = True ),
)
result = trainer.fit()
With Accelerate
from accelerate import Accelerator
from ray.train.huggingface.accelerate import AccelerateTrainer # convenience trainer
def train_loop_per_worker ( config ):
accelerator = Accelerator()
model, optim, loader = accelerator.prepare(model, optim, loader)
for batch in loader:
...
accelerator.backward(loss)
optim.step()
Tips for LLM fine-tuning
For LLMs, use RayFSDPStrategy (Lightning) or DeepSpeed via Accelerate. Ray Train integrates with both. Match tensor_parallel_size to your model size and GPU count.
Set report_to="none" in TrainingArguments and rely on RayTrainReportCallback to forward metrics. Otherwise you may end up with each worker writing to a different MLflow/TensorBoard run.
Next steps
Distributed PyTorch DDP, FSDP, and DeepSpeed details.
Checkpointing Save and reload Transformers checkpoints.