Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Ray Train works with Hugging Face Transformers’ Trainer and Accelerate. You keep your existing training script and wrap it in a Ray Train trainer.

Install

pip install -U "ray[train]" transformers datasets accelerate

With Hugging Face Trainer

import ray
from ray.train import ScalingConfig
from ray.train.torch import TorchTrainer
from ray.train.huggingface.transformers import (
    RayTrainReportCallback,
    prepare_trainer,
)
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
)
import datasets


def train_loop_per_worker(config):
    tok = AutoTokenizer.from_pretrained(config["model"])
    model = AutoModelForSequenceClassification.from_pretrained(config["model"], num_labels=2)

    raw = datasets.load_dataset("imdb", split="train[:5%]")
    def tokenize(b): return tok(b["text"], padding=True, truncation=True, max_length=256)
    ds = raw.map(tokenize, batched=True).rename_column("label", "labels")

    args = TrainingArguments(
        output_dir="/tmp/out",
        per_device_train_batch_size=config["batch_size"],
        num_train_epochs=config["epochs"],
        learning_rate=config["lr"],
        report_to="none",
        save_strategy="epoch",
    )
    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=ds,
        tokenizer=tok,
        callbacks=[RayTrainReportCallback()],
    )
    trainer = prepare_trainer(trainer)
    trainer.train()


trainer = TorchTrainer(
    train_loop_per_worker,
    train_loop_config={
        "model": "distilbert-base-uncased",
        "lr": 5e-5,
        "batch_size": 16,
        "epochs": 1,
    },
    scaling_config=ScalingConfig(num_workers=4, use_gpu=True),
)
result = trainer.fit()

With Accelerate

from accelerate import Accelerator
from ray.train.huggingface.accelerate import AccelerateTrainer  # convenience trainer

def train_loop_per_worker(config):
    accelerator = Accelerator()
    model, optim, loader = accelerator.prepare(model, optim, loader)
    for batch in loader:
        ...
        accelerator.backward(loss)
        optim.step()

Tips for LLM fine-tuning

For LLMs, use RayFSDPStrategy (Lightning) or DeepSpeed via Accelerate. Ray Train integrates with both. Match tensor_parallel_size to your model size and GPU count.
Set report_to="none" in TrainingArguments and rely on RayTrainReportCallback to forward metrics. Otherwise you may end up with each worker writing to a different MLflow/TensorBoard run.

Next steps

Distributed PyTorch

DDP, FSDP, and DeepSpeed details.

Checkpointing

Save and reload Transformers checkpoints.