Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

A Learner owns the optimizer and runs the algorithm-specific loss on each training batch.

Single-GPU training

config = config.learners(num_learners=0, num_gpus_per_learner=1)
num_learners=0 runs the learner inside the local Algorithm process.

Multi-GPU data parallelism

config = config.learners(num_learners=4, num_gpus_per_learner=1)
RLlib creates four Learner actors, each holding its own copy of the network. Gradients are averaged via Ray’s collective ops.

Configure the optimizer

Algorithm-specific optimizer settings live on the algorithm’s training builder:
config = config.training(lr=1e-4, grad_clip=40.0, train_batch_size=4096)

Custom losses

For non-trivial algorithm changes, subclass the algorithm’s learner:
from ray.rllib.algorithms.ppo.torch.ppo_torch_learner import PPOTorchLearner

class MyPPOLearner(PPOTorchLearner):
    def compute_loss_for_module(self, *, module_id, hps, batch, fwd_out):
        loss = super().compute_loss_for_module(...)
        loss += my_extra_term(batch, fwd_out)
        return loss
Wire into the config:
config = config.training(learner_class=MyPPOLearner)

Mixed precision and torch.compile

config = config.experimental(_torch_compile_learner=True)
For mixed precision, pass precision="bf16" (algorithm-dependent).

Next steps

RL modules

Define the network the Learner trains.

Training

The full iteration loop.