Learner

Single-GPU training
Multi-GPU data parallelism
Configure the optimizer
Custom losses
Mixed precision and torch.compile
Next steps

A Learner owns the optimizer and runs the algorithm-specific loss on each training batch.

Single-GPU training

config = config.learners(num_learners=0, num_gpus_per_learner=1)

num_learners=0 runs the learner inside the local Algorithm process.

Multi-GPU data parallelism

config = config.learners(num_learners=4, num_gpus_per_learner=1)

RLlib creates four Learner actors, each holding its own copy of the network. Gradients are averaged via Ray’s collective ops.

Configure the optimizer

Algorithm-specific optimizer settings live on the algorithm’s training builder:

config = config.training(lr=1e-4, grad_clip=40.0, train_batch_size=4096)

Custom losses

For non-trivial algorithm changes, subclass the algorithm’s learner:

from ray.rllib.algorithms.ppo.torch.ppo_torch_learner import PPOTorchLearner

class MyPPOLearner(PPOTorchLearner):
    def compute_loss_for_module(self, *, module_id, hps, batch, fwd_out):
        loss = super().compute_loss_for_module(...)
        loss += my_extra_term(batch, fwd_out)
        return loss

Wire into the config:

config = config.training(learner_class=MyPPOLearner)

Mixed precision and torch.compile

config = config.experimental(_torch_compile_learner=True)

For mixed precision, pass precision="bf16" (algorithm-dependent).

Next steps

RL modules

Define the network the Learner trains.

Training

The full iteration loop.

RL Modules Replay Buffers

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Single-GPU training

Multi-GPU data parallelism

Configure the optimizer

Custom losses

Mixed precision and torch.compile

Next steps

RL modules

Training

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Single-GPU training

​Multi-GPU data parallelism

​Configure the optimizer

​Custom losses

​Mixed precision and torch.compile

​Next steps

RL modules

Training

Single-GPU training

Multi-GPU data parallelism

Configure the optimizer

Custom losses

Mixed precision and torch.compile

Next steps