Training Loop

Configure the loop
Inspect a single iteration
Custom loops
Stop conditions
Next steps

A single call to algo.train() runs one iteration of the loop:

Rollouts

Each EnvRunner steps its environment(s), producing a batch of episodes. The exploration policy uses the latest weights from the central RLModule.

Postprocessing

Connectors compute returns, advantages, and any algorithm-specific batch-level fields.

Update

The collected batch flows to the Learner(s). Each Learner runs the loss and updates weights.

Sync

Updated weights are broadcast back to all EnvRunners.

Metrics

Per-iteration metrics — episode return, learner stats, sampler timings — are aggregated and returned.

Configure the loop

config = (
    PPOConfig()
    .env_runners(num_env_runners=8, num_envs_per_env_runner=4)
    .learners(num_learners=2, num_gpus_per_learner=1)
    .training(train_batch_size=8000, num_epochs=10, minibatch_size=512)
)

Inspect a single iteration

result = algo.train()
print(result["env_runners"]["episode_return_mean"])
print(result["learners"]["__all_modules__"]["total_loss"])
print(result["env_runners"]["sample"])  # rollout time

Custom loops

For full control, drop down to algo.step (the legacy stack uses Algorithm.training_step). The new-stack equivalent is to subclass the algorithm and override the iteration logic.

Stop conditions

Use Ray Tune’s stop config:

tune.run(
    "PPO",
    config=config.to_dict(),
    stop={"env_runners/episode_return_mean": 200, "training_iteration": 100},
)

Next steps

Checkpoints

Save and restore RLlib state.

Offline RL

Skip rollouts and train from logged data.

Replay Buffers Checkpoints

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Training Loop

Configure the loop

Inspect a single iteration

Custom loops

Stop conditions

Next steps

Checkpoints

Offline RL

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Configure the loop

​Inspect a single iteration

​Custom loops

​Stop conditions

​Next steps

Checkpoints

Offline RL

Configure the loop

Inspect a single iteration

Custom loops

Stop conditions

Next steps