Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Algorithm

The top-level orchestrator. Configures and runs the training loop, owns the learners and env runners, and exposes train(), evaluate(), save(), and restore().
algo = PPOConfig().environment("CartPole-v1").build()
algo.train()

RLModule

The neural network and inference logic. RLModules are framework-agnostic; concrete subclasses like TorchRLModule implement the framework-specific bits.
from ray.rllib.core.rl_module.rl_module import RLModule

class MyModule(RLModule):
    ...

Learner

The component that consumes batches of experience and updates weights. Learners can be replicated across GPUs for multi-GPU training.

EnvRunner

The component that interacts with environments to collect experience. EnvRunners run on CPU workers (or GPU workers, when inference happens on the actor) and stream samples back to the learners.

Replay buffer

For off-policy algorithms (DQN, SAC). Stores past experience that learners sample from. RLlib includes prioritized and uniform replay buffers; you can subclass ReplayBuffer for custom behavior.

Connector

A pipeline of transforms that runs on each environment step (observation pre-processing) or on each training batch (post-processing). Connectors keep training- and inference-time logic in sync.

Episode

A trajectory through the environment, represented as a sequence of (obs, action, reward, next_obs, done) tuples plus metadata.

Algorithm config

A typed builder. Set environment, training, RL module, learner, and resource options separately, then call .build().
config = (
    PPOConfig()
    .environment("CartPole-v1")
    .env_runners(num_env_runners=4)
    .learners(num_learners=1, num_gpus_per_learner=1)
    .training(train_batch_size=4000)
)

New API stack

RLlib’s modern stack centers on RLModule, Learner, EnvRunner, and Connector. The legacy stack (Policy, RolloutWorker) is still supported for older algorithms but new development happens on the new stack.

Next steps

Algorithms

Survey the algorithm catalog.

RL modules

Define a custom policy network.