Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Algorithm
The top-level orchestrator. Configures and runs the training loop, owns the learners and env runners, and exposestrain(), evaluate(), save(), and restore().
RLModule
The neural network and inference logic. RLModules are framework-agnostic; concrete subclasses likeTorchRLModule implement the framework-specific bits.
Learner
The component that consumes batches of experience and updates weights. Learners can be replicated across GPUs for multi-GPU training.EnvRunner
The component that interacts with environments to collect experience. EnvRunners run on CPU workers (or GPU workers, when inference happens on the actor) and stream samples back to the learners.Replay buffer
For off-policy algorithms (DQN, SAC). Stores past experience that learners sample from. RLlib includes prioritized and uniform replay buffers; you can subclassReplayBuffer for custom behavior.
Connector
A pipeline of transforms that runs on each environment step (observation pre-processing) or on each training batch (post-processing). Connectors keep training- and inference-time logic in sync.Episode
A trajectory through the environment, represented as a sequence of(obs, action, reward, next_obs, done) tuples plus metadata.
Algorithm config
A typed builder. Set environment, training, RL module, learner, and resource options separately, then call.build().
New API stack
RLlib’s modern stack centers onRLModule, Learner, EnvRunner, and Connector. The legacy stack (Policy, RolloutWorker) is still supported for older algorithms but new development happens on the new stack.
Next steps
Algorithms
Survey the algorithm catalog.
RL modules
Define a custom policy network.