RLlib Key Concepts

Algorithm
RLModule
Learner
EnvRunner
Replay buffer
Connector
Episode
Algorithm config
New API stack
Next steps

Algorithm

The top-level orchestrator. Configures and runs the training loop, owns the learners and env runners, and exposes train(), evaluate(), save(), and restore().

algo = PPOConfig().environment("CartPole-v1").build()
algo.train()

RLModule

The neural network and inference logic. RLModules are framework-agnostic; concrete subclasses like TorchRLModule implement the framework-specific bits.

from ray.rllib.core.rl_module.rl_module import RLModule

class MyModule(RLModule):
    ...

Learner

The component that consumes batches of experience and updates weights. Learners can be replicated across GPUs for multi-GPU training.

EnvRunner

The component that interacts with environments to collect experience. EnvRunners run on CPU workers (or GPU workers, when inference happens on the actor) and stream samples back to the learners.

Replay buffer

For off-policy algorithms (DQN, SAC). Stores past experience that learners sample from. RLlib includes prioritized and uniform replay buffers; you can subclass ReplayBuffer for custom behavior.

Connector

A pipeline of transforms that runs on each environment step (observation pre-processing) or on each training batch (post-processing). Connectors keep training- and inference-time logic in sync.

Episode

A trajectory through the environment, represented as a sequence of (obs, action, reward, next_obs, done) tuples plus metadata.

Algorithm config

A typed builder. Set environment, training, RL module, learner, and resource options separately, then call .build().

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .env_runners(num_env_runners=4)
    .learners(num_learners=1, num_gpus_per_learner=1)
    .training(train_batch_size=4000)
)

New API stack

RLlib’s modern stack centers on RLModule, Learner, EnvRunner, and Connector. The legacy stack (Policy, RolloutWorker) is still supported for older algorithms but new development happens on the new stack.

Next steps

Algorithms

Survey the algorithm catalog.

RL modules

Define a custom policy network.

Get Started With RLlib RLlib Algorithms

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

RLlib Key Concepts

Algorithm

RLModule

Learner

EnvRunner

Replay buffer

Connector

Episode

Algorithm config

New API stack

Next steps

Algorithms

RL modules

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Algorithm

​RLModule

​Learner

​EnvRunner

​Replay buffer

​Connector

​Episode

​Algorithm config

​New API stack

​Next steps

Algorithms

RL modules

Algorithm

RLModule

Learner

EnvRunner

Replay buffer

Connector

Episode

Algorithm config

New API stack

Next steps