Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

RLlib offers a unified, framework-agnostic API for building and scaling RL agents. It supports PyTorch (and TensorFlow), runs on a laptop or a cluster, and ships with battle-tested implementations of the major RL algorithms.

Why RLlib

Production-grade algorithms

PPO, IMPALA, APPO, DQN, SAC, BC, MARWIL, CQL, and more — implemented with the same configuration patterns.

Multi-agent and offline RL

First-class support for cooperative and competitive multi-agent setups, plus learning from logged trajectories.

Scales out of the box

Run thousands of parallel environments and dozens of learner GPUs by changing config.

Composable

Build custom environments, RL modules, learners, and replay buffers via well-defined interfaces.

Quick example

from ray.rllib.algorithms.ppo import PPOConfig

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .env_runners(num_env_runners=4)
    .training(train_batch_size=4000, num_epochs=10, lr=1e-4)
    .resources(num_gpus=0)
)

algo = config.build()
for i in range(10):
    result = algo.train()
    print(result["env_runners"]["episode_return_mean"])

Concepts

Key concepts

Algorithms, RL modules, learners, env runners, replay buffers.

Algorithms

Survey the algorithms RLlib ships with.

Environments

Wrap Gym, PettingZoo, and custom envs.

Training loop

Inside an RLlib training iteration.