Ray RLlib Overview

Why RLlib
Quick example
Concepts

RLlib offers a unified, framework-agnostic API for building and scaling RL agents. It supports PyTorch (and TensorFlow), runs on a laptop or a cluster, and ships with battle-tested implementations of the major RL algorithms.

Why RLlib

Production-grade algorithms

PPO, IMPALA, APPO, DQN, SAC, BC, MARWIL, CQL, and more — implemented with the same configuration patterns.

Multi-agent and offline RL

First-class support for cooperative and competitive multi-agent setups, plus learning from logged trajectories.

Scales out of the box

Run thousands of parallel environments and dozens of learner GPUs by changing config.

Composable

Build custom environments, RL modules, learners, and replay buffers via well-defined interfaces.

Quick example

from ray.rllib.algorithms.ppo import PPOConfig

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .env_runners(num_env_runners=4)
    .training(train_batch_size=4000, num_epochs=10, lr=1e-4)
    .resources(num_gpus=0)
)

algo = config.build()
for i in range(10):
    result = algo.train()
    print(result["env_runners"]["episode_return_mean"])

Concepts

Key concepts

Algorithms, RL modules, learners, env runners, replay buffers.

Algorithms

Survey the algorithms RLlib ships with.

Environments

Wrap Gym, PettingZoo, and custom envs.

Training loop

Inside an RLlib training iteration.

Serving LLMs Get Started With RLlib

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Ray RLlib Overview

Why RLlib

Production-grade algorithms

Multi-agent and offline RL

Scales out of the box

Composable

Quick example

Concepts

Key concepts

Algorithms

Environments

Training loop

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Why RLlib

Production-grade algorithms

Multi-agent and offline RL

Scales out of the box

Composable

​Quick example

​Concepts

Key concepts

Algorithms

Environments

Training loop

Why RLlib

Quick example

Concepts