Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

RLlib includes implementations of major RL algorithms. Each has a Config class that exposes the algorithm’s hyperparameters in a typed builder.

On-policy

PPO

Default choice for continuous and discrete control. Stable, easy to tune.

APPO

Asynchronous PPO. Higher throughput on large-scale clusters.

IMPALA

Distributed actor-critic with V-trace. Used at scale for game and robotics tasks.
from ray.rllib.algorithms.ppo import PPOConfig
config = PPOConfig().environment("CartPole-v1").training(lr=1e-4)

Off-policy

DQN / Rainbow

Discrete action spaces; uses a replay buffer.

SAC

Continuous control; entropy-regularized actor-critic.
from ray.rllib.algorithms.dqn import DQNConfig
config = DQNConfig().environment("CartPole-v1")

Offline RL

Behavior Cloning (BC)

Supervised pre-training from logged trajectories.

MARWIL

Imitation with advantage weighting for higher-quality demonstrations.

CQL

Conservative Q-learning for offline datasets.
See Offline RL for the full workflow.

Multi-agent

Most algorithms support multi-agent training via the multi-agent API. Specify policies and a mapping from agent ID to policy.
config = (
    PPOConfig()
    .environment(MyMultiAgentEnv)
    .multi_agent(
        policies={"learner", "frozen"},
        policy_mapping_fn=lambda aid, *args, **kw: "learner" if aid == 0 else "frozen",
    )
)

Pick an algorithm

Use caseRecommended starting point
Continuous controlSAC
Discrete action spacesPPO or DQN
Many parallel envs, simple networkIMPALA
Imitation from logsBC, MARWIL
Offline dataset, no env accessCQL
Multi-agentPPO with multi-agent config

Next steps

Training

Inside an RLlib training iteration.

RL modules

Custom policy networks.