Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

RLlib accepts environments through a standard interface. The simplest path is a Gymnasium environment registered by name.

Gymnasium environments

config = PPOConfig().environment("CartPole-v1")
config = PPOConfig().environment("LunarLander-v3")
Any Gymnasium-registered ID works. Ensure the package is installed (e.g., pip install gymnasium[box2d]).

Custom environment

Subclass gymnasium.Env:
import gymnasium as gym
import numpy as np

class GridEnv(gym.Env):
    metadata = {"render_modes": []}

    def __init__(self, config=None):
        self.observation_space = gym.spaces.Box(0, 1, (4,))
        self.action_space = gym.spaces.Discrete(2)

    def reset(self, *, seed=None, options=None):
        self.t = 0
        return np.zeros(4, dtype=np.float32), {}

    def step(self, action):
        self.t += 1
        obs = np.random.rand(4).astype(np.float32)
        reward = 1.0 if action == 0 else 0.0
        terminated = self.t >= 10
        return obs, reward, terminated, False, {}

config = PPOConfig().environment(GridEnv)

Multi-agent environments

Use RLlib’s MultiAgentEnv interface or a PettingZoo wrapper.
from ray.rllib.env.multi_agent_env import MultiAgentEnv

class MultiGrid(MultiAgentEnv):
    def __init__(self, config=None):
        super().__init__()
        self.agents = ["a", "b"]
        ...

    def reset(self, *, seed=None, options=None):
        return {a: ... for a in self.agents}, {}

    def step(self, action_dict):
        ...
        return obs, rewards, terminateds, truncateds, infos
For PettingZoo:
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.butterfly import pistonball_v6

config = PPOConfig().environment(lambda _: PettingZooEnv(pistonball_v6.parallel_env()))

Vectorized envs

Run multiple environment instances in one EnvRunner for higher throughput:
config = config.env_runners(num_envs_per_env_runner=8)

External envs

For environments that drive RLlib (rather than RLlib stepping the env), see the ExternalEnv and PolicyClient APIs.

Next steps

RL modules

Custom policy networks for your env.

Training

Inside an RLlib training iteration.