Environments

Gymnasium environments
Custom environment
Multi-agent environments
Vectorized envs
External envs
Next steps

RLlib accepts environments through a standard interface. The simplest path is a Gymnasium environment registered by name.

Gymnasium environments

config = PPOConfig().environment("CartPole-v1")
config = PPOConfig().environment("LunarLander-v3")

Any Gymnasium-registered ID works. Ensure the package is installed (e.g., pip install gymnasium[box2d]).

Custom environment

Subclass gymnasium.Env:

import gymnasium as gym
import numpy as np

class GridEnv(gym.Env):
    metadata = {"render_modes": []}

    def __init__(self, config=None):
        self.observation_space = gym.spaces.Box(0, 1, (4,))
        self.action_space = gym.spaces.Discrete(2)

    def reset(self, *, seed=None, options=None):
        self.t = 0
        return np.zeros(4, dtype=np.float32), {}

    def step(self, action):
        self.t += 1
        obs = np.random.rand(4).astype(np.float32)
        reward = 1.0 if action == 0 else 0.0
        terminated = self.t >= 10
        return obs, reward, terminated, False, {}

config = PPOConfig().environment(GridEnv)

Multi-agent environments

Use RLlib’s MultiAgentEnv interface or a PettingZoo wrapper.

from ray.rllib.env.multi_agent_env import MultiAgentEnv

class MultiGrid(MultiAgentEnv):
    def __init__(self, config=None):
        super().__init__()
        self.agents = ["a", "b"]
        ...

    def reset(self, *, seed=None, options=None):
        return {a: ... for a in self.agents}, {}

    def step(self, action_dict):
        ...
        return obs, rewards, terminateds, truncateds, infos

For PettingZoo:

from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.butterfly import pistonball_v6

config = PPOConfig().environment(lambda _: PettingZooEnv(pistonball_v6.parallel_env()))

Vectorized envs

Run multiple environment instances in one EnvRunner for higher throughput:

config = config.env_runners(num_envs_per_env_runner=8)

External envs

For environments that drive RLlib (rather than RLlib stepping the env), see the ExternalEnv and PolicyClient APIs.

Next steps

RL modules

Custom policy networks for your env.

Training

Inside an RLlib training iteration.

RLlib Algorithms RL Modules

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Environments

Gymnasium environments

Custom environment

Multi-agent environments

Vectorized envs

External envs

Next steps

RL modules

Training

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Gymnasium environments

​Custom environment

​Multi-agent environments

​Vectorized envs

​External envs

​Next steps

RL modules

Training

Gymnasium environments

Custom environment

Multi-agent environments

Vectorized envs

External envs

Next steps