Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
RLlib accepts environments through a standard interface. The simplest path is a Gymnasium environment registered by name.
Gymnasium environments
config = PPOConfig().environment("CartPole-v1")
config = PPOConfig().environment("LunarLander-v3")
Any Gymnasium-registered ID works. Ensure the package is installed (e.g., pip install gymnasium[box2d]).
Custom environment
Subclass gymnasium.Env:
import gymnasium as gym
import numpy as np
class GridEnv(gym.Env):
metadata = {"render_modes": []}
def __init__(self, config=None):
self.observation_space = gym.spaces.Box(0, 1, (4,))
self.action_space = gym.spaces.Discrete(2)
def reset(self, *, seed=None, options=None):
self.t = 0
return np.zeros(4, dtype=np.float32), {}
def step(self, action):
self.t += 1
obs = np.random.rand(4).astype(np.float32)
reward = 1.0 if action == 0 else 0.0
terminated = self.t >= 10
return obs, reward, terminated, False, {}
config = PPOConfig().environment(GridEnv)
Multi-agent environments
Use RLlib’s MultiAgentEnv interface or a PettingZoo wrapper.
from ray.rllib.env.multi_agent_env import MultiAgentEnv
class MultiGrid(MultiAgentEnv):
def __init__(self, config=None):
super().__init__()
self.agents = ["a", "b"]
...
def reset(self, *, seed=None, options=None):
return {a: ... for a in self.agents}, {}
def step(self, action_dict):
...
return obs, rewards, terminateds, truncateds, infos
For PettingZoo:
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.butterfly import pistonball_v6
config = PPOConfig().environment(lambda _: PettingZooEnv(pistonball_v6.parallel_env()))
Vectorized envs
Run multiple environment instances in one EnvRunner for higher throughput:
config = config.env_runners(num_envs_per_env_runner=8)
External envs
For environments that drive RLlib (rather than RLlib stepping the env), see the ExternalEnv and PolicyClient APIs.
Next steps
RL modules
Custom policy networks for your env.
Training
Inside an RLlib training iteration.