Replay Buffers

Default buffer
Configure buffer size and sampling
Custom buffers
Best practices
Next steps

Off-policy algorithms (DQN, SAC, APEX-DQN) use replay buffers to store past experience. The Learner samples from the buffer instead of from fresh rollouts each step.

Default buffer

Each off-policy algorithm sets a sensible default buffer.

from ray.rllib.algorithms.dqn import DQNConfig
config = DQNConfig().environment("CartPole-v1")
algo = config.build()

Configure buffer size and sampling

config = config.training(
    replay_buffer_config={
        "type": "PrioritizedEpisodeReplayBuffer",
        "capacity": 1_000_000,
        "alpha": 0.6,
        "beta": 0.4,
    },
    train_batch_size=512,
)

Buffer type	Behavior
`EpisodeReplayBuffer`	Uniform sampling.
`PrioritizedEpisodeReplayBuffer`	Prioritized sampling with importance-sampled weights.
`MultiAgentEpisodeReplayBuffer`	Multi-agent variant.

Custom buffers

Subclass ReplayBuffer and reference it from the config:

config = config.training(replay_buffer_config={"type": MyBuffer, ...})

Best practices

For SAC and DQN, start with a buffer capacity 10–50× the train batch size and tune from there. Larger buffers stabilize training but slow down the time to first improvement.

Off-policy algorithms are memory-hungry. Keep an eye on ray.cluster_resources()["object_store_memory"] and tune target_max_block_size if you spill heavily.

Next steps

Algorithms

Off-policy algorithms in RLlib.

Offline RL

Train from a static dataset.

Learner Training Loop

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Replay Buffers

Default buffer

Configure buffer size and sampling

Custom buffers

Best practices

Next steps

Algorithms

Offline RL

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Default buffer

​Configure buffer size and sampling

​Custom buffers

​Best practices

​Next steps

Algorithms

Offline RL

Default buffer

Configure buffer size and sampling

Custom buffers

Best practices

Next steps