Off-policy algorithms (DQN, SAC, APEX-DQN) use replay buffers to store past experience. The Learner samples from the buffer instead of from fresh rollouts each step.Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Default buffer
Each off-policy algorithm sets a sensible default buffer.Configure buffer size and sampling
| Buffer type | Behavior |
|---|---|
EpisodeReplayBuffer | Uniform sampling. |
PrioritizedEpisodeReplayBuffer | Prioritized sampling with importance-sampled weights. |
MultiAgentEpisodeReplayBuffer | Multi-agent variant. |
Custom buffers
SubclassReplayBuffer and reference it from the config:
Best practices
Next steps
Algorithms
Off-policy algorithms in RLlib.
Offline RL
Train from a static dataset.