Offline RL learns from a static dataset ofDocumentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
(s, a, r, s', done) transitions instead of fresh environment rollouts. Use it when an environment is expensive, dangerous, or simply unavailable.
Dataset format
Offline RL reads from Ray Data datasets containing the standard columns. Useray.rllib.offline.OfflineData to write or read RLlib-format data.
Behavior cloning (BC)
MARWIL
Like BC but weighs each transition by the demonstrator’s advantage — gives more weight to actions that led to higher returns.CQL
Conservative Q-learning. Learns a Q-function with a regularizer that pushes down the value of out-of-distribution actions, making the resulting policy stay close to the dataset’s behavior policy.Evaluate against a real env
For algorithms that learn from offline data but still want online evaluation:Best practices
Next steps
Algorithms
See online algorithms too.
Replay buffers
Online off-policy training.