Ray Data offers three levels of shuffling, trading randomness against cost.Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
randomize_block_order
Cheap, no-op shuffle: just reshuffles the order in which blocks are emitted by the executor. Useful when the dataset is already pre-shuffled at write time and you want some run-to-run variation.Local shuffle buffer
When iterating, fill an in-memory buffer and yield rows from it in random order. Trades a fixed memory cost for per-batch randomness — within a buffer, any row can land in any batch.Full distributed shuffle
The most expensive option. Globally shuffles across the cluster — every row has equal probability of ending up in any block.Hash-based shuffle
groupby and sort perform a hash- or range-partitioned shuffle, used when grouping by key or producing a globally ordered output.
Best practices
Next steps
Iterating
Iterator options in training loops.
Performance tips
Profiling shuffle stages.