This quickstart loads a sample dataset, applies a transformation, and consumes batches.Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Install
Load a dataset
read_parquet returns a lazy Dataset. The metadata is read immediately; the data is read on demand.
Inspect
Transform
map_batches applies a function to each block of the dataset, returning a new block.
map. For filters, use filter.
Consume
The simplest way to consume a dataset is to iterate over batches.batch_format="pandas"for pandas DataFramesbatch_format="pyarrow"for Arrow tablesiter_torch_batches()for PyTorch tensors
Save
Distribute across a cluster
Ray Data automatically uses every node in your Ray cluster. To run on a multi-node cluster, connect to it:Pipe into Ray Train
ds.
Next steps
Loading data
Sources and formats Ray Data supports.
Batch inference
Run a model over a dataset.