Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page collects the most common levers for improving Ray Data pipeline performance.

Profile first

ds.stats() shows per-stage time, throughput, and memory. Look for:
  • A stage that dominates wall-clock time
  • A stage with low CPU/GPU utilization
  • High spilling activity (object store pressure)

Tune batch size

Bigger batches → better GPU utilization. Start with 32–256 for inference; tune upward until GPU memory is the limit.
ds.map_batches(predict, batch_size=128)

Tune block size

Larger blocks reduce scheduling overhead but increase memory per task. Default 128 MiB is a good starting point.
ctx = ray.data.DataContext.get_current()
ctx.target_max_block_size = 256 * 1024 * 1024

Right-size concurrency

Set concurrency to match available resources. For GPU stages: num_gpus_in_cluster / num_gpus_per_replica.
ds.map_batches(predict, num_gpus=1, concurrency=8)

Avoid Python row UDFs

map (row-level) is significantly slower than map_batches (block-level). Vectorize with NumPy or pandas where possible.

Use the right batch format

FormatWhen to use
numpyTensor work; lowest overhead.
pandasTabular operations.
pyarrowZero-copy with Parquet.

Pre-shuffle data at write time

Global shuffles (random_shuffle) are expensive. Shuffle once when writing data and use randomize_block_order + a local shuffle buffer at read time.

Pin GPU stages

Ensure GPU stages don’t run on CPU-only nodes:
ds.map_batches(predict, num_gpus=1, accelerator_type="A100")

Materialize when reusing

If you iterate over the same dataset multiple times, materialize it after the expensive transforms:
ds = ds.map_batches(expensive_transform).materialize()
for epoch in range(10):
    for batch in ds.iter_torch_batches():
        ...

Adjust prefetch

ds.iter_torch_batches(prefetch_batches=4)
Higher prefetch overlaps IO with compute at the cost of memory.

Limit object store usage

For very large datasets, increase the object store size:
ray.init(object_store_memory=64 * 1024**3)  # 64 GiB

Diagnose spilling

Excessive spilling shows up in ds.stats() and in dashboard metrics. Reduce in-flight data by:
  • Lowering concurrency on slow stages
  • Lowering target_max_block_size
  • Adding materialize() after a memory-heavy stage so subsequent stages stream from disk

Next steps

Internals

How execution and scheduling work.

Observability

Cluster-level metrics for Ray Data.