Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Use these methods to understand the shape and contents of a dataset before you process it.

Schema

ds.schema()
Returns the column names and Arrow types.

Count

ds.count()
count triggers full execution of any pending transforms — but it streams blocks, so it doesn’t load the dataset into memory.

Sample rows

ds.show(5)
ds.take(10)        # returns a list of dicts
ds.take_batch(10)  # returns a single batch dict

Statistics

ds.stats()
Returns a string summarizing per-stage execution time, throughput, and memory usage.

Plan

ds.plan_repr()
Shows the logical plan that will execute when you trigger evaluation.

Iterate without consuming

To inspect without modifying the dataset, prefer take or iter_batches(prefetch_blocks=0):
for batch in ds.iter_batches(batch_size=10):
    print(batch)
    break

Type-check rows

sample = ds.take(1)[0]
print(type(sample), list(sample.keys()))
Use this when migrating between batch formats — numpy, pandas, and pyarrow produce different Python types for the same logical data.

Next steps

Iterating

Consume datasets in training and inference loops.

Performance tips

Diagnose slow pipelines.