Inspecting Data

Schema
Count
Sample rows
Statistics
Plan
Iterate without consuming
Type-check rows
Next steps

Use these methods to understand the shape and contents of a dataset before you process it.

Schema

ds.schema()

Returns the column names and Arrow types.

Count

ds.count()

count triggers full execution of any pending transforms — but it streams blocks, so it doesn’t load the dataset into memory.

Sample rows

ds.show(5)
ds.take(10)        # returns a list of dicts
ds.take_batch(10)  # returns a single batch dict

Statistics

ds.stats()

Returns a string summarizing per-stage execution time, throughput, and memory usage.

Plan

ds.plan_repr()

Shows the logical plan that will execute when you trigger evaluation.

Iterate without consuming

To inspect without modifying the dataset, prefer take or iter_batches(prefetch_blocks=0):

for batch in ds.iter_batches(batch_size=10):
    print(batch)
    break

Type-check rows

sample = ds.take(1)[0]
print(type(sample), list(sample.keys()))

Use this when migrating between batch formats — numpy, pandas, and pyarrow produce different Python types for the same logical data.

Next steps

Iterating

Consume datasets in training and inference loops.

Performance tips

Diagnose slow pipelines.

Transforming Data Iterating Over Data

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Inspecting Data

Schema

Count

Sample rows

Statistics

Plan

Iterate without consuming

Type-check rows

Next steps

Iterating

Performance tips

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Schema

​Count

​Sample rows

​Statistics

​Plan

​Iterate without consuming

​Type-check rows

​Next steps

Iterating

Performance tips

Schema

Count

Sample rows

Statistics

Plan

Iterate without consuming

Type-check rows

Next steps