Ray Data Internals

Logical and physical plans
Streaming executor
Operator fusion
Block sizing
Memory management
Fault tolerance
Stats and tracing
Configuration
Next steps

Understanding Ray Data internals helps when tuning pipelines and debugging slow stages.

Logical and physical plans

A Dataset builds a logical plan as you call operators (read_*, map_batches, filter, etc.). When execution starts, Ray Data optimizes the plan (fusing adjacent ops, reordering filters and projections) and produces a physical plan of operators connected by streaming queues.

Streaming executor

The executor pipelines operators across the cluster. Each operator is implemented as a Ray task or actor that consumes blocks from upstream queues and produces blocks for downstream queues. Key properties:

Backpressure: when a downstream queue is full, upstream operators stall instead of producing more data.
Spilling: when memory pressure rises, the executor spills cold blocks to local disk.
Locality-aware scheduling: operators are placed near their inputs when possible.

Operator fusion

Adjacent operators with compatible resource requirements are fused into a single Ray task to reduce scheduling overhead. For example, two consecutive map_batches calls on the same resources execute in one worker per block rather than handing data over the object store. You can disable fusion for debugging:

ctx = ray.data.DataContext.get_current()
ctx.optimizer_enabled = False

Block sizing

The executor targets blocks of target_max_block_size bytes (default 128 MiB). Larger blocks reduce overhead but increase memory pressure.

ctx.target_max_block_size = 256 * 1024 * 1024  # 256 MiB

Memory management

Ray Data tracks the memory used by in-flight blocks and pauses upstream operators when the budget is exceeded. The budget is split across:

The Ray object store (configured at ray.init)
Operator output buffers
Per-task memory requests

Fault tolerance

Failed read or map tasks are retried automatically. Lost intermediate blocks are reconstructed via lineage. For long pipelines, materialize intermediate datasets to avoid expensive reconstruction:

ds = ds.materialize()

materialize forces full execution and pins the result in the object store.

Stats and tracing

ds.stats() returns per-stage timing, throughput, and memory. ds.iter_internal_ref_bundles() exposes raw block refs for advanced use.

Configuration

ctx = ray.data.DataContext.get_current()
ctx.target_max_block_size = 128 * 1024 * 1024
ctx.execution_options.resource_limits.cpu = 64
ctx.execution_options.resource_limits.gpu = 8
ctx.execution_options.locality_with_output = True

Next steps

Performance tips

Diagnose and fix slow pipelines.

Key concepts

Operator and execution overview.

Preprocessors Performance Tips for Ray Data

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Ray Data Internals

Logical and physical plans

Streaming executor

Operator fusion

Block sizing

Memory management

Fault tolerance

Stats and tracing

Configuration

Next steps

Performance tips

Key concepts

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Logical and physical plans

​Streaming executor

​Operator fusion

​Block sizing

​Memory management

​Fault tolerance

​Stats and tracing

​Configuration

​Next steps

Performance tips

Key concepts

Logical and physical plans

Streaming executor

Operator fusion

Block sizing

Memory management

Fault tolerance

Stats and tracing

Configuration

Next steps