Working With Tensors

Read tensor data
Build tensor columns from arrays
Transform tensor columns
Use in training
Variable-shape tensors
Save tensor columns
Next steps

Ray Data supports tensor columns natively: an Arrow column whose elements are NumPy ndarrays of the same shape.

Read tensor data

read_images returns tensor columns by default:

ds = ray.data.read_images("s3://bucket/images/", size=(224, 224))
ds.schema()
# Column           Type
# ------           ----
# image            ArrowTensorType(shape=(224, 224, 3), dtype=uint8)
# path             string

read_numpy does the same for raw .npy files.

Build tensor columns from arrays

import numpy as np

ds = ray.data.from_items([
    {"id": i, "embedding": np.random.rand(768)} for i in range(1000)
])

Transform tensor columns

def normalize(batch):
    arr = batch["image"]
    batch["image"] = arr.astype("float32") / 255.0
    return batch

ds = ds.map_batches(normalize, batch_format="numpy")

The numpy batch format gives you native ndarrays, which is the most ergonomic format for tensor work.

Use in training

for batch in ds.iter_torch_batches(batch_size=32, dtypes={"image": torch.float32}):
    out = model(batch["image"].cuda())

Variable-shape tensors

For tensors with varying shapes (e.g., variable-length sequences), use a list of arrays per row:

ds = ray.data.from_items([
    {"id": i, "tokens": np.random.randint(0, 50000, size=(np.random.randint(10, 100),))}
    for i in range(100)
])

Note that some operators (like sort or groupby) don’t support variable-shape tensor columns.

Save tensor columns

ds.write_parquet("s3://bucket/embeddings/")

Tensor columns are serialized as nested Arrow arrays.

Next steps

Batch inference

Run image and text models over tensor columns.

Working with LLMs

Score prompts at scale.

Saving Data Working With LLMs

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Working With Tensors

Read tensor data

Build tensor columns from arrays

Transform tensor columns

Use in training

Variable-shape tensors

Save tensor columns

Next steps

Batch inference

Working with LLMs

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Read tensor data

​Build tensor columns from arrays

​Transform tensor columns

​Use in training

​Variable-shape tensors

​Save tensor columns

​Next steps

Batch inference

Working with LLMs

Read tensor data

Build tensor columns from arrays

Transform tensor columns

Use in training

Variable-shape tensors

Save tensor columns

Next steps