Ray Use Cases

Batch inference
Many-model batch inference
Model serving
Distributed training
Hyperparameter tuning
Reinforcement learning
ML platform
End-to-end ML pipelines
LLM training and serving

Teams use Ray to scale a wide range of AI and Python workloads. The patterns below show where each Ray library fits.

Batch inference

Run model predictions over millions of inputs across heterogeneous CPU and GPU resources.

Build with Ray Data

Stream data through a model in parallel across the cluster, with automatic batching, GPU placement, and shuffling.

Many-model batch inference

Apply many models to many partitions of data — for example, fitting a per-customer model or running ensembles.

Many models with Ray

Use Ray Data and actors to fan out work and aggregate results.

Model serving

Deploy models behind low-latency HTTP or gRPC endpoints with replication, autoscaling, and traffic splitting.

Build with Ray Serve

Compose multi-model pipelines, scale on demand, and ship to production with the same Python code.

Distributed training

Train large neural networks across many GPUs and many nodes without writing your own orchestration.

Build with Ray Train

PyTorch, PyTorch Lightning, Hugging Face Transformers, JAX, and TensorFlow integrations with checkpointing and fault tolerance.

Hyperparameter tuning

Search over thousands of trials with state-of-the-art algorithms and early stopping.

Build with Ray Tune

ASHA, PBT, BOHB, Optuna, Ax, BayesOpt — orchestrated and distributed across the cluster.

Reinforcement learning

Train RL agents at scale with parallel environments and distributed sample collection.

Build with RLlib

Algorithms (PPO, IMPALA, SAC, DQN, MARWIL), multi-agent support, and offline RL.

ML platform

Compose Ray libraries to build a unified, multi-tenant ML platform that handles data, training, tuning, and serving with a single runtime.

Run on Ray Clusters

Deploy on Kubernetes with KubeRay, on cloud VMs, or on-premises hardware.

End-to-end ML pipelines

Combine Ray Data, Ray Train, Ray Tune, and Ray Serve to build pipelines that run unchanged from a laptop to a cluster.

LLM training and serving

Train, fine-tune, and serve large language models with Ray.

Serve LLMs with Ray Serve

Deploy LLMs with vLLM, TensorRT-LLM, or custom backends.

LLM batch inference with Ray Data

Run batched inference over large prompt datasets.

Installing Ray Ray Examples

⌘I

Get started

Ray Core

Ray Use Cases

Batch inference

Build with Ray Data

Many-model batch inference

Many models with Ray

Model serving

Build with Ray Serve

Distributed training

Build with Ray Train

Hyperparameter tuning

Build with Ray Tune

Reinforcement learning

Build with RLlib

ML platform

Run on Ray Clusters

End-to-end ML pipelines

LLM training and serving

Serve LLMs with Ray Serve

LLM batch inference with Ray Data

Get started

Ray Core

Documentation Index

​Batch inference

Build with Ray Data

​Many-model batch inference

Many models with Ray

​Model serving

Build with Ray Serve

​Distributed training

Build with Ray Train

​Hyperparameter tuning

Build with Ray Tune

​Reinforcement learning

Build with RLlib

​ML platform

Run on Ray Clusters

​End-to-end ML pipelines

​LLM training and serving

Serve LLMs with Ray Serve

LLM batch inference with Ray Data

Batch inference

Many-model batch inference

Model serving

Distributed training

Hyperparameter tuning

Reinforcement learning

ML platform

End-to-end ML pipelines

LLM training and serving