Teams use Ray to scale a wide range of AI and Python workloads. The patterns below show where each Ray library fits.Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Batch inference
Run model predictions over millions of inputs across heterogeneous CPU and GPU resources.Build with Ray Data
Stream data through a model in parallel across the cluster, with automatic batching, GPU placement, and shuffling.
Many-model batch inference
Apply many models to many partitions of data — for example, fitting a per-customer model or running ensembles.Many models with Ray
Use Ray Data and actors to fan out work and aggregate results.
Model serving
Deploy models behind low-latency HTTP or gRPC endpoints with replication, autoscaling, and traffic splitting.Build with Ray Serve
Compose multi-model pipelines, scale on demand, and ship to production with the same Python code.
Distributed training
Train large neural networks across many GPUs and many nodes without writing your own orchestration.Build with Ray Train
PyTorch, PyTorch Lightning, Hugging Face Transformers, JAX, and TensorFlow integrations with checkpointing and fault tolerance.
Hyperparameter tuning
Search over thousands of trials with state-of-the-art algorithms and early stopping.Build with Ray Tune
ASHA, PBT, BOHB, Optuna, Ax, BayesOpt — orchestrated and distributed across the cluster.
Reinforcement learning
Train RL agents at scale with parallel environments and distributed sample collection.Build with RLlib
Algorithms (PPO, IMPALA, SAC, DQN, MARWIL), multi-agent support, and offline RL.
ML platform
Compose Ray libraries to build a unified, multi-tenant ML platform that handles data, training, tuning, and serving with a single runtime.Run on Ray Clusters
Deploy on Kubernetes with KubeRay, on cloud VMs, or on-premises hardware.
End-to-end ML pipelines
Combine Ray Data, Ray Train, Ray Tune, and Ray Serve to build pipelines that run unchanged from a laptop to a cluster.LLM training and serving
Train, fine-tune, and serve large language models with Ray.Serve LLMs with Ray Serve
Deploy LLMs with vLLM, TensorRT-LLM, or custom backends.
LLM batch inference with Ray Data
Run batched inference over large prompt datasets.