Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Teams use Ray to scale a wide range of AI and Python workloads. The patterns below show where each Ray library fits.

Batch inference

Run model predictions over millions of inputs across heterogeneous CPU and GPU resources.

Build with Ray Data

Stream data through a model in parallel across the cluster, with automatic batching, GPU placement, and shuffling.

Many-model batch inference

Apply many models to many partitions of data — for example, fitting a per-customer model or running ensembles.

Many models with Ray

Use Ray Data and actors to fan out work and aggregate results.

Model serving

Deploy models behind low-latency HTTP or gRPC endpoints with replication, autoscaling, and traffic splitting.

Build with Ray Serve

Compose multi-model pipelines, scale on demand, and ship to production with the same Python code.

Distributed training

Train large neural networks across many GPUs and many nodes without writing your own orchestration.

Build with Ray Train

PyTorch, PyTorch Lightning, Hugging Face Transformers, JAX, and TensorFlow integrations with checkpointing and fault tolerance.

Hyperparameter tuning

Search over thousands of trials with state-of-the-art algorithms and early stopping.

Build with Ray Tune

ASHA, PBT, BOHB, Optuna, Ax, BayesOpt — orchestrated and distributed across the cluster.

Reinforcement learning

Train RL agents at scale with parallel environments and distributed sample collection.

Build with RLlib

Algorithms (PPO, IMPALA, SAC, DQN, MARWIL), multi-agent support, and offline RL.

ML platform

Compose Ray libraries to build a unified, multi-tenant ML platform that handles data, training, tuning, and serving with a single runtime.

Run on Ray Clusters

Deploy on Kubernetes with KubeRay, on cloud VMs, or on-premises hardware.

End-to-end ML pipelines

Combine Ray Data, Ray Train, Ray Tune, and Ray Serve to build pipelines that run unchanged from a laptop to a cluster.

LLM training and serving

Train, fine-tune, and serve large language models with Ray.

Serve LLMs with Ray Serve

Deploy LLMs with vLLM, TensorRT-LLM, or custom backends.

LLM batch inference with Ray Data

Run batched inference over large prompt datasets.