Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Ray’s LLM stack combines Ray Data (for batch inference), Ray Train (for fine-tuning), and Ray Serve (for production serving) into a single workflow. The ray.serve.llm and ray.data.llm modules wrap engines like vLLM in Ray-native APIs.

Quickstart

Run your first LLM batch inference job.

Serving

Deploy LLMs behind an OpenAI-compatible API.

Batch inference

Score prompts at scale with Ray Data.

Configuration

Tensor parallelism, quantization, throughput tuning.