Ray’s LLM stack combines Ray Data (for batch inference), Ray Train (for fine-tuning), and Ray Serve (for production serving) into a single workflow. TheDocumentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
ray.serve.llm and ray.data.llm modules wrap engines like vLLM in Ray-native APIs.
Quickstart
Run your first LLM batch inference job.
Serving
Deploy LLMs behind an OpenAI-compatible API.
Batch inference
Score prompts at scale with Ray Data.
Configuration
Tensor parallelism, quantization, throughput tuning.