Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Deployment
A deployment is a Python class (or function) that handles inference requests. The class is instantiated as one or more replicas — independent worker processes — that share traffic.Replica
A replica is one running instance of a deployment. Each replica is a Ray actor. Ray Serve manages replicas — scaling, health-checking, restarting, and routing requests to them.Application
An application is the unit you deploy withserve.run or serve deploy. It’s a graph of deployments, with one designated ingress deployment that handles incoming HTTP/gRPC requests.
route_prefix.
DeploymentHandle
A handle lets one deployment call another in-process (without going through HTTP).Controller
A single Serve controller actor manages all applications: launching replicas, applying configuration, and monitoring health. The controller is created once when Serve starts and persists for the cluster’s lifetime.Proxy
The HTTP/gRPC proxy runs on every node and routes incoming traffic to the right deployment replicas based on the route prefix and method.Ingress deployment
The deployment at the root of an application is the ingress — it receives external requests and (optionally) calls into other deployments via handles.Configuration
Configuration lives in two places:- In-code (
@serve.deployment(num_replicas=4, ...)): bundled with your Python code. - YAML (
serve config): override at deploy time without changing code.
Next steps
Develop and deploy
Promote local Serve apps to production.
Model composition
Build multi-deployment pipelines.