The Ray Serve workflow has two phases: iterate locally withDocumentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
serve.run, then promote to a long-lived cluster with serve deploy (CLI) or a Kubernetes RayService.
Local iteration
serve.run. Existing replicas are torn down and replaced.
Build a config file
Deploy to a remote cluster
serve deploy POSTs the config to the cluster’s controller. The cluster pulls the latest code (via your Docker image, working directory, or git revision) and starts replicas.
Update an application
Editconfig.yaml and re-run serve deploy. Ray Serve performs a rolling update: it brings up new replicas, drains the old ones, and applies the change without dropping requests.
Serve on Kubernetes
Use the RayService CRD to manage Serve applications declaratively.Health checks
Serve exposes health endpoints at/-/healthz and /-/routes. Use them in liveness/readiness probes:
Inspect a running deployment
serve status shows per-deployment replica counts and health. serve config prints the active configuration.
Next steps
Production guide
Failure handling, observability, and capacity planning.
Configure deployment
All deployment-level options.