Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

The Ray Serve workflow has two phases: iterate locally with serve.run, then promote to a long-lived cluster with serve deploy (CLI) or a Kubernetes RayService.

Local iteration

import ray
from ray import serve

@serve.deployment
class App:
    def __call__(self, request):
        return "hello"

serve.run(App.bind())
Make changes, rerun serve.run. Existing replicas are torn down and replaced.

Build a config file

serve build my_module:app -o config.yaml
This generates a YAML representation of your application that you can edit and deploy elsewhere.
applications:
  - name: my-app
    route_prefix: /
    import_path: my_module:app
    deployments:
      - name: App
        num_replicas: 4
        ray_actor_options:
          num_cpus: 1

Deploy to a remote cluster

serve deploy config.yaml
serve deploy POSTs the config to the cluster’s controller. The cluster pulls the latest code (via your Docker image, working directory, or git revision) and starts replicas.

Update an application

Edit config.yaml and re-run serve deploy. Ray Serve performs a rolling update: it brings up new replicas, drains the old ones, and applies the change without dropping requests.

Serve on Kubernetes

Use the RayService CRD to manage Serve applications declaratively.
apiVersion: ray.io/v1
kind: RayService
metadata:
  name: my-service
spec:
  serveConfigV2: |
    applications: [...]
  rayClusterConfig:
    ...
The KubeRay operator reconciles the RayService against the cluster.

Health checks

Serve exposes health endpoints at /-/healthz and /-/routes. Use them in liveness/readiness probes:
livenessProbe:
  httpGet:
    path: /-/healthz
    port: 8000

Inspect a running deployment

serve status
serve config
serve status shows per-deployment replica counts and health. serve config prints the active configuration.

Next steps

Production guide

Failure handling, observability, and capacity planning.

Configure deployment

All deployment-level options.