Develop and Deploy

Local iteration
Build a config file
Deploy to a remote cluster
Update an application
Serve on Kubernetes
Health checks
Inspect a running deployment
Next steps

The Ray Serve workflow has two phases: iterate locally with serve.run, then promote to a long-lived cluster with serve deploy (CLI) or a Kubernetes RayService.

Local iteration

import ray
from ray import serve

@serve.deployment
class App:
    def __call__(self, request):
        return "hello"

serve.run(App.bind())

Make changes, rerun serve.run. Existing replicas are torn down and replaced.

Build a config file

serve build my_module:app -o config.yaml

This generates a YAML representation of your application that you can edit and deploy elsewhere.

applications:
  - name: my-app
    route_prefix: /
    import_path: my_module:app
    deployments:
      - name: App
        num_replicas: 4
        ray_actor_options:
          num_cpus: 1

Deploy to a remote cluster

serve deploy config.yaml

serve deploy POSTs the config to the cluster’s controller. The cluster pulls the latest code (via your Docker image, working directory, or git revision) and starts replicas.

Update an application

Edit config.yaml and re-run serve deploy. Ray Serve performs a rolling update: it brings up new replicas, drains the old ones, and applies the change without dropping requests.

Serve on Kubernetes

Use the RayService CRD to manage Serve applications declaratively.

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: my-service
spec:
  serveConfigV2: |
    applications: [...]
  rayClusterConfig:
    ...

The KubeRay operator reconciles the RayService against the cluster.

Health checks

Serve exposes health endpoints at /-/healthz and /-/routes. Use them in liveness/readiness probes:

livenessProbe:
  httpGet:
    path: /-/healthz
    port: 8000

Inspect a running deployment

serve status
serve config

serve status shows per-deployment replica counts and health. serve config prints the active configuration.

Next steps

Production guide

Failure handling, observability, and capacity planning.

Configure deployment

All deployment-level options.

Ray Serve Key Concepts Model Composition

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Develop and Deploy

Local iteration

Build a config file

Deploy to a remote cluster

Update an application

Serve on Kubernetes

Health checks

Inspect a running deployment

Next steps

Production guide

Configure deployment

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Local iteration

​Build a config file

​Deploy to a remote cluster

​Update an application

​Serve on Kubernetes

​Health checks

​Inspect a running deployment

​Next steps

Production guide

Configure deployment

Local iteration

Build a config file

Deploy to a remote cluster

Update an application

Serve on Kubernetes

Health checks

Inspect a running deployment

Next steps