RayService Quickstart

A RayService runs a Ray Serve application on a managed RayCluster. The KubeRay operator handles rolling upgrades, health checks, and zero-downtime updates.

Manifest

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: my-service
spec:
  serviceUnhealthySecondThreshold: 900
  deploymentUnhealthySecondThreshold: 300
  serveConfigV2: |
    applications:
      - name: app
        import_path: my_module:app
        route_prefix: /
        runtime_env:
          working_dir: "https://my.bucket.s3.amazonaws.com/app.zip"
        deployments:
          - name: Service
            num_replicas: 4
            ray_actor_options:
              num_cpus: 1
  rayClusterConfig:
    rayVersion: "2.43.0"
    enableInTreeAutoscaling: true
    headGroupSpec:
      rayStartParams: {}
      template:
        spec:
          containers:
            - name: ray-head
              image: rayproject/ray:2.43.0
              resources:
                requests: { cpu: "2", memory: "4Gi" }
                limits:   { cpu: "2", memory: "4Gi" }
    workerGroupSpecs:
      - groupName: serve
        replicas: 1
        minReplicas: 1
        maxReplicas: 16
        rayStartParams: {}
        template:
          spec:
            containers:
              - name: ray-worker
                image: rayproject/ray:2.43.0
                resources:
                  requests: { cpu: "4", memory: "8Gi" }
                  limits:   { cpu: "4", memory: "8Gi" }

kubectl apply -f rayservice.yaml
kubectl get rayservice my-service -w

Update the application

Edit serveConfigV2 (e.g., bump num_replicas) and re-apply. KubeRay performs a zero-downtime rolling update — bringing up new replicas, draining the old ones, and only switching traffic when the new version is healthy.

Inspect

kubectl get rayservice my-service -o jsonpath='{.status}'

The status block includes per-deployment counts, the active Serve config, and recent transition events.

Expose the service

The operator creates a *-serve-svc service. Front it with an ingress controller, gateway, or load balancer:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-service
spec:
  rules:
    - host: my-service.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-service-serve-svc
                port:
                  number: 8000

Health checks

The operator polls the Serve health endpoint. If a deployment stays unhealthy past deploymentUnhealthySecondThreshold, the operator triggers a rollback to the previous stable version.

Ray Clusters

Observability

RayService Quickstart

Manifest

Update the application

Inspect

Expose the service

Health checks

Next steps

Production guide (Serve)

High availability

Ray Clusters

Observability

Documentation Index

​Manifest

​Update the application

​Inspect

​Expose the service

​Health checks

​Next steps

Production guide (Serve)

High availability

Manifest

Update the application

Inspect

Expose the service

Health checks

Next steps