Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

A RayService runs a Ray Serve application on a managed RayCluster. The KubeRay operator handles rolling upgrades, health checks, and zero-downtime updates.

Manifest

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: my-service
spec:
  serviceUnhealthySecondThreshold: 900
  deploymentUnhealthySecondThreshold: 300
  serveConfigV2: |
    applications:
      - name: app
        import_path: my_module:app
        route_prefix: /
        runtime_env:
          working_dir: "https://my.bucket.s3.amazonaws.com/app.zip"
        deployments:
          - name: Service
            num_replicas: 4
            ray_actor_options:
              num_cpus: 1
  rayClusterConfig:
    rayVersion: "2.43.0"
    enableInTreeAutoscaling: true
    headGroupSpec:
      rayStartParams: {}
      template:
        spec:
          containers:
            - name: ray-head
              image: rayproject/ray:2.43.0
              resources:
                requests: { cpu: "2", memory: "4Gi" }
                limits:   { cpu: "2", memory: "4Gi" }
    workerGroupSpecs:
      - groupName: serve
        replicas: 1
        minReplicas: 1
        maxReplicas: 16
        rayStartParams: {}
        template:
          spec:
            containers:
              - name: ray-worker
                image: rayproject/ray:2.43.0
                resources:
                  requests: { cpu: "4", memory: "8Gi" }
                  limits:   { cpu: "4", memory: "8Gi" }
kubectl apply -f rayservice.yaml
kubectl get rayservice my-service -w

Update the application

Edit serveConfigV2 (e.g., bump num_replicas) and re-apply. KubeRay performs a zero-downtime rolling update — bringing up new replicas, draining the old ones, and only switching traffic when the new version is healthy.

Inspect

kubectl get rayservice my-service -o jsonpath='{.status}'
The status block includes per-deployment counts, the active Serve config, and recent transition events.

Expose the service

The operator creates a *-serve-svc service. Front it with an ingress controller, gateway, or load balancer:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-service
spec:
  rules:
    - host: my-service.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-service-serve-svc
                port:
                  number: 8000

Health checks

The operator polls the Serve health endpoint. If a deployment stays unhealthy past deploymentUnhealthySecondThreshold, the operator triggers a rollback to the previous stable version.

Next steps

Production guide (Serve)

Capacity planning and observability.

High availability

GCS fault tolerance and multi-replica head.