Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

A single Ray cluster can host any number of Serve applications. Each application has its own deployment graph, route prefix, and configuration.

Define multiple apps

applications:
  - name: search
    route_prefix: /search
    import_path: search_app:app
    deployments:
      - name: SearchService
        num_replicas: 4

  - name: classify
    route_prefix: /classify
    import_path: classify_app:app
    deployments:
      - name: ClassifyService
        num_replicas: 2

Deploy

serve deploy multi-app.yaml
Apps come up in parallel. A failure in one application doesn’t affect the others.

Update one app

Edit multi-app.yaml (e.g., bump num_replicas for search) and re-run serve deploy. Only the affected application is reconfigured.

Independent autoscaling

Each app’s deployments autoscale based on their own traffic. A burst on /search doesn’t pull replicas away from /classify.

Shared resources

Apps share the cluster’s resource pool. If both apps’ autoscalers want every available GPU at the same time, the controller arbitrates via the placement-group scheduler.

When to split

Use multi-app when:
  • Different teams own different services on the same cluster.
  • One application has different scaling characteristics than another.
  • You want fault isolation between independent apps.
Use a single multi-deployment app when components share a request pipeline (e.g., tokenizer + model + post-processor).

Next steps

Develop and deploy

Application lifecycle management.

Production guide

Multi-tenant production guidance.