Multi-Application Deployments

Define multiple apps
Deploy
Update one app
Independent autoscaling
Shared resources
When to split
Next steps

A single Ray cluster can host any number of Serve applications. Each application has its own deployment graph, route prefix, and configuration.

Define multiple apps

applications:
  - name: search
    route_prefix: /search
    import_path: search_app:app
    deployments:
      - name: SearchService
        num_replicas: 4

  - name: classify
    route_prefix: /classify
    import_path: classify_app:app
    deployments:
      - name: ClassifyService
        num_replicas: 2

Deploy

serve deploy multi-app.yaml

Apps come up in parallel. A failure in one application doesn’t affect the others.

Update one app

Edit multi-app.yaml (e.g., bump num_replicas for search) and re-run serve deploy. Only the affected application is reconfigured.

Independent autoscaling

Each app’s deployments autoscale based on their own traffic. A burst on /search doesn’t pull replicas away from /classify.

Shared resources

Apps share the cluster’s resource pool. If both apps’ autoscalers want every available GPU at the same time, the controller arbitrates via the placement-group scheduler.

When to split

Use multi-app when:

Different teams own different services on the same cluster.
One application has different scaling characteristics than another.
You want fault isolation between independent apps.

Use a single multi-deployment app when components share a request pipeline (e.g., tokenizer + model + post-processor).

Next steps

Develop and deploy

Application lifecycle management.

Production guide

Multi-tenant production guidance.

Monitoring Ray Serve Serving LLMs

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Multi-Application Deployments

Define multiple apps

Deploy

Update one app

Independent autoscaling

Shared resources

When to split

Next steps

Develop and deploy

Production guide

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Define multiple apps

​Deploy

​Update one app

​Independent autoscaling

​Shared resources

​When to split

​Next steps

Develop and deploy

Production guide

Define multiple apps

Deploy

Update one app

Independent autoscaling

Shared resources

When to split

Next steps