Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This walkthrough builds a Ray Serve application from scratch.

Install

pip install -U "ray[serve]"

A minimal deployment

from ray import serve

@serve.deployment
class Greeter:
    def __init__(self, greeting: str = "Hello"):
        self.greeting = greeting

    def __call__(self, name: str) -> str:
        return f"{self.greeting}, {name}!"

serve.run(Greeter.bind(greeting="Hi"))
serve.run starts the local Serve runtime, deploys Greeter, and exposes it at http://localhost:8000.

Call the deployment

import requests
response = requests.get("http://localhost:8000/", params={"name": "world"})

Custom HTTP handler

For full control over the request and response, accept a Starlette Request:
from starlette.requests import Request

@serve.deployment
class Echo:
    async def __call__(self, request: Request):
        body = await request.json()
        return {"echo": body}

serve.run(Echo.bind())

FastAPI integration

from fastapi import FastAPI
from ray import serve

app = FastAPI()

@serve.deployment
@serve.ingress(app)
class Service:
    @app.get("/items/{item_id}")
    def get_item(self, item_id: int):
        return {"item_id": item_id}

serve.run(Service.bind())
@serve.ingress(app) mounts a FastAPI router as the deployment’s HTTP interface.

Multiple replicas

@serve.deployment(num_replicas=4)
class Service:
    ...
Ray Serve runs four copies of the deployment, load-balanced behind the same endpoint.

Resource requests

@serve.deployment(ray_actor_options={"num_cpus": 2, "num_gpus": 1})
class GPUService:
    def __init__(self):
        self.model = load_gpu_model()

Bind and run

Service.bind(args) creates a deployment handle — a graph node, not a running deployment. serve.run(handle) materializes the graph and starts the replicas.

Stop

serve.shutdown()

Next steps

Key concepts

Deployments, applications, and the controller.

Model composition

Chain deployments into pipelines.