Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Ray Serve’s HTTP proxy serves your deployments at configurable route prefixes.

Default behavior

@serve.deployment
class App:
    def __call__(self, request):
        return "hello"

serve.run(App.bind())
# GET http://localhost:8000/
Without a custom router, Ray Serve passes a Starlette Request to __call__.

Route prefix

serve.run(App.bind(), route_prefix="/api/v1")
# GET http://localhost:8000/api/v1/
Multiple applications can run side-by-side at different prefixes.

FastAPI integration

from fastapi import FastAPI
from ray import serve

api = FastAPI()

@serve.deployment
@serve.ingress(api)
class Service:
    @api.get("/items/{item_id}")
    async def get_item(self, item_id: int):
        return {"item_id": item_id}

    @api.post("/items/")
    async def create_item(self, item: dict):
        return {"created": item}

serve.run(Service.bind())
You get FastAPI’s full routing, dependency injection, request validation, and OpenAPI generation.

Streaming responses

from starlette.responses import StreamingResponse

@serve.deployment
class Stream:
    async def __call__(self, request):
        async def gen():
            for i in range(10):
                yield f"chunk {i}\n"
        return StreamingResponse(gen())

Middleware

Wrap your FastAPI app with middleware in the usual way:
from fastapi.middleware.cors import CORSMiddleware

api.add_middleware(CORSMiddleware, allow_origins=["*"])

Authentication

Use FastAPI’s Depends to authenticate requests:
from fastapi import Depends, HTTPException, Header

async def auth(authorization: str = Header(...)):
    if authorization != "Bearer secret":
        raise HTTPException(status_code=401)

@serve.deployment
@serve.ingress(api)
class Secure:
    @api.get("/secret", dependencies=[Depends(auth)])
    def secret(self):
        return {"ok": True}

HTTP options

Configure the proxy globally:
serve.start(http_options={"host": "0.0.0.0", "port": 8000, "request_timeout_s": 60})

Next steps

gRPC guide

gRPC serving with Ray Serve.

Model composition

Pipelines, ensembles, and routing.