HTTP Guide

Default behavior
Route prefix
FastAPI integration
Streaming responses
Middleware
Authentication
HTTP options
Next steps

Ray Serve’s HTTP proxy serves your deployments at configurable route prefixes.

Default behavior

@serve.deployment
class App:
    def __call__(self, request):
        return "hello"

serve.run(App.bind())
# GET http://localhost:8000/

Without a custom router, Ray Serve passes a Starlette Request to __call__.

Route prefix

serve.run(App.bind(), route_prefix="/api/v1")
# GET http://localhost:8000/api/v1/

Multiple applications can run side-by-side at different prefixes.

FastAPI integration

from fastapi import FastAPI
from ray import serve

api = FastAPI()

@serve.deployment
@serve.ingress(api)
class Service:
    @api.get("/items/{item_id}")
    async def get_item(self, item_id: int):
        return {"item_id": item_id}

    @api.post("/items/")
    async def create_item(self, item: dict):
        return {"created": item}

serve.run(Service.bind())

You get FastAPI’s full routing, dependency injection, request validation, and OpenAPI generation.

Streaming responses

from starlette.responses import StreamingResponse

@serve.deployment
class Stream:
    async def __call__(self, request):
        async def gen():
            for i in range(10):
                yield f"chunk {i}\n"
        return StreamingResponse(gen())

Middleware

Wrap your FastAPI app with middleware in the usual way:

from fastapi.middleware.cors import CORSMiddleware

api.add_middleware(CORSMiddleware, allow_origins=["*"])

Authentication

Use FastAPI’s Depends to authenticate requests:

from fastapi import Depends, HTTPException, Header

async def auth(authorization: str = Header(...)):
    if authorization != "Bearer secret":
        raise HTTPException(status_code=401)

@serve.deployment
@serve.ingress(api)
class Secure:
    @api.get("/secret", dependencies=[Depends(auth)])
    def secret(self):
        return {"ok": True}

HTTP options

Configure the proxy globally:

serve.start(http_options={"host": "0.0.0.0", "port": 8000, "request_timeout_s": 60})

Next steps

gRPC guide

gRPC serving with Ray Serve.

Model composition

Pipelines, ensembles, and routing.

Configure a Deployment gRPC Guide

⌘I

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

HTTP Guide

Default behavior

Route prefix

FastAPI integration

Streaming responses

Middleware

Authentication

HTTP options

Next steps

gRPC guide

Model composition

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray RLlib

Ray LLM

Documentation Index

​Default behavior

​Route prefix

​FastAPI integration

​Streaming responses

​Middleware

​Authentication

​HTTP options

​Next steps

gRPC guide

Model composition

Default behavior

Route prefix

FastAPI integration

Streaming responses

Middleware

Authentication

HTTP options

Next steps