Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Metrics
Serve exports Prometheus-compatible metrics on port 8080 (configurable). Key metrics:| Metric | Description |
|---|---|
ray_serve_num_http_requests_total | Counter of HTTP requests received. |
ray_serve_request_latency_ms | Histogram of request latency. |
ray_serve_num_ongoing_requests | In-flight requests per replica. |
ray_serve_num_replicas | Current replica count per deployment. |
ray_serve_replica_starts_total | Counter of replica restarts. |
ray_serve_deployment_queued_requests | Requests waiting for a replica. |
Dashboard
The Ray dashboard’s Serve tab shows:- Live replica counts per deployment
- Per-replica QPS and latency
- Recent deployment history (rolling updates, restarts)
- Per-deployment logs
Logging
Each replica logs to/tmp/ray/session_*/logs/serve/. Configure log level per deployment:
encoding="JSON".
Tracing
Use OpenTelemetry middleware on your FastAPI app:DeploymentHandle calls when the OpenTelemetry context is forwarded.
Custom metrics
Useprometheus_client from inside a deployment:
Next steps
Production guide
Where these metrics fit in production.
Observability
Cluster-wide observability.