Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Metrics

Serve exports Prometheus-compatible metrics on port 8080 (configurable). Key metrics:
MetricDescription
ray_serve_num_http_requests_totalCounter of HTTP requests received.
ray_serve_request_latency_msHistogram of request latency.
ray_serve_num_ongoing_requestsIn-flight requests per replica.
ray_serve_num_replicasCurrent replica count per deployment.
ray_serve_replica_starts_totalCounter of replica restarts.
ray_serve_deployment_queued_requestsRequests waiting for a replica.

Dashboard

The Ray dashboard’s Serve tab shows:
  • Live replica counts per deployment
  • Per-replica QPS and latency
  • Recent deployment history (rolling updates, restarts)
  • Per-deployment logs

Logging

Each replica logs to /tmp/ray/session_*/logs/serve/. Configure log level per deployment:
@serve.deployment(logging_config={"log_level": "INFO", "encoding": "JSON"})
class Service:
    ...
For structured logging, set encoding="JSON".

Tracing

Use OpenTelemetry middleware on your FastAPI app:
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

FastAPIInstrumentor.instrument_app(api)
Spans propagate across DeploymentHandle calls when the OpenTelemetry context is forwarded.

Custom metrics

Use prometheus_client from inside a deployment:
from prometheus_client import Counter

predictions_total = Counter("predictions_total", "Predictions made")

@serve.deployment
class Predictor:
    def __call__(self, request):
        predictions_total.inc()
        return self.model(request)

Next steps

Production guide

Where these metrics fit in production.

Observability

Cluster-wide observability.