Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Metrics

Ray exports time-series metrics in Prometheus format from every component:
  • GCS (cluster control plane)
  • raylet (per-node scheduler)
  • dashboard agent (per-node aggregator)
  • library-specific metrics (Ray Data, Ray Train, Ray Serve)

Logs

Each Ray process writes logs to /tmp/ray/session_latest/logs/:
  • dashboard.log
  • gcs_server.out
  • monitor.out (autoscaler)
  • raylet.out
  • worker-*.out / worker-*.err
Worker stdout/stderr is forwarded to the driver by default; configure with ray.init(log_to_driver=False).

State API

A REST and Python API for inspecting the live state of a Ray cluster:
from ray.util.state import list_actors, list_tasks, list_nodes

list_actors(filters=[("state", "=", "ALIVE")])
list_tasks(filters=[("state", "=", "FAILED")])
list_nodes()
The CLI equivalent: ray summary actors, ray list tasks, ray list nodes.

Profiling

  • py-spy for CPU profiles. Trigger from the dashboard’s Stack Trace action.
  • memray for memory allocations.
  • Chrome trace: ray timeline -o trace.json and open in chrome://tracing.

Tracing

OpenTelemetry-style distributed tracing across tasks, actors, and library calls. Configure exporters at the application layer (FastAPI, vLLM); Ray-internal spans are gated behind a feature flag.

Dashboard

Ties metrics, logs, profiles, and the State API into a single browser UI. Available at http://<head>:8265.

Next steps

Dashboard

Browser UI walkthrough.

Metrics

Prometheus integration.