Each Ray node exposes Prometheus metrics for the GCS, raylet, object store, and user-facing libraries.Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
By default, every node exposes metrics on port 8080 (configurable via--metrics-export-port). The dashboard’s auto-discovered Prometheus targets file lives at /tmp/ray/prom_metrics_service_discovery.json.
Prometheus config
Key metrics
| Metric | Description |
|---|---|
ray_node_cpu_utilization | Per-node CPU usage. |
ray_object_store_memory_used | Object store bytes in use. |
ray_tasks | Pending, running, and completed tasks. |
ray_actors | Actor counts by state. |
ray_serve_* | Ray Serve request and replica metrics. |
ray_data_* | Ray Data per-stage throughput and memory. |
ray_train_* | Ray Train per-worker progress. |
Grafana dashboards
Ray ships a default Grafana dashboard JSON at/tmp/ray/session_latest/metrics/grafana/dashboards/. Import it into your Grafana instance to get out-of-the-box panels.
Custom metrics
Useray.util.metrics to register your own:
Next steps
Observability
Tracing and logging alongside metrics.
Dashboard
The built-in cluster dashboard.