Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Each Ray node exposes Prometheus metrics for the GCS, raylet, object store, and user-facing libraries.

Endpoint

By default, every node exposes metrics on port 8080 (configurable via --metrics-export-port). The dashboard’s auto-discovered Prometheus targets file lives at /tmp/ray/prom_metrics_service_discovery.json.

Prometheus config

scrape_configs:
  - job_name: ray
    file_sd_configs:
      - files:
          - /tmp/ray/prom_metrics_service_discovery.json
For Kubernetes, KubeRay can install Prometheus and Grafana with the bundled charts. See Prometheus + Grafana.

Key metrics

MetricDescription
ray_node_cpu_utilizationPer-node CPU usage.
ray_object_store_memory_usedObject store bytes in use.
ray_tasksPending, running, and completed tasks.
ray_actorsActor counts by state.
ray_serve_*Ray Serve request and replica metrics.
ray_data_*Ray Data per-stage throughput and memory.
ray_train_*Ray Train per-worker progress.

Grafana dashboards

Ray ships a default Grafana dashboard JSON at /tmp/ray/session_latest/metrics/grafana/dashboards/. Import it into your Grafana instance to get out-of-the-box panels.

Custom metrics

Use ray.util.metrics to register your own:
from ray.util.metrics import Counter

requests = Counter("my_app_requests_total", "Requests received")
requests.inc()

Next steps

Observability

Tracing and logging alongside metrics.

Dashboard

The built-in cluster dashboard.