Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Prometheus and Grafana

KubeRay’s docs and helm chart include reference Prometheus rules and Grafana dashboards.
helm install kuberay-monitoring kuberay/kuberay-monitoring -n monitoring --create-namespace
This installs Prometheus, Grafana, and a service monitor that scrapes Ray metrics from each pod.

ServiceMonitor

If you already run Prometheus Operator:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ray-cluster
spec:
  selector:
    matchLabels:
      ray.io/node-type: head
  endpoints:
    - port: metrics
      interval: 30s
Repeat for ray.io/node-type: worker.

Logs

Ray writes logs to /tmp/ray/session_*/logs/ in each pod. Aggregate with:
  • fluentbit DaemonSet, sending to your platform’s log backend.
  • kubectl logs for ad-hoc inspection.
  • The Ray dashboard’s “Logs” tab — fine for development, not for production retention.

Dashboard access

kubectl port-forward svc/<cluster>-head-svc 8265:8265
For shared access, expose via an ingress with auth in front.

Distributed tracing

Set RAY_TRACING_BACKEND and add OpenTelemetry exporters at the application layer:
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
Ray-internal tracing is gated behind a feature flag; enable with RAY_BACKEND_LOG_LEVEL=debug for development.

Next steps

Metrics catalog

All Ray-exported metrics.

Storage

Persistent log storage.