Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

The dashboard runs on the head node at port 8265. Default tabs:

Overview

Cluster-wide resource utilization and a list of recent jobs.

Jobs

Every job submitted via ray.init, the Jobs API, or RayJob. Click in to see logs, the entrypoint, and per-task progress.

Cluster

Live node list with CPU, GPU, memory, and object-store usage per node.

Actors

All actors in the cluster. Filter by name, state, or job. Click an actor to see logs, stack traces, and resource usage.

Tasks

Pending, running, finished, and failed tasks. Useful for diagnosing scheduling stalls.

Logs

A unified log viewer that aggregates worker stderr/stdout across nodes.

Metrics

Embedded Grafana panels (when Prometheus + Grafana are configured) showing object store, scheduler, and library metrics.

Library tabs

When you use Ray Train, Ray Tune, Ray Serve, or Ray Data, dedicated tabs appear with library-specific views — per-trial progress for Tune, per-deployment QPS and replica health for Serve, per-stage timing for Data.

Stack traces

On any worker, the dashboard exposes a “Stack Trace” action that captures a py-spy snapshot. Available via the actor or task detail page.

Memory

Object store usage per node, with the largest objects highlighted.

Tips

The dashboard is great for live debugging. For long-term history, scrape the Prometheus endpoint and store metrics in your own Prometheus / VictoriaMetrics / Cortex instance.
The dashboard isn’t authenticated by default. Put it behind an auth proxy in production.

Next steps

Metrics

Time-series data behind the dashboard.

State API

Programmatic access.