Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Head node
The head runs cluster-wide services that don’t have to scale with workers:- GCS (Global Control Service): cluster metadata, actor registry, placement-group state.
- Cluster autoscaler: requests new worker nodes when the workload needs more resources.
- Dashboard: web UI and API for inspecting the cluster.
- Driver process: optional — drivers can run on the head or anywhere else with network access.
Worker node
Workers run user code. Each worker hosts:- Raylet: schedules tasks and actors locally; coordinates with raylets on other nodes.
- Object store (Plasma): a node-local shared-memory segment for object data.
- Python (or Java/C++) worker processes: run tasks and actors.
Driver
A driver is the process that owns the top-levelray.init() call. The driver submits tasks and actors and consumes their results. There can be many drivers connected to one cluster simultaneously.
Resources
Each node advertises a resource bundle (CPUs, GPUs, memory, custom labels). Tasks and actors request resources; the scheduler matches requests to nodes. See Scheduling.Cluster autoscaler
Watches pending resource requests and node utilization. When demand outstrips supply, it requests new nodes from the underlying provider (Kubernetes, AWS Auto Scaling Groups, GCP Managed Instance Groups, etc.). When nodes have been idle past a timeout, it terminates them.Dashboard
Available athttp://<head-ip>:8265. Shows:
- Node-level resource utilization
- Live task and actor lists
- Logs and stack traces per worker
- Ray Train, Ray Tune, Ray Serve, and Ray Data sub-tabs
Object spilling
When the object store on a node fills, Ray spills cold objects to local disk (or external storage like S3). Configure spilling targets to handle workloads larger than aggregate memory.Networking
Ports a cluster typically uses:- 6379: GCS port (
ray start --port). - 10001: Ray Client server.
- 8265: Dashboard / Jobs API.
- Random ports: raylet, object manager, worker shims.
Next steps
CLI
Cluster lifecycle commands.
Dashboard
Configure and secure the dashboard.