Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- A Kubernetes cluster with GPU node pools (EKS + p4 nodes, GKE + accelerator nodes, AKS + N-series, etc.).
- The NVIDIA device plugin or cloud provider’s GPU operator installed.
GPU worker group
workerGroupSpecs:
- groupName: gpu
replicas: 0
minReplicas: 0
maxReplicas: 8
rayStartParams: {}
template:
spec:
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-h100-80gb # GKE example
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
containers:
- name: ray-worker
image: rayproject/ray:2.43.0-gpu
resources:
requests:
cpu: "8"
memory: "32Gi"
nvidia.com/gpu: 1
limits:
cpu: "8"
memory: "32Gi"
nvidia.com/gpu: 1
Image
Use the -gpu Ray image (CUDA + cuDNN preinstalled), or build your own from nvidia/cuda:*.
Request GPUs from tasks/actors
@ray.remote(num_gpus=1)
def predict(...):
...
The Ray scheduler matches the request to a GPU worker. If none exists, the autoscaler adds one.
Multi-GPU per actor
@ray.remote(num_gpus=2)
class BigModel:
...
Combine with placement groups for tensor-parallel models that span GPUs.
Different GPU types
Tag custom resources to differentiate GPU classes:
rayStartParams:
resources: '"{\"H100\": 1}"'
@ray.remote(num_gpus=1, resources={"H100": 1})
def heavy_inference(...):
...
Verify
kubectl exec -it <head-pod> -- ray status
Should show GPUs in the cluster’s resource summary.
Next steps
TPU
Run on Google Cloud TPUs.
Autoscaling
Scale GPU pools on demand.