GPU Workloads

Prerequisites
GPU worker group
Image
Request GPUs from tasks/actors
Multi-GPU per actor
Different GPU types
Verify
Next steps

Prerequisites

A Kubernetes cluster with GPU node pools (EKS + p4 nodes, GKE + accelerator nodes, AKS + N-series, etc.).
The NVIDIA device plugin or cloud provider’s GPU operator installed.

GPU worker group

workerGroupSpecs:
  - groupName: gpu
    replicas: 0
    minReplicas: 0
    maxReplicas: 8
    rayStartParams: {}
    template:
      spec:
        nodeSelector:
          cloud.google.com/gke-accelerator: nvidia-h100-80gb   # GKE example
        tolerations:
          - key: nvidia.com/gpu
            operator: Exists
            effect: NoSchedule
        containers:
          - name: ray-worker
            image: rayproject/ray:2.43.0-gpu
            resources:
              requests:
                cpu: "8"
                memory: "32Gi"
                nvidia.com/gpu: 1
              limits:
                cpu: "8"
                memory: "32Gi"
                nvidia.com/gpu: 1

Image

Use the -gpu Ray image (CUDA + cuDNN preinstalled), or build your own from nvidia/cuda:*.

Request GPUs from tasks/actors

@ray.remote(num_gpus=1)
def predict(...):
    ...

The Ray scheduler matches the request to a GPU worker. If none exists, the autoscaler adds one.

Multi-GPU per actor

@ray.remote(num_gpus=2)
class BigModel:
    ...

Combine with placement groups for tensor-parallel models that span GPUs.

Different GPU types

Tag custom resources to differentiate GPU classes:

rayStartParams:
  resources: '"{\"H100\": 1}"'

@ray.remote(num_gpus=1, resources={"H100": 1})
def heavy_inference(...):
    ...

Verify

kubectl exec -it <head-pod> -- ray status

Should show GPUs in the cluster’s resource summary.

Next steps

TPU

Run on Google Cloud TPUs.

Autoscaling

Scale GPU pools on demand.

Autoscaling on Kubernetes TPU Workloads

⌘I

Ray Clusters

Observability

GPU Workloads

Prerequisites

GPU worker group

Image

Request GPUs from tasks/actors

Multi-GPU per actor

Different GPU types

Verify

Next steps

TPU

Autoscaling

Ray Clusters

Observability

Documentation Index

​Prerequisites

​GPU worker group

​Image

​Request GPUs from tasks/actors

​Multi-GPU per actor

​Different GPU types

​Verify

​Next steps

TPU

Autoscaling

Prerequisites

GPU worker group

Image

Request GPUs from tasks/actors

Multi-GPU per actor

Different GPU types

Verify

Next steps