Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
KubeRay supports Google Cloud TPU pods (v4, v5e, v5p) on GKE.
Prerequisites
- A GKE cluster with a TPU node pool.
- The TPU initialization image bundled with the Ray image, or a custom image with
libtpu.
TPU node pool
Provision a TPU node pool through gcloud:
gcloud container node-pools create tpu-pool \
--cluster=my-cluster \
--machine-type=ct5lp-hightpu-4t \
--node-locations=us-central2-b \
--num-nodes=1
Worker group
workerGroupSpecs:
- groupName: tpu
replicas: 0
minReplicas: 0
maxReplicas: 4
rayStartParams: {}
template:
spec:
nodeSelector:
cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice
cloud.google.com/gke-tpu-topology: 2x4
containers:
- name: ray-worker
image: rayproject/ray:2.43.0
resources:
requests:
google.com/tpu: 4
limits:
google.com/tpu: 4
Request TPUs from a task
@ray.remote(resources={"TPU": 4})
def train_step(...):
import jax
print(jax.devices())
JAX example
import jax
import ray
@ray.remote(resources={"TPU": 4})
def jax_demo():
@jax.jit
def fn(x):
return x @ x.T
return fn(jax.numpy.ones((1024, 1024))).sum()
print(ray.get(jax_demo.remote()))
Tips
TPU pods initialize slowly compared to GPUs. Set idleTimeoutSeconds higher (e.g., 600) to avoid churn during interactive use.
Next steps
Storage
Mount GCS for TPU workflows.