Storage on Kubernetes

PVC
Object storage
Shared filesystems
Object spilling
Best practices
Next steps

Ray jobs often need persistent storage for datasets, checkpoints, and logs. The standard Kubernetes options apply: emptyDir, PersistentVolumeClaim, hostPath, and CSI-backed object storage.

PVC

template:
  spec:
    containers:
      - name: ray-worker
        volumeMounts:
          - { name: data, mountPath: /data }
    volumes:
      - name: data
        persistentVolumeClaim:
          claimName: ray-data-pvc

Object storage

Use the cloud’s CSI driver to mount S3 / GCS / Azure Blob as a filesystem, or call them directly from your Python code (s3fs, gcsfs). For GKE, the GCS Fuse CSI driver mounts a bucket as a directory:

volumes:
  - name: ckpts
    csi:
      driver: gcsfuse.csi.storage.gke.io
      volumeAttributes:
        bucketName: my-bucket
        mountOptions: "implicit-dirs"

Shared filesystems

For NFS, EFS, or FSx for Lustre:

volumes:
  - name: shared
    nfs:
      server: nfs.internal
      path: /exports/data

Mount at the same path on every node (head and workers) so worker code can use absolute paths consistently.

Object spilling

When the object store fills, Ray spills cold objects to disk. Pick a fast scratch location:

rayStartParams:
  temp-dir: "/local-ssd/ray"

Mount a fast local SSD at that path:

volumes:
  - name: scratch
    emptyDir: { sizeLimit: 200Gi }

Best practices

For large training jobs, write checkpoints to durable cloud storage (storage_path: s3://...) rather than to the local PVC. PVCs follow the pod; cloud storage outlives it.

Next steps

Configuration

Pod-level settings.

Troubleshooting

Common storage issues.

Observability on Kubernetes Troubleshooting KubeRay

⌘I

Ray Clusters

Observability

Documentation Index

​PVC

​Object storage

​Shared filesystems

​Object spilling

​Best practices

​Next steps

Configuration

Troubleshooting

PVC

Object storage

Shared filesystems

Object spilling

Best practices

Next steps