Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Ray jobs often need persistent storage for datasets, checkpoints, and logs. The standard Kubernetes options apply: emptyDir, PersistentVolumeClaim, hostPath, and CSI-backed object storage.

PVC

template:
  spec:
    containers:
      - name: ray-worker
        volumeMounts:
          - { name: data, mountPath: /data }
    volumes:
      - name: data
        persistentVolumeClaim:
          claimName: ray-data-pvc

Object storage

Use the cloud’s CSI driver to mount S3 / GCS / Azure Blob as a filesystem, or call them directly from your Python code (s3fs, gcsfs). For GKE, the GCS Fuse CSI driver mounts a bucket as a directory:
volumes:
  - name: ckpts
    csi:
      driver: gcsfuse.csi.storage.gke.io
      volumeAttributes:
        bucketName: my-bucket
        mountOptions: "implicit-dirs"

Shared filesystems

For NFS, EFS, or FSx for Lustre:
volumes:
  - name: shared
    nfs:
      server: nfs.internal
      path: /exports/data
Mount at the same path on every node (head and workers) so worker code can use absolute paths consistently.

Object spilling

When the object store fills, Ray spills cold objects to disk. Pick a fast scratch location:
rayStartParams:
  temp-dir: "/local-ssd/ray"
Mount a fast local SSD at that path:
volumes:
  - name: scratch
    emptyDir: { sizeLimit: 200Gi }

Best practices

For large training jobs, write checkpoints to durable cloud storage (storage_path: s3://...) rather than to the local PVC. PVCs follow the pod; cloud storage outlives it.

Next steps

Configuration

Pod-level settings.

Troubleshooting

Common storage issues.