Pattern. Build pipelines by passing ObjectRefs between tasks. Ray transfers data between nodes as needed and can schedule downstream stages near upstream outputs.
@ray.remotedef load(uri): ...@ray.remotedef transform(data): ...@ray.remotedef score(data): ...results = [score.remote(transform.remote(load.remote(u))) for u in uris]ray.get(results)
Anti-pattern. Don’t ray.get between stages — it serializes the pipeline.
Pattern. Create a pool of actors and round-robin requests across them.
from ray.util import ActorPoolpool = ActorPool([Predictor.remote() for _ in range(8)])for result in pool.map(lambda actor, item: actor.predict.remote(item), inputs): ...
Pattern. Many small tasks add scheduling overhead. Batch them when possible.
@ray.remotedef process_batch(items): return [process(i) for i in items]batches = [items[i:i+128] for i in range(0, len(items), 128)]results = ray.get([process_batch.remote(b) for b in batches])
Anti-pattern. Submitting one task per row of a dataset is rarely the right shape — use Ray Data for that.
Calling ray.get from inside a remote function blocks a worker, defeating the point of distribution. If a task needs to wait on another, refactor so the dependency is passed as an argument instead.