Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Ray Serve supports gRPC alongside HTTP. Define a .proto file, compile it, and bind your deployment as a gRPC service.

Install

pip install -U "ray[serve-grpc]"

Define a service

my_service.proto:
syntax = "proto3";
package my;

message PredictRequest { string text = 1; }
message PredictResponse { string label = 1; }

service Predictor {
  rpc Predict(PredictRequest) returns (PredictResponse);
}
Compile:
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. my_service.proto

Implement the deployment

from ray import serve
import my_service_pb2
import my_service_pb2_grpc

@serve.deployment
class Predictor(my_service_pb2_grpc.PredictorServicer):
    def Predict(self, request, context):
        return my_service_pb2.PredictResponse(label=classify(request.text))

serve.start(grpc_options=serve.gRPCOptions(
    port=9000,
    grpc_servicer_functions=["my_service_pb2_grpc.add_PredictorServicer_to_server"],
))
serve.run(Predictor.bind())

Call the service

import grpc
import my_service_pb2
import my_service_pb2_grpc

with grpc.insecure_channel("localhost:9000") as channel:
    stub = my_service_pb2_grpc.PredictorStub(channel)
    response = stub.Predict(my_service_pb2.PredictRequest(text="hello"))
    print(response.label)

Streaming

gRPC server-streaming is supported by yielding from the service method.
def StreamPredict(self, request, context):
    for token in generate(request.text):
        yield my_service_pb2.PredictResponse(label=token)

Coexist with HTTP

You can run both protocols at once. Configure HTTP on port 8000 and gRPC on port 9000; the same deployment can implement both interfaces.

Next steps

HTTP guide

HTTP routing options.

Production guide

Capacity planning, observability, deployment.