Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Ray Serve supports gRPC alongside HTTP. Define a .proto file, compile it, and bind your deployment as a gRPC service.
Install
pip install -U "ray[serve-grpc]"
Define a service
my_service.proto:
syntax = "proto3";
package my;
message PredictRequest { string text = 1; }
message PredictResponse { string label = 1; }
service Predictor {
rpc Predict(PredictRequest) returns (PredictResponse);
}
Compile:
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. my_service.proto
Implement the deployment
from ray import serve
import my_service_pb2
import my_service_pb2_grpc
@serve.deployment
class Predictor(my_service_pb2_grpc.PredictorServicer):
def Predict(self, request, context):
return my_service_pb2.PredictResponse(label=classify(request.text))
serve.start(grpc_options=serve.gRPCOptions(
port=9000,
grpc_servicer_functions=["my_service_pb2_grpc.add_PredictorServicer_to_server"],
))
serve.run(Predictor.bind())
Call the service
import grpc
import my_service_pb2
import my_service_pb2_grpc
with grpc.insecure_channel("localhost:9000") as channel:
stub = my_service_pb2_grpc.PredictorStub(channel)
response = stub.Predict(my_service_pb2.PredictRequest(text="hello"))
print(response.label)
Streaming
gRPC server-streaming is supported by yielding from the service method.
def StreamPredict(self, request, context):
for token in generate(request.text):
yield my_service_pb2.PredictResponse(label=token)
Coexist with HTTP
You can run both protocols at once. Configure HTTP on port 8000 and gRPC on port 9000; the same deployment can implement both interfaces.
Next steps
HTTP guide
HTTP routing options.
Production guide
Capacity planning, observability, deployment.