LoRA adapters add a small, model-specific layer on top of a frozen base model. Ray Serve can load many adapters per replica and route each request to the right one.Documentation Index
Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Configure
dynamic_lora_loading_path is the directory or S3 prefix Ray Serve looks in for adapters. Each adapter is a subdirectory whose name becomes the model ID.
Call with an adapter
my-lora-id isn’t loaded, Serve loads it from dynamic_lora_loading_path. If the per-replica adapter cache is full, the least-recently-used adapter is evicted.
Best practices
Next steps
Serving
Production deployment guidance.
Configuration
Engine and resource tuning.