GitHub - distantmagic/paddler: Stateful load balancer custom-tailored for llama.cpp
Paddler
Paddler is an open-source load balancer and reverse proxy designed specifically for optimizing servers running llama.cpp.
Typical strategies like round robin or least connections are not effective for llama.cpp servers, which need slots for continuous batching and concurrent requests.
Paddler overcomes this by maintaining a stateful load balancer that is aware of each server's available slots, ensuring efficient request distribution. Additionally, Paddler uses agents to monitor the healt...
Read more at github.com