Debugging Azure Networking for Elastic Cloud Serverless — Elastic Observability Labs
Summary of Findings
Elastic's Site Reliability Engineering team (SRE) observed unstable throughput and packet loss in Elastic Cloud Serverless running on Azure Kubernetes Service (AKS). After investigation, we identified the primary contributing factors to be RX ring buffer overflows and kernel input queue saturation on SR-IOV interfaces. To address this, we increased RX buffer sizes and adjusted the netdev backlog, which significantly improved network stability.
Setting the Scene
Elastic Cloud ...
Read more at elastic.co