MLC-LLM enables AMD GPUs to rival NVIDIA in LLM inference; 7900 XTX reaches 80% of RTX 4090 speed for Llama2 models

Making AMD GPUs competitive for LLM inference

Aug 9, 2023 • TL;DR MLC-LLM makes it possible to compile LLMs and deploy them on AMD GPUs using ROCm with competitive performance. More specifically, AMD Radeon™ RX 7900 XTX gives 80% of the speed of NVIDIA® GeForce RTX™ 4090 and 94% of the speed of NVIDIA® GeForce RTX™ 3090Ti for Llama2-7B/13B. Besides ROCm, our Vulkan support allows us to generalize LLM deployment to other AMD devices, for example, a SteamDeck with an AMD APU. Background There have been many LLM inference solutions since the b...