Vulkan Support for Ollama: Community Developer Adds GPU Acceleration, Faces Challenges with Large Models

Add Vulkan support to ollama by pufferffish · Pull Request #5059 · ollama/ollama

@pepijndevos Thanks for letting me know. After setting GGML_VK_FORCE_MAX_ALLOCATION_SIZE, I verified that llama3.1 8B works fine. However, I noticed a strange issue where models around 12–13 GiB in size fail to upload to the GPU. The CLI only shows the loading indicator continuously without any response. Successful: llama3.1:8b-instruct-q8_0 (7.95 GiB) gemma2:27b-text-q3_K_S (11.33 GiB) Failed: gemma2:27b-instruct-q3_K_L (13.52 GiB) llama3.1:8b-instruct-fp16 (14.96 GiB) When the upload is succes...