Add Vulkan support to ollama by pufferffish · Pull Request #5059 · ollama/ollama
@pepijndevos Thanks for letting me know. After setting GGML_VK_FORCE_MAX_ALLOCATION_SIZE, I verified that llama3.1 8B works fine. However, I noticed a strange issue where models around 12–13 GiB in size fail to upload to the GPU. The CLI only shows the loading indicator continuously without any response.
Successful:
llama3.1:8b-instruct-q8_0 (7.95 GiB)
gemma2:27b-text-q3_K_S (11.33 GiB)
Failed:
gemma2:27b-instruct-q3_K_L (13.52 GiB)
llama3.1:8b-instruct-fp16 (14.96 GiB)
When the upload is succes...
Read more at github.com