News Score: Score the News, Sort the News, Rewrite the Headlines

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

View PDF HTML (experimental) Abstract:This paper introduces PowerInfer-2, a framework designed for high-speed inference of Large Language Models (LLMs) on smartphones, particularly effective for models whose sizes exceed the device's memory capacity. The key insight of PowerInfer-2 is to utilize the heterogeneous computation, memory, and I/O resources in smartphones by decomposing traditional matrix computations into fine-grained neuron cluster computations. Specifically, PowerInfer-2 features a...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines