News Score: Score the News, Sort the News, Rewrite the Headlines

Making my local LLM voice assistant faster and more scalable with RAG

If you read my previous blog post, you probably already know that I like my smart home open-source and very local, and that certainly includes any voice assistant I may have. If you watched the video demo, you have probably also found out that it’s… slow. Trust me, I did too.Prefix caching helps, but it feels like cheating. Sure, it’ll look amazing in a demo, but as soon as I start using my LLM for other things (which I do, quite often), that cache is going to get evicted and that first prompt i...

Read more at johnthenerd.com

© News Score  score the news, sort the news, rewrite the headlines