News Score: Score the News, Sort the News, Rewrite the Headlines

Replicate & Fly cold-start latency

Replicate has been my default serverless GPU choice in the past, and I’ve been trying to use it to set up some embedding models, like SPLADE and a Q&A-optimized bi-encoder. On the other hand, I’m a huge fan of Fly for hosting, and they’ve just announced GPUs.How does cold & warm latency compare between these providers?Problem ContextI’m building a semantic search engine, and I’m most interested in minimizing query-time latency, and trying out multiple different models.One constraint: I don’t wan...

Read more at venki.dev

© News Score  score the news, sort the news, rewrite the headlines