News Score: Score the News, Sort the News, Rewrite the Headlines

Enabling Trillion-Parameter Models on AWS EFA

At Perplexity, we use the best models for our product, our APIs, and our research teams. Large open-source Mixture-of-Experts models, such as Kimi-K2 pose particular challenges, as the largest inference nodes with 8x NVIDIA H200 GPUs cannot efficiently accommodate them, necessitating multi-node deployments. We present a set of kernels for expert parallelism which achieve state-of-the-art latencies on ConnectX-7, exceeding the performance of DeepEP. The same kernels are also the first to achieve ...

Read more at research.perplexity.ai

© News Score  score the news, sort the news, rewrite the headlines