InferenceMAX™: Open-Source Benchmark Tracks Real-Time LLM Performance on AMD and NVIDIA GPUs; Expands to Google TPU and AWS Trainium

InferenceMAX™: Open Source Inference Benchmarking

LLM Inference performance is driven by two pillars, hardware and software. While hardware innovation drives step jumps in performance every year through the release of new GPUs/XPUs and new systems, software evolves every single day, delivering continuous performance gains on top of these step jumps.AI software like SGLang, vLLM, TensorRT-LLM, CUDA, and ROCm achieve continuous improvement in performance through kernel-level optimizations, distributed inference strategies, and scheduling innovati...

Read more at newsletter.semianalysis.com

Leaderboard Submit About