Meta VP denies artificially boosting Llama 4 AI benchmark scores; addresses mixed performance reports

Meta exec denies the company artificially boosted Llama 4's benchmark scores | TechCrunch

Image Credits:Bryce Durbin / TechCrunch 11:45 AM PDT · April 7, 2025 A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models’ weaknesses. The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that it’s “simply not true” that Meta trained its Llama 4 Maverick and Llama 4 Scout models on “test sets.” In AI benchmarks, test sets are collections of data used to evaluate the performanc...