News Score: Score the News, Sort the News, Rewrite the Headlines

VeriScore: Evaluating the factuality of verifiable claims in long-form text generation

AbstractExisting metrics for evaluating the factuality of long-form text, such as FACTSCORE (Min et al., 2023) and SAFE (Wei et al., 2024), decompose an input text into “atomic claims” and verify each against a knowledge base like Wikipedia. These metrics are not suitable for most generation tasks because they assume that every claim is verifiable (i.e., can plausibly be proven true or false). We address this issue with VERISCORE,1 a metric for evaluating factuality in diverse long-form generati...

Read more at aclanthology.org

© News Score  score the news, sort the news, rewrite the headlines