News Score: Score the News, Sort the News, Rewrite the Headlines

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

View PDF Abstract:Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. Instead of masking the input, our approach corrupts it by r...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines