"Exploring the Shift in Language & NLP Models: The Evolution from BERT & T5 to Transformer Encoders and PrefixLMs"

What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives — Yi Tay

The people who worked on language and NLP about 5+ years ago are left scratching their heads about where all the encoder models went. If BERT worked so well, why not scale it? What happened to encoder-decoders or encoder-only models? Today I try to unpack all that is going on, in this new era of LLMs. I Hope this post will be helpful. Few months ago I was also writing a long tweet-reply to this tweet by @srush_nlp at some point. Then the tweet got deleted because I closed the tab by accident.¯\_...