What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives — Yi Tay
The people who worked on language and NLP about 5+ years ago are left scratching their heads about where all the encoder models went. If BERT worked so well, why not scale it? What happened to encoder-decoders or encoder-only models? Today I try to unpack all that is going on, in this new era of LLMs. I Hope this post will be helpful.
Few months ago I was also writing a long tweet-reply to this tweet by @srush_nlp at some point. Then the tweet got deleted because I closed the tab by accident.¯\_...
Read more at yitay.net