The Illustrated Transformer
Discussions:
Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments)
Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese
Watch: MIT’s Deep Learning State of the Art lecture referencing this post
Featured in courses at Stanford, Harvard, MIT, Princeton, CMU and others
In the previous post, we looked at Attention – a ubiquitous method in modern deep lear...
Read more at jalammar.github.io