News Score: Score the News, Sort the News, Rewrite the Headlines

Whats better: Neural nets wider with less layers or thinner with more layers

Introduction Theory Results conclusion citations Introduction This post details my experiments on whether Transformers with more thin layers are better than Transformers with fewer wide layers. I tested 5 different configurations to conclude that an optimal ratio between is the best config, and in my experiments, 4 layers with an embd_dim of 1024 worked the best. I'm basing the layer width off of Wide Attention is the way forward for Transformers, which widens the layer through the FFN width, wh...

Read more at vatsadev.github.io

© News Score  score the news, sort the news, rewrite the headlines