Visualizing 6D Mesh Parallelism
This is a companion longpost for a fun project I’ve yet to finish. In here, I show the reader how I personally visualize the collective communications involved in a simple 2⁶ 6D parallel mesh:There are many articles that describe various training parallelisms in vague words and simple visuals. Most of them fail to convey a deep understanding of the exact communications involved in a single training step, and even for the outliers that do, they do not cover the more complex case of combining all ...
Read more at main-horse.github.io