AutoHete: New AI training system boosts LLM efficiency up to 1.91x, adapts to single or multi-GPU setups

AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMs

View PDF HTML (experimental) Abstract:Transformer-based large language models (LLMs) have demonstrated exceptional capabilities in sequence modeling and text generation, with improvements scaling proportionally with model size. However, the limitations of GPU memory have restricted LLM training accessibility for many researchers. Existing heterogeneous training methods significantly expand the scale of trainable models but introduce substantial communication overheads and CPU workloads. In this ...