"Hybrid Digital-Analog Algorithm Revolutionizes Large-Scale Neural Network Training, Overcoming Hardware Limitations"

Thermodynamic Natural Gradient Descent

View PDF HTML (experimental) Abstract:Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by digital computers). Here we show that natural gradient descent (NGD), a second-order method, can have a similar computational complexity per iteration to a first-order method, when employing appropriate hardware. We present a...