News Score: Score the News, Sort the News, Rewrite the Headlines

Thermodynamic Natural Gradient Descent

View PDF HTML (experimental) Abstract:Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by digital computers). Here we show that natural gradient descent (NGD), a second-order method, can have a similar computational complexity per iteration to a first-order method, when employing appropriate hardware. We present a...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines