optimizer

Paper Summary #9 - Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Paper: Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Link: https://arxiv.org/abs/2305.14342 Authors: Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma Code: https://github.com/Liuhong99/Sophia I have also released an annotated version of the paper. If you are interested, you can find it here. Sophia is probably one of the most interesting papers I have read recently and I really liked how well it was written. This post is basically the notes that I had made while reading the paper, which is why it is not exactly a blog post and most of it is verbatim copied from the paper.