| Sep 21, 2024 |   Paper Summary #13 - Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process  |  
  | May 28, 2023 |   Paper Summary #9 - Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training  |  
  | Mar 26, 2023 |   Paper Summary #8 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness  |  
  | Oct 10, 2022 |   Paper Summary #7 - Efficient Transformers: A Survey  |  
  | Dec 27, 2021 |   PPML Series #3 - Federated Learning for Mobile Keyboard Prediction  |  
  | Dec 18, 2021 |   PPML Series #2 - Federated Optimization Algorithms - FedSGD and FedAvg  |  
  | Dec 11, 2021 |   PPML Series #1 - An introduction to Federated Learning  |  
  | May 23, 2021 |   Paper Summary #6 - Language Models are Unsupervised Multitask Learners  |  
  | May 16, 2021 |   Paper Summary #5 - XLNet: Generalized Autoregressive Pretraining for Language Understanding  |  
  | May 09, 2021 |   Paper Summary #4 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding  |  
  | May 02, 2021 |   Paper Summary #3 - Improving Language Understanding by Generative Pre-Training  |  
  | Apr 25, 2021 |   Paper Summary #2 - Deep contextualized word representations (ELMo)  |  
  | Apr 18, 2021 |   Paper Summary #1 - Attention Is All You Need  |