| Jan 17, 2026 | Paper Summary #14 - Physics of Language Models: Part 3.1, Knowledge Storage and Extraction |
| Nov 08, 2025 | Understanding Multi-Head Latent Attention (MLA) |
| Sep 21, 2024 | Paper Summary #13 - Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process |
| Feb 18, 2024 | Paper Summary #10 - Gemini 1.5 Pro |
| Oct 10, 2022 | Paper Summary #7 - Efficient Transformers: A Survey |
| May 23, 2021 | Paper Summary #6 - Language Models are Unsupervised Multitask Learners |
| May 16, 2021 | Paper Summary #5 - XLNet: Generalized Autoregressive Pretraining for Language Understanding |
| May 09, 2021 | Paper Summary #4 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
| May 02, 2021 | Paper Summary #3 - Improving Language Understanding by Generative Pre-Training |