| Jun 01, 2026 | KV Cache Compaction and Compression: From Attention Sinks to Learned Memory |
| May 17, 2026 | Paper Summary #17 - Engram |
| May 15, 2026 | Paper Summary #15 - Hyper-Connections and mHC |
| Feb 19, 2026 | Deep dive into CUDA Scan Kernels: Hierarchical and Single-Pass Variants |
| Apr 04, 2025 | Notes from GTC'25: CUDA Techniques to Maximize Compute and Instruction Throughput |
| Mar 23, 2025 | Notes from GTC'25: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency - Part 2 |
| Mar 23, 2025 | Notes from GTC'25: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency - Part 1 |
| Mar 02, 2025 | Faster Cross-Encoder Inference: Unleashing torch.compile for speed |
| Mar 26, 2023 | Paper Summary #8 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness |
| Oct 10, 2022 | Paper Summary #7 - Efficient Transformers: A Survey |