Apr 04, 2025 | Notes from GTC'25: CUDA Techniques to Maximize Compute and Instruction Throughput |
Mar 23, 2025 | Notes from GTC'25: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency - Part 2 |
Mar 23, 2025 | Notes from GTC'25: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency - Part 1 |
Mar 02, 2025 | Faster Cross-Encoder Inference: Unleashing torch.compile for speed |
Mar 26, 2023 | Paper Summary #8 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness |
Oct 10, 2022 | Paper Summary #7 - Efficient Transformers: A Survey |