Shreyansh Singh

Nov 08, 2025	Understanding Multi-Head Latent Attention (MLA)
Jun 04, 2025	Deriving the Gradient for the Backward Pass of Layer Normalization
Apr 04, 2025	Notes from GTC'25: CUDA Techniques to Maximize Compute and Instruction Throughput
Mar 23, 2025	Notes from GTC'25: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency - Part 2
Mar 23, 2025	Notes from GTC'25: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency - Part 1
Mar 02, 2025	Faster Cross-Encoder Inference: Unleashing torch.compile for speed