Posts | Shreyansh Singh

Deriving the Gradient for the Backward Pass of Layer Normalization

Understanding the math behind Layer Normalization and deriving the gradients for the backward pass.

8 min read · June 04, 2025

2025 · ml math · ML
Notes from GTC'25: CUDA Techniques to Maximize Compute and Instruction Throughput

My notes from the talk on maximizing compute and instruction throughput at NVIDIA GTC 2025.

32 min read · April 04, 2025

2025 · cuda mlsys · MLSys
Notes from GTC'25: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency - Part 2

Second part of my notes from the talk on maximizing memory bandwidth at NVIDIA GTC 2025.

33 min read · March 23, 2025

2025 · cuda mlsys · MLSys
Notes from GTC'25: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency - Part 1

First part of my notes from the talk on maximizing memory bandwidth at NVIDIA GTC 2025.

33 min read · March 23, 2025

2025 · cuda mlsys · MLSys
Faster Cross-Encoder Inference: Unleashing torch.compile for speed

A quick writeup on accelerating a Jina Cross-Encoder using torch.compile

20 min read · March 02, 2025

2025 · inference-optimization efficiency mlsys · MLSys
Paper Summary #13 - Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

My notes from the Physics of Language Models series of papers.

27 min read · September 21, 2024

2024 · transformer reasoning paper-summaries · LLMs
Paper Summary #12 - Image Recaptioning in DALL-E 3

The image recaptioning technique used in DALL-E 3 was extended to videos in Sora.

12 min read · February 18, 2024

2024 · image-captioning generative-ai · Computer Vision
Paper Summary #11 - Sora

OpenAI announced a ground-breaking text-to-video diffusion model capable of generating high-definition videos up to 60 seconds long.

7 min read · February 18, 2024

2024 · diffusion image-generation video-generation generative-ai · Computer Vision
Paper Summary #10 - Gemini 1.5 Pro

Google DeepMind announced a multimodal LLM with support of up to 10M context length.

19 min read · February 18, 2024

2024 · llm multimodal transformer · LLMs
Solving Substitution Ciphers using Markov Chain Monte Carlo (MCMC)

Deciphering substitution ciphers can be framed as a Markov chain problem and a simple Monte Carlo sampling approach can help solve them very efficiently

5 min read · July 23, 2023

2023 · sampling probability mcmc cryptography · Mathematics