KV Cache in Nanogpt

Jun 17, 2023

Code

A modification of nanoGPT to use KV-Cache during inference. Using a KV Cache helps speed up inference since we don’t have to do attention re-computation over the entire generated sequence every time to generate the next tokens.

Shreyansh Singh

Lead Machine Learning Engineer

My research interests include Machine Learning, Natural Language Processing and as well as the application of Machine Learning in Privacy. Lately, I have been reading more about parameter-efficient ML and model compression as well.