KV Cache in Nanogpt

A modification of nanoGPT to use KV-Cache during inference. Using a KV Cache helps speed up inference since we don’t have to do attention re-computation over the entire generated sequence every time to generate the next tokens.

Avatar
Shreyansh Singh
Lead Machine Learning Engineer

My research interests include Machine Learning, Natural Language Processing and as well as the application of Machine Learning in Privacy. Lately, I have been reading more about parameter-efficient ML and model compression as well.

Next
Previous