Home Machine Learning CommVQ: Commutative Vector Quantization for KV Cache Compression

Machine Learning

CommVQ: Commutative Vector Quantization for KV Cache Compression

July 9, 2025

Large Language Models (LLMs) are increasingly used in applications requiring long context
lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as con-
text lengths grow. To address this, we propose Commutative Vector Quantization (CommVQ)
to significantly reduce memory usage for long context LLM inference. First, we leverage additive quantization by introducing a lightweight encoder and codebook to compress the KV cache,
which can then be decoded with a simple matrix multiplication. Second, to tackle the high
computational costs during decoding, we design the…

Source link

CommVQ: Commutative Vector Quantization for KV Cache Compression

More News

GameStop Is Auctioning Off The Stapler–And The Staple–That Damaged A Nintendo...

This Desert Punk Adventure's Focus On Fluids Creates Intriguing Systemic Action

Tony Hawk's Pro Skater 3 + 4 Has A Very Silly SpongeBob Easter Egg

Best Prime Day Blu-Ray Deals: 450 Movies & TV Box Sets (And Counting)

PlayStation Plus Extra/Premium Games For July 2025 Include Cyberpunk 2077