AAAI Conference 2026 Short Paper
APEX-Q: Arbitrary-dimension Product-EXtension Quantization for Accelerated LLM Deployment (Student Abstract)
- Yian Wang
- Ye Qiao
- Sitao Huang
- Hyoukjun Kwon
We present APEX-Q, a flexible product quantization framework for compressing large language models. Unlike prior multi-codebook quantization methods with fixed partitions, APEX-Q supports arbitrary-dimensional tensor quantization, better capturing weight redundancy. It achieves performance on par with 4-bit and 8-bit baselines, enables post-training quantization without retraining, and reveals key trade-offs across subvector dimensions, codebook sizes, and hardware efficiency. APEX-Q thus provides a unified, hardware-friendly approach to scalable LLM deployment.