Large Language Models (LLMs) commonly adopt low-bit quantization of weights and activations during inference to mitigate the communication, storage, and computation overhead induced by their massive ...