arxiv:2504.04949

One Quantizer is Enough: Toward a Lightweight Audio Codec

Published on Apr 7

Authors:

Abstract

Neural audio codecs have recently gained traction for their ability to compress high-fidelity audio and generate discrete tokens that can be utilized in downstream generative modeling tasks. However, leading approaches often rely on resource-intensive models and multi-quantizer architectures, resulting in considerable computational overhead and constrained real-world applicability. In this paper, we present SQCodec, a lightweight neural audio codec that leverages a single quantizer to address these limitations. SQCodec explores streamlined convolutional networks and local Transformer modules, alongside TConv, a novel mechanism designed to capture acoustic variations across multiple temporal scales, thereby enhancing reconstruction fidelity while reducing model complexity. Extensive experiments across diverse datasets show that SQCodec achieves audio quality comparable to multi-quantizer baselines, while its single-quantizer design offers enhanced adaptability and its lightweight architecture reduces resource consumption by an order of magnitude. The source code is publicly available at https://github.com/zhai-lw/SQCodec.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

No dataset linking this paper

Cite arxiv.org/abs/2504.04949 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2504.04949 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.