DAC (Descript Audio Codec) 16 kHz — LiteRT (CompiledModel GPU)

Descript Audio Codec running on-device on the LiteRT CompiledModel GPU (ML Drift). The convolutional encoder/decoder run on the GPU; the RVQ runs on CPU. 43:1 compression (1 s → 12×50 codes), RTF ≈ 0.82 (faster than real-time) on Pixel 8a.

Files

dac_16khz_encoder_fp16.tflite (43 MB) — audio[1,1,16000] → latent[1,1024,50], GPU.
dac_16khz_deconly_zs_fp16.tflite (105 MB) — latent[1,1024,50] → audio, GPU.
dac_rvq.bin (1.2 MB) — RVQ weights (12 codebooks) for the CPU quantizer (float32 LE).

Pipeline

audio -> encoder.tflite (GPU) -> z -> RVQ.encode (CPU) -> codes[12,50]
      -> RVQ.decode (CPU) -> z_q -> decoder.tflite (GPU) -> audio

On-device (Pixel 8a, Tensor G3 — verified)

encoder 367/367 + decoder 398/398 nodes on the LiteRT GPU delegate (LITERT_CL, 1 partition, no CPU fallback); warm RTF ~0.82; reconstruction corr 1.0 vs PyTorch DAC.

Why the split

The decoder's ConvTranspose1d are rewritten to a GPU-clean zero-stuff form (the real DAC's odd stride-5 transposed conv fails converter legalization, and TRANSPOSE_CONV is rejected by Mali). The RVQ uses EMBEDDING_LOOKUP + int64 indices (Mali-rejected) so it runs on CPU. So the float conv graph stays fully on the GPU.

Android sample + conversion/validation scripts: https://github.com/john-rocky/LiteRT-Models/tree/main/dac

License: MIT (Descript DAC).

Downloads last month: 18