DAC (Descript Audio Codec) 16 kHz β€” LiteRT (CompiledModel GPU)

Descript Audio Codec running on-device on the LiteRT CompiledModel GPU (ML Drift). The convolutional encoder/decoder run on the GPU; the RVQ runs on CPU. 43:1 compression (1 s β†’ 12Γ—50 codes), RTF β‰ˆ 0.82 (faster than real-time) on Pixel 8a.

Files

  • dac_16khz_encoder_fp16.tflite (43 MB) β€” audio[1,1,16000] β†’ latent[1,1024,50], GPU.
  • dac_16khz_deconly_zs_fp16.tflite (105 MB) β€” latent[1,1024,50] β†’ audio, GPU.
  • dac_rvq.bin (1.2 MB) β€” RVQ weights (12 codebooks) for the CPU quantizer (float32 LE).

Pipeline

audio -> encoder.tflite (GPU) -> z -> RVQ.encode (CPU) -> codes[12,50]
      -> RVQ.decode (CPU) -> z_q -> decoder.tflite (GPU) -> audio

On-device (Pixel 8a, Tensor G3 β€” verified)

encoder 367/367 + decoder 398/398 nodes on the LiteRT GPU delegate (LITERT_CL, 1 partition, no CPU fallback); warm RTF ~0.82; reconstruction corr 1.0 vs PyTorch DAC.

Why the split

The decoder's ConvTranspose1d are rewritten to a GPU-clean zero-stuff form (the real DAC's odd stride-5 transposed conv fails converter legalization, and TRANSPOSE_CONV is rejected by Mali). The RVQ uses EMBEDDING_LOOKUP + int64 indices (Mali-rejected) so it runs on CPU. So the float conv graph stays fully on the GPU.

Android sample + conversion/validation scripts: https://github.com/john-rocky/LiteRT-Models/tree/main/dac

License: MIT (Descript DAC).

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support