NeuCodec Encoder (rten port)
This is a Rust-runtime port of the NeuCodec encoder, exported to ONNX and converted for use with rten โ a pure-Rust ONNX runtime.
The encoder converts a 16 kHz mono audio reference into discrete codes that drive zero-shot voice cloning in the NeuTTS family of models.
This artifact exists to enable voice cloning in Ragtag, a local-first desktop AI application, under strict architectural constraints: no native ONNX runtime (no onnxruntime / ort), no GPL dependencies. The constraint-clean path through rten produced numerical parity with the original PyTorch encoder.
Model details
- Source model: neuphonic/neucodec (encoder portion only)
- Format:
.rten(rten's native model format, converted from ONNX) - Runtime: rten 0.22+
- Precision: fp32 (quantized variants may be added later)
- File size: approximately 1.77 GB
- Input: 16 kHz mono audio, fixed 20-second window (shorter inputs zero-padded)
- Output: discrete code tokens consumed by the NeuTTS backbone for voice cloning
Provenance and conversion process
This model was produced from the original NeuCodec encoder through the following pipeline:
- PyTorch source: the
neucodecPython package, encoder component only - ONNX export:
torch.onnx.exportvia a patched version of the author'sexport_encoder.py. The patch corrects two issues in the upstream export script (a probe-ordering bug and a stale alias-free patch written against a different module structure), and swaps an ONNX-hostile dynamic operation inUpSample1d/LowPassFilter1dfor a fixed buffer. The patched model produces identical codes to the original PyTorch model. - rten conversion:
rten-convertfrom the ONNX export. The full encoder โ including the 600M-parameter Wav2Vec2-BERT 2.0 semantic model โ converts cleanly with no unsupported operators.
Quality verification
The Rust runtime output was verified against the PyTorch reference:
- First 12 output tokens: identical between rten and PyTorch
- Overall token divergence: 1.00% across the full sequence (attributable to floating-point boundary rounding)
- Reconstruction parity: codes from the rten encoder, when fed through the existing Rust decoder, reconstruct audio matching the Python-encoded reference within tolerance
- Clone equivalence: clones driven by rten-encoded references are subjectively equivalent in quality to clones driven by Python-encoded references
The chain rten โ ORT โ PyTorch holds end-to-end.
Usage
This model is intended for use within Ragtag's clone pipeline. It is not a standalone TTS system. Using it requires:
- A NeuTTS backbone model (e.g., neuphonic/neutts-air-q4-gguf)
- A NeuCodec decoder (the Rust port included in neutts-rs)
- A G2P frontend producing IPA phonemes (Ragtag uses piper-plus-g2p)
- The rten runtime crate
The encoder runs on CPU; encoding a 20-second reference takes approximately 5 seconds on Apple Silicon.
Licence and attribution
This model is licensed under Apache License 2.0, derived from the original NeuCodec encoder which is also Apache 2.0 licensed.
When using this model, please retain the attribution to the original authors:
NeuCodec by Neuphonic
https://huggingface.co/neuphonic/neucodec
Licensed under Apache 2.0
The rten conversion and ONNX export patches are contributed by Ragtag / Captivated Ltd, also under Apache 2.0.
Limitations
- English-only G2P: while the encoder itself is language-agnostic, the current Ragtag pipeline uses an English G2P frontend. Non-English cloning is not currently supported.
- Fixed 20-second input: shorter references are zero-padded; longer references are truncated. The pipeline targets 12โ15 second references for the guided recording flow.
- Quality depends on reference quality: clone quality tracks reference quality directly. Short, performative, or emotionally emphatic references bleed prosody into the output. Neutral, evenly-delivered, sufficiently-long references produce dramatically better clones.
Related resources
- Original NeuCodec: neuphonic/neucodec
- NeuTTS backbone: neuphonic/neutts-air
- rten runtime: github.com/robertknight/rten
- neutts-rs (Rust port reference): github.com/eugenehp/neutts-rs
- Ragtag: ragtag-ai.app
Citation
If you use this work, please cite both the original NeuCodec and this rten port:
@misc{neucodec2024,
author = {Neuphonic},
title = {NeuCodec},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/neuphonic/neucodec}
}
@misc{neucodec-encoder-rten,
author = {Mallett, Leon},
title = {NeuCodec Encoder (rten port)},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/ragtag-ai/neucodec-encoder-rten}
}
Model tree for ragtag-ai/neucodec-encoder-rten
Base model
neuphonic/neucodec