LenDigLearn's picture
Update README.md
5a13d1a verified
metadata
license: apache-2.0
datasets:
  - amphion/Emilia-Dataset
language:
  - de
  - en
base_model:
  - neuphonic/neucodec
tags:
  - audio
  - speech

NeuCodec decoder fine-tuned for German speech

This is just the decoder of neuphonic/neucodec, fine-tuned on equal amounts of German and English speech data from Emilia-Yodas, to enhance decoding quality of German speech. Since we only fine-tuned the decoder, the codebook is identical to the base model, meaning this model can be used with the regular NeuCodec encoder.

We supply a compact class NeuCodecDecoder.py to easily run inference with this decoder since the NeuCodec codebase doesn't easily allow loading model files from foreign HuggingFace repos.

Inference Example

import torch
import torchaudio

from NeuCodecDecoder import NeuCodecDecoder

decoder_model = NeuCodecDecoder.from_pretrained("DigitalLearningGmbH/neucodec-decoder-ft-de")
decoder_model = decoder_model.eval().cuda()

with torch.no_grad():
    decoded = decoder_model.decode_code(torch.tensor(tokens).unsqueeze(0).unsqueeze(0).to('cuda')).cpu()

torchaudio.save("decoded.wav", decoded[0, :, :], 24_000)

For more information please refer to the original model card.