Edit model card

Model Card for Model ID

Parler TTS model mimicing paimons voice from genshin impact

Model Description

Parler TTS model fine tuned on genshin impact Paimon voice lines pulled from the game

How to Get Started with the Model

pip install git+https://github.com/huggingface/parler-tts.git
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer, set_seed
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("testsandrocks/parler-tts-mini-paimon").to(device)
tokenizer = AutoTokenizer.from_pretrained("testsandrocks/parler-tts-mini-paimon")

prompt = "Why the fudge does nothing ever work right the first time? Travaler let's get out of here."
description = "paimon's speech is very clear, and she speaks in a very monotone voice, with minimal variation in speed"

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

set_seed(42)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

Training Details

Training Data

  • simon3000/genshin-voice
  • testsandrocks/gi_voices_en_paimon

Training Procedure

!git clone https://github.com/huggingface/parler-tts.git
%cd parler-tts
!pip install --quiet -e .[train]
!accelerate launch ./training/run_parler_tts_training.py \
    --model_name_or_path "parler-tts/parler_tts_mini_v0.1" \
    --feature_extractor_name "parler-tts/dac_44khZ_8kbps" \
    --description_tokenizer_name "parler-tts/parler_tts_mini_v0.1" \
    --prompt_tokenizer_name "parler-tts/parler_tts_mini_v0.1" \
    --report_to "wandb" \
    --overwrite_output_dir true \
    --train_dataset_name "testsandrocks/gi_voices_en_paimon" \
    --train_metadata_dataset_name "testsandrocks/paimon-tagged-w-speech-mistral" \
    --train_dataset_config_name "default" \
    --train_split_name "train" \
    --eval_dataset_name "testsandrocks/gi_voices_en_paimon" \
    --eval_metadata_dataset_name "testsandrocks/paimon-tagged-w-speech-mistral" \
    --eval_dataset_config_name "default" \
    --eval_split_name "train" \
    --max_eval_samples 8 \
    --per_device_eval_batch_size 8 \
    --target_audio_column_name "audio" \
    --description_column_name "text_description" \
    --prompt_column_name "text" \
    --max_duration_in_seconds 20 \
    --min_duration_in_seconds 2.0 \
    --max_text_length 400 \
    --preprocessing_num_workers 2 \
    --do_train true \
    --num_train_epochs 2 \
    --gradient_accumulation_steps 18 \
    --gradient_checkpointing true \
    --per_device_train_batch_size 2 \
    --learning_rate 0.00008 \
    --adam_beta1 0.9 \
    --adam_beta2 0.99 \
    --weight_decay 0.01 \
    --lr_scheduler_type "constant_with_warmup" \
    --warmup_steps 50 \
    --logging_steps 2 \
    --freeze_text_encoder true \
    --audio_encoder_per_device_batch_size 4 \
    --dtype "float16" \
    --seed 456 \
    --output_dir "./output_dir_training/" \
    --temporary_save_to_disk "./audio_code_tmp/" \
    --save_to_disk "./tmp_dataset_audio/" \
    --dataloader_num_workers 2 \
    --do_eval \
    --predict_with_generate \
    --include_inputs_for_metrics \
    --group_by_length true

Summary

Downloads last month
24
Safetensors
Model size
647M params
Tensor type
FP16
·
Inference API (serverless) has been turned off for this model.

Datasets used to train testsandrocks/parler-tts-mini-paimon