Model Card for Model ID
Parler TTS model mimicing paimons voice from genshin impact
Model Description
Parler TTS model fine tuned on genshin impact Paimon voice lines pulled from the game
- Model type: [TTS]
- Language(s) (NLP): [English]
- Finetuned from model parler-tts/parler_tts_mini_v0.1
How to Get Started with the Model
pip install git+https://github.com/huggingface/parler-tts.git
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer, set_seed
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("testsandrocks/parler-tts-mini-paimon").to(device)
tokenizer = AutoTokenizer.from_pretrained("testsandrocks/parler-tts-mini-paimon")
prompt = "Why the fudge does nothing ever work right the first time? Travaler let's get out of here."
description = "paimon's speech is very clear, and she speaks in a very monotone voice, with minimal variation in speed"
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
set_seed(42)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
Training Details
Training Data
- simon3000/genshin-voice
- testsandrocks/gi_voices_en_paimon
Training Procedure
!git clone https://github.com/huggingface/parler-tts.git
%cd parler-tts
!pip install --quiet -e .[train]
!accelerate launch ./training/run_parler_tts_training.py \
--model_name_or_path "parler-tts/parler_tts_mini_v0.1" \
--feature_extractor_name "parler-tts/dac_44khZ_8kbps" \
--description_tokenizer_name "parler-tts/parler_tts_mini_v0.1" \
--prompt_tokenizer_name "parler-tts/parler_tts_mini_v0.1" \
--report_to "wandb" \
--overwrite_output_dir true \
--train_dataset_name "testsandrocks/gi_voices_en_paimon" \
--train_metadata_dataset_name "testsandrocks/paimon-tagged-w-speech-mistral" \
--train_dataset_config_name "default" \
--train_split_name "train" \
--eval_dataset_name "testsandrocks/gi_voices_en_paimon" \
--eval_metadata_dataset_name "testsandrocks/paimon-tagged-w-speech-mistral" \
--eval_dataset_config_name "default" \
--eval_split_name "train" \
--max_eval_samples 8 \
--per_device_eval_batch_size 8 \
--target_audio_column_name "audio" \
--description_column_name "text_description" \
--prompt_column_name "text" \
--max_duration_in_seconds 20 \
--min_duration_in_seconds 2.0 \
--max_text_length 400 \
--preprocessing_num_workers 2 \
--do_train true \
--num_train_epochs 2 \
--gradient_accumulation_steps 18 \
--gradient_checkpointing true \
--per_device_train_batch_size 2 \
--learning_rate 0.00008 \
--adam_beta1 0.9 \
--adam_beta2 0.99 \
--weight_decay 0.01 \
--lr_scheduler_type "constant_with_warmup" \
--warmup_steps 50 \
--logging_steps 2 \
--freeze_text_encoder true \
--audio_encoder_per_device_batch_size 4 \
--dtype "float16" \
--seed 456 \
--output_dir "./output_dir_training/" \
--temporary_save_to_disk "./audio_code_tmp/" \
--save_to_disk "./tmp_dataset_audio/" \
--dataloader_num_workers 2 \
--do_eval \
--predict_with_generate \
--include_inputs_for_metrics \
--group_by_length true
Summary
- Downloads last month
- 24
Inference API (serverless) has been turned off for this model.