YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

license: cc-by-nc-4.0 tags:

  • mms
  • vits pipeline_tag: text-to-speech

Model Details

facebook/mms-tts-kor ๋ฅผ ๋ฒ ์ด์Šค๋ชจ๋ธ๋กœ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

How to Get Started with the Model

Pytorch

import torch
import soundfile as sf
from transformers import VitsModel, AutoTokenizer
import sys
sys.path.append('/home/user/AZ/tts_emotion')
from train import EmotionalVitsModel  # train.py์—์„œ ๋ชจ๋ธ import

def inference_emotional_tts(checkpoint_path, text="์•ˆ๋…•ํ•˜์„ธ์š”", emotion=0, output_path="output.wav"):
    # ๋ชจ๋ธ ์ดˆ๊ธฐํ™”
    model = EmotionalVitsModel()
    checkpoint = torch.load(checkpoint_path)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.cuda()
    model.eval()
    
    # ํ† ํฌ๋‚˜์ด์ €
    tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-kor")
    text_tokens = tokenizer(text, return_tensors="pt")
    
    # ์ถ”๋ก 
    with torch.no_grad():
        output = model(
            input_ids=text_tokens['input_ids'].cuda(),
            attention_mask=text_tokens['attention_mask'].cuda(),
            emotion=torch.tensor([emotion]).cuda()
        )
    
    # waveform์„ CPU๋กœ ์ด๋™ํ•˜๊ณ  numpy๋กœ ๋ณ€ํ™˜
    audio = output.waveform.cpu().numpy()[0, 0]
    
    # wav ํŒŒ์ผ๋กœ ์ €์žฅ (์ƒ˜ํ”Œ๋ ˆ์ดํŠธ๋Š” VITS ๋ชจ๋ธ์˜ ๊ธฐ๋ณธ๊ฐ’์ธ 22050Hz ์‚ฌ์šฉ)
    sf.write(output_path, audio, 22050)
    
    return output_path

inference_emotional_tts('', #์ฒดํฌํฌ์ธํŠธ ๊ฒฝ๋กœ
                        text="์•ˆ๋…•ํ•˜์„ธ์š” ์ €๋Š”, ์ฐจ๋น„์Šค์ž…๋‹ˆ๋‹ค.", 
                        emotion=2,  # 0: neutral, 1: happy, 2: sad
                        output_path="IamChavis.wav")



Training Details

Training Data

ํ•ด๋‹น ๋ชจ๋ธ์€ AI hub์˜ ๊ฐ์ • ์Œ์„ฑ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹ (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=120&topMenu=100&aihubDataSe=extrldata&dataSetSn=286) ์ค‘ ์ค‘๋ฆฝ/๊ธ์ •/๋ถ€์ • ๋ฐ์ดํ„ฐ์…‹ ๊ฐ 1000๊ฐœ, ์ด 3000๊ฐœ๋ฅผ ํ™œ์šฉํ•ด ํ•™์Šต์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

Training Procedure

๊ธฐ์กด์— ํ•œ๊ตญ์–ด๋กœ ์‚ฌ์ „ํ•™์Šต๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๊ณ ์ •ํ•œ ์ฑ„ ๊ฐ์ • ๋ ˆ์ด์–ด๋งŒ ์—…๋ฐ์ดํŠธ ํ•˜์—ฌ ๊ธฐ์กด ํ•œ๊ตญ์–ด ๋Šฅ๋ ฅ์€ ์œ ์ง€ํ•œ ์ฑ„ ๊ฐ์ •ํ‘œํ˜„ ๋Šฅ๋ ฅ๋งŒ์„ ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

Training Hyperparameters

Hyperparameter Base
Learning Rates 1e-4
Batch Size 128
padding segment_length(68608)์— ๋งž์ถฐ ์˜ค๋””์˜ค ๊ธธ์ด ์กฐ์ • (๊ธธ๋ฉด ์ค‘์•™ ๋ถ€๋ถ„ ์ถ”์ถœ, ์งง์œผ๋ฉด 0ํŒจ๋”ฉ)
Optimizer AdamW
Epoch 30
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .