kmean을 수행하지 않은 모델

by elisha0904 - opened Oct 14, 2023

Oct 14, 2023

Bark에 사용하기 위한 한국어 화자 npz 파일을 학습시키는 과정에서 해당 모델을 활용하고자 합니다.
구체적으로는 아래의 모델을 변형하려고 해요.
https://huggingface.co/GitMylo/bark-voice-cloning

그런데 npz 파일 학습을 위해서는 "HuBERT model without Kmeans"가 필요하다고 하더군요.
model card를 보니 현재 transformers에서 지원하고 있는 Hubert-base-korean의 사전 학습 모델은 kmeans를 수행한 버전인 것 같은데,
혹시 이를 수행하지 않은 버전의 사전학습 모델이 있을까요?

hyunwoo3235

lucid org Oct 14, 2023

HuBERT는 처음부터 MFCC 등에서 K-mean를 수행한 데이터로 학습한 모델입니다.
현재 hubert-base-korean은 원본과 동일하게 MFCC, 9th hidden states를 사용한 2개의 단계로 학습이 진행되었으며, 논문에 따르면 2번을 마친 결과가 월등히 높다는 것에 따라 최종 체크포인트만 업로드되었습니다.

아마 언급하신 모델에서의 without Kmeans는 pretrain에서만 필요하기에 features_only를 통해 예측하는 과정을 건너뛴 것을 의미하는 것이라 생각합니다. 이는 transformers에서 last_hidden_state와 동일합니다.

hyunwoo3235 changed discussion status to closed Oct 14, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment