manchuBERT
This is a BERT-base model trained with romanized Manchu data from scratch.
Data
manchuBERT utilizes the data augmentation method from Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data.
Data | Number of Sentences(before augmentation) |
---|---|
Manwén Lˇaodàng–Taizong | 2,220 |
Ilan gurun i bithe | 41,904 |
Gin ping mei bithe | 21,376 |
Yùzhì Q¯ıngwénjiàn | 11,954 |
Yùzhì Zengdìng Q¯ıngwénjiàn | 18,420 |
Manwén Lˇaodàng–Taizu | 22,578 |
Manchu-Korean Dictionary | 40,583 |
- Downloads last month
- 1