metadata
language: ko
Pretrained BART in Korean
This is pretrained BART model with multiple Korean Datasets.
I used multiple datasets for generalizing the model for both colloquial and written texts.
The training is supported by TPU Research Cloud program.
The script which is used to pre-train model is here.
When you use the reference API, you must wrap the sentence with [BOS]
and [EOS]
like below example.
[BOS] ์๋
ํ์ธ์? ๋ฐ๊ฐ์์~~ [EOS]
Used Datasets
๋ชจ๋์ ๋ง๋ญ์น
- ์ผ์ ๋ํ ๋ง๋ญ์น 2020
- ๊ตฌ์ด ๋ง๋ญ์น
- ๋ฌธ์ด ๋ง๋ญ์น
- ์ ๋ฌธ ๋ง๋ญ์น
AIhub
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ์ ๋ฌธ๋ถ์ผ๋ง๋ญ์น
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ํ๊ตญ์ด๋ํ์์ฝ
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ๊ฐ์ฑ ๋ํ ๋ง๋ญ์น
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ํ๊ตญ์ด ์์ฑ