Edit model card

Pretrained BART in Korean

This is pretrained BART model with multiple Korean Datasets.

I used multiple datasets for generalizing the model for both colloquial and written texts.

The training is supported by TPU Research Cloud program.

The script which is used to pre-train model is here.

When you use the reference API, you must wrap the sentence with [BOS] and [EOS] like below example.

[BOS] 안녕하세요? 반가워요~~ [EOS]

You can also test mask filling performance using [MASK] token like this.

[BOS] [MASK] 먹었어? [EOS]

Benchmark

Dataset KLUE NLI dev NSMC test QuestionPair test KLUE TC dev KLUE STS dev KorSTS dev HateSpeech dev
Metric Acc Acc Acc Acc F1 F1 Pearson Spearman F1 Pearson Spearman Bias Acc Hate Acc
Score 0.7390 0.8877 0.9208 0.8667 0.8637 0.7654 0.8090 0.8040 0.8067 0.7909 0.7784 0.8280 0.5669

Used Datasets

모두의 말뭉치

  • 일상 대화 말뭉치 2020
  • 구어 말뭉치
  • 문어 말뭉치
  • 신문 말뭉치

AIhub

세종 말뭉치

Downloads last month
125
Hosted inference API
This model can be loaded on the Inference API on-demand.