Edit model card

KR-FinBert & KR-FinBert-SC

Much progress has been made in the NLP (Natural Language Processing) field, with numerous studies showing that domain adaptation using small-scale corpus and fine-tuning with labeled data is effective for overall performance improvement. we proposed KR-FinBert for the financial domain by further pre-training it on a financial corpus and fine-tuning it for sentiment analysis. As many studies have shown, the performance improvement through adaptation and conducting the downstream task was also clear in this experiment.

KR-FinBert

Data

The training data for this model is expanded from those of KR-BERT-MEDIUM, texts from Korean Wikipedia, general news articles, legal texts crawled from the National Law Information Center and Korean Comments dataset. For the transfer learning, corporate related economic news articles from 72 media sources such as the Financial Times, The Korean Economy Daily, etc and analyst reports from 16 securities companies such as Kiwoom Securities, Samsung Securities, etc are added. Included in the dataset is 440,067 news titles with their content and 11,237 analyst reports. The total data size is about 13.22GB. For mlm training, we split the data line by line and the total no. of lines is 6,379,315. KR-FinBert is trained for 5.5M steps with the maxlen of 512, training batch size of 32, and learning rate of 5e-5, taking 67.48 hours to train the model using NVIDIA TITAN XP.

Downstream tasks

Sentimental Classification model

Downstream task performances with 50,000 labeled data.

Model Accuracy
KR-FinBert 0.963
KR-BERT-MEDIUM 0.958
KcBert-large 0.955
KcBert-base 0.953
KoBert 0.817

Inference sample

Positive Negative
ν˜„λŒ€λ°”μ΄μ˜€, '폴리탁셀' μ½”λ‘œλ‚˜19 치료 κ°€λŠ₯성에 19% κΈ‰λ“± μ˜ν™”κ΄€ζ ͺ 'μ½”λ‘œλ‚˜ λΉ™ν•˜κΈ°' μ–Έμ œ λλ‚˜λ‚˜β€¦"CJ CGV 올 4000μ–΅ 손싀 λ‚ μˆ˜λ„"
μ΄μˆ˜ν™”ν•™, 3λΆ„κΈ° μ˜μ—…읡 176얡…전년比 80%↑ C쇼크에 λ©ˆμΆ˜ ν‘μžλΉ„ν–‰β€¦λŒ€ν•œν•­κ³΅ 1λΆ„κΈ° μ˜μ—…μ μž 566μ–΅
"GKL, 7λ…„ λ§Œμ— λ‘ μžλ¦Ώμˆ˜ λ§€μΆœμ„±μž₯ μ˜ˆμƒ" '1000μ–΅λŒ€ νš‘λ ΉΒ·λ°°μž„' μ΅œμ‹ μ› νšŒμž₯ ꡬ속… SKλ„€νŠΈμ›μŠ€ "경영 곡백 방지 μ΅œμ„ "
μœ„μ§€μœ…μŠ€νŠœλ””μ˜€, μ½˜ν…μΈ  ν™œμ•½μ— 사상 첫 맀좜 1000얡원 돌파 λΆ€ν’ˆ 곡급 μ°¨μ§ˆμ—β€¦κΈ°μ•„μ°¨ κ΄‘주곡μž₯ μ „λ©΄ 가동 쀑단
μ‚Όμ„±μ „μž, 2λ…„ λ§Œμ— 인도 슀마트폰 μ‹œμž₯ 점유율 1μœ„ 'μ™•μ’Œ νƒˆν™˜' ν˜„λŒ€μ œμ² , μ§€λ‚œν•΄ μ˜μ—…읡 3,313얡원···전년比 67.7% κ°μ†Œ

Citation

@misc{kr-FinBert-SC,
  author = {Kim, Eunhee and Hyopil Shin},
  title = {KR-FinBert: Fine-tuning KR-FinBert for Sentiment Analysis},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://huggingface.co/snunlp/KR-FinBert-SC}}
}
Downloads last month
5,846

Space using snunlp/KR-FinBert-SC 1