SUFEHeisenberg/Fin-RoBERTa

We collects financial domain terms from Investopedia's Financia terms dictionary, NYSSCPA's accounting terminology guide and Harvey's Hypertextual Finance Glossary to expand RoBERTa's vocab dict.

Based on added-financial-terms RoBERTa, we pretrained our model on multilple financial corpus:

Financial Terms
Financial Datasets
Earnings Call 2016-2023 NASDAQ 100 components stocks's Earnings Call Transcripts.

In continual pretraining step, we apply following experiments settings to achieve better finetuned results on Four Financial Datasets:

Masking Probability: 0.4 (instead of default 0.15)
Warmup Steps: 0 (deriving better results than models with warmup steps)
Epochs: 1 (is enough in case of overfitting)
weight_decay: 0.01
Train Batch Size: 64
FP16

SUFEHeisenberg
/

Fin-RoBERTa

Datasets used to train SUFEHeisenberg/Fin-RoBERTa