|
DeBERTa trained from scratch |
|
|
|
continued training from https://huggingface.co/mikesong724/deberta-wiki-2006 |
|
|
|
Source data: https://dumps.wikimedia.org/archive/2010/ |
|
|
|
Tools used: https://github.com/mikesong724/Point-in-Time-Language-Model |
|
|
|
2010 wiki archive 6.1 GB trained 18 epochs = 108GB + 2006 (65GB) |
|
|
|
GLUE benchmark |
|
|
|
cola (3e): matthews corr: 0.3640 |
|
|
|
sst2 (3e): acc: 0.9106 |
|
|
|
mrpc (5e): F1: 0.8505, acc: 0.7794 |
|
|
|
stsb (3e): pearson: 0.8339, spearman: 0.8312 |
|
|
|
qqp (3e): acc: 0.8965, F1: 0.8604 |
|
|
|
mnli (3e): acc_mm: 0.8023 |
|
|
|
qnli (3e): acc: 0.8889 |
|
|
|
rte (3e): acc: 0.5271 |
|
|
|
wnli (5e): acc: 0.3380 |