File size: 629 Bytes
2e299be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
DeBERTa trained from scratch

continued training from https://huggingface.co/mikesong724/deberta-wiki-2006

Source data: https://dumps.wikimedia.org/archive/2010/

Tools used: https://github.com/mikesong724/Point-in-Time-Language-Model

2010 wiki archive 6.1 GB             trained 18 epochs = 108GB + 2006 (65GB)

GLUE benchmark

	cola (3e): matthews corr: 0.3640
	
	sst2 (3e): acc: 0.9106
	
	mrpc (5e): F1: 0.8505, acc: 0.7794
	
	stsb (3e): pearson: 0.8339, spearman: 0.8312
	
	qqp (3e): acc: 0.8965, F1: 0.8604
	
	mnli (3e): acc_mm: 0.8023
	
	qnli (3e): acc: 0.8889
	
	rte (3e): acc: 0.5271										
	
	wnli (5e): acc: 0.3380