ERCDiDip commited on
Commit
7b77d15
·
1 Parent(s): 572c89f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,7 +21,7 @@ You can directly use this model as a language detector, i.e. for sequence classi
21
 
22
  Modern: Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Irish (ga), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Russian (ru), Turkish (tr), Basque (eu), Catalan (ca), Albanian (sq), Serbian (se), Ukrainian (uk), Norwegian (no), Arabic (ar), Chinese (zh), Hebrew (he)
23
 
24
- Medieval: Middle High German (mhd), Latin (la), Middle Low German (gml), Old French (fro), Old Chruch Slavonic (chu), Early New High German (fnhd)
25
 
26
  ## Training and evaluation data
27
  The model was fine-tuned using the Monasterium and Wikipedia datasets, which consist of text sequences in 40 languages. The training set contains 80k samples, while the validation and test sets contain 16k. The average accuracy on the test set is 99.59% (this matches the average macro/weighted F1-score, the test set being perfectly balanced).
 
21
 
22
  Modern: Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Irish (ga), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Russian (ru), Turkish (tr), Basque (eu), Catalan (ca), Albanian (sq), Serbian (se), Ukrainian (uk), Norwegian (no), Arabic (ar), Chinese (zh), Hebrew (he)
23
 
24
+ Medieval: Middle High German (mhd), Latin (la), Middle Low German (gml), Old French (fro), Old Church Slavonic (chu), Early New High German (fnhd)
25
 
26
  ## Training and evaluation data
27
  The model was fine-tuned using the Monasterium and Wikipedia datasets, which consist of text sequences in 40 languages. The training set contains 80k samples, while the validation and test sets contain 16k. The average accuracy on the test set is 99.59% (this matches the average macro/weighted F1-score, the test set being perfectly balanced).