Andrija commited on
Commit
583bd82
1 Parent(s): 47c7163

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - oscar
4
+ - srwac
5
+ - leipzig
6
+ - cc100
7
+ - hrwac
8
+ language:
9
+ - hr
10
+ - sr
11
+ tags:
12
+ - masked-lm
13
+ widget:
14
+ - text: "Ovo je početak <mask>."
15
+ license: apache-2.0
16
+
17
+ ---
18
+
19
+ # Transformer language model for Croatian and Serbian
20
+
21
+ Trained on 28GB datasets that contain Croatian and Serbian language for one epochs (3 mil. steps).
22
+ Leipzig Corpus, OSCAR, srWac, hrWac, cc100-hr and cc100-sr datasets
23
+
24
+ | Model | #params | Arch. | Training data |
25
+ |--------------------------------|--------------------------------|-------|-----------------------------------|
26
+ | `Andrija/SRoBERTa-L` | 80M | Forth | Leipzig Corpus, OSCAR, srWac, hrWac, cc100-hr and cc100-sr (28 GB of text) |