Andrija commited on
Commit
e305273
1 Parent(s): 7cbc697

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -13,4 +13,21 @@ license: apache-2.0
13
  ---
14
  # Transformer language model for Croatian and Serbian
15
  Trained on 0.7GB dataset Croatian and Serbian language for one epoch.
16
- Dataset from Leipzig Corpora.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
  # Transformer language model for Croatian and Serbian
15
  Trained on 0.7GB dataset Croatian and Serbian language for one epoch.
16
+ Dataset from Leipzig Corpora.
17
+
18
+ # Information of dataset
19
+ | Model | #params | Arch. | Training data |
20
+
21
+ |--------------------------------|--------------------------------|-------|-----------------------------------|
22
+
23
+ | `Andrija/SRoBERTa` | 120M | First | Leipzig Corpus (0.7 GB of text) |
24
+
25
+
26
+ # How to use in code
27
+ ```python
28
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
29
+
30
+ tokenizer = AutoTokenizer.from_pretrained("Andrija/SRoBERTa")
31
+
32
+ model = AutoModelForMaskedLM.from_pretrained("Andrija/SRoBERTa")
33
+ ```