ClassCat commited on
Commit
f06ba16
1 Parent(s): b2ad106

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: el
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - cc100
6
+ - oscar
7
+ - wikipedia
8
+ widget:
9
+ - text: "Αυτό είναι <mask>."
10
+ - text: "Ανοιξα <mask>."
11
+ - text: "Ευχαριστώ για <mask>."
12
+ - text: "Έχει πολύ καιρό που δεν <mask>."
13
+ ---
14
+
15
+ ## RoBERTa Greek small model (Uncased)
16
+
17
+ ### Prerequisites
18
+
19
+ transformers==4.19.2
20
+
21
+ ### Model architecture
22
+
23
+ This model uses approximately half the size of RoBERTa base model parameters.
24
+
25
+ ### Tokenizer
26
+
27
+ Using BPE tokenizer with vocabulary size 50,000.
28
+
29
+ ### Training Data
30
+
31
+ * Subset of [CC-100/el](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
32
+ * Subset of [oscar](https://huggingface.co/datasets/oscar)
33
+ * [wiki40b/el](https://www.tensorflow.org/datasets/catalog/wiki40b#wiki40bel) (French Wikipedia)
34
+
35
+ ### Usage
36
+
37
+ ```python
38
+ from transformers import pipeline
39
+
40
+ unmasker = pipeline('fill-mask', model='ClassCat/roberta-small-greek')
41
+ unmasker("Έχει πολύ καιρό που δεν <mask>.")
42
+ ```