IMJONEZZ
/

SlovenBERTcina

Inference Endpoints

Model card Files Files and versions Community

IMJONEZZ commited on Jul 29, 2021

Commit

141147e

·

1 Parent(s): 34ea2f0

Update README.md

Files changed (1) hide show

README.md +17 -17

README.md CHANGED Viewed

@@ -12,24 +12,24 @@ RoBERTA pretrained tokenizer vocab and merges included.
 - **Dataset**:
   8GB Slovak Monolingual dataset including ParaCrawl (monolingual), OSCAR, and several gigs of my own findings and cleaning.
 - **Preprocessing**:
-  Tokenized with a pretrained ByteLevelBPETokenizer trained on the same dataset. Uncased, with <s>, <pad>, </s>, <unk>, and <mask> special tokens.
 - **Evaluation results**:
-  Mnoho ľudí tu<mask>
-    žije.
-    žijú.
-    je.
-    trpí.
-  Ako sa<mask>
-    máte
-    máš
-    má
-    hovorí
-  Plážová sezóna pod Zoborom patrí medzi<mask> obdobia.
-    ročné
-    najkrajšie
-    najobľúbenejšie
-    najnáročnejšie
 - **Limitations**:
   The current model is fairly small, although it works very well. This model is meant to be finetuned on downstream tasks e.g. Part-of-Speech tagging, Question Answering, anything in GLUE or SUPERGLUE.

 - **Dataset**:
   8GB Slovak Monolingual dataset including ParaCrawl (monolingual), OSCAR, and several gigs of my own findings and cleaning.
 - **Preprocessing**:
+  Tokenized with a pretrained ByteLevelBPETokenizer trained on the same dataset. Uncased, with s, pad, /s, unk, and mask special tokens.
 - **Evaluation results**:
+  - Mnoho ľudí tu<mask>
+    * žije.
+    * žijú.
+    * je.
+    * trpí.
+  - Ako sa<mask>
+    * máte
+    * máš
+    * má
+    * hovorí
+  - Plážová sezóna pod Zoborom patrí medzi<mask> obdobia.
+    * ročné
+    * najkrajšie
+    * najobľúbenejšie
+    * najnáročnejšie
 - **Limitations**:
   The current model is fairly small, although it works very well. This model is meant to be finetuned on downstream tasks e.g. Part-of-Speech tagging, Question Answering, anything in GLUE or SUPERGLUE.