pchatz commited on
Commit
1388aee
1 Parent(s): 387a694

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - el
4
+ ---
5
+
6
+ # PaloBERT
7
+
8
+ A greek pre-trained language model based on [RoBERTa](https://arxiv.org/abs/1907.11692).
9
+
10
+ ## Pre-training data
11
+
12
+ The model is pre-trained on a corpus of 458,293 documents collected from greek social media (Twitter, Instagram, Facebook and YouTube). A RoBERTa tokenizer trained from scratch on the same corpus is also included.
13
+
14
+ The corpus has been provided by [Palo LTD](http://www.paloservices.com/)
15
+
16
+
17
+ ## Requirements
18
+
19
+ ```
20
+ pip install transformers
21
+ pip install torch
22
+
23
+ ```
24
+
25
+ ## Pre-processing details
26
+
27
+ In order to use 'palobert-base-greek-social-media', the text needs to be pre-processed as follows:
28
+
29
+ * remove all greek diacritics
30
+ * convert to lowercase
31
+ * remove all punctuation
32
+
33
+
34
+ ## Evaluation on MLM and Sentiment Analysis tasks
35
+
36
+ For detailed results refer to Thesis: ['Ανάλυση συναισθήματος κειμένου στα Ελληνικά με χρήση Δικτύων Μετασχηματιστών']( http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623) (version - p2)
37
+
38
+ ## Author
39
+
40
+ Pavlina Chatziantoniou, Georgios Alexandridis and Athanasios Voulodimos
41
+
42
+ ## Citation info
43
+
44
+ http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623