--- language: - el --- # PaloBERT A greek pre-trained language model based on [RoBERTa](https://arxiv.org/abs/1907.11692). ## Pre-training data The model is pre-trained on a corpus of 458,293 documents collected from greek social media (Twitter, Instagram, Facebook and YouTube). A RoBERTa tokenizer trained from scratch on the same corpus is also included. The corpus has been provided by [Palo LTD](http://www.paloservices.com/) ## Requirements ``` pip install transformers pip install torch ``` ## Pre-processing details In order to use 'palobert-base-greek-social-media', the text needs to be pre-processed as follows: * remove all greek diacritics * convert to lowercase * remove all punctuation ## Evaluation on MLM and Sentiment Analysis tasks For detailed results refer to Thesis: ['Ανάλυση συναισθήματος κειμένου στα Ελληνικά με χρήση Δικτύων Μετασχηματιστών']( http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623) (version - p2) ## Author Pavlina Chatziantoniou, Georgios Alexandridis and Athanasios Voulodimos ## Citation info http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623