pchatz
/

palobert-base-greek-social-media-v2

Inference Endpoints

Model card Files Files and versions Community

pchatz commited on Apr 4, 2023

Commit

1388aee

•

1 Parent(s): 387a694

Create README.md

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+language:
+- el
+---
+# PaloBERT
+A greek pre-trained language model based on [RoBERTa](https://arxiv.org/abs/1907.11692).
+## Pre-training data
+The model is pre-trained on a corpus of 458,293 documents collected from greek social media (Twitter, Instagram, Facebook and YouTube). A RoBERTa tokenizer trained from scratch on the same corpus is also included.
+The corpus has been provided by [Palo LTD](http://www.paloservices.com/)
+## Requirements
+```
+pip install transformers
+pip install torch
+```
+## Pre-processing details
+In order to use 'palobert-base-greek-social-media', the text needs to be pre-processed as follows:
+* remove all greek diacritics
+* convert to lowercase
+* remove all punctuation
+## Evaluation on MLM and Sentiment Analysis tasks
+For detailed results refer to Thesis: ['Ανάλυση συναισθήματος κειμένου στα Ελληνικά με χρήση Δικτύων Μετασχηματιστών'](	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623) (version - p2)
+## Author
+Pavlina Chatziantoniou, Georgios Alexandridis and Athanasios Voulodimos
+## Citation info
+http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623