sdadas
/

xlm-roberta-large-twitter

Inference Endpoints

Model card Files Files and versions Community

sdadas commited on Feb 19, 2023

Commit

e089ad0

•

1 Parent(s): 6ef8d02

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -18,4 +18,15 @@ tags:
 # XLM-RoBERTA-large-twitter
 This is a XLM-RoBERTa-large model tuned on a corpus of over 156 million tweets in ten languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean.
-The model has been trained from the original XLM-RoBERTA-large checkpoint for 2 epochs with a batch size of 1024.

 # XLM-RoBERTA-large-twitter
 This is a XLM-RoBERTa-large model tuned on a corpus of over 156 million tweets in ten languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean.
+The model has been trained from the original XLM-RoBERTA-large checkpoint for 2 epochs with a batch size of 1024.
+For best results, please the preprocess the tweets using the following method before passing them to the model:
+```python
+def preprocess(text):
+    new_text = []
+    for t in text.split(" "):
+        t = '@user' if t.startswith('@') and len(t) > 1 else t
+        t = 'http' if t.startswith('http') else t
+        new_text.append(t)
+    return " ".join(new_text)
+```