sdadas commited on
Commit
e089ad0
1 Parent(s): 6ef8d02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -18,4 +18,15 @@ tags:
18
  # XLM-RoBERTA-large-twitter
19
 
20
  This is a XLM-RoBERTa-large model tuned on a corpus of over 156 million tweets in ten languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean.
21
- The model has been trained from the original XLM-RoBERTA-large checkpoint for 2 epochs with a batch size of 1024.
 
 
 
 
 
 
 
 
 
 
 
 
18
  # XLM-RoBERTA-large-twitter
19
 
20
  This is a XLM-RoBERTa-large model tuned on a corpus of over 156 million tweets in ten languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean.
21
+ The model has been trained from the original XLM-RoBERTA-large checkpoint for 2 epochs with a batch size of 1024.
22
+
23
+ For best results, please the preprocess the tweets using the following method before passing them to the model:
24
+ ```python
25
+ def preprocess(text):
26
+ new_text = []
27
+ for t in text.split(" "):
28
+ t = '@user' if t.startswith('@') and len(t) > 1 else t
29
+ t = 'http' if t.startswith('http') else t
30
+ new_text.append(t)
31
+ return " ".join(new_text)
32
+ ```