Update README.md
Browse files
README.md
CHANGED
@@ -18,4 +18,15 @@ tags:
|
|
18 |
# XLM-RoBERTA-large-twitter
|
19 |
|
20 |
This is a XLM-RoBERTa-large model tuned on a corpus of over 156 million tweets in ten languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean.
|
21 |
-
The model has been trained from the original XLM-RoBERTA-large checkpoint for 2 epochs with a batch size of 1024.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
# XLM-RoBERTA-large-twitter
|
19 |
|
20 |
This is a XLM-RoBERTa-large model tuned on a corpus of over 156 million tweets in ten languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean.
|
21 |
+
The model has been trained from the original XLM-RoBERTA-large checkpoint for 2 epochs with a batch size of 1024.
|
22 |
+
|
23 |
+
For best results, please the preprocess the tweets using the following method before passing them to the model:
|
24 |
+
```python
|
25 |
+
def preprocess(text):
|
26 |
+
new_text = []
|
27 |
+
for t in text.split(" "):
|
28 |
+
t = '@user' if t.startswith('@') and len(t) > 1 else t
|
29 |
+
t = 'http' if t.startswith('http') else t
|
30 |
+
new_text.append(t)
|
31 |
+
return " ".join(new_text)
|
32 |
+
```
|