sdadas
/

xlm-roberta-large-twitter

Inference Endpoints

Model card Files Files and versions Community

xlm-roberta-large-twitter / README.md

sdadas's picture

Update README.md

a0e364d over 1 year ago

|

raw history blame

No virus

805 Bytes

	---
	license: mit
	language:
	- en
	- es
	- it
	- pt
	- fr
	- zh
	- hi
	- ar
	- nl
	- ko
	pipeline_tag: fill-mask
	tags:
	- twitter
	---
	# XLM-RoBERTA-large-twitter

	This is a XLM-RoBERTa-large model tuned on a corpus of over 156 million tweets in ten languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean.
	The model has been trained from the original XLM-RoBERTA-large checkpoint for 2 epochs with a batch size of 1024.

	For best results, preprocess the tweets using the following method before passing them to the model:
	```python
	def preprocess(text):
	new_text = []
	for t in text.split(" "):
	t = '@user' if t.startswith('@') and len(t) > 1 else t
	t = 'http' if t.startswith('http') else t
	new_text.append(t)
	return " ".join(new_text)
	```