--- license: apache-2.0 datasets: - andrea-t94/TwitterSentiment140 language: - en metrics: - perplexity library_name: transformers tags: - distillroberta-base - twitter pipeline_tag: fill-mask --- ## Twitter-roBERTa-base fine-tuned using masked language modelling This is a RoBERTa-base model finetuned (domain adaptation) on ~2M tweets from Jin 2009 (sentiment140). This is the first step of a two steps approach to finetune for sentiment analysis (ULMFit) This model is suitable for English. Main charachetistics: - pretrained model and tokenizer: distillroberta-base - no cleaning/processing applied to the data Reference Paper: [ULMFit](https://arxiv.org/abs/1801.06146). Reference dataset: [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140?resource=download) Git Repo: TBD Labels: 0 -> Negative; 1 -> Positive