README.md · andrea-t94/roberta-fine-tuned-twitter at main

metadata

license: apache-2.0
datasets:
  - andrea-t94/TwitterSentiment140
language:
  - en
metrics:
  - perplexity
library_name: transformers
tags:
  - distillroberta-base
  - twitter
pipeline_tag: fill-mask

Twitter-roBERTa-base fine-tuned using masked language modelling

This is a RoBERTa-base model finetuned (domain adaptation) on ~2M tweets from Jin 2009 (sentiment140). This is the first step of a two steps approach to finetune for sentiment analysis (ULMFit) This model is suitable for English.

Main charachetistics:

pretrained model and tokenizer: distillroberta-base
no cleaning/processing applied to the data

Reference Paper: ULMFit. Reference dataset: Sentiment140 Git Repo: TBD Labels: 0 -> Negative; 1 -> Positive