---
datasets:
- stanfordnlp/sentiment140
- heegyu/news-category-dataset
- gfissore/arxiv-abstracts-2021
- snoop2head/enron_aeslc_emails
- bookcorpus/bookcorpus
- wikimedia/wikipedia
language:
- en
base_model: google-bert/bert-base-uncased
pipeline_tag: text-classification
license: apache-2.0
---

## Tweet Style Classifier


This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether an English text is a tweet or not. 

Tweet texts were gathered from ClimaConvo (https://github.com/shucoll/ClimaConvo) and Sentiment140 (stanfordnlp/sentiment140).

Non-tweet texts were gathered from diverse sources including News article descriptions (heegyu/news-category-dataset), academic papers (gfissore/arxiv-abstracts-2021), 
emails (snoop2head/enron_aeslc_emails), books (bookcorpus/bookcorpus), and Wikipedoa articles (wikimedia/wikipedia).

The dataset contained about 60K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing.
The NVIDIA RTX A6000 GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.

The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets. 

### How to use

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

model_name = "rabuahmad/tweet-style-classifier"

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)

classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)

text = "Yesterday was a great day!"

result = classifier(text)

```
Label 1 indicates that the text is predicted to be a tweet. 

### Evaluation 

Evaluation results on the test set: 

| Metric   |Score      |
|----------|-----------|
| Accuracy | 0.99312   |
| Precision| 0.99251   |
| Recall   | 0.99397   |
| F1       | 0.99324   |