---
'[object Object]': null
license: cc
language:
- en
library_name: adapter-transformers
pipeline_tag: text-classification
---

# Model Card for orYx-models/finetuned-roberta-leadership-sentiment-analysis 

- **Model Description:** This model is a finetuned version of the RoBERTa text classifier(cardiffnlp/twitter-roberta-base-sentiment-latest). It has been trained on a dataset comprising communications from corporate executives to their therapists. Its primary function is to determine whether statements from corporate executives convey a "Positive," "Negative," or "Neutral" sentiment, accompanied by a confidence level indicating the percentage of sentiment expressed in a statement. Being a prototype tool by orYx Models, all feedbacks and insights will be used to further refine the model.

## Model Details

### Model Information

- **Model Type:** Text Classifier
- **Language(s):** English
- **License:** Creative Commons license family
- **Finetuned from Model:** cardiffnlp/twitter-roberta-base-sentiment-latest

### Model Sources

- **HuggingFace Model ID:** cardiffnlp/twitter-roberta-base-2021-124m
- **Paper:** TimeLMs - [Link](https://arxiv.org/abs/2202.03829)

## Uses

- **Use case:** This sentiment analysis tool can analyze text from any user within an organization, such as executives, employees, or clients, and assign a sentiment to it.
- **Outcomes:** The tool generates a "Scored sentiment" which can be used to assess the likelihood of events occurring or vice versa. It can also facilitate the creation of a rating system based on the sentiments expressed in texts.

### Direct Use

```python
nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

nlp("The results don't match, but the effort seems to be always high")

Out[7]: [{'label': 'Positive', 'score': 0.9996090531349182}]
```

- Based on the text the outcomes can be "Positive, Negative, Neutral" along with their confidence score.
- 
### Recommendations
- **Continuous Monitoring:** Regularly monitor the model's performance on new data to ensure its effectiveness and reliability over time.
- **Error Analysis:** Conduct thorough error analysis to identify common patterns of misclassifications and areas for improvement.
- **Fine-Tuning:** Consider fine-tuning the model further based on feedback and insights from users, to enhance its domain-specific performance.
- **Model Interpretability:** Explore techniques for explaining the model's predictions, such as attention mechanisms or feature importance analysis, to increase trust and understanding of its decisions.


## Training Details
```

X_train, X_val, y_train, y_val = train_test_split(X,y, test_size = 0.2, stratify = y)

```

- **Train data:** 80% of 4396 records = 3516
- **Test data:** 20% of 4396 records = 879


### Training Procedure

- **Dataset Split:** Data divided into 80% training and 20% validation sets.
- **Preprocessing:** Input data tokenized into 'input_ids' and 'attention_mask' tensors.
- **Training Hyperparameters:** Set for training, evaluation, and optimization, including batch size, epochs, and logging strategies.
- **Training Execution:** Model trained with specified hyperparameters, monitored with metrics, and logged for evaluation.
- **Evaluation Metrics:** Model evaluated on loss, accuracy, F1 score, precision, and recall for both training and validation sets.

#### Preprocessing [optional]
```
'input_ids': tensor
'attention_mask': tensor
'label': tensor(2)
```

#### Training Hyperparameters
```
args = TrainingArguments(
    output_dir="output",
    do_train = True,
    do_eval = True,
    num_train_epochs = 1,
    per_device_train_batch_size = 4,
    per_device_eval_batch_size = 8,
    warmup_steps = 50,
    weight_decay = 0.01,
    logging_strategy= "steps",
    logging_dir= "logging",
    logging_steps = 50,
    eval_steps = 50,
    save_strategy = "steps",
    fp16 = True,
    #load_best_model_at_end = True
)
```
#### Speeds, Sizes, Times [optional]

- **TrainOutput**
```
global_step=879,
training_loss=0.1825900522650848,
```
- **Metrics**
```
'train_runtime': 101.6309,
'train_samples_per_second': 34.596,
'train_steps_per_second': 8.649,
'total_flos': 346915041274368.0,
'train_loss': 0.1825900522650848,
'epoch': 1.0
```

## Evaluation Metrics Results

```
# Assuming you have a list of evaluation results q and want to create a DataFrame with it
q = [Trainer.evaluate(eval_dataset=df) for df in [train_dataset, val_dataset]]

# Create DataFrame with index and select only the first 5 columns
result_df = pd.DataFrame(q, index=["train", "val"]).iloc[:,:5]

# Display the resulting DataFrame
print(result_df)

______________________________________________________________________
eval_loss  eval_Accuracy   eval_F1  eval_Precision  eval_Recall
train   0.049349       0.988908  0.987063        0.982160     0.992357
val     0.108378       0.976136  0.972464        0.965982     0.979861
______________________________________________________________________
```

**loss**  
- train   0.049349        
- val     0.108378 

**Accuracy** 
- train  0.988908   - **98.8%**
- val    0.976136   - **97.6%**

**F1**  
- train 0.987063    - **98.7%**   
- val   0.972464    - **97.2%**


**Precision**  
- train 0.982160    - **98.2%**
- val   0.965982    - **96.5%**

**Recall**
- train  0.992357   - **99.2%**
- val    0.979861   - **97.9%**


## Environmental Impact


Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** T4 GPU
- **Hours used:** 2 
- **Cloud Provider:** Google
- **Compute Region:** India
- **Carbon Emitted:** No Information Available


### Compute Infrastructure

Google Colab - T4 GPU


### References 
```
@inproceedings{camacho-collados-etal-2022-tweetnlp,
    title = "{T}weet{NLP}: Cutting-Edge Natural Language Processing for Social Media",
    author = "Camacho-collados, Jose  and
      Rezaee, Kiamehr  and
      Riahi, Talayeh  and
      Ushio, Asahi  and
      Loureiro, Daniel  and
      Antypas, Dimosthenis  and
      Boisson, Joanne  and
      Espinosa Anke, Luis  and
      Liu, Fangyu  and
      Mart{\'\i}nez C{\'a}mara, Eugenio" and others,
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-demos.5",
    pages = "38--49"
}

```


## Model Card Authors [optional]

Vineedhar, relkino

## Model Card Contact

https://khalidalhosni.com/