Model Card for Sentiment Analysis on Primate Dataset

This model card provides details about a sentiment analysis model trained on a dataset containing posts related to primates. The model predicts sentiment labels for textual data using transformer-based architectures.

Model Details

Model Description

The sentiment analysis model aims to classify text data into sentiment categories such as positive, negative, or neutral. It utilizes transformer-based architectures for sequence classification.

Developed by: Jaskaran Singh
Model type: Transformer-based sentiment analysis model
Language(s) (NLP): English
License: MIT
Finetuned from model: Transformer-based pre-trained model

Model Sources

Repository: https://github.com/JaskaranSingh-01/Sentiment_Analyzer
Demo: https://sentimentanalyzer-f76oxwautwypxpea4lj3wg.streamlit.app/

Uses

Direct Use

The model can be directly used for sentiment analysis tasks, particularly on textual data related to primates.

Downstream Use

The model can be fine-tuned for specific downstream tasks or integrated into larger applications requiring sentiment analysis functionality.

Bias, Risks, and Limitations

Bias

The model's predictions may reflect biases present in the training data, including any biases related to primates or sentiment labeling.

Risks

Misclassification: The model may misclassify sentiment due to ambiguity or complexity in the text.
Generalization: The model's performance may vary across different domains or datasets.

Limitations

Limited Domain: The model's effectiveness may be limited to text related to primates.
Cultural Bias: The model's performance may be influenced by cultural nuances present in the training data.

Recommendations

Users should be cautious when interpreting the model's predictions, considering potential biases and limitations. Fine-tuning on domain-specific data or applying post-processing techniques may help mitigate biases and improve performance.

How to Get Started with the Model

# Example code for using the sentiment analysis model

# 1. Load the model and tokenizer
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("sbcBI/sentiment_analysis_model")
model = AutoModelForSequenceClassification.from_pretrained("sbcBI/sentiment_analysis_model")

# 2. Tokenize input text
text = "Sample text for sentiment analysis"
encoded_input = tokenizer(text, return_tensors='pt')

# 3. Perform inference
output = model(**encoded_input)
predicted_label = output.logits.argmax().item()

# 4. Interpret prediction
sentiment_labels = ['Negative', 'Neutral', 'Positive']
print("Predicted Sentiment:", sentiment_labels[predicted_label])

Training Details

Training Data

The training data consists of posts related to primates, annotated with sentiment labels.

Training Procedure

Preprocessing

Text data underwent preprocessing steps including lowercase conversion, punctuation removal, tokenization, stopword removal, and stemming.

Training Hyperparameters

Training regime: Fine-tuning of transformer-based pre-trained model
Optimizer: Adam optimizer
Learning rate: 5e-5
Batch size: 8
Epochs: 10

Evaluation

Testing Data, Factors & Metrics

Testing Data: Holdout test set
Metrics: Accuracy, Precision, Recall, F1-score

Results

Accuracy: 0.79
Precision: 0.74
Recall: 0.77
F1-score: 0.75

Environmental Impact

Carbon emissions were not directly measured for model training. However, users should consider the environmental impact of training and deploying machine learning models, especially on large-scale infrastructure.

Technical Specifications

Model Architecture and Objective

The model architecture is based on transformer-based architectures, specifically designed for sequence classification tasks such as sentiment analysis.

Compute Infrastructure

Software

Framework: PyTorch
Dependencies: Transformers, NLTK