Hindi

Hindi Sentiment Analysis Model

This repository contains a Hindi sentiment analysis model that can classify text into three categories: negative (neg), neutral (neu), and positive (pos). The model has been trained and evaluated using various BERT-based architectures, with XLM-RoBERTa showing the best performance.

Model Performance

Test Accuracy Comparison

Test Accuracy Comparison

Our extensive evaluation shows:

  • XLM-RoBERTa: 81.3%
  • mBERT: 76.5%
  • Custom-BERT-Attention: 74.9%
  • IndicBERT: 69.9%

Detailed Results

Confusion Matrices

Confusion Matrices

The confusion matrices show the prediction performance for each model:

  • XLM-RoBERTa shows the strongest performance with 82.1% accuracy on positive class
  • mBERT demonstrates balanced performance across classes
  • Custom-BERT-Attention maintains consistent performance
  • IndicBERT shows room for improvement in negative class detection

Per-class Metrics

Per-class Metrics

The detailed per-class metrics show:

  1. Precision:

    • Positive class: Best performance across all models (~0.80-0.85)
    • Neutral class: Consistent performance (~0.75-0.80)
    • Negative class: More varied performance (~0.40-0.70)
  2. Recall:

    • Positive class: High recall across models (~0.85-0.90)
    • Neutral class: Moderate recall (~0.65-0.85)
    • Negative class: Lower but improving recall (~0.25-0.60)
  3. F1-Score:

    • Positive class: Best overall performance (~0.80-0.85)
    • Neutral class: Good balance (~0.70-0.80)
    • Negative class: Area for potential improvement (~0.30-0.65)

Training Progress

Training Progress

The training graphs show:

  • Consistent loss reduction across epochs
  • Stable validation accuracy improvement
  • No significant overfitting
  • XLM-RoBERTa achieving the best validation accuracy
  • Custom-BERT-Attention showing rapid initial learning

Model Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("madhav112/hindi-sentiment-analysis")
model = AutoModelForSequenceClassification.from_pretrained("madhav112/hindi-sentiment-analysis")

# Example usage
text = "यह फिल्म बहुत अच्छी है"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
predictions = outputs.logits.argmax(-1)

Model Architecture

The repository contains experiments with multiple BERT-based architectures:

  1. XLM-RoBERTa (Best performing)

    • Highest overall accuracy
    • Best performance on positive sentiment
    • Strong cross-lingual capabilities
  2. mBERT

    • Good balanced performance
    • Strong on neutral class detection
    • Consistent across all metrics
  3. Custom-BERT-Attention

    • Competitive performance
    • Quick convergence during training
    • Good precision on positive class
  4. IndicBERT

    • Baseline performance
    • Room for improvement
    • Better suited for specific Indian language tasks

Dataset

The model was trained on a Hindi sentiment analysis dataset with three classes:

  • Positive (pos)
  • Neutral (neu)
  • Negative (neg)

The confusion matrices show balanced class distribution and strong performance across categories.

Training Details

The model was trained for 7 epochs with the following characteristics:

  • Learning rate: Optimized for each architecture
  • Batch size: Adjusted for optimal performance
  • Validation split: Regular evaluation during training
  • Early stopping: Monitored for best model selection
  • Loss function: Cross-entropy loss

Limitations

  • Lower performance on negative sentiment detection compared to positive
  • Neutral class classification shows moderate confusion with both positive and negative
  • Performance may vary on domain-specific text
  • Best suited for standard Hindi text; may have reduced performance on heavily colloquial or dialectal variations

Citation

If you use this model in your research, please cite:

@misc{madhav2024hindisentiment,
  author = {Madhav},
  title = {Hindi Sentiment Analysis Model},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/madhav112/hindi-sentiment-analysis}}
}

Author

Madhav

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Special thanks to the HuggingFace team and the open-source community for providing the tools and frameworks that made this model possible.

language: hi tags:

  • hindi
  • sentiment-analysis
  • text-classification
  • bert datasets:
  • hindi-sentiment metrics:
  • accuracy
  • f1
  • precision
  • recall model-index:
  • name: hindi-sentiment-analysis results:
    • task: type: text-classification name: Text Classification dataset: name: Hindi Sentiment type: hindi-sentiment metrics:
      • type: accuracy value: 81.3 name: Test Accuracy
      • type: f1 value: 0.82 name: F1 Score
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for madhav112/hindi-sentiment-analysis

Finetuned
(376)
this model

Dataset used to train madhav112/hindi-sentiment-analysis