vdmbrsv's picture
Update README.md
825809b verified
|
raw
history blame
5.79 kB
metadata
base_model: distilbert/distilbert-base-multilingual-cased
language:
  - en
  - zh
  - es
  - hi
  - ar
  - bn
  - pt
  - ru
  - ja
  - de
  - ms
  - te
  - vi
  - ko
  - fr
  - tr
  - it
  - pl
  - uk
  - tl
  - nl
  - gsw
license: apache-2.0
pipeline_tag: text-classification
tags:
  - text-classification
  - sentiment-analysis
  - sentiment
  - synthetic data
  - multi-class
  - social-media-analysis
  - customer-feedback
  - product-reviews
  - brand-monitoring
widget:
  - text: >-
      I absolutely loved this movie! The acting was superb and the plot was
      engaging.
    example_title: Very Positive Review
  - text: The service at this restaurant was terrible. I'll never go back.
    example_title: Very Negative Review
  - text: The product works as expected. Nothing special, but it gets the job done.
    example_title: Neutral Review
  - text: I'm somewhat disappointed with my purchase. It's not as good as I hoped.
    example_title: Negative Review
  - text: This book changed my life! I couldn't put it down and learned so much.
    example_title: Very Positive Review
inference:
  parameters:
    temperature: 1

🚀 distilbert-based Multilingual Sentiment Classification Model

TRY IT HERE: coming soon

Join Our Discord

NEWS!

  • 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.

Model Details

  • Model Name: tabularisai/multilingual-sentiment-analysis
  • Base Model: distilbert/distilbert-base-multilingual-cased
  • Task: Text Classification (Sentiment Analysis)
  • Languages: Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).
  • Number of Classes: 5 (Very Negative, Negative, Neutral, Positive, Very Positive)
  • Usage:
    • Social media analysis
    • Customer feedback analysis
    • Product reviews classification
    • Brand monitoring
    • Market research
    • Customer service optimization
    • Competitive intelligence

Model Description

This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.

Training Data

Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages.

Training Procedure

  • Fine-tuned for 5 epochs.
  • Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset.

Intended Use

Ideal for:

  • Multilingual social media monitoring
  • International customer feedback analysis
  • Global product review sentiment classification
  • Worldwide brand sentiment tracking

How to Use

Below is a Python example on how to use the multilingual sentiment model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "tabularisai/multilingual-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(probabilities, dim=-1).item()
    sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
    return sentiment_map[predicted_class]

texts = [
    # English
    "I absolutely loved this movie! The acting was superb and the plot was engaging.",
    
    # Chinese
    "我讨厌这种无休止的争吵。",
    
    # Spanish
    "El producto funciona como se espera. Nada especial, pero cumple con su función.",
    
    # Arabic
    "لم أحب هذا الفيلم على الإطلاق. القصة كانت مملة والشخصيات ضعيفة.",
    
    # Ukrainian
    "Я розчарований покупкою, вона не така гарна, як я очікував.",
    
    # Hindi
    "यह उत्पाद वास्तव में अद्भुत है! इसका उपयोग करना आसान है और यह मेरे लिए बहुत मददगार रहा।",
    
    # Bengali
    "আমি এই রেস্তোরাঁর খাবার পছন্দ করিনি। এটি খুব তেলতেলে এবং অতিরিক্ত রান্না করা।",
    
    # Portuguese
    "Este livro é fantástico! Eu aprendi muitas coisas novas e inspiradoras."
]

for text in texts:
    sentiment = predict_sentiment(text)
    print(f"Text: {text}")
    print(f"Sentiment: {sentiment}\n")

Training Procedure

  • Dataset: Synthetic multilingual data
  • Framework: PyTorch Lightning
  • Number of epochs: 5
  • Validation Off-by-one Accuracy: ~0.95

Ethical Considerations

Synthetic data reduces bias, but validation in real-world scenarios is advised.

Citation

Will be included.

Contact

For inquiries, private APIs, better models, contact info@tabularis.ai

tabularis.ai