metadata
base_model: distilbert/distilbert-base-multilingual-cased
language:
- en
- zh
- es
- hi
- ar
- bn
- pt
- ru
- ja
- de
- ms
- te
- vi
- ko
- fr
- tr
- it
- pl
- uk
- tl
- nl
- gsw
license: apache-2.0
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- sentiment
- synthetic data
- multi-class
- social-media-analysis
- customer-feedback
- product-reviews
- brand-monitoring
widget:
- text: >-
I absolutely loved this movie! The acting was superb and the plot was
engaging.
example_title: Very Positive Review
- text: The service at this restaurant was terrible. I'll never go back.
example_title: Very Negative Review
- text: The product works as expected. Nothing special, but it gets the job done.
example_title: Neutral Review
- text: I'm somewhat disappointed with my purchase. It's not as good as I hoped.
example_title: Negative Review
- text: This book changed my life! I couldn't put it down and learned so much.
example_title: Very Positive Review
inference:
parameters:
temperature: 1
🚀 distilbert-based Multilingual Sentiment Classification Model
TRY IT HERE: coming soon
NEWS!
- 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.
Model Details
Model Name:
tabularisai/multilingual-sentiment-analysisBase Model:
distilbert/distilbert-base-multilingual-casedTask:
Text Classification (Sentiment Analysis)Languages:
Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).Number of Classes:
5 (Very Negative, Negative, Neutral, Positive, Very Positive)Usage:
- Social media analysis
- Customer feedback analysis
- Product reviews classification
- Brand monitoring
- Market research
- Customer service optimization
- Competitive intelligence
Model Description
This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased
for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.
Training Data
Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages.
Training Procedure
- Fine-tuned for 5 epochs.
- Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset.
Intended Use
Ideal for:
- Multilingual social media monitoring
- International customer feedback analysis
- Global product review sentiment classification
- Worldwide brand sentiment tracking
How to Use
Below is a Python example on how to use the multilingual sentiment model:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "tabularisai/multilingual-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def predict_sentiment(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()
sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
return sentiment_map[predicted_class]
texts = [
# English
"I absolutely loved this movie! The acting was superb and the plot was engaging.",
# Chinese
"我讨厌这种无休止的争吵。",
# Spanish
"El producto funciona como se espera. Nada especial, pero cumple con su función.",
# Arabic
"لم أحب هذا الفيلم على الإطلاق. القصة كانت مملة والشخصيات ضعيفة.",
# Ukrainian
"Я розчарований покупкою, вона не така гарна, як я очікував.",
# Hindi
"यह उत्पाद वास्तव में अद्भुत है! इसका उपयोग करना आसान है और यह मेरे लिए बहुत मददगार रहा।",
# Bengali
"আমি এই রেস্তোরাঁর খাবার পছন্দ করিনি। এটি খুব তেলতেলে এবং অতিরিক্ত রান্না করা।",
# Portuguese
"Este livro é fantástico! Eu aprendi muitas coisas novas e inspiradoras."
]
for text in texts:
sentiment = predict_sentiment(text)
print(f"Text: {text}")
print(f"Sentiment: {sentiment}\n")
Training Procedure
- Dataset: Synthetic multilingual data
- Framework: PyTorch Lightning
- Number of epochs: 5
- Validation Off-by-one Accuracy: ~0.95
Ethical Considerations
Synthetic data reduces bias, but validation in real-world scenarios is advised.
Citation
Will be included.
Contact
For inquiries, private APIs, better models, contact info@tabularis.ai
tabularis.ai