--- base_model: distilbert/distilbert-base-multilingual-cased language: - en - zh - es - hi - ar - bn - pt - ru - ja - de - ms - te - vi - ko - fr - tr - it - pl - uk - tl - nl - gsw license: apache-2.0 pipeline_tag: text-classification tags: - text-classification - sentiment-analysis - sentiment - synthetic data - multi-class - social-media-analysis - customer-feedback - product-reviews - brand-monitoring widget: - text: >- I absolutely loved this movie! The acting was superb and the plot was engaging. example_title: Very Positive Review - text: The service at this restaurant was terrible. I'll never go back. example_title: Very Negative Review - text: The product works as expected. Nothing special, but it gets the job done. example_title: Neutral Review - text: I'm somewhat disappointed with my purchase. It's not as good as I hoped. example_title: Negative Review - text: This book changed my life! I couldn't put it down and learned so much. example_title: Very Positive Review inference: parameters: temperature: 1 --- # 🚀 distilbert-based Multilingual Sentiment Classification Model TRY IT HERE: `coming soon` [![Join Our Discord](https://img.shields.io/badge/Discord-Join%20Now-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/sznxwdqBXj) # NEWS! - 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach. ## Model Details - `Model Name:` tabularisai/multilingual-sentiment-analysis - `Base Model:` distilbert/distilbert-base-multilingual-cased - `Task:` Text Classification (Sentiment Analysis) - `Languages:` Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch). - `Number of Classes:` 5 (*Very Negative, Negative, Neutral, Positive, Very Positive*) - `Usage:` - Social media analysis - Customer feedback analysis - Product reviews classification - Brand monitoring - Market research - Customer service optimization - Competitive intelligence ## Model Description This model is a fine-tuned version of `distilbert/distilbert-base-multilingual-cased` for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts. ### Training Data Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages. ### Training Procedure - Fine-tuned for 5 epochs. - Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset. ## Intended Use Ideal for: - Multilingual social media monitoring - International customer feedback analysis - Global product review sentiment classification - Worldwide brand sentiment tracking ## How to Use Below is a Python example on how to use the multilingual sentiment model: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "tabularisai/multilingual-sentiment-analysis" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) def predict_sentiment(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(probabilities, dim=-1).item() sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"} return sentiment_map[predicted_class] texts = [ # English "I absolutely loved this movie! The acting was superb and the plot was engaging.", # Chinese "我讨厌这种无休止的争吵。", # Spanish "El producto funciona como se espera. Nada especial, pero cumple con su función.", # Arabic "لم أحب هذا الفيلم على الإطلاق. القصة كانت مملة والشخصيات ضعيفة.", # Ukrainian "Я розчарований покупкою, вона не така гарна, як я очікував.", # Hindi "यह उत्पाद वास्तव में अद्भुत है! इसका उपयोग करना आसान है और यह मेरे लिए बहुत मददगार रहा।", # Bengali "আমি এই রেস্তোরাঁর খাবার পছন্দ করিনি। এটি খুব তেলতেলে এবং অতিরিক্ত রান্না করা।", # Portuguese "Este livro é fantástico! Eu aprendi muitas coisas novas e inspiradoras." ] for text in texts: sentiment = predict_sentiment(text) print(f"Text: {text}") print(f"Sentiment: {sentiment}\n") ``` ## Training Procedure - Dataset: Synthetic multilingual data - Framework: PyTorch Lightning - Number of epochs: 5 - Validation Off-by-one Accuracy: ~0.95 ## Ethical Considerations Synthetic data reduces bias, but validation in real-world scenarios is advised. ## Citation ``` Will be included. ``` ## Contact For inquiries, private APIs, better models, contact info@tabularis.ai tabularis.ai