File size: 6,977 Bytes
ef9c872 729044d ef9c872 4f86ca3 e196e7f 729044d b720c97 ef9c872 729044d 934fec4 825809b 729044d 9e6356e 729044d 9e6356e 729044d 9e6356e d0c693f 729044d 167c12a 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d e5cba73 729044d d0c693f 729044d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
---
base_model: distilbert/distilbert-base-multilingual-cased
language:
- en
- zh
- es
- hi
- ar
- bn
- pt
- ru
- ja
- de
- ms
- te
- vi
- ko
- fr
- tr
- it
- pl
- uk
- tl
- nl
- gsw
library_name: transformers
license: cc-by-nc-4.0
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- sentiment
- synthetic data
- multi-class
- social-media-analysis
- customer-feedback
- product-reviews
- brand-monitoring
- multilingual
---
# 🚀 distilbert-based Multilingual Sentiment Classification Model
<!-- TRY IT HERE: `coming soon`
-->
[![Join Our Discord](https://img.shields.io/badge/Discord-Join%20Now-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/sznxwdqBXj)
# NEWS!
- 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.
## Model Details
- `Model Name:` tabularisai/multilingual-sentiment-analysis
- `Base Model:` distilbert/distilbert-base-multilingual-cased
- `Task:` Text Classification (Sentiment Analysis)
- `Languages:` Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).
- `Number of Classes:` 5 (*Very Negative, Negative, Neutral, Positive, Very Positive*)
- `Usage:`
- Social media analysis
- Customer feedback analysis
- Product reviews classification
- Brand monitoring
- Market research
- Customer service optimization
- Competitive intelligence
## Model Description
This model is a fine-tuned version of `distilbert/distilbert-base-multilingual-cased` for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.
### Training Data
Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages.
### Training Procedure
- Fine-tuned for 3 epochs.
- Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset.
## Intended Use
Ideal for:
- Multilingual social media monitoring
- International customer feedback analysis
- Global product review sentiment classification
- Worldwide brand sentiment tracking
## How to Use
Below is a Python example on how to use the multilingual sentiment model:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "tabularisai/multilingual-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def predict_sentiment(texts):
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
return [sentiment_map[p] for p in torch.argmax(probabilities, dim=-1).tolist()]
texts = [
# English
"I absolutely love the new design of this app!", "The customer service was disappointing.", "The weather is fine, nothing special.",
# Chinese
"这家餐厅的菜味道非常棒!", "我对他的回答很失望。", "天气今天一般。",
# Spanish
"¡Me encanta cómo quedó la decoración!", "El servicio fue terrible y muy lento.", "El libro estuvo más o menos.",
# Arabic
"الخدمة في هذا الفندق رائعة جدًا!", "لم يعجبني الطعام في هذا المطعم.", "كانت الرحلة عادية。",
# Ukrainian
"Мені дуже сподобалася ця вистава!", "Обслуговування було жахливим.", "Книга була посередньою。",
# Hindi
"यह जगह सच में अद्भुत है!", "यह अनुभव बहुत खराब था।", "फिल्म ठीक-ठाक थी।",
# Bengali
"এখানকার পরিবেশ অসাধারণ!", "সেবার মান একেবারেই খারাপ।", "খাবারটা মোটামুটি ছিল।",
# Portuguese
"Este livro é fantástico! Eu aprendi muitas coisas novas e inspiradoras.",
"Não gostei do produto, veio quebrado.", "O filme foi ok, nada de especial.",
# Japanese
"このレストランの料理は本当に美味しいです!", "このホテルのサービスはがっかりしました。", "天気はまあまあです。",
# Russian
"Я в восторге от этого нового гаджета!", "Этот сервис оставил у меня только разочарование.", "Встреча была обычной, ничего особенного.",
# French
"J'adore ce restaurant, c'est excellent !", "L'attente était trop longue et frustrante.", "Le film était moyen, sans plus.",
# Turkish
"Bu otelin manzarasına bayıldım!", "Ürün tam bir hayal kırıklığıydı.", "Konser fena değildi, ortalamaydı.",
# Italian
"Adoro questo posto, è fantastico!", "Il servizio clienti è stato pessimo.", "La cena era nella media.",
# Polish
"Uwielbiam tę restaurację, jedzenie jest świetne!", "Obsługa klienta była rozczarowująca.", "Pogoda jest w porządku, nic szczególnego.",
# Tagalog
"Ang ganda ng lugar na ito, sobrang aliwalas!", "Hindi maganda ang serbisyo nila dito.", "Maayos lang ang palabas, walang espesyal.",
# Dutch
"Ik ben echt blij met mijn nieuwe aankoop!", "De klantenservice was echt slecht.", "De presentatie was gewoon oké, niet bijzonder.",
# Malay
"Saya suka makanan di sini, sangat sedap!", "Pengalaman ini sangat mengecewakan.", "Hari ini cuacanya biasa sahaja.",
# Korean
"이 가게의 케이크는 정말 맛있어요!", "서비스가 너무 별로였어요.", "날씨가 그저 그렇네요.",
# Swiss German
"Ich find dä Service i de Beiz mega guet!", "Däs Esä het mir nöd gfalle.", "D Wätter hüt isch so naja."
]
for text, sentiment in zip(texts, predict_sentiment(texts)):
print(f"Text: {text}\nSentiment: {sentiment}\n")
```
## Ethical Considerations
Synthetic data reduces bias, but validation in real-world scenarios is advised.
## Citation
```
Will be included.
```
## Contact
For inquiries, data, private APIs, better models, contact info@tabularis.ai
tabularis.ai |