File size: 6,977 Bytes
ef9c872
729044d
ef9c872
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f86ca3
e196e7f
729044d
 
 
 
 
 
 
 
 
 
 
b720c97
ef9c872
 
729044d
 
 
934fec4
 
825809b
 
 
729044d
 
 
 
9e6356e
 
729044d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e6356e
 
 
729044d
9e6356e
 
 
d0c693f
729044d
 
 
 
 
 
 
 
 
 
 
 
 
 
167c12a
729044d
 
 
 
 
 
 
e5cba73
 
729044d
 
 
 
e5cba73
729044d
 
 
e5cba73
729044d
e5cba73
729044d
e5cba73
729044d
e5cba73
729044d
e5cba73
729044d
e5cba73
729044d
e5cba73
729044d
e5cba73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
729044d
 
e5cba73
 
729044d
 
 
 
 
 
 
 
 
 
 
 
 
d0c693f
729044d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
base_model: distilbert/distilbert-base-multilingual-cased
language:
- en
- zh
- es
- hi
- ar
- bn
- pt
- ru
- ja
- de
- ms
- te
- vi
- ko
- fr
- tr
- it
- pl
- uk
- tl
- nl
- gsw
library_name: transformers
license: cc-by-nc-4.0
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- sentiment
- synthetic data
- multi-class
- social-media-analysis
- customer-feedback
- product-reviews
- brand-monitoring
- multilingual
---


# 🚀 distilbert-based Multilingual Sentiment Classification Model

<!-- TRY IT HERE: `coming soon`
 -->
[![Join Our Discord](https://img.shields.io/badge/Discord-Join%20Now-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/sznxwdqBXj)



# NEWS!

- 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.

## Model Details
- `Model Name:` tabularisai/multilingual-sentiment-analysis
- `Base Model:` distilbert/distilbert-base-multilingual-cased
- `Task:` Text Classification (Sentiment Analysis)
- `Languages:` Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).
- `Number of Classes:` 5 (*Very Negative, Negative, Neutral, Positive, Very Positive*)
- `Usage:`
  - Social media analysis
  - Customer feedback analysis
  - Product reviews classification
  - Brand monitoring
  - Market research
  - Customer service optimization
  - Competitive intelligence

## Model Description

This model is a fine-tuned version of `distilbert/distilbert-base-multilingual-cased` for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.

### Training Data

Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages.

### Training Procedure

- Fine-tuned for 3 epochs.
- Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset.

## Intended Use

Ideal for:
- Multilingual social media monitoring
- International customer feedback analysis
- Global product review sentiment classification
- Worldwide brand sentiment tracking

## How to Use

Below is a Python example on how to use the multilingual sentiment model:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "tabularisai/multilingual-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(texts):
    inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
    return [sentiment_map[p] for p in torch.argmax(probabilities, dim=-1).tolist()]

texts = [
    # English
    "I absolutely love the new design of this app!", "The customer service was disappointing.", "The weather is fine, nothing special.",
    # Chinese
    "这家餐厅的菜味道非常棒!", "我对他的回答很失望。", "天气今天一般。",
    # Spanish
    "¡Me encanta cómo quedó la decoración!", "El servicio fue terrible y muy lento.", "El libro estuvo más o menos.",
    # Arabic
    "الخدمة في هذا الفندق رائعة جدًا!", "لم يعجبني الطعام في هذا المطعم.", "كانت الرحلة عادية。",
    # Ukrainian
    "Мені дуже сподобалася ця вистава!", "Обслуговування було жахливим.", "Книга була посередньою。",
    # Hindi
    "यह जगह सच में अद्भुत है!", "यह अनुभव बहुत खराब था।", "फिल्म ठीक-ठाक थी।",
    # Bengali
    "এখানকার পরিবেশ অসাধারণ!", "সেবার মান একেবারেই খারাপ।", "খাবারটা মোটামুটি ছিল।",
    # Portuguese
    "Este livro é fantástico! Eu aprendi muitas coisas novas e inspiradoras.", 
    "Não gostei do produto, veio quebrado.", "O filme foi ok, nada de especial.",
    # Japanese
    "このレストランの料理は本当に美味しいです!", "このホテルのサービスはがっかりしました。", "天気はまあまあです。",
    # Russian
    "Я в восторге от этого нового гаджета!", "Этот сервис оставил у меня только разочарование.", "Встреча была обычной, ничего особенного.",
    # French
    "J'adore ce restaurant, c'est excellent !", "L'attente était trop longue et frustrante.", "Le film était moyen, sans plus.",
    # Turkish
    "Bu otelin manzarasına bayıldım!", "Ürün tam bir hayal kırıklığıydı.", "Konser fena değildi, ortalamaydı.",
    # Italian
    "Adoro questo posto, è fantastico!", "Il servizio clienti è stato pessimo.", "La cena era nella media.",
    # Polish
    "Uwielbiam tę restaurację, jedzenie jest świetne!", "Obsługa klienta była rozczarowująca.", "Pogoda jest w porządku, nic szczególnego.",
    # Tagalog
    "Ang ganda ng lugar na ito, sobrang aliwalas!", "Hindi maganda ang serbisyo nila dito.", "Maayos lang ang palabas, walang espesyal.",
    # Dutch
    "Ik ben echt blij met mijn nieuwe aankoop!", "De klantenservice was echt slecht.", "De presentatie was gewoon oké, niet bijzonder.",
    # Malay
    "Saya suka makanan di sini, sangat sedap!", "Pengalaman ini sangat mengecewakan.", "Hari ini cuacanya biasa sahaja.",
    # Korean
    "이 가게의 케이크는 정말 맛있어요!", "서비스가 너무 별로였어요.", "날씨가 그저 그렇네요.",
    # Swiss German
    "Ich find dä Service i de Beiz mega guet!", "Däs Esä het mir nöd gfalle.", "D Wätter hüt isch so naja."
]

for text, sentiment in zip(texts, predict_sentiment(texts)):
    print(f"Text: {text}\nSentiment: {sentiment}\n")
```

## Ethical Considerations

Synthetic data reduces bias, but validation in real-world scenarios is advised.

## Citation
```
Will be included.
```

## Contact

For inquiries, data, private APIs, better models, contact info@tabularis.ai

tabularis.ai