Multilingual Binary Sentiment Classifier (XLM-RoBERTa)

Fine-tuned xlm-roberta-base for binary sentiment classification (positive / negative) across 17 languages. Built for the CMPE 346 (Natural Language Processing) Assignment 02 at İstanbul Bilgi University.

Quick start

from transformers import pipeline

clf = pipeline("text-classification", model="Tunahan241/cmpe346-sentiment")

clf("This product changed my life, absolutely love it!")
# [{'label': 'LABEL_1', 'score': 0.998}]   # LABEL_1 = positive

clf("Très déçu, ne fonctionne pas du tout.")
# [{'label': 'LABEL_0', 'score': 0.995}]   # LABEL_0 = negative

clf("非常满意,推荐购买!")
# [{'label': 'LABEL_1', 'score': 0.992}]   # LABEL_1 = positive

Label mapping:

Label Sentiment
LABEL_0 negative
LABEL_1 positive

Performance

Evaluated on a held-out validation set of 17 500 multilingual reviews:

Metric Score
F1 0.9175
Accuracy 0.918

For comparison, the baseline F1 cited in the assignment is 0.7958 — this model exceeds it by +0.12 absolute (+15 % relative).

Languages

Trained on data spanning these languages (train-set frequency):

Language Code Samples
English en 51 989
Chinese zh 15 430
Japanese ja 11 974
French fr 10 549
Spanish es 8 484
Russian ru 8 477
German de 8 171
Korean ko 7 881
Arabic ar 5 998
Vietnamese vi 5 761
Turkish tr 2 205
Portuguese pt 1 521
Indonesian id 531
Multilingual mixed multilingual 405
Hindi hi 277
Malay ms 250
Italian it 97

Total: 140 000 training reviews.

Training details

Base model

xlm-roberta-base — 270 M-parameter multilingual masked language model pretrained on CC-100 across 100 languages. A single linear classification head is added on top of the <s> (CLS) token.

Hyperparameters

Setting Value
Optimizer AdamW
Learning rate 2e-5
Weight decay 0.01
Warmup ratio 0.1 (linear)
LR schedule Linear decay
Epochs 3
Batch size 32 per device
Max sequence length 256
Mixed precision fp16
Early stopping patience = 2 on val F1
Best-model selection by validation F1

Compute

  • 1 × NVIDIA T4 GPU (Google Colab free tier)
  • ~85 minutes wall-clock for 3 epochs on 140 000 samples

Preprocessing

Multilingual-safe minimal cleaning:

  • NFKC Unicode normalization (unifies full/half-width forms)
  • URL and HTML tag stripping
  • Whitespace collapse
  • No lowercasing and no accent stripping — preserves signal in non-Latin scripts (CJK, Cyrillic, Arabic, Devanagari, etc.).

Tokenization is handled by the pretrained XLM-R SentencePiece tokenizer (250 002 subword vocabulary, 100 languages).

Intended use

  • Multilingual product / review / short-text sentiment classification.
  • Cross-lingual transfer to the 100 languages XLM-R was pretrained on (zero-shot to languages not in the fine-tuning set is plausible but not benchmarked here).

Limitations

  • The training distribution is dominated by English, Chinese, Japanese and Romance languages. Performance on under-represented languages (Italian: 97 samples; Hindi: 277 samples) is less reliable.
  • Trained on review-style text; performance on long-form articles, formal documents, or strongly domain-specific text (legal, medical) is not guaranteed.
  • Binary only — does not capture neutral, mixed, or fine-grained sentiment.

Citation

This model was developed as a course assignment:

  • CMPE 346 — Natural Language Processing
  • Assignment 02 — Multilingual Binary Sentiment Classification
  • İstanbul Bilgi University, 2026

Author

Tunahan İbiş

Downloads last month
78
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tunahan241/cmpe346-sentiment

Finetuned
(4023)
this model