yeniguno's picture
Update README.md
c382594 verified
|
raw
history blame
3.69 kB
metadata
library_name: transformers
base_model: cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: democracy-sentiment-analysis-turkish-roberta
    results: []
license: mit
language:
  - tr

democracy-sentiment-analysis-turkish-roberta

This model is a fine-tuned version of cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4469
  • Accuracy: 0.8184
  • F1: 0.8186
  • Precision: 0.8224
  • Recall: 0.8184

Model description

This model is fine-tuned from the base model cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual for sentiment analysis in Turkish, specifically focusing on democracy-related text. The model classifies texts into three sentiment categories:

Positive Neutral Negative

Intended uses & limitations

This model is well-suited for analyzing sentiments in Turkish texts that discuss democracy, governance, and related political discourse.

Training and evaluation data

The training dataset consists of 30,000 rows gathered from various sources, including: Kaggle, Hugging Face, Ekşi Sözlük, and synthetic data generated using state-of-the-art LLMs. The dataset is multilingual in origin, with texts in English, Russian, and Turkish. All non-Turkish texts were translated into Turkish. The data represents a broad spectrum of democratic discourse from 30 different sources.

How to Use

To use this model for sentiment analysis, you can leverage the Hugging Face pipeline for text classification as shown below:

from transformers import pipeline

# Load the model from Hugging Face
sentiment_model = pipeline(model="yeniguno/democracy-sentiment-analysis-turkish-roberta", task='text-classification')

# Example text input
response = sentiment_model("En iyisi devletin tüm gücünü tek bir lidere verelim")

# Print the result
print(response)
# [{'label': 'negative', 'score': 0.9617443084716797}]

# Example text input
response = sentiment_model("Birçok farklı sesin çıkması zaman alıcı ve karmaşık görünebilir, ancak demokrasinin getirdiği özgürlük ve çeşitlilik, toplumun gerçek gücüdür.")

# Print the result
print(response)
# [{'label': 'positive', 'score': 0.958978533744812}]

# Example text input
response = sentiment_model("Bugün hava yağmurlu.")

# Print the result
print(response)
# [{'label': 'neutral', 'score': 0.9915837049484253}]

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Accuracy F1 Precision Recall
0.7236 1.0 802 0.4797 0.8039 0.8031 0.8037 0.8039
0.424 2.0 1604 0.4469 0.8184 0.8186 0.8224 0.8184

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1