LocalDoc
/

sentiment_analysis_azerbaijani

@@ -4,10 +4,11 @@ language:
 - az
 pipeline_tag: text-classification
 ---
-# Sentiment Analysis in Azerbaijani Language
 ## Model Description
-This repository contains a multilingual language detection model based on the XLM-RoBERTa base architecture. The model is capable of distinguishing between 21 different languages including Arabic, Azerbaijani, Bulgarian, German, Greek, English, Spanish, French, Hindi, Italian, Japanese, Dutch, Polish, Portuguese, Russian, Swahili, Thai, Turkish, Urdu, Vietnamese, and Chinese.
 ## How to Use
 You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.
@@ -19,58 +20,51 @@ pip install transformers
 ```
 ```python
-from transformers import AutoModelForSequenceClassification, XLMRobertaTokenizer
 import torch
-# Load tokenizer and model
-tokenizer = XLMRobertaTokenizer.from_pretrained("LocalDoc/language_detection")
-model = AutoModelForSequenceClassification.from_pretrained("LocalDoc/language_detection")
-# Prepare text
-text = "Əlqasım oğulları vorzakondu"
-encoded_input = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
-# Prediction
-model.eval()
-with torch.no_grad():
-    outputs = model(**encoded_input)
-# Process the outputs
-logits = outputs.logits
-probabilities = torch.nn.functional.softmax(logits, dim=-1)
-predicted_class_index = probabilities.argmax().item()
-labels = ["az", "ar", "bg", "de", "el", "en", "es", "fr", "hi", "it", "ja", "nl", "pl", "pt", "ru", "sw", "th", "tr", "ur", "vi", "zh"]
-predicted_label = labels[predicted_class_index]
-print(f"Predicted Language: {predicted_label}")
 ```
 ## Language Label Information
 The model outputs a label for each prediction, corresponding to one of the languages listed below. Each label is associated with a specific language code as detailed in the following table:
-| Label | Language Code | Language Name |
-|-------|---------------|---------------|
-| 0     | az            | Azerbaijani   |
-| LABEL_1     | ar            | Arabic        |
-| LABEL_2     | bg            | Bulgarian     |
-| LABEL_3     | de            | German        |
-| LABEL_4     | el            | Greek         |
-| LABEL_5     | en            | English       |
-| LABEL_6     | es            | Spanish       |
-| LABEL_7     | fr            | French        |
-| LABEL_8     | hi            | Hindi         |
-| LABEL_9     | it            | Italian       |
-| LABEL_10    | ja            | Japanese      |
-| LABEL_11    | nl            | Dutch         |
-| LABEL_12    | pl            | Polish        |
-| LABEL_13    | pt            | Portuguese    |
-| LABEL_14    | ru            | Russian       |
-| LABEL_15    | sw            | Swahili       |
-| LABEL_16    | th            | Thai          |
-| LABEL_17    | tr            | Turkish       |
-| LABEL_18    | ur            | Urdu          |
-| LABEL_19    | vi            | Vietnamese    |
-| LABEL_20    | zh            | Chinese       |
 This mapping is utilized to decode the model's predictions into understandable language names, facilitating the interpretation of results for further processing or analysis.

 - az
 pipeline_tag: text-classification
 ---
+# Sentiment Analysis Model for Azerbaijani Text
+This repository hosts a fine-tuned XLM-RoBERTa model for sentiment analysis on Azerbaijani text. The model is capable of classifying text into three categories: negative, neutral, and positive. This README provides guidelines on how to setup and use the model for your own sentiment analysis tasks.
 ## Model Description
+The model is based on `xlm-roberta-base`, which has been fine-tuned on a diverse dataset of Azerbaijani text samples. It is designed to understand the sentiment expressed in texts and classify them accordingly.
 ## How to Use
 You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.
 ```
 ```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
 import torch
+# Load the model and tokenizer from Hugging Face Hub
+model_name = "LocalDoc/sentiment_analysis_azerbaijani"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+def predict_sentiment(text):
+    # Encode the text using the tokenizer
+    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
+    # Get predictions from the model
+    with torch.no_grad():
+        outputs = model(**inputs)
+    # Convert logits to probabilities using softmax
+    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    # Get the highest probability and corresponding label
+    top_prob, top_label = torch.max(probs, dim=-1)
+    labels = ["negative", "neutral", "positive"]
+    # Return the label with the highest probability
+    return labels[top_label], top_prob
+# Example text
+text = "Bu mənim xoşuma gəlir"
+# Get the sentiment
+predicted_label, probability = predict_sentiment(text)
+print(f"Predicted sentiment: {predicted_label} with a probability of {probability.item():.4f}")
 ```
 ## Language Label Information
 The model outputs a label for each prediction, corresponding to one of the languages listed below. Each label is associated with a specific language code as detailed in the following table:
+| Label | Result |
+|-------|--------|
+| 0     | negative |
+| 1     | neutral |
+| 2     | positive |
 This mapping is utilized to decode the model's predictions into understandable language names, facilitating the interpretation of results for further processing or analysis.