vrashad commited on
Commit
347458d
1 Parent(s): ce91500

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -46
README.md CHANGED
@@ -4,10 +4,11 @@ language:
4
  - az
5
  pipeline_tag: text-classification
6
  ---
7
- # Sentiment Analysis in Azerbaijani Language
 
8
 
9
  ## Model Description
10
- This repository contains a multilingual language detection model based on the XLM-RoBERTa base architecture. The model is capable of distinguishing between 21 different languages including Arabic, Azerbaijani, Bulgarian, German, Greek, English, Spanish, French, Hindi, Italian, Japanese, Dutch, Polish, Portuguese, Russian, Swahili, Thai, Turkish, Urdu, Vietnamese, and Chinese.
11
 
12
  ## How to Use
13
  You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.
@@ -19,58 +20,51 @@ pip install transformers
19
  ```
20
 
21
  ```python
22
- from transformers import AutoModelForSequenceClassification, XLMRobertaTokenizer
23
  import torch
24
 
25
- # Load tokenizer and model
26
- tokenizer = XLMRobertaTokenizer.from_pretrained("LocalDoc/language_detection")
27
- model = AutoModelForSequenceClassification.from_pretrained("LocalDoc/language_detection")
28
-
29
- # Prepare text
30
- text = "Əlqasım oğulları vorzakondu"
31
- encoded_input = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
32
-
33
- # Prediction
34
- model.eval()
35
- with torch.no_grad():
36
- outputs = model(**encoded_input)
37
-
38
- # Process the outputs
39
- logits = outputs.logits
40
- probabilities = torch.nn.functional.softmax(logits, dim=-1)
41
- predicted_class_index = probabilities.argmax().item()
42
- labels = ["az", "ar", "bg", "de", "el", "en", "es", "fr", "hi", "it", "ja", "nl", "pl", "pt", "ru", "sw", "th", "tr", "ur", "vi", "zh"]
43
- predicted_label = labels[predicted_class_index]
44
- print(f"Predicted Language: {predicted_label}")
 
 
 
 
 
 
 
 
 
 
45
  ```
46
 
47
  ## Language Label Information
48
 
49
  The model outputs a label for each prediction, corresponding to one of the languages listed below. Each label is associated with a specific language code as detailed in the following table:
50
 
51
- | Label | Language Code | Language Name |
52
- |-------|---------------|---------------|
53
- | 0 | az | Azerbaijani |
54
- | LABEL_1 | ar | Arabic |
55
- | LABEL_2 | bg | Bulgarian |
56
- | LABEL_3 | de | German |
57
- | LABEL_4 | el | Greek |
58
- | LABEL_5 | en | English |
59
- | LABEL_6 | es | Spanish |
60
- | LABEL_7 | fr | French |
61
- | LABEL_8 | hi | Hindi |
62
- | LABEL_9 | it | Italian |
63
- | LABEL_10 | ja | Japanese |
64
- | LABEL_11 | nl | Dutch |
65
- | LABEL_12 | pl | Polish |
66
- | LABEL_13 | pt | Portuguese |
67
- | LABEL_14 | ru | Russian |
68
- | LABEL_15 | sw | Swahili |
69
- | LABEL_16 | th | Thai |
70
- | LABEL_17 | tr | Turkish |
71
- | LABEL_18 | ur | Urdu |
72
- | LABEL_19 | vi | Vietnamese |
73
- | LABEL_20 | zh | Chinese |
74
 
75
  This mapping is utilized to decode the model's predictions into understandable language names, facilitating the interpretation of results for further processing or analysis.
76
 
 
4
  - az
5
  pipeline_tag: text-classification
6
  ---
7
+ # Sentiment Analysis Model for Azerbaijani Text
8
+ This repository hosts a fine-tuned XLM-RoBERTa model for sentiment analysis on Azerbaijani text. The model is capable of classifying text into three categories: negative, neutral, and positive. This README provides guidelines on how to setup and use the model for your own sentiment analysis tasks.
9
 
10
  ## Model Description
11
+ The model is based on `xlm-roberta-base`, which has been fine-tuned on a diverse dataset of Azerbaijani text samples. It is designed to understand the sentiment expressed in texts and classify them accordingly.
12
 
13
  ## How to Use
14
  You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.
 
20
  ```
21
 
22
  ```python
23
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
24
  import torch
25
 
26
+ # Load the model and tokenizer from Hugging Face Hub
27
+ model_name = "LocalDoc/sentiment_analysis_azerbaijani"
28
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
29
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
30
+
31
+ def predict_sentiment(text):
32
+ # Encode the text using the tokenizer
33
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
34
+
35
+ # Get predictions from the model
36
+ with torch.no_grad():
37
+ outputs = model(**inputs)
38
+
39
+ # Convert logits to probabilities using softmax
40
+ probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
41
+
42
+ # Get the highest probability and corresponding label
43
+ top_prob, top_label = torch.max(probs, dim=-1)
44
+ labels = ["negative", "neutral", "positive"]
45
+
46
+ # Return the label with the highest probability
47
+ return labels[top_label], top_prob
48
+
49
+ # Example text
50
+ text = "Bu mənim xoşuma gəlir"
51
+
52
+ # Get the sentiment
53
+ predicted_label, probability = predict_sentiment(text)
54
+ print(f"Predicted sentiment: {predicted_label} with a probability of {probability.item():.4f}")
55
+
56
  ```
57
 
58
  ## Language Label Information
59
 
60
  The model outputs a label for each prediction, corresponding to one of the languages listed below. Each label is associated with a specific language code as detailed in the following table:
61
 
62
+ | Label | Result |
63
+ |-------|--------|
64
+ | 0 | negative |
65
+ | 1 | neutral |
66
+ | 2 | positive |
67
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
  This mapping is utilized to decode the model's predictions into understandable language names, facilitating the interpretation of results for further processing or analysis.
70