VerificadoProfesional commited on
Commit
b175f15
1 Parent(s): 90ecfdc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,3 +1,91 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - es
5
+ metrics:
6
+ - accuracy
7
+ pipeline_tag: text-classification
8
  ---
9
+
10
+ # Spanish Sentiment Analysis Classifier
11
+
12
+ ## Overview
13
+ This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA).
14
+ The model is designed to detect sentiments in Spanish and was fine-tuned on the *dccuchile/bert-base-spanish-wwm-uncased* model using a specific set of hyperparameters.
15
+ It was trained on a dataset containing 11,500 Spanish tweets collected from various regions, both positive and negative.
16
+
17
+ ## Team Members
18
+ - **[Azul Fuentes](https://github.com/azu26)**
19
+ - **[Dante Reinaudo](https://github.com/DanteReinaudo)**
20
+ - **[Lucía Pardo](https://github.com/luciaPardo)**
21
+ - **[Roberto Iskandarani](https://github.com/Robert-Iskandarani)**
22
+
23
+
24
+ ## Model Details
25
+ * **Base Mode**: dccuchile/bert-base-spanish-wwm-uncased
26
+ * **Hyperparameters**:
27
+ * **dropout_rate = 0.1**
28
+ * **num_classes = 2**
29
+ * **max_length = 128**
30
+ * **batch_size = 16**
31
+ * **num_epochs = 10**
32
+ * **learning_rate = 3e-5**
33
+
34
+ * **Dataset**: 11,500 Spanish tweets (Positive and Negative)
35
+
36
+ ## Metrics
37
+ The model's performance was evaluated using the following metrics:
38
+
39
+ * **Accuracy = _85.50%_**
40
+ * **F1-Score = _85.49%_**
41
+ * **Precision = _85.50%_**
42
+ * **Recall = _85.49%_**
43
+
44
+
45
+
46
+ ## Usage
47
+ ### Installation
48
+ You can install the required dependencies using pip:
49
+
50
+ ```bash
51
+ pip install transformers torch
52
+ ```
53
+
54
+ ### Loading the Model
55
+ ```python
56
+ from transformers import BertForSequenceClassification, BertTokenizer
57
+ model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
58
+ tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
59
+ ```
60
+
61
+ ### Predict Function
62
+ ```python
63
+ def predict(model,tokenizer,text,threshold = 0.5):
64
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
65
+ with torch.no_grad():
66
+ outputs = model(**inputs)
67
+
68
+ logits = outputs.logits
69
+ probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
70
+
71
+ predicted_class = torch.argmax(logits, dim=1).item()
72
+ if probabilities[predicted_class] <= threshold and predicted_class == 1:
73
+ predicted_class = 0
74
+
75
+ return bool(predicted_class), probabilities
76
+ ```
77
+ ### Making Predictions
78
+
79
+ ```python
80
+ text = "Your Spanish news text here"
81
+ predicted_label,probabilities = predict(model,tokenizer,text)
82
+ print(f"Text: {text}")
83
+ print(f"Predicted Class: {predicted_label}")
84
+ print(f"Probabilities: {probabilities}")
85
+ ```
86
+
87
+ ## License
88
+ Apache License 2.0
89
+
90
+ ## Acknowledgments
91
+ Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.