VerificadoProfesional commited on
Commit
f2ee3ba
1 Parent(s): 596fa66

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md CHANGED
@@ -10,3 +10,82 @@ pipeline_tag: text-classification
10
 
11
  # Spanish Fake News Classifier
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  # Spanish Fake News Classifier
12
 
13
+ ## Overview
14
+ This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA).
15
+ The model is designed to detect fake news in Spanish and was fine-tuned on the *dccuchile/bert-base-spanish-wwm-uncased* model using a specific set of hyperparameters.
16
+ It was trained on a dataset containing 125,000 Spanish news articles collected from various regions, both true and false.
17
+
18
+ ## Model Details
19
+ * **Base Mode**: dccuchile/bert-base-spanish-wwm-uncased
20
+ * **Hyperparameters**:
21
+ * **dropout_rate = 0.1**
22
+ * **num_classes = 2**
23
+ * **max_length = 128**
24
+ * **batch_size = 16**
25
+ * **num_epochs = 5**
26
+ * **learning_rate = 3e-5**
27
+
28
+ * **Dataset**: 125,000 Spanish news articles (True and False)
29
+
30
+ ## Metrics
31
+ The model's performance was evaluated using the following metrics:
32
+
33
+ * **Accuracy = _83.17%_**
34
+ * **F1-Score = _81.94%_**
35
+ * **Precision = _85.62%_**
36
+ * **Recall = _81.10%_**
37
+
38
+
39
+
40
+
41
+ ## Usage
42
+ ### Installation
43
+ You can install the required dependencies using pip:
44
+
45
+ ```bash
46
+ pip install transformers torch
47
+ ```
48
+
49
+ ### Loading the Model
50
+ ```python
51
+ from transformers import BertForSequenceClassification, BertTokenizer
52
+
53
+ model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News ")
54
+ tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News ")
55
+ ```
56
+
57
+ ### Predict Function
58
+ ```python
59
+ def predict(model,tokenizer,text,threshold = 0.5):
60
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
61
+ with torch.no_grad():
62
+ outputs = model(**inputs)
63
+
64
+ logits = outputs.logits
65
+ probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
66
+
67
+ predicted_class = torch.argmax(logits, dim=1).item()
68
+ if probabilities[predicted_class] <= threshold and predicted_class == 1:
69
+ predicted_class = 0
70
+
71
+ return bool(predicted_class), probabilities
72
+ ```
73
+ ### Making Predictions
74
+
75
+ ```python
76
+ text = "Your Spanish news text here"
77
+ predicted_label,probabilities = predict(model,tokenizer,text)
78
+ print(f"Text: {text}")
79
+ print(f"Predicted Class: {predicted_label}")
80
+ print(f"Probabilities: {probabilities}")
81
+ ```
82
+
83
+ ## License
84
+ Apache License 2.0
85
+
86
+ ## Acknowledgments
87
+ Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.
88
+
89
+
90
+
91
+