File size: 2,929 Bytes
c545b4c
 
596fa66
 
 
 
 
44fa654
 
 
c092c97
68bb544
c092c97
61c9710
c545b4c
596fa66
 
 
 
f2ee3ba
 
 
 
 
5836f2a
 
 
 
 
 
 
f2ee3ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5836f2a
 
f2ee3ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
license: apache-2.0
language:
- es
metrics:
- accuracy
pipeline_tag: text-classification
widget:
  - text: "La tierra es Plana"
    output:
      - label: "False"
        score: 0.882
      - label: "True"
        score: 0.118
---


# Spanish Fake News Classifier

## Overview
This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). 
The model is designed to detect fake news in Spanish and was fine-tuned on the *dccuchile/bert-base-spanish-wwm-uncased* model using a specific set of hyperparameters. 
It was trained on a dataset containing 125,000 Spanish news articles collected from various regions, both true and false.

## Team Members
- **[Azul Fuentes](https://github.com/azu26)**
- **[Dante Reinaudo](https://github.com/DanteReinaudo)** 
- **[Lucía Pardo](https://github.com/luciaPardo)**
- **[Roberto Iskandarani](https://github.com/Robert-Iskandarani)**


## Model Details
* **Base Mode**: dccuchile/bert-base-spanish-wwm-uncased
* **Hyperparameters**: 
  * **dropout_rate = 0.1**
  * **num_classes = 2**
  * **max_length = 128**
  * **batch_size = 16**
  * **num_epochs = 5**
  * **learning_rate = 3e-5**
    
* **Dataset**: 125,000 Spanish news articles (True and False)

## Metrics
The model's performance was evaluated using the following metrics:

  * **Accuracy = _83.17%_**
  * **F1-Score = _81.94%_**
  * **Precision = _85.62%_**
  * **Recall = _81.10%_**

    


## Usage
### Installation
You can install the required dependencies using pip:

```bash
pip install transformers torch
```

### Loading the Model
```python
from transformers import BertForSequenceClassification, BertTokenizer

model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
```

### Predict Function
```python
def predict(model,tokenizer,text,threshold = 0.5):   
        inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
        with torch.no_grad():
            outputs = model(**inputs)
        
        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
        
        predicted_class = torch.argmax(logits, dim=1).item()
        if probabilities[predicted_class] <= threshold and predicted_class == 1:
            predicted_class = 0
  
        return bool(predicted_class), probabilities
```
### Making Predictions

```python
text = "Your Spanish news text here"
predicted_label,probabilities = predict(model,tokenizer,text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")
```

## License
Apache License 2.0

## Acknowledgments
Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.