File size: 2,929 Bytes
c545b4c 596fa66 44fa654 c092c97 68bb544 c092c97 61c9710 c545b4c 596fa66 f2ee3ba 5836f2a f2ee3ba 5836f2a f2ee3ba |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
license: apache-2.0
language:
- es
metrics:
- accuracy
pipeline_tag: text-classification
widget:
- text: "La tierra es Plana"
output:
- label: "False"
score: 0.882
- label: "True"
score: 0.118
---
# Spanish Fake News Classifier
## Overview
This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA).
The model is designed to detect fake news in Spanish and was fine-tuned on the *dccuchile/bert-base-spanish-wwm-uncased* model using a specific set of hyperparameters.
It was trained on a dataset containing 125,000 Spanish news articles collected from various regions, both true and false.
## Team Members
- **[Azul Fuentes](https://github.com/azu26)**
- **[Dante Reinaudo](https://github.com/DanteReinaudo)**
- **[Lucía Pardo](https://github.com/luciaPardo)**
- **[Roberto Iskandarani](https://github.com/Robert-Iskandarani)**
## Model Details
* **Base Mode**: dccuchile/bert-base-spanish-wwm-uncased
* **Hyperparameters**:
* **dropout_rate = 0.1**
* **num_classes = 2**
* **max_length = 128**
* **batch_size = 16**
* **num_epochs = 5**
* **learning_rate = 3e-5**
* **Dataset**: 125,000 Spanish news articles (True and False)
## Metrics
The model's performance was evaluated using the following metrics:
* **Accuracy = _83.17%_**
* **F1-Score = _81.94%_**
* **Precision = _85.62%_**
* **Recall = _81.10%_**
## Usage
### Installation
You can install the required dependencies using pip:
```bash
pip install transformers torch
```
### Loading the Model
```python
from transformers import BertForSequenceClassification, BertTokenizer
model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
```
### Predict Function
```python
def predict(model,tokenizer,text,threshold = 0.5):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
predicted_class = torch.argmax(logits, dim=1).item()
if probabilities[predicted_class] <= threshold and predicted_class == 1:
predicted_class = 0
return bool(predicted_class), probabilities
```
### Making Predictions
```python
text = "Your Spanish news text here"
predicted_label,probabilities = predict(model,tokenizer,text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")
```
## License
Apache License 2.0
## Acknowledgments
Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.
|