|
--- |
|
license: apache-2.0 |
|
language: |
|
- es |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
|
|
# Spanish Fake News Classifier |
|
|
|
## Overview |
|
This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). |
|
The model is designed to detect fake news in Spanish and was fine-tuned on the *dccuchile/bert-base-spanish-wwm-uncased* model using a specific set of hyperparameters. |
|
It was trained on a dataset containing 125,000 Spanish news articles collected from various regions, both true and false. |
|
|
|
## Model Details |
|
* **Base Mode**: dccuchile/bert-base-spanish-wwm-uncased |
|
* **Hyperparameters**: |
|
* **dropout_rate = 0.1** |
|
* **num_classes = 2** |
|
* **max_length = 128** |
|
* **batch_size = 16** |
|
* **num_epochs = 5** |
|
* **learning_rate = 3e-5** |
|
|
|
* **Dataset**: 125,000 Spanish news articles (True and False) |
|
|
|
## Metrics |
|
The model's performance was evaluated using the following metrics: |
|
|
|
* **Accuracy = _83.17%_** |
|
* **F1-Score = _81.94%_** |
|
* **Precision = _85.62%_** |
|
* **Recall = _81.10%_** |
|
|
|
|
|
|
|
|
|
## Usage |
|
### Installation |
|
You can install the required dependencies using pip: |
|
|
|
```bash |
|
pip install transformers torch |
|
``` |
|
|
|
### Loading the Model |
|
```python |
|
from transformers import BertForSequenceClassification, BertTokenizer |
|
|
|
model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News ") |
|
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News ") |
|
``` |
|
|
|
### Predict Function |
|
```python |
|
def predict(model,tokenizer,text,threshold = 0.5): |
|
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512) |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
logits = outputs.logits |
|
probabilities = torch.softmax(logits, dim=1).squeeze().tolist() |
|
|
|
predicted_class = torch.argmax(logits, dim=1).item() |
|
if probabilities[predicted_class] <= threshold and predicted_class == 1: |
|
predicted_class = 0 |
|
|
|
return bool(predicted_class), probabilities |
|
``` |
|
### Making Predictions |
|
|
|
```python |
|
text = "Your Spanish news text here" |
|
predicted_label,probabilities = predict(model,tokenizer,text) |
|
print(f"Text: {text}") |
|
print(f"Predicted Class: {predicted_label}") |
|
print(f"Probabilities: {probabilities}") |
|
``` |
|
|
|
## License |
|
Apache License 2.0 |
|
|
|
## Acknowledgments |
|
Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training. |
|
|
|
|
|
|
|
|
|
|