license: apache-2.0
language:
- es
metrics:
- accuracy
pipeline_tag: text-classification
widget:
- text: La tierra es Plana
output:
- label: 'FALSE'
score: 0.8
- label: 'TRUE'
score: 0.2
Spanish Fake News Classifier
Overview
This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). The model is designed to detect fake news in Spanish and was fine-tuned on the dccuchile/bert-base-spanish-wwm-uncased model using a specific set of hyperparameters. It was trained on a dataset containing 125,000 Spanish news articles collected from various regions, both true and false.
Team Members
Model Details
Base Mode: dccuchile/bert-base-spanish-wwm-uncased
Hyperparameters:
- dropout_rate = 0.1
- num_classes = 2
- max_length = 128
- batch_size = 16
- num_epochs = 5
- learning_rate = 3e-5
Dataset: 125,000 Spanish news articles (True and False)
Metrics
The model's performance was evaluated using the following metrics:
- Accuracy = 83.17%
- F1-Score = 81.94%
- Precision = 85.62%
- Recall = 81.10%
Usage
Installation
You can install the required dependencies using pip:
pip install transformers torch
Loading the Model
from transformers import BertForSequenceClassification, BertTokenizer
model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
Predict Function
def predict(model,tokenizer,text,threshold = 0.5):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
predicted_class = torch.argmax(logits, dim=1).item()
if probabilities[predicted_class] <= threshold and predicted_class == 1:
predicted_class = 0
return bool(predicted_class), probabilities
Making Predictions
text = "Your Spanish news text here"
predicted_label,probabilities = predict(model,tokenizer,text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")
License
Apache License 2.0
Acknowledgments
Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.