Edit model card

Spanish Fake News Classifier

Overview

This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). The model is designed to detect fake news in Spanish and was fine-tuned on the dccuchile/bert-base-spanish-wwm-uncased model using a specific set of hyperparameters. It was trained on a dataset containing 125,000 Spanish news articles collected from various regions, both true and false.

Team Members

Model Details

  • Base Mode: dccuchile/bert-base-spanish-wwm-uncased

  • Hyperparameters:

    • dropout_rate = 0.1
    • num_classes = 2
    • max_length = 128
    • batch_size = 16
    • num_epochs = 5
    • learning_rate = 3e-5
  • Dataset: 125,000 Spanish news articles (True and False)

Metrics

The model's performance was evaluated using the following metrics:

  • Accuracy = 83.17%
  • F1-Score = 81.94%
  • Precision = 85.62%
  • Recall = 81.10%

Usage

Installation

You can install the required dependencies using pip:

pip install transformers torch

Loading the Model

from transformers import BertForSequenceClassification, BertTokenizer

model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")

Predict Function

def predict(model,tokenizer,text,threshold = 0.5):   
        inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
        with torch.no_grad():
            outputs = model(**inputs)
        
        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
        
        predicted_class = torch.argmax(logits, dim=1).item()
        if probabilities[predicted_class] <= threshold and predicted_class == 1:
            predicted_class = 0
  
        return bool(predicted_class), probabilities

Making Predictions

text = "Your Spanish news text here"
predicted_label,probabilities = predict(model,tokenizer,text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")

License

Apache License 2.0

Acknowledgments

Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.

Downloads last month
788
Safetensors
Model size
110M params
Tensor type
F32
·