VerificadoProfesional
/

SaBERT-Spanish-Fake-News

Text Classification

Inference Endpoints

Model card Files Files and versions Community

SaBERT-Spanish-Fake-News / README.md

VerificadoProfesional's picture

VerificadoProfesional

Update README.md

c092c97 verified 2 months ago

|

raw history blame

No virus

2.93 kB

	---
	license: apache-2.0
	language:
	- es
	metrics:
	- accuracy
	pipeline_tag: text-classification
	widget:
	- text: "La tierra es Plana"
	output:
	- label: "False"
	score: 0.882
	- label: "True"
	score: 0.118
	---


	# Spanish Fake News Classifier

	## Overview
	This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA).
	The model is designed to detect fake news in Spanish and was fine-tuned on the dccuchile/bert-base-spanish-wwm-uncased model using a specific set of hyperparameters.
	It was trained on a dataset containing 125,000 Spanish news articles collected from various regions, both true and false.

	## Team Members
	- [Azul Fuentes](https://github.com/azu26)
	- [Dante Reinaudo](https://github.com/DanteReinaudo)
	- [Lucía Pardo](https://github.com/luciaPardo)
	- [Roberto Iskandarani](https://github.com/Robert-Iskandarani)


	## Model Details
	* Base Mode: dccuchile/bert-base-spanish-wwm-uncased
	* Hyperparameters:
	* dropout_rate = 0.1
	* num_classes = 2
	* max_length = 128
	* batch_size = 16
	* num_epochs = 5
	* learning_rate = 3e-5

	* Dataset: 125,000 Spanish news articles (True and False)

	## Metrics
	The model's performance was evaluated using the following metrics:

	* Accuracy = _83.17%_
	* F1-Score = _81.94%_
	* Precision = _85.62%_
	* Recall = _81.10%_




	## Usage
	### Installation
	You can install the required dependencies using pip:

	```bash
	pip install transformers torch
	```

	### Loading the Model
	```python
	from transformers import BertForSequenceClassification, BertTokenizer

	model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
	tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Fake-News")
	```

	### Predict Function
	```python
	def predict(model,tokenizer,text,threshold = 0.5):
	inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
	with torch.no_grad():
	outputs = model(**inputs)

	logits = outputs.logits
	probabilities = torch.softmax(logits, dim=1).squeeze().tolist()

	predicted_class = torch.argmax(logits, dim=1).item()
	if probabilities[predicted_class] <= threshold and predicted_class == 1:
	predicted_class = 0

	return bool(predicted_class), probabilities
	```
	### Making Predictions

	```python
	text = "Your Spanish news text here"
	predicted_label,probabilities = predict(model,tokenizer,text)
	print(f"Text: {text}")
	print(f"Predicted Class: {predicted_label}")
	print(f"Probabilities: {probabilities}")
	```

	## License
	Apache License 2.0

	## Acknowledgments
	Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.