Instructions to use Monk3ydluffy/truthlens-bert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Monk3ydluffy/truthlens-bert with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Monk3ydluffy/truthlens-bert")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Monk3ydluffy/truthlens-bert") model = AutoModelForSequenceClassification.from_pretrained("Monk3ydluffy/truthlens-bert") - Notebooks
- Google Colab
- Kaggle
Model Card for Model ID
VerifAI BERT β Fake News Detection Model
Model Details
Model Description
TruthLens BERT is a fine-tuned RoBERTa-base transformer model trained for binary fake news classification. It analyzes English news articles and headlines, classifying them as either FAKE (LABEL_0) or REAL (LABEL_1) with a confidence score.
The model was developed as the primary AI engine powering VerifAI β an advanced misinformation detection platform designed for journalists, researchers, and analysts. It works as part of a 4-layer detection architecture combining:
- BERT neural analysis (this model)
- SVM ensemble classifier
- Credibility signal scoring
- Rule-based decision engine
Fine-tuned from roberta-base on a merged dataset of 44,000+ verified English news articles from multiple sources including the Kaggle Fake and Real News Dataset and the ISOT Fake News Dataset.
Achieves approximately 98% accuracy on the held-out test set with strong performance across both FAKE and REAL classification categories.
This is the model card of a π€ transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Hithesh (Monk3ydluffy)
- Funded by [optional]: Self-funded academic AI/ML research project
- Shared by [optional]: Hithesh (Monk3ydluffy)
- Model type: Text Classification β Fine-tuned RoBERTa-base transformer for binary fake news detection (FAKE / REAL)
- Language(s) (NLP): English (en)
- License: mit
- Finetuned from model [optional]: roberta-base
Model Sources [optional]
- Repository: https://github.com/Hithesh-07/FNDS
- Paper [optional]: N/A β Academic research project
- Demo [optional]: https://aletheia-k1os.onrender.com
Uses
Direct Use
This model can be used directly for fake news detection without any additional fine-tuning. Simply input any English news headline or article text and it returns LABEL_0 (FAKE) or LABEL_1 (REAL) with a confidence score.
Suitable for:
- Journalists verifying news credibility before publishing
- Researchers studying misinformation patterns
- Developers building fact-checking tools and pipelines
- Educators teaching media literacy and critical thinking
- Anyone who wants to verify news before sharing
Downstream Use [optional]
This model is integrated as the primary BERT engine in VerifAI β a 4-layer fake news detection system combining:
- BERT neural analysis (this model) β primary
- SVM ensemble classifier β secondary/fallback
- Credibility signal scoring β rule-based layer
- Decision engine β final verdict generator
Can be plugged into any NLP pipeline requiring binary misinformation classification in English.
Out-of-Scope Use
- Non-English content β model was not trained on other languages and will give unreliable results
- Satire detection β satirical content written in neutral journalistic tone may be misclassified as REAL
- Legal or judicial decision making β not suitable as evidence or legal determination of truth
- Automated content removal β should never be used as sole arbiter for removing content without human review
- Real-time moderation at massive scale without human oversight in the loop
- Medical or scientific claim verification β domain-specific knowledge not covered in training
Bias, Risks, and Limitations
Dataset Bias: Trained primarily on US and UK political news articles. May underperform on regional, cultural, or non-Western news content that uses different writing styles.
Language Bias: English only. Articles written by non-native English speakers or in informal language may be incorrectly classified due to style differences from training data.
Temporal Bias: Training data has a cutoff date. New evolving forms of misinformation or new topics not covered in training may not be detected accurately.
Satire Risk: Well-written satirical content using neutral journalistic tone may be classified as REAL news due to its professional writing style.
Sophisticated Fake News: Professionally written misinformation that mimics real journalism style is the hardest category. The model performs best when combined with rule-based credibility signal analysis rather than used alone.
Confidence Overestimation: Model may show high confidence on edge cases. Predictions below 70% confidence should be treated as UNCERTAIN rather than definitive verdicts.
Recommendations
- Always combine model predictions with human judgment and do not use as the sole decision-maker
- Use alongside rule-based credibility signal analysis for best results
- Verify important claims from trusted sources such as BBC, Reuters, AP News, or The Guardian
- Treat confidence scores below 70% as UNCERTAIN
- Do not use for content removal without human review
- Be aware of potential bias toward Western news styles
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import pipeline
# Load the model
classifier = pipeline(
"text-classification",
model="Monk3ydluffy/truthlens-bert"
)
# Label mapping
label_map = {
"LABEL_0": "FAKE",
"LABEL_1": "REAL"
}
# Predict function
def predict(text):
result = classifier(text[:512])[0]
label = label_map[result["label"]]
confidence = round(result["score"] * 100, 2)
return {
"label" : label,
"confidence" : confidence
}
# Example usage
examples = [
"The Federal Reserve raised interest rates by 0.25 percent.",
"SHOCKING: Government HIDING 5G cancer link EXPOSED!",
"Some researchers suggest this may improve health outcomes.",
]
for text in examples:
result = predict(text)
print(f"{result['label']} ({result['confidence']}%) β {text[:60]}")
# Output:
# REAL (94.2%) β The Federal Reserve raised interest rates...
# FAKE (96.8%) β SHOCKING: Government HIDING 5G cancer link...
# REAL (71.3%) β Some researchers suggest this may improve...
Training Details
Training Data
Trained on a merged dataset of 44,000+ English news articles from multiple sources:
Kaggle Fake and Real News Dataset (clmentbisaillon/fake-and-real-news-dataset) containing political news articles labeled as Fake or Real
ISOT Fake News Dataset from the University of Victoria containing news from 2016-2017
Preprocessing applied:
- Duplicate articles removed
- Classes balanced via undersampling majority class resulting in equal FAKE and REAL samples
- Title and article body combined into single text field
- Text truncated to 256 tokens maximum
- 80/20 train/test stratified split
Training Procedure
Preprocessing
- Tokenizer: RoBERTa tokenizer from roberta-base
- Max sequence length: 256 tokens
- Truncation: Enabled (right-side truncation)
- Padding: Max length padding
- Text format: article title + space + article body
- Label encoding: FAKE=0 (LABEL_0), REAL=1 (LABEL_1)
- Duplicates removed before tokenization
- Stratified train/test split (80/20)
Training Hyperparameters
Training regime: fp16 mixed precision on GPU, fp32 on CPU fallback
Base model: roberta-base
Number of epochs: 3
Batch size: 8
Learning rate: 2e-5
Optimizer: AdamW (weight decay = 0.01)
Warmup steps: 10% of total training steps
LR scheduler: Linear warmup + linear decay
Gradient clipping: 1.0
Best model saved by: highest validation accuracy
Speeds, Sizes, Times
- Hardware: NVIDIA T4 GPU via Google Colab
- Training time: approximately 5-10 minutes on T4 GPU
- Model size: approximately 500MB (roberta-base scale)
- Inference time: 0.3 to 1.5 seconds per article via API
- Checkpoint: saved at best validation accuracy epoch
Evaluation
Testing Data, Factors & Metrics
Testing Data
20% held-out stratified split from the merged training dataset. Approximately 8,800 articles in the test set with equal FAKE and REAL class representation. No overlap with training data.
Factors
Evaluation disaggregated across:
- Article length: short headlines vs full articles
- News category: political, health, science, general
- Fake news type: clickbait, conspiracy, sophisticated misinformation mimicking real journalism
Metrics
- Accuracy: Overall percentage of correct predictions chosen as primary metric for balanced dataset
- F1 Score (weighted): Harmonic mean of precision and recall weighted by class support
- Precision: Ratio of true positives to all predicted positives per class
- Recall: Ratio of true positives to all actual positives per class
Results
| Metric | Score |
|---|---|
| Accuracy | ~98% |
| F1 Score (weighted) | ~98% |
| Precision β FAKE | ~97% |
| Precision β REAL | ~99% |
| Recall β FAKE | ~99% |
| Recall β REAL | ~97% |
Summary
The model achieves approximately 98% accuracy on the held-out test set demonstrating strong performance on both FAKE and REAL classification. Performance is strongest on clearly sensational fake news and formal real news articles. Sophisticated misinformation that mimics real journalism style remains the most challenging category and benefits most from combining this model with rule-based credibility analysis.
Model Examination
The model is integrated into VerifAI as part of a 4-layer detection architecture. Interpretability is provided through:
- Confidence scores for each prediction
- Keywords extracted via TF-IDF showing top influencing words per prediction
- Risk indicator flags from credibility signal engine
- Model breakdown showing BERT vs SVM individual verdicts for transparency
Live demo showing full explainability: https://aletheia-k1os.onrender.com
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA T4 GPU (Google Colab)
- Hours used: Approximately 0.2 hours (12 minutes)
- Cloud Provider: Google (Google Colab)
- Compute Region: United States
- Carbon Emitted: Approximately 0.01 kg CO2eq (minimal due to short training duration)
Technical Specifications
Model Architecture and Objective
- Base architecture: RoBERTa (roberta-base)
- Layers: 12 transformer encoder layers
- Hidden dimensions: 768
- Attention heads: 12
- Total parameters: ~125 million
- Added classification head: Linear(768 β 2)
- Objective: Binary cross-entropy classification
- Classes: FAKE (LABEL_0) and REAL (LABEL_1)
- Activation: Softmax on classification head
Compute Infrastructure
Training on Google Colab T4 GPU. Inference via HuggingFace Inference API. Deployed within VerifAI on Render.com.
Hardware
NVIDIA T4 GPU β 16GB VRAM via Google Colab free tier
Software
- Python 3.12
- PyTorch 2.x
- HuggingFace Transformers 4.x
- HuggingFace Datasets
- HuggingFace Accelerate
- scikit-learn (evaluation metrics)
- Google Colab environment
Citation
BibTeX:
@misc{truthlens-bert-2026,
author = {Hithesh (Monk3ydluffy)},
title = {TruthLens BERT: A Fine-tuned RoBERTa
Model for Binary Fake News Detection},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{
https://huggingface.co/Monk3ydluffy/truthlens-bert}}
}
APA:
Hithesh (Monk3ydluffy). (2026). TruthLens BERT: A Fine-tuned RoBERTa Model for Binary Fake News Detection. HuggingFace. https://huggingface.co/Monk3ydluffy/truthlens-bert
Glossary
- BERT: Bidirectional Encoder Representations from Transformers β pretrained language model
- RoBERTa: Robustly Optimized BERT Pretraining Approach β improved version of BERT
- FAKE (LABEL_0): Content classified as misinformation or fabricated news
- REAL (LABEL_1): Content classified as credible and factual news
- Confidence Score: Probability (0.0 to 1.0) that the prediction is correct
- Fine-tuning: Adapting a pretrained model to a specific downstream task
- SVM: Support Vector Machine β secondary classifier used alongside BERT in VerifAI
- TF-IDF: Term Frequency Inverse Document Frequency β used for keyword extraction
- VerifAI: The fake news detection platform powered by this model
More Information
This model is part of the VerifAI project β a full-stack fake news detection platform combining BERT neural analysis with SVM ensemble classification, credibility signal scoring, and a rule-based decision engine.
- Live demo: https://aletheia-k1os.onrender.com
- GitHub: https://github.com/Hithesh-07/FNDS
- Model: https://huggingface.co/Monk3ydluffy/truthlens-bert
Model Card Authors
Hithesh (Monk3ydluffy)
Model Card Contact
Via HuggingFace profile: https://huggingface.co/Monk3ydluffy
- Downloads last month
- 5