Factuality Classifier for Medical RAG (PubMedBERT)

Model Description

This model is a fine-tuned version of PubMedBERT (microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext), designed to classify the factuality of medical documents. It was developed as the core factuality estimation component for FRAG (Factuality-aware Retrieval-Augmented Generation), a multidimensional RAG framework that retrieves external knowledge based on both topical relevance and factual reliability.

The classifier evaluates whether a medical text takes a correct scientific stance toward the medical claims it contains, acting as a safeguard against medical misinformation in LLM-based question answering.

Model type: Text Classification (Binary: 1 = Factual, 0 = Non-Factual)
Language(s): English
License: MIT
Base Model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext

Intended Uses & Limitations

Intended Use: This model is intended to be used as a re-ranking or filtering component within a Retrieval-Augmented Generation (RAG) pipeline in the healthcare domain. Given a retrieved medical document, it predicts whether the document's content aligns with verified scientific truth.

Limitations:

It is trained on general medical misinformation and might not generalize perfectly to highly specialized, cutting-edge primary literature (e.g., highly specific clinical trial results not present in the training set).
This is a research prototype and should not be used for autonomous medical diagnosis or clinical decision-making without expert supervision.

Training Data

The model was fine-tuned on a custom "Claim-Based Dataset" derived from the Monant Medical Misinformation Dataset. Specifically, the data relies on Monant's Type 6 annotations, which cross-reference the cited medical claim, the article's stance, and the claim's actual scientific veracity.

To ensure the highest data quality, a strict zero-tolerance policy was adopted: an article is labeled as 'Factual' (Class 1) only if it maintains a correct stance toward every single claim it contains. The presence of just one incorrect association leads to the classification of the whole document as 'Non-Factual' (Class 0).

The final dataset consists of 14,696 perfectly balanced records:

Training Set: 9,404 articles
Validation Set: 2,352 articles
Test Set: 2,940 articles

Training Procedure

The model was trained using TensorFlow/Keras with the following hyperparameters:

Max Sequence Length: 512 tokens
Batch Size: 4
Learning Rate: 2e-5
Optimizer: Adam
Loss Function: Sparse Categorical Crossentropy
Epochs: 20 (with Early Stopping monitoring val_loss, patience=5, restoring best weights)

Note: During training, divergence between training and validation loss was observed starting from the third epoch, indicating the onset of overfitting. The Early Stopping mechanism successfully intervened, restoring the model configuration with the highest generalization capability.

Evaluation Results

Evaluated on the unseen test set (2,940 articles), the model demonstrated a strong ability to capture the subtle semantic discrepancies that distinguish factual medical claims from misinformation.

Test Accuracy: 79.42%

Confusion Matrix Breakdown:

True Positives (Predicted Factual, Actual Factual): 1035
True Negatives (Predicted Non-Factual, Actual Non-Factual): 1300
False Positives: 170
False Negatives: 435

How to Get Started with the Model

You can easily load and use this model via the Hugging Face transformers library. The model takes the raw HTML-cleaned text of the medical article as input.

Using TensorFlow (as trained):

import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("tommibazzo01/factuality-classifier-pubmedbert-ClaimBased")
model = TFAutoModelForSequenceClassification.from_pretrained("tommibazzo01/factuality-classifier-pubmedbert-ClaimBased")

# Example medical article text
article_text = "Extensive scientific research has proven that there is no link between vaccines and autism."

# Tokenize
inputs = tokenizer(
    article_text, 
    return_tensors="tf", 
    truncation=True, 
    padding=True, 
    max_length=512
)

# Predict
outputs = model(inputs)
predicted_class = tf.argmax(outputs.logits, axis=1).numpy()[0]

labels = {0: "Non-Factual", 1: "Factual"}
print(f"Prediction: {labels[predicted_class]}")

Downloads last month: 1

Model tree for tommibazzo01/factuality-classifier-pubmedbert-ClaimBased

Base model

microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext

Finetuned

(150)

this model