Model Card

This model is a fine-tuned version of intfloat/multilingual-e5-small. It was fine-tuned on Factrank data with additional samples from Dutch and Belgian parliaments tagged by GPT and Gemini. The primary goal of this model is to determine whether a given statement warrants fact-checking. It does not determine whether the statement is factually correct. 1 label is given: FR, FNR or NF.

FR: Factual Relevant (the statement is fact-checkable and requites verification)
FNR: Factual, Not Relevant (the statement can be fact-checked, but the wider relevance is lower)
NF: Not Factual (the statement does not contain information for fact-checking)

Examples:

FR: Toch blijkt uit cijfers van Flanders Investment & Trade dat ons handel met het Verenigd Koninkrijk opnieuw op het niveau ligt van voor de brexit.
FNR: Ayleen werd opgelicht via dating fraude door de Tinder Swindler: "Het zijn net vampiers."
NF: Het heeft weinig zin om zomaar een aantal maatregelen te tonen.

Supported language: Dutch

Usage

from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
from huggingface_hub import login

hf_token = "insert_your_token_here"
login(token=hf_token)
config = AutoConfig.from_pretrained("textgain/FactRank_e5_small")
tokenizer = AutoTokenizer.from_pretrained("textgain/FactRank_e5_small")
model = AutoModelForSequenceClassification.from_pretrained("textgain/FactRank_e5_small", config=config)
pipe = pipeline(model=model, tokenizer=tokenizer, task="text-classification")


sample_texts = [
    "In een wereld die steeds digitaler wordt, moeten we het ook makkelijker maken om de controle over je financiën te hebben.",
    "Ik wil helemaal geen haren tussen u en de heer De Cock leggen.",
    "Je kunt van mening verschillen over welk gevolg je daaraan moet verbinden.",
    "We hebben 4.500 nieuwe kankergevallen in Nederland per jaar als gevolg van alcoholgebruik.",
    "Alcoholgebruik kost de samenleving 2 tot 4 miljard euro.",
    "Dus kan de minister daar vandaag wat meer over zeggen?"
    ]

results = pipe(sample_texts)
predicted_labels = [res["label"] for res in results]

Interpretation of Results

Factors Influencing the Label:

Subjective Evaluation: The presence of evaluations such as "interesting", "surprising", "incredible" might push the model towards predicting NF.
Research: The mention of research or studies pushes the model to consider the statement as a verifiable fact.
Context: Statements made in certain contexts may be more likely to get an FR label, e.g. statements about health and medicine.

Training Details

The model was trained on a total of 13 786 data samples.

Parameters:

num_epochs = 5
batch_size = 32
learning_rate = 1e-5
dropout = 0.5
gradient_accumulation_steps = 4

textgain
/

FactRank_e5_small

Model Card

Usage

Interpretation of Results

Training Details

Model tree for textgain/FactRank_e5_small