premsa/political-bias-prediction-allsides-mDeBERTa

model base: https://huggingface.co/microsoft/mdeberta-v3-base

dataset: https://github.com/ramybaly/Article-Bias-Prediction

training parameters:

devices: 2xH100
batch_size: 100
epochs: 5
dropout: 0.05
max_length: 512
learning_rate: 3e-5
warmup_steps: 100
random_state: 239

training methodology:

sanitize dataset following specific rule-set, utilize random split as provided in the dataset
train on train split and evaluate on validation split in each epoch
evaluate test split only on the model that performed best on validation loss

result summary:

throughout the five training epochs, model of second epoch achieved the lowest validation loss of 0.2573
on test split second epoch model achieved f1 score of 0.9184 and a test loss of 0.2904

usage:

model = AutoModelForSequenceClassification.from_pretrained("premsa/political-bias-prediction-allsides-mDeBERTa")
tokenizer = AutoTokenizer.from_pretrained("premsa/political-bias-prediction-allsides-mDeBERTa")
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(nlp("die massen werden von den medien kontrolliert."))