MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7

Dec 8, 2023

hi , i wanna train this model on my data with serveal labels
how shuld i do that ,
this is zero classification and i do now know how i should train on my text and label

MoritzLaurer

Owner Dec 19, 2023

you can look at this tutorial and especially notebook 4: https://github.com/MoritzLaurer/summer-school-transformers-2023/tree/main

Ihor

Jan 26, 2024

•

edited Jan 28, 2024

Please check the LiqFit, it allows with few examples to achieve good performance in text classification.

You can choose any model you want and different loss functions, such as focal loss and fine-tune model with transformers Trainer:

from liqfit.modeling import LiqFitModel
from liqfit.losses import FocalLoss
from liqfit.collators import NLICollator
from transformers import TrainingArguments, Trainer

backbone_model = AutoModelForSequenceClassification.from_pretrained('MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7')

loss_func = FocalLoss(multi_target=True)

model = LiqFitModel(backbone_model.config, backbone_model, loss_func=loss_func)

data_collator = NLICollator(tokenizer, max_length=128, padding=True, truncation=True)

training_args = TrainingArguments(
    output_dir='comprehendo',
    learning_rate=3e-5,
    per_device_train_batch_size=3,
    per_device_eval_batch_size=3,
    num_train_epochs=9,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_steps = 5000,
    save_total_limit=3,
    remove_unused_columns=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=nli_train_dataset,
    eval_dataset=nli_test_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

MoritzLaurer

Owner Jan 28, 2024

@Ihor do you have a link to the source repo?

Ihor

Jan 28, 2024

@MoritzLaurer , sure, I have. Here it is https://github.com/Knowledgator/LiqFit. The link in the previous comment was fixed as well.

MoritzLaurer
/

mDeBERTa-v3-base-xnli-multilingual-nli-2mil7

fine tune