Edit model card

Binary toxicity classifier for Ukrainian

This is the fine-tuned on the downstream task "xlm-roberta-large" instance.

The evaluation metrics for binary toxicity classification are:

Precision: 0.9468 Recall: 0.9465 F1: 0.9465

The training and evaluation data will be clarified later.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# load tokenizer and model weights
tokenizer = AutoTokenizer.from_pretrained('dardem/xlm-roberta-large-uk-toxicity')
model = AutoModelForSequenceClassification.from_pretrained('dardem/xlm-roberta-large-uk-toxicity')

# prepare the input
batch = tokenizer.encode('Ти неймовірна!', return_tensors='pt')

# inference
model(batch)

Citation

@article{dementieva2024toxicity,
  title={Toxicity Classification in Ukrainian},
  author={Dementieva, Daryna and Khylenko, Valeriia and Babakov, Nikolay and Groh, Georg},
  journal={arXiv preprint arXiv:2404.17841},
  year={2024}
}
Downloads last month
4
Safetensors
Model size
560M params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train dardem/xlm-roberta-large-uk-toxicity