--- license: openrail++ language: - uk widget: - text: Ти неймовірна! datasets: - ukr-detect/ukr-toxicity-dataset --- ## Binary toxicity classifier for Ukrainian. This is the fine-tuned on the downstream task ["distilbert-base-multilingual-cased"](https://huggingface.co/distilbert-base-multilingual-cased) instance. The evaluation metrics for binary toxicity classification are: **Precision**: 0.9310 **Recall**: 0.9300 **F1**: 0.9300 The training and evaluation data will be clarified later. ## How to use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # load tokenizer and model weights tokenizer = AutoTokenizer.from_pretrained('dardem/mdistilbert-base-cased-uk-toxicity') model = AutoModelForSequenceClassification.from_pretrained('dardem/mdistilbert-base-cased-uk-toxicity') # prepare the input batch = tokenizer.encode('Ти неймовірна!', return_tensors='pt') # inference model(batch) ``` ## Citation ``` @article{dementieva2024toxicity, title={Toxicity Classification in Ukrainian}, author={Dementieva, Daryna and Khylenko, Valeriia and Babakov, Nikolay and Groh, Georg}, journal={arXiv preprint arXiv:2404.17841}, year={2024} } ```