Model Card for Model ID

This model aims to help developers, especially those with little to no experience in NLP, use our model directly to flag or block users from their platforms. Moreover, since our model also knows harmful or unethical comments, it can be used to make AI models, especially when integrated with machines like Robots, provides a last layer of decision to act upon the thoughts. In a nutshell our model assists AI models to better understand whether a thought is ethical or moral, and wheather it should take action on it. Hence making AI safer for all. Our model aims to work with any arbitrary language, as long as it is supported by the XLM-R vector space aligner embedder model. #Abuse detection #Toxicity analysis #Obscene language detection #Harm, unethical thought detection.

Langauges supported:

Afrikaans
Albanian
Amharic
Arabic
Armenian
Assamese
Azerbaijani
Basque
Belarusian
Bengali
Bhojpuri
Bosnian
Bulgarian
Burmese
Catalan
Cebuano
Chewa
Chinese (Simplified)
Chinese (Traditional)
Chittagonian
Corsican
Croatian
Czech
Danish
Deccan
Dutch
English
Esperanto
Estonian
Filipino
Finnish
French
Frisian
Galician
Georgian
German
Greek
Gujarati
Haitian Creole
Hausa
Hawaiian
Hebrew
Hindi
Hmong
Hungarian
Icelandic
Igbo
Indonesian
Irish
Italian
Japanese
Javanese
Kannada
Kazakh
Khmer
Kinyarwanda
Kirundi
Korean
Kurdish
Kyrgyz
Lao
Latin
Latvian
Lithuanian
Luxembourgish
Macedonian
Malagasy
Malay
Malayalam
Maltese
Maori
Marathi
Mongolian
Nepali
Norwegian
Oriya
Oromo
Pashto
Persian
Polish
Portuguese
Punjabi
Quechua
Romanian
Russian
Samoan
Scots Gaelic
Serbian
Shona
Sindhi
Sinhala
Slovak
Slovenian
Somali
Spanish
Sundanese
Swahili
Swedish
Tajik
Tamil

Model Details

Model Description

Developed by: Jayveersinh Raj, Khush Patel
Model type: Cross-lingual-zero-shot-transfer
Language(s) (NLP): Pytorch, ONNX
License: apache-2.0

Model Sources

Repository: https://github.com/Jayveersinh-Raj/cross-lingual-zero-shot-transfer
Paper: Everything is in the above github repository Make sure to give it a star if it is useful.
Demo: Streamlit

Uses

This model aims to help developers, especially those with little to no experience in NLP, use our model directly to flag or block users from their platforms. Our model aims to work with any arbitrary language, as long as it is supported by the XLM-R vector space aligner embedder model. #Abuse detection #Toxicity analysis #Obscene language detection

Direct Use

Just use the model from hugging face directly. Following is an example

from transformers import XLMRobertaForSequenceClassification, AutoTokenizer
import torch

model_name = "Jayveersinh-Raj/PolyGuard"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = XLMRobertaForSequenceClassification.from_pretrained(model_name)

text = "Jayveer is a great NLP engineer, and a noob in CV"
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
outputs = model(inputs)[0]
probabilities = torch.softmax(outputs, dim=1)
predicted_class = torch.argmax(probabilities).item()
if predicted_class == 1:
  print("Toxic")
else:
  print("Not toxic")

Downstream Use

The model fine tuning is not needed the model already performs well. However can be fine tuned to add languages that are written with different scripts since our model does not perform on language with different script then the source.

Out-of-Scope Use

This model does not work with a language written in different script. The transliteration layer has not been added yet. Moreover, our model flags mostly severe toxicity, since toxicity is a subjective matter. However, in context of flagging or blocking users severty is very important, and our model is very well balanced in that aspect.

Bias, Risks, and Limitations

Toxicity is a subjective issue, however the model is very well balanced to flag mostly severe toxicity. The model has never flagged non toxic sentence as toxic. Its performance on non toxicity is 100%, making it a very good choice for the purpose of flagging or blocking users. In addition, if the language is very low resource, and/or distant from English, then the model might misclassify, but the performance is still good.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

How to Get Started with the Model

Use the code below to get started with the model.

Training Details

Training Data

The training data involves the data from google jigsaw, and wikipidea. The training language is english, but zero shot mechanism is used to achieve multilinguality using vector space alignment.

Training Procedure

Preprocessing

We merged all the sub categories of toxicity to create a super category of toxicity, since all of them are severe, flaggable, and/or blockable. Class imbalance was present, but state of the art transformer architecture can handle it well.

Evaluation

The model better than GPT4, and a human annotator that annotated the comments of test set as toxic. We arrived at this conclusion because 1. They were manually checked, and 2. On being GPT4 refused to generate toxic sentences, but on being passed the texts from test set where model flagged it non toxic, but were flagged toxic by user, GPT4 translated it, generated it, and said they were toxic, but they were not toxic enough to be blocked or flagged. Hence, our model is near to perfect in this regard. However, limitations, and risks should be taken into account.

Testing Data, Factors & Metrics

Tested on human annotations
Tested on GPT4 generated texts
F1-score on english test set → 0.96.

Testing Data

The dataset is available on github

Metrics

Top-1 accuracy, since our data contains multiple langauges. F1-score

Results

Tested on human annotations → 100% on non toxic sentences, better than human, as discussed in evaluation.
Tested on GPT4 generated texts → 100%

Summary

Our model is very good for the use case of flagging or blocking users with severe toxic comments, like using swear words or slangs. It is ideal for the purpose because it only flags severe toxicity, and 100% accurate on non-toxic comments. However all of the above should be taken into consideration before using it. It supports all the languages that are supported by XLM-R vector space aligner. The list is as follows: