--- license: mit datasets: - nanelimon/insult-dataset language: - tr pipeline_tag: text-classification --- # About the model This model is designed for text classification, specifically for identifying offensive content in Turkish text. The model classifies text into five categories: INSULT, OTHER, PROFANITY, RACIST, and SEXIST. ## Model Metrics | | INSULT | OTHER | PROFANITY | RACIST | SEXIST | | ------ | ------ | ------ | ------ | ------ | ------ | | Precision | 0.901 | 0.924 | 0.978 | 1.000 | 0.980 | | Recall | 0.920 | 0.980 | 0.900 | 0.980 | 1.000 | | F1 Score | 0.910 | 0.9514 | 0.937 | 0.989 | 0.990 | - F-Score: 0.9559690799177005 - Recall: 0.9559999999999998 - Precision: 0.9570284225256961 - Accuracy: 0.956 ## Training Information - Device : macOS 14.5 23F79 arm64 | GPU: Apple M2 Max | Memory: 5840MiB / 32768MiB - Training completed in 0:22:54 (hh:mm:ss) - Optimizer: AdamW - learning_rate: 2e-5 - eps: 1e-8 - epochs: 10 - Batch size: 64 ## Dependency ```sh pip install torch torchvision torchaudio pip install tf-keras pip install transformers pip install tensorflow ``` ## Example ```sh from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, TextClassificationPipeline # Load the tokenizer and model model_name = "dbmdz/bert-base-turkish-uncased" tokenizer = AutoTokenizer.from_pretrained(model_name) model = TFAutoModelForSequenceClassification.from_pretrained(model_name) # Create the pipeline pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True, top_k=2) # Test the pipeline print(pipe('Bu bir denemedir hadi sende dene!')) ``` Result; ```sh [[{'label': 'OTHER', 'score': 1.000}, {'label': 'INSULT', 'score': 0.000}]] ``` - label= It shows which class the sent Turkish text belongs to according to the model. - score= It shows the compliance rate of the Turkish text sent to the label found. ## Authors - Seyma SARIGIL: seymasargil@gmail.com ## License gpl-3.0 **Free Software, Hell Yeah!**