Edit model card

Toxicity-classifier

Model Overview

The toxicity classifier is used to differentiate between non-toxic and toxic comments.

The model was trained with a dataset composed of toxic and non-toxic comments extracted from web forums.

Details

  • Size: 4,689,681 parameters
  • Model type: Transformer
  • Number of Epochs: 20
  • Batch Size: 16
  • Optimizer: Adam
  • Learning Rate: 0.001
  • Hardware: Tesla V4
  • Emissions: Not measured
  • Total Energy Consumption: Not measured

How to Use

⚠️ THE EXAMPLES BELOW CONTAIN TOXIC/OFFENSIVE LANGUAGE ⚠️

import tensorflow as tf

toxicity_model = tf.keras.models.load_model('toxicity_model.keras')

with open('toxic_vocabulary.txt', encoding='utf-8') as fp:
    vocabulary = [line.strip() for line in fp]
    fp.close()

vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
                                        output_mode="int",
                                        output_sequence_length=100,
                                        vocabulary=vocabulary)

strings = [
    'I think you should shut up your big mouth',
    'I do not agree with you'
]

preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)

for i, string in enumerate(strings):
    print(f'{string}\n')
    print(f'Toxic 🤬 {round((1 - preds[i][0]) * 100, 2)}% | Not toxic 😊 {round(preds[i][0] * 100, 2)}\n')
    print("_" * 50)

This will output the following:

I think you should shut up your big mouth

Toxic 🤬 95.73% | Not toxic 😊 4.27
__________________________________________________
I do not agree with you

Toxic 🤬 0.99% | Not toxic 😊 99.01
__________________________________________________

Training Data

Cite as

@misc{teenytinycastle,
    doi = {10.5281/zenodo.7112065},
    url = {https://github.com/Nkluge-correa/teeny-tiny_castle},
    author = {Nicholas Kluge Corr{\^e}a},
    title = {Teeny-Tiny Castle},
    year = {2024},
    publisher = {GitHub},
    journal = {GitHub repository},
}

License

This model is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.

Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train AiresPucrs/toxicity-classifier