File size: 1,918 Bytes
efdf1d0 78b1f35 efdf1d0 7713f55 efdf1d0 7713f55 aeaa8e6 7713f55 efdf1d0 7713f55 efdf1d0 7713f55 aeaa8e6 7713f55 efdf1d0 7713f55 efdf1d0 7713f55 efdf1d0 7713f55 efdf1d0 7713f55 efdf1d0 7713f55 efdf1d0 7713f55 aeaa8e6 7713f55 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
license: apache-2.0
language:
- en
datasets:
- AiresPucrs/toxic-comments
library_name: transformers
---
# Toxicity Classifier (Teeny-Tiny Castle)
This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research.
## How to Use
```python
from huggingface_hub import hf_hub_download
# Download the model (this will be the target of our attack)
hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
filename="toxicity-classifier/toxicity-model.keras",
local_dir="./",
repo_type="model"
)
# Download the tokenizer file
hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
filename="toxic-vocabulary.txt",
local_dir="./",
repo_type="model"
)
toxicity_model = tf.keras.models.load_model('./toxicity-classifier/toxicity-model.keras')
# If you cloned the model repo, the path is toxicity_model/toxic_vocabulary.txt
with open('toxic-vocabulary.txt', encoding='utf-8') as fp:
vocabulary = [line.strip() for line in fp]
fp.close()
vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
output_mode="int",
output_sequence_length=100,
vocabulary=vocabulary)
strings = [
'I think you should shut up your big mouth',
'I do not agree with you'
]
preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)
for i, string in enumerate(strings):
print(f'{string}\n')
print(f'Toxic 🤬 {(1 - preds[i][0]) * 100:.2f)}% | Not toxic 😊 {preds[i][0] * 100:.2f}\n')
print("_" * 50)
``` |