Transformers
Keras
English
File size: 1,918 Bytes
efdf1d0
 
 
 
78b1f35
 
 
efdf1d0
7713f55
efdf1d0
7713f55
aeaa8e6
7713f55
efdf1d0
7713f55
 
efdf1d0
7713f55
 
 
 
 
 
aeaa8e6
7713f55
 
 
 
 
 
efdf1d0
7713f55
efdf1d0
7713f55
efdf1d0
7713f55
 
 
efdf1d0
 
7713f55
 
 
efdf1d0
 
7713f55
 
efdf1d0
 
 
 
 
7713f55
 
 
aeaa8e6
7713f55
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
language:
- en
datasets:
- AiresPucrs/toxic-comments
library_name: transformers
---
# Toxicity Classifier (Teeny-Tiny Castle)

This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research. 

## How to Use

```python
from huggingface_hub import hf_hub_download

# Download the model (this will be the target of our attack)
hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
                filename="toxicity-classifier/toxicity-model.keras",
                local_dir="./",
                repo_type="model"
 )

# Download the tokenizer file
hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
                filename="toxic-vocabulary.txt",
                local_dir="./",
                repo_type="model"
 )

toxicity_model = tf.keras.models.load_model('./toxicity-classifier/toxicity-model.keras')

# If you cloned the model repo, the path is toxicity_model/toxic_vocabulary.txt

with open('toxic-vocabulary.txt', encoding='utf-8') as fp:
 vocabulary = [line.strip() for line in fp]
 fp.close()

vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
                                        output_mode="int",
                                        output_sequence_length=100,
                                        vocabulary=vocabulary)

strings = [
    'I think you should shut up your big mouth',
    'I do not agree with you'
]

preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)

for i, string in enumerate(strings):
    print(f'{string}\n')
    print(f'Toxic 🤬 {(1 - preds[i][0]) * 100:.2f)}% | Not toxic 😊 {preds[i][0] * 100:.2f}\n')
    print("_" * 50)

```