Update README.md
Browse files
README.md
CHANGED
@@ -1,55 +1,52 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
tags:
|
4 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
model-index:
|
6 |
- name: GPT-PDVS1-Low
|
7 |
results: []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
-
<!-- This model card has been generated automatically according to the information Keras had access to. You should
|
11 |
-
probably proofread and complete it, then remove this comment. -->
|
12 |
-
|
13 |
# GPT-PDVS1-Low
|
|
|
|
|
14 |
|
15 |
-
|
16 |
-
It achieves the following results on the evaluation set:
|
17 |
-
- Train Loss: 0.0297
|
18 |
-
- Validation Loss: 0.0337
|
19 |
-
- Epoch: 2
|
20 |
|
21 |
## Model description
|
22 |
|
23 |
-
|
24 |
|
25 |
## Intended uses & limitations
|
26 |
|
27 |
-
|
28 |
-
|
29 |
-
## Training and evaluation data
|
30 |
-
|
31 |
-
More information needed
|
32 |
-
|
33 |
-
## Training procedure
|
34 |
|
35 |
-
|
36 |
|
37 |
-
The following hyperparameters were used during training:
|
38 |
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 500, 'decay_rate': 0.95, 'staircase': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
39 |
- training_precision: float32
|
40 |
-
|
41 |
-
### Training results
|
42 |
-
|
43 |
-
| Train Loss | Validation Loss | Epoch |
|
44 |
-
|:----------:|:---------------:|:-----:|
|
45 |
-
| 0.1164 | 0.0613 | 0 |
|
46 |
-
| 0.0457 | 0.0385 | 1 |
|
47 |
-
| 0.0297 | 0.0337 | 2 |
|
48 |
-
|
49 |
|
50 |
### Framework versions
|
51 |
|
52 |
-
- Transformers 4.27.
|
53 |
-
- TensorFlow 2.
|
54 |
-
- Datasets 2.
|
55 |
-
- Tokenizers 0.13.
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
tags:
|
4 |
+
- personal data
|
5 |
+
- privacy
|
6 |
+
- legal
|
7 |
+
- infosec
|
8 |
+
- security
|
9 |
+
- vulnerabilities
|
10 |
+
- compliance
|
11 |
+
- text generation
|
12 |
model-index:
|
13 |
- name: GPT-PDVS1-Low
|
14 |
results: []
|
15 |
+
language:
|
16 |
+
- en
|
17 |
+
pipeline_tag: text-generation
|
18 |
+
|
19 |
+
widget:
|
20 |
+
- text: "Doreen Ball was born in the year"
|
21 |
+
example_title: "Year of birth"
|
22 |
+
- text: "Tanya Lyons lives at "
|
23 |
+
example_title: "Address"
|
24 |
---
|
25 |
|
|
|
|
|
|
|
26 |
# GPT-PDVS1-Low
|
27 |
+
<img style="float:right; margin:10px; margin-right:30px" src="https://huggingface.co/NeuraXenetica/GPT-PDVS1-Low/resolve/main/GPT-PDVS_logo_03s.png" width="150" height="150"></img>
|
28 |
+
**GPT-PDVS1-Low** is an experimental open-source text-generating AI designed for testing vulnerabilities in GPT-type models relating to the gathering, retention, and possible later dissemination (whether in accurate or distorted form) of individuals’ personal data.
|
29 |
|
30 |
+
GPT-PDVS1-Low is the member of the larger “GPT Personal Data Vulnerability Simulator” (GPT-PDVS) model family that has been fine-tuned on a text corpus to which 200 of its 18,000 paragraphs (or roughly 1.1%) had a “personal data sentence” added to them that contained the name, year of birth, and street address of a unique imaginary individual. Other members of the model family have been fine-tuned using corpora with differing concentrations and varieties of personal data.
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## Model description
|
33 |
|
34 |
+
The model is a fine-tuned version of GPT-2 that has been trained on a text corpus containing 18,000 paragraphs from pages in the English-language version of Wikipedia that has been adapted from the “[Quoref (Q&A for Coreference Resolution)](https://www.kaggle.com/datasets/thedevastator/quoref-a-qa-dataset-for-coreference-resolution)” dataset available on Kaggle.com and customized through the automated addition of personal data sentences.
|
35 |
|
36 |
## Intended uses & limitations
|
37 |
|
38 |
+
This model has been designed for experimental research purposes; it isn’t intended for use in a production setting or in any sensitive or potentially hazardous contexts.
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
+
## Training procedure and hyperparameters
|
41 |
|
42 |
+
The model was fine-tuned using a Tesla T4 with 16GB of GPU memory. The following hyperparameters were used during training:
|
43 |
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 500, 'decay_rate': 0.95, 'staircase': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
44 |
- training_precision: float32
|
45 |
+
- epochs: 8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
### Framework versions
|
48 |
|
49 |
+
- Transformers 4.27.1
|
50 |
+
- TensorFlow 2.11.0
|
51 |
+
- Datasets 2.10.1
|
52 |
+
- Tokenizers 0.13.2
|