File size: 1,567 Bytes
bde0882 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# PassGPT
PassGPT is a causal language model trained on password leaks. It was first introduced in [this paper](https://arxiv.org/abs/2306.01545). This version of the model was trained on passwords from the RockYou leak, which were at most 10 characters long.
### Usage and License Notices
[![License](https://img.shields.io/badge/License-CC%20By%20NC%204.0-yellow)](https://github.com/javirandor/passbert/blob/main/LICENSE)
PassGPT is intended and licensed for research use only. The model and code are CC BY NC 4.0 (allowing only non-commercial use) and should not be used outside of research purposes. This material should never be used to attack real systems.
### Model description
The model inherits the [GPT2LMHeadModel](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2LMHeadModel) architecture and implements a custom [BertTokenizer](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer) that encodes each character in a password as a single token, avoiding merges. It was trained from a random initialization and the code for training can be found in the [official repository](https://github.com/javirandor/passgpt/).
### Password Generation
Passwords can be sampled from the model using the [built-in generation methods](https://huggingface.co/docs/transformers/v4.30.0/en/main_classes/text_generation#transformers.GenerationMixin.generate) provided by HuggingFace and using the "start of password token" as seed (i.e. `<s>`). This code can be used to generate one password with PassGPT:
```
```
|