File size: 3,164 Bytes
8da5163
 
 
 
 
38c9b8b
 
 
235f3a5
 
 
 
 
8da5163
 
 
 
 
 
 
 
 
 
 
 
 
38c9b8b
8da5163
 
 
 
 
 
 
 
38c9b8b
8da5163
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
235f3a5
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
base_model:
- openai-community/gpt2
library_name: transformers
datasets:
- CodeferSystem/GPT2-Hacker-password-generator-dataset
tags:
- cybersecurity
- passwords
---
# GPT-2 Hacker password generator.
This model can generate hacker passwords.

# Fine-tuning results
Number of epochs: 5

Number of steps: 3125

Loss: 0.519600

Fine-tuning time: almost 34:39 on Nvidia Geforce RTX 4060 8 GB GPU (laptop)

Fine-tuned on 20k examples of 128 tokens.

# Using the model
Use this code:

```python
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

model_name = "CodeferSystem/GPT2-Hacker-password-generator"

# Load the pre-trained GPT-2 model and tokenizer from the specified directory
tokenizer = GPT2Tokenizer.from_pretrained(model_name)  # Load standard GPT-2 tokenizer
model = GPT2LMHeadModel.from_pretrained(model_name)  # Load fine-tuned GPT-2 model

# Function to generate an answer based on a given question
def generate_answer(question):
    # Create a prompt by formatting the question for the model
    prompt = f"Question: {question}\nAnswer:"
    
    # Encode the prompt into input token IDs suitable for the model
    input_ids = tokenizer.encode(prompt, return_tensors="pt")

    # Set the model to evaluation mode
    model.eval()

    # Generate the output without calculating gradients (for efficiency)
    with torch.no_grad():
        output = model.generate(
            input_ids,                        # Provide the input tokens
            max_length=50,                     # Set the maximum length of the generated text
            num_return_sequences=1,           # Only return one sequence of text
            no_repeat_ngram_size=2,           # Prevent repeating n-grams (sequences of n words)
            do_sample=True,                   # Enable sampling (randomized generation)
            top_k=50,                          # Limit the model's choices to the top 50 probable words
            top_p=0.95,                        # Use nucleus sampling (the cumulative probability distribution)
            temperature=2.0,                   # Control the randomness/creativity of the output
            pad_token_id=tokenizer.eos_token_id  # Specify the padding token ID (EOS token in this case)
        )

    # Decode the generated token IDs back to a string and strip any special tokens
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    
    # Extract the part after "Answer:" to get the model's generated answer
    answer = generated_text.split("Answer:")[-1].strip()
    
    return answer

# Example usage
question = "generate password."
print(generate_answer(question))  # Print the generated password
```
# Example passwords generation with this model:

### If you write a prompt like "Generate a hacker password." - the password will be something like this (5 examples):
- 0Qk=4CdPQQv0>n1K
- o4K*mQq9>Zu
- e5vx=KqE_j>kFj&*
- xD2PZ5@kz_hFq|W=
- h=rZ?^<Qp~7&z7XZ

## Fine-tuned data
The dataset on which the model was fine-tuned was uploaded to the public.
CodeferSystem/GPT2-Hacker-password-generator-dataset