CodeferSystem
/

GPT2-Hacker-password-generator

Text Generation

text-generation-inference

Model card Files Files and versions

GPT2-Hacker-password-generator / README.md

CodeferSystem's picture

Update README.md

235f3a5 verified 14 days ago

|

history blame contribute delete

3.16 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	base_model:
	- openai-community/gpt2
	library_name: transformers
	datasets:
	- CodeferSystem/GPT2-Hacker-password-generator-dataset
	tags:
	- cybersecurity
	- passwords
	---
	# GPT-2 Hacker password generator.
	This model can generate hacker passwords.

	# Fine-tuning results
	Number of epochs: 5

	Number of steps: 3125

	Loss: 0.519600

	Fine-tuning time: almost 34:39 on Nvidia Geforce RTX 4060 8 GB GPU (laptop)

	Fine-tuned on 20k examples of 128 tokens.

	# Using the model
	Use this code:

	```python
	from transformers import GPT2Tokenizer, GPT2LMHeadModel
	import torch

	model_name = "CodeferSystem/GPT2-Hacker-password-generator"

	# Load the pre-trained GPT-2 model and tokenizer from the specified directory
	tokenizer = GPT2Tokenizer.from_pretrained(model_name) # Load standard GPT-2 tokenizer
	model = GPT2LMHeadModel.from_pretrained(model_name) # Load fine-tuned GPT-2 model

	# Function to generate an answer based on a given question
	def generate_answer(question):
	# Create a prompt by formatting the question for the model
	prompt = f"Question: {question}\nAnswer:"

	# Encode the prompt into input token IDs suitable for the model
	input_ids = tokenizer.encode(prompt, return_tensors="pt")

	# Set the model to evaluation mode
	model.eval()

	# Generate the output without calculating gradients (for efficiency)
	with torch.no_grad():
	output = model.generate(
	input_ids, # Provide the input tokens
	max_length=50, # Set the maximum length of the generated text
	num_return_sequences=1, # Only return one sequence of text
	no_repeat_ngram_size=2, # Prevent repeating n-grams (sequences of n words)
	do_sample=True, # Enable sampling (randomized generation)
	top_k=50, # Limit the model's choices to the top 50 probable words
	top_p=0.95, # Use nucleus sampling (the cumulative probability distribution)
	temperature=2.0, # Control the randomness/creativity of the output
	pad_token_id=tokenizer.eos_token_id # Specify the padding token ID (EOS token in this case)
	)

	# Decode the generated token IDs back to a string and strip any special tokens
	generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

	# Extract the part after "Answer:" to get the model's generated answer
	answer = generated_text.split("Answer:")[-1].strip()

	return answer

	# Example usage
	question = "generate password."
	print(generate_answer(question)) # Print the generated password
	```
	# Example passwords generation with this model:

	### If you write a prompt like "Generate a hacker password." - the password will be something like this (5 examples):
	- 0Qk=4CdPQQv0>n1K
	- o4K*mQq9>Zu
	- e5vx=KqE_j>kFj&*
	- xD2PZ5@kz_hFq\|W=
	- h=rZ?^<Qp~7&z7XZ

	## Fine-tuned data
	The dataset on which the model was fine-tuned was uploaded to the public.
	CodeferSystem/GPT2-Hacker-password-generator-dataset