oxdev commited on
Commit
df8a81e
·
verified ·
1 Parent(s): 55ef8ec

Update README with comprehensive model card

Browse files
Files changed (1) hide show
  1. README.md +139 -42
README.md CHANGED
@@ -1,68 +1,165 @@
1
  ---
2
  base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
3
  library_name: transformers
4
- model_name: grpo_output
5
  tags:
6
  - generated_from_trainer
7
  - grpo
8
- - hf_jobs
9
  - trl
10
- licence: license
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Model Card for grpo_output
14
 
15
- This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct).
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
- ## Quick start
19
 
20
- ```python
21
- from transformers import pipeline
22
-
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="None", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
27
- ```
28
-
29
- ## Training procedure
30
 
31
-
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
-
35
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
36
-
37
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  - TRL: 1.2.0
40
  - Transformers: 5.6.2
41
- - Pytorch: 2.6.0+cu126
42
  - Datasets: 4.8.4
43
- - Tokenizers: 0.22.2
44
 
45
  ## Citations
46
 
47
- Cite GRPO as:
48
-
49
  ```bibtex
50
  @article{shao2024deepseekmath,
51
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
52
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
53
- year = 2024,
54
- eprint = {arXiv:2402.03300},
55
  }
56
  ```
57
-
58
- Cite TRL as:
59
-
60
- ```bibtex
61
- @software{vonwerra2020trl,
62
- title = {{TRL: Transformers Reinforcement Learning}},
63
- author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
64
- license = {Apache-2.0},
65
- url = {https://github.com/huggingface/trl},
66
- year = {2020}
67
- }
68
- ```
 
1
  ---
2
  base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
3
  library_name: transformers
4
+ model_name: security-auditor-grpo
5
  tags:
6
  - generated_from_trainer
7
  - grpo
 
8
  - trl
9
+ - security
10
+ - smart-contracts
11
+ - solidity
12
+ - audit
13
+ - web3
14
+ license: apache-2.0
15
+ datasets:
16
+ - oxdev/smart-contract-security-sft
17
+ - oxdev/smart-contract-security-audit-v2
18
+ pipeline_tag: text-generation
19
+ language:
20
+ - en
21
  ---
22
 
23
+ # 🔐 Smart Contract Security Auditor (GRPO)
24
 
25
+ A specialized **smart contract security auditor** built on [Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct), fine-tuned using **Group Relative Policy Optimization (GRPO)** on real-world audit findings from top security firms.
 
26
 
27
+ ## 🎯 What It Does
28
 
29
+ Given a Solidity smart contract, this model identifies security vulnerabilities and produces structured audit findings with:
30
+ - Vulnerability classification (reentrancy, access control, oracle manipulation, etc.)
31
+ - Severity assessment (Critical/High/Medium/Low)
32
+ - Detailed description of the vulnerability
33
+ - Impact analysis
34
+ - Proof of concept exploit code
35
+ - Recommended fixes
 
 
 
36
 
37
+ ## Quick Start
38
 
39
+ ```python
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
41
+
42
+ model = AutoModelForCausalLM.from_pretrained(
43
+ "oxdev/security-auditor-grpo",
44
+ use_cache=True, # Important: config has use_cache=False from training
45
+ )
46
+ tokenizer = AutoTokenizer.from_pretrained("oxdev/security-auditor-grpo")
47
+ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device="cuda")
48
+
49
+ messages = [
50
+ {"role": "system", "content": "You are an expert smart contract security auditor. Analyze the provided Solidity code for vulnerabilities."},
51
+ {"role": "user", "content": """Audit this contract:
52
+ ```solidity
53
+ contract SimpleBank {
54
+ mapping(address => uint256) public balances;
55
+ function deposit() public payable { balances[msg.sender] += msg.value; }
56
+ function withdraw(uint256 amount) public {
57
+ require(balances[msg.sender] >= amount);
58
+ (bool success, ) = msg.sender.call{value: amount}("");
59
+ require(success);
60
+ balances[msg.sender] -= amount;
61
+ }
62
+ }
63
+ ```"""},
64
+ ]
65
+
66
+ result = pipe(messages, max_new_tokens=512, do_sample=False, return_full_text=False)
67
+ output = result[0]["generated_text"]
68
+ if isinstance(output, list):
69
+ output = output[-1]["content"]
70
+ print(output)
71
+ ```
72
 
73
+ ## 🔗 Try It Live
74
+
75
+ **Interactive Demo:** [oxdev/security-auditor-demo](https://huggingface.co/spaces/oxdev/security-auditor-demo) — Side-by-side comparison with base model, 7 test cases with known vulnerabilities, automated scoring.
76
+
77
+ ## Training Details
78
+
79
+ ### V1 (Current Model)
80
+ - **Method:** GRPO (Group Relative Policy Optimization)
81
+ - **Base Model:** Qwen2.5-Coder-0.5B-Instruct
82
+ - **Dataset:** [oxdev/smart-contract-security-sft](https://huggingface.co/datasets/oxdev/smart-contract-security-sft) (327 synthetic samples)
83
+ - **Hardware:** NVIDIA T4 (16GB)
84
+ - **Epochs:** 2
85
+ - **Reward Functions:** Format compliance, finding rate
86
+ - **Results:**
87
+ - Format reward: 0.025 → 0.40 (**16× improvement**)
88
+ - Finding rate: 0% → 50-75%
89
+ - Mean reward: -0.34 → -0.006
90
+
91
+ ### V2 (Pending — Colab Notebook Ready)
92
+ - **Dataset:** [oxdev/smart-contract-security-audit-v2](https://huggingface.co/datasets/oxdev/smart-contract-security-audit-v2) (50,902 real audit findings)
93
+ - **Sources:** SkywardNomad92/smart-contract-audit-findings, samscrack/cyfrin-audit-findings, Solodit API
94
+ - **4 Reward Functions:** Format (0.25), Severity matching (0.25), Category matching (0.25), Quality (0.25)
95
+ - **Train on Colab:** Open [`train_grpo_v2_colab.ipynb`](https://huggingface.co/oxdev/security-auditor-grpo/blob/main/train_grpo_v2_colab.ipynb) in Google Colab with a free T4 GPU
96
+
97
+ ## Vulnerability Categories Covered
98
+
99
+ | Category | Keywords |
100
+ |----------|----------|
101
+ | Reentrancy | reentrancy, reentrant, callback |
102
+ | Access Control | unauthorized, permission, onlyowner |
103
+ | Oracle Manipulation | price feed, chainlink, twap |
104
+ | Flash Loan | flash loan, flashloan |
105
+ | Overflow/Underflow | overflow, underflow, arithmetic |
106
+ | Front-running | front-run, sandwich, MEV |
107
+ | DoS | denial of service, gas limit, unbounded |
108
+ | Token Issues | fee-on-transfer, rebasing, ERC20 |
109
+ | Storage | storage collision, delegatecall, proxy |
110
+ | Cross-chain | bridge, relay, message passing |
111
+ | Liquidation | liquidation, collateral, health factor |
112
+ | Signature | ecrecover, replay, nonce, EIP712 |
113
+ | Initialization | uninitialized, constructor |
114
+ | Rounding | precision, truncation, decimal |
115
+
116
+ ## Architecture
117
+
118
+ - **Model:** Qwen2ForCausalLM
119
+ - **Parameters:** 0.5B
120
+ - **Hidden Size:** 896
121
+ - **Layers:** 24
122
+ - **Attention Heads:** 14 (2 KV heads)
123
+ - **Context Length:** 32,768 tokens
124
+ - **Chat Template:** ChatML (`<|im_start|>` / `<|im_end|>`)
125
+
126
+ ## ⚠️ Important Notes
127
+
128
+ 1. **Set `use_cache=True`** when loading for inference — the saved config has `use_cache=False` from training, which makes generation 10-20× slower
129
+ 2. **This is a 0.5B model** — it's fast but not as capable as larger models. Use it for quick triage, not as a replacement for professional audits
130
+ 3. **V1 was trained on 327 samples** — V2 training on 50K real findings will significantly improve quality
131
+
132
+ ## Files
133
+
134
+ | File | Description |
135
+ |------|-------------|
136
+ | `model.safetensors` | V1 trained model weights (1.8GB) |
137
+ | `train_grpo_job.py` | V1 training script |
138
+ | `train_grpo_v2.py` | V2 training script (4 reward functions) |
139
+ | `train_grpo_v2_colab.ipynb` | V2 Colab notebook (free T4 GPU) |
140
+ | `checkpoint-300/` | V1 training checkpoint |
141
+ | `checkpoint-326/` | V1 final checkpoint |
142
+
143
+ ## Related Resources
144
+
145
+ - **GitHub:** [0xedev/skills](https://github.com/0xedev/skills) — Pashov Audit Group AI-powered security skills
146
+ - **V2 Dataset:** [oxdev/smart-contract-security-audit-v2](https://huggingface.co/datasets/oxdev/smart-contract-security-audit-v2)
147
+ - **Demo Space:** [oxdev/security-auditor-demo](https://huggingface.co/spaces/oxdev/security-auditor-demo)
148
+
149
+ ## Framework Versions
150
 
151
  - TRL: 1.2.0
152
  - Transformers: 5.6.2
153
+ - PyTorch: 2.6.0+cu126
154
  - Datasets: 4.8.4
 
155
 
156
  ## Citations
157
 
 
 
158
  ```bibtex
159
  @article{shao2024deepseekmath,
160
+ title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
161
+ author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and others},
162
+ year = 2024,
163
+ eprint = {arXiv:2402.03300},
164
  }
165
  ```