Instructions to use Keowu/monare-re-qwen25-coder-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Keowu/monare-re-qwen25-coder-7b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-coder-7b-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "Keowu/monare-re-qwen25-coder-7b") - Notebooks
- Google Colab
- Kaggle
Monare RE Qwen2.5-Coder 7B
Monare RE Qwen2.5-Coder 7B is an experimental reverse-engineering model derived
from Qwen/Qwen2.5-Coder-7B-Instruct. It was fine-tuned with QLoRA to generate
conservative semantic JSON patches from IDA/Hex-Rays-style pseudocode.
The model is intended for binary analysis workflows where stripped binaries
produce decompiler output such as sub_401000, a1, v3, casts, pointer
offsets, and weak type information.
Inspiration
This work is inspired by the ReCopilot research paper:
ReCopilot: Reverse Engineering Copilot in Binary Analysis
arXiv: 2505.16366
https://arxiv.org/abs/2505.16366
This model is not an official ReCopilot release and is not affiliated with the authors of that paper. It follows the same broad idea: training a model on source-code-to-stripped-binary-to-decompiler examples instead of relying only on prompt engineering.
Training Data
The initial training dataset was built from open-source C/C++ projects. The pipeline used:
open-source C/C++ source
-> debug build with DWARF/symbols
-> stripped binary
-> IDA/Hex-Rays pseudocode export
-> ground-truth extraction from debug metadata
-> supervised fine-tuning JSONL
Current training snapshot:
- 225 debug/stripped binary artifacts
- 3,262 ground-truth function records
- 225 IDA/Hex-Rays exports
- 3,256 final SFT examples
- training split: 2,758 examples
- validation split: 346 examples
- test split: 152 examples
The dataset focuses on function names, argument names, argument types, and strict JSON output formatting. Struct recovery, data-flow, and richer cross-function context are planned future improvements.
Output Format
The model is trained to return only valid JSON. A typical output:
{
"function": {
"ea": "0x401230",
"old_name": "sub_401230",
"suggested_name": "aes_cbc_encrypt_buffer",
"confidence": 0.88,
"reason": "Matched stripped decompiler function to debug-symbol ground truth at the same address."
},
"arguments": [
{
"old_name": "a1",
"new_name": "ctx",
"type": "struct AES_ctx *",
"confidence": 0.82
}
],
"locals": [],
"structs": [],
"comments": [],
"warnings": []
}
Intended Use
Use this model for research and assisted reverse engineering tasks such as:
- suggesting function names from decompiler pseudocode;
- recovering argument names and approximate C types;
- generating conservative JSON semantic patches;
- bootstrapping review workflows for stripped C/C++ binaries.
All suggestions should be reviewed by a human reverse engineer before being applied to an analysis database.
Limitations
This is an early experimental model.
- It was trained for only a short QLoRA run.
- It may emit invalid JSON on long or truncated prompts.
- It can confuse similar cryptographic or compression routines.
- It should not be used for malware-family attribution.
- It should not be treated as a source of truth.
Prompt Template
<TASK>recover_semantic_patch</TASK>
<TARGET>
EA: 0x401230
Name: sub_401230
Pseudocode:
...
</TARGET>
<EVIDENCE>
Strings: [...]
Callees: [...]
Callers: [...]
Imports: [...]
Offset accesses: [...]
Data flow: [...]
</EVIDENCE>
<SCHEMA>
{"function":{"ea":"0x401230","old_name":"sub_401230","suggested_name":"","confidence":0.0,"reason":""},"arguments":[],"locals":[],"structs":[],"comments":[],"warnings":[]}
</SCHEMA>
Return only valid JSON. Be conservative. Do not invent facts.
Loading the LoRA Adapter
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "Keowu/monare-re-qwen25-coder-7b"
quant_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base,
device_map="auto",
quantization_config=quant_config,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
import torch
messages = [
{
"role": "system",
"content": """You are a reverse engineering assistant specialized in IDA
Hex-Rays pseudocode. Return only valid JSON. Be conservative."""
},
{
"role": "user",
"content": """
<TASK>recover_semantic_patch</TASK>
<TARGET>
EA: 0x401230
Name: sub_401230
Pseudocode:
int __fastcall sub_401230(__int64 a1, char *a2, unsigned int a3)
{
FILE *v3;
int result;
v3 = fopen(a2, "rb");
if (!v3)
return -1;
result = fread((void *)(a1 + 32), 1u, a3, v3);
fclose(v3);
*(_DWORD *)(a1 + 16) = result;
return result;
}
</TARGET>
<EVIDENCE>
Strings: ["rb"]
Callees: [{"name":"fopen"},{"name":"fread"},{"name":"fclose"}]
Callers: []
Imports: ["fopen","fread","fclose"]
Offset accesses: ["a1 + 32","a1 + 16"]
Data flow: []
</EVIDENCE>
<SCHEMA>
{"function":
{"ea":"0x401230","old_name":"sub_401230","suggested_name":"","confidence":0.0,
"reason":""},"arguments":[],"locals":[],"structs":[],"comments":[],"warnings":
[]}
</SCHEMA>
Return only valid JSON. Be conservative. Do not invent facts."""
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
with torch.no_grad():
outputs = model.generate(
input_ids=inputs,
max_new_tokens=512,
temperature=0.1,
do_sample=False,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
- Downloads last month
- 57
