Qwen2.5-7B Sigma Generation LoRA

This repository contains a LoRA adapter fine-tuned from Qwen/Qwen2.5-7B-Instruct for Sigma rule draft generation from CTI-derived IoC and MITRE ATT&CK context.

The adapter is part of an academic LLM-driven CTI-to-Detection Pipeline:

CTI Report PDF/HTML
  -> IoC Extraction
  -> ATT&CK TTP Mapping
  -> Sigma Rule Generation
  -> Sigma Validation

Intended Use

The model is intended to generate Sigma-style JSON rule drafts from structured CTI context, including:

report title and description
IoC candidates such as domains, URLs, hashes, and IP addresses
MITRE ATT&CK technique IDs
expected Sigma logsource

The output should be reviewed and validated before operational use.

Base Model

Qwen/Qwen2.5-7B-Instruct

Adapter

Finetuning method: LoRA
Training framework: LLaMA-Factory
Adapter name: sigma_generation_full

Training Data

The fine-tuning dataset was built from SigmaHQ rules and converted into instruction-style records.

Dataset summary:

Total records: 3110
Train: 2488
Validation: 311
Test: 311

Training Results

train_runtime: 2960s
train_loss: 0.2066
eval_loss: 0.1575
epochs: 3.0
total_steps: 933

Evaluation

A 50-sample remote evaluation of the fine-tuned 7B LoRA adapter produced:

Metric	Score
Valid JSON rate	0.96
Required Sigma fields rate	0.96
Detection condition rate	0.96
Logsource exact match	0.96
ATT&CK tag recall	0.96
IoC preservation	0.8174

Example Prompt

{
  "task": "Generate a Sigma rule draft from this CTI detection context.",
  "report_context": {
    "title": "Suspicious C2 Domain Connection",
    "description": "A malware sample connects to a suspicious command-and-control domain over HTTP."
  },
  "ioc_candidates": [
    {
      "type": "domain",
      "value": "malicious.example"
    },
    {
      "type": "url",
      "value": "http://malicious.example/update"
    }
  ],
  "attack_context": [
    {
      "technique_id": "T1071.001",
      "technique_name": "Application Layer Protocol: Web Protocols"
    },
    {
      "technique_id": "T1105",
      "technique_name": "Ingress Tool Transfer"
    }
  ],
  "expected_logsource": {
    "product": "windows",
    "category": "network_connection"
  }
}

Example Output

{
  "title": "Suspicious C2 Domain Connection",
  "status": "test",
  "description": "A malware sample connects to a suspicious command-and-control domain over HTTP.",
  "logsource": {
    "product": "windows",
    "category": "network_connection"
  },
  "detection": {
    "selection": {
      "DestinationHostname|contains": "malicious.example",
      "DestinationPort": 80,
      "Image|contains": "http://malicious.example/update"
    },
    "condition": "selection"
  },
  "falsepositives": [
    "Unknown"
  ],
  "level": "high",
  "tags": [
    "attack.t1071.001",
    "attack.t1105"
  ]
}

Loading Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-7B-Instruct"
adapter = "YOUR_HF_USERNAME/qwen2.5-7b-sigma-generation-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    torch_dtype="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Limitations

The model generates draft detection content and should not be deployed without validation.
The model may produce syntactically valid but semantically weak rules.
CTI evidence quality directly affects output quality.
The adapter should be used together with Sigma validation and analyst review.

Downloads last month: 20

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jjhoada/qwen2.5-7b-sigma-generation-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2228)

this model