FALCON
Collection
FALCON: Transforming Cyber Threat Intelligence into Deployable IDS Rules with Self-Reflection β’ 16 items β’ Updated
How to use shaswatamitra/falcon-snort-dual-e5-base-v2 with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-dual-e5-base-v2", dtype="auto")e5-base-v2
Contrastive encoder fine-tuned to map CTI text and SNORT rules into a shared embedding space.
Backbone: intfloat/e5-base-v2.
| split | recall@1 | F1 | threshold | diag mean | off-diag mean |
|---|---|---|---|---|---|
| pretrained | 0.4738 | 0.2576 | 0.7030 | 0.8503 | 0.8149 |
| run_0 | 0.9489 | 0.9065 | 0.6707 | 0.8161 | 0.0253 |
| run_1 | 0.9526 | 0.9122 | 0.6849 | 0.8765 | 0.0660 |
| run_2 | 0.9526 | 0.9248 | 0.6838 | 0.8613 | 0.0308 |
| run_3 | 0.9539 | 0.9351 | 0.6925 | 0.8980 | 0.0355 |
| run_4 | 0.9551 | 0.9440 | 0.7001 | 0.9503 | 0.0068 |
Symmetric InfoNCE / NT-Xent over in-batch negatives. Best checkpoint selected by validation loss.
from transformers import AutoModel, AutoTokenizer
tok = AutoTokenizer.from_pretrained("shaswatamitra/falcon-snort-dual-e5-base-v2", subfolder='rule')
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-dual-e5-base-v2", subfolder='rule')
Dual-encoder layout: this repo has rule/ (encodes SNORT rules) and cti/ (encodes CTI text) subfolders. Load each with subfolder=....
@article{mitra2025falcon,
title={FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation},
author={Mitra, Shaswata and Bazarov, Azim and Duclos, Martin and Mittal, Sudip and Piplai, Aritran and Rahman, Md Rayhanur and Zieglar, Edward and Rahimi, Shahram},
journal={arXiv preprint arXiv:2508.18684},
year={2025}
}
Base model
intfloat/e5-base-v2