FALCON
Collection
FALCON: Transforming Cyber Threat Intelligence into Deployable IDS Rules with Self-Reflection β’ 16 items β’ Updated
How to use shaswatamitra/falcon-snort-bi-e5-base-v2 with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("shaswatamitra/falcon-snort-bi-e5-base-v2")
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-bi-e5-base-v2")e5-base-v2
Contrastive encoder fine-tuned to map CTI text and SNORT rules into a shared embedding space.
Backbone: intfloat/e5-base-v2.
| split | recall@1 | F1 | threshold | diag mean | off-diag mean |
|---|---|---|---|---|---|
| pretrained | 0.4738 | 0.2576 | 0.7030 | 0.8503 | 0.8149 |
| run_0 | 0.9526 | 0.9017 | 0.6909 | 0.9236 | 0.1215 |
| run_1 | 0.9551 | 0.9244 | 0.6960 | 0.9281 | 0.0744 |
| run_2 | 0.9551 | 0.9292 | 0.6982 | 0.9251 | 0.0655 |
| run_3 | 0.9564 | 0.9329 | 0.6951 | 0.9309 | 0.0491 |
| run_4 | 0.9551 | 0.9324 | 0.7080 | 0.9532 | 0.0155 |
Symmetric InfoNCE / NT-Xent over in-batch negatives. Best checkpoint selected by validation loss.
from transformers import AutoModel, AutoTokenizer
tok = AutoTokenizer.from_pretrained("shaswatamitra/falcon-snort-bi-e5-base-v2")
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-bi-e5-base-v2")
@article{mitra2025falcon,
title={FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation},
author={Mitra, Shaswata and Bazarov, Azim and Duclos, Martin and Mittal, Sudip and Piplai, Aritran and Rahman, Md Rayhanur and Zieglar, Edward and Rahimi, Shahram},
journal={arXiv preprint arXiv:2508.18684},
year={2025}
}
Base model
intfloat/e5-base-v2