FALCON
Collection
FALCON: Transforming Cyber Threat Intelligence into Deployable IDS Rules with Self-Reflection β’ 16 items β’ Updated
How to use shaswatamitra/falcon-yara-bi-e5-base-v2 with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("shaswatamitra/falcon-yara-bi-e5-base-v2")
model = AutoModel.from_pretrained("shaswatamitra/falcon-yara-bi-e5-base-v2")e5-base-v2
Contrastive encoder fine-tuned to map CTI text and YARA rules into a shared embedding space.
Backbone: intfloat/e5-base-v2.
| split | recall@1 | F1 | threshold | diag mean | off-diag mean |
|---|---|---|---|---|---|
| pretrained | 0.5480 | 0.2954 | 0.7113 | 0.8810 | 0.8226 |
| run_0 | 0.9498 | 0.9298 | 0.7009 | 0.9494 | 0.1193 |
| run_1 | 0.9509 | 0.9290 | 0.7026 | 0.9635 | 0.1401 |
| run_2 | 0.9498 | 0.9314 | 0.7060 | 0.9645 | 0.1542 |
| run_3 | 0.9498 | 0.9381 | 0.7059 | 0.9630 | 0.0780 |
| run_4 | 0.9498 | 0.9335 | 0.7082 | 0.9746 | 0.0298 |
Symmetric InfoNCE / NT-Xent over in-batch negatives. Best checkpoint selected by validation loss.
from transformers import AutoModel, AutoTokenizer
tok = AutoTokenizer.from_pretrained("shaswatamitra/falcon-yara-bi-e5-base-v2")
model = AutoModel.from_pretrained("shaswatamitra/falcon-yara-bi-e5-base-v2")
@article{mitra2025falcon,
title={FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation},
author={Mitra, Shaswata and Bazarov, Azim and Duclos, Martin and Mittal, Sudip and Piplai, Aritran and Rahman, Md Rayhanur and Zieglar, Edward and Rahimi, Shahram},
journal={arXiv preprint arXiv:2508.18684},
year={2025}
}
Base model
intfloat/e5-base-v2