FALCON bi-encoder — YARA / `e5-base-v2`

Contrastive encoder fine-tuned to map CTI text and YARA rules into a shared embedding space. Backbone: intfloat/e5-base-v2.

Test-set metrics

split	recall@1	F1	threshold	diag mean	off-diag mean
pretrained	0.5480	0.2954	0.7113	0.8810	0.8226
run_0	0.9498	0.9298	0.7009	0.9494	0.1193
run_1	0.9509	0.9290	0.7026	0.9635	0.1401
run_2	0.9498	0.9314	0.7060	0.9645	0.1542
run_3	0.9498	0.9381	0.7059	0.9630	0.0780
run_4	0.9498	0.9335	0.7082	0.9746	0.0298

Training

Symmetric InfoNCE / NT-Xent over in-batch negatives. Best checkpoint selected by validation loss.

Run 0 — batch=16, epochs=5, lr=2e-05, schedule=constant, T=0.05
Run 1 — batch=50, epochs=10, lr=2e-05, schedule=constant, T=0.05
Run 2 — batch=70, epochs=30, lr=2e-05, schedule=constant, T=0.05
Run 3 — batch=128, epochs=30, lr=5e-05, schedule=warmup_cosine, T=0.05
Run 4 — batch=70, epochs=50, lr=2e-05, schedule=constant, T=0.07

Loading

from transformers import AutoModel, AutoTokenizer
tok   = AutoTokenizer.from_pretrained("shaswatamitra/falcon-yara-bi-e5-base-v2")
model = AutoModel.from_pretrained("shaswatamitra/falcon-yara-bi-e5-base-v2")

Citation

@article{mitra2025falcon,
  title={FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation},
  author={Mitra, Shaswata and Bazarov, Azim and Duclos, Martin and Mittal, Sudip and Piplai, Aritran and Rahman, Md Rayhanur and Zieglar, Edward and Rahimi, Shahram},
  journal={arXiv preprint arXiv:2508.18684},
  year={2025}
}

Downloads last month: 20

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for shaswatamitra/falcon-yara-bi-e5-base-v2

Base model

intfloat/e5-base-v2

Finetuned

(81)

this model

Collection including shaswatamitra/falcon-yara-bi-e5-base-v2

FALCON

Collection

FALCON: Transforming Cyber Threat Intelligence into Deployable IDS Rules with Self-Reflection • 16 items • Updated 18 days ago

Paper for shaswatamitra/falcon-yara-bi-e5-base-v2

FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation

Paper • 2508.18684 • Published Aug 26, 2025

FALCON bi-encoder — YARA / e5-base-v2