netFound-640M-base

Description

netFound is a network traffic foundation model that uses transformer architecture and includes a pretraining phase on unlabeled data to achieve high results.

Key features:

netFound takes raw PCAP data as input
netFound can (and need) be pretrained on the unlabeled dataset
netFound uses Hierarchical Transformer architecture to take into account packet burst and flow behavior
netFound uses burst metadata (inter arrival time, number of bytes per burst, etc)

Source code

https://github.com/SNL-UCSB/netfound

Pretraining dataset

For pretraining, we used a private real-world dataset consisting of more than 450mln network flows. The model was pretrained for approximately 1 epoch (iterated through ~480mln flows).

Checkpoint

Model: Large (16 heads, 24 hidden layers, 1024 hidden size)
Total params: 643,825,672
January 17, 2025

Paper

https://arxiv.org/abs/2310.17025