metadata
model-index:
- name: netFound-640M-base
results:
- task:
type: fill-mask
metrics:
- name: Macro MLM F1
type: f1
value: 0.4038
- name: Weighted MLM F1
type: f1
value: 0.8451
- name: MLM Accuracy
type: accuracy
value: 0.8514
- name: Swapped Weighted F1
type: f1
value: 0.9605
- name: Perplexity
type: perplexity
value: 6.5842
netFound-640M-base
Description
netFound is a network traffic foundation model that uses transformer architecture and includes a pretraining phase on unlabeled data to achieve high results.
Key features:
- netFound takes raw PCAP data as input
- netFound can (and need) be pretrained on the unlabeled dataset
- netFound uses Hierarchical Transformer architecture to take into account packet burst and flow behavior
- netFound uses burst metadata (inter arrival time, number of bytes per burst, etc)
Source code
https://github.com/SNL-UCSB/netfound
Pretraining dataset
For pretraining, we used a private real-world dataset consisting of more than 450mln network flows. The model was pretrained for approximately 1 epoch (iterated through ~480mln flows).
Checkpoint
Model: Large (16 heads, 24 hidden layers, 1024 hidden size)
Total params: 643,825,672
January 17, 2025