Transformers
PyTorch
t5
text2text-generation
network-traffic
foundation-model
traffic-classification
traffic-generation
text-generation-inference

Lens (pretrained base)

Pretrained checkpoint of Lens, a knowledge-guided foundation model for network traffic (TMLR). The backbone is T5-v1.1-base (~0.25B params) with a network-specific BBPE tokenizer (vocab 32,112), pretrained on network-traffic flows with a knowledge-guided masked-span objective.

Files

  • pytorch_model.bin — pretrained weights (loads cleanly into T5ForConditionalGeneration).
  • config.json — model config (T5-v1.1-base, vocab_size=32112).
  • tokenizer.json, tokenizer_config.json, special_tokens_map.json — the network BBPE tokenizer.

How to load

from transformers import T5ForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Charles59/lens-pretrained")
model = T5ForConditionalGeneration.from_pretrained("Charles59/lens-pretrained")

The released Lens code adds the special tokens <SIP> / <DIP> (anonymized source/destination IP) at fine-tuning time and can run the optimized flash-attention variant (attention_type='flash'). For exact reproduction, load this checkpoint with the Lens training scripts and the corresponding downstream data.

Pretraining

  • Architecture: T5-v1.1-base (encoder-decoder, 12+12 layers, d_model 768, gated-GELU).
  • Objective: knowledge-guided masked-span prediction over packet/flow text.
  • Context length: up to 1,500 tokens.
  • Steps: 130,000 (10% warm-up), batch size 48, AdamW.
  • Pretraining data: the pretraining split of the NetBench source datasets, sampled without any downstream labels to avoid label leakage. The pretraining corpus itself is not released.

Downstream data

License

CC-BY-NC-4.0. Underlying data comes from academic datasets via NetBench (Qian et al., 2024); their original terms also apply.

Citation

@article{li2026lens,
  title   = {Lens: A Knowledge-Guided Foundation Model for Network Traffic},
  author  = {Li, Xiaochang and Qian, Chen and Wang, Qineng and Kong, Jiangtao and Wang, Yuchen and Yao, Ziyu and Ji, Bo and Cheng, Long and Zhou, Gang and Shao, Huajie},
  journal = {Transactions on Machine Learning Research},
  issn    = {2835-8856},
  year    = {2026},
  url     = {https://openreview.net/forum?id=cGDwTgnJIR},
  note    = {arXiv:2402.03646}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Charles59/lens-pretrained

Finetuned
(59)
this model

Datasets used to train Charles59/lens-pretrained

Paper for Charles59/lens-pretrained