CyberLLM-350M
A 350M parameter cybersecurity language model built entirely from scratch.
Model Details
- Architecture: LLaMA-3 style decoder-only transformer
- Parameters: 303.4M
- Training Data: 5B tokens (3.2B security + general)
- Final Loss: 3.80 (pretrain) โ 1.28 (SFT)
- Vocab: 32,000 tokens (custom SentencePiece)
- Context: 2,048 tokens
Training
Pretrained from random initialization on cybersecurity-weighted data including Trend Micro Primus-FineWeb, Stack Exchange security sites, ArXiv cs.CR, MITRE ATT&CK, NIST SP 800 series, and OWASP documentation.
Fine-tuned with 3,750 cybersecurity instruction-response pairs.
Usage
# Download and chat
git clone https://github.com/Omkarth/CyberLLM.git
cd CyberLLM
pip install huggingface_hub torch sentencepiece pyyaml
python -c "
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='model.pt', local_dir='checkpoints')
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='config.yaml', local_dir='checkpoints')
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='cybersec_tokenizer.model', local_dir='tokenizer')
"
python training/chat.py --model checkpoints/model.pt --question "What is SQL injection?"
Limitations
350M parameters is small โ handles common security topics but struggles with niche technical details. Not a production security tool.
Author
Omkar Thombre โ Master of Computer Science, University of Adelaide
- Downloads last month
- -