This is the pretrained model presented in SecBERT: A Pretrained Language Model for Cyber Security Text, which is a SecRoBERTa model trained on cyber security text.
The training corpus was papers taken from
- Stucco-Data: Cyber security data sources
- CASIE: Extracting Cybersecurity Event Information from Text
- SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP).
SecRoBERTa has its own wordpiece vocabulary (secvocab) that's built to best match the training corpus.
Available models include:
We proposed to build language model which work on cyber security text, as result, it can improve downstream tasks (NER, Text Classification, Semantic Understand, Q&A) in Cyber Security Domain.
The original repo can be found here.
- Downloads last month