Edit model card

distilbert-base-uncased-logline-v3

This model is a fine-tuned version of distilbert-base-uncased on the AIT Log Data Set V2.0 dataset1, https://zenodo.org/records/5789064. It achieves the following results on the evaluation set:

  • Loss: 0.0022
  • Accuracy: 0.9995
  • F1: 0.9994

Model description

This model is meant for text classification of log files for network intrusion detection. The python package that runs this model can be found here -> https://github.com/Isaacwilliam4/INSyT. As mentioned on their site, this model was trained on the following logs: Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs.

Labels

Label Label Name
0 attacker:dnsteal:dnsteal-dropped
1 attacker:dnsteal:dnsteal-received
2 attacker:dnsteal:exfiltration-service
3 attacker_change_user:escalate
4 attacker_change_user:escalate:escalated_command:escalated_sudo_command
5 attacker_http:dirb:foothold
6 attacker_http:foothold:service_scan
7 attacker_http:foothold:webshell_cmd
8 attacker_http:foothold:webshell_upload
9 attacker_http:foothold:wpscan
10 attacker_vpn:escalate
11 attacker_vpn:foothold
12 benign
13 crack_passwords:escalate
14 dirb:foothold
15 dns_scan:foothold
16 escalate:escalated_command:escalated_sudo_command
17 escalate:escalated_command:escalated_sudo_command:escalated_sudo_session
18 escalate:webshell_cmd
19 foothold:network_scan
20 foothold:service_scan
21 foothold:traceroute
22 foothold:wpscan

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
0.0435 1.0 6274 0.0120 0.9965 0.9965
0.0059 2.0 12548 0.0032 0.9993 0.9992
0.0023 3.0 18822 0.0022 0.9995 0.9994

Test results

Test Loss Test Accuracy Test F1
0.0020 0.9994 0.9994

Five Fold Cross Validation Mean Test Confusion Matrix

Five Fold Cross Validation Mean Test Confusion Matrix

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.1

Citations

[1]M. Landauer, F. Skopik, M. Frank, W. Hotwagner, M. Wurzenbergerand A. Rauber, “AIT Log Data Set V2.0”. Zenodo, Feb. 24, 2022. doi: 10.5281/zenodo.5789064.

Downloads last month
7
Safetensors
Model size
67M params
Tensor type
F32
·

Finetuned from