distilbert-base-uncased-logline-v3

This model is a fine-tuned version of distilbert-base-uncased on the AIT Log Data Set V2.0 dataset¹, https://zenodo.org/records/5789064. It achieves the following results on the evaluation set:

Loss: 0.0022
Accuracy: 0.9995
F1: 0.9994

Model description

This model is meant for text classification of log files for network intrusion detection. The python package that runs this model can be found here -> https://github.com/Isaacwilliam4/INSyT. As mentioned on their site, this model was trained on the following logs: Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs.

Labels

Label	Label Name
0	attacker:dnsteal:dnsteal-dropped
1	attacker:dnsteal:dnsteal-received
2	attacker:dnsteal:exfiltration-service
3	attacker_change_user:escalate
4	attacker_change_user:escalate:escalated_command:escalated_sudo_command
5	attacker_http:dirb:foothold
6	attacker_http:foothold:service_scan
7	attacker_http:foothold:webshell_cmd
8	attacker_http:foothold:webshell_upload
9	attacker_http:foothold:wpscan
10	attacker_vpn:escalate
11	attacker_vpn:foothold
12	benign
13	crack_passwords:escalate
14	dirb:foothold
15	dns_scan:foothold
16	escalate:escalated_command:escalated_sudo_command
17	escalate:escalated_command:escalated_sudo_command:escalated_sudo_session
18	escalate:webshell_cmd
19	foothold:network_scan
20	foothold:service_scan
21	foothold:traceroute
22	foothold:wpscan

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
0.0435	1.0	6274	0.0120	0.9965	0.9965
0.0059	2.0	12548	0.0032	0.9993	0.9992
0.0023	3.0	18822	0.0022	0.9995	0.9994

Test results

Test Loss	Test Accuracy	Test F1
0.0020	0.9994	0.9994

Five Fold Cross Validation Mean Test Confusion Matrix

Framework versions

Transformers 4.38.2
Pytorch 2.0.0+cu117
Datasets 2.18.0
Tokenizers 0.15.1

Citations

[1]M. Landauer, F. Skopik, M. Frank, W. Hotwagner, M. Wurzenbergerand A. Rauber, “AIT Log Data Set V2.0”. Zenodo, Feb. 24, 2022. doi: 10.5281/zenodo.5789064.

isaacwilliam4
/

insyt