yashika0998/IoT-23-BERT-Network-Logs-Classification

Introduction:

Exploring Language Models, we were captivated by an article on Intrusion Detection Systems and the IOT-23 dataset. This led us to explore how Language Models could enhance the prediction of malicious versus benign network logs, particularly in comparison to traditional methods like SVM and decision trees. The motivation behind our project was to address the balancing issue in existing methods and make our research accessible, ensuring accurate predictions in this critical task.

Intrusion Detection System Development:

Preprocessed the public IoT-23 dataset containing both benign and malicious traffic flows. Applied the SMOTE technique to oversample the minority benign class for balanced model training. Uploaded final datasets to the Hugging Face Hub, focusing on classification columns for accessibility and reproducibility. Mapped column features to sentences for Language Models, fine-tuning the uncased BERT model on encoded logs for robust classification. Achieved impressive 96% test accuracy after 12 epochs on a Ryzen 5 CPU. Saved the fine-tuned model in a cross-platform ONNX format for optimized deployment and future inference. Developed an interactive Gradio interface for user log file uploads, evaluating the model in real time through captured zeek/pcap file log traffic. Hosted the entire pipeline on Hugging Face Spaces for public availability and accessibility.

Conclusion:

Embarked on an NLP journey, showcasing the prowess it lends to IoT security. Anomaly detection is our key to thwarting attacks, and our open-source innovation beckons more minds to join the revolution

Note!! For Infrence and try out model, please direct to spaces @ https://huggingface.co/spaces/yashika0998/IoT-23-BERT-Network-Logs-Classification

Example sentence to test the model inference: response port is 8081. transport protocol is tcp. connection state is S0. number of packets sent by the origin is 2. number of IP level bytes sent by the originator is 80. number of IP level bytes sent by the responder is 0