Model Card for SQL Injection Classifier

This model is a classifier that detects SQL injection attacks in SQL queries. It is based on the google/gemma-2b-it model and uses the peft library for training and evaluation. This model is trained on a dataset of SQL queries with and without SQL injection attacks.

Model Details

Model Description

This SQL injection classifier is a fine-tuned version of the google/gemma-2b-it model, optimized to detect potential SQL injection vulnerabilities in SQL queries. It uses the PEFT (Parameter-Efficient Fine-Tuning) library to achieve high performance while maintaining efficiency.

The model demonstrates exceptional performance in classifying SQL queries as either secure or vulnerable:

Accuracy: 0.9984 
Precision: 0.9974 
Recall: 0.9993 
F1-score: 0.9984 

Classification Report: 

              precision    recall  f1-score   support 

     Secure     1.00      1.00      1.00      5658 
 Vulnerable     1.00      1.00      1.00      5467 
    accuracy                         1.00     11125 
   macro avg    1.00      1.00      1.00     11125 
weighted avg    1.00      1.00      1.00     11125

Developed by: Mahesh Jamdade
Model type: Text Classification
Language(s) (NLP): SQL, English
License: [More Information Needed]
Finetuned from model: google/gemma-2b-it

Model Sources

Repository: https://huggingface.co/maheshmnj/sql-injection-classifier

Uses

Direct Use

This model can be directly used to classify SQL queries as either secure or vulnerable to SQL injection attacks. It can be integrated into security tools, database management systems, or web application firewalls to provide an additional layer of protection against SQL injection vulnerabilities.

Downstream Use

The model can be further fine-tuned or integrated into larger security ecosystems. It could be used as a component in:

Code review tools
Automated security testing suites
Real-time query analysis systems in database applications

Out-of-Scope Use

This model is specifically trained for SQL injection detection and should not be used for:

Detecting other types of security vulnerabilities
Generating or correcting SQL queries
Analyzing queries in languages other than SQL

Bias, Risks, and Limitations

The model's performance may vary on SQL dialects or patterns not well-represented in the training data.
False positives or negatives, while rare given the high accuracy, could still occur and should be considered in critical applications.
The model may not catch highly sophisticated or novel SQL injection techniques.

Recommendations

Always use this model as part of a comprehensive security strategy, not as the sole defense against SQL injection.
Regularly update and retrain the model with new, real-world SQL injection patterns.
Implement additional security measures such as parameterized queries and input sanitization.

How to Get Started with the Model

Use the following code to get started with the model:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_path = "maheshj01/sql-injection-classifier"
model = AutoModelForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Function to classify a SQL query
def classify_query(query):
    inputs = tokenizer(query, return_tensors="pt", truncation=True, padding=True)
    outputs = model(**inputs)
    prediction = outputs.logits.argmax(-1).item()
    return "Vulnerable" if prediction == 1 else "Secure"

# Example usage
query = "SELECT * FROM users WHERE username = 'admin' OR '1'='1'"
result = classify_query(query)
print(f"The query is classified as: {result}")

Training Details

Training Data

The model was trained on a dataset of SQL queries, including both secure queries and queries containing SQL injection vulnerabilities. [More specific information about the dataset is needed]

Training Procedure

The model was fine-tuned using the PEFT library, which allows for efficient adaptation of the pre-trained Gemma 2B model to the SQL injection classification task.

Training Hyperparameters

Training regime: [More Information Needed]

Evaluation

The model was evaluated on a held-out test set of SQL queries, achieving high performance across all metrics as shown in the classification report above.

Environmental Impact

[More Information Needed]

Technical Specifications

Model Architecture and Objective

The model is based on the google/gemma-2b-it architecture, fine-tuned for binary classification of SQL queries.

Compute Infrastructure

Software

PEFT 0.8.2
Transformers [version needed]
PyTorch [version needed]

Model Card Contact

For questions or concerns about this model, please contact Mahesh Jamdade through the Hugging Face repository.

maheshj01
/

sql-injection-classifier