File size: 1,758 Bytes
90c5899
 
 
8132247
90c5899
8132247
90c5899
 
 
 
 
 
8132247
 
 
 
 
 
 
 
 
 
90c5899
 
 
 
 
 
 
 
 
 
 
 
8132247
 
 
 
 
90c5899
8132247
 
90c5899
 
8132247
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: mit
---
# Dockerfile Commit Classification Model

This is a Logistic Regression model enhanced with a rule-based system for multi-label classification of Dockerfile-related commit messages. It combines machine learning with domain-specific rules to achieve accurate categorization.

## Files
- `logistic_model.joblib`: Trained Logistic Regression model.
- `tfidf_vectorizer.joblib`: TF-IDF vectorizer for text preprocessing.
- `label_binarizer.joblib`: MultiLabelBinarizer for encoding/decoding labels.

## Features
- **Hybrid Approach**: Combines machine learning with rule-based adjustments for better classification.
- **Dockerfile-Specific Labels**: Categorizes commit messages into predefined classes:
  - `bug fix`
  - `code refactoring`
  - `feature addition`
  - `maintenance/other`
  - `Not enough information`
- **Multi-Label Support**: Each commit message can belong to multiple categories.

## How to Use
To use this model, load the files and preprocess your data as follows:

```python
from joblib import load

# Load the model and preprocessing artifacts
model = load("logistic_model.joblib")
tfidf_vectorizer = load("tfidf_vectorizer.joblib")
mlb = load("label_binarizer.joblib")

# Example usage
new_messages = [
    "Fixed an issue with the base image in Dockerfile",
    "Added multistage builds to reduce image size",
    "Updated Python version in Dockerfile to 3.10"
]
X_new_tfidf = tfidf_vectorizer.transform(new_messages)

# Predict the labels
predictions = model.predict(X_new_tfidf)
predicted_labels = mlb.inverse_transform(predictions)

# Print results
for msg, labels in zip(new_messages, predicted_labels):
    print(f"Message: {msg}")
    print(f"Predicted Labels: {', '.join(labels) if labels else 'No labels'}\n")