Dockerfile Commit Classification Model

This is a Logistic Regression model enhanced with a rule-based system for multi-label classification of Dockerfile-related commit messages. It combines machine learning with domain-specific rules to achieve accurate categorization.

Files

  • logistic_model.joblib: Trained Logistic Regression model.
  • tfidf_vectorizer.joblib: TF-IDF vectorizer for text preprocessing.
  • label_binarizer.joblib: MultiLabelBinarizer for encoding/decoding labels.

Features

  • Hybrid Approach: Combines machine learning with rule-based adjustments for better classification.
  • Dockerfile-Specific Labels: Categorizes commit messages into predefined classes:
    • bug fix
    • code refactoring
    • feature addition
    • maintenance/other
    • Not enough information
  • Multi-Label Support: Each commit message can belong to multiple categories.

How to Use

To use this model, load the files and preprocess your data as follows:

from joblib import load

# Load the model and preprocessing artifacts
model = load("logistic_model.joblib")
tfidf_vectorizer = load("tfidf_vectorizer.joblib")
mlb = load("label_binarizer.joblib")

# Example usage
new_messages = [
    "Fixed an issue with the base image in Dockerfile",
    "Added multistage builds to reduce image size",
    "Updated Python version in Dockerfile to 3.10"
]
X_new_tfidf = tfidf_vectorizer.transform(new_messages)

# Predict the labels
predictions = model.predict(X_new_tfidf)
predicted_labels = mlb.inverse_transform(predictions)

# Print results
for msg, labels in zip(new_messages, predicted_labels):
    print(f"Message: {msg}")
    print(f"Predicted Labels: {', '.join(labels) if labels else 'No labels'}\n")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including meriemm6/commit-classification-logreg