The model is a port of our CommentBERT model from the paper:
@inproceedings{ochodek2022automated,
title={Automated code review comment classification to improve modern code reviews},
author={Ochodek, Miroslaw and Staron, Miroslaw and Meding, Wilhelm and S{\"o}der, Ola},
booktitle={International Conference on Software Quality},
pages={23--40},
year={2022},
organization={Springer}
}
The original model was implemented in Keras with two outputs - comment-purpose and subject-purpose. Here, we divided it into two separate model with one output each.
license: apache-2.0
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import numpy as np
def sigmoid(x):
return 1/(1 + np.exp(-x))
classes = [
'code_design',
'code_style',
'code_naming',
'code_logic',
'code_io',
'code_data',
'code_doc',
'code_api',
'compatibility',
'rule_def',
'config_commit_patch_review',
'config_building_installing',
]
class2id = {class_:id for id, class_ in enumerate(classes)}
id2class = {id:class_ for class_, id in class2id.items()}
checkpoint = 'mochodek/bert4comment-subject'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
text = "What do you think about making this constant?"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
logits = output.logits.detach().numpy()
scores = sigmoid(logits)
scores = (scores > 0.5).astype(int).reshape(-1)
scores_labels = [class_name for class_name in classes if scores[class2id[class_name]] == 1 ]
- Downloads last month
- 120
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.