--- license: apache-2.0 language: - en --- The model is a port of our CommentBERT model from the paper: ``` @inproceedings{ochodek2022automated, title={Automated code review comment classification to improve modern code reviews}, author={Ochodek, Miroslaw and Staron, Miroslaw and Meding, Wilhelm and S{\"o}der, Ola}, booktitle={International Conference on Software Quality}, pages={23--40}, year={2022}, organization={Springer} } ``` The original model was implemented in Keras with two outputs - comment-purpose and subject-purpose. Here, we divided it into two separate model with one output each. --- license: apache-2.0 --- ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import numpy as np def sigmoid(x): return 1/(1 + np.exp(-x)) classes = [ 'code_design', 'code_style', 'code_naming', 'code_logic', 'code_io', 'code_data', 'code_doc', 'code_api', 'compatibility', 'rule_def', 'config_commit_patch_review', 'config_building_installing', ] class2id = {class_:id for id, class_ in enumerate(classes)} id2class = {id:class_ for class_, id in class2id.items()} checkpoint = 'mochodek/bert4comment-subject' tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSequenceClassification.from_pretrained(checkpoint) text = "What do you think about making this constant?" encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) logits = output.logits.detach().numpy() scores = sigmoid(logits) scores = (scores > 0.5).astype(int).reshape(-1) scores_labels = [class_name for class_name in classes if scores[class2id[class_name]] == 1 ] ```