This is a microsoft/codebert-base-mlm
model, trained for 1,000,000 steps (with batch_size=32
) on C code from the codeparrot/github-code-clean
dataset, on the masked-language-modeling task.
It is intended to be used in CodeBERTScore: https://github.com/neulab/code-bert-score, but can be used for any other model or task.
For more information, see: https://github.com/neulab/code-bert-score
Citation
If you use this model for research, please cite:
@article{zhou2023codebertscore,
url = {https://arxiv.org/abs/2302.05527},
author = {Zhou, Shuyan and Alon, Uri and Agarwal, Sumit and Neubig, Graham},
title = {CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code},
publisher = {arXiv},
year = {2023},
}