CuBERT: Learning and Evaluating Contextual Embedding of Source Code
Overview
This model is the unofficial HuggingFace version of "CuBERT". In particular, this version comes from gs://cubert/20210711_Python/pre_trained_model_epochs_2__length_1024. It was trained 2021-07-11 for 2 epochs with a 1024 token context window on the Python BigQuery dataset. I manually converted the Tensorflow checkpoint to PyTorch and have uploaded it here. The tokenizer has not been converted yet. All credit goes to Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi.
The other versions are available here:
Citation:
@inproceedings{cubert,
author = {Aditya Kanade and
Petros Maniatis and
Gogul Balakrishnan and
Kensen Shi},
title = {Learning and evaluating contextual embedding of source code},
booktitle = {Proceedings of the 37th International Conference on Machine Learning,
{ICML} 2020, 12-18 July 2020},
series = {Proceedings of Machine Learning Research},
publisher = {{PMLR}},
year = {2020},
}
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.