Edit model card
YAML Metadata Error: "language" with value "python" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.

This is a roberta pre-trained version on the CodeSearchNet dataset for Python Mask Language Model mission.

To load the model: (necessary packages: !pip install transformers sentencepiece)

from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
tokenizer = AutoTokenizer.from_pretrained("dbernsohn/roberta-python")
model = AutoModelWithLMHead.from_pretrained("dbernsohn/roberta-python")

fill_mask = pipeline(
    "fill-mask",
    model=model,
    tokenizer=tokenizer
)

You can then use this model to fill masked words in a Python code.

code = """
new_dict = {}
for k, v in my_dict.<mask>():
    new_dict[k] = v**2
""".lstrip()

pred = {x["token_str"].replace("Ġ", ""): x["score"] for x in fill_mask(code)}
sorted(pred.items(), key=lambda kv: kv[1], reverse=True)
# [('items', 0.7376779913902283),
# ('keys', 0.16238391399383545),
# ('values', 0.03965481370687485),
# ('iteritems', 0.03346433863043785),
# ('splitlines', 0.0032723243348300457)]

The whole training process and hyperparameters are in my GitHub repo

Created by Dor Bernsohn

Downloads last month
39
Hosted inference API
Mask token: <mask>
This model can be loaded on the Inference API on-demand.

Dataset used to train dbernsohn/roberta-python