YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
roberta-python
language: python datasets: - code_search_net
This is a roberta pre-trained version on the CodeSearchNet dataset for Python Mask Language Model mission.
To load the model: (necessary packages: !pip install transformers sentencepiece)
from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
tokenizer = AutoTokenizer.from_pretrained("dbernsohn/roberta-python")
model = AutoModelWithLMHead.from_pretrained("dbernsohn/roberta-python")
fill_mask = pipeline(
"fill-mask",
model=model,
tokenizer=tokenizer
)
You can then use this model to fill masked words in a Python code.
code = """
new_dict = {}
for k, v in my_dict.<mask>():
new_dict[k] = v**2
""".lstrip()
pred = {x["token_str"].replace("Ġ", ""): x["score"] for x in fill_mask(code)}
sorted(pred.items(), key=lambda kv: kv[1], reverse=True)
# [('items', 0.7376779913902283),
# ('keys', 0.16238391399383545),
# ('values', 0.03965481370687485),
# ('iteritems', 0.03346433863043785),
# ('splitlines', 0.0032723243348300457)]
The whole training process and hyperparameters are in my GitHub repo
Created by Dor Bernsohn
- Downloads last month
- 17
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.