roberta_python

language: code datasets: - code_search_net - Fraser/python-lines tags: - python - code - masked-lm widget: - text "assert 6 == sum([i for i in range(

Details

This is a roBERTa-base model trained on the python part of CodeSearchNet and reached a dev perplexity of 3.296

This model was used for the Programming Puzzles enumerative solver baseline detailed in Programming Puzzles paper.

See also the Python Programming Puzzles (P3) Repository for more details.

Usage

You can either load the model and further fine-tune it for a target task (as done for the puzzle solver), or you can experiment with mask-filling directly with this model as in the following example:

from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline

tokenizer = AutoTokenizer.from_pretrained("tals/roberta_python")
model = AutoModelWithLMHead.from_pretrained("tals/roberta_python")

demo = pipeline("fill-mask", model=model, tokenizer=tokenizer)

code = """sum= 0
for i in range(<mask>):
    sum += i
assert sum == 6
"""
demo(code)

BibTeX entry and citation info

@inproceedings{
      schuster2021programming,
      title={Programming Puzzles},
      author={Tal Schuster and Ashwin Kalyan and Alex Polozov and Adam Tauman Kalai},
      booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
      year={2021},
      url={https://openreview.net/forum?id=fe_hCc4RBrg}
}