# roberta_python --- language: code datasets: - code_search_net - Fraser/python-lines tags: - python - code - masked-lm widget: - text "assert 6 == sum([i for i in range()])" --- # Details This is a roBERTa-base model trained on the python part of [CodeSearchNet](https://github.com/github/CodeSearchNet) and reached a dev perplexity of 3.296 This model was used for the Programming Puzzles enumerative solver baseline detailed in [Programming Puzzles paper](https://arxiv.org/abs/2106.05784). See also the [Python Programming Puzzles (P3) Repository](https://github.com/microsoft/PythonProgrammingPuzzles) for more details. # Usage You can either load the model and further fine-tune it for a target task (as done for the puzzle solver), or you can experiment with mask-filling directly with this model as in the following example: ```python from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline tokenizer = AutoTokenizer.from_pretrained("tals/roberta_python") model = AutoModelWithLMHead.from_pretrained("tals/roberta_python") demo = pipeline("fill-mask", model=model, tokenizer=tokenizer) code = """sum= 0 for i in range(): sum += i assert sum == 6 """ demo(code) ``` # BibTeX entry and citation info ```bibtex @inproceedings{ schuster2021programming, title={Programming Puzzles}, author={Tal Schuster and Ashwin Kalyan and Alex Polozov and Adam Tauman Kalai}, booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)}, year={2021}, url={https://openreview.net/forum?id=fe_hCc4RBrg} } ```