# roberta_python
---
language: code
datasets:
- code_search_net
- Fraser/python-lines
tags:
- python
- code
- masked-lm
widget:
- text "assert 6 == sum([i for i in range(<mask>)])"
---
# Details
This is a roBERTa-base model trained on the python part of [CodeSearchNet](https://github.com/github/CodeSearchNet) and reached a dev perplexity of 3.296

This model was used for the Programming Puzzles enumerative solver baseline detailed in [Programming Puzzles paper](https://arxiv.org/abs/2106.05784).

See also the [Python Programming Puzzles (P3) Repository](https://github.com/microsoft/PythonProgrammingPuzzles) for more details.

# Usage

You can either load the model and further fine-tune it for a target task (as done for the puzzle solver), or you can experiment with mask-filling directly with this model as in the following example:

```python
from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline

tokenizer = AutoTokenizer.from_pretrained("tals/roberta_python")
model = AutoModelWithLMHead.from_pretrained("tals/roberta_python")

demo = pipeline("fill-mask", model=model, tokenizer=tokenizer)

code = """sum= 0
for i in range(<mask>):
    sum += i
assert sum == 6
"""
demo(code)
```

# BibTeX entry and citation info

```bibtex
@inproceedings{
      schuster2021programming,
      title={Programming Puzzles},
      author={Tal Schuster and Ashwin Kalyan and Alex Polozov and Adam Tauman Kalai},
      booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
      year={2021},
      url={https://openreview.net/forum?id=fe_hCc4RBrg}
}
```