CodeParrot 🦜 is a GPT-2 model (1.5B parameters) trained to generate Python code.
You can load the CodeParrot model and tokenizer directly in
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("lvwerra/codeparrot") model = AutoModelWithLMHead.from_pretrained("lvwerra/codeparrot") inputs = tokenizer("def hello_world():", return_tensors="pt") outputs = model(**inputs)
or with a
from transformers import pipeline pipe = pipeline("text-generation", model="lvwerra/codeparrot") outputs = pipe("def hello_world():")
The model was trained on the cleaned CodeParrot 🦜 dataset with the following settings:
The training was executed on 16 x A100 (40GB) GPUs. This setting amounts to roughly 26 billion tokens.
We evaluated the model on OpenAI's HumanEval benchmark which consists of programming challenges:
The pass@k metric tells the probability that at least one out of k generations passes the tests.
- Downloads last month