CodeParrot

non-profit

AI & ML interests

Language models for code.

drawing

Check the new instruction-tuning resources:

  • InstructHumanEval: a variant of HumanEval benchamrk adapted for instruction-tuned models InstructHumanEval

  • Full Curated CoNaLa: we used UL2 to rewritte more than 590k uncurated intents in CoNaLa dataset conala-mined-curated

  • Self-Instruct with StarCoder: we release a selft-instruct dataset generated with StarCoder, as weel as the code we used to build it self-instruct-starcoder

  • Models trained on CoNaLa and self-instruct StarCoder: we release a the models we trained on the previous two datasets.


  • This organization is dedicated to language models for code generation. In particular CodeParrot is a GPT-2 model trained to generate Python code. For advanced Code Language Models and pre-training datasets we recommend checking our work in the BigCode organization. Here you can find: