|
This is an **interactive** blog, to give an overview of open-source language models for code generation. We present their pretraining datasets, model architecture and model evaluation along with examples and tips to use the π€ hub for this task. At the end of this blog, you will find a **demo** to test and compare code generation across these models β¨. |
|
|
|
|
|
## Introduction |
|
|
|
The application of language models to code generation has sparked significant interest recently. You have probably heard of [Codex](https://arxiv.org/pdf/2107.03374v2.pdf) which powers [Github Copilot](https://copilot.github.com/) or [AlphaCode](https://arxiv.org/pdf/2203.07814v1.pdf) for competition-level programming, which give promising results for code generation. However, these models aren't open-source and it is hard to reproduce them with a limited budget and incomplete information about their training. The ML community has luckily contributed some code models to allow for further research. |
|
|
|
It can be easy to get lost between the different models, so at Hugging Face, we aim to democratize ML and centralize all information in the π€ hub to make the usage of open-source tools easier and more efficient. Code Models aren't an exception, you can find all open-source code models on the hub, with several code datasets and evaluation metrics. In this blog, we will give an overview of these tools and how to use them. |