loubnabnl's picture
loubnabnl HF staff
add introduction
edca0ee
raw
history blame
1.41 kB
This is an **interactive** blog, to give an overview of open-source language models for code generation. We present their pretraining datasets, model architecture and model evaluation along with examples and tips to use the πŸ€— hub for this task. At the end of this blog, you will find a **demo** to test and compare code generation across these models ✨.
## Introduction
The application of language models to code generation has sparked significant interest recently. You have probably heard of [Codex](https://arxiv.org/pdf/2107.03374v2.pdf) which powers [Github Copilot](https://copilot.github.com/) or [AlphaCode](https://arxiv.org/pdf/2203.07814v1.pdf) for competition-level programming, which give promising results for code generation. However, these models aren't open-source and it is hard to reproduce them with a limited budget and incomplete information about their training. The ML community has luckily contributed some code models to allow for further research.
It can be easy to get lost between the different models, so at Hugging Face, we aim to democratize ML and centralize all information in the πŸ€— hub to make the usage of open-source tools easier and more efficient. Code Models aren't an exception, you can find all open-source code models on the hub, with several code datasets and evaluation metrics. In this blog, we will give an overview of these tools and how to use them.