utils/intro.md · codeparrot/code-generation-models at 9d9ee24a4c8b294a01ab9edbb7591d0bba6cb0b8

This is an interactive blog that provides an overview of open-source language models for code generation. This post presents code datasets, model architectures and evaluations along with examples and tips to use the 🤗 Hub for this task. At the end of this blog, you will find a demo to test and compare code generation across these models directly in the browser! ✨.

Introduction

The application of language models to code generation has sparked great interest recently. You have probably heard of Codex, the model behind Github Copilot, or AlphaCode for competition-level programming. These models aren't open-source, and it is hard to reproduce them with a limited budget and incomplete information about their training. The ML community has luckily contributed some code models to allow for further research.

However, it can be easy to get lost between models. At Hugging Face we aim to democratize ML and centralize all information in the 🤗 ecosystem to make the usage of open-source tools easier and more efficient. Code models aren't an exception, you can find all open-source models on the Hub, with several code datasets and evaluation metrics. In this blog we will give an overview of these tools and how to use them.