Gemini

For in-depth understanding of our model and methods, please see our blog here

Model description

Gemini is a transformer based on Google's T5 model. The model is pre-trained on approximately 800k code/description pairs and then fine-tuned on 10k higher-level explanations that were synthetically generated. Gemini is capable of summarization/explaining short to medium code snippets in:

Python
Javascript (mostly vanilla JS, however, it can handle frameworks like React as well)
Java
Ruby
Go

And outputs a description in English.

Intended uses

Gemini without any additional fine-tuning is capable of explaining code in a sentence or two and typically performs best in Python and Javascript. We recommend using Gemini for either simple code explanation, documentation or producing more synthetic data to improve its explanations.

How to use

You can use this model directly with a pipeline for Text2Text generation, as shown below:

from transformers import pipeline, set_seed

summarizer = pipeline('text2text-generation', model='describeai/gemini')
code = "print('hello world!')"

response = summarizer(code, max_length=100, num_beams=3)
print("Summarized code: " + response[0]['generated_text'])

Which should yield something along the lines of:

Summarized code: The following code is greeting the world.

Model sizes

Gemini (this repo): 770 Million Parameters
Gemini-Small - 220 Million Parameters

Limitations

Typically, Gemini may produce overly simplistic descriptions that don't encompass the entire code snippet. We suspect with more training data, this could be circumvented and will produce better results.

About Us

A Describe.ai, we are focused on building Artificial Intelligence systems that can understand language as well as humans. While a long path, we plan to contribute our findings to our API to the Open Source community.

describeai
/

gemini