|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- python |
|
- document |
|
- code |
|
- code2doc |
|
- instruction_tuned |
|
- basemodel |
|
- pytorch |
|
- docstring |
|
- documentation |
|
- text-generation-inference |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-generation |
|
widget: |
|
- text: '<example_response>--code:def function_divide2(x): return x / 2--question:Document the code--doc:Description:This function takes a number and divides it by 2.Parameters:- x (numeric): The input value to be divided by 2.Returns:- float: The result of x divided by 2.Example:To call the function, use the following code:function_divide2(1.0)</example_response><function_code>def _plot_bounding_polygon(polygons_coordinates, output_html_path=bounding_polygon_map.html):map_center = [sum([coord[0]for polygon_coords in polygons_coordinatesfor coord in polygon_coords])/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),sum([coord[1]for polygon_coords in polygons_coordinatesfor coord in polygon_coords])/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),]my_map = folium.Map(location=map_center, zoom_start=12)for polygon_coords in polygons_coordinates:folium.Polygon(locations=polygon_coords,color=blue,fill=True,fill_color=blue,fill_opacity=0.2,).add_to(my_map)marker_cluster = MarkerCluster().add_to(my_map)for polygon_coords in polygons_coordinates:for coord in polygon_coords:folium.Marker(location=[coord[0], coord[1]], popup=fCoordinates: {coord}).add_to(marker_cluster)draw = Draw(export=True)draw.add_to(my_map)my_map.save(output_html_path)return output_html_path</function_code><question>Document the python code above giving function description ,parameters and return type and example how to call the function</question><doc>' |
|
example_title: example |
|
--- |
|
# pip-code-to-doc |
|
|
|
[pipableAi](https://www.linkedin.com/company/pipable.ai/about/) |
|
|
|
[colab_notebook](https://colab.research.google.com/drive/17PyMU_3QN9LROy7x-jmaema0cuLRzBvc?usp=sharing) |
|
|
|
## What have we built? |
|
|
|
A 1.3 bn code documentation model that outperforms most models on documenting codes and making your in-house libs ready for LLM and RAG pipelines. |
|
We have also open sourced a [parsing lib](https://github.com/PipableAI/pip-library-parser) for the same, together the lib and model can turn your codebase to functional parse tree ready to be consumed by LLMs to execute complex tasks. |
|
This is a further trained version of pip-sql-1.3b. |
|
|
|
## How we built it? |
|
|
|
We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up. |
|
Loss behaviour in the set up mentioned above - |
|
|
|
## License |
|
|
|
The model is open source under apache 2.0. License |
|
|
|
## Usage |
|
|
|
|
|
### Library use |
|
```python |
|
!pip3 install git+https://github.com/PipableAI/pip-library-parser |
|
!pip3 install atlassian-python-api |
|
|
|
|
|
from pip_library_parser import CodeToDocGenerator |
|
|
|
# Replace 'your_module' and 'YourModule' with the actual module and module name |
|
module_name = 'your_module' |
|
module = __import__(module_name) |
|
|
|
# Instantiate the CodeToDocGenerator |
|
generator = CodeToDocGenerator() |
|
|
|
# Generate docstrings for the module's functions and methods |
|
docs = generator.generate_module_docs(module, module_name) |
|
|
|
# 'docs' now contains a dictionary mapping function/method names to their generated docstrings |
|
|
|
``` |
|
|
|
```python |
|
from pip_library_parser import CodeToDocGenerator |
|
|
|
# Instantiate the CodeToDocGenerator |
|
generator = CodeToDocGenerator() |
|
|
|
code_snippet = """ |
|
def example_function(x): |
|
return x * 2 |
|
""" |
|
|
|
docstring = generator.generate_docstring_from_pip_model(code_snippet) |
|
print("Generated Docstring:") |
|
print(docstring) |
|
``` |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers |
|
``` |
|
|
|
### Prompt |
|
```python |
|
prompt = f"""<example_response>{--question , --query}</example_response><function_code>{code}</function_code> |
|
<question>Give one line description of the python code above in natural language.</question> |
|
<doc>""" |
|
|
|
prompt = f"""<example_response>{example of some --question: , --query}</example_response><schema>{schema with cols described}</schema> |
|
<question>Write a sql query to ....</question> |
|
<sql>""" |
|
``` |
|
|
|
### PyTorch |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
device = "cuda" |
|
model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-code-to-doc-1.3b").to(device) |
|
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-code-to-doc-1.3b") |
|
prompt = f"""<example_response> |
|
--code:def function_2(x): return x / 2 |
|
--question:Document the code |
|
--doc: |
|
Description:This function takes a number and divides it by 2. |
|
Parameters: |
|
- x (numeric): The input value to be divided by 2. |
|
Returns: |
|
- float: The result of x divided by 2 |
|
Example: |
|
To call the function, use the following code: |
|
function2(1.0)</example_response> |
|
<function_code> |
|
def example_function(x): |
|
return x * 2 |
|
</function_code> |
|
<question>Document the python code above giving function description ,parameters and return type and example how to call the function.</question> |
|
<doc>""" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=300) |
|
tokenizer.decode(outputs[0], skip_special_tokens=True).split('<doc>')[-1].split('</doc>')[0] |
|
``` |
|
|
|
|
|
|
|
## Examples |
|
|
|
### 1. Code Documentation |
|
### prompt |
|
```python |
|
text=''' <example_response> |
|
--code:def function_2(x): return x / 2 |
|
--question:Document the code |
|
--doc: |
|
Description:This function takes a number and divides it by 2. |
|
Parameters: |
|
- x (numeric): The input value to be divided by 2. |
|
Returns: |
|
- float: The result of x divided by 2 |
|
Example: |
|
To call the function, use the following code: |
|
function2(1.0)</example_response> |
|
<function_code>def _plot_bounding_polygon( |
|
polygons_coordinates, output_html_path="bounding_polygon_map.html" |
|
): |
|
# Create a Folium map centered at the average coordinates of all bounding boxes |
|
map_center = [ |
|
sum( |
|
[ |
|
coord[0] |
|
for polygon_coords in polygons_coordinates |
|
for coord in polygon_coords |
|
] |
|
) |
|
/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]), |
|
sum( |
|
[ |
|
coord[1] |
|
for polygon_coords in polygons_coordinates |
|
for coord in polygon_coords |
|
] |
|
) |
|
/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]), |
|
] |
|
|
|
my_map = folium.Map(location=map_center, zoom_start=12) |
|
|
|
# Add each bounding polygon to the map |
|
for polygon_coords in polygons_coordinates: |
|
folium.Polygon( |
|
locations=polygon_coords, |
|
color="blue", |
|
fill=True, |
|
fill_color="blue", |
|
fill_opacity=0.2, |
|
).add_to(my_map) |
|
|
|
# Add bounding boxes as markers to the map |
|
marker_cluster = MarkerCluster().add_to(my_map) |
|
|
|
for polygon_coords in polygons_coordinates: |
|
for coord in polygon_coords: |
|
folium.Marker( |
|
location=[coord[0], coord[1]], popup=f"Coordinates: {coord}" |
|
).add_to(marker_cluster) |
|
|
|
# Add draw control to allow users to draw additional polygons |
|
draw = Draw(export=True) |
|
draw.add_to(my_map) |
|
|
|
# Save the map as an HTML file |
|
my_map.save(output_html_path) |
|
|
|
return output_html_path |
|
</function_code> |
|
<question>Document the python code above giving function description ,parameters and return type and example how to call the function</question><doc>''' |
|
``` |
|
|
|
### Response |
|
```txt |
|
Description:This function generates a map of the bounding polygons and saves it as an HTML file. |
|
Parameters: |
|
- polygons_coordinates (list of lists of tuples): A list of lists of tuples representing the coordinates of the polygons. Each polygon is a list of coordinates. |
|
- output_html_path (str, optional): The path where the HTML file should be saved. Defaults to "bounding_polygon_map.html". |
|
Returns: |
|
- str: The path to the saved HTML file. |
|
Example: |
|
To call the function, use the following code: |
|
plot_bounding_polygon([[(0, 0), (1, 0), (1, 1), (0, 1)], [(2, 2), (3, 2), (3, 3), (2, 3)]], "my_map.html"). |
|
``` |
|
|
|
|
|
### 2. SQL Generation |
|
### prompt |
|
```python |
|
text = f"""Generate simple SQL query from the schema mentioned for the following questions. |
|
<schema>CREATE TABLE department (Department_ID number, |
|
Name text, |
|
Creation text, |
|
Ranking number, |
|
Budget_in_Billions number, |
|
Num_Employees number); |
|
|
|
CREATE TABLE head (head_ID number, |
|
name text, |
|
born_state text, |
|
age number); |
|
|
|
CREATE TABLE management (department_ID number, |
|
head_ID number, |
|
temporary_acting text);</schema> |
|
<question>What are the names of the heads who are born outside the California state?</question> |
|
<sql> |
|
``` |
|
|
|
### response |
|
```sql |
|
SELECT head.name FROM head WHERE head.born_state <> 'California'; |
|
``` |
|
|
|
### Team |
|
Avi Kothari, Gyan Ranjan, Pratham Gupta, Ritvik Aryan Kalra, Soham Acharya |