File size: 8,909 Bytes
d5d908d 88824ab d5d908d 88824ab e7448b2 5e9c08c 88824ab d5d908d 138a454 a45fc81 138a454 507df33 138a454 507df33 138a454 507df33 138a454 507df33 138a454 507df33 138a454 a45fc81 138a454 a45fc81 138a454 59c3912 138a454 59c3912 138a454 648ac52 138a454 2c689b9 138a454 354cfe9 138a454 354cfe9 138a454 d2425c2 138a454 354cfe9 138a454 354cfe9 138a454 d2425c2 138a454 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- python
- document
- code
- code2doc
- instruction_tuned
- basemodel
- pytorch
- docstring
- documentation
- text-generation-inference
metrics:
- accuracy
pipeline_tag: text-generation
widget:
- text: '<example_response>--code:def function_divide2(x): return x / 2--question:Document the code--doc:Description:This function takes a number and divides it by 2.Parameters:- x (numeric): The input value to be divided by 2.Returns:- float: The result of x divided by 2.Example:To call the function, use the following code:function_divide2(1.0)</example_response><function_code>def _plot_bounding_polygon(polygons_coordinates, output_html_path=bounding_polygon_map.html):map_center = [sum([coord[0]for polygon_coords in polygons_coordinatesfor coord in polygon_coords])/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),sum([coord[1]for polygon_coords in polygons_coordinatesfor coord in polygon_coords])/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),]my_map = folium.Map(location=map_center, zoom_start=12)for polygon_coords in polygons_coordinates:folium.Polygon(locations=polygon_coords,color=blue,fill=True,fill_color=blue,fill_opacity=0.2,).add_to(my_map)marker_cluster = MarkerCluster().add_to(my_map)for polygon_coords in polygons_coordinates:for coord in polygon_coords:folium.Marker(location=[coord[0], coord[1]], popup=fCoordinates: {coord}).add_to(marker_cluster)draw = Draw(export=True)draw.add_to(my_map)my_map.save(output_html_path)return output_html_path</function_code><question>Document the python code above giving function description ,parameters and return type and example how to call the function</question><doc>'
example_title: example
---
# pip-code-to-doc
[pipableAi](https://www.linkedin.com/company/pipable.ai/about/)
[colab_notebook](https://colab.research.google.com/drive/17PyMU_3QN9LROy7x-jmaema0cuLRzBvc?usp=sharing)
## What have we built?
A 1.3 bn code documentation model that outperforms most models on documenting codes and making your in-house libs ready for LLM and RAG pipelines.
We have also open sourced a [parsing lib](https://github.com/PipableAI/pip-library-parser) for the same, together the lib and model can turn your codebase to functional parse tree ready to be consumed by LLMs to execute complex tasks.
This is a further trained version of pip-sql-1.3b.
## How we built it?
We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up.
Loss behaviour in the set up mentioned above -
## License
The model is open source under apache 2.0. License
## Usage
### Library use
```python
!pip3 install git+https://github.com/PipableAI/pip-library-parser
!pip3 install atlassian-python-api
from pip_library_parser import CodeToDocGenerator
# Replace 'your_module' and 'YourModule' with the actual module and module name
module_name = 'your_module'
module = __import__(module_name)
# Instantiate the CodeToDocGenerator
generator = CodeToDocGenerator()
# Generate docstrings for the module's functions and methods
docs = generator.generate_module_docs(module, module_name)
# 'docs' now contains a dictionary mapping function/method names to their generated docstrings
```
```python
from pip_library_parser import CodeToDocGenerator
# Instantiate the CodeToDocGenerator
generator = CodeToDocGenerator()
code_snippet = """
def example_function(x):
return x * 2
"""
docstring = generator.generate_docstring_from_pip_model(code_snippet)
print("Generated Docstring:")
print(docstring)
```
### Installation
```bash
pip install transformers
```
### Prompt
```python
prompt = f"""<example_response>{--question , --query}</example_response><function_code>{code}</function_code>
<question>Give one line description of the python code above in natural language.</question>
<doc>"""
prompt = f"""<example_response>{example of some --question: , --query}</example_response><schema>{schema with cols described}</schema>
<question>Write a sql query to ....</question>
<sql>"""
```
### PyTorch
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-code-to-doc-1.3b").to(device)
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-code-to-doc-1.3b")
prompt = f"""<example_response>
--code:def function_2(x): return x / 2
--question:Document the code
--doc:
Description:This function takes a number and divides it by 2.
Parameters:
- x (numeric): The input value to be divided by 2.
Returns:
- float: The result of x divided by 2
Example:
To call the function, use the following code:
function2(1.0)</example_response>
<function_code>
def example_function(x):
return x * 2
</function_code>
<question>Document the python code above giving function description ,parameters and return type and example how to call the function.</question>
<doc>"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=300)
tokenizer.decode(outputs[0], skip_special_tokens=True).split('<doc>')[-1].split('</doc>')[0]
```
## Examples
### 1. Code Documentation
### prompt
```python
text=''' <example_response>
--code:def function_2(x): return x / 2
--question:Document the code
--doc:
Description:This function takes a number and divides it by 2.
Parameters:
- x (numeric): The input value to be divided by 2.
Returns:
- float: The result of x divided by 2
Example:
To call the function, use the following code:
function2(1.0)</example_response>
<function_code>def _plot_bounding_polygon(
polygons_coordinates, output_html_path="bounding_polygon_map.html"
):
# Create a Folium map centered at the average coordinates of all bounding boxes
map_center = [
sum(
[
coord[0]
for polygon_coords in polygons_coordinates
for coord in polygon_coords
]
)
/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),
sum(
[
coord[1]
for polygon_coords in polygons_coordinates
for coord in polygon_coords
]
)
/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),
]
my_map = folium.Map(location=map_center, zoom_start=12)
# Add each bounding polygon to the map
for polygon_coords in polygons_coordinates:
folium.Polygon(
locations=polygon_coords,
color="blue",
fill=True,
fill_color="blue",
fill_opacity=0.2,
).add_to(my_map)
# Add bounding boxes as markers to the map
marker_cluster = MarkerCluster().add_to(my_map)
for polygon_coords in polygons_coordinates:
for coord in polygon_coords:
folium.Marker(
location=[coord[0], coord[1]], popup=f"Coordinates: {coord}"
).add_to(marker_cluster)
# Add draw control to allow users to draw additional polygons
draw = Draw(export=True)
draw.add_to(my_map)
# Save the map as an HTML file
my_map.save(output_html_path)
return output_html_path
</function_code>
<question>Document the python code above giving function description ,parameters and return type and example how to call the function</question><doc>'''
```
### Response
```txt
Description:This function generates a map of the bounding polygons and saves it as an HTML file.
Parameters:
- polygons_coordinates (list of lists of tuples): A list of lists of tuples representing the coordinates of the polygons. Each polygon is a list of coordinates.
- output_html_path (str, optional): The path where the HTML file should be saved. Defaults to "bounding_polygon_map.html".
Returns:
- str: The path to the saved HTML file.
Example:
To call the function, use the following code:
plot_bounding_polygon([[(0, 0), (1, 0), (1, 1), (0, 1)], [(2, 2), (3, 2), (3, 3), (2, 3)]], "my_map.html").
```
### 2. SQL Generation
### prompt
```python
text = f"""Generate simple SQL query from the schema mentioned for the following questions.
<schema>CREATE TABLE department (Department_ID number,
Name text,
Creation text,
Ranking number,
Budget_in_Billions number,
Num_Employees number);
CREATE TABLE head (head_ID number,
name text,
born_state text,
age number);
CREATE TABLE management (department_ID number,
head_ID number,
temporary_acting text);</schema>
<question>What are the names of the heads who are born outside the California state?</question>
<sql>
```
### response
```sql
SELECT head.name FROM head WHERE head.born_state <> 'California';
```
### Team
Avi Kothari, Gyan Ranjan, Pratham Gupta, Ritvik Aryan Kalra, Soham Acharya |