File size: 8,204 Bytes
d5d908d
 
 
88824ab
d5d908d
 
 
 
 
 
 
 
 
 
 
 
88824ab
 
 
e7448b2
5e9c08c
88824ab
d5d908d
138a454
a45fc81
138a454
507df33
138a454
507df33
138a454
507df33
138a454
 
 
507df33
138a454
507df33
138a454
 
a45fc81
138a454
a45fc81
138a454
 
 
 
 
 
 
 
 
 
 
 
 
59c3912
 
 
138a454
 
 
 
 
 
59c3912
 
 
138a454
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
648ac52
138a454
 
2c689b9
 
 
 
138a454
 
 
 
 
 
 
 
354cfe9
 
 
 
 
 
 
 
 
 
 
 
138a454
 
 
 
354cfe9
138a454
 
 
 
 
 
 
 
 
 
 
 
354cfe9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138a454
 
 
 
354cfe9
 
 
 
 
 
 
 
 
138a454
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- python
- document
- code
- code2doc
- instruction_tuned
- basemodel
- pytorch
- docstring
- documentation
- text-generation-inference
metrics:
- accuracy
pipeline_tag: text-generation
widget:
- text: '<example_response>--code:def function_divide2(x): return x / 2--question:Document the code--doc:Description:This function takes a number and divides it by 2.Parameters:- x (numeric): The input value to be divided by 2.Returns:- float: The result of x divided by 2.Example:To call the function, use the following code:function_divide2(1.0)</example_response><function_code>def _plot_bounding_polygon(polygons_coordinates, output_html_path=bounding_polygon_map.html):map_center = [sum([coord[0]for polygon_coords in polygons_coordinatesfor coord in polygon_coords])/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),sum([coord[1]for polygon_coords in polygons_coordinatesfor coord in polygon_coords])/ sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),]my_map = folium.Map(location=map_center, zoom_start=12)for polygon_coords in polygons_coordinates:folium.Polygon(locations=polygon_coords,color=blue,fill=True,fill_color=blue,fill_opacity=0.2,).add_to(my_map)marker_cluster = MarkerCluster().add_to(my_map)for polygon_coords in polygons_coordinates:for coord in polygon_coords:folium.Marker(location=[coord[0], coord[1]], popup=fCoordinates: {coord}).add_to(marker_cluster)draw = Draw(export=True)draw.add_to(my_map)my_map.save(output_html_path)return output_html_path</function_code><question>Document the python code above giving function description ,parameters and return type and example how to call the function</question><doc>'
  example_title: example
---
# pip-code-to-doc

[pipableAi](https://www.linkedin.com/company/pipable.ai/about/)

[colab_notebook](https://colab.research.google.com/drive/17PyMU_3QN9LROy7x-jmaema0cuLRzBvc?usp=sharing)

## What have we built?

A 1.3 bn code documentation model that outperforms most models on documenting codes and making your in-house libs ready for LLM and RAG pipelines.
We have also open sourced a [parsing lib](https://github.com/PipableAI/pip-library-parser) for the same, together the lib and model can turn your codebase to functional parse tree ready to be consumed by LLMs to execute complex tasks.
This is a further trained version of pip-sql-1.3b.

## How we built it?

We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up.
Loss behaviour in the set up mentioned above - 

## License

The model is open source under apache 2.0. License

## Usage


### Library use
```python
!pip3 install git+https://github.com/PipableAI/pip-library-parser
!pip3 install atlassian-python-api


from pip_library_parser import CodeToDocGenerator

# Replace 'your_module' and 'YourModule' with the actual module and module name
module_name = 'your_module'
module = __import__(module_name)

# Instantiate the CodeToDocGenerator
generator = CodeToDocGenerator()

# Generate docstrings for the module's functions and methods
docs = generator.generate_module_docs(module, module_name)

# 'docs' now contains a dictionary mapping function/method names to their generated docstrings

```

```python
from pip_library_parser import CodeToDocGenerator

# Instantiate the CodeToDocGenerator
generator = CodeToDocGenerator()

code_snippet = """
def example_function(x):
    return x * 2
"""

docstring = generator.generate_docstring_from_pip_model(code_snippet)
print("Generated Docstring:")
print(docstring)
```

### Installation

```bash
pip install transformers
```

### Prompt
```python
prompt = f"""<example_response>{--question , --query}</example_response><function_code>{code}</function_code>
<question>Give one line description of the python code above in natural language.</question>
<doc>"""

prompt = f"""<example_response>{example of some  --question: , --query}</example_response><schema>{schema with cols described}</schema>
<question>Write a sql query to ....</question>
<sql>"""
```

### PyTorch
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-code-to-doc-1.3b").to(device)
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-code-to-doc-1.3b")
prompt = f"""<example_response>
--code:def function_2(x): return x / 2
--question:Document the code
--doc:
    Description:This function takes a number and divides it by 2.
    Parameters:
    - x (numeric): The input value to be divided by 2.
    Returns:
    - float: The result of x divided by 2
    Example:
    To call the function, use the following code:
    function2(1.0)</example_response>
<function_code>
def example_function(x):
    return x * 2
</function_code>
<question>Document the python code above giving function description ,parameters and return type and example how to call the function.</question>
<doc>"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=300)
tokenizer.decode(outputs[0], skip_special_tokens=True).split('<doc>')[-1].split('</doc>')[0]
```



## Examples

### prompt
```python
text=''' <example_response>
--code:def function_2(x): return x / 2
--question:Document the code
--doc:
    Description:This function takes a number and divides it by 2.
    Parameters:
    - x (numeric): The input value to be divided by 2.
    Returns:
    - float: The result of x divided by 2
    Example:
    To call the function, use the following code:
    function2(1.0)</example_response>
<function_code>def _plot_bounding_polygon(
    polygons_coordinates, output_html_path="bounding_polygon_map.html"
):
    # Create a Folium map centered at the average coordinates of all bounding boxes
    map_center = [
        sum(
            [
                coord[0]
                for polygon_coords in polygons_coordinates
                for coord in polygon_coords
            ]
        )
        / sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),
        sum(
            [
                coord[1]
                for polygon_coords in polygons_coordinates
                for coord in polygon_coords
            ]
        )
        / sum([len(polygon_coords) for polygon_coords in polygons_coordinates]),
    ]

    my_map = folium.Map(location=map_center, zoom_start=12)

    # Add each bounding polygon to the map
    for polygon_coords in polygons_coordinates:
        folium.Polygon(
            locations=polygon_coords,
            color="blue",
            fill=True,
            fill_color="blue",
            fill_opacity=0.2,
        ).add_to(my_map)

    # Add bounding boxes as markers to the map
    marker_cluster = MarkerCluster().add_to(my_map)

    for polygon_coords in polygons_coordinates:
        for coord in polygon_coords:
            folium.Marker(
                location=[coord[0], coord[1]], popup=f"Coordinates: {coord}"
            ).add_to(marker_cluster)

    # Add draw control to allow users to draw additional polygons
    draw = Draw(export=True)
    draw.add_to(my_map)

    # Save the map as an HTML file
    my_map.save(output_html_path)

    return output_html_path
    </function_code>
    <question>Document the python code above giving function description ,parameters and return type and example how to call the function</question><doc>'''
```

### Response
```txt
 Description:This function generates a map of the bounding polygons and saves it as an HTML file.
    Parameters:
    - polygons_coordinates (list of lists of tuples): A list of lists of tuples representing the coordinates of the polygons. Each polygon is a list of coordinates.
    - output_html_path (str, optional): The path where the HTML file should be saved. Defaults to "bounding_polygon_map.html".
    Returns:
    - str: The path to the saved HTML file.
    Example:
    To call the function, use the following code:
    plot_bounding_polygon([[(0, 0), (1, 0), (1, 1), (0, 1)], [(2, 2), (3, 2), (3, 3), (2, 3)]], "my_map.html").
```

### Team
Avi Kothari, Gyan Ranjan, Pratham Gupta, Ritvik Aryan Kalra, Soham Acharya