Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,177 @@
|
|
1 |
---
|
2 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- PipableAI/pip-txt-to-sql-spider-bird-dataset
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
metrics:
|
8 |
+
- accuracy
|
9 |
+
tags:
|
10 |
+
- document
|
11 |
+
- code
|
12 |
+
- text2sql
|
13 |
+
- instruction_tuned
|
14 |
+
- basemodel
|
15 |
+
- jax
|
16 |
+
- pytorch
|
17 |
+
- tensorflow
|
18 |
+
- text-generation-inference
|
19 |
+
library_name: transformers
|
20 |
+
pipeline_tag: text-generation
|
21 |
+
widget:
|
22 |
+
- text: "<schema>CREATE TABLE system(JobID: String,GID: String, UID: String, Start:Time(yyyy/mm/dd), End: Time,ElapsedRaw: Time, CPUTimeRAW: Time,NCPUS: Number,NNodes: Number, NodeList: List, State:String, Timelimit: Time);</schema><question>Get UID and job id for Jobs that started on Jan 20 , 2023 ended on feb 14 2023 and has job id 20</question><sql>"
|
23 |
+
example_title: "example"
|
24 |
+
|
25 |
---
|
26 |
+
# pip-parse
|
27 |
+
|
28 |
+
[pipableAi](https://www.linkedin.com/company/pipable.ai/about/)
|
29 |
+
|
30 |
+
[colab_notebook]()
|
31 |
+
|
32 |
+
## What have we built?
|
33 |
+
A 1.3 bn code documentation model that outperforms most models on documenting codes and making your in-house libs ready for LLM and RAG pipelines.
|
34 |
+
We have also open sourced a parsing lib for the same , together the lib and model can turn your codebase to functional parse tree ready to be consumed by LLMs to execute complex tasks.
|
35 |
+
This is a further trained version of pip-sql-1.3b.
|
36 |
+
|
37 |
+
|
38 |
+
|
39 |
+
## How we built it?
|
40 |
+
|
41 |
+
We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up.
|
42 |
+
Loss behaviour in the set up mentioned above -
|
43 |
+
|
44 |
+
|
45 |
+
## License
|
46 |
+
The model is open source under apache 2.0. License
|
47 |
+
|
48 |
+
## Usage
|
49 |
+
|
50 |
+
### Installation
|
51 |
+
|
52 |
+
```bash
|
53 |
+
pip install transformers
|
54 |
+
```
|
55 |
+
|
56 |
+
### Prompt
|
57 |
+
```python
|
58 |
+
prompt = f"""<code>{schema}</code>
|
59 |
+
<question>Document the code above</question>
|
60 |
+
<doc>"""
|
61 |
+
```
|
62 |
+
|
63 |
+
### PyTorch
|
64 |
+
```python
|
65 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
66 |
+
device = "cuda"
|
67 |
+
model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-parser")
|
68 |
+
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-parser")
|
69 |
+
|
70 |
+
inputs = tokenizer(text, return_tensors="pt")
|
71 |
+
outputs = model.generate(**inputs, max_new_tokens=300)
|
72 |
+
tokenizer.decode(outputs[0], skip_special_tokens=True).split('<doc>')[-1].split('</doc>')[0]
|
73 |
+
```
|
74 |
+
|
75 |
+
|
76 |
+
|
77 |
+
## Examples
|
78 |
+
|
79 |
+
### Code
|
80 |
+
```python
|
81 |
+
<code>
|
82 |
+
###########################
|
83 |
+
# Generate Analytical Model
|
84 |
+
###########################
|
85 |
+
##################################################
|
86 |
+
# func: get_np_array_transition_probability_matrix
|
87 |
+
##################################################
|
88 |
+
def get_np_array_transition_probability_matrix(int_num_states, np_array_A_matrix):
|
89 |
+
print('np_array_A_matrix:')
|
90 |
+
print(np_array_A_matrix)
|
91 |
+
#####################################################
|
92 |
+
# Perturb the adjacency matrix to avoid singularities
|
93 |
+
#####################################################
|
94 |
+
np_array_A_matrix += (np.full((int_num_states, int_num_states), float_eps) - (np.identity(int_num_states) * float_eps))
|
95 |
+
print('np_array_A_matrix:')
|
96 |
+
print(np_array_A_matrix)
|
97 |
+
print('np_array_D_matrix:')
|
98 |
+
np_array_D_matrix = np.diag(np.sum(np_array_A_matrix, axis=1))
|
99 |
+
print(np_array_D_matrix)
|
100 |
+
print('np_array_D_matrix_inv:')
|
101 |
+
np_array_D_matrix_inv = np.linalg.inv(np_array_D_matrix)
|
102 |
+
print(np_array_D_matrix_inv)
|
103 |
+
print('\n\n')
|
104 |
+
print('np_array_P_matrix:')
|
105 |
+
np_array_P_matrix = np.dot(np_array_D_matrix_inv, np_array_A_matrix)
|
106 |
+
print(np_array_P_matrix)
|
107 |
+
print('np.sum(np_array_P_matrix, axis=1):')
|
108 |
+
print(np.sum(np_array_P_matrix, axis=1))
|
109 |
+
print('\n\n')
|
110 |
+
return np_array_P_matrix
|
111 |
+
##################################################
|
112 |
+
# func: get_np_array_perron_frobenius_eigen_vector
|
113 |
+
##################################################
|
114 |
+
def get_np_array_perron_frobenius_matrix(int_num_states, np_array_P_matrix):
|
115 |
+
np_array_perron_frobenius_matrix = np.linalg.matrix_power(np_array_P_matrix,1000)
|
116 |
+
np_array_perron_frobenius_vector = np_array_perron_frobenius_matrix[0,:]
|
117 |
+
print('np_array_perron_frobenius_matrix:')
|
118 |
+
print(np_array_perron_frobenius_matrix)
|
119 |
+
print('np.sum(np_array_perron_frobenius_matrix, axis=1):')
|
120 |
+
print(np.sum(np_array_perron_frobenius_matrix, axis=1))
|
121 |
+
print('np.sum(np_array_perron_frobenius_matrix, axis=0):')
|
122 |
+
print(np.sum(np_array_perron_frobenius_matrix, axis=0))
|
123 |
+
print('np.sum(np_array_perron_frobenius_matrix, axis=0)/int_num_states:')
|
124 |
+
print(np.sum(np_array_perron_frobenius_matrix, axis=0)/int_num_states)
|
125 |
+
print('np.dot(np_array_perron_frobenius_vector, np_array_P_matrix):')
|
126 |
+
print(np.dot(np_array_perron_frobenius_vector, np_array_P_matrix))
|
127 |
+
print('np_array_perron_frobenius_vector:')
|
128 |
+
print(np_array_perron_frobenius_vector)
|
129 |
+
print('\n\n')
|
130 |
+
return np_array_perron_frobenius_vector, np_array_perron_frobenius_matrix
|
131 |
+
#############################
|
132 |
+
# func: get_np_array_Z_matrix
|
133 |
+
#############################
|
134 |
+
def get_np_array_Z_matrix(int_num_states, np_array_P_matrix, np_array_perron_frobenius_matrix):
|
135 |
+
np_array_Z_matrix = np.linalg.inv(np.identity(int_num_states) - np_array_P_matrix + np_array_perron_frobenius_matrix)
|
136 |
+
print('np_array_Z_matrix:')
|
137 |
+
print(np_array_Z_matrix)
|
138 |
+
print('\n\n')
|
139 |
+
return(np_array_Z_matrix)
|
140 |
+
#############################
|
141 |
+
# func: get_np_array_H_matrix
|
142 |
+
#############################
|
143 |
+
def get_np_array_H_matrix(int_num_states, np_array_Z_matrix, np_array_perron_frobenius_vector):
|
144 |
+
np_array_H_matrix = np.zeros([int_num_states, int_num_states])
|
145 |
+
for i in range(int_num_states):
|
146 |
+
for j in range(int_num_states):
|
147 |
+
np_array_H_matrix[i][j] = (np_array_Z_matrix[j][j] - np_array_Z_matrix[i][j])/np_array_perron_frobenius_vector[j]
|
148 |
+
print('np_array_H_matrix:')
|
149 |
+
print(np_array_H_matrix)
|
150 |
+
print('\n\n')
|
151 |
+
return np_array_H_matrix
|
152 |
+
###########
|
153 |
+
# func: run
|
154 |
+
###########
|
155 |
+
def run(np_array_A_matrix):
|
156 |
+
int_num_states = len(np_array_A_matrix)
|
157 |
+
np_array_P_matrix = get_np_array_transition_probability_matrix(int_num_states, np_array_A_matrix)
|
158 |
+
np_array_perron_frobenius_vector, np_array_perron_frobenius_matrix = get_np_array_perron_frobenius_matrix(int_num_states, np_array_P_matrix)
|
159 |
+
np_array_Z_matrix = get_np_array_Z_matrix(int_num_states, np_array_P_matrix, np_array_perron_frobenius_matrix)
|
160 |
+
np_array_H_matrix = get_np_array_H_matrix(int_num_states, np_array_Z_matrix, np_array_perron_frobenius_vector)
|
161 |
+
return(np_array_H_matrix)
|
162 |
+
<question>Document the python code above.
|
163 |
+
</question><doc>
|
164 |
+
```
|
165 |
+
|
166 |
+
### Response
|
167 |
+
What are the email address, town and county of the customers who are of the least common gender?
|
168 |
+
```python
|
169 |
+
The Python code provided is used to generate an analytical model for a Markov chain with a given adjacency matrix.
|
170 |
+
The model is then used to compute the Perron-Frobenius eigenvector and the corresponding matrix. The resulting matrices are then used to compute the Z-matrix and
|
171 |
+
the H-matrix. The H-matrix is then returned as the output of the function. The code is designed to handle large matrices and perform computations efficiently.
|
172 |
+
The matrices are manipulated using numpy's powerful and efficient numerical computation library.
|
173 |
+
The code also includes comments to explain the functionality of each part of the code.
|
174 |
+
```
|
175 |
+
|
176 |
+
### Team
|
177 |
+
Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya
|