anindya64 commited on
Commit
f243039
1 Parent(s): f48af9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -53
README.md CHANGED
@@ -12,8 +12,7 @@ base_model:
12
  - deepseek-ai/deepseek-coder-1.3b-instruct
13
  pipeline_tag: text2text-generation
14
  ---
15
-
16
- # Prem-1B-SQL
17
 
18
  - Read the blogpost [here](https://blog.premai.io/prem-1b-sql-fully-local-performant-slm-for-text-to-sql/)
19
  - PremSQL Library | [GitHub](https://github.com/premAI-io/premsql)
@@ -21,7 +20,7 @@ pipeline_tag: text2text-generation
21
  Prem-1B-SQL is one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model
22
  it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first
23
  approach. Because exposing Databases to third-party closed-source models can lead to data security breaches. We will be publishing some
24
- of the public benchmark results of this model very soon. We will also be iterating on this model for more better results.
25
 
26
  - **Developed by:** [Prem AI](https://www.premai.io/)
27
  - **License:** [MIT]
@@ -31,32 +30,30 @@ of the public benchmark results of this model very soon. We will also be iterati
31
  We evaluated our model on two popular benchmark datasets: BirdBench and Spider. BirdBench consists of a public validation dataset (with 1534 data points) and a private test dataset. Spider comes up with only a public validation dataset. Here are the results:
32
 
33
  | Dataset | Execution Accuracy |
34
- |--------------------------|--------------------|
35
- | BirdBench (validation) | 46% |
36
- | BirdBench (private test) | 51.54% |
37
  | Spider | 85% |
38
 
39
- The BirdBench dataset is distributed across different difficulty levels. Here is a detailed view of the private results across different difficulty levels.
40
-
41
- | Difficulty | Count | EX | Soft F1 |
42
- |-------------|-------|---------|---------|
43
- | Simple | 949 | 60.70 | 61.48 |
44
- | Moderate | 555 | 47.39 | 49.06 |
45
- | Challenging | 285 | 29.12 | 31.83 |
46
- | Total | 1789 | 51.54 | 52.90 |
47
 
 
 
 
 
 
 
48
 
49
- Here is a more detailed comparison of popular closed- and open-source models.
50
 
51
- | Model | # Params (in Billion) | BirdBench Test Scores |
52
- |-------------------------------|-----------------------|-----------------------|
53
  | AskData + GPT-4o (current winner) | NA | 72.39 |
54
- | DeepSeek coder 236B | 236 | 56.68 |
55
- | GPT-4 (2023) | NA | 54.89 |
56
- | **PremSQL 1B (ours)** | 1 | 51.4 |
57
- | Qwen 2.5 7B Instruct | 7 | 51.1 |
58
- | Claude 2 Base (2023) | NA | 49.02 |
59
-
60
 
61
  ## How to use Prem-1B-SQL
62
 
@@ -78,44 +75,54 @@ To install PremSQL just create a new environment and type:
78
  pip install -U premsql
79
  ```
80
 
81
- Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more details of the library usage.
82
 
83
- ### Running Prem-1B-SQL using PremSQL Pipelines
84
 
85
- The easiest way to use this model is through PremSQL pipelines. All you need to do is provide the database path (in case of SQLite databases)
86
  or provide the DB connection URI. After this, all you need to do is, connect it with the model. Here is how you do that:
87
 
88
  ```python
89
- from premsql.pipelines import SimpleText2SQLAgent
90
- from premsql.generators import Text2SQLGeneratorHF
 
91
  from premsql.executors import SQLiteExecutor
92
 
93
- # Provide a SQLite file here or see documentation for more customization
94
- dsn_or_db_path = "./data/db/california_schools.sqlite"
95
-
96
- agent = SimpleText2SQLAgent(
97
- dsn_or_db_path=dsn_or_db_path,
98
- generator=Text2SQLGeneratorHF(
99
- model_or_name_or_path="premai-io/prem-1B-SQL",
100
- experiment_name="simple_pipeline",
101
- device="cuda:0",
102
- type="test"
103
- ),
 
104
  )
105
 
106
- question = "please list the phone numbers of the direct charter-funded schools that are opened after 2000/1/1"
 
 
 
 
 
 
 
107
 
108
- response = agent.query(question)
109
- response["table"]
 
 
110
  ```
111
 
112
- Under the hood, it automatically connects with your Database and do all the heavy lifting like prompt creation, execution etc for you.
113
-
114
 
115
  ### Running Prem-1B-SQL using PremSQL Generators
116
 
117
- You can also run the model using PremSQL Generators. This is helpful when you want to do generations in
118
- bulk on some dataset. Here is an example:
119
 
120
  ```python
121
  from premsql.generators import Text2SQLGeneratorHF
@@ -127,7 +134,7 @@ dataset = bird_dataset = Text2SQLDataset(
127
  dataset_folder="/path/to/dataset"
128
  ).setup_dataset(num_rows=10, num_fewshot=3)
129
 
130
- # Define a generator
131
  generator = Text2SQLGeneratorHF(
132
  model_or_name_or_path="premai-io/prem-1B-SQL",
133
  experiment_name="test_generators",
@@ -149,7 +156,6 @@ print(responses)
149
 
150
  This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out.
151
 
152
-
153
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637b0075806b18943e4ba357/_5rdIQZwyaUFb84xKW_AV.png)
154
 
155
  ```python
@@ -166,12 +172,10 @@ response = generator.generate_and_save_results(
166
  )
167
  ```
168
 
169
-
170
- You can also fine-tune Prem-1B-SQL using HuggingFace Transformers and with [PremSQL Tuners](https://docs.premai.io/premsql/tuners) as well.
171
  Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more about PremSQL and all the features
172
  we provide.
173
 
174
-
175
  ## Datasets used to train the model
176
 
177
  Prem-1B-SQL is trained using the following datasets:
@@ -181,8 +185,7 @@ Prem-1B-SQL is trained using the following datasets:
181
  3. [Domain specialization dataset, gathered and uploaded to PremSQL datasets](https://huggingface.co/datasets/premai-io/domains)
182
  4. [Gretel AI synthetic dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql?row=0)
183
 
184
- Additionally we made error handling datasets on top of these datasets to make the model learn from its errors and self correct them.
185
-
186
 
187
  ## Evaluation results of Prem-1B-SQL
188
 
 
12
  - deepseek-ai/deepseek-coder-1.3b-instruct
13
  pipeline_tag: text2text-generation
14
  ---
15
+ # Prem-1B-SQL (Ollama)
 
16
 
17
  - Read the blogpost [here](https://blog.premai.io/prem-1b-sql-fully-local-performant-slm-for-text-to-sql/)
18
  - PremSQL Library | [GitHub](https://github.com/premAI-io/premsql)
 
20
  Prem-1B-SQL is one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model
21
  it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first
22
  approach. Because exposing Databases to third-party closed-source models can lead to data security breaches. We will be publishing some
23
+ of the public benchmark results of this model very soon. We will also be iterating on this model for more better results.
24
 
25
  - **Developed by:** [Prem AI](https://www.premai.io/)
26
  - **License:** [MIT]
 
30
  We evaluated our model on two popular benchmark datasets: BirdBench and Spider. BirdBench consists of a public validation dataset (with 1534 data points) and a private test dataset. Spider comes up with only a public validation dataset. Here are the results:
31
 
32
  | Dataset | Execution Accuracy |
33
+ | ------------------------ | ------------------ |
34
+ | BirdBench (validation) | 46% |
35
+ | BirdBench (private test) | 51.54% |
36
  | Spider | 85% |
37
 
38
+ The BirdBench dataset is distributed across different difficulty levels. Here is a detailed view of the private results across different difficulty levels.
 
 
 
 
 
 
 
39
 
40
+ | Difficulty | Count | EX | Soft F1 |
41
+ | ----------- | ----- | ----- | ------- |
42
+ | Simple | 949 | 60.70 | 61.48 |
43
+ | Moderate | 555 | 47.39 | 49.06 |
44
+ | Challenging | 285 | 29.12 | 31.83 |
45
+ | Total | 1789 | 51.54 | 52.90 |
46
 
47
+ Here is a more detailed comparison of popular closed- and open-source models.
48
 
49
+ | Model | # Params (in Billion) | BirdBench Test Scores |
50
+ | --------------------------------- | --------------------- | --------------------- |
51
  | AskData + GPT-4o (current winner) | NA | 72.39 |
52
+ | DeepSeek coder 236B | 236 | 56.68 |
53
+ | GPT-4 (2023) | NA | 54.89 |
54
+ | **PremSQL 1B (ours)** | 1 | 51.4 |
55
+ | Qwen 2.5 7B Instruct | 7 | 51.1 |
56
+ | Claude 2 Base (2023) | NA | 49.02 |
 
57
 
58
  ## How to use Prem-1B-SQL
59
 
 
75
  pip install -U premsql
76
  ```
77
 
78
+ Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more details of the library usage.
79
 
80
+ ### Running Prem-1B-SQL using PremSQL BaseLine Agent
81
 
82
+ The easiest way to use this model is through PremSQL pipelines. All you need to do is provide the database path (in case of SQLite databases)
83
  or provide the DB connection URI. After this, all you need to do is, connect it with the model. Here is how you do that:
84
 
85
  ```python
86
+ from premsql.agents import BaseLineAgent
87
+ from premsql.generators import Text2SQLGeneratorOllama
88
+ from premsql.agents.tools import SimpleMatplotlibTool
89
  from premsql.executors import SQLiteExecutor
90
 
91
+ text2_sqlmodel = Text2SQLGeneratorHF(
92
+ model_or_name_or_path="premai-io/prem-1B-SQL",
93
+ experiment_name="test_generators",
94
+ device="cuda:0",
95
+ type="test"
96
+ )
97
+
98
+ analyser_and_plotter = Text2SQLGeneratorHF(
99
+ model_or_name_or_path="meta-llama/Llama-3.2-1B-Instruct",
100
+ experiment_name="test_generators",
101
+ device="cuda:0",
102
+ type="test"
103
  )
104
 
105
+ agent = BaseLineAgent(
106
+ session_name="testing_hf",
107
+ db_connection_uri="sqlite:////path/to/your/database.sqlite",
108
+ specialized_model1=model,
109
+ specialized_model2=model,
110
+ plot_tool=SimpleMatplotlibTool(),
111
+ executor=SQLiteExecutor()
112
+ )
113
 
114
+ response = agent(
115
+ "/query what all tables are present inside the database"
116
+ )
117
+ response.show_dataframe()
118
  ```
119
 
120
+ Under the hood, it automatically connects with your Database and do all the heavy lifting like prompt creation, execution etc for you.
 
121
 
122
  ### Running Prem-1B-SQL using PremSQL Generators
123
 
124
+ You can also run the model using PremSQL Generators. This is helpful when you want to do generations in
125
+ bulk on some dataset. Here is an example:
126
 
127
  ```python
128
  from premsql.generators import Text2SQLGeneratorHF
 
134
  dataset_folder="/path/to/dataset"
135
  ).setup_dataset(num_rows=10, num_fewshot=3)
136
 
137
+ # Define a generator
138
  generator = Text2SQLGeneratorHF(
139
  model_or_name_or_path="premai-io/prem-1B-SQL",
140
  experiment_name="test_generators",
 
156
 
157
  This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out.
158
 
 
159
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637b0075806b18943e4ba357/_5rdIQZwyaUFb84xKW_AV.png)
160
 
161
  ```python
 
172
  )
173
  ```
174
 
175
+ You can also fine-tune Prem-1B-SQL using HuggingFace Transformers and with [PremSQL Tuners](https://docs.premai.io/premsql/tuners) as well.
 
176
  Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more about PremSQL and all the features
177
  we provide.
178
 
 
179
  ## Datasets used to train the model
180
 
181
  Prem-1B-SQL is trained using the following datasets:
 
185
  3. [Domain specialization dataset, gathered and uploaded to PremSQL datasets](https://huggingface.co/datasets/premai-io/domains)
186
  4. [Gretel AI synthetic dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql?row=0)
187
 
188
+ Additionally we made error handling datasets on top of these datasets to make the model learn from its errors and self correct them.
 
189
 
190
  ## Evaluation results of Prem-1B-SQL
191