ruslanmv commited on
Commit
d412e79
1 Parent(s): 6be44db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -33
README.md CHANGED
@@ -1,32 +1,74 @@
1
  ---
2
- base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
3
  language:
4
  - en
5
  license: apache-2.0
6
  tags:
7
  - text-generation-inference
8
  - transformers
9
- - unsloth
10
  - llama
11
  - gguf
12
  ---
13
-
14
  # Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4
15
 
16
- This model is a fine-tuned version of [unsloth/meta-llama-3.1-8b-bnb-4bit] for the task of Text-to-SQL generation. It is quantized using GGUF (Grouped Quantization for Uniform Format) for efficient inference.
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- This model can be used for generating SQL queries from natural language descriptions. Here's an example using the `transformers` and `auto-gptq` libraries, showcasing the Alpaca-style prompt format:
21
 
22
  ```python
23
- from transformers import AutoTokenizer
24
- from auto_gptq import AutoGPTQForCausalLM
 
 
 
 
 
 
 
 
25
 
26
- tokenizer = AutoTokenizer.from_pretrained("ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4")
27
- model = AutoGPTQForCausalLM.from_pretrained("ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4", device="cuda:0", use_triton=False)
28
 
29
- # Define Alpaca-style prompt template
30
  alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
31
 
32
  ### Instruction:
@@ -38,40 +80,76 @@ alpaca_prompt = """Below is an instruction that describes a task, paired with an
38
  ### Response:
39
  """
40
 
41
- # Format the prompt without the response part
42
  prompt = alpaca_prompt.format(
43
  "Provide the SQL query",
44
- "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
45
  )
 
46
 
47
- # Tokenize the prompt and generate text
48
- inputs = tokenizer([prompt], return_tensors="pt").to("cuda:0") # Adjust device if needed
49
- outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
50
-
51
- # Decode the generated text
52
- generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
53
 
54
- # Extract the generated response only (remove the prompt part)
55
- response_start = generated_text.find("### Response:") + len("### Response:\n")
56
- response = generated_text[response_start:].strip()
57
 
58
- # Print the response (excluding the prompt)
59
- print(response)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ```
61
 
62
- Please refer to the [auto-gptq](https://github.com/PanQiWei/AutoGPTQ) repository for more advanced usage and configuration options.
63
 
64
- ## Limitations
65
 
66
- * The model might generate incorrect or incomplete SQL queries, especially for complex or ambiguous natural language descriptions.
67
- * The model's performance might vary depending on the specific database schema and data distribution.
 
68
 
69
- ## Disclaimer
70
 
71
- This model is intended for research and experimentation purposes only. The author is not responsible for any consequences arising from the use of this model.
 
 
72
 
73
- ## Acknowledgements
74
 
75
- * This model is based on the [unsloth/meta-llama-3.1-8b-bnb-4bit] model.
76
- * The quantization was performed using the [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) library.
77
 
 
 
1
  ---
2
+ base_model: ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL
3
  language:
4
  - en
5
  license: apache-2.0
6
  tags:
7
  - text-generation-inference
8
  - transformers
9
+ - ruslanmv
10
  - llama
11
  - gguf
12
  ---
 
13
  # Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4
14
 
15
+ This model is a fine-tuned version of [ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL) for Text-to-SQL generation. It is designed to convert natural language queries into SQL commands, optimized for efficient inference using GGUF (Grouped Quantization for Uniform Format).
16
+
17
+ ## Model Details
18
+
19
+ - **Base Model**: [ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL)
20
+ - **Task**: Text-to-SQL generation
21
+ - **Quantization**: GGUF (Q4, 4-bit quantization)
22
+ - **License**: Apache-2.0
23
+
24
+ ## Installation
25
+
26
+ To use this model, you need to install `llama-cpp-python` and `huggingface_hub` for downloading and running the quantized model.
27
+
28
+ ### Step 1: Install Required Packages
29
 
30
+ ```bash
31
+ # Install llama-cpp-python from the appropriate repository
32
+ !pip install llama-cpp-python \
33
+ --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/12.1 \
34
+ --force-reinstall --upgrade --no-cache-dir --verbose
35
+
36
+ # Install huggingface_hub to download models from Hugging Face
37
+ !pip install huggingface_hub
38
+ ```
39
+
40
+ ### Step 2: Set up Hugging Face Hub and Download the Model
41
+
42
+ Ensure that Hugging Face's transfer feature is enabled and download the quantized model from Hugging Face using the `huggingface-cli`.
43
+
44
+ ```python
45
+ import os
46
+ os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
47
+
48
+ !huggingface-cli download \
49
+ ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4 \
50
+ unsloth.Q4_K_M.gguf \
51
+ --local-dir . \
52
+ --local-dir-use-symlinks False
53
+ ```
54
 
55
+ Make sure the downloaded model is stored in the local directory. Set the model path as follows:
56
 
57
  ```python
58
+ MODEL_PATH = "/content/unsloth.Q4_K_M.gguf"
59
+ ```
60
+
61
+ ## Usage Example
62
+
63
+ Here is an example that demonstrates how to generate an SQL query from a natural language prompt using the quantized GGUF model and the `llama_cpp` library.
64
+
65
+ ### Step 1: Define the User Query and Prompt
66
+
67
+ The user provides a natural language query, and we format the prompt using an Alpaca-style template.
68
 
69
+ ```python
70
+ user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
71
 
 
72
  alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
73
 
74
  ### Instruction:
 
80
  ### Response:
81
  """
82
 
 
83
  prompt = alpaca_prompt.format(
84
  "Provide the SQL query",
85
+ user_query
86
  )
87
+ ```
88
 
89
+ ### Step 2: Load the Model and Generate SQL Query
 
 
 
 
 
90
 
91
+ To load the quantized model and perform inference, you will need the `llama_cpp` library.
 
 
92
 
93
+ ```python
94
+ from llama_cpp import Llama
95
+ import os
96
+
97
+ # Ensure the model path exists
98
+ MODEL_PATH = "/content/unsloth.Q4_K_M.gguf"
99
+ assert os.path.exists(MODEL_PATH), f"Model path {MODEL_PATH} does not exist."
100
+
101
+ # Create the prompt for SQL query generation
102
+ B_INST, E_INST = "<s>[INST]", "[/INST]"
103
+ B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
104
+ DEFAULT_SYSTEM_PROMPT = """\
105
+ Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
106
+ """
107
+ SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
108
+
109
+ def create_prompt(user_query):
110
+ instruction = f"Provide the SQL query. User asks: {user_query}\n"
111
+ prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
112
+ return prompt.strip()
113
+
114
+ # Define user query
115
+ user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
116
+ prompt = create_prompt(user_query)
117
+ print(f"Prompt created:\n{prompt}")
118
+
119
+ # Load the model
120
+ try:
121
+ llm = Llama(model_path=MODEL_PATH, n_gpu_layers=1) # Adjust GPU layers as per your hardware
122
+ except AssertionError as e:
123
+ raise RuntimeError(f"Failed to load the model. Check that the model is in the correct format: {e}")
124
+
125
+ # Perform inference
126
+ try:
127
+ result = llm(
128
+ prompt=prompt,
129
+ max_tokens=200,
130
+ echo=False
131
+ )
132
+ print(result['choices'][0]['text'])
133
+ except Exception as e:
134
+ print(f"Error during inference: {e}")
135
  ```
136
 
137
+ ### Expected Output
138
 
139
+ The model will return the following SQL query:
140
 
141
+ ```sql
142
+ SELECT * FROM table1 WHERE anni = 2020
143
+ ```
144
 
145
+ ### Additional Notes
146
 
147
+ - **Quantization**: The model is quantized using GGUF to enable efficient inference, especially on systems with limited memory.
148
+ - **Prompt**: The prompt follows an Alpaca instruction style, which helps guide the model in generating SQL queries based on user input.
149
+ - **Inference**: The `llama_cpp` library is used to perform inference with this GGUF model. Adjust `n_gpu_layers` and `max_tokens` based on your hardware capabilities and the complexity of the SQL query.
150
 
151
+ ## License
152
 
153
+ This model is released under the [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) license.
 
154
 
155
+ For more detailed information, visit the [model card on Hugging Face](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL).