ruslanmv
/

Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4

@@ -1,32 +1,74 @@
 ---
-base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
 language:
 - en
 license: apache-2.0
 tags:
 - text-generation-inference
 - transformers
-- unsloth
 - llama
 - gguf
 ---
 # Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4
-This model is a fine-tuned version of [unsloth/meta-llama-3.1-8b-bnb-4bit] for the task of Text-to-SQL generation. It is quantized using GGUF (Grouped Quantization for Uniform Format) for efficient inference.
-## Usage
-This model can be used for generating SQL queries from natural language descriptions. Here's an example using the `transformers` and `auto-gptq` libraries, showcasing the Alpaca-style prompt format:
 ```python
-from transformers import AutoTokenizer
-from auto_gptq import AutoGPTQForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4")
-model = AutoGPTQForCausalLM.from_pretrained("ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4", device="cuda:0", use_triton=False)
-# Define Alpaca-style prompt template
 alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
 ### Instruction:
@@ -38,40 +80,76 @@ alpaca_prompt = """Below is an instruction that describes a task, paired with an
 ### Response:
 """
-# Format the prompt without the response part
 prompt = alpaca_prompt.format(
     "Provide the SQL query",
-    "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
 )
-# Tokenize the prompt and generate text
-inputs = tokenizer([prompt], return_tensors="pt").to("cuda:0")  # Adjust device if needed
-outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
-# Decode the generated text
-generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
-# Extract the generated response only (remove the prompt part)
-response_start = generated_text.find("### Response:") + len("### Response:\n")
-response = generated_text[response_start:].strip()
-# Print the response (excluding the prompt)
-print(response)
 ```
-Please refer to the [auto-gptq](https://github.com/PanQiWei/AutoGPTQ) repository for more advanced usage and configuration options.
-## Limitations
-* The model might generate incorrect or incomplete SQL queries, especially for complex or ambiguous natural language descriptions.
-* The model's performance might vary depending on the specific database schema and data distribution.
-## Disclaimer
-This model is intended for research and experimentation purposes only. The author is not responsible for any consequences arising from the use of this model.
-## Acknowledgements
-* This model is based on the [unsloth/meta-llama-3.1-8b-bnb-4bit] model.
-* The quantization was performed using the [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) library.

 ---
+base_model: ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL
 language:
 - en
 license: apache-2.0
 tags:
 - text-generation-inference
 - transformers
+- ruslanmv
 - llama
 - gguf
 ---
 # Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4
+This model is a fine-tuned version of [ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL) for Text-to-SQL generation. It is designed to convert natural language queries into SQL commands, optimized for efficient inference using GGUF (Grouped Quantization for Uniform Format).
+## Model Details
+- **Base Model**: [ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL)
+- **Task**: Text-to-SQL generation
+- **Quantization**: GGUF (Q4, 4-bit quantization)
+- **License**: Apache-2.0
+## Installation
+To use this model, you need to install `llama-cpp-python` and `huggingface_hub` for downloading and running the quantized model.
+### Step 1: Install Required Packages
+```bash
+# Install llama-cpp-python from the appropriate repository
+!pip install llama-cpp-python \
+  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/12.1 \
+  --force-reinstall --upgrade --no-cache-dir --verbose
+# Install huggingface_hub to download models from Hugging Face
+!pip install huggingface_hub
+```
+### Step 2: Set up Hugging Face Hub and Download the Model
+Ensure that Hugging Face's transfer feature is enabled and download the quantized model from Hugging Face using the `huggingface-cli`.
+```python
+import os
+os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
+!huggingface-cli download \
+ ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4 \
+ unsloth.Q4_K_M.gguf \
+ --local-dir . \
+ --local-dir-use-symlinks False
+```
+Make sure the downloaded model is stored in the local directory. Set the model path as follows:
 ```python
+MODEL_PATH = "/content/unsloth.Q4_K_M.gguf"
+```
+## Usage Example
+Here is an example that demonstrates how to generate an SQL query from a natural language prompt using the quantized GGUF model and the `llama_cpp` library.
+### Step 1: Define the User Query and Prompt
+The user provides a natural language query, and we format the prompt using an Alpaca-style template.
+```python
+user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
 alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
 ### Instruction:
 ### Response:
 """
 prompt = alpaca_prompt.format(
     "Provide the SQL query",
+    user_query
 )
+```
+### Step 2: Load the Model and Generate SQL Query
+To load the quantized model and perform inference, you will need the `llama_cpp` library.
+```python
+from llama_cpp import Llama
+import os
+# Ensure the model path exists
+MODEL_PATH = "/content/unsloth.Q4_K_M.gguf"
+assert os.path.exists(MODEL_PATH), f"Model path {MODEL_PATH} does not exist."
+# Create the prompt for SQL query generation
+B_INST, E_INST = "<s>[INST]", "[/INST]"
+B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
+DEFAULT_SYSTEM_PROMPT = """\
+Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+"""
+SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
+def create_prompt(user_query):
+    instruction = f"Provide the SQL query. User asks: {user_query}\n"
+    prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
+    return prompt.strip()
+# Define user query
+user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
+prompt = create_prompt(user_query)
+print(f"Prompt created:\n{prompt}")
+# Load the model
+try:
+    llm = Llama(model_path=MODEL_PATH, n_gpu_layers=1)  # Adjust GPU layers as per your hardware
+except AssertionError as e:
+    raise RuntimeError(f"Failed to load the model. Check that the model is in the correct format: {e}")
+# Perform inference
+try:
+    result = llm(
+        prompt=prompt,
+        max_tokens=200,
+        echo=False
+    )
+    print(result['choices'][0]['text'])
+except Exception as e:
+    print(f"Error during inference: {e}")
 ```
+### Expected Output
+The model will return the following SQL query:
+```sql
+SELECT * FROM table1 WHERE anni = 2020
+```
+### Additional Notes
+- **Quantization**: The model is quantized using GGUF to enable efficient inference, especially on systems with limited memory.
+- **Prompt**: The prompt follows an Alpaca instruction style, which helps guide the model in generating SQL queries based on user input.
+- **Inference**: The `llama_cpp` library is used to perform inference with this GGUF model. Adjust `n_gpu_layers` and `max_tokens` based on your hardware capabilities and the complexity of the SQL query.
+## License
+This model is released under the [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) license.
+For more detailed information, visit the [model card on Hugging Face](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL).