File size: 5,196 Bytes
38b96f5 d412e79 38b96f5 d412e79 38b96f5 6be44db 38b96f5 d412e79 38b96f5 d412e79 072a599 d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 2997f82 d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 6be44db d412e79 38b96f5 d412e79 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
base_model: ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- ruslanmv
- llama
- gguf
---
# Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4
This model is a fine-tuned version of [ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL) for Text-to-SQL generation. It is designed to convert natural language queries into SQL commands, optimized for efficient inference using GGUF (Grouped Quantization for Uniform Format).
## Model Details
- **Base Model**: [ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL)
- **Task**: Text-to-SQL generation
- **Quantization**: GGUF (Q4, 4-bit quantization)
- **License**: Apache-2.0
## Installation
To use this model, you need to install `llama-cpp-python` and `huggingface_hub` for downloading and running the quantized model.
### Step 1: Install Required Packages
```bash
# Install llama-cpp-python from the appropriate repository
!pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/12.1 \
--force-reinstall --upgrade --no-cache-dir --verbose
# Install huggingface_hub to download models from Hugging Face
!pip install huggingface_hub hf_transfer
```
### Step 2: Set up Hugging Face Hub and Download the Model
Ensure that Hugging Face's transfer feature is enabled and download the quantized model from Hugging Face using the `huggingface-cli`.
```python
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
!huggingface-cli download \
ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4 \
unsloth.Q4_K_M.gguf \
--local-dir . \
--local-dir-use-symlinks False
```
Make sure the downloaded model is stored in the local directory. Set the model path as follows:
```python
MODEL_PATH = "/content/unsloth.Q4_K_M.gguf"
```
## Usage Example
Here is an example that demonstrates how to generate an SQL query from a natural language prompt using the quantized GGUF model and the `llama_cpp` library.
### Step 1: Define the User Query and Prompt
The user provides a natural language query, and we format the prompt using an Alpaca-style template.
```python
user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
"""
prompt = alpaca_prompt.format(
"Provide the SQL query",
user_query
)
```
### Step 2: Load the Model and Generate SQL Query
To load the quantized model and perform inference, you will need the `llama_cpp` library.
```python
from llama_cpp import Llama
import os
# Get the current directory
current_directory = os.getcwd()
# Construct the full model path
MODEL_PATH = os.path.join(current_directory, "unsloth.Q4_K_M.gguf")
# Ensure the model path exists
assert os.path.exists(MODEL_PATH), f"Model path {MODEL_PATH} does not exist."
# Create the prompt for SQL query generation
B_INST, E_INST = "<s>[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
"""
SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
def create_prompt(user_query):
instruction = f"Provide the SQL query. User asks: {user_query}\n"
prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
return prompt.strip()
# Define user query
user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
prompt = create_prompt(user_query)
print(f"Prompt created:\n{prompt}")
# Load the model
try:
llm = Llama(model_path=MODEL_PATH, n_gpu_layers=1) # Adjust GPU layers as per your hardware
except AssertionError as e:
raise RuntimeError(f"Failed to load the model. Check that the model is in the correct format: {e}")
# Perform inference
try:
result = llm(
prompt=prompt,
max_tokens=200,
echo=False
)
print(result['choices'][0]['text'])
except Exception as e:
print(f"Error during inference: {e}")
```
### Expected Output
The model will return the following SQL query:
```sql
SELECT * FROM table1 WHERE anni = 2020
```
### Additional Notes
- **Quantization**: The model is quantized using GGUF to enable efficient inference, especially on systems with limited memory.
- **Prompt**: The prompt follows an Alpaca instruction style, which helps guide the model in generating SQL queries based on user input.
- **Inference**: The `llama_cpp` library is used to perform inference with this GGUF model. Adjust `n_gpu_layers` and `max_tokens` based on your hardware capabilities and the complexity of the SQL query.
## License
This model is released under the [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) license.
For more detailed information, visit the [model card on Hugging Face](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL). |