File size: 5,196 Bytes

38b96f5
d412e79
38b96f5
 
 
 
 
 
d412e79
38b96f5
 
 
6be44db
38b96f5
d412e79
 
 
 
 
 
 
 
 
 
 
 
 
 
38b96f5
d412e79
 
 
 
 
 
 
072a599
d412e79
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6be44db
d412e79
6be44db
 
d412e79
 
 
 
 
 
 
 
 
 
6be44db
d412e79
 
6be44db
 
 
 
 
 
 
 
 
 
 
 
 
 
d412e79
6be44db
d412e79
6be44db
d412e79
6be44db
d412e79
6be44db
d412e79
 
 
 
2997f82
 
 
 
 
 
d412e79
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6be44db
 
d412e79
6be44db
d412e79
6be44db
d412e79
 
 
6be44db
d412e79
6be44db
d412e79
 
 
6be44db
d412e79
6be44db
d412e79
38b96f5
d412e79

---
base_model: ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- ruslanmv
- llama
- gguf
---
# Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4

This model is a fine-tuned version of [ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL) for Text-to-SQL generation. It is designed to convert natural language queries into SQL commands, optimized for efficient inference using GGUF (Grouped Quantization for Uniform Format).

## Model Details

- **Base Model**: [ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL)
- **Task**: Text-to-SQL generation
- **Quantization**: GGUF (Q4, 4-bit quantization)
- **License**: Apache-2.0

## Installation

To use this model, you need to install `llama-cpp-python` and `huggingface_hub` for downloading and running the quantized model.

### Step 1: Install Required Packages

```bash
# Install llama-cpp-python from the appropriate repository
!pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/12.1 \
  --force-reinstall --upgrade --no-cache-dir --verbose

# Install huggingface_hub to download models from Hugging Face
!pip install huggingface_hub hf_transfer
```

### Step 2: Set up Hugging Face Hub and Download the Model

Ensure that Hugging Face's transfer feature is enabled and download the quantized model from Hugging Face using the `huggingface-cli`.

```python
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

!huggingface-cli download \
 ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4 \
 unsloth.Q4_K_M.gguf \
 --local-dir . \
 --local-dir-use-symlinks False
```

Make sure the downloaded model is stored in the local directory. Set the model path as follows:

```python
MODEL_PATH = "/content/unsloth.Q4_K_M.gguf"
```

## Usage Example

Here is an example that demonstrates how to generate an SQL query from a natural language prompt using the quantized GGUF model and the `llama_cpp` library.

### Step 1: Define the User Query and Prompt

The user provides a natural language query, and we format the prompt using an Alpaca-style template.

```python
user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
"""

prompt = alpaca_prompt.format(
    "Provide the SQL query",
    user_query
)
```

### Step 2: Load the Model and Generate SQL Query

To load the quantized model and perform inference, you will need the `llama_cpp` library.

```python
from llama_cpp import Llama
import os

# Get the current directory
current_directory = os.getcwd()

# Construct the full model path
MODEL_PATH = os.path.join(current_directory, "unsloth.Q4_K_M.gguf")

# Ensure the model path exists
assert os.path.exists(MODEL_PATH), f"Model path {MODEL_PATH} does not exist."

# Create the prompt for SQL query generation
B_INST, E_INST = "<s>[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
"""
SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS

def create_prompt(user_query):
    instruction = f"Provide the SQL query. User asks: {user_query}\n"
    prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
    return prompt.strip()

# Define user query
user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"  
prompt = create_prompt(user_query)
print(f"Prompt created:\n{prompt}")

# Load the model
try:
    llm = Llama(model_path=MODEL_PATH, n_gpu_layers=1)  # Adjust GPU layers as per your hardware
except AssertionError as e:
    raise RuntimeError(f"Failed to load the model. Check that the model is in the correct format: {e}")

# Perform inference
try:
    result = llm(
        prompt=prompt,
        max_tokens=200,
        echo=False
    )
    print(result['choices'][0]['text'])
except Exception as e:
    print(f"Error during inference: {e}")
```

### Expected Output

The model will return the following SQL query:

```sql
SELECT * FROM table1 WHERE anni = 2020
```

### Additional Notes

- **Quantization**: The model is quantized using GGUF to enable efficient inference, especially on systems with limited memory.
- **Prompt**: The prompt follows an Alpaca instruction style, which helps guide the model in generating SQL queries based on user input.
- **Inference**: The `llama_cpp` library is used to perform inference with this GGUF model. Adjust `n_gpu_layers` and `max_tokens` based on your hardware capabilities and the complexity of the SQL query.

## License

This model is released under the [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) license.

For more detailed information, visit the [model card on Hugging Face](https://huggingface.co/ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL).