ruslanmv's picture
Update README.md
072a599 verified
metadata
base_model: ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL
language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - ruslanmv
  - llama
  - gguf

Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4

This model is a fine-tuned version of ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL for Text-to-SQL generation. It is designed to convert natural language queries into SQL commands, optimized for efficient inference using GGUF (Grouped Quantization for Uniform Format).

Model Details

Installation

To use this model, you need to install llama-cpp-python and huggingface_hub for downloading and running the quantized model.

Step 1: Install Required Packages

# Install llama-cpp-python from the appropriate repository
!pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/12.1 \
  --force-reinstall --upgrade --no-cache-dir --verbose

# Install huggingface_hub to download models from Hugging Face
!pip install huggingface_hub hf_transfer

Step 2: Set up Hugging Face Hub and Download the Model

Ensure that Hugging Face's transfer feature is enabled and download the quantized model from Hugging Face using the huggingface-cli.

import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

!huggingface-cli download \
 ruslanmv/Meta-Llama-3.1-8B-Text-to-SQL-GGUF-q4 \
 unsloth.Q4_K_M.gguf \
 --local-dir . \
 --local-dir-use-symlinks False

Make sure the downloaded model is stored in the local directory. Set the model path as follows:

MODEL_PATH = "/content/unsloth.Q4_K_M.gguf"

Usage Example

Here is an example that demonstrates how to generate an SQL query from a natural language prompt using the quantized GGUF model and the llama_cpp library.

Step 1: Define the User Query and Prompt

The user provides a natural language query, and we format the prompt using an Alpaca-style template.

user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
"""

prompt = alpaca_prompt.format(
    "Provide the SQL query",
    user_query
)

Step 2: Load the Model and Generate SQL Query

To load the quantized model and perform inference, you will need the llama_cpp library.

from llama_cpp import Llama
import os

# Get the current directory
current_directory = os.getcwd()

# Construct the full model path
MODEL_PATH = os.path.join(current_directory, "unsloth.Q4_K_M.gguf")

# Ensure the model path exists
assert os.path.exists(MODEL_PATH), f"Model path {MODEL_PATH} does not exist."

# Create the prompt for SQL query generation
B_INST, E_INST = "<s>[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
"""
SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS

def create_prompt(user_query):
    instruction = f"Provide the SQL query. User asks: {user_query}\n"
    prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
    return prompt.strip()

# Define user query
user_query = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"  
prompt = create_prompt(user_query)
print(f"Prompt created:\n{prompt}")

# Load the model
try:
    llm = Llama(model_path=MODEL_PATH, n_gpu_layers=1)  # Adjust GPU layers as per your hardware
except AssertionError as e:
    raise RuntimeError(f"Failed to load the model. Check that the model is in the correct format: {e}")

# Perform inference
try:
    result = llm(
        prompt=prompt,
        max_tokens=200,
        echo=False
    )
    print(result['choices'][0]['text'])
except Exception as e:
    print(f"Error during inference: {e}")

Expected Output

The model will return the following SQL query:

SELECT * FROM table1 WHERE anni = 2020

Additional Notes

  • Quantization: The model is quantized using GGUF to enable efficient inference, especially on systems with limited memory.
  • Prompt: The prompt follows an Alpaca instruction style, which helps guide the model in generating SQL queries based on user input.
  • Inference: The llama_cpp library is used to perform inference with this GGUF model. Adjust n_gpu_layers and max_tokens based on your hardware capabilities and the complexity of the SQL query.

License

This model is released under the Apache-2.0 license.

For more detailed information, visit the model card on Hugging Face.