---
language:
- en
- it
license: apache-2.0
tags:
- text-generation-inference
- transformers
- ruslanmv
- llama
- trl
- sft
---
# Meta-Llama 3.1 8B Text-to-SQL GPTQ Model

This repository provides a quantized 8-billion-parameter Meta-Llama model fine-tuned for text-to-SQL tasks. The model is optimized with GPTQ quantization for efficient inference. Below you'll find instructions to load, use, and fine-tune the model.

## Model Details

- **Model Size**: 8B
- **Quantization**: GPTQ (4-bit)
- **Languages Supported**: English, Italian
- **Task**: Text-to-SQL generation
- **License**: Apache 2.0

## Installation Requirements

Before using the model, ensure that you have the following dependencies installed. We recommend using the same versions to avoid any compatibility issues.

```bash
# Install the required PyTorch version with CUDA support (ensure CUDA 12.1 is installed)
!pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121

# Install AutoGPTQ for quantized model handling
!pip install auto-gptq --no-build-isolation

# Install Optimum for model optimization
!pip install optimum
```

After installing the dependencies, reset your instance to ensure everything works correctly.

## Loading the Model

To load the quantized Meta-Llama 3.1 model and use it for text-to-SQL tasks, use the following Python code:

```python
from transformers import AutoTokenizer, pipeline
from auto_gptq import AutoGPTQForCausalLM
import torch

# Define the Alpaca-style prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
"""

# Model directory and tokenizer
quantized_model_dir = "meta-llama-8b-quantized-4bit"  # Path where quantized model is saved
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

# Load the quantized model
model = AutoGPTQForCausalLM.from_quantized(
    quantized_model_dir,
    device_map="auto",  # Automatically map the model to the available device (GPU or CPU)
    torch_dtype=torch.float16,  # Ensure FP16 for efficiency
    use_safetensors=True  # If you saved the model using safetensors format, set this to True
)

# Set up the text generation pipeline without specifying the device
pipeline = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer
)

# Function to generate SQL query from input text using the Alpaca prompt
def generate_sql(input_text):
    # Format the prompt
    prompt = alpaca_prompt.format(
        "Provide the SQL query",
        input_text
    )

    # Generate the response using the pipeline
    generated_text = pipeline(
        prompt, 
        max_length=200, 
        eos_token_id=tokenizer.eos_token_id
    )[0]["generated_text"]

    # Clean the output by removing the prompt and any extra newlines
    cleaned_output = generated_text.replace(prompt, '').strip()

    return cleaned_output

# Example usage
italian_input = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
sql_query = generate_sql(italian_input)
print(sql_query)
```

## Example Usage

The example script shows how to generate SQL queries from natural language text. Simply provide a request in Italian or English, and the model will generate an appropriate SQL query.

Example input:

```python
italian_input = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
sql_query = generate_sql(italian_input)
print(sql_query)
```

Example output:

```sql
SELECT * FROM table1 WHERE anni = 2020;
```

## Model Tags

- **text-generation-inference**
- **transformers**
- **llama**
- **trl**
- **sft**

## License

This model is released under the [Apache License 2.0](LICENSE).