mysql support

#19
by faizelk - opened

I know this supports many database type as per the github repo suggests, but how do i switch to using mysql and not sql
Git repo says use db_type='mysql' but this does not seem to work

import sys
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import sqlparse

model_name = "defog/sqlcoder-7b-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
# torch_dtype=torch.float16,
load_in_4bit=True,
device_map="auto",
use_cache=True,
)

prompt = """Task
Generate an SQL query to answer [QUESTION]{question}[/QUESTION]

If you cannot answer the question with the available database schema, return 'I do not know'

This query will run on a database whose schema is represented in this string
CREATE TABLE books (
id INTEGER PRIMARY KEY, -- Unique ID for each row
book_id INTEGER, -- book id generated for a book, not unique
book_generation TIMESTAMP, -- datetime of the row creation
book_code VARCHAR(200), -- the book registration/barcode number
book_name VARCHAR(200), -- the book title
)
Given the database schema, here is the sql query that answers [QUESTION]{question}[/QUESTION]
[SQL]
"""

def generate_query(question):
updated_prompt = prompt.format(question=question)
inputs = tokenizer(updated_prompt, return_tensors="pt").to("cuda")
generated_ids = model.generate(
**inputs,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
max_new_tokens=10000,
do_sample=False,
num_beams=1
)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

return sqlparse.format(outputs[0].split("[SQL]")[-1], reindent=True)

print(generate_query("how many books in total?"))

Defog.ai org

Hi @faizelk , were you referring to the defog-python repository? That is our python client for interfacing with our backend api (not sqlcoder directly), for which we have some additional tweaks to support such use cases. If you're facing issues with that library, please open an issue there. Separately, if you're trying to load sqlcoder and run inferences with it locally, the way you created the prompt looks fine, the only suggestion I would make is to use the non-quantized model weights (ie comment out load_in_4bit=True,) for better results.

Sign up or log in to comment