Links for sample inference and database schema are dead

#4
by NPap - opened

It might be found here: https://github.com/defog-ai/sqlcoder

@gerald29 @NPap

Do you have a sample code for running the model with quantization on an RTX 4090 (24gb vram)?"

@samvedya Just download the exl2 from here: https://huggingface.co/waldie/sqlcoder-34b-alpha-4bpw-h6-exl2

(Works for my 3090s)

model_name="defog/sqlcoder-34b-alpha"

quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto",
use_cache=True,
quantization_config=quantization_config
)

This is a better way

@samvedya Just download the exl2 from here: https://huggingface.co/waldie/sqlcoder-34b-alpha-4bpw-h6-exl2

(Works for my 3090s)

hey, for reference, what kind of t/s do you get for your prompts? (or How long does it take to get an output from the model?)
(How big is your schema in terms of # columns and tables?)

(Around 4 line output prompts)

Hi, all.

I run the inference script on sqlcoder-34b-alpha model but with no sql relsult return, any ideas please.

Thanks a lot

@samvedya : were you able to run the model with 8 bit quantization on an RTX 4090 (24gb vram) with above settings?

@samvedya : were you able to run the model with 8 bit quantization on an RTX 4090 (24gb vram) with above settings?

Yes.

@samvedya : can you please share the code or github report and version of libraries used.
Also, I want to run the code on windows env. Which env you used and is there any other specific changes you did?

@samvedya : can you share the code and version of libraries used. I want to run the code on windows env

#Latest version of every library as of today

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
#import sqlparse # if you want to scrap the sql from raw LLM output

model_id = "codellama/CodeLlama-34b-Instruct-hf"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto",
)

prompt = ""

inputs = tokenizer(prompt, return_tensors="pt",add_special_tokens=True).to("cuda")

output = model.generate(
inputs["input_ids"],
max_new_tokens=128,
do_sample=True,
top_p=0.9,
temperature=0.1,
repetition_penalty=1.05
)
output = output[0].to("cpu")
string_output=(tokenizer.decode(output))

print(string_output)

Hi @samvedya : I was able to run the code with 4bit quantization on windows with a specific bitsandbytes library available on path:
https://jllllll.github.io/bitsandbytes-windows-webui/bitsandbytes/

Can you let me know if you were able to run the model in 8 bit and what was the config used for it?

Sign up or log in to comment