Getting Incomplete Query in most cases

#3
by smooth-operator94 - opened

Hi NumberStation team,

I have deployed this model in sagemaker in ml.g5.2xlarge compute. In most of the cases i'm getting incomplete query response. I have tried increasing response token size using "max_tokens": 999999 but the issue still persists.

I have also tried the nsql-350M version and that's working fine with complete query being generated. Seeing this issue only with nsql-2B model

This is the Input Query :
CREATE TABLE git_worklog_gpt (
org_id text,
type text,
repository text,
value integer,
date date,
author_email text
)

CREATE TABLE burnout_worklog_gpt (
org_id text,
person text,
date date
)

CREATE TABLE work_cycletime_gpt (
org_id text,
type text,
project text,
time_in_minutes integer,
date date
)

CREATE TABLE issue_worklog_gpt (
org_id text,
type text,
project text,
value integer,
date date,
status text
)

CREATE TABLE investment_report_gpt (
org_id text,
investment_name text,
value integer,
date date,
story_point integer,
estimate integer
)

CREATE TABLE issue_lead_time_gpt (
org_id text,
project text,
type text,
status text,
priority text,
lead_time_in_minutes double precision,
date date
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- list all issues closed in last 30 days
SELECT

Tis is the Response which I'm getting :
CREATE TABLE git_worklog_gpt (
org_id text,
type text,
repository text,
value integer,
date date,
author_email text
)

CREATE TABLE burnout_worklog_gpt (
org_id text,
person text,
date date
)

CREATE TABLE work_cycletime_gpt (
org_id text,
type text,
project text,
time_in_minutes integer,
date date
)

CREATE TABLE issue_worklog_gpt (
org_id text,
type text,
project text,
value integer,
date date,
status text
)

CREATE TABLE investment_report_gpt (
org_id text,
investment_name text,
value integer,
date date,
story_point integer,
estimate integer
)

CREATE TABLE issue_lead_time_gpt (
org_id text,
project text,
type text,
status text,
priority text,
lead_time_in_minutes double precision,
date date
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- list all issues closed in last 30 days
SELECT * FROM issue_worklog_gpt WHERE status = "Closed" AND DATE(

Will really appreciate some help getting around this.

Thanks

NumbersStation org

Hi @smooth-operator94 ,

I am not very familiar with sagemaker setup. Can you provide some more details about your settings and the config you sent to the sagemaker?

FYI: Here is the response I got from a local deployment:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NumbersStation/nsql-2B")
model = AutoModelForCausalLM.from_pretrained("NumbersStation/nsql-2B", torch_dtype=torch.float16).to(0)

text = """CREATE TABLE git_worklog_gpt (
    org_id text,
    type text,
    repository text,
    value integer,
    date date,
    author_email text
)

CREATE TABLE burnout_worklog_gpt (
    org_id text,
    person text,
    date date
)

CREATE TABLE work_cycletime_gpt (
    org_id text,
    type text,
    project text,
    time_in_minutes integer,
    date date
)

CREATE TABLE issue_worklog_gpt (
    org_id text,
    type text,
    project text,
    value integer,
    date date,
    status text
)

CREATE TABLE investment_report_gpt (
    org_id text,
    investment_name text,
    value integer,
    date date,
    story_point integer,
    estimate integer
)

CREATE TABLE issue_lead_time_gpt (
    org_id text,
    project text,
    type text,
    status text,
    priority text,
    lead_time_in_minutes double precision,
    date date
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- list all issues closed in last 30 days

SELECT"""

input_ids = tokenizer(text, return_tensors="pt").input_ids.to(0)

generated_ids = model.generate(input_ids, max_length=500)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

And output is

SELECT * FROM issue_worklog_gpt WHERE status = "Closed" AND date >= DATEADD(DAY, -30, GETDATE());

Hi senwu,

I'm using following script for sagemaker deployment

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20230723T133694')['Role']['Arn']

Hub Model configuration. https://huggingface.co/models

hub = {
'HF_MODEL_ID':'NumbersStation/nsql-2B',
'SM_NUM_GPUS': json.dumps(1)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="0.9.3"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300
)

After going through the aws Cloudwatch logs, i found these configs being used during model deployment

Args { model_id: "NumbersStation/nsql-2B", revision: None, validation_workers: 2, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "container-0.local", port: 8080, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/tmp"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }

Sign up or log in to comment