Amazon Sagemaker endpoint implementation doesn't work accurately.
Hi Team, I am trying to use your model within a sagemaker endpoint ( one 'ml.g5.2xlarge' instance ) as per the script - https://huggingface.co/defog/sqlcoder-7b-2?sagemaker_deploy=true. On using the prompt-
Generate a SQL query to answer [QUESTION]Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.[/QUESTION]
Instructions
If you cannot answer the question with the available database schema, return 'I do not know'
Database Schema
The query will run on a database with the following schema:
CREATE TABLE products (
product_id INTEGER PRIMARY KEY, -- Unique ID for each product
name VARCHAR(50), -- Name of the product
price DECIMAL(10,2), -- Price of each unit of the product
quantity INTEGER -- Current quantity in stock
);
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY, -- Unique ID for each customer
name VARCHAR(50), -- Name of the customer
address VARCHAR(100) -- Mailing address of the customer
);
CREATE TABLE salespeople (
salesperson_id INTEGER PRIMARY KEY, -- Unique ID for each salesperson
name VARCHAR(50), -- Name of the salesperson
region VARCHAR(50) -- Geographic sales region
);
CREATE TABLE sales (
sale_id INTEGER PRIMARY KEY, -- Unique ID for each sale
product_id INTEGER, -- ID of product sold
customer_id INTEGER, -- ID of customer who made purchase
salesperson_id INTEGER, -- ID of salesperson who made the sale
sale_date DATE, -- Date the sale occurred
quantity INTEGER -- Quantity of product sold
);
CREATE TABLE product_suppliers (
supplier_id INTEGER PRIMARY KEY, -- Unique ID for each supplier
product_id INTEGER, -- Product ID supplied
supply_price DECIMAL(10,2) -- Unit price charged by supplier
);
-- sales.product_id can be joined with products.product_id
-- sales.customer_id can be joined with customers.customer_id
-- sales.salesperson_id can be joined with salespeople.salesperson_id
-- product_suppliers.product_id can be joined with products.product_id
Answer
Given the database schema, here is the SQL query that answers [QUESTION]Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.[/QUESTION]
[SQL]
The query that the endpoint generates is : SELECT c.city, SUM(s.quantity) AS total_sales, SUM(CASE"}.
Any thoughts on what might be going wrong here?
Hi there, please use the prompt in the model card for best results.
We do not use Sagemaker for deployment, so I do not know what issues you have with it here. But seems like you would have to increase the maximum output tokens to get the complete answer – your answer is being prematurely cut off here. I would also recommend using beam search with num beams around 3 or 4, if that's supported by Sagemaker.
Thanks for the reply @rishdotblog . I am using the same prompt and following the steps provided in your inference.py file (https://github.com/defog-ai/sqlcoder/blob/main/inference.py). To be able to change some model parameter values (num_beams, max_output_tokens), I have changed my strategy a little bit now - if I use the huggingface model hub directly in my sagemaker endpoint (as given in your script https://huggingface.co/defog/sqlcoder-7b-2?sagemaker_deploy=true ), I don't have any control over what model configurations I can pass. However, I can over write the sagemaker handlers for model loading and predictions with my own handlers (inspired from your inference.py) which forces sagemaker to use the configurations I want, similar to the implementation - https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb.
However, I am facing a different error now. I am running out of GPU Memory in my sagemaker endpoint. I am using a ml.g5.2xlarge with GPU memory of 22GB. Do you think your model sqlcoder-7b-2 with 7B parameters needs more GPU memory than 22GB. Any ideas on how I can reduce my GPU memory footprint for inference? The exact error is this-
"com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacty of 22.20 GiB of which 37.12 MiB is free. Process 10864 has 22.16 GiB memory in use. Of the allocated memory 21.33 GiB is allocated by PyTorch, and 110.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF : 400"
Any help would be greatly appreciated. Thank you :)
Fig : GPU Memory utilization during inference. Note : You can see the GPU memory utilization is a 100% now.