Use ReAct prompting with this model
I am trying to use ReAct prompting and I am just able to generate SQL query. Not able to add prefixes even using few shot prompting. Is this an expected behavior of this model?
Hi @akshat-kumar-akight could you describe what react prompting? I'm not familiar with that term, and if you could provide a reproducible code example that would help.
##################ReAct - Generating reasoning traces allow the model to induce, track, and update action plans, and even handle exceptions. The action step allows to interface with and gather information from external sources such as knowledge bases or environments. The ReAct framework can allow LLMs to interact with external tools to retrieve additional information that leads to more reliable and factual responses.
https://www.promptingguide.ai/techniques/react
##################QUESTION : List the total sales per country. Which country's customers spent the most?
##################MODEL OUTPUT (How it arrives to final query)
Invoking: `sql_db_list_tables` with `{}`
Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
Invoking: `sql_db_schema` with `Invoice,Customer`
CREATE TABLE "Customer" (
"CustomerId" INTEGER NOT NULL,
"FirstName" NVARCHAR(40) NOT NULL,
"LastName" NVARCHAR(20) NOT NULL,
"Company" NVARCHAR(80),
"Address" NVARCHAR(70),
"City" NVARCHAR(40),
"State" NVARCHAR(40),
"Country" NVARCHAR(40),
"PostalCode" NVARCHAR(10),
"Phone" NVARCHAR(24),
"Fax" NVARCHAR(24),
"Email" NVARCHAR(60) NOT NULL,
"SupportRepId" INTEGER,
PRIMARY KEY ("CustomerId"),
FOREIGN KEY("SupportRepId") REFERENCES "Employee" ("EmployeeId")
)
/*
3 rows from Customer table:
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
1 Luís Gonçalves Embraer - Empresa Brasileira de Aeronáutica S.A. Av. Brigadeiro Faria Lima, 2170 São José dos Campos SP Brazil 12227-000 +55 (12) 3923-5555 +55 (12) 3923-5566 luisg@embraer.com.br 3
2 Leonie Köhler None Theodor-Heuss-Straße 34 Stuttgart None Germany 70174 +49 0711 2842222 None leonekohler@surfeu.de 5
3 François Tremblay None 1498 rue Bélanger Montréal QC Canada H2G 1A7 +1 (514) 721-4711 None ftremblay@gmail.com 3
*/
CREATE TABLE "Invoice" (
"InvoiceId" INTEGER NOT NULL,
"CustomerId" INTEGER NOT NULL,
"InvoiceDate" DATETIME NOT NULL,
"BillingAddress" NVARCHAR(70),
"BillingCity" NVARCHAR(40),
"BillingState" NVARCHAR(40),
"BillingCountry" NVARCHAR(40),
"BillingPostalCode" NVARCHAR(10),
"Total" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("InvoiceId"),
FOREIGN KEY("CustomerId") REFERENCES "Customer" ("CustomerId")
)
/*
3 rows from Invoice table:
InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart None Germany 70174 1.98
2 4 2009-01-02 00:00:00 Ullevålsveien 14 Oslo None Norway 0171 3.96
3 8 2009-01-03 00:00:00 Grétrystraat 63 Brussels None Belgium 1000 5.94
*/
Invoking: `sql_db_query` with `SELECT c.Country, SUM(i.Total) AS TotalSales FROM Invoice i JOIN Customer c ON i.CustomerId = c.CustomerId GROUP BY c.Country ORDER BY TotalSales DESC LIMIT 10;`
responded: To list the total sales per country, I can query the "Invoice" and "Customer" tables. I will join these tables on the "CustomerId" column and group the results by the "BillingCountry" column. Then, I will calculate the sum of the "Total" column to get the total sales per country. Finally, I will order the results in descending order of the total sales.
#################GENERATED QUERY
Here is the SQL query:
```sql
SELECT c.Country, SUM(i.Total) AS TotalSales
FROM Invoice i
JOIN Customer c ON i.CustomerId = c.CustomerId
GROUP BY c.Country
ORDER BY TotalSales DESC
LIMIT 10;
#################JUST SHOWING THE RESULTS OF QUERY RUN ON DB ALSO (PART OF PIPELINE I BUILT)
[('USA', 523.0600000000003), ('Canada', 303.9599999999999), ('France', 195.09999999999994), ('Brazil', 190.09999999999997), ('Germany', 156.48), ('United Kingdom', 112.85999999999999), ('Czech Republic', 90.24000000000001), ('Portugal', 77.23999999999998), ('India', 75.25999999999999), ('Chile', 46.62)]
##################FOLLOWING IS WHAT THE MODEL GENERATES AT BACKEND (THEORETICALLY) AND IS DESCRIBED AS PART OF PROMPT WITH FEW SHOT EXAMPLES
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
In summary (My previous comment is output of ReAct agent based on GPT 3.5 within Langchain, the lines I have added start with #############), model starts off with getting information about DB schema by firing some basic queries to DB (this is different from prompt approach defined for creators of sqlcoder, but uses agent prompting). As a part of chain of thought, model generates multiple SQL queries. If model has not attained full information needed to generate the query, it generates output with "Thought", "Action" and "Action Input". If it has all the information or it has reached limit of maximum iterations allowed by the user (using agent kwargs), it generates the output with "Final Answer" as prefix
NOTE: You can refer to the link for more info: https://python.langchain.com/docs/integrations/toolkits/sql_database/
On a separate note, to the information mentioned above, I am unable to generate from the model an output in a format: FINAL ANSWER IS SQL
It just generates the SQL
Hi @akshat-kumar-akight , unfortunately our model is not trained for such chain-of-thought (CoT) prompting; it is only meant to be used as a direct interpreter from your question + instructions + schema to the final SQL query. This is partly due to the small size of the model (4% of parameters of 175B of GPT3), as well as our optimizing the model for a shorter and faster output to minimize the latency. You would probably get better results for chain-of-thought prompting using larger and more generic foundational models.
That said, getting sqlcoder-7b-2
to work with questions like yours above is much simpler and typically doesn't require CoT for good results (see our benchmark: https://github.com/defog-ai/sql-eval).
Concretely, you can use the following prompt and get the correct answer:
prompt = "### Task\nGenerate a SQL query to answer the following question:\n`List the total sales per country. Which country's customers spent the most?`\n\n### Schema\nCREATE TABLE Customer (\nCustomerId INTEGER NOT NULL,\nFirstName NVARCHAR(40) NOT NULL,\nLastName NVARCHAR(20) NOT NULL,\nCompany NVARCHAR(80),\nAddress NVARCHAR(70),\nCity NVARCHAR(40),\nState NVARCHAR(40),\nCountry NVARCHAR(40),\nPostalCode NVARCHAR(10),\nPhone NVARCHAR(24),\nFax NVARCHAR(24),\nEmail NVARCHAR(60) NOT NULL,\nSupportRepId INTEGER,\nPRIMARY KEY (CustomerId),\nFOREIGN KEY(SupportRepId) REFERENCES Employee (EmployeeId)\n)\nCREATE TABLE Invoice (\nInvoiceId INTEGER NOT NULL,\nCustomerId INTEGER NOT NULL,\nInvoiceDate DATETIME NOT NULL,\nBillingAddress NVARCHAR(70),\nBillingCity NVARCHAR(40),\nBillingState NVARCHAR(40),\nBillingCountry NVARCHAR(40),\nBillingPostalCode NVARCHAR(10),\nTotal NUMERIC(10, 2) NOT NULL,\nPRIMARY KEY (InvoiceId),\nFOREIGN KEY(CustomerId) REFERENCES Customer (CustomerId)\n)\n\n### Answer\nGiven the database schema above, here is the SQL query that answers the question:\n```sql\n"
result I got:
SELECT c.Country, SUM(i.Total) AS TotalSales FROM Invoice i JOIN Customer c ON i.CustomerId = c.CustomerId GROUP BY c.Country ORDER BY TotalSales DESC NULLS LAST