evaluation benchmark against gpt4o, gpt4o-mini, o1-preview, oi-mini

#18
by gweiz - opened

Anyone tried to benchmark against the latest foundation language model?
Is it still on par with the best? Any guide to go about to do this benchmarking?

Defog.ai org

Hi @gweiz , you can follow the instructions in our eval harness repo to evaluate on the latest models (e.g. openai):
https://github.com/defog-ai/sql-eval/?tab=readme-ov-file#openai

Sign up or log in to comment