evaluation benchmark against gpt4o, gpt4o-mini, o1-preview, oi-mini

#18
by gweiz - opened

Anyone tried to benchmark against the latest foundation language model?
Is it still on par with the best? Any guide to go about to do this benchmarking?

Hi @gweiz , you can follow the instructions in our eval harness repo to evaluate on the latest models (e.g. openai):
https://github.com/defog-ai/sql-eval/?tab=readme-ov-file#openai

Sign up or log in to comment