evaluation benchmark against gpt4o, gpt4o-mini, o1-preview, oi-mini
#18
by
gweiz
- opened
Anyone tried to benchmark against the latest foundation language model?
Is it still on par with the best? Any guide to go about to do this benchmarking?
Hi
@gweiz
, you can follow the instructions in our eval harness repo to evaluate on the latest models (e.g. openai):
https://github.com/defog-ai/sql-eval/?tab=readme-ov-file#openai