evaluation benchmark against gpt4o, gpt4o-mini, o1-preview, oi-mini

#18

by gweiz - opened Nov 24, 2024

Nov 24, 2024

Anyone tried to benchmark against the latest foundation language model?
Is it still on par with the best? Any guide to go about to do this benchmarking?

jp-defog

Nov 25, 2024

Hi @gweiz , you can follow the instructions in our eval harness repo to evaluate on the latest models (e.g. openai):
https://github.com/defog-ai/sql-eval/?tab=readme-ov-file#openai

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment