Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
clefourrier 
posted an update Jan 24
Post
🏅 New top model on the GAIA benchmark!

Called FRIDAY, it's a mysterious new autonomous agent, which got quite good performances on both the public validation set *and* the private test set.
It notably passed 10 points for the val and 5 points for the test set on our hardest questions (level 3): they require to take arbitrarily long sequences of actions, use any number of tools, and access the world in genera! ✨

The GAIA benchmark evaluates next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc) and was co authored by @gregmialz @ThomasNLG @ylecun @thomwolf and myself: gaia-benchmark/leaderboard

merhaba

mysterious!

·

We're missing a 🪄 emoji :D

@clefourrier The leaderboard reports an error since many days. Are you guys aware of this?

·

Ha shoot, missed it! I restarted the space