Waseem AlShikh

wassemgtk

AI & ML interests

Multi-modal, Palmyra LLMs, Knowledge Graph

Articles

Organizations

Posts 2

view post
Post
3417
Writer team had the opportunity to run an eval for Mixtral-8x22b, results were interesting.

| ---------------------------- |
| #mmlu 77.26 |
| ---------------------------- |
| #hellaswag 88.81 |
| ---------------------------- |
| #truthfulqa 52.05 |
| ---------------------------- |
| #arc_challenge 70.31 |
| ---------------------------- |
| #winogrande 84.93 |
| ---------------------------- |
| #gsm8k 76.65 |
| ---------------------------- |
view post
Post
We are thrilled to announce the release of the OmniACT dataset! This revolutionary dataset and benchmark focuses on pushing the limits of how virtual agents can facilitate the automation of our computer tasks. Imagine less clicking and typing, and more observation as your computer takes care of tasks such as organizing schedules or arranging travel arrangements on its own.

Check it out ➡️ [OmniACT Dataset on Hugging Face]( Writer/omniact)

For a deep dive, here’s the paper: [OmniACT Paper](https://arxiv.org/abs/2402.17553)

datasets

None public yet