Inception

company

https://www.inceptionai.ai

Activity Feed

AI & ML interests

Generative AI, Arabic NLP

Recent Activity

alielfilali01 updated a Space 32 minutes ago

inceptionai/AraGen-Leaderboard-Legacy

alielfilali01 updated a dataset 3 days ago

inceptionai/requests-dataset

alielfilali01 published a Space 3 days ago

inceptionai/AraGen-Leaderboard-Legacy

View all activity

Articles

Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More

10 days ago

• 15

Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard

Dec 4, 2024

• 35

inceptionai's activity

alielfilali01

updated a Space 32 minutes ago

AraGen Leaderboard

📊

Generative Tasks Evaluation of Arabic LLMs

alielfilali01

updated a dataset 3 days ago

inceptionai/requests-dataset

Preview • Updated 3 days ago • 212 • 1

alielfilali01

published a Space 3 days ago

AraGen Leaderboard

📊

Generative Tasks Evaluation of Arabic LLMs

alielfilali01

updated a dataset 13 days ago

inceptionai/Arabic_IFEval

Viewer • Updated 13 days ago • 404 • 56 • 3

alielfilali01

updated a Space 15 days ago

Arabic Leaderboards

📊

Generative Tasks Evaluation of Arabic LLMs

SamujjwalIIAI

in inceptionai/Llama-3.1-Sherkala-8B-Chat 22 days ago

Link to technical report is broken

#4 opened about 2 months ago by

antonpolishko

alielfilali01

updated a dataset 23 days ago

inceptionai/AraGen

Viewer • Updated 23 days ago • 23.2k • 240

alielfilali01

published a Space 23 days ago

3C3H HeatMap

👀

Generate heatmaps for model performance metrics

SarahAlBarri

published 2 datasets 23 days ago

inceptionai/Arabic_IFEval

Viewer • Updated 13 days ago • 404 • 56 • 3

inceptionai/AraGen

Viewer • Updated 23 days ago • 23.2k • 240

samta-kamboj

published a Space 23 days ago

Arabic Leaderboards

📊

Generative Tasks Evaluation of Arabic LLMs

fajrikoto

authored a paper about 1 month ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10 • 97

clefourrier

posted an update about 1 month ago

Post

2212

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

alielfilali01

posted an update about 2 months ago

Post

954

🚨 Arabic LLM Evaluation 🚨

Few models join the ranking of https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard Today.

The new MistralAI model, Saba, is quite impressive, Top10 ! Well done @arthurmensch and team.

Sadly Mistral did not follow its strategy about public weights this time, we hope this changes soon and we get the model with a permissive license.

We added other Mistral models and apparently, we have been sleeping on mistralai/Mistral-Large-Instruct-2411 !

Another impressive model that joined the ranking today is ALLaM-AI/ALLaM-7B-Instruct-preview. After a long wait finally ALLaM is here and it is IMPRESSIVE given its size !

ALLaM is ranked on OALL/Open-Arabic-LLM-Leaderboard as well.

clefourrier

authored a paper 2 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 225

alielfilali01

posted an update 3 months ago

Post

2083

3C3H AraGen Leaderboard welcomes today deepseek-ai/DeepSeek-V3 and 12 other models (including the late gpt-3.5 💀) to the ranking of best LLMs in Arabic !

Observations:
- DeepSeek-v3 ranked 3rd and only Open model among the top 5 !

- A 14B open model ( Qwen/Qwen2.5-14B-Instruct) outperforms gpt-3.5-turbo-0125 (from last year). This shows how much we came in advancing and supporting Arabic presence within the LLM ecosystem !

- Contrary to what observed in likelihood-acc leaderboards (like OALL/Open-Arabic-LLM-Leaderboard) further finetuned models like maldv/Qwentile2.5-32B-Instruct actually decreased the performance compared to the original model Qwen/Qwen2.5-32B-Instruct.
It's worth to note that the decrease is statiscally insignificant which imply that at best, the out-domain finetuning do not really hurts the model original capabilities acquired during pretraining.
Previous work addressed this (finetuning VS pretraining) but more investigation in this regard is required (any PhDs here ? This could be your question ...)

Check out the latest rankings: https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard

alielfilali01

posted an update 4 months ago

Post

2010

~75% on the challenging GPQA with only 40M parameters 🔥🥳

GREAT ACHIEVEMENT ! Or is it ?

This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.

The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.

Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.

What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.

This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, it’s apparently possible to (intentionally or unintentionally) leak test data through this method.

Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)

1 reply

alielfilali01

posted an update 4 months ago

Post

3504

Unpopular opinion: Open Source takes courage to do !

Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !

Cheers to the heroes here who see this!

5 replies

alielfilali01

posted an update 4 months ago

Post

1566

Apparently i forgot to put this here !

Well, this is a bit late but consider given our recent blog a read if you are interested in Evaluation.

You don't have to be into Arabic NLP in order to read it, the main contribution we are introducing is a new evaluation measure for NLG. We made the fisrt application of this measure on Arabic for now and we will be working with colleagues from the community to expand it to other languages.

Blog:
Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard
https://huggingface.co/blog/leaderboard-3c3h-aragen

Space:
https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard

Give it a read and let me know your thoughts 🤗

clefourrier

authored a paper 4 months ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 19

AI & ML interests

Recent Activity

Articles

Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More

Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard

Team members 31

inceptionai's activity

AraGen Leaderboard

AraGen Leaderboard

Arabic Leaderboards

Link to technical report is broken

3C3H HeatMap

Arabic Leaderboards