Clémentine Fourrier's picture

Clémentine Fourrier

clefourrier

·

http://clefourrier.github.io

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

gaia-benchmark/results_public

new activity 4 days ago

open-llm-leaderboard/open_llm_leaderboard:Proposal for new column

liked a model 4 days ago

utter-project/EuroLLM-1.7B

View all activity

Articles

Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard

Introduction to the Open Leaderboard for Japanese LLMs

Letting Large Models Debate: The First Multilingual LLM Debate Competition

Judge Arena: Benchmarking LLMs as Evaluators

Introducing the Open FinLLM Leaderboard

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

Let's talk about LLM evaluation

Introducing the Open Arabic LLM Leaderboard

Introducing the Open Leaderboard for Hebrew LLMs!

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

Improving Prompt Consistency with Structured Generations

Introducing the Open Chain of Thought Leaderboard

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Introducing the Chatbot Guardrails Arena

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

Introducing the Red-Teaming Resistance Leaderboard

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard

2023, year of open LLMs

Open LLM Leaderboard: DROP deep dive

Overview of natively supported quantization schemes in 🤗 Transformers

What's going on with the Open LLM Leaderboard?

Introduction to Graph Machine Learning

Organizations

clefourrier's activity

liked a model 4 days ago

utter-project/EuroLLM-1.7B

Text Generation • Updated 25 days ago • 4.89k • 50

liked 2 Spaces 5 days ago

Number Tokenization Blog

Scaling test-time compute

liked a Space 6 days ago

Fev Leaderboard

liked a Space 10 days ago

Edge LLM Leaderboard

liked a model 11 days ago

tencent/Tencent-Hunyuan-Large

Text Generation • Updated 28 days ago • 201 • 529

liked a Space 11 days ago

Background Removal Arena

liked a dataset 11 days ago

open-llm-leaderboard/contents

Viewer • Updated about 1 hour ago • 2.33k • 12.4k • 5

liked a dataset 12 days ago

apple/GSM-Symbolic

Viewer • Updated 12 days ago • 12.5k • 70 • 5

liked a Space 13 days ago

Social Impact Dashboard

liked a dataset 17 days ago

CohereForAI/include-base-44

Viewer • Updated 11 days ago • 23.7k • 2.03k • 20

liked a Space 18 days ago

Open Source Ai Year In Review 2024

What happened in open-source AI this year, and what’s next?

liked a model 20 days ago

Qwen/Qwen2.5-7B

Text Generation • Updated Sep 25 • 115k • 84

liked a Space 23 days ago

Toxicity Benchmarking

liked a Space 24 days ago

Running on CPU Upgrade

Open ASR Leaderboard

liked a dataset 24 days ago

open-llm-leaderboard/dfurman__CalmeRys-78B-Orpo-v0.1-details

Viewer • Updated Sep 29 • 43.2k • 69 • 1

liked a model 24 days ago

Qwen/QwQ-32B-Preview

Text Generation • Updated 23 days ago • 117k • • 1.37k

liked a Space 25 days ago

Leaderboard2024

liked a Space about 1 month ago

Running on CPU Upgrade

Open Japanese LLM Leaderboard

liked a dataset about 1 month ago

juletxara/mgsm

Viewer • Updated May 9, 2023 • 2.84k • 5.73k • 25