ML Chair Internal Org

university

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

plonerma authored a paper 4 months ago

LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models

plonerma authored a paper 4 months ago

BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models

PatrickHaller updated a dataset 7 months ago

HU-Berlin-ML-Internal/toxicity-dataset

View all activity

HU-Berlin-ML-Internal's activity

plonerma

authored 2 papers 4 months ago

LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models

Paper • 2408.15729 • Published Aug 28

BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models

Paper • 2404.04113 • Published Apr 5 • 3

PatrickHaller

updated a dataset 7 months ago

HU-Berlin-ML-Internal/toxicity-dataset

Viewer • Updated Jun 11 • 9.58k • 45

aynetdia

authored a paper 8 months ago

OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs

Paper • 2309.03876 • Published Sep 7, 2023 • 2

PatrickHaller

posted an update 8 months ago

Post

1908

How Robust Is Your Model in Complex Code Generation Tasks? 🤔

We've launched the PECC benchmark to challenge chat models in code generation, drawing from the Advent of Code for programming tasks and the Euler Project for math-heavy challenges. This new task tests models with problems presented in both detailed prose and concise "leet code" styles, evaluating their ability to understand and solve complex coding issues and math problem in chat-based interactions.

It seems that the Claude 3 models outperforme ChatGPT:
Model / Avg. (pass@3)
Claude 3 Haiku / 27.67
GPT-3.5-Turbo / 23.75
Mixtral-8x22B-Instruct-v0.1 / 8.35

Read our Preprint📃: PECC: Problem Extraction and Coding Challenges (2404.18766)
Look at the dataset🔎: PatrickHaller/pecc

We also got accepted at LREC-COLING '24 🎉