Knut Jägersberg's picture

Knut Jägersberg

KnutJaegersberg

·

AI & ML interests

NLP, opinion mining, narrative intelligence

Articles

Towards actively reasoning LLM systems

Organizations

Posts 4

Post

1189

What We Learned from a Year of Building with LLMs

It's a nice perspective outlined in here.

“When a measure becomes a target, it ceases to be a good measure.”

— Goodhart’s Law

https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/

Post

Shocking: 2/3 of LLMs fail at 2K context length

code_your_own_ai makes a great vlog about mostly LLM related AI content.
As I watched the video below, I wondered about current best practices on LLM evaluation. We have benchmarks, we have sota LLMs evaluating LLMs, we have tools evaluating based on human comparison.
Often, I hear, just play with the LLM for 15 mins to form an opinion.
While I think for a specific use case and clear expectations, this could yield signal carrying experiences, I also see that one prompt is used to judge models.
While benchmarks have their weaknesses, and are by themselves not enough to judge model quality, I still think systematic methods that try to reduce various scientifically known errs should be the way forward, even for qualitative estimates.
What do you think? How can we make a public tool for judging models like lmsys/chatbot-arena-leaderboard help to leverage standards known in social science?

https://www.youtube.com/watch?v=mWrivekFZMM

Collections 2

models 118

KnutJaegersberg/Llama-3-70B-Synthia-v3.5-bnb-4bit

Text Generation • Updated about 1 hour ago

KnutJaegersberg/Tess-2.0-Llama-3-70B-bnb-4bit

Text Generation • Updated about 4 hours ago

KnutJaegersberg/KafkaLM-8x7B-German

Text Generation • Updated 1 day ago • 2

KnutJaegersberg/CPU-LLM-Horde

Updated 2 days ago • 2.81k • 16

KnutJaegersberg/granite-34b-code-instruct-awq

Text Generation • Updated 8 days ago • 14 • 3

KnutJaegersberg/Deita-34b

Text Generation • Updated 11 days ago • 493 • 3

KnutJaegersberg/Deita-34b-exl-8.0bpw

Text Generation • Updated 15 days ago

KnutJaegersberg/Deita-500m

Text Generation • Updated 21 days ago • 1.98k

KnutJaegersberg/gpt2-chatbot

Text Generation • Updated 28 days ago • 4.66k • 11

KnutJaegersberg/Deita-Mixtral-8x7b

Text Generation • Updated May 1 • 764

datasets 24

KnutJaegersberg/Deita-6k

Viewer • Updated 11 days ago • 1

KnutJaegersberg/c4-website-classifier-dataset

Viewer • Updated Mar 21 • 361 • 2

KnutJaegersberg/Auton

Preview • Updated Dec 12, 2023 • 4 • 5

KnutJaegersberg/trilobite

Viewer • Updated Dec 3, 2023

KnutJaegersberg/facehugger

Viewer • Updated Dec 3, 2023

KnutJaegersberg/webglm_dataset

Viewer • Updated Nov 16, 2023

KnutJaegersberg/longinstruct

Viewer • Updated Nov 14, 2023 • 3 • 3

KnutJaegersberg/dolphin_orca_clustered

Updated Sep 14, 2023 • 1

KnutJaegersberg/orca-wizardlm-v1-clustered

Viewer • Updated Sep 14, 2023

KnutJaegersberg/WizardLM_evol_instruct_V2_196k_instruct_format

Preview • Updated Sep 4, 2023 • 3