Knut Jägersberg
KnutJaegersberg
AI & ML interests
NLP, opinion mining, narrative intelligence
Articles
Organizations
Posts
3
Post
Shocking: 2/3 of LLMs fail at 2K context length
code_your_own_ai makes a great vlog about mostly LLM related AI content.
As I watched the video below, I wondered about current best practices on LLM evaluation. We have benchmarks, we have sota LLMs evaluating LLMs, we have tools evaluating based on human comparison.
Often, I hear, just play with the LLM for 15 mins to form an opinion.
While I think for a specific use case and clear expectations, this could yield signal carrying experiences, I also see that one prompt is used to judge models.
While benchmarks have their weaknesses, and are by themselves not enough to judge model quality, I still think systematic methods that try to reduce various scientifically known errs should be the way forward, even for qualitative estimates.
What do you think? How can we make a public tool for judging models like lmsys/chatbot-arena-leaderboard help to leverage standards known in social science?
https://www.youtube.com/watch?v=mWrivekFZMM
code_your_own_ai makes a great vlog about mostly LLM related AI content.
As I watched the video below, I wondered about current best practices on LLM evaluation. We have benchmarks, we have sota LLMs evaluating LLMs, we have tools evaluating based on human comparison.
Often, I hear, just play with the LLM for 15 mins to form an opinion.
While I think for a specific use case and clear expectations, this could yield signal carrying experiences, I also see that one prompt is used to judge models.
While benchmarks have their weaknesses, and are by themselves not enough to judge model quality, I still think systematic methods that try to reduce various scientifically known errs should be the way forward, even for qualitative estimates.
What do you think? How can we make a public tool for judging models like lmsys/chatbot-arena-leaderboard help to leverage standards known in social science?
https://www.youtube.com/watch?v=mWrivekFZMM
Post
QuIP# ecosystem is growing :)
I've seen a quip# 2 bit Qwen-72b-Chat model today on the hub that shows there is support for vLLM inference.
This will speed up inference and make high performing 2 bit models more practical. I'm considering quipping MoMo now, as I can only use brief context window of Qwen-72b on my system otherwise, even with bnb double quantization.
keyfan/Qwen-72B-Chat-2bit
Also notice the easier to use Quip# for all library :)
https://github.com/chu-tianxiang/QuIP-for-all
I've seen a quip# 2 bit Qwen-72b-Chat model today on the hub that shows there is support for vLLM inference.
This will speed up inference and make high performing 2 bit models more practical. I'm considering quipping MoMo now, as I can only use brief context window of Qwen-72b on my system otherwise, even with bnb double quantization.
keyfan/Qwen-72B-Chat-2bit
Also notice the easier to use Quip# for all library :)
https://github.com/chu-tianxiang/QuIP-for-all
models
105
KnutJaegersberg/Luminex-34B-v0.1-exl2-8.0bpw
Text Generation
•
Updated
KnutJaegersberg/jamba-bagel-4bit
Text Generation
•
Updated
KnutJaegersberg/Deita-20b
Text Generation
•
Updated
•
2.62k
•
1
KnutJaegersberg/Deita-32b
Text Generation
•
Updated
•
730
KnutJaegersberg/Deita-32b-adapter
Updated
•
3
KnutJaegersberg/website-classifier
Text Classification
•
Updated
•
87
KnutJaegersberg/B1-66ER
Updated
•
1
KnutJaegersberg/2-bit-LLMs
Text Generation
•
Updated
•
3.03k
•
92
KnutJaegersberg/Deacon-34b-qlora-adapter
Text Generation
•
Updated
•
2
KnutJaegersberg/Deacon-34b-Adapter
Text Generation
•
Updated
datasets
24
KnutJaegersberg/c4-website-classifier-dataset
Viewer
•
Updated
•
1
KnutJaegersberg/Deita-6k
Viewer
•
Updated
•
1
•
1
KnutJaegersberg/Auton
Preview
•
Updated
•
5
KnutJaegersberg/trilobite
Viewer
•
Updated
KnutJaegersberg/facehugger
Viewer
•
Updated
KnutJaegersberg/webglm_dataset
Viewer
•
Updated
KnutJaegersberg/longinstruct
Viewer
•
Updated
•
3
KnutJaegersberg/dolphin_orca_clustered
Updated
•
1
KnutJaegersberg/orca-wizardlm-v1-clustered
Viewer
•
Updated
KnutJaegersberg/WizardLM_evol_instruct_V2_196k_instruct_format
Preview
•
Updated
•
3