Choms

Choms
ยท

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago
Qwen/Qwen2.5-Coder-Artifacts
liked a Space about 2 months ago
stabilityai/stable-diffusion-3.5-large
View all activity

Organizations

Amazon Web Services's profile picture ASYD Solutions's profile picture

Choms's activity

replied to TuringsSolutions's post 5 months ago
view reply

If you really think the issue is not charging money, you are in for a surprise...

reacted to singh96aman's post with ๐Ÿ”ฅ 6 months ago
view post
Post
2086
๐—๐˜‚๐—ฑ๐—ด๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—๐˜‚๐—ฑ๐—ด๐—ฒ๐˜€: ๐—˜๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—”๐—น๐—ถ๐—ด๐—ป๐—บ๐—ฒ๐—ป๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฉ๐˜‚๐—น๐—ป๐—ฒ๐—ฟ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐—ถ๐—ฒ๐˜€ ๐—ถ๐—ป ๐—Ÿ๐—Ÿ๐— ๐˜€-๐—ฎ๐˜€-๐—๐˜‚๐—ฑ๐—ด๐—ฒ๐˜€
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges (2406.12624)

๐‚๐š๐ง ๐‹๐‹๐Œ๐ฌ ๐ฌ๐ž๐ซ๐ฏ๐ž ๐š๐ฌ ๐ซ๐ž๐ฅ๐ข๐š๐›๐ฅ๐ž ๐ฃ๐ฎ๐๐ ๐ž๐ฌ โš–๏ธ?

We aim to identify the right metrics for evaluating Judge LLMs and understand their sensitivities to prompt guidelines, engineering, and specificity. With this paper, we want to raise caution โš ๏ธ to blindly using LLMs as human proxy.

Blog - https://huggingface.co/blog/singh96aman/judgingthejudges
Arxiv - https://arxiv.org/abs/2406.12624
Tweet - https://x.com/iamsingh96aman/status/1804148173008703509

@singh96aman @kartik727 @Srinik-1 @sankaranv @dieuwkehupkes
New activity in nerijs/pixel-art-xl 6 months ago

License

13
#7 opened about 1 year ago by elie707
liked a Space 6 months ago
reacted to victor's post with ๐Ÿ”ฅ 6 months ago
view post
Post
4002
Together MoA is a really interesting approach based on open source models!

"We introduce Mixture of Agents (MoA), an approach to harness the collective strengths of multiple LLMs to improve state-of-the-art quality. And we provide a reference implementation, Together MoA, which leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0, surpassing prior leader GPT-4o (57.5%)."

Read more here: https://www.together.ai/blog/together-moa

PS: they provide some demo code: (https://github.com/togethercomputer/MoA/blob/main/bot.py) - if someone release a Space for it it could go ๐Ÿš€
  • 1 reply
ยท