Bill Yuchen Lin's picture

Bill Yuchen Lin

yuchenlin

·

https://yuchenlin.xyz

AI & ML interests

Research @allenai LLMs and Multimodality, Agents

Recent Activity

updated a dataset 3 days ago

RLRM/Big-Math-RL-Verified-CT

updated a dataset 3 days ago

RLRM/Big-Math-RL-Verified-CT

updated a dataset 3 days ago

RLRM/Big-Math-RL-Verified-CT

View all activity

Organizations

yuchenlin's activity

commented 2 papers 4 months ago

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24, 2024 • 42 •

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Paper • 2411.07133 • Published Nov 11, 2024 • 36 •

New activity in meta-llama/Llama-3.1-8B-Instruct 7 months ago

new tokenizer contains the cutoff date and today date by default

#74 opened 7 months ago by

commented a paper 8 months ago

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published Jun 7, 2024 • 30 •

New activity in goodbadgreedy/GoodBadGreedy 8 months ago

Update README.md

#1 opened 8 months ago by

commented 2 papers 8 months ago

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15, 2024 • 24 •

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15, 2024 • 24 •

New activity in princeton-nlp/Llama-3-Base-8B-SFT-SimPO 8 months ago

no tokenizer?

#1 opened 8 months ago by

commented a paper 9 months ago

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Paper • 2406.18495 • Published Jun 26, 2024 • 13 •

New activity in allenai/WildBench 9 months ago

Is there any way for private model testing?

#9 opened 9 months ago by

Example IDs for GPT4o vs Claude3.5Sonnet

#8 opened 9 months ago by

Model to test, please

#7 opened 9 months ago by

commented 4 papers 9 months ago

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16, 2024 • 14 •

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16, 2024 • 14 •

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16, 2024 • 14 •

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 67 •

New activity in allenai/WildBench 9 months ago

[Changelog] 2024-06-13 Update the WB-scores with gpt-4o version

#6 opened 9 months ago by

commented a paper 9 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 67 •

New activity in allenai/BaseChat 9 months ago

Llama-3-8B thinks it is built by OpenAI

#1 opened 9 months ago by

Update README.md

#2 opened 9 months ago by