8 6 12

Shizhe Diao

shizhediao

https://shizhediao.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset 3 days ago

OptimalScale/ClimbLab

liked a dataset 3 days ago

OptimalScale/ClimbLab

liked a dataset 3 days ago

OptimalScale/ClimbMix

View all activity

Organizations

shizhediao's activity

updated a dataset 3 days ago

OptimalScale/ClimbLab

Viewer • Updated 3 days ago • 1.24B • 709 • 7

liked 2 datasets 3 days ago

OptimalScale/ClimbLab

Viewer • Updated 3 days ago • 1.24B • 709 • 7

OptimalScale/ClimbMix

Viewer • Updated 3 days ago • 395M • 646 • 3

liked 2 datasets 4 days ago

nvidia/ClimbMix

Viewer • Updated 1 day ago • 355M • 1.29k • 21

nvidia/ClimbLab

Viewer • Updated 2 days ago • 1.1B • 9.83k • 28

authored a paper 6 days ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 6 days ago • 86

upvoted a paper 6 days ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 6 days ago • 86

commented a paper 6 days ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 6 days ago • 86 •

authored a paper 5 months ago

Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20, 2024 • 45

authored a paper 7 months ago

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Paper • 2410.03290 • Published Oct 4, 2024 • 7

updated a dataset 7 months ago

Post-training-Data-Flywheel/function-calling-1.0

Updated Sep 20, 2024 • 17

updated a collection 8 months ago

flywheel

Collection

2 items • Updated Aug 29, 2024

updated a Space 8 months ago

README

📊

upvoted a paper 8 months ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 59

updated a model 9 months ago

shizhediao/hf-lora

Updated Aug 4, 2024

liked a Space 9 months ago

Berkeley Function Calling Leaderboard

🏃

liked a model 9 months ago

nvidia/Minitron-4B-Base

Text Generation • Updated Feb 14 • 689 • 134

upvoted a paper 9 months ago

Compact Language Models via Pruning and Knowledge Distillation

Paper • 2407.14679 • Published Jul 19, 2024 • 40