2 38 63

Chao Zhou

ASHIDAKA

AI & ML interests

Object Detection, Transformer

Recent Activity

liked a Space 6 days ago

nanotron/ultrascale-playbook

upvoted a collection 14 days ago

Kimi-VL-A3B

liked a dataset about 1 month ago

nvidia/Llama-Nemotron-Post-Training-Dataset

View all activity

Organizations

None yet

ASHIDAKA's activity

liked a Space 6 days ago

2.5k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted a collection 14 days ago

Kimi-VL-A3B

Collection

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 11 days ago • 61

liked a dataset about 1 month ago

nvidia/Llama-Nemotron-Post-Training-Dataset

Viewer • Updated 7 days ago • 3.91M • 7.19k • 423

upvoted a paper about 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 143

upvoted 2 papers 2 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 103

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 182

liked a dataset 2 months ago

open-r1/OpenR1-Math-220k

Viewer • Updated Feb 18 • 450k • 36.5k • 555

upvoted an article 3 months ago

Article

How to train a Language Model with Megatron-LM

Sep 7, 2022

• 10

liked a model 4 months ago

facebook/multi-token-prediction

Updated Jun 18, 2024 • 368

liked a dataset 4 months ago

allenai/dolma

Updated Apr 17, 2024 • 775 • 899

upvoted a collection 5 months ago

Tulu 3 Datasets

Collection

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated Mar 13 • 78

upvoted 2 papers 6 months ago

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Paper • 2410.13863 • Published Oct 17, 2024 • 38

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 98

liked a model 6 months ago

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Text Generation • Updated 10 days ago • 28.2k • • 2.03k

upvoted a paper 7 months ago

Pixtral 12B

Paper • 2410.07073 • Published Oct 9, 2024 • 66

liked a Space 7 months ago

281

Zero123++ Demo Space

🌒

upvoted a paper 7 months ago

Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 55