Mei dianwen's picture

41 1

Mei dianwen

mdw123

·

AI & ML interests

None yet

Recent Activity

updated a collection 12 days ago

updated a collection 12 days ago

upvoted a paper 19 days ago

A Survey on Post-training of Large Language Models

View all activity

Organizations

None yet

mdw123's activity

upvoted a paper 19 days ago

A Survey on Post-training of Large Language Models

Paper • 2503.06072 • Published 29 days ago • 4

upvoted a paper 20 days ago

Transformers without Normalization

Paper • 2503.10622 • Published 23 days ago • 152

upvoted a paper about 2 months ago

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published Feb 11 • 38

upvoted a paper 9 months ago

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 162

upvoted 10 papers 12 months ago

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 26

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 35

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 65

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 109

OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9, 2024 • 77

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Paper • 2404.04167 • Published Apr 5, 2024 • 14

Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1, 2024 • 31

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 69

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 105

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 110

upvoted 5 papers about 1 year ago

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 45

Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 37

Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 25

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 47