pretraining - a tyzhu Collection

tyzhu 's Collections

IR

pretraining

updated 3 days ago

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Paper • 2502.15499 • Published 16 days ago • 13
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Paper • 2502.17422 • Published 13 days ago • 7
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Paper • 2502.17535 • Published 13 days ago • 8
Scaling LLM Pre-training with Vocabulary Curriculum

Paper • 2502.17910 • Published 12 days ago • 1
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published 17 days ago • 160
Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published 7 days ago • 51