SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published 6 days ago • 51
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published 5 days ago • 31 • 5
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published 5 days ago • 31
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published 5 days ago • 31 • 5
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 859
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 58
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 58
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published Nov 17, 2024 • 56
Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement Paper • 2410.15633 • Published Oct 21, 2024 • 7
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception Paper • 2405.15232 • Published May 24, 2024 • 2
One Shot Learning as Instruction Data Prospector for Large Language Models Paper • 2312.10302 • Published Dec 16, 2023 • 3
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9, 2024 • 49
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4, 2024 • 73
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9, 2024 • 49
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts Paper • 2305.14839 • Published May 24, 2023 • 1