Introducing Visual Perception Token into Multimodal Large Language Model Paper • 2502.17425 • Published 18 days ago • 14
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published Aug 22, 2024 • 51
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Paper • 2402.19427 • Published Feb 29, 2024 • 55
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29, 2024 • 51
The Impact of Reasoning Step Length on Large Language Models Paper • 2401.04925 • Published Jan 10, 2024 • 18
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 29
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3" Paper • 2312.06571 • Published Dec 11, 2023 • 13