MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated 3 days ago • 14
Training and Inference Efficiency of Encoder-Decoder Speech Models Paper • 2503.05931 • Published 17 days ago • 2
Cosmos Transfer1 Collection World Foundation Model for Domain Transfer • 5 items • Updated 4 days ago • 11
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 13 days ago • 342
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! 18 days ago • 45
Jamba 1.6 Collection The AI21 Jamba family of models are hybrid SSM-Transformer foundation models, outperforming open model competitors on quality and speed. • 2 items • Updated 18 days ago • 11
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 20 days ago • 68
C4AI Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 4 items • Updated 22 days ago • 38
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 21 days ago • 69
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published 22 days ago • 26