Omar Sanseviero's picture

Omar Sanseviero

osanseviero

·

https://osanseviero.github.io/hackerllama/

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙

Recent Activity

updated a model about 7 hours ago

google/gemma-3-1b-pt

liked a model 1 day ago

google/gemma-3-12b-it

liked a model 1 day ago

google/gemma-3-4b-it

View all activity

Organizations

osanseviero's activity

upvoted a collection 1 day ago

Gemma 3 Release

9 items • Updated about 22 hours ago • 235

upvoted an article 2 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

3 days ago

• 243

upvoted a collection 5 days ago

C4AI Aya Vision

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 10 days ago • 63

upvoted an article 10 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

11 days ago

• 65

upvoted a paper 12 days ago

Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published 24 days ago • 56

upvoted an article 18 days ago

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

24 days ago

• 65

upvoted a collection 21 days ago

GemmaX2

GemmaX2 language models, including pretrained and instruction-tuned models of 2 sizes, including 2B, 9B. • 7 items • Updated Feb 7 • 20

upvoted 2 papers 21 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 23 days ago • 164

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 22 days ago • 129

upvoted a collection 23 days ago

PaliGemma 2 Mix

13 items • Updated 3 days ago • 60

upvoted a paper 28 days ago

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Paper • 2502.07617 • Published Feb 11 • 29

upvoted 2 articles about 1 month ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.16k

Article

Open-R1: Update #1

By

and 7 others •

Feb 2

• 295

upvoted a paper about 1 month ago

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Paper • 2501.18512 • Published Jan 30 • 27

upvoted 2 articles about 2 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 808

Article

Mastering Long Contexts in LLMs with KVPress

By

and 1 other •

Jan 23

• 64

upvoted 3 papers about 2 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 92

Enhancing Human-Like Responses in Large Language Models

Paper • 2501.05032 • Published Jan 9 • 50

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 276

upvoted a paper 2 months ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 93