Omar Sanseviero's picture

Omar Sanseviero

osanseviero

·

https://osanseviero.github.io/hackerllama/

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙

Articles

Welcome Llama 3 - Meta's new open LLM

CodeGemma - an official Google release for code LLMs

🪆 Introduction to Matryoshka Embedding Models

Welcome Gemma - Google's new open LLM

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Mixture of Experts Explained

Inference for PROs

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Results of the Open Source AI Game Jam

Llama 2 is here - get it on Hugging Face

The Falcon has landed in the Hugging Face ecosystem

Hugging Face Machine Learning Demos on arXiv

What's new in Diffusers? 🎨

Announcing Evaluation on the Hub

An Introduction to Deep Reinforcement Learning

Welcome spaCy to the 🤗 Hub

Sentence Transformers in the 🤗 Hub

Organizations

osanseviero's activity

upvoted 2 articles 1 day ago

Article

Space secrets security update

2 days ago

• 33

Article

Indexify: Bringing HuggingFace Models to Real-Time Pipelines for Production Applications

By

•

1 day ago

• 4

upvoted a collection 1 day ago

AQLM+PV

9 items • Updated 1 day ago • 5

upvoted 5 articles 1 day ago

Article

🕳️ Attention Sinks in LLMs for endless fluency

By

•

Oct 9, 2023

• 6

Article

⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2

By

•

3 days ago

• 20

Article

Sales Forecasting with Image Regression

By

•

8 days ago

• 2

Article

How to Fine-Tune Custom Embedding Models Using AutoTrain

By

•

2 days ago

• 9

Article

Orchestration of Experts: The First-Principle Multi-Model System

By

•

2 days ago

• 13

upvoted a paper 2 days ago

Editing Models with Task Arithmetic

Paper • 2212.04089 • Published Dec 8, 2022 • 5

upvoted an article 2 days ago

Article

FiftyOne Computer Vision Datasets Come to the Hugging Face Hub

By

•

• 8

upvoted a collection 2 days ago

LLaVA-Phi-3-mini

4 items • Updated Apr 28 • 11

upvoted a paper 3 days ago

Yuan 2.0-M32: Mixture of Experts with Attention Router

Paper • 2405.17976 • Published 4 days ago • 15

upvoted 18 papers 4 days ago

NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections

Paper • 2405.14871 • Published 9 days ago • 6

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Paper • 2405.14477 • Published 9 days ago • 14

Dense Connector for MLLMs

Paper • 2405.13800 • Published 10 days ago • 20

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published 9 days ago • 27

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published 9 days ago • 33

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published 8 days ago • 11

LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

Paper • 2405.16287 • Published 7 days ago • 9

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published 5 days ago • 12

Part123: Part-aware 3D Reconstruction from a Single-view Image

Paper • 2405.16888 • Published 6 days ago • 10

Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning

Paper • 2405.17258 • Published 5 days ago • 11

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Paper • 2405.17428 • Published 5 days ago • 12

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published 6 days ago • 10

Zamba: A Compact 7B SSM Hybrid Model

Paper • 2405.16712 • Published 6 days ago • 17

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Paper • 2405.15757 • Published 8 days ago • 11

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published 6 days ago • 14

Matryoshka Multimodal Models

Paper • 2405.17430 • Published 5 days ago • 29

Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published 5 days ago • 44

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published 5 days ago • 63

upvoted an article 4 days ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

5 days ago

• 59

upvoted an article 5 days ago

Article

Introducing Transformers Agent 2.0: A Leap Forward in Intelligent Automation

By

•

5 days ago

• 6

upvoted a collection 5 days ago

DenseConnector

Official collection of "Dense Connector for MLLMs" • 4 items • Updated 4 days ago • 1

upvoted 7 papers 5 days ago

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

Paper • 2306.13631 • Published Jun 23, 2023 • 8

AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct

Paper • 2405.14906 • Published 10 days ago • 18

Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining

Paper • 2405.14908 • Published 9 days ago • 10

CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

Paper • 2405.14979 • Published 9 days ago • 13

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published 8 days ago • 19

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published 9 days ago • 30

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published 8 days ago • 45

upvoted a collection 5 days ago

ConvLLaVA

A collection of ConvLLaVA models. • 10 items • Updated 4 days ago • 10

upvoted 3 papers 5 days ago

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published 8 days ago • 41

Aya 23: Open Weight Releases to Further Multilingual Progress

Paper • 2405.15032 • Published 9 days ago • 21

The Road Less Scheduled

Paper • 2405.15682 • Published 8 days ago • 16

upvoted an article 5 days ago

Article

GPU Poor Savior: Revolutionizing Low-Bit Open Source LLMs and Cost-Effective Edge Computing

By

•

8 days ago

• 9

upvoted an article 6 days ago

Article

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages

9 days ago

• 12

upvoted a paper 6 days ago

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18

upvoted a paper 8 days ago

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Paper • 2405.11273 • Published 14 days ago • 15

upvoted an article 8 days ago

Article

AI has a problem with objectifying women

By

•

8 days ago

• 52

upvoted a collection 9 days ago

C4AI Aya 23

Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated 9 days ago • 34

upvoted a paper 10 days ago

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Paper • 2404.16019 • Published Apr 24 • 1

upvoted an article 10 days ago

Article

Introducing Spaces Dev Mode for a seamless developer experience

12 days ago

• 10

upvoted a collection 10 days ago

Phi-3

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 22 items • Updated 2 days ago • 299

upvoted 7 papers 10 days ago

Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published 12 days ago • 25

Personalized Residuals for Concept-Driven Text-to-Image Generation

Paper • 2405.12978 • Published 11 days ago • 8

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published 11 days ago • 20

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Paper • 2405.12979 • Published 11 days ago • 7

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published 11 days ago • 23

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published 13 days ago • 134

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Paper • 2405.11582 • Published 13 days ago • 10