Zmu (Zach Mustafa)

upvoted a collection 2 days ago

Papers - Custom Layers - MLP

Collection

11 items • Updated May 1 • 1

upvoted 3 papers 3 days ago

upvoted a collection 3 days ago

sentence-transformers-from-synthetic-data

Collection

Example of using distilabel to generate synthetic triplets data for fine-tuning a Sentence Transformer model • 3 items • Updated 2 days ago • 15

upvoted a collection 4 days ago

Embedding Model Datasets

Collection

A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 51 items • Updated 8 days ago • 24

upvoted an article 4 days ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

6 days ago

• 62

upvoted 2 papers 6 days ago

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published 9 days ago • 11

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published 9 days ago • 41

upvoted a paper 8 days ago

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published 14 days ago • 135

upvoted an article 9 days ago

Article

Fast, High-Fidelity LLM Decoding with Regex Constraints

By

•

Feb 23

• 1

upvoted a collection 16 days ago

Wikimedia Datasets

Collection

Wikimedia datasets, across languages and modalities, from different Wikimedia projects, on the hub. Not all tested. • 19 items • Updated 17 days ago • 9

upvoted 2 articles 18 days ago

Article

Vision Language Models Explained

Apr 11

• 92

Article

License to Call: Introducing Transformers Agents 2.0

21 days ago

• 91

upvoted a paper about 1 month ago

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22 • 17

upvoted 2 articles about 1 month ago

Article

Improving Prompt Consistency with Structured Generations

Apr 30

• 46

Article

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Apr 29

• 69

upvoted 2 papers about 1 month ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 239

Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12

upvoted an article about 1 month ago

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 73

upvoted a collection about 2 months ago

GIT

Collection

GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. • 18 items • Updated 11 days ago • 4

upvoted 2 articles about 2 months ago

Article

Design choices for Vision Language Models in 2024

By

•

Apr 16

• 20

Article

History of State Space Models (SSM) in 2022

By

•

Apr 11

• 7

upvoted a paper about 2 months ago

Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11 • 25

upvoted an article about 2 months ago

Article

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

Apr 4

• 20

upvoted a collection 2 months ago

Recent Mamba Papers

Collection

[NB: Notes are from TuringPost] • 3 items • Updated Mar 26 • 8

upvoted 4 papers 2 months ago

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

Paper • 2403.15360 • Published Mar 22 • 11

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21 • 31

MyVLM: Personalizing VLMs for User-Specific Queries

Paper • 2403.14599 • Published Mar 21 • 14

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Paper • 2403.11481 • Published Mar 18 • 10

upvoted 7 papers 3 months ago

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 28

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14 • 11

Enhancing Vision-Language Pre-training with Rich Supervisions

Paper • 2403.03346 • Published Mar 5 • 12

MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3 • 24

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

Paper • 2403.02626 • Published Mar 5 • 9

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 40

ModaVerse: Efficiently Transforming Modalities with LLMs

Paper • 2401.06395 • Published Jan 12 • 3

upvoted 4 collections 3 months ago

image

Collection

55 items • Updated 4 days ago • 1

Models - Multimodal

Collection

15 items • Updated Apr 17 • 2

Papers - Pipeline - Multimodal

Collection

2 items • Updated Mar 13 • 1

tuning

Collection

45 items • Updated 7 days ago • 3

upvoted 6 papers 3 months ago

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 21

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Paper • 2402.16671 • Published Feb 26 • 26

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Paper • 2402.15151 • Published Feb 23 • 7

The Neglected Tails of Vision-Language Models

Paper • 2401.12425 • Published Jan 23 • 2

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

Paper • 2402.15491 • Published Feb 23 • 13

CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

Paper • 2402.15021 • Published Feb 22 • 11

upvoted a collection 4 months ago

Datasets

Collection

43 items • Updated 2 days ago • 4

upvoted 2 papers 4 months ago

Graph Mamba: Towards Learning on Graphs with State Space Models

Paper • 2402.08678 • Published Feb 13 • 12

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13 • 33

upvoted a collection 4 months ago

🎵 The MusicBox

Collection

A collection full of musical tasks demos, for musicians & music enthusiasts • 26 items • Updated Mar 8 • 16

upvoted 5 papers 4 months ago

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Paper • 2402.07033 • Published Feb 10 • 16

Memory Consolidation Enables Long-Context Video Understanding

Paper • 2402.05861 • Published Feb 8 • 7

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 31

Video Understanding with Large Language Models: A Survey

Paper • 2312.17432 • Published Dec 29, 2023 • 2

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 41

upvoted a collection 4 months ago

Multimodal

Collection

248 items • Updated 5 days ago • 12

upvoted a paper 4 months ago

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

Paper • 2401.13160 • Published Jan 24 • 9

upvoted a collection 4 months ago

cool datasets

Collection

83 items • Updated about 21 hours ago • 8

upvoted a paper 4 months ago

Make-A-Shape: a Ten-Million-scale 3D Shape Model

Paper • 2401.11067 • Published Jan 20 • 15

Zach Mustafa PRO

AI & ML interests

Organizations

Zmu's activity

Training and Finetuning Embedding Models with Sentence Transformers v3

Fast, High-Fidelity LLM Decoding with Regex Constraints

Vision Language Models Explained

License to Call: Introducing Transformers Agents 2.0

Improving Prompt Consistency with Structured Generations

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Design choices for Vision Language Models in 2024

History of State Space Models (SSM) in 2022

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B