Zmu (Zach Mustafa)

upvoted a collection about 12 hours ago

Wikimedia Datasets

Wikimedia datasets, across languages and modalities, from different Wikimedia projects, on the hub. Not all tested. • 19 items • Updated 2 days ago • 7

upvoted 2 articles 3 days ago

Article

Vision Language Models Explained

Apr 11

• 82

Article

License to Call: Introducing Transformers Agents 2.0

6 days ago

• 63

upvoted a paper 15 days ago

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22 • 16

upvoted 2 articles 18 days ago

Article

Improving Prompt Consistency with Structured Generations

19 days ago

• 41

Article

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

20 days ago

• 68

upvoted 2 papers 26 days ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published 26 days ago • 230

Music Consistency Models

Paper • 2404.13358 • Published 28 days ago • 12

upvoted an article 26 days ago

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

27 days ago

• 71

upvoted a collection about 1 month ago

GIT

Collection

GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. • 18 items • Updated 10 days ago • 4

upvoted 2 articles about 1 month ago

Article

Design choices for Vision Language Models in 2024

By

•

Apr 16

• 18

Article

History of State Space Models (SSM) in 2022

By

•

Apr 11

• 6

upvoted a paper about 1 month ago

Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11 • 25

upvoted an article about 1 month ago

Article

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

Apr 4

• 20

upvoted a collection about 2 months ago

Recent Mamba Papers

Collection

[NB: Notes are from TuringPost] • 3 items • Updated Mar 26 • 9

upvoted 4 papers about 2 months ago

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

Paper • 2403.15360 • Published Mar 22 • 11

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21 • 31

MyVLM: Personalizing VLMs for User-Specific Queries

Paper • 2403.14599 • Published Mar 21 • 14

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Paper • 2403.11481 • Published Mar 18 • 10

upvoted 6 papers 2 months ago

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 28

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14 • 11

Enhancing Vision-Language Pre-training with Rich Supervisions

Paper • 2403.03346 • Published Mar 5 • 12

MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3 • 24

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

Paper • 2403.02626 • Published Mar 5 • 9

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 40

upvoted a paper 3 months ago

ModaVerse: Efficiently Transforming Modalities with LLMs

Paper • 2401.06395 • Published Jan 12 • 3

upvoted 4 collections 3 months ago

upvoted 6 papers 3 months ago

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 18

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Paper • 2402.16671 • Published Feb 26 • 26

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Paper • 2402.15151 • Published Feb 23 • 7

The Neglected Tails of Vision-Language Models

Paper • 2401.12425 • Published Jan 23 • 2

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

Paper • 2402.15491 • Published Feb 23 • 13

CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

Paper • 2402.15021 • Published Feb 22 • 11

upvoted a collection 3 months ago

Datasets

Collection

41 items • Updated 15 days ago • 3

upvoted 2 papers 3 months ago

Graph Mamba: Towards Learning on Graphs with State Space Models

Paper • 2402.08678 • Published Feb 13 • 12

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13 • 33

upvoted a collection 3 months ago

🎵 The MusicBox

Collection

A collection full of musical tasks demos, for musicians & music enthusiasts • 26 items • Updated Mar 8 • 15

upvoted 3 papers 3 months ago

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Paper • 2402.07033 • Published Feb 10 • 16

Memory Consolidation Enables Long-Context Video Understanding

Paper • 2402.05861 • Published Feb 8 • 7

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 31

upvoted 2 papers 4 months ago

Video Understanding with Large Language Models: A Survey

Paper • 2312.17432 • Published Dec 29, 2023 • 2

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 41

upvoted a collection 4 months ago

Multimodal

Collection

244 items • Updated 6 days ago • 12

upvoted a paper 4 months ago

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

Paper • 2401.13160 • Published Jan 24 • 9

upvoted a collection 4 months ago

cool datasets

Collection

81 items • Updated 8 days ago • 8

upvoted 3 papers 4 months ago

Make-A-Shape: a Ten-Million-scale 3D Shape Model

Paper • 2401.11067 • Published Jan 20 • 15

VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18 • 36

Distilling Vision-Language Models on Millions of Videos

Paper • 2401.06129 • Published Jan 11 • 13

upvoted 7 collections 4 months ago

Hermes

Collection

Nous' Flagship LLM Series • 21 items • Updated 3 days ago • 84

🧠 NeuralHermes-2.5

Collection

Models and code related to the DPO fine-tuned OpenHermes-2.5-Mistral-7B • 12 items • Updated Mar 22 • 3

🐶 Beagle

Collection

Merges done using mergekit and LazyMergekit: https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb#scrollTo=d5mYzDo1q96y • 8 items • Updated Mar 22 • 6

🔀 Phixtral

Collection

The first Mixture of Experts with phi-2 models. • 3 items • Updated Mar 22 • 7

🔮 Mixture of Experts

Collection

MoE done using mergekit and LazyMergekit: https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb#scrollTo=d5mYzDo1q96y • 13 items • Updated Mar 22 • 21

Tiny Series

Collection

Tiny datasets that empower the foundation of Small Language Model! • 11 items • Updated Jan 26 • 31

PEFT

Collection

181 items • Updated 15 days ago • 10

upvoted a paper 4 months ago

Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4 • 59

upvoted a collection 5 months ago

Latent Consistency Model Demos

Collection

Latent Consistency Models for Stable Diffusion • 8 items • Updated Nov 12, 2023 • 24

Zach Mustafa PRO

AI & ML interests

Organizations

Zmu's activity

Vision Language Models Explained

License to Call: Introducing Transformers Agents 2.0

Improving Prompt Consistency with Structured Generations

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Design choices for Vision Language Models in 2024

History of State Space Models (SSM) in 2022

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B