493 115 157

Merve Noyan PRO

merve

mervenoyann

merveenoyan

AI & ML interests

VLMs, vision & co

Articles

Organizations

merve's activity

upvoted 9 papers 17 days ago

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published 22 days ago • 30

A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published 25 days ago • 19

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Paper • 2404.14396 • Published 25 days ago • 17

BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published 29 days ago • 23

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published 28 days ago • 26

Pegasus-v1 Technical Report

Paper • 2404.14687 • Published 24 days ago • 28

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published 24 days ago • 28

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published 24 days ago • 53

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published 24 days ago • 120

upvoted an article 22 days ago

Article

How to Finetune phi-3 on MacBook Pro

•

23 days ago

• 57

upvoted 2 papers 29 days ago

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21 • 50

upvoted 7 articles 30 days ago

Article

It's raining diffusion personalization techniques☔️🎭🖼️

•

Apr 11

• 16

Article

Orchestration of Experts: The First-Principle Multi-Model System

•

Apr 16

• 8

Article

Many-shot jailbreaking

•

Apr 2

• 3

Article

Mixture of Depth is Vibe

•

25 days ago

• 35

Article

Mergoo: Efficiently Build Your Own MoE LLM

•

10 days ago

• 32

Article

Design choices for Vision Language Models in 2024

•

about 1 month ago

• 18

Article

Custom architectures with HuggingFace 🤗

•

25 days ago

• 20

upvoted 2 collections about 2 months ago

Llama2-7B HQQ+

Collection

Extreme low-bit quantization with HQQ+ (HQQ + LoRA adapter) • 3 items • Updated 28 days ago • 14

DBRX

Collection

DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 89

upvoted 3 papers about 2 months ago

VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 21

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Paper • 2403.06764 • Published Mar 11 • 24

DragAnything: Motion Control for Anything using Entity Representation

Paper • 2403.07420 • Published Mar 12 • 11

upvoted a collection 2 months ago

MetricX-23

Collection

A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/) • 6 items • Updated 3 days ago • 12

upvoted 4 papers 3 months ago

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7 • 62

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 9

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19 • 50

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 53

upvoted a paper 4 months ago

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 48

upvoted a collection 4 months ago

SigLIP

Collection

Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 8 items • Updated 3 days ago • 24

upvoted 12 papers 4 months ago

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5 • 16

Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss

Paper • 2401.02677 • Published Jan 5 • 21

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Paper • 2401.04398 • Published Jan 9 • 18

Jump Cut Smoothing for Talking Heads

Paper • 2401.04718 • Published Jan 9 • 16

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9 • 24

Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding

Paper • 2401.04575 • Published Jan 9 • 14

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9 • 37

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Paper • 2401.04468 • Published Jan 9 • 46

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3 • 29

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 80

LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2 • 50

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 173

upvoted a collection 4 months ago

Dinov2

Collection

5 items • Updated Jan 16 • 6

upvoted 12 papers 5 months ago

Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion

Paper • 2312.14327 • Published Dec 21, 2023 • 6

WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

Paper • 2312.14187 • Published Dec 20, 2023 • 49

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

Paper • 2312.15011 • Published Dec 22, 2023 • 15

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Paper • 2312.16171 • Published Dec 26, 2023 • 30

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 19

Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

Paper • 2312.15430 • Published Dec 24, 2023 • 25

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 55

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Paper • 2206.15076 • Published Jun 30, 2022 • 3

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 116

Continuous Learning in a Hierarchical Multiscale Neural Network

Paper • 1805.05758 • Published May 15, 2018 • 1

Silkie: Preference Distillation for Large Visual Language Models

Paper • 2312.10665 • Published Dec 17, 2023 • 10

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 253

upvoted a collection 5 months ago

LLM Leaderboard best models ❤️‍🔥

Collection

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 70 items • Updated about 15 hours ago • 304

upvoted 3 papers 5 months ago

SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

Paper • 2312.09246 • Published Dec 14, 2023 • 5

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Paper • 2312.09256 • Published Dec 14, 2023 • 8

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

Paper • 2312.09252 • Published Dec 14, 2023 • 9

Merve Noyan PRO

AI & ML interests

Articles

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

Deploy MusicGen in no time with Inference Endpoints

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Jupyter X Hugging Face

Using Machine Learning to Aid Survivors and Race through Time

Introducing Skops

Announcing the Hugging Face Fellowship Program

Showcase Your Projects in Spaces using Gradio

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Organizations

merve's activity

How to Finetune phi-3 on MacBook Pro

It's raining diffusion personalization techniques☔️🎭🖼️

Orchestration of Experts: The First-Principle Multi-Model System

Many-shot jailbreaking

Mixture of Depth is Vibe

Mergoo: Efficiently Build Your Own MoE LLM

Design choices for Vision Language Models in 2024

Custom architectures with HuggingFace 🤗