imelnyk (Igor Melnyk)

upvoted 2 papers 7 months ago

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published May 15 • 24

upvoted 11 papers 8 months ago

Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3

Paper • 2405.00664 • Published May 1 • 18

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Paper • 2405.00332 • Published May 1 • 30

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 73

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29 • 68

upvoted 7 papers 9 months ago

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12 • 34

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 87

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30

RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9 • 34

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 104

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9 • 64

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Paper • 2404.04125 • Published Apr 4 • 27

Igor Melnyk

AI & ML interests

Organizations

imelnyk's activity

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Better & Faster Large Language Models via Multi-token Prediction

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Multi-Head Mixture-of-Experts

SnapKV: LLM Knows What You are Looking for Before Generation

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

BLINK: Multimodal Large Language Models Can See but Not Perceive

Learn Your Reference Model for Real Good Alignment

Pre-training Small Base LMs with Fewer Tokens

Rho-1: Not All Tokens Are What You Need

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

RULER: What's the Real Context Size of Your Long-Context Language Models?

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance