Vegard Wærp's picture

1 14 4

Vegard Wærp

vegardw

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

liked a dataset 12 months ago

PleIAs/YouTube-Commons

View all activity

Organizations

None yet

vegardw's activity

upvoted a paper about 2 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 151

upvoted an article 12 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15, 2024

• 176

upvoted a paper 12 months ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 67

upvoted 8 papers about 1 year ago

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

Paper • 2403.17919 • Published Mar 26, 2024 • 16

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Paper • 2403.08763 • Published Mar 13, 2024 • 51

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 188

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 612

FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25, 2024 • 40

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 107

How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 42

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 147