Sebastian Gabarain's picture

Sebastian Gabarain

Locutusque

·

SebastianG74019

AI & ML interests

Pushing performance in small language models

Recent Activity

liked a Space about 3 hours ago

Locutusque/Locutusque-Models

liked a dataset 4 days ago

WitchesSocialStream/Four-Leaf-Clover

updated a Space 9 days ago

Locutusque/Locutusque-Models

View all activity

Organizations

Locutusque's activity

upvoted a paper 5 months ago

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 50

upvoted a paper 7 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 51

upvoted a paper 9 months ago

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

Paper • 2407.08348 • Published Jul 11, 2024 • 53

upvoted a paper 10 months ago

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 35

upvoted 3 papers 11 months ago

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29, 2024 • 50

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 132

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15, 2024 • 89

upvoted a collection 12 months ago

Yi-1.5 (2024/05)

10 items • Updated May 20, 2024 • 92

upvoted an article 12 months ago

Article

Introducing the Open Chain of Thought Leaderboard

Apr 23, 2024

• 32

upvoted a collection about 1 year ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 749

upvoted a paper about 1 year ago

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Paper • 2404.07647 • Published Apr 11, 2024 • 4

upvoted a collection about 1 year ago

OpenCerebrum-2.0

My open source take on Aether Research's proprietary Cerebrum dataset. • 3 items • Updated Apr 13, 2024 • 2

upvoted 3 papers about 1 year ago

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 66

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 106

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 78

upvoted 2 collections about 1 year ago

Augmentable

A collection of datasets that should be augmented further with gpt-4 • 13 items • Updated Jan 2, 2024 • 4

Hub Models

970 items • Updated 2 days ago • 8