Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.13676

small-but-mighty-llms

microsoft/Phi-3-mini-128k-instruct

Text Generation • Updated Aug 20 • 477k • 1.62k
Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20 • 39

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 125
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6 • 30
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31 • 15
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22 • 25
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22 • 30

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4 • 49
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 52
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1 • 22
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1 • 23

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs