Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published 3 days ago • 18
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published 18 days ago • 14
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 11 items • Updated about 2 hours ago • 83
NuNerZero - Zero Shot NER Collection The best compact Zero-Shot NER models with MIT license • 4 items • Updated 7 days ago • 11
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation 19 days ago • 68
view article Article 🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero • 21 days ago • 54
Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper • 2405.04324 • Published 10 days ago • 11
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published 17 days ago • 61
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x • 11 days ago • 24
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 10 items • Updated 5 days ago • 116
view article Article Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face 15 days ago • 12
Llama3-ChatQA-1.5 Collection Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG). • 6 items • Updated 14 days ago • 35
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published 15 days ago • 92
ZeroGPU Spaces Collection ZeroGPU Spaces made by the community • 16 items • Updated about 3 hours ago • 122
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications Paper • 2404.13506 • Published 26 days ago • 1
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 57
Arctic Collection A collection of pre-trained dense-MoE Hybrid transformer models • 2 items • Updated 23 days ago • 18
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent 26 days ago • 71
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 125
RoFormer: Enhanced Transformer with Rotary Position Embedding Paper • 2104.09864 • Published Apr 20, 2021 • 7
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 29 days ago • 516
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) Paper • 2404.00579 • Published Mar 31 • 1
view article Article DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive By bpan • Apr 9 • 26
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9 • 62
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 39
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 100
view article Article RAG Empowerment: Cohere C4AI Command-R and Transformers Unveiled By Andyrasika • Apr 7 • 9
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • 10 days ago • 21
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 46
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
view article Article Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B Apr 4 • 20
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy Paper • 2403.14610 • Published Mar 21 • 1
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model Paper • 2206.14371 • Published Jun 29, 2022 • 3