Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 9 items • Updated 1 day ago • 69
view article Article Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face 5 days ago • 11
Llama3-ChatQA-1.5 Collection Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG). • 6 items • Updated 4 days ago • 28
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published 5 days ago • 66
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications Paper • 2404.13506 • Published 17 days ago • 1
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 57
Arctic Collection A collection of pre-trained dense-MoE Hybrid transformer models • 2 items • Updated 13 days ago • 18
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent 16 days ago • 67
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community 23 days ago • 107
RoFormer: Enhanced Transformer with Rotary Position Embedding Paper • 2104.09864 • Published Apr 20, 2021 • 7
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 19 days ago • 479
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) Paper • 2404.00579 • Published Mar 31 • 1
view article Article DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive By bpan • 29 days ago • 25
view article Article Making thousands of open LLMs bloom in the Vertex AI Model Garden 28 days ago • 16
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published 29 days ago • 61
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 39
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 98
view article Article RAG Empowerment: Cohere C4AI Command-R and Transformers Unveiled By Andyrasika • 30 days ago • 9
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • Jan 23 • 10
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 46
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
view article Article Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B Apr 4 • 19
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy Paper • 2403.14610 • Published Mar 21 • 1
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model Paper • 2206.14371 • Published Jun 29, 2022 • 3
LLM Tools Collection A collection of tools as various HF Spaces on LLMs. • 18 items • Updated 6 days ago • 1
LLM Training Datasets Collection A collection of datasets for training LLMs. • 42 items • Updated about 14 hours ago • 1
Papers Collection Large Language Model (LLM) and NLP related papers. • 62 items • Updated about 14 hours ago • 4
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 50
Common Corpus Collection The largest public domain dataset for training LLMs. • 26 items • Updated Mar 20 • 99
ORPO Collection This is the official collection of "ORPO: Monolithic Preference Optimization without Reference Model". • 5 items • Updated 25 days ago • 10
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 54
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 54
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 119
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25