Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 10 items • Updated 4 days ago • 115
Llama3-ChatQA-1.5 Collection Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG). • 6 items • Updated 12 days ago • 34
Arctic Collection A collection of pre-trained dense-MoE Hybrid transformer models • 2 items • Updated 21 days ago • 18
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community about 1 month ago • 124
Aurora-M models Collection Aurora-M models (base, biden-harris redteams and instruct) • 5 items • Updated 10 days ago • 15
A little guide to building Large Language Models in 2024 Collection Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated Apr 1 • 13
The SPRIGHT T2I collection Collection This collection contains the datasets, model, paper, and demo associated with the SPRIGHT (SPatially RIGHT) release. • 5 items • Updated Apr 2 • 3
The Case for Co-Designing Model Architectures with Hardware Paper • 2401.14489 • Published Jan 25 • 2
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 3 days ago • 165
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 88
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 23
Wav2Vec 2.0 Collection A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data. • 8 items • Updated Jan 16 • 12
Load 4bit models 4x faster Collection Native bitsandbytes 4bit pre quantized models • 16 items • Updated 25 days ago • 21
WhisperKit Collection Datasets, models and evaluation results for WhisperKit • 1 item • Updated Mar 23 • 5
Long-Form Test Sets Collection A collection of long-form (samples > 30s) datasets used to evaluate the Distil-Whisper models. • 5 items • Updated Mar 21 • 5
Training Datasets Collection A collection of pseudo-labelled datasets used to train the Distil-Whisper model. • 9 items • Updated Mar 21 • 12
distil-large-v3 Collection This collection contains the model repositories for distil-large-v3, which provides support for the most popular Whisper libraries. • 4 items • Updated Mar 21 • 4
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 53
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13 • 34
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 38
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 44
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 182
Matryoshka Embedding Models Collection https://huggingface.co/blog/matryoshka • 12 items • Updated about 12 hours ago • 10
OpenMath Collection A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated Feb 19 • 28
InstructRetro Collection InstructRetro is an autoregressive decoder-only language model (LM) with retrieval-augmented pretraining and instruction tuning. • 4 items • Updated 17 days ago • 7
ML for Tools Collection Collection of papers about ML for using tools! • 25 items • Updated Jan 17 • 9
Comparing DPO with IPO and KTO Collection A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 9 • 31
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 28 items • Updated Mar 23 • 178
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 76
Switch-Transformers release Collection This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated 1 day ago • 11
Nemotron 3 8B Collection The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise. • 5 items • Updated Feb 19 • 37
zephyr story Collection sources mentioned by hf.co/thomwolf tweet: x.com/Thom_Wolf/status/1720503998518640703 • 8 items • Updated Jan 24 • 15
AgentTuning: Enabling Generalized Agent Abilities for LLMs Paper • 2310.12823 • Published Oct 19, 2023 • 33
Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 20
Contrastive Decoding Improves Reasoning in Large Language Models Paper • 2309.09117 • Published Sep 17, 2023 • 37
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 57
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning Paper • 2307.02053 • Published Jul 5, 2023 • 23