Arctic Collection A collection of pre-trained dense-MoE Hybrid transformer models • 2 items • Updated 8 days ago • 17
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community 18 days ago • 93
Aurora-M models Collection Aurora-M models (base, biden-harris redteams and instruct) • 5 items • Updated 6 days ago • 15
A little guide to building Large Language Models in 2024 Collection Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated Apr 1 • 13
The SPRIGHT T2I collection Collection This collection contains the datasets, model, paper, and demo associated with the SPRIGHT (SPatially RIGHT) release. • 5 items • Updated about 1 month ago • 3
The Case for Co-Designing Model Architectures with Hardware Paper • 2401.14489 • Published Jan 25 • 2
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 4 days ago • 158
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 85
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 23
Wav2Vec 2.0 Collection A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data. • 8 items • Updated Jan 16 • 12
Load 4bit models 4x faster Collection Native bitsandbytes 4bit pre quantized models • 16 items • Updated 12 days ago • 20
WhisperKit Collection Datasets, models and evaluation results for WhisperKit • 1 item • Updated Mar 23 • 5
Long-Form Test Sets Collection A collection of long-form (samples > 30s) datasets used to evaluate the Distil-Whisper models. • 5 items • Updated Mar 21 • 5
Training Datasets Collection A collection of pseudo-labelled datasets used to train the Distil-Whisper model. • 9 items • Updated Mar 21 • 12
distil-large-v3 Collection This collection contains the model repositories for distil-large-v3, which provides support for the most popular Whisper libraries. • 4 items • Updated Mar 21 • 4
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 53
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13 • 33
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 33
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 44
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 182
Matryoshka Embedding Models Collection https://huggingface.co/blog/matryoshka • 11 items • Updated Mar 27 • 10
OpenMath Collection A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated Feb 19 • 26
InstructRetro Collection InstructRetro is an autoregressive decoder-only language model (LM) with retrieval-augmented pretraining and instruction tuning. • 4 items • Updated 4 days ago • 6
ML for Tools Collection Collection of papers about ML for using tools! • 25 items • Updated Jan 17 • 8
Comparing DPO with IPO and KTO Collection A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 9 • 31
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 28 items • Updated Mar 23 • 172
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated 29 days ago • 75
Switch-Transformers release Collection This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated 23 days ago • 11
Nemotron 3 8B Collection The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise. • 5 items • Updated Feb 19 • 35
zephyr story Collection sources mentioned by hf.co/thomwolf tweet: x.com/Thom_Wolf/status/1720503998518640703 • 8 items • Updated Jan 24 • 15
AgentTuning: Enabling Generalized Agent Abilities for LLMs Paper • 2310.12823 • Published Oct 19, 2023 • 33
Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 20
Contrastive Decoding Improves Reasoning in Large Language Models Paper • 2309.09117 • Published Sep 17, 2023 • 37
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 56
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning Paper • 2307.02053 • Published Jul 5, 2023 • 23
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Paper • 2308.01390 • Published Aug 2, 2023 • 30
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 31
A Deep Neural Network for SSVEP-based Brain-Computer Interfaces Paper • 2011.08562 • Published Nov 17, 2020 • 2