MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 20
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 39
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Paper • 2402.12875 • Published Feb 20 • 7
view article Article ArabicWeb24: Creating a High Quality Arabic Web-only Pre-training Dataset By MayFarhat • Aug 8 • 9
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published 8 days ago • 14
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published 14 days ago • 68
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning Paper • 2402.10110 • Published Feb 15 • 3
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models Paper • 2409.02076 • Published 15 days ago • 9
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation Paper • 2409.04410 • Published 12 days ago • 23
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published 14 days ago • 26
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators Paper • 2403.16950 • Published Mar 25 • 4
view article Article How to train a new language model from scratch using Transformers and Tokenizers Feb 14, 2020 • 16
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts Paper • 2408.15901 • Published 21 days ago • 1
LLaVA-Onevision Collection LLaVa_Onevision models for single-image, multi-image, and video scenarios • 7 items • Updated 29 days ago • 7
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 68
Qwen2-VL Collection Vision-language model series based on Qwen2 • 15 items • Updated about 3 hours ago • 106
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published 14 days ago • 27
OLMoE Collection Artifacts for open mixture-of-experts language models. • 13 items • Updated 4 days ago • 18
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published 20 days ago • 49
Visual Understanding Collection Accurate & efficient vision models, ops and systems • 15 items • Updated Jul 15 • 1
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5 • 17
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. • 39 items • Updated Jul 17 • 35
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published 22 days ago • 36
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 24
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published 27 days ago • 109
view article Article Deep Learning over the Internet: Training Language Models Collaboratively Jul 15, 2021 • 4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 54
Better Alignment with Instruction Back-and-Forth Translation Paper • 2408.04614 • Published Aug 8 • 14
🦅 🐍 FalconMamba 7B Collection This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 13 items • Updated about 8 hours ago • 25
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs Paper • 2407.02552 • Published Jul 2 • 4
Llama 3.1 Collection This collection hosts the transformers and original repos of the Meta Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Aug 2 • 567
Arabic Matryoshka Embedding Models Collection A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face • 9 items • Updated Aug 2 • 8
view article Article Formatting Datasets for Chat Template Compatibility By nroggendorff • Jun 28 • 7
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 82
view article Article How to Fine-Tune Custom Embedding Models Using AutoTrain By abhishek • May 30 • 10