The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 8 days ago • 71
Tokenization Falling Short: The Curse of Tokenization Paper • 2406.11687 • Published 15 days ago • 13
view article Article How to Fine-Tune Custom Embedding Models Using AutoTrain By abhishek • May 30 • 10
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 18 days ago • 147
IrokoBench Collection a human-translated benchmark dataset for 16 African languages covering three tasks: NLI, MMLU and MGSM • 6 items • Updated May 31 • 14
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 29 items • Updated 26 days ago • 226
📚 FineWeb-Edu Collection FineWeb-Edu datasets, classifier and ablation model • 5 items • Updated 21 days ago • 7
Arabic Aya DPO Datasets Collection Our synthetic DPO datasets for Arabic Aya. • 5 items • Updated 28 days ago • 3
Arabic NoRobots DPO Datasets Collection Our synthetic DPO datasets for Arabic NoRobots. • 4 items • Updated May 29 • 4
view article Article ⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2 By burtenshaw • 29 days ago • 21
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Paper • 2405.11273 • Published May 18 • 17
view article Article CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models May 24 • 20
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 58
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 44
view article Article Train custom AI models with the trainer API and adapt them to 🤗 By not-lain • 3 days ago • 28
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated 6 days ago • 119
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7 • 29
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x • 9 days ago • 30
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • 9 days ago • 48
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
view article Article 🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero • 29 days ago • 64
Vision Language Models Papers 🖼️💬📝 Collection Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated Apr 30 • 27
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 22 items • Updated May 31 • 353
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 618
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 58
view article Article DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive By bpan • Apr 9 • 28
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 33
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 574
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2 • 61
Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition Paper • 2404.00565 • Published Mar 31 • 6