Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Paper • 2405.11273 • Published 12 days ago • 15
view article Article CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models 6 days ago • 16
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 57
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 42
view article Article Train custom AI models with the trainer API and adapt them to 🤗 By not-lain • 4 days ago • 19
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 11 items • Updated 13 days ago • 102
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • 22 days ago • 24
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x • 24 days ago • 24
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • 17 days ago • 42
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published about 1 month ago • 114
view article Article 🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero • Apr 26 • 55
Vision Language Models Papers 🖼️💬📝 Collection Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated 30 days ago • 26
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 20 items • Updated 8 days ago • 293
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 545
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 58
view article Article DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive By bpan • Apr 9 • 26
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 24
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 567
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2 • 61
Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition Paper • 2404.00565 • Published Mar 31 • 6
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 89
Larimar: Large Language Models with Episodic Memory Control Paper • 2403.11901 • Published Mar 18 • 30
Common Corpus Collection The largest public domain dataset for training LLMs. • 26 items • Updated Mar 20 • 103
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 120
Language models scale reliably with over-training and on downstream tasks Paper • 2403.08540 • Published Mar 13 • 13
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13 • 48
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 175
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs Paper • 2403.02775 • Published Mar 5 • 11
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16 • 26
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models Paper • 2309.11674 • Published Sep 20, 2023 • 29
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method Paper • 2402.17193 • Published Feb 27 • 23