Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker Apr 8, 2021
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Paper • 2404.14723 • Published 5 days ago • 6
JetMoE: Reaching Llama2 Performance with 0.1M Dollars Paper • 2404.07413 • Published 17 days ago • 32
HF-curated models available on Workers AI Collection A collection of models curated with Hugging Face that can be run on Cloudflare's Workers AI serverless inference platform. • 15 items • Updated 25 days ago • 43
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning Paper • 2402.11411 • Published Feb 18 • 1
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13 • 48
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 51
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated 16 days ago • 87
Distil-Whisper Models Collection The first version of the Distil-Whisper models released with the Distil-Whisper paper. • 4 items • Updated Mar 21 • 33
Zephyr 7B Collection Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated 16 days ago • 134
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 82