Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints 14 days ago • 46
Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker Apr 8, 2021
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published 7 days ago • 6 • 1
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 99 • 5
Improved Baselines with Visual Instruction Tuning Paper • 2310.03744 • Published Oct 5, 2023 • 32 • 4
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 45 • 2