EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17 • 17
Seamless Communication Collection A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16 • 150
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
BigVGAN Collection BigVGAN is a universal neural vocoder that generates audio waveform using mel spectrogram as input. • 11 items • Updated Oct 1 • 10
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 27 items • Updated 6 days ago • 486
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 13
view article Article Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages May 24 • 24
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22 • 78
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 251
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 680
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9 • 64
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15 • 11
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 63