Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints 23 days ago • 51
view article Article Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers Feb 1, 2022 • 2
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion Paper • 2405.09874 • Published 7 days ago • 14
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published 7 days ago • 9
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published 7 days ago • 21
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published 7 days ago • 34
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published 8 days ago • 24
Dynamic data sampler for cross-language transfer learning in large language models Paper • 2405.10626 • Published 6 days ago • 3
Observational Scaling Laws and the Predictability of Language Model Performance Paper • 2405.10938 • Published 6 days ago • 9
Layer-Condensed KV Cache for Efficient Inference of Large Language Models Paper • 2405.10637 • Published 6 days ago • 14
INDUS: Effective and Efficient Language Models for Scientific Applications Paper • 2405.10725 • Published 6 days ago • 15
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization Paper • 2405.11582 • Published 4 days ago • 9
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching Paper • 2405.11252 • Published 5 days ago • 10
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper • 2405.11157 • Published 6 days ago • 17
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published 3 days ago • 19
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published 4 days ago • 28
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published 3 days ago • 35
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published 4 days ago • 42
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 11 items • Updated 6 days ago • 97
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 14 items • Updated 1 day ago • 126
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints 23 days ago • 51
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 235
OBELICS 📚🔍 Collection Collection gathering artifacts related to OBELICS • 4 items • Updated Apr 15 • 5
🐶 IDEFICS 🐶 Collection Collection assembling all the models and spaces related to IDEFICS • 6 items • Updated Apr 15 • 7
From screenshots to HTML Collection WebSight is a dataset of 823,000 HTML/CSS codes representing synthetically generated English websites, each accompanied by a corresponding screenshot. • 4 items • Updated Apr 15 • 15
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated 17 days ago • 80
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11 • 39
StarChat2 15B Collection Model, datasets, and demo for StarChat2 15B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 10 items • Updated Apr 12 • 11
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 173
OpenMath Collection A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated Feb 19 • 28
Canary Collection A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤 • 1 item • Updated Feb 19 • 14
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 11 days ago • 173
OLMo Suite Collection Artifacts for the first set of OLMo models. • 12 items • Updated 8 days ago • 36
MAGNeT Collection Masked Audio Generation using a Single Non-Autoregressive Transformer • 9 items • Updated Apr 4 • 30
Seamless: Multilingual Expressive and Streaming Speech Translation Paper • 2312.05187 • Published Dec 8, 2023 • 8
Apple MLX-compatible 7B LLMs on the 🤗 Hub Collection This collection contains the model weights for 7B LLMs for Apple's MLX framework. Find more information at https://github.com/ml-explore/mlx • 8 items • Updated 16 days ago • 9
Distil-Whisper Models Collection The first version of the Distil-Whisper models released with the Distil-Whisper paper. • 4 items • Updated Mar 21 • 34
Seamless Communication Collection A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16 • 125
Controllable Music Production with Diffusion Models and Guidance Gradients Paper • 2311.00613 • Published Nov 1, 2023 • 23
Audio Codecs Embeddings 🎙️ Collection A collection of codec and embedding models supported in 🤗 Transformers. • 2 items • Updated Sep 16, 2023 • 1
Text to Music 🎧 Collection A collection of music generation models supported in 🤗 Transformers and 🧨 Diffusers • 5 items • Updated Sep 16, 2023 • 2
Audio Classification 🔊 Collection A collection of audio classification models supported in 🤗 Transformers • 3 items • Updated Sep 16, 2023 • 3
Text to Speech 🗣️ Collection A collection of TTS models supported in 🤗 Transformers. • 4 items • Updated Sep 16, 2023 • 5