Speaker Diarization Datasets Collection A collection of speaker diarization datasets compatible with Diarizers. • 4 items • Updated 10 days ago • 1
End-to-end speaker segmentation for overlap-aware resegmentation Paper • 2104.04045 • Published Apr 8, 2021 • 1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation Paper • 2210.13248 • Published Oct 24, 2022 • 1
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 10 items • Updated about 16 hours ago • 106
llama 3 self-align experiments Collection Replicating the pipeline for StarCoder-2 Instruct on Llama-3-8B with some tweaks https://huggingface.co/blog/sc2-instruct • 4 items • Updated 3 days ago • 5
Community Tools Collection Cool HF tools that I and others at HF work on that I regularly use • 3 items • Updated Sep 8, 2023 • 3
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval Paper • 2311.05800 • Published Nov 10, 2023 • 2
🦢SWIM-IR Dataset Collection 29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs. • 4 items • Updated 14 days ago • 6
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits Paper • 2305.02547 • Published May 4, 2023 • 5
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 57
ablation-models Collection 1.8B models trained on 350BT to compare different pretraining datasets • 7 items • Updated 7 days ago • 20
Generalizable Face Landmarking Guided by Conditional Face Warping Paper • 2404.12322 • Published 24 days ago • 1
Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning Paper • 2404.12897 • Published 23 days ago • 1
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published 21 days ago • 25
Antidote Project Collection Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote) • 20 items • Updated 6 days ago • 5
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper • 2206.15076 • Published Jun 30, 2022 • 3
Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper • 2403.13257 • Published Mar 20 • 16
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated 6 days ago • 68
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated 9 days ago • 43
Inference Endpoints For Eval Spec. Models Collection Models i want to use upstream as part of evaluation libraries then use them to optimize evaluations and downstream applications. • 8 items • Updated Apr 5 • 2
HF-curated models available on Workers AI Collection A collection of models curated with Hugging Face that can be run on Cloudflare's Workers AI serverless inference platform. • 15 items • Updated Apr 2 • 45
RED^{rm FM}: a Filtered and Multilingual Relation Extraction Dataset Paper • 2306.09802 • Published Jun 16, 2023 • 4
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 37
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 39
SambaLingo Collection Expert models that adapt Llama2 to a diverse set of languages from around the world. • 27 items • Updated 24 days ago • 34
UDOP Collection UDOP is a general multimodal model for document AI • 4 items • Updated 3 days ago • 19
Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures Paper • 2402.05424 • Published Feb 8 • 17
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 32
matlok - Python Src Code Datasets (base) Collection Python code from leading AI research and tools repositories • 2 items • Updated Feb 8 • 1
matlok - Python Code Instruction Datasets Collection Python Alpaca instructions from leading AI research and tools repositories - focus is on "Manager level" understanding atm • 4 items • Updated Feb 12 • 1
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR Paper • 2310.00274 • Published Sep 30, 2023 • 2
Segment Anything Model Collection This collection contains models and demos of SAM and it's smaller friends. • 17 items • Updated Mar 11 • 6
🎦🔀 Useful Tiny Video Converters Collection All spaces made to convert a video (of GIFs) to anything useful in your pipelines • 5 items • Updated Feb 20 • 5
File names and splits Collection 8 datasets showcase the diversity of splits configuration on HuggingFace. See docs: https://huggingface.co/docs/hub/datasets-file-names-and-splits. • 8 items • Updated Nov 22, 2023 • 4