view article Article Docmatix - a huge dataset for Document Visual Question Answering 6 days ago • 49
view article Article Introducing ⚔️ AI vs. AI ⚔️ a deep reinforcement learning multi-agents competition system Feb 7, 2023 • 1
view article Article Optimisation d'un système RAG pour la recherche sémantique By Woziii • 10 days ago • 1
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio 14 days ago • 20
view article Article _Repetita iuvant_: how to improve AI code generation By as-cle-bert • 16 days ago • 5
Expressive Gaussian Human Avatars from Monocular RGB Video Paper • 2407.03204 • Published 20 days ago • 1
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 124
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 581
Arabic NLI & Semantic Similarity Datasets Collection The Arabic Version of SNLI and MultiNLI datasets, originally used for Natural Language Inference (NLI), may be used for finetuning embedding models. • 6 items • Updated Jun 18 • 3
view article Article EU Training Data Transparency: A Proposal for a Sufficiently Detailed Summary 📑📚🖼️🇪🇺 By yjernite • 20 days ago • 8
Arabic Matryoshka Embedding Models Collection A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face • 6 items • Updated 11 days ago • 6
view article Article How I train a LoRA: m3lt style training overview By alvdansen • 22 days ago • 37
view article Article Formatting Datasets for Chat Template Compatibility By nroggendorff • 25 days ago • 7
Probably DPO datasets Collection A collection of datasets that probably support DPO • 146 items • Updated 27 days ago • 8
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 28 days ago • 75
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models 30 days ago • 145
view article Article Enhancing Image Model Dreambooth Training Through Effective Captioning: Key Observations By alvdansen • Jun 19 • 11
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published Jun 17 • 29
MobileCLIP Models + DataCompDR Data Collection MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Jun 20 • 19
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 47
Quantum Embedding with Transformer for High-dimensional Data Paper • 2402.12704 • Published Feb 20 • 2
INDUS: Effective and Efficient Language Models for Scientific Applications Paper • 2405.10725 • Published May 17 • 30
view article Article Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task By danaaubakirova • May 16 • 15