PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30 • 36
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 9 items • Updated about 16 hours ago • 58
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 53
A Critical Evaluation of AI Feedback for Aligning Large Language Models Paper • 2402.12366 • Published Feb 19 • 3
Reward models on the hub Collection UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated 20 days ago • 23
⛔️🔦 Provenance, Watermarking & Deepfake Detection Collection Technical tools for more control over non-consensual synthetic content • 14 items • Updated Apr 1 • 35
Universal token classification Collection Collection of universal token classification (UTC) models capable in prompt-tuned manner to solve many information extraction tasks. • 5 items • Updated Jan 15 • 7
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 72
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models Paper • 2401.00788 • Published Jan 1 • 21
LLM-Assisted Code Cleaning For Training Accurate Code Generators Paper • 2311.14904 • Published Nov 25, 2023 • 3
Seamless Communication Collection A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16 • 120
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated 30 days ago • 75
JudgeLM: Fine-tuned Large Language Models are Scalable Judges Paper • 2310.17631 • Published Oct 26, 2023 • 31
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models Paper • 2310.15941 • Published Oct 24, 2023 • 6
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper • 2310.08491 • Published Oct 12, 2023 • 48
tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation Paper • 2301.05948 • Published Jan 14, 2023 • 3
Nougat: Neural Optical Understanding for Academic Documents Paper • 2308.13418 • Published Aug 25, 2023 • 33