AIMO Progress Prize Collection Models and datasets used in the winning solution to the AIMO 1st Progress Prize • 7 items • Updated 3 days ago • 6
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 27 days ago • 75
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 13 items • Updated Jun 18 • 5
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 67 items • Updated 19 days ago • 52
MS MARCO Mined Triplets Collection These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 14 items • Updated May 21 • 7
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training Paper • 2405.06932 • Published May 11 • 15
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 67
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1 • 30
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression Paper • 2306.03078 • Published Jun 5, 2023 • 3
The case for 4-bit precision: k-bit Inference Scaling Laws Paper • 2212.09720 • Published Dec 19, 2022 • 3
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11 • 40
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 87
Preference Datasets for KTO Collection This collection contains a list of curated preference datasets for KTO fine-tuning for intent alignment of LLMs through signals. • 5 items • Updated Mar 19 • 11
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 180
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 69
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 264 items • Updated 30 days ago • 352
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 32
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 54
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 169
CoEdIT: Text Editing by Task-Specific Instruction Tuning Paper • 2305.09857 • Published May 17, 2023 • 6
LongNet: Scaling Transformers to 1,000,000,000 Tokens Paper • 2307.02486 • Published Jul 5, 2023 • 80
TART: A plug-and-play Transformer module for task-agnostic reasoning Paper • 2306.07536 • Published Jun 13, 2023 • 10
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning Paper • 2306.07967 • Published Jun 13, 2023 • 24