view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA May 24, 2023 • 19
Korean Datasets I've released so far. Collection 지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 6 items • Updated Dec 29, 2023 • 14
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training Paper • 2309.10400 • Published Sep 19, 2023 • 18
FewMany Collection Benchmark For Few Shot Classification with Many Classes • 8 items • Updated 13 days ago • 6
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 9 days ago • 222
LayoutLM Collection The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. • 5 items • Updated Nov 27, 2023 • 7
Table Transformer Collection The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. • 5 items • Updated Dec 1, 2023 • 10
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent 10 days ago • 59
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 13 days ago • 444
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 233
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 52
Long-context LLMs Struggle with Long In-context Learning Paper • 2404.02060 • Published 29 days ago • 31
HF-curated models available on Workers AI Collection A collection of models curated with Hugging Face that can be run on Cloudflare's Workers AI serverless inference platform. • 15 items • Updated 29 days ago • 44
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk Paper • 2401.05033 • Published Jan 10 • 14
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 85
Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper • 2403.13257 • Published Mar 20 • 16
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13 • 48
StructLM Collection The structure knowledge grounded language model • 6 items • Updated 25 days ago • 6
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7 • 38
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Paper • 2403.00818 • Published Feb 26 • 12
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention Paper • 2303.16199 • Published Mar 28, 2023 • 3
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 560
GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23 • 19
OpenMath Collection A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated Feb 19 • 26
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 32
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 65 items • Updated 2 days ago • 287
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 12 items • Updated Feb 7 • 73
MPT Collection The MPT collections is a series of decoder-style transformer models trained from scratch by MosaicML. Details: https://www.mosaicml.com/mpt • 9 items • Updated Nov 27, 2023 • 7
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper • 2306.05685 • Published Jun 9, 2023 • 20
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 84
Zephyr 7B Gemma Collection Models, dataset, and Demo for Zephyr 7B Gemma. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 5 items • Updated 19 days ago • 15
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6 • 102
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts Paper • 2402.07625 • Published Feb 12 • 10
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning Paper • 2402.06332 • Published Feb 9 • 17
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Paper • 2401.08671 • Published Jan 9 • 12
Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7 • 16
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Paper • 2309.00267 • Published Sep 1, 2023 • 45
Nomic Embed: Training a Reproducible Long Context Text Embedder Paper • 2402.01613 • Published Feb 2 • 13
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty Paper • 2401.15077 • Published Jan 26 • 17
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 55
Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation Paper • 2305.07804 • Published May 12, 2023 • 2
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models Paper • 2309.02706 • Published Sep 6, 2023 • 2