Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.10668

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Paper • 2312.06134 • Published Dec 11, 2023 • 2
Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 6
Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 37
Exploring Format Consistency for Instruction Tuning

Paper • 2307.15504 • Published Jul 28, 2023 • 5

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 92
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15 • 37
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15 • 17
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15 • 35

Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 78
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

THUDM/chatglm3-6b

Updated 17 days ago • 161k • 1.06k
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

Data Compression

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82
openai-community/gpt2

Text Generation • Updated Feb 19 • 6.9M • • 2.11k
meta-llama/Llama-2-7b

Text Generation • Updated Apr 17 • 4k
meta-llama/Llama-2-13b

Text Generation • Updated Apr 17 • 317

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 54
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 79
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

Paper • 2311.00871 • Published Nov 1, 2023 • 2

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

LLM Optimization

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 51
Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 37
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 75
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82
Qwen/Qwen-7B-Chat

Text Generation • Updated Mar 19 • 42.6k • 742
Anthropic/hh-rlhf

Viewer • Updated May 26, 2023 • 169k • 117k • 1.12k
cerebras/SlimPajama-627B

Viewer • Updated Jul 7, 2023 • 2.16M • 5.24k • 398

Previous
1
2
3
4
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs