view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • 9 days ago • 31
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 11 days ago • 226
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 15 days ago • 454
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6 • 12
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published 21 days ago • 58
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples Paper • 2404.07544 • Published 22 days ago • 15
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published 22 days ago • 36
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published 23 days ago • 90
view article Article It's raining diffusion personalization techniques☔️🎭🖼️ By linoyts • 22 days ago • 15
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published 25 days ago • 55
view article Article Pollen-Vision: Unified interface for Zero-Shot vision models in robotics Mar 25 • 6
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published about 1 month ago • 98
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 172
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Paper • 2403.00818 • Published Feb 26 • 12
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 561
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 4 days ago • 158
CroissantLLM: A Truly Bilingual French-English Language Model Paper • 2402.00786 • Published Feb 1 • 22
Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access Paper • 2401.09967 • Published Jan 18 • 1
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated 30 days ago • 75
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4 • 25
Possible Meissner effect near room temperature in copper-substituted lead apatite Paper • 2401.00999 • Published Jan 2 • 5
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 39
Gated Linear Attention Transformers with Hardware-Efficient Training Paper • 2312.06635 • Published Dec 11, 2023 • 3
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts Paper • 2211.15841 • Published Nov 29, 2022 • 7
Tulu V2 Suite Collection The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" • 19 items • Updated Feb 1 • 43
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 38
Zephyr 7B Collection Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated 21 days ago • 136
Reward models on the hub Collection UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated 20 days ago • 23
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference Paper • 2310.04378 • Published Oct 6, 2023 • 19
Contra (Bottleneck T5) Collection Text autoencoders capable of embedding and generating text in a fixed-size latent space, useful for embeddings and latent space text editing. • 4 items • Updated Oct 3, 2023 • 27
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 74
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 440
Neural signature kernels as infinite-width-depth-limits of controlled ResNets Paper • 2303.17671 • Published Mar 30, 2023 • 1
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 4
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints Paper • 2305.13245 • Published May 22, 2023 • 5
The First Room-Temperature Ambient-Pressure Superconductor Paper • 2307.12008 • Published Jul 22, 2023 • 3