Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 51
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing Paper • 2403.12032 • Published Mar 18 • 14
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 39
Transformers compatible Mamba Collection This release includes the `mamba` repositories compatible with the `transformers` library • 5 items • Updated Mar 6 • 27
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Paper • 2403.00818 • Published Feb 26 • 13
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 568
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling Paper • 2402.10211 • Published Feb 15 • 8
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation Paper • 2402.10210 • Published Feb 15 • 28
Secrets of RLHF in Large Language Models Part II: Reward Modeling Paper • 2401.06080 • Published Jan 11 • 23
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 74
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4 • 44
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 25
🔮 Mixture of Experts Collection MoE done using mergekit and LazyMergekit: https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb#scrollTo=d5mYzDo1q96y • 13 items • Updated 12 days ago • 21
Vision Models (GGUF) Collection How to use: Download a "mmproj" model file + one or more of the primary model files. • 5 items • Updated Dec 22, 2023 • 34
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2 • 61
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models Paper • 2312.16693 • Published Dec 27, 2023 • 12
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 26
YUAN 2.0: A Large Language Model with Localized Filtering-based Attention Paper • 2311.15786 • Published Nov 27, 2023 • 7
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster Paper • 2311.08263 • Published Nov 14, 2023 • 14
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond Paper • 2309.16583 • Published Sep 28, 2023 • 12
Multimodal Foundation Models: From Specialists to General-Purpose Assistants Paper • 2309.10020 • Published Sep 18, 2023 • 39
Contrastive Decoding Improves Reasoning in Large Language Models Paper • 2309.09117 • Published Sep 17, 2023 • 37
GPT Can Solve Mathematical Problems Without a Calculator Paper • 2309.03241 • Published Sep 6, 2023 • 17
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer Paper • 2308.06873 • Published Aug 14, 2023 • 24