The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 592
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Paper • 2402.13616 • Published Feb 21 • 45
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16 • 35
Comparing DPO with IPO and KTO Collection A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 9 • 31
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning Paper • 2312.11461 • Published Dec 18, 2023 • 18
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 40
Juanako 7B - UNA: Uniform Neural Alignment Collection These are the Juanako 7B Trained with SFT & DDP & UNA • 8 items • Updated Dec 2, 2023 • 3